## Rational Kernels: Theory and Algorithms

** Corinna Cortes, Patrick Haffner, Mehryar Mohri**; 5(Aug):1035--1062, 2004.

### Abstract

Many classification algorithms were originally designed for fixed-size vectors. Recent applications in text and speech processing and computational biology require however the analysis of variable-length sequences and more generally weighted automata. An approach widely used in statistical learning techniques such as Support Vector Machines (SVMs) is that of kernel methods, due to their computational efficiency in high-dimensional feature spaces. We introduce a general family of kernels based on weighted transducers or rational relations,*rational kernels*, that extend kernel methods to the analysis of variable-length sequences or more generally weighted automata. We show that rational kernels can be computed efficiently using a general algorithm of composition of weighted transducers and a general single-source shortest-distance algorithm.

Not all rational kernels are * positive definite and symmetric *
(PDS), or equivalently verify the Mercer condition, a condition that
guarantees the convergence of training for discriminant classification
algorithms such as SVMs. We present several theoretical results
related to PDS rational kernels. We show that under some general
conditions these kernels are closed under sum, product, or
Kleene-closure and give a general method for constructing a PDS
rational kernel from an arbitrary transducer defined on some
non-idempotent semirings. We give the proof of several
characterization results that can be used to guide the design of PDS
rational kernels. We also show that some commonly used string kernels
or similarity measures such as the edit-distance, the convolution
kernels of Haussler, and some string kernels used in the context of
computational biology are specific instances of rational kernels. Our
results include the proof that the edit-distance over a non-trivial
alphabet is not *negative definite*, which, to the best of our
knowledge, was never stated or proved before.

Rational kernels can be combined with SVMs to form efficient and powerful techniques for a variety of classification tasks in text and speech processing, or computational biology. We describe examples of general families of PDS rational kernels that are useful in many of these applications and report the result of our experiments illustrating the use of rational kernels in several difficult large-vocabulary spoken-dialog classification tasks based on deployed spoken-dialog systems. Our results show that rational kernels are easy to design and implement and lead to substantial improvements of the classification accuracy.