## Probability Product Kernels

*
Tony Jebara, Risi Kondor, Andrew Howard*; 5(Jul):819-844, 2004.

### Abstract

The advantages of discriminative learning algorithms and kernel
machines are combined with generative modeling using a novel kernel
between distributions. In the probability product kernel, data points
in the input space are mapped to distributions over the sample space
and a general inner product is then evaluated as the integral of the
product of pairs of distributions. The kernel is straightforward to
evaluate for all exponential family models such as multinomials and
Gaussians and yields interesting nonlinear kernels. Furthermore, the
kernel is computable in closed form for latent distributions such as
mixture models, hidden Markov models and linear dynamical systems. For
intractable models, such as switching linear dynamical systems,
structured mean-field approximations can be brought to bear on the
kernel evaluation. For general distributions, even if an analytic
expression for the kernel is not feasible, we show a straightforward
sampling method to evaluate it. Thus, the kernel permits
discriminative learning methods, including support vector machines, to
exploit the properties, metrics and invariances of the generative
models we infer from each datum. Experiments are shown using
multinomial models for text, hidden Markov models for biological
data sets and linear dynamical systems for time series data.

[abs][pdf]
[ps.gz]
[ps]