## Tracking the Best Linear Predictor

** Mark Herbster, Manfred K. Warmuth**;
1(Sep):281-309, 2001.

### Abstract

In most on-line learning research the total on-line loss of the algorithm is compared to the total loss of the best off-line predictor*u*from a comparison class of predictors. We call such bounds

*static bounds*. The interesting feature of these bounds is that they hold for an arbitrary sequence of examples. Recently some work has been done where the predictor

*u*at each trial

_{t}*t*is allowed to change with time, and the total on-line loss of the algorithm is compared to the sum of the losses of

*u*at each trial plus the total "cost" for shifting to successive predictors. This is to model situations in which the examples change over time, and different predictors from the comparison class are best for different segments of the sequence of examples. We call such bounds

_{t}*shifting bounds*. They hold for arbitrary sequences of examples and arbitrary sequences of predictors.

Naturally shifting bounds are much harder to prove. The only known
bounds are for the case when the comparison class consists of a
sequences of experts or boolean disjunctions. In this paper we develop
the methodology for lifting known static bounds to the shifting case.
In particular we obtain bounds when the comparison class consists of
linear neurons (linear combinations of experts). Our essential
technique is to *project* the hypothesis of the static algorithm
at the end of each trial into a suitably chosen convex region. This
keeps the hypothesis of the algorithm well-behaved and the static
bounds can be converted to shifting bounds.