Comparisons with Other Algorithms

Recently, many authors have proposed decomposition algorithms for
regression. For instance, Shevade *et al* [13] proposed two modifications of
the *SMO*
algorithm from Platt [12]
for regression, based on a previous paper from the same
team [7] for classification problems.
Laskov [8] proposed also a decomposition method for
regression problems which is very similar to the second modification
proposed by Shevade et al.
In fact, it is easy to see that Laskov's method with a subproblem of size 2
uses the same selection algorithm as well as the same termination criterion
as Shevade et al.

Their method for selecting the working set is *very* similar to the one
we show in this paper, but while we propose to select variables
independently of their counterparts
,
they
propose to select simultaneously *pairs* of variables
.
Even if this
seems to be a small difference,
let us note that
at the optimal solution as well as during the optimization process,
as proved by Lin [9] in the case of algorithms such as the one we
propose here.^{4}
Thus, one of the two variables
or
is
always equal to 0,
and choosing the
and
independently
can thus help to quickly eliminate many variables, thanks to
the *shrinking* phase,^{5} which, of
course, has a direct impact on the speed of our algorithm. In fact,
working with pairs of variables would force the algorithm to do many
computations with null variables until the end of the optimization process.

Similarly, Smola and Schölkopf [14] also proposed earlier to use a
decomposition algorithm for regression based on *SMO*, using an analytical
solution for the subproblems, but again they proposed to select two pairs of
variables (two
and their corresponding
)
instead of
two variables as we propose in this paper.

Finally, Flake and Lawrence [3] proposed a modification of
*SMO* for regression that uses the heuristics proposed by Platt [12]
and those from Smola and Schölkopf [14], but their modification
uses a new variable
.
Once again, this forces the
use of the pairs
during the computations.

The originality of our algorithm is thus to select *independently*
the variables
and
,
which has the side
effect of efficiently adapting the shrinking step proposed in classification
by Joachims [5] (it is indeed less easy to think of an
efficient shrinking method in
the context of pairs of variables). This also helps to simplify the resolution
of the subproblem in the case where *q*=2. Finally, experiments given
in section 3 suggest that this idea leads
to faster convergence times.