Shrinking

Next: Termination Criterion Up: The Decomposition Algorithm Previous: Solving the Subproblem

Shrinking

The idea of shrinking is to remove some variables whose values have been equal to the bounds 0 or C for a long time, and that will probably not change anymore. To do this, we use the fact that $(\boldsymbol{\alpha},\, \boldsymbol{\alpha^{\star}})$ minimizes the problem (3) under the constraints (4) and (5) if and only if there exists numbers $\boldsymbol{\lambda}^{up} \in \mathbb{R} ^{2l}$ , $\boldsymbol{\lambda}^{low} \in \mathbb{R} ^{2l}$ , $\lambda^{eq} \in \mathbb{R}$ that verify the following KKT conditions:

$\begin{displaymath}\mathcal{W}^{'}(\boldsymbol{\alpha},\, \boldsymbol{\alpha}^{\... ...ol{\lambda}^{low} + \boldsymbol{\lambda}^{up} = \boldsymbol{0} \end{displaymath}$

(24)

$\begin{displaymath}\lambda^{low}_i \, \alpha_i = 0, \ \ i = 1\ldots l \ \ \ \tex... ...da^{low}_i \, \alpha^{\star}_{i-l} = 0, \ \ i = (l+1)\ldots 2l \end{displaymath}$

(25)

$\begin{displaymath}\lambda^{up}_i \left( \alpha_i - C \right) = 0, \ \ i = 1\ldo... ...( \alpha^{\star}_{i-l} - C \right) = 0, \ \ i = (l+1)\ldots 2l \end{displaymath}$

(26)

$\begin{displaymath}\boldsymbol{\lambda}^{low} \geq \boldsymbol{0} \end{displaymath}$

(27)

$\begin{displaymath}\boldsymbol{\lambda}^{up} \geq \boldsymbol{0} \end{displaymath}$

(28)

$\begin{displaymath}(\boldsymbol{\alpha}-\boldsymbol{\alpha}^{\star})^{\raisebox{2pt}{\tiny T}} \, \boldsymbol{1} = 0 \end{displaymath}$

(29)

$\begin{displaymath}\boldsymbol{0} \leq \boldsymbol{\alpha}^{\star}, \ \boldsymbol{\alpha} \leq \boldsymbol{C}. \end{displaymath}$

(30)

Note that if $\lambda^{low}_i > 0$ , then the corresponding variable is equal to 0. Also, if $\lambda^{up}_i > 0$ , then the corresponding variable is equal to C. The idea is thus to search at each iteration for variables $\boldsymbol{\lambda}^{up}$ , $\boldsymbol{\lambda}^{low}$ and $\lambda^{eq}$ that verify as well as possible³ the equations (24)-(28), and to remove a variable whose value is equal to 0 if its coefficient $\lambda^{low}_i$ is strictly positive (or just above a constant $\epsilon_{shrink}$ ) during a given number of iterations. Using the same idea, we also eliminate a variable whose value is equal to Cif its coefficient $\lambda^{up}_i$ stays strictly positive for a sufficient number of iterations.

To estimate $\boldsymbol{\lambda}^{low}$ or $\boldsymbol{\lambda}^{up}$ , we start by estimating $\lambda^{eq}$ (note that if $0 < \alpha_i < C$ then $\lambda^{low}_i = \lambda^{up}_i = 0$ and if $0 < \alpha^{\star}_i < C$ then $\lambda^{low}_{i+l} = \lambda^{up}_{i+l} = 0$ ). Noting

$\begin{displaymath}A = \{ i, \ 0 < \alpha_i < C \}, \ \ B = \{ i, \ 0 < \alpha^{\star}_i < C \}\end{displaymath}$

we have (with $\hat{\lambda}$ standing for an estimation of $\lambda$ ) :

$\begin{displaymath}\hat{\lambda}^{eq} = \frac{1}{\vert A\cup B\vert} $ \sum_{i ... ...i^{'}(\boldsymbol{\alpha},\, \boldsymbol{\alpha}^{\star}) $ . \end{displaymath}$

(31)

Then if we have

$\begin{displaymath}\begin{array}{rclcrcl} \alpha_i = 0 & \textrm{we compute} & \... ... = & \hat{\lambda}^{eq} & - & \mathcal{W}_{i+l}^{'} \end{array}\end{displaymath}$

(32)

and if a variable stays a sufficient number of iterations at 0 (or C) with its corresponding coefficient $\hat{\lambda}^{low}_j > \epsilon_{shrink}$ (or $\hat{\lambda}^{up}_j > \epsilon_{shrink}$ ), then we remove it from the problem. Note that this can lead to an incorrect solution if $\epsilon_{shrink}$ or the number of iterations before removing a variable is too small. This is verified in the experimental section.

Next: Termination Criterion Up: The Decomposition Algorithm Previous: Solving the Subproblem

Journal of Machine Learning Research