## Algorithms and Hardness Results for Parallel Large Margin Learning

*Philip M. Long, Rocco A. Servedio*; 14(Oct):3105−3128, 2013.

### Abstract

We consider the problem of learning an unknown large-margin
halfspace in the context of parallel computation, giving both
positive and negative results. As our main positive result, we
give a parallel algorithm for learning a large-margin halfspace,
based on an algorithm of Nesterov's that performs gradient
descent with a momentum term. We show that this algorithm can
learn an unknown $\gamma$-margin halfspace over $n$ dimensions
using $n \cdot \text{poly}(1/\gamma)$ processors and running in time
$\tilde{O}(1/\gamma)+O(\log n)$. In contrast, naive parallel
algorithms that learn a $\gamma$-margin halfspace in time that
depends polylogarithmically on $n$ have an inverse quadratic
running time dependence on the margin parameter $\gamma$. Our
negative result deals with boosting, which is a standard
approach to learning large-margin halfspaces. We prove that in
the original PAC framework, in which a weak learning algorithm
is provided as an oracle that is called by the booster, boosting
cannot be parallelized. More precisely, we show that, if the
algorithm is allowed to call the weak learner multiple times in
parallel within a single boosting stage, this ability does not
reduce the overall number of successive stages of boosting
needed for learning by even a single stage. Our proof is
information-theoretic and does not rely on unproven assumptions.

[abs][pdf][bib]