## From External to Internal Regret

** Avrim Blum, Yishay Mansour**; 8(Jun):1307--1324, 2007.

### Abstract

External regret compares the performance of an online algorithm,
selecting among *N* actions, to the performance of the best of those
actions in hindsight. Internal regret compares the loss of an online
algorithm to the loss of a modified online algorithm, which
consistently replaces one action by another.

In this paper we give a simple generic reduction that, given an
algorithm for the external regret problem, converts it to an efficient
online algorithm for the internal regret problem. We provide methods
that work both in the * full information* model, in which the loss
of every action is observed at each time step, and the * partial
information* (bandit) model, where at each time step only the loss of
the selected action is observed.
The importance of internal regret in game theory is due to the
fact that in a general game, if each player has sublinear internal
regret, then the empirical frequencies converge to a correlated
equilibrium.

For external regret we also derive a quantitative regret bound for a
very general setting of regret, which includes an arbitrary set of
modification rules (that possibly modify the online algorithm) and an
arbitrary set of time selection functions (each giving different
weight to each time step). The regret for a given time selection and
modification rule is the difference between the cost of the online
algorithm and the cost of the modified online algorithm, where the
costs are weighted by the time selection function. This can be viewed
as a generalization of the previously-studied *sleeping experts*
setting.

[abs][pdf]