## Contextual Bandits with Similarity Information

** Aleksandrs Slivkins**; 15(Jul):2533−2568, 2014.

### Abstract

In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time- invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is by now well-understood, a lot of recent work has focused on MAB problems with exponentially or infinitely large strategy sets, where one needs to assume extra structure in order to make the problem tractable. In particular, recent literature considered information on similarity between arms.

We consider similarity information in the setting
of *contextual bandits*, a natural extension of the basic
MAB problem where before each round an algorithm is given the
*context*---a hint about the payoffs in this round.
Contextual bandits are directly motivated by placing
advertisements on web pages, one of the crucial problems in
sponsored search. A particularly simple way to represent
similarity information in the contextual bandit setting is via a
*similarity distance* between the context- arm pairs which
bounds from above the difference between the respective expected
payoffs.

Prior work on contextual bandits with similarity
uses â€œuniform" partitions of the similarity space, so that each
context-arm pair is approximated by the closest pair in the
partition. Algorithms based on â€œuniform" partitions disregard
the structure of the payoffs and the context arrivals, which is
potentially wasteful. We present algorithms that are based on
*adaptive* partitions, and take advantage of "benign"
payoffs and context arrivals without sacrificing the worst-case
performance. The central idea is to maintain a finer partition
in high-payoff regions of the similarity space and in popular
regions of the context space. Our results apply to several other
settings, e.g., MAB with constrained temporal change (Slivkins
and Upfal, 2008) and sleeping bandits (Kleinberg et al.,
2008a).