This project contains a working example of a contextual multi-armed bandit.
I wrote this after becoming interested in the contextual bandit problem for providing personalized recommendations, but not being able to find any working code. So I made this to be able to understand how the algorithm works :)
This repository indexes high on code and low on docs. Briefly, it contains:
-
A driver ipython notebook
contextual_bandit_sim.ipynb
. You should start here to understand the contents. -
A data generator. We initialize hidden contextual variables that are used to create synthetic samples. Let's call this latent contextual variable set
L
. Let's call the synthetic dataX
. -
Two strategies / reinforcement functions
S
by which the bandits can be evaluated (binomial:BinaryStrategy.py
and continuous:PositiveStrategy.py
). -
A simulator
Simulator.py
that contains a set of variablesM
that are estimates ofL
, that through observation ofX
and evaluation of those observations using stratigiesS
can be used to update estimatesM
to converge to the hiddenL
.
If you find this useful, please share, retweet, and follow me @allenday on twitter.
Have fun!