Skip to content

A python library for (finite) Partial Monitoring algorithms

License

Notifications You must be signed in to change notification settings

TanguyUrvoy/pmlib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pmlib

A python library for (finite) Partial Monitoring algorithms

Partial monitoring

Partial Monitoring(PM) is a general framework for sequential decision making with imperfect feedback. PM generalizes a host of problems including for instance multi-armed bandits, prediction with expert advices, dynamic pricing, apple tasting, dark pools, label efficient prediction and dueling bandits.

Each problem is formalized by a couple of matrices and . At each step of the game, the learner chooses an action and the environment chooses an outome . gives the loss of action for outome and gives a (symbolic or numeric) feedback for this situation. The aim of the learner is to control her regret against an informed policy.

See N. Cesa-Bianchi, G. Lugosi "Prediction, Learning, and Games" 2006 on chapter 6 for an introduction: http://homes.dsi.unimi.it/~cesabian/predbook/

The library

We plan to add several generic PM algorithm, but the present version only includes FeedExp3 and its variants. See http://archive.cone.informatik.uni-freiburg.de/pubs/siim-tr-00-18.pdf or http://stoltz.perso.math.cnrs.fr/Publications/CBLS-pmonit.pdf

We also provide Rex3, an adhoc algorithm for dueling bandits: http://proceedings.mlr.press/v37/gajane15.html

The Function pmlib.problemClass(game) can analyze any game and provide its position in the PM complexity hierarchy as defined in (Bartok et al. "Partial monitoring – classification, regret bounds, and algorithms" 2013). This can be either:

  • TRIVIAL if a single action domines all others. It gives a constant regret.
  • EASY if all pairs of actions are globally observable and all neighbouring actions are locally observable. It gives a min-max regret in .
  • HARD if all pairs of actions are globally observable. It gives a min-max regret in .
  • INTRACTABLE if some pairs of actions are non globally observable.

Install guide

This library is based on the the Parma Polyhedra Library for the "Cell decomposition":

  • http://bugseng.com/products/ppl/
  • Reference: R. Bagnara et al.. "The Parma Polyhedra Library: Toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems." Science of Computer Programming, 72(1-2):3-21, 2008

You must intall this library and its python wrapper to use pmlib:

We also use numpy, scipy and pandas.