title | tags | authors | affiliations | date | bibliography | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CompressedBeliefMDPs.jl: A Julia Package for Solving Large POMDPs with Belief Compression |
|
|
|
13 April 2024 |
paper.bib |
Partially observable Markov decision processes (POMDPs) are a standard mathematical model for sequential decision making under state and outcome uncertainty [@AFDM]. They commonly feature in reinforcement learning research and have applications spanning medicine [@drugs], sustainability [@carbon], and aerospace [@planes]. Unfortunately, real-world POMDPs often require bespoke solutions, because they are too large to be tractable with traditional methods [@complexity1; @complexity2]. Belief compression [@Roy] is a general-purpose technique that focuses planning on relevant belief states, thereby making it feasible to solve complex, real-world POMDPs more efficiently.
CompressedBeliefMDPs.jl is a Julia package [@Julia] for solving large POMDPs in the POMDPs.jl ecosystem [@POMDPs.jl] with belief compression (described below). It offers a simple interface for efficiently sampling and compressing beliefs and for constructing and solving belief-state MDPs. The package can be used to benchmark techniques for sampling, compressing, and planning. It can also solve complex POMDPs to support applications in a variety of domains.
While traditional tabular methods like policy and value iteration scale poorly, there are modern methods such as point-based algorithms [@PBVI; @perseus; @hsvi; @SARSOP] and online planners [@AEMS; @despot; @mcts; @pomcp; @sunberg2018online] that perform well on real-world POMDPs in practice. Belief compression is an equally powerful but often overlooked alternative that is especially potent when belief is sparse.
CompressedBeliefMDPs.jl is a modular generalization of the original algorithm. It can be used independently or in conjunction with other planners. It also supports both continuous and discrete state, action, and observation spaces.
CompressedBeliefMDPs.jl abstracts the belief compression algorithm of @Roy into four steps: sampling, compression, construction, and planning. The Sampler
abstract type handles belief sampling; the Compressor
abstract type handles belief compression; the CompressedBeliefMDP
struct handles constructing the compressed belief-state MDP; and the CompressedBeliefSolver
and CompressedBeliefPolicy
structs handle planning in the compressed belief-state MDP.
Our framework is a generalization of the original belief compression algorithm. @Roy uses a heuristic controller for sampling beliefs; exponential family principal component analysis with Poisson loss for compression [@EPCA]; and local approximation value iteration for the base solver. CompressedBeliefMDPs.jl, on the other hand, is a modular framework, meaning that belief compression can be applied with any combination of sampler, compressor, and MDP solver.
To our knowledge, no prior Julia or Python package implements POMDP belief compression. A similar package exists for MATLAB [@epca-MATLAB], but it focuses on Poisson exponential family principal component analysis and not general belief compression.
The Sampler
abstract type handles sampling. CompressedBeliefMDPs.jl supports sampling with policy rollouts through PolicySampler
and ExplorationSampler
which wrap Policy
and ExplorationPolicy
from POMDPs.jl respectively. These objects can be used to collect beliefs with a random or
CompressedBeliefMDPs.jl also supports fast exploratory belief expansion on POMDPs with discrete state, action, and observation spaces. Our implementation is an adaptation of Algorithm 21.13 in @AFDM. We use
The Compressor
abstract type handles compression in CompressedBeliefMDPs.jl. CompressedBeliefMDPs.jl provides seven off-the-shelf compressors:
- Principal component analysis (PCA) [@PCA],
- Kernel PCA [@kernelPCA],
- Probabilistic PCA [@PPCA],
- Factor analysis [@factor],
- Isomap [@isomap],
- Autoencoder [@autoencoder], and
- Variational auto-encoder (VAE) [@VAE].
The first four are supported through MultivariateState.jl; Isomap is supported through ManifoldLearning.jl; and the last two are implemented in Flux.jl [@flux].
First, recall that any POMDP can be viewed as a belief-state MDP [@belief-state-MDP], where states are beliefs and transitions are belief updates (e.g., with Bayesian or Kalman filters). Formally, a POMDP is a tuple
We define the corresponding compressed belief-state MDP (CBMDP) as
The CompressedBeliefMDP
struct contains a GenerativeBeliefMDP
, a Compressor
, and a cache Solver
can solve a CompressedBeliefMDP
.
using POMDPs, POMDPModels, POMDPTools
using CompressedBeliefMDPs
# construct the CBMDP
pomdp = BabyPOMDP()
sampler = BeliefExpansionSampler(pomdp)
updater = DiscreteUpdater(pomdp)
compressor = PCACompressor(1)
cbmdp = CompressedBeliefMDP(pomdp, sampler, updater, compressor)
# solve the CBMDP
solver = MyMDPSolver()::POMDPs.Solver
policy = solve(solver, cbmdp)
CompressedBeliefSolver
and CompressedBeliefPolicy
wrap the belief compression pipeline, meaning belief compression can be applied without explicitly constructing a CompressedBeliefMDP
.
using POMDPs, POMDPModels, POMDPTools
using CompressedBeliefMDPs
pomdp = BabyPOMDP()
base_solver = MyMDPSolver()
solver = CompressedBeliefSolver(
pomdp,
base_solver;
updater=DiscreteUpdater(pomdp),
sampler=BeliefExpansionSampler(pomdp),
compressor=PCACompressor(1),
)
policy = POMDPs.solve(solver, pomdp) # CompressedBeliefPolicy
s = initialstate(pomdp)
v = value(policy, s)
a = action(policy, s)
Following @Roy, we use local value approximation as our default base solver, because it bounds the value estimation error [@error_bound].
using POMDPs, POMDPTools, POMDPModels
using CompressedBeliefMDPs
pomdp = BabyPOMDP()
solver = CompressedBeliefSolver(pomdp)
policy = solve(solver, pomdp)
To solve a continuous-space POMDP, simply swap the base solver. More details, examples, and instructions on implementing custom components can be found in the documentation.
CompressedBeliefMDPs.jl also includes the Circular Maze POMDP from @Roy and scripts to recreate figures from the original paper. Additional details can be found in the documentation.
using CompressedBeliefMDPs
n_corridors = 2
corridor_length = 100
pomdp = CircularMaze(n_corridors, corridor_length)
We thank Arec Jamgochian, Robert Moss, Dylan Asmar, and Zachary Sunberg for their help and guidance.