A headers-only C++ library for submodular optimization (subset selection) problems on arbitrary data types.
This library implements a handful of basic algorithms for submodular function maximization. In particular, they solve the problem:
where
for any
Monotone functions are functions that preserve the subset partial order on the power set
For such functions, greedy algorithms are both efficient and provably near-optimal when the set
This library uses Bazel as its build system. To compile, make sure you have Bazel installed on your system and run:
bazel build ...
which will build and install both the headers library and the tests into the /build
directory. If you would like to run the tests, you can test the library with:
bazel test ...
Alternatively, you can run a specific test via:
bazel test sfo_cpp/tests/test_monotone_greedy
or similar, for a different test.
Basic usage follows four simple steps:
- Define a
std::unordered_set
of "ground set" elements to summarize. - Import and create the appropriate
GreedyAlgorithm
object from this library. - Give the
GreedyAlgorithm
object a reference to the ground set andConstraint
s it must satisfy (if desired). - Call
GreedyAlgorithm.run_greedy(CostFunction)
on asfo_cpp::CostFunction
abstract base class.
The entire library is templated via some typename E
, where the instatiation of E
defines the C++ objects that the algorithms will be summarizing. The library accomplishes this by populating a std::unordered_set<E*>
of pointers to elements in
All you need to do is implement a CostFunction
that evaluates how "good" a given summary is, and a Constraint
that evaluates if a summary satisfies some constraint set or not, and hand everything over to the GreedyAlgorithms
implemented here.
Once they are run, the algorithms will hold a variable curr_set
, which is a STL unordered_set<E*>
of pointers to the elements of the ground set (which is also a STL unordered_set<E*>
).
To do this, the library defines a templated override of the comparison operator <
for basic pairs std::pair<E, double>
. The library uses this operator to interface with the STL std::priority_queue
container and sort elements
For convenience, the library includes a simple Element
class, which the library will fall back to and instantiate if the template instantiation for typename E
is not given.
In cost_function.hpp
, the library defines the templated (typename E
) abstract base class CostFunction
to represent the mathematical function
The CostFunction
abstract base class only has one virtual method:
-
operator()
: returns a C++double
corresponding to$F(S)$ when$S$ is a singleton (E*
argument) or a subset (std::unordered_set<E*>
argument) for any$S\subseteq V$ .
A couple important example derived classes (specific cost functions) are implemented, such as the Modular
and SquareRootModular
cost functions.
In principle, however, one needs only to define an appropriate CostFunction<E>
object with evaluation overloads to run the greedy algorithms on it.
In constraint.hpp
, the library defines the templated (typename E
) abstract base class Constraint
to represent the mathematical constraint
The set
The Constraint<E>
abstract base class has two pure virtual functions:
test_membership
: returns a Boolean value stating if a singleton<E*>
or subsetstd::unordered_set<E*>
satisfy the constraint or not;is_saturated
: returns a Boolean value stating if the singleton<E*>
or subsetstd::unordered_set<E*>
can have any elements added without violating the constraint. (Overriding this is optional, but it helps stop the algorithms faster)
To implement a constraint, you just have to override the test_membership
functions. A couple simple derived examples such as Knapsack
and Cardinality
constraints are implemented in constraint.hpp
.
The CostFunction
and Constraint
objects are handed to one of the Algorithm objects, which implement the optimization routines to select a (provably near-optimal) subset of elements.
-
Naive Greedy (
GreedyAlgorithm
):- Valid constraints: Matroid, Knapsack
- Valid cost functions: Monotone
The naive greedy algorithm requires
$\mathcal{O}(n)$ computations per iteration as it evaluates the marginal benefit of every possible element in$V$ and greedily selects the best element. Since we select$B$ elements, this algorithm has$\mathcal{O}(Bn^2)$ complexity. This algorithm, while simple, produces a subset$S\subseteq V$ such that $F(\hat{S}) \geq (1-\frac{1}{e})F(S^)$, where $S^$ is the global optimum (which is NP-Hard to compute) when$F$ is monotone and submodular.Reference here.
-
Lazy Greedy (
LazyGreedy
).- Valid constraints: Matroid, Knapsack
- Valid cost functions: Monotone
The lazy greedy algorithm abuses the submodularity of
$F$ to perform iterations in the order dictated by a priority queue. While this still has complexity$\mathcal{O}(n)$ per iteration, in practice this method exhibits orders of magnitude speedup. Because, in principle, the lazy greedy algorithm defaults to the naive greedy algorithm, this algorithm also comes with the same guarantee of$F(\hat{S})\geq (1-\frac{1}{e})F(S^*)$ .Reference here.
-
Stochastic Greedy (
StochasticGreedy
)- Valid constraints: Cardinality
- Valid cost functions: Monotone
The stochastic greedy algorithm instead selects a uniform random subset of elements to greedily choose from each iteration. For a given
$\varepsilon \geq 0$ , the algorithm samples$\frac{n}{B}\log\frac{1}{\varepsilon}$ elements each iteration and has an approximation guarantee of$F(\hat{S}) \geq (1-\frac{1}{e}-\varepsilon)F(S^*)$ in expectation.Reference here.
-
"Lazier Than Lazy Greedy" (
LazierThanLazyGreedy
)- Valid constraints: Cardinality
- Valid cost functions: Monotone
This algorithm is Stochastic Greedy, but also implements the priority queue for lazy evaluations, giving the same guarantee as Stochastic Greedy in expectation.
Reference here.
-
Bidirectional Greedy (
BidirectionalGreedy
)- Valid constraints: None
- Valid cost functions: Monotone, non-monotone
This algorithm is only valid for unconstrained problems (
$\mathcal{C} = V$ ), but returns a set$\hat{S}\subseteq V$ with $F(\hat{S}) \geq \frac{1}{3}F(S^{})$ for any submodular function $F$. It also has a flagrandomized
that, if set totrue
, will run the randomized variant that returns a set with $F(\hat{S}) \geq\frac{1}{2}F(S^{})$ guarantee in expectation.Reference here.
-
Approximate local search (
ApxLocalSearch
)- Valid constraints: k-Matroid
- Valid cost functions: Monotone, non-monotone
WIP
Reference here.
Many algorithms, when asked to optimize the cost function with the run_{ALG_NAME}()
call, may be handed a flag to instead run the cost-benefit algorithm instead. In the cost-benefit algorithm, the benefit of each Element
is divided by its additional cost according to the knapsack constraint. As a result, this specific variant will only run when the Constraint
that GreedyAlgorithm
or LazyGreedyAlgorithm
is provided with is of the specific derived class Knapsack
.
Quick testing scripts are given in test_monotone_greedy.cpp
and test_non_monotone_greedy.cpp
.
Coming soon: non-monotone algorithms, pre-emption (streaming), semi-streaming, and more!