Skip to content

thm-mni-ii/sea

Repository files navigation


Space-Efficient
Algorithms

Build Status License: GPL v3 Coverage Status Quality Gate Reliability

SEA is a project to research and implement an open C++ library for Space-Efficient (Graph) Algorithms (SEA). Besides the running time of algorithms, their space requirements cause a problem if dealing with huge data sets or executing them on tiny devices where memory is heavily limited. Therefore, we want to provide algorithms and data structures to tackle this problem by treating space as a scarce resource.

Table of Contents

Motivation

Space-efficient algorithms in practice are justified by two main arguments. The first one is that due to memory limitation of a real computer a space-efficient algorithm runs on instances where a non space-efficient algorithm would crash due to out of memory errors. This is especially interesting where all cores of a computer should be used and multiple algorithms compete for the available memory. The second reason is that space-efficient algorithms can run faster on huge instances because they use less memory and therefore cause less cache faults. (A cache fault occurs when a system must move some of the data stored in the memory to a slower one, i.e., from the L1/L2/L3 cache of the CPU to the RAM or from the RAM to the disc drive.)

We now show these two points on an example. Assume we need to compute an Eulerian tour in an outerplanar graph. Without going into detail, we use the Hierholzer algorithm where one key step of the algorithm is the creation of recursive graph instances. There are many other graph algorithms that need to create recursive subgraphs, such as computing tree decompositions. The first step to make the algorithm space-efficient is to handle the recursive graph instances with a subgraph stack. We have implemented a folklore algorithm and a space-efficient version of it and run tests to measure the running time and the space requirements of both algorithms.

The next image shows the space requirements of the creation of subgraph instances using a space-efficient subgraph stack (blue line) and a simple subgraph stack, which creates subgraphs instances by copying vertices and edges of the original graph that remain in the subgraph together with a mapping between the new vertex numbering and the original vertex numbering, for easy translation. The system the tests have been run on has an Intel Core i7-3770T CPU @ 2.50GHz with 16 GB of RAM and 32 GB swap. Note that the simple subgraph stack exhibits a large runtime penalty as soon as cache faults become a large problem because of having to use the swap space.

Algorithms and Data Structures

This section gives you a brief overview over the implemented algorithms and data structures. For a detailed documentation click on an algorithm. For some data structures and algorithms we also provide a folklore implementation that we use to compare the memory consumption and runtime efficiency.

Algorithms

Algorithm Runtime Memory (bits) Details
Depth First Search O(n+m) O(n+m) here
Depth First Search O(n+m) O(n log(log n)) here
Depth First Search O((n+m) log n) O((log(3)+ε) n) here
Reverse DFS O(n+m) O(n log(log n)) here
Breadth First Search O(n+m) O(n) here
Cut-Vertex O(n+m) O(n+m) here
Biconnected-Component O(n+m) O(n+m) here
Outerplanar Detection O(n log(log n)) O(n) here

Data Structures

  • InitializedArray: An array consisting of fields that in total can be initialized with an user defined value in constant time by using O(1) computer words. The array provides constant time access (read/write) to fields.
  • Graph(G = {V, E}): An adjacency list graph representation that occupies O((n + m) log n) bits.
  • Bitset: A bitset of n bits that supports access in O(1) time and occupies O(n) bits.
  • AVL tree: A self-balancing binary tree with O(log(n)) time for search, insertion and removal of a node.
  • Choice Dictionary: A bitset that supports a choice operation in O(1) time that returns the position of a bit set to 1. The choice dictionary occupies O(n) bits.
  • Rank-Select: A bit sequence that supports the operations rank(k) and select(k) in O(1) time and occupies O(n) bits. rank(k) returns the number of set bits up to index k, and select(k) returns the index of the k-th set bit.
  • Ragged Dictionary: A set of n/log(n) key-value tuples with O(log(log(n))) time for get, insert and remove operations. The ragged dictionary occupies O(n) bits.
  • Static-Space Storage: A sequence of n bit packs of variable size that can be accessed in O(1) time and occupies O(n + N) bits. N is the total usable size of the static-space storage.
  • Subraph Stack: Initialized with an n-vertex m-edge graph the stack allows to remove vertices and edges and provides a resulting graph using O(n + m) bits.

Build

  1. Install CMake and a C++ compiler for your specific operating system.
  2. Build a make file for your system by using CMake -> cmake .
  3. Build the artifacts by executing make -> make

Now, the include folder contains the necessary header files and the lib folder contains the build library.

If you encounter any bugs, missing or misleading documentation, do not hesitate to create an issue ticket.

Using the Library

  • Copy the include/sealib folder into your own project's include path.
  • Copy the lib/libsealib.so file into your own project's library path. Make sure that the environment variable LD_LIBRARY_PATH also points there, or else the shared library won't be found.
  • Include the header files you want to use in your code.
  • Compile with correct -I and -L flags, and use -std=c++11.

Usage example

#include <vector>
#include <sealib/_types.h>
#include <sealib/iterator/dfs.h>
#include <sealib/graph/graphcreator.h>

using namespace Sealib;

bool reachable(DirectedGraph &g, uint64_t a, uint64_t b) {
    bool ret = false;
    std::vector<bool> started(100);
    std::vector<bool> done(100);
    DFS::nloglognBitDFS(g,
                        [&](uint64_t u) {
                            started[u]=1;
                            if (u == b && started[a] && !done[a]) {
                                ret = true;
                            }
                        },
                        DFS_NOP_EXPLORE, DFS_NOP_EXPLORE,
                        [&](uint64_t u) { done[u]=1; });
    return ret;
}

int main(void) {
    DirectedGraph g = GraphCreator::kOutdegree(100, 30);
    return reachable(g, 10, 25);
}

Compile with:

clang++ -I<include-path> -L<libary-path> -std=c++11 -o reachable reachable.cpp -lsealib

Run the executable:

export LD_LIBRARY_PATH=<library-path>
./reachable

Project Structure

.
├── CMakeLists.txt  # CMake build script
├── LICENSE         # Licence description
├── README.md       # You are reading this file now
├── third-party     # Third party libraries
├── include         # The library's header files (*.h)
├── src             # The library's source files (*.cpp)
├── src-view        # The source files for the visualization (*.cpp)
├── test            # The test files
├── lib             # The library files
└── bin             # Executable files to test the project

Research

We publish most of our research on arXiv.org.

License

Licensed under the MIT licence. For detailed license information look inside the LICENSE file.

Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 379157101.