The Hunt for the Missing Data Type #72

pnevyk · 2024-03-23T16:18:59Z

pnevyk
Mar 23, 2024
Maintainer

Here is an interesting article The Hunt for the Missing Data Type that talks about challenges in development of a graph library. A few citations:

[G]raph algorithm performance is a serious consideration and the special cases really matter. Kelly raised the example of maximum weight matching. If you know that your graph is “bipartite”, you can use a particular fast algorithm to find a matching, while for other graphs you need to use a slow, more general algorithm.

This is exactly what Guarantee trait and "restricted encapsulations" on top of the storage tries to help with.

Different graph operations have different performance characteristics on different representations. Take a directed graph with 100 nodes and 200 edges. If we use an adjacency matrix representation, we need a 100×100 matrix containing 200 ones and 9,800 zeros. If we instead use an edge list we need only 200 pairs of nodes. Depending on your PL and level of optimizations that could be a memory difference of 20x or more.

Now instead take a graph with 100 nodes and 8,000 edges and try to find whether an edge exists between node 0 and node 93. In the matrix representation, that’s an O(1) lookup on graph[0][93]. In the edge list representation, that’s an O(|edge|) iteration through all 8,000 edges.

That is why it is possible to choose the exact representation for storing the graph and the algorithms are storage-agnostic. Although it is definitely true that a generic interface required for all representations is limiting and some features which specific representations excel in might not be generalizable.

Supporting many representations has a serious downside: you have to do a lot more work to add algorithms. If you write a separate version of the algorithm for each graph representation, you’re tripling or quadrupling the maintenance burden.

I believe that the impact of this problem is reduced by only implementing a set of core features for each storage and implement more high-level operations on top of them in a generic manner, but...

If you instead write a generic abstraction over polymorphic types, then your library is less performant.

I suppose this is true. It would be interesting to create some benchmarks to quantify this claim.

[W]hat if the graph is simply too big to work with? He gave the example of finding a solution to the 15 puzzle. This is done by running a A* search on the state space. A state space with over 20 trillion states.

That is why we aim for convenient support for implicit graphs...

They eventually found that a generic graph type simply couldn’t compete with handpicking the representation for the problem.

Or to use custom graph storage as long as the common interface is implemented for it.

It was a very good read.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Hunt for the Missing Data Type #72

{{title}}

Replies: 0 comments

Select a reply

The Hunt for the Missing Data Type #72

pnevyk Mar 23, 2024 Maintainer

Replies: 0 comments

pnevyk
Mar 23, 2024
Maintainer