Dijkstra*: refactor for performance and API cleanup #100

jrtom · 2017-08-04T21:35:49Z

DijkstraDistance and DijkstraShortestPath each (optionally) cache information about previously calculated shortest distances/paths, as a time/space tradeoff in case those distances/paths are needed again.

However, they don't do so in a very intelligent fashion: when we've calculated the shortest path/distance from A to C, and the path passes through B, we already know the shortest path/distance from B to C also, but we don't store it in such a way as to be able to take advantage of this.

This is known to be a problem in practice; one user has reported that with a graph of ~10k nodes and 100k edges, after a number of calls for path lengths, there are ~2M distances being stored. :(

The task here is to come up with a better representation of the cached distances/paths that is more compact. Offhand I'd guess that this can be done in O(m + n) space (i.e., basically the size of the input graph), or perhaps O(n^2); what we have now looks more like O(mn).

(While we're in here, we should also clean up the API, e.g., make use of builders, make the interface more uniform, etc.)

The text was updated successfully, but these errors were encountered:

gstevey · 2017-08-04T22:10:28Z

O(m+n) would be fantastic. You could just precompute all shortest paths and be O(1) on every query. I'm finding that shortest-path queries are taking 3-6 seconds, though fortunately most are avoided because of caching.

For what it's worth, my band-aid fix of calling reset() every few hours is going to carry me for at least a year, as far as I can tell. But it really does use a ton of space. In addition to the 2M distances, there are ~4M LinkedHashMap node entries for my m+n=110k graph.

jbduncan · 2017-08-04T22:14:10Z

@jrtom I've not yet looked at DijkstraDistance or DijkstraShortestPath in detail, but whilst we're cleaning up the API, I would encourage us to see if we could merge the two classes together.

Perhap's JGraphT's DijkstraShortestPath (javadoc) could prove to be a source of inspiration for a cleaner API.

gstevey · 2017-08-04T22:17:13Z

That would be nice. It was somewhat nontrivial to figure out how to use the API. My code looks like this:

        val edges = getAlgorithm(graph).getPath(a, b)
        if (edges.size == 0) return listOf()
        val result: MutableList<Location> = mutableListOf()
        var v_prev: Location?
        var v: Location? = a
        result.add(a)
        for (edge_num in edges) {
            v_prev = v
            v = graph.getOpposite(v_prev, edge_num)
            result.add(v)
        }

jrtom · 2017-08-04T22:23:52Z

Merging the classes together is almost certainly the right thing to do. The one caveat is that we don't want to cause people who only care about distances to incur the costs for calculating (and storing) paths as well; that was the original motivation for the distinction.

There's also an internal discussion going on with the Guava team about what a good API looks like for graph distance/path calculation; once that's resolved we'll almost certainly be using its results.

@gstevey just curious, is that Kotlin?

jbduncan · 2017-08-04T22:27:38Z

There's also an internal discussion going on with the Guava team about what a good API looks like for graph distance/path calculation; once that's resolved we'll almost certainly be using its results.

Oh wow! The Guava team continue to amaze me with their approach to API design, so I look forward to any results that may crop up from their discussion. :)

jrtom · 2017-08-04T22:31:34Z

That discussion has been on the back burner for a while, mostly because it got blocked on a related issue that got blocked on basically coming up with definitions for graph theoretical terms. Which led to me dragging out all my graph theory/algorithms texts and writing up a doc that established that, sure enough, there is not a lot of consensus in the graph theory world on some very basic semantic questions. :/

But I'll get back to that soon, and hopefully that will lead to various other things getting unblocked. :)

jrtom · 2017-08-09T02:37:02Z

related: #80 (proposal to allow the size of the cache to be limited)

jrtom · 2017-08-09T17:05:38Z

bringing in comment from elsewhere:

I liked your idea about sharing the common shortest paths, but I also think you can get significant mileage out of reusing edges (or their distances) -- make a Set and then create or fetch the edge distance object box from the set when storing it in a path. You'd still need all the path holder boxes, but it would at least avoid duplicating identical distances everywhere. I'm seeing an order of magnitude more distance objects than edges in my graph.

Also something to consider: using fastutil for cases like these, although I'd kind of hate to add a dependency.

jbduncan · 2017-09-07T22:10:09Z

I admit I do not fully understand yet why we'd want to use fastutil (to share double-typed distances between... something, presumably), but I also agree with you that I'd kind of hate to bring in an additional dependency just to effectively cache such distances.

Hmm, I do know that Bazel has an internal class called CompactHashSet (which I believe consumes less memory than HashSet), so even if we'd still box the distances as objects, I wonder if we could borrow CompactHashSet as a sort of stopgap. But having said that, it does occur to me that Bazel is licensed under Apache 2.0 rather than BSD, so I don't know how that would affect us legal-stuff-wise.

jrtom · 2017-09-08T01:42:36Z

The Dijkstra* classes keep track of distances using a Map<N, Number> objects rather than using primitive doubles, which adds overhead.

The question of whether CompactHashSet is worth it has been the subject of a great deal of internal debate (which is why you saw it in Bazel but not Guava). I don't think it helps that much.

In any event, these are definitely third-order concerns, after being more intelligent about storing distances and sharing common distances (as discussed above).

jbduncan · 2017-11-12T19:00:33Z

@jrtom Do you know if the Guava team came to any conclusions regarding a good API for graph distance/path calculation? If not, then it seems sensible for me to have a go at another issue in the meantime.

jrtom · 2017-11-13T00:55:47Z

We haven't yet sorted that out, no (and that's on me, and it's blocked behind a few other common.graph-related things, I'm afraid). That said, even if we change the API, we're still going to need to figure out the performance issues, so if you want to dig into that some more, that would be great. If you do, I'll paste into this bug some more details that @gstevey gave me based on his investigations.

jbduncan · 2017-11-15T02:23:13Z

@jrtom Sounds interesting! Time and energies permitting, I'd be interested in pursuing this issue further, so if you could indeed paste here the details from @gstevey's investigations, then that would be great!

jbduncan · 2017-11-19T18:21:36Z

Hi @jrtom, just wanted to follow up on this particular issue. Would you have time over the next week or so to share @gstevey's investigations with me?

jrtom closed this as completed Nov 13, 2017

jrtom reopened this Nov 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dijkstra*: refactor for performance and API cleanup #100

Dijkstra*: refactor for performance and API cleanup #100

jrtom commented Aug 4, 2017

gstevey commented Aug 4, 2017

jbduncan commented Aug 4, 2017

gstevey commented Aug 4, 2017

jrtom commented Aug 4, 2017

jbduncan commented Aug 4, 2017 •

edited

Loading

jrtom commented Aug 4, 2017

jrtom commented Aug 9, 2017

jrtom commented Aug 9, 2017

jbduncan commented Sep 7, 2017 •

edited

Loading

jrtom commented Sep 8, 2017

jbduncan commented Nov 12, 2017 •

edited

Loading

jrtom commented Nov 13, 2017

jbduncan commented Nov 15, 2017

jbduncan commented Nov 19, 2017

Dijkstra*: refactor for performance and API cleanup #100

Dijkstra*: refactor for performance and API cleanup #100

Comments

jrtom commented Aug 4, 2017

gstevey commented Aug 4, 2017

jbduncan commented Aug 4, 2017

gstevey commented Aug 4, 2017

jrtom commented Aug 4, 2017

jbduncan commented Aug 4, 2017 • edited Loading

jrtom commented Aug 4, 2017

jrtom commented Aug 9, 2017

jrtom commented Aug 9, 2017

jbduncan commented Sep 7, 2017 • edited Loading

jrtom commented Sep 8, 2017

jbduncan commented Nov 12, 2017 • edited Loading

jrtom commented Nov 13, 2017

jbduncan commented Nov 15, 2017

jbduncan commented Nov 19, 2017

jbduncan commented Aug 4, 2017 •

edited

Loading

jbduncan commented Sep 7, 2017 •

edited

Loading

jbduncan commented Nov 12, 2017 •

edited

Loading