-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use PriorityQueue for k-way merge (perf) #3
Comments
True, existing way is very simple and perhaps simplistic; I only tested it with simplest of sorts (byte array based one), where output speed was bounded by i/o speed. |
Hmmh. Actually, checking complexity again (re-reading code I wrote), am I missing something if I suggested that current implementation has same complexity as a binary heap, i.e. log2n? (with respect to number of comparisons). While various heap impls do have constant (at least amortized) access time for top entry, it is followed by insertion (remove + insert) which typically is log N isnt it? |
Yes, indeed. Wolfgang. On Dec 15, 2011, at 1:25 PM, Tatu Saloranta wrote:
|
Ok. So current impl might work well enough, with respect to number of comparisons. Although one could reduce number of calls being made (due to chaining). So maybe I'll still consider binary heap, like adapting https://github.com/simonwrafter/BinaryHeap |
I think a java.util.PriorityQueue would work just fine, no? At least that's what we ended up using for our external merge sort impl a few years ago. BTW, every time you take an entry object from that queue you can reuse it and put it right back into the queue, with a new value from the next item in the input stream. This is to avoid gc issues. Wolfgang. On Dec 15, 2011, at 2:38 PM, Tatu Saloranta wrote:
|
My question is basically would PriorityQueue work better? Or would it just be for reducing amount of code? (which is nice thing of course, assuming it'd work as well) Although if one can recycle entries, that's good (to know) -- I wasn't sure, and if not, it would indeed produce tons of unnecessary garbage. |
Just for reducing code I guess. I think they're all binary heap impls with the same performance characteristics. Here is an outline of how we reuse the entries:
Wolfgang. On Dec 15, 2011, at 3:15 PM, Tatu Saloranta wrote:
|
Ok, gotcha. Just wanted to be sure I understood -- for some reason I first thought I had used equivalent of linear lookup, plus, didnt realize JDK had this nice class available. :-) |
Just read through the code: indeed it is using the binary heap on array so it should work just fine. |
I am not familiar enough with these issues to completely follow the discussion here, but I thought I'd share a link to a similar library that uses a PriorityQueue (though perhaps not in the manner implied in the discussion above). Written by Daniel Lemire (https://github.com/lemire) and in the public domain. PS Wolfgang: huge fan of Nux, btw! |
For a k-way merge with k>2 consider using a PriorityQueue instead of multiple 2-way mergers for enhanced performance.
The text was updated successfully, but these errors were encountered: