Simple, yet the fastest implementation so far (Rust) #179
Replies: 4 comments 5 replies
-
Nice work! Although given the details you shared, I’m a tiny bit skeptical that this finishes in 7 seconds while for example the SIMD C++ version needs 12. Are you sure you’re comparing a warm pagecache vs. a warm pagecache? EDIT: Runtimes on my ADM Ryzen 4800U with 16GB RAM for running twice in a row, without graphical environment and as little running as possible:
For reference, this is on par with the fastest Java implementation:
But at least twice as slow as the C implementation here:
I can't get the even faster C++ SIMD version to run on my machine without having to modify its source, but it should be about 10-20% faster even. |
Beta Was this translation helpful? Give feedback.
-
I get ~18.5s with 1 thread on cheap AMD 4350G, 2x 8GB 2400MHz RAM. 12 seconds with multi threads is really slow, so I think something is not correct with your test setup. For example |
Beta Was this translation helpful? Give feedback.
-
@dannyvankooten @lehuyduc That's interesting. I've run my code and others 7 times each, some of which are interleaved (to make sure there's no advantage of running earlier/later). I've also tried to purge the disk cache for some runs ( |
Beta Was this translation helpful? Give feedback.
-
Hi! I'm late to the game but I also just jumped into the fun! Currently, I look for other solutions, and jump into some discussions here and there. My current solution [0] runs 23.5s in single-core mode and 2.8s in multicore-mode (16 threads). I didn't run the Java versions (yet) on my machine.
If you mmap the whole file, you never have to copy any data or buffer it. The data is immediately available in the address space (but page faults will happen transparently in the background). mmap'ing the file is the approach I've taken in the end. So all I have to do is to send the right reference (i.e., pointer) to the threads but not copy any data to them. Some say |
Beta Was this translation helpful? Give feedback.
-
It runs for about 7-8 seconds in MBP 2019 (2.6 GHz 6-Core Intel Core i7).
As a comparison, the current top 3 Java implementations run in my machine more than 15 seconds. Other implementations (C, C++, Rust) shared in the Discussions run for more than 12 seconds.
Summary of the approach:
HashMap
, which is exactly the same asHashMap
in the stdlib, except the former has theentry_ref
method. It turns out usingentry_ref
has practically the same performance as using stdlib'sHashMap
+ theget_mut
-then-insert
approach. That's because the input data has a very few unique city names, compared to the total input lines.rayon
before, but the lock contention became a bottleneck.The implementation can be further improved by reading the input file in parallel.
Repo: https://github.com/danieljl/rust-1B-row-challenge
Beta Was this translation helpful? Give feedback.
All reactions