Simple, yet the fastest implementation so far (Rust) #179

danieljl · 2024-01-06T19:11:08Z

danieljl
Jan 6, 2024

It runs for about 7-8 seconds in MBP 2019 (2.6 GHz 6-Core Intel Core i7).

As a comparison, the current top 3 Java implementations run in my machine more than 15 seconds. Other implementations (C, C++, Rust) shared in the Discussions run for more than 12 seconds.

Summary of the approach:

Uses a general-purpose float parser (not an input-specific one)
Uses a general-purpose hash function (not an input-specific one)
Uses hashbrown's HashMap, which is exactly the same as HashMap in the stdlib, except the former has the entry_ref method. It turns out using entry_ref has practically the same performance as using stdlib's HashMap + the get_mut-then-insert approach. That's because the input data has a very few unique city names, compared to the total input lines.
Uses crossbeam_channel for sending data from the main thread (which only reads the input file) to the worker threads. Tried using rayon before, but the lock contention became a bottleneck.

The implementation can be further improved by reading the input file in parallel.

Repo: https://github.com/danieljl/rust-1B-row-challenge

dannyvankooten · 2024-01-06T21:42:24Z

dannyvankooten
Jan 6, 2024

Nice work! Although given the details you shared, I’m a tiny bit skeptical that this finishes in 7 seconds while for example the SIMD C++ version needs 12. Are you sure you’re comparing a warm pagecache vs. a warm pagecache?

EDIT: Runtimes on my ADM Ryzen 4800U with 16GB RAM for running twice in a row, without graphical environment and as little running as possible:

$ time ./target/release/rs-1brc measurements.txt

real	0m6.493s
user	0m55.099s
sys	0m4.489s

$ time ./target/release/rs-1brc measurements.txt

real	0m5.195s
user	0m55.983s
sys	0m3.257s

For reference, this is on par with the fastest Java implementation:

$ ./calculate_average_royvanrijn.sh

real	0m5.604s
user	1m3.284s
sys	0m2.472s

$ ./calculate_average_royvanrijn.sh

real	0m4.693s
user	1m5.548s
sys	0m1.205s

But at least twice as slow as the C implementation here:

$ time ./bin/analyze measurements.txt


real	0m2.037s
user	0m25.888s
sys	0m0.912s


$ time ./bin/analyze measurements.txt

real	0m1.988s
user	0m25.474s
sys	0m1.045s

I can't get the even faster C++ SIMD version to run on my machine without having to modify its source, but it should be about 10-20% faster even.

0 replies

lehuyduc · 2024-01-07T01:32:43Z

lehuyduc
Jan 7, 2024

I get ~18.5s with 1 thread on cheap AMD 4350G, 2x 8GB 2400MHz RAM. 12 seconds with multi threads is really slow, so I think something is not correct with your test setup.

For example N_THREADS should be 12 on your PC. There's also compiler version, background tasks running, etc

0 replies

danieljl · 2024-01-07T07:32:51Z

danieljl
Jan 7, 2024
Author

@dannyvankooten @lehuyduc That's interesting. I've run my code and others 7 times each, some of which are interleaved (to make sure there's no advantage of running earlier/later). I've also tried to purge the disk cache for some runs (sudo purge in MacOS). For all the runs my implementation ran significantly faster than others.

5 replies

lehuyduc Jan 7, 2024

Which C++ compiler version did you use? That might be a problem.

danieljl Jan 7, 2024
Author

I'm currently away from my laptop, but I think the compiler is at least 2 years old. How does my implementation run on your machine?

lehuyduc Jan 7, 2024

Actually 2 years is not really old. What command do you use to compile + run?

dannyvankooten Jan 7, 2024

Could it be that mmap is somehow really slow on MacOS? In #67 there is @corlinp with a similarly fast solution on his machine that is much slower on other machines (or vice versa, the mmap solutions are slow on his while his is fast).

danieljl Jan 7, 2024
Author

What command do you use to compile + run?

@lehuyduc I just ran the build script of each repo. For your case, it's run_cpp.sh.

Could it be that mmap is somehow really slow on MacOS?

That seems likely the culprit. See https://stackoverflow.com/a/5837676

phip1611 · 2024-05-06T20:14:32Z

phip1611
May 6, 2024

Hi! I'm late to the game but I also just jumped into the fun! Currently, I look for other solutions, and jump into some discussions here and there. My current solution [0] runs 23.5s in single-core mode and 2.8s in multicore-mode (16 threads). I didn't run the Java versions (yet) on my machine.

Uses crossbeam_channel for sending data from the main thread (which only reads the input file) to the worker threads. Tried using rayon before, but the lock contention became a bottleneck.

If you mmap the whole file, you never have to copy any data or buffer it. The data is immediately available in the address space (but page faults will happen transparently in the background). mmap'ing the file is the approach I've taken in the end. So all I have to do is to send the right reference (i.e., pointer) to the threads but not copy any data to them.

Some say mmap is slower than reading the file the regular way. However, given the architecture of the two different interfaces to access files, I can't just easily compare both approaches. But I think that 2.8s on my laptop makes it pretty clear that it works well. My NVME SSD is capable of reading 6 GBit/s. The file with 1B rows is 14GB on my machine.

[0] https://github.com/phip1611/1brc-rust

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple, yet the fastest implementation so far (Rust) #179

{{title}}

Replies: 4 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Simple, yet the fastest implementation so far (Rust) #179

danieljl Jan 6, 2024

Replies: 4 comments · 5 replies

dannyvankooten Jan 6, 2024

lehuyduc Jan 7, 2024

danieljl Jan 7, 2024 Author

lehuyduc Jan 7, 2024

danieljl Jan 7, 2024 Author

lehuyduc Jan 7, 2024

dannyvankooten Jan 7, 2024

danieljl Jan 7, 2024 Author

phip1611 May 6, 2024

danieljl
Jan 6, 2024

Replies: 4 comments 5 replies

dannyvankooten
Jan 6, 2024

lehuyduc
Jan 7, 2024

danieljl
Jan 7, 2024
Author

danieljl Jan 7, 2024
Author

danieljl Jan 7, 2024
Author

phip1611
May 6, 2024