-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: rework MemoryCell
for cache efficiency
#1672
Conversation
Benchmark Results for unmodified programs 🚀
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1672 +/- ##
=======================================
Coverage 94.80% 94.81%
=======================================
Files 101 101
Lines 38689 38715 +26
=======================================
+ Hits 36679 36707 +28
+ Misses 2010 2008 -2 ☔ View full report in Codecov by Sentry. |
4bbc851
to
84dac58
Compare
Store data and metadata inline as a single `[u64; 4]` with `32` bytes alignment, fitting a whole number of cells per cache line to reduce evictions and double sharing and to avoid split line access. Besides performance, an observable change is that now `Memory::get` always returns a `Cow::Owned` variant because the decoding process always creates a new value.
84dac58
to
96835a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging I think it will be nice to add a workflow that benchmarks the hyper-threading performance, so we can measure the improvement done here
Benchmark Results for modified programs 🚀
|
|
…erf/compact_memory_cell_main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check the benchmarks here, they are giving wrong numbers
Benchmarks are going OK now |
Store data and metadata inline as a single
[u64; 4]
with32
bytes alignment, fitting a whole number of cells per cache line to reduce evictions and double sharing and to avoid split line access.Besides performance, an observable change is that now
Memory::get
always returns aCow::Owned
variant because the decoding process always creates a new value.Checklist