Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache-friendly prime table #31993

Closed

Conversation

blacktea
Copy link

@blacktea blacktea commented Jul 21, 2024

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

This PR optimizes the getting modulo method. Based on the benchmarks https://quick-bench.com/q/-bPdO1GtFx1DQ6Ve5dZhaah_eso, the performance increased by up to 4 times.

Method:
The original __stl_prime_list table is split into arrays of 8 elements. It allows the cache line to accommodate these arrays decreasing cache misses.
The __upper_bound_list_size array is used to find a nested array where the given newSize is located.

Testing:
The method is checked here: https://godbolt.org/z/bd7rM71f6

@vekterli vekterli requested a review from baldersheim July 22, 2024 09:45
@blacktea
Copy link
Author

Note: I'd avoid using double underscores as well since they are supposed to be reserved by compilers.
Let me know if you want me to rename variable names.

@@ -4,33 +4,32 @@

namespace {

static const unsigned long __stl_prime_list[] =
static const size_t __chunk_size{8};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further modernize:

  • Remove static, anonymous namespace has similar effect.
  • Rename to avoid leading underscore.
  • Use constexpr

};

static const unsigned long __upper_bound_list[] = {1543ul, 393241ul, 100663319ul, 4294967291ul};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid duplicating constants. I guess you can initialize by refering to last elements in stl_prime_list. Like {stl_prime_list[0][chunk_size - 1], ...}

size_t
hashtable_base::getModuloStl(size_t size) noexcept
{
return getModulo(size, __stl_prime_list, sizeof(__stl_prime_list)/sizeof(__stl_prime_list[0]));
for(size_t i = 0; i < __upper_bound_list_size; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Standard is to use space after for, if, etc. Like for ()

@jonmv
Copy link
Contributor

jonmv commented Aug 14, 2024

If this is a critical piece of code, it's much faster to find the relevant part of the list straight from the order of the msb: https://quick-bench.com/q/9W-F5Q5sCBJr96PXjsPq2a9PFGU, and then comparing the only two relevant elements. I have no idea how to write C++, though, so don't look too hard at the details :p Also, this is obviously only true while the list has the property that the primes are between consecutive powers of two (except for the first one, which is handled with the + 8 🙈).

@baldersheim
Copy link
Contributor

I have never observed this as consuming time. Perhaps @blacktea can add some info on how this was observed ?
I agree the optimal solution would be to use the msb as lookup index.

@baldersheim
Copy link
Contributor

@blacktea Do you intend to complete this one anytime soon ?

@baldersheim
Copy link
Contributor

Implemented msb index lookup in #32168, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants