-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glebashnik/fix tensor adress hash code #32943
Conversation
Looks like a lot of time in the benchmark is spent resolving hash collisions in the binary tree fallback path in My guess is that hashing order impacting performance is due to how it all boils down to Example: Arrays.hashCode(new long[]{-2, -3, -4, -5, -6}) = 29615266
Arrays.hashCode(new long[]{-2, -3, -4, -5, -7}) = 29615267
Arrays.hashCode(new long[]{-2, -3, -4, -5, -8}) = 29615268 Flip the ordering around and you get much more dispersion due to earlier multiplies: Arrays.hashCode(new long[]{-6, -5, -4, -3, -2}) = 33368866
Arrays.hashCode(new long[]{-7, -5, -4, -3, -2}) = 34292387
Arrays.hashCode(new long[]{-8, -5, -4, -3, -2}) = 35215908 This is very data-dependent—e.g. in the benchmark case the matrix changes most rapidly in the last dimension, but it could just as well have been the other way around. Here's if I run the benchmark on this branch locally:
Running with the right-to-left
High 🐌 factor as observed. Let's sneakily change @Override
public int hashCode() {
// return Long.hashCode(numeric);
long k = numeric;
// 64-bit avalanche finalization mix from MurmurHash3, with final fold down to 32 bits
k ^= k >>> 33;
k *= 0xff51afd7ed558ccdL;
k ^= k >>> 33;
k *= 0xc4ceb9fe1a85ec53L;
k ^= k >>> 33;
return Long.hashCode(k);
} With this change and still with left-to-right eval:
With this change and right-to-left eval (i.e. the only diff from the branch is
Benchmark timings fluctuate a bit between runs, so minor differences shouldn't be read too much into. Not likely a silver bullet (as with all things), but could be worth looking into as a more "generalized" solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
b2230ba
to
d0aca6d
Compare
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.
Replaced custom hash code implementation with the one that is similar to a Java standart library.
The order of labels is reversed when generating hash codes for multi-label addresses.
For some reason this increased performance 8x for matrices with string labels.
Benchmarked with TensorFunctionBenchmark.java.