Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glebashnik/fix tensor adress hash code #32943

Merged
merged 7 commits into from
Nov 26, 2024

Conversation

glebashnik
Copy link
Contributor

@glebashnik glebashnik commented Nov 25, 2024

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

Replaced custom hash code implementation with the one that is similar to a Java standart library.
The order of labels is reversed when generating hash codes for multi-label addresses.
For some reason this increased performance 8x for matrices with string labels.
Benchmarked with TensorFunctionBenchmark.java.

@glebashnik glebashnik assigned vekterli and unassigned vekterli Nov 25, 2024
@glebashnik glebashnik requested a review from vekterli November 25, 2024 11:25
@vekterli
Copy link
Member

Looks like a lot of time in the benchmark is spent resolving hash collisions in the binary tree fallback path in HashMap.

My guess is that hashing order impacting performance is due to how it all boils down to Long.hashCode(numeric) in the label, which is the identity function for all numbers using less than 32 bits (it just XOR-folds the top 32 bits on top of the low 32 bits). With Arrays.hashCode() the contribution of the final element hash to the output is a simple addition, so returning the identity of strictly increasing internal label IDs will yield hashes with very little difference.

Example:

Arrays.hashCode(new long[]{-2, -3, -4, -5, -6}) = 29615266
Arrays.hashCode(new long[]{-2, -3, -4, -5, -7}) = 29615267
Arrays.hashCode(new long[]{-2, -3, -4, -5, -8}) = 29615268

Flip the ordering around and you get much more dispersion due to earlier multiplies:

Arrays.hashCode(new long[]{-6, -5, -4, -3, -2}) = 33368866
Arrays.hashCode(new long[]{-7, -5, -4, -3, -2}) = 34292387
Arrays.hashCode(new long[]{-8, -5, -4, -3, -2}) = 35215908

This is very data-dependent—e.g. in the benchmark case the matrix changes most rapidly in the last dimension, but it could just as well have been the other way around.

Here's if I run the benchmark on this branch locally:

Indexed unbound vectors,                     time per join:    0.279 ms
Indexed unbound matrix,                      time per join:    0.423 ms
Indexed bound vectors,                       time per join:    0.267 ms
Indexed bound matrix,                        time per join:    0.467 ms
Mapped vectors,                              time per join:   12.158 ms
Mapped matrix,                               time per join:   18.290 ms
Mapped vectors with string labels,           time per join:   12.222 ms
Mapped matrix with string labels,            time per join:   17.490 ms
Indexed vectors, x space                     time per join:   38.040 ms
Indexed matrix, x space                      time per join:   18.380 ms
Mapped vectors, x space with string labels   time per join:   54.260 ms
Mapped matrix, x space with string labels    time per join:   92.710 ms

Running with the right-to-left hashCode evaluation order "reverted" to left-to-right:

Indexed unbound vectors,                     time per join:    0.293 ms
Indexed unbound matrix,                      time per join:    0.972 ms
Indexed bound vectors,                       time per join:    0.288 ms
Indexed bound matrix,                        time per join:    0.976 ms
Mapped vectors,                              time per join:   11.998 ms
Mapped matrix,                               time per join:  134.430 ms
Mapped vectors with string labels,           time per join:   14.824 ms
Mapped matrix with string labels,            time per join:  121.720 ms
Indexed vectors, x space                     time per join:   40.900 ms
Indexed matrix, x space                      time per join:   20.440 ms
Mapped vectors, x space with string labels   time per join:   59.540 ms
Mapped matrix, x space with string labels    time per join:  459.680 ms

High 🐌 factor as observed.

Let's sneakily change LabelImpl.hashCode() to mix bits around in a blender...

@Override
public int hashCode() {
    // return Long.hashCode(numeric);
    long k = numeric;
    // 64-bit avalanche finalization mix from MurmurHash3, with final fold down to 32 bits
    k ^= k >>> 33;
    k *= 0xff51afd7ed558ccdL;
    k ^= k >>> 33;
    k *= 0xc4ceb9fe1a85ec53L;
    k ^= k >>> 33;
    return Long.hashCode(k);
}

With this change and still with left-to-right eval:

Indexed unbound vectors,                     time per join:    0.293 ms
Indexed unbound matrix,                      time per join:    0.966 ms
Indexed bound vectors,                       time per join:    0.293 ms
Indexed bound matrix,                        time per join:    0.972 ms
Mapped vectors,                              time per join:    7.126 ms
Mapped matrix,                               time per join:   25.250 ms
Mapped vectors with string labels,           time per join:    7.496 ms
Mapped matrix with string labels,            time per join:   20.450 ms
Indexed vectors, x space                     time per join:   39.400 ms
Indexed matrix, x space                      time per join:   18.600 ms
Mapped vectors, x space with string labels   time per join:   62.380 ms
Mapped matrix, x space with string labels    time per join:   95.650 ms

With this change and right-to-left eval (i.e. the only diff from the branch is LabelImpl.hashCode:

Indexed unbound vectors,                     time per join:    0.291 ms
Indexed unbound matrix,                      time per join:    0.985 ms
Indexed bound vectors,                       time per join:    0.291 ms
Indexed bound matrix,                        time per join:    0.973 ms
Mapped vectors,                              time per join:    7.292 ms
Mapped matrix,                               time per join:   21.660 ms
Mapped vectors with string labels,           time per join:    8.214 ms
Mapped matrix with string labels,            time per join:   23.760 ms
Indexed vectors, x space                     time per join:   40.780 ms
Indexed matrix, x space                      time per join:   18.440 ms
Mapped vectors, x space with string labels   time per join:   61.510 ms
Mapped matrix, x space with string labels    time per join:   98.160 ms

Benchmark timings fluctuate a bit between runs, so minor differences shouldn't be read too much into.

Not likely a silver bullet (as with all things), but could be worth looking into as a more "generalized" solution.

vekterli
vekterli previously approved these changes Nov 26, 2024
Copy link
Member

@vekterli vekterli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@glebashnik glebashnik force-pushed the glebashnik/fix-tensor-adress-hash-code branch from b2230ba to d0aca6d Compare November 26, 2024 13:46
@glebashnik glebashnik merged commit 9536e24 into master Nov 26, 2024
3 checks passed
@glebashnik glebashnik deleted the glebashnik/fix-tensor-adress-hash-code branch November 26, 2024 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants