Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement LFU sketch using arm64 intrinsics #595

Closed
wants to merge 11 commits into from

Conversation

bitfaster
Copy link
Owner

@bitfaster bitfaster commented May 24, 2024

Below the *BlockAvx benchmark runs on the ARM64 intrinsics, FlatAVX is not implemented.

Mac M2

Repros on commit 7d7d023, but not newest.

BenchmarkDotNet v0.13.12, macOS Sonoma 14.5 (23F79) [Darwin 23.5.0]
Apple M2, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.100
  [Host]   : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
  .NET 6.0 : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.0 (8.0.23.53103), Arm64 RyuJIT AdvSIMD

BitFaster Caching Benchmarks Lfu SketchIncrement-columnchart
BitFaster Caching Benchmarks Lfu SketchFrequency-columnchart

Method Runtime Size Mean Error StdDev Ratio Allocated
**IncFlat7d7d023 .NET 6.0 1024 13.870 ns 0.0025 ns 0.0020 ns 1.00 -
IncBlock .NET 6.0 1024 13.042 ns 0.0172 ns 0.0161 ns 0.94 -
IncBlockAvx .NET 6.0 1024 6.736 ns 0.1010 ns 0.0944 ns 0.49 -
IncFlat .NET 8.0 1024 6.348 ns 0.0012 ns 0.0009 ns 1.00 -
IncBlock .NET 8.0 1024 9.951 ns 0.0909 ns 0.0851 ns 1.57 -
IncBlockAvx .NET 8.0 1024 5.842 ns 0.0145 ns 0.0135 ns 0.92 -
IncFlat .NET 6.0 32768 14.036 ns 0.2049 ns 0.1916 ns 1.00 -
IncBlock .NET 6.0 32768 13.799 ns 0.0731 ns 0.0683 ns 0.98 -
IncBlockAvx .NET 6.0 32768 7.490 ns 0.0143 ns 0.0134 ns 0.53 -
IncFlat .NET 8.0 32768 7.036 ns 0.0955 ns 0.0893 ns 1.00 -
IncBlock .NET 8.0 32768 10.223 ns 0.0179 ns 0.0150 ns 1.45 -
IncBlockAvx .NET 8.0 32768 6.507 ns 0.0051 ns 0.0042 ns 0.92 -
IncFlat .NET 6.0 524288 14.622 ns 0.0821 ns 0.0768 ns 1.00 -
IncBlock .NET 6.0 524288 16.510 ns 0.1516 ns 0.1418 ns 1.13 -
IncBlockAvx .NET 6.0 524288 7.660 ns 0.0179 ns 0.0159 ns 0.52 -
IncFlat .NET 8.0 524288 7.607 ns 0.0578 ns 0.0541 ns 1.00 -
IncBlock .NET 8.0 524288 13.160 ns 0.0397 ns 0.0371 ns 1.73 -
IncBlockAvx .NET 8.0 524288 6.672 ns 0.0112 ns 0.0099 ns 0.88 -
IncFlat .NET 6.0 8388608 61.644 ns 0.0636 ns 0.0564 ns 1.00 -
IncBlock .NET 6.0 8388608 53.673 ns 0.0489 ns 0.0458 ns 0.87 -
IncBlockAvx .NET 6.0 8388608 30.969 ns 0.0283 ns 0.0236 ns 0.50 -
IncFlat .NET 8.0 8388608 34.704 ns 0.0418 ns 0.0349 ns 1.00 -
IncBlock .NET 8.0 8388608 39.008 ns 0.0364 ns 0.0322 ns 1.12 -
IncBlockAvx .NET 8.0 8388608 27.053 ns 0.0253 ns 0.0224 ns 0.78 -
IncFlat .NET 6.0 134217728 68.909 ns 0.1676 ns 0.1486 ns 1.00 -
IncBlock .NET 6.0 134217728 63.177 ns 0.0881 ns 0.0824 ns 0.92 -
IncBlockAvx .NET 6.0 134217728 35.213 ns 0.0200 ns 0.0177 ns 0.51 -
IncFlat .NET 8.0 134217728 39.842 ns 0.0742 ns 0.0657 ns 1.00 -
IncBlock .NET 8.0 134217728 44.355 ns 0.0494 ns 0.0438 ns 1.11 -
IncBlockAvx .NET 8.0 134217728 30.773 ns 0.0278 ns 0.0247 ns 0.77 -
Method Runtime Size Mean Error StdDev Ratio Allocated
FrequencyFlat .NET 6.0 1024 22.029 ns 0.0152 ns 0.0134 ns 1.00 -
FrequencyBlock .NET 6.0 1024 17.766 ns 0.0166 ns 0.0147 ns 0.81 -
FrequencyBlockAvx .NET 6.0 1024 11.007 ns 0.0011 ns 0.0008 ns 0.50 -
FrequencyFlat .NET 8.0 1024 10.750 ns 0.0018 ns 0.0015 ns 1.00 -
FrequencyBlock .NET 8.0 1024 12.805 ns 0.0063 ns 0.0056 ns 1.19 -
FrequencyBlockAvx .NET 8.0 1024 8.762 ns 0.0895 ns 0.0837 ns 0.81 -
FrequencyFlat .NET 6.0 32768 22.058 ns 0.0131 ns 0.0103 ns 1.00 -
FrequencyBlock .NET 6.0 32768 19.284 ns 0.0289 ns 0.0241 ns 0.87 -
FrequencyBlockAvx .NET 6.0 32768 11.791 ns 0.1241 ns 0.1161 ns 0.53 -
FrequencyFlat .NET 8.0 32768 11.351 ns 0.0099 ns 0.0092 ns 1.00 -
FrequencyBlock .NET 8.0 32768 13.599 ns 0.0457 ns 0.0405 ns 1.20 -
FrequencyBlockAvx .NET 8.0 32768 9.286 ns 0.0235 ns 0.0208 ns 0.82 -
FrequencyFlat .NET 6.0 524288 22.361 ns 0.3072 ns 0.2873 ns 1.00 -
FrequencyBlock .NET 6.0 524288 20.041 ns 0.0355 ns 0.0332 ns 0.90 -
FrequencyBlockAvx .NET 6.0 524288 12.155 ns 0.0188 ns 0.0157 ns 0.54 -
FrequencyFlat .NET 8.0 524288 11.830 ns 0.0348 ns 0.0326 ns 1.00 -
FrequencyBlock .NET 8.0 524288 14.052 ns 0.0447 ns 0.0397 ns 1.19 -
FrequencyBlockAvx .NET 8.0 524288 9.436 ns 0.0219 ns 0.0205 ns 0.80 -
FrequencyFlat .NET 6.0 8388608 105.323 ns 0.1536 ns 0.1283 ns 1.00 -
FrequencyBlock .NET 6.0 8388608 62.690 ns 0.0273 ns 0.0228 ns 0.60 -
FrequencyBlockAvx .NET 6.0 8388608 43.439 ns 0.0327 ns 0.0255 ns 0.41 -
FrequencyFlat .NET 8.0 8388608 59.741 ns 0.1223 ns 0.1085 ns 1.00 -
FrequencyBlock .NET 8.0 8388608 59.074 ns 0.0829 ns 0.0735 ns 0.99 -
FrequencyBlockAvx .NET 8.0 8388608 40.865 ns 0.0844 ns 0.0789 ns 0.68 -
FrequencyFlat .NET 6.0 134217728 121.675 ns 0.4978 ns 0.4656 ns 1.00 -
FrequencyBlock .NET 6.0 134217728 72.443 ns 0.0492 ns 0.0411 ns 0.60 -
FrequencyBlockAvx .NET 6.0 134217728 48.987 ns 0.0746 ns 0.0662 ns 0.40 -
FrequencyFlat .NET 8.0 134217728 67.616 ns 0.2006 ns 0.1877 ns 1.00 -
FrequencyBlock .NET 8.0 134217728 68.747 ns 0.0631 ns 0.0527 ns 1.02 -
FrequencyBlockAvx .NET 8.0 134217728 46.334 ns 0.1333 ns 0.1113 ns 0.69 -

Windows Cobalt 100 (VM)

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22000.2960/21H2/SunValley)
Pioneer, 1 CPU, 16 logical and 16 physical cores
.NET SDK 8.0.300
  [Host]   : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
  .NET 6.0 : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD

Job=.NET 6.0  Runtime=.NET 6.0  Alloc Ratio=NA

BitFaster Caching Benchmarks Lfu SketchIncrement-columnchart
BitFaster Caching Benchmarks Lfu SketchFrequency-columnchart

Method Size Mean Error StdDev Ratio Allocated
IncFlat 32768 23.19 ns 0.013 ns 0.012 ns 1.00 -
IncFlatAvx 32768 23.41 ns 0.017 ns 0.015 ns 1.01 -
IncBlock 32768 20.36 ns 0.010 ns 0.009 ns 0.88 -
IncBlockAvx 32768 13.89 ns 0.004 ns 0.003 ns 0.60 -
IncFlat 524288 64.55 ns 1.262 ns 2.243 ns 1.00 -
IncFlatAvx 524288 64.56 ns 1.251 ns 1.753 ns 1.01 -
IncBlock 524288 50.67 ns 1.008 ns 1.120 ns 0.78 -
IncBlockAvx 524288 31.84 ns 0.632 ns 1.290 ns 0.49 -
IncFlat 8388608 92.59 ns 2.143 ns 6.285 ns 1.00 -
IncFlatAvx 8388608 91.21 ns 2.231 ns 6.507 ns 0.99 -
IncBlock 8388608 82.57 ns 1.642 ns 4.710 ns 0.90 -
IncBlockAvx 8388608 41.18 ns 0.815 ns 1.921 ns 0.45 -
IncFlat 134217728 205.91 ns 3.865 ns 4.135 ns 1.00 -
IncFlatAvx 134217728 205.51 ns 2.790 ns 2.473 ns 0.99 -
IncBlock 134217728 183.31 ns 3.610 ns 5.061 ns 0.89 -
IncBlockAvx 134217728 87.76 ns 1.755 ns 3.739 ns 0.43 -
Method Size Mean Error StdDev Ratio Allocated
FrequencyFlat 32768 38.03 ns 0.034 ns 0.028 ns 1.00 -
FrequencyFlatAvx 32768 38.05 ns 0.040 ns 0.037 ns 1.00 -
FrequencyBlock 32768 25.98 ns 0.006 ns 0.005 ns 0.68 -
FrequencyBlockAvx 32768 23.02 ns 0.007 ns 0.006 ns 0.61 -
FrequencyFlat 524288 80.73 ns 1.593 ns 2.831 ns 1.00 -
FrequencyFlatAvx 524288 81.06 ns 1.606 ns 3.056 ns 1.01 -
FrequencyBlock 524288 56.54 ns 1.009 ns 1.381 ns 0.70 -
FrequencyBlockAvx 524288 58.12 ns 1.147 ns 2.067 ns 0.72 -
FrequencyFlat 8388608 111.51 ns 2.220 ns 6.077 ns 1.00 -
FrequencyFlatAvx 8388608 113.42 ns 2.580 ns 7.525 ns 1.02 -
FrequencyBlock 8388608 89.49 ns 1.769 ns 4.784 ns 0.80 -
FrequencyBlockAvx 8388608 87.31 ns 1.840 ns 5.397 ns 0.79 -
FrequencyFlat 134217728 226.23 ns 4.506 ns 4.215 ns 1.00 -
FrequencyFlatAvx 134217728 223.52 ns 2.770 ns 2.591 ns 0.99 -
FrequencyBlock 134217728 195.64 ns 3.683 ns 3.941 ns 0.87 -
FrequencyBlockAvx 134217728 187.74 ns 3.590 ns 4.135 ns 0.83 -

Notes

To run benchmarks, start as .NET6+. Benchmarkdotnet fails to run .NET6+ benches from a .NET48 process. It may be starting under emulation on the test machine.

@bitfaster bitfaster changed the title Implement Sketch using arm intrinsics Implement Sketch using arm64 intrinsics May 24, 2024
BitFaster.Caching/Lfu/CmSketchCore.cs Fixed Show fixed Hide fixed
BitFaster.Caching/Lfu/CmSketchCore.cs Dismissed Show dismissed Hide dismissed
BitFaster.Caching/Lfu/CmSketchCore.cs Fixed Show fixed Hide fixed
BitFaster.Caching/Lfu/CmSketchCore.cs Dismissed Show dismissed Hide dismissed
@bitfaster
Copy link
Owner Author

bitfaster commented May 24, 2024

Windows on ARM results (virtual machine).

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22000.2960/21H2/SunValley)
Pioneer, 1 CPU, 16 logical and 16 physical cores
.NET SDK 8.0.300
  [Host]   : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
  .NET 6.0 : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD

Job=.NET 6.0  Runtime=.NET 6.0  Alloc Ratio=NA
Method Size Mean Error StdDev Ratio Allocated
FrequencyFlat 32768 37.98 ns 0.031 ns 0.027 ns 1.00 -
FrequencyFlatAvx 32768 37.97 ns 0.049 ns 0.046 ns 1.00 -
FrequencyBlock 32768 25.94 ns 0.009 ns 0.008 ns 0.68 -
FrequencyBlockAvx 32768 23.77 ns 0.010 ns 0.010 ns 0.63 -
FrequencyFlat 524288 80.29 ns 1.583 ns 2.934 ns 1.00 -
FrequencyFlatAvx 524288 78.85 ns 1.542 ns 2.005 ns 0.98 -
FrequencyBlock 524288 55.42 ns 1.100 ns 1.391 ns 0.69 -
FrequencyBlockAvx 524288 59.96 ns 1.185 ns 2.226 ns 0.75 -
FrequencyFlat 8388608 106.38 ns 2.236 ns 6.522 ns 1.00 -
FrequencyFlatAvx 8388608 109.39 ns 2.584 ns 7.578 ns 1.03 -
FrequencyBlock 8388608 89.16 ns 1.964 ns 5.699 ns 0.84 -
FrequencyBlockAvx 8388608 84.27 ns 1.784 ns 5.233 ns 0.79 -
FrequencyFlat 134217728 221.45 ns 4.327 ns 4.809 ns 1.00 -
FrequencyFlatAvx 134217728 215.51 ns 4.152 ns 3.884 ns 0.97 -
FrequencyBlock 134217728 188.04 ns 3.680 ns 5.837 ns 0.86 -
FrequencyBlockAvx 134217728 185.44 ns 3.694 ns 5.750 ns 0.83 -
Method Size Mean Error StdDev Ratio Allocated
IncFlat 32768 23.17 ns 0.018 ns 0.017 ns 1.00 -
IncFlatAvx 32768 23.38 ns 0.025 ns 0.024 ns 1.01 -
IncBlock 32768 20.27 ns 0.041 ns 0.039 ns 0.87 -
IncBlockAvx 32768 14.90 ns 0.012 ns 0.011 ns 0.64 -
IncFlat 524288 65.01 ns 1.285 ns 2.900 ns 1.00 -
IncFlatAvx 524288 60.48 ns 1.207 ns 1.070 ns 0.93 -
IncBlock 524288 51.21 ns 1.015 ns 2.207 ns 0.79 -
IncBlockAvx 524288 33.04 ns 0.649 ns 1.326 ns 0.51 -
IncFlat 8388608 87.51 ns 2.014 ns 5.938 ns 1.00 -
IncFlatAvx 8388608 87.62 ns 2.163 ns 6.378 ns 1.01 -
IncBlock 8388608 78.41 ns 1.555 ns 3.479 ns 0.91 -
IncBlockAvx 8388608 40.51 ns 0.503 ns 0.420 ns 0.47 -
IncFlat 134217728 202.39 ns 3.469 ns 3.245 ns 1.00 -
IncFlatAvx 134217728 203.19 ns 4.017 ns 4.626 ns 1.01 -
IncBlock 134217728 179.61 ns 3.520 ns 4.190 ns 0.89 -
IncBlockAvx 134217728 88.39 ns 1.746 ns 4.081 ns 0.44 -

@bitfaster
Copy link
Owner Author

bitfaster commented May 25, 2024

WIndows on ARM, .NET 8

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22000.2960/21H2/SunValley)
Pioneer, 1 CPU, 16 logical and 16 physical cores
.NET SDK 8.0.300
  [Host]   : .NET 8.0.5 (8.0.524.21615), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.5 (8.0.524.21615), Arm64 RyuJIT AdvSIMD

Job=.NET 8.0  Runtime=.NET 8.0  Alloc Ratio=NA
Method Size Mean Error StdDev Ratio Allocated
FrequencyFlat 32768 17.29 ns 0.004 ns 0.004 ns 1.00 -
FrequencyFlatAvx 32768 17.33 ns 0.010 ns 0.010 ns 1.00 -
FrequencyBlock 32768 20.70 ns 0.010 ns 0.009 ns 1.20 -
FrequencyBlockAvx 32768 17.26 ns 0.004 ns 0.003 ns 1.00 -
FrequencyFlat 524288 55.96 ns 1.112 ns 3.174 ns 1.00 -
FrequencyFlatAvx 524288 57.27 ns 1.151 ns 3.395 ns 1.03 -
FrequencyBlock 524288 51.76 ns 1.016 ns 1.424 ns 0.91 -
FrequencyBlockAvx 524288 43.08 ns 0.842 ns 0.969 ns 0.75 -
FrequencyFlat 8388608 77.04 ns 2.205 ns 6.466 ns 1.00 -
FrequencyFlatAvx 8388608 76.54 ns 2.012 ns 5.932 ns 1.00 -
FrequencyBlock 8388608 74.94 ns 1.497 ns 4.318 ns 0.98 -
FrequencyBlockAvx 8388608 67.66 ns 1.536 ns 4.480 ns 0.88 -
FrequencyFlat 134217728 191.81 ns 3.758 ns 3.691 ns 1.00 -
FrequencyFlatAvx 134217728 193.56 ns 3.856 ns 5.406 ns 1.02 -
FrequencyBlock 134217728 178.97 ns 3.544 ns 4.851 ns 0.93 -
FrequencyBlockAvx 134217728 175.34 ns 3.502 ns 4.169 ns 0.91 -
Method Size Mean Error StdDev Ratio Allocated
IncFlat 32768 13.09 ns 0.016 ns 0.015 ns 1.00 -
IncFlatAvx 32768 13.13 ns 0.038 ns 0.036 ns 1.00 -
IncBlock 32768 16.90 ns 0.018 ns 0.016 ns 1.29 -
IncBlockAvx 32768 11.13 ns 0.013 ns 0.012 ns 0.85 -
IncFlat 524288 39.74 ns 0.794 ns 2.132 ns 1.00 -
IncFlatAvx 524288 39.70 ns 0.769 ns 1.078 ns 1.03 -
IncBlock 524288 44.80 ns 0.724 ns 0.566 ns 1.20 -
IncBlockAvx 524288 26.39 ns 0.525 ns 1.083 ns 0.67 -
IncFlat 8388608 41.43 ns 0.824 ns 2.269 ns 1.00 -
IncFlatAvx 8388608 42.81 ns 0.894 ns 2.622 ns 1.03 -
IncBlock 8388608 65.80 ns 1.314 ns 3.748 ns 1.59 -
IncBlockAvx 8388608 36.27 ns 0.722 ns 1.715 ns 0.88 -
IncFlat 134217728 101.26 ns 2.001 ns 2.382 ns 1.00 -
IncFlatAvx 134217728 101.27 ns 1.329 ns 1.038 ns 1.01 -
IncBlock 134217728 151.88 ns 3.026 ns 6.952 ns 1.49 -
IncBlockAvx 134217728 80.18 ns 1.547 ns 3.965 ns 0.82 -

@coveralls
Copy link

coveralls commented May 25, 2024

Coverage Status

coverage: 99.162% (-0.06%) from 99.218%
when pulling 8ca47d2 on users/alexpeck/arm64
into 25ea2bd on main.

@bitfaster
Copy link
Owner Author

With VectorTableLookup

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22000.2960/21H2/SunValley)
Pioneer, 1 CPU, 16 logical and 16 physical cores
.NET SDK 8.0.300
  [Host]   : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
  .NET 6.0 : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD

Job=.NET 6.0  Runtime=.NET 6.0  Alloc Ratio=NA
Method Size Mean Error StdDev Ratio Allocated
FrequencyFlat 32768 38.20 ns 0.037 ns 0.035 ns 1.00 -
FrequencyFlatAvx 32768 38.00 ns 0.043 ns 0.040 ns 1.00 -
FrequencyBlock 32768 26.04 ns 0.019 ns 0.015 ns 0.68 -
FrequencyBlockAvx 32768 24.85 ns 0.016 ns 0.013 ns 0.65 -
FrequencyFlat 524288 78.99 ns 1.569 ns 1.927 ns 1.00 -
FrequencyFlatAvx 524288 80.64 ns 1.591 ns 3.321 ns 1.03 -
FrequencyBlock 524288 56.74 ns 1.127 ns 2.116 ns 0.71 -
FrequencyBlockAvx 524288 60.36 ns 1.207 ns 2.465 ns 0.77 -
FrequencyFlat 8388608 106.68 ns 2.151 ns 6.240 ns 1.00 -
FrequencyFlatAvx 8388608 119.36 ns 4.384 ns 12.580 ns 1.12 -
FrequencyBlock 8388608 91.31 ns 2.181 ns 6.398 ns 0.86 -
FrequencyBlockAvx 8388608 94.26 ns 2.138 ns 6.169 ns 0.89 -
FrequencyFlat 134217728 224.86 ns 4.456 ns 4.376 ns 1.00 -
FrequencyFlatAvx 134217728 222.78 ns 4.218 ns 4.142 ns 0.99 -
FrequencyBlock 134217728 195.09 ns 3.887 ns 5.450 ns 0.86 -
FrequencyBlockAvx 134217728 189.32 ns 2.508 ns 2.223 ns 0.84 -

@bitfaster
Copy link
Owner Author

bitfaster commented May 25, 2024

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22000.2960/21H2/SunValley)
Pioneer, 1 CPU, 16 logical and 16 physical cores
.NET SDK 8.0.300
  [Host]   : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD
  .NET 6.0 : .NET 6.0.30 (6.0.3024.21525), Arm64 RyuJIT AdvSIMD

Job=.NET 6.0  Runtime=.NET 6.0  Alloc Ratio=NA
Method Size Mean Error StdDev Ratio Allocated
IncFlat 32768 23.19 ns 0.013 ns 0.012 ns 1.00 -
IncFlatAvx 32768 23.41 ns 0.017 ns 0.015 ns 1.01 -
IncBlock 32768 20.36 ns 0.010 ns 0.009 ns 0.88 -
IncBlockAvx 32768 13.89 ns 0.004 ns 0.003 ns 0.60 -
IncFlat 524288 64.55 ns 1.262 ns 2.243 ns 1.00 -
IncFlatAvx 524288 64.56 ns 1.251 ns 1.753 ns 1.01 -
IncBlock 524288 50.67 ns 1.008 ns 1.120 ns 0.78 -
IncBlockAvx 524288 31.84 ns 0.632 ns 1.290 ns 0.49 -
IncFlat 8388608 92.59 ns 2.143 ns 6.285 ns 1.00 -
IncFlatAvx 8388608 91.21 ns 2.231 ns 6.507 ns 0.99 -
IncBlock 8388608 82.57 ns 1.642 ns 4.710 ns 0.90 -
IncBlockAvx 8388608 41.18 ns 0.815 ns 1.921 ns 0.45 -
IncFlat 134217728 205.91 ns 3.865 ns 4.135 ns 1.00 -
IncFlatAvx 134217728 205.51 ns 2.790 ns 2.473 ns 0.99 -
IncBlock 134217728 183.31 ns 3.610 ns 5.061 ns 0.89 -
IncBlockAvx 134217728 87.76 ns 1.755 ns 3.739 ns 0.43 -
Method Size Mean Error StdDev Ratio Allocated
FrequencyFlat 32768 38.03 ns 0.034 ns 0.028 ns 1.00 -
FrequencyFlatAvx 32768 38.05 ns 0.040 ns 0.037 ns 1.00 -
FrequencyBlock 32768 25.98 ns 0.006 ns 0.005 ns 0.68 -
FrequencyBlockAvx 32768 23.02 ns 0.007 ns 0.006 ns 0.61 -
FrequencyFlat 524288 80.73 ns 1.593 ns 2.831 ns 1.00 -
FrequencyFlatAvx 524288 81.06 ns 1.606 ns 3.056 ns 1.01 -
FrequencyBlock 524288 56.54 ns 1.009 ns 1.381 ns 0.70 -
FrequencyBlockAvx 524288 58.12 ns 1.147 ns 2.067 ns 0.72 -
FrequencyFlat 8388608 111.51 ns 2.220 ns 6.077 ns 1.00 -
FrequencyFlatAvx 8388608 113.42 ns 2.580 ns 7.525 ns 1.02 -
FrequencyBlock 8388608 89.49 ns 1.769 ns 4.784 ns 0.80 -
FrequencyBlockAvx 8388608 87.31 ns 1.840 ns 5.397 ns 0.79 -
FrequencyFlat 134217728 226.23 ns 4.506 ns 4.215 ns 1.00 -
FrequencyFlatAvx 134217728 223.52 ns 2.770 ns 2.591 ns 0.99 -
FrequencyBlock 134217728 195.64 ns 3.683 ns 3.941 ns 0.87 -
FrequencyBlockAvx 134217728 187.74 ns 3.590 ns 4.135 ns 0.83 -

@bitfaster bitfaster changed the title Implement Sketch using arm64 intrinsics Implement LFU sketch using arm64 intrinsics May 25, 2024

// Before: < 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F >
// After: < 0, 1, 2, 3, 8, 9, A, B, 4, 5, 6, 7, C, D, E, F >
var min = AdvSimd.Arm64.VectorTableLookup(a.AsByte(), Vector128.Create(0x0B0A090803020100, 0xFFFFFFFFFFFFFFFF).AsByte());
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On later versions of .NET VectorTableLookup can also take two or more 128 bit registers as input.

See this example:
dotnet/runtime#87126

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants