Skip to content

Commit

Permalink
Detailed algorithm description
Browse files Browse the repository at this point in the history
  • Loading branch information
IlyaGrebnov committed Apr 21, 2024
1 parent 38f39d2 commit a472b25
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 44 deletions.
49 changes: 49 additions & 0 deletions Benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,3 +116,52 @@ The times are the minimum of five runs measuring **single-threaded (ST)** and **
| fib41 | 267914296 | 8.390 sec ( 31.93 MB/s) | 37.397 sec ( 7.16 MB/s) |**+345.72%**| 3.794 sec ( 70.61 MB/s) | 37.158 sec ( 7.21 MB/s) |**+879.30%**|
| rs.13 | 216747218 | 6.425 sec ( 33.74 MB/s) | 28.994 sec ( 7.48 MB/s) |**+351.30%**| 2.807 sec ( 77.21 MB/s) | 31.609 sec ( 6.86 MB/s) |**+1025.97%**|
| tm29 | 268435456 | 8.931 sec ( 30.06 MB/s) | 31.267 sec ( 8.59 MB/s) |**+250.09%**| 3.729 sec ( 71.99 MB/s) | 31.384 sec ( 8.55 MB/s) |**+741.69%**|



## Large pages and multi-core systems support

Large-pages and OpenMP improves the libsais performance. Here is an example of such improvements on Manzini Corpus.

| file | size | baseline| LP | LP w 2c | LP w 3c | LP w 4c | LP w 5c | LP w 6c | LP w 7c | LP w 8c |
|:---------------:|:---------:|:-------:|:-------:|:-------:|:-------:|:-------:|:-------:|:-------:|:-------:|:-------:|
| chr22.dna | 34553758 |43.50MB/s|50.18MB/s|61.20MB/s|73.66MB/s|78.91MB/s|81.20MB/s|81.49MB/s|81.52MB/s|80.42MB/s|
| etext99 | 105277340 |32.96MB/s|40.73MB/s|50.19MB/s|59.34MB/s|62.97MB/s|64.06MB/s|62.83MB/s|63.08MB/s|62.49MB/s|
| gcc-3.0.tar | 86630400 |44.32MB/s|50.13MB/s|58.51MB/s|68.85MB/s|73.82MB/s|75.76MB/s|76.14MB/s|75.85MB/s|75.24MB/s|
| howto | 39422105 |42.78MB/s|48.10MB/s|57.38MB/s|67.75MB/s|71.91MB/s|73.67MB/s|73.61MB/s|73.17MB/s|72.38MB/s|
| jdk13c | 69728899 |42.70MB/s|47.77MB/s|54.50MB/s|64.85MB/s|69.63MB/s|71.66MB/s|72.15MB/s|71.96MB/s|71.24MB/s|
| linux-2.4.5.tar | 116254720 |42.46MB/s|48.85MB/s|57.60MB/s|67.92MB/s|72.29MB/s|73.88MB/s|74.11MB/s|73.59MB/s|73.27MB/s|
| rctail96 | 114711151 |36.39MB/s|43.19MB/s|50.96MB/s|60.60MB/s|64.33MB/s|65.43MB/s|65.79MB/s|65.78MB/s|65.18MB/s|
| rfc | 116421901 |39.81MB/s|46.76MB/s|55.92MB/s|66.48MB/s|70.79MB/s|71.68MB/s|72.21MB/s|71.92MB/s|71.06MB/s|
| sprot34.dat | 109617186 |36.09MB/s|45.06MB/s|53.26MB/s|61.60MB/s|59.69MB/s|62.25MB/s|67.20MB/s|66.84MB/s|66.38MB/s|
| w3c2 | 104201579 |42.97MB/s|47.09MB/s|54.01MB/s|63.79MB/s|67.67MB/s|69.84MB/s|69.94MB/s|69.65MB/s|68.86MB/s|

Note, multi-core scalability is limited by RAM bandwidth and adding more RAM channels improves performance:
![enwik9 BWT throughput in MB/s on Azure DS14 v2 (Intel Xeon Platinum 8171M)](Azure_enwik9_benchmark.png?raw=true "enwik9 BWT throughput in MB/s on Azure DS14 v2 (Intel Xeon Platinum 8171M)")

## libsais64 for inputs larger than 2GB

Starting from version 2.2.0 libsais64 could process inputs larger than 2GB.

The times below are the minimum of five runs measuring **multi-threaded (MT)** performance of suffix array construction on Azure DS14 v2 (Intel Xeon Platinum 8171M).

| file | size | libsais64 2.2.0 (MT) | divsufsort64 2.0.2 (MT) |speedup (MT)|
|:---------------:|:-----------:|:--------------------------:|:--------------------------:|:----------:|
| english | 2210395553 | 61.499 sec ( 34.28 MB/s) | 435.199 sec ( 4.84 MB/s) |**+607.65%**|
| GRCh38.p13.fa | 3321586957 | 84.068 sec ( 37.68 MB/s) | 782.938 sec ( 4.05 MB/s) |**+831.32%**|
| enwik10 | 10000000000 | 303.542 sec ( 31.42 MB/s) |1927.351 sec ( 4.95 MB/s) |**+534.95%**|

## Additional memory

The libsais reuses space allocated for suffix array during construction. Sometimes this free space is not sufficient for most optimal algorithm (this is uncommon) and libsais will need to fallback to less efficient one (libsais has 4 algorithms at different break-points point: 6k, 4k, 2k and 1k; where k is alphabet size). To improve performance for those cases you could allocating additional space at the end of suffix array.

| file | size | libsais + O(n) (ST) | libsais + O(1) (ST) |speedup (ST)| libsais + O(n) (MT) | libsais + O(1) (ST) |speedup (MT)|
|:---------------:|:-----------:|:--------------------------:|:--------------------------:|:----------:|:--------------------------:|:--------------------------:|:----------:|
| osdb | 10085684 | 0.222 sec ( 45.52 MB/s) | 0.228 sec ( 44.20 MB/s) | **+2.97%**| 0.150 sec ( 67.30 MB/s) | 0.162 sec ( 62.25 MB/s) | **+8.11%**|
| x-ray | 8474240 | 0.190 sec ( 44.52 MB/s) | 0.217 sec ( 39.11 MB/s) | **+13.82%**| 0.122 sec ( 69.46 MB/s) | 0.156 sec ( 54.16 MB/s) | **+28.25%**|
| sao | 7251944 | 0.175 sec ( 41.48 MB/s) | 0.182 sec ( 39.75 MB/s) | **+4.37%**| 0.127 sec ( 57.26 MB/s) | 0.140 sec ( 51.87 MB/s) | **+10.39%**|
| ooffice | 6152192 | 0.113 sec ( 54.55 MB/s) | 0.117 sec ( 52.45 MB/s) | **+4.01%**| 0.081 sec ( 76.38 MB/s) | 0.088 sec ( 70.30 MB/s) | **+8.65%**|
| abac | 200000 | 0.002 sec ( 84.36 MB/s) | 0.003 sec ( 73.63 MB/s) | **+14.56%**| 0.002 sec ( 105.08 MB/s) | 0.002 sec ( 86.64 MB/s) | **+21.27%**|
| test3 | 2097088 | 0.034 sec ( 61.54 MB/s) | 0.037 sec ( 56.45 MB/s) | **+9.03%**| 0.028 sec ( 75.76 MB/s) | 0.032 sec ( 64.93 MB/s) | **+16.68%**|

> * All other files from [Benchmarks](#benchmarks) above do not suffer from this fallbacks.
Loading

0 comments on commit a472b25

Please sign in to comment.