Wrong order after rdst #12

vigna · 2024-01-27T09:45:50Z

We have a file containing a billion pairs of usize that rdst fails to sort properly. You can get the data at http://vigna.di.unimi.it/data.tsv.gz (you'll have to gunzip it), and the program to replicate the problem is

use rayon::prelude::*;
use rdst::*;
use std::{fs::File, io::BufRead, io::BufReader};

#[derive(Clone, Copy)]
struct Pair([usize; 2]);

impl RadixKey for Pair {
    const LEVELS: usize = 16;

    fn get_level(&self, level: usize) -> u8 {
        (self.0[1 - level / 8] >> ((level % 8) * 8)) as u8
    }
}

fn main() {
    let mut v = Vec::with_capacity(1_000_000_000);
    for l in BufReader::new(File::open("data.tsv").unwrap()).lines() {
        let line = l.unwrap();
        let mut iter = line.split("\t").map(|x| x.parse::<usize>().unwrap());
        v.push(Pair([iter.next().unwrap(), iter.next().unwrap()]));
    }
    v.radix_sort_unstable();
    v.par_windows(2)
        .for_each(|v| assert!(v[0].0 <= v[1].0, "{:?} > {:?}", v[0].0, v[1].0));
}

which prints

thread '<unnamed>' panicked at src/bin/test.rs:25:23:
[1050093940, 84812442] > [1050093939, 86974995]

The text was updated successfully, but these errors were encountered:

nessex · 2024-01-30T12:38:56Z

Thanks! I've grabbed the file and managed to replicate it. Still trying to work out the cause though. It takes a while, the issue appears right at the end of the dataset, and when taking a subset of the data it disappears 😅

It looks like it's possibly something to do with one of the layer skips in LsbSort, so unfortunately for data that big there isn't really a great replacement for that algorithm. You could limit the sort to just Regions sort and Ska sort for now, but it's a lot slower for that data.

vigna · 2024-01-30T13:36:13Z

Ok thanks!

BTW, any possibility of getting a stable version? 😏

nessex · 2024-02-03T11:12:36Z

BTW, any possibility of getting a stable version? 😏

Depends on if you mean a stable library or stable sort order 😅 . For the former, I've released 0.20.13 that includes a fix for the ordering issue here. It was an issue with the combination of two optimizations (level skip + alternating input vs. output arrays), where counting wasn't alternating input arrays. Counts were correct (same data), but the flag / check for if the data was "already sorted" wasn't.

If you mean stable sort order, give Algorithm::MtLsb and Algorithm::Lsb a try! They have stable output. Something like this should do:

struct StableTuner;
impl Tuner for StableTuner {
    fn pick_algorithm(&self, p: &TuningParams, _counts: &[usize]) -> Algorithm {
        if p.input_len >= 500_000 {
            Algorithm::MtLsb
        } else {
            Algorithm::Lsb
        }
    }
}

vigna · 2024-02-03T11:49:12Z

The second thing LOL. Thanks for the fix!

However, in the end we realized unstable is fine. Any suggestion for optimizing the sort for a few billion pairs (usize, usize)?

nessex · 2024-02-03T14:26:13Z

The most important thing would be to make the RadixKey impl. as fast as possible. If you can flip the order of the pair of usize, and if you're on exclusively little-endian machines, something like this is probably faster:

impl RadixKey for Pair {
    const LEVELS: usize = 16;

    #[inline(always)]
    fn get_level(&self, level: usize) -> u8 {
        #[cfg(target_endian = "little")]
        unsafe { (self.0.as_ptr() as *const u8).add(level).read_unaligned() }

        #[cfg(target_endian="big")]
        compile_error!("big endian is not supported");
    }
}

This function is used in some very hot loops, so a single conditional or divide can have a major impact.

Beyond that, the standard tuner is still in the same ballpark as other options I tried when running it against that billion pairs dataset you provided. Occasionally a different tuning was a little faster / slower, but nothing consistently so.

Depending on the machine you're running on, the included low_mem tuner could be faster though. Most of my runs were on a m1 macbook, which seems to have great memory performance (both bandwidth and latency). On a mid-range Ryzen it's a different story and memory can be a bit of a bottleneck, so the low_mem tuner wins out more often. If you test the standard tuner, low_mem tuner and also that MtLsb option in the comment above, then you'll have a pretty good snapshot of what options exist. More complex tunings probably won't have a great impact beyond those three options.

vigna · 2024-02-03T15:22:43Z

Thanks! As is it it's two times faster than par_sort_unstable, so we're already very happy. We'll keep experimenting. Thanks for a great crate!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong order after rdst #12

Wrong order after rdst #12

vigna commented Jan 27, 2024

nessex commented Jan 30, 2024

vigna commented Jan 30, 2024

nessex commented Feb 3, 2024 •

edited

Loading

vigna commented Feb 3, 2024

nessex commented Feb 3, 2024 •

edited

Loading

vigna commented Feb 3, 2024

Wrong order after rdst #12

Wrong order after rdst #12

Comments

vigna commented Jan 27, 2024

nessex commented Jan 30, 2024

vigna commented Jan 30, 2024

nessex commented Feb 3, 2024 • edited Loading

vigna commented Feb 3, 2024

nessex commented Feb 3, 2024 • edited Loading

vigna commented Feb 3, 2024

nessex commented Feb 3, 2024 •

edited

Loading

nessex commented Feb 3, 2024 •

edited

Loading