memmem::find performance regression since 2.6.0 #161

Ambyjkl · 2024-09-11T01:35:52Z

I noticed when updating memchr and that memmem::find was considerably slower, and digging deeper found that memmem::Finder takes considerably longer to build in versions 2.6.0 and above. The good thing is this revealed some opportunities for caching and reusing existing Finder instances, but there are still cases where I have a fresh one-time needle to search in the haystack and hence I cannot cache. Comparing the commits between 2.5.0 and 2.6.0 revealed that the major suspects as being e49a1b8 00c6372 93662e7, with the first and the last ones being the biggest. Here are the benchmark results of simulating real-world usage in my application

Summary
  /tmp/m-2.5.0 ran
    1.19 ± 0.02 times faster than /tmp/m-54c893176860c15cb41be03412ab0bc30f20df25
    1.20 ± 0.02 times faster than /tmp/m-e49a1b89ea8c7617de2e3447c356f7813d0a88e9
    1.20 ± 0.02 times faster than /tmp/m-7af67ce25998d7e91c349f39eb3e38efcc5d5e8f
    1.21 ± 0.01 times faster than /tmp/m-00c6372daa25224a3a8698f0d0a3b222bb8b0795
    1.28 ± 0.01 times faster than /tmp/m-93662e796001b795283ec6badb59a17b4b5492cc

and here is a microbenchmark just running memmem::find in a loop to get the net perf regression

fn main() {
    for _ in 0..1000000 {
        memchr::memmem::find(b"123456789012374890172390417230941723401723490293084570293457890234589023478", b"testadfjgadfaopisdfjoipasdf");
    }
}

Summary
  target/release/2.5.0 ran
    3.31 ± 0.20 times faster than target/release/2.6.0

Which comes out to ~3.3x. Here are the flamegraphs

2.6.0:

2.5.0:

The text was updated successfully, but these errors were encountered:

BurntSushi · 2024-10-08T13:47:14Z

93662e7 in particular was basically a re-organization of the entire crate. So unfortunately, we don't really have one specific and small change we can point to that is the cause of this regression.

In general, this crate generally prioritizes search speed over construction speed. Of course, there has to be a balance here. I'll need to do an investigation to figure out if there's anything we can improve. But this could come down to something like, "previously, we weren't building an SSE2 fallback for AVX2 searchers, and now we are, and that causes construction times to increase." I don't know that that is the issue, but I think it represents the possible flavor. In which case, it's probably a regression we have to eat. Maybe we can "make up" for it somewhere else, I'm not sure.

Ambyjkl · 2024-10-08T14:11:56Z

I see, that makes sense, and i wasn't aware the sse2 fallback is new. So if you're right, the trick might be to skip sse2 if avx2 is supported and this check fails, unless that's how it already is:

        if haystack.len() < self.avx2.min_haystack_len() {
            self.sse2.find(haystack, needle)
        } else {
            self.avx2.find(haystack, needle)
        }

and also have a heuristic for skipping everying and going with rabin-karp for small needles and haystacks

BurntSushi · 2024-10-08T14:16:05Z

and i wasn't aware the sse2 fallback is new

I don't think it is. I can't remember. It was just an example of what kinds of issues might appear here. :-)

So if you're right, the trick might be to skip sse2 if avx2 is supported and this check fails, unless that's how it already is:

That check is done at search time, past the point where SSE2 construction is done. In general, construction can't use facts about the haystack to change course. The higher level search APIs like memchr::memmem::find do, but only because that API is explicitly for one-shot usage.

and also have a heuristic for skipping everying and going with rabin-karp for small needles and haystacks

You can't for memmem::Finder because construction doesn't know anything about haystack length. Otherwise, memchr::memmem::find already does this.

BurntSushi added the enhancement label Oct 8, 2024

BurntSushi mentioned this issue Nov 7, 2024

unpin and update memchr rust-lang/rust#132714

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memmem::find performance regression since 2.6.0 #161

memmem::find performance regression since 2.6.0 #161

Ambyjkl commented Sep 11, 2024

BurntSushi commented Oct 8, 2024

Ambyjkl commented Oct 8, 2024 •

edited

Loading

BurntSushi commented Oct 8, 2024

memmem::find performance regression since 2.6.0 #161

memmem::find performance regression since 2.6.0 #161

Comments

Ambyjkl commented Sep 11, 2024

BurntSushi commented Oct 8, 2024

Ambyjkl commented Oct 8, 2024 • edited Loading

BurntSushi commented Oct 8, 2024

Ambyjkl commented Oct 8, 2024 •

edited

Loading