-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memmem::find performance regression since 2.6.0 #161
Comments
93662e7 in particular was basically a re-organization of the entire crate. So unfortunately, we don't really have one specific and small change we can point to that is the cause of this regression. In general, this crate generally prioritizes search speed over construction speed. Of course, there has to be a balance here. I'll need to do an investigation to figure out if there's anything we can improve. But this could come down to something like, "previously, we weren't building an SSE2 fallback for AVX2 searchers, and now we are, and that causes construction times to increase." I don't know that that is the issue, but I think it represents the possible flavor. In which case, it's probably a regression we have to eat. Maybe we can "make up" for it somewhere else, I'm not sure. |
I see, that makes sense, and i wasn't aware the sse2 fallback is new. So if you're right, the trick might be to skip sse2 if avx2 is supported and this check fails, unless that's how it already is: if haystack.len() < self.avx2.min_haystack_len() {
self.sse2.find(haystack, needle)
} else {
self.avx2.find(haystack, needle)
} and also have a heuristic for skipping everying and going with rabin-karp for small needles and haystacks |
I don't think it is. I can't remember. It was just an example of what kinds of issues might appear here. :-)
That check is done at search time, past the point where SSE2 construction is done. In general, construction can't use facts about the haystack to change course. The higher level search APIs like
You can't for |
I noticed when updating
memchr
and thatmemmem::find
was considerably slower, and digging deeper found thatmemmem::Finder
takes considerably longer to build in versions 2.6.0 and above. The good thing is this revealed some opportunities for caching and reusing existingFinder
instances, but there are still cases where I have a fresh one-time needle to search in the haystack and hence I cannot cache. Comparing the commits between2.5.0
and2.6.0
revealed that the major suspects as being e49a1b8 00c6372 93662e7, with the first and the last ones being the biggest. Here are the benchmark results of simulating real-world usage in my applicationand here is a microbenchmark just running
memmem::find
in a loop to get the net perf regressionWhich comes out to ~3.3x. Here are the flamegraphs
2.6.0:
2.5.0:
The text was updated successfully, but these errors were encountered: