Make search independent of word order and fuzzy #2335
Replies: 3 comments 3 replies
-
If you want to give it a shot, here's the dev getting started guide: And here's the core search logic: https://github.com/hay-kot/mealie/blob/de4debe74963ef48f057588c666ca2e214ab4cde/mealie/repos/repository_recipes.py#L153 |
Beta Was this translation helpful? Give feedback.
-
Probably want to include what hardware you're bench testing on. Lots of people run this of Pi's which I think will totally tank any feasibility here, but that's just a guess. Bet it runs great on my M1 though! |
Beta Was this translation helpful? Give feedback.
-
This has since been implemented |
Beta Was this translation helpful? Give feedback.
-
Before submitting this feature request I have
Please Describe The Problem To Be Solved
Recipe search is currently highly dependent on search phrasing and is non-fuzzy. For example, "pinto beans" and "beans pinto" are different searches, and "eggs over easy" is different than "egg over easy". See #2325
Suggest A Solution
The different function call names to Levenshtein distance and trigram search in sqlite vs Postgres make this a bit complicated via the sqlalchemy interface. @fleshgolem therefore suggested a two-stage fix:
This is safe, sane and makes a lot of sense.
However, I also did some benchmarking with RapidFuzz (https://maxbachmann.github.io/RapidFuzz/) to test the crazy idea of just pulling relevant database fields from all recipes and then fuzzing against everything to figure out what matches. This would be database independent and could be easily pythonized inside the search function, but possibly slower. I was surprised to instead find out that RapidFuzz is blazingly fast and this is actually very fast with even a large dataset.
I started with a test set of 5 different recipes and made a "meta string" of the "name", "description", "tags" and "recipeIngredient:note" fields for each. Running 10'000 iterations of a four-word search against these meta strings (e.g. simulating 5 separate 10'000 recipe databases) using even the slowest RapidFuzz algorithms (partial_token_set_ratio and partial_token_sort_ratio) takes on average 0.1 seconds per 10'000 recipe search. These algorithms tokenise and variably sort or set-ize the strings to make them independent of word order and remove repeated words. The fuzzy matching was excellent (stop words should be removed). A one-word search is even faster, at average 0.05 seconds per 10'000 recipe search.
Breaking the search up into tiered search of various fields could be even faster, if real-life usage suggests that the meta-string approach is problematic.
Additional Information
Sort of. I have never developed inside docker and am only passingly familiar with the mealie codebase. But I could share a the inner code for a potential RapidFuzz-based change with someone who has more experience with the mealie code integration workflow (e.g. testing etc)
Beta Was this translation helpful? Give feedback.
All reactions