Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full-text search is slow #1573

Open
harrislapiroff opened this issue Jan 18, 2023 · 5 comments
Open

Full-text search is slow #1573

harrislapiroff opened this issue Jan 18, 2023 · 5 comments
Assignees

Comments

@harrislapiroff
Copy link
Contributor

We observed a number of timeouts over this past weekend. Trawling through the logs, Cameron noticed around those timeouts some slow requests for pages that included a text search term. I am able to reproduce that performing a text search on the tracker website takes several seconds to respond: https://pressfreedomtracker.us/all-incidents/?search=antifa

Let's find ways to improve performance here. I'll set the bar for closing this ticket as consistently achieving <1s responses for filtering queries involving text searches on the database page.

Some ideas that came up in discussion:

  • Can we rely on Wagtail's internal page search index?
  • Can we get better performance by reordering how filters are applied (e.g., if we do text-search first can that search be run on an indexed dataset instead of dynamically generating the index)?
  • Something something materialized view (I didn't fully understand this, feel free to add your actual idea @chigby)
@chigby
Copy link
Contributor

chigby commented Mar 30, 2023

Saw some more worker timeouts today, and here are the surrounding log entries:

Around 02:00

Around 05:50

Around 08:57

Looking at these logs, it is hard to immediately discern a pattern to the requests, they almost seem random (e.g. page 65 for a nonexistent target) . Some of these might be slower than others but the thing that strikes between all three of these times is that it's a lot of requests at once. But also it doesn't really seem like so far out of line with what we're getting, say, in the middle of the day?

@harrislapiroff
Copy link
Contributor Author

Do you think it's a deliberate DDoS? Someone trying to inflate the usage metrics on our filters 😉?

@chigby
Copy link
Contributor

chigby commented May 18, 2023

As of right now, the code from #1626 is deployed on staging, making direct comparisons with production somewhat straightforward.

With https://staging.pressfreedomtracker.us/all-incidents/?search=arrested, I'm getting ~3-3.5 second times. With the same query in production, https://pressfreedomtracker.us/all-incidents/?search=arrested, I'm getting ~6 second times.

@harrislapiroff
Copy link
Contributor Author

🔥

@sssoleileraaa
Copy link

As follow-up from our discussion around creating a clear performance goal, I started this new "Scalability & Performance" page in the wiki as a starting point: https://github.com/freedomofpress/fpf-www-projects/wiki

@chigby chigby mentioned this issue Oct 2, 2023
@soleilera soleilera added someday and removed someday labels Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants