Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump changes_doc_ids_optimization_threshold for Couch 3.4 #9642

Closed
dianabarsan opened this issue Nov 13, 2024 · 1 comment
Closed

Bump changes_doc_ids_optimization_threshold for Couch 3.4 #9642

dianabarsan opened this issue Nov 13, 2024 · 1 comment
Assignees
Labels
Type: Performance Make something faster
Milestone

Comments

@dianabarsan
Copy link
Member

Describe the performance issue
CouchDb 3.4 introduces an "optimization" where the changes feed with doc_ids retrieves targeted docs only when the payload is under 1000 doc_ids, and goes over the whole changes feed when it's over 1000.
Previously, there was no limit.
This makes purging and other mechanisms that rely on querying changes with doc ids be very slow.

Describe the improvement you'd like
Update purging so it hits other endpoints or work out a way to optimize it while still using the changes feed.

Measurements
We should get similar purging times on Couch 3.3 and Couch 3.4.

Additional context

#9303 (comment)

@dianabarsan dianabarsan added the Type: Performance Make something faster label Nov 13, 2024
@dianabarsan dianabarsan self-assigned this Nov 13, 2024
@dianabarsan
Copy link
Member Author

dianabarsan commented Nov 15, 2024

I've tried this over a local database with 100k docs, and these are the numbers my purge times ended up with:

CouchDb v. Method Time
v. 3.3.3 _changes 5.3 minutes
v. 3.4.2 _changes 11 minutes
v 3.4.2 _all_docs 18 minutes
v. 3.4.2 _changes with increased changes_doc_ids_optimization_threshold 5.5 minutes

So it turned out using _all_docs instead of changes requests is even worse than using the changes feed with the performance hit.
The times depend on the dataset and how many doc ids get passed as payload to these requests, but I'm afraid that the increased time when using _all_docs is serious enough to disqualify it as a viable option.

So our only alternative is to update the changes_doc_ids_optimization_threshold config to some significantly large value - we kinda limit the number of maximum docs we handle in a single purge request to ~20.000, so for safety I bumped it to 30.000 and keep current performance.
This means that no code changes are required, except for adding changes_doc_ids_optimization_threshold as a couch config value.

@dianabarsan dianabarsan changed the title Refactor purging so it does not rely on the changes feed Bump changes_doc_ids_optimization_threshold for Couch 3.4 Nov 16, 2024
@dianabarsan dianabarsan added this to the 4.16.0 milestone Nov 18, 2024
dianabarsan added a commit that referenced this issue Nov 26, 2024
set `changes_doc_ids_optimization_threshold` a high value

#9642
@andrablaj andrablaj moved this to Done in CHT Stewardship Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Make something faster
Projects
Status: Done
Development

No branches or pull requests

1 participant