You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the performance issue
CouchDb 3.4 introduces an "optimization" where the changes feed with doc_ids retrieves targeted docs only when the payload is under 1000 doc_ids, and goes over the whole changes feed when it's over 1000.
Previously, there was no limit.
This makes purging and other mechanisms that rely on querying changes with doc ids be very slow.
Describe the improvement you'd like
Update purging so it hits other endpoints or work out a way to optimize it while still using the changes feed.
Measurements
We should get similar purging times on Couch 3.3 and Couch 3.4.
I've tried this over a local database with 100k docs, and these are the numbers my purge times ended up with:
CouchDb v.
Method
Time
v. 3.3.3
_changes
5.3 minutes
v. 3.4.2
_changes
11 minutes
v 3.4.2
_all_docs
18 minutes
v. 3.4.2
_changes with increased changes_doc_ids_optimization_threshold
5.5 minutes
So it turned out using _all_docs instead of changes requests is even worse than using the changes feed with the performance hit.
The times depend on the dataset and how many doc ids get passed as payload to these requests, but I'm afraid that the increased time when using _all_docs is serious enough to disqualify it as a viable option.
So our only alternative is to update the changes_doc_ids_optimization_threshold config to some significantly large value - we kinda limit the number of maximum docs we handle in a single purge request to ~20.000, so for safety I bumped it to 30.000 and keep current performance.
This means that no code changes are required, except for adding changes_doc_ids_optimization_threshold as a couch config value.
dianabarsan
changed the title
Refactor purging so it does not rely on the changes feed
Bump changes_doc_ids_optimization_threshold for Couch 3.4
Nov 16, 2024
Describe the performance issue
CouchDb 3.4 introduces an "optimization" where the changes feed with doc_ids retrieves targeted docs only when the payload is under 1000 doc_ids, and goes over the whole changes feed when it's over 1000.
Previously, there was no limit.
This makes purging and other mechanisms that rely on querying changes with doc ids be very slow.
Describe the improvement you'd like
Update purging so it hits other endpoints or work out a way to optimize it while still using the changes feed.
Measurements
We should get similar purging times on Couch 3.3 and Couch 3.4.
Additional context
#9303 (comment)
The text was updated successfully, but these errors were encountered: