Fix flaky TestQueryFrontendNoRetryChunkPool test #5631
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
I saw flaky test
TestQueryFrontendNoRetryChunkPool
a lot after we added this test. For example, https://github.com/cortexproject/cortex/actions/runs/6764732560/job/18383544803.In the test, we expect querier failed to query SG due to chunk pool error. Querier will retry 3 times and return error to QFE. QFE will not retry the error and return 500 because it is chunk pool exhaustion error.
Querier has a consistency checker to decide whether we want to retry the block. When flaky happens, if the block is uploaded within the upload grace period, querier won't retry and it will return 200, causing test to fail.
The
upload grace period
is consistency check delay + 3 * bucket sync interval. consistency check delay is configured as 0s and bucket sync interval is 5s. So it is 15s in total. When I tried to reproduce the issue locally, I can reproduce every time if I increasebucket sync interval
to a larger value.This PR tries to fix this bug by reducing bucket sync interval from 5s to 1s and make sure we sleep at least 3s to make sure consistency checker retries the block.
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]