Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MongoDB query timeout #3158

Closed
jnm opened this issue Apr 28, 2021 · 2 comments · Fixed by kobotoolbox/kobocat#710 or #3210
Closed

Add MongoDB query timeout #3158

jnm opened this issue Apr 28, 2021 · 2 comments · Fixed by kobotoolbox/kobocat#710 or #3210
Assignees
Labels
high priority To be done soon

Comments

@jnm
Copy link
Member

jnm commented Apr 28, 2021

We have an issue where zombie Mongo queries are able to run for several hours—long after any client that requested them has perished. Unfortunately, a global query timeout seems impossible, but maybe we could wrap pymongo somehow so that it sends maxTimeMS with all queries: https://stackoverflow.com/a/60542564/2402324

Whatever solution we arrive at we should implement in KoBoCAT as well.

Somewhat related to kobotoolbox/kobocat#696 in that both that and this work together to cause MongoDB slowdowns, which lead to users getting 502s

@jnm jnm added the high priority To be done soon label May 5, 2021
@jnm
Copy link
Member Author

jnm commented May 5, 2021

Elevated the priority because the servers are really struggling under the load.

As I mentioned earlier, I don't see a way to set maxTimeMS on every query used with our pymongo MONGO_DB, but maybe the easiest thing to do is to make a helper function that wraps MONGO_DB.instances.find() and adds the maxTimeMS argument.

The limit should be CELERY_TASK_TIME_LIMIT (converted to milliseconds) + some grace period

kpi/kobo/settings/base.py

Lines 422 to 426 in 7edbc13

# Default to a 30-minute soft time limit and a 35-minute hard time limit
CELERY_TASK_TIME_LIMIT = int(os.environ.get('CELERYD_TASK_TIME_LIMIT', 2100))
CELERY_TASK_SOFT_TIME_LIMIT = int(os.environ.get(
'CELERYD_TASK_SOFT_TIME_LIMIT', 1800))

@jnm
Copy link
Member Author

jnm commented May 5, 2021

It's more of a sysadmin thing, but for reference, here's a quick and really-very-dirty 🙈 method to reduce load from runaway queries:

root@mongo:/# while true; do mongo -u root -p "$MONGO_INITDB_ROOT_PASSWORD" admin --eval 'db.currentOp(true).inprog.forEach(function(op){ if(op.secs_running > 2110) { print(op.opid); db.killOp(op.opid) } });'; sleep 10; done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority To be done soon
Projects
None yet
2 participants