Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize compute_dataset_taxonomy_stats queries #6

Open
ffont opened this issue Mar 28, 2017 · 0 comments
Open

Optimize compute_dataset_taxonomy_stats queries #6

ffont opened this issue Mar 28, 2017 · 0 comments

Comments

@ffont
Copy link
Member

ffont commented Mar 28, 2017

In compute_dataset_taxonomy_stats(https://github.com/MTG/freesound-datasets/blob/master/datasets/tasks.py#L89) we carry out one single big query to get the number of annotations and the number of sounds for all taxonomy categories, and then we compute one extra small query for each category to get the number of non-validated annotations. There are probably two ways to optimise this:

  • Get all the information for all categories in a single query. We tried to do that but the resulting query took a really long time (~hours) to compute for full sized dataset (i.e., 250k sounds, 500k annotations approx). We reverted back to use separate queries as a quick solution to get this function usable, but maybe this query can be improved and run quickly. One way that it could be surely improved is by adding an is_validated field in the datasets.models.Annotation model which gets updated when new votes for an annotation are created. However, first option would be to try to be fast without needing to store that intermediate value.

  • Get all the information regarding num sounds and num annotations in one single big query (like now), and get all the information regarding num non validated annotations in another single big query (so running 2 big queries instead of 1 + 1 * num categories).

@ffont ffont changed the title Optimize compute_dataset_taxonomy_stats queries Optimize compute_dataset_taxonomy_stats queries Mar 28, 2017
@ffont ffont changed the title Optimize compute_dataset_taxonomy_stats queries Optimize compute_dataset_taxonomy_stats queries Mar 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants