Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate hash for new non-deferred datasets when finishing a job #19181

Merged
merged 5 commits into from
Nov 21, 2024

Conversation

nsoranzo
Copy link
Member

This is configurable with two new options:

  • calculate_dataset_hash: in which cases Galaxy should calculate a hash for a new dataset. Possible values: 'always', 'upload' (the default), 'never'.
  • hash_function. Possible values: 'md5', 'sha1', 'sha256', 'sha512'

Hashes are calculated via a Celery task, so currently only if the 'enable_celery_tasks' option is set to true.

Also:

  • Type annotation improvements
  • Small refactorings and fixes

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

This is configurable with two new options:
- `calculate_dataset_hash`: in which cases Galaxy should calculate
  a hash for a new dataset. Possible values: 'always', 'upload'
  (the default), 'never'.
- `hash_function`. Possible values: 'md5', 'sha1', 'sha256', 'sha512'

Hashes are calculated via a Celery task, so currently only if
the 'enable_celery_tasks' option is set to true.
Copy link
Member

@jmchilton jmchilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the anxiety this causes but I think this is a good step forward.

@nsoranzo
Copy link
Member Author

Oh the anxiety this causes but I think this is a good step forward.

In case of issues, changing calculate_dataset_hash to never would be a quick fix.

@nsoranzo
Copy link
Member Author

Integration test failure unrelated.

@jmchilton jmchilton merged commit 4c46062 into galaxyproject:dev Nov 21, 2024
57 of 58 checks passed
@nsoranzo nsoranzo deleted the dataset_hashes_at_job_finish branch November 21, 2024 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants