-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add audbackend.checksum() #245
Conversation
Reviewer's Guide by SourceryThe PR introduces a new Class diagram for the new checksum functionclassDiagram
class audbackend {
+String checksum(String file)
}
class audeer {
+String md5(String file)
}
audbackend --|> audeer : uses
note for audbackend "The checksum function handles parquet files with metadata and falls back to MD5 if necessary."
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hagenw - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟡 Testing: 1 issue found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All threads resolved. Approving.
Tackles part of audeering/audb#459
This copies the checksum handling for parquet files from
audb.core.utils.md5()
into the new functionaudbackend.checksum()
. Insideaudbackend
, it replaces the usage ofaudeer.md5()
by it, to make sure we are getting the correct checksum for local parquet files, which we use for example insideaudbackend.backend.Base.get_file()
to decide if the file exists already locally.The checksum for parquet is given by its
"hash"
metadata entry, or if not available byaudeer.md5()
.As
pyarrow
is not a dependency ofaudbackend
, and should also not be, I decided to also useaudeer.md5()
ifpyarrow
is not installed and we can hence not access the metadata of the parquet file (or is there a way to access it with pure Python functions?).Summary by Sourcery
Add a new checksum function to the audbackend module, replacing the existing MD5 checksum method with support for deterministic checksums for parquet files using metadata. Update documentation and tests accordingly.
New Features:
Enhancements:
Documentation:
Tests:
Summary by Sourcery
Add a new checksum function to the audbackend module, replacing the existing MD5 checksum method with support for deterministic checksums for parquet files using metadata. Update documentation and tests accordingly.
New Features:
Enhancements:
Documentation:
Tests: