Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace data backend and increase flexibility. #39

Merged
merged 23 commits into from
Oct 10, 2023
Merged

Conversation

wfondrie
Copy link
Owner

This PR migrate the data backend for Depthcharge from HDF5 to Apache Arrow-based formats, and updates the API in the process.

Overall this PR simplifies the code base and allows for more flexibility in data parsing and usage for advanced users. Please note that I'm still adding documentation, but the code itself is ready for review.

@codecov
Copy link

codecov bot commented Sep 28, 2023

Codecov Report

Merging #39 (c2a4d48) into main (54f36bf) will increase coverage by 1.00%.
The diff coverage is 94.94%.

@@            Coverage Diff             @@
##             main      #39      +/-   ##
==========================================
+ Coverage   91.48%   92.48%   +1.00%     
==========================================
  Files          19       22       +3     
  Lines         963      972       +9     
==========================================
+ Hits          881      899      +18     
+ Misses         82       73       -9     
Files Coverage Δ
depthcharge/__init__.py 100.00% <100.00%> (ø)
depthcharge/data/__init__.py 100.00% <100.00%> (ø)
depthcharge/data/fields.py 100.00% <100.00%> (ø)
depthcharge/primitives.py 96.66% <100.00%> (-0.65%) ⬇️
depthcharge/tokenizers/peptides.py 70.42% <100.00%> (+1.76%) ⬆️
depthcharge/transformers/spectra.py 97.87% <100.00%> (+1.87%) ⬆️
depthcharge/utils.py 100.00% <100.00%> (+27.27%) ⬆️
depthcharge/version.py 100.00% <100.00%> (ø)
depthcharge/data/arrow.py 97.14% <97.14%> (ø)
depthcharge/data/parsers.py 97.22% <96.19%> (+0.25%) ⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@wfondrie wfondrie requested a review from jspaezp September 28, 2023 23:12
@wfondrie
Copy link
Owner Author

@jspaezp, on second thought, I'll update the docs in a separate PR. This one is large enough 😅

data/TMT10-Trial-8.mzML Show resolved Hide resolved
depthcharge/data/parsers.py Show resolved Hide resolved
depthcharge/data/parsers.py Outdated Show resolved Hide resolved
depthcharge/utils.py Outdated Show resolved Hide resolved
depthcharge/__init__.py Show resolved Hide resolved
depthcharge/data/parsers.py Show resolved Hide resolved
depthcharge/data/parsers.py Show resolved Hide resolved
depthcharge/data/spectrum_datasets.py Show resolved Hide resolved
@wfondrie wfondrie requested a review from jspaezp October 3, 2023 20:09
AssertionError
Indicates that the two dictionaries are not equal.
"""
bad_keys = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this test will not test correctly in both directions (if dict1 is empty but dict2 isnt it will pass)

@jspaezp jspaezp self-requested a review October 9, 2023 22:38
@wfondrie wfondrie merged commit 2079903 into main Oct 10, 2023
@wfondrie wfondrie deleted the feat/parquet-backend branch October 10, 2023 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants