Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report number of batches in a spectrum dataset #60

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bittremieux
Copy link
Collaborator

Useful for timing estimates in the progress bar.

Useful for timing estimates in the progress bar.
@bittremieux bittremieux requested a review from wfondrie July 24, 2024 07:39
Copy link

codecov bot commented Jul 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.50%. Comparing base (486221c) to head (e188d80).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #60   +/-   ##
=======================================
  Coverage   97.49%   97.50%           
=======================================
  Files          24       24           
  Lines         957      960    +3     
=======================================
+ Hits          933      936    +3     
  Misses         24       24           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bittremieux
Copy link
Collaborator Author

Hmm, using this a bit more, it actually gives the following warning:

WARNING: UserWarning: Your IterableDataset has __len__ defined. In combination with multi-process data loading (when num_workers > 1), __len__ could be inaccurate if each worker is not configured independently to avoid having duplicate data.

So maybe not an ideal tweak in the end. Is the SpectrumDataset compatible with getting accurate timing estimates from the PyTorch Lightning progress bar (for which the number of batches is needed)?

@bittremieux
Copy link
Collaborator Author

Addition: because SpectrumDataset is an IterableDataset, it also doesn't support shuffling. We might want to do this during training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant