Using IterableDataset with Torch DataLoader throws error. #2577
Replies: 1 comment 1 reply
-
PyTorch only recognize
Or you can customize with something like this:
Reference: datasets/src/datasets/iterable_dataset.py Lines 480 to 494 in c722810 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I have a general question regarding using Dataset in Streaming mode – Is IterableDataset not to be used with Pytorch DataLoader? I can use Dataset with the DataLoader without any issues (as is also mentioned in the examples), but I cannot do so with the former. I am quite new to the HF Dataset library so my apologies if this is already mentioned somewhere (I am still looking).
I get the following error, which makes sense because this is streaming mode, but I am unclear about how to design so that I can do batching then:
File “/data/leshekha/lib/HFDatasets/lib/python3.6/site-packages/torch/utils/data/sampler.py”, line 67, in iter
return iter(range(len(self.data_source)))
TypeError: object of type ‘IterableDataset’ has no len()
Any help is appreciated. Thank you.
I have asked the same question here: https://discuss.huggingface.co/t/roadmap-timeline-for-dataset-streaming/6789/5
Beta Was this translation helpful? Give feedback.
All reactions