Replies: 1 comment
-
We don't have an official API to push data to the Hub (or remote storage) on the fly without caching it beforehand. But it shouldn't be too hard to implement manually with |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
concerning https://huggingface.co/docs/datasets/package_reference/builder_classes#datasets.DatasetBuilder.download_and_prepare
Hi,
as I understand https://huggingface.co/docs/datasets/filesystems#download-and-prepare-a-dataset-into-a-cloud-storage the mentioned function first loads the full dataset to the cache then processes it and uploads it to the cloud storage. Is there a way to do this process batch-wise, so that I do not have to e.g. load the full https://huggingface.co/datasets/wiki_dpr/discussions?status=open&type=discussion into my cache at once?
I might be missing something here?
(Beyond that is there a best practice to load the above mentioned dataset to a cloud storage?)
thanks
Beta Was this translation helpful? Give feedback.
All reactions