Download a very large dataset #5707
Replies: 1 comment
-
Hi! Loading from Parquet is already very fast, but you can make it even faster by calling |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I want to upload a very large dataset to the hub, and would like to make sure that users are able to efficiently download it. I know that when I upload it, it will be automatically be divided into shards which is great. What will be the most efficient way to download the dataset afterwards? Eg making use of the largest number of concurrent processes/threads etc.
Also, if I upload with push_to_hub, do I need to add a custom download script to make the download more efficient? If I am locally
Uploading it with load_from_disk it works fine.
Beta Was this translation helpful? Give feedback.
All reactions