Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast seek() for multiprocessing #28

Open
meshiguge opened this issue Dec 18, 2017 · 1 comment
Open

fast seek() for multiprocessing #28

meshiguge opened this issue Dec 18, 2017 · 1 comment

Comments

@meshiguge
Copy link

here I want to split warc file to small chunks and then use multiprocessing in python

for text file, we can use seeks, but how to seek in warc module or .gz warc files ??
any advices ?

@kartheek7895
Copy link

kartheek7895 commented Mar 19, 2018

You can open it as gzip file and perform seek, then from there you can pass the file pointer to WARCReader

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants