Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported WARC version: 1.1 #34

Open
kiska3 opened this issue Feb 12, 2021 · 2 comments
Open

Unsupported WARC version: 1.1 #34

kiska3 opened this issue Feb 12, 2021 · 2 comments

Comments

@kiska3
Copy link

kiska3 commented Feb 12, 2021

example file

f = warc.open("example.warc.gz")
for record in f:
     print record['WARC-Target-URI'], record['Content-Length']

Expected Behaviour

Prints records with URI + Content Length

Observed Behaviour:

Traceback (most recent call last):
File "", line 1, in
File "/home/kiska/.local/lib/python2.7/site-packages/warc/warc.py", line 390, in iter
record = self.read_record()
File "/home/kiska/.local/lib/python2.7/site-packages/warc/warc.py", line 373, in read_record
header = self.read_header(fileobj)
File "/home/kiska/.local/lib/python2.7/site-packages/warc/warc.py", line 334, in read_header
raise IOError("Unsupported WARC version: %s" % version)
IOError: Unsupported WARC version: 1.1

@ivanistheone
Copy link

ivanistheone commented Feb 12, 2021

This seems to be a known issue for .warc.gz files.

See workaround in #21 (comment) and #21 (comment) .

@JustAnotherArchivist
Copy link

@ivanistheone Yes, there are many bugs in this library. This is a lack of support for the WARC/1.1 specification though and has nothing to do with #21.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants