Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError warc-target-uri #19

Open
vschiavoni opened this issue Nov 4, 2014 · 3 comments
Open

KeyError warc-target-uri #19

vschiavoni opened this issue Nov 4, 2014 · 3 comments

Comments

@vschiavoni
Copy link

I get this:

~$ python warcread.py 
<warc.warc.WARCFile instance at 0x7fc61fc34290>
Traceback (most recent call last):
  File "warcread.py", line 6, in <module>
    print(record['WARC-Target-URI'])
  File "/usr/local/lib/python2.7/dist-packages/warc/warc.py", line 199, in __getitem__
    return self.header[name]
  File "/usr/local/lib/python2.7/dist-packages/warc/utils.py", line 34, in __getitem__
    return self._d[name.lower()]
KeyError: 'warc-target-uri'
```python

Input file can be downloaded from here (191 MB): 
https://www.dropbox.com/s/25tk1mpo03g73pj/1009wb-39.warc.gz?dl=0
@Gijs-Koot
Copy link

Hi, the problem is that there is an error in the documentation I believe;

  print(record['WARC-Target-URI'])

doesn't work, but you should use

 print(record.header['WARC-Target-URI'])

I came here to suggest a fix, but this library is not maintained I think, judging from the commits.

@Segerberg
Copy link

Segerberg commented Jun 29, 2016

My temporary workaround:

import warc
f = warc.open("file.warc.gz", "rb")

for record in f:
    if record['Warc-type'] == 'warcinfo':
        pass
    else:
        print record['Warc-type'],':', record['warc-target-uri']



@girishmt4
Copy link

Hi, the problem is that there is an error in the documentation I believe;

  print(record['WARC-Target-URI'])

doesn't work, but you should use

 print(record.header['WARC-Target-URI'])

I came here to suggest a fix, but this library is not maintained I think, judging from the commits.

Actually it's
record.header.get('WARC-Target-URI')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants