Skip to content
This repository has been archived by the owner on Nov 17, 2022. It is now read-only.

抓取某本书时lxml.etree模块抛出Error,附解决方案 #89

Open
yyihuan opened this issue Jun 13, 2021 · 0 comments
Open

抓取某本书时lxml.etree模块抛出Error,附解决方案 #89

yyihuan opened this issue Jun 13, 2021 · 0 comments

Comments

@yyihuan
Copy link

yyihuan commented Jun 13, 2021

在抓取 https://hit-scir.gitbooks.io/neural-networks-and-deep-learning-zh_cn/content/ 这本书时,其它页面正常运作,但某页会出现错误并中断。

done :  https://hit-scir.gitbooks.io/neural-networks-and-deep-learning-zh_cn/content/chap3/c3s0.html
Traceback (most recent call last):
  File "gitbook.py", line 5, in <module>
    Gitbook2PDF(url).run()
  File "/Users/cxjh168/Downloads/gitbook2pdf-master/gitbook2pdf/gitbook2pdf.py", line 198, in run
    loop.run_until_complete(self.crawl_main_content(content_urls))
  File "/Users/cxjh168/anaconda3/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "/Users/cxjh168/Downloads/gitbook2pdf-master/gitbook2pdf/gitbook2pdf.py", line 220, in crawl_main_content
    await asyncio.gather(*tasks)
  File "/Users/cxjh168/Downloads/gitbook2pdf-master/gitbook2pdf/gitbook2pdf.py", line 241, in gettext
    text = ChapterParser(metatext, title, level, ).parser()
  File "/Users/cxjh168/Downloads/gitbook2pdf-master/gitbook2pdf/gitbook2pdf.py", line 105, in parser
    return html.unescape(ET.tostring(context).decode())
  File "src/lxml/etree.pyx", line 3437, in lxml.etree.tostring
  File "src/lxml/serializer.pxi", line 139, in lxml.etree._tostring
  File "src/lxml/serializer.pxi", line 199, in lxml.etree._raiseSerialisationError
lxml.etree.SerialisationError: IO_ENCODER
(base) MacBook-Pro:gitbook2pdf-master

我的解决方案是,修改了gitbook2pdf.py文件的第105行,增加了encode

return html.unescape(ET.tostring(context).decode())  # 原来的
return html.unescape(ET.tostring(context, encoding='utf-8').decode())  # 修改后

然后可以正常运作了

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant