You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2022. It is now read-only.
The reason is that it uses html.unescape() to convert escape characters into corresponding unicode characters.
However, the original HTML code of "练习 5.3: 编写函数输出所有text结点的内容。注意不要访问<script>和<style>元素,因为这些元素对浏览者是不可见的。" includes <code><script></code>和<code><style></code>.
As a result, if we convert < script> into <script>, the content after <script> will be truncated.
When I remove the call html.unescape() as follows, then the output pdf contains the whole content.
def parser(self):
tree = ET.HTML(self.original)
if tree.xpath('//section[@class="normal markdown-section"]'):
context = tree.xpath('//section[@class="normal markdown-section"]')[0]
else:
context = tree.xpath('//section[@class="normal"]')[0]
if context.find('footer'):
context.remove(context.find('footer'))
context = self.parsehead(context)
- return html.unescape(ET.tostring(context).decode())+ return ET.tostring(context).decode()
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When I convert The Go Programming Language into pdf, the output pdf file is truncated after section 5.2.
The reason is that it uses
html.unescape()
to convert escape characters into corresponding unicode characters.However, the original HTML code of "练习 5.3: 编写函数输出所有text结点的内容。注意不要访问<script>和<style>元素,因为这些元素对浏览者是不可见的。" includes
<code><script></code>和<code><style></code>
.As a result, if we convert
< script>
into<script>
, the content after<script>
will be truncated.When I remove the call
html.unescape()
as follows, then the output pdf contains the whole content.The text was updated successfully, but these errors were encountered: