Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process character references in data #60

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

mina86
Copy link
Contributor

@mina86 mina86 commented Mar 3, 2021

Re-escape characters in data to minimise code further. In data
sections only ampersand and less-than sign need to be escaped. Since
characters are always shorter than their entities not escaping what
doesn’t need to saves space. Furthermore, don’t escape ampersand in
situations in which HTML5 dictates it doesn’t need to be escape.

mina86 added 2 commits March 4, 2021 23:56
First of all, avoid unnecessary dictionary lookup by using get()
method once rather than ‘in’ operator followed by lookup.

Second of all, optimise _charref by observing that there are no
named character references which start with a digit or contain
non-alphanumeric characters and the longest named reference consists
of 31 letters (not 32).
Add a handful of new cases into FEATURES_TEXTS; create new class for
convert_charrefs feature and add more casses testing it including
checking behaviour with text new tests for convert_charref feature; add
new test for quality of minification when all options are turned off;
and (when verifying quality) check reduction in bytes in addition to
reduction in character count.

Some of the new tests demonstrate bugs.  Add appropriate comments.
@mankyd
Copy link
Owner

mankyd commented Mar 5, 2021

Thanks for the PR. I am going to spend some time digesting this and hope to get it in soon if it makes sense.

mina86 added 3 commits March 7, 2021 13:57
For historical reasons, inside of an attribute value, a named character
reerence which is not terminated by a semicolon must be interpreted
verbatim.
Re-escape characters in data to minimise code further.  In data
sections only ampersand and less-than sign need to be escaped.  Since
characters are always shorter than their entities not escaping what
doesn’t need to saves space.  Furthermore, don’t escape ampersand in
situations in which HTML5 dictates it doesn’t need to be escape.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants