-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup XML input before processing #2190
Conversation
-- the source. We're not using it, so we don't care and it is clutter in the AST that makes it different from | ||
-- ASTs generated from SIL inputs. | ||
for i = 1, #options do | ||
options[i] = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cost/benefit triggering key removals on all attributes in a big XML file? (triggering table realloc and GC) vs. just keeping these entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cost is less than nil
. The info is going to be garbage collected anyway at the end of the SILE run, and doing it sooner rather than later actually reduces overall memory usage. The table realloc is hard to measure; this abstraction for dropping the list-like keys is pretty efficient. In fact most of my test runs have it coming in faster, possible because of the way we copy the options table around later in processing.
On a sample XML stress test file (99,999 commands with 9 args each) the difference is in the noise, but with the "cleaned" version coming in faster on most runs (the error bars are affected by other things I'm doing on my desktop while the test runs):
$ hyperfine -r 10 -L sile sile-master,sile-clean-xml "./{sile} -b dummy foo.xml"
Benchmark 1: ./sile-master -b dummy foo.xml
Time (mean ± σ): 9.631 s ± 0.404 s [User: 9.192 s, System: 0.400 s]
Range (min … max): 9.267 s … 10.563 s 10 runs
Benchmark 2: ./sile-clean-xml -b dummy foo.xml
Time (mean ± σ): 9.567 s ± 0.190 s [User: 9.113 s, System: 0.418 s]
Range (min … max): 9.317 s … 9.808 s 10 runs
Summary
./sile-clean-xml -b dummy foo.xml ran
1.01 ± 0.05 times faster than ./sile-master -b dummy foo.xml
Benefit: Nothing to get confused in the debug output, the debug output matches what we're actually looking at in the data, and the SIL vs. XML ASTs match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, would be neat to propose this lunarmodules/luaexpat#41
And then use local parser = lxp.new(content, nil, nil, false)
in our XML inputter.
I'd be happy to have this be an option upstream too, but I see no harm in parsing the input for just what we plan to work with until such an option is available upstream. |
I proposed such an option upstream, we'll see. |
See discussion in #1837, not actual issue but a side quest.