Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup XML input before processing #2190

Merged
merged 2 commits into from
Dec 7, 2024
Merged

Conversation

alerque
Copy link
Member

@alerque alerque commented Dec 5, 2024

See discussion in #1837, not actual issue but a side quest.

@alerque alerque added this to the v0.15.8 milestone Dec 5, 2024
@alerque alerque added the todo label Dec 5, 2024
@alerque alerque requested a review from Omikhleia December 5, 2024 13:40
-- the source. We're not using it, so we don't care and it is clutter in the AST that makes it different from
-- ASTs generated from SIL inputs.
for i = 1, #options do
options[i] = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cost/benefit triggering key removals on all attributes in a big XML file? (triggering table realloc and GC) vs. just keeping these entries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cost is less than nil. The info is going to be garbage collected anyway at the end of the SILE run, and doing it sooner rather than later actually reduces overall memory usage. The table realloc is hard to measure; this abstraction for dropping the list-like keys is pretty efficient. In fact most of my test runs have it coming in faster, possible because of the way we copy the options table around later in processing.

On a sample XML stress test file (99,999 commands with 9 args each) the difference is in the noise, but with the "cleaned" version coming in faster on most runs (the error bars are affected by other things I'm doing on my desktop while the test runs):

$ hyperfine -r 10 -L sile sile-master,sile-clean-xml "./{sile} -b dummy foo.xml"
Benchmark 1: ./sile-master -b dummy foo.xml
  Time (mean ± σ):      9.631 s ±  0.404 s    [User: 9.192 s, System: 0.400 s]
  Range (min … max):    9.267 s … 10.563 s    10 runs

Benchmark 2: ./sile-clean-xml -b dummy foo.xml
  Time (mean ± σ):      9.567 s ±  0.190 s    [User: 9.113 s, System: 0.418 s]
  Range (min … max):    9.317 s …  9.808 s    10 runs

Summary
  ./sile-clean-xml -b dummy foo.xml ran
    1.01 ± 0.05 times faster than ./sile-master -b dummy foo.xml

Benefit: Nothing to get confused in the debug output, the debug output matches what we're actually looking at in the data, and the SIL vs. XML ASTs match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, would be neat to propose this lunarmodules/luaexpat#41

And then use local parser = lxp.new(content, nil, nil, false) in our XML inputter.

@alerque
Copy link
Member Author

alerque commented Dec 5, 2024

I'd be happy to have this be an option upstream too, but I see no harm in parsing the input for just what we plan to work with until such an option is available upstream.

@Omikhleia
Copy link
Member

I'd be happy to have this be an option upstream too, but I see no harm in parsing the input for just what we plan to work with until such an option is available upstream.

I proposed such an option upstream, we'll see.
We have been living so long with it that I don't feel there's any urge to "fix" a non-issue that just affects some bad tracing utilities 😁

@alerque alerque merged commit c3fdfcc into sile-typesetter:master Dec 7, 2024
21 checks passed
@alerque alerque deleted the xml branch December 7, 2024 23:21
@alerque alerque self-assigned this Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants