Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Precompute performance #6931

Open
kripken opened this issue Sep 11, 2024 · 4 comments
Open

Improving Precompute performance #6931

kripken opened this issue Sep 11, 2024 · 4 comments

Comments

@kripken
Copy link
Member

kripken commented Sep 11, 2024

Running on the binary from Google Sheets,

BINARYEN_CORES=1 perf record bin/wasm-opt binary.wasm -all --precompute --no-validation

(1 core to avoid noise from multithreading; validation disabled to focus on the optimization)

The top item from perf report is

     7.41%  wasm-opt  libbinaryen.so        [.] wasm::Type::isRef() const

Reading the code, I can't see an obvious reason for that slowness.

@tlively You mentioned some lock you might look into with wasm::Type - is this related? I don't seem to see a lock taken here though.

My only other guess is that this may just be called many times, and by not being in a header, it isn't getting inlined.

@tlively
Copy link
Member

tlively commented Sep 11, 2024

The lock shouldn't be related, no. That lock is only taken when creating a new HeapType (edit: or Type). The inlining hypothesis seems likely. Perhaps you could look for places that handle the isRef() case before handling the isBasic() case. isBasic() should be super fast and inlineable, so handling that first should generally be faster.

@kripken kripken assigned kripken and unassigned kripken Sep 11, 2024
@kripken
Copy link
Member Author

kripken commented Sep 11, 2024

I experimented with moving all the isBasic() checks into the header, to make them fast:

kripken@90b6e7a?w=1

That does not help, though. Runtime is the same, and the profile shows isRef() replaced with isNonBasicRef(), that is, the majority of the calls seem to go through the slow path anyhow, which is not inlined.

So this still seems like a surprising amount of overhead. @tlively I wonder if we can at least make isRef() super-fast, as it only differentiates references from tuples? Maybe another representation of tuples - which are rare and do not need to be fast - could help the references case.

@tlively
Copy link
Member

tlively commented Sep 12, 2024

I once tried to change the representation of reference types to be pointers to HeapType infos with their nullability encoded in their low bits, but didn't quite get it working. It would be good to try again. This would also remove the need to take the lock when creating a Type other than a tuple type.

kripken added a commit that referenced this issue Sep 16, 2024
…*() methods (#6936)

This just moves code around. As a result, isRef() vanishes entirely from the
profiling traces in #6931, since now the core isRef/Tuple/etc. methods are
all inlineable.

This also required some reordering of wasm-type.h, namely to move HeapType
up front. No changes to that class otherwise.

TypeInfo is now in the header. getTypeInfo is now a static method on Type.

This has the downside of moving internal details into the header, and it may
increase compile time a little. The upside is making the --precompute benchmark
from #6931 significantly faster, 33%, and it will also help the many
Type::isNonNullable() etc. calls we have scattered around the codebase in
other passes too.
kripken added a commit that referenced this issue Sep 17, 2024
This avoids creating a large Literals (SmallVector of Literal) and then copying it. All the places
that construct GCData do not need the Literals afterwards.

This gives a 7% speedup on the --precompute benchmark from #6931
tlively added a commit that referenced this issue Dec 13, 2024
This makes Precompute about 5% faster on a WasmGC binary.

Inspired by #6931.
@tlively
Copy link
Member

tlively commented Dec 13, 2024

I checked the profile here again today and Type::isRef() is thankfully no longer anywhere to be seen. The lowest hanging fruit is now

  • Literal::~Literal() (12.4%)
  • Literal::Literal(const Literal&) (11.9%)
  • Flow::Flow(Literal) (11.6%)

tlively added a commit that referenced this issue Dec 13, 2024
This makes Precompute about 5% faster on a WasmGC binary.

Inspired by #6931.
@tlively tlively changed the title Type::isRef() oddly slow in Precompute Improving Precompute performance Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants