-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize switch fallthrough #2054
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fallThroughBlock = getOneIntersectionBlock(out, caseBlocks, df); | ||
} else { | ||
if (postDomMap == null) { | ||
Set<BlockNode> allPathsBlocks = BlockUtils.getAllPathsBlocks(block, out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like your change is semantically different from original algorithm. You are using here constant in loop block
variable, but it was successor
before, and it changes in each iteration, that is why these result were calculated every time.
For now, am I not sure why it was done this way and as soon as test didn't fail maybe we really can use your approach. Although, I want to investigate and check this on other cases, so it may take some time.
And, if possible, please share your sample apks, so I can also profile this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick response!
It is slightly semantically different, since I'm computing the post-dom map for the whole switch once, instead of calculating a separate post-dom map for each case in the switch.
As for a sample APK, WhatsApp is as an example of an app that has been optimized by Redex and spends a lot of time in processSwitch
, making it relevant in this case.
return calcPartialPostDominance(mth, mth.getBasicBlocks(), mth.getPreExitBlocks().get(0)); | ||
Map<BlockNode, BitSet> postDominanceMap = mth.getPostDominanceMap(); | ||
if (postDominanceMap == null) { | ||
postDominanceMap = calcPartialPostDominance(mth, mth.getBasicBlocks(), mth.getPreExitBlocks().get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caching here may cause issues, because basic blocks tree changing during 'blocks' stage, and as soon as this is a utility method, we need some guard to prevent using it at that stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that by "blocks stage" you mean the passes relating to blocks IR (BlocksSplitter
and BlockProcessor
). Is that correct? Or is the CFG modified again in later passes?
I've looked into how you've implemented dominator computation in BlockProcessor
, and it seems that the same thing can be done for post dominators. Simply compute all post dominators BlockProcessor
and store them as a postDoms
field in BlockNode
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by "blocks stage" you mean the passes relating to blocks IR
Yes, and blocks are locked at the end of BlockProcessor and not modified later.
Simply compute all post dominators BlockProcessor and store them as a postDoms field in BlockNode. What do you think?
This is how it supposes to be and this way it is easier to use in other places 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The locking looks very cool!
I'll start working on it then 💪
Update:
But after that I actually change blocks structure to have only one exit node, so now we really can use calc post-dominance for whole method and cache it 🎉 |
@DanielFi is there are any progress? |
Ok, I implemented calculation of post-dominators using the same approach and algorithm as dominators. |
I profiled jadx and noticed that
RegionMaker::processSwitch
was taking up a surprising amount of the total de-compilation time (1.92% on my laptop with a certain real-world apk).Apps that use Redex for optimization tend to have many large switch cases, which amplifies this problem.
The computationally hardest part of
processSwitch
is calculating the post dominators for almost every case in the switch. I improved this by only calculating the post dominators map once for the whole switch and reusing it, instead of once for each case.While doing so, I noticed that most other calls to
BlockUtils::calcPartialPostDominance
were computing the post dominance for the entire method, so I added a "cache" toBlockUtils::calcPostDominance
in order to avoid recomputing the same thing over and over.