-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate and investigate performance #108
Comments
There should be some very low hanging fruit here - I am seeing almost 100ms to run a phase which is doing nothing at all (types phase, for json_codable). |
Ok, so I narrowed this down to the initial query for the target, which is taking >80ms.
I actually suspect this is just us needing to change the TCP connection settings, but will continue to dig. The actual query on the server side is <1ms to compute. This query is going away soon, but in general queries should not be this slow 🤣 Update: Confirmed we weren't setting tcpNoDelay, fixed in #109. Still more work to do though, we are about 200ms now in total which is overall about 20x faster end to end. |
#104 should also improves things here. |
We previously discussed invalidation in dart-lang/sdk#55784, and it seems to me that timings provided there will still be the same, even with new queries. The analyzer invalidates and links whole library cycles, and has to parse all files of the library cycle. We cannot currently do it with higher granularity. We still want by UX reasons to have single macro generated file per library. We can avoid running individual macros, or individual phases, but this will save us only so much. |
This is why I was saying yesterday that I didn't think it would be sufficient to just do the "easy" thing and make an implementation of the macro executor which can return cached results. It sounded like more work would be needed specific to the implementation to cache the fully merged result. Essentially, you need to be able to ask if any macro application in the entire library was invalidated, and if not be able to re-use the parsed AST for the fully merged result. So, in general yes I agree, except that the new queries can simplify (and unify) the cache invalidation story, and they can also allow for more granular invalidation. |
Yes, I agree that queries might simplify the invalidation checks. Could you walk me through a scenario? Lets say we have the library However we cannot use previously remembered It looks to me that we cannot just run all queries from all macros and decide whether we can reuse the merged result of running macros. We still have to do this macro by macro. |
cc @davidmorgan Good points, I had forgotten this extra complication of the query results being affected also by macros. So, you need to somehow be able to return the query results as of a specific point in time, which is not only per phase but also even can be affected by individual macros in the phase. One possible idea, is to track somewhere on each declaration information about when it was introduced (either in the element model or AST, not sure which makes most sense). That sounds like potentially a lot of work to track though, and it isn't immediately obvious the best way to track the stuff that happens mid-phase, although I am sure we could work something out. Does it sound like I am understanding the issue properly? |
I am also curious how much of this is a result of the merged augmentation file. If we decided to not merge augmentations, but instead created one per macro application (and phase), it might simplify/improve this situation. Then you could save each of the parsed augmentation results separately, and go back through the merging process from the beginning, but with cached results for most of the augmentations? |
My thoughts about many macro generated files are still the same as in dart-lang/sdk#55784 (comment). |
I agree from a usability standpoint that one file is better, but I think performance is the more important usability concern. If not merging the files means we can avoid re-parsing all downstream macro generated code, then I think that is probably worth doing. We can evaluate other options for improving the user experience too. |
Hm... Actually, we merge macro generated results into a single file after running all macros. |
Well, I think the difference is we would have to cache all the parsed but pre-merged augmentations, as well as the final parsed merged augmentation? If none of the pre-merged ones had to be regenerated, then we can re-use the final result as well. So, double the AST nodes cached for macro files. But, if we did that, we might be able to do something reasonable here? Essentially something like this at a very high level:
At a very high level does that sound correct/feasible? I worry a bit about the last step, it has to happen always no matter what. And in general, we are always repeating the work of merging the augmentations into the element model. Is this more work than if the file was hand written, or would we always be re-creating the entire element model from scratch anyways? @johnniwinther is this something that would be feasible in the CFE? I don't want to overfit to the analyzer here either. |
We probably have different conceptions how the element model is handled in the analyzer. When there is any change to a file in a library cycle itself, or any of its dependencies, we discard the whole element model for this library cycle, and re-link it from scratch. This starts with parsing all files of the library cycle. So, the step "remove all macro augmentations" does not make sense. If there was a change, there is nothing left already. Also, saying "re-use previous parsed AST" might have different meaning. We do not, and probably should not, keep parsed ASTs that were used for linking element models after linking is done. It would consume more memory than necessary. Theoretically we could serialize parsed ASTs, and this might be faster to read from some binary format than to parse from scratch. Probably not hard to do, but quite a lot of work, and the performance benefits are unknown until we try. |
I think it would help to have concrete goals for performance to motivate the work, particularly to guide how far we need to go here in considering different designs and approaches. Here are my thoughts as a starting point :) GoalsBetter Than CodegenMacros need to be a better user experience than
There are two paths to improvement here:
We will certainly work on making macros faster, the question is how far we can go: can macros be fast enough that we just say to users "you have no choices here because it's always fast enough"? I think there's a reasonable chance the answer is "no", so now seems a good time to ask what options we have and how much work they are to build. ScalableWe've talked about macros being "scalable", and I know what I think that means, let's see if we have consensus :) For me "scalable" means that we've checked how performance scales with a set of measures of input size, and are satisfied with the results. Example measures of input size: total program number of different macros, total program number of macro applications, library cycle number of different macros, library cycle number of macro applications. "Satisfied with the results" is tricky because macro runtime is always part of a larger operation: it's part of a larger analysis or compile. We know for example that incremental analysis on a library cycle has to do work on the whole library cycle, so analysis time scales with the library cycle size. That could tempt us to say that it's okay for macro runtime to also scale with library cycle size. What worries me about that outcome is that if we design macros so that So I think it's on us at design time to justify the scalability as much as possible from basic principles: by arguing that the work done is precisely what is necessary and no more. With this in mind, an important challenge to macro scalability is the generality of the feature: for reasons of usability, a lot of things "just work", sharp edges are minimized. But the common cases, and so the most important cases for performance, will tend to be simple. For example, if a library cycle uses ten different macros, it's extremely unlikely that they all interact, in the sense of output from one macro being input to another. It is much more likely that they are mostly independent: acting on different declarations or within different parts of the namespace when they do touch the same declarations. So we need that to be fast, with little or no cost paid for generality that goes unused. BraindumpVarious performance-related thoughts. Lazy MacrosThe analyzer doesn't actually need full macro output unless the user clicks through to an augmentation file; before that it only needs sufficient output to analyze other files, which could for example mean skipping phase 3. More fine-grained optimizations are possible. For example, Swift macros can specify in advance the names they will produce. The compiler can then avoid running the macro at all if the possible names are not interesting for the currently-running compiler task. Our macros are much more complicated, of course :) Optimistic MacrosIf macros generally don't interact with each other, then "run quickly assuming no interactions, start a slower run in parallel" might give good results. Analyzer UX / ParallelismMost of what the analyzer does is faster than running macros, and the results are wanted with low latency: so it might be an improvement to delay running macros until the faster results are reported, or to push macro evaluation to another process so it doesn't block the faster results. Just generally we should look for any opportunities to use multiple processes as this looks like free additional resources on most dev machines. Simple Hacks for Simple ThingsWith If we decide "correct macro analysis after every keypress" is a hard requirement, it's likely the hardest case to optimize for so we should be prepared to tackle it directly. Pause ButtonIf we can't make macro execution lazy / off the critical path, another option is to have a way to choose when to run it. Disk SupportJust generally a lot of the "escape hatches" for codegen performance relate to it working with files on disk that can be reused, prebuilt, and manually hacked on. Maybe it's something we should look at fleshing out. Share Analyzer Output with CFEUsers of incremental CFE compile are likely also running the analyzer; the CFE could directly use augmentations output by the analyzer in this case. Publish to PubFor faster initial analysis / initial compile. VersioningNow that we plan to support breaking changes to macros, we are at least not fully stuck if we discover that macros are slow in an important+unfixable way: we can add an alternative faster implementation and deprecate (but not remove) the slow one. Next StepsI guess, let's chat in tomorrow's meeting :) Thanks. |
This is actually not true, based on dart-lang/sdk#55784 (comment) and dart-lang/sdk#55784 (comment). Macros themselves actually are less than half the time even after some optimizations to other pieces were applied. This is why I have been trying to address things from the parsing/merging of results angle, that appears to actually be a fundamental issue which making macros faster, or even more incremental, will not solve (not unless the incrementality also fixes those other issues). |
Sorry, what I meant was: the findings the analyzer could report by totally ignoring macros, or pretending the output didn't change; vs the findings after fully evaluating macros. But, yes, and one related thought: it might be useful to talk about performance of augmentations as a separate thing, it's not clear to me how much of that incremental cost is special to augmentations vs what we would see with the code just inline. |
@scheglov should comment but my understanding is that this isn't an issue with augmentations themselves. It is an issue with us re-creating new augmentation files all the time whenever macros are applied to a library, as well as the duplicate parsing of all macro produced results. We end up always regenerating and re-parsing macro generated augmentation files whenever a macro re-runs, even if we don't actually re-run the macro itself. |
There are many aspects to this, I figure we can collect info in this issue so as not to duplicate too much work.
Some specific tasks are:
The text was updated successfully, but these errors were encountered: