Replies: 50 comments 64 replies
-
Great feedback, thanks for calling this out! Closing gaps like this with WPF is certainly a goal. @Austin-Lamb is on the hook to investigate. |
Beta Was this translation helpful? Give feedback.
-
@Xarlot - thank you for making such a minimal repro, that is greatly appreciated :) I'm about to go on vacation for Thanksgiving, so I won't be able to get to this until early December, but I'm putting it in my queue and will get back to you. |
Beta Was this translation helpful? Give feedback.
-
any news? |
Beta Was this translation helpful? Give feedback.
-
@danzil are you aware of the issue of not being able to build Release apps? @Austin-Lamb are you alright if I take this? I'd be hesitant on doing any further investigation at the moment. Debug builds don't use the .NET Native toolchain, so the better comparison would be to use UWP w/o WinUI and test Release vs Release. However, that might even be a moot point because the current plan of record is to move to .NET5, which has a different interop layer, and we need to be cognizant of the perf implications of that. I'd also be curious to see what the time is for a c++ app and whether the C++/C# boundary is causing major issues in a test like this. |
Beta Was this translation helpful? Give feedback.
-
tagging @bartekk8 since this is related to perf. |
Beta Was this translation helpful? Give feedback.
-
@stevenbrix Are you referring to the issue that when you target SDK 10.0.17763.0 you get an internal compiler error when building release (this test project targets that SDK and fails to build Release)? I have not seen that error previously, but if you build against a newer SDK like 10.0.18362.0 it works fine. |
Beta Was this translation helpful? Give feedback.
-
@stevenbrix - this is about runtime perf rather than build perf, so I'm going to assign it over to @bartekk8 |
Beta Was this translation helpful? Give feedback.
-
@Austin-Lamb i'm not sure this is runtime perf. in a regular c++ app i'm seeing it's much faster (by about 10x). I'm thinking this might be because of the interop layer, which is something that I'm looking into validating we don't regress with CS/WinRT |
Beta Was this translation helpful? Give feedback.
-
I think we may want 2 issues. We are definitely slow, a C++ app is around 400ms, which we can definitely improve on. But a C# app is about 3500-4000ms on my machine, so we need to make sure the interop layer is much more efficient. |
Beta Was this translation helpful? Give feedback.
-
This in part is likely boxing overhead, the kind of thing that would be helped by using XamlDirect.GetInt32Property rather than DependencyObject.GetValue. |
Beta Was this translation helpful? Give feedback.
-
@MikeHillberg, @AaronRobinsonMSFT looked at this the other day and found that there is a lot of GC overhead that we are causing, most likely due to the high amount of allocations. Hopefully with cs/winrt we can improve this, although unlikely in the WinUI3.0 timeframe. |
Beta Was this translation helpful? Give feedback.
-
I've created #2028 so we can track the managed interop layer. We should move any discussion about that there, and let this issue stay related to |
Beta Was this translation helpful? Give feedback.
-
You're seeing GC overhead even when using XamlDirect.GetInt32Property instead of GetValue? GetInt32Property is typed so that boxing isn't necessary, which saves a lot of allocations. |
Beta Was this translation helpful? Give feedback.
-
No, just with |
Beta Was this translation helpful? Give feedback.
-
We definitely can use more efficient API. Thanks for the hint. Looking forward to net core 5 integration |
Beta Was this translation helpful? Give feedback.
-
Could someone from Microsoft comment on this discussion? Something like this could give us more than no comments from your side: "I apologize, guys, we messed up, but we know where the problem lies, and we will work on resolving it by version 1.5" or "It's not up to us; it depends on another team that plans to address it in the coming six months, which will make WinUI faster." |
Beta Was this translation helpful? Give feedback.
-
Our problems of performance will be solved with the new version of Windows App SDK 1.3.2 (1.3.230602002), specifically with the application initialization and openening new forms. The speed increase has been drastic. I recommend trying teh new version Windows App SDK 1.3.2. |
Beta Was this translation helpful? Give feedback.
-
Thank you @ADD-Eugenio-Lopez for the quick response. Currently, we use ver. 1.3.230502000 and opening a Page with grouped data inside DataGrid takes ages (it is only one example). |
Beta Was this translation helpful? Give feedback.
-
So, we should stick with WPF for performance? Plus some library to make the app look more modern such as wpfui? |
Beta Was this translation helpful? Give feedback.
-
@MikeHillberg @Austin-Lamb @stevenbrix @BorzillaR where is Microsoft on this issue? Most of the recent comments seem to be from users in the community rather than status updates from Microsoft. Is MS taking action on this? |
Beta Was this translation helpful? Give feedback.
-
(I'm not actually on the XAML team so don't @ me, but...) I did find the following internal work items tracking this:
(I however don't understand the internals of XAML enough to really be able to parse those threads) |
Beta Was this translation helpful? Give feedback.
-
Why was this converted to a discussion? Seriously? This seems like a serious issue. |
Beta Was this translation helpful? Give feedback.
-
I guess I'm not surprised that the Windows/WinRT people don't care about perf. :/ From my experience with UWP and WinRT, the WinRT APIs are enormously slow compared to their Win32 equivalents, sometimes by a factor of more than a 100. Meanwhile, they avoid using .NET because they think .NET is too slow.... while their C++ based WinUI is 50 times slower than .NET's WPF. Oh the irony... |
Beta Was this translation helpful? Give feedback.
-
@Xarlot's benchmarks with the newest 1.5 SDK, 64-bit Release mode, Intel Core i9: WinUI WPF So about 20x difference for write and 10x for read. Obviously much better than before but still pretty friggin abysmal considering WPF does all this in purely managed code. For WinUI if I remove the property change callback - which is the interop from C++ to C# - write drops to around 150-200. So the interop may not be the whole problem but it's still a huge source. |
Beta Was this translation helpful? Give feedback.
-
I mean don't denigrate managed code here, it gets jitted. What I assume we are paying for is the managed to native boundary that exists in WinUI for these scenarios. Something about that must be chatty enough and involve enough tax that it really adds up. So as I said above, either that tax needs to be lowered or the managed to native seams need to be moved to less chatty areas. Or just do it all managed like WPF and put the managed to native seams somewhere with really low chattiness. |
Beta Was this translation helpful? Give feedback.
-
I managed to get WinUI 3 run with NativeAOT. With the latest WinUI 3, now the result on the aot-compiled app is: WinUI WPF Still about 20x difference for write and 10x for read. Apparently the slowness happens on the native code, I think the same performance character should be observed on a C++ WinUI 3 app as well. |
Beta Was this translation helpful? Give feedback.
-
A significant chunk of this slowness comes from CsWinRT, and is not from WinUI's native code. Here are results with the same benchmark written in C++ (which removes that overhead). I took the liberty to fix a bug where it was only doing 100k writes and not 200k (because of a All tests where ran on a Ryzen 9 5900X, using x64 Release mode and running without a debugger. Latest WASDK 1.5. WinUI 3 - C# .NET 8 JIT (don't know how to get AOT working): WinUI 3 - C++/WinRT: FWIW, these results are still miles better than UWP, here are the results in UWP: UWP - C# .NET Native: UWP - C++/WinRT: Considering WinUI 3 is derived from UWP XAML code, this is considerable improvement already. And realistically, I suspect the current numbers are fast enough for most apps out there. C++ benchmark code in case anyone wants to try it: template<typename T>
struct TestClass : DependencyObjectT<TestClass<T>>
{
DependencyProperty TestProperty()
{
static const auto prop = DependencyProperty::Register(L"Test", winrt::xaml_typename<T>(), {L"TestClass<int>", Windows::UI::Xaml::Interop::TypeKind::Custom}, PropertyMetadata(winrt::box_value(T{})));
return prop;
}
T Test()
{
return GetValue(TestProperty()).as<T>();
}
void Test(T value)
{
SetValue(TestProperty(), box_value(value));
}
};
void MainWindow::Button_Click(IInspectable const&, RoutedEventArgs const&)
{
const auto testClass = make_self<TestClass<int>>();
const auto before = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 200000; i++)
testClass->Test(i + 1);
const auto after = std::chrono::high_resolution_clock::now();
const auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(after - before);
Title(winrt::to_hstring(duration.count()));
}
void MainWindow::Button_Click_1(IInspectable const&, RoutedEventArgs const&)
{
const auto testClass = make_self<TestClass<int>>();
const auto before = std::chrono::high_resolution_clock::now();
int test = 0;
for (int i = 0; i < 200000; i++)
test += testClass->Test();
const auto after = std::chrono::high_resolution_clock::now();
const auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(after - before);
Title(winrt::to_hstring(duration.count()));
} |
Beta Was this translation helpful? Give feedback.
-
Perhaps @DarranRowe can comment on this thread. |
Beta Was this translation helpful? Give feedback.
-
I built the nightly version of CsWinRT (which has a bunch of optimizations compared to the current latest one) and did a simple benchmark on both in-proc WinRT call (calling a method in native dll through WinRT ABI) and managed direct call (calling a managed method directly without inlining) to calculate 12+34 in async and sync, respectively. This is the result:
Just note that in-proc WinRT call slower than direct call is apparent as it needs to cross the managed-native boundary. It's the same with calling a method defined in a native dll through P/Invoke. As you can see calling a method through WinRT ABI is generally 10x-20x slower than calling a managed method directly. This performance is quite attractive if we are talking about foreign language interface call. WinUI 3 is a native UI framework which means a method cannot be called directly like WPF, any call to the framework must go though any form of native ABI (either C ABI or WinRT ABI or else).
Update: #9154 (reply in thread) |
Beta Was this translation helpful? Give feedback.
-
More information on WinRT's general efficiency problems compared to vanilla C++: https://devblogs.microsoft.com/oldnewthing/20240301-00/?p=109468
|
Beta Was this translation helpful? Give feedback.
-
Based on my quick test WinUI 3 dependency property nearly 50 times slower than in WPF.
https://github.com/Xarlot/dependencypropertytest
Do you have any plans to improve this behavior?
Beta Was this translation helpful? Give feedback.
All reactions