Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[iOS / Native AOT] Bad stack trace due to unsupported marshalling behavior #3280

Open
tipa opened this issue Apr 10, 2024 · 14 comments · May be fixed by #3729
Open

[iOS / Native AOT] Bad stack trace due to unsupported marshalling behavior #3280

tipa opened this issue Apr 10, 2024 · 14 comments · May be fixed by #3729

Comments

@tipa
Copy link

tipa commented Apr 10, 2024

Package

Sentry

.NET Flavor

.NET

.NET Version

8.0.2

OS

iOS

SDK Version

4.2.1

Self-Hosted Sentry Version

No response

Steps to Reproduce

Throw an exception in the AppDelegate.FinishedLaunching method in an NativeAot-compiled app.

To run a Native AOT compiled app locally on a physical device use dotnet publish /t:Run /p:_DeviceName=DEVICEID where DEVICEID can be queried using xcrun xctrace list devices.

There might be more prerequisites, like the managed exception to be marshalled to native code and back to managed.

Expected Result

Crash with useful strack trace including managed C# function from where it originated

Actual Result

Thread 0 Crashed:
0   libsystem_kernel.dylib          0x39fe95fbc         __pthread_kill
1   libsystem_pthread.dylib         0x3e519567c         pthread_kill
2   libsystem_c.dylib               0x3210abb8c         abort
3   MyXYTestApp                     0x20499ce8c         xamarin_assertion_message (runtime.m:1461)
4   MyXYTestApp                     0x20499d5d4         xamarin_process_managed_exception (runtime.m:2441)
5   MyXYTestApp                     0x20499d458         xamarin_process_managed_exception_gchandle (runtime.m:1123)
6   MyXYTestApp                     0x2049c6508         -[App application:didFinishLaunchingWithOptions:] (registrar.mm:2760)
7   UIKitCore                       0x31562361c         -[UIApplication _handleDelegateCallbacksWithOptions:isSuspended:restoreState:]
8   UIKitCore                       0x315622784         -[UIApplication _callInitializationDelegatesWithActions:forCanvas:payload:fromOriginatingProcess:]
9   UIKitCore                       0x315621768         -[UIApplication _runWithMainScene:transitionContext:completion:]
10  UIKitCore                       0x3156213b4         -[_UISceneLifecycleMultiplexer completeApplicationLaunchWithFBSScene:transitionContext:]
11  UIKitCore                       0x31559dedc         _UIScenePerformActionsWithLifecycleActionMask
12  UIKitCore                       0x3156253b0         __101-[_UISceneLifecycleMultiplexer _evalTransitionToSettings:fromSettings:forceExit:withTransitionStore:]_block_invoke
13  UIKitCore                       0x31554ce44         -[_UISceneLifecycleMultiplexer _performBlock:withApplicationOfDeactivationReasons:fromReasons:]
14  UIKitCore                       0x31554b8bc         -[_UISceneLifecycleMultiplexer _evalTransitionToSettings:fromSettings:forceExit:withTransitionStore:]
15  UIKitCore                       0x31554b224         -[_UISceneLifecycleMultiplexer uiScene:transitionedFromState:withTransitionContext:]
16  UIKitCore                       0x31554b0f4         __186-[_UIWindowSceneFBSSceneTransitionContextDrivenLifecycleSettingsDiffAction _performActionsForUIScene:withUpdatedFBSScene:settingsDiff:fromSettings:transitionContext:lifecycleActionType:]_block_invoke
17  UIKitCore                       0x31554affc         +[BSAnimationSettings(UIKit) tryAnimatingWithSettings:fromCurrentState:actions:completion:]
18  UIKitCore                       0x31554a884         _UISceneSettingsDiffActionPerformChangesWithTransitionContextAndCompletion
19  UIKitCore                       0x31554a534         -[_UIWindowSceneFBSSceneTransitionContextDrivenLifecycleSettingsDiffAction _performActionsForUIScene:withUpdatedFBSScene:settingsDiff:fromSettings:transitionContext:lifecycleActionType:]
20  UIKitCore                       0x3158ce26c         __64-[UIScene scene:didUpdateWithDiff:transitionContext:completion:]_block_invoke.225
21  UIKitCore                       0x3155496b8         -[UIScene _emitSceneSettingsUpdateResponseForCompletion:afterSceneUpdateWork:]
22  UIKitCore                       0x315549528         -[UIScene scene:didUpdateWithDiff:transitionContext:completion:]
23  UIKitCore                       0x315661c94         -[UIApplication workspace:didCreateScene:withTransitionContext:completion:]
24  UIKitCore                       0x315661a2c         -[UIApplicationSceneClientAgent scene:didInitializeWithEvent:completion:]
25  FrontBoardServices              0x3419176d0         -[FBSScene _callOutQueue_didCreateWithTransitionContext:completion:]
26  FrontBoardServices              0x34191756c         __92-[FBSWorkspaceScenesClient createSceneWithIdentity:parameters:transitionContext:completion:]_block_invoke.108
27  FrontBoardServices              0x341916198         -[FBSWorkspace _calloutQueue_executeCalloutFromSource:withBlock:]
28  FrontBoardServices              0x341921f88         __92-[FBSWorkspaceScenesClient createSceneWithIdentity:parameters:transitionContext:completion:]_block_invoke
29  libdispatch.dylib               0x320fac2fc         _dispatch_client_callout
30  libdispatch.dylib               0x320fafd44         _dispatch_block_invoke_direct
31  FrontBoardServices              0x34191251c         __FBSSERIALQUEUE_IS_CALLING_OUT_TO_A_BLOCK__
32  FrontBoardServices              0x34191249c         -[FBSMainRunLoopSerialQueue _targetQueue_performNextIfPossible]
33  FrontBoardServices              0x341912374         -[FBSMainRunLoopSerialQueue _performNextFromRunLoopSource]
34  CoreFoundation                  0x310fef0a8         __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__
35  CoreFoundation                  0x310fee324         __CFRunLoopDoSource0
36  CoreFoundation                  0x310fecad8         __CFRunLoopDoSources0
37  CoreFoundation                  0x310feb814         __CFRunLoopRun
38  CoreFoundation                  0x310feb3f4         CFRunLoopRunSpecific
39  GraphicsServices                0x397b374f4         GSEventRunModal
40  UIKitCore                       0x31563e89c         -[UIApplication _run]
41  UIKitCore                       0x31563ded8         UIApplicationMain
42  MyXYTestApp                     0x20499034c         xamarin_UIApplicationMain (bindings.m:126)

I believe this is caused because of this event handler that changes the default marshalling behavior of exceptions:

ObjCRuntime.Runtime.MarshalManagedException += (_, args) =>
{
args.ExceptionMode = ObjCRuntime.MarshalManagedExceptionMode.UnwindNativeCode;
};

The code was introduced after this discussion in the Xamarin repo: xamarin/xamarin-macios#15252

According to @rolfbjarne from the .NET iOS/macOS team, CoreCLR (which is what is used on iOS when using Native AOT as well as on macOS) doesn't support MarshalManagedExceptionMode.UnwindNativeCode and therefore, this assertion is crashing the process:
https://github.com/xamarin/xamarin-macios/blob/907081f787315704a01c940cf28b46b47db23df0/runtime/runtime.m#L2362-L2364
https://github.com/xamarin/xamarin-macios/blob/907081f787315704a01c940cf28b46b47db23df0/runtime/runtime.m#L2452-L2455

References

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Apr 10, 2024
@tipa tipa changed the title [iOS / Native AOT] App crashes while unwinding [iOS / Native AOT] Bad stack trace due to unsupported marshalling behavior Apr 10, 2024
@jamescrosswell
Copy link
Collaborator

According to @rolfbjarne from the .NET iOS/macOS team, CoreCLR (which is what is used on iOS when using Native AOT as well as on macOS) doesn't support MarshalManagedExceptionMode.UnwindNativeCode and therefore, this assertion is crashing the process:

Hm, I'm not sure I follow everything that's going on here but it seems like that assertion is also dependent on xamarin_is_gc_coop, which is only true when TARGET_OS_WATCH:

#if TARGET_OS_WATCH
bool xamarin_is_gc_coop = true;
#else
bool xamarin_is_gc_coop = false;
#endif

I can't see anything from Sentry in that stack trace though. Do you know what kind of exception was originally thrown (that resulted in this stack trace)?

@tipa
Copy link
Author

tipa commented Apr 11, 2024

I don' think it is dependent on xamarin_is_gc_coop. I might have posted links from the wrong commit, this one shows how the xamarin_assertion_message method is called in line 2441, like in the stack trace:
https://github.com/xamarin/xamarin-macios/blob/ed26faa94fe2734d9c1014d2e6ef7173d4d77690/runtime/runtime.m#L2441
and then abort() in line 1461:
https://github.com/xamarin/xamarin-macios/blob/ed26faa94fe2734d9c1014d2e6ef7173d4d77690/runtime/runtime.m#L1461

I don't know what exception was originally thrown (that's what I'm trying to find out), @rolfbjarne recommended that I should also subscribe to the ObjCRuntime.Runtime.MarshalManagedException event and then log the exception manually (more on that is documented here. UnwindNativeCode is documented as This isn't available when [...] when using CoreCLR).

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Apr 11, 2024
@jamescrosswell
Copy link
Collaborator

OK thanks, that makes more sense. If we can find a way to throw an exception that reproduces this behaviour then we should be able to work out a solution 👍🏻

@tipa
Copy link
Author

tipa commented Sep 13, 2024

@jamescrosswell you can reproduce this behavior / stack trace by simply throwing an exception in the AppDelegate.FinishedLaunching method in an NativeAot-compiled app. You might have to upload a build to TestFlight as (to my knowledge) it is not possible to run NativeAOT compiled iOS apps locally.
Since there will be more adoption of NativeAOT in the future (e.g. with the launch of .NET 9), would it be possible to optionally disable this part of the Sentry initialization, so that the unsupported marshalling behavior isn't used?

ObjCRuntime.Runtime.MarshalManagedException += (_, args) =>
{
args.ExceptionMode = ObjCRuntime.MarshalManagedExceptionMode.UnwindNativeCode;
};

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Sep 13, 2024
@bitsandfoxes
Copy link
Contributor

would it be possible to optionally disable this part of the Sentry initialization

We could definitely put these on the Cocoa specific native options - like we do with the LogCat options for Android.

@bitsandfoxes bitsandfoxes added this to the 5.0.0 milestone Sep 16, 2024
@bitsandfoxes
Copy link
Contributor

Upping the priority on this as it crashes the process.

@jamescrosswell
Copy link
Collaborator

Upping the priority on this as it crashes the process.

@bitsandfoxes what do we want to do about this?

you might have to upload a build to TestFlight as (to my knowledge) it is not possible to run NativeAOT compiled iOS apps locally.

Realistically, we can't test for things like this until/unless we create a developer profile and an app in the Apple store that we start down the QA route. Having done that once personally for a Flutter app, I suspect it's going to take about 3-5 days to get this all setup via CI so that we can easily deploy new versions of our app (e.g. Sentry.Samples.MAUI) to the iOS store... there are various encryption/deployment keys to manage that we'll need to sync between CI and our local machines etc.

I don't think we've got the bandwidth to do this right now. Perhaps once we've sorted out what we're going to do about net9.0 (and the device tests) we could circle back to this.

@tipa
Copy link
Author

tipa commented Oct 8, 2024

@bitsandfoxes what do we want to do about this?

Would it not be possible to allow an optional/temporary way to disable the changing of marshalling bahavior, as I suggested above?
I don't know if it would really work, but at the moment, the stack traces are just barely understandable...

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 8, 2024
@tipa
Copy link
Author

tipa commented Oct 8, 2024

I was able to run a Native AOT compiled app locally on a physical device using dotnet publish /t:Run /p:_DeviceName=DEVICEID where DEVICEID can be queried using xcrun xctrace list devices

@jamescrosswell
Copy link
Collaborator

I was able to run a Native AOT compiled app locally on a physical device using dotnet publish /t:Run /p:_DeviceName=DEVICEID where DEVICEID can be queried using xcrun xctrace list devices

That helps... it means we don't have to push it all the way through to test flight. Just a bit of fluffing around setting up encryption keys, developer profiles and configuring an app in the apple dev portal.

So it sounds a little less painful, when we do have the bandwidth to look into this.

@jamescrosswell
Copy link
Collaborator

@tipa I'm just looking at this issue and, embarrassingly, you opened it in April last year!

In any event, I'm a bit stuck. Bryce created a PR attempting to fix this but when I tested it out, I didn't see any noticeable difference in the stack traces with MarshalManagedExceptionMode.UnwindNativeCode and MarshalManagedExceptionMode.Disabled... So it doesn't look like that code is causing the problem.

Ultimately, the problem is that you're not getting very useful stack traces for exceptions in AOT Compiled applications though right?

@tipa
Copy link
Author

tipa commented Jan 16, 2025

Yes I didn't receive useful stack traces, because the app process crashed here when going through the native & managed layers: https://github.com/xamarin/xamarin-macios/blob/907081f787315704a01c940cf28b46b47db23df0/runtime/runtime.m#L2362-L2364 - at least this is the explanation I got from @rolfbjarne.

I am currently not using Sentry because I had to remove it after this problem while migrating to .NET9. At the moment I am just registering to the Runtime.MarshalManagedException event and logging exceptions to my backend, without altering the MarshalManagedExceptionMode. The stack traces look good.

I don't know why you don't see any difference in your tests. But from what I can tell based on your screenshot (#3729 (comment)), the stack trace looks pretty similar to the one I posted and also contains this part that indicates the app being aborted due to unsupported marshaling behavior:

Image

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Jan 16, 2025
@jamescrosswell
Copy link
Collaborator

Thanks again for all the info @tipa, as always.

I am currently not using Sentry because I had to remove it after #3620 while migrating to .NET9.

That's a shame. That was due to an issue in the Apple tooling, which was resolved in the XCode 16.2 release.

At the moment I am just registering to the Runtime.MarshalManagedException event and logging exceptions to my backend, without altering the MarshalManagedExceptionMode. The stack traces look good.

Arggh. Frustrating. I'll give it another go - although it feels a bit like Einstein's definition of insanity at this point!

@tipa
Copy link
Author

tipa commented Jan 17, 2025

When I create a blank iOS app (dotnet new ios) and throw an exception in FinishedLaunching while running with NativeAot, I do not get this part in the native stack trace (gathered by NSThread.NativeCallStack), which would indicate the process crashing here.

Thread 0 Crashed:
0   libsystem_kernel.dylib          0x39fe95fbc         __pthread_kill
1   libsystem_pthread.dylib         0x3e519567c         pthread_kill
2   libsystem_c.dylib               0x3210abb8c         abort
3   MyXYTestApp                     0x20499ce8c         xamarin_assertion_message (runtime.m:1461)

In your screenshot, these lines still seem to be present:

Image

I also cannot reproduce the stack trace at the moment when using my own logging, the top of the stack trace always looks like the one below, even when setting the ExceptionMode to UnwindNativeCode - maybe it's because I have a different setup (e.g. no native library catching and sending out the crash report):

0   test                                0x00000001050f9ef0 xamarin_log + 6740
1   test                                0x00000001049ca660 test + 206432
2   test                                0x0000000104e28d80 test + 4787584
3   test                                0x0000000104e10b00 test + 4688640
4   test                                0x00000001050f0f94 xamarin_get_block_descriptor + 8052

Wish I could provide more help, but I'm also no expert in the matter, unfortunately.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: No status
Status: Waiting for: Product Owner
Development

Successfully merging a pull request may close this issue.

4 participants