Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Crash when NullReferenceException occurs in background thread on x64 #107026

Open
rolfbjarne opened this issue Aug 27, 2024 · 4 comments
Open

Crash when NullReferenceException occurs in background thread on x64 #107026

rolfbjarne opened this issue Aug 27, 2024 · 4 comments

Comments

@rolfbjarne
Copy link
Member

Description

The app crashes after exception handling when a NullReferenceException occurs in a background thread.

Reproduction Steps

using System;
using System.Threading;

static class MainClass {
	static int Main (string [] args)
	{
		var thread = new Thread (() =>
		{
			try {
				Crash.Me ();
			} catch (Exception e) {
				Console.WriteLine ($"E: {e.Message}");
			}
			Console.WriteLine ("C");
		});
		thread.Start ();
		thread.Join ();
		Console.WriteLine ("D");

		return 0;
	}
}

public class Crash {
	public static void Me ()
	{
		Console.WriteLine ("A");
		((object) null).ToString ();
		Console.WriteLine ("B");
	}
}

Project file:

<?xml version="1.0" encoding="utf-8"?>
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <RuntimeIdentifier>osx-x64</RuntimeIdentifier>
    <OutputType>Exe</OutputType>
    <SelfContained>true</SelfContained>
  </PropertyGroup>
</Project>

Run like this:

$ dotnet run
A
E: Object reference not set to an instance of an object.
$ echo $?
138

Two points of note here:

  1. Neither "C" nor "D" from the code is printed.
  2. The exit code is 138, which indicates the executable terminated due to signal 10 (SIGBUS)

macOS also creates a crash report: https://gist.github.com/rolfbjarne/4b6ba90b127d180a07414c18fef4b17e (which corroborates the SIGBUS termination).

The crashing thread:

Thread 1:: com.apple.rosetta.exceptionserver
0   runtime                       	    0x7ff7ffc97414 0x7ff7ffc93000 + 17428

While creating a smaller test case, the crashing stack trace was typically a bit different: https://gist.github.com/rolfbjarne/6d0d1ee838cdae83cfddc8970afe01ec

Thread 2 Crashed:
0   <translation info unavailable>	       0x100d69ba0 ???
1   libsystem_platform.dylib      	    0x7ff80aafaff3 _sigtramp + 51
2   libcoreclr.dylib              	       0x109cff74c SEHExceptionThread(void*) + 1580
3   libsystem_pthread.dylib       	    0x7ff80aacc18b _pthread_start + 99
4   libsystem_pthread.dylib       	    0x7ff80aac7ae3 thread_start + 15

Hopefully it's the same issue though.

Expected behavior

No crash.

Actual behavior

Crash

Regression?

Yes.

This started happening in a maestro bump here: dotnet/macios#21021, which at the moment is a bump from 8.0.109-servicing.24407.6 to 8.0.109-servicing.24419.10.

Known Workarounds

No response

Configuration

dotnet --info
.NET SDK:
 Version:           8.0.109
 Commit:            6e9002c2ef
 Workload version:  8.0.100-manifests.70d157ca

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  14.6
 OS Platform: Darwin
 RID:         osx-arm64
 Base Path:   /Users/rolf/work/maccore/main/xamarin-macios/builds/downloads/dotnet-sdk-8.0.109-servicing.24419.10/sdk/8.0.109/

.NET workloads installed:
 Workload version: 8.0.100-manifests.70d157ca
 [macos]
   Installation Source: SDK 8.0.100
   Manifest Version:    14.5.8059-ci.darc-main-ba8b4a5c-703d-4d22-97b2-7323315a2e65/8.0.100
   Manifest Path:       /Users/rolf/work/maccore/main/xamarin-macios/builds/downloads/dotnet-sdk-8.0.109-servicing.24419.10/sdk-manifests/8.0.100/microsoft.net.sdk.macos/WorkloadManifest.json
   Install Type:        FileBased

 [maccatalyst]
   Installation Source: SDK 8.0.100
   Manifest Version:    17.5.8059-ci.darc-main-ba8b4a5c-703d-4d22-97b2-7323315a2e65/8.0.100
   Manifest Path:       /Users/rolf/work/maccore/main/xamarin-macios/builds/downloads/dotnet-sdk-8.0.109-servicing.24419.10/sdk-manifests/8.0.100/microsoft.net.sdk.maccatalyst/WorkloadManifest.json
   Install Type:        FileBased


Host:
  Version:      8.0.8
  Architecture: arm64
  Commit:       08338fcaa5

.NET SDKs installed:
  8.0.109 [/Users/rolf/work/maccore/main/xamarin-macios/builds/downloads/dotnet-sdk-8.0.109-servicing.24419.10/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 8.0.8 [/Users/rolf/work/maccore/main/xamarin-macios/builds/downloads/dotnet-sdk-8.0.109-servicing.24419.10/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 8.0.8 [/Users/rolf/work/maccore/main/xamarin-macios/builds/downloads/dotnet-sdk-8.0.109-servicing.24419.10/shared/Microsoft.NETCore.App]

Other information

I'm on an M1, and this only happens when building for x64. I haven't tested on an x64 machine, but it's a possibility this is related/limited to Rosetta only.

This only happens when using CoreCLR, not with MonoVM.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 27, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Aug 27, 2024
@jkotas jkotas added area-ExceptionHandling-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 27, 2024
@lewing
Copy link
Member

lewing commented Aug 27, 2024

Sounds like a coreclr regression between runtime 8.0.7 and 8.0.8? The diff in dotnet/macios#21021 is confusing because the ref packs appear to be trailing the sdk version

@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Aug 28, 2024
@mangod9 mangod9 added this to the 10.0.0 milestone Aug 28, 2024
@mangod9
Copy link
Member

mangod9 commented Aug 28, 2024

Believe there are no guarantees around unhandled exceptions. @janvorli ?

@rolfbjarne
Copy link
Member Author

Believe there are no guarantees around unhandled exceptions. @janvorli ?

It's handled:

} catch (Exception e) {

@janvorli
Copy link
Member

janvorli commented Aug 29, 2024

The problem happens only under Rosetta. It was introduced by the #104818. We incorrectly leave CONTEXT_XSTATE set on the context even if the context returned by the OS didn't contain any AVX state. When later resuming execution after catch, our RtlRestoreContext attempts to set ymm registers due to the CONTEXT_XSTATE being present. And that crashes with SIGBUS, as Rosetta doesn't support AVX instructions (which are used to set the ymm registers).
The issue doesn't occur on .NET 9 because we have added stripping the CONTEXT_XSTATE from the context before we start unwinding from it during EH recently. we are using a ClrRestoreNonVolatileContext which doesn't restore the ymm registers.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

5 participants