-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[release/9.0] Fix init race in mono_class_try_get_[shortname]_class. #112296
Merged
steveisok
merged 1 commit into
release/9.0-staging
from
backport/pr-112282-to-release/9.0
Feb 12, 2025
Merged
[release/9.0] Fix init race in mono_class_try_get_[shortname]_class. #112296
steveisok
merged 1 commit into
release/9.0-staging
from
backport/pr-112282-to-release/9.0
Feb 12, 2025
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We observed several crash reports in dotnet Android coming from google play store, like this one: #109921 (comment) and it happens at the following callstack: mono_class_setup_vtable_full at /__w/1/s/src/mono/mono/metadata/class-setup-vtable.c:900 init_io_stream_slots at /__w/1/s/src/mono/mono/metadata/icall.c:3378 ves_icall_System_IO_Stream_HasOverriddenBeginEndRead at /__w/1/s/src/mono/mono/metadata/icall.c:3437 (inlined by) ves_icall_System_IO_Stream_HasOverriddenBeginEndRead_raw at /__w/1/s/src/mono/mono/metadata/../metadata/icall-def.h:228 Looking a little deeper at that that code path expects call to mono_class_try_get_stream_class to succeed since it calls mono_class_setup_vtable on returned class pointer. There are other places where code expectes mono_class_try_get_[shortname]_class to always succeed or it will assert. Other locations handles the case where it returns NULL meaning that the class has been linked out. After looking into the macro implementation generating the mono_class_try_get_[shortname]_class functions it appears that it has a race where one thread could return NULL even if the class successfully gets loaded by another racing thread. Initially that might have been intentionally and callers would need to redo the call, but then there is no way for callers to distinct between hitting the race and class not available. Adding to the fact that code would also expect that some critical classes never fail loading, like what init_io_stream_slots does, this race ended up in race conditions. In the past, this particular race in init_io_stream_slots was hidden due to multiple calls to mono_class_try_get_stream_class minimzing the risk of a race to cause a crash. That implementation was however changed by e74da7c ending up in just one call to mono_class_try_get_stream_class in the critical paths. Since code relies on mono_class_try_get_[shortname]_class to succeed for some critical classes, code asserts if it returns NULL or crashes. The fix will use acquire/release semantics on the static variable guarding the cached type and make sure memory order is preserved on both read and write. Previous implementation adopted a similar approach but it took a copy of the static tmp_class into a local variable meaning that memory order would not be guaranteed, even if it applied memory barriers and volatile keywords on the static variables. Fix also adds an assert into init_io_stream_slots since it expects failures in calls to mono_class_try_get_stream_class as fatal.
This was referenced Feb 8, 2025
Open
jeffschwMSFT
approved these changes
Feb 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. please get a code review. we will take for consideration in 9.0.x
lateralusX
approved these changes
Feb 12, 2025
This was referenced Feb 12, 2025
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #112282 to release/9.0
/cc @steveisok @lateralusX
Customer Impact
As reported in #109921 (comment), customers experienced a crash when trying to open a stream due to the runtime accessing a NULL class pointer. The reason for the NULL class pointer was due to a race where one thread could return NULL even if another thread successfully loaded it.
The fix uses acquire/release semantics on the static variable guarding the init of the cached class and makes sure memory order is preserved on both read and write.
Regression
This problem existed long ago, but a change we made back in .NET 7 made this more likely to happen.
Testing
Manual testing
Risk
Low
IMPORTANT: If this backport is for a servicing release, please verify that:
release/X.0-staging
, notrelease/X.0
.Package authoring no longer needed in .NET 9
IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.