Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[release/9.0] Fix init race in mono_class_try_get_[shortname]_class. #112296

Merged
merged 1 commit into from
Feb 12, 2025

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Feb 8, 2025

Backport of #112282 to release/9.0

/cc @steveisok @lateralusX

Customer Impact

  • Customer reported
  • Found internally

As reported in #109921 (comment), customers experienced a crash when trying to open a stream due to the runtime accessing a NULL class pointer. The reason for the NULL class pointer was due to a race where one thread could return NULL even if another thread successfully loaded it.

The fix uses acquire/release semantics on the static variable guarding the init of the cached class and makes sure memory order is preserved on both read and write.

Regression

  • Yes
  • No

This problem existed long ago, but a change we made back in .NET 7 made this more likely to happen.

Testing

Manual testing

Risk

Low

IMPORTANT: If this backport is for a servicing release, please verify that:

  • The PR target branch is release/X.0-staging, not release/X.0.

Package authoring no longer needed in .NET 9

IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.

We observed several crash reports in dotnet Android coming from
google play store, like this one:

#109921 (comment)

and it happens at the following callstack:

mono_class_setup_vtable_full at /__w/1/s/src/mono/mono/metadata/class-setup-vtable.c:900
init_io_stream_slots at /__w/1/s/src/mono/mono/metadata/icall.c:3378
ves_icall_System_IO_Stream_HasOverriddenBeginEndRead at /__w/1/s/src/mono/mono/metadata/icall.c:3437
 (inlined by) ves_icall_System_IO_Stream_HasOverriddenBeginEndRead_raw at /__w/1/s/src/mono/mono/metadata/../metadata/icall-def.h:228

Looking a little deeper at that that code path expects call to
mono_class_try_get_stream_class to succeed since it calls
mono_class_setup_vtable on returned class pointer. There are other
places where code expectes mono_class_try_get_[shortname]_class to
always succeed or it will assert. Other locations handles the case
where it returns NULL meaning that the class has been linked out.

After looking into the macro implementation generating the
mono_class_try_get_[shortname]_class functions it appears that it has
a race where one thread could return NULL even if the class successfully
gets loaded by another racing thread. Initially that might have
been intentionally and callers would need to redo the call, but then
there is no way for callers to distinct between hitting the race and
class not available. Adding to the fact that code would also expect that
some critical classes never fail loading, like what
init_io_stream_slots does, this race ended up in race conditions.

In the past, this particular race in init_io_stream_slots was hidden
due to multiple calls to mono_class_try_get_stream_class minimzing the
risk of a race to cause a crash. That implementation was however
changed by e74da7c
ending up in just one call to mono_class_try_get_stream_class in the
critical paths.

Since code relies on mono_class_try_get_[shortname]_class to succeed
for some critical classes, code asserts if it returns NULL or crashes.
The fix will use acquire/release semantics on the static variable
guarding the cached type and make sure memory order is preserved
on both read and write. Previous implementation adopted a similar approach
but it took a copy of the static tmp_class into a local variable meaning
that memory order would not be guaranteed, even if it applied memory
barriers and volatile keywords on the static variables.

Fix also adds an assert into init_io_stream_slots since it expects
failures in calls to mono_class_try_get_stream_class as fatal.
Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. please get a code review. we will take for consideration in 9.0.x

@jeffschwMSFT jeffschwMSFT added the Servicing-consider Issue for next servicing release review label Feb 11, 2025
@jeffschwMSFT jeffschwMSFT added this to the 9.0.x milestone Feb 11, 2025
@leecow leecow added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Feb 11, 2025
@leecow leecow modified the milestones: 9.0.x, 9.0.3 Feb 11, 2025
@steveisok steveisok changed the base branch from release/9.0 to release/9.0-staging February 12, 2025 13:58
@steveisok steveisok merged commit dd88d29 into release/9.0-staging Feb 12, 2025
76 of 105 checks passed
@steveisok steveisok deleted the backport/pr-112282-to-release/9.0 branch February 12, 2025 14:44
@steveisok steveisok modified the milestones: 9.0.3, 9.0.4 Feb 18, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area-VM-meta-mono Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants