Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Stack overflow from ILLink when LogWarning called #109157

Closed
mrvoorhe opened this issue Oct 23, 2024 · 11 comments · Fixed by #109207
Closed

Stack overflow from ILLink when LogWarning called #109157

mrvoorhe opened this issue Oct 23, 2024 · 11 comments · Fixed by #109207
Assignees
Labels
area-Tools-ILLink .NET linker development as well as trimming analyzers
Milestone

Comments

@mrvoorhe
Copy link
Contributor

Description

A stack overflow can be caused while logging a warning in a complex scenario involving compiler generated code. In my case, IL code gen was also involved.

I hit this error while running some of our tests using Unity's latest UnityLinker which is admittedly only using https://github.com/dotnet/linker/tree/release/7.0 and not the latest illink code in the runtime repo. However, I've gone over the diff of the offending code as well as the history and I don't see any reason to believe this issues has been addressed.

The scenario is:

In GetCompilerGeneratedStateForType, the call to LogWarning starts things off. From there it's
-> MessageContainer.CreateWarningMessage
-> context.IsWarningSuppressed
-> Suppressions.IsSuppressed
-> TryGetOwningMethodForCompilerGeneratedMember
-> GetCompilerGeneratedStateForType.
Rinse & Repeat until boom.

The repro code I have is fairly complex and I don't have a standalone repro.

Somehow it seems like the cycle needs to be broken.

I played around with a fix 2d2b36a and it does work for the scenario I'm hitting. But...

  1. There are other LogWarning calls in GetCompilerGeneratedStateForType and I'm not sure if the fix I did should be used there as well?

  2. GetGeneratedTypeAttributes also calls GetCompilerGeneratedStateForType which makes me wonder if it could also be vulnerable to a stack overflow.

Reproduction Steps

I don't have a reproduction case. Here is the information I have.

The LogWarning that is leads to the stack overflow is this one

When LogWarning is called, the state of the interesting locals is

stateMachineType = Unity.DataFlowGraph.PerformanceTests.SimpleBurstStateMachinesPerformanceTests/<<Unit>g__Test|1_0>d
alreadyAssociatedMethod = Unity.DataFlowGraph.Routine Unity.DataFlowGraph.PerformanceTests.SimpleBurstStateMachinesPerformanceTests::<Unit>g__Test|1_0(Unity.DataFlowGraph.Routine)
method =                  Unity.DataFlowGraph.Routine Unity.DataFlowGraph.PerformanceTests.SimpleBurstStateMachinesPerformanceTests::<Unit>g__Test|1_0$BurstManaged(Unity.DataFlowGraph.Routine)

And here is the il of the assembly LogWarningStackOverflow.txt

Expected behavior

ILLink issues a warning and completes successfully.

Actual behavior

The linker crashes with FullCrashStack.txt

Regression?

Technically, yes this is a regression. Our older version of UnityLinker does not crash on this code. However, our old linker is based on a version of Mono.Linker that is so old it doesn't have CompilerGeneratedState.cs so this issue has been around for awhile.

Known Workarounds

Don't trigger the LogWarning call.

Configuration

No response

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 23, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Oct 23, 2024
@mrvoorhe
Copy link
Contributor Author

@sbomer @vitek-karas I'm curious what you think about this. I realize Unity's IL code gen is creating the LogWarning scenario in my case. I don't know this code well enough to know if it's possible to hit this warning without IL modifications? Can the stackoverflow be hit without IL modifications?

I don't see any tests covering the warnings issued by GetCompilerGeneratedStateForType. That said, debugging I did see other warnings from GetCompilerGeneratedStateForType, so the LogWarning calls in GetCompilerGeneratedStateForType do not always produce a stack overflow. It does also require certain IL for the stack overflow to happen.

Is my fix a viable route to go? Do you have any ideas on a different approach to take to avoid the stack overflow?

@sbomer sbomer added the area-Tools-ILLink .NET linker development as well as trimming analyzers label Oct 23, 2024
@sbomer sbomer removed untriaged New issue has not been triaged by the area owner needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 23, 2024
@sbomer sbomer added this to the 10.0.0 milestone Oct 23, 2024
Copy link
Contributor

Tagging subscribers to this area: @agocke, @sbomer, @vitek-karas
See info in area-owners.md if you want to be subscribed.

@sbomer sbomer self-assigned this Oct 24, 2024
@sbomer
Copy link
Member

sbomer commented Oct 24, 2024

Thanks for the report. It looks like this happens when multiple methods have an AsyncStateMachineAttribute that point to the same state machine type, and one of the annotated methods is itself a compiler-generated method. I reduced it down to a self-contained repro in #109207, which also has a fix for the issue in ILLink. However, ILC has the same problem and the same fix doesn't apply there because the code is architected differently (I think the same is true of your fix).

I'll keep thinking about a fix that applies to both tools.

Can the stackoverflow be hit without IL modifications?

I'm not sure, but I wasn't able to find a pure C# example. @agocke do you know any cases where the compiler will use a state machine that's shared by multiple user methods? If we can come up with a case that hits the stackoverflow without IL modifications, we could change the logic to handle that and get rid of this warning.

@mrvoorhe
Copy link
Contributor Author

Thanks @sbomer !

@agocke
Copy link
Member

agocke commented Oct 25, 2024

As far as I know, in Roslyn async state machines are 1:1 with async functions. Since state machines have the actual lowered code from the original method, it's not obvious to me how two methods can share the same state machine type, unless they have exactly the same code -- and if they do, why have two methods at all?

Is it possible that this is a bug in code gen?

@mrvoorhe
Copy link
Contributor Author

Our code gen might be problematic, but the linker still shouldn't crash.

@agocke
Copy link
Member

agocke commented Oct 28, 2024

Agreed

@mrvoorhe
Copy link
Contributor Author

mrvoorhe commented Jan 6, 2025

@sbomer What happened with #109207 ? I tried your fix and it worked.

@sbomer
Copy link
Member

sbomer commented Jan 6, 2025

I think we need a different fix for nativeaot, but maybe we can take the ILLink fix in the meantime. I'll reopen it.

@mrvoorhe
Copy link
Contributor Author

mrvoorhe commented Jan 6, 2025

I think we need a different fix for nativeaot, but maybe we can take the ILLink fix in the meantime. I'll reopen it.

Thanks!

@mrvoorhe
Copy link
Contributor Author

mrvoorhe commented Jan 8, 2025

Thanks!

@github-actions github-actions bot locked and limited conversation to collaborators Feb 16, 2025
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
area-Tools-ILLink .NET linker development as well as trimming analyzers
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants