-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Failure when building self-hosted NativeAOT compiler changes with 8.0 Preview 2 SDK #83695
Comments
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsThe changes in #81205 have caused a failure when attempting to use .NET 8 Preview 2 SDK to build the Preview 3 source. This is a required scenario for .NET's Source-Build. This was identified in the changes to update the source-build configuration so that it references the Preview 2 SDK: dotnet/installer#15851. The error that occurs is for the ILCompiler project:
An analysis of the error and its symptoms is given here: dotnet/installer#15851 (comment). As stated in the analysis, the behavior is sporadic for some reason. One notable symptom is that It's been confirmed that reverting the changes in #81205 resolves the issue. We should consider reverting this change or getting a fix in to address this in the Preview 3 timeframe.
|
Reverting would be pretty bad -- we're planning on relying on this change to do more testing with the Pri0 coreclr tests. Otherwise the AOT compile is too slow. How can we service the old compiler? I think we should start by getting rid of the catch and seeing the actual exception, with message. |
If we need to unblock source build, adding a NativeAotSupported=false under SourceBuild here would unblock: runtime/src/coreclr/tools/aot/ILCompiler/ILCompiler.csproj Lines 13 to 14 in 059898f
I think this is some kind of build race caused by how source build is set up. I'm not aware of seeing this before. We do build NativeAOT compiler with NativeAOT in dotnet/runtime builds and CI and haven't seen this. The ComputeManagedAssembliesToCompileToNative is another remnant from CoreRT/runtimelab when we could not make changes to the SDK targets to accomodate the compiler. If we address #67080 and similar by moving more of this logic into the SDK, the task can potentially be deleted. The SDK knows what assemblies we need to compile - we shouldn't have to recompute that in a task. |
The exception message is Line 119 in c1acb35
|
?? Ok this must be some sort of race condition where the binaries are still being written when we read them. Agreed with @MichalStrehovsky -- let's try to fix this authoring. |
Gentle ping. What is the status/eta on fixing this? It is affecting source-build. |
@sbomer Could you also look at this? This seems to be some incredibly strange race condition that results in a partially-written file being read by another task. From @MichalStrehovsky's comment, it sounds like we might be able to just avoid some complexity here entirely by modifying the SDK. |
The ILCompiler that we're building now has a dependency on the System.CommandLine NuGet. The LKG version of ILCompiler is failing because the reference is bad. How does source build restore the NuGets that the repo build depends on? I don't think this is fixable from the runtime repo. The problem is with the NuGet that source build is providing to the repo build system. When the task is running, the NuGets are not fully written to disk. Does source build do something with how NuGets are used? Does it run the restore as usual or there is something special going on? |
Before #81205 the partially restored nugets would still be a problem (we would likely sometimes construct a ILCompiler that doesn't work because we'd blindly copy the partial assembly) but I doubt any NativeAOT testing is happening in source build so this would never be detected. |
The only intended difference between a repo build and source-build are the nuget feeds. For source-build we have to build everything from source so any dependencies are restored from local feeds that consist of artifacts built prior in the dependency tree or from the N-1 build (necessary to break circular dependencies) |
How are the contributing repo dependencies tracked by source build? We need to guarantee that https://github.com/dotnet/dotnet/tree/main/src/command-line-api finishes building before it gets used by https://github.com/dotnet/dotnet/tree/main/src/runtime . What does guarantee that? |
Maybe |
I agree this dependency should be expressed but it won't actually change the build order as command-line-api is built before runtime as expressed in the vmr's top level proj file. |
Bottom line is the command-line-api is the first product repo in the build. |
Could you please share links to a recent build logs with the failure? (The build logs shared above have been deleted.) What are the steps that can reproduce this? I have tried https://github.com/dotnet/dotnet#building a few times and it always works fine for me. |
I was just able to repro it with these changes: https://github.com/dotnet/dotnet/compare/main...sbomer:dotnet:tmp?expand=1. |
We expect the implementation of these assemblies to come as part of msbuild. Fixes dotnet#83695
This is regression introduced by #79783. Before this change, the only assemblies copied next to ILCompiler.Build.Tasks.dll are System.Collections.Immutable.dll and System.Reflection.Metadata.dll. After this change, the build is also copying over System.Buffers.dll, System.Memory.dll and System.Numerics.Vectors.dll. The problem is that these are reference assemblies that fail spectacularly when loaded for execution. It is causing a problem in the source build environment only. It is not causing a problem in regular shipping build environment since these assemblies end up being loaded from somewhere else and so these broken reference assemblies do not come into the picture. #86423 is proposed fix. |
@jkotas - You must remove the workaround patch in the VMR - https://github.com/dotnet/installer/blob/main/src/SourceBuild/patches/runtime/0002-Revert-switch-to-self-hosted-NativeAOT-compiler.patch |
Yep, I have figured that based on Sven's hint. Based on earlier comments, I assumed that this is a race condition that will reproduce intermittently, and I did not expect there to be a patch for it. It turns out that this is not a race condition. It reproduces 100% of time with the workaround patch removed. |
* Exclude System.* reference assemblies in ILCompiler.Build.Tasks We expect the implementation of these assemblies to come as part of msbuild. Fixes #83695 * Do not include any System.* assemblies as part of the task * Delete custom resolver
@MichaelSimons The fix is checked in. Could you please take care of deleting the workaround from source build? |
Will do - Thanks for investigating and resolving the runtime issue. |
@jkotas - I am seeing the same failure as before when trying to remove the source build workaround - dotnet/installer#16466. I did verify the runtime changes flowed in last week - dotnet/dotnet@cda4753#diff-531a689fcb6b7793148534b23232ba5733d6afe2ca51c1a31119defd88090bfd. |
@MichaelSimons Your PR was submitted against stale main that did not have the fix. I have merged current main into your PR and it is all green now. |
Thanks for updating my PR. That confuses me though, because azdo syncs in latest when running PR validation. |
I do not think that installer repo is configured to sync when running PR validation. |
Source build internal CI which runs several distro variants is failing in multiple legs with the original symptoms. Failing Legs Note: There are several other legs failing but they are caused from different issues. |
The failure is:
Notice that this is running .NET 8.0 Preview 4 build from package-cache (package-cache/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.0-preview.4.23259.5). This fix is present in .NET 8.0 Preview 5+ only. The failure should go away once this part of the source build is upgraded to .NET 8.0 Preview 5+. |
The source build seems to be combining live and prebuilt binaries in a way that is not present in the regular build. Is that intentional? It is the reason why this fails in source build only while regular build is fine. Alternative way to address this issue sooner may be to make source build closer to how regular build works. |
Can you point me to the indicators that |
Notice that the The relevant part of the CentOSStream8_Online_MsftSdk_x64 build log:
The relevant part of the CentOSStream8_Offline_MsftSdk_x64 build log:
The assembly version is |
Ok, I've tracked down the cause of the use of inconsistent versions. It's due to the bootstrapping logic having a casing typo for the package name. This causes two different names of the However, there is another related failure that we missed earlier through all the noise of the build errors:
This occurs when using a source-built SDK to build the VMR. In my test scenario, the RID of the SDK is runtime/src/coreclr/nativeaot/BuildIntegration/Microsoft.DotNet.ILCompiler.SingleEntry.targets Lines 7 to 9 in 10222f9
This fails in the following code because there's no corresponding case for handling a runtime/src/coreclr/tools/Common/CommandLineHelpers.cs Lines 57 to 86 in 10222f9
Here's a link to the binlog (internal link) which comes from this build (internal link). @jkotas - Can you provide your input here? Clearly this code isn't comprehensive with respect to the available RIDs. But it seems broken that it should have to account for everything. Is there a better way to do this? Because of this issue, it will delay our ability to remove the |
I think the RID logic in the targets is broken -- instead of using |
Do you mean crossgen2 task has some complicated logic for this that walks RID graph here: https://github.com/dotnet/sdk/blob/dc8c847472b827dae15b3f865a5615fd7717b83e/src/Tasks/Microsoft.NET.Build.Tasks/ResolveReadyToRunCompilers.cs#L68-L73 . Is there a simpler way to do it?
The complete removal of this patch is going to depend enabling native AOT for source build - tracked by dotnet/source-build#1215 (comment) . If you would really like to see this patch gone, we can check in in to dotnet/runtime - we have number of similar conditions throughout the repo. It does not hurt anything. |
I think that makes sense to do then. We're making a push to eliminate all patches for source-build. |
The changes in #81205 have caused a failure when attempting to use .NET 8 Preview 2 SDK to build the Preview 3 source. This is a required scenario for .NET's Source-Build. This was identified in the changes to update the source-build configuration so that it references the Preview 2 SDK: dotnet/installer#15851.
The error that occurs is for the ILCompiler project:
An analysis of the error and its symptoms is given here: dotnet/installer#15851 (comment). As stated in the analysis, the behavior is sporadic for some reason. One notable symptom is that
BadImageFormatException
is thrown when attempting to load certain assemblies in theComputeManagedAssembliesToCompileToNative
build task.It's been confirmed that reverting the changes in #81205 resolves the issue. We should consider reverting this change or getting a fix in to address this in the Preview 3 timeframe.
cc @MichalStrehovsky
The text was updated successfully, but these errors were encountered: