Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[libraries-pgo] Issues related to overlapped IO #75828

Closed
jakobbotsch opened this issue Sep 19, 2022 · 8 comments · Fixed by #75922
Closed

[libraries-pgo] Issues related to overlapped IO #75828

jakobbotsch opened this issue Sep 19, 2022 · 8 comments · Fixed by #75922
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@jakobbotsch
Copy link
Member

Multiple failures related to overlapped IO in net7.0-windows-Release-x64-CoreCLR_checked-jitosr_stress_random.

Work item link: https://dev.azure.com/dnceng-public/public/_build/results?buildId=21469&view=ms.vss-test-web.build-test-results-tab&runId=432776&paneView=debug


https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.NetworkInformation.Functional.Tests/1/console.cc044db2.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: 'overlapped' has already been freed.
   at System.Threading.ThreadPoolBoundHandleOverlapped.CompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolBoundHandleOverlapped.cs:line 49
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.Requests.Tests/1/console.7662a692.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: Handle is not initialized.
   at System.Threading.Overlapped.GetOverlappedFromNative(NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Overlapped.cs:line 228
   at System.Threading.IOCompletionCallbackHelper.PerformSingleIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/IOCompletionCallbackHelper.cs:line 37
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

Also, there are several AVs inside coreclr.dll and uses of corrupted OBJECTREFs in the same scenario.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 19, 2022
@ghost
Copy link

ghost commented Sep 19, 2022

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

Multiple failures related to overlapped IO in net7.0-windows-Release-x64-CoreCLR_checked-jitosr_stress_random.

Work item link: https://dev.azure.com/dnceng-public/public/_build/results?buildId=21469&view=ms.vss-test-web.build-test-results-tab&runId=432776&paneView=debug


https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.NetworkInformation.Functional.Tests/1/console.cc044db2.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: 'overlapped' has already been freed.
   at System.Threading.ThreadPoolBoundHandleOverlapped.CompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolBoundHandleOverlapped.cs:line 49
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.Requests.Tests/1/console.7662a692.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: Handle is not initialized.
   at System.Threading.Overlapped.GetOverlappedFromNative(NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Overlapped.cs:line 228
   at System.Threading.IOCompletionCallbackHelper.PerformSingleIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/IOCompletionCallbackHelper.cs:line 37
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

Also, there are several AVs inside coreclr.dll and uses of corrupted OBJECTREFs in the same scenario.

Author: jakobbotsch
Assignees: -
Labels:

area-System.Threading, untriaged

Milestone: -

@jakobbotsch jakobbotsch added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Threading untriaged New issue has not been triaged by the area owner labels Sep 19, 2022
@jakobbotsch jakobbotsch added this to the 8.0.0 milestone Sep 19, 2022
@ghost
Copy link

ghost commented Sep 19, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Multiple failures related to overlapped IO in net7.0-windows-Release-x64-CoreCLR_checked-jitosr_stress_random.

Work item link: https://dev.azure.com/dnceng-public/public/_build/results?buildId=21469&view=ms.vss-test-web.build-test-results-tab&runId=432776&paneView=debug


https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.NetworkInformation.Functional.Tests/1/console.cc044db2.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: 'overlapped' has already been freed.
   at System.Threading.ThreadPoolBoundHandleOverlapped.CompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolBoundHandleOverlapped.cs:line 49
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-main-d8e4b98572834b0b9b/System.Net.Requests.Tests/1/console.7662a692.log?helixlogtype=result

Unhandled exception. System.InvalidOperationException: Handle is not initialized.
   at System.Threading.Overlapped.GetOverlappedFromNative(NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/Overlapped.cs:line 228
   at System.Threading.IOCompletionCallbackHelper.PerformSingleIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pNativeOverlapped) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/IOCompletionCallbackHelper.cs:line 37
   at System.Threading.PortableThreadPool.IOCompletionPoller.Callback.Invoke(Event e) in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs:line 265
   at System.Threading.ThreadPoolTypedWorkItemQueue`2.System.Threading.IThreadPoolWorkItem.Execute() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 1134
   at System.Threading.ThreadPoolWorkQueue.Dispatch() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs:line 919
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() in /_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs:line 77

Also, there are several AVs inside coreclr.dll and uses of corrupted OBJECTREFs in the same scenario.

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib

@AndyAyersMS
Copy link
Member

I'll take a look.

@AndyAyersMS AndyAyersMS self-assigned this Sep 20, 2022
@AndyAyersMS
Copy link
Member

AndyAyersMS commented Sep 20, 2022

For win-x64, issue seems to be related to codegen in this method (via Complus_JitEnablePatchpointRange isolation)

System.Reflection.PortableExecutable.PEHeaders:TryGetDirectoryOffset(System.Reflection.PortableExecutable.DirectoryEntry,byref,bool):bool:this [Tier1-OSR @0x5e, IL size=136, code size=125, hash=0xd9042efb]

Hmm, that method may not be buggy though -- it is correlated with failure but not the cause of failure. I think it only gets called after the exception is raised.

So the root cause may not be bad codegen from OSR. Still digging.

@AndyAyersMS
Copy link
Member

Another possibility is that the tier0 code with patchpoints is somehow buggy. Further isolation suggests this is the likely culprit:

 SocketDelegateHelper:ConnectEx(System.Net.Sockets.SafeSocketHandle,long,int,long,int,byref,ulong):bool:this [Tier0, IL size=98, code size=587, hash=0xd9557380]

@AndyAyersMS
Copy link
Member

Guessing the issue is that we insert a patchpoint callout between a pinvoke and fetching of last error:

G_M35967_IG06:              ;; offset=011CH
       FFD0                 call     rax
						;; size=2 bbWeight=1    PerfScore 3.00
G_M35967_IG07:              ;; offset=011EH
       488B5580             mov      rdx, qword ptr [rbp-80H]
       C6420C01             mov      byte  ptr [rdx+0CH], 1
       833D6B184B5E00       cmp      dword ptr [(reloc 0x7ffcc493bd08)], 0
       7406                 je       SHORT G_M35967_IG08
       FF154B69495E         call     [CORINFO_HELP_STOP_FOR_GC]
						;; size=23 bbWeight=1    PerfScore 9.00
G_M35967_IG08:              ;; offset=0135H
       488B4D80             mov      rcx, qword ptr [rbp-80H]
       488B9540FFFFFF       mov      rdx, bword ptr [rbp-C0H]
       48895110             mov      qword ptr [rcx+10H], rdx
       8945B8               mov      dword ptr [rbp-48H], eax
						;; size=18 bbWeight=1    PerfScore 4.00
G_M35967_IG09:              ;; offset=0147H
       488BCC               mov      rcx, rsp
       E87B000000           call     G_M35967_IG21
						;; size=8 bbWeight=1    PerfScore 1.25
G_M35967_IG10:              ;; offset=014FH
       90                   nop      
						;; size=1 bbWeight=1    PerfScore 0.25
G_M35967_IG11:              ;; offset=0150H
       8B4D90               mov      ecx, dword ptr [rbp-70H]
       FFC9                 dec      ecx
       894D90               mov      dword ptr [rbp-70H], ecx
       837D9000             cmp      dword ptr [rbp-70H], 0
       7F0E                 jg       SHORT G_M35967_IG13
						;; size=14 bbWeight=1    PerfScore 5.25
G_M35967_IG12:              ;; offset=015EH
       488D4D90             lea      rcx, [rbp-70H]
       BA45000000           mov      edx, 69
       E8E49BB95D           call     CORINFO_HELP_PATCHPOINT
						;; size=14 bbWeight=0.01 PerfScore 0.02
G_M35967_IG13:              ;; offset=016CH
       FF15B64C93FE         call     [System.Runtime.InteropServices.Marshal:GetLastSystemError():int]

and so the last error state is corrupted.

Normally we wouldn't put a patchpoint here as this is not a loop backedge source or target, a so this is (currently) an OSR stress mode only problem.

Fix is either:

  • avoid doing instrumentation in these places (seems tricky)
  • ensure that last error is preserved (fairly straightforward in cases where the helper returns, but less obvious if the helper transitions -- maybe we are lucky and ClrRestoreNonvolatileContext or similar does not corrupt last error?

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Sep 20, 2022
In stress modes (and eventually perhaps in normal uses) the jit may insert
patchpoint helper calls in regions where last error is live. So the helpers
need to preserve last error.

Because some invocations of the helpers may transition to OSR methods instead
of returning, we can't use the normal macros for this.

Fixes dotnet#75828.
@AndyAyersMS
Copy link
Member

Seems like EnC could have similar problems preserving last error, especially if the interop code is source generated and there are user-accessible breakpoints between the pinvoke call and the last error handling.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 20, 2022
AndyAyersMS added a commit that referenced this issue Sep 22, 2022
In stress modes (and in normal uses, see test case) the jit may insert
patchpoint helper calls in regions where last error is live. So the helpers
need to preserve last error.

Because some invocations of the helpers may transition to OSR methods instead
of returning, we can't use the normal macros for this.

Fixes #75828.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 22, 2022
github-actions bot pushed a commit that referenced this issue Sep 26, 2022
In stress modes (and eventually perhaps in normal uses) the jit may insert
patchpoint helper calls in regions where last error is live. So the helpers
need to preserve last error.

Because some invocations of the helpers may transition to OSR methods instead
of returning, we can't use the normal macros for this.

Fixes #75828.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Sep 26, 2022
carlossanlop pushed a commit that referenced this issue Sep 27, 2022
* Preserve last error for patchpoint helpers

In stress modes (and eventually perhaps in normal uses) the jit may insert
patchpoint helper calls in regions where last error is live. So the helpers
need to preserve last error.

Because some invocations of the helpers may transition to OSR methods instead
of returning, we can't use the normal macros for this.

Fixes #75828.

* fix build issues, review feedback

* add test case

Co-authored-by: Andy Ayers <andya@microsoft.com>
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 27, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Oct 27, 2022
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants