-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
os: on NetBSD, *Process.Wait sometimes deadlocks after cmd.Process.Signal returns "process already finished" #48789
Comments
@bsiegert, @coypoop: do you know who might be able to look into this on the NetBSD side? @ianlancetaylor, @tklauser: is this possibly related to #13987? |
#44801 may be related, in that it involves a hang in |
I see two possible sequences that could lead to this. First, Second, Neither should be possible. The first case seems more likely. If we don't see a response from somebody familiar with NetBSD we should probably move the |
The first case seems more likely to me too, given that the failure seems to have started occuring only after https://golang.org/cl/315281 was submitted on 2021-05-02. I'll send a CL to move the |
Change https://golang.org/cl/354249 mentions this issue: |
CL 315281 changed the os package use wait6 on netbsd. This seems to be causing frequent test failures as reported in #48789. Revert that change using wait6 on netbsd for now. Updates #13987 Updates #16028 For #48789 Change-Id: Ieddffc65611c7f449971eaa8ed6f4299a5f742c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/354249 Trust: Tobias Klauser <tobias.klauser@gmail.com> Trust: Bryan C. Mills <bcmills@google.com> Trust: Benny Siegert <bsiegert@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Benny Siegert <bsiegert@gmail.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
@bcmills Have you seen more deadlocks since the commit above landed at the beginning of October? |
Looks like the deadlocks ended at that CL. 👍 The only failures I can find involving
2021-11-05T16:51:14-c58417b/netbsd-amd64-9_0 |
We believe that the
2021-12-14T01:48:22-1afa432/netbsd-amd64-9_0-n2
|
Hmm, maybe not! That send is here: which is not on the |
Change https://go.dev/cl/431855 mentions this issue: |
Resend of CL 315281 which was partially reverted by CL 354249 after the original CL was suspected to cause test failures as reported in #48789. It seems that both wait4 and wait6 lead to that particular deadlock, so let's use wait6. That way we at least don't hit #13987 on netbsd. Updates #13987 For #48789 For #50138 Change-Id: Iadc4a771217b7e9e821502e89afa07036e0dcb6f Reviewed-on: https://go-review.googlesource.com/c/go/+/431855 Reviewed-by: Benny Siegert <bsiegert@gmail.com> Auto-Submit: Tobias Klauser <tobias.klauser@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Change https://go.dev/cl/483396 mentions this issue: |
CL 431855 changed (*Process).blockUntilWaitable on netbsd to use wait6 again. Update #48789 Change-Id: I948f5445a44ab2e82c02560480a2a244d2b5f473 Reviewed-on: https://go-review.googlesource.com/c/go/+/483396 Reviewed-by: Benny Siegert <bsiegert@gmail.com> Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>
I've noticed a recurring pattern in
cmd/go
test failures on NetBSD builders, and I believe that it indicates a bug in eitheros.Process
or the kernel itself on that platform.The
cmd/go
tests start processes running “in the background” usingos/exec
, and use awaitOrStop
function to terminate any remaining processes at the conclusion of each test.The
waitOrStop
function starts a goroutine, then blocks oncmd.Wait()
. The background goroutine blocks until either the call tocmd.Wait
completes or theContext
is canceled, then sends a signal to the process. If that signal fails withos: process already finished
, then we assume that the process actually has already finished, and the background goroutine simply blocks until the call tocmd.Wait
(inevitably) returns.That all happens here:
go/src/cmd/go/script_test.go
Lines 1164 to 1206 in a05a7d4
What I'm seeing on (some of?) the NetBSD builders is that after
cmd.Process.Signal
fails withprocess already finished
, the call tocmd.Wait
continues to block, seemingly forever.The relevant goroutine traces are:
Note that goroutine 1537 is blocked at
script_test.go:1176
, which is the send onerrc
aftercmd.Process.Signal
fails withos: process already finished
.Goroutine 1536 is blocked at the call to
cmd.Wait
, which is itself blocked onsyscall.Wait4
.The failure rate with these symptoms is fairly high: something on the order of 20 failures per month.
greplogs --dashboard -md -l -e '(?m)panic: test timed out.*(?:.*\n)*.*\[syscall, .* minutes\]:\n(?:.+\n\t.+\n)*syscall\.Wait.*\n\t.+\n(?:.+\n\t.+\n)*cmd/go_test\.waitOrStop'
2021-10-04T22:46:23-17674e2/netbsd-386-9_0
2021-10-04T18:15:09-9f8d558/netbsd-386-9_0
2021-09-30T19:56:06-eb9f090/netbsd-386-9_0
2021-09-29T15:23:27-aeb4fba/netbsd-amd64-9_0
2021-09-28T17:18:36-ff7b041/netbsd-amd64-9_0
2021-09-28T15:26:21-583eeaa/netbsd-386-9_0
2021-09-27T18:57:20-3d795ea/netbsd-amd64-9_0
2021-09-22T16:24:17-74ba70b/netbsd-amd64-9_0
2021-09-22T15:00:53-91c2318/netbsd-amd64-9_0
2021-09-21T20:39:31-48cf96c/netbsd-386-9_0
2021-09-20T23:04:13-d7e3e44/netbsd-arm64-bsiegert
2021-09-20T00:13:47-a83a558/netbsd-amd64-9_0
2021-09-17T19:32:44-74e384f/netbsd-amd64-9_0
2021-09-16T23:57:40-8d2a9c3/netbsd-386-9_0
2021-09-16T19:38:19-bcdc61d/netbsd-amd64-9_0
2021-09-10T17:11:39-5a4b9f9/netbsd-amd64-9_0
2021-09-09T16:32:28-a53e3d5/netbsd-amd64-9_0
2021-09-08T11:57:03-9295723/netbsd-386-9_0
2021-09-07T03:56:13-6226020/netbsd-386-9_0
2021-09-04T10:58:11-5ec298d/netbsd-386-9_0
2021-08-31T16:43:46-6815235/netbsd-386-9_0
2021-08-30T22:07:49-b06cfe9/netbsd-386-9_0
2021-08-27T05:13:44-2c60a99/netbsd-386-9_0
2021-08-24T22:23:12-54cdef1/netbsd-386-9_0
2021-08-23T21:22:58-8157960/netbsd-386-9_0
2021-08-23T21:22:58-8157960/netbsd-arm64-bsiegert
2021-08-22T21:43:43-1958582/netbsd-386-9_0
2021-08-20T03:25:17-c92c2c9/netbsd-386-9_0
2021-08-19T20:50:13-65074a4/netbsd-amd64-9_0
2021-08-18T21:19:22-c2bd9ee/netbsd-386-9_0
2021-08-18T20:11:28-165ebd8/netbsd-386-9_0
2021-08-17T16:22:15-cf12b0d/netbsd-386-9_0
2021-08-17T15:00:04-3001b0a/netbsd-386-9_0
2021-08-17T04:37:32-a304273/netbsd-386-9_0
2021-08-17T01:29:37-1951afc/netbsd-386-9_0
2021-08-16T18:44:38-56a919f/netbsd-386-9_0
2021-08-16T18:44:32-ff36d11/netbsd-amd64-9_0
2021-08-16T13:38:52-a192ef8/netbsd-386-9_0
2021-08-12T17:43:16-39634e7/netbsd-386-9_0
2021-08-09T20:06:35-f1dce31/netbsd-386-9_0
2021-08-06T16:51:12-70546f6/netbsd-386-9_0
2021-07-28T03:27:13-b39e0f4/netbsd-386-9_0
2021-07-15T20:39:22-0941dbc/netbsd-amd64-9_0
2021-06-29T16:57:13-3463852/netbsd-386-9_0
2021-06-28T20:51:30-956c81b/netbsd-386-9_0
2021-06-24T03:45:33-a9bb382/netbsd-386-9_0
2021-06-15T20:59:42-d77f4c0/netbsd-amd64-9_0
2021-06-11T20:31:30-16b5d76/netbsd-386-9_0
2021-06-09T17:11:44-df35ade/netbsd-386-9_0
2021-06-09T15:09:13-139e935/netbsd-386-9_0
2021-05-21T17:43:46-4fda54c/netbsd-arm64-bsiegert
2021-05-21T17:35:47-8876b9b/netbsd-386-9_0
2021-05-17T16:02:12-b1aff42/netbsd-386-9_0
2021-05-12T15:23:09-0388670/netbsd-386-9_0
2021-05-12T02:04:57-1a0ea1a/netbsd-386-9_0
2021-05-11T18:22:54-9b84814/netbsd-386-9_0
2021-05-10T13:16:56-2870259/netbsd-386-9_0
2021-05-06T16:00:55-6c591f7/netbsd-386-9_0
2020-09-19T05:13:19-ccf581f/netbsd-386-9_0
The text was updated successfully, but these errors were encountered: