Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

runtime/pprof: Missing labels from goroutines stacks containing systemstack #48577

Closed
felixge opened this issue Sep 23, 2021 · 6 comments
Closed
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.

Comments

@felixge
Copy link
Contributor

felixge commented Sep 23, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17.1 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/felix.geisendoerfer/Library/Caches/go-build"
GOENV="/Users/felix.geisendoerfer/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS="-count=1"
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/felix.geisendoerfer/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/felix.geisendoerfer/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.17.1/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.17.1/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.1"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/felix.geisendoerfer/go/src/github.com/felixge/dump/cpu-profiler-label-bug/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/mg/xwzs985x35789zk2l14nk8k80000gn/T/go-build2698991037=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I created a Go program like cpu-profiler-label-bug.go shown below that uses pprof labels and generates a CPU profile:

package main

import (
	"context"
	"fmt"
	"io"
	"os"
	"runtime"
	"runtime/pprof"
	"time"
)

func main() {
	file, _ := os.Create("./cpu.pprof")
	pprof.StartCPUProfile(file)
	defer pprof.StopCPUProfile()

	ctx := context.Background()
	labels := pprof.Labels("foo", "bar")
	pprof.Do(ctx, labels, func(_ context.Context) {
		for i := 0; i < runtime.GOMAXPROCS(0)*10; i++ {
			go cpuHog()
		}
	})
	time.Sleep(10 * time.Second)
}

func cpuHog() {
	for i := 0; ; i++ {
		fmt.Fprintf(io.Discard, "%d", i)
	}
}

I then executed the program and used pprofutils labelframes to turn cpu.pprof into cpu.label.pprof. The latter modifies each sample in the pprof to add a new root frame that contains the value of my label foo (or N/A if the label is not present). This makes it easier to view the labels than in the graph view of pprof.

go run cpu-profiler-label-bug.go
curl --data-binary @cpu.pprof pprof.to/labelframes?label=foo > cpu.label.pprof
go tool pprof -http=:6060 cpu.label.pprof

What did you expect to see?

main.cpuHog should show up only under the foo=bar frame and not the foo=N/A frame:

expected-flamegraph

What did you see instead?

3.95% of all samples containing main.cpuHog show below the foo=N/A frame which means they were missing the foo=bar label.

flamegraph

Zooming into those 3.95% of the flame graph, I noticed that most samples contain a systemstack frame.

flamegraph-detail

See zip file below for code, pprofs and screenshots: cpu-profiler-label-bug.zip

My Analysis

As far as I can tell this problem can be easily fixed via the patch below. In fact that's what I used to create the "What did you expect to see" screenshot above.

diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index a144b36ec8..bebbd7b3b1 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -4773,7 +4773,7 @@ func sigprof(pc, sp, lr uintptr, gp *g, mp *m) {
        }
 
        if prof.hz != 0 {
-               cpuprof.add(gp, stk[:n])
+               cpuprof.add(gp.m.curg, stk[:n])
        }
        getg().m.mallocing--
 }

The reason this is the right thing to do is because gp.m.curg should always point to the right goroutine, even when the systemstack goroutine is executing. At least that's how traceback.go stitches the stacks together.

Since I suspect this to be the correct fix, I'll be submitting a CL shortly. It's like to fix more situations as well (morestack, newstack, etc.)

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/351751 mentions this issue: runtime: fix missing pprof labels

@seankhliao seankhliao added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 24, 2021
felixge added a commit to felixge/go that referenced this issue Oct 1, 2021
Use gp.m.curg instead of the gp when recording cpu profiler stack
traces. This ensures profiler labels are captured when systemstack or similar
is executing on behalf of the current goroutine.

After this there are still rare cases of samples containing the labelHog
function, so more work might be needed. This patch should fix ~99% of the
problem.

Fixes golang#48577.

Change-Id: I27132110e3d09721ec3b3ef417122bc70d8f3279
@prattmic
Copy link
Member

prattmic commented Nov 2, 2021

Reverted in https://golang.org/cl/360757 due to broken builders on the new test.

@prattmic prattmic reopened this Nov 2, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/362614 mentions this issue: runtime/pprof: include labels in profile dump

gopherbot pushed a commit that referenced this issue Nov 9, 2021
For tests of pprof label support having the sample labels in the output
is needed for effective debugging.

For #48577

Change-Id: Ic7c5bc90cb33e8fb477f7db62d9b56a7a9d6ffa8
Reviewed-on: https://go-review.googlesource.com/c/go/+/362614
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/367200 mentions this issue: runtime: fix missing pprof labels

@felixge
Copy link
Contributor Author

felixge commented Dec 31, 2021

Update: Turns out this issue also impacted cgo stacks, which pretty much never have labels in go1.17 in my testing.

However, CL 367200 seems to have also fixed the cgo case 🥳

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants