Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

String with nil backing array causing panics in 1.23 #69710

Closed
watchforstock opened this issue Sep 30, 2024 · 10 comments
Closed

String with nil backing array causing panics in 1.23 #69710

watchforstock opened this issue Sep 30, 2024 · 10 comments
Labels
WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@watchforstock
Copy link

Go version

go1.23.1 windows/amd64

Output of go env in your module/workspace:

Note this is from a development machine. Our current deployed system is run on windows (cross compiled on a linux build server)

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/Username/Library/Caches/go-build'
GOENV='/Users/Username/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/username/go/pkg/mod'
GONOPROXY='*.company.local'
GONOSUMDB='*.company.local'
GOOS='darwin'
GOPATH='/Users/Username/go'
GOPRIVATE='*.company.local'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.23.1/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.23.1/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/username/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/Users/username/project/go.mod'
GOWORK='/Users/username/Development/go.work'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/_s/568w3p_n6y5f1dlkb7r5_snh0000gn/T/go-build424052742=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

We have a large complex code base and are still tracking down the issue. Code that has been running fine on go1.22 seems to be failing on go1.23. At the moment we haven't been able to recreate the issue except on a customer system.

What did you see happen?

We noticed that when requesting the length of a string variable, we were getting a size > 1TB. On closer inspection it appears that the data pointer for the string is nil and there is (we assume) invalid data in the length property).

The string in question may be being created programmatically, or may be coming as the result of unmarshalling JSON into an object (we initially suspected the unmarshalling library, but we switched this out for the standard-library version and saw the same results). We are continuing to try and isolate where the invalid string is being created.

It's worth noting that this appears to have a degree of randomness to it, in that we can sometimes have a run which doesn't exhibit the issue - also the code path where we're having the issue is executed thousands of times during a run and many of those times are successful before we see a failure.

We are getting a strack trace as follows (only partial I'm afraid):

.\project-win-snapshot-f18b047.exe : runtime: VirtualAlloc of 1099511701504 bytes failed with errno=1455
At line:1 char:1
+ .\project-win-snapshot-f18b047.exe run -c .\static\deployments...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (runtime: Virtua...with errno=1455:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError
fatal error: out of memory
 
runtime stack:
runtime.throw({0x393303b?, 0x1cdd51bc000?})
                /usr/local/go/src/runtime/panic.go:1067 +0x4d fp=0x32a57ffb70 sp=0x32a57ffb40 pc=0xba97ed
runtime.sysUsedOS(0x1c000000000, 0x10000012000)
                /usr/local/go/src/runtime/mem_windows.go:83 +0x1bb fp=0x32a57ffbd0 sp=0x32a57ffb70 pc=0xb4aa3b
runtime.sysUsed(...)
                /usr/local/go/src/runtime/mem.go:77
runtime.(*mheap).allocSpan(0xb8baba0, 0x8000009, 0x0, 0x1)
                /usr/local/go/src/runtime/mheap.go:1345 +0x487 fp=0x32a57ffc78 sp=0x32a57ffbd0 pc=0xb5d9c7
runtime.(*mheap).alloc.func1()
                /usr/local/go/src/runtime/mheap.go:962 +0x5c fp=0x32a57ffcc0 sp=0x32a57ffc78 pc=0xb5d17c
runtime.systemstack(0xc000106000)
                /usr/local/go/src/runtime/asm_amd64.s:514 +0x49 fp=0x32a57ffcd0 sp=0x32a57ffcc0 pc=0xbb0fa9
 
goroutine 165899 gp=0xc019b60e00 m=5 mp=0xc000100008 [running]:
runtime.systemstack_switch()
                /usr/local/go/src/runtime/asm_amd64.s:479 +0x8 fp=0xc009c85fd0 sp=0xc009c85fc0 pc=0xbb0f48
runtime.(*mheap).alloc(0x10000012000?, 0x8000009?, 0x4d?)
                /usr/local/go/src/runtime/mheap.go:956 +0x5b fp=0xc009c86018 sp=0xc009c85fd0 pc=0xb5d0db
runtime.(*mcache).allocLarge(0x1b30?, 0x10000010003, 0x1)
                /usr/local/go/src/runtime/mcache.go:234 +0x87 fp=0xc009c86068 sp=0xc009c86018 pc=0xb497e7
runtime.mallocgc(0x10000010003, 0x3361520, 0x1)
                /usr/local/go/src/runtime/malloc.go:1177 +0x5d0 fp=0xc009c86108 sp=0xc009c86068 pc=0xba3d90
runtime.makeslice(0x60?, 0xb475bd?, 0xc025cb5801?)
                /usr/local/go/src/runtime/slice.go:116 +0x49 fp=0xc009c86130 sp=0xc009c86108 pc=0xbabaa9
scm.company.local/project/project.git/data.Node.MarshalJSON(...)
                /go/pkg/mod/scm.company.local/project/project.git@v0.26.1-0.20240930081403-908ad04ce8ae/data/node.go:46
scm.company.local/project/project.git/data.(*Node).MarshalJSON(0x3801980?)
                <autogenerated>:1 +0x7f fp=0xc009c861e8 sp=0xc009c86130 pc=0x10b591f
github.com/goccy/go-json/internal/encoder.AppendMarshalJSON(0xc023efba00, 0xc003dade90, {0xc02f3f8000, 0x735c, 0x2a000},
{0x3801980?, 0xb8f89c0?})
                /go/pkg/mod/github.com/goccy/go-json@v0.10.2/internal/encoder/encoder.go:430 +0x276 fp=0xc009c86358 sp=0xc009c861e8 pc=0xf8f516
github.com/goccy/go-json/internal/encoder/vm.appendMarshalJSON(0x88b6c80?, 0xc013a17590?, {0xc02f3f8000?, 0xc025281b08?,
0xc0025e3b90?}, {0x3801980?, 0xb8f89c0?})
                /go/pkg/mod/github.com/goccy/go-json@v0.10.2/internal/encoder/vm/util.go:152 +0x25 fp=0xc009c863a0 sp=0xc009c86358 pc=0x10127e5
github.com/goccy/go-json/internal/encoder/vm.Run(0xc023efba00, {0xc02f3f8000?, 0xbac49c?, 0xc009c885a0?}, 0x0?)
                /go/pkg/mod/github.com/goccy/go-json@v0.10.2/internal/encoder/vm/vm.go:271 +0x217e fp=0xc009c886f0 sp=0xc009c863a0 pc=0x1014b3e
github.com/goccy/go-json.encodeRunCode(0xc023efba00?, {0xc02f3f8000?, 0x23?, 0x0?}, 0xbc3c5d?)
                /go/pkg/mod/github.com/goccy/go-json@v0.10.2/encode.go:310 +0x56 fp=0xc009c88728 sp=0xc009c886f0 pc=0x10399d6
github.com/goccy/go-json.encode(0xc023efba00, {0x375d580, 0xc019e719e0})
                /go/pkg/mod/github.com/goccy/go-json@v0.10.2/encode.go:235 +0x205 fp=0xc009c887a8 sp=0xc009c88728 pc=0x1039565
github.com/goccy/go-json.marshal({0x375d580, 0xc019e719e0}, {0x0, 0x0, 0xc0263f4b50?})
                /go/pkg/mod/github.com/goccy/go-json@v0.10.2/encode.go:150 +0xb9 fp=0xc009c88808 sp=0xc009c887a8 pc=0x1039019
github.com/goccy/go-json.MarshalWithOption(...)
                /go/pkg/mod/github.com/goccy/go-json@v0.10.2/json.go:186
github.com/goccy/go-json.Marshal({0x375d580?, 0xc019e719e0?})
                /go/pkg/mod/github.com/goccy/go-json@v0.10.2/json.go:171 +0x25 fp=0xc009c88840 sp=0xc009c88808 pc=0x1039c25

The code at /go/pkg/mod/scm.company.local/project/project.git@v0.26.1-0.20240930081403-908ad04ce8ae/data/node.go:46 is as follows:

func (n Node) MarshalJSON() ([]byte, error) {
    data := make([]byte, 0, len(n.str)+2)  // line 46
    data = append(data, doubleQuote)
    data = append(data, []byte(n.str)...)
    data = append(data, doubleQuote)
    return data, nil
}

What did you expect to see?

We expected it to run the same on version 1.23 as on version 1.22.

As we are still trying to track this down, if there are additional things we could try (or try to eliminate) that would be helpful.

We are somewhat at a loss as to how the string could get a nil backing array (we're not doing anything with unsafe pointers etc in our code base).

@Zxilly
Copy link
Member

Zxilly commented Sep 30, 2024

cc @golang/runtime

@Zxilly
Copy link
Member

Zxilly commented Sep 30, 2024

but we switched this out for the standard-library version and saw the same results

Can you share the stacktrace when you met the error with the encoding/json?

Also, can you share more details about how n.str was created? In particular, was the unsafe package or reflect.StringHeader used?

@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 30, 2024
@randall77
Copy link
Contributor

Have you run your application under the race detector (-race)?

The size in question is 0x10000010003. Makes me think of #68227 which was a missing zero extension on arm64.

@watchforstock
Copy link
Author

Stack trace when we used built-in json marshaller:

fatal error: out of memory
 
runtime stack:
runtime.throw({0x2ff3043?, 0xce28a77000?})
                /usr/local/go/src/runtime/panic.go:1067 +0x4d fp=0x500b3ff500 sp=0x500b3ff4d0 pc=0x2397ed
runtime.sysUsedOS(0xc04daca000, 0x10000012000)
                /usr/local/go/src/runtime/mem_windows.go:83 +0x1bb fp=0x500b3ff560 sp=0x500b3ff500 pc=0x1daa3b
runtime.sysUsed(...)
                /usr/local/go/src/runtime/mem.go:77
runtime.(*mheap).allocSpan(0xaf7da80, 0x8000009, 0x0, 0x1)
                /usr/local/go/src/runtime/mheap.go:1345 +0x487 fp=0x500b3ff608 sp=0x500b3ff560 pc=0x1ed9c7
runtime.(*mheap).alloc.func1()
                /usr/local/go/src/runtime/mheap.go:962 +0x5c fp=0x500b3ff650 sp=0x500b3ff608 pc=0x1ed17c
runtime.systemstack(0xc00d7f0000)
                /usr/local/go/src/runtime/asm_amd64.s:514 +0x49 fp=0x500b3ff660 sp=0x500b3ff650 pc=0x240fa9
 
goroutine 411448 gp=0xc0197ffa40 m=13 mp=0xc0043ba808 [running]:
runtime.systemstack_switch()
                /usr/local/go/src/runtime/asm_amd64.s:479 +0x8 fp=0xc004fe36b8 sp=0xc004fe36a8 pc=0x240f48
runtime.(*mheap).alloc(0x10000012000?, 0x8000009?, 0x40?)
                /usr/local/go/src/runtime/mheap.go:956 +0x5b fp=0xc004fe3700 sp=0xc004fe36b8 pc=0x1ed0db
runtime.(*mcache).allocLarge(0x239b1f?, 0x10000012000, 0x1)
                /usr/local/go/src/runtime/mcache.go:234 +0x87 fp=0xc004fe3750 sp=0xc004fe3700 pc=0x1d97e7
runtime.mallocgc(0x10000012000, 0x0, 0x0)
                /usr/local/go/src/runtime/malloc.go:1177 +0x5d0 fp=0xc004fe37f0 sp=0xc004fe3750 pc=0x233d90
runtime.growslice(0xc00721a000, 0x16?, 0xc004fe393c?, 0x0?, 0x0?)
                /usr/local/go/src/runtime/slice.go:264 +0x5c9 fp=0xc004fe3860 sp=0xc004fe37f0 pc=0x23c109
fmt.(*buffer).writeString(...)
                /usr/local/go/src/fmt/print.go:108
fmt.(*fmt).padString(0x7fe2fd0?, {0x0, 0x10000010001})
                /usr/local/go/src/fmt/format.go:113 +0x217 fp=0xc004fe38d0 sp=0xc004fe3860 pc=0x36d6f7
fmt.(*fmt).fmtS(0xc004fe3928?, {0x0?, 0x16?})
                /usr/local/go/src/fmt/format.go:362 +0x39 fp=0xc004fe3908 sp=0xc004fe38d0 pc=0x36e239
fmt.(*pp).fmtString(0xc00721a000?, {0x0?, 0x1?}, 0x10?)
                /usr/local/go/src/fmt/print.go:494 +0x85 fp=0xc004fe3958 sp=0xc004fe3908 pc=0x371465
fmt.(*pp).printArg(0xc02047cd00, {0x2a214a0, 0xc01a0f8360}, 0x76)
                /usr/local/go/src/fmt/print.go:741 +0x346 fp=0xc004fe39f0 sp=0xc004fe3958 pc=0x3730c6
fmt.(*pp).doPrintln(0xc02047cd00, {0xc004fe3b48?, 0x2, 0xc004fe3aa0?})
                /usr/local/go/src/fmt/print.go:1220 +0x38 fp=0xc004fe3a60 sp=0xc004fe39f0 pc=0x376cf8
fmt.Fprintln({0x7f6b5a0, 0xc00010e098}, {0xc004fe3b48, 0x2, 0x2})
                /usr/local/go/src/fmt/print.go:304 +0x49 fp=0xc004fe3ab0 sp=0xc004fe3a60 pc=0x370289
fmt.Println(...)
                /usr/local/go/src/fmt/print.go:314
scm.company.local/project/project.git/data.(*Node).MarshalJSON(0xafbb8a0)
                /go/pkg/mod/scm.company.local/project/project.git@v0.26.1-0.20240927134325-92e0b7ffb6b8/data/node.go:51 +0x152 fp=0xc004fe3b98
sp=0xc004fe3ab0 pc=0x689a12
encoding/json.addrMarshalerEncoder(0xc017ebd440, {0x2eb9440?, 0xafbb8a0?, 0x3a5b25?}, {0x0?, 0x1?})
                /usr/local/go/src/encoding/json/encode.go:460 +0x15a fp=0xc004fe3c58 sp=0xc004fe3b98 pc=0x38df1a
encoding/json.condAddrEncoder.encode({0x7a4b2f0?, 0xc0045cfd70?}, 0xc0063cb8c0?, {0x2eb9440?, 0xafbb8a0?, 0xc0060642a0?}, {0xc0?,0x68?})
                /usr/local/go/src/encoding/json/encode.go:902 +0x44 fp=0xc004fe3c90 sp=0xc004fe3c58 pc=0x391824
encoding/json.condAddrEncoder.encode-fm(0x29e0940?, {0x2eb9440?, 0xafbb8a0?, 0x90?}, {0x1?, 0x0?})
                <autogenerated>:1 +0x46 fp=0xc004fe3cd8 sp=0xc004fe3c90 pc=0x3a5746
encoding/json.arrayEncoder.encode({0xc001186900?}, 0xc017ebd440, {0x29e0940?, 0xc00ce868f0?, 0x3a5b25?}, {0x0?, 0x1?})
                /usr/local/go/src/encoding/json/encode.go:858 +0xcf fp=0xc004fe3d48 sp=0xc004fe3cd8 pc=0x39112f
encoding/json.arrayEncoder.encode-fm(0x2?, {0x29e0940?, 0xc00ce868f0?, 0xc017ebd440?}, {0xc0?, 0x68?})
                <autogenerated>:1 +0x3f fp=0xc004fe3d88 sp=0xc004fe3d48 pc=0x3a58bf
encoding/json.sliceEncoder.encode({0x7fa0f40?}, 0xc017ebd440, {0x29e0940?, 0xc00ce868f0?, 0x92?}, {0x8?, 0x0?})
                /usr/local/go/src/encoding/json/encode.go:831 +0x338 fp=0xc004fe3eb0 sp=0xc004fe3d88 pc=0x390c98
encoding/json.sliceEncoder.encode-fm(0x2c670e0?, {0x29e0940?, 0xc00ce868f0?, 0x17?}, {0xd8?, 0x17?})
                <autogenerated>:1 +0x3f fp=0xc004fe3ef0 sp=0xc004fe3eb0 pc=0x3a597f
encoding/json.structEncoder.encode({{{0xc0064f1200, 0x2, 0x2}, 0xc006de1980, 0xc006de19b0}}, 0xc017ebd440, {0x2c670e0?,
0xc00ce868d8?, 0x7fa0f40?}, {0x0, ...})
                /usr/local/go/src/encoding/json/encode.go:715 +0x21e fp=0xc004fe3fa0 sp=0xc004fe3ef0 pc=0x38fade
encoding/json.structEncoder.encode-fm(0x29e82a0?, {0x2c670e0?, 0xc00ce868d8?, 0x10041401701?}, {0x30?, 0xb3?})
                <autogenerated>:1 +0x65 fp=0xc004fe4000 sp=0xc004fe3fa0 pc=0x3a5b25
encoding/json.arrayEncoder.encode({0xc004fe40a8?}, 0xc017ebd440, {0x29e82a0?, 0xc0067056a8?, 0xc001ec3a1c?}, {0x4?, 0x0?})
                /usr/local/go/src/encoding/json/encode.go:858 +0xcf fp=0xc004fe4070 sp=0xc004fe4000 pc=0x39112f
encoding/json.arrayEncoder.encode-fm(0xc0067056a0?, {0x29e82a0?, 0xc0067056a8?, 0x0?}, {0xc0?, 0xfe?})
                <autogenerated>:1 +0x3f fp=0xc004fe40b0 sp=0xc004fe4070 pc=0x3a58bf
encoding/json.sliceEncoder.encode({0xc004fe4250?}, 0xc017ebd440, {0x29e82a0?, 0xc0067056a8?, 0x2a214a0?}, {0x14?, 0x0?})
                /usr/local/go/src/encoding/json/encode.go:831 +0x338 fp=0xc004fe41d8 sp=0xc004fe40b0 pc=0x390c98
encoding/json.sliceEncoder.encode-fm(0x2e1d580?, {0x29e82a0?, 0xc0067056a8?, 0xc004fe4268?}, {0xa2?, 0x46?})
                <autogenerated>:1 +0x3f fp=0xc004fe4218 sp=0xc004fe41d8 pc=0x3a597f
encoding/json.structEncoder.encode({{{0xc001bd9b08, 0x5, 0x8}, 0xc006de1b30, 0xc006de1b60}}, 0xc017ebd440, {0x2e1d580?,
0xc006705680?, 0x24278791908?}, {0x0, ...})
                /usr/local/go/src/encoding/json/encode.go:715 +0x21e fp=0xc004fe42c8 sp=0xc004fe4218 pc=0x38fade
encoding/json.structEncoder.encode-fm(0x2e1d580?, {0x2e1d580?, 0xc006705680?, 0xc004fe4398?}, {0x8?, 0xa8?})
                <autogenerated>:1 +0x65 fp=0xc004fe4328 sp=0xc004fe42c8 pc=0x3a5b25
encoding/json.(*encodeState).reflectValue(0xc017ebd440, {0x2e1d580?, 0xc006705680?, 0x30?}, {0x20?, 0xba?})
                /usr/local/go/src/encoding/json/encode.go:322 +0x73 fp=0xc004fe4380 sp=0xc004fe4328 pc=0x38d153
encoding/json.(*encodeState).marshal(0x40?, {0x2e1d580?, 0xc006705680?}, {0x8?, 0xa8?})
                /usr/local/go/src/encoding/json/encode.go:298 +0xbd fp=0xc004fe43f8 sp=0xc004fe4380 pc=0x38cf1d
encoding/json.Marshal({0x2e1d580, 0xc006705680})
                /usr/local/go/src/encoding/json/encode.go:164 +0xbe fp=0xc004fe44b8 sp=0xc004fe43f8 pc=0x38c73e

The Node object is defined as:

type Node struct {
    typ     string
    subtype string
    value   string
    str     string
}

When we create a new Node object it sets n.str as n.str = n.typ + ":" + n.subtype + ":" + n.value. We don't use unsafe or reflect.StringHeader in our code anyway (but the Node item may have been unmarshalled from JSON as part of its creation).

Contrary to what I said above, we think we've now been able to recreate it with a 1.22 build. However, it manifested slightly differently with a panic which was caught in a recover and reported makeslice: cap out of range. The rest of the stack trace looks the same. It took more runs on 1.22 to trigger.

@watchforstock
Copy link
Author

Have you run your application under the race detector (-race)?

The size in question is 0x10000010003. Makes me think of #68227 which was a missing zero extension on arm64.

We haven't run it under the race detector. At the moment we're only seeing the issue on Windows / amd64 (although we're generally using mac arm64 as a development environment). That said, I noticed that the size looked "structured" rather than random and indeed we have seen the same value multiple times (I can't be definitive that it was always the same)

@watchforstock
Copy link
Author

I have now realised, partly with the prompt from @randall77 , that if n.str was equal to the empty string then it would legitimately have a nil pointer for its underlying storage (we had previously been trying to understand how that might be nil). That would leave the length potentially being corrupted somehow from its zero value.

Is there anything useful we can look for in the executable to help track down the situation where this issue might be coming from? Unfortunately I can't share the executable itself (and it's about 140MB)

@randall77
Copy link
Contributor

Please run with the race detector if you can. Heap corruption like this (if that's what it is) is easily caused by data races.

@watchforstock
Copy link
Author

Please run with the race detector if you can. Heap corruption like this (if that's what it is) is easily caused by data races.

Thanks. We will do that (probably tomorrow UK time as we've lost our connection to the customer's environment for the day). Thanks for your help.

@watchforstock
Copy link
Author

Based on today's investigations it seems most likely that this is a race condition leading to corruption. My team and I very much appreciate the rapid responses and guidance given!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants