Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Incorrect use of atomic.Value in the syncutil.Go utility function causes panic #916

Open
apparentlymart opened this issue Mar 6, 2025 · 2 comments

Comments

@apparentlymart
Copy link

apparentlymart commented Mar 6, 2025

While using ORAS-Go to push a layout with some slightly-incorrect manifest contents I encountered this panic:

panic: sync/atomic: compare and swap of inconsistently typed value into Value

goroutine 83 [running]:
sync/atomic.(*Value).CompareAndSwap(0xc00034a700, {0x0, 0x0}, {0x99fe80, 0xc0004a2660})
        sync/atomic/value.go:174 +0x150
oras.land/oras-go/v2/internal/syncutil.Go[...].func1.func2()
        oras.land/oras-go/v2@v2.5.1-0.20250221033735-cb6d75be7dd4/internal/syncutil/limit.go:86 +0xe6
golang.org/x/sync/errgroup.(*Group).Go.func1()
        golang.org/x/sync@v0.11.0/errgroup/errgroup.go:78 +0x50
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 46
        golang.org/x/sync@v0.11.0/errgroup/errgroup.go:75 +0x93

Unfortunately I can't share with you the incorrect content since it was just throwaway stuff in a job where I didn't preserve the input artifacts, but I found the relevant code here:

for _, item := range items {
region := LimitRegion(egCtx, limiter)
if err := region.Start(); err != nil {
if egErr, ok := egErr.Load().(error); ok && egErr != nil {
return egErr
}
return err
}
eg.Go(func(t T) func() error {
return func() error {
defer region.End()
err := fn(egCtx, region, t)
if err != nil {
egErr.CompareAndSwap(nil, err)
return err
}
return nil
}
}(item))
}

I think the egErr.CompareAndSwap call on line 86 is incorrect, because err here has an interface type (error) and so whichever concrete error type is stored there first freezes the type expected for all future calls, per the Value.CompareAndSwap documentation:

All calls to CompareAndSwap for a given Value must use values of the same concrete type. CompareAndSwap of an inconsistent type panics, as does CompareAndSwap(old, nil).

If egErr.CompareAndSwap gets subsequently called again with a different concrete error type from one of the other goroutines then I expect that this would generate the panic I observed.

I think one potential way to make this more robust would be for this function to declare a local struct type to act as the consistent concrete type for egErr, with the dynamically-typed error inside it:

type WrappedError struct {
    err error
}
# ...

egErr.CompareAndSwap(nil, WrappedErr{err})

Then when loading the error again:

if egErrWrap, ok := egErr.Load().(WrappedError) ok && egErrWrap.err != nil {
	return egErrWrap.err
}

The concrete type in egErr would therefore always be exactly WrappedError, and so this panic would then be impossible.

Again I'm sorry I can't offer a specific reproduction case for this, but I hope the above is useful. I'm not blocked by this because I've now corrected my manifest content so that it doesn't cause these errors.

@apparentlymart
Copy link
Author

apparentlymart commented Mar 6, 2025

I encountered the same panic again a little while later and so I added some print statements to get a better idea of what was going on. It turns out that it occurs when I try to push a manifest using the standard empty JSON config descriptor into a registry that doesn't have that blob already -- this time using the oras cp --from-oci-layout command rather than ORAS-Go directly -- causing the following sequence of stores to egErr:

  • *fmt.wrapError: sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a: application/vnd.oci.empty.v1+json: not found
  • *url.Error: Post "http://127.0.0.1:5000/v2/test-thing/blobs/uploads/": sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a: application/vnd.oci.empty.v1+json: not found
  • *fmt.wrapError: sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a: application/vnd.oci.empty.v1+json: not found

The second store is causing the panic because *url.Error is inconsistent with the previous *fmt.wrapError.

I captured all three of these using a version of this function modified as I suggested in my original writeup, which fixed the problem and so also allowed the ORAS CLI to return one of the errors without crashing:

// Go concurrently invokes fn on items.
func Go[T any](ctx context.Context, limiter *semaphore.Weighted, fn GoFunc[T], items ...T) error {
	type WrapErr struct {
		err error
	}

	eg, egCtx := errgroup.WithContext(ctx)
	var egErr atomic.Value
	for _, item := range items {
		region := LimitRegion(egCtx, limiter)
		if err := region.Start(); err != nil {
			if egErrWrap, ok := egErr.Load().(WrapErr); ok && egErrWrap.err != nil {
				return egErrWrap.err
			}
			return err
		}
		eg.Go(func(t T) func() error {
			return func() error {
				defer region.End()
				err := fn(egCtx, region, t)
				if err != nil {
					fmt.Fprintf(os.Stderr, "%T: %s\n", err, err)
					egErr.CompareAndSwap(nil, WrapErr{err})
					return err
				}
				return nil
			}
		}(item))
	}
	return eg.Wait()
}

In case it's important for reproduction, my local OCI layout directory also lacks the blob for the empty JSON object. I suspect that's what the first of the three errors was complaining about, but I've not confirmed that.

@Wwwsylvia
Copy link
Member

Hi @apparentlymart , thanks for opening the issue. We will look into it once we have some bandwidth.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants