Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bad file descriptor when running with Github Actions #2593

Closed
3 tasks done
Jufik opened this issue Jul 12, 2024 · 12 comments · Fixed by #2629
Closed
3 tasks done

Bad file descriptor when running with Github Actions #2593

Jufik opened this issue Jul 12, 2024 · 12 comments · Fixed by #2629
Assignees
Labels
area/buildkit kind/bug Something isn't working
Milestone

Comments

@Jufik
Copy link

Jufik commented Jul 12, 2024

Contributing guidelines

I've found a bug and checked that ...

  • ... the documentation does not mention anything about my problem
  • ... there are no open or closed issues that are related to my problem

Description

The command :

docker buildx build \
--cache-from type=local,compression-level=2,src=/var/lib/docker/actions/$image \
--cache-to type=local,dest=/var/lib/docker/actions/$image,mode=max \
--file ./Dockerfile \
--tag hello:world
.

Throws a bad file descriptor error

Expected behaviour

The docker image should be built, pushed and cached.
Pin pointing to buildx 0.15.2 in the setup-build-action solves the issue with the exact same action configuration.

Actual behaviour

Caching fails and throws a bad file descriptor error:
ERROR: could not lock /var/lib/docker/actions/$image/index.json.lock: bad file descriptor

When looking into runners /var/lib/docker/actions/$image path, index.json.lock exists with runner rights.

Buildx version

v0.16.0 10c9ff9

Docker info

NA

Builders list

NA

Configuration

FROM nginx
COPY ./index.html /usr/share/nginx/html

Build logs

#12 pushing manifest for [...] 0.6s done
#12 DONE 25.1s

#14 exporting cache to client directory
#14 preparing build cache for export
#14 writing layer sha256:[...]
#14 writing layer sha256:[...] 1.0s done
#14 writing layer sha256:[...]
#14 writing layer sha256:[...] 0.2s done
#14 writing layer sha256:[...]
#14 writing layer sha256:[...] 0.2s done
#14 writing layer sha256:[...]
#14 writing layer sha256:[...] 0.2s done
#14 writing layer sha256:[...]
#14 writing layer sha256:[...] 0.6s done
#14 writing layer sha256:[...]
#14 writing layer sha256:[...] 1.9s done
#14 writing layer sha256:[...]
#14 writing layer sha256:[...] 0.1s done
#14 writing layer sha256:[...]
#14 writing layer sha256:[...] 0.1s done
#14 writing layer sha256:[...]
#14 writing layer sha256:[...] 0.2s done
#14 writing layer sha256:[...]
#14 writing layer sha256:[...] 0.2s done
#14 writing config sha256:[...] 0.1s done
#14 writing cache manifest sha256:[...]
#14 preparing build cache for export 4.9s done
#14 writing cache manifest sha256:[...] 0.1s done
#14 DONE 4.9s
ERROR: could not lock /var/lib/docker/actions/$image/index.json.lock: bad file descriptor
Error: buildx failed with: ERROR: could not lock /var/lib/docker/actions/$image/index.json.lock: bad file descriptor

Additional info

Context:
Context:

  • Buildx command runs within a Github Action runners in K8s.
  • Buildx is installed through docker/setup-buildx-action Github Action, without the version param: Uses latest BuildX version docker-container driver
  • Docker is built and pushed through docker/build-push-action Github Action
@crazy-max
Copy link
Member

Is it a self-hosted runner? Can you show the output of docker info and docker buildx ls?

@ozydingo
Copy link

I have this exact issue. Pinning buildx to 0.15.1 resolves the immediate issue (0.15.2 failed with "could not find version")

@ozydingo
Copy link

ozydingo commented Jul 16, 2024

Self-hosted runner, using cache settings:

      build_cache_from: |
        type=local,src=/home/runner/work/shared/main/${{ matrix.image.name }}
        type=local,src=/home/runner/work/shared/${{ needs.plan.outputs.branch_name }}/${{ matrix.image.name }}
        type=local,src=/home/runner/work/shared/${{ needs.plan.outputs.branch_name }}/${{ matrix.image.name }}
      build_cache_to: 'type=local,mode=max,compression=zstd,compression-level=4,dest=/home/runner/work/shared/${{ needs.plan.outputs.branch_name }}/${{ matrix.image.name }}'

where /home/runner/work/shared is a mounted EFS volume in order to share cache between different workflow runs.

The issue persisted when I used a completely new cache location and when i removed all build_cache_from options, indicating I that the issue is solely in the cache writing step regardless of any current cache.

Removing the build_cache_to which keeping build_cache_from resolved the error (but obviously isn't a long term viable solution)

@ozydingo
Copy link

On 0.16 / latest:

docker info

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Client:
 Version:    24.0.7
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.0
    Path:     /home/runner/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 2
 Server Version: 24.0.7
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 091922f03c2762540fd057fba91260237ff86acb
 runc version: v1.1.9-0-gccaecfc
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.1.92
 Operating System: Ubuntu 22.04.4 LTS (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 30.41GiB
 Name: app-runner-wsz5t-62w5d
 ID: c9fcc801-c6ae-4879-9419-da9d7638033d
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: ***
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

docker buildx ls

NAME/NODE                                           DRIVER/ENDPOINT      STATUS    BUILDKIT               PLATFORMS
builder-45d3ecc3-2ecb-4fd7-887b-99a6733fd6e0*       docker-container                                      
 \_ builder-45d3ecc3-2ecb-4fd7-887b-99a6733fd6e00    \_ buildx-context   running   v0.14.1                linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
buildx-context                                      docker                                                
 \_ buildx-context                                   \_ buildx-context   running   v0.11.7+d3e6c1360f6e   linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
default                                             docker                                                
 \_ default                                          \_ default          running   v0.11.7+d3e6c1360f6e   linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

@SaschaSchwarze0
Copy link

I reproduced this with pure buildkit. Started to happen with buildkit v0.15.0. I patched v0.15.0 and only reverted github.com/gofrs/flock back to v0.8.1. This resolves it.

In my environment, the cache directory is a volume mount in a pod which is a NFS-based persistent volume.

@crazy-max
Copy link
Member

@SaschaSchwarze0 Thanks for your repro, could you post BuildKit logs in debug please?

@SaschaSchwarze0
Copy link

@SaschaSchwarze0 Thanks for your repro, could you post BuildKit logs in debug please?

One quick clarification in addition to what I wrote above. The patch (= the revert of github.com/gofrs/flock) is necessary for buildctl, not for buildkitd.

With debugging enabled on buildctl, the following stack trace is shown:

error: could not lock /tmp/buildkit-cache/index.json.lock: bad file descriptor
122 v0.15.0 buildctl --debug build --trace=/tmp/buildkit-cache/trace.log --progress=plain --frontend=dockerfile.v0 --opt=filename=Dockerfile --opt=platform=linux/amd64,linux/arm64 --local=context=/workspace/source --local=dockerfile=/workspace/source --output=type=oci,tar=false,dest=/workspace/output-image --export-cache=type=local,mode=max,dest=/tmp/buildkit-cache --import-cache=type=local,src=/tmp/buildkit-cache
github.com/moby/buildkit/client/ociindex.StoreIndex.Put
    /src/client/ociindex/ociindex.go:65
github.com/moby/buildkit/client.(*Client).solve
    /src/client/solve.go:349
github.com/moby/buildkit/client.(*Client).Build
    /src/client/build.go:64
main.buildAction.func5
    /src/cmd/buildctl/build.go:369
golang.org/x/sync/errgroup.(*Group).Go.func1
    /src/vendor/golang.org/x/sync/errgroup/errgroup.go:78
runtime.goexit
    /usr/local/go/src/runtime/asm_arm64.s:1222

@crazy-max
Copy link
Member

@SaschaSchwarze0 Thanks, do you repro with v0.11.0 as well? I wonder if this issue is related to this change gofrs/flock#87

@SaschaSchwarze0
Copy link

SaschaSchwarze0 commented Jul 21, 2024

@SaschaSchwarze0 Thanks, do you repro with v0.11.0 as well? I wonder if this issue is related to this change gofrs/flock#87

buildctl v0.15.0 compiled with github.com/gofrs/flock@v0.11.0: works

So yeah, change must be in https://github.com/gofrs/flock/releases/tag/v0.12.0.

EDIT1: one second, copied the wrong file to my test setup

EDIT2: no, the change you refer to I also found suspicious. But, it looks like this:

buildctl v0.15.0 compiled with github.com/gofrs/flock@v0.10.0: works
buildctl v0.15.0 compiled with github.com/gofrs/flock@v0.11.0: fails

So, must be somehow something in gofrs/flock@v0.10.0...v0.11.0.

@crazy-max
Copy link
Member

Seems to be gofrs/flock@b659e1e where f.flag would not be in read-write mode gofrs/flock@b659e1e#diff-87c2c4fe0fb43f4b38b4bee45c1b54cfb694c61e311f93b369caa44f6c1323ffR192 but read-only gofrs/flock@b659e1e#diff-22145325dded38eb5288ed3321a113d8260ccc70747ee04d4551bfd2fba975fdR69

@crazy-max
Copy link
Member

@SaschaSchwarze0 Should be fixed with moby/buildkit#5183

@jamshid
Copy link

jamshid commented Jul 24, 2024

Is there an easy workaround for docker-ce users on Ubuntu 20.04?
I just did a apt-get upgrade to get a reported docker security fix and now my build scripts are DOA.

$ docker buildx prune -f --verbose
ERROR: bad file descriptor

Ok downgrading the buildkit package seems to fix it. Hopefully that's right approach and next upgrade fixes it.

# apt list --all-versions docker-buildx-plugin
Listing... Done
docker-buildx-plugin/focal,now 0.16.1-1~ubuntu.20.04~focal amd64 [installed]
docker-buildx-plugin/focal 0.15.1-1~ubuntu.20.04~focal amd64
...

# apt-get install docker-buildx-plugin=0.15.1-1~ubuntu.20.04~focal
...
The following packages will be DOWNGRADED:
  docker-buildx-plugin
...
Setting up docker-buildx-plugin (0.15.1-1~ubuntu.20.04~focal)
...

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area/buildkit kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants