Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

gradle tasks are stuck forever on ARM64 image builds #42

Closed
balusarakesh opened this issue Sep 30, 2021 · 8 comments
Closed

gradle tasks are stuck forever on ARM64 image builds #42

balusarakesh opened this issue Sep 30, 2021 · 8 comments

Comments

@balusarakesh
Copy link

balusarakesh commented Sep 30, 2021

Behaviour

  • gradle build is stuck forever while building ARM64 docker image using image: tonistiigi/binfmt:latest
  • here is the process that is stuck forever:
/usr/bin/qemu-aarch64 /usr/lib/jvm/java-1.8.0-amazon-corretto/bin/java -Dorg.gradle.daemon=false -Dorg.gradle.appname=gradlew -classpath /usr/app/gradle/wrapper/gradle-wrapper.jar org.gradle.wrapper.GradleWrapperMain build -x :bootRepackage -x test --continue

Steps to reproduce this issue

  1. use the following workflow and try to build a docker image with the following Dockerfile on the runner pod with image summerwind/actions-runner-dind:v2.283.1-ubuntu-20.04-24602ff:
jobs:
  docker:
    runs-on: [size/lg]
    steps:
      - name: Set up  QEMU
        uses: docker/setup-qemu-action@v1
        with:
          image: tonistiigi/binfmt:latest
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
      
      - name: Build/Push
        uses: docker/build-push-action@v2
        with:
          context: ./
          platforms: linux/arm64

Dockerfile:

FROM amazoncorretto:8u302
ENV GRADLE_OPTS=-Dorg.gradle.daemon=false
ENV APP_HOME=/usr/app/
WORKDIR $APP_HOME
COPY . .
RUN ./gradlew build -x :bootRepackage -x test --continue
RUN echo "hello"
  1. gradle version:
Gradle 3.5
Build time:   2017-04-10 13:37:25 UTC
Revision:     b762622a185d59ce0cfc9cbc6ab5dd22469e18a6
Groovy:       2.4.10
Ant:          Apache Ant(TM) version 1.9.6 compiled on June 29 2015
JVM:          1.8.0_302 (Amazon.com Inc. 25.302-b08
  1. Once the ./gradlew build -x :bootRepackage -x test --continue command finished running, we can see in the github action logs that the build passed but the runner doesn't proceed to the next step echo "hello". We exec into the runner pod and noticed that the gradlew command is still running, for some reason the command exit code is not being received by the qemu process OR might be some other issue

Expected behaviour

  • ./gradlew build -x :bootRepackage -x test --continue command should finish and the build should proceed to the next step echo "hello"

Actual behaviour

  • gradlew process is stuck forever

Logs

  • unfortunately all the logs look just normal and they contain private build info which I cannot share here
  • as you can see in the screenshot the build is stuck for hours and gets terminated when the runner pod is killed

Screen Shot 2021-09-30 at 12 10 52 PM

issue.

workaround

  • we noticed that switching to image qemu-v6.0.0-12 fixed the issue:
- name: Set up  QEMU
  uses: docker/setup-qemu-action@v1
  with:
    image: tonistiigi/binfmt:qemu-v6.0.0-12
@crazy-max
Copy link
Member

crazy-max commented Oct 4, 2021

@balusarakesh Don't think that's a QEMU issue but the gradle daemon. You should avoid the Gradle daemon inside a Dockerfile so the process is terminated after build completion:

FROM amazoncorretto:8u302
ENV GRADLE_OPTS=-Dorg.gradle.daemon=false
ENV APP_HOME=/usr/app/
WORKDIR $APP_HOME
COPY . .
RUN ./gradlew build --no-daemon -x :bootRepackage -x test --continue

Edit: My bad just saw ENV GRADLE_OPTS=-Dorg.gradle.daemon=false..

@crazy-max
Copy link
Member

@balusarakesh echo "hello" is not a valid syntax. Should be RUN echo "hello".

@balusarakesh
Copy link
Author

@balusarakesh echo "hello" is not a valid syntax. Should be RUN echo "hello".

@crazy-max

  • I posted wrong syntax in the ticket description but I'm using the correct syntax in my dockerfile
  • just saw your comment here
  • this doesn't feel like an issue with the runner because the workflow finishes sometimes
  • is there a way to enable debug logging on the qemu side to see what's going on?

also, we notice a weird phenomenon:

  • while running gradlew build command, java process uses a lot of CPU like 3-4 cores
  • once the build finishes we notice the BUILD SUCCESSFUL message (first screenshot) in the workflow logs and CPU usage for the runner drops to a few millicores (second screenshot) - this clearly means the gradlew command finished running
  • now I don't see any java process running in top command BUT I still see the /usr/bin/qemu-aarch64 /usr/lib/jvm/java-1.8.0-amazon-corretto/bin/java -Dorg.gradle.daemon=false -Dorg.gradle.appname=gradlew -classpath /home/app/gradle/wrapper/gradle-wrapper.jar org.gradle.wrapper.GradleWrapperMain bootRepackage command running
  • I'm thinking, what if qemu binary thinks that the gradlew command is running even though it is not? maybe enabling debug logging will tell us something?

Screen Shot 2021-10-04 at 1 08 04 PM

Screen Shot 2021-10-04 at 1 07 21 PM

@crazy-max
Copy link
Member

crazy-max commented Oct 4, 2021

this doesn't feel like an issue with the runner because the workflow finishes sometimes

Would like to be sure if it works or not on standard GitHub Runner because I don't know how has been configured your self-hosted runner.

is there a way to enable debug logging on the qemu side to see what's going on?

No but you can take a look at the BuildKit container logs and post them here.

now I don't see any java process running in top command BUT I still see the /usr/bin/qemu-aarch64 /usr/lib/jvm/java-1.8.0-amazon-corretto/bin/java -Dorg.gradle.daemon=false -Dorg.gradle.appname=gradlew -classpath /home/app/gradle/wrapper/gradle-wrapper.jar org.gradle.wrapper.GradleWrapperMain bootRepackage command running

Hum that doesn't feel like an issue with QEMU for me but something looks odd with Gradle. Can you post the full logs of the workflow and if you can give me the link to your repo. Thanks.

@balusarakesh
Copy link
Author

  • I was wrong about the java process not running, it was still running but it is not taking any CPU
  • I've enabled debug logging for buildx and uploading them here:
    9_Set up Docker Buildx.txt

here's a successful build which passed just now (I changed nothing except for enabling buildkit logs):
Screen Shot 2021-10-04 at 1 39 29 PM

  • unfortunately this is a private repo and the logs have proprietary data that I cannot share :(
  • Let me know if you think there is anything else I can look into (any alternatives to qemu you can suggest?)

@crazy-max
Copy link
Member

I've enabled debug logging for buildx and uploading them here:

I need the whole logs not just setup-buildx because BuildKit container logs are available at the end of the job as explained here. Can you post the whole archive?

Mahoney added a commit to Mahoney/docker-multiarch-java-issue that referenced this issue Feb 9, 2022
@crazy-max
Copy link
Member

You can set ENV QEMU_STRACE=1, it could print some additional info.

@crazy-max crazy-max closed this as not planned Won't fix, can't repro, duplicate, stale May 2, 2022
@shlyakpavel
Copy link

Can paper-plane-developers/paper-plane#284 be related?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants