Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Unreliable DNS during Container Builds #101

Open
tpanum opened this issue Dec 11, 2023 · 11 comments
Open

Unreliable DNS during Container Builds #101

tpanum opened this issue Dec 11, 2023 · 11 comments

Comments

@tpanum
Copy link

tpanum commented Dec 11, 2023

I have a GitHub Actions pipeline that roughly process like so:

  1. Connect to a private Tailscale network using this Github Action.
  2. Start a docker build of a multi-stage Dockerfile where an internal python package index[1] is accessed to pull dependencies.

[1]: This package index is available within the Tailscale network and is discovered using DNS of an internal DNS server configured in Tailscale.

Step 2 occasionally fail due to DNS lookups for the package index failing, thus DNS resolving not working reliably.

@mloberg
Copy link

mloberg commented Jan 12, 2024

Are you using buildx? I was having this issue, but it seems to be resolved by setting network=host in driver-opts

@tpanum
Copy link
Author

tpanum commented Jan 13, 2024

I have been using buildx, yeah. I have tried using the network=host in the past, but did not make it more reliable from my experiences. I ended up doing dig example.com > $IP and then use --add-host.

@henworth
Copy link

I was dealing with basically the same issue--container builds that rely on internal resources failing to resolve dns--and the thing that finally fixed it in my case was to configure buildx with the internal dns hosts:

- name: Setup buildx with internal DNS
  uses: docker/setup-buildx-action@v3
  with:
    config-inline: |
      [dns]
        nameservers="<comma separated list>"

@kdpuvvadi
Copy link

kdpuvvadi commented Mar 26, 2024

Even that is also incosistant, it works sometimes. Mostly doesn't.

For some reason, it's using 168.63.129.16 for dns resolution.

@tpanum
Copy link
Author

tpanum commented Mar 27, 2024

Following @henworth's example, I had to change it slightly to get it working:

with:
  buildkitd-config-inline: |
    [dns]
      nameservers=["..."]

@kdpuvvadi
Copy link

kdpuvvadi commented Mar 27, 2024

Still same though @tpanum.

mines looks like this

- name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3.2.0
        with:
          buildkitd-config-inline: |
            [dns]
              nameservers=["100.78.xx.xx","100.78.xx.xx"]

And errors out

#45 ERROR: failed to push git.local.puvvadi.net/***/blog:e234247: failed to do request: Head "https://git.local.puvvadi.net/v2/***/blog/blobs/sha256:bbba97e7b63ba8e2a28aa20a0a10b6ba491f29a395e8f7cc5bdf6ff4fe783000": dial tcp: lookup git.local.puvvadi.net on 168.63.129.16:53: no such host

@marcelofernandez
Copy link

We've also seen this same issue on docker run github actions steps, so it's not a docker build issue only. For example, a job containing this bash step:

runner@fv-az1797-395:~/work/repo/repo$ docker run -it --rm \
                --env ENV_VAR \
                <aws_account_id>.dkr.ecr.us-east-2.amazonaws.com/repo:latest /bin/bash
root@35f3013f9caf:/opt/code# cat /etc/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 168.63.129.16
nameserver 100.100.100.100
search vnw05d5vvvpeplv1mpaxmbipab.bx.internal.cloudapp.net tail1abc1.ts.net
root@35f3013f9caf:/opt/code#

This way we had many issues resolving Tailscale hosts inside this container. Fortunately, we managed to fix it using the --dns docker run option:

runner@fv-az1797-395:~/work/repo/repo$ docker run --dns=100.100.100.100 -it --rm \
                --env ENV_VAR \
                <aws_account_id>.dkr.ecr.us-east-2.amazonaws.com/repo:latest /bin/bash
root@35f3013f9caf:/opt/code# cat /etc/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 100.100.100.100
search vnw05d5vvvpeplv1mpaxmbipab.bx.internal.cloudapp.net tail1abc1.ts.net
root@35f3013f9caf:/opt/code#

Regards

@jaxxstorm
Copy link
Contributor

This also creates a scenarion where any action with

runs:
  using: docker

is unreliable

@jaxxstorm
Copy link
Contributor

jaxxstorm commented Jul 7, 2024

I think I've discovered why this happened, to me at least.

The GitHub actions runners are on 172.17.0.0/16

\n===== Network Interfaces =====
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo 
       valid_lft forever preferred_lft forever
4: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever
\n===== Routing Table =====
default via 172.17.0.1 dev eth0 
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.2 

I have a subnet router that was advertising that same cidr. As soon as I connected Tailscale via actions, any docker based workload could no longer get to the DNS server inside the runner

Status: Downloaded newer image for nicolaka/netshoot:latest
===== DNS Configuration =====
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

nameserver 168.63.129.16
search grvplcbrxqwulonopmolb0o12f.dx.internal.cloudapp.net tail9e93b.ts.net

# Based on host file: '/run/systemd/resolve/resolv.conf' (legacy)
# Overrides: []
\n===== Network Interfaces =====
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen [100](https://github.com/jaxxstorm/tailscale-actions-example/actions/runs/9829717610/job/27135072514#step:5:101)0
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo 
       valid_lft forever preferred_lft forever
5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever
\n===== Routing Table =====
default via 172.17.0.1 dev eth0 
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.2 
\n===== DNS Resolution Test =====
;; communications error to 168.63.129.16#53: timed out
;; communications error to 168.63.129.16#53: timed out
;; communications error to 168.63.129.16#53: timed out
;; no servers could be reached


;; communications error to 168.63.129.16#53: timed out
;; communications error to 168.63.129.16#53: timed out
;; communications error to 168.63.129.16#53: timed out
;; no servers could be reached

@youp-augur
Copy link

Still same though @tpanum.

mines looks like this

  • name: Set up Docker Buildx
    uses: docker/setup-buildx-action@v3.2.0
    with:
    buildkitd-config-inline: |
    [dns]
    nameservers=["100.78.xx.xx","100.78.xx.xx"]
    And errors out
#45 ERROR: failed to push git.local.puvvadi.net/***/blog:e234247: failed to do request: Head "https://git.local.puvvadi.net/v2/***/blog/blobs/sha256:bbba97e7b63ba8e2a28aa20a0a10b6ba491f29a395e8f7cc5bdf6ff4fe783000": dial tcp: lookup git.local.puvvadi.net on 168.63.129.16:53: no such host

@kdpuvvadi I had a similar issue and the solution was to set network=host on the Docker BuildX setup step

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        with:
          driver-opts: |
            network=host

@kdpuvvadi
Copy link

@kdpuvvadi I had a similar issue and the solution was to set network=host on the Docker BuildX setup step

  - name: Set up Docker Buildx
    uses: docker/setup-buildx-action@v3
    with:
      driver-opts: |
        network=host

This is not reliable at all. Hit or miss. switched to self-hosted runners and local dns worked always.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants