-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
apk fetch hangs #307
Comments
Are you able to start another container shell and curl dl-cdn.alpinelinux.org? Sounds like a networking issue somewhere. |
Seems like a DNS issue. Not sure why, I've set correct dns settings in
Got around this by running using https
|
Doesn't seem like a DNS issue, since it has resolved. Unfortunately, for while, I'm at the same point: names are resolved, but can't connect to anything. |
In another issue I thought this could be a DNS issue because the CDN POP IP addresses may change more frequently. If DNS is being cached somewhere and TTL not honored then an outdated IP address may be returned for dl-cdn.alpinelinux.org. This is why I need debugging information to help pinpoint it. When the issue happens, I need the |
Success while another process hangs
One more with matching url
|
I ran into this issue in kubernetes. I bounced the kube-dns pod to flush any records it might be caching. This fixed the problem for me. EDIT: Actually it didn't. Still having the problem. |
Are you running Docker-in-Docker by any chance? I have this issue only in dind containers. |
I only have this with dind + kubernetes. However it doesn't happen if I use '--network host' or '--net host'. I am using weave overlay network. |
@andremarianiello thanks for info, I am also using dind+kubernetes (though with flannel). Have you enabled |
@nogoegst No I haven't. It worked without doing that. |
@andremarianiello so you mean you set |
I added it to my docker client commands, e.g. 'docker build --network host ...' |
I've see the k8s issue quite a bit. wireshark shows fastly getting stuck sending oversized packets with a do not fragment flag. I don't think this is OP's issue though as its docker for windows. I recently started running into a similar issue to this as well though. On linux but the behavior is the same, apk fails fetching and mostly on the index. Again, pulled up wireshark and recreated the problem. I see things going smoothly then the apk process seems to stop ACK'ing segments from the fastly server. Fastly starts throttling and resending segments and it lags out. I've never recreated this with curl, but it looks like apk uses a built in BSD libfetch for its HTTP communications so maybe there's a bug in there? My network communication understanding is just enough to get me this far so here's a link to the wireshark log of the communications. hopefully an alpine dev has a better understanding and can parse out a clue or find the problem. |
It seems like fastly is filtering ICMP need to frag packets, which means that PMTU does not work. This can be a problem is your traffic goes via a network link that has MTU lower than 1500 (typically tunnels/vpns, PPPoE and similar). This can be worked around by enabling tcp mss clamping in the network. |
Yeah I was treating this as a different issue because it has slightly different characteristics and not the same as #279. The Wireshark link in #307 (comment) shows a different traffic behavior. Instead of the traffic doesn't get killed at the bridge, it is never ACK'd by libfetch and Fastly's TCP session gets stuck trying to get recover. I don't know if it's even fastly's fault as on the surface it seems to be doing the right thing.
|
observation, networking is hard. |
where exactly did you put this? I'm facing this on kubernetes right now, where gitlab spins up a container with docker running in docker...driving me nuts for the lasst 4 hours. |
@evanrich My gitlab CI was using docker:dind as a service container, and my main build container had a docker client in it which I used to connect to the service container. My repo has a dockerfile in it that I need to be built by the gitlab runner. My .gitlab-ci.yaml file contained the command
This builds my docker image. One of my layers in the dockerfile runs
docker will run the |
I believe that the problem is that in docker the MTU is lower than on the host. The way this is supposed to work is via path MTU discovery, but fastly appears to block the PMTU icmp packet (I guess it is a part of their DDoS defence). The way to "fix" this properly is to enable MSS clamping on the host. The other alternative is to use a different mirror that does not block the PMTU traffic. |
@ncopa How can we check to see if our docker mtu is lower than our host mtu? |
are you not using auto devops? I haven't specified a .gitlab-ci.yml file yet, I seem to have worked around part of it via switching to alpine.global.ssl.fastly.net, but i get this
and it just hangs at installing binutils every time. Found this: #279 . seems to be a wide spread issue in k8s due to lower mtu. I was able to get slightly further with changing my mirror from a fastly mirror to mirror.clarkson.edu using builds are running, will update when they finish. Edit: Just finished successfully... build 174 (that's how many times it's taken trying to get this to work"
|
On Kubernetes one should run these containers with |
Made this small change on the playbook to include `--network host` parameter on docker build command to avoid apk to hang when trying to fetch Alpine linux image, as described on [docker-alpine GItHub issue 307](gliderlabs/docker-alpine#307).
Hi here, is there anyone still suffering this issue? Seems the issue has gone some how. |
20 days ago it still was present, see above |
It still present last week, what about these 2days? could u have a try 👀 |
I tried and it worked for these two days. However hangs again now. |
Hi, thanks for this. This help my build. I remember I found article about MTU that maybe useful to give more information |
@smnbbrv Thanks a lot for the mtu hint, now drone is finally building ... |
I have apk fetch hangs indefinitely, it's not because of mtu but network glitch. My alpine and apk version: ~# apk --version I can reproduce the issue by just shutting down my eth when apk fetch is downloading apks. ~# apk fetch -R linux-lts Then I bring eth back and check network connectivity is fine. UPDATE 2020-06-16: |
I was able to narrow down the issue and is IPv6. If docker host has IPv6 enabled you are pretty much f**** as apk fetch from inside container will get stuck trying to fetch from APK gets fully stuck without ever timing out or trying to to use IPv4 addresses, which will likely work. That problem is a huge PITA as normal debugging techiniques will not give any usable results:
UPDATE, we have a working hackI can confirm that https://stackoverflow.com/a/41497555/99834 hack works on both docker and podman, mainly adding |
Old problem, but it still happens! For me, none of the options worked! 1# Repository change, for any mirror, add a RUN line or Joining an existing RUN: 2 # The one that best behaved was to change the DNS of the Image, add a RUN line or Joining an existing RUN: 3 # Change the Docker DNS: |
Running the offical Drone helm chart on k3os (v0.11) I had to set the MTU to 1450 for my build to finish and not stall on fetching the apkindex.
|
I had a similar issue. We have a docker-in-docker build container within a Rancher 2 / Kubernetes environment. I had to decrease the MTU of the inner docker service by adding daemon.json
|
did the trick, thx! |
We just got hit by this, running Drone docker plugin in a Kubernetes cluster. Decreasing the MTU to the value used by the What I absolutely do not understand is how it worked for almost a year without this workaround. We didn't change anything about our cluster or Drone setup, or the Alpine versions used in our pipelines. If someone has discovered more information about this, please do share. |
After 4 hours of debugging managed to solve this by changing this in the gitlab-ci file:
TO
|
On Codefresh runners you can set
|
fetch of the apk index just hangs. I hit this now on a Ubuntu server and Docker for Windows
Docker version 17.03.1-ce, build c6d412e
The text was updated successfully, but these errors were encountered: