-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fallback to default registry endpoint is broken when using "*"
wildcard mirror in registries.yaml with containerd 2.0
#11857
Comments
After changing the mirrors:
"*":
endpoint:
- "http://localhost:5000"
configs:
"docker.io":
"quay.io":
"*":
tls:
insecure_skip_verify: true to... #mirrors:
# "*":
# endpoint:
# - "http://localhost:5000"
configs:
"docker.io":
"quay.io":
"*":
tls:
insecure_skip_verify: true all newer Deployments are working now. |
Did you disable fallback to the default endpoint with I would probably increase the containerd log level by setting |
Hi Brandon, no i didn't disable the fallback, here my root@staging1:~# cat /etc/rancher/k3s/config.yaml
flannel-backend: none
disable-kube-proxy: true
disable-network-policy: true
disable-helm-controller: true
disable:
- servicelb
- traefik
tls-san:
- localhost
- 127.0.0.1
- cluster
- 10.10.1.254
- staging1
- 172.16.0.1
- 192.168.122.246
- staging2
- 172.16.0.2
- 192.168.122.90
bind-address: "172.16.0.1"
node-ip: "172.16.0.1"
node-external-ip: "172.16.0.1"
#kubelet-arg:
#- "node-ip=172.16.0.1"
#cluster-cidr: "10.42.0.0/16" # <- managed by cilium ipam
cluster-dns : "10.43.0.10"
service-cidr: "10.43.0.0/16"
egress-selector-mode: cluster
debug: false
#datastore-endpoint: "mysql://k3s:aeh0Eu$p3O@tcp(cluster:3306)/k3s"
datastore-endpoint: "http://172.16.0.1:2379,https://172.16.0.2:2379" Thats why iam wondering why this has worked before in version |
If you look at the release notes, you should see that we upgraded from containerd 1.7 to 2.0 in this release. Lots of changes there. Check the logs to see what exactly it's doing. Could be a regression in fallback to the default. Do you see the same thing if you explicitly list your mirror as a mirror for docker hub, instead of using the wildcard? |
So i asume |
It is read by k3s, but its contents are pretty much exclusively used to generate the containerd configuration file. It is containerd that actually pulls and runs images. |
Ok can't find much, the
So it seems he can't use the local registry as mirror, reason is unknown. The
|
this is just an Alpine test image, i created before. So now it looks like the fallback doesn't work anymore, correct!? |
here the |
and here the k3s |
I read currently the containerd migration docs, there is a section https://containerd.io/releases/#deprecated-features .. which mentions a change so we should use CONTAINERD_ENABLE_DEPRECATED_PULL_SCHEMA_1_IMAGE=1 So i added that environment var to |
Also affects v1.31.6+k3s1 with containerd://2.0.2-k3s2 After deletion of /etc/rancher/k3s/registries.yaml and restart agents/servers everything works fine but without integrated mirror. |
Same problem here after upgrade from 1.31.5. to 1.31.6. Embedded registry mirror is busted and any images that are not already cached get 404 not found immediately when I try to pull with This is all that's in my registries.yaml:
|
@sholdee what do you mean by "embedded registry mirror is busted"? The conversation here so far has not involved the embedded registry mirror (spegel) at all. If you are having a similar problem with that, please provide concrete details. @lirtistan were you able to try listing your registry as a mirror for docker hub, instead of using the wildcard? I suspect perhaps only wildcard support is broken in containerd 2.0. I will also note that you don't need to set Try this: mirrors:
docker.io:
endpoint:
- "http://localhost:5000"
quay.io:
endpoint:
- "http://localhost:5000" |
Sure gime some minutes to verify, actually at dinner. |
"*"
in registries.yaml with containerd 2.0
"*"
in registries.yaml with containerd 2.0"*"
wildcard mirror in registries.yaml with containerd 2.0
@brandond I can verify that the Deployments are now working with your suggestion, tyvm for investigation ❤️ |
After 1.31.5 > 1.31.6 upgrade, all images that are not already cached fail to pull with
Pulling images with My containerd/registries configuration is all default except for enabling Spegel and the following registries.yaml on all my nodes:
Either removing registries.yaml and restarting k3s or removing |
@sholdee please see the comment you responded to, and let me know if using explicit mirror entries instead of the wildcard works around the issue for you. This does not appear to have anything to do with the embedded registry, but rather containerd 2.0 is failing to fall back to the default endpoint when using the |
This does seem to fix the issue: ethan@k3s-worker-1:~ $ sudo cat /etc/rancher/k3s/registries.yaml
mirrors:
"*":
ethan@k3s-worker-1:~ $ sudo crictl pull quay.io/coreos/etcd:v3.6.0-rc.1
E0228 12:43:09.049821 1023110 log.go:32] "PullImage from image service failed" err="rpc error: code = NotFound desc = failed to pull and unpack image \"quay.io/coreos/etcd:v3.6.0-rc.1\": failed to resolve reference \"quay.io/coreos/etcd:v3.6.0-rc.1\": quay.io/coreos/etcd:v3.6.0-rc.1: not found" image="quay.io/coreos/etcd:v3.6.0-rc.1"
FATA[0000] pulling image: rpc error: code = NotFound desc = failed to pull and unpack image "quay.io/coreos/etcd:v3.6.0-rc.1": failed to resolve reference "quay.io/coreos/etcd:v3.6.0-rc.1": quay.io/coreos/etcd:v3.6.0-rc.1: not found
ethan@k3s-worker-1:~ $ sudo nano /etc/rancher/k3s/registries.yaml
ethan@k3s-worker-1:~ $ sudo cat /etc/rancher/k3s/registries.yaml
mirrors:
docker.io:
quay.io:
ghcr.io:
gcr.io:
registry.k8s.io:
public.ecr.aws:
oci.external-secrets.io:
ethan@k3s-worker-1:~ $ sudo systemctl restart k3s-agent
ethan@k3s-worker-1:~ $ sudo crictl pull quay.io/coreos/etcd:v3.6.0-rc.1
Image is up to date for sha256:f3788da74c9c2dce76fc84ff0ff64636641ed52412438d88cb69c475e848cd56
ethan@k3s-worker-1:~ $ |
Thanks, that confirms what I thought was going on. I can take a look at where this is broken, I suspect we may need to open an issue/PR against https://github.com/containerd/containerd to resolve this regression if it is not already addressed for the upcoming v2.0.3 release. |
Environmental Info:
K3s Version:
Node(s) CPU architecture, OS, and Version:
2 Node Test Cluster
uname -r
reporting6.1.0-30-amd64
Both installed with a minimal Debian 12 OS (ansible deployment)
Cluster Configuration:
lo
two Interfaceseth0
for WAN Traffic and eth1 for LAN TrafficDescribe the bug:
New Workload-Deployments in K3s v1.32.2+k3s1 are failing/hanging in
ContainerCreating
status, because something must have changed with the format of theregistries.yaml
config.Output from a
cilium-agent
Pod describe:So i moved
/etc/rancher/k3s/registries.yaml
to another location and restarted thek3s.service
and voila everything got pulled.Content of the
registries.yaml
:Steps To Reproduce:
see Bug-Description above.
Expected behavior:
registries.yaml
hasn't changed between my earlier deployments nor does the Documentation mention something.So everything should working.
Actual behavior:
Images can't be pulled, the root cause is actually unknown, currently i haven't much time to dive into the code.
Additional context / logs:
see Bug-Description above.
The text was updated successfully, but these errors were encountered: