Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Clear warning required for containerd documentation #27017

Closed
aaabdallah opened this issue Mar 12, 2021 · 9 comments · Fixed by #27073
Closed

Clear warning required for containerd documentation #27017

aaabdallah opened this issue Mar 12, 2021 · 9 comments · Fixed by #27073
Labels
kind/support Categorizes issue or PR as a support question. language/en Issues or PRs related to English language needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@aaabdallah
Copy link

This is a Feature Request

What would you like to be added

The documentation for container runtimes at:

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd

needs to have a clear warning that by adding the suggested change in config.toml, this will actually PREVENT kubeadm init from successfully initializing a cluster (this is my experience multiple times with K8s 1.20.4 on Ubuntu 20.04 with containerd - not Docker).

Why is this needed

The current wording does not indicate that the suggested change for containerd's config.toml will actually break things.

Comments

It is not clear even why that suggestion is there given that it stops kubeadm init from working.

@k8s-ci-robot
Copy link
Contributor

@aaabdallah: This issue is currently awaiting triage.

SIG Docs takes a lead on issue triage for this website, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 12, 2021
@sftim
Copy link
Contributor

sftim commented Mar 12, 2021

The current text reads:

When using kubeadm, manually configure the cgroup driver for kubelet.

How would you reword this @aaabdallah? How about:
When using kubeadm, you must manually configure the cgroup driver for kubelet.

/language en
/priority awaiting-more-evidence
/sig cluster-lifecycle

@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Mar 12, 2021
@aaabdallah
Copy link
Author

The current text reads:

When using kubeadm, manually configure the cgroup driver for kubelet.

How would you reword this @aaabdallah? How about:
When using kubeadm, you must manually configure the cgroup driver for kubelet.

/language en
/priority awaiting-more-evidence
/sig cluster-lifecycle

Thank you for all the work that you do, related to Kubernetes and all other open source projects you contribute to. Your efforts are really appreciated.

My own experience is that the configuration snippet there is actually harmful in the scope of using kubeadm. I realize now that the page is discussing how to use containerd with the systemd cgroup driver in general... but the typical person coming to the "kubernetes.io" documentation is there to learn how to use containerd in the context of kubernetes. That makes that configuration snippet a bit out of place since it will actually prevent the usage of kubeadm.

For this reason, I believe it is better to be crystal clear and explicitly mention all of that.

"To use the systemd cgroup driver...

[plugins...
...
SystemdCgroup = true

However, when using kubeadm to initialize the cluster, the above configuration should NOT be done (it will actually prevent kubelet from bringing up static pods). Instead, manually configure the cgroup driver for kubelet."

@neolit123
Copy link
Member

neolit123 commented Mar 13, 2021

i don't see anything invalid on the page, but ideas for clarification are welcome.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

When systemd is chosen as the init system for a Linux distribution, the init process generates and consumes a root control group (cgroup) and acts as a cgroup manager
...

^ the above already explains when the systemd driver is needed.

My own experience is that the configuration snippet there is actually harmful in the scope of using kubeadm.

it is the opposite.
if you are using kubeadm, you must configure containerd with SystemdCgroup = true.

and then pass the same driver to the kubelet as explained here:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-control-plane-node

cgroup drivers have been confusing users for a long time and this is really a problem in the CRI spec.

there is a plan to automatically handle them but this might take a while to complete.
kubernetes/kubernetes#99808

@neolit123
Copy link
Member

neolit123 commented Mar 13, 2021

this area of the docs is seeing some refactor, so you can add comments here too:
#26786

@aaabdallah
Copy link
Author

it is the opposite.
if you are using kubeadm, you must configure containerd with SystemdCgroup = true.

This is the source of our difference:.

I have tried that setting multiple times (at least 4 times). It does NOT work for me on Ubuntu 20.04. When I do NOT specify that setting in containerd's config.toml, then and ONLY THEN does it work... for me.

@neolit123
Copy link
Member

neolit123 commented Mar 13, 2021

I have tried that setting multiple times (at least 4 times). It does NOT work for me on Ubuntu 20.04. When I do NOT specify that setting in containerd's config.toml, then and ONLY THEN does it work... for me.

i just tried following our k8s.io documentation for installing containerd / installing kubeadm / creating kubeadm cluster.

$ containerd --version
containerd containerd.io 1.4.4 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
$ cat /etc/containerd/config.toml | grep Systemd -C 2
          base_runtime_spec = ""
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            SystemdCgroup = true
    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/cni/bin"
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.10
Release:        20.10
Codename:       groovy
$ cat config.yaml 
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
kubeadm version && kubelet --version
kubeadm version: &version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:09:38Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes v1.20.4

everything works fine for me:

$ kubectl get po -A
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE
kube-system   cilium-fg9s7                         1/1     Running   0          2m20s
kube-system   cilium-operator-67895d78b7-ccn6g     1/1     Running   0          2m20s
kube-system   coredns-74ff55c5b-9ntwp              1/1     Running   0          3m36s
kube-system   coredns-74ff55c5b-w9x75              1/1     Running   0          3m36s
kube-system   etcd-instance-1                      1/1     Running   0          3m37s
kube-system   kube-apiserver-instance-1            1/1     Running   0          3m37s
kube-system   kube-controller-manager-instance-1   1/1     Running   0          3m37s
kube-system   kube-proxy-w7l5k                     1/1     Running   0          3m36s
kube-system   kube-scheduler-instance-1            1/1     Running   0          3m37s

also a number of different CI setups are running kubeadm like so, you must be failing something in your setup.
i doubt this is a containerd / OS bug of sorts, but it could be.

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Mar 13, 2021
@neolit123
Copy link
Member

neolit123 commented Mar 13, 2021

did you restart containerd after you added:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
   SystemdCgroup = true

the step 3. here should be after the driver is changed, so that's a documentation problem:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd

@neolit123
Copy link
Member

did you restart containerd after you added:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
   SystemdCgroup = true

the step 3. here should be after the driver is changed, so that's a documentation problem:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd

opened PR for this:
#27073
(it also closes this issue)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
kind/support Categorizes issue or PR as a support question. language/en Issues or PRs related to English language needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants