Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Upgrade kubernetes to v1.10.0 #68

Closed
wants to merge 2 commits into from

Conversation

pgayvallet
Copy link
Contributor

No description provided.

Signed-off-by: pgayvallet <pierre.gayvallet@gmail.com>
Signed-off-by: pgayvallet <pierre.gayvallet@gmail.com>
@ijc
Copy link
Contributor

ijc commented Apr 5, 2018

CI here lacks the capability to actually boot a VM with Kube in a reasonable time so I've taken to running the tests from the tests directory (simply an rtf run in there) on exciting PRs such are this.

Unfortunately the result here is not good. There are 4 tests covering the cross-product of Docker vs CRI for runtime and bridge vs weave for networking. I'm seeing:

test$ rtf run
LABELS: linux, Debian, testing, amd64
ID: c864d026-6689-4a06-86b3-f4031df35885
[FAIL    ] kubernetes.smoke.cri-bridge 73.32s
[FAIL    ] kubernetes.smoke.cri-weave 76.40s
[FAIL    ] kubernetes.smoke.docker-bridge 1922.39s
[FAIL    ] kubernetes.smoke.docker-weave 1919.32s
[SUMMARY ] LogDir: c864d026-6689-4a06-86b3-f4031df35885
[SUMMARY ] Version: testing
[SUMMARY ] Passed: 0
[SUMMARY ] Failed: 4
[SUMMARY ] Cancelled: 0
[SUMMARY ] Skipped: 0
[SUMMARY ] Duration: 3991.75s
Some tests failed

results.zip

The cri ones both failed with:

[STDOUT  ] 2018-04-05T12:39:08.881590376+01:00: [init] This might take a minute or longer if the control plane images have to be pulled.
[STDOUT  ] 2018-04-05T12:39:10.186350065+01:00: Connection to localhost closed.
[STDOUT  ] 2018-04-05T12:39:10.187096495+01:00: FAIL kubeadm-init.sh (eof)

While the docker ones both failed with:

[STDOUT  ] 2018-04-05T12:41:12.299059146+01:00: [init] This might take a minute or longer if the control plane images have to be pulled.
[STDOUT  ] 2018-04-05T13:11:12.302084127+01:00: 
[STDOUT  ] 2018-04-05T13:11:12.303340814+01:00: Unfortunately, an error has occurred:
[STDOUT  ] 2018-04-05T13:11:12.303361383+01:00:         timed out waiting for the condition
[STDOUT  ] 2018-04-05T13:11:12.303371730+01:00: 
[STDOUT  ] 2018-04-05T13:11:12.303382034+01:00: This error is likely caused by:
[STDOUT  ] 2018-04-05T13:11:12.303392927+01:00:         - The kubelet is not running
[STDOUT  ] 2018-04-05T13:11:12.303402566+01:00:         - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
[STDOUT  ] 2018-04-05T13:11:12.303414947+01:00:         - Either there is no internet connection, or imagePullPolicy is set to "Never",
[STDOUT  ] 2018-04-05T13:11:12.303424089+01:00:           so the kubelet cannot pull or find the following control plane images:
[STDOUT  ] 2018-04-05T13:11:12.303433422+01:00:                 - k8s.gcr.io/kube-apiserver-amd64:v1.10.0
[STDOUT  ] 2018-04-05T13:11:12.303442569+01:00:                 - k8s.gcr.io/kube-controller-manager-amd64:v1.10.0
[STDOUT  ] 2018-04-05T13:11:12.303451511+01:00:                 - k8s.gcr.io/kube-scheduler-amd64:v1.10.0
[STDOUT  ] 2018-04-05T13:11:12.303460277+01:00:                 - k8s.gcr.io/etcd-amd64:3.1.12 (only if no external etcd endpoints are configured)
[STDOUT  ] 2018-04-05T13:11:12.303469419+01:00: 
[STDOUT  ] 2018-04-05T13:11:12.303478832+01:00: If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
[STDOUT  ] 2018-04-05T13:11:12.303507289+01:00:         - 'systemctl status kubelet'
[STDOUT  ] 2018-04-05T13:11:12.303516004+01:00:         - 'journalctl -xeu kubelet'
[STDOUT  ] 2018-04-05T13:11:12.303525415+01:00: couldn't initialize a Kubernetes cluster
[STDOUT  ] 2018-04-05T13:11:12.307491683+01:00: linuxkit-e694aaf0b765:/# ESC[6nFAIL kubeadm-init.sh (returned to prompt)

Not sure what's going on here -- did you manage to boot/initialise a node with this?

I'm going to repeat the tests in case I did something wrong with my builds.

@ijc
Copy link
Contributor

ijc commented Apr 5, 2018

Repeated and have got the first two failures. Looking at the in-progress logs for the third I fully expect it is going to fail after 1900s too.

@ijc
Copy link
Contributor

ijc commented Apr 5, 2018

FWIW master (018027c) passes, starting to dig a bit deeper now.

@ijc
Copy link
Contributor

ijc commented Apr 5, 2018

I don't think this PR is complete. e.g. /var/log/kubelet.err.log has things like:

Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --allow-privileged has been deprecated, will be removed in a future version
Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cgroups-per-qos has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --enforce-node-allocatable has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cadvisor-port has been deprecated, The default will change to 0 (disabled) in 1.12, and the cadvisor port will be removed entirely in 1.13
Flag --kube-reserved-cgroup has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --system-reserved-cgroup has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cgroup-root has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.

I don't think they are the cause of the failure but seem like something which should be addressed at this point.

@ijc
Copy link
Contributor

ijc commented Apr 5, 2018

I wonder if this is the root issue:

E0405 13:49:04.072893     638 remote_runtime.go:209] StartContainer "4ea4bde9b43a9eb241a5d7d98abf87184938f85ce9139949a3a246b6fe6b8985" from runtime service failed: rpc error: code = Unknown desc = failed to start container "4ea4bde9b43a9eb241a5d7d98abf87184938f85ce9139949a3a246b6fe6b8985": Error response from daemon: linux mounts: path /etc/kubernetes/pki/etcd is mounted on /etc/kubernetes but it is not a shared or slave mount
E0405 13:49:04.072979     638 kuberuntime_manager.go:733] container start failed: RunContainerError: failed to start container 4ea4bde9b43a9eb241a5d7d98abf87184938f85ce9139949a3a246b6fe6b8985": Error response from daemon: linux mounts: path /etc/kubernetes/pki/etcd is mounted on /etc/kubernetes but it is not a shared or slave mount
E0405 13:49:04.073043     638 pod_workers.go:186] Error syncing pod 5afaaee6f646c04b39d2a4753f62ca14 ("etcd-linuxkit-b6e09efea36e_kube-system(5afaaee6f646c04b39d2a4753f62ca14)"), skipping: failed to "StartContainer" for "etcd" with RunContainerError: "failed to start container \"4ea4bde9b43a9eb241a5d7d98abf87184938f85ce9139949a3a246b6fe6b8985\": Error response from daemon: linux mounts: path /etc/kubernetes/pki/etcd is mounted on /etc/kubernetes but it is not a shared or slave mount"

@ijc
Copy link
Contributor

ijc commented Apr 5, 2018

In https://kubernetes.io/docs/imported/release/notes/#before-upgrading I notice:

[action-required] The Container Runtime Interface (CRI) version has increased from v1alpha1 to v1alpha2. Runtimes implementing the CRI will need to update to the new version, which configures container namespaces using an enumeration rather than booleans. (#58973, @verb)

So it may be that there is a dependency on a newer cri (which I think might mean the one embedded in containerd v1.1)

Copy link
Contributor

@ijc ijc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to sort out the rtf test failures before merging.

@ijc
Copy link
Contributor

ijc commented Apr 5, 2018

I think I've found a fix (well, more of a workaround really) for the docker parts, still need to dig into the cri half.

@ijc
Copy link
Contributor

ijc commented Apr 9, 2018

Lots more changes are needed to make this work so I'm going to carry this one.

@ijc ijc closed this Apr 9, 2018
@ijc ijc mentioned this pull request Apr 9, 2018
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants