Upgrade kubernetes to v1.10.0 #68

pgayvallet · 2018-04-04T14:16:56Z

No description provided.

Signed-off-by: pgayvallet <pierre.gayvallet@gmail.com>

ijc · 2018-04-05T13:01:08Z

CI here lacks the capability to actually boot a VM with Kube in a reasonable time so I've taken to running the tests from the tests directory (simply an rtf run in there) on exciting PRs such are this.

Unfortunately the result here is not good. There are 4 tests covering the cross-product of Docker vs CRI for runtime and bridge vs weave for networking. I'm seeing:

test$ rtf run
LABELS: linux, Debian, testing, amd64
ID: c864d026-6689-4a06-86b3-f4031df35885
[FAIL    ] kubernetes.smoke.cri-bridge 73.32s
[FAIL    ] kubernetes.smoke.cri-weave 76.40s
[FAIL    ] kubernetes.smoke.docker-bridge 1922.39s
[FAIL    ] kubernetes.smoke.docker-weave 1919.32s
[SUMMARY ] LogDir: c864d026-6689-4a06-86b3-f4031df35885
[SUMMARY ] Version: testing
[SUMMARY ] Passed: 0
[SUMMARY ] Failed: 4
[SUMMARY ] Cancelled: 0
[SUMMARY ] Skipped: 0
[SUMMARY ] Duration: 3991.75s
Some tests failed

results.zip

The cri ones both failed with:

[STDOUT  ] 2018-04-05T12:39:08.881590376+01:00: [init] This might take a minute or longer if the control plane images have to be pulled.
[STDOUT  ] 2018-04-05T12:39:10.186350065+01:00: Connection to localhost closed.
[STDOUT  ] 2018-04-05T12:39:10.187096495+01:00: FAIL kubeadm-init.sh (eof)

While the docker ones both failed with:

[STDOUT  ] 2018-04-05T12:41:12.299059146+01:00: [init] This might take a minute or longer if the control plane images have to be pulled.
[STDOUT  ] 2018-04-05T13:11:12.302084127+01:00: 
[STDOUT  ] 2018-04-05T13:11:12.303340814+01:00: Unfortunately, an error has occurred:
[STDOUT  ] 2018-04-05T13:11:12.303361383+01:00:         timed out waiting for the condition
[STDOUT  ] 2018-04-05T13:11:12.303371730+01:00: 
[STDOUT  ] 2018-04-05T13:11:12.303382034+01:00: This error is likely caused by:
[STDOUT  ] 2018-04-05T13:11:12.303392927+01:00:         - The kubelet is not running
[STDOUT  ] 2018-04-05T13:11:12.303402566+01:00:         - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
[STDOUT  ] 2018-04-05T13:11:12.303414947+01:00:         - Either there is no internet connection, or imagePullPolicy is set to "Never",
[STDOUT  ] 2018-04-05T13:11:12.303424089+01:00:           so the kubelet cannot pull or find the following control plane images:
[STDOUT  ] 2018-04-05T13:11:12.303433422+01:00:                 - k8s.gcr.io/kube-apiserver-amd64:v1.10.0
[STDOUT  ] 2018-04-05T13:11:12.303442569+01:00:                 - k8s.gcr.io/kube-controller-manager-amd64:v1.10.0
[STDOUT  ] 2018-04-05T13:11:12.303451511+01:00:                 - k8s.gcr.io/kube-scheduler-amd64:v1.10.0
[STDOUT  ] 2018-04-05T13:11:12.303460277+01:00:                 - k8s.gcr.io/etcd-amd64:3.1.12 (only if no external etcd endpoints are configured)
[STDOUT  ] 2018-04-05T13:11:12.303469419+01:00: 
[STDOUT  ] 2018-04-05T13:11:12.303478832+01:00: If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
[STDOUT  ] 2018-04-05T13:11:12.303507289+01:00:         - 'systemctl status kubelet'
[STDOUT  ] 2018-04-05T13:11:12.303516004+01:00:         - 'journalctl -xeu kubelet'
[STDOUT  ] 2018-04-05T13:11:12.303525415+01:00: couldn't initialize a Kubernetes cluster
[STDOUT  ] 2018-04-05T13:11:12.307491683+01:00: linuxkit-e694aaf0b765:/# ESC[6nFAIL kubeadm-init.sh (returned to prompt)

Not sure what's going on here -- did you manage to boot/initialise a node with this?

I'm going to repeat the tests in case I did something wrong with my builds.

ijc · 2018-04-05T13:07:35Z

Repeated and have got the first two failures. Looking at the in-progress logs for the third I fully expect it is going to fail after 1900s too.

ijc · 2018-04-05T13:41:25Z

FWIW master (018027c) passes, starting to dig a bit deeper now.

ijc · 2018-04-05T13:52:36Z

I don't think this PR is complete. e.g. /var/log/kubelet.err.log has things like:

Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --allow-privileged has been deprecated, will be removed in a future version
Flag --cluster-dns has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cgroups-per-qos has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --enforce-node-allocatable has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cadvisor-port has been deprecated, The default will change to 0 (disabled) in 1.12, and the cadvisor port will be removed entirely in 1.13
Flag --kube-reserved-cgroup has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --system-reserved-cgroup has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Flag --cgroup-root has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.

I don't think they are the cause of the failure but seem like something which should be addressed at this point.

ijc · 2018-04-05T13:55:00Z

I wonder if this is the root issue:

E0405 13:49:04.072893     638 remote_runtime.go:209] StartContainer "4ea4bde9b43a9eb241a5d7d98abf87184938f85ce9139949a3a246b6fe6b8985" from runtime service failed: rpc error: code = Unknown desc = failed to start container "4ea4bde9b43a9eb241a5d7d98abf87184938f85ce9139949a3a246b6fe6b8985": Error response from daemon: linux mounts: path /etc/kubernetes/pki/etcd is mounted on /etc/kubernetes but it is not a shared or slave mount
E0405 13:49:04.072979     638 kuberuntime_manager.go:733] container start failed: RunContainerError: failed to start container 4ea4bde9b43a9eb241a5d7d98abf87184938f85ce9139949a3a246b6fe6b8985": Error response from daemon: linux mounts: path /etc/kubernetes/pki/etcd is mounted on /etc/kubernetes but it is not a shared or slave mount
E0405 13:49:04.073043     638 pod_workers.go:186] Error syncing pod 5afaaee6f646c04b39d2a4753f62ca14 ("etcd-linuxkit-b6e09efea36e_kube-system(5afaaee6f646c04b39d2a4753f62ca14)"), skipping: failed to "StartContainer" for "etcd" with RunContainerError: "failed to start container \"4ea4bde9b43a9eb241a5d7d98abf87184938f85ce9139949a3a246b6fe6b8985\": Error response from daemon: linux mounts: path /etc/kubernetes/pki/etcd is mounted on /etc/kubernetes but it is not a shared or slave mount"

ijc · 2018-04-05T14:14:47Z

In https://kubernetes.io/docs/imported/release/notes/#before-upgrading I notice:

[action-required] The Container Runtime Interface (CRI) version has increased from v1alpha1 to v1alpha2. Runtimes implementing the CRI will need to update to the new version, which configures container namespaces using an enumeration rather than booleans. (#58973, @verb)

So it may be that there is a dependency on a newer cri (which I think might mean the one embedded in containerd v1.1)

ijc

We need to sort out the rtf test failures before merging.

ijc · 2018-04-05T16:42:16Z

I think I've found a fix (well, more of a workaround really) for the docker parts, still need to dig into the cri half.

ijc · 2018-04-09T10:00:54Z

Lots more changes are needed to make this work so I'm going to carry this one.

pgayvallet added 2 commits April 4, 2018 16:14

upgrade kubernetes to 1.10.0

5615aa0

Signed-off-by: pgayvallet <pierre.gayvallet@gmail.com>

update hashes

333c811

Signed-off-by: pgayvallet <pierre.gayvallet@gmail.com>

GordonTheTurtle added the status/0-triage label Apr 4, 2018

ijc suggested changes Apr 5, 2018

View reviewed changes

ijc closed this Apr 9, 2018

ijc mentioned this pull request Apr 9, 2018

Update to Kubernetes v1.10 #70

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade kubernetes to v1.10.0 #68

Upgrade kubernetes to v1.10.0 #68

pgayvallet commented Apr 4, 2018

ijc commented Apr 5, 2018

ijc commented Apr 5, 2018

ijc commented Apr 5, 2018

ijc commented Apr 5, 2018

ijc commented Apr 5, 2018

ijc commented Apr 5, 2018

ijc left a comment

ijc commented Apr 5, 2018

ijc commented Apr 9, 2018

Upgrade kubernetes to v1.10.0 #68

Upgrade kubernetes to v1.10.0 #68

Conversation

pgayvallet commented Apr 4, 2018

ijc commented Apr 5, 2018

ijc commented Apr 5, 2018

ijc commented Apr 5, 2018

ijc commented Apr 5, 2018

ijc commented Apr 5, 2018

ijc commented Apr 5, 2018

ijc left a comment

Choose a reason for hiding this comment

ijc commented Apr 5, 2018

ijc commented Apr 9, 2018