EBS CSI Driver issue causing kubetest2 failures - IMDS metadata and Kubernetes metadata are both unavailable #1061

mmerkes · 2024-11-25T18:53:15Z

Which jobs are failing:

pull-cloud-provider-aws-e2e-kubetest2-quick
pull-cloud-provider-aws-e2e-kubetest2

Which test(s) are failing:
BeforeSuite is failing because CPI nodes aren't stabilizing.

Since when has it been failing:
This one passed on 10/31.

This one failed on 11/6. So sometime between these two.

Testgrid link:

Reason for failure:

EBS CSI pod is not stabilizing:

2024-11-25T18:30:42.52251214Z stderr F I1125 18:30:42.522404       1 main.go:157] "Initializing metadata"
2024-11-25T18:30:47.523520821Z stderr F E1125 18:30:47.523424       1 metadata.go:51] "Retrieving IMDS metadata failed, falling back to Kubernetes metadata" err="could not get EC2 instance identity metadata: operation error ec2imds: GetInstanceIdentityDocument, canceled, context deadline exceeded"
2024-11-25T18:30:47.530862069Z stderr F E1125 18:30:47.530760       1 metadata.go:58] "Retrieving Kubernetes metadata failed" err="could not retrieve instance type from topology label"
2024-11-25T18:30:47.530928736Z stderr F E1125 18:30:47.530882       1 main.go:162] "Failed to initialize metadata when it is required" err="IMDS metadata and Kubernetes metadata are both unavailable"

Anything else we need to know:

/kind failing-test

The text was updated successfully, but these errors were encountered:

mmerkes · 2024-11-25T18:55:06Z

/triage accepted

dims · 2024-11-25T18:58:15Z

cc @ConnorJC3 @torredil

mmerkes · 2024-11-25T19:06:17Z

Not sure if they're related to each other, but also see this error in kubelet:

Nov 25 18:34:03 ip-172-31-24-156 kubelet[6298]: E1125 18:34:03.425509 6298 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"aws-cloud-controller-manager\" with ImagePullBackOff: \"Back-off pulling image \\\"209411653980.dkr.ecr.us-east-1.amazonaws.com/provider-aws/cloud-controller-manager:v1.30.0-beta.0-110-gac63fea\\\": ErrImagePull: rpc error: code = NotFound desc = failed to pull and unpack image \\\"209411653980.dkr.ecr.us-east-1.amazonaws.com/provider-aws/cloud-controller-manager:v1.30.0-beta.0-110-gac63fea\\\": failed to resolve reference \\\"209411653980.dkr.ecr.us-east-1.amazonaws.com/provider-aws/cloud-controller-manager:v1.30.0-beta.0-110-gac63fea\\\": 209411653980.dkr.ecr.us-east-1.amazonaws.com/provider-aws/cloud-controller-manager:v1.30.0-beta.0-110-gac63fea: not found\"" pod="kube-system/aws-cloud-controller-manager-cq6m2" podUID="b6d43d27-1967-414e-86f8-72b3e9375664"

ConnorJC3 · 2024-11-25T19:11:57Z

Not sure if they're related to each other, but also see this error in kubelet:

Very likely related - as I believe it is the AWS CCM that adds the labels we rely on for metadata to the node.

mmerkes · 2024-11-25T19:18:48Z

Very likely related - as I believe it is the AWS CCM that adds the labels we rely on for metadata to the node.

Sounds right. Looks like that's a red herring.

lavalex · 2024-12-18T13:56:52Z

I'm getting this error on Openshift .... Any ideas how to solve it? Thanks.

k8s-ci-robot added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 25, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EBS CSI Driver issue causing kubetest2 failures - IMDS metadata and Kubernetes metadata are both unavailable #1061

EBS CSI Driver issue causing kubetest2 failures - IMDS metadata and Kubernetes metadata are both unavailable #1061

mmerkes commented Nov 25, 2024

mmerkes commented Nov 25, 2024

dims commented Nov 25, 2024

mmerkes commented Nov 25, 2024 •

edited

Loading

ConnorJC3 commented Nov 25, 2024

mmerkes commented Nov 25, 2024

lavalex commented Dec 18, 2024

EBS CSI Driver issue causing kubetest2 failures - IMDS metadata and Kubernetes metadata are both unavailable #1061

EBS CSI Driver issue causing kubetest2 failures - IMDS metadata and Kubernetes metadata are both unavailable #1061

Comments

mmerkes commented Nov 25, 2024

mmerkes commented Nov 25, 2024

dims commented Nov 25, 2024

mmerkes commented Nov 25, 2024 • edited Loading

ConnorJC3 commented Nov 25, 2024

mmerkes commented Nov 25, 2024

lavalex commented Dec 18, 2024

mmerkes commented Nov 25, 2024 •

edited

Loading