Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

eks: Auth manifest fails to provision with cluster in isolated subnets #30442

Closed
hakenmt opened this issue Jun 3, 2024 · 5 comments
Closed
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. effort/medium Medium work item – several days of effort p3 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@hakenmt
Copy link

hakenmt commented Jun 3, 2024

Describe the bug

I'm trying to deploy a basic EKS cluster in a private VPC (but using public and private control plane endpoint). I can provision just the cluster itself, but when trying to add an auto scaling group as a managed node group, the Lambda functions trying to create the auth manifest fail.

Expected Behavior

The auth manifest successfully completes.

Current Behavior

The Lambda function creating the auth manifest fails with the following error:

Received response status [FAILED] from custom resource. Message returned: TimeoutError: {"state":"TIMEOUT","reason":"Waiter has timed out"} at checkExceptions (/var/runtime/node_modules/@aws-sdk/node_modules/@smithy/util-waiter/dist-cjs/index.js:59:26) at waitUntilFunctionActiveV2 (/var/runtime/node_modules/@aws-sdk/client-lambda/dist-cjs/index.js:5895:49) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async defaultInvokeFunction (/var/task/outbound.js:1:1024) at async invokeUserFunction (/var/task/framework.js:1:2287) at async onEvent (/var/task/framework.js:1:369) at async Runtime.handler (/var/task/cfn-response.js:1:1676) (RequestId: b13fb0f2-88dd-4ff8-a866-54c4963e9bbb)

The function's logs show the following:

executing user function arn:aws:lambda:us-east-1:123456789012:function:workshop-EKSNestedStackEK-Handler886CB40B-qIenQtrVLhHt with payload
{ "RequestType": "Create", "ServiceToken": "arn:aws:lambda:us-east-1:123456789012:function:multi-az-workshop-EKSNest-ProviderframeworkonEvent-KXYzmcHXxo0s", "ResponseURL": "...", "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/workshop-EKSNestedStackEKSNestedStackResourceAE427C53-KBXM35391R6Q/534250a0-1eba-11ef-ba72-1216c147ed83", "RequestId": "f30ba971-e153-4768-b2fe-88cbc4e671cb", "LogicalResourceId": "EKSClusterAwsAuthmanifestA4E0796C", "ResourceType": "Custom::AWSCDK-EKS-KubernetesResource", "ResourceProperties": { "ServiceToken": "arn:aws:lambda:us-east-1:123456789012:function:workshop-EKSNest-ProviderframeworkonEvent-KXYzmcHXxo0s", "Overwrite": "true", "PruneLabel": "aws.cdk.eks/prune-c8c15f43727ab77e57a21567c165e7ba9169d8e3ed", "ClusterName": "EKSClusterE11008B6-91e4fa7eeaf6422d9937cdfeabb95ggg", "Manifest": "[{\"apiVersion\":\"v1\",\"kind\":\"ConfigMap\",\"metadata\":{\"name\":\"aws-auth\",\"namespace\":\"kube-system\",\"labels\":{\"aws.cdk.eks/prune-c8c15f43727ab77e57a21567c165e7ba9169d8e3ed\":\"\"}},\"data\":{\"mapRoles\":\"[{\\\"rolearn\\\":\\\"arn:aws:iam::123456789012:role/workshop-EKSNest-EKSClusterMNGInstanceRole-BbGdW8Ua0qcZ\\\",\\\"username\\\":\\\"system:node:{{EC2PrivateDNSName}}\\\",\\\"groups\\\":[\\\"system:bootstrappers\\\",\\\"system:nodes\\\"]}]\",\"mapUsers\":\"[]\",\"mapAccounts\":\"[]\"}}]", "RoleArn": "arn:aws:iam::123456789012:role/workshop-EKSNest-EKSClusterCreationRoleB86-zJ9j0anFlnrC" } }

But I don't see any logs for the referenced Handler function ever produced and the function's metrics don't show any invocations. The Lambdas aren't being placed in the VPC, so it's unclear why the handler function is not being invoked.

Reproduction Steps

IRole eksRole = new Role(this, "EKSWorkerRole", new RoleProps() {
                AssumedBy = new ServicePrincipal("ec2.amazonaws.com")
        });

            eksRole.AddManagedPolicy(ManagedPolicy.FromAwsManagedPolicyName("AmazonEKSVPCResourceController"));
            eksRole.AddManagedPolicy(ManagedPolicy.FromAwsManagedPolicyName("AmazonEKSWorkerNodePolicy"));
            eksRole.AddManagedPolicy(ManagedPolicy.FromAwsManagedPolicyName("AmazonSSMManagedEC2InstanceDefaultPolicy"));
            eksRole.AddManagedPolicy(ManagedPolicy.FromAwsManagedPolicyName("AmazonEC2ContainerRegistryReadOnly"));

            eksRole.AddManagedPolicy(new ManagedPolicy(this, "EKSManagedPolicy", new ManagedPolicyProps() {
                Statements = new PolicyStatement[] {
                    new PolicyStatement(new PolicyStatementProps() {
                        Effect = Effect.ALLOW,
                        Resources = new string[] { "*" },
                        Actions = new string[] {
                            "s3:GetObject",
                            "s3:ListBucket"
                        }
                    })
                }
            }));

            SecurityGroup sg = new SecurityGroup(this, "EKSSecurityGroup", new SecurityGroupProps() {
                Description =  "Allow inbound access from the load balancer and public clients",
                Vpc = props.Vpc
            });

            sg.AddIngressRule(Peer.AnyIpv4(), Port.Tcp(80) );
            sg.AddIngressRule(Peer.Ipv4(props.Vpc.VpcCidrBlock), Port.Tcp(80) );
            sg.AddIngressRule(Peer.Ipv4(props.Vpc.VpcCidrBlock), Port.IcmpPing());           

            Cluster eksCluster = new Cluster(this, "EKSCluster", new ClusterProps(){
                Vpc = props.Vpc,
                VpcSubnets = new SubnetSelection[] { new SubnetSelection() { SubnetType = SubnetType.PRIVATE_ISOLATED } },
                DefaultCapacity = 0,
                Version =  KubernetesVersion.Of("1.30"),
                PlaceClusterHandlerInVpc = false,
                EndpointAccess = EndpointAccess.PUBLIC_AND_PRIVATE,
                KubectlLayer = new KubectlV29Layer(this, "KubectlV29Layer")
            });

            eksCluster.AddAutoScalingGroupCapacity("MNG", new AutoScalingGroupCapacityOptions() {
                MinCapacity = 6,
                MaxCapacity = 6,
                InstanceType = InstanceType.Of(props.CpuArch == InstanceArchitecture.ARM_64 ? InstanceClass.T4G : InstanceClass.T3, InstanceSize.MICRO),
                MachineImageType = MachineImageType.AMAZON_LINUX_2,
                SsmSessionPermissions = true
            });

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.144.0

Framework Version

No response

Node.js Version

v20.9.0

OS

darwin

Language

.NET

Language Version

No response

Other information

No response

@hakenmt hakenmt added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 3, 2024
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Jun 3, 2024
@pahud
Copy link
Contributor

pahud commented Jun 4, 2024

If you create cluster with isolated subnets only with PlaceClusterHandlerInVpc=false. Your lambda function would not be able to access your cluster control plan.

You may find these useful though:

@pahud pahud added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p3 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Jun 4, 2024
@pahud pahud changed the title EKS: Auth manifest fails to provision EKS: Auth manifest fails to provision with cluster in isolated subnets Jun 4, 2024
@pahud pahud changed the title EKS: Auth manifest fails to provision with cluster in isolated subnets eks: Auth manifest fails to provision with cluster in isolated subnets Jun 4, 2024
@hakenmt
Copy link
Author

hakenmt commented Jun 4, 2024

Actually the root of the problem is that the Lambda function used for kubectl isn't configured to use regional STS endpoints, which causes its manifest validation to fail because it defaults to sts.amazonaws.com, even when being used in a VPC with a VPC endpoint.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jun 4, 2024
@pahud
Copy link
Contributor

pahud commented Jun 5, 2024

Yes this could be a consideration. Thanks for the sharing.

@pahud pahud added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jun 5, 2024
Copy link

github-actions bot commented Jun 7, 2024

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jun 7, 2024
@github-actions github-actions bot added closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Jun 12, 2024
@aws-cdk-automation
Copy link
Collaborator

Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.

@aws aws locked as resolved and limited conversation to collaborators Jul 25, 2024
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. effort/medium Medium work item – several days of effort p3 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

3 participants