You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
run cortex up on the cluster.yml specified above, using us-west-2 as the region.
Expected behavior
cortex up to complete successfully, just as it does when region is us-east-2
Actual behavior
cortex up exits nonzero and reports a failure
Stack traces
failure trace
cortex cluster up ./<MASKED>/cluster.yaml
using aws credentials with access key <MASKED>
verifying your configuration ...
aws resource cost per hour
1 eks cluster $0.10
nodegroup tmp: 1-5 m5.large instances $0.102 each
2 t3.medium instances (cortex system) $0.088 total
1 t3.medium instance (prometheus) $0.05
2 network load balancers $0.045 total
your cluster will cost $0.38 - $0.79 per hour based on cluster size
cortex will also create an s3 bucket (this-config-fails-36f0f6ff) and a cloudwatch log group (this-config-fails)
would you like to continue? (y/n): y
○ creating a new s3 bucket: this-config-fails-36f0f6ff ✓
○ creating a new cloudwatch log group: this-config-fails ✓
○ spinning up the cluster (this will take about 30 minutes) ...
2022-02-23 19:01:51 [ℹ] eksctl version 0.67.0
2022-02-23 19:01:51 [ℹ] using region us-west-2
2022-02-23 19:01:51 [ℹ] subnets for us-west-2a - public:192.168.0.0/19 private:192.168.96.0/19
2022-02-23 19:01:51 [ℹ] subnets for us-west-2b - public:192.168.32.0/19 private:192.168.128.0/19
2022-02-23 19:01:51 [ℹ] subnets for us-west-2c - public:192.168.64.0/19 private:192.168.160.0/19
2022-02-23 19:01:51 [!] Custom AMI detected for nodegroup cx-operator. Please refer to https://github.com/weaveworks/eksctl/issues/3563 for upcoming breaking changes
2022-02-23 19:01:51 [ℹ] nodegroup "cx-operator" will use "ami-002539dd2c532d0a5" [AmazonLinux2/1.21]
2022-02-23 19:01:51 [!] Custom AMI detected for nodegroup cx-prometheus. Please refer to https://github.com/weaveworks/eksctl/issues/3563 for upcoming breaking changes
2022-02-23 19:01:51 [ℹ] nodegroup "cx-prometheus" will use "ami-002539dd2c532d0a5" [AmazonLinux2/1.21]
2022-02-23 19:01:51 [!] Custom AMI detected for nodegroup cx-wd-tmp. Please refer to https://github.com/weaveworks/eksctl/issues/3563 for upcoming breaking changes
2022-02-23 19:01:51 [ℹ] nodegroup "cx-wd-tmp" will use "ami-002539dd2c532d0a5" [AmazonLinux2/1.21]
2022-02-23 19:01:51 [ℹ] using Kubernetes version 1.21
2022-02-23 19:01:51 [ℹ] creating EKS cluster "this-config-fails" in "us-west-2" region with un-managed nodes
2022-02-23 19:01:51 [ℹ] 3 nodegroups (cx-operator, cx-prometheus, cx-wd-tmp) were included (based on the include/exclude rules)
2022-02-23 19:01:51 [ℹ] will create a CloudFormation stack for cluster itself and 3 nodegroup stack(s)
2022-02-23 19:01:51 [ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s)
2022-02-23 19:01:51 [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-west-2 --cluster=this-config-fails'
2022-02-23 19:01:51 [ℹ] CloudWatch logging will not be enabled for cluster "this-config-fails" in "us-west-2"
2022-02-23 19:01:51 [ℹ] you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-west-2 --cluster=this-config-fails'
2022-02-23 19:01:51 [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "this-config-fails" in "us-west-2"
2022-02-23 19:01:51 [ℹ] 2 sequential tasks: { create cluster control plane "this-config-fails", 3 sequential sub-tasks: { 2 sequential sub-tasks: { wait for control plane to become ready, tag cluster }, 1 task: { create addons }, 3 parallel sub-tasks: { create nodegroup "cx-operator", create nodegroup "cx-prometheus", create nodegroup "cx-wd-tmp" } } }
2022-02-23 19:01:51 [ℹ] building cluster stack "eksctl-this-config-fails-cluster"
2022-02-23 19:01:52 [ℹ] deploying stack "eksctl-this-config-fails-cluster"
2022-02-23 19:02:22 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:02:52 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:03:52 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:04:52 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:05:52 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:06:52 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:07:52 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:08:52 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:09:52 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:10:52 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:11:53 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:12:53 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:13:53 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:14:53 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-cluster"
2022-02-23 19:16:54 [✔] tagged EKS cluster (cortex.dev/cluster-name=this-config-fails)
2022-02-23 19:18:55 [!] OIDC is disabled but policies are required/specified for this addon. Users are responsible for attaching the policies to all nodegroup roles
2022-02-23 19:18:55 [ℹ] creating addon
2022-02-23 19:23:25 [ℹ] addon "vpc-cni" active
2022-02-23 19:23:25 [ℹ] building nodegroup stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:23:25 [!] Custom AMI detected for nodegroup cx-wd-tmp, using legacy nodebootstrap mechanism. Please refer to https://github.com/weaveworks/eksctl/issues/3563 for upcoming breaking changes
2022-02-23 19:23:25 [ℹ] building nodegroup stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:23:25 [!] Custom AMI detected for nodegroup cx-operator, using legacy nodebootstrap mechanism. Please refer to https://github.com/weaveworks/eksctl/issues/3563 for upcoming breaking changes
2022-02-23 19:23:25 [ℹ] building nodegroup stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:23:25 [!] Custom AMI detected for nodegroup cx-prometheus, using legacy nodebootstrap mechanism. Please refer to https://github.com/weaveworks/eksctl/issues/3563 for upcoming breaking changes
2022-02-23 19:23:26 [ℹ] deploying stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:23:26 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:23:26 [ℹ] deploying stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:23:26 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:23:26 [ℹ] deploying stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:23:26 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:23:42 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:23:44 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:23:44 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:23:57 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:23:59 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:24:02 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:24:16 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:24:16 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:24:20 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:24:33 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:24:36 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:24:40 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:24:51 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:24:53 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:24:55 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:25:10 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:25:11 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:25:11 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:25:25 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:25:28 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:25:30 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:25:44 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:25:44 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:25:50 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:26:00 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:26:00 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:26:10 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:26:17 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:26:20 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:26:25 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:26:33 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:26:40 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:26:41 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:26:51 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-wd-tmp"
2022-02-23 19:26:56 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-operator"
2022-02-23 19:26:57 [ℹ] waiting for CloudFormation stack "eksctl-this-config-fails-nodegroup-cx-prometheus"
2022-02-23 19:26:57 [ℹ] waiting for the control plane availability...
2022-02-23 19:26:57 [✔] saved kubeconfig as "/root/.kube/config"
2022-02-23 19:26:57 [ℹ] 1 task: { suspend ASG processes for nodegroup cx-wd-tmp }
2022-02-23 19:26:58 [ℹ] suspended ASG processes [AZRebalance] for cx-wd-tmp
2022-02-23 19:26:58 [✔] all EKS cluster resources for "this-config-fails" have been created
2022-02-23 19:26:58 [ℹ] adding identity "arn:aws:iam::<MASKED>:role/eksctl-this-config-fails-nodegrou-NodeInstanceRole-OG2YBA75HYPE" to auth ConfigMap
2022-02-23 19:26:58 [ℹ] nodegroup "cx-operator" has 0 node(s)
2022-02-23 19:26:58 [ℹ] waiting for at least 2 node(s) to become ready in "cx-operator"
2022-02-23 19:27:30 [ℹ] nodegroup "cx-operator" has 2 node(s)
2022-02-23 19:27:30 [ℹ] node "ip-192-168-20-129.us-west-2.compute.internal" is ready
2022-02-23 19:27:30 [ℹ] node "ip-192-168-88-85.us-west-2.compute.internal" is ready
2022-02-23 19:27:30 [ℹ] adding identity "arn:aws:iam::<MASKED>:role/eksctl-this-config-fails-nodegrou-NodeInstanceRole-75KAP1SXG8SQ" to auth ConfigMap
2022-02-23 19:27:30 [ℹ] nodegroup "cx-prometheus" has 0 node(s)
2022-02-23 19:27:30 [ℹ] waiting for at least 1 node(s) to become ready in "cx-prometheus"
2022-02-23 19:28:32 [ℹ] nodegroup "cx-prometheus" has 1 node(s)
2022-02-23 19:28:32 [ℹ] node "ip-192-168-54-32.us-west-2.compute.internal" is ready
2022-02-23 19:28:32 [ℹ] adding identity "arn:aws:iam::<MASKED>:role/eksctl-this-config-fails-nodegrou-NodeInstanceRole-JLK3EF72JQAV" to auth ConfigMap
2022-02-23 19:28:32 [ℹ] nodegroup "cx-wd-tmp" has 0 node(s)
2022-02-23 19:28:32 [ℹ] waiting for at least 1 node(s) to become ready in "cx-wd-tmp"
2022-02-23 19:31:00 [ℹ] nodegroup "cx-wd-tmp" has 1 node(s)
2022-02-23 19:31:00 [ℹ] node "ip-192-168-72-237.us-west-2.compute.internal" is ready
2022-02-23 19:33:01 [ℹ] kubectl command should work with "/root/.kube/config", try 'kubectl get nodes'
2022-02-23 19:33:01 [✔] EKS cluster "this-config-fails" in "us-west-2" region is ready
○ updating cluster configuration ✓
○ configuring networking (this will take a few minutes) ✓
○ configuring autoscaling ✓
○ configuring async gateway ✓
○ configuring logging ✓
○ configuring metrics ✓
○ configuring gpu support (for nodegroups that may require it) ✓
○ configuring inf support (for nodegroups that may require it) ✓
○ starting operator ✓
○ starting controller manager ✓
○ waiting for load balancers .............................................................................................................................................................................................................................................................................................................................................
timeout has occurred when validating your cortex cluster
debugging info:
operator pod name: pod/operator-controller-manager-6f8bb85b96-clqxf
operator pod is ready: true
operator endpoint: <MASKED>.elb.us-west-2.amazonaws.com
noperator curl response:
{}additional networking events:
LAST SEEN TYPE REASON OBJECT MESSAGE
30m Normal EnsuringLoadBalancer service/ingressgateway-apis Ensuring load balancer
30m Normal EnsuredLoadBalancer service/ingressgateway-apis Ensured load balancer
30m Normal EnsuringLoadBalancer service/ingressgateway-operator Ensuring load balancer
30m Normal EnsuredLoadBalancer service/ingressgateway-operator Ensured load balancer
30m Normal Scheduled pod/ingressgateway-apis-69465f9956-gzxtf Successfully assigned istio-system/ingressgateway-apis-69465f9956-gzxtf to ip-192-168-20-129.us-west-2.compute.internal
30m Normal Pulling pod/ingressgateway-operator-7b54fcf5cd-gsvcb Pulling image "quay.io/cortexlabs/istio-proxy:0.42.0"
30m Normal Pulling pod/ingressgateway-apis-69465f9956-gzxtf Pulling image "quay.io/cortexlabs/istio-proxy:0.42.0"
30m Normal Pulled pod/ingressgateway-operator-7b54fcf5cd-gsvcb Successfully pulled image "quay.io/cortexlabs/istio-proxy:0.42.0" in 4.987000991s
30m Normal Created pod/ingressgateway-operator-7b54fcf5cd-gsvcb Created container istio-proxy
30m Normal Started pod/ingressgateway-operator-7b54fcf5cd-gsvcb Started container istio-proxy
30m Normal Pulled pod/ingressgateway-apis-69465f9956-gzxtf Successfully pulled image "quay.io/cortexlabs/istio-proxy:0.42.0" in 6.764940388s
30m Normal Started pod/ingressgateway-apis-69465f9956-gzxtf Started container istio-proxy
30m Normal Created pod/ingressgateway-apis-69465f9956-gzxtf Created container istio-proxy
30m Warning Unhealthy pod/ingressgateway-apis-69465f9956-gzxtf Readiness probe failed: Get "http://192.168.3.27:15021/healthz/ready": dial tcp 192.168.3.27:15021: connect: connection refused
please run `cortex cluster down` to delete the cluster before trying to create this cluster again
Additional context
I've only tested us-west-2 and us-east-2 so far. I've repeated the experiment a number of times. I see consistent failure when region is us-west-2 and consistent success when region is us-east-2.
A search in the slack channel for timeout has occurred when validating your cortex cluster shows that this issue is pretty common. I see four or five reports of the issue in the last year.
us-west-2 is my default region.
The text was updated successfully, but these errors were encountered:
I just tried creating a new cluster with the cluster configuration you provided, and it worked for me in us-west-2. I ran this from the master branch, but there have not been any changes that should affect the cluster creation process since the v0.42.0 release. Do you mind trying again?
Version
Description
cortex up
fails with "timeout has occurred when validating your cortex cluster". This happens consistently.The failure only occurs with
region: us-west-2
. When region is set tous-east-2
,cortex up
succeeds.Configuration
cortex up
fails when using thiscluster.yml
:while this
cluster.yml
succeeds:Steps to reproduce
cortex up
on thecluster.yml
specified above, usingus-west-2
as the region.Expected behavior
cortex up
to complete successfully, just as it does when region isus-east-2
Actual behavior
cortex up
exits nonzero and reports a failureStack traces
failure trace
Additional context
I've only tested
us-west-2
andus-east-2
so far. I've repeated the experiment a number of times. I see consistent failure when region isus-west-2
and consistent success when region isus-east-2
.A search in the slack channel for
timeout has occurred when validating your cortex cluster
shows that this issue is pretty common. I see four or five reports of the issue in the last year.us-west-2
is my default region.The text was updated successfully, but these errors were encountered: