Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Kuma Control Plane Stuck on creating default mesh #614

Closed
somejavadev opened this issue Mar 16, 2021 · 10 comments
Closed

Kuma Control Plane Stuck on creating default mesh #614

somejavadev opened this issue Mar 16, 2021 · 10 comments

Comments

@somejavadev
Copy link

Summary

I am trying to create the kuma control plane on a GKE cluster by following the standalone instructions as detailed here although after everything has completed it seems kuma is stuck at creating the default mesh. The control plane logs indicates

image

with INFO defaults trying to create default Mesh repeating.

I have tried to creating a mesh manually by applying the follow with kubectl -f mesh.yml:

mesh.yml

apiVersion: kuma.io/v1alpha1
kind: Mesh
metadata:
  name: my-mesh
spec:
  mtls:
    enabledBackend: test
    backends:
      - name: test
        type: builtin
    enabled: true

but this resulted in the following error:

Error from server (InternalError): error when creating "test.yml": Internal error occurred: failed calling webhook "mesh.defaulter.kuma-admission.kuma.io": Post https://kuma-control-plane.kuma-system.svc:443/default-kuma-io-v1alpha1-mesh?timeout=30s: context deadline exceeded

Steps To Reproduce

  1. Setup a GKE cluster using the following parameters
gcloud beta container clusters create test \
  --no-enable-basic-auth \
  --cluster-version "1.17.17-gke.1101" \
  --machine-type "n1-standard-2" \
  --image-type "COS" \
  --disk-type "pd-standard" \
  --disk-size "20" \
  --metadata disable-legacy-endpoints=true, \
  --service-account limited-gke-service-account \
  --max-pods-per-node "30" \
  --num-nodes "3" \
  --enable-stackdriver-kubernetes  \
  --enable-private-nodes \
  --master-ipv4-cidr "$MASTER_IPV4" \
  --enable-ip-alias \
  --default-max-pods-per-node "110" \
  --enable-autoscaling \
  --min-nodes "0" \
  --max-nodes "3" \
  --enable-network-policy \
  --enable-master-authorized-networks \
  --master-authorized-networks 0.0.0.0/0 \
  --addons HorizontalPodAutoscaling,HttpLoadBalancing \
  --enable-autoupgrade \
  --enable-autorepair \
  --enable-shielded-nodes

The limited-gke-service-account has a Kubernetes Engine Admin IAM role as well as the Metric Writer role.

  1. Use the 1.1.1 release of kuma, and deploy the control plane as described here
  2. Monitor the control plane logs and see the repeat logging of the default mesh trying to be created. When you access the UI you will also note that there isn't a default mesh displayed.
  3. While this is occurring you may try and deploy the mesh.yml above to create the my-mesh mesh which should result in the following error: Error from server (InternalError): error when creating "test.yml": Internal error occurred: failed calling webhook "mesh.defaulter.kuma-admission.kuma.io": Post https://kuma-control-plane.kuma-system.svc:443/default-kuma-io-v1alpha1-mesh?timeout=30s: context deadline exceeded

Additional Details & Logs

  • Version

Tested on both kuma 1.1.1 and 1.1.0

  • Error logs

There doesn't seem to be any other logs indicating error besides the repeating of the trying to create default Mesh INFO entries

Update: leaving the control plane in this state after a while results in this error message:

ERROR mesh-insight-resyncer component terminated with an error {"generationID": 1, "error": "stop channel was closed", "errorVerbose": "stop channel was closed\ngithub.heygears.com/kumahq/kuma/pkg/events.(*reader).Recv\n\t/go/src/github.com/kumahq/kuma/pkg/events/eventbus.go:57\ngithub.heygears.com/kumahq/kuma/pkg/insights.(*resyncer).Start\n\t/go/src/github.com/kumahq/kuma/pkg/insights/resyncer.go:101\ngithub.heygears.com/kumahq/kuma/pkg/core/runtime/component.(*resilientComponent).Start.func1\n\t/go/src/github.com/kumahq/kuma/pkg/core/runtime/component/resilient.go:43\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1374"}

Although it could be due to GKE restarting the pod.

  • Configuration

Default quick start, no custom changes.

  • Platform and Operating System

GKE

  • Installation Method (Helm, kumactl, AWS CloudFormation, etc.)

This occurs in the latest helm charts as well as with the kumactl client.

@bartsmykla
Copy link
Contributor

bartsmykla commented Mar 25, 2021

Hi @somejavadev, I will try to reproduce it first and then come back to you

@bartsmykla
Copy link
Contributor

@somejavadev so I managed to reproduce the issue, and I find out the problem is related to control-plane not being able to communicate with k8s api-server, which is related to the "hardened" configuration of the cluster.

I'm not an expert in GCP so I didn't manage to find a fix for you, as I don't know enough the flags used to create cluster, but will do my best to work on it next week.

I wanted to let you know about my findings, as maybe you are more proficient with GCP itself and maybe you'll have better idea about this.

@austince austince changed the title Kuma Conrol Plane Stuck on creating default mesh Kuma Control Plane Stuck on creating default mesh May 10, 2021
@nhamlh
Copy link

nhamlh commented May 28, 2021

@somejavadev could you please try re-create control-plane deployment with hostNetwork: true ?

@arjunsalyan
Copy link

arjunsalyan commented Jun 9, 2021

The issue is that in a private GKE cluster, by default, the kubernetes control plane is only allowed access to port 443 and 10250 of the pods and nodes.

But it seems that there is some involvement of admission webhooks when creating a mesh. And that is why you will have to make custom firewall rules so that the k8s control plane can be allowed the right ports.

From here, https://kuma.io/docs/1.0.1/documentation/networking/#kuma-cp-ports, it seems that it is port 5443 that the k8s control plane is lacking access to. You can either edit the existing firewall rule for the k8s master and add port 5443 , or create a new firewall rule with the your master ip ranges as source and the nodes as target.

This might help you, https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#add_firewall_rules

UPDATE:
I reproduced the problem and adding 5443 to the master firewall rule fixed it.

@jpeach
Copy link
Contributor

jpeach commented Jul 1, 2021

UPDATE:
I reproduced the problem and adding 5443 to the master firewall rule fixed it.

Nice! Can we update the Kuma deployment docs with what you needed to change?

@itspngu
Copy link

itspngu commented Oct 12, 2021

UPDATE:
I reproduced the problem and adding 5443 to the master firewall rule fixed it.

Nice! Can we update the Kuma deployment docs with what you needed to change?

Since @arjunsalyan seems to have disappeared, I just ran into this issue and adding 5443 to the list of allowed ports in the master<->nodes firewall rule fixed it.

@arjunsalyan
Copy link

@itspngu Oops! I thought my comment above was elaborative enough for someone running into this issue. But it will make more sense to add this to documentation somewhere.

@github-actions
Copy link
Contributor

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed.
If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

@lahabana lahabana transferred this issue from kumahq/kuma Nov 29, 2021
@lahabana
Copy link
Contributor

Off to kuma-website to add this to the docs

@lahabana
Copy link
Contributor

lahabana commented Dec 6, 2021

This port is listed there: https://kuma.io/docs/1.4.0/networking/networking/#kuma-cp-ports

Closing this ticket.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants