Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Connectivity tests fail when run in non cilium-test namespace #1109

Closed
pchaigno opened this issue Sep 26, 2022 · 8 comments Β· Fixed by #1112
Closed

Connectivity tests fail when run in non cilium-test namespace #1109

pchaigno opened this issue Sep 26, 2022 · 8 comments Β· Fixed by #1112
Assignees
Labels
kind/bug Something isn't working

Comments

@pchaigno
Copy link
Member

The connectivity tests consistently fail when run in a namespace other than cilium-test:

$ ./cilium connectivity test --test-namespace latest
[...]
πŸ“‹ Test Report
❌ 6/31 tests failed (16/233 actions), 0 tests skipped, 1 scenarios skipped:
Test [client-egress]:
  ❌ client-egress/pod-to-pod/curl-0: latest/client-594cbbdf44-dt5fj (10.72.0.241) -> latest/echo-other-node-848c6d48fb-2hx2n (10.72.1.34:8080)
  ❌ client-egress/pod-to-pod/curl-1: latest/client-594cbbdf44-dt5fj (10.72.0.241) -> latest/echo-same-node-6d78d7f79b-j94j7 (10.72.0.98:8080)
  ❌ client-egress/pod-to-pod/curl-2: latest/client2-8484df99d-cbwj6 (10.72.0.230) -> latest/echo-other-node-848c6d48fb-2hx2n (10.72.1.34:8080)
  ❌ client-egress/pod-to-pod/curl-3: latest/client2-8484df99d-cbwj6 (10.72.0.230) -> latest/echo-same-node-6d78d7f79b-j94j7 (10.72.0.98:8080)
Test [client-egress-expression]:
  ❌ client-egress-expression/pod-to-pod/curl-2: latest/client2-8484df99d-cbwj6 (10.72.0.230) -> latest/echo-other-node-848c6d48fb-2hx2n (10.72.1.34:8080)
  ❌ client-egress-expression/pod-to-pod/curl-3: latest/client2-8484df99d-cbwj6 (10.72.0.230) -> latest/echo-same-node-6d78d7f79b-j94j7 (10.72.0.98:8080)
Test [client-egress-to-echo-service-account]:
  ❌ client-egress-to-echo-service-account/pod-to-pod/curl-0: latest/client-594cbbdf44-dt5fj (10.72.0.241) -> latest/echo-other-node-848c6d48fb-2hx2n (10.72.1.34:8080)
  ❌ client-egress-to-echo-service-account/pod-to-pod/curl-1: latest/client-594cbbdf44-dt5fj (10.72.0.241) -> latest/echo-same-node-6d78d7f79b-j94j7 (10.72.0.98:8080)
Test [client-egress-to-echo-deny]:
  ❌ client-egress-to-echo-deny/pod-to-pod/curl-0: latest/client-594cbbdf44-dt5fj (10.72.0.241) -> latest/echo-other-node-848c6d48fb-2hx2n (10.72.1.34:8080)
  ❌ client-egress-to-echo-deny/pod-to-pod/curl-1: latest/client-594cbbdf44-dt5fj (10.72.0.241) -> latest/echo-same-node-6d78d7f79b-j94j7 (10.72.0.98:8080)
  ❌ client-egress-to-echo-deny/pod-to-pod/curl-2: latest/client2-8484df99d-cbwj6 (10.72.0.230) -> latest/echo-other-node-848c6d48fb-2hx2n (10.72.1.34:8080)
  ❌ client-egress-to-echo-deny/pod-to-pod/curl-3: latest/client2-8484df99d-cbwj6 (10.72.0.230) -> latest/echo-same-node-6d78d7f79b-j94j7 (10.72.0.98:8080)
Test [client-ingress-to-echo-named-port-deny]:
  ❌ client-ingress-to-echo-named-port-deny/pod-to-pod/curl-0: latest/client-594cbbdf44-dt5fj (10.72.0.241) -> latest/echo-other-node-848c6d48fb-2hx2n (10.72.1.34:8080)
  ❌ client-ingress-to-echo-named-port-deny/pod-to-pod/curl-1: latest/client-594cbbdf44-dt5fj (10.72.0.241) -> latest/echo-same-node-6d78d7f79b-j94j7 (10.72.0.98:8080)
Test [client-egress-to-echo-service-account-deny]:
  ❌ client-egress-to-echo-service-account-deny/pod-to-pod/curl-0: latest/client-594cbbdf44-dt5fj (10.72.0.241) -> latest/echo-other-node-848c6d48fb-2hx2n (10.72.1.34:8080)
  ❌ client-egress-to-echo-service-account-deny/pod-to-pod/curl-1: latest/client-594cbbdf44-dt5fj (10.72.0.241) -> latest/echo-same-node-6d78d7f79b-j94j7 (10.72.0.98:8080)
connectivity test failed: 6 tests failed

I've rerun with -p to pause on the first failure and collected a sysdump:
cilium-sysdump-20220926-205342.zip

This is with Cilium CLI d9371d3.

Investigation

cilium monitor shows this is caused by policy drops:

xx drop (Policy denied) flow 0x76b6c553 to endpoint 0, ifindex 25, file 2:1188, , identity 4620->34411: 10.72.1.9:52440 -> 10.72.1.28:8080 tcp SYN

Looking at the BPF policy map for that endpoint, it is indeed missing a rule to allow on that port:

$ ks exec cilium-x6rtv -- cilium bpf policy get -n 705
POLICY   DIRECTION   IDENTITY   PORT/PROTO   PROXY PORT   BYTES   PACKETS   
Allow    Ingress     0          ANY          NONE         0       0         
Allow    Ingress     1          ANY          NONE         392     4         
Allow    Egress      15025      53/TCP       NONE         0       0         
Allow    Egress      15025      53/UDP       NONE         0       0         
Allow    Egress      15025      53/SCTP      NONE         0       0

Dumping the only CNP in that namespace reveals the issue:

W0926 20:54:10.346854  299992 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
apiVersion: v1
items:
- apiVersion: cilium.io/v2
  kind: CiliumNetworkPolicy
  metadata:
    creationTimestamp: "2022-09-26T18:50:44Z"
    generation: 1
    name: client-egress-to-echo
    namespace: c1
    resourceVersion: "37680"
    uid: f4a95476-ed39-473e-bb00-bec11882b02b
  spec:
    egress:
    - toEndpoints:
      - matchLabels:
          any:io.kubernetes.pod.namespace: cilium-test
          any:kind: echo
      toPorts:
      - ports:
        - port: "8080"
          protocol: TCP
    - toEndpoints:
      - matchExpressions:
        - key: any:k8s-app
          operator: In
          values:
          - kube-dns
          - coredns
          - node-local-dns
        - key: any:io.kubernetes.pod.namespace
          operator: In
          values:
          - kube-system
      toPorts:
      - ports:
        - port: "53"
          protocol: ANY
    endpointSelector:
      matchLabels:
        any:kind: client
kind: List
metadata:
  resourceVersion: ""

Root Cause

It seems commit fc192b0 (and maybe others) added policies with the cilium-test hardcoded. I'm not sure what the intent was here given they are CNPs anyway so the namespace is implicit.

cc @sayboras

@pchaigno pchaigno added the kind/bug Something isn't working label Sep 26, 2022
@sayboras
Copy link
Member

sayboras commented Sep 27, 2022

It seems commit fc192b0 (and maybe others) added policies with the cilium-test hardcoded. I'm not sure what the intent was here given they are CNPs anyway so the namespace is implicit.

thanks for highlighting this, I don't think it's intended πŸ’―

@pchaigno
Copy link
Member Author

@sayboras I think you missed some policies as it's still failing with the same policy drops. And I still have the same policy loaded with cilium-test hardcoded. I checked out latest:

$ git log -n1
commit e206203241b7e8164195bbc16bf46617ef3928a6 (HEAD -> master, origin/master, origin/HEAD)
Author: Tam Mach <tam.mach@cilium.io>
Date:   Tue Sep 27 13:07:55 2022 +1000

    test/connectivity: Remove explicit namespace in policy manifest
    
    As the connectivity test can be from another namespace, which is passed
    by --test-namespace CLI flag, we should not have any hard coded namespace
    in all related policy manifests.
    
    Signed-off-by: Tam Mach <tam.mach@cilium.io>

and I ran make already.

@pchaigno
Copy link
Member Author

Those maybe?

$ git grep cilium-test connectivity/manifests/
connectivity/manifests/client-egress-to-echo-deny.yaml:        io.kubernetes.pod.namespace: cilium-test
connectivity/manifests/client-egress-to-echo-expression.yaml:        io.kubernetes.pod.namespace: cilium-test
connectivity/manifests/client-egress-to-echo-named-port-deny.yaml:        io.kubernetes.pod.namespace: cilium-test
connectivity/manifests/client-egress-to-echo-service-account-deny.yaml:        io.kubernetes.pod.namespace: cilium-test
connectivity/manifests/client-egress-to-echo-service-account.yaml:        io.kubernetes.pod.namespace: cilium-test
connectivity/manifests/client-egress-to-echo.yaml:        io.kubernetes.pod.namespace: cilium-test

@tklauser
Copy link
Member

Maybe we should convert one of the CI tests to use a test namespace name other than cilium-test, so we'll be more likely to catch regressions like this in the future?

@sayboras
Copy link
Member

sorry all, I thought I removed the label in match selector before

#1113

@sayboras
Copy link
Member

Actually, we need those dummy labels, so that the value can be updated dynamically.

So it could be something else, let me take a deeper look

@sayboras
Copy link
Member

@pchaigno this should fix our issue I reckon #1113

@sayboras
Copy link
Member

sayboras commented Oct 3, 2022

This should be fixed as part of recent PR.

@sayboras sayboras closed this as completed Oct 3, 2022
sayboras added a commit to sayboras/cilium-cli that referenced this issue Oct 3, 2022
This is to make sure that we have the coverage of running tests in a
namespace other than cilium-test.

Relates: cilium#1109 (comment)
Signed-off-by: Tam Mach <tam.mach@cilium.io>
tklauser pushed a commit that referenced this issue Oct 3, 2022
This is to make sure that we have the coverage of running tests in a
namespace other than cilium-test.

Relates: #1109 (comment)
Signed-off-by: Tam Mach <tam.mach@cilium.io>
michi-covalent pushed a commit to michi-covalent/cilium that referenced this issue May 30, 2023
This is to make sure that we have the coverage of running tests in a
namespace other than cilium-test.

Relates: cilium/cilium-cli#1109 (comment)
Signed-off-by: Tam Mach <tam.mach@cilium.io>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants