Skip to content
This repository has been archived by the owner on Oct 10, 2023. It is now read-only.

Fix flaky package plugin tests #3323

Merged
merged 1 commit into from
Sep 20, 2022

Conversation

haoranleo
Copy link
Contributor

@haoranleo haoranleo commented Sep 13, 2022

What this PR does / why we need it

  • Fix flaky package plugin tests by waiting for packaging API to become available after registry pod is running
  • This change simply reorders the checking block in package plugin tests

Root cause of flaky plugin tests:

  • With original checking order (first wait for packaging API become available and then wait for registry pod to be running), it might be possible that the packaging API is still available because of old kapp-controller pod has NOT been deleted yet.
  • In this case, the checking about packaging API might pass before the kapp-controller actually restarts (the time old kapp-controller pod is replaced by new one). Which means the "make sure package API is available after kapp-controller restart" block in setUpPrivateRegistry function can not guarantee that the packaging API is available AFTER kapp-controller restarts
  • So there might be a chance that the when the tests go into testHelper and tries to call package repository API, the kapp-controller is still in progress of restarting. As a result, packaging API is not available and thus throw error:
Error: failed to check for the availability of 'packaging.carvel.dev' API: failed to discover unmatched GroupVersionResources: the server is currently unable to handle the request

Evidence of this analysis:

In a pipeline run, the following logs are observed:

01:12:58  $ tanzu package repository list --all-namespaces --namespace test-ns --output json --kubeconfig /home/kubo/.kube/config 
01:12:58  [
01:12:58    {
01:12:58      "details": "",
01:12:58      "name": "tanzu-standard",
01:12:58      "namespace": "tanzu-package-repo-global",
01:12:58      "repository": "projects.registry.vmware.com/tkg/packages/standard/repo",
01:12:58      "status": "Reconcile succeeded",
01:12:58      "tag": "v1.6.0"
01:12:58    },
01:12:58    {
01:12:58      "details": "",
01:12:58      "name": "tanzu-core",
01:12:58      "namespace": "tkg-system",
01:12:58      "repository": "projects.registry.vmware.com/tkg/packages/core/repo",
01:12:58      "status": "Reconcile succeeded",
01:12:58      "tag": "v1.23.8_vmware.2-tkg.1"
01:12:58    }
01:12:58  ]
01:13:16  $ tanzu package repository update carvel-test --url registry-svc.registry.svc.cluster.local:443/secret-test/test-repo@sha256:e07483e2140fa427d9875aee9055d72efc49a732f3a3fb2c9651d9f39159315a --namespace test-ns --create-namespace --create --wait=true --poll-interval 20s --poll-timeout 30s --kubeconfig /home/kubo/.kube/config 
01:13:16  Error: failed to check for the availability of 'packaging.carvel.dev' API: failed to discover unmatched GroupVersionResources: the server is currently unable to handle the request

Note that the first tanzu package repository list is executed by the "make sure package API is available after kapp-controller restart" block. Which means that the packaging API is available in that check and thus the check passes. However, when the tests proceed, the packaging API becomes unavailable which fails the operations on package repository. This symptoms demonstrates that the kapp-controller has not finished the restart yet while the corresponding check ("make sure package API ... restart") passes.

Reasoning for the fix:

After switching order, the tests won't check the availability of packaging API right after deleting old kapp-controller pod. Instead it will first wait for registry pod to be up. So it gives time for kapp-controller pods to be deleted before checking, which avoids the fact that the packaging API is available is because of old kapp-controller pods are still running. Testing details can be seen at the Testing done section.

Which issue(s) this PR fixes

Fixes #3324

Describe testing done for PR

Ran the pipeline multiple times and verified all of package plugin tests passed. Also observe the following log snippet which demonstrates that the "make sure package API is available after kapp-controller restart" test block is actually waiting for packaging API to become available after restart now:

15:56:29  $ tanzu package repository list --all-namespaces --namespace test-ns --output json --kubeconfig /home/kubo/.kube/config 
15:56:29  Error: failed to check for the availability of 'packaging.carvel.dev' API: failed to discover unmatched GroupVersionResources: the server is currently unable to handle the request
15:56:29  
15:56:29  ✖  exit status 1 
15:56:29  Error: exit status 1
15:56:29  <nil>
15:56:29  <nil>
15:56:47  $ tanzu package repository list --all-namespaces --namespace test-ns --output json --kubeconfig /home/kubo/.kube/config 
15:56:47  Error: failed to check for the availability of 'packaging.carvel.dev' API: failed to discover unmatched GroupVersionResources: the server is currently unable to handle the request
15:56:47  
15:56:47  ✖  exit status 1 
15:56:47  Error: exit status 1
15:56:47  <nil>
15:56:47  <nil>
15:57:02  $ tanzu package repository list --all-namespaces --namespace test-ns --output json --kubeconfig /home/kubo/.kube/config 
15:57:02  Error: failed to check for the availability of 'packaging.carvel.dev' API: failed to discover unmatched GroupVersionResources: the server is currently unable to handle the request
15:57:02  
15:57:02  ✖  exit status 1 
15:57:02  Error: exit status 1
15:57:02  <nil>
15:57:02  <nil>
15:57:20  $ tanzu package repository list --all-namespaces --namespace test-ns --output json --kubeconfig /home/kubo/.kube/config 
15:57:20  Error: failed to check for the availability of 'packaging.carvel.dev' API: failed to discover unmatched GroupVersionResources: the server is currently unable to handle the request
15:57:20  
15:57:20  ✖  exit status 1 
15:57:20  Error: exit status 1
15:57:20  <nil>
15:57:20  <nil>
15:57:35  $ tanzu package repository list --all-namespaces --namespace test-ns --output json --kubeconfig /home/kubo/.kube/config 
15:57:35  [
15:57:35    {
15:57:35      "details": "",
15:57:35      "name": "tanzu-standard",
15:57:35      "namespace": "tanzu-package-repo-global",
15:57:35      "repository": "projects.registry.vmware.com/tkg/packages/standard/repo",
15:57:35      "status": "Reconcile succeeded",
15:57:35      "tag": "v1.6.0"
15:57:35    },
15:57:35    {
15:57:35      "details": "",
15:57:35      "name": "tanzu-core",
15:57:35      "namespace": "tkg-system",
15:57:35      "repository": "projects.registry.vmware.com/tkg/packages/core/repo",
15:57:35      "status": "Reconcile succeeded",
15:57:35      "tag": "v1.23.8_vmware.2-tkg.1"
15:57:35    }
15:57:35  ]
15:57:35  $ tanzu package repository update carvel-test --url registry-svc.registry.svc.cluster.local:443/secret-test/test-repo@sha256:e07483e2140fa427d9875aee9055d72efc49a732f3a3fb2c9651d9f39159315a --namespace test-ns --create-namespace --create --wait=true --poll-interval 20s --poll-timeout 30s --kubeconfig /home/kubo/.kube/config 
15:57:57  
15:57:57  
15:57:57  Please consider using 'tanzu package repository update' to update the package repository with correct settings

Release note

Fix package plugin tests flakiness

Additional information

Special notes for your reviewer

@codecov
Copy link

codecov bot commented Sep 13, 2022

Codecov Report

Merging #3323 (2ed3857) into main (6ab1811) will increase coverage by 0.05%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #3323      +/-   ##
==========================================
+ Coverage   52.94%   52.99%   +0.05%     
==========================================
  Files         103      103              
  Lines       10419    10419              
==========================================
+ Hits         5516     5522       +6     
+ Misses       4443     4439       -4     
+ Partials      460      458       -2     
Impacted Files Coverage Δ
addons/controllers/clusterbootstrap_controller.go 63.18% <0.00%> (+0.25%) ⬆️
...ons/controllers/packageinstallstatus_controller.go 79.15% <0.00%> (+1.15%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@haoranleo haoranleo force-pushed the lhaoran/fix-flaky-package-plugin-tests branch from 0b01fe1 to a67a8ad Compare September 13, 2022 20:58
@haoranleo haoranleo force-pushed the lhaoran/fix-flaky-package-plugin-tests branch from a67a8ad to 5683d1c Compare September 20, 2022 00:00
@haoranleo haoranleo requested a review from maralavi September 20, 2022 00:38
@haoranleo haoranleo force-pushed the lhaoran/fix-flaky-package-plugin-tests branch from 5683d1c to 2ed3857 Compare September 20, 2022 16:34
@maralavi maralavi added the ok-to-merge PRs should be labelled with this before merging label Sep 20, 2022
@haoranleo haoranleo merged commit 67d32de into main Sep 20, 2022
@haoranleo haoranleo deleted the lhaoran/fix-flaky-package-plugin-tests branch September 20, 2022 20:54
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
cla-not-required ok-to-merge PRs should be labelled with this before merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky package plugin integration tests
4 participants