Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[SURE-9137] ClusterValues dont apply changes if one of the clusters is missing the templateValues #2943

Closed
1 task done
skanakal opened this issue Oct 8, 2024 · 10 comments
Assignees
Labels
Milestone

Comments

@skanakal
Copy link

skanakal commented Oct 8, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

If a GitRepo is configured to target two or more clusters and the fleet.yaml file includes ${ .ClusterValues}, any missing templateValues in one of the cluster's spec will prevent updates or changes from being deployed to the clusters where templateValues are properly configured.

Expected Behavior

  • The changes should be applied in the cluster where the templatesValues are defined.
  • UI should show the clear Error message

Steps To Reproduce

  1. Install rancher 2.9.2 with fleet 0.10.3v
  2. Register two downstream clusters, ensuring that one of them includes templateValues.
apiVersion: fleet.cattle.io/v1alpha1
kind: Cluster
metadata:
  annotations:
  labels:
    foo: bar
    management.cattle.io/cluster-display-name: rke2custom1
    management.cattle.io/cluster-name: c-m-qmc767s2
    objectset.rio.cattle.io/hash: 464bd091084175e4d5572051571f4dfb39bcf2fd
    provider.cattle.io: rke2
  name: rke2custom1
  namespace: fleet-default
spec:
  agentAffinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - preference:
            matchExpressions:
              - key: fleet.cattle.io/agent
                operator: In
                values:
                  - 'true'
          weight: 1
  clientID: pl882vs458n4lqqrj8jc58jvkvq4xgqdfv9l7q7spnrhh7s8wjgj8v
  kubeConfigSecret: rke2custom1-kubeconfig
  kubeConfigSecretNamespace: fleet-default
  templateValues:
    generated:
      cluster_metadata:
        fqdn: server-1.example.com
        name: server-1
  1. create gitrepo from this example path: templateValues
  2. check the gitrepo dashboard for resourceReady

Environment

- Architecture: x86_64
- Fleet Version: fleet:104.0.3+up0.10.3
- Cluster:
  - Provider: custom
  - Options: 1
  - Kubernetes Version: v1.30.5+rke2r1

Logs

From the fleet-controller logs:

2024-10-08T11:59:49Z	DEBUG	bundle	Unchanged bundledeployment	{"controller": "bundle", "controllerGroup": "fleet.cattle.io", "controllerKind": "Bundle", "Bundle": {"name":"mcc-rke2custom1-managed-system-upgrade-controller","namespace":"fleet-default"}, "namespace": "fleet-default", "name": "mcc-rke2custom1-managed-system-upgrade-controller", "reconcileID": "04c5c324-f0f4-4f19-bc31-1e11a890da3e", "bundledeployment": {"apiVersion": "fleet.cattle.io/v1alpha1", "kind": "BundleDeployment", "namespace": "cluster-fleet-default-rke2custom1-43138de7906f", "name": "mcc-rke2custom1-managed-system-upgrade-controller"}, "operation": "unchanged"}
2024-10-08T11:59:49Z	DEBUG	bundle	Unchanged bundledeployment	{"controller": "bundle", "controllerGroup": "fleet.cattle.io", "controllerKind": "Bundle", "Bundle": {"name":"fleet-agent-rke2custom1","namespace":"fleet-default"}, "namespace": "fleet-default", "name": "fleet-agent-rke2custom1", "reconcileID": "d63cdb5d-544d-4356-b269-350b5564aa21", "bundledeployment": {"apiVersion": "fleet.cattle.io/v1alpha1", "kind": "BundleDeployment", "namespace": "cluster-fleet-default-rke2custom1-43138de7906f", "name": "fleet-agent-rke2custom1"}, "operation": "unchanged"}
2024-10-08T11:59:49Z	ERROR	Reconciler error	{"controller": "bundle", "controllerGroup": "fleet.cattle.io", "controllerKind": "Bundle", "Bundle": {"name":"templatevalues-templatevalues-5bfacaa9","namespace":"fleet-default"}, "namespace": "fleet-default", "name": "templatevalues-templatevalues-5bfacaa9", "reconcileID": "2a8aaea7-2194-46c2-a923-bf6f745b1a4a", "error": "failed to render helm values template: template: values:56:40: executing \"values\" at <.ClusterValues.generated.cluster_metadata.fqdn>: map has no entry for key \"generated\""}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222

Anything else?

current behavior:
image

@skanakal skanakal added kind/bug JIRA Must shout labels Oct 8, 2024
@rancherbot rancherbot added this to Fleet Oct 8, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in Fleet Oct 8, 2024
@kkaempf kkaempf added this to the v2.9.4 milestone Oct 8, 2024
@kkaempf kkaempf moved this from 🆕 New to To Triage in Fleet Oct 8, 2024
@manno
Copy link
Member

manno commented Oct 23, 2024

We should not fail all bundle deployments when one cluster is missing a label.

@manno manno modified the milestones: v2.9.4, v2.9.5 Oct 23, 2024
@p-se p-se self-assigned this Nov 7, 2024
@p-se p-se moved this from 📋 Backlog to 🏗 In progress in Fleet Nov 7, 2024
@manno manno modified the milestones: v2.9.5, v2.9.6 Dec 9, 2024
@p-se
Copy link
Contributor

p-se commented Dec 9, 2024

We should not fail all bundle deployments when one cluster is missing a label.

Clarification: We have decided not to ignore template errors when they occur, but to make them visible in Bundle and GitRepo statuses. Corresponding PR is #3114

p-se added a commit to p-se/fleet that referenced this issue Dec 9, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 9, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 9, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 9, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 10, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 10, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 10, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 10, 2024
@manno manno moved this from 🏗 In progress to 👀 In review in Fleet Dec 11, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 16, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 16, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 18, 2024
p-se added a commit to p-se/fleet that referenced this issue Dec 18, 2024
p-se added a commit to p-se/fleet that referenced this issue Jan 6, 2025
p-se added a commit to p-se/fleet that referenced this issue Jan 6, 2025
p-se added a commit to p-se/fleet that referenced this issue Jan 8, 2025
p-se added a commit to p-se/fleet that referenced this issue Jan 8, 2025
p-se added a commit to p-se/fleet that referenced this issue Jan 8, 2025
p-se added a commit to p-se/fleet that referenced this issue Jan 8, 2025
p-se added a commit to p-se/fleet that referenced this issue Jan 8, 2025
p-se added a commit to p-se/fleet that referenced this issue Jan 8, 2025
p-se added a commit that referenced this issue Jan 9, 2025
* Import v1alpha1 package as fleet

* Show bundle errors in Bundle and GitRepo

Refers to #2943

* Add E2E tests

Refers to #2943
@p-se
Copy link
Contributor

p-se commented Jan 9, 2025

/backport v2.10.2

@p-se
Copy link
Contributor

p-se commented Jan 9, 2025

/backport v2.9.6

p-se added a commit to p-se/fleet that referenced this issue Jan 9, 2025
…#3114)

* Import v1alpha1 package as fleet

* Show bundle errors in Bundle and GitRepo

Refers to rancher#2943

* Add E2E tests

Refers to rancher#2943

(cherry picked from commit 235e8ef)
p-se added a commit to p-se/fleet that referenced this issue Jan 9, 2025
…#3114)

* Import v1alpha1 package as fleet

* Show bundle errors in Bundle and GitRepo

Refers to rancher#2943

* Add E2E tests

Refers to rancher#2943

(cherry picked from commit 235e8ef)
(cherry picked from commit 3417071)
manno pushed a commit that referenced this issue Jan 10, 2025
…3196)

* Import v1alpha1 package as fleet

* Show bundle errors in Bundle and GitRepo

Refers to #2943

* Add E2E tests

Refers to #2943

(cherry picked from commit 235e8ef)
(cherry picked from commit 3417071)
manno pushed a commit that referenced this issue Jan 10, 2025
…3193)

* Import v1alpha1 package as fleet

* Show bundle errors in Bundle and GitRepo

Refers to #2943

* Add E2E tests

Refers to #2943

(cherry picked from commit 235e8ef)
@manno manno modified the milestones: v2.9.6, v2.11.0 Jan 13, 2025
@weyfonk
Copy link
Contributor

weyfonk commented Jan 13, 2025

Additional QA

Problem

When a workload targets multiple clusters, and one of those clusters is missing a template value, the following happens:

  • the workload is not deployed to any of the target clusters
  • a reconcile error appears, but only in fleet-controller pod logs. They are not visible in the Rancher UI.

Solution

Fleet now reflects targeting errors, such as those caused by missing template values on clusters, in the bundle and GitRepo statuses.
Fleet deliberately refrains from creating bundle deployments for clusters without targeting issues. A bundle working with a subset of its expected bundle deployments would be expected to cause inconsistencies in resource counts and a possible cascade of other issues. This could be revisited in a further iteration.

Testing

Engineering Testing

Manual Testing

N/A

Automated Testing

End-to-end tests have been added to check for the presence of targeting errors in bundle and GitRepo statuses.

QA Testing Considerations

Suggestion: follow the reproduction steps above, and check that:

  • targeting errors appear in the Rancher UI
  • no bundle deployments are created

Regressions Considerations

N/A

@weyfonk weyfonk moved this from 👀 In review to Needs QA review in Fleet Jan 13, 2025
@sbulage
Copy link
Contributor

sbulage commented Jan 21, 2025

System information

Before Upgrade

Rancher Version Fleet Version
Prime v2.10.1 0.11.2

Steps used to perform

  • Verified that No resources are created on any cluster.
  • No Error message

Note: Steps mentioned in the descriptions were performed pre and post upgrade.


After Upgrade

Rancher Version Fleet Version
v2.11-f0b88cfe74d14b4431d11cda695abd69cc1b951d-head v0.12.0-alpha.3

Steps used to perform

  • Upgraded same cluster to v2.11-f0b88cfe74d14b4431d11cda695abd69cc1b951d-head
  • Navigate to Continuous Delivery --> GitRepo
  • Error message shown on the GitRepo page.
  • See below screenshot.
  • bundleDeployment for the same GitRepo is not created
Screenshot showing template Error message on GitRepo

Image


Below Video shows the Upgrade from Prime v2.10.1 to v2.11-f0b88cfe74d14b4431d11cda695abd69cc1b951d-head version.
cluster_gitrepo_message.mp4

@sbulage sbulage closed this as completed Jan 21, 2025
@github-project-automation github-project-automation bot moved this from Needs QA review to ✅ Done in Fleet Jan 21, 2025
@mahauke
Copy link

mahauke commented Jan 22, 2025

@sbulage From the screenshot, it is hard to tell whether the cluster with missing values will be visible. Does the Fleet UI show the display name of the cluster which is missing the values?

@sbulage
Copy link
Contributor

sbulage commented Jan 22, 2025

@sbulage From the screenshot, it is hard to tell whether the cluster with missing values will be visible. Does the Fleet UI show the display name of the cluster which is missing the values?

Hello @mahauke,
Earlier there were no error shown on the GitRepo page which tell exactly what happens, but right now it is showing due to missing cluster values this GitRepo unable to process further actions.

May be on your specific query @p-se or @manno can shed some light.

@p-se
Copy link
Contributor

p-se commented Jan 22, 2025

@mahauke

@sbulage From the screenshot, it is hard to tell whether the cluster with missing values will be visible. Does the Fleet UI show the display name of the cluster which is missing the values?

If a value is missing or not can only be determined by a GitRepo that points to a repository that uses those cluster values. In fact, the error first appears when the BundleDeployments for the targets are created. Therefore the error is appended to the status of the Bundle. But because we treat the Bundle as a Fleet internal resource, which is not prominently visible in the Rancher UI, the error is propagated to the GitRepo, where you can find it in the UI.

Image

It appears the name of the cluster that is expected to have a value isn't shown. I will revisit the code and check if we can improve on that matter.

@mahauke
Copy link

mahauke commented Jan 22, 2025

@p-se Thank you for the explanations. It would be great if that could be improved!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

7 participants