Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[bug] HelmRepository blocked if secret on startup not exists #1173

Open
genofire opened this issue Jul 21, 2023 · 16 comments
Open

[bug] HelmRepository blocked if secret on startup not exists #1173

genofire opened this issue Jul 21, 2023 · 16 comments

Comments

@genofire
Copy link

genofire commented Jul 21, 2023

Steps:

  • install fluxcd
  • add HelmRepository CRDs with secretRef
  • wait till HelmRepo failed
  • add Secret (which was referenced)
    • or unseal by sealedsecet ...
  • ....

Error Behavour:

  • HelmRepository does not reconcile with new working secret

Expected Behavour:

  • HelmRepository reconcile after given time / interval

Workaround:

  • kill / restart source-controller pod

in fluxcd, version:

  • 0.41.2
  • 2.0.0-rc5
  • 2.0.1
@genofire
Copy link
Author

Does it happen with the ErrorHandling here?

e := &serror.Event{
Err: fmt.Errorf("failed to get secret '%s': %w", name.String(), err),
Reason: sourcev1.AuthenticationFailedReason,
}
conditions.MarkTrue(obj, sourcev1.FetchFailedCondition, e.Reason, e.Err.Error())
return sreconcile.ResultEmpty, e

On the GitRepository (where it works), there we god an "Generic" Error:

e := serror.NewGeneric(
fmt.Errorf("failed to get secret '%s': %w", name.String(), err),
sourcev1.AuthenticationFailedReason,
)
conditions.MarkTrue(obj, sourcev1.FetchFailedCondition, e.Reason, e.Err.Error())
// Return error as the world as observed may change
return sreconcile.ResultEmpty, e

maybe the old error lead them to block permenently

@darkowlzz
Copy link
Contributor

Hi, I just tried it but I couldn't reproduce it.
I created the following helmrepo:

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: podinfo
  namespace: default
spec:
  interval: 1m
  url: https://stefanprodan.github.io/podinfo
  secretRef:
    name: "example"

The secret doesn't exist yet.
Got the following errors in the logs

{"level":"error","ts":"2023-07-21T16:04:40.646+0530","msg":"Reconciler error","controller":"helmrepository","controllerGroup":"source.toolkit.fluxcd.io","controllerKind":"HelmRepository","HelmRepository":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinfo","reconcileID":"3276abd8-8a54-4057-8bc7-ab7664327a44","error":"failed to get secret 'default/example': secrets "example" not found"}

The status of helmrepo shows (kubectl get helmrepository podinfo -o yaml):

status:                                                                                                                                                                      
  conditions:                                                                         
  - lastTransitionTime: "2023-07-21T10:34:45Z"                                                                                                                               
    message: building artifact                                                                                                                                               
    observedGeneration: 1                                                             
    reason: ProgressingWithRetry                                                                                                                                             
    status: "True"                                                                    
    type: Reconciling                                                                                                                                                        
  - lastTransitionTime: "2023-07-21T10:34:45Z"                                                                                                                               
    message: 'failed to get secret ''default/example'': secrets "example" not found'                                                                                               
    observedGeneration: 1                                                             
    reason: AuthenticationFailed                                                      
    status: "False"                                                                   
    type: Ready                                                                                                                                                              
  - lastTransitionTime: "2023-07-21T10:34:40Z"                                                                                                                               
    message: 'failed to get secret ''default/example'': secrets "example" not found'        
    observedGeneration: 1                                                                                                                                                    
    reason: AuthenticationFailed                                                      
    status: "True"      
    type: FetchFailed
  observedGeneration: -1

After creating the secret, within a few seconds, the logs show

{"level":"info","ts":"2023-07-21T16:06:16.387+0530","msg":"stored fetched index of size 43.13kB from 'https://stefanprodan.github.io/podinfo'","controller":"helmrepository",
"controllerGroup":"source.toolkit.fluxcd.io","controllerKind":"HelmRepository","HelmRepository":{"name":"podinfo","namespace":"default"},"namespace":"default","name":"podinf
o","reconcileID":"96dcf686-9538-462e-b832-be6f1f873be5"}

and the helmrepo status shows:

status:
  artifact:
    digest: sha256:80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e
    lastUpdateTime: "2023-07-21T10:36:16Z"
    path: helmrepository/default/podinfo/index-80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e.yaml
    revision: sha256:80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e
    size: 43126
    url: http://:0/helmrepository/default/podinfo/index-80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e.yaml
  conditions:
  - lastTransitionTime: "2023-07-21T10:36:16Z"
    message: 'stored artifact: revision ''sha256:80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e'''
    observedGeneration: 1
    reason: Succeeded
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-07-21T10:36:16Z"
    message: 'stored artifact: revision ''sha256:80b091a3a69b9ecfebde40ce2a5f19e95f8f8ea956bd5635a31701f7fad1616e'''
    observedGeneration: 1
    reason: Succeeded
    status: "True"
    type: ArtifactInStorage
  observedGeneration: 1
  ...

An object can get blocked if they have a Stalled condition in the status, which we don't in this case.
Can you check the status of the blocked helmrepo and share?

@stefanprodan
Copy link
Member

@genofire when reporting bugs please say which version you're using by simply posting the flux check output.

@genofire
Copy link
Author

genofire commented Jul 21, 2023

► checking prerequisites
✔ Kubernetes 1.24.6 >=1.24.0-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.34.1
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v1.0.0-rc.4
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v1.0.0-rc.4
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v1.0.0-rc.5
► checking crds
✔ alerts.notification.toolkit.fluxcd.io/v1beta2
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ gitrepositories.source.toolkit.fluxcd.io/v1
✔ helmcharts.source.toolkit.fluxcd.io/v1beta2
✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1
✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1
✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta2
✔ receivers.notification.toolkit.fluxcd.io/v1
✔ all checks passed

@stefanprodan
Copy link
Member

That's the CLI version, what about controllers and CRDs? flux check prints those.

@genofire
Copy link
Author

genofire commented Jul 21, 2023

no i mean, that the namespace has the version-label of 2.0.0-rc5 - have edit / update the message

@stefanprodan
Copy link
Member

Can you please upgrade to Flux v2.0.1 and see if this issue persists?

@genofire
Copy link
Author

That needs time -> we have 30 clusters with staging

@stefanprodan
Copy link
Member

Not asking you to upgrade all of them, just one to rerun the test. We've tried to replicate this with 2.0.1 and the HelmRepository is not getting stuck. Also what type of repo are you using? OCI or Helm HTTP?

@stefanprodan
Copy link
Member

It wold also be helpful if you can post here kubectl get helmrepository --show-managed-field -oyaml for the one that's stuck.

@genofire
Copy link
Author

genofire commented Jul 21, 2023

so secret exists for 31 minutes, now:

helmrepo:

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  annotations:
    meta.helm.sh/release-name: infra-infra-base
    meta.helm.sh/release-namespace: infra
  creationTimestamp: "2023-07-21T12:46:01Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
    helm.toolkit.fluxcd.io/name: infra-base
    helm.toolkit.fluxcd.io/namespace: flux-system
  managedFields:
  - apiVersion: source.toolkit.fluxcd.io/v1beta2
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:meta.helm.sh/release-name: {}
          f:meta.helm.sh/release-namespace: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/managed-by: {}
          f:helm.toolkit.fluxcd.io/name: {}
          f:helm.toolkit.fluxcd.io/namespace: {}
      f:spec:
        .: {}
        f:interval: {}
        f:provider: {}
        f:secretRef:
          .: {}
          f:name: {}
        f:timeout: {}
        f:url: {}
    manager: helm-controller
    operation: Update
    time: "2023-07-21T12:46:01Z"
  - apiVersion: source.toolkit.fluxcd.io/v1beta2
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions: {}
    manager: source-controller
    operation: Update
    subresource: status
    time: "2023-07-21T12:49:23Z"
  name: opstree
  namespace: infra
  resourceVersion: "32531330664"
  uid: 3021755c-d010-454f-8b88-fecf6ded654f
spec:
  interval: 5m
  provider: generic
  secretRef:
    name: internal-artifactory-auth
  timeout: 60s
  url: https://repo-ex.internal.de/artifactory/ot-container-kit-helm-remote/
status:
  conditions:
  - lastTransitionTime: "2023-07-21T12:49:23Z"
    message: building artifact
    observedGeneration: 1
    reason: ProgressingWithRetry
    status: "True"
    type: Reconciling
  - lastTransitionTime: "2023-07-21T12:49:23Z"
    message: 'failed to get secret ''infra/internal-artifactory-auth'': secrets "internal-artifactory-auth"
      not found'
    observedGeneration: 1
    reason: AuthenticationFailed
    status: "False"
    type: Ready
  - lastTransitionTime: "2023-07-21T12:46:02Z"
    message: 'failed to get secret ''infra/internal-artifactory-auth'': secrets "internal-artifactory-auth"
      not found'
    observedGeneration: 1
    reason: AuthenticationFailed
    status: "True"
    type: FetchFailed
  observedGeneration: -1

oci helmrepo:

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  annotations:
    meta.helm.sh/release-name: infra-infra-base
    meta.helm.sh/release-namespace: infra
  creationTimestamp: "2023-07-21T12:46:01Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: Helm
    helm.toolkit.fluxcd.io/name: infra-base
    helm.toolkit.fluxcd.io/namespace: flux-system
  managedFields:
  - apiVersion: source.toolkit.fluxcd.io/v1beta2
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:meta.helm.sh/release-name: {}
          f:meta.helm.sh/release-namespace: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/managed-by: {}
          f:helm.toolkit.fluxcd.io/name: {}
          f:helm.toolkit.fluxcd.io/namespace: {}
      f:spec:
        .: {}
        f:interval: {}
        f:provider: {}
        f:secretRef:
          .: {}
          f:name: {}
        f:timeout: {}
        f:type: {}
        f:url: {}
    manager: helm-controller
    operation: Update
    time: "2023-07-21T12:46:01Z"
  - apiVersion: source.toolkit.fluxcd.io/v1beta2
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions: {}
    manager: source-controller
    operation: Update
    subresource: status
    time: "2023-07-21T12:49:23Z"
  name: weave-gitops
  namespace: infra
  resourceVersion: "32531330680"
  uid: 2a2a7e1b-a809-4992-9f76-a8c5d7650133
spec:
  interval: 60m0s
  provider: generic
  secretRef:
    name: internal-artifactory-auth
  timeout: 60s
  type: oci
  url: oci://docker-virtual.repo-ex.internal.de/weaveworks/charts
status:
  conditions:
  - lastTransitionTime: "2023-07-21T12:49:22Z"
    message: 'processing object: new generation -1 -> 1'
    observedGeneration: 1
    reason: ProgressingWithRetry
    status: "True"
    type: Reconciling
  - lastTransitionTime: "2023-07-21T12:46:02Z"
    message: 'failed to get secret ''infra/internal-artifactory-auth'': secrets "internal-artifactory-auth"
      not found'
    observedGeneration: 1
    reason: AuthenticationFailed
    status: "False"
    type: Ready
  observedGeneration: -1


@stefanprodan
Copy link
Member

If you run flux reconcile helmrepository does it find the secret or the same thing happens?

@genofire
Copy link
Author

genofire commented Jul 21, 2023

if i trigger it twice:

# flux reconcile source helm -n infra weave-gitops                            
► annotating HelmRepository weave-gitops in infra namespace
✔ HelmRepository annotated
◎ waiting for HelmRepository reconciliation
✗ HelmRepository reconciliation failed: 'failed to get secret 'infra/internal-artifactory-auth': secrets "internal-artifactory-auth" not found'


# flux reconcile source helm -n infra weave-gitops
► annotating HelmRepository weave-gitops in infra namespace
✔ HelmRepository annotated
◎ waiting for HelmRepository reconciliation
✔ Helm repository is ready


@stefanprodan
Copy link
Member

This is really strange, is your Kubernetes API under heavy load, is etcd having any issue? This may be a caching issue, we have disabled the caching of Secrets in our controllers but the API does it as well.

@genofire
Copy link
Author

genofire commented Jul 21, 2023

It is your cloud provider IONOS ... we have no control over the etcd. my problem ist, i do not see any logs above a reconcileing of this helmrepository (other i see) ... like it is in stall.

we have that problem daily over two month (always if we create a new cluster and install there your default resources)

@genofire
Copy link
Author

if you are right, that the kube-api request is under heavy load, so maybe we should timeout request there (maybe that is the problem),
here my code:
fluxcd/pkg#627

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants