Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

unable to load specified CA cert in target allocator #3572

Open
oszlak opened this issue Dec 23, 2024 · 3 comments
Open

unable to load specified CA cert in target allocator #3572

oszlak opened this issue Dec 23, 2024 · 3 comments
Labels
bug Something isn't working needs triage

Comments

@oszlak
Copy link

oszlak commented Dec 23, 2024

Component(s)

target allocator

What happened?

Description

I'm trying to run TA with Prom CR, while using autoGenerateCert true and certManager false.
I see the secret is populated:
apiVersion: v1 data: ca.crt: ++++++++ tls.crt: ++++++++ tls.key: ++++++++ kind: Secret metadata: annotations: helm.sh/hook: 'pre-install,pre-upgrade' helm.sh/hook-delete-policy: before-hook-creation kubectl.kubernetes.io/last-applied-configuration: >- {"apiVersion":"v1","data":{"ca.crt":"++++++++","tls.crt":"++++++++","tls.key":"++++++++"},"kind":"Secret","metadata":{"annotations":{"helm.sh/hook":"pre-install,pre-upgrade","helm.sh/hook-delete-policy":"before-hook-creation"},"labels":{"app.kubernetes.io/component":"webhook","app.kubernetes.io/instance":"<cluster_name>-opentelemetry-operator","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"opentelemetry-operator","app.kubernetes.io/version":"0.94.0","argocd.argoproj.io/instance":"<cluster_name>-opentelemetry-operator","helm.sh/chart":"opentelemetry-operator-0.48.0"},"name":"<cluster_name>-opentelemetry-operator-controller-manager-service-cert","namespace":"opentelemetry"},"type":"kubernetes.io/tls"} creationTimestamp: '2024-12-23T08:55:47Z' labels: app.kubernetes.io/component: webhook app.kubernetes.io/instance:<cluster_name>-opentelemetry-operator app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: opentelemetry-operator app.kubernetes.io/version: 0.94.0 argocd.argoproj.io/instance: <cluster_name>-opentelemetry-operator helm.sh/chart: opentelemetry-operator-0.48.0 name: >- <cluster_name>-opentelemetry-operator-controller-manager-service-cert namespace: opentelemetry resourceVersion: '665456594' uid: a5c19d0f-414c-40b5-a4da-7da52cde746a type: kubernetes.io/tls
but still can't get it to work.
Also tried to mount it in the collector crd:
volumes: - name: prometheus-certs secret: secretName: {{ .Values.scraper.prometheusSecretName }} items: - key: ca.crt path: {{ .Values.scraper.prometheusSecretPath }} containers: - name: otel-scraper volumeMounts: - name: prometheus-certs mountPath: /etc/prometheus/certs/ readOnly: true

and still getting the same error

Steps to Reproduce

Install operator and enable target allocator with self signed certs

Expected Result

I'm able to scrapte targets over https

Actual Result

Getting error creating new scrape pool

Kubernetes Version

1.30.0

Operator version

0.94.0

Collector version

0.94.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

Log output

2024-12-23T09:19:20.836Z	error	scrape/manager.go:219	error creating new scrape pool	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "error": "error creating HTTP client: unable to load specified CA cert /etc/prometheus/certs/secret_monitoring_<cluster_name>-admission_ca: open /etc/prometheus/certs/secret_monitoring_<cluster_name>-admission_ca: no such file or directory", "errorVerbose": "unable to load specified CA cert /etc/prometheus/certs/secret_monitoring_<cluster_name>-admission_ca: open /etc/prometheus/certs/secret_monitoring_<cluster_name>-admission_ca: no such file or directory\nerror creating HTTP client\ngithub.heygears.com/prometheus/prometheus/scrape.newScrapePool\n\tgithub.heygears.com/prometheus/prometheus@v0.48.1/scrape/scrape.go:293\ngithub.heygears.com/prometheus/prometheus/scrape.(*Manager).reload\n\tgithub.heygears.com/prometheus/prometheus@v0.48.1/scrape/manager.go:217\ngithub.heygears.com/prometheus/prometheus/scrape.(*Manager).reloader\n\tgithub.heygears.com/prometheus/prometheus@v0.48.1/scrape/manager.go:199\nruntime.goexit\n\truntime/asm_amd64.s:1650", "scrape_pool": "serviceMonitor/monitoring/<cluster_name>-operator/0"}

Additional context

No response

@oszlak oszlak added bug Something isn't working needs triage labels Dec 23, 2024
@mtthwcmpbll
Copy link

I'm noticing this error too, and I think it's related to enabling the Target Allocator's mTLS feature flag. I have kube-state-metrics deployed, which deploys a ServiceMonitor with the TLS section filled out. When the Target Allocator discovers this, it's throwing this error at the receiver trying to discover the certificate described in the ServiceMonitor.

I had this halfway configured from some earlier work on this feature, and I missed a few key steps:

  1. I was missing the cert-manager RBAC shown in the mTLS documentation on my operator (I had incorrectly done that on the target allocator itself and not the operator controller)
  2. I was seeing an error in the operator stating "Cert-Manager is not available to the operator, skipping adding to scheme.". This issue pointed me toward setting specific environment variables on the operator so that cert-manager was successfully autodiscovered.

After fixing those two issues, I no longer see the periodic issue when the TA discovers a ServiceMonitor with a TLS block.

@oszlak
Copy link
Author

oszlak commented Jan 7, 2025

thank you @mtthwcmpbll
I have followed the steps above
The role now have

 certificaterequests.cert-manager.io                  []                 []              [create get list watch update patch delete]
  certificates.cert-manager.io                         []                 []              [create get list watch update patch delete]
  issuers.cert-manager.io                              []                 []              [create get list watch update patch delete]

and also in the operator logs I can see:

{"level":"INFO","timestamp":"2025-01-07T10:57:54Z","logger":"setup","message":"Cert-Manager is available to the operator, adding to scheme."}
{"level":"INFO","timestamp":"2025-01-07T10:57:54Z","logger":"setup","message":"Securing the connection between the target allocator and the collector"}

but I still see the issue in the collectors:

2025-01-07T11:06:17.598Z	warn	internal/transaction.go:129	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1736247977598, "target_labels": "{__name__=\"up\", instance=\"{ip}:10249\", job=\"kube-proxy\"}"}
2025-01-07T11:06:17.645Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}
2025-01-07T11:06:18.600Z	warn	internal/transaction.go:129	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1736247968598, "target_labels": "{__name__=\"up\", endpoint=\"https-metrics\", instance=\"{ip}:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"ip-{ip}.ec2.internal\", service=\"test-chart-prom-rw-kube-pr-kubelet\"}"}

can it be related to the fact that the operator has this log?

{"level":"INFO","timestamp":"2025-01-07T10:58:30Z","logger":"collector-upgrade","message":"no instances to upgrade"}

@oszlak
Copy link
Author

oszlak commented Jan 7, 2025

and the main issue still exists:

2025-01-07T11:16:44.095Z	error	scrape/manager.go:180	error creating new scrape pool	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "error": "error creating HTTP client: unable to read CA cert: unable to read file /etc/prometheus/certs/0_monitoring_{cluster}-admission_ca: open /etc/prometheus/certs/0_monitoring_isr-playground-k8s-cen-dev-admission_ca: no such file or directory", "scrape_pool": "serviceMonitor/monitoring/i{cluster}-operator/0"}

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working needs triage
Projects
None yet
Development

No branches or pull requests

2 participants