-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
OpenCost metrics interfere with OpenShift's "degraded control plane" detection? #249
Comments
Hmmm, this seems to completely break any cost calculation in OpenCost. After setting this, there are no more metrics visible. I enabled the |
Even after re-enabling |
As just stated in #252 I am not sure if this issue should rather go to the opencost repository, as it seems (to me, with the knowledge I have today...) like not just a problem of disabling some things on OpenShift, but a general problem of OpenCost not working on OpenShift without interfering with OpenShift itself? |
Hi @kastl-ars |
Thank you! We are using the latest chart version 1.43.1. |
The more pressing issue would be #252 as a wrong CPU count sounds more problematic. But my guess is they are related... |
Thank you, |
Any news on this? |
So sorry, did not get time for it this week. Checking right now |
I am able to reproduce this at my end. count(kube_pod_labels{label_app="openshift-kube-apiserver", label_apiserver="true", namespace="openshift-kube-apiserver" }) is giving me a value 6 while there are only 3 with |
This also leads to number of CPUs being doubled in the overview page as stated in #252 |
Thanks for looking into this, glad you could reproduce this! |
Hi @kastl-ars One thing to point out here as I was looking into this further is that setting opencost:
metrics:
kubecostMetrics:
emitKsmV1Metrics: false does not break anything and I am still able to view the cost in the UI. Can you confirm this if possible? I am not able to reproduce what you were experiencing here |
not sure if it is intentional, but you have a wrong key in your YAML snippet. According to the values.yaml, it should be
I would be surprised if with your snippet above the errors (apiservers degraded, number of CPUs wrong) were gone. I have been trying this snippet in our cluster for a week, I get metrics (i.e. there are bars shown in OpenCost), but all of the values are just zero. There is a total of 700$, but all of it is from |
Ahh my bad here. Got mixed up in two different things. |
Any news? Sorry for the hustle, but this is rendering OpenCost unusable on our OpenShift clusters currently... |
Hi @kastl-ars opencost:
metrics:
serviceMonitor:
enabled: true
kubeStateMetrics:
emitKsmV1Metrics: true
config:
enabled: true
disabledMetrics:
- cluster:hyperthread_enabled_nodes
- deployment_match_labels
- kube_job_status_failed
- kube_namespace_labels
- kube_node_labels
- kube_node_status_allocatable
- kube_node_status_capacity
- kube_persistentvolume_capacity_bytes
- kube_persistentvolumeclaim_info
- kube_persistentvolumeclaim_resource_requests_storage_bytes
- kube_pod_container_resource_requests
- kube_pod_container_status_terminated_reason
- kube_pod_labels
- kube_pod_owner
prometheus:
kubeRBACProxy: true
createMonitoringClusterRoleBinding: true
createMonitoringResourceReaderRoleBinding: true
monitoringServiceAccountName: prometheus-k8s
monitoringServiceAccountNamespace: openshift-monitoring
external:
enabled: true
url: https://prometheus-k8s.openshift-monitoring.svc.cluster.local:9091
internal:
enabled: false
exporter:
extraVolumeMounts:
- name: configs
mountPath: /var/configs/metrics.json
subPath: metrics.json
ui:
extraVolumeMounts:
- name: empty-var-www
mountPath: /var/www
# - name: opencost-ui-nginx-config-volume
# mountPath: /etc/nginx/conf.d/default.nginx.conf
# subPath: default.nginx.conf
extraVolumes:
- name: empty-var-www
emptyDir: {}
- name: configs
configMap:
name: custom-metrics
# - name: opencost-ui-nginx-config-volume
# configMap:
# name: opencost-ui-nginx-config and see if you are still not able to get the cost as this seems to fix the overview page CPU count and degraded warnings. I am testing on my end too. Will update in sometime. |
Thanks for digging into this, @mittal-ishaan. I will test this and report back. |
OK, I immediately see OpenCost displaying actual data for today after using the workaround. I'll check if the warnings and the wrong CPU count reappear over the course of today. |
Seems like the workaround works. I have not seen any errors or wrong CPU counts on 4 clusters. And this far all 4 clusters are reporting values in the OpenCost UI. So, the question is, how to get this into the chart properly? And how to document this properly? I think the docs need an OpenShift section... :-( |
That sounds great, thank you @kastl-ars |
Having a |
We have implemented this workaround in all of our OpenShift clusters and this far the "degraded apiserver" error as well as the bogus CPU numbers are gone. So I daresay it works. |
Dear OpenCost maintainers,
since last week we noticed that our OpenShift cluster show a degradation warning, as only 50% of the apiservers are responding.
Turns out this seems to be related to metrics exposed by OpenCost, scraped by Prometheus and then returned by the query used for this degradation detection.
We have explictly disabled the emission of pod annotations, namespace annotations and ksm V1 metrics and the error vanished.
The following lines appeared in the deployment:
I would like to see this added to the documentation that @mittal-ishaan was working on IIRC.
The query that went wrong was this:
Before we introduced the workaround described above, this returned 6 pods, while only three were really running. Hence the degradation warning as only 50% were working...
Kind Regards,
Johannes
The text was updated successfully, but these errors were encountered: