-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Delete metrics for deleted pods #1279
Conversation
ab39f6c
to
b55de47
Compare
Just an idea would there be a way to do the delete on a timer so the metrics are not immediately lost? Then we could program some time to ensure that when we poll the metrics we get at least one poll before they are deleted? Maybe alternatively mark them ready for delete and only delete them after a read? |
I tend to like the idea where they get marked for delete on next read when the pod is removed then we don't need another timer value to tune. |
c77835a
to
e5bc30a
Compare
@jrfastab Do you have a good use case for when this last value would be really important? This is in a TODO, yes, but I'm debating how useful it is.
This would mean that if the scraper is not working then Tetragon is never deleting stale metrics. It's kinda against the pull design of Prometheus IMO - the idea is that Tetragon exposes a metrics endpoint and whether something is reading it or not shouldn't be its concern. |
2de7d4d
to
abe499b
Compare
abe499b
to
e98ad7a
Compare
636200d
to
cdbcbb3
Compare
Ok, I implemented this using workqueue.DelayingInterface. But I hit an import cycle, so CI is red. I'll refactor in a separate PR to resolve it. |
cbb0424
to
20e9a2c
Compare
20e9a2c
to
8712376
Compare
8712376
to
8e8583a
Compare
05f85a1
to
97679ce
Compare
✅ Deploy Preview for tetragon ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
97679ce
to
726ce8f
Compare
pkg/metrics/metrics.go
Outdated
var ( | ||
metricsWithPod []*prometheus.MetricVec | ||
podQueue workqueue.DelayingInterface | ||
deleteDelay = 1 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow up PR we can make this configurable
4a39c4a
to
80df0bf
Compare
Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Some of the exposed metrics have "pod" label, which contains the name of the monitored pod. So far when a pod got deleted, Tetragon kept exposing stale metrics for it. This was causing continuous increase in memory usage in Tetragon agent as well as in the metrics scraper. This commit fixes the issue. Now if metrics and k8s API are both enabled then an additional pod hook gets registered that on pod deletion deletes metrics associated with it. Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
3fb8d8d
to
82b1da6
Compare
Instead of deleting metrics immediately after a pod is deleted, use a workqueue to delay the deletion a minute. This allows the scraper to scrape last values of the metrics. It's particularly useful when the cluster has short-lived pods - with immediate deletion the scraper could completely miss metrics for them. Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
This commit moves InitAllMetrics function from pkg/metrics to a separate package, pkg/metrics/config. This is done to avoid importing all packages that define metrics in pkg/metrics. pkg/metrics should act as a library and can be imported by packages that define metrics, but not the other way round. Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
The intention is to make it easier to register metrics that should be cleaned up on pod deletion. Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
82b1da6
to
2e1041b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Some of the exposed metrics have "pod" label, which contains the name of the
monitored pod. So far when a pod got deleted, Tetragon kept exposing stale
metrics for it. This was causing continuous increase in memory usage in
Tetragon agent as well as in the metrics scraper.
This PR fixes the issue. Now if metrics and k8s API are both enabled then
an additional pod hook gets registered that on pod deletion deletes metrics
associated with it.