Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

metrics: Refactor metrics label filter logic #2321

Merged
merged 8 commits into from
Apr 17, 2024

Conversation

lambdanis
Copy link
Contributor

@lambdanis lambdanis commented Apr 10, 2024

Tetragon has a functionality to disable some of the high-cardinality metrics
labels. So far it worked like this:

  1. Configure option.Config.MetricsLabelFilter as a set (map[string]interface{})
    of enabled labels.
  2. metrics.granularLabelFilter wraps enabled labels and configurable labels in
    one global variable.
  3. Define a metric using a "granular metric" wrapper, which uses
    granularLabelFilter.
  4. When the metric is registered, the inner Prometheus metric is registered
    with the enabled labels only.
  5. Then we update the metric like a regular Prometheus metric, using
    WithLabelValues.

Refactor this logic so that now it works like this:

  1. Configure MetricsLabelFilter similarly like before, but as map[string]bool
  2. metrics.ProcessLabels struct implements metrics.FilteredLabels interface and
    contains label values of configurable labels. Disabled labels are set to ""
    when the struct is instantiated.
  3. Define a metric using a "granular metric" wrapper, which now uses Go generics
    to specify configurable labels (FilteredMetrics type).
  4. The inner Prometheus metric is now registered with all labels.
  5. Then we update the metric using a slightly different implementation of
    WithLabelValues, which takes a generic FilteredLabels struct and any
    additional label values as strings.

This approach provides better types, so that it's not as easy for developers to
make a mistake and pass incorrect labels when updating a metric. Additionally,
it makes it easy to define a different set of configurable labels (another
FilteredMetrics type) and to implement dynamic metrics labels configuration (as
it doesn't require re-registering metrics when the filter changes).

@lambdanis lambdanis added area/metrics Related to prometheus metrics release-note/misc This PR makes changes that have no direct user impact. labels Apr 10, 2024
Copy link

netlify bot commented Apr 10, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 3b11b2f
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/661d721b7ba3b80007444c61
😎 Deploy Preview https://deploy-preview-2321--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@lambdanis lambdanis force-pushed the pr/lambdanis/labels-filter branch 2 times, most recently from 86d44db to ceac60b Compare April 10, 2024 15:57
@lambdanis lambdanis marked this pull request as ready for review April 10, 2024 15:57
@lambdanis lambdanis requested a review from a team as a code owner April 10, 2024 15:57
@lambdanis lambdanis requested a review from olsajiri April 10, 2024 15:57
@lambdanis lambdanis marked this pull request as draft April 10, 2024 16:04
@lambdanis
Copy link
Contributor Author

I'm still doing some testing so I converted it to draft, but please review 🙇‍♀️

@lambdanis lambdanis force-pushed the pr/lambdanis/labels-filter branch from ceac60b to dd68c2a Compare April 15, 2024 13:13
There are no functional changes in this commit, just shuffling code around to
improve navigation.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Nothing defined in this file is used.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
@lambdanis lambdanis force-pushed the pr/lambdanis/labels-filter branch 2 times, most recently from 53d774c to 3b11b2f Compare April 15, 2024 18:29
@lambdanis lambdanis marked this pull request as ready for review April 15, 2024 18:30
@lambdanis lambdanis requested a review from mtardy as a code owner April 15, 2024 18:30
@lambdanis lambdanis force-pushed the pr/lambdanis/labels-filter branch from b3ed29f to a9c29d5 Compare April 16, 2024 07:07
Copy link
Contributor

@kkourt kkourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

}
if !option.Config.MetricsLabelFilter["binary"] {
binary = ""
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Configure MetricsLabelFilter similarly like before, but as map[string]bool
  2. metrics.ProcessLabels struct implements metrics.FilteredLabels interface and
    contains label values of configurable labels. Disabled labels are set to ""
    when the struct is instantiated.
  3. Define a metric using a "granular metric" wrapper, which now uses Go generics
    to specify configurable labels (FilteredMetrics type).
  4. The inner Prometheus metric is now registered with all labels.
  5. Then we update the metric using a slightly different implementation of
    WithLabelValues, which takes a generic FileredLabels struct and any
    additional label values as strings.

AFAIU, the disabled labels are set to "". The thing I'm not sure I understand is where does the filtering happens? Is it .WithLabelValues()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When calling WithLabelValues() you need to pass a ProcessLabels struct. If it's created with NewProcessLabels then disabled labels are set to "", what from Prometheus perspective is equivalent to not having these labels at all. So filtering relies on always creating ProcessLabels with NewProcessLabels. I'll add a comment about it, as it's not obvious indeed.

Tetragon has a functionality to disable some of the high-cardinality metrics
labels. So far it worked like this:
1. Configure option.Config.MetricsLabelFilter as a set (map[string]interface{})
   of enabled labels.
2. metrics.granularLabelFilter wraps enabled labels and configurable labels in
   one global variable.
3. Define a metric using a "granular metric" wrapper, which uses
   granularLabelFilter.
4. When the metric is registered, the inner Prometheus metric is registered
   with the enabled labels only.
5. Then we update the metric like a regular Prometheus metric, using
   WithLabelValues.

Refactor this logic so that now it works like this:
1. Configure MetricsLabelFilter similarly like before, but as map[string]bool
2. metrics.ProcessLabels struct implements metrics.FilteredLabels interface and
   contains label values of configurable labels. Disabled labels are set to ""
   when the struct is instantiated.
3. Define a metric using a "granular metric" wrapper, which now uses Go generics
   to specify configurable labels (FilteredMetrics type).
4. The inner Prometheus metric is now registered with all labels.
5. Then we update the metric using a slightly different implementation of
   WithLabelValues, which takes a generic FilteredLabels struct and any
   additional label values as strings. The FilteredLabels struct should be
   always instantiated with a helper function that applies the filter.

This approach provides better types, so that it's not as easy for developers to
make a mistake and pass incorrect labels when updating a metric. Additionally,
it makes it easy to define a different set of configurable labels (another
FilteredMetrics type) and to implement dynamic metrics labels configuration (as
it doesn't require re-registering metrics when the filter changes).

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Tetragon allows users to configure high-cardinality metrics labels via a Helm
value (tetragon.prometheus.metricsLabelFilter) and an agent flag
(metrics-label-filter). The Helm value default is to enable all labels,
however, the agent flag defaulted to disabling all, even though its docs said
otherwise. Let's fix it and default the agent flag to enable all labels.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
There is no reason to export them - new instances of the labels should be
always created with NewProcessLabels.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Instead, implement prometheus.Collector interface for granular metrics structs,
i.e. Describe and Collect methods that access the underlying Prometheus metric.
This allows us to register and test granular metrics directly, without
unnecessarily exposing internal implementation.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Hopefully this comment makes it clearer how the labels filtering happens.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
@lambdanis lambdanis force-pushed the pr/lambdanis/labels-filter branch from 1e0aa21 to 317fb2d Compare April 17, 2024 15:24
@lambdanis
Copy link
Contributor Author

I only added one commit with a docstring and fixed a typo in a previous commit. Will merge after CI passes.

@lambdanis lambdanis merged commit 25a0050 into cilium:main Apr 17, 2024
38 checks passed
@lambdanis lambdanis mentioned this pull request Jul 3, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area/metrics Related to prometheus metrics release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants