You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I'm currently working on rolling out the OpenTelemetry Operator across all of the kubernetes (OpenShift) clusters in our environment. The capability of auto-instrumenting our application workloads will become crucial in our ability to support our systems. If something happens to the operator that results in pods NOT getting auto-instrumented, we'd potentially be "flying blind".
I'd like the ability to have finer insights into the counts of auto-instrumentation attempts and failures to build the proper alerting (SLOs).
Describe the solution you'd like
Instrument the pod mutator to create/increment metrics that indicate that a pod contains the instrumentation annotation and is subject to receive auto-instrumentation. Some initial ideas on the types of scenarios/metrics to expose:
pod contained instrumentation/sidecar annotation (may or may not be valid config) -> increment some counter saying "the podmutator will attempt to process"
pod contained invalid "inject" type -> pod mutation didn't happen, increment a counter to reflect this scenario
pod contained invalid instrumentation or sidecar reference in the annotation value -> pod mutation didn't happen, increment a counter to reflect this scenario
pod contained valid instrumentation or sidecar annotation/reference, but an unexpected error occurred -> pod mutation failed, increment a counter to reflect
I know some of these scenarios may be available in container or kubernetes logs, but for managing a fleet of operator across multiple clusters is much easier to do with aggregate metrics to feed to our alerting infrastructure.
Describe alternatives you've considered
I'm currently leveraging the metrics provided by the kubernetes api server admission controller to see the counts of webhook invocations sent to the mpod.kb.io and it does provide some insights, but not all pod creations will be eligible for OTel instrumentation (i.e. they may or may not have the instrumentation.opentelemetry.io annotations.
Additional context
No response
The text was updated successfully, but these errors were encountered:
We discussed this during the SIG meeting on 13.02.2025, and agreed that this would be a desirable feature. There's some performance issues related to reporting the number of Pods that should be instrumented, but aren't, but simply counting errors as they happen should be fine.
What we need to do next is propose names for the new metrics and attributes. If anyone has suggestions, feel free to post them in this issue.
Component(s)
auto-instrumentation
Is your feature request related to a problem? Please describe.
I'm currently working on rolling out the OpenTelemetry Operator across all of the kubernetes (OpenShift) clusters in our environment. The capability of auto-instrumenting our application workloads will become crucial in our ability to support our systems. If something happens to the operator that results in pods NOT getting auto-instrumented, we'd potentially be "flying blind".
I'd like the ability to have finer insights into the counts of auto-instrumentation attempts and failures to build the proper alerting (SLOs).
Describe the solution you'd like
Instrument the pod mutator to create/increment metrics that indicate that a pod contains the instrumentation annotation and is subject to receive auto-instrumentation. Some initial ideas on the types of scenarios/metrics to expose:
I know some of these scenarios may be available in container or kubernetes logs, but for managing a fleet of operator across multiple clusters is much easier to do with aggregate metrics to feed to our alerting infrastructure.
Describe alternatives you've considered
I'm currently leveraging the metrics provided by the kubernetes api server admission controller to see the counts of webhook invocations sent to the mpod.kb.io and it does provide some insights, but not all pod creations will be eligible for OTel instrumentation (i.e. they may or may not have the instrumentation.opentelemetry.io annotations.
Additional context
No response
The text was updated successfully, but these errors were encountered: