-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
add observability ui operator proposal #1494
Conversation
a221514
to
775af95
Compare
/retest |
|
||
- As an OpenShift user, I want an operator from the Red Hat catalog that can deploy various observability UI components so that all signals supported by the cluster are easily accessible and can be used for troubleshooting. | ||
|
||
- As an OpenShift administrator, I want a centralized operator for observability UI components so that I can streamline console requirements and integrate diverse signals effectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is that it will require the user to install another operator just to get the UI portion unless more work is done to provide an option to install this for you when you install the component such as Openshift Logging or Network Observability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A unique console plugin for all observability operators would solve that point without the need to install an extra operator. As soon as you have at least one installed, the observability plugin is there.
Behind the scene, the plugin can feature gate the exposed pages according to available metrics / logs etc.
The challenge in this approach would be the compatibility between plugin version and each operator version but there are ways to remediate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A unique console plugin for all observability operators would solve that point without the need to install an extra operator. As soon as you have at least one installed, the observability plugin is there.
@jpinsonneau But how does the observability plugin get installed? Is the suggestion for all observability operators to include a copy of the plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all operators could embed the same plugin and the most up to date is taken. It need to create a mechanism to identify who's responsible of it, using owner / version labels for example. The code managing that must be shared between all operators to avoid any reconcile issue. It's a cross team effort !
As an alternative, the plugin could be embedded in Monitoring Operator or OCP Console directly, and enabled when needed. The downside of this approach is the update cycle, tied to the owner.
Both of these approach seems to be a better match for the objectives listed in this doc rather than creating yet another operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all operators could embed the same plugin and the most up to date is taken. It need to create a mechanism to identify who's responsible of it, using owner / version labels for example. The code managing that must be shared between all operators to avoid any reconcile issue. It's a cross team effort !
This sounds pretty involved and easy to get wrong. If the goal is to simplify the signal operators, is it possible we're actually making things harder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stleerh I believe this will give customers the flexibility to add visualization based on their needs, if in some case metrics or logs are just forwarded, the visualization piece might not be needed saving resources. for multi-cluster setups it should be enabled only in the hub cluster . If visualization should be installed by default this can come from a meta operator like OBO that enables the signal operator and enables that piece in the Observability UI operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to me it's still a bit heavy to create a dedicated operator, not including monitoring, just to avoid update cycles.
It would make more sense if the target was:
- including monitoring dashboards / metrics query pages
- sharing storages (Prometheus, Loki etc)
- managing gateway + roles
- enabling correlation between any metrics / storages
Your plugin should rely on the logStore
you configured in ClusterLogging
CR, so even if it doesn't consume the logging services, it rely on its configuration.
Our Console Plugin is deployed by Network Observability Operator with Loki storage behind the scene, just as same as you:
- both
collector
andplugin
clients for Loki are configured in the same section of FlowCollector CR - Plugin is optionnal
- FYI, even Loki became optionnal since we want to provide a lightweight observability based on Prometheus metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correlation is given by korrel8r and enhanced customizable dashboards by Perses. So the observability operator is intended to do this heavy lifting of configuring and installing the Perses operator and korrel8r for correlation.
The logging plugin could be connected to any Loki store without even having cluster logging in a cluster, and the only task for the logging operator is to enable it. So it currently relies on the CR configuration but this is not ideal as any other operator can do the job.
IIUC even if the net observe plugin is optional is coupled to the net observe operator backend, regardless of the store. This is a reason why we wont include the network observability plugin as it has a clear 1:1 relationship with its operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implementation yes, however that 1:1 relationship may change in the future, depending on multi cluster needs.
I think that's the reason why logging plugin can be installed without the operator right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this will be the ideal in multi cluster scenario so visualization is present only where is needed.
|
||
The Observability UI Operator will be available in the Red Hat catalog. | ||
|
||
It will manage the deployment of several components which will be added incrementally based on priority, as shown in the table below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is only one Observability UI Operator, this means all of the components must be compatible with a specific version of the UI operator. Will the Observability UI Operator guarantee backwards-compatibility?
**OpenShift cluster administrator** is responsible for installing, enabling, configuring, and managing the plugins and operators within the OpenShift environment. | ||
**OpenShift user** is the end-user interfacing with the OpenShift console and making use of the observability signals presented by the dynamic console plugins. | ||
|
||
1. The cluster administrator installs the ObservabilityUI operator from the RedHat Catalog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be able to install operator dependencies automatically and not expect the user to do this, similar to all package managers like apt, rpm, dnf, npm, pip, go get, etc. Let's do some investigation on what OLM supports.
|
||
- As an OpenShift administrator, I want a centralized operator for observability UI components so that I can streamline console requirements and integrate diverse signals effectively. | ||
|
||
- As an OpenShift user, I want to customize observability dashboards with various signals so that I can quickly identify and resolve issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this replace monitoring ?
The opposite is mentionned in the non goals section so the difference should be clear between monitoring dashboards and the plugin ones.
Unifying monitoring in a single place would be a real value for the user. It's a nonsense to have dashboards separated per team today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will add new customizable dashboards, as the dashboards should not be specific to a single datasource. Totally agree that dashboards per team create disparity. Hence the goal is to add new dashboards that can consume multiple datasources and have richer charts so teams are not constrained in visualization. This is described in the design section: Dashboards console plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it definitly makes sense for monitoring to be part of it and take advantage of the improvments ?
See #1494 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Monitoring currently is more than only dashboards as it includes alerting rules and service monitors. So the part that is not coupled (dashboards) and offers other teams more advantages is the part we are extracting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest concern is that the transition to this model sounds pretty rough for cluster admins with several steps and a lot of manual configuration. If those steps aren't completed, the observability UI will simply disappear for users. We should look if there are ways to streamline the transition and make things just work.
The other major drawback is that plugins will now have to deal with version skew between the observability UI operator and the signal operators. If the plugin is packaged with the signal operator, this isn't an issue. We should decide whether we only want this for cross-cutting plugins.
**OpenShift user** is the end-user interfacing with the OpenShift console and making use of the observability signals presented by the dynamic console plugins. | ||
|
||
1. The cluster administrator installs the ObservabilityUI operator from the RedHat Catalog. | ||
2. If there is an existing observability UI plugin deployed by another operator, the cluster administrator disables it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't necessarily need to be a manual process. One pattern we've taken when phasing out static plugins is for the new plugin to set a console feature flag like OBSERVABILITY_PLUGIN
, then the old plugin will disable all extension points when that flag is present. That way if the admin doesn't disable the old plugin, you won't have duplicate pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the suggestion, will update accordingly
|
||
1. The cluster administrator installs the ObservabilityUI operator from the RedHat Catalog. | ||
2. If there is an existing observability UI plugin deployed by another operator, the cluster administrator disables it. | ||
3. The cluster administrator configures the operator adding a custom resources (CR) to deploy the desired plugins and link them with the corresponding signal operators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the presence of the observability operators something we can detect ourselves using console feature flags so that this step isn't required?
|
||
### API Extensions | ||
|
||
This enhancement introduces a new CRD to represent observability UI console plugins. The `ObservabilityUIConsolePlugin` CR for a plugin will be defined as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we're asking a lot of administrators to set this up. If they forgot to do it or get it wrong, the UI will be missing or non-functional. I think we should look at whether we can discover these services on the cluster instead of requiring manual configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it the role of the admin, or the role of the signal operator, to create this CR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be the role of the admin, but as @spadgett suggests above, this might come also from the operator discovering the services and enabling the plugins accordingly.
|
||
### Open Questions | ||
|
||
- How does the operator enables the plugins from the Observability UI Operator without having to patch the console operator? Answer fom the console team: Plugins signed by Red Hat can be enabled by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Red Hat plugins aren't enabled by default. If it's a Red Hat operator, we do default the radio button to enabled when installing the operator in the UI. This isn't done through signing, but by looking at the catalog source IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted the phrasing as I wanted to describe what could be a solution in the future.
|
||
### Why: | ||
|
||
The current state of observability signals in the OpenShift console has each operator responsible for its own console plugin. This sometimes results in operators deploying plugins outside their primary scope. As the requirements for the console's UI grow, there's a clear need for a centralized system that can manage diverse UI components spanning across various signals to offer a unified observability experience in the console. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the intent to move all observability UIs into this common plugin or only plugins that don't have a clear 1:1 relationship with an existing operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only plugins that do not have a clear 1:1 relationship, as they are currently misplaced inside other operators, like the cluster logging operator.
|
||
# Observability UI Operator | ||
|
||
The Observability UI Operator aims to manage dynamic console plugins for observability signals inside the OpenShift console, ensuring a consistent user experience and efficient management of UI plugins. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the proposal, the Observability UI operator itself would contain the implementation of the UI plugins. Is that correct?
If so, it might be good to clarity it just at the beginning, as one could still read this proposal as the plugins being installed from separate components/repositories.
|
||
Decouple the responsibility of managing observability UI from operators, enabling each operator to focus solely on its primary functionalities. | ||
|
||
Enhance the observability experience on the console by providing components like [Perses](https://github.com/perses/perses) for customizable dashboards and [korrel8r](https://github.com/korrel8r/korrel8r) for correlation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgbernalp just to clarify, after our discussion I still have a doubt: if an operator like NetObserv wants to deploy perses dashboards (and/or datasources), does it also have to have a new plugin integrated via the ObservabilityUIConsolePlugin CRD? Or this can be two independent things, and deployed perses dashboards can anyway go into a "generic pool" like what today exists in the menu "Observe > dashboards", without the need for a dedicated plugin ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jotak I believe these are independent things. When an operator uses a Perses dashboard, the Observability UI operator must enable the necessary plugins/proxies in the console so the user can see the dashboard, ideally this will fall also into the observe > dashboards with an enhanced UI.
This aligns with what Sam suggested about having less manual configuration for the admins or in this case other operators.
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
a3f2f29
to
af816e0
Compare
af816e0
to
164c9a7
Compare
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle stale |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@jgbernalp: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle stale |
/remove-lifecycle rotten |
#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template. |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@periklis: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jgbernalp: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
5 similar comments
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "New". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
1 similar comment
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, OU-204, has status "In Progress". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future. |
This proposal by the Console UI team aims to improve the way we manage observability plugins in the OpenShift console. The focus is on reducing complexity and unifying the Observability experience in the OpenShift console.