Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[k8s] improve kubernetes_secrets provider secret logging #6841

Conversation

pkoutsovasilis
Copy link
Contributor

@pkoutsovasilis pkoutsovasilis commented Feb 13, 2025

What does this PR do?

This PR enhances the logging of the kubernetes_secrets provider to improve visibility into secret updates and cache operations. The changes include:

  • Improving cache update logging to specify which secrets were updated or expired.
  • Enhancing error messages to clarify failures in secret retrieval.
  • Updating the function fetchFromAPI to return the resource version of a secret.
  • Adding unit tests for fetchFromAPI to validate secret retrieval behavior.

Why is it important?

Previously, the kubernetes_secrets provider had minimal logging, making it difficult to determine when secrets were updated, expired, or failed to refresh. This lack of visibility led to troubleshooting challenges, especially in cases where secrets did not rotate as expected. With these improvements, users can now track secret updates more effectively without exposing sensitive values.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

This change does not alter the core functionality of the kubernetes_secrets provider but improves logging. Users should see more detailed logs related to secret updates, which can be useful for debugging. There are no breaking changes expected.

How to test this PR locally

Run the kubernetes_secrets provider unit-tests

Related issues

@pkoutsovasilis pkoutsovasilis added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Feb 13, 2025
@pkoutsovasilis pkoutsovasilis self-assigned this Feb 13, 2025
Copy link
Contributor

mergify bot commented Feb 13, 2025

This pull request does not have a backport label. Could you fix it @pkoutsovasilis? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@pkoutsovasilis pkoutsovasilis force-pushed the k8s/kubernetes_secrets_provider_logging branch from ada501f to 205b8f1 Compare February 13, 2025 09:13
@pkoutsovasilis pkoutsovasilis added backport-8.x Automated backport to the 8.x branch with mergify backport-9.0 Automated backport to the 9.0 branch labels Feb 13, 2025
@pkoutsovasilis pkoutsovasilis marked this pull request as ready for review February 14, 2025 21:06
@pkoutsovasilis pkoutsovasilis requested a review from a team as a code owner February 14, 2025 21:06
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, just a question about why the use of backslash when golang offers a way to not have to do that.

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed the inline comment.

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pkoutsovasilis
Copy link
Contributor Author

Thanks!

I thank you for the comments 🙂

@pkoutsovasilis
Copy link
Contributor Author

@cmacknz if we follow our policies this shouldn't be backported on any active releases branches. But I believe such enhancement will be helpful for users that use kubernetes_secrets provider on active release such 8.17.x. WDYT? 🙂

@cmacknz
Copy link
Member

cmacknz commented Feb 19, 2025

I would backport this everywhere the secrets provider implementation has changed.

@cmacknz
Copy link
Member

cmacknz commented Feb 19, 2025

Is there a way we can get the current secret resource version agent substituted into the agent policy added into say computed-config.yaml or something? I suppose we could just log it, that might get verbose if there are a lot of secrets and/or policy changes.

One of the questions I had when I was trying to debug this was, "is the secret provider working correctly but the agent somehow has a stale secret value" and I don't see anything to help me answer that.

@pkoutsovasilis pkoutsovasilis added backport-8.17 Automated backport with mergify backport-8.18 Automated backport to the 8.18 branch labels Feb 20, 2025
@pkoutsovasilis
Copy link
Contributor Author

after looking more into running this PR actually on a k8s cluster to see how variables of kubernetes_secrets provider are printed in the computed-config inside a diagnostics, I run into this issue panic

panic: assignment to entry in nil map

goroutine 145 [running]:
github.com/elastic/elastic-agent-libs/mapstr.M.Put(0x0, {0x40000bbd40, 0x12}, {0x9a8db80, 0x4000444320})
        /Users/pkoutsovasilis/go/pkg/mod/github.com/elastic/elastic-agent-libs@v0.18.2/mapstr/mapstr.go:383 +0x140
github.com/elastic/elastic-agent/internal/pkg/composable.(*controller).Run.func2()
        /Users/pkoutsovasilis/repos/elastic-agent/internal/pkg/composable/controller.go:204 +0x164
created by github.com/elastic/elastic-agent/internal/pkg/composable.(*controller).Run in goroutine 131
        /Users/pkoutsovasilis/repos/elastic-agent/internal/pkg/composable/controller.go:194 +0x440

I don't see how the provider can cause this and I have already reached out to @blakerouse for support

Until then let's keep this PR un-merged

@pkoutsovasilis
Copy link
Contributor Author

So @cmacknz I did get a hands-on experience with it. As of now, you could debug the kubernetes_secrets provider by just downloading the diagnostics and looking at the computed-config.yaml. This will contain the actual secret values where you can deem if the provider has stale secrets or not. Although this might seem somewhat convenient, IMO this is an issue and we should change it. I opened this one as a follow-up.

@pkoutsovasilis
Copy link
Contributor Author

created issue for one new flaky test here #6965

@pkoutsovasilis
Copy link
Contributor Author

created another issue for a new flaky test here

@elasticmachine
Copy link
Contributor

elasticmachine commented Feb 25, 2025

💚 Build Succeeded

History

cc @pkoutsovasilis

@pkoutsovasilis
Copy link
Contributor Author

pkoutsovasilis commented Feb 25, 2025

created one more issue for Flaky one here. Thank you all for the reviews 🤘

@pkoutsovasilis pkoutsovasilis merged commit 0ae5d35 into elastic:main Feb 25, 2025
14 checks passed
mergify bot pushed a commit that referenced this pull request Feb 25, 2025
* feat: improve kubernetes_secrets provider secret logging

* feat: improve readability by replacing Info with Infof

* fix: categorise properly the type of change

* feat: more logging readability improvements

* feat: improve readability by replacing "%s" with %q

(cherry picked from commit 0ae5d35)
mergify bot pushed a commit that referenced this pull request Feb 25, 2025
* feat: improve kubernetes_secrets provider secret logging

* feat: improve readability by replacing Info with Infof

* fix: categorise properly the type of change

* feat: more logging readability improvements

* feat: improve readability by replacing "%s" with %q

(cherry picked from commit 0ae5d35)
mergify bot pushed a commit that referenced this pull request Feb 25, 2025
* feat: improve kubernetes_secrets provider secret logging

* feat: improve readability by replacing Info with Infof

* fix: categorise properly the type of change

* feat: more logging readability improvements

* feat: improve readability by replacing "%s" with %q

(cherry picked from commit 0ae5d35)
mergify bot pushed a commit that referenced this pull request Feb 25, 2025
* feat: improve kubernetes_secrets provider secret logging

* feat: improve readability by replacing Info with Infof

* fix: categorise properly the type of change

* feat: more logging readability improvements

* feat: improve readability by replacing "%s" with %q

(cherry picked from commit 0ae5d35)
@@ -151,7 +151,8 @@ func (p *contextProviderK8SSecrets) Fetch(key string) (string, bool) {

if p.config.DisableCache {
// cache disabled - fetch secret from the API
return p.fetchFromAPI(ctx, secretName, secretNamespace, secretKey)
val, _, ok := p.fetchFromAPI(ctx, secretName, secretNamespace, secretKey)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkoutsovasilis I'm guessing that adding logging here would add too much noise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is with cache disabled thus Fetch will always return the latest value from API server, right? we could definitely add it but since there is no intermediate layer that can bear stale secrets I thought we could try and minimise noise. What do you think? 🙂

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a debug-level log make sense?

secretKey: "secret_key_not_found",
},
{
name: "key in secret not found",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the happy path case, where the key is found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the almost happy path 😆 so the secret is found in the api server, there is no error reading it, but the key inside the secret we are trying to fetch doesn't exist. does it make sense?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be missing something 😕 IIUC, the key we're looking for is secret_key (that's what we're passing to the fetchFromAPI function on line 978), and the mock secret we're building also has a key of secret_key (line 974). So to me it seems like the key is found

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.17 Automated backport with mergify backport-8.18 Automated backport to the 8.18 branch backport-9.0 Automated backport to the 9.0 branch Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it possible to view and debug the lifecycle of a secret populated by the kubernetes_secrets provider
8 participants