Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

KEP 4447: Promote PolicyReport API to Kubernetes SIG API #4448

Closed
wants to merge 6 commits into from

Conversation

anusha94
Copy link

  • One-line PR description: Adding a new KEP-4447 to promote PolicyReport API to a Kubernetes SIG API
  • Other comments: None

/sig auth
/wg policy

cc @JimBugwadia

@k8s-ci-robot k8s-ci-robot added sig/auth Categorizes an issue or PR as relevant to SIG Auth. wg/policy Categorizes an issue or PR as relevant to WG Policy. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 27, 2024
@k8s-ci-robot k8s-ci-robot requested a review from deads2k January 27, 2024 10:09
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Jan 27, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @anusha94!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 27, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @anusha94. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 27, 2024
Comment on lines +302 to +304
Based on the producer and usage, it is possible to create lots of report objects.
For example, if a policy engine has 20 policy rules and a namespace has 1000 pods,
an implementation may produce 20,000 reports. This can overwhelm etcd.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Based on the producer and usage, it is possible to create lots of report objects.
For example, if a policy engine has 20 policy rules and a namespace has 1000 pods,
an implementation may produce 20,000 reports. This can overwhelm etcd.
Based on the producer and usage, it is possible to create lots of report objects.
For example, if a policy engine has 20 policy rules and a namespace has 1000 pods,
an implementation may produce 20,000 reports. If a cluster operator deploys PolicyReport
into their cluster, using this APU can overwhelm etcd.

(is this a risk? We already let people deploy any CRD they like.)

Copy link

@sudermanjr sudermanjr Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has definitely been reported as an issue for users of the API. Whether that constitutes a risk or not is a good question, but this should be highlighted somewhere

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@maxsmythe maxsmythe Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my core concern with the concept of reporting findings via the API server.

I think events are another example of a high-volume object? One salient difference between this and events is that events are understood to be subject to throttling/sampling. Security reports may not have the same luxury.

I like the idea of reports-server, but IMO it would need to be an expectation that all clusters have a similar scalable backend solution before reports could be reliably enabled without risking cluster stability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also add that there is a difference between "users can deploy any CRD they like" and "K8s accepts using KRM/the API server this way as a valid practice", the second statement has much stronger implications around supportability.

Copy link
Member

@ritazh ritazh Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JimBugwadia can you comment on the cluster reliability and performance concerns brought up here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ritazh - can you please help clarify what exactly is expected?

The proposal is for a uniform API for reporting, and reliability or performance will depend heavily on implementations. For example, the API as a contract between consumers and producers can be used as a bounded log for the last N results.

We can help document best practices, but seems like a number of those may be applicable to any other API as well. For example, the standard size limits would apply, and resource limits can be configured.

Is there any prior work, done to test performance and reliability impacts of other APIs, that we can reference?

If there are specific tests or measurements that are recommended, happy to help capture the data.

Comment on lines +405 to +406
We need approvals from the following stakeholders:
[TBD]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we target this API at a release?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, its decoupled from Kubernetes releases.


- Add `policy-report-api` as a new project under kubernetes-sigs i.e `github.com/kubernetes-sigs/policy-report-api`
- Provide guidance on building consumers and producers

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to publish official artefacts for the API?

  • YAML manifest?
  • OCI image of Helm chart?
  • something else?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I suggest:

  • Golang client set to reuse in producers and consumers
  • Generated YAMLs
  • API spec
  • Docs

Comment on lines +302 to +304
Based on the producer and usage, it is possible to create lots of report objects.
For example, if a policy engine has 20 policy rules and a namespace has 1000 pods,
an implementation may produce 20,000 reports. This can overwhelm etcd.
Copy link

@sudermanjr sudermanjr Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has definitely been reported as an issue for users of the API. Whether that constitutes a risk or not is a good question, but this should be highlighted somewhere

@nilekhc
Copy link
Contributor

nilekhc commented Feb 12, 2024

/assign @ritazh

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 2, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 2, 2024
@JimBugwadia
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 2, 2024
Co-authored-by: Andy Suderman <andy@suderman.dev>
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 24, 2024
@JimBugwadia
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 24, 2024
not need to be as detailed as the proposal, but should include enough
information to express the idea and why it was not acceptable.
-->
- Adopt PolicyReport as an official, in-tree Kubernetes API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this proposal is not about putting the API it in-tree, but I'd like to register my opposition to putting it in-tree.

From a scalability/stability point of view, reporting seems like a secondary concern compared to running an application, and better targeted for a datastore that is not serving traffic (i.e. not the cluster running the actual workloads being audited). From a security operations point of view, I’d be concerned about the veracity of this API if it was coming from a cluster that is hosting the workload being reported on. I’d really want security reporting information to be stored in a separate domain (in this case, cluster) from the source of the data. If it were in the same cluster, any Kubernetes CVE/authorization misconfiguration becomes that much more worse and calls the authenticity of report data into question. To anyone relying on reports for compliance, that becomes a real business-impacting issue. Given all that, it feels like making such an API part of Kubernetes core would naturally lead people to adopt anti-patterns in both (stability & security) cases, making the cluster a self-contained unit where source data is gathered and reports are stored.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@micahhausler Yes, this proposal is not for adding the API in-tree. Instead, it is for a uniform API for reporting. I have listed in-tree as an alternative that was considered, but ruled out as policy reports are best managed as a Custom Resource. Perhaps I should clarify that.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: anusha94
Once this PR has been reviewed and has the lgtm label, please ask for approval from ritazh. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@anusha94
Copy link
Author

Closing this PR in favor of the OpenReports proposal.

@anusha94 anusha94 closed this Feb 21, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. wg/policy Categorizes an issue or PR as relevant to WG Policy.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

10 participants