Enhancement to outline path for network policies for all core components #1720

knobunc · 2024-11-26T17:38:26Z

This enhancement outlines how we will add network policies to all core components in OpenShift. It also outlines how it will eventually become enforcing, and how we will test compliance.

danwinship · 2024-12-02T16:51:11Z

enhancements/network/core-network-policies.md

+- As an administrator, I need to be able to override specific
+  OpenShift policies to be more restrictive so that I can satisfy my
+  security department


For the most part, this should not be necessary. If a given restriction is safe to apply, then it should be part of the default policy. And in that case, anything "more restrictive" would be something that would cause cluster functionality to break in some way. IMO we shouldn't allow admins to break functionality they aren't using, we should allow them to disable that functionality instead (and then they don't need to add NetworkPolicies restricting its use).

One specific case I can think of where this might apply is admission control webhooks / validation webhooks. The apiserver needs to reach out to any namespace that contains such a webhook, so by default, we'd have to allow the apiserver egress access to all namespaces (or at least, all "system" namespaces?). Administrators might want to disable that. But should that be expressed as "kube-apiserver can reach all namespaces by default but administrators can create additional NetworkPolicies to restrict that", or should we instead have kube-apiserver-operator just automatically add NPs for all and only the namespaces that contain webhooks? Or should we have a rule like "webhooks can only exist in namespaces with the label foo"?

(Also, FTR, there is no way to implement "kube-apiserver can reach all namespaces by default but administrators can create additional NetworkPolicies to restrict that" given the semantics of NetworkPolicy; we would have to say that administrators are allowed to delete the "kube-apiserver can reach all namespaces" NetworkPolicy.)

I want administrators to create AdminNetworkPolicy to express their exceptions (where they can deny).

Ultimately, I would love for operators to have a way to have config that allows them to create policy to restrict things like webhooks. They may need an allow-list of valid domains, or something.

But, for the shorter term, we already have people deploying their own Network Policy (not admin, it wasn't ready then) to restrict our stuff since we have no policy today.

enhancements/network/core-network-policies.md

danwinship · 2024-12-02T16:58:14Z

enhancements/network/core-network-policies.md

+1. Change the namespace admission controller so all namespaces with an
+   openshift- prefix (since those are special anyway) are labeled with
+   the “openshift-namespace’ label so **that network policy can address
+   them**


I always forget what the rule about OLM operators and openshift- is, so maybe you should explain that somewhere. (In particular, it seems from some of the other comments that OLM operators can make openshift- namespaces, but then how does that work with "(Not) Adding policies for operators not created by Red Hat"?)

The rule is horrible.

They get installed into a default openshift-operators namespace, but the OLM will need to own the policy for that namespace. I suspect it will need to be wildcard allow egress and ingress.

But, worse, the admin can chose any namespace to add it to... there are ongoing conversations about how to deal with this.

Former OLM team member here (recently moved to control plane group) 👋

OLM installed operators probably won't be able to specify NetworkPolicy resources in their bundles according to https://olm.operatorframework.io/docs/tasks/creating-operator-manifests/#packaging-additional-objects-alongside-an-operator

OLMv1 is supposed to GA in 4.18 and will, initially, respect a subset of the existing bundle format. The desire is to eventually support bundling formats that allows operator authors more control as to the contents packaged within their bundle (OLMv1 is going to be unopinionated in what is in the bundles) meaning operator authors will eventually be able to specify NetworkPolicy resources in their bundles.

With the current bundling format it seems like OLM would need to be updated to either:

Allow NetworkPolicy resources in the bundle alongside the CSV. IIUC, to achieve compliance with the high-level goal of this proposal this would likely mean updating all bundles in the catalogs to contain a sufficient NetworkPolicy resource. Some open questions for this approach:

Do we have enough control of the catalog curation pipelines to ensure this is done properly?

Should a wildcard NetworkPolicy be added to all bundles by default if the authors of the operator can't/won't add one themselves?

Stamp out a permissive NetworkPolicy resource on each operator installation.

With the focus on OLMv1 and some general attrition, there are not many folks on the OLM team left with knowledge of the legacy "OLMv0". Both of these are likely not trivial asks of the OLM team.

I do wonder if having CVO manage a default NetworkPolicy for the openshift-operators namespace, that is permissive enough for all operators installed in that namespace, is sufficient enough to achieve the compliance we are looking for even if admins can install an operator in any namespace?

👀

Operators running in the openshift-operators namespace may need network access to just about anything, so I agree with @knobunc that OLM will likely end up needing to ship with an allow-all policy.

Beyond that, I am pretty leery of any solution that requires operators to change their bundle contents, as this will effectively block the use of older versions of $everything.

I'm not clear on whether there's a mechanism that allows the cluster admin to install/override/configure the NetworkPolicy in an impacted namespace. I see a knob to shut the whole thing down. And this section hints at it, but I can't tell what actor (cluster admin or developer) or environment (dev/test or production) it's talking about. Anyway, it seems like a per-namespace "escape hatch" would help mitigate this issue, as it would allow operators to publish documentation describing how to retrofit policies for legacy versions.

The cluster admin can always create Admin Network Policies to override the Network Policies. They could also add a Network Policy to a namespace, but I recommend against that because the NPs are purely additive, there is no way to deny something added.

Not sure what you mean by "Beyond that, I am pretty leery of any solution that requires operators to change their bundle contents, as this will effectively block the use of older versions of $everything."

Can you clarify your concern?

Scenario: Customer is currently making use of some number of operators that were built/published without embedded NetworkPolicyz, either because they are versions that were built prior to this effort, or because the authors simply didn't. Customer upgrades their cluster to a version that enforces the requirement for NetworkPolicyz for everything running in openshift-* namespaces. Operators stop working because they don't have 'em. Customer is sad.

Note that I hedged my concern with speculations on how the admin could override this behavior. If there's a relatively easy way for them to do that, it might be okay. But of course there's the risk that they'll never go back and fix it up even if/when that becomes possible. (My oldest and wisest son likes to say, "There's nothing more permanent than a temporary solution.")

Added a section about it. In general, I see this as a decision that the OLM team should own, and they can work out how they want to handle migration to a default deny world for all of the operators.

However, they are somewhat of a unique case, and I have had a few conversations with @joelanford about how to handle it, so I feel comfortable adding a section to this enhancement.

enhancements/network/core-network-policies.md

danwinship · 2024-12-02T17:14:04Z

enhancements/network/core-network-policies.md

+
+We also need to provide this information to the support organization
+in a must-gather.  We will augment the must-gather tool to include
+information about network policy blocks in OpenShift namespaces.


Note that sufficiently-broken policies could interfere with both the Network Observability tool and must-gather.

Hopefully we can catch at least some of that in automated testing. I am assuming that these are the RH shipped policies, not random ones that a customer applied to us.

enhancements/network/core-network-policies.md

…nents This enhancement outlines how we will add network policies to all core components in OpenShift. It also outlines how it will eventually become enforcing, and how we will test compliance.

First round of updates in response to review comments, and some other clean-up.

openshift-bot · 2025-01-01T01:15:07Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

deads2k · 2025-01-07T15:49:23Z

enhancements/network/core-network-policies.md

+
+## Proposal
+
+1. Change the namespace admission controller so all namespaces with an


A new, openshift/kubernetes admission plugin is probably a simpler implementation. This will not work on HCP.

Use ValidatingAdmissionPolicy to prevent the removal of the label on openshift-* namespaces.

I assume you mean that we would add a new default mutating admission plugin (and not as a webhook?)

And I took your comment to mean that my proposal would not work with HCP and that is why we should use the admission plugin.

deads2k · 2025-01-07T15:50:38Z

enhancements/network/core-network-policies.md

+3. Work out how we can identify connections blocked by policy in
+   OpenShift namespaces (either egress or ingress) and work out how to
+   include it in a must-gather


Critical feature for development and field debugging. Very glad to see it here.

deads2k · 2025-01-07T15:52:33Z

enhancements/network/core-network-policies.md

+4.  If it does not start with `openshift-` then any
+    `security.openshift.io/openshift-namespace` label will be stripped out and can not be set


Don't mutate the input, but use ValidatingAdmissionPolicy to cause creation to fail to prevent this in newer clusters. Having the cluster-network-operator detect a violation of this rule and prevent upgrade until the cluster-admin corrects the problem is a good idea.

deads2k · 2025-01-07T15:53:50Z

enhancements/network/core-network-policies.md

+   3. Otherwise it will strip the `security.openshift.io/openshift-namespace` label
+
+   TODO: Decide if this is the correct behavior... do we want to allow
+   namespaces to opt-in to being part of the platform.  I do not think


Our platform namespaces are openshift-* namespaces. That's how things like monitoring recognize them. To be a part of the platform requires using that prefix.

deads2k · 2025-01-07T15:56:01Z

enhancements/network/core-network-policies.md

+    1. Document what traffic flows (ingress and egress) need to be
+       allowed for pods in the namespaces
+
+    2. The ACS product can be used to analyze workloads in a cluster to


We don't deploy ACS in CI. When ACS isn't installed, what options are available?

This is purely for the developers who do not know what their network flows are (which appears to be the case sometimes, even though I find that surprising). It would not be something end-users would need to do. And most of the time I don't think developers would need it.

deads2k · 2025-01-07T15:56:36Z

enhancements/network/core-network-policies.md

+       trying to talk to whom and mint tight network policies to reflect
+       that model.
+
+    4. The Network Observability tool can be used to detect when a network policy blocked traffic


Network observability isn't available by default. Without network observability, how will we identify and resolve issues?

We would need to enable https://docs.openshift.com/container-platform/4.18/networking/network_security/logging-network-security.html#nw-networkpolicy-audit-configure_logging-network-security by default with a low rate limit. We'd also need to make sure that gets in to a must-gather. I'll update the doc.

deads2k · 2025-01-07T15:58:08Z

enhancements/network/core-network-policies.md

+Are there any unique considerations for making this change work with Hypershift?
+
+No. 


I'm surprised there is no impact to having the kube-apiserver run outside the cluster and reach into the cluster to call things like admission webhooks. How about a specific admission webhook and CRD conversion webhook test to confirm that there is no HCP impact?

Well, I had hoped we would not need a webhook. I thought from our earlier conversation about this that you were not opposed to changing the namespace admission controller. But if we need to have a webhook, then I agree with your concern. Will update.

deads2k · 2025-01-07T16:00:03Z

enhancements/network/core-network-policies.md

+How does this proposal affect MicroShift? For example, if the proposal
+adds configuration options through API resources, should any of those
+behaviors also be exposed to MicroShift admins through the
+configuration file for MicroShift?


Microshift doesn't run the cluster-network-operator. How will upgrades of microshift get the appropriate namespace labels? How will upgrades prevent non-openshift-* namespaces from squatting on the label?

openshift-ci · 2025-01-07T16:04:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from knobunc. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

enhancements/network/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2025-01-15T00:45:11Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2025-01-22T08:15:21Z

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2025-01-22T08:15:36Z

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

knobunc · 2025-03-04T19:08:32Z

/reopen
/remove-lifecycle rotten

openshift-ci · 2025-03-04T19:08:43Z

@knobunc: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2025-03-04T19:31:40Z

@knobunc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/markdownlint	`25b3070`	link	true	`/test markdownlint`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from danwinship and trozet November 26, 2024 17:38

knobunc changed the title ~~Enhancement to outline path for network policies for all core components~~ WIP Enhancement to outline path for network policies for all core components Nov 26, 2024

openshift-ci bot added the do-not-merge/work-in-progress label Nov 26, 2024

danwinship reviewed Dec 2, 2024

View reviewed changes

knobunc added 2 commits December 3, 2024 13:53

Enhancement to outline how we get network policies for all core compo…

c185b85

…nents This enhancement outlines how we will add network policies to all core components in OpenShift. It also outlines how it will eventually become enforcing, and how we will test compliance.

Responded to review comments

25b3070

First round of updates in response to review comments, and some other clean-up.

knobunc force-pushed the core-network-policies branch from 0244f94 to 25b3070 Compare December 3, 2024 18:57

openshift-ci bot added the lifecycle/stale label Jan 1, 2025

deads2k requested changes Jan 7, 2025

View reviewed changes

openshift-ci bot added lifecycle/rotten and removed lifecycle/stale labels Jan 15, 2025

openshift-ci bot closed this Jan 22, 2025

openshift-ci bot reopened this Mar 4, 2025

openshift-ci bot removed the lifecycle/rotten label Mar 4, 2025

knobunc changed the title ~~WIP Enhancement to outline path for network policies for all core components~~ Enhancement to outline path for network policies for all core components Mar 12, 2025

openshift-ci bot removed the do-not-merge/work-in-progress label Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement to outline path for network policies for all core components #1720

Enhancement to outline path for network policies for all core components #1720

knobunc commented Nov 26, 2024

danwinship Dec 2, 2024

knobunc Dec 3, 2024

danwinship Dec 2, 2024

knobunc Dec 3, 2024

everettraven Dec 3, 2024

2uasimojo Jan 7, 2025

knobunc Mar 5, 2025

2uasimojo Mar 5, 2025

knobunc Mar 12, 2025

danwinship Dec 2, 2024

knobunc Dec 3, 2024

openshift-bot commented Jan 1, 2025

deads2k Jan 7, 2025

knobunc Mar 12, 2025

deads2k Jan 7, 2025

deads2k Jan 7, 2025

deads2k Jan 7, 2025

deads2k Jan 7, 2025

knobunc Mar 5, 2025

deads2k Jan 7, 2025

knobunc Mar 5, 2025

deads2k Jan 7, 2025

knobunc Mar 5, 2025

deads2k Jan 7, 2025

openshift-ci bot commented Jan 7, 2025

openshift-bot commented Jan 15, 2025

openshift-bot commented Jan 22, 2025

openshift-ci bot commented Jan 22, 2025

knobunc commented Mar 4, 2025

openshift-ci bot commented Mar 4, 2025

openshift-ci bot commented Mar 4, 2025


		## Proposal

		1. Change the namespace admission controller so all namespaces with an

		4. If it does not start with `openshift-` then any
		`security.openshift.io/openshift-namespace` label will be stripped out and can not be set

		Are there any unique considerations for making this change work with Hypershift?

		No.

Enhancement to outline path for network policies for all core components #1720

Are you sure you want to change the base?

Enhancement to outline path for network policies for all core components #1720

Conversation

knobunc commented Nov 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-bot commented Jan 1, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci bot commented Jan 7, 2025

openshift-bot commented Jan 15, 2025

openshift-bot commented Jan 22, 2025

openshift-ci bot commented Jan 22, 2025

knobunc commented Mar 4, 2025

openshift-ci bot commented Mar 4, 2025

openshift-ci bot commented Mar 4, 2025