Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

STOR-2141: add support for maxAllowedBlockVolumesPerNode #287

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

RomanBednar
Copy link
Contributor

@RomanBednar RomanBednar commented Feb 4, 2025

Depends on

Manual verification

Test value limits for maxAllowedBlockVolumesPerNode field:

oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":-1}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: -1: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":0}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 0: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":256}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 256: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be less than or equal to 255

Validate maxAllowedBlockVolumesPerNode value propagation to driver deployment as MAX_VOLUMES_PER_NODE:

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":60}}}}'
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.specontainers[0].env}'
[{"name":"CSI_ENDPOINT","value":"unix:///var/lib/csi/sockets/pluginproxy/csi.sock"},{"name":"X_CSI_MODE","value":"controller"},{"name":"VSPHERE_CSI_CONFIG","value":"/etc/kubernetes/vsphere-csi-config/cloud.conf"},{"name":"INCLUSTER_CLIENT_QPS","value":"100"},{"name":"INCLUSTER_CLIENT_BURST","value":"100"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"X_CSI_SERIAL_VOL_ACCESS_TIMEOUT","value":"3m"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

oc -n openshift-cluster-csi-drivers get daemonset.apps/vmware-vsphere-csi-driver-node -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"CSI_ENDPOINT","value":"unix:///csi/csi.sock"},{"name":"X_CSI_MODE","value":"node"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

Validate propagation to CSINode as allocatable count:

$ oc get csinode/ci-ln-k30mn5t-c1627-2tk2k-worker-0-72mfn -o jsonpath='{.spec.drivers[0].allocatable.count}'
60

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 4, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 4, 2025

@RomanBednar: This pull request references STOR-2141 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 4, 2025
@openshift-ci openshift-ci bot requested review from dobsonj and gnufied February 4, 2025 11:34
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 4, 2025
Copy link
Contributor

openshift-ci bot commented Feb 6, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RomanBednar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RomanBednar RomanBednar changed the title WIP: STOR-2141: add daemonset hook to allow setting custom volume limit WIP: STOR-2141: add support for maxAllowedBlockVolumesPerNode Mar 17, 2025
@RomanBednar RomanBednar force-pushed the STOR-2141 branch 3 times, most recently from 623109c to ea8a387 Compare March 18, 2025 13:48
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 18, 2025

@RomanBednar: This pull request references STOR-2141 which is a valid jira issue.

In response to this:

Manual verification

Test value limits for maxAllowedBlockVolumesPerNode field:

oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":-1}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: -1: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":0}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 0: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":256}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 256: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be less than or equal to 255

Validate maxAllowedBlockVolumesPerNode value propagation to driver deployment as MAX_VOLUMES_PER_NODE:

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":60}}}}'
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.specontainers[0].env}'
[{"name":"CSI_ENDPOINT","value":"unix:///var/lib/csi/sockets/pluginproxy/csi.sock"},{"name":"X_CSI_MODE","value":"controller"},{"name":"VSPHERE_CSI_CONFIG","value":"/etc/kubernetes/vsphere-csi-config/cloud.conf"},{"name":"INCLUSTER_CLIENT_QPS","value":"100"},{"name":"INCLUSTER_CLIENT_BURST","value":"100"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"X_CSI_SERIAL_VOL_ACCESS_TIMEOUT","value":"3m"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

oc -n openshift-cluster-csi-drivers get daemonset.apps/vmware-vsphere-csi-driver-node -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"CSI_ENDPOINT","value":"unix:///csi/csi.sock"},{"name":"X_CSI_MODE","value":"node"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

Validate propagation to CSINode as allocatable count:

$ oc get csinode/ci-ln-k30mn5t-c1627-2tk2k-worker-0-72mfn -o jsonpath='{.spec.drivers[0].allocatable.count}'
60

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 20, 2025

@RomanBednar: This pull request references STOR-2141 which is a valid jira issue.

In response to this:

Depends on

Manual verification

Test value limits for maxAllowedBlockVolumesPerNode field:

oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":-1}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: -1: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":0}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 0: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be greater than or equal to 1

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":256}}}}'
The ClusterCSIDriver "csi.vsphere.vmware.com" is invalid: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode: Invalid value: 256: spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode in body should be less than or equal to 255

Validate maxAllowedBlockVolumesPerNode value propagation to driver deployment as MAX_VOLUMES_PER_NODE:

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"maxAllowedBlockVolumesPerNode":60}}}}'
clustercsidriver.operator.openshift.io/csi.vsphere.vmware.com patched

$ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -o jsonpath='{.specontainers[0].env}'
[{"name":"CSI_ENDPOINT","value":"unix:///var/lib/csi/sockets/pluginproxy/csi.sock"},{"name":"X_CSI_MODE","value":"controller"},{"name":"VSPHERE_CSI_CONFIG","value":"/etc/kubernetes/vsphere-csi-config/cloud.conf"},{"name":"INCLUSTER_CLIENT_QPS","value":"100"},{"name":"INCLUSTER_CLIENT_BURST","value":"100"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"X_CSI_SERIAL_VOL_ACCESS_TIMEOUT","value":"3m"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

oc -n openshift-cluster-csi-drivers get daemonset.apps/vmware-vsphere-csi-driver-node -o jsonpath='{.spec.template.spec.containers[0].env}'
[{"name":"NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"CSI_ENDPOINT","value":"unix:///csi/csi.sock"},{"name":"X_CSI_MODE","value":"node"},{"name":"X_CSI_SPEC_DISABLE_LEN_CHECK","value":"true"},{"name":"CSI_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MAX_VOLUMES_PER_NODE","value":"60"}]

Validate propagation to CSINode as allocatable count:

$ oc get csinode/ci-ln-k30mn5t-c1627-2tk2k-worker-0-72mfn -o jsonpath='{.spec.drivers[0].allocatable.count}'
60

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

In order to validate maximum volume attachment limit set by user
NodeChecker needs access to ClusterCSIDriver which is where users
set the value.
We need to check versions of all ESXI hosts in the cluster and if we
detect that users set a custom volume attachment limit that is
incorrect we degrade the cluster.

Incorrect value is any value above 59 if any of the vSphere hosts in a
cluster is not on ESXI version 8 or higher
Since NodeChecker now checks max attachment limit value it is now safe
to add hooks for reflecting maxAllowedBlockVolumesPerNode field of
clusterCSIDriver into deployment and daemonset as env variable.
@RomanBednar RomanBednar changed the title WIP: STOR-2141: add support for maxAllowedBlockVolumesPerNode STOR-2141: add support for maxAllowedBlockVolumesPerNode Mar 21, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2025
@RomanBednar
Copy link
Contributor Author

/retest-required

Copy link
Contributor

openshift-ci bot commented Mar 21, 2025

@RomanBednar: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 3942bf2 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@RomanBednar
Copy link
Contributor Author

/assign @gnufied

For review - feel free to reassign to other candidate.


// Check if a custom value is set in the ClusterCSIDriver
if clusterCSIDriver != nil && clusterCSIDriver.Spec.DriverConfig.VSphere != nil {
maxVolumesPerNode = int(clusterCSIDriver.Spec.DriverConfig.VSphere.MaxAllowedBlockVolumesPerNode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this return when no limit is set? 0 ??

@gnufied
Copy link
Member

gnufied commented Mar 25, 2025

Shouldn't all this code be behind a feature gate?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants