|
| 1 | +--- |
| 2 | +title: vsphere-configurable-maximum-allowed-number-of-block-volumes-per-node |
| 3 | +authors: |
| 4 | + - "@rbednar" |
| 5 | +reviewers: |
| 6 | + - "@jsafrane" |
| 7 | + - "@gnufied" |
| 8 | + - "@deads2k" |
| 9 | +approvers: |
| 10 | + - "@jsafrane" |
| 11 | + - "@gnufied" |
| 12 | + - "@deads2k" |
| 13 | +api-approvers: |
| 14 | + - "@deads2k" |
| 15 | +creation-date: 2025-01-31 |
| 16 | +last-updated: 2025-01-31 |
| 17 | +tracking-link: |
| 18 | + - https://issues.redhat.com/browse/OCPSTRAT-1829 |
| 19 | +see-also: |
| 20 | + - "None" |
| 21 | +replaces: |
| 22 | + - "None" |
| 23 | +superseded-by: |
| 24 | + - "None" |
| 25 | +--- |
| 26 | + |
| 27 | +# vSphere configurable maximum allowed number of block volumes per node |
| 28 | + |
| 29 | +This document proposes an enhancement to the vSphere CSI driver to allow administrators to configure the maximum number |
| 30 | +of block volumes that can be attached to a single vSphere node. This enhancement addresses the limitations of the |
| 31 | +current driver, which currently relies on a static limit that can not be changed by cluster administrators. |
| 32 | + |
| 33 | +## Summary |
| 34 | + |
| 35 | +The vSphere CSI driver for vSphere version 7 uses a constant to determine the maximum number of block volumes that can |
| 36 | +be attached to a single node. This limit is inflcuened by the number of SCSI controllers available on the node. |
| 37 | +By default, a node can have up to four SCSI controllers, each supporting up to 15 devices, allowing for a maximum of 60 |
| 38 | +volumes per node (59 + root volume). |
| 39 | + |
| 40 | +However, vSphere version 8 increased the maximum number of volumes per node to 256 (255 + root volume). This enhancement |
| 41 | +aims to leverage this increased limit and provide administrators with finer-grained control over volume allocation |
| 42 | +allowing them to configure the maximum number of block volumes that can be attached to a single node. |
| 43 | + |
| 44 | +Details about configuration maximums: https://configmax.broadcom.com/guest?vmwareproduct=vSphere&release=vSphere%208.0&categories=3-0 |
| 45 | +Volume limit configuration for vSphere storage plug-in: https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/container-storage-plugin/3-0/getting-started-with-vmware-vsphere-container-storage-plug-in-3-0/vsphere-container-storage-plug-in-concepts/configuration-maximums-for-vsphere-container-storage-plug-in.html |
| 46 | + |
| 47 | +## Motivation |
| 48 | + |
| 49 | +### User Stories |
| 50 | + |
| 51 | +- As a vSphere administrator, I want to configure the maximum number of volumes that can be attached to a node, so that |
| 52 | + I can optimize resource utilization and prevent oversubscription. |
| 53 | +- As a cluster administrator, I want to ensure that the vSphere CSI driver operates within the limits imposed by the |
| 54 | + underlying vSphere infrastructure. |
| 55 | + |
| 56 | +### Goals |
| 57 | + |
| 58 | +- Provide administrators with control over volume allocation limit on vSphere nodes. |
| 59 | +- Improve resource utilization and prevent oversubscription. |
| 60 | +- Ensure compatibility with existing vSphere infrastructure limitations. |
| 61 | +- Maintain backward compatibility with existing deployments. |
| 62 | + |
| 63 | +### Non-Goals |
| 64 | + |
| 65 | +- Support heterogeneous environments with different ESXi versions on nodes that form OpenShift cluster. |
| 66 | +- Dynamically adjust the limit based on real-time resource usage. |
| 67 | +- Implement per-namespace or per-workload volume limits. |
| 68 | +- Modify the underlying vSphere VM configuration. |
| 69 | + |
| 70 | +## Proposal |
| 71 | + |
| 72 | +1. Driver Feature State Switch (FSS): |
| 73 | + |
| 74 | + - Use FSS (`max-pvscsi-targets-per-vm`) of the vSphere driver to control the activation of the maximum volume limit |
| 75 | + functionality. |
| 76 | + - No changes needed, the feature is enabled by default. |
| 77 | + |
| 78 | +2. API for Maximum Volume Limit: |
| 79 | + |
| 80 | + - Introduce a new field `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode` in ClusterCSIDriver API to allow |
| 81 | + administrators to configure the desired maximum number of volumes per node. |
| 82 | + - This field should default to the current maximum limit of 59 volumes per node for vSphere 7. |
| 83 | + - API will not allow `0` value to be set `MAX_VOLUMES_PER_NODE` or allow the field to be unset. This would lead to |
| 84 | + disabling the limit. |
| 85 | + - Allowed range of values should be 1 to 255. The maximum value matches vSphere 8 limit. |
| 86 | + |
| 87 | +3. Update CSI Node Pods: |
| 88 | + |
| 89 | + - After reading the new `maxAllowedBlockVolumesPerNode` API field from ClusterCSIDriver the operator will inject the |
| 90 | + `MAX_VOLUMES_PER_NODE` environment variable into node pods using a DaemonSet hook. |
| 91 | + - Any value that is statically set for the `MAX_VOLUMES_PER_NODE` environment variable in DaemonSet asset file will |
| 92 | + be overwritten. If the variable is omitted in the asset, DaemonSet hook will add it and set its value to the |
| 93 | + default defined in ClusterCSIDriver API. |
| 94 | + |
| 95 | +4. Driver Behavior: |
| 96 | + |
| 97 | + - The vSphere CSI driver needs to allow the higher limit with Feature State Switch FSS (`max-pvscsi-targets-per-vm`). |
| 98 | + - The switch is already enabled by default in versions shipped in OpenShift 4.19. |
| 99 | + - The driver will report the volume limit as usual in response to `NodeGetInfo` calls. |
| 100 | + |
| 101 | +5. Documentation: |
| 102 | + |
| 103 | + - Update the vSphere CSI driver documentation to include information about the new feature and how to configure it. |
| 104 | + - Include a statement informing users of the current requirement of having a homogeneous cluster with all nodes |
| 105 | + running ESXi 8 or higher. Until this requirement is met, the limit set in maxAllowedBlockVolumesPerNode must not be |
| 106 | + increased to a higher value than 59. |
| 107 | + |
| 108 | +### Workflow Description |
| 109 | + |
| 110 | +1. Administrator configures the limit: |
| 111 | + - The administrator creates or updates a ClusterCSIDriver object to specify the desired maximum number of volumes per |
| 112 | + node using the new `maxAllowedBlockVolumesPerNode` API field. |
| 113 | +2. Operator reads configuration: |
| 114 | + - The vSphere CSI Operator monitors the ClusterCSIDriver object for changes. |
| 115 | + - Upon detecting a change, the operator reads the configured limit value. |
| 116 | +3. Operator updates the new limit for DaemonSet: |
| 117 | + - The operator updates the DaemonSet for the vSphere CSI driver, injecting the `MAX_VOLUMES_PER_NODE` environment |
| 118 | + variable with the configured limit value into the driver node pods on worker nodes. |
| 119 | +4. Driver reflects the limit in deployments: |
| 120 | + - The vSphere CSI driver checks that `max-pvscsi-targets-per-vm` FSS is set to true and reflects the limit as |
| 121 | + `MAX_VOLUMES_PER_NODE` environment variable and uses the configured limit during |
| 122 | + volume provisioning requests. |
| 123 | + |
| 124 | +### API Extensions |
| 125 | + |
| 126 | +- New field in ClusterCSIDriver CRD: |
| 127 | + - A new CRD field will be introduced to represent the maximum volume limit configuration. |
| 128 | + - This CRD will contain a single new field (e.g., `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode`) to define |
| 129 | + the desired limit. |
| 130 | + - The field will default to the current maximum limit of 59 volumes per node for vSphere 7. |
| 131 | + - The CRD will validate the value fits within the defined range (1-255). |
| 132 | + |
| 133 | +### Topology Considerations |
| 134 | + |
| 135 | +#### Hypershift / Hosted Control Planes |
| 136 | + |
| 137 | +No unique considerations for Hypershift. The configuration and behavior of the vSphere CSI driver with respect to the |
| 138 | +maximum volume limit will remain consistent across standalone and managed clusters. |
| 139 | + |
| 140 | +#### Standalone Clusters |
| 141 | + |
| 142 | +This enhancement is fully applicable to standalone OpenShift clusters. |
| 143 | + |
| 144 | +#### Single-node Deployments or MicroShift |
| 145 | + |
| 146 | +No unique considerations for MicroShift. The configuration and behavior of the vSphere CSI driver with respect to the |
| 147 | +maximum volume limit will remain consistent across standalone and SNO/MicroShift clusters. |
| 148 | + |
| 149 | +### Implementation Details/Notes/Constraints |
| 150 | + |
| 151 | +One of the possible future constraints might be increasing the limit with newer vSphere versions. However, we expect the |
| 152 | +limit to be increasing rather than decreasing and making the API validation more relaxed is possible. |
| 153 | + |
| 154 | +### Risks and Mitigations |
| 155 | + |
| 156 | +- None. |
| 157 | + |
| 158 | +### Drawbacks |
| 159 | + |
| 160 | +- Increased Complexity: Introducing a new CRD and operator logic adds complexity to the vSphere CSI driver ecosystem. |
| 161 | +- Potential for Configuration Errors: Incorrectly configuring the volume limit can lead to unexpected behavior |
| 162 | + or pod scheduling failures. |
| 163 | +- Limited Granularity: The current proposal provides a global node-level limit. More fine-grained control |
| 164 | + (e.g., per-namespace or per-workload limits) would require further investigation and development. |
| 165 | + |
| 166 | +## Open Questions [optional] |
| 167 | + |
| 168 | +None. |
| 169 | + |
| 170 | +## Test Plan |
| 171 | + |
| 172 | +- E2E tests will be implemented to verify the correct propagation of the configured limit to the driver pods. |
| 173 | + These tests will be executed only on vSphere 8. |
| 174 | + |
| 175 | +## Graduation Criteria |
| 176 | + |
| 177 | +- GA in 4.19. |
| 178 | +- E2E tests are implemented and passing. |
| 179 | +- Documentation is updated. |
| 180 | + |
| 181 | +### Dev Preview -> Tech Preview |
| 182 | + |
| 183 | +- Ability to utilize the enhancement end to end |
| 184 | + |
| 185 | +### Tech Preview -> GA |
| 186 | + |
| 187 | +- E2E test coverage demonstrating stability. |
| 188 | +- Available by default. |
| 189 | +- User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/). |
| 190 | + |
| 191 | +### Removing a deprecated feature |
| 192 | + |
| 193 | +- No. |
| 194 | + |
| 195 | +## Upgrade / Downgrade Strategy |
| 196 | + |
| 197 | +- **Upgrades:** During an upgrade, the operator will apply the new API field value and update the driver DaemonSet with |
| 198 | + the new `MAX_VOLUMES_PER_NODE` value. If the field is not set, default value (59) is used to match the current limit |
| 199 | + for vSphere 7. So the limit will not change for existing deployments. |
| 200 | +-**Downgrades:** Downgrading to a version without this feature will result in the API field being ignored and the |
| 201 | + operator will revert to its previous hardcoded value configured in DaemonSet (59). If there is a higher count of |
| 202 | + attached volumes that the limit after downgrade, the vSphere CSI driver will not be able to attach new volumes and |
| 203 | + to nodes and users will need to manually detach the extra volumes. |
| 204 | + |
| 205 | +## Version Skew Strategy |
| 206 | + |
| 207 | +There are no version skew concerns for this enhancement. |
| 208 | + |
| 209 | +## Operational Aspects of API Extensions |
| 210 | + |
| 211 | +- API extension does not pose any operational challenges. |
| 212 | + |
| 213 | +## Support Procedures |
| 214 | + |
| 215 | +* To check the status of the vSphere CSI operator, use the following command: |
| 216 | + `oc get deployments -n openshift-cluster-csi-drivers`. Ensure that the operator is running and healthy, inspect logs. |
| 217 | +* To inspect the `ClusterCSIDriver` CRs, use the following command: `oc get clustercsidriver/csi.vsphere.vmware.com -o yaml`. |
| 218 | + Examine the `spec.driverConfig.vSphere.maxAllowedBlockVolumesPerNode` field. |
| 219 | + |
| 220 | +## Alternatives |
| 221 | + |
| 222 | +- We considered adding version checks to the CSI operator to prevent users setting this value incorrectly for versions |
| 223 | + that do not support higher limits. In order to do this we would need to check vSphere/vCenter versions and ESXi for |
| 224 | + every node, probably with custom webhook for validating the limit value in ClusterCSIDriver. Due to the complexity |
| 225 | + and low demand we do not plan to add this logic in 4.19. We might expand this enhancement in the future if needed. |
| 226 | + |
| 227 | +## Infrastructure Needed [optional] |
| 228 | + |
| 229 | +- Current infrastructure needed to support the enhancement is available for testing vSphere version 8. |
0 commit comments