-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Expand FAILED_PRECONDITION error in NodeExpandVolume RPC call #431
base: master
Are you sure you want to change the base?
Expand FAILED_PRECONDITION error in NodeExpandVolume RPC call #431
Conversation
cc @bswartz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
23d227b
to
f72fbbe
Compare
The build failures on this branch are unrelated to this PR. Filed #434 for tracking. |
898aec4
to
bd02984
Compare
@@ -2576,7 +2580,7 @@ message NodeExpandVolumeResponse { | |||
|-----------------------|-----------|-----------------------|-----------------------------------| | |||
| Exceeds capabilities | 3 INVALID_ARGUMENT | Indicates that CO has specified capabilities not supported by the volume. | Caller MAY verify volume capabilities by calling ValidateVolumeCapabilities and retry with matching capabilities. | | |||
| Volume does not exist | 5 NOT FOUND | Indicates that a volume corresponding to the specified volume_id does not exist. | Caller MUST verify that the volume_id is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. | | |||
| Volume in use | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged and the underlying filesystem does not support expansion of published or staged volumes. | Caller MUST NOT retry. | | |||
| Volume in use or readonly | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged or readonly and the storage provider does not support expansion of published or staged or readonly volumes. | Caller MUST NOT retry. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Volume in use or readonly | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged or readonly and the storage provider does not support expansion of published or staged or readonly volumes. | Caller MUST NOT retry. | | |
| Volume in use or readonly | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged or readonly and the storage provider does not support expansion of published or staged or readonly volumes. | Caller SHOULD ensure that volume is not node-published before retrying with exponential back off. | |
@@ -2523,12 +2523,16 @@ If plugin has `STAGE_UNSTAGE_VOLUME` node capability then: | |||
|
|||
Otherwise `NodeExpandVolume` MUST be called after successful `NodePublishVolume`. | |||
|
|||
A plugin that has `STAGE_UNSTAGE_VOLUME` node capability and supports `NodeExpandVolume` ONLY after `NodeStageVolume` and before `NodePublishVolume` but not after `NodePublishVolume` may return `FAILED_PRECONDITION` error code if `NodeExpandVolume` is called after `NodePublishVolume` - CO MUST NOT retry if volume is node-published. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to say CO MUST NOT retry if volume is node-published
? Because that means offline is not supported?
A plugin that has `STAGE_UNSTAGE_VOLUME` node capability and supports `NodeExpandVolume` ONLY after `NodeStageVolume` and before `NodePublishVolume` but not after `NodePublishVolume` may return `FAILED_PRECONDITION` error code if `NodeExpandVolume` is called after `NodePublishVolume` - CO MUST NOT retry if volume is node-published. | |
An SP that has `STAGE_UNSTAGE_VOLUME` node capability and supports `NodeExpandVolume` ONLY after `NodeStageVolume` and before `NodePublishVolume` but not after `NodePublishVolume` may return `FAILED_PRECONDITION` error code if `NodeExpandVolume` is called after `NodePublishVolume` and, in response, the CO SHOULD ensure that the volume is is node-staged but not node-published before retrying with exponential backoff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did mean "CO MUST NOT retry if volume is node-published". The reason being - if a SP does not allow expansion after node-publish and volume already has been node-published, the only way NodeExpandVolume
can succeed after retry is - if they NodeUnpublish
the volume and the only way CO can retry expansion is on another node after node-stage but before node-publish. For CO it is already too late to retry expansion on this node because NodeUnpublish
is very likely to be a unsafe operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that means offline is not supported?
I think it does not mean that - it means that CO must try expansion on another node after node-stage and before node-publish. It just means it can't happen on this node. Having said that - there is a slight problem with language:
If plugin has
STAGE_UNSTAGE_VOLUME
node capability then:
NodeExpandVolume
MUST be called after successfulNodeStageVolume
.NodeExpandVolume
MAY be called before or afterNodePublishVolume
.
I know Kubernetes always calls NodeExpandVolume
after NodeStageVolume
if NodeExpand
is supported but in general a different CO may choose to interpret this differently and may skip calling NodeExpand
between node-stage and node-publish and only call NodeExpand
after node-publish in which case "offline" expansion may not work on that particular CO. Even then I think recovery from volume in-use
error on node shouldn't be "retry with exponential backoff" because it is highly problematic to node-unpublish a volume or if volume is readonly on a node, make it RW.
I think we have two options:
- Make no spec change for this particular behaviour in
NodeExpandVolume
. The downside might be - another CO may interpret this spec differently and implement call toNodeExpandVolume
differently and plugins might be dependent on k8s behaviour (although I don't know any plugins which want this behaviour). - Make the language change around when
NdoeExpandVolume
can be called. so something like - "it is desirable to call NodeExpandVolume before NodePublish to permit expansion of volumes which only permit expansion after node-stage and before node-publish and not after node-publish"
bd02984
to
0d6a8ba
Compare
Expand the error from
NodeExpandVolume
to include readonly volume errors and add additional context about returningFAILED_PRECONDITION
error ifNodeExpandVolume
is not supported for node-published volumes.