Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ControllerUnpublishVolume should require "OK" when node/volume unpublished and get rid of "recoverable" NOT FOUND errors #382

Open
davidz627 opened this issue Aug 15, 2019 · 3 comments
Milestone

Comments

@davidz627
Copy link
Contributor

This would be a breaking change and would require a major version bump of the spec but putting here as a backlog item.

NOT_FOUND errors in ControllerUnpublishVolume are unrecoverable based on the current recovery behavior defined in the spec. The recovery behavior asks the caller to "verify ... the volume is accessible" which can only be verified through calling the driver. Currently the only call to find that is ListVolumes which is an optional capability. Therefore, a MUST to verify the volume exist after this error is actually logically impossible.

The driver is actually the one that can/should decide to return an OK code when the node/volume is unavailable and it can interpret it as the volume has been controllerunpublished. There should be no situation where an error occurs where the driver will return NOT_FOUND that could be recoverable by the caller - this is unless the caller got the volume_id wrong which I would argue should also be returned as an OK just like the specification states for DeleteVolume.

The solution to the above problems is to amend the wording:

If the volume corresponding to the volume_id or the node corresponding to node_id cannot be found by the Plugin and the volume can be safely regarded as ControllerUnpublished from the node, the plugin **MUST** return 0 OK.

And remove the must/should recovery behavior from the two NOT_FOUND errors in ControllerUnpublishVolume and just cover it with a blanket retry.

The implementation of the external-attacher for Kubernetes already does not do the recovery behaviors, and is in the process of releasing a new major version which treats NOT_FOUND errors as real errors that must be retried.

@davidz627
Copy link
Contributor Author

/cc @saad-ali
I feel like I didn't really explain this very eloquently, feel free to reword and clarify

@davidz627
Copy link
Contributor Author

#375 (comment)

@jdef jdef added this to the v2.0 milestone Aug 15, 2019
@saad-ali
Copy link
Member

tldr; for 2.0 we have to make sure that we require SP to return OK for ControllerPublish if they volume or node does not exist and can be assumed to be not published. And indicate that a NOT_FOUND will be treated like any other error (caller, CO, will just retry for ever). Or introduce a set of calls that would allow caller to figure out things.

shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 1, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 1, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 1, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 1, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 1, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 4, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 4, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 4, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 5, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 5, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 5, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Longhorn longhorn#347

Signed-off-by: Shuo Wu <shuo@rancher.com>
shuo-wu pushed a commit to shuo-wu/longhorn-manager that referenced this issue Nov 8, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Longhorn longhorn#347

Signed-off-by: Shuo Wu <shuo@rancher.com>
yasker pushed a commit to longhorn/longhorn-manager that referenced this issue Nov 8, 2019
…tation

Now the provisioner won't wait for Detach
(ControllerUnpublishVolume()) complete before invoking DeleteVolume.
Then Detach will get stuck in the NOT_FOUND error since there is no
way to verify the volume before detaching.

For the details, See
container-storage-interface/spec#382

And the spec doc has removed NOT_FOUND error for
ControllerUnpublishVolume():
https://github.com/container-storage-interface/spec/blob/release-1.2/spec.md#controllerunpublishvolume

Longhorn #347

Signed-off-by: Shuo Wu <shuo@rancher.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants