Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Make ESX OPs resilient to retries #1076

Closed
pdhamdhere opened this issue Mar 24, 2017 · 7 comments
Closed

Make ESX OPs resilient to retries #1076

pdhamdhere opened this issue Mar 24, 2017 · 7 comments

Comments

@pdhamdhere
Copy link
Contributor

With #1049, Plugin retries operation in case of VMCI errors. However, operation on ESX may have been successful on first attempt. We need to gracefully handle "retry" requests on ESX. For e.g.

For Remove requests, 2nd attempt to removeVMDK fails with "File not found" sicne it is already deleted on first attempt. We should simply return success.

Similarly, we should check CREATE: return success if VMDK already exists, ATTACH/DETACH: return success if vmdk is already attached/detached.

See https://ci.vmware.run/vmware/docker-volume-vsphere/1797 where remove is retried due to "Bad Magic" error and it fails with "file not found"

@pdhamdhere
Copy link
Contributor Author

Hmm Should ESX service be resilient or should Plugin retries be more intelligent?
For e.g. Plugin before retrying CREATE/REMOVE, can call GET and see if volume is already created/deleted.

@govint
Copy link
Contributor

govint commented Mar 24, 2017

I'd seen and considered this behavior for the ESX service when dealing with the VMCI issue and it seemed like a good thing to do. But it may mask issues by allowing retries to succeed. For example, two users both creating a volume with the same name and different properties on the same DS and both getting success. Prefer that the VMCI code handle error scnearios, doing a Get() before Remove() is one such.

@pdhamdhere
Copy link
Contributor Author

@govint I don't have strong opinion on this yet. Wouldn't it be nice & make client simpler & predictable if Server provides idempotency guarantee? For e.g. PUT & DELETE (with exception) are idempotent.

From code structure perspective, "retry" logic is deep inside socket/vmci code and adding OPs logic their would really mess modularity of code.

Re Create Volume e.g. Docker doesn't allow this and Server won't even see Create request.

@govint
Copy link
Contributor

govint commented Mar 24, 2017

Agree, any checks will have to be added above VMCI layer. But a call like a creating volume with same name done from two docker hosts should ensure one gets the error.

@pdhamdhere
Copy link
Contributor Author

creating volume with same name done from two docker hosts should ensure one gets the error.

This won't happen since Docker doesn't distinguish volumes based on properties other than name. Docker before "create" issues "get" and will return success since volume with same name already exists.

@shuklanirdesh82
Copy link
Contributor

one more failure instance: https://ci.vmware.run/vmware/docker-volume-vsphere/1803

@govint
Copy link
Contributor

govint commented Mar 24, 2017

Can the below sequence be an issue:

  1. Docker on two docker hosts create a volume of the same name (but say different options) and volume doesn't exist.
  2. Both hosts issue a Get() to the service and get back an error that the volume isn't there.
  3. Both hosts now issue the create, one actually creates the vmdk while both hosts get back an OK status and both believe they have created the volume per their options.

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

No branches or pull requests

3 participants