-
Notifications
You must be signed in to change notification settings - Fork 95
Make ESX OPs resilient to retries #1076
Comments
Hmm Should ESX service be resilient or should Plugin retries be more intelligent? |
I'd seen and considered this behavior for the ESX service when dealing with the VMCI issue and it seemed like a good thing to do. But it may mask issues by allowing retries to succeed. For example, two users both creating a volume with the same name and different properties on the same DS and both getting success. Prefer that the VMCI code handle error scnearios, doing a Get() before Remove() is one such. |
@govint I don't have strong opinion on this yet. Wouldn't it be nice & make client simpler & predictable if Server provides idempotency guarantee? For e.g. PUT & DELETE (with exception) are idempotent. From code structure perspective, "retry" logic is deep inside socket/vmci code and adding OPs logic their would really mess modularity of code. Re Create Volume e.g. Docker doesn't allow this and Server won't even see Create request. |
Agree, any checks will have to be added above VMCI layer. But a call like a creating volume with same name done from two docker hosts should ensure one gets the error. |
This won't happen since Docker doesn't distinguish volumes based on properties other than name. Docker before "create" issues "get" and will return success since volume with same name already exists. |
one more failure instance: https://ci.vmware.run/vmware/docker-volume-vsphere/1803 |
Can the below sequence be an issue:
|
With #1049, Plugin retries operation in case of VMCI errors. However, operation on ESX may have been successful on first attempt. We need to gracefully handle "retry" requests on ESX. For e.g.
For Remove requests, 2nd attempt to removeVMDK fails with "File not found" sicne it is already deleted on first attempt. We should simply return success.
Similarly, we should check CREATE: return success if VMDK already exists, ATTACH/DETACH: return success if vmdk is already attached/detached.
See https://ci.vmware.run/vmware/docker-volume-vsphere/1797 where remove is retried due to "Bad Magic" error and it fails with "file not found"
The text was updated successfully, but these errors were encountered: