-
Notifications
You must be signed in to change notification settings - Fork 95
Device attach fails on volume creation due to duplicate bios.uuid #1373
Comments
Timing in log seems suspicious; Plugin issued attach (19:07:25), timed out after 10Secs and issued detach after 12 seconds (19:07:44). |
Tried this out multiple times and the attach-wait times are anywhere between 6s - 10s. From debugging, the plugin starts its wait after the docker host kernel has completed attaching the device. There are no obvious delays seen in the kernel logs: Docker host kernel logs: Plugin logs As seen above its taking around 6s to wait for the device to be detected by the plugin. The issue is reproduced at times but from all the logs available doesn't seem to be an issue on the ESX side. |
Summarizing, this doesn't seem to be a host (ESX) or VM issue. None of the logs point to any latencies in the host and the VM, except perhaps the GO watcher itself thats causing this behavior. |
@pdhamdhere, from all local tests was able to repro this issue some times and thereafter the issue didn't recur. I don't believe its a plugin or ESX service issue as there are no delays there that point to an obvious issue. I'll have an NFS datastore added and lets see if this is reproduced in CI. Else will close this issue. @shuklanirdesh82 can you get a NFS DS added to the CI just so we can figure if this can happen there. |
Local tests don't seem to repro this issue and at least not for the same reasons reported. Issue is being addressed over email, closing this issue. |
Let's keep the issue open until we root cause the issue in this unique setup. |
@govint writes
The issue still exists and we are trying to get additional info from the setup where it was reported from, even though it's unique so far. Until we know the root cause it represents a significant risk as it's a blocker for whoever steps into it. |
More info. Customer writes Here is the information from the ESX side of things:
and later he writes the following (hostnames and vm names redacted)
So the theory is that we are stepping into VM UUID uniqueness issues. We can do the following:
I'll take care of this |
@msterin Are we still targeting this for 0.16 release? We've just 2 more days to go! |
When communication from VMCI vSocket is established n vmdk_ops.py, we find out VM ID from the socket, and then locate VM ManagedObject by this ID. There are 2 IDs - VC UUID and BIOS UUID. When .vmx file is copied (e.g. VCD or other products), BIOS UUID can be duplicate thus failures like #1373 . However, VC UUID is unique on creation of VM, and unique as long as ESXi is a part of a VC. So it is a much better candidate for using as a unique ID. This change tries to use VC UUID first, and fails back to BIOS UUID next. The change also checks that the VM name (for a found VM by ID) matches the one we see from vSocket, and does a few other minor prints.
* Added proper return on protocol version mismatch * Using VC UUID for attach/detach where possible When communcaiton from VMCI vSocket is established, we find out VM ID from the socket. There are 2 IDs - VC UUID and BIOS UUID. When .vmx file is copied (e.g. VCD or other product), BIOS UUID can be dulicate. However, VC UUID is unique on creation of VM, and unique as long as ESXi is a part of a VC. So it is a much better candidate for using as a unique ID. Thic change tries to use VC UUID first, and fails back to BIOS UUID next. The change also checks that the VM name (for a found VM by ID) matches the one we see from vSocket, and does a few other minor prints. * Try to use VC UUID first for locating VMs, then fall back to BIOS UUID When communication from VMCI vSocket is established n vmdk_ops.py, we find out VM ID from the socket, and then locate VM ManagedObject by this ID. There are 2 IDs - VC UUID and BIOS UUID. When .vmx file is copied (e.g. VCD or other products), BIOS UUID can be duplicate thus failures like #1373 . However, VC UUID is unique on creation of VM, and unique as long as ESXi is a part of a VC. So it is a much better candidate for using as a unique ID. This change tries to use VC UUID first, and fails back to BIOS UUID next. The change also checks that the VM name (for a found VM by ID) matches the one we see from vSocket, and does a few other minor prints. * Unbundled detachVMDK and attachVMD to address review comments * fixed the VM instance ID
Copy-n-paste from email :
The text was updated successfully, but these errors were encountered: