-
Notifications
You must be signed in to change notification settings - Fork 95
Volume create fails with meta-data in use error #626
Comments
Today if the metadata file is missing (which could happen when some ops fail and disk created OK, but metadata file is not created - I had it a few time in test though I am not sure which op exactly failed) the whole plugin goes into unstable state due to unhandled exception (below). This fix makes it a little more predictable. All keeps working for other volumes, but any opertion on a volume with damaged metadata will return a JSON parse error from Docker volume driver. The issue has to has to be properly handled (Issue #626 ), this is a smaller fix Before the fix: --------------- 1. Docker commands hang 2. Backtrace on the server: 10/15/16 06:12:28 41380 [photon-1] [INFO ] executeRequest 'list' completed with ret=[{u'Attributes': {}, u'Name': 'myvol'}, {u'Attributes': {}, u'Name': 'refCountTestVol'}] 10/15/16 06:12:33 41380 [photon-1] [ERROR ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd Traceback (most recent call last): File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load with open(meta_file, "r") as fh: IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd' 10/15/16 06:12:33 41380 [photon-1] [ERROR ] 'NoneType' object has no attribute '__getitem__' Traceback (most recent call last): File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1186, in main handleVmciRequests(port) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1143, in handleVmciRequests opts=opts) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 568, in executeRequest response = getVMDK(vmdk_path, vol_name, datastore) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 400, in getVMDK return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 350, in vol_info vinfo = {CREATED_BY_VM : vol_meta[kv.CREATED_BY], TypeError: 'NoneType' object has no attribute '__getitem__' After the fix: 1. Docker command reports error Error response from daemon: get v3: VolumeDriver.Get: json: cannot unmarshal string into Go value of type map[string]interface {} 2. Server log reports error: 10/15/16 06:33:35 45724 [photon-1] [INFO ] executeRequest 'get' completed with ret=Failed to get disk details 10/15/16 06:33:52 45724 [photon-1] [ERROR ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd Traceback (most recent call last): File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load with open(meta_file, "r") as fh: IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
@brunotm Can you say what version of ESX you are using for the test that failed. Failed to save meta-data for /vmfs/volumes/vsanDatastore/dockvols/volP10.vmdk While in parallel volp00 created fine. @brunotm - Were any parallel commands being run when the create test was happening? If by chance the disk was opened in parallel the KV file may not be usable. |
@govint Also I don't record seeing this error on non-vsan data stores.
|
Today if the metadata file is missing (which could happen when some ops fail and disk created OK, but metadata file is not created - I had it a few time in test though I am not sure which op exactly failed) the whole plugin goes into unstable state due to unhandled exception (below). This fix makes it a little more predictable. All keeps working for other volumes, but any opertion on a volume with damaged metadata will return a JSON parse error from Docker volume driver. The issue has to has to be properly handled (Issue #626 ), this is a smaller fix Before the fix: --------------- 1. Docker commands hang 2. Backtrace on the server: 10/15/16 06:12:28 41380 [photon-1] [INFO ] executeRequest 'list' completed with ret=[{u'Attributes': {}, u'Name': 'myvol'}, {u'Attributes': {}, u'Name': 'refCountTestVol'}] 10/15/16 06:12:33 41380 [photon-1] [ERROR ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd Traceback (most recent call last): File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load with open(meta_file, "r") as fh: IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd' 10/15/16 06:12:33 41380 [photon-1] [ERROR ] 'NoneType' object has no attribute '__getitem__' Traceback (most recent call last): File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1186, in main handleVmciRequests(port) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1143, in handleVmciRequests opts=opts) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 568, in executeRequest response = getVMDK(vmdk_path, vol_name, datastore) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 400, in getVMDK return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 350, in vol_info vinfo = {CREATED_BY_VM : vol_meta[kv.CREATED_BY], TypeError: 'NoneType' object has no attribute '__getitem__' After the fix: 1. Docker command reports error Error response from daemon: get v3: VolumeDriver.Get: json: cannot unmarshal string into Go value of type map[string]interface {} 2. Server log reports error: 10/15/16 06:33:35 45724 [photon-1] [INFO ] executeRequest 'get' completed with ret=Failed to get disk details 10/15/16 06:33:52 45724 [photon-1] [ERROR ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd Traceback (most recent call last): File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load with open(meta_file, "r") as fh: IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
Renamed the issue as we have a couple of cases which are not reproducible. This issue is for what Bruno reported where a volume create failed with the KV file being in use during creation. @brunotm can you say what test was run when this error was seen, can you post the steps used. |
Occasionally observed this issue with the code from #612 when running make test-vm in a vsan (only?) environment. ESX 6u2 with a vsan datastore |
This is a ***workaround*** to the timing/locking with metadata files. issue vmware-archive#626
Been trying this with VMFS and without the threaded code. I'll need to check this with the threading code included. Seems like the KV file is being accessed ahead of it being written. Will wait for the threading code to be submitted in its final form. |
This is a ***workaround*** to the timing/locking with metadata files. issue vmware-archive#626
This is a ***workaround*** to the timing/locking with metadata files. issue vmware-archive#626
This is a ***workaround*** to the timing/locking with metadata files. issue vmware-archive#626
I think his unlikely for the file busy issue in kv.save() when creating a volume. looks more like a timing issue for the esx to release the file lock between kv.create() and kv.save(). If you can, would be nice to know the description of the error codes returned by the disklib, mainly 4008 and 27 (this one is from memory). Thanks! |
Made handing of missing metafile less harsh.
If the metadata file is missing (which could happen when some ops fail and disk created OK,
but metadata file is not created - I had it a few times in test though I am not sure which op exactly failed) the whole plugin goes into unstable state due to unhandled exception (below).
Backtrace (from my private branch, so line numbers could be a bit off)
@pdhamdhere - please adjust priority as needed
Assigning to @govint per git blame for
return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore)
The text was updated successfully, but these errors were encountered: