-
Notifications
You must be signed in to change notification settings - Fork 95
Made handing of missing metafile less harsh. #627
Conversation
Today if the metadata file is missing (which could happen when some ops fail and disk created OK, but metadata file is not created - I had it a few time in test though I am not sure which op exactly failed) the whole plugin goes into unstable state due to unhandled exception (below). This fix makes it a little more predictable. All keeps working for other volumes, but any opertion on a volume with damaged metadata will return a JSON parse error from Docker volume driver. The issue has to has to be properly handled (Issue #626 ), this is a smaller fix Before the fix: --------------- 1. Docker commands hang 2. Backtrace on the server: 10/15/16 06:12:28 41380 [photon-1] [INFO ] executeRequest 'list' completed with ret=[{u'Attributes': {}, u'Name': 'myvol'}, {u'Attributes': {}, u'Name': 'refCountTestVol'}] 10/15/16 06:12:33 41380 [photon-1] [ERROR ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd Traceback (most recent call last): File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load with open(meta_file, "r") as fh: IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd' 10/15/16 06:12:33 41380 [photon-1] [ERROR ] 'NoneType' object has no attribute '__getitem__' Traceback (most recent call last): File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1186, in main handleVmciRequests(port) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1143, in handleVmciRequests opts=opts) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 568, in executeRequest response = getVMDK(vmdk_path, vol_name, datastore) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 400, in getVMDK return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore) File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 350, in vol_info vinfo = {CREATED_BY_VM : vol_meta[kv.CREATED_BY], TypeError: 'NoneType' object has no attribute '__getitem__' After the fix: 1. Docker command reports error Error response from daemon: get v3: VolumeDriver.Get: json: cannot unmarshal string into Go value of type map[string]interface {} 2. Server log reports error: 10/15/16 06:33:35 45724 [photon-1] [INFO ] executeRequest 'get' completed with ret=Failed to get disk details 10/15/16 06:33:52 45724 [photon-1] [ERROR ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd Traceback (most recent call last): File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load with open(meta_file, "r") as fh: IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
Don't we need the same exception handling in other kv.* operations in vmdk_ops? |
yes, if there are in handled exceptions. I opened a separate issue for that. this PR is a quick fix I did to prevent hanging "docker volume" command in my demo bed , in case I get this corruption again. |
If disk is created ok and meta-data creation fails then the disk is removed. Exactly how did this condition occur and what was the op that failed leading to a disk getting created and ack'ed to the client without the KV getting created> |
@govint - I have no idea how it happened, but it did happen. It is easy to fake by just removing vmsd file to get the same behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good. However, scenario where VMDK exists with no KV is concerning.
@govint - manual removal is for testing the fix. On my demo bed the file disappeared on its own after running (and failing) the make test-all, but as I mentioned, I so not know exactly what happened. |
* master: (25 commits) Update new ESX IP added forgotten .so file Install sqlite3 py libs on ESX and load for Python2 Added py code and binaries for sqlite3 python libs Update drone security Removed accidental .pyc files Handle byte to string conversions for status command. Auth configuration and operation admission check (Auth.liping) (vmware-archive#603) Revert "Cli auth.liping" Cli auth.liping (vmware-archive#640) Handle missing or invalid fs type on mount. (vmware-archive#639) Updated Admin CLI commands to support tenants. (vmware-archive#620) Workaround older versions of e2fsprogs (vmware-archive#631) Add auth proposal Made handing of missing metafile less harsh. (vmware-archive#627) Fixed ACLs in payload bin dir (vmware-archive#624) Fixed error handling for set command. (vmware-archive#610) Use new error variables when rolling back volume creation to avoid nil reassignment. (vmware-archive#617) Change wording Fix broken link ...
Today if the metadata file is missing (which could happen when some ops fail and disk created OK,
but metadata file is not created - I had it a few time in test though I am not sure which op exactly failed)
the whole plugin goes into unstable state due to unhandled exception (below).
This fix makes it a little more predictable. All keeps working for other volumes, but any operation on a volume with damaged metadata
will return a JSON parse error from Docker volume driver.
The issue has to be properly handled (Issue #626 ), this is a smaller fix
Before the fix:
10/15/16 06:12:28 41380 [photon-1] [INFO ] executeRequest 'list' completed with ret=[{u'Attributes': {}, u'Name': 'myvol'}, {u'Attributes': {}, u'Name': 'refCountTestVol'}]
10/15/16 06:12:33 41380 [photon-1] [ERROR ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd
Traceback (most recent call last):
File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load
with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
10/15/16 06:12:33 41380 [photon-1] [ERROR ] 'NoneType' object has no attribute 'getitem'
Traceback (most recent call last):
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1186, in main
handleVmciRequests(port)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1143, in handleVmciRequests
opts=opts)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 568, in executeRequest
response = getVMDK(vmdk_path, vol_name, datastore)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 400, in getVMDK
return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 350, in vol_info
vinfo = {CREATED_BY_VM : vol_meta[kv.CREATED_BY],
TypeError: 'NoneType' object has no attribute 'getitem'
After the fix:
Error response from daemon: get v3: VolumeDriver.Get: json: cannot unmarshal string into Go value of type map[string]interface {}
_Note_: I can not change the code to return explicit err() - due to #628 in this case Docker reports "no such volume" which is even more confusing.