Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Made handing of missing metafile less harsh. #627

Merged
merged 1 commit into from
Oct 19, 2016
Merged

Conversation

msterin
Copy link
Contributor

@msterin msterin commented Oct 17, 2016

Today if the metadata file is missing (which could happen when some ops fail and disk created OK,
but metadata file is not created - I had it a few time in test though I am not sure which op exactly failed)
the whole plugin goes into unstable state due to unhandled exception (below).

This fix makes it a little more predictable. All keeps working for other volumes, but any operation on a volume with damaged metadata
will return a JSON parse error from Docker volume driver.

The issue has to be properly handled (Issue #626 ), this is a smaller fix

Before the fix:

  1. Docker commands hang
  2. Backtrace on the server:

10/15/16 06:12:28 41380 [photon-1] [INFO ] executeRequest 'list' completed with ret=[{u'Attributes': {}, u'Name': 'myvol'}, {u'Attributes': {}, u'Name': 'refCountTestVol'}]
10/15/16 06:12:33 41380 [photon-1] [ERROR ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd
Traceback (most recent call last):
File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load
with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
10/15/16 06:12:33 41380 [photon-1] [ERROR ] 'NoneType' object has no attribute 'getitem'
Traceback (most recent call last):
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1186, in main
handleVmciRequests(port)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1143, in handleVmciRequests
opts=opts)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 568, in executeRequest
response = getVMDK(vmdk_path, vol_name, datastore)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 400, in getVMDK
return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore)
File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 350, in vol_info
vinfo = {CREATED_BY_VM : vol_meta[kv.CREATED_BY],
TypeError: 'NoneType' object has no attribute 'getitem'

After the fix:

  1. Docker command reports error

Error response from daemon: get v3: VolumeDriver.Get: json: cannot unmarshal string into Go value of type map[string]interface {}

  1. Server log reports error:
10/15/16 06:33:35 45724 [photon-1] [INFO   ] executeRequest 'get' completed with ret=Failed to get disk details
10/15/16 06:33:52 45724 [photon-1] [ERROR  ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load
    with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'

_Note_: I can not change the code to return explicit err() - due to #628 in this case Docker reports "no such volume" which is even more confusing.

Today if the metadata file is missing (which could happen when some ops fail and disk created OK,
but metadata file is not created - I had it a few time in test though I am not sure which op exactly failed)
the whole plugin goes into unstable state due to unhandled exception (below).

This fix makes it a little more predictable. All keeps working for other volumes, but any opertion on a volume with damaged metadata
will return a JSON parse error from Docker volume driver.

The issue has to has to be properly handled (Issue #626 ), this is a smaller fix

Before the fix:
---------------
1. Docker commands hang

2. Backtrace on the server:

10/15/16 06:12:28 41380 [photon-1] [INFO   ] executeRequest 'list' completed with ret=[{u'Attributes': {}, u'Name': 'myvol'}, {u'Attributes': {}, u'Name': 'refCountTestVol'}]
10/15/16 06:12:33 41380 [photon-1] [ERROR  ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load
    with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
10/15/16 06:12:33 41380 [photon-1] [ERROR  ] 'NoneType' object has no attribute '__getitem__'
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1186, in main
    handleVmciRequests(port)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1143, in handleVmciRequests
    opts=opts)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 568, in executeRequest
    response = getVMDK(vmdk_path, vol_name, datastore)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 400, in getVMDK
    return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 350, in vol_info
    vinfo = {CREATED_BY_VM : vol_meta[kv.CREATED_BY],
TypeError: 'NoneType' object has no attribute '__getitem__'

After the fix:

1. Docker command reports error

Error response from daemon: get v3: VolumeDriver.Get: json: cannot unmarshal string into Go value of type map[string]interface {}

2. Server log reports error:

10/15/16 06:33:35 45724 [photon-1] [INFO   ] executeRequest 'get' completed with ret=Failed to get disk details
10/15/16 06:33:52 45724 [photon-1] [ERROR  ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load
    with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
@brunotm
Copy link
Contributor

brunotm commented Oct 17, 2016

Don't we need the same exception handling in other kv.* operations in vmdk_ops?

@msterin
Copy link
Contributor Author

msterin commented Oct 17, 2016

yes, if there are in handled exceptions. I opened a separate issue for that. this PR is a quick fix I did to prevent hanging "docker volume" command in my demo bed , in case I get this corruption again.

@govint
Copy link
Contributor

govint commented Oct 17, 2016

If disk is created ok and meta-data creation fails then the disk is removed. Exactly how did this condition occur and what was the op that failed leading to a disk getting created and ack'ed to the client without the KV getting created>

@msterin
Copy link
Contributor Author

msterin commented Oct 17, 2016

@govint - I have no idea how it happened, but it did happen. It is easy to fake by just removing vmsd file to get the same behavior.

Copy link
Contributor

@pdhamdhere pdhamdhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good. However, scenario where VMDK exists with no KV is concerning.

@govint
Copy link
Contributor

govint commented Oct 18, 2016

@msterin Thats what I was expecting, because the disk being there and KV not there isn't going to happen in any of the workflows unless its removed. Also, #626 is where the disk/kv seems to be opened causing the KV file to be 'in-use' when the server is updating it.

This change is fine for this PR.

@msterin
Copy link
Contributor Author

msterin commented Oct 18, 2016

@govint - manual removal is for testing the fix. On my demo bed the file disappeared on its own after running (and failing) the make test-all, but as I mentioned, I so not know exactly what happened.

@msterin msterin merged commit 26be8bd into master Oct 19, 2016
@msterin msterin deleted the kv-miss.msterin branch October 19, 2016 10:50
@msterin msterin restored the kv-miss.msterin branch October 19, 2016 10:50
@msterin msterin deleted the kv-miss.msterin branch October 22, 2016 00:42
brunotm added a commit to brunotm/docker-volume-vsphere that referenced this pull request Oct 26, 2016
* master: (25 commits)
  Update new ESX IP
  added forgotten .so file
  Install sqlite3 py libs on ESX and load for Python2
  Added py code and binaries for sqlite3 python libs
  Update drone security
  Removed accidental .pyc files
  Handle byte to string conversions for status command.
  Auth configuration and operation admission check (Auth.liping) (vmware-archive#603)
  Revert "Cli auth.liping"
  Cli auth.liping (vmware-archive#640)
  Handle missing or invalid fs type on mount. (vmware-archive#639)
  Updated Admin CLI commands to support tenants. (vmware-archive#620)
  Workaround older versions of e2fsprogs (vmware-archive#631)
  Add auth proposal
  Made handing of missing metafile less harsh. (vmware-archive#627)
  Fixed ACLs in payload bin dir (vmware-archive#624)
  Fixed error handling for set command. (vmware-archive#610)
  Use new error variables when rolling back volume creation to avoid nil reassignment. (vmware-archive#617)
  Change wording
  Fix broken link
  ...
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants