Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Volume create fails with meta-data in use error #626

Closed
msterin opened this issue Oct 17, 2016 · 7 comments
Closed

Volume create fails with meta-data in use error #626

msterin opened this issue Oct 17, 2016 · 7 comments

Comments

@msterin
Copy link
Contributor

msterin commented Oct 17, 2016

Made handing of missing metafile less harsh.

If the metadata file is missing (which could happen when some ops fail and disk created OK,
but metadata file is not created - I had it a few times in test though I am not sure which op exactly failed) the whole plugin goes into unstable state due to unhandled exception (below).

Backtrace (from my private branch, so line numbers could be a bit off)

10/15/16 06:12:28 41380 [photon-1] [INFO   ] executeRequest 'list' completed with ret=[{u'Attributes': {}, u'Name': 'myvol'}, {u'Attributes': {}, u'Name': 'refCountTestVol'}]
10/15/16 06:12:33 41380 [photon-1] [ERROR  ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load
    with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
10/15/16 06:12:33 41380 [photon-1] [ERROR  ] 'NoneType' object has no attribute '__getitem__'
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1186, in main
    handleVmciRequests(port)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1143, in handleVmciRequests
    opts=opts)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 568, in executeRequest
    response = getVMDK(vmdk_path, vol_name, datastore)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 400, in getVMDK
    return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 350, in vol_info
    vinfo = {CREATED_BY_VM : vol_meta[kv.CREATED_BY],
TypeError: 'NoneType' object has no attribute '__getitem__'

@pdhamdhere - please adjust priority as needed

Assigning to @govint per git blame for return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore)

@msterin msterin added the P0 label Oct 17, 2016
msterin pushed a commit that referenced this issue Oct 17, 2016
Today if the metadata file is missing (which could happen when some ops fail and disk created OK,
but metadata file is not created - I had it a few time in test though I am not sure which op exactly failed)
the whole plugin goes into unstable state due to unhandled exception (below).

This fix makes it a little more predictable. All keeps working for other volumes, but any opertion on a volume with damaged metadata
will return a JSON parse error from Docker volume driver.

The issue has to has to be properly handled (Issue #626 ), this is a smaller fix

Before the fix:
---------------
1. Docker commands hang

2. Backtrace on the server:

10/15/16 06:12:28 41380 [photon-1] [INFO   ] executeRequest 'list' completed with ret=[{u'Attributes': {}, u'Name': 'myvol'}, {u'Attributes': {}, u'Name': 'refCountTestVol'}]
10/15/16 06:12:33 41380 [photon-1] [ERROR  ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load
    with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
10/15/16 06:12:33 41380 [photon-1] [ERROR  ] 'NoneType' object has no attribute '__getitem__'
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1186, in main
    handleVmciRequests(port)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1143, in handleVmciRequests
    opts=opts)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 568, in executeRequest
    response = getVMDK(vmdk_path, vol_name, datastore)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 400, in getVMDK
    return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 350, in vol_info
    vinfo = {CREATED_BY_VM : vol_meta[kv.CREATED_BY],
TypeError: 'NoneType' object has no attribute '__getitem__'

After the fix:

1. Docker command reports error

Error response from daemon: get v3: VolumeDriver.Get: json: cannot unmarshal string into Go value of type map[string]interface {}

2. Server log reports error:

10/15/16 06:33:35 45724 [photon-1] [INFO   ] executeRequest 'get' completed with ret=Failed to get disk details
10/15/16 06:33:52 45724 [photon-1] [ERROR  ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load
    with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
@govint
Copy link
Contributor

govint commented Oct 17, 2016

@brunotm Can you say what version of ESX you are using for the test that failed.
Whats happened is just after the KV file got created (its created by ESX) on vsan, the file is opened to write the initial KV data. Apparently the system still has the file open and hence the write fails - with a busy error. I'll need to check this behavior on vsan more as volp00 creation went thru ok at around the same time (almost in parallel).


Failed to save meta-data for /vmfs/volumes/vsanDatastore/dockvols/volP10.vmdk
IOError: [Errno 16] Device or resource busy: '/vmfs/volumes/vsanDatastore/dockvols/volP10-77ad58548f419806.vmfd'


While in parallel volp00 created fine.

@brunotm - Were any parallel commands being run when the create test was happening? If by chance the disk was opened in parallel the KV file may not be usable.

@brunotm
Copy link
Contributor

brunotm commented Oct 17, 2016

@govint
Version 6u2.

Also I don't record seeing this error on non-vsan data stores.
On Mon, 17 Oct 2016 at 19:39, Thiruvengada Govindan <
notifications@github.com> wrote:

@brunotm https://github.com/brunotm Can you say what version of ESX you
are using for the test that failed.
Whats happened is just after the KV file got created (its created by ESX)
on vsan, the file is opened to write the initial KV data. Apparently the
system still has the file open and hence the write fails - with a busy
error. I'll need to check this behavior on vsan more as volp00 creation
went thru ok at around the same time (almost in parallel).

Failed to save meta-data for
/vmfs/volumes/vsanDatastore/dockvols/volP10.vmdk
IOError: [Errno 16] Device or resource busy:
'/vmfs/volumes/vsanDatastore/dockvols/volP10-77ad58548f419806.vmfd'

While in parallel volp00 created fine.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#626 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGCQDkji9JGXyZ3ZdvBRcYwT1wV0Cynuks5q08DPgaJpZM4KYo5y
.

Bruno Moura
+351 932 518 676

msterin pushed a commit that referenced this issue Oct 19, 2016
Today if the metadata file is missing (which could happen when some ops fail and disk created OK,
but metadata file is not created - I had it a few time in test though I am not sure which op exactly failed)
the whole plugin goes into unstable state due to unhandled exception (below).

This fix makes it a little more predictable. All keeps working for other volumes, but any opertion on a volume with damaged metadata
will return a JSON parse error from Docker volume driver.

The issue has to has to be properly handled (Issue #626 ), this is a smaller fix

Before the fix:
---------------
1. Docker commands hang

2. Backtrace on the server:

10/15/16 06:12:28 41380 [photon-1] [INFO   ] executeRequest 'list' completed with ret=[{u'Attributes': {}, u'Name': 'myvol'}, {u'Attributes': {}, u'Name': 'refCountTestVol'}]
10/15/16 06:12:33 41380 [photon-1] [ERROR  ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load
    with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
10/15/16 06:12:33 41380 [photon-1] [ERROR  ] 'NoneType' object has no attribute '__getitem__'
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1186, in main
    handleVmciRequests(port)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 1143, in handleVmciRequests
    opts=opts)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 568, in executeRequest
    response = getVMDK(vmdk_path, vol_name, datastore)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 400, in getVMDK
    return vol_info(kv.getAll(vmdk_path), kv.get_vol_info(vmdk_path), datastore)
  File "/usr/lib/vmware/vmdkops/bin/vmdk_ops.py", line 350, in vol_info
    vinfo = {CREATED_BY_VM : vol_meta[kv.CREATED_BY],
TypeError: 'NoneType' object has no attribute '__getitem__'

After the fix:

1. Docker command reports error

Error response from daemon: get v3: VolumeDriver.Get: json: cannot unmarshal string into Go value of type map[string]interface {}

2. Server log reports error:

10/15/16 06:33:35 45724 [photon-1] [INFO   ] executeRequest 'get' completed with ret=Failed to get disk details
10/15/16 06:33:52 45724 [photon-1] [ERROR  ] Failed to access /vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 242, in load
    with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/datastore1/dockvols/refCountTestVol-8bfdeb99e642a231.vmfd'
@govint govint changed the title Missing metadata file causes instability and docker volume command hang Volume create fails with meta-data in use error Oct 21, 2016
@govint
Copy link
Contributor

govint commented Oct 21, 2016

Renamed the issue as we have a couple of cases which are not reproducible. This issue is for what Bruno reported where a volume create failed with the KV file being in use during creation.

@brunotm can you say what test was run when this error was seen, can you post the steps used.

@brunotm
Copy link
Contributor

brunotm commented Oct 21, 2016

@govint

Occasionally observed this issue with the code from #612 when running make test-vm in a vsan (only?) environment.

ESX 6u2 with a vsan datastore
Docker host VMs Ubuntu 16.04
docker version 1.12.1

Logs http://pastebin.com/UybXCQME

brunotm added a commit to brunotm/docker-volume-vsphere that referenced this issue Oct 22, 2016
This is a ***workaround*** to the timing/locking with metadata files. issue vmware-archive#626
@govint
Copy link
Contributor

govint commented Oct 24, 2016

Been trying this with VMFS and without the threaded code. I'll need to check this with the threading code included. Seems like the KV file is being accessed ahead of it being written. Will wait for the threading code to be submitted in its final form.

brunotm added a commit to brunotm/docker-volume-vsphere that referenced this issue Oct 26, 2016
This is a ***workaround*** to the timing/locking with metadata files. issue vmware-archive#626
brunotm added a commit to brunotm/docker-volume-vsphere that referenced this issue Oct 28, 2016
This is a ***workaround*** to the timing/locking with metadata files. issue vmware-archive#626
kerneltime pushed a commit to brunotm/docker-volume-vsphere that referenced this issue Oct 29, 2016
This is a ***workaround*** to the timing/locking with metadata files. issue vmware-archive#626
@brunotm
Copy link
Contributor

brunotm commented Nov 3, 2016

@govint

Been trying this with VMFS and without the threaded code. I'll need to check this with the threading code included. Seems like the KV file is being accessed ahead of it being written. Will wait for the threading code to be submitted in its final form.

I think his unlikely for the file busy issue in kv.save() when creating a volume. looks more like a timing issue for the esx to release the file lock between kv.create() and kv.save().

If you can, would be nice to know the description of the error codes returned by the disklib, mainly 4008 and 27 (this one is from memory).

Thanks!

@govint
Copy link
Contributor

govint commented Nov 9, 2016

Closing with the changes merged via #708. All analysis is on #695

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

No branches or pull requests

3 participants