Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Trying to remove a volume from a different node multiple times leaves volume in un-usable state #716

Closed
kerneltime opened this issue Nov 7, 2016 · 9 comments

Comments

@kerneltime
Copy link
Contributor

I ran into this issue when trying to test the retry logic for volumes.
Node 1

docker volume create --driver=vmdk --name=storage -o size=100MB
docker run --rm -it -v storage:/vol busybox 

Node 2

docker volume rm storage 
docker volume rm storage

The first time I get the following logs

07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] lib.vmci_reply: VMCI replied with errcode 0
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] Released lock: storage
07/13/16 23:11:02 2710275 [MainThread] [DEBUG  ] lib.vmci_get_one_op returns 7, buffer '{"cmd":"remove","details":{"Name":"storage"}}'
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] LockManager.get_lock: new lock: storage, <thread.lock object at 0xffe9e7c0>
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] LockManager existing locks in store: [u'storage']
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] Trying to aquire lock: storage
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] Aquired lock: storage
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] get_datastores: [('datastore1', '57302fae-961b9cf9-2e49-000c29acb565', '/vmfs/volumes/57302fae-961b9cf9-2e49-000c29acb565/dockvols'), ('vsanDatastore', 'vsan:5201dd577fe793bd-112186a6c2b42e0c', '/vmfs/volumes/vsan:5201dd577fe793bd-112186a6c2b42e0c/dockvols')]
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] get_datastore_name: path=vsan:5201dd577fe793bd-112186a6c2b42e0c name=['vsanDatastore']
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] Authorize: vm_uuid=564d6f23-eabb-1b63-c79d-2086cca704d5
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] Authorize: datastore=vsanDatastore
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] Authorize: cmd=remove
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] Authorize: opt={}
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] get tenant vm_uuid=564d6f23-eabb-1b63-c79d-2086cca704d5 tenant_id=4acd19ee-1ed6-4716-bcda-c8637c9658f2
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] get_privileges tenant_uuid=4acd19ee-1ed6-4716-bcda-c8637c9658f2 datastore=vsanDatastore
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] delete_volume=1
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [INFO   ] cmd remove with opts {} on tenant_uuid 4acd19ee-1ed6-4716-bcda-c8637c9658f2 datastore vsanDatastore is allowed to execute
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] Found /vmfs/volumes/vsanDatastore/dockvols/tenant1, returning
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] executeRequest tenant1 /vmfs/volumes/vsanDatastore/dockvols/tenant1
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [INFO   ] *** removeVMDK: /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk
07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [DEBUG  ] Running cmd /sbin/vmkfstools -U  /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk

07/13/16 23:11:02 2710275 [photon.vsan.2-storage] [INFO   ] *** removeVMDK: /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk, coudn't lock volume for removal. Retrying...
07/13/16 23:11:03 2710275 [photon.vsan.2-storage] [DEBUG  ] Running cmd /sbin/vmkfstools -U  /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk
07/13/16 23:11:04 2710275 [photon.vsan.2-storage] [INFO   ] *** removeVMDK: /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk, coudn't lock volume for removal. Retrying...
07/13/16 23:11:05 2710275 [photon.vsan.2-storage] [DEBUG  ] Running cmd /sbin/vmkfstools -U  /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk
07/13/16 23:11:05 2710275 [photon.vsan.2-storage] [INFO   ] *** removeVMDK: /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk, coudn't lock volume for removal. Retrying...
07/13/16 23:11:06 2710275 [photon.vsan.2-storage] [DEBUG  ] Running cmd /sbin/vmkfstools -U  /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk
07/13/16 23:11:06 2710275 [photon.vsan.2-storage] [INFO   ] *** removeVMDK: /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk, coudn't lock volume for removal. Retrying...
07/13/16 23:11:07 2710275 [photon.vsan.2-storage] [DEBUG  ] Running cmd /sbin/vmkfstools -U  /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk
07/13/16 23:11:07 2710275 [photon.vsan.2-storage] [INFO   ] *** removeVMDK: /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk, coudn't lock volume for removal. Retrying...
07/13/16 23:11:08 2710275 [photon.vsan.2-storage] [DEBUG  ] Running cmd /sbin/vmkfstools -U  /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk
07/13/16 23:11:08 2710275 [photon.vsan.2-storage] [INFO   ] executeRequest 'remove' completed with ret={u'Error': u'Failed to remove /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk. Failed to delete virtual disk: Failed to lock the file (16392).\n'}
07/13/16 23:11:08 2710275 [photon.vsan.2-storage] [DEBUG  ] lib.vmci_reply: VMCI replied with errcode 0
07/13/16 23:11:08 2710275 [photon.vsan.2-storage] [DEBUG  ] Released lock: storage

After that each retry gives the following result

07/13/16 23:11:22 2710275 [MainThread] [DEBUG  ] lib.vmci_get_one_op returns 4, buffer '{"cmd":"get","details":{"Name":"storage"}}'
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] LockManager.get_lock: new lock: storage, <thread.lock object at 0xffe9e810>
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] LockManager existing locks in store: [u'storage']
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Trying to aquire lock: storage
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Aquired lock: storage
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] get_datastores: [('datastore1', '57302fae-961b9cf9-2e49-000c29acb565', '/vmfs/volumes/57302fae-961b9cf9-2e49-000c29acb565/dockvols'), ('vsanDatastore', 'vsan:5201dd577fe793bd-112186a6c2b42e0c', '/vmfs/volumes/vsan:5201dd577fe793bd-112186a6c2b42e0c/dockvols')]
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] get_datastore_name: path=vsan:5201dd577fe793bd-112186a6c2b42e0c name=['vsanDatastore']
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Authorize: vm_uuid=564d6f23-eabb-1b63-c79d-2086cca704d5
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Authorize: datastore=vsanDatastore
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Authorize: cmd=get
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Authorize: opt={}
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] get tenant vm_uuid=564d6f23-eabb-1b63-c79d-2086cca704d5 tenant_id=4acd19ee-1ed6-4716-bcda-c8637c9658f2
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] get_privileges tenant_uuid=4acd19ee-1ed6-4716-bcda-c8637c9658f2 datastore=vsanDatastore
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] mount_volume=1
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [INFO   ] cmd get with opts {} on tenant_uuid 4acd19ee-1ed6-4716-bcda-c8637c9658f2 datastore vsanDatastore is allowed to execute
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Found /vmfs/volumes/vsanDatastore/dockvols/tenant1, returning
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] executeRequest tenant1 /vmfs/volumes/vsanDatastore/dockvols/tenant1
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Trying to acquire lock: <_RLock owner=None count=0> @ 0x3433acec, caller: load, args: (u'/vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk',) {}
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Aquired lock: <_RLock owner='photon.vsan.2-storage' count=1> @ 0x3433acec, caller: load, args: (u'/vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk',) {}
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [ERROR  ] Failed to access /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage-579734cb32c89965.vmfd
Traceback (most recent call last):
  File "/usr/lib/vmware/vmdkops/Python/kvESX.py", line 266, in load
    with open(meta_file, "r") as fh:
IOError: [Errno 2] No such file or directory: '/vmfs/volumes/vsanDatastore/dockvols/tenant1/storage-579734cb32c89965.vmfd'
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Trying to acquire lock: <_RLock owner=None count=0> @ 0x3433acec, caller: get_info, args: (u'/vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk',) {}
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Aquired lock: <_RLock owner='photon.vsan.2-storage' count=1> @ 0x3433acec, caller: get_info, args: (u'/vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk',) {}
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Trying to acquire lock: <_RLock owner='photon.vsan.2-storage' count=1> @ 0x3433acec, caller: vol_open_path, args: (u'/vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk', 524289) {}
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Aquired lock: <_RLock owner='photon.vsan.2-storage' count=2> @ 0x3433acec, caller: vol_open_path, args: (u'/vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk', 524289) {}
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [ERROR  ] Failed to get disk details for /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk ('NoneType' object has no attribute '__getitem__')
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [INFO   ] executeRequest 'get' completed with ret={u'Error': u"Failed to get disk details for /vmfs/volumes/vsanDatastore/dockvols/tenant1/storage.vmdk ('NoneType' object has no attribute '__getitem__')"}
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] lib.vmci_reply: VMCI replied with errcode 0
07/13/16 23:11:22 2710275 [photon.vsan.2-storage] [DEBUG  ] Released lock: storage
@brunotm
Copy link
Contributor

brunotm commented Nov 8, 2016

@kerneltime
Shouldn't this fail earlier since the volume is attached?

@brunotm
Copy link
Contributor

brunotm commented Nov 8, 2016

I have to investigate the behavior with the vim.virtualdiskmanager, but I think vmkfstool -U is too destructive to allow execution without checking attached status.

@govint
Copy link
Contributor

govint commented Nov 8, 2016

vmkfstools removes the sidecar (KV files) ahead of unlinking the vmdk itself. So, its possible that the command errors but has removed the side car file. Seems like a bug in disk removal as its leaving the disk with meta-data removed. Again, the error seems to be a busy error at the lowest level. Which is expected given the disk is attached and hence opened.

@govint
Copy link
Contributor

govint commented Nov 8, 2016

Yes, the removeVMDK needs to check that the disk isn't in use - there is no check in vmdk_ops.py:removeVMDK() - which is a bug in itself. Issue #719 will fix this.

@brunotm
Copy link
Contributor

brunotm commented Nov 8, 2016

@govint
The volume was attached to another node. If the removal cmd was issued from the same node the docker daemon would check and error out

@govint
Copy link
Contributor

govint commented Nov 8, 2016

@brunotm, absolutely, but removeVMDK must handle that check as its got no idea who's asking it to remove a VMDK. And with the multi-tenant support coming up, we can have multiple VMs in the same tenant having access to the same VMDKs and this scenario can be common.

@brunotm
Copy link
Contributor

brunotm commented Nov 8, 2016

@govint
That was my first comment :)

@msterin
Copy link
Contributor

msterin commented Nov 9, 2016

FWIW I think adding overhead to every remove operation (check before remove) is less appealing that adding overhead to very few operations (recover on failure). Besides, I suspect checking for usage is not going to be trivial.

@pdhamdhere
Copy link
Contributor

Fixed with Bruno's PR #719

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

No branches or pull requests

5 participants