-
Notifications
You must be signed in to change notification settings - Fork 95
Add volume cloning support -o clone-from=volname #726
Add volume cloning support -o clone-from=volname #726
Conversation
src_vol_info = kv.get_vol_info(src_vmdk_path) | ||
opts["size"] = src_vol_info["size"] | ||
error_info, tenant_uuid, tenant_name = auth.authorize(vm_uuid, vm_datastore, "create", opts) | ||
if error_info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do the authorize step for "create" before doing get_vol_info(), why read the KV if the VM doesn't have create permission on the datastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@govint
You're right. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the implementation is correct for this.
- In executeRequest() we fetch and validate datastore, volume and tenant info and authorize the op without the size information.
- In cloneVMDK() we need to do the same to validate access to the src volume, then fetch its size information and _authorize again_ the create op with the correct size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brunotm I see the create check @ line 251 but not the src check that you mention. This is a quick review and I might have missed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kerneltime
If I understood your question correctly:
From @220, first we get tenant info again to validate accessibility with volume and datastore, and fetch its path. Then create opts[size] is updated with the src_vol_info[size] and authorize again with the correct size info @251
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. to validate accessibility with source volume and datastore
return err(error_info) | ||
|
||
attached, uuid, attach_as = getStatusAttached(src_vmdk_path) | ||
if attached: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should call handle_stale_attach() in case the disk is not in use but KV says so, or disk is attached to a VM thats powered off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. thanks!
|
||
si = get_si() | ||
task = si.content.virtualDiskManager.CopyVirtualDisk( | ||
sourceName=source_vol, destName=dest_vol, destSpec=vdisk_spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use vmkfstools -i src-disk.vmdk cloned-disk.vmdk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API approach is better, we could also go that path for the other ops. Also, and i might be wrong, but from the docs vmkfstools doesn't support delta disks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on "API is better". Besides: #39 (remove command lines usage where API is available) is still open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to API calls on the *VMDK ops would be a straightforward change, if I manage to get time I can investigate it further
@@ -58,6 +58,7 @@ | |||
# The disk allocation format for vmdk | |||
DISK_ALLOCATION_FORMAT = 'diskformat' | |||
VALID_ALLOCATION_FORMATS = ["zeroedthick", "thin", "eagerzeroedthick"] | |||
CLONE_ADD_ALLOCATION_FORMAT = "delta" # Additional allocation format for clones | |||
DEFAULT_ALLOCATION_FORMAT = 'thin' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest not having delta as a format. Its an actual new disk that we get as a result of clone.
Clone could mean a full clone - a full copy of the source disk. Its not possible to get a snap of a disk that can be a delta over a base disk. And in fact we could avoid putting in ESX internal format types for a docker user.
Something like "fast", "full" clone types can be what the user specifies which sounds generic.
For now we can't support a "fast" clone as thats tied to a VM snapshot, but "full" clone can be supported, which may be slow depending on source disk size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be a source of confusion. And we already expose disk formats to the users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also agree with @brunotm here.
In vmware speak the disks are either "delta disks" or "redo logs" . So I think "delta" in format is fine. Also, "cloning", while usually applies to VM, is also already used as a "disk cloning" (https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1028042) so we can keep using --option clone with "delta" disk IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed and suggested changes,
3ee3862
to
0e5c797
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change!
What is expected behavior for 'docker volume rm source-volume' if there are clones. I assume it should fail with reasonable error.
@pdhamdhere Thanks! |
That is accurate for one chain only and AFAIK is managed only in the context of VM snapshots. When you have multiple "clones" (i.e. delta disks with shared parent) I am not aware of any code that does search of all the children and consolidation into EACH of them. So I am also wondering what happens in the current code if parent is deleted. I suspect operation will fail of the volume is currently attached, and the operation will succeed if it's detached - but all children will be rendered unusable. |
@msterin |
@msterin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change ! A few comments included - the major one being I am still not clear on what's the actual result of removing something that has clones.
A small requests - can you add a few examples or command line (cut-n-paste from xterm) ? I am wondering if commands like docker volume create -d vmdk --name my@datastore1 --option clone-from=old@datastore2
would work, and what happens on parent delete.
@@ -40,6 +40,9 @@ | |||
# glob expression to match end of 'delta' (aka snapshots) file names. | |||
SNAP_SUFFIX_GLOB = "-[0-9][0-9][0-9][0-9][0-9][0-9].vmdk" | |||
|
|||
# regexp for finding datastore path "[datastore] path/to/file.vmdk" from full vmdk path | |||
DATASTORE_PATH_REGEXP = r"^/vmfs/volumes/([a-zA-Z0-9_-]+?)/(.*)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datastores can have pretty much anything in the name. It's a symlink after all. I suspect there are interesting corner cases wich will be broken by this regexp.
Also, I think this may NOT work with the tenant since componet of the path is not limited to [a-zA-Z0-9_-]
(aka w
) @lipingxue
maybe something like this ^/vmfs/volumes/([^/]+)/(.*\.vmdk)$
?
@@ -209,6 +213,91 @@ def make_create_cmd(opts, vmdk_path): | |||
return "{0} -d {1} -c {2} {3}".format(VMDK_CREATE_CMD, disk_format, size, vmdk_path) | |||
|
|||
|
|||
def cloneVMDK(vm_name, vmdk_path, opts={}, vm_uuid=None, vm_datastore=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both new params seem excessive. vm_name should be enough got find uuid, and vmdk_path should be enough to find data store. It's OK to have redundant params to save on extra search, I just want to make sure it's a deliberate choice ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. we already have them :)
I am wondering how it is even possible to find all clones (delta disks) ? They can be on different datastores in different folders what all we know , and there is nothing in parent VMDK to track children. Something does not add up, we are missing something. Please do share the xterm logs and the vmdk_ops log and maybe 'ls -l' on the dockvols too. |
The test and logs are a bit long but they cover most of the concerns. _Create master volume_
_Create 1st delta on different datastore_
_dockvols on datastores_
_Create 2nd delta from 1st delta on different datastore_
_dockvols on datastores_
_Create 3rd delta from 2nd on different datastore_
_Inspect_
_dockvols on datastores_
_Remove master volume_
_Change more data on delta3_
_Inspect_
_Use delta3, for delete delta2 from another vm_
_dockvols on datastores_
_Inspect_
_Fill all available space on delta3_*
_Create delta4 from delta3 on a different datastore_
_Remove delta1_
_dockvols on datastores_
_Use delta4 while deleting delta3 from another vm_
_dockvols on datastores_
_Create and use a thin clone from delta4_
_dockvols on datastores_
|
Thanks for the tip, i missed that one in a rebase. |
@govint |
@brunotm - fascinating :-) ! I see all successes in the log and the actual data is obviously in place, but I still have no idea how is it possible. Evidently, either it does not really create "redo log" aka "delta" disk on on CopyDisk with target format "delta", or I am missing something. Can you share the content of vol34189-delta?.vmdk and vol34189-master.vmdk files ? Thanks. |
@brunotm - can you post the contents .vmdk files of the base disk and each Its impossible to use a "delta" disk if the parent has been deleted. The size calculation is based off following links in a disk chain (of On Tue, Nov 15, 2016 at 7:34 AM, Mark Sterin notifications@github.com
|
Ok, from the implementation of this feature all it does is to create a new Yes, the spec doesn't call out clearly, but the fact that you can have On Tue, Nov 15, 2016 at 1:17 PM, Bruno Moura notifications@github.com
|
All would be good, if the stored size on the datastore wasn't this.
_Size in MB of files in datastore_
_Size in MB of files in datastore_
_Size in MB of files in datastore_
_VMDK content, the createtype differs from parent vmfs to deltas vmfsSparse_
|
@govint writes
This is not fully accurate. In disklib slang, "child disk" can be wherever. It will just refer to parent disk on a different DS. This is how Linked Clones VMs can be on different stores. @brunotm writes
this is totally expected with fully cloned disks (i.e. when all "src" content is actually copied to 'target" on 'clone' operation) if the src and target have different format. Which they do. VMFSSparse allocates in smaller grains that basic VMFS thin format, and it tends to be better compacted (but slower in access) due to structure of metadata ("grain tables"). So what we see is perfectly expected - but there is NO reuse of base disk, it's content is simply copied. For real chains, child .VMDK would have non-empty parent (or parentPath, I don't recall). And, as @govint mentioneed, the child The good news is that it also explains why "remove" works just fine :) so there is no issue ! So, the bottom line - the API doc is misleading, it does not create chains but creates full copy instead - at least based on .vmdks you shared. Maybe -delta?.vmdk have more about parents ? Note that all key scenarios (backup, checkpoint) are still satisfied, albeit slower and with more space waste. |
@msterin, sorry, by same datastore I meant same type, mixed types aren't On Tue, Nov 15, 2016 at 3:23 PM, Mark Sterin notifications@github.com
|
@msterin 👍 A test with random data:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM. The doc docs/user-guide/docker-volume-cli.md needs to be updated with a quick write up, and it's good to go,
No, the nomenclature is more like "full" or "linked". The exact filesystem On Wed, Nov 16, 2016 at 12:35 PM, Bruno Moura notifications@github.com
|
vmdk_ops_admin_test: make expected_column_count a named global
vmdk_ops.executeRequest: remove stale code vmdk_ops: Fix argument passing.
vmdk_ops: fix assignment.
94e2506
to
248ff71
Compare
@kerneltime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only reviewed the last 2 commits (as we have discussed and reviewed the rest already). LGTM imo- ready to merge pending CI pass
@kerneltime
|
248ff71
to
c4d7def
Compare
@brunotm Yes, but the latest push from you has passed. The good news is that CI will a lot more robust here on.. but it did need everyone pushing to rebase and pull latest from master. |
@kerneltime |
if vm_uuid and vm_datastore: | ||
src_vol_info = kv.get_vol_info(src_vmdk_path) | ||
opts["size"] = src_vol_info["size"] | ||
error_info, tenant_uuid, tenant_name = auth.authorize(vm_uuid, vm_datastore, "create", opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be src_datastore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this @kerneltime, i understand that we're missing a explicit auth on the src_datastore. But its not only that, we're also missing a check of the src_volume file existence and a lock on it for the clone operation.
Thanks!
6790414
to
b02a629
Compare
@kerneltime @msterin Thanks. |
error_info, tenant_uuid, tenant_name = auth.authorize(vm_uuid, | ||
src_datastore, auth.CMD_ATTACH, {}) | ||
if error_info: | ||
return err(error_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could log if the user is not authorized. May be not in this change as the change is more or less final and looking good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@govint
Pushed the last commit again with the suggested changes.
Thanks!
vmdk_ops.cloneVMDK: add auth on src_datastore @kerneltime vmdk_ops.cloneVMDK: check if source volume file exists vmdk_ops.cloneVMDK: fix logging on invalid datastores vmdk_ops.cloneVMDK: use kv.CLONE_FROM instead of literals vmdk_ops.cloneVMDK: Log errors in tenant authorisation for src_datastore @govint vmdk_utils: add get_datastore_from_vmdk_path()
b02a629
to
b64b6f6
Compare
@@ -71,3 +71,11 @@ docker volume create --driver=vmdk --name=MyVolume -o size=10gb -o fstype=ext4 ( | |||
|
|||
Specifies which filesystem will be created on the new volume. vSphere Docker Volume Service will search for a existing /sbin/mkfs.**fstype** on the docker host to create the filesystem, and if not found it will return a list of filesystems for which it has found a corresponding mkfs. The specified filesystem must be supported by the running kernel and support labels (-L flag for mkfs). Defaults to ext4 if not specified. | |||
|
|||
### clone-from | |||
``` | |||
docker volume create --driver=vmdk --name=CloneVolume -o clone-from=MyVolume -o access=read-only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you have multi-datstore setup. Did you test following;
docker volume create --driver=vmdk --name=CloneVolume@DS2 -o clone-from=MyVolume@DS3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, on the examples above in the thread:
Create 3rd delta from 2nd on different datastore
root@dev01:~# docker volume create -d vmdk --name vol$$-delta3@datastore1 -o clone-from=vol$$-delta2@datastore2 -o diskformat=delta
vol34189-delta3@datastore1
root@dev01:~# docker run -it -v vol$$-delta3@datastore1:/mnt busybox
/ # ls /mnt/
delta-ds1.img group hosts lost+found mtab resolv.conf
delta-ds2 hostname localtime master.img passwd shadow
/ # md5sum /mnt/master.img
8f4e33f3dc3e414ff94e5fb6905cba8c /mnt/master.img
/ # md5sum /mnt/delta-ds1.img
5f363e0e58a95f06cbe9bbc662c5dfb6 /mnt/delta-ds1.img
/ # md5sum /mnt/delta-ds2
f1c9645dbc14efddc7d8a322685f26eb /mnt/delta-ds2
root@dev01:~# docker rm -vf $(docker ps -qa)
91aaa793bce2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Awesome! Thanks again @brunotm for your contribution. With required approvals and green CI, I am merging this PR. |
@pdhamdhere |
Fixes #462 |
This PR adds volume cloning support, including delta disks.
The use cases would be:
This needs #723 to correctly report the size of delta disks.
EDIT
The delta allocation format was dropped because the resulting volumes were not deltas from their parents but full sparse clones. More bellow.