Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Serialization of multiple operation requests (Create/Delete) #695

Closed
kerneltime opened this issue Nov 3, 2016 · 16 comments
Closed

Serialization of multiple operation requests (Create/Delete) #695

kerneltime opened this issue Nov 3, 2016 · 16 comments
Assignees

Comments

@kerneltime
Copy link
Contributor

A test run for #693 failed.
The failure is not related to the PR. The sanity test post multi-threaded change seems flaky.

build-failure.txt

@brunotm
Copy link
Contributor

brunotm commented Nov 3, 2016

Hello @kerneltime, this is related to the kv/metadata issues and they where happening in the pre-threaded code. See #662 and #626. The parallel tests seems to trigger them more easily.

The kv creation/save terminates with error codes from disklib/libvmsnapshot. It is possible to have the description of those error codes? In the log its 4008.

11/03/16 08:19:21 2254625 [Ubuntu.1404.vsan-volTestP14] [WARNING] Side car create for /vmfs/volumes/vsanDatastore/dockvols/volTestP14.vmdk failed - 4008

@govint
Copy link
Contributor

govint commented Nov 3, 2016

Actually, unless the KV file has been modified some way, there isn't code
to retain single KV pairs while removing all else. and the APIs used are
Python read/write APIs. Actually, till now we haven't seen a repro of the
kind of KV state reported in #662. Nor do we have logs to look at.

That said, multi-threading the server may be opening up new scenarios which we don't as yet handle when accessing meta-data.

On Thu, Nov 3, 2016 at 11:17 PM, Bruno Moura notifications@github.com
wrote:

Hello @kerneltime https://github.com/kerneltime, this is related to the
kv/metadata issues and they where happening in the pre-threaded code. See
#662 #662 and #626
#626. The
parallel tests seems to trigger them more easily.

The kv creation/save terminates with error codes from disklib. It is
possible to have the description of those error codes? In the log its 4008.

11/03/16 08:19:21 2254625 [Ubuntu.1404.vsan-volTestP14] [WARNING] Side car
create for /vmfs/volumes/vsanDatastore/dockvols/volTestP14.vmdk failed -
4008


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#695 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/APHsePllb1xwNaHqpSPP-AvKDWB8RNtYks5q6h4WgaJpZM4KoseE
.

@govint
Copy link
Contributor

govint commented Nov 3, 2016

From the test logs looks like this is a repro of #626. One thing to note is that we use Python open(), read, write() APIs to access the KV.

KV file creation is done with the ESX private library APIs and that "always" works. Because we are able to create the file with the ESX library APIs and close the KV file via the ESX private lib API immediately after creating the KV file. And that close op will close the file.

But thereafter we use Python open()/read/write APIs to load/store the KV. Now the open is done in the code and Python is expected to close the file. When that happens isn't in our control and with possibly multiple threads accessing the same file its possible that the file is still open when another thread comes in to store to the KV which can result in this scenario. But exaclty why two threads would be accessing the same file during create seems odd.

Let me confirm the exact scenario from the logs.

@brunotm
Copy link
Contributor

brunotm commented Nov 3, 2016

@govint

KV file creation is done with the ESX private library APIs and that "always" works. Because we are able to create the file with the ESX library APIs and close the KV file via the ESX private lib API immediately after creating the KV file. And that close op will close the file.

The error code 4008 comes from disklib:

if use_sidecar_create:
       res = lib.DiskLib_SidecarCreate(dhandle, DVOL_KEY.encode(),
                                       KV_CREATE_SIZE, KV_SIDECAR_CREATE,
                                       byref(obj_handle))
    else:
       res = lib.DiskLib_SidecarOpen(dhandle, DVOL_KEY.encode(),
                                     KV_SIDECAR_CREATE,
                                     byref(obj_handle))
    if res != 0:
       logging.warning("Side car create for %s failed - %x", volpath, res)
       lib.DiskLib_Close(dhandle)
       return False

@govint
Copy link
Contributor

govint commented Nov 3, 2016

True, but the cause seems to be the same. From the ESX side, during steps of the file creation, the KV file (assume thats the one) has been found to be "EBUSY" and that gets translated to this error. Which matches the busy error reported in #626.

I'll dig more into how the ESX side logs if thats available and also the code paths when the KV file gets created.

@brunotm
Copy link
Contributor

brunotm commented Nov 3, 2016

@govint
Not saying they aren't related, but we have 3 distinct errors. 2 from the python stack in open calls:

  • vmfd file absent after being created
  • the ebusy on kV save (workaround in place)

And the 3 on side car creation failed with the 4008 from disklib.

We can of course serialize disklib operations and observe. But it would be really helpful to understand error codes from disklib.

@brunotm
Copy link
Contributor

brunotm commented Nov 3, 2016

A side note on this, currently from vmdk_ops for any operation on a volume we get an exclusive volume lock for it. The only code path that would allow concurrent access to the kV for the same volume would be on listVMDK(), but it doesn't use the kV at all.

@kerneltime
Copy link
Contributor Author

hostd.log
hostd.txt

@brunotm
Copy link
Contributor

brunotm commented Nov 4, 2016

@kerneltime
From the logs we ended up in a situation where again we have volumes without the kv and need manual cleanup.

Can i purpose the merge of a less noisy version of #698 (that's clean and easily revertible) so we can gather actual data and better understand when this happens in the kv/disklib/thread code?

From actual build log of #698, _in parallel tests only_ we had 273 kvLock acquires, 40 very short waits to acquire the lock (~14.65%), and none of them where for the same volumes as suggested. (as said before it is not possible from vmdk_ops thread-requests, that have volume level locks)

Of course that build is not representative (because of the existing volumes without the kv). And I would like to get data from clean builds when they fail, to identify and solve the problem asap.

//CC @govint @pdhamdhere

@govint
Copy link
Contributor

govint commented Nov 9, 2016

Looking at the logs (from the test run) seems like there was a parallel access (create from VM2 and attach from VM1) for the disk at some point.

There are at least two volumes (volTestP04 and volTestP14) that show a scenario thats possible only with two threads hitting the same disk in parallel - each one being a create vs. an attach or get in parallel.

Given that the test was done with locking and with parallel accesses and logs aren't really complete the analysis is approximate. But doubt that the race is inside the ESX system libs

a. volTestP14 (not all logs are available)

i. There are two errors clearly

  • Error in creating the KV for volume volTestP14
  • And error for Get for the same volume volTestP14

First, create happens - the VMDK is created first and then the KV file (which happens to be a sidecar for the VMDK)
11/03/16 08:19:17 2254625 [photon.ga.vsan-volTestP04] [INFO ] *** createVMDK: /vmfs/volumes/vsanDatastore/dockvols/volTestP04.vmdk opts = {u'size': u'1gb', u'fstype': u'ext4'}

VMDK is created here but the side car creation failed with a "BUSY" error

11/03/16 08:19:21 2254625 [Ubuntu.1404.vsan-volTestP14] [WARNING] Side car create for /vmfs/volumes/vsanDatastore/dockvols/volTestP14.vmdk failed - 4008

11/03/16 08:19:23 2254625 [Ubuntu.1404.vsan-volTestP14] [WARNING] Failed to create metadata kv store for /vmfs/volumes/vsanDatastore/dockvols/volTestP14.vmdk

And the VMDK is getting removed,
11/03/16 08:19:23 2254625 [Ubuntu.1404.vsan-volTestP14] [INFO ] *** removeVMDK: /vmfs/volumes/vsanDatastore/dockvols/volTestP14.vmdk

ii. The create request completes having removed the VMDK

11/03/16 08:19:25 2254625 [Ubuntu.1404.vsan-volTestP14] [INFO ] executeRequest 'create' completed with ret={u'Error': 'Failed to create metadata kv store for /vmfs/volumes/vsanDatastore/dockvols/
volTestP14.vmdk'}

iv. VM1 (??) seems to have issued a Get() which fails right away with a "missing disk error".

11/03/16 08:19:25 2254625 [Ubuntu.1404.vsan-volTestP14] [INFO ] executeRequest 'get' completed with ret={u'Error': 'Volume volTestP14 not found (file: /vmfs/volumes/vsanDatastore/dockvols/volTest
P14.vmdk)'}

Its unclear how the get() was issued when the create failed - no logs. And whether there were other calls made that accessed the same VMDK file (since its there) and caused a busy error to get reported on KV file create.

b. Client log #2 - This client is VM2 which is doing the create request and got the error

2016-11-03 01:18:52.785479122 -0700 PDT [INFO] Attaching volume and creating filesystem name=volTestP13 fstype=ext4
2016-11-03 01:19:01.903464997 -0700 PDT [INFO] Volume and filesystem created fstype=ext4 name=volTestP13
2016-11-03 01:19:03.126066682 -0700 PDT [INFO] Removing volume name=volTestP13
2016-11-03 01:19:25.801264017 -0700 PDT [ERROR] Create volume failed name=volTestP14 error="Failed to create metadata kv store for /vmfs/volumes/vsanDatastore/dockvols/volTestP14.vmdk"

a. volTestP04 (logs available) - issue doesn't happen because of retry loop on KV save(). But clearly there seems to be a parallel access

i. Volume creation starts

11/03/16 08:19:17 2254625 [photon.ga.vsan-volTestP04] [INFO ] *** createVMDK: /vmfs/volumes/vsanDatastore/dockvols/volTestP04.vmdk opts = {u'size': u'1gb', u'fstype': u'ext4'}

ii. While another thread seems to be attempting a write of the same side car KV file (which matches the volume attach in client logs from VM1).

11/03/16 08:19:24 2254625 [photon.ga.vsan-volTestP04] [WARNING] Meta file /vmfs/volumes/vsanDatastore/dockvols/volTestP04-42961025dc8ffd7a.vmfd busy for save(), retrying...
11/03/16 08:19:25 2254625 [photon.ga.vsan-volTestP04] [WARNING] Meta file /vmfs/volumes/vsanDatastore/dockvols/volTestP04-42961025dc8ffd7a.vmfd busy for save(), retrying...

iii. The create proceeds and completed and the guest formats the filesystem on the new disk volTestP04.

11/03/16 08:19:26 2254625 [photon.ga.vsan-volTestP04] [INFO ] executeRequest 'create' completed with ret=None
11/03/16 08:19:26 2254625 [photon.ga.vsan-volTestP04] [INFO ] *** attachVMDK: /vmfs/volumes/vsanDatastore/dockvols/volTestP04.vmdk to photon.ga.vsan VM uuid = 564d5ef2-89dc-1cfa-abc6-1289c486759b
11/03/16 08:19:26 2254625 [photon.ga.vsan-volTestP04] [INFO ] Attaching /vmfs/volumes/vsanDatastore/dockvols/volTestP04.vmdk as independent_persistent
11/03/16 08:19:27 2254625 [photon.ga.vsan-volTestP04] [INFO ] Disk /vmfs/volumes/vsanDatastore/dockvols/volTestP04.vmdk successfully attached. controller pci_slot_number=192, disk_slot=0
11/03/16 08:19:27 2254625 [photon.ga.vsan-volTestP04] [INFO ] executeRequest 'attach' completed with ret={'ControllerPciSlotNumber': '192', 'Unit': '0'}
11/03/16 08:19:29 2254625 [photon.ga.vsan-volTestP04] [INFO ] *** detachVMDK: /vmfs/volumes/vsanDatastore/dockvols/volTestP04.vmdk from photon.ga.vsan VM uuid = 564d5ef2-89dc-1cfa-abc6-1289c48675
9b

Overall, the multi-threaded server exposed an issue in the plugin server itself:

a. The plugin server doesn't implement any state for the stages involved when a disk is created, Ideally, the disk shouldn't be available or even visible until KV is created and populated. Only then the disk should be visible for any request.

b. Locking down access to the KV as done in #708 is ok, but its not addressing the root issue that a disk can be accessed and used ahead of it being "ready".

@brunotm
Copy link
Contributor

brunotm commented Nov 9, 2016

@govint
I have mentioned this a few times:

Right now they'll be serialized in the lifetime of a request but an op (mainly create/delete) are a composition of them, for that we'll need a higher level lock and remember the plugin intention with some kind of id and state. I have some ideas for that and can propose something as soon as I have time.

How do you think that should be handled ?

@kerneltime kerneltime added the P0 label Nov 11, 2016
@kerneltime
Copy link
Contributor Author

The original issue around tests being flaky due to vmfd file being deleted has been addressed, renaming this to address the concern raised by @brunotm around serialization of multiple op requests.

@kerneltime kerneltime changed the title Sanity test is flaky. Serialization of multiple operation requests (Create/Delete) Nov 11, 2016
@govint
Copy link
Contributor

govint commented Nov 17, 2016

Are there any more changes to be made for this issue? @brunotm @kerneltime

@brunotm
Copy link
Contributor

brunotm commented Nov 17, 2016

@govint
I think this can be handled more gracefully (and simpler) with your suggestion of visibility control in state changes, rather than introducing a higher level locking mechanism. (higher than volume/request).

So specifically for this issue, if we decide to go the state/visibility path, no.
If needed i can look at it by the end of month.

Thanks.

@govint
Copy link
Contributor

govint commented Feb 13, 2017

For this issue I'm suggesting going with last update from @brunotm. There are no more changes needed unless we do see an issue happening in our testing. If we agree suggest closing this issue, unless a problem is identified.

@govint
Copy link
Contributor

govint commented Feb 15, 2017

Closing per last updates.

@govint govint closed this as completed Feb 15, 2017
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

No branches or pull requests

4 participants