-
Notifications
You must be signed in to change notification settings - Fork 95
Serialization of multiple operation requests (Create/Delete) #695
Comments
Hello @kerneltime, this is related to the kv/metadata issues and they where happening in the pre-threaded code. See #662 and #626. The parallel tests seems to trigger them more easily. The kv creation/save terminates with error codes from disklib/libvmsnapshot. It is possible to have the description of those error codes? In the log its 4008.
|
Actually, unless the KV file has been modified some way, there isn't code That said, multi-threading the server may be opening up new scenarios which we don't as yet handle when accessing meta-data. On Thu, Nov 3, 2016 at 11:17 PM, Bruno Moura notifications@github.com
|
From the test logs looks like this is a repro of #626. One thing to note is that we use Python open(), read, write() APIs to access the KV. KV file creation is done with the ESX private library APIs and that "always" works. Because we are able to create the file with the ESX library APIs and close the KV file via the ESX private lib API immediately after creating the KV file. And that close op will close the file. But thereafter we use Python open()/read/write APIs to load/store the KV. Now the open is done in the code and Python is expected to close the file. When that happens isn't in our control and with possibly multiple threads accessing the same file its possible that the file is still open when another thread comes in to store to the KV which can result in this scenario. But exaclty why two threads would be accessing the same file during create seems odd. Let me confirm the exact scenario from the logs. |
The error code 4008 comes from disklib:
|
True, but the cause seems to be the same. From the ESX side, during steps of the file creation, the KV file (assume thats the one) has been found to be "EBUSY" and that gets translated to this error. Which matches the busy error reported in #626. I'll dig more into how the ESX side logs if thats available and also the code paths when the KV file gets created. |
@govint
And the 3 on side car creation failed with the 4008 from disklib. We can of course serialize disklib operations and observe. But it would be really helpful to understand error codes from disklib. |
A side note on this, currently from vmdk_ops for any operation on a volume we get an exclusive volume lock for it. The only code path that would allow concurrent access to the kV for the same volume would be on listVMDK(), but it doesn't use the kV at all. |
hostd.log |
@kerneltime Can i purpose the merge of a less noisy version of #698 (that's clean and easily revertible) so we can gather actual data and better understand when this happens in the kv/disklib/thread code? From actual build log of #698, _in parallel tests only_ we had 273 kvLock acquires, 40 very short waits to acquire the lock (~14.65%), and none of them where for the same volumes as suggested. (as said before it is not possible from vmdk_ops thread-requests, that have volume level locks) Of course that build is not representative (because of the existing volumes without the kv). And I would like to get data from clean builds when they fail, to identify and solve the problem asap. //CC @govint @pdhamdhere |
Looking at the logs (from the test run) seems like there was a parallel access (create from VM2 and attach from VM1) for the disk at some point. There are at least two volumes (volTestP04 and volTestP14) that show a scenario thats possible only with two threads hitting the same disk in parallel - each one being a create vs. an attach or get in parallel. Given that the test was done with locking and with parallel accesses and logs aren't really complete the analysis is approximate. But doubt that the race is inside the ESX system libs a. volTestP14 (not all logs are available) i. There are two errors clearly
First, create happens - the VMDK is created first and then the KV file (which happens to be a sidecar for the VMDK) VMDK is created here but the side car creation failed with a "BUSY" error 11/03/16 08:19:21 2254625 [Ubuntu.1404.vsan-volTestP14] [WARNING] Side car create for /vmfs/volumes/vsanDatastore/dockvols/volTestP14.vmdk failed - 4008 11/03/16 08:19:23 2254625 [Ubuntu.1404.vsan-volTestP14] [WARNING] Failed to create metadata kv store for /vmfs/volumes/vsanDatastore/dockvols/volTestP14.vmdk And the VMDK is getting removed, ii. The create request completes having removed the VMDK 11/03/16 08:19:25 2254625 [Ubuntu.1404.vsan-volTestP14] [INFO ] executeRequest 'create' completed with ret={u'Error': 'Failed to create metadata kv store for /vmfs/volumes/vsanDatastore/dockvols/ iv. VM1 (??) seems to have issued a Get() which fails right away with a "missing disk error". 11/03/16 08:19:25 2254625 [Ubuntu.1404.vsan-volTestP14] [INFO ] executeRequest 'get' completed with ret={u'Error': 'Volume volTestP14 not found (file: /vmfs/volumes/vsanDatastore/dockvols/volTest Its unclear how the get() was issued when the create failed - no logs. And whether there were other calls made that accessed the same VMDK file (since its there) and caused a busy error to get reported on KV file create. b. Client log #2 - This client is VM2 which is doing the create request and got the error 2016-11-03 01:18:52.785479122 -0700 PDT [INFO] Attaching volume and creating filesystem name=volTestP13 fstype=ext4 a. volTestP04 (logs available) - issue doesn't happen because of retry loop on KV save(). But clearly there seems to be a parallel access i. Volume creation starts 11/03/16 08:19:17 2254625 [photon.ga.vsan-volTestP04] [INFO ] *** createVMDK: /vmfs/volumes/vsanDatastore/dockvols/volTestP04.vmdk opts = {u'size': u'1gb', u'fstype': u'ext4'} ii. While another thread seems to be attempting a write of the same side car KV file (which matches the volume attach in client logs from VM1). 11/03/16 08:19:24 2254625 [photon.ga.vsan-volTestP04] [WARNING] Meta file /vmfs/volumes/vsanDatastore/dockvols/volTestP04-42961025dc8ffd7a.vmfd busy for save(), retrying... iii. The create proceeds and completed and the guest formats the filesystem on the new disk volTestP04. 11/03/16 08:19:26 2254625 [photon.ga.vsan-volTestP04] [INFO ] executeRequest 'create' completed with ret=None Overall, the multi-threaded server exposed an issue in the plugin server itself: a. The plugin server doesn't implement any state for the stages involved when a disk is created, Ideally, the disk shouldn't be available or even visible until KV is created and populated. Only then the disk should be visible for any request. b. Locking down access to the KV as done in #708 is ok, but its not addressing the root issue that a disk can be accessed and used ahead of it being "ready". |
@govint
How do you think that should be handled ? |
The original issue around tests being flaky due to vmfd file being deleted has been addressed, renaming this to address the concern raised by @brunotm around serialization of multiple op requests. |
Are there any more changes to be made for this issue? @brunotm @kerneltime |
@govint So specifically for this issue, if we decide to go the state/visibility path, no. Thanks. |
For this issue I'm suggesting going with last update from @brunotm. There are no more changes needed unless we do see an issue happening in our testing. If we agree suggest closing this issue, unless a problem is identified. |
Closing per last updates. |
A test run for #693 failed.
The failure is not related to the PR. The sanity test post multi-threaded change seems flaky.
build-failure.txt
The text was updated successfully, but these errors were encountered: