-
Notifications
You must be signed in to change notification settings - Fork 95
CI is taking more time than usual while running tests on vsan datastore #1057
Comments
Refcnt test is taking min 2+ and max a little over 3 minutes and its run like 3 - 4 times in this test (why??). And each time its for waiting for the plugin to restart (after the volumes are created and attached to 5 containers). Seems like a plugin start up issue (managed plugin?). Refcnt test is flaky that it looks for logs to figure correct reporting of refcnt for a volume. Lets change the plugin to add a refcnt field to the volume status, so refcnt and any user can figure how many users are there for a volume. The plugin can populate the refcnt field after getting the volume meta-data from the ESX service. With the managed plugin preferably use the API to figure volume usage vs. grep on plugin logs. Besides that the real question is why the plugin took so long to come up and report one volume in use. |
Does.the degradation apply to esx unit tests ? |
It seems to me. Currently test-esx is running twice once against 6.5 ESX and another shot against 6.0 ESX and the difference is 3+ minutes (test running against 6.0ESX taking more time). |
vm-test (go test) runs 4 times: 2 times from vms on VMFS backed by 6.0/6.5 ESX and other 2 times for vms exist on VSAN datastore backed by 6.0/6.5 ESX @govint Please let me know if I have missed any question from your comment. If I understand correctly you are proposing to change the plugin to report refcount inline with volume status. |
@shuklanirdesh82 the CI run doesn't have the plugin logs, where can we get those. |
Ran locally with Photon VMs on VMFS, Should be able to figure exactly whats causing the slowness on CI (only) if we have the plugin and ESX service logs. |
The testbed (single node vsan) is being or has been retired. Closing this issue unless the slow runs are observed again. |
reopening this issue, #1088 has disabled tests on vsan60 |
The issue has been root caused, logs are collected for following both runs and following up with VSAN team further.
|
so what is the root cause ? |
Keeping in open state as discussed offline(Thanks @msterin! ) |
Checked with VSAN performance team with the collected performance data from our run. The outcome is inline with vsan performance team too. There are some extra steps while creating object on the vSAN datastore compare with VMFS datastore. In addition to that our CI is using nested ESX VMs. We have planned to improve our test execution hence closing this issue as |
I've noticed so many timeouts today so I went ahead and grabbed some data that elevates some concern.
Following are the data grabbed from https://ci.vmware.run/vmware/docker-volume-vsphere/1686 ... It is highly possible that some bad code should have gone in past couple months.
Some observations:
vms on vsan
vsvms on vmfs
TestConcurrency (3462.32s)
taking close to ~1hr that is ridiculousTest run with test execution time
on ESX 6.5 using VMs created on VMFS
on ESX 6.5 using VMs created on VSAN
on ESX 6.0 using VMs created on VMFS
on ESX 6.0 using VMs created on VSAN
Note: The last successful run before commenting out testConcurrency I can recall is https://ci.vmware.run/vmware/docker-volume-vsphere/1020 like ~2months back where the test completion data is not bad as shown above.
/cc @govint @msterin @pdhamdhere
The text was updated successfully, but these errors were encountered: