-
Notifications
You must be signed in to change notification settings - Fork 373
[qemu] q35 VFIO passthrough fails on both bridge and pcie-root-port #2664
Comments
Confirmed that forcing isLargeBarSpace to return true also fixes the root-port scenario, having the following log:
So, I tried to reproduce the race condition generically, running
I don't think this depends on the size of the BAR. Even if the problem was solved in the kernel, the result would be to serialize the hotplug and the rescan (making the rescan hold the slot lock?), which does not solve the 5s delay that justified the rescan in the first place. There is still something I'm missing because I cannot reproduce the race on hotplug on the root-port. However, forcing kata to delay the plug (forcing isLargeBarSpace to return true) solves the issue. If I don't do that I simply see:
and nothing else... Regardless of the mysterious disappearing of the device, I see the following ways forward:
|
@amorenoz I'm running some tests to see if we can rid of pci-rescan, this way lazy attach won't be needed |
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed fixes kata-containers#781 fixes kata-containers/runtime#2664 [1]: clearcontainers/agent#139 [2]: kata-containers/runtime#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
The "lazy attach" mechanism [1] was added to hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, fixing LBS hotplug in kata containers. Since PCI rescan is removed in kata-containers/agent#782, lazy attach is not longer needed. Depends-on: github.com/kata-containers/agent#782 fixes kata-containers#2664 [1] kata-containers#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed Depends-on: github.com/kata-containers/runtime#2670 fixes kata-containers#781 fixes kata-containers/runtime#2664 [1] clearcontainers/agent#139 [2] kata-containers/runtime#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
The "lazy attach" mechanism [1] was added to hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, fixing LBS hotplug in kata containers. Since PCI rescan is removed in kata-containers/agent#782, lazy attach is not longer needed. Depends-on: github.com/kata-containers/agent#782 fixes kata-containers#2664 [1] kata-containers#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
Thanks @devimc. In the tests I ran disabling the rescan worked fine. Also, the pcie-root-port hotplug did not incur in the 5s delay. |
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed Depends-on: github.com/kata-containers/runtime#2670 fixes kata-containers#781 fixes kata-containers/runtime#2664 [1] clearcontainers/agent#139 [2] kata-containers/runtime#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
Splitting Problem 2 described above to a new issue: #2678 |
The presence of /sys/bus/pci/slots seems to be very random; looking at 4 machines, 2 of them have it and 2 of them has it completely empty, even though there are PCIe devices in there. |
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed Depends-on: github.com/kata-containers/runtime#2670 fixes kata-containers#781 fixes kata-containers/runtime#2664 [1] clearcontainers/agent#139 [2] kata-containers/runtime#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
The original report covers two problems. The |
We send information about several kinds of devices to the agent so that it can apply specific handling. We don't currently do this with VFIO devices. However we need to do that so that the agent can properly wait for VFIO devices to be ready (previously it did that using a PCI rescan which may not be reliable and has some very bad side effects). This patch collates and sends the relevant information. Depends-on: github.com/kata-containers/agent#850 fixes kata-containers#2664 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The "lazy attach" mechanism [1] was added to hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, fixing LBS hotplug in kata containers. Since PCI rescan is removed in kata-containers/agent#782, lazy attach is not longer needed. fixes kata-containers#2664 [1] kata-containers#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
I'm no longer planning to pursue this in Kata1, I'll be following up in Kata 2 instead. |
I have recently tried VFIO passthrough with and without SR-IOV and detected a number of problems that make it fail.
Reporting them as a single issue since they all contribute to "VFIO passthrough not working". Let me know if you prefer to split them.
The below tests were performed with the following device:
65:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
Description of the problem
Generically, trying to add a PF or VF via VFIO passthrough fails
Problem 1: pci-rescan
The pci-rescan triggered by kata-agent makes shpchp fail. Basically, the same as reported in #2460.
@devimc pointed me to the origin origin of the rescan which seems to be related to lack of ACPI hotplug support in old versions of qemu.
Question: Could we implement a mechanism to tell the agent whether a rescan is needed?
Should we keep the rescan as the default behaviour in that case?
Alternatively, could the mechanism that was implemented in to fix #2460 be extended to support not only devices with large BARs.
Failing dmesg:
And, lspci -vv reports:
FWIW: Forcing isLargeBarSpace to return true, eliminates the issue.
This problem is reproducible both with the PF or with one of its VFs
Problem 2: isPcieDevice does not account for pcie-root-port
When trying to work around problem 1, I tried to take the pcie-pci-bridge out of the picture, so I enabled:
and tried to add the PF:
Looking at the logs, the following caught my eye:
specially the part where it says it's not a PCIe device
Looking at isPCIeDevice and at my PCI tree:
it seems it's not being detected as a PCIe device although it being connected to a pcie-root-port.
Note that even forcing the hotplug into the pci-root-device does not work. However, repeating the process manually (via QMP) does work, which makes me think the use of pci-root-device is still affected by the rescan race condition described above.
Expected result
We should de able to add both PFs and VFs to a kata container both using a bridge and a pcie-root-port
Actual result
It is not possible to add a PF or VF to a kata container neither on a bridge nor on a pcie-root-port.
Credits:
Thanks to @devimc for his help troubleshooting the issues
The text was updated successfully, but these errors were encountered: