This repository has been archived by the owner on May 12, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 111
[qemu] q35: PCI bus rescan code is messing up PCIe hotplug #781
Labels
Comments
devimc
pushed a commit
to devimc/kata-agent
that referenced
this issue
May 7, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed fixes kata-containers#781 fixes kata-containers/runtime#2664 [1]: clearcontainers/agent#139 [2]: kata-containers/runtime#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc
pushed a commit
to devimc/kata-agent
that referenced
this issue
May 7, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed Depends-on: github.com/kata-containers/runtime#2670 fixes kata-containers#781 fixes kata-containers/runtime#2664 [1] clearcontainers/agent#139 [2] kata-containers/runtime#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc
pushed a commit
to devimc/kata-agent
that referenced
this issue
May 8, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed Depends-on: github.com/kata-containers/runtime#2670 fixes kata-containers#781 fixes kata-containers/runtime#2664 [1] clearcontainers/agent#139 [2] kata-containers/runtime#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
This issue covers a number of separate bugs. The one that's most relevant (and not covered by other more specific issues) is the problems caused by PCI rescans colliding with hotplug operations. Can we retitle it to reflect that? |
@dgibson tittle updated |
devimc
pushed a commit
to devimc/kata-agent
that referenced
this issue
Sep 4, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug in Kata Containers. A workaround to this issue is the "lazy attach" mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the PCI bus, unfourtunately some non-LBS devices are being affected too, for instance SR-IOV devices. It would not make sense to lazy-attach non-LBS devices because kata will end up lazy-attaching all the devices, having said that, the PCI bus rescan code and the "lazy attach" mechanism should be removed Depends-on: github.com/kata-containers/runtime#2670 fixes kata-containers#781 fixes kata-containers/runtime#2664 [1] clearcontainers/agent#139 [2] kata-containers/runtime#2461 Signed-off-by: Julio Montes <julio.montes@intel.com>
This was referenced Sep 23, 2020
As @jodh-intel pointed out in #782, removing the rescan has the side effect that we don't wait until the hotplugged device is ready before starting the container, which breaks the CI (probably amongst other things). I'm submitting a new PR which includes @devimc's patch to remove the rescan along with extra logic to have the agent explicitly wait for the VFIO hotplug to complete. |
I'm no longer planning to try to get this finished for Kata1, I'll be pursuing it with Kata2 instead. |
# for free
to subscribe to this conversation on GitHub.
Already have an account?
#.
see kata-containers/runtime#2664
The text was updated successfully, but these errors were encountered: