Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

[qemu] q35: PCI bus rescan code is messing up PCIe hotplug #781

Closed
devimc opened this issue May 7, 2020 · 4 comments
Closed

[qemu] q35: PCI bus rescan code is messing up PCIe hotplug #781

devimc opened this issue May 7, 2020 · 4 comments
Assignees
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.

Comments

@devimc
Copy link

devimc commented May 7, 2020

see kata-containers/runtime#2664

@devimc devimc added bug Incorrect behaviour needs-review Needs to be assessed by the team. labels May 7, 2020
@devimc devimc self-assigned this May 7, 2020
devimc pushed a commit to devimc/kata-agent that referenced this issue May 7, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of
ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug
in Kata Containers. A workaround to this issue is the "lazy attach"
mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the
PCI bus, unfourtunately some non-LBS devices are being affected too, for
instance SR-IOV devices. It would not make sense to lazy-attach non-LBS
devices because kata will end up lazy-attaching all the devices, having said
that, the PCI bus rescan code and the "lazy attach" mechanism should be removed

fixes kata-containers#781
fixes kata-containers/runtime#2664

[1]: clearcontainers/agent#139
[2]: kata-containers/runtime#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-agent that referenced this issue May 7, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of
ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug
in Kata Containers. A workaround to this issue is the "lazy attach"
mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the
PCI bus, unfourtunately some non-LBS devices are being affected too, for
instance SR-IOV devices. It would not make sense to lazy-attach non-LBS
devices because kata will end up lazy-attaching all the devices, having said
that, the PCI bus rescan code and the "lazy attach" mechanism should be removed

Depends-on: github.com/kata-containers/runtime#2670
fixes kata-containers#781
fixes kata-containers/runtime#2664

[1] clearcontainers/agent#139
[2] kata-containers/runtime#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
devimc pushed a commit to devimc/kata-agent that referenced this issue May 8, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of
ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug
in Kata Containers. A workaround to this issue is the "lazy attach"
mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the
PCI bus, unfourtunately some non-LBS devices are being affected too, for
instance SR-IOV devices. It would not make sense to lazy-attach non-LBS
devices because kata will end up lazy-attaching all the devices, having said
that, the PCI bus rescan code and the "lazy attach" mechanism should be removed

Depends-on: github.com/kata-containers/runtime#2670
fixes kata-containers#781
fixes kata-containers/runtime#2664

[1] clearcontainers/agent#139
[2] kata-containers/runtime#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
@dgibson
Copy link
Contributor

dgibson commented Sep 3, 2020

This issue covers a number of separate bugs. The one that's most relevant (and not covered by other more specific issues) is the problems caused by PCI rescans colliding with hotplug operations. Can we retitle it to reflect that?

@devimc devimc changed the title [qemu] q35 VFIO passthrough fails on both bridge and pcie-root-port [qemu] q35: PCI bus rescan code is messing up PCIe hotplug Sep 3, 2020
@devimc
Copy link
Author

devimc commented Sep 3, 2020

@dgibson tittle updated

devimc pushed a commit to devimc/kata-agent that referenced this issue Sep 4, 2020
PCI bus rescan code was added long time ago in Clear Containers due to lack of
ACPI support in QEMU 2.9 + q35 [1]. Now this code is messing up PCIe hotplug
in Kata Containers. A workaround to this issue is the "lazy attach"
mechanism [2] that hotplugs LBS (Large BAR space) devices after re-scanning the
PCI bus, unfourtunately some non-LBS devices are being affected too, for
instance SR-IOV devices. It would not make sense to lazy-attach non-LBS
devices because kata will end up lazy-attaching all the devices, having said
that, the PCI bus rescan code and the "lazy attach" mechanism should be removed

Depends-on: github.com/kata-containers/runtime#2670
fixes kata-containers#781
fixes kata-containers/runtime#2664

[1] clearcontainers/agent#139
[2] kata-containers/runtime#2461

Signed-off-by: Julio Montes <julio.montes@intel.com>
@dgibson
Copy link
Contributor

dgibson commented Sep 23, 2020

As @jodh-intel pointed out in #782, removing the rescan has the side effect that we don't wait until the hotplugged device is ready before starting the container, which breaks the CI (probably amongst other things).

I'm submitting a new PR which includes @devimc's patch to remove the rescan along with extra logic to have the agent explicitly wait for the VFIO hotplug to complete.

@dgibson
Copy link
Contributor

dgibson commented Dec 18, 2020

I'm no longer planning to try to get this finished for Kata1, I'll be pursuing it with Kata2 instead.

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
bug Incorrect behaviour needs-review Needs to be assessed by the team.
Projects
None yet
2 participants