mbox series

[B,0/1] Bionic kernel 4.15.0-136 causes VMM freezes due to lack of KVM patch

Message ID 20210303213348.31319-1-gpiccoli@canonical.com
Headers show
Series Bionic kernel 4.15.0-136 causes VMM freezes due to lack of KVM patch | expand

Message

Guilherme G. Piccoli March 3, 2021, 9:33 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1917138

[Impact]
* Since kernel 4.15.0-136, Bionic kernel included a very complex KVM fix for a
kind of "race" in interrupt window with irqchip-split (reported in [0]).
The fix was proposed in the form of a patch series containing 2 patches [1];
this was merged in Ubuntu though the stable tree, in the form of the following
commit: 71cc849b7093 ("KVM: x86: Fix split-irqchip vs interrupt injection window request") [2]

* The problem is that such commit has a companion required commit, which was
not proposed in the stable tree. In fact, there was a confusion among KVM
community and the stable maintainer [3], due to the lack of such missing
commit - because of that, the series was removed from stable trees 4.14.y
and 4.9.y, but the solo commit was merged alone in Ubuntu kernel.

* Without the companion patch, we might have a KVM infinite "loop" condition in
the core IRQ handling, since the merged commit requires an extra check in
kvm_cpu_has_extint() and a condition "inversion" in kvm_cpu_get_extint(), only
present in the missing companion patch. Users reported that this manifested as
dosemu2 (running in KVM mode) to be stuck in kernel 4.15.0-136 and -137, while
works fine in 4.15.0-135 and the -137 plus the companion patch.

* So, we hereby backport the companion commit, originally upstream patch:
72c3bcdcda ("KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint") [4]

[Test Case]
* The test case proposed was the reported bug: try running dosemu2 (with
kvm mode enabled) and it fails without the companion commit.

* In order to test the correctness of both fixes together, we could rely in the
test proposed in [0] (running a guest with "noapic"), but it wasn't consistent
and the VMM wasn't mentioned, so we might have a workaround mechanism in qemu,
for example, preventing such test to reproduce the issue.

[Where problems could occur]
* Since this is a KVM core modification, it could affect interrupt handling in
KVM but without the fix, we are already experiencing a bug. Also, both commits
were backported to 5.4.y and 4.19.y, so Focal and subsequent released are
already running with them.

[0] https://lore.kernel.org/kvm/62918f65ec78f8990278a6a0db0567968fa23e49.camel@infradead.org/

[1] https://lore.kernel.org/kvm/20201127112114.3219360-1-pbonzini@redhat.com/

[2] http://git.kernel.org/linus/71cc849b70

[3] https://lore.kernel.org/stable/d29c4b25-33f6-8d99-7a45-8f4e06f5ade6@redhat.com/

[4] http://git.kernel.org/linus/72c3bcdcda


Paolo Bonzini (1):
  KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint

 arch/x86/kvm/irq.c   | 65 ++++++++++++++++++++++++--------------------
 arch/x86/kvm/lapic.c |  2 +-
 2 files changed, 37 insertions(+), 30 deletions(-)