mbox series

[0/1,SRU,B,F] KVM emulation failure when booting into VM crash kernel with multiple CPUs

Message ID 20211027171516.313906-1-halves@canonical.com
Headers show
Series KVM emulation failure when booting into VM crash kernel with multiple CPUs | expand

Message

Heitor Alves de Siqueira Oct. 27, 2021, 5:15 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1948862

[Impact]
When kexec'ing into a crash kernel with ncpus > 1, VMs can raise a KVM
emulation failure. This will cause the VM to go into the "paused"
state, and prevents it from being restored without a full VM restart.

This happens only when there are multiple enabled CPUs in the crash
kernel command-line, regardless of whether `nr_cpus` or `maxcpus` is
being used. Due to the vCPU MMU state not being cleaned up correctly,
the secondary CPUs try to access virtual addresses with a faulty MMU
context that will result in the emulation failure. This shows up with
a similar spew as below:

$ sudo tail -n20 /var/log/libvirt/qemu/focal-vm.log
KVM internal error. Suberror: 1
emulation failure
EAX=0000de8f EBX=00000000 ECX=0000008f EDX=00000600
ESI=00000000 EDI=00000000 EBP=00000000 ESP=0000f90c
EIP=0000cdb1 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 000f0000 0000ffff 00009b00
SS =de00 000de000 0000ffff 00009300
DS =de00 000de000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=290b8001 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=66 83 c4 28 66 5b 66 c3 66 56 66 53 66 52 b1 8f 88 c8 e6 70 <e4> 71 66 0f
b6 f0 66 89 f2 67 88 54 24 03 88 c8 e6 70 66 31 db 88 d8 e6 71 66 56 66 68 1a

[Test Plan]
1. Boot an Ubuntu guest VM with e.g. multipass:
$ multipass launch daily:focal -c8 -m16g -n focal-vm

2. Configure guest crash kernel command-line with `nr_cpus=8`:
ubuntu@focal-vm:~$ grep CMDLINE_APPEND /etc/default/kdump-tools
# KDUMP_CMDLINE_APPEND - Additional arguments to append to the command line
KDUMP_CMDLINE_APPEND="reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=8 irqpoll nousb ata_piix.prefer_ms_hyperv=0"

3. Crash guest VM and watch for the KVM emulation failure:
ubuntu@focal-vm:~$ echo c | sudo tee /proc/sysrq-trigger

[Where problems could occur]
As we're resetting MMU context on vCPUs, potential regressions would
show up in workloads relying on KVM guests. We should properly test
the scenario mentioned in the bug to make sure secondary CPUs are
being cleaned up properly, and that no other regressions have been
introduced when rebooting or kexec'ing into different kernels.
Since we're adding an MMU reset at kvm_vcpu_reset(), the overall
regression potential should be fairly low and contained to
starting/resetting vCPUs (i.e. VM start and reboot).

[Other info]
This has been fixed by upstream commit:
  0aa1837533e5 KVM: x86: Properly reset MMU context at vCPU RESET/INIT

The commit above has been picked up by stable trees up until 5.11, so
it's only needed in Bionic and Focal (4.15 and 5.4 kernels). There are
also two follow up commits, which revert the vendor-specific resets:
  5d2d7e41e3b8 KVM: SVM: Drop explicit MMU reset at RESET/INIT
  61152cd907d5 KVM: VMX: Remove explicit MMU reset in enter_rmode()

These follow ups have not been picked up in stable trees due to the
risk of regressions. According to the original fix, they have been
introduced primarily to aid bisection in case there are workflows
relying on the vendor resets. As these are not required for the fix
and don't conflict with the backport, we should leave them out to
prevent potential regressions in the older kernels.

Sean Christopherson (1):
  KVM: x86: Properly reset MMU context at vCPU RESET/INIT

 arch/x86/kvm/x86.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Tim Gardner Oct. 27, 2021, 5:37 p.m. UTC | #1
Acked-by: Tim Gardner <tim.gardner@canonical.com>

On 10/27/21 11:15 AM, Heitor Alves de Siqueira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1948862
> 
> [Impact]
> When kexec'ing into a crash kernel with ncpus > 1, VMs can raise a KVM
> emulation failure. This will cause the VM to go into the "paused"
> state, and prevents it from being restored without a full VM restart.
> 
> This happens only when there are multiple enabled CPUs in the crash
> kernel command-line, regardless of whether `nr_cpus` or `maxcpus` is
> being used. Due to the vCPU MMU state not being cleaned up correctly,
> the secondary CPUs try to access virtual addresses with a faulty MMU
> context that will result in the emulation failure. This shows up with
> a similar spew as below:
> 
> $ sudo tail -n20 /var/log/libvirt/qemu/focal-vm.log
> KVM internal error. Suberror: 1
> emulation failure
> EAX=0000de8f EBX=00000000 ECX=0000008f EDX=00000600
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=0000f90c
> EIP=0000cdb1 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 000f0000 0000ffff 00009b00
> SS =de00 000de000 0000ffff 00009300
> DS =de00 000de000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT= 00000000 0000ffff
> IDT= 00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=290b8001 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=66 83 c4 28 66 5b 66 c3 66 56 66 53 66 52 b1 8f 88 c8 e6 70 <e4> 71 66 0f
> b6 f0 66 89 f2 67 88 54 24 03 88 c8 e6 70 66 31 db 88 d8 e6 71 66 56 66 68 1a
> 
> [Test Plan]
> 1. Boot an Ubuntu guest VM with e.g. multipass:
> $ multipass launch daily:focal -c8 -m16g -n focal-vm
> 
> 2. Configure guest crash kernel command-line with `nr_cpus=8`:
> ubuntu@focal-vm:~$ grep CMDLINE_APPEND /etc/default/kdump-tools
> # KDUMP_CMDLINE_APPEND - Additional arguments to append to the command line
> KDUMP_CMDLINE_APPEND="reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=8 irqpoll nousb ata_piix.prefer_ms_hyperv=0"
> 
> 3. Crash guest VM and watch for the KVM emulation failure:
> ubuntu@focal-vm:~$ echo c | sudo tee /proc/sysrq-trigger
> 
> [Where problems could occur]
> As we're resetting MMU context on vCPUs, potential regressions would
> show up in workloads relying on KVM guests. We should properly test
> the scenario mentioned in the bug to make sure secondary CPUs are
> being cleaned up properly, and that no other regressions have been
> introduced when rebooting or kexec'ing into different kernels.
> Since we're adding an MMU reset at kvm_vcpu_reset(), the overall
> regression potential should be fairly low and contained to
> starting/resetting vCPUs (i.e. VM start and reboot).
> 
> [Other info]
> This has been fixed by upstream commit:
>    0aa1837533e5 KVM: x86: Properly reset MMU context at vCPU RESET/INIT
> 
> The commit above has been picked up by stable trees up until 5.11, so
> it's only needed in Bionic and Focal (4.15 and 5.4 kernels). There are
> also two follow up commits, which revert the vendor-specific resets:
>    5d2d7e41e3b8 KVM: SVM: Drop explicit MMU reset at RESET/INIT
>    61152cd907d5 KVM: VMX: Remove explicit MMU reset in enter_rmode()
> 
> These follow ups have not been picked up in stable trees due to the
> risk of regressions. According to the original fix, they have been
> introduced primarily to aid bisection in case there are workflows
> relying on the vendor resets. As these are not required for the fix
> and don't conflict with the backport, we should leave them out to
> prevent potential regressions in the older kernels.
> 
> Sean Christopherson (1):
>    KVM: x86: Properly reset MMU context at vCPU RESET/INIT
> 
>   arch/x86/kvm/x86.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
>
Kleber Sacilotto de Souza Nov. 4, 2021, 4:59 p.m. UTC | #2
On 27.10.21 19:15, Heitor Alves de Siqueira wrote:
> BugLink: https://bugs.launchpad.net/bugs/1948862
> 
> [Impact]
> When kexec'ing into a crash kernel with ncpus > 1, VMs can raise a KVM
> emulation failure. This will cause the VM to go into the "paused"
> state, and prevents it from being restored without a full VM restart.
> 
> This happens only when there are multiple enabled CPUs in the crash
> kernel command-line, regardless of whether `nr_cpus` or `maxcpus` is
> being used. Due to the vCPU MMU state not being cleaned up correctly,
> the secondary CPUs try to access virtual addresses with a faulty MMU
> context that will result in the emulation failure. This shows up with
> a similar spew as below:
> 
> $ sudo tail -n20 /var/log/libvirt/qemu/focal-vm.log
> KVM internal error. Suberror: 1
> emulation failure
> EAX=0000de8f EBX=00000000 ECX=0000008f EDX=00000600
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=0000f90c
> EIP=0000cdb1 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 000f0000 0000ffff 00009b00
> SS =de00 000de000 0000ffff 00009300
> DS =de00 000de000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT= 00000000 0000ffff
> IDT= 00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=290b8001 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=66 83 c4 28 66 5b 66 c3 66 56 66 53 66 52 b1 8f 88 c8 e6 70 <e4> 71 66 0f
> b6 f0 66 89 f2 67 88 54 24 03 88 c8 e6 70 66 31 db 88 d8 e6 71 66 56 66 68 1a
> 
> [Test Plan]
> 1. Boot an Ubuntu guest VM with e.g. multipass:
> $ multipass launch daily:focal -c8 -m16g -n focal-vm
> 
> 2. Configure guest crash kernel command-line with `nr_cpus=8`:
> ubuntu@focal-vm:~$ grep CMDLINE_APPEND /etc/default/kdump-tools
> # KDUMP_CMDLINE_APPEND - Additional arguments to append to the command line
> KDUMP_CMDLINE_APPEND="reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=8 irqpoll nousb ata_piix.prefer_ms_hyperv=0"
> 
> 3. Crash guest VM and watch for the KVM emulation failure:
> ubuntu@focal-vm:~$ echo c | sudo tee /proc/sysrq-trigger
> 
> [Where problems could occur]
> As we're resetting MMU context on vCPUs, potential regressions would
> show up in workloads relying on KVM guests. We should properly test
> the scenario mentioned in the bug to make sure secondary CPUs are
> being cleaned up properly, and that no other regressions have been
> introduced when rebooting or kexec'ing into different kernels.
> Since we're adding an MMU reset at kvm_vcpu_reset(), the overall
> regression potential should be fairly low and contained to
> starting/resetting vCPUs (i.e. VM start and reboot).
> 
> [Other info]
> This has been fixed by upstream commit:
>    0aa1837533e5 KVM: x86: Properly reset MMU context at vCPU RESET/INIT
> 
> The commit above has been picked up by stable trees up until 5.11, so
> it's only needed in Bionic and Focal (4.15 and 5.4 kernels). There are
> also two follow up commits, which revert the vendor-specific resets:
>    5d2d7e41e3b8 KVM: SVM: Drop explicit MMU reset at RESET/INIT
>    61152cd907d5 KVM: VMX: Remove explicit MMU reset in enter_rmode()
> 
> These follow ups have not been picked up in stable trees due to the
> risk of regressions. According to the original fix, they have been
> introduced primarily to aid bisection in case there are workflows
> relying on the vendor resets. As these are not required for the fix
> and don't conflict with the backport, we should leave them out to
> prevent potential regressions in the older kernels.
> 
> Sean Christopherson (1):
>    KVM: x86: Properly reset MMU context at vCPU RESET/INIT
> 
>   arch/x86/kvm/x86.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 

Applied to [bionic/focal]:linux.

Thanks,
Kleber