diff mbox

[2/2,V2] qemu-kvm: fix improper nmi emulation

Message ID 4E9476E2.1070804@cn.fujitsu.com
State New
Headers show

Commit Message

Lai Jiangshan Oct. 11, 2011, 5:03 p.m. UTC
From: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is maskied in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, inject-nmi request is handled as follows.

- When in-kernel irqchip is disabled, inject LINT1 instead of NMI
  interrupt.
- When in-kernel irqchip is enabled, send nmi event to kernel as the
  current code does. LINT1 should be emulated in kernel.

(laijs) changed from v1:
use KVM_CAP_LAPIC_NMI
adjust the pic_deliver_nmi() API

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Tested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 hw/apic.c |   26 ++++++++++++++++++++++++++
 hw/apic.h |    1 +
 monitor.c |    6 +++++-
 3 files changed, 32 insertions(+), 1 deletions(-)

Comments

Jan Kiszka Oct. 11, 2011, 6:17 p.m. UTC | #1
On 2011-10-11 19:03, Lai Jiangshan wrote:
> From: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
> 
> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
> button event happens. This doesn't properly emulate real hardware on
> which NMI button event triggers LINT1. Because of this, NMI is sent to
> the processor even when LINT1 is maskied in LVT. For example, this
> causes the problem that kdump initiated by NMI sometimes doesn't work
> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
> 
> With this patch, inject-nmi request is handled as follows.
> 
> - When in-kernel irqchip is disabled, inject LINT1 instead of NMI
>   interrupt.
> - When in-kernel irqchip is enabled, send nmi event to kernel as the
>   current code does. LINT1 should be emulated in kernel.
> 
> (laijs) changed from v1:
> use KVM_CAP_LAPIC_NMI
> adjust the pic_deliver_nmi() API
> 
> Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
> Tested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  hw/apic.c |   26 ++++++++++++++++++++++++++
>  hw/apic.h |    1 +
>  monitor.c |    6 +++++-
>  3 files changed, 32 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/apic.c b/hw/apic.c
> index 69d6ac5..76e8208 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -205,6 +205,32 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
>      }
>  }
>  
> +void apic_deliver_nmi(DeviceState *d)
> +{
> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
> +
> +#ifdef KVM_CAP_LAPIC_NMI

Unneeded #ifdef, x86 has this defined unconditionally.

> +    static int kernel_lapic_nmi;

Some enum with symbolic states would be more readable.

> +
> +    if (kernel_lapic_nmi == 0) {
> +        if (!kvm_enabled() || !kvm_irqchip_in_kernel() ||
> +            !kvm_check_extension(kvm_state, KVM_CAP_LAPIC_NMI)) {

This is wrong: If we run with in-kernel irqchip on an old kernel without
KVM_CAP_LAPIC_NMI, we still must not call into the user space APIC model.

As explained in some other mail, we could then emulate the missing
kernel feature by reading out the current in-kernel APIC state, testing
if LINT1 is unmasked, and then delivering the NMI directly.

Jan
Lai Jiangshan Oct. 14, 2011, 12:53 a.m. UTC | #2
> 
> As explained in some other mail, we could then emulate the missing
> kernel feature by reading out the current in-kernel APIC state, testing
> if LINT1 is unmasked, and then delivering the NMI directly.
> 

Only the thread of the VCPU can safely get the in-kernel LAPIC states,
so this approach will cause some troubles.

Since it is a kernel bug, I think we need to change the kernel.

Thanks,
lai
Jan Kiszka Oct. 14, 2011, 5:53 a.m. UTC | #3
On 2011-10-14 02:53, Lai Jiangshan wrote:
> 
>>
>> As explained in some other mail, we could then emulate the missing
>> kernel feature by reading out the current in-kernel APIC state, testing
>> if LINT1 is unmasked, and then delivering the NMI directly.
>>
> 
> Only the thread of the VCPU can safely get the in-kernel LAPIC states,
> so this approach will cause some troubles.

run_on_cpu() can help.

Jan
diff mbox

Patch

diff --git a/hw/apic.c b/hw/apic.c
index 69d6ac5..76e8208 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -205,6 +205,32 @@  void apic_deliver_pic_intr(DeviceState *d, int level)
     }
 }
 
+void apic_deliver_nmi(DeviceState *d)
+{
+    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+#ifdef KVM_CAP_LAPIC_NMI
+    static int kernel_lapic_nmi;
+
+    if (kernel_lapic_nmi == 0) {
+        if (!kvm_enabled() || !kvm_irqchip_in_kernel() ||
+            !kvm_check_extension(kvm_state, KVM_CAP_LAPIC_NMI)) {
+            kernel_lapic_nmi = -1;
+        } else {
+            kernel_lapic_nmi = 1;
+        }
+    }
+#else
+    int kernel_lapic_nmi = -1;
+#endif
+
+    if (kernel_lapic_nmi == 1) {
+        cpu_interrupt(s->cpu_env, CPU_INTERRUPT_NMI);
+    } else {
+        apic_local_deliver(s, APIC_LVT_LINT1);
+    }
+}
+
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
     int __i, __j, __mask;\
diff --git a/hw/apic.h b/hw/apic.h
index c857d52..3a4be0a 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -10,6 +10,7 @@  void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
                              uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
+void apic_deliver_nmi(DeviceState *d);
 int apic_get_interrupt(DeviceState *s);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
diff --git a/monitor.c b/monitor.c
index cb485bf..0b81f17 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2616,7 +2616,11 @@  static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data)
     CPUState *env;
 
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
-        cpu_interrupt(env, CPU_INTERRUPT_NMI);
+        if (!env->apic_state) {
+            cpu_interrupt(env, CPU_INTERRUPT_NMI);
+        } else {
+            apic_deliver_nmi(env->apic_state);
+        }
     }
 
     return 0;