diff mbox

[1/1,V4] qemu-kvm: fix improper nmi emulation

Message ID 4E97D85C.7070107@cn.fujitsu.com
State New
Headers show

Commit Message

Lai Jiangshan Oct. 14, 2011, 6:36 a.m. UTC
On 10/14/2011 01:53 PM, Jan Kiszka wrote:
> On 2011-10-14 02:53, Lai Jiangshan wrote:
>>
>>>
>>> As explained in some other mail, we could then emulate the missing
>>> kernel feature by reading out the current in-kernel APIC state, testing
>>> if LINT1 is unmasked, and then delivering the NMI directly.
>>>
>>
>> Only the thread of the VCPU can safely get the in-kernel LAPIC states,
>> so this approach will cause some troubles.
> 
> run_on_cpu() can help.
> 
> Jan
> 

Ah, I forgot it, Thanks.

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is maskied in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, inject-nmi request is handled as follows.

- When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
  interrupt.
- When in-kernel irqchip is enabled, get the in-kernel LAPIC states
  and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
  delivering the NMI directly. (Suggested by Jan Kiszka)

Changed from old version:
  re-implement it by the Jan's suggestion.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
---
 hw/apic.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 hw/apic.h |    1 +
 monitor.c |    6 +++++-
 3 files changed, 54 insertions(+), 1 deletions(-)

Comments

Jan Kiszka Oct. 14, 2011, 6:49 a.m. UTC | #1
On 2011-10-14 08:36, Lai Jiangshan wrote:
> On 10/14/2011 01:53 PM, Jan Kiszka wrote:
>> On 2011-10-14 02:53, Lai Jiangshan wrote:
>>>
>>>>
>>>> As explained in some other mail, we could then emulate the missing
>>>> kernel feature by reading out the current in-kernel APIC state, testing
>>>> if LINT1 is unmasked, and then delivering the NMI directly.
>>>>
>>>
>>> Only the thread of the VCPU can safely get the in-kernel LAPIC states,
>>> so this approach will cause some troubles.
>>
>> run_on_cpu() can help.
>>
>> Jan
>>
> 
> Ah, I forgot it, Thanks.
> 
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
> button event happens. This doesn't properly emulate real hardware on
> which NMI button event triggers LINT1. Because of this, NMI is sent to
> the processor even when LINT1 is maskied in LVT. For example, this
> causes the problem that kdump initiated by NMI sometimes doesn't work
> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
> 
> With this patch, inject-nmi request is handled as follows.
> 
> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>   interrupt.
> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>   delivering the NMI directly. (Suggested by Jan Kiszka)
> 
> Changed from old version:
>   re-implement it by the Jan's suggestion.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
> ---
>  hw/apic.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/apic.h |    1 +
>  monitor.c |    6 +++++-
>  3 files changed, 54 insertions(+), 1 deletions(-)
> diff --git a/hw/apic.c b/hw/apic.c
> index 69d6ac5..9a40129 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -205,6 +205,54 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
>      }
>  }
>  
> +#ifdef KVM_CAP_IRQCHIP

Again, this is always defined on x86 thus pointless to test.

> +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
> +
> +struct kvm_get_remote_lapic_params {
> +    CPUState *env;
> +    struct kvm_lapic_state klapic;
> +};
> +
> +static void kvm_get_remote_lapic(void *p)
> +{
> +    struct kvm_get_remote_lapic_params *params = p;
> +
> +    kvm_get_lapic(params->env, &params->klapic);

When you already interrupted that vcpu, why not inject from here? Avoids
one further ping-pong round.

> +}
> +
> +void apic_deliver_nmi(DeviceState *d)
> +{
> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
> +
> +    if (kvm_irqchip_in_kernel()) {
> +        struct kvm_get_remote_lapic_params p = {.env = s->cpu_env,};
> +        uint32_t lvt;
> +
> +        run_on_cpu(s->cpu_env, kvm_get_remote_lapic, &p);
> +        lvt = kapic_reg(&p.klapic, 0x32 + APIC_LVT_LINT1);
> +
> +        if (lvt & APIC_LVT_MASKED) {
> +            return;
> +        }
> +
> +        if (((lvt >> 8) & 7) != APIC_DM_NMI) {
> +            return;
> +        }
> +
> +        cpu_interrupt(s->cpu_env, CPU_INTERRUPT_NMI);

Err, aren't you introducing KVM_CAP_LAPIC_NMI that allows to test if
this workaround is needed? Oh, your latest kernel patch is missing this
again - requires fixing as well.

Jan
Lai Jiangshan Oct. 14, 2011, 7:43 a.m. UTC | #2
On 10/14/2011 02:49 PM, Jan Kiszka wrote:
> On 2011-10-14 08:36, Lai Jiangshan wrote:
>> On 10/14/2011 01:53 PM, Jan Kiszka wrote:
>>> On 2011-10-14 02:53, Lai Jiangshan wrote:
>>>>
>>>>>
>>>>> As explained in some other mail, we could then emulate the missing
>>>>> kernel feature by reading out the current in-kernel APIC state, testing
>>>>> if LINT1 is unmasked, and then delivering the NMI directly.
>>>>>
>>>>
>>>> Only the thread of the VCPU can safely get the in-kernel LAPIC states,
>>>> so this approach will cause some troubles.
>>>
>>> run_on_cpu() can help.
>>>
>>> Jan
>>>
>>
>> Ah, I forgot it, Thanks.
>>
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>
>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
>> button event happens. This doesn't properly emulate real hardware on
>> which NMI button event triggers LINT1. Because of this, NMI is sent to
>> the processor even when LINT1 is maskied in LVT. For example, this
>> causes the problem that kdump initiated by NMI sometimes doesn't work
>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
>>
>> With this patch, inject-nmi request is handled as follows.
>>
>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>>   interrupt.
>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>>   delivering the NMI directly. (Suggested by Jan Kiszka)
>>
>> Changed from old version:
>>   re-implement it by the Jan's suggestion.
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
>> ---
>>  hw/apic.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  hw/apic.h |    1 +
>>  monitor.c |    6 +++++-
>>  3 files changed, 54 insertions(+), 1 deletions(-)
>> diff --git a/hw/apic.c b/hw/apic.c
>> index 69d6ac5..9a40129 100644
>> --- a/hw/apic.c
>> +++ b/hw/apic.c
>> @@ -205,6 +205,54 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
>>      }
>>  }
>>  
>> +#ifdef KVM_CAP_IRQCHIP
> 
> Again, this is always defined on x86 thus pointless to test.
> 
>> +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
>> +
>> +struct kvm_get_remote_lapic_params {
>> +    CPUState *env;
>> +    struct kvm_lapic_state klapic;
>> +};
>> +
>> +static void kvm_get_remote_lapic(void *p)
>> +{
>> +    struct kvm_get_remote_lapic_params *params = p;
>> +
>> +    kvm_get_lapic(params->env, &params->klapic);
> 
> When you already interrupted that vcpu, why not inject from here? Avoids
> one further ping-pong round.

get_remote_lapic and inject nmi are two different things,
so I don't inject nmi from here. I didn't notice this ping-pond overhead.
Thank you.

> 
>> +}
>> +
>> +void apic_deliver_nmi(DeviceState *d)
>> +{
>> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
>> +
>> +    if (kvm_irqchip_in_kernel()) {
>> +        struct kvm_get_remote_lapic_params p = {.env = s->cpu_env,};
>> +        uint32_t lvt;
>> +
>> +        run_on_cpu(s->cpu_env, kvm_get_remote_lapic, &p);
>> +        lvt = kapic_reg(&p.klapic, 0x32 + APIC_LVT_LINT1);
>> +
>> +        if (lvt & APIC_LVT_MASKED) {
>> +            return;
>> +        }
>> +
>> +        if (((lvt >> 8) & 7) != APIC_DM_NMI) {
>> +            return;
>> +        }
>> +
>> +        cpu_interrupt(s->cpu_env, CPU_INTERRUPT_NMI);
> 
> Err, aren't you introducing KVM_CAP_LAPIC_NMI that allows to test if
> this workaround is needed? Oh, your latest kernel patch is missing this
> again - requires fixing as well.
> 


Kernel site patch is dropped with this v4 patch.

Did you mean you want KVM_CAP_SET_LINT1 + KVM_SET_LINT1 patches?
I have made them.

Sent soon.

Thanks,
Lai
Jan Kiszka Oct. 14, 2011, 8:31 a.m. UTC | #3
On 2011-10-14 09:43, Lai Jiangshan wrote:
> On 10/14/2011 02:49 PM, Jan Kiszka wrote:
>> On 2011-10-14 08:36, Lai Jiangshan wrote:
>>> On 10/14/2011 01:53 PM, Jan Kiszka wrote:
>>>> On 2011-10-14 02:53, Lai Jiangshan wrote:
>>>>>
>>>>>>
>>>>>> As explained in some other mail, we could then emulate the missing
>>>>>> kernel feature by reading out the current in-kernel APIC state, testing
>>>>>> if LINT1 is unmasked, and then delivering the NMI directly.
>>>>>>
>>>>>
>>>>> Only the thread of the VCPU can safely get the in-kernel LAPIC states,
>>>>> so this approach will cause some troubles.
>>>>
>>>> run_on_cpu() can help.
>>>>
>>>> Jan
>>>>
>>>
>>> Ah, I forgot it, Thanks.
>>>
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>
>>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
>>> button event happens. This doesn't properly emulate real hardware on
>>> which NMI button event triggers LINT1. Because of this, NMI is sent to
>>> the processor even when LINT1 is maskied in LVT. For example, this
>>> causes the problem that kdump initiated by NMI sometimes doesn't work
>>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
>>>
>>> With this patch, inject-nmi request is handled as follows.
>>>
>>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>>>   interrupt.
>>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>>>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>>>   delivering the NMI directly. (Suggested by Jan Kiszka)
>>>
>>> Changed from old version:
>>>   re-implement it by the Jan's suggestion.
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
>>> ---
>>>  hw/apic.c |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>  hw/apic.h |    1 +
>>>  monitor.c |    6 +++++-
>>>  3 files changed, 54 insertions(+), 1 deletions(-)
>>> diff --git a/hw/apic.c b/hw/apic.c
>>> index 69d6ac5..9a40129 100644
>>> --- a/hw/apic.c
>>> +++ b/hw/apic.c
>>> @@ -205,6 +205,54 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
>>>      }
>>>  }
>>>  
>>> +#ifdef KVM_CAP_IRQCHIP
>>
>> Again, this is always defined on x86 thus pointless to test.
>>
>>> +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
>>> +
>>> +struct kvm_get_remote_lapic_params {
>>> +    CPUState *env;
>>> +    struct kvm_lapic_state klapic;
>>> +};
>>> +
>>> +static void kvm_get_remote_lapic(void *p)
>>> +{
>>> +    struct kvm_get_remote_lapic_params *params = p;
>>> +
>>> +    kvm_get_lapic(params->env, &params->klapic);
>>
>> When you already interrupted that vcpu, why not inject from here? Avoids
>> one further ping-pong round.
> 
> get_remote_lapic and inject nmi are two different things,
> so I don't inject nmi from here. I didn't notice this ping-pond overhead.
> Thank you.

Actually, it is not performance-critical. But there is a race between
obtaining the APIC state and testing for the NMI injection path. So it's
better to define an on-vcpu LINT1 NMI injection service.

> 
>>
>>> +}
>>> +
>>> +void apic_deliver_nmi(DeviceState *d)
>>> +{
>>> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
>>> +
>>> +    if (kvm_irqchip_in_kernel()) {
>>> +        struct kvm_get_remote_lapic_params p = {.env = s->cpu_env,};
>>> +        uint32_t lvt;
>>> +
>>> +        run_on_cpu(s->cpu_env, kvm_get_remote_lapic, &p);
>>> +        lvt = kapic_reg(&p.klapic, 0x32 + APIC_LVT_LINT1);
>>> +
>>> +        if (lvt & APIC_LVT_MASKED) {
>>> +            return;
>>> +        }
>>> +
>>> +        if (((lvt >> 8) & 7) != APIC_DM_NMI) {
>>> +            return;
>>> +        }
>>> +
>>> +        cpu_interrupt(s->cpu_env, CPU_INTERRUPT_NMI);
>>
>> Err, aren't you introducing KVM_CAP_LAPIC_NMI that allows to test if
>> this workaround is needed? Oh, your latest kernel patch is missing this
>> again - requires fixing as well.
>>
> 
> 
> Kernel site patch is dropped with this v4 patch.
> 
> Did you mean you want KVM_CAP_SET_LINT1 + KVM_SET_LINT1 patches?
> I have made them.

OK, so this is going to be applied on top? Then I take this remark back.

Jan
diff mbox

Patch

diff --git a/hw/apic.c b/hw/apic.c
index 69d6ac5..9a40129 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -205,6 +205,54 @@  void apic_deliver_pic_intr(DeviceState *d, int level)
     }
 }
 
+#ifdef KVM_CAP_IRQCHIP
+static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
+
+struct kvm_get_remote_lapic_params {
+    CPUState *env;
+    struct kvm_lapic_state klapic;
+};
+
+static void kvm_get_remote_lapic(void *p)
+{
+    struct kvm_get_remote_lapic_params *params = p;
+
+    kvm_get_lapic(params->env, &params->klapic);
+}
+
+void apic_deliver_nmi(DeviceState *d)
+{
+    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+    if (kvm_irqchip_in_kernel()) {
+        struct kvm_get_remote_lapic_params p = {.env = s->cpu_env,};
+        uint32_t lvt;
+
+        run_on_cpu(s->cpu_env, kvm_get_remote_lapic, &p);
+        lvt = kapic_reg(&p.klapic, 0x32 + APIC_LVT_LINT1);
+
+        if (lvt & APIC_LVT_MASKED) {
+            return;
+        }
+
+        if (((lvt >> 8) & 7) != APIC_DM_NMI) {
+            return;
+        }
+
+        cpu_interrupt(s->cpu_env, CPU_INTERRUPT_NMI);
+    } else {
+        apic_local_deliver(s, APIC_LVT_LINT1);
+    }
+}
+#else
+void apic_deliver_nmi(DeviceState *d)
+{
+    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+    apic_local_deliver(s, APIC_LVT_LINT1);
+}
+#endif
+
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
     int __i, __j, __mask;\
diff --git a/hw/apic.h b/hw/apic.h
index c857d52..3a4be0a 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -10,6 +10,7 @@  void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
                              uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
+void apic_deliver_nmi(DeviceState *d);
 int apic_get_interrupt(DeviceState *s);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
diff --git a/monitor.c b/monitor.c
index cb485bf..0b81f17 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2616,7 +2616,11 @@  static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data)
     CPUState *env;
 
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
-        cpu_interrupt(env, CPU_INTERRUPT_NMI);
+        if (!env->apic_state) {
+            cpu_interrupt(env, CPU_INTERRUPT_NMI);
+        } else {
+            apic_deliver_nmi(env->apic_state);
+        }
     }
 
     return 0;