diff mbox

[1/1,V6] qemu-kvm: fix improper nmi emulation

Message ID 4E9C5106.5070506@cn.fujitsu.com
State New
Headers show

Commit Message

Lai Jiangshan Oct. 17, 2011, 4 p.m. UTC
On 10/17/2011 05:49 PM, Avi Kivity wrote:
> On 10/17/2011 11:40 AM, Lai Jiangshan wrote:
>>>>
>>>
>>> LINT1 may have been programmed as a level -triggered interrupt instead
>>> of edge triggered (NMI or interrupt).  We can use the ioctl argument for
>>> the level (and pressing the NMI button needs to pulse the level to 1 and
>>> back to 0).
>>>
>>
>> Hi, Avi, Jan,
>>
>> Which approach you prefer to?
>> I need to know the result before wasting too much time to respin
>> the approach.
> 
> Yes, sorry about the slow and sometimes conflicting feedback.
> 
>> 1) Fix KVM_NMI emulation approach  (which is v3 patchset)
>> 	- It directly fixes the problem and matches the
>> 	  real hard ware more, but it changes KVM_NMI bahavior.
>> 	- Require both kernel-site and userspace-site fix.
>>
>> 2) Get the LAPIC state from kernel irqchip, and inject NMI if it is allowed
>>    (which is v4 patchset)
>> 	- Simple, don't changes any kernel behavior.
>> 	- Only need the userspace-site fix
>>
>> 3) Add KVM_SET_LINT1 approach (which is v5 patchset)
>> 	- don't changes the kernel's KVM_NMI behavior.
>> 	- much complex
>> 	- Require both kernel-site and userspace-site fix.
>> 	- userspace-site should also handle the !KVM_SET_LINT1
>> 	  condition, it uses all the 2) approach' code. it means
>> 	  this approach equals the 2) approach + KVM_SET_LINT1 ioctl.
>>
>> This is an urgent bug of us, we need to settle it down soo
> 
> While (1) is simple, it overloads a single ioctl with two meanings,
> that's not so good.
> 
> Whether we do (1) or (3), we need (2) as well, for older kernels.
> 
> So I recommend first focusing on (2) and merging it, then doing (3).
> 
> (note an additional issue with 3 is whether to make it a vm or vcpu
> ioctl - we've been assuming vcpu ioctl but it's not necessarily the best
> choice).
> 

It is the 2) approach.
It only changes the user space site, the kernel site is not touched.
It is changed from previous v4 patch, fixed problems found by Jan.
----------------------------

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
button event happens. This doesn't properly emulate real hardware on
which NMI button event triggers LINT1. Because of this, NMI is sent to
the processor even when LINT1 is maskied in LVT. For example, this
causes the problem that kdump initiated by NMI sometimes doesn't work
on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.

With this patch, inject-nmi request is handled as follows.

- When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
  interrupt.
- When in-kernel irqchip is enabled, get the in-kernel LAPIC states
  and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
  delivering the NMI directly. (Suggested by Jan Kiszka)

Changed from old version:
  re-implement it by the Jan's suggestion.
  fix the race found by Jan.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
---
 hw/apic.c |   33 +++++++++++++++++++++++++++++++++
 hw/apic.h |    1 +
 monitor.c |    6 +++++-
 3 files changed, 39 insertions(+), 1 deletions(-)

Comments

Jan Kiszka Oct. 18, 2011, 7:41 p.m. UTC | #1
On 2011-10-17 18:00, Lai Jiangshan wrote:
> On 10/17/2011 05:49 PM, Avi Kivity wrote:
>> On 10/17/2011 11:40 AM, Lai Jiangshan wrote:
>>>>>
>>>>
>>>> LINT1 may have been programmed as a level -triggered interrupt instead
>>>> of edge triggered (NMI or interrupt).  We can use the ioctl argument for
>>>> the level (and pressing the NMI button needs to pulse the level to 1 and
>>>> back to 0).
>>>>
>>>
>>> Hi, Avi, Jan,
>>>
>>> Which approach you prefer to?
>>> I need to know the result before wasting too much time to respin
>>> the approach.
>>
>> Yes, sorry about the slow and sometimes conflicting feedback.
>>
>>> 1) Fix KVM_NMI emulation approach  (which is v3 patchset)
>>> 	- It directly fixes the problem and matches the
>>> 	  real hard ware more, but it changes KVM_NMI bahavior.
>>> 	- Require both kernel-site and userspace-site fix.
>>>
>>> 2) Get the LAPIC state from kernel irqchip, and inject NMI if it is allowed
>>>    (which is v4 patchset)
>>> 	- Simple, don't changes any kernel behavior.
>>> 	- Only need the userspace-site fix
>>>
>>> 3) Add KVM_SET_LINT1 approach (which is v5 patchset)
>>> 	- don't changes the kernel's KVM_NMI behavior.
>>> 	- much complex
>>> 	- Require both kernel-site and userspace-site fix.
>>> 	- userspace-site should also handle the !KVM_SET_LINT1
>>> 	  condition, it uses all the 2) approach' code. it means
>>> 	  this approach equals the 2) approach + KVM_SET_LINT1 ioctl.
>>>
>>> This is an urgent bug of us, we need to settle it down soo
>>
>> While (1) is simple, it overloads a single ioctl with two meanings,
>> that's not so good.
>>
>> Whether we do (1) or (3), we need (2) as well, for older kernels.
>>
>> So I recommend first focusing on (2) and merging it, then doing (3).
>>
>> (note an additional issue with 3 is whether to make it a vm or vcpu
>> ioctl - we've been assuming vcpu ioctl but it's not necessarily the best
>> choice).
>>
> 
> It is the 2) approach.
> It only changes the user space site, the kernel site is not touched.
> It is changed from previous v4 patch, fixed problems found by Jan.
> ----------------------------
> 
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
> button event happens. This doesn't properly emulate real hardware on
> which NMI button event triggers LINT1. Because of this, NMI is sent to
> the processor even when LINT1 is maskied in LVT. For example, this
> causes the problem that kdump initiated by NMI sometimes doesn't work
> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
> 
> With this patch, inject-nmi request is handled as follows.
> 
> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>   interrupt.
> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>   delivering the NMI directly. (Suggested by Jan Kiszka)
> 
> Changed from old version:
>   re-implement it by the Jan's suggestion.
>   fix the race found by Jan.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
> ---
>  hw/apic.c |   33 +++++++++++++++++++++++++++++++++
>  hw/apic.h |    1 +
>  monitor.c |    6 +++++-
>  3 files changed, 39 insertions(+), 1 deletions(-)
> diff --git a/hw/apic.c b/hw/apic.c
> index 69d6ac5..922796a 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
>      }
>  }
>  
> +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
> +
> +static void kvm_irqchip_deliver_nmi(void *p)
> +{
> +    APICState *s = p;
> +    struct kvm_lapic_state klapic;
> +    uint32_t lvt;
> +
> +    kvm_get_lapic(s->cpu_env, &klapic);
> +    lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1);
> +
> +    if (lvt & APIC_LVT_MASKED) {
> +        return;
> +    }
> +
> +    if (((lvt >> 8) & 7) != APIC_DM_NMI) {
> +        return;
> +    }
> +
> +    kvm_vcpu_ioctl(s->cpu_env, KVM_NMI);
> +}
> +
> +void apic_deliver_nmi(DeviceState *d)
> +{
> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
> +
> +    if (kvm_irqchip_in_kernel()) {
> +        run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s);
> +    } else {
> +        apic_local_deliver(s, APIC_LVT_LINT1);
> +    }
> +}
> +
>  #define foreach_apic(apic, deliver_bitmask, code) \
>  {\
>      int __i, __j, __mask;\
> diff --git a/hw/apic.h b/hw/apic.h
> index c857d52..3a4be0a 100644
> --- a/hw/apic.h
> +++ b/hw/apic.h
> @@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
>                               uint8_t trigger_mode);
>  int apic_accept_pic_intr(DeviceState *s);
>  void apic_deliver_pic_intr(DeviceState *s, int level);
> +void apic_deliver_nmi(DeviceState *d);
>  int apic_get_interrupt(DeviceState *s);
>  void apic_reset_irq_delivered(void);
>  int apic_get_irq_delivered(void);
> diff --git a/monitor.c b/monitor.c
> index cb485bf..0b81f17 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data)
>      CPUState *env;
>  
>      for (env = first_cpu; env != NULL; env = env->next_cpu) {
> -        cpu_interrupt(env, CPU_INTERRUPT_NMI);
> +        if (!env->apic_state) {
> +            cpu_interrupt(env, CPU_INTERRUPT_NMI);
> +        } else {
> +            apic_deliver_nmi(env->apic_state);
> +        }
>      }
>  
>      return 0;

Looks OK to me.

Please don't forget to bake a qemu-only patch for those bits that apply
to upstream as well (ie. the user space APIC path).

Jan
Lai Jiangshan Oct. 19, 2011, 6:33 a.m. UTC | #2
On 10/19/2011 03:41 AM, Jan Kiszka wrote:
> On 2011-10-17 18:00, Lai Jiangshan wrote:
>> On 10/17/2011 05:49 PM, Avi Kivity wrote:
>>> On 10/17/2011 11:40 AM, Lai Jiangshan wrote:
>>>>>>
>>>>>
>>>>> LINT1 may have been programmed as a level -triggered interrupt instead
>>>>> of edge triggered (NMI or interrupt).  We can use the ioctl argument for
>>>>> the level (and pressing the NMI button needs to pulse the level to 1 and
>>>>> back to 0).
>>>>>
>>>>
>>>> Hi, Avi, Jan,
>>>>
>>>> Which approach you prefer to?
>>>> I need to know the result before wasting too much time to respin
>>>> the approach.
>>>
>>> Yes, sorry about the slow and sometimes conflicting feedback.
>>>
>>>> 1) Fix KVM_NMI emulation approach  (which is v3 patchset)
>>>> 	- It directly fixes the problem and matches the
>>>> 	  real hard ware more, but it changes KVM_NMI bahavior.
>>>> 	- Require both kernel-site and userspace-site fix.
>>>>
>>>> 2) Get the LAPIC state from kernel irqchip, and inject NMI if it is allowed
>>>>    (which is v4 patchset)
>>>> 	- Simple, don't changes any kernel behavior.
>>>> 	- Only need the userspace-site fix
>>>>
>>>> 3) Add KVM_SET_LINT1 approach (which is v5 patchset)
>>>> 	- don't changes the kernel's KVM_NMI behavior.
>>>> 	- much complex
>>>> 	- Require both kernel-site and userspace-site fix.
>>>> 	- userspace-site should also handle the !KVM_SET_LINT1
>>>> 	  condition, it uses all the 2) approach' code. it means
>>>> 	  this approach equals the 2) approach + KVM_SET_LINT1 ioctl.
>>>>
>>>> This is an urgent bug of us, we need to settle it down soo
>>>
>>> While (1) is simple, it overloads a single ioctl with two meanings,
>>> that's not so good.
>>>
>>> Whether we do (1) or (3), we need (2) as well, for older kernels.
>>>
>>> So I recommend first focusing on (2) and merging it, then doing (3).
>>>
>>> (note an additional issue with 3 is whether to make it a vm or vcpu
>>> ioctl - we've been assuming vcpu ioctl but it's not necessarily the best
>>> choice).
>>>
>>
>> It is the 2) approach.
>> It only changes the user space site, the kernel site is not touched.
>> It is changed from previous v4 patch, fixed problems found by Jan.
>> ----------------------------
>>
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>
>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
>> button event happens. This doesn't properly emulate real hardware on
>> which NMI button event triggers LINT1. Because of this, NMI is sent to
>> the processor even when LINT1 is maskied in LVT. For example, this
>> causes the problem that kdump initiated by NMI sometimes doesn't work
>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
>>
>> With this patch, inject-nmi request is handled as follows.
>>
>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>>   interrupt.
>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>>   delivering the NMI directly. (Suggested by Jan Kiszka)
>>
>> Changed from old version:
>>   re-implement it by the Jan's suggestion.
>>   fix the race found by Jan.
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
>> ---
>>  hw/apic.c |   33 +++++++++++++++++++++++++++++++++
>>  hw/apic.h |    1 +
>>  monitor.c |    6 +++++-
>>  3 files changed, 39 insertions(+), 1 deletions(-)
>> diff --git a/hw/apic.c b/hw/apic.c
>> index 69d6ac5..922796a 100644
>> --- a/hw/apic.c
>> +++ b/hw/apic.c
>> @@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
>>      }
>>  }
>>  
>> +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
>> +
>> +static void kvm_irqchip_deliver_nmi(void *p)
>> +{
>> +    APICState *s = p;
>> +    struct kvm_lapic_state klapic;
>> +    uint32_t lvt;
>> +
>> +    kvm_get_lapic(s->cpu_env, &klapic);
>> +    lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1);
>> +
>> +    if (lvt & APIC_LVT_MASKED) {
>> +        return;
>> +    }
>> +
>> +    if (((lvt >> 8) & 7) != APIC_DM_NMI) {
>> +        return;
>> +    }
>> +
>> +    kvm_vcpu_ioctl(s->cpu_env, KVM_NMI);
>> +}
>> +
>> +void apic_deliver_nmi(DeviceState *d)
>> +{
>> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
>> +
>> +    if (kvm_irqchip_in_kernel()) {
>> +        run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s);
>> +    } else {
>> +        apic_local_deliver(s, APIC_LVT_LINT1);
>> +    }
>> +}
>> +
>>  #define foreach_apic(apic, deliver_bitmask, code) \
>>  {\
>>      int __i, __j, __mask;\
>> diff --git a/hw/apic.h b/hw/apic.h
>> index c857d52..3a4be0a 100644
>> --- a/hw/apic.h
>> +++ b/hw/apic.h
>> @@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
>>                               uint8_t trigger_mode);
>>  int apic_accept_pic_intr(DeviceState *s);
>>  void apic_deliver_pic_intr(DeviceState *s, int level);
>> +void apic_deliver_nmi(DeviceState *d);
>>  int apic_get_interrupt(DeviceState *s);
>>  void apic_reset_irq_delivered(void);
>>  int apic_get_irq_delivered(void);
>> diff --git a/monitor.c b/monitor.c
>> index cb485bf..0b81f17 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data)
>>      CPUState *env;
>>  
>>      for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> -        cpu_interrupt(env, CPU_INTERRUPT_NMI);
>> +        if (!env->apic_state) {
>> +            cpu_interrupt(env, CPU_INTERRUPT_NMI);
>> +        } else {
>> +            apic_deliver_nmi(env->apic_state);
>> +        }
>>      }
>>  
>>      return 0;
> 
> Looks OK to me.
> 
> Please don't forget to bake a qemu-only patch for those bits that apply
> to upstream as well (ie. the user space APIC path).
> 
> Jan
> 

I did forget it.
Did you mean we need to add "#ifdef KVM_CAP_IRQCHIP" back?

Thanks
Lai
Avi Kivity Oct. 19, 2011, 9:29 a.m. UTC | #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/18/2011 09:41 PM, Jan Kiszka wrote:
>
> Looks OK to me.
>
>

Same here.

- -- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJOnpiNAAoJEI7yEDeUysxllqUP/3K9oPbz9OxbqH3+9G1W9cUy
49hKR0DtLyf5WH0hoSq3/jA2T00PWR6fLIo6itth76x/TqnIuimjln6Nrj/T2nhO
PPvwJB4OE/9ahSlm3JOVsE/JYwDx6h3u9eouN5BqVoQax8S3mnhxSGLxZOp8wvar
ol6vDj2U8JbigV3fCsFheiP9tTZWZgH66qCdCUzuNUnYWUW5m9repdsXflTp6YyW
id30xzuZETnQ/0RFU0hnhrfQ/vvm1dJeK6Y2bPKowoDCp+CFNi/CnJYDAZA18FSQ
V5096U8cj8/m/Hr8fPLpyZzDonPz0KfMPvtfV9rVHEtqvf04Ym+gcdfwo+2U4LQs
16RNGWwsF6qIAcyevK9xCpcU9g00v6m0fyj3eQgD+JT+pV+m8QCzNnQyDqDlEUEl
ub0WR7ilnl3/NIa6FTKHqZ5Wct8f9mO6wcCtJKXDTcHo/2uB5+kHzqJsLE2UCaXm
ptaiyFGZgGNpUocO+tYxeORWm4kNMoZRAaYmiU0RWaoIkQMY0P/m/Ghy+nZBUexM
vdH1lQ8DQoqQQxiC38MoO717rBOHDgxPoUGVPyPtU7qPhI2sSMYa2r+Uwi/Pmsm/
/dbKMbQs9q9pVkESBsmpkSLMVOrLQE/ju3h7iikZmY5RVrm+pI8fyOo9e20+/mKG
aO5IT5IDaHXAVk8jjAWB
=rMf/
-----END PGP SIGNATURE-----
Jan Kiszka Oct. 19, 2011, 10:57 a.m. UTC | #4
On 2011-10-19 08:33, Lai Jiangshan wrote:
> On 10/19/2011 03:41 AM, Jan Kiszka wrote:
>> On 2011-10-17 18:00, Lai Jiangshan wrote:
>>> On 10/17/2011 05:49 PM, Avi Kivity wrote:
>>>> On 10/17/2011 11:40 AM, Lai Jiangshan wrote:
>>>>>>>
>>>>>>
>>>>>> LINT1 may have been programmed as a level -triggered interrupt instead
>>>>>> of edge triggered (NMI or interrupt).  We can use the ioctl argument for
>>>>>> the level (and pressing the NMI button needs to pulse the level to 1 and
>>>>>> back to 0).
>>>>>>
>>>>>
>>>>> Hi, Avi, Jan,
>>>>>
>>>>> Which approach you prefer to?
>>>>> I need to know the result before wasting too much time to respin
>>>>> the approach.
>>>>
>>>> Yes, sorry about the slow and sometimes conflicting feedback.
>>>>
>>>>> 1) Fix KVM_NMI emulation approach  (which is v3 patchset)
>>>>> 	- It directly fixes the problem and matches the
>>>>> 	  real hard ware more, but it changes KVM_NMI bahavior.
>>>>> 	- Require both kernel-site and userspace-site fix.
>>>>>
>>>>> 2) Get the LAPIC state from kernel irqchip, and inject NMI if it is allowed
>>>>>    (which is v4 patchset)
>>>>> 	- Simple, don't changes any kernel behavior.
>>>>> 	- Only need the userspace-site fix
>>>>>
>>>>> 3) Add KVM_SET_LINT1 approach (which is v5 patchset)
>>>>> 	- don't changes the kernel's KVM_NMI behavior.
>>>>> 	- much complex
>>>>> 	- Require both kernel-site and userspace-site fix.
>>>>> 	- userspace-site should also handle the !KVM_SET_LINT1
>>>>> 	  condition, it uses all the 2) approach' code. it means
>>>>> 	  this approach equals the 2) approach + KVM_SET_LINT1 ioctl.
>>>>>
>>>>> This is an urgent bug of us, we need to settle it down soo
>>>>
>>>> While (1) is simple, it overloads a single ioctl with two meanings,
>>>> that's not so good.
>>>>
>>>> Whether we do (1) or (3), we need (2) as well, for older kernels.
>>>>
>>>> So I recommend first focusing on (2) and merging it, then doing (3).
>>>>
>>>> (note an additional issue with 3 is whether to make it a vm or vcpu
>>>> ioctl - we've been assuming vcpu ioctl but it's not necessarily the best
>>>> choice).
>>>>
>>>
>>> It is the 2) approach.
>>> It only changes the user space site, the kernel site is not touched.
>>> It is changed from previous v4 patch, fixed problems found by Jan.
>>> ----------------------------
>>>
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>
>>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
>>> button event happens. This doesn't properly emulate real hardware on
>>> which NMI button event triggers LINT1. Because of this, NMI is sent to
>>> the processor even when LINT1 is maskied in LVT. For example, this
>>> causes the problem that kdump initiated by NMI sometimes doesn't work
>>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
>>>
>>> With this patch, inject-nmi request is handled as follows.
>>>
>>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>>>   interrupt.
>>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>>>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>>>   delivering the NMI directly. (Suggested by Jan Kiszka)
>>>
>>> Changed from old version:
>>>   re-implement it by the Jan's suggestion.
>>>   fix the race found by Jan.
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>> Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
>>> ---
>>>  hw/apic.c |   33 +++++++++++++++++++++++++++++++++
>>>  hw/apic.h |    1 +
>>>  monitor.c |    6 +++++-
>>>  3 files changed, 39 insertions(+), 1 deletions(-)
>>> diff --git a/hw/apic.c b/hw/apic.c
>>> index 69d6ac5..922796a 100644
>>> --- a/hw/apic.c
>>> +++ b/hw/apic.c
>>> @@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
>>>      }
>>>  }
>>>  
>>> +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
>>> +
>>> +static void kvm_irqchip_deliver_nmi(void *p)
>>> +{
>>> +    APICState *s = p;
>>> +    struct kvm_lapic_state klapic;
>>> +    uint32_t lvt;
>>> +
>>> +    kvm_get_lapic(s->cpu_env, &klapic);
>>> +    lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1);
>>> +
>>> +    if (lvt & APIC_LVT_MASKED) {
>>> +        return;
>>> +    }
>>> +
>>> +    if (((lvt >> 8) & 7) != APIC_DM_NMI) {
>>> +        return;
>>> +    }
>>> +
>>> +    kvm_vcpu_ioctl(s->cpu_env, KVM_NMI);
>>> +}
>>> +
>>> +void apic_deliver_nmi(DeviceState *d)
>>> +{
>>> +    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
>>> +
>>> +    if (kvm_irqchip_in_kernel()) {
>>> +        run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s);
>>> +    } else {
>>> +        apic_local_deliver(s, APIC_LVT_LINT1);
>>> +    }
>>> +}
>>> +
>>>  #define foreach_apic(apic, deliver_bitmask, code) \
>>>  {\
>>>      int __i, __j, __mask;\
>>> diff --git a/hw/apic.h b/hw/apic.h
>>> index c857d52..3a4be0a 100644
>>> --- a/hw/apic.h
>>> +++ b/hw/apic.h
>>> @@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
>>>                               uint8_t trigger_mode);
>>>  int apic_accept_pic_intr(DeviceState *s);
>>>  void apic_deliver_pic_intr(DeviceState *s, int level);
>>> +void apic_deliver_nmi(DeviceState *d);
>>>  int apic_get_interrupt(DeviceState *s);
>>>  void apic_reset_irq_delivered(void);
>>>  int apic_get_irq_delivered(void);
>>> diff --git a/monitor.c b/monitor.c
>>> index cb485bf..0b81f17 100644
>>> --- a/monitor.c
>>> +++ b/monitor.c
>>> @@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data)
>>>      CPUState *env;
>>>  
>>>      for (env = first_cpu; env != NULL; env = env->next_cpu) {
>>> -        cpu_interrupt(env, CPU_INTERRUPT_NMI);
>>> +        if (!env->apic_state) {
>>> +            cpu_interrupt(env, CPU_INTERRUPT_NMI);
>>> +        } else {
>>> +            apic_deliver_nmi(env->apic_state);
>>> +        }
>>>      }
>>>  
>>>      return 0;
>>
>> Looks OK to me.
>>
>> Please don't forget to bake a qemu-only patch for those bits that apply
>> to upstream as well (ie. the user space APIC path).
>>
>> Jan
>>
> 
> I did forget it.
> Did you mean we need to add "#ifdef KVM_CAP_IRQCHIP" back?

No. I meant basically your patch minus the kvm_in_kernel_irqchip code
paths, applicable against current qemu.git. Those paths will be re-added
(slightly differently) when upstream gains that support. I'm working on
a basic version an will incorporate the logic if your qemu patch is
already available.

Jan
Lai Jiangshan Oct. 19, 2011, 3:32 p.m. UTC | #5
On 10/19/2011 05:29 PM, Avi Kivity wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 10/18/2011 09:41 PM, Jan Kiszka wrote:
>>
>> Looks OK to me.
>>
>>
> 
> Same here.

Who will merge it?

Thanks,
Lai

> 
> - -- 
> I have a truly marvellous patch that fixes the bug which this
> signature is too narrow to contain.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iQIcBAEBAgAGBQJOnpiNAAoJEI7yEDeUysxllqUP/3K9oPbz9OxbqH3+9G1W9cUy
> 49hKR0DtLyf5WH0hoSq3/jA2T00PWR6fLIo6itth76x/TqnIuimjln6Nrj/T2nhO
> PPvwJB4OE/9ahSlm3JOVsE/JYwDx6h3u9eouN5BqVoQax8S3mnhxSGLxZOp8wvar
> ol6vDj2U8JbigV3fCsFheiP9tTZWZgH66qCdCUzuNUnYWUW5m9repdsXflTp6YyW
> id30xzuZETnQ/0RFU0hnhrfQ/vvm1dJeK6Y2bPKowoDCp+CFNi/CnJYDAZA18FSQ
> V5096U8cj8/m/Hr8fPLpyZzDonPz0KfMPvtfV9rVHEtqvf04Ym+gcdfwo+2U4LQs
> 16RNGWwsF6qIAcyevK9xCpcU9g00v6m0fyj3eQgD+JT+pV+m8QCzNnQyDqDlEUEl
> ub0WR7ilnl3/NIa6FTKHqZ5Wct8f9mO6wcCtJKXDTcHo/2uB5+kHzqJsLE2UCaXm
> ptaiyFGZgGNpUocO+tYxeORWm4kNMoZRAaYmiU0RWaoIkQMY0P/m/Ghy+nZBUexM
> vdH1lQ8DQoqQQxiC38MoO717rBOHDgxPoUGVPyPtU7qPhI2sSMYa2r+Uwi/Pmsm/
> /dbKMbQs9q9pVkESBsmpkSLMVOrLQE/ju3h7iikZmY5RVrm+pI8fyOo9e20+/mKG
> aO5IT5IDaHXAVk8jjAWB
> =rMf/
> -----END PGP SIGNATURE-----
> 
>
Avi Kivity Dec. 7, 2011, 10:29 a.m. UTC | #6
On 10/17/2011 06:00 PM, Lai Jiangshan wrote:
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>
> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
> button event happens. This doesn't properly emulate real hardware on
> which NMI button event triggers LINT1. Because of this, NMI is sent to
> the processor even when LINT1 is maskied in LVT. For example, this
> causes the problem that kdump initiated by NMI sometimes doesn't work
> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
>
> With this patch, inject-nmi request is handled as follows.
>
> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>   interrupt.
> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>   delivering the NMI directly. (Suggested by Jan Kiszka)
>
> Changed from old version:
>   re-implement it by the Jan's suggestion.
>   fix the race found by Jan.

This patch fell through the cracks, sorry.  Now applied.

Sasha, this patch highlights the issues with KVM_NMI.
Jan Kiszka Dec. 8, 2011, 9:42 a.m. UTC | #7
On 2011-12-07 11:29, Avi Kivity wrote:
> On 10/17/2011 06:00 PM, Lai Jiangshan wrote:
>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>
>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
>> button event happens. This doesn't properly emulate real hardware on
>> which NMI button event triggers LINT1. Because of this, NMI is sent to
>> the processor even when LINT1 is maskied in LVT. For example, this
>> causes the problem that kdump initiated by NMI sometimes doesn't work
>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
>>
>> With this patch, inject-nmi request is handled as follows.
>>
>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>>   interrupt.
>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>>   delivering the NMI directly. (Suggested by Jan Kiszka)
>>
>> Changed from old version:
>>   re-implement it by the Jan's suggestion.
>>   fix the race found by Jan.
> 
> This patch fell through the cracks, sorry.  Now applied.

Lai, what is the state of a corresponding QEMU upstream patch? I'd like
to build on top of it for my upstream irqchip series.

Thanks,
Jan
Jan Kiszka Dec. 8, 2011, 10:20 a.m. UTC | #8
On 2011-12-08 10:42, Jan Kiszka wrote:
> On 2011-12-07 11:29, Avi Kivity wrote:
>> On 10/17/2011 06:00 PM, Lai Jiangshan wrote:
>>> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>>>
>>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI
>>> button event happens. This doesn't properly emulate real hardware on
>>> which NMI button event triggers LINT1. Because of this, NMI is sent to
>>> the processor even when LINT1 is maskied in LVT. For example, this
>>> causes the problem that kdump initiated by NMI sometimes doesn't work
>>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0.
>>>
>>> With this patch, inject-nmi request is handled as follows.
>>>
>>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI
>>>   interrupt.
>>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states
>>>   and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then
>>>   delivering the NMI directly. (Suggested by Jan Kiszka)
>>>
>>> Changed from old version:
>>>   re-implement it by the Jan's suggestion.
>>>   fix the race found by Jan.
>>
>> This patch fell through the cracks, sorry.  Now applied.
> 
> Lai, what is the state of a corresponding QEMU upstream patch? I'd like
> to build on top of it for my upstream irqchip series.

Never mind, I'll include a patch in my series as it requires some
tweaking to the APIC backend concept.

Jan
diff mbox

Patch

diff --git a/hw/apic.c b/hw/apic.c
index 69d6ac5..922796a 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -205,6 +205,39 @@  void apic_deliver_pic_intr(DeviceState *d, int level)
     }
 }
 
+static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id);
+
+static void kvm_irqchip_deliver_nmi(void *p)
+{
+    APICState *s = p;
+    struct kvm_lapic_state klapic;
+    uint32_t lvt;
+
+    kvm_get_lapic(s->cpu_env, &klapic);
+    lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1);
+
+    if (lvt & APIC_LVT_MASKED) {
+        return;
+    }
+
+    if (((lvt >> 8) & 7) != APIC_DM_NMI) {
+        return;
+    }
+
+    kvm_vcpu_ioctl(s->cpu_env, KVM_NMI);
+}
+
+void apic_deliver_nmi(DeviceState *d)
+{
+    APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+    if (kvm_irqchip_in_kernel()) {
+        run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s);
+    } else {
+        apic_local_deliver(s, APIC_LVT_LINT1);
+    }
+}
+
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
     int __i, __j, __mask;\
diff --git a/hw/apic.h b/hw/apic.h
index c857d52..3a4be0a 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -10,6 +10,7 @@  void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
                              uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
+void apic_deliver_nmi(DeviceState *d);
 int apic_get_interrupt(DeviceState *s);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
diff --git a/monitor.c b/monitor.c
index cb485bf..0b81f17 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2616,7 +2616,11 @@  static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data)
     CPUState *env;
 
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
-        cpu_interrupt(env, CPU_INTERRUPT_NMI);
+        if (!env->apic_state) {
+            cpu_interrupt(env, CPU_INTERRUPT_NMI);
+        } else {
+            apic_deliver_nmi(env->apic_state);
+        }
     }
 
     return 0;