Message ID | 4E9C5106.5070506@cn.fujitsu.com |
---|---|
State | New |
Headers | show |
On 2011-10-17 18:00, Lai Jiangshan wrote: > On 10/17/2011 05:49 PM, Avi Kivity wrote: >> On 10/17/2011 11:40 AM, Lai Jiangshan wrote: >>>>> >>>> >>>> LINT1 may have been programmed as a level -triggered interrupt instead >>>> of edge triggered (NMI or interrupt). We can use the ioctl argument for >>>> the level (and pressing the NMI button needs to pulse the level to 1 and >>>> back to 0). >>>> >>> >>> Hi, Avi, Jan, >>> >>> Which approach you prefer to? >>> I need to know the result before wasting too much time to respin >>> the approach. >> >> Yes, sorry about the slow and sometimes conflicting feedback. >> >>> 1) Fix KVM_NMI emulation approach (which is v3 patchset) >>> - It directly fixes the problem and matches the >>> real hard ware more, but it changes KVM_NMI bahavior. >>> - Require both kernel-site and userspace-site fix. >>> >>> 2) Get the LAPIC state from kernel irqchip, and inject NMI if it is allowed >>> (which is v4 patchset) >>> - Simple, don't changes any kernel behavior. >>> - Only need the userspace-site fix >>> >>> 3) Add KVM_SET_LINT1 approach (which is v5 patchset) >>> - don't changes the kernel's KVM_NMI behavior. >>> - much complex >>> - Require both kernel-site and userspace-site fix. >>> - userspace-site should also handle the !KVM_SET_LINT1 >>> condition, it uses all the 2) approach' code. it means >>> this approach equals the 2) approach + KVM_SET_LINT1 ioctl. >>> >>> This is an urgent bug of us, we need to settle it down soo >> >> While (1) is simple, it overloads a single ioctl with two meanings, >> that's not so good. >> >> Whether we do (1) or (3), we need (2) as well, for older kernels. >> >> So I recommend first focusing on (2) and merging it, then doing (3). >> >> (note an additional issue with 3 is whether to make it a vm or vcpu >> ioctl - we've been assuming vcpu ioctl but it's not necessarily the best >> choice). >> > > It is the 2) approach. > It only changes the user space site, the kernel site is not touched. > It is changed from previous v4 patch, fixed problems found by Jan. > ---------------------------- > > From: Lai Jiangshan <laijs@cn.fujitsu.com> > > Currently, NMI interrupt is blindly sent to all the vCPUs when NMI > button event happens. This doesn't properly emulate real hardware on > which NMI button event triggers LINT1. Because of this, NMI is sent to > the processor even when LINT1 is maskied in LVT. For example, this > causes the problem that kdump initiated by NMI sometimes doesn't work > on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. > > With this patch, inject-nmi request is handled as follows. > > - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI > interrupt. > - When in-kernel irqchip is enabled, get the in-kernel LAPIC states > and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then > delivering the NMI directly. (Suggested by Jan Kiszka) > > Changed from old version: > re-implement it by the Jan's suggestion. > fix the race found by Jan. > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> > --- > hw/apic.c | 33 +++++++++++++++++++++++++++++++++ > hw/apic.h | 1 + > monitor.c | 6 +++++- > 3 files changed, 39 insertions(+), 1 deletions(-) > diff --git a/hw/apic.c b/hw/apic.c > index 69d6ac5..922796a 100644 > --- a/hw/apic.c > +++ b/hw/apic.c > @@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level) > } > } > > +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id); > + > +static void kvm_irqchip_deliver_nmi(void *p) > +{ > + APICState *s = p; > + struct kvm_lapic_state klapic; > + uint32_t lvt; > + > + kvm_get_lapic(s->cpu_env, &klapic); > + lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1); > + > + if (lvt & APIC_LVT_MASKED) { > + return; > + } > + > + if (((lvt >> 8) & 7) != APIC_DM_NMI) { > + return; > + } > + > + kvm_vcpu_ioctl(s->cpu_env, KVM_NMI); > +} > + > +void apic_deliver_nmi(DeviceState *d) > +{ > + APICState *s = DO_UPCAST(APICState, busdev.qdev, d); > + > + if (kvm_irqchip_in_kernel()) { > + run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s); > + } else { > + apic_local_deliver(s, APIC_LVT_LINT1); > + } > +} > + > #define foreach_apic(apic, deliver_bitmask, code) \ > {\ > int __i, __j, __mask;\ > diff --git a/hw/apic.h b/hw/apic.h > index c857d52..3a4be0a 100644 > --- a/hw/apic.h > +++ b/hw/apic.h > @@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, > uint8_t trigger_mode); > int apic_accept_pic_intr(DeviceState *s); > void apic_deliver_pic_intr(DeviceState *s, int level); > +void apic_deliver_nmi(DeviceState *d); > int apic_get_interrupt(DeviceState *s); > void apic_reset_irq_delivered(void); > int apic_get_irq_delivered(void); > diff --git a/monitor.c b/monitor.c > index cb485bf..0b81f17 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) > CPUState *env; > > for (env = first_cpu; env != NULL; env = env->next_cpu) { > - cpu_interrupt(env, CPU_INTERRUPT_NMI); > + if (!env->apic_state) { > + cpu_interrupt(env, CPU_INTERRUPT_NMI); > + } else { > + apic_deliver_nmi(env->apic_state); > + } > } > > return 0; Looks OK to me. Please don't forget to bake a qemu-only patch for those bits that apply to upstream as well (ie. the user space APIC path). Jan
On 10/19/2011 03:41 AM, Jan Kiszka wrote: > On 2011-10-17 18:00, Lai Jiangshan wrote: >> On 10/17/2011 05:49 PM, Avi Kivity wrote: >>> On 10/17/2011 11:40 AM, Lai Jiangshan wrote: >>>>>> >>>>> >>>>> LINT1 may have been programmed as a level -triggered interrupt instead >>>>> of edge triggered (NMI or interrupt). We can use the ioctl argument for >>>>> the level (and pressing the NMI button needs to pulse the level to 1 and >>>>> back to 0). >>>>> >>>> >>>> Hi, Avi, Jan, >>>> >>>> Which approach you prefer to? >>>> I need to know the result before wasting too much time to respin >>>> the approach. >>> >>> Yes, sorry about the slow and sometimes conflicting feedback. >>> >>>> 1) Fix KVM_NMI emulation approach (which is v3 patchset) >>>> - It directly fixes the problem and matches the >>>> real hard ware more, but it changes KVM_NMI bahavior. >>>> - Require both kernel-site and userspace-site fix. >>>> >>>> 2) Get the LAPIC state from kernel irqchip, and inject NMI if it is allowed >>>> (which is v4 patchset) >>>> - Simple, don't changes any kernel behavior. >>>> - Only need the userspace-site fix >>>> >>>> 3) Add KVM_SET_LINT1 approach (which is v5 patchset) >>>> - don't changes the kernel's KVM_NMI behavior. >>>> - much complex >>>> - Require both kernel-site and userspace-site fix. >>>> - userspace-site should also handle the !KVM_SET_LINT1 >>>> condition, it uses all the 2) approach' code. it means >>>> this approach equals the 2) approach + KVM_SET_LINT1 ioctl. >>>> >>>> This is an urgent bug of us, we need to settle it down soo >>> >>> While (1) is simple, it overloads a single ioctl with two meanings, >>> that's not so good. >>> >>> Whether we do (1) or (3), we need (2) as well, for older kernels. >>> >>> So I recommend first focusing on (2) and merging it, then doing (3). >>> >>> (note an additional issue with 3 is whether to make it a vm or vcpu >>> ioctl - we've been assuming vcpu ioctl but it's not necessarily the best >>> choice). >>> >> >> It is the 2) approach. >> It only changes the user space site, the kernel site is not touched. >> It is changed from previous v4 patch, fixed problems found by Jan. >> ---------------------------- >> >> From: Lai Jiangshan <laijs@cn.fujitsu.com> >> >> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI >> button event happens. This doesn't properly emulate real hardware on >> which NMI button event triggers LINT1. Because of this, NMI is sent to >> the processor even when LINT1 is maskied in LVT. For example, this >> causes the problem that kdump initiated by NMI sometimes doesn't work >> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. >> >> With this patch, inject-nmi request is handled as follows. >> >> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI >> interrupt. >> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states >> and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then >> delivering the NMI directly. (Suggested by Jan Kiszka) >> >> Changed from old version: >> re-implement it by the Jan's suggestion. >> fix the race found by Jan. >> >> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> >> Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> >> --- >> hw/apic.c | 33 +++++++++++++++++++++++++++++++++ >> hw/apic.h | 1 + >> monitor.c | 6 +++++- >> 3 files changed, 39 insertions(+), 1 deletions(-) >> diff --git a/hw/apic.c b/hw/apic.c >> index 69d6ac5..922796a 100644 >> --- a/hw/apic.c >> +++ b/hw/apic.c >> @@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level) >> } >> } >> >> +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id); >> + >> +static void kvm_irqchip_deliver_nmi(void *p) >> +{ >> + APICState *s = p; >> + struct kvm_lapic_state klapic; >> + uint32_t lvt; >> + >> + kvm_get_lapic(s->cpu_env, &klapic); >> + lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1); >> + >> + if (lvt & APIC_LVT_MASKED) { >> + return; >> + } >> + >> + if (((lvt >> 8) & 7) != APIC_DM_NMI) { >> + return; >> + } >> + >> + kvm_vcpu_ioctl(s->cpu_env, KVM_NMI); >> +} >> + >> +void apic_deliver_nmi(DeviceState *d) >> +{ >> + APICState *s = DO_UPCAST(APICState, busdev.qdev, d); >> + >> + if (kvm_irqchip_in_kernel()) { >> + run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s); >> + } else { >> + apic_local_deliver(s, APIC_LVT_LINT1); >> + } >> +} >> + >> #define foreach_apic(apic, deliver_bitmask, code) \ >> {\ >> int __i, __j, __mask;\ >> diff --git a/hw/apic.h b/hw/apic.h >> index c857d52..3a4be0a 100644 >> --- a/hw/apic.h >> +++ b/hw/apic.h >> @@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, >> uint8_t trigger_mode); >> int apic_accept_pic_intr(DeviceState *s); >> void apic_deliver_pic_intr(DeviceState *s, int level); >> +void apic_deliver_nmi(DeviceState *d); >> int apic_get_interrupt(DeviceState *s); >> void apic_reset_irq_delivered(void); >> int apic_get_irq_delivered(void); >> diff --git a/monitor.c b/monitor.c >> index cb485bf..0b81f17 100644 >> --- a/monitor.c >> +++ b/monitor.c >> @@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) >> CPUState *env; >> >> for (env = first_cpu; env != NULL; env = env->next_cpu) { >> - cpu_interrupt(env, CPU_INTERRUPT_NMI); >> + if (!env->apic_state) { >> + cpu_interrupt(env, CPU_INTERRUPT_NMI); >> + } else { >> + apic_deliver_nmi(env->apic_state); >> + } >> } >> >> return 0; > > Looks OK to me. > > Please don't forget to bake a qemu-only patch for those bits that apply > to upstream as well (ie. the user space APIC path). > > Jan > I did forget it. Did you mean we need to add "#ifdef KVM_CAP_IRQCHIP" back? Thanks Lai
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/18/2011 09:41 PM, Jan Kiszka wrote: > > Looks OK to me. > > Same here. - -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJOnpiNAAoJEI7yEDeUysxllqUP/3K9oPbz9OxbqH3+9G1W9cUy 49hKR0DtLyf5WH0hoSq3/jA2T00PWR6fLIo6itth76x/TqnIuimjln6Nrj/T2nhO PPvwJB4OE/9ahSlm3JOVsE/JYwDx6h3u9eouN5BqVoQax8S3mnhxSGLxZOp8wvar ol6vDj2U8JbigV3fCsFheiP9tTZWZgH66qCdCUzuNUnYWUW5m9repdsXflTp6YyW id30xzuZETnQ/0RFU0hnhrfQ/vvm1dJeK6Y2bPKowoDCp+CFNi/CnJYDAZA18FSQ V5096U8cj8/m/Hr8fPLpyZzDonPz0KfMPvtfV9rVHEtqvf04Ym+gcdfwo+2U4LQs 16RNGWwsF6qIAcyevK9xCpcU9g00v6m0fyj3eQgD+JT+pV+m8QCzNnQyDqDlEUEl ub0WR7ilnl3/NIa6FTKHqZ5Wct8f9mO6wcCtJKXDTcHo/2uB5+kHzqJsLE2UCaXm ptaiyFGZgGNpUocO+tYxeORWm4kNMoZRAaYmiU0RWaoIkQMY0P/m/Ghy+nZBUexM vdH1lQ8DQoqQQxiC38MoO717rBOHDgxPoUGVPyPtU7qPhI2sSMYa2r+Uwi/Pmsm/ /dbKMbQs9q9pVkESBsmpkSLMVOrLQE/ju3h7iikZmY5RVrm+pI8fyOo9e20+/mKG aO5IT5IDaHXAVk8jjAWB =rMf/ -----END PGP SIGNATURE-----
On 2011-10-19 08:33, Lai Jiangshan wrote: > On 10/19/2011 03:41 AM, Jan Kiszka wrote: >> On 2011-10-17 18:00, Lai Jiangshan wrote: >>> On 10/17/2011 05:49 PM, Avi Kivity wrote: >>>> On 10/17/2011 11:40 AM, Lai Jiangshan wrote: >>>>>>> >>>>>> >>>>>> LINT1 may have been programmed as a level -triggered interrupt instead >>>>>> of edge triggered (NMI or interrupt). We can use the ioctl argument for >>>>>> the level (and pressing the NMI button needs to pulse the level to 1 and >>>>>> back to 0). >>>>>> >>>>> >>>>> Hi, Avi, Jan, >>>>> >>>>> Which approach you prefer to? >>>>> I need to know the result before wasting too much time to respin >>>>> the approach. >>>> >>>> Yes, sorry about the slow and sometimes conflicting feedback. >>>> >>>>> 1) Fix KVM_NMI emulation approach (which is v3 patchset) >>>>> - It directly fixes the problem and matches the >>>>> real hard ware more, but it changes KVM_NMI bahavior. >>>>> - Require both kernel-site and userspace-site fix. >>>>> >>>>> 2) Get the LAPIC state from kernel irqchip, and inject NMI if it is allowed >>>>> (which is v4 patchset) >>>>> - Simple, don't changes any kernel behavior. >>>>> - Only need the userspace-site fix >>>>> >>>>> 3) Add KVM_SET_LINT1 approach (which is v5 patchset) >>>>> - don't changes the kernel's KVM_NMI behavior. >>>>> - much complex >>>>> - Require both kernel-site and userspace-site fix. >>>>> - userspace-site should also handle the !KVM_SET_LINT1 >>>>> condition, it uses all the 2) approach' code. it means >>>>> this approach equals the 2) approach + KVM_SET_LINT1 ioctl. >>>>> >>>>> This is an urgent bug of us, we need to settle it down soo >>>> >>>> While (1) is simple, it overloads a single ioctl with two meanings, >>>> that's not so good. >>>> >>>> Whether we do (1) or (3), we need (2) as well, for older kernels. >>>> >>>> So I recommend first focusing on (2) and merging it, then doing (3). >>>> >>>> (note an additional issue with 3 is whether to make it a vm or vcpu >>>> ioctl - we've been assuming vcpu ioctl but it's not necessarily the best >>>> choice). >>>> >>> >>> It is the 2) approach. >>> It only changes the user space site, the kernel site is not touched. >>> It is changed from previous v4 patch, fixed problems found by Jan. >>> ---------------------------- >>> >>> From: Lai Jiangshan <laijs@cn.fujitsu.com> >>> >>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI >>> button event happens. This doesn't properly emulate real hardware on >>> which NMI button event triggers LINT1. Because of this, NMI is sent to >>> the processor even when LINT1 is maskied in LVT. For example, this >>> causes the problem that kdump initiated by NMI sometimes doesn't work >>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. >>> >>> With this patch, inject-nmi request is handled as follows. >>> >>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI >>> interrupt. >>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states >>> and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then >>> delivering the NMI directly. (Suggested by Jan Kiszka) >>> >>> Changed from old version: >>> re-implement it by the Jan's suggestion. >>> fix the race found by Jan. >>> >>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> >>> Reported-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> >>> --- >>> hw/apic.c | 33 +++++++++++++++++++++++++++++++++ >>> hw/apic.h | 1 + >>> monitor.c | 6 +++++- >>> 3 files changed, 39 insertions(+), 1 deletions(-) >>> diff --git a/hw/apic.c b/hw/apic.c >>> index 69d6ac5..922796a 100644 >>> --- a/hw/apic.c >>> +++ b/hw/apic.c >>> @@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level) >>> } >>> } >>> >>> +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id); >>> + >>> +static void kvm_irqchip_deliver_nmi(void *p) >>> +{ >>> + APICState *s = p; >>> + struct kvm_lapic_state klapic; >>> + uint32_t lvt; >>> + >>> + kvm_get_lapic(s->cpu_env, &klapic); >>> + lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1); >>> + >>> + if (lvt & APIC_LVT_MASKED) { >>> + return; >>> + } >>> + >>> + if (((lvt >> 8) & 7) != APIC_DM_NMI) { >>> + return; >>> + } >>> + >>> + kvm_vcpu_ioctl(s->cpu_env, KVM_NMI); >>> +} >>> + >>> +void apic_deliver_nmi(DeviceState *d) >>> +{ >>> + APICState *s = DO_UPCAST(APICState, busdev.qdev, d); >>> + >>> + if (kvm_irqchip_in_kernel()) { >>> + run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s); >>> + } else { >>> + apic_local_deliver(s, APIC_LVT_LINT1); >>> + } >>> +} >>> + >>> #define foreach_apic(apic, deliver_bitmask, code) \ >>> {\ >>> int __i, __j, __mask;\ >>> diff --git a/hw/apic.h b/hw/apic.h >>> index c857d52..3a4be0a 100644 >>> --- a/hw/apic.h >>> +++ b/hw/apic.h >>> @@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, >>> uint8_t trigger_mode); >>> int apic_accept_pic_intr(DeviceState *s); >>> void apic_deliver_pic_intr(DeviceState *s, int level); >>> +void apic_deliver_nmi(DeviceState *d); >>> int apic_get_interrupt(DeviceState *s); >>> void apic_reset_irq_delivered(void); >>> int apic_get_irq_delivered(void); >>> diff --git a/monitor.c b/monitor.c >>> index cb485bf..0b81f17 100644 >>> --- a/monitor.c >>> +++ b/monitor.c >>> @@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) >>> CPUState *env; >>> >>> for (env = first_cpu; env != NULL; env = env->next_cpu) { >>> - cpu_interrupt(env, CPU_INTERRUPT_NMI); >>> + if (!env->apic_state) { >>> + cpu_interrupt(env, CPU_INTERRUPT_NMI); >>> + } else { >>> + apic_deliver_nmi(env->apic_state); >>> + } >>> } >>> >>> return 0; >> >> Looks OK to me. >> >> Please don't forget to bake a qemu-only patch for those bits that apply >> to upstream as well (ie. the user space APIC path). >> >> Jan >> > > I did forget it. > Did you mean we need to add "#ifdef KVM_CAP_IRQCHIP" back? No. I meant basically your patch minus the kvm_in_kernel_irqchip code paths, applicable against current qemu.git. Those paths will be re-added (slightly differently) when upstream gains that support. I'm working on a basic version an will incorporate the logic if your qemu patch is already available. Jan
On 10/19/2011 05:29 PM, Avi Kivity wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 10/18/2011 09:41 PM, Jan Kiszka wrote: >> >> Looks OK to me. >> >> > > Same here. Who will merge it? Thanks, Lai > > - -- > I have a truly marvellous patch that fixes the bug which this > signature is too narrow to contain. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iQIcBAEBAgAGBQJOnpiNAAoJEI7yEDeUysxllqUP/3K9oPbz9OxbqH3+9G1W9cUy > 49hKR0DtLyf5WH0hoSq3/jA2T00PWR6fLIo6itth76x/TqnIuimjln6Nrj/T2nhO > PPvwJB4OE/9ahSlm3JOVsE/JYwDx6h3u9eouN5BqVoQax8S3mnhxSGLxZOp8wvar > ol6vDj2U8JbigV3fCsFheiP9tTZWZgH66qCdCUzuNUnYWUW5m9repdsXflTp6YyW > id30xzuZETnQ/0RFU0hnhrfQ/vvm1dJeK6Y2bPKowoDCp+CFNi/CnJYDAZA18FSQ > V5096U8cj8/m/Hr8fPLpyZzDonPz0KfMPvtfV9rVHEtqvf04Ym+gcdfwo+2U4LQs > 16RNGWwsF6qIAcyevK9xCpcU9g00v6m0fyj3eQgD+JT+pV+m8QCzNnQyDqDlEUEl > ub0WR7ilnl3/NIa6FTKHqZ5Wct8f9mO6wcCtJKXDTcHo/2uB5+kHzqJsLE2UCaXm > ptaiyFGZgGNpUocO+tYxeORWm4kNMoZRAaYmiU0RWaoIkQMY0P/m/Ghy+nZBUexM > vdH1lQ8DQoqQQxiC38MoO717rBOHDgxPoUGVPyPtU7qPhI2sSMYa2r+Uwi/Pmsm/ > /dbKMbQs9q9pVkESBsmpkSLMVOrLQE/ju3h7iikZmY5RVrm+pI8fyOo9e20+/mKG > aO5IT5IDaHXAVk8jjAWB > =rMf/ > -----END PGP SIGNATURE----- > >
On 10/17/2011 06:00 PM, Lai Jiangshan wrote: > From: Lai Jiangshan <laijs@cn.fujitsu.com> > > Currently, NMI interrupt is blindly sent to all the vCPUs when NMI > button event happens. This doesn't properly emulate real hardware on > which NMI button event triggers LINT1. Because of this, NMI is sent to > the processor even when LINT1 is maskied in LVT. For example, this > causes the problem that kdump initiated by NMI sometimes doesn't work > on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. > > With this patch, inject-nmi request is handled as follows. > > - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI > interrupt. > - When in-kernel irqchip is enabled, get the in-kernel LAPIC states > and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then > delivering the NMI directly. (Suggested by Jan Kiszka) > > Changed from old version: > re-implement it by the Jan's suggestion. > fix the race found by Jan. This patch fell through the cracks, sorry. Now applied. Sasha, this patch highlights the issues with KVM_NMI.
On 2011-12-07 11:29, Avi Kivity wrote: > On 10/17/2011 06:00 PM, Lai Jiangshan wrote: >> From: Lai Jiangshan <laijs@cn.fujitsu.com> >> >> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI >> button event happens. This doesn't properly emulate real hardware on >> which NMI button event triggers LINT1. Because of this, NMI is sent to >> the processor even when LINT1 is maskied in LVT. For example, this >> causes the problem that kdump initiated by NMI sometimes doesn't work >> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. >> >> With this patch, inject-nmi request is handled as follows. >> >> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI >> interrupt. >> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states >> and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then >> delivering the NMI directly. (Suggested by Jan Kiszka) >> >> Changed from old version: >> re-implement it by the Jan's suggestion. >> fix the race found by Jan. > > This patch fell through the cracks, sorry. Now applied. Lai, what is the state of a corresponding QEMU upstream patch? I'd like to build on top of it for my upstream irqchip series. Thanks, Jan
On 2011-12-08 10:42, Jan Kiszka wrote: > On 2011-12-07 11:29, Avi Kivity wrote: >> On 10/17/2011 06:00 PM, Lai Jiangshan wrote: >>> From: Lai Jiangshan <laijs@cn.fujitsu.com> >>> >>> Currently, NMI interrupt is blindly sent to all the vCPUs when NMI >>> button event happens. This doesn't properly emulate real hardware on >>> which NMI button event triggers LINT1. Because of this, NMI is sent to >>> the processor even when LINT1 is maskied in LVT. For example, this >>> causes the problem that kdump initiated by NMI sometimes doesn't work >>> on KVM, because kdump assumes NMI is masked on CPUs other than CPU0. >>> >>> With this patch, inject-nmi request is handled as follows. >>> >>> - When in-kernel irqchip is disabled, deliver LINT1 instead of NMI >>> interrupt. >>> - When in-kernel irqchip is enabled, get the in-kernel LAPIC states >>> and test the APIC_LVT_MASKED, if LINT1 is unmasked, and then >>> delivering the NMI directly. (Suggested by Jan Kiszka) >>> >>> Changed from old version: >>> re-implement it by the Jan's suggestion. >>> fix the race found by Jan. >> >> This patch fell through the cracks, sorry. Now applied. > > Lai, what is the state of a corresponding QEMU upstream patch? I'd like > to build on top of it for my upstream irqchip series. Never mind, I'll include a patch in my series as it requires some tweaking to the APIC backend concept. Jan
diff --git a/hw/apic.c b/hw/apic.c index 69d6ac5..922796a 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -205,6 +205,39 @@ void apic_deliver_pic_intr(DeviceState *d, int level) } } +static inline uint32_t kapic_reg(struct kvm_lapic_state *kapic, int reg_id); + +static void kvm_irqchip_deliver_nmi(void *p) +{ + APICState *s = p; + struct kvm_lapic_state klapic; + uint32_t lvt; + + kvm_get_lapic(s->cpu_env, &klapic); + lvt = kapic_reg(&klapic, 0x32 + APIC_LVT_LINT1); + + if (lvt & APIC_LVT_MASKED) { + return; + } + + if (((lvt >> 8) & 7) != APIC_DM_NMI) { + return; + } + + kvm_vcpu_ioctl(s->cpu_env, KVM_NMI); +} + +void apic_deliver_nmi(DeviceState *d) +{ + APICState *s = DO_UPCAST(APICState, busdev.qdev, d); + + if (kvm_irqchip_in_kernel()) { + run_on_cpu(s->cpu_env, kvm_irqchip_deliver_nmi, s); + } else { + apic_local_deliver(s, APIC_LVT_LINT1); + } +} + #define foreach_apic(apic, deliver_bitmask, code) \ {\ int __i, __j, __mask;\ diff --git a/hw/apic.h b/hw/apic.h index c857d52..3a4be0a 100644 --- a/hw/apic.h +++ b/hw/apic.h @@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t trigger_mode); int apic_accept_pic_intr(DeviceState *s); void apic_deliver_pic_intr(DeviceState *s, int level); +void apic_deliver_nmi(DeviceState *d); int apic_get_interrupt(DeviceState *s); void apic_reset_irq_delivered(void); int apic_get_irq_delivered(void); diff --git a/monitor.c b/monitor.c index cb485bf..0b81f17 100644 --- a/monitor.c +++ b/monitor.c @@ -2616,7 +2616,11 @@ static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) CPUState *env; for (env = first_cpu; env != NULL; env = env->next_cpu) { - cpu_interrupt(env, CPU_INTERRUPT_NMI); + if (!env->apic_state) { + cpu_interrupt(env, CPU_INTERRUPT_NMI); + } else { + apic_deliver_nmi(env->apic_state); + } } return 0;