Message ID | 201412311145449265941@tencent.com |
---|---|
State | New |
Headers | show |
On Dec31 03:45, kevinnma(马文霜) wrote: > diff --git a/kvm-all.c b/kvm-all.c > index 18cc6b4..f47e1b1 100644 > --- a/kvm-all.c > +++ b/kvm-all.c > @@ -1123,6 +1123,17 @@ static int kvm_irqchip_get_virq(KVMState *s) > int i, bit; > bool retry = true; > > + /* > + * PIC and IOAPIC share the first 15 GSI numbers,available GSI > + * numbers greater than IRQ route entries. If allocate GSI number > + * succeeds, a new route entry can be added, so total IRQ route > + * enties can exceed gsi_count, flush dynamic MSI entries when > + * IRQ route entries arrive gsi_count. > + */ > + if (!s->direct_msi && s->irq_routes->nr == s->gsi_count) { > + kvm_flush_dynamic_msi_routes(s); > + } > + > again: > /* Return the lowest unused GSI in the bitmap */ > for (i = 0; i < max_words; i++) { Any comments on this patch?
Ping Patches here: http://patchwork.ozlabs.org/patch/424738/ Description: In multi-core guest, set irq affinity will eventually lead to guest crash, this is a severe BUG, I do not know why this patch was ignored? Wenshuang Ma
On 08/01/2015 04:28, kevinnma(马文霜) wrote: > Ping > > Patches here: > http://patchwork.ozlabs.org/patch/424738/ > > Description: > In multi-core guest, set irq affinity will eventually lead to guest crash, this is a > severe BUG, I do not know why this patch was ignored? Because there is one maintainer and he (I) was on holiday. Paolo
On 31/12/2014 04:45, kevinnma(马文霜) wrote: > Last month, we experienced several guests crash(6cores-8cores),qemu logs > display the following messages: > > qemu-system-x86_64: /build/qemu-2.1.2/kvm-all.c:976: > kvm_irqchip_commit_routes: Assertion `ret == 0' failed. > > After analysis and verification, we can confirm it's irq-balance > daemon(in guest) leads to the assertion failure.So start a 8 core guest > with two disks, execute the following scripts will reproduce the BUG quickly: > > vda_irq_num=25 > vdb_irq_num=27 > while [ 1 ] > do > for irq in {1,2,4,8,10,20,40,80} > do > echo $irq > /proc/irq/$vda_irq_num/smp_affinity > echo $irq > /proc/irq/$vdb_irq_num/smp_affinity > dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct > dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct > done > done > > QEMU setup static irq route entries in kvm_pc_setup_irq_routing(),PIC and > IOAPIC share the first 15 GSI numbers,take up 23 GSI numbers,but take up 38 > irq route entries.When change irq smp_affinity in guest,a dynamic route > entry may be setup,the current logic is:if allocate GSI number succeeds, > a new route entry can be added.The available dynamic GSI numbers is > 1021(KVM_MAX_IRQ_ROUTES-23),but available irq route entries is only > 986(KVM_MAX_IRQ_ROUTES-38),GSI numbers greater than route entries. > irq-balance's behavior will eventually leads to total irq route entries > exceed KVM_MAX_IRQ_ROUTES,ioctl(KVM_SET_GSI_ROUTING) fail and > kvm_irqchip_commit_routes() trigger assertion failure. I have two questions: 1) why isn't the existing check in kvm_irqchip_get_virq enough to fix the bug? if (!s->direct_msi && retry) { retry = false; kvm_flush_dynamic_msi_routes(s); goto again; } 2) If you introduce this extra call to kvm_flush_dynamic_msi_routes, does the existing check become obsolete? Thanks, Paolo
diff --git a/kvm-all.c b/kvm-all.c index 18cc6b4..f47e1b1 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1123,6 +1123,17 @@ static int kvm_irqchip_get_virq(KVMState *s) int i, bit; bool retry = true; + /* + * PIC and IOAPIC share the first 15 GSI numbers,available GSI + * numbers greater than IRQ route entries. If allocate GSI number + * succeeds, a new route entry can be added, so total IRQ route + * enties can exceed gsi_count, flush dynamic MSI entries when + * IRQ route entries arrive gsi_count. + */ + if (!s->direct_msi && s->irq_routes->nr == s->gsi_count) { + kvm_flush_dynamic_msi_routes(s); + } + again: /* Return the lowest unused GSI in the bitmap */ for (i = 0; i < max_words; i++) {
Last month, we experienced several guests crash(6cores-8cores),qemu logs display the following messages: qemu-system-x86_64: /build/qemu-2.1.2/kvm-all.c:976: kvm_irqchip_commit_routes: Assertion `ret == 0' failed. After analysis and verification, we can confirm it's irq-balance daemon(in guest) leads to the assertion failure.So start a 8 core guest with two disks, execute the following scripts will reproduce the BUG quickly: vda_irq_num=25 vdb_irq_num=27 while [ 1 ] do for irq in {1,2,4,8,10,20,40,80} do echo $irq > /proc/irq/$vda_irq_num/smp_affinity echo $irq > /proc/irq/$vdb_irq_num/smp_affinity dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct done done QEMU setup static irq route entries in kvm_pc_setup_irq_routing(),PIC and IOAPIC share the first 15 GSI numbers,take up 23 GSI numbers,but take up 38 irq route entries.When change irq smp_affinity in guest,a dynamic route entry may be setup,the current logic is:if allocate GSI number succeeds, a new route entry can be added.The available dynamic GSI numbers is 1021(KVM_MAX_IRQ_ROUTES-23),but available irq route entries is only 986(KVM_MAX_IRQ_ROUTES-38),GSI numbers greater than route entries. irq-balance's behavior will eventually leads to total irq route entries exceed KVM_MAX_IRQ_ROUTES,ioctl(KVM_SET_GSI_ROUTING) fail and kvm_irqchip_commit_routes() trigger assertion failure. This patch fix the BUG. Signed-off-by: Wenshuang Ma <kevinnma@tencent.com> --- kvm-all.c | 11 +++++++++++ 1 files changed, 11 insertions(+), 0 deletions(-) -- 1.7.1