diff mbox

Fix irq route entries exceed KVM_MAX_IRQ_ROUTES

Message ID 201412311145449265941@tencent.com
State New
Headers show

Commit Message

kevinnma(马文霜) Dec. 31, 2014, 3:45 a.m. UTC
Last month, we experienced several guests crash(6cores-8cores),qemu logs
display the following messages:

qemu-system-x86_64: /build/qemu-2.1.2/kvm-all.c:976:
kvm_irqchip_commit_routes: Assertion `ret == 0' failed.

After analysis and verification, we can confirm it's irq-balance
daemon(in guest) leads to the assertion failure.So start a 8 core guest
with two disks, execute the following scripts will reproduce the BUG quickly:

vda_irq_num=25
vdb_irq_num=27
while [ 1 ]
do
    for irq in {1,2,4,8,10,20,40,80}
        do
            echo $irq > /proc/irq/$vda_irq_num/smp_affinity
            echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
            dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
            dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
        done
done

QEMU setup static irq route entries in kvm_pc_setup_irq_routing(),PIC and
IOAPIC share the first 15 GSI numbers,take up 23 GSI numbers,but take up 38
irq route entries.When change irq smp_affinity in guest,a dynamic route
entry may be setup,the current logic is:if allocate GSI number succeeds,
a new route entry can be added.The available dynamic GSI numbers is
1021(KVM_MAX_IRQ_ROUTES-23),but available irq route entries is only
986(KVM_MAX_IRQ_ROUTES-38),GSI numbers greater than route entries.
irq-balance's behavior will eventually leads to total irq route entries
exceed KVM_MAX_IRQ_ROUTES,ioctl(KVM_SET_GSI_ROUTING) fail and
kvm_irqchip_commit_routes() trigger assertion failure.

This patch fix the BUG.

Signed-off-by: Wenshuang Ma <kevinnma@tencent.com>

---
 kvm-all.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

-- 
1.7.1

Comments

William Dauchy Jan. 3, 2015, 5:50 p.m. UTC | #1
On Dec31 03:45, kevinnma(马文霜) wrote:
> diff --git a/kvm-all.c b/kvm-all.c
> index 18cc6b4..f47e1b1 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1123,6 +1123,17 @@ static int kvm_irqchip_get_virq(KVMState *s)
>      int i, bit;
>      bool retry = true;
>  
> +    /*
> +     * PIC and IOAPIC share the first 15 GSI numbers,available GSI
> +     * numbers greater than IRQ route entries. If allocate GSI number
> +     * succeeds, a new route entry can be added, so total IRQ route
> +     * enties can exceed gsi_count, flush dynamic MSI entries when
> +     * IRQ route entries arrive gsi_count.
> +     */
> +    if (!s->direct_msi && s->irq_routes->nr == s->gsi_count) {
> +        kvm_flush_dynamic_msi_routes(s);
> +    }
> +
>  again:
>      /* Return the lowest unused GSI in the bitmap */
>      for (i = 0; i < max_words; i++) {

Any comments on this patch?
kevinnma(马文霜) Jan. 8, 2015, 3:28 a.m. UTC | #2
Ping

Patches here:
http://patchwork.ozlabs.org/patch/424738/

Description:
In multi-core guest, set irq affinity will eventually lead to guest crash, this is a
severe BUG, I do not know why this patch was ignored? 

Wenshuang Ma
Paolo Bonzini Jan. 8, 2015, 8:51 a.m. UTC | #3
On 08/01/2015 04:28, kevinnma(马文霜) wrote:
> Ping
> 
> Patches here:
> http://patchwork.ozlabs.org/patch/424738/
> 
> Description:
> In multi-core guest, set irq affinity will eventually lead to guest crash, this is a
> severe BUG, I do not know why this patch was ignored? 

Because there is one maintainer and he (I) was on holiday.

Paolo
Paolo Bonzini Jan. 8, 2015, 9 a.m. UTC | #4
On 31/12/2014 04:45, kevinnma(马文霜) wrote:
> Last month, we experienced several guests crash(6cores-8cores),qemu logs
> display the following messages:
> 
> qemu-system-x86_64: /build/qemu-2.1.2/kvm-all.c:976:
> kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
> 
> After analysis and verification, we can confirm it's irq-balance
> daemon(in guest) leads to the assertion failure.So start a 8 core guest
> with two disks, execute the following scripts will reproduce the BUG quickly:
> 
> vda_irq_num=25
> vdb_irq_num=27
> while [ 1 ]
> do
>     for irq in {1,2,4,8,10,20,40,80}
>         do
>             echo $irq > /proc/irq/$vda_irq_num/smp_affinity
>             echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
>             dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
>             dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
>         done
> done
> 
> QEMU setup static irq route entries in kvm_pc_setup_irq_routing(),PIC and
> IOAPIC share the first 15 GSI numbers,take up 23 GSI numbers,but take up 38
> irq route entries.When change irq smp_affinity in guest,a dynamic route
> entry may be setup,the current logic is:if allocate GSI number succeeds,
> a new route entry can be added.The available dynamic GSI numbers is
> 1021(KVM_MAX_IRQ_ROUTES-23),but available irq route entries is only
> 986(KVM_MAX_IRQ_ROUTES-38),GSI numbers greater than route entries.
> irq-balance's behavior will eventually leads to total irq route entries
> exceed KVM_MAX_IRQ_ROUTES,ioctl(KVM_SET_GSI_ROUTING) fail and
> kvm_irqchip_commit_routes() trigger assertion failure.

I have two questions:

1) why isn't the existing check in kvm_irqchip_get_virq enough to fix
the bug?

    if (!s->direct_msi && retry) {
        retry = false;
        kvm_flush_dynamic_msi_routes(s);
        goto again;
    }

2) If you introduce this extra call to kvm_flush_dynamic_msi_routes,
does the existing check become obsolete?

Thanks,

Paolo
diff mbox

Patch

diff --git a/kvm-all.c b/kvm-all.c
index 18cc6b4..f47e1b1 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1123,6 +1123,17 @@  static int kvm_irqchip_get_virq(KVMState *s)
     int i, bit;
     bool retry = true;
 
+    /*
+     * PIC and IOAPIC share the first 15 GSI numbers,available GSI
+     * numbers greater than IRQ route entries. If allocate GSI number
+     * succeeds, a new route entry can be added, so total IRQ route
+     * enties can exceed gsi_count, flush dynamic MSI entries when
+     * IRQ route entries arrive gsi_count.
+     */
+    if (!s->direct_msi && s->irq_routes->nr == s->gsi_count) {
+        kvm_flush_dynamic_msi_routes(s);
+    }
+
 again:
     /* Return the lowest unused GSI in the bitmap */
     for (i = 0; i < max_words; i++) {