diff mbox

[BUG] Guest kernel divide error in kvm_unlock_kick

Message ID 54116F40.8090704@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Sept. 11, 2014, 9:45 a.m. UTC
Il 08/09/2014 15:28, Chris Webb ha scritto:
> divide error: 0000 [#1] PREEMPT SMP 
> Modules linked in:
> CPU: 0 PID: 743 Comm: syslogd Not tainted 3.16.2-guest #2
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> task: ffff88007c972580 ti: ffff88007cb7c000 task.ti: ffff88007cb7c000
> RIP: 0010:[<ffffffff81037fe2>]  [<ffffffff81037fe2>] kvm_unlock_kick+0x72/0x80
> RSP: 0000:ffff88007fc03ec8  EFLAGS: 00010046
> RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000003
> RDX: 0000000000000003 RSI: ffffffff81a466a0 RDI: 0000000000000000
> RBP: ffffffff81a466a0 R08: ffffffff81b98940 R09: 0000000000000246
> R10: 0000000000000400 R11: 0000000000000000 R12: 00000000000000ea
> R13: 0000000000000009 R14: 0000000000000002 R15: ffff88007fc0d300
> FS:  00007f2a6473e700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000004a8240 CR3: 000000007ac75000 CR4: 00000000000406f0
> Stack:
>  ffffffff81a46400 0000000000000246 0000000000000001 ffffffff8168979d
>  0000000000000282 ffffffff81110d97 0000000000000007 ffff88007cb7ffd8
>  ffff88007c972580 000000004b0782e8 0000000000000002 ffffffff81a0b0c8
> Call Trace:
>  <IRQ> 
>  [<ffffffff8168979d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
>  [<ffffffff81110d97>] ? rcu_process_callbacks+0x337/0x4f0
>  [<ffffffff810cde2d>] ? __do_softirq+0xfd/0x210
>  [<ffffffff810ce06e>] ? irq_exit+0x7e/0xa0
>  [<ffffffff8103063b>] ? smp_apic_timer_interrupt+0x3b/0x50
>  [<ffffffff8168b04d>] ? apic_timer_interrupt+0x6d/0x80
>  <EOI> 
>  [<ffffffff8114180b>] ? filemap_map_pages+0x17b/0x240
>  [<ffffffff811418c0>] ? filemap_map_pages+0x230/0x240
>  [<ffffffff811679e2>] ? do_read_fault.isra.70+0x2a2/0x320
>  [<ffffffff811696cc>] ? handle_mm_fault+0x37c/0xd00
>  [<ffffffff8103bb45>] ? __do_page_fault+0x185/0x4c0
>  [<ffffffff8168b958>] ? async_page_fault+0x28/0x30
>  [<ffffffff813b9610>] ? __put_user_4+0x20/0x30
>  [<ffffffff8168b958>] ? async_page_fault+0x28/0x30
> Code: c0 ca a7 81 48 8d 04 0b 48 8b 30 48 39 ee 75 c9 0f b6 40 08 44 38 e0 75 c0 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 <0f> 01 c1 0f 1f 00 5b 5d 41 5c c3 0f 1f 00 48 c7 c0 10 cf 00 00 

Hi Chris,

sorry for not following up on your previous patch.

This is a hypercall that should have kicked VCPU 3 (see rcx).

Can you please apply this patch and gather a trace of the host
(using "trace-cmd -e kvm qemu-kvm <arguments>")?

Thanks,




Paolo

Comments

Chris Webb Sept. 11, 2014, 5:03 p.m. UTC | #1
Paolo Bonzini <pbonzini@redhat.com> wrote:

> This is a hypercall that should have kicked VCPU 3 (see rcx).
> 
> Can you please apply this patch and gather a trace of the host
> (using "trace-cmd -e kvm qemu-kvm <arguments>")?

Sure, no problem. I've built the trace-cmd tool against udis86 (I hope) and
have put the resulting trace.dat at

  http://cdw.me.uk/tmp/trace.dat

This is actually for a -smp 2 qemu (failing to kick VCPU 1?) as I was having
trouble persuading the -smp 4 qemu to crash as reliably under tracing.
(Something timing related?) Otherwise the qemu-system-x86 command line is
exactly as before.

The guest kernel crash message which corresponds to this trace was:

divide error: 0000 [#1] PREEMPT SMP 
Modules linked in:
CPU: 0 PID: 618 Comm: mkdir Not tainted 3.16.2-guest #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
task: ffff88007c997080 ti: ffff88007c614000 task.ti: ffff88007c614000
RIP: 0010:[<ffffffff81037fe2>]  [<ffffffff81037fe2>] kvm_unlock_kick+0x72/0x80
RSP: 0018:ffff88007c617d40  EFLAGS: 00010046
RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000001 RSI: ffff88007fd11c40 RDI: 0000000000000000
RBP: ffff88007fd11c40 R08: ffffffff81b98940 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000007 R12: 00000000000000f6
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000011c40
FS:  00007f43eb1ed700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f43eace0a30 CR3: 0000000001a12000 CR4: 00000000000406f0
Stack:
 ffff88007c994380 ffff88007c9949aa 0000000000000046 ffffffff81689715
 ffffffff810f3174 0000000000000001 ffffea0001f16320 ffffea0001f17860
 0000000000000000 ffff88007c99e1e8 ffff88007c997080 0000000000000001
Call Trace:
 [<ffffffff81689715>] ? _raw_spin_unlock+0x45/0x70
 [<ffffffff810f3174>] ? try_to_wake_up+0x2a4/0x330
 [<ffffffff81101e2c>] ? __wake_up_common+0x4c/0x80
 [<ffffffff81102418>] ? __wake_up_sync_key+0x38/0x60
 [<ffffffff810d873a>] ? do_notify_parent+0x19a/0x280
 [<ffffffff810f4d56>] ? sched_move_task+0xb6/0x190
 [<ffffffff810cb4fc>] ? do_exit+0xa1c/0xab0
 [<ffffffff810cc344>] ? do_group_exit+0x34/0xb0
 [<ffffffff810cc3cb>] ? SyS_exit_group+0xb/0x10
 [<ffffffff8168a16d>] ? system_call_fastpath+0x1a/0x1f
Code: c0 ca a7 81 48 8d 04 0b 48 8b 30 48 39 ee 75 c9 0f b6 40 08 44 38 e0 75 c0 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 <0f> 01 c1 0f 1f 00 5b 5d 41 5c c3 0f 1f 00 48 c7 c0 10 cf 00 00 
RIP  [<ffffffff81037fe2>] kvm_unlock_kick+0x72/0x80
 RSP <ffff88007c617d40>
---[ end trace bf5a4445f9decdbb ]---
Fixing recursive fault but reboot is needed!
BUG: scheduling while atomic: mkdir/618/0x00000006
Modules linked in:
CPU: 0 PID: 618 Comm: mkdir Tainted: G      D       3.16.2-guest #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
 0000000000000000 00000000c022d302 ffffffff81684029 0000000000000000
 ffffffff810ee956 ffffffff81686266 0000000000011c40 ffff88007c617fd8
 0000000000011c40 ffff88007c997080 0000000000000006 0000000000000046
Call Trace:
 [<ffffffff81684029>] ? dump_stack+0x49/0x6a
 [<ffffffff810ee956>] ? __schedule_bug+0x46/0x60
 [<ffffffff81686266>] ? __schedule+0x5a6/0x7c0
 [<ffffffff816828cd>] ? printk+0x59/0x75
 [<ffffffff810cb33b>] ? do_exit+0x85b/0xab0
 [<ffffffff816828cd>] ? printk+0x59/0x75
 [<ffffffff8100614a>] ? oops_end+0x7a/0x100
 [<ffffffff810033e5>] ? do_error_trap+0x85/0x110
 [<ffffffff81037fe2>] ? kvm_unlock_kick+0x72/0x80
 [<ffffffff8114a358>] ? __alloc_pages_nodemask+0x108/0xa60
 [<ffffffff8168b57e>] ? divide_error+0x1e/0x30
 [<ffffffff81037fe2>] ? kvm_unlock_kick+0x72/0x80
 [<ffffffff81689715>] ? _raw_spin_unlock+0x45/0x70
 [<ffffffff810f3174>] ? try_to_wake_up+0x2a4/0x330
 [<ffffffff81101e2c>] ? __wake_up_common+0x4c/0x80
 [<ffffffff81102418>] ? __wake_up_sync_key+0x38/0x60
 [<ffffffff810d873a>] ? do_notify_parent+0x19a/0x280
 [<ffffffff810f4d56>] ? sched_move_task+0xb6/0x190
 [<ffffffff810cb4fc>] ? do_exit+0xa1c/0xab0
 [<ffffffff810cc344>] ? do_group_exit+0x34/0xb0
 [<ffffffff810cc3cb>] ? SyS_exit_group+0xb/0x10
 [<ffffffff8168a16d>] ? system_call_fastpath+0x1a/0x1f

Best wishes,

Chris.
Paolo Bonzini Sept. 22, 2014, 10:58 a.m. UTC | #2
Il 11/09/2014 19:03, Chris Webb ha scritto:
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> This is a hypercall that should have kicked VCPU 3 (see rcx).
>>
>> Can you please apply this patch and gather a trace of the host
>> (using "trace-cmd -e kvm qemu-kvm <arguments>")?
> 
> Sure, no problem. I've built the trace-cmd tool against udis86 (I hope) and
> have put the resulting trace.dat at
> 
>   http://cdw.me.uk/tmp/trace.dat
> 
> This is actually for a -smp 2 qemu (failing to kick VCPU 1?) as I was having
> trouble persuading the -smp 4 qemu to crash as reliably under tracing.
> (Something timing related?) Otherwise the qemu-system-x86 command line is
> exactly as before.

Do you by chance have CONFIG_DEBUG_RODATA set?  In that case, the fix is
simply not to set it.

Paolo

> The guest kernel crash message which corresponds to this trace was:
> 
> divide error: 0000 [#1] PREEMPT SMP 
> Modules linked in:
> CPU: 0 PID: 618 Comm: mkdir Not tainted 3.16.2-guest #2
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> task: ffff88007c997080 ti: ffff88007c614000 task.ti: ffff88007c614000
> RIP: 0010:[<ffffffff81037fe2>]  [<ffffffff81037fe2>] kvm_unlock_kick+0x72/0x80
> RSP: 0018:ffff88007c617d40  EFLAGS: 00010046
> RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000001
> RDX: 0000000000000001 RSI: ffff88007fd11c40 RDI: 0000000000000000
> RBP: ffff88007fd11c40 R08: ffffffff81b98940 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000007 R12: 00000000000000f6
> R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000011c40
> FS:  00007f43eb1ed700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f43eace0a30 CR3: 0000000001a12000 CR4: 00000000000406f0
> Stack:
>  ffff88007c994380 ffff88007c9949aa 0000000000000046 ffffffff81689715
>  ffffffff810f3174 0000000000000001 ffffea0001f16320 ffffea0001f17860
>  0000000000000000 ffff88007c99e1e8 ffff88007c997080 0000000000000001
> Call Trace:
>  [<ffffffff81689715>] ? _raw_spin_unlock+0x45/0x70
>  [<ffffffff810f3174>] ? try_to_wake_up+0x2a4/0x330
>  [<ffffffff81101e2c>] ? __wake_up_common+0x4c/0x80
>  [<ffffffff81102418>] ? __wake_up_sync_key+0x38/0x60
>  [<ffffffff810d873a>] ? do_notify_parent+0x19a/0x280
>  [<ffffffff810f4d56>] ? sched_move_task+0xb6/0x190
>  [<ffffffff810cb4fc>] ? do_exit+0xa1c/0xab0
>  [<ffffffff810cc344>] ? do_group_exit+0x34/0xb0
>  [<ffffffff810cc3cb>] ? SyS_exit_group+0xb/0x10
>  [<ffffffff8168a16d>] ? system_call_fastpath+0x1a/0x1f
> Code: c0 ca a7 81 48 8d 04 0b 48 8b 30 48 39 ee 75 c9 0f b6 40 08 44 38 e0 75 c0 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 <0f> 01 c1 0f 1f 00 5b 5d 41 5c c3 0f 1f 00 48 c7 c0 10 cf 00 00 
> RIP  [<ffffffff81037fe2>] kvm_unlock_kick+0x72/0x80
>  RSP <ffff88007c617d40>
> ---[ end trace bf5a4445f9decdbb ]---
> Fixing recursive fault but reboot is needed!
> BUG: scheduling while atomic: mkdir/618/0x00000006
> Modules linked in:
> CPU: 0 PID: 618 Comm: mkdir Tainted: G      D       3.16.2-guest #2
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
>  0000000000000000 00000000c022d302 ffffffff81684029 0000000000000000
>  ffffffff810ee956 ffffffff81686266 0000000000011c40 ffff88007c617fd8
>  0000000000011c40 ffff88007c997080 0000000000000006 0000000000000046
> Call Trace:
>  [<ffffffff81684029>] ? dump_stack+0x49/0x6a
>  [<ffffffff810ee956>] ? __schedule_bug+0x46/0x60
>  [<ffffffff81686266>] ? __schedule+0x5a6/0x7c0
>  [<ffffffff816828cd>] ? printk+0x59/0x75
>  [<ffffffff810cb33b>] ? do_exit+0x85b/0xab0
>  [<ffffffff816828cd>] ? printk+0x59/0x75
>  [<ffffffff8100614a>] ? oops_end+0x7a/0x100
>  [<ffffffff810033e5>] ? do_error_trap+0x85/0x110
>  [<ffffffff81037fe2>] ? kvm_unlock_kick+0x72/0x80
>  [<ffffffff8114a358>] ? __alloc_pages_nodemask+0x108/0xa60
>  [<ffffffff8168b57e>] ? divide_error+0x1e/0x30
>  [<ffffffff81037fe2>] ? kvm_unlock_kick+0x72/0x80
>  [<ffffffff81689715>] ? _raw_spin_unlock+0x45/0x70
>  [<ffffffff810f3174>] ? try_to_wake_up+0x2a4/0x330
>  [<ffffffff81101e2c>] ? __wake_up_common+0x4c/0x80
>  [<ffffffff81102418>] ? __wake_up_sync_key+0x38/0x60
>  [<ffffffff810d873a>] ? do_notify_parent+0x19a/0x280
>  [<ffffffff810f4d56>] ? sched_move_task+0xb6/0x190
>  [<ffffffff810cb4fc>] ? do_exit+0xa1c/0xab0
>  [<ffffffff810cc344>] ? do_group_exit+0x34/0xb0
>  [<ffffffff810cc3cb>] ? SyS_exit_group+0xb/0x10
>  [<ffffffff8168a16d>] ? system_call_fastpath+0x1a/0x1f
> 
> Best wishes,
> 
> Chris.
>
Chris Webb Sept. 22, 2014, 7:08 p.m. UTC | #3
Paolo Bonzini <pbonzini@redhat.com> wrote:

> Il 11/09/2014 19:03, Chris Webb ha scritto:
>> Paolo Bonzini <pbonzini@redhat.com> wrote:
>> 
>>> This is a hypercall that should have kicked VCPU 3 (see rcx).
>>> 
>>> Can you please apply this patch and gather a trace of the host
>>> (using "trace-cmd -e kvm qemu-kvm <arguments>")?
>> 
>> Sure, no problem. I've built the trace-cmd tool against udis86 (I hope) and
>> have put the resulting trace.dat at
>> 
>>  http://cdw.me.uk/tmp/trace.dat
>> 
>> This is actually for a -smp 2 qemu (failing to kick VCPU 1?) as I was having
>> trouble persuading the -smp 4 qemu to crash as reliably under tracing.
>> (Something timing related?) Otherwise the qemu-system-x86 command line is
>> exactly as before.
> 
> Do you by chance have CONFIG_DEBUG_RODATA set?  In that case, the fix is
> simply not to set it.

Absolutely right: my host and guest kernels do have CONFIG_DEBUG_RODATA set!

Your patch to use alternatives for VMCALL vs VMMCALL definitely fixed the
divide-by-zero crashes I saw.

Given that I can easily use either (or both) of these solutions, is it be
more efficient to turn off CONFIG_DEBUG_RODATA in the guest kernel so kvm
can fix up the instructions in-place, or is using alternatives for
VMCALL/VMMCALL as implemented by your patch just as good?

Best wishes,

Chris.
Paolo Bonzini Sept. 22, 2014, 7:14 p.m. UTC | #4
Il 22/09/2014 21:08, Chris Webb ha scritto:
> > Do you by chance have CONFIG_DEBUG_RODATA set?  In that case, the fix is
> > simply not to set it.
> 
> Absolutely right: my host and guest kernels do have CONFIG_DEBUG_RODATA set!
> 
> Your patch to use alternatives for VMCALL vs VMMCALL definitely fixed the
> divide-by-zero crashes I saw.
> 
> Given that I can easily use either (or both) of these solutions, is it be
> more efficient to turn off CONFIG_DEBUG_RODATA in the guest kernel so kvm
> can fix up the instructions in-place, or is using alternatives for
> VMCALL/VMMCALL as implemented by your patch just as good?

I posted a patch to use alternatives if CONFIG_DEBUG_RODATA is enabled,
but the bug is in KVM that explicitly documents "you can use any of
VMCALL or VMMCALL".

I'll also see to fix KVM, but the patch is still useful because a) KVM
would not patch a read-only text segment, so there would be a small
performance benefit; b) you cannot control already deployed hypervisors.

However, since there is a workaround, I won't push it into 3.17 so late
in the cycle.  Also, there's a chance that it is NACKed since it touches
non-KVM files.

Paolo
diff mbox

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fb919c574e23..25ed29f68419 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -709,6 +709,8 @@  static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 	int result = 0;
 	struct kvm_vcpu *vcpu = apic->vcpu;
 
+	trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
+				  trig_mode, vector, false);
 	switch (delivery_mode) {
 	case APIC_DM_LOWEST:
 		vcpu->arch.apic_arb_prio++;
@@ -730,8 +732,6 @@  static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 			kvm_make_request(KVM_REQ_EVENT, vcpu);
 			kvm_vcpu_kick(vcpu);
 		}
-		trace_kvm_apic_accept_irq(vcpu->vcpu_id, delivery_mode,
-					  trig_mode, vector, false);
 		break;
 
 	case APIC_DM_REMRD: