diff mbox

SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

Message ID 20151119134039.GA27717@morn.lan
State New
Headers show

Commit Message

Kevin O'Connor Nov. 19, 2015, 1:40 p.m. UTC
On Thu, Nov 19, 2015 at 12:42:50PM +0000, Xulei (Stone) wrote:
> Kevin,
> 
> After deeply analyzing, i think there may be 3 possible reasons:
> 1)wrong CountCPUs value. It seems CountCPUs++ in handle_smp() has no
> lock to protect.  So, sometimes, 2 or more vcpu may get the same
> current value of CountCPUs. Then we'll get a single incrementation
> instead of 2 or more and "while (cmos_smp_count != CountCPUs)" will
> loop forever;

The handle_smp() code is called from romlayout.S:entry_smp() which
does take a lock.  So, all of handle_smp() should run synchronous.

> 2)wrong cmos_smp_count value. SeaBIOS rtc reads an incorrect number?

Not sure - the last time there were problems in this area of the code
others used kvmtrace to try and track this down.  Since you are
getting dprintf statements, you could also try outputting
cmos_smp_count prior to the loop (see patch below).

> 3)yield() stuck. Is it possible that SeaBIOS is stuck during yield?
> I've tested, when yield() is running, SeaBIOS seems has not created
> some other threads except the main thread. So I don't know what's
> the function of yield() here.?

The yield() allows hardware interrupts to occur.  But note that
yield() isn't called in the loop - is is only called after the loop
completes.

If you are only getting this on massive repetitive reboot requests,
there are some other possible explanations:

- perhaps the SIPI is getting lost because one of the CPUs is still
  resetting or still processing a SIPI from the last reboot?

- the seabios code itself may have been corrupted if the memcpy() in
  qemu_prep_reset() got far enough along to clear HaveRunPost, but did
  not get far enough along to fully complete the memcpy().

If the failure is reproducible, the patch below could help narrow the
possibilities.

-Kevin

Comments

Xulei (Stone, Euler) Nov. 20, 2015, 2:05 a.m. UTC | #1
>On Thu, Nov 19, 2015 at 12:42:50PM +0000, Xulei (Stone) wrote:

>> Kevin,

>> 

>> After deeply analyzing, i think there may be 3 possible reasons:

>> 1)wrong CountCPUs value. It seems CountCPUs++ in handle_smp() has no

>> lock to protect.  So, sometimes, 2 or more vcpu may get the same

>> current value of CountCPUs. Then we'll get a single incrementation

>> instead of 2 or more and "while (cmos_smp_count != CountCPUs)" will

>> loop forever;

>

>The handle_smp() code is called from romlayout.S:entry_smp() which

>does take a lock.  So, all of handle_smp() should run synchronous.

>


Ok, this possibility is ruled out!

>> 2)wrong cmos_smp_count value. SeaBIOS rtc reads an incorrect number?

>

>Not sure - the last time there were problems in this area of the code

>others used kvmtrace to try and track this down.  Since you are

>getting dprintf statements, you could also try outputting

>cmos_smp_count prior to the loop (see patch below).

>

I'll test again with this patch, and observe the output.
But frankly speaking, i don't think SeaBIOS may read an incorrect number.
Because, QEMU set smp_cpus value in pc_cmos_init (cmos ox5f) at the 1st
time (QEMU does not execute pc_cmos_init again during repetitive reboot).
SeaBIOS works well for many times which means there is no reason that
a time SeaBIOS suddenly reads an incorrect number.

>> 3)yield() stuck. Is it possible that SeaBIOS is stuck during yield?

>> I've tested, when yield() is running, SeaBIOS seems has not created

>> some other threads except the main thread. So I don't know what's

>> the function of yield() here.?

>

>The yield() allows hardware interrupts to occur.  But note that

>yield() isn't called in the loop - is is only called after the loop

>completes.

>

>If you are only getting this on massive repetitive reboot requests,

>there are some other possible explanations:

>

>- perhaps the SIPI is getting lost because one of the CPUs is still

>  resetting or still processing a SIPI from the last reboot?

>

Seems impossible. Seen from the qemu log and SeaBIOS log, the VM has 
booted successfully and can execute our "regular rebooting" daemon
process before the last reboot.

>- the seabios code itself may have been corrupted if the memcpy() in

>  qemu_prep_reset() got far enough along to clear HaveRunPost, but did

>  not get far enough along to fully complete the memcpy().

>

BTW, my VM reboots every 220 second (reboot time interval = 220s). I think
SeaBIOS has enough time to process all kinds of affairs, like SIPI and
memcpy().

Kevin, I want to know whether it is possible that if my VM is stuck at QEMU
(a point of pci device reset procedure? or whatever) SeaBIOS will hold at
hanle_smp() and could not printf "Found %d cpu(s) max supported %d cpu(s)"?
Is this possible?

================== bad QEMU log=======
[2015-11-13 18:45:57] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:45:57] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:45:58] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 650381}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}}
[2015-11-13 18:45:58] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 796285}, "event": "RESET"}
[2015-11-13 18:45:58] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:45:59] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:46:00] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411560, "microseconds": 212196}, "event": "RESET"}
[2015-11-13 18:46:00] qemu_reset_report:749 domain is rebooting
[2015-11-13 18:46:00] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411558, "microseconds": 650558}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel3"}}

================ good QEMU log=========
[2015-11-13 18:42:12] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:42:12] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:42:13] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 718617}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel0"}}
[2015-11-13 18:42:13] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 848236}, "event": "RESET"}
[2015-11-13 18:42:14] qemu_devices_reset:1941 reset all devices
[2015-11-13 18:42:14] set_nmi_flag:71 set nmi val = 0
[2015-11-13 18:42:15] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411335, "microseconds": 280198}, "event": "RESET"}
[2015-11-13 18:42:15] qemu_reset_report:749 domain is rebooting
[2015-11-13 18:42:15] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411333, "microseconds": 718794}, "event": "VSERPORT_CHANGE", "data": {"open": false, "id": "channel3"}}
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:15] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 3 that means DRIVER
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '-1' hanle control message event = 0, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 1 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 2 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 3 send control message event = 1, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 4 send control message event = 1, value = 1
[2015-11-13 18:42:23] virtio_set_status:524 virtio-serial device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '1' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '2' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 2 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '3' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1
[2015-11-13 18:42:23] handle_control_message:333 virtio serial port '4' hanle control message event = 3, value = 1
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 1 that means ACKNOWLEDGE
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 3 that means DRIVER
[2015-11-13 18:42:23] virtio_set_status:524 virtio-blk device status is 7 that means DRIVER OK
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '2' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411350, "microseconds": 214826}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel1"}}
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '1' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '3' hanle control message event = 6, value = 1
[2015-11-13 18:42:30] handle_control_message:333 virtio serial port '4' hanle control message event = 6, value = 1
[2015-11-13 18:42:31] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1447411350, "microseconds": 220665}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel3"}}

>If the failure is reproducible, the patch below could help narrow the

>possibilities.

>

>-Kevin

>

>

>--- a/src/fw/smp.c

>+++ b/src/fw/smp.c

>@@ -125,6 +125,7 @@ smp_setup(void)

> 

>     // Wait for other CPUs to process the SIPI.

>     u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;

>+    dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count);

>     while (cmos_smp_count != CountCPUs)

>         asm volatile(

>             // Release lock and allow other processors to use the stack.

>@@ -136,6 +137,7 @@ smp_setup(void)

>             "  jc 1b\n"

>             : "+m" (SMPLock), "+m" (SMPStack)

>             : : "cc", "memory");

>+    dprintf(1, "finish smp\n");

>     yield();

> 

>     // Restore memory.
Kevin O'Connor Dec. 18, 2015, 11:13 p.m. UTC | #2
On Fri, Dec 18, 2015 at 03:04:58AM +0000, Gonglei (Arei) wrote:
> Hi Kevin & Paolo,
> 
> Luckily, I reproduced this problem last night. And I got the below log when SeaBIOS is stuck.
[...]
> [2015-12-18 10:38:10]  gonglei: finish while   
[...]
> <...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306
> <...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
> <...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
> <...>-31509 [035] 154753.180077: kvm_entry: vcpu 0

This is an odd finding.  It seems to indicate that the code is caught
in an infinite irq loop once irqs are enabled.  What doesn't make
sense is that an NMI shouldn't depend on the cpu irq enable flag.
Also, I can't explain why rip would be 0x03, nor why a #UD in an
exception handler wouldn't result in a triple fault.  Maybe someone
with more kvm knowledge could help here.

I did notice that you appear to be running with SeaBIOS v1.8.1 - I
recommend you upgrade to the latest.  There were two important fixes
in this area (8b9942fa and 3156b71a).  I don't think either of these
fixes would explain the log above, but it would be best to eliminate
the possibility.

-Kevin
Gonglei (Arei) Dec. 19, 2015, 6:28 a.m. UTC | #3
>
> From: Kevin O'Connor [mailto:kevin@koconnor.net]
> Sent: Saturday, December 19, 2015 7:13 AM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seabios@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Fri, Dec 18, 2015 at 03:04:58AM +0000, Gonglei (Arei) wrote:
> > Hi Kevin & Paolo,
> >
> > Luckily, I reproduced this problem last night. And I got the below log when
> SeaBIOS is stuck.
> [...]
> > [2015-12-18 10:38:10]  gonglei: finish while
> [...]
> > <...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3
> info 0 80000306
> > <...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real)
> > <...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0)
> > <...>-31509 [035] 154753.180077: kvm_entry: vcpu 0
> 
> This is an odd finding.  It seems to indicate that the code is caught
> in an infinite irq loop once irqs are enabled.  What doesn't make
> sense is that an NMI shouldn't depend on the cpu irq enable flag.
> Also, I can't explain why rip would be 0x03, nor why a #UD in an
> exception handler wouldn't result in a triple fault.  Maybe someone
> with more kvm knowledge could help here.
> 

Ccing Paolo and Radim.

> I did notice that you appear to be running with SeaBIOS v1.8.1 - I
> recommend you upgrade to the latest.  There were two important fixes
> in this area (8b9942fa and 3156b71a).  I don't think either of these
> fixes would explain the log above, but it would be best to eliminate
> the possibility.
> 
We can reproduce the problem using latest SeaBIOS too. :(


Regards,
-Gonglei
diff mbox

Patch

--- a/src/fw/smp.c
+++ b/src/fw/smp.c
@@ -125,6 +125,7 @@  smp_setup(void)
 
     // Wait for other CPUs to process the SIPI.
     u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
+    dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count);
     while (cmos_smp_count != CountCPUs)
         asm volatile(
             // Release lock and allow other processors to use the stack.
@@ -136,6 +137,7 @@  smp_setup(void)
             "  jc 1b\n"
             : "+m" (SMPLock), "+m" (SMPStack)
             : : "cc", "memory");
+    dprintf(1, "finish smp\n");
     yield();
 
     // Restore memory.