diff mbox

E5-2620v2 - emulation stop error

Message ID jpgvbi7v4tl.fsf@redhat.com
State New
Headers show

Commit Message

Bandan Das March 11, 2015, 5:09 p.m. UTC
"Kevin O'Connor" <kevin@koconnor.net> writes:
...
>
> Something is very odd here.  When I run the above command (on an older
> AMD machine) I get:
>
> Found 128 cpu(s) max supported 128 cpu(s)
>
> That first value (1 vs 128) comes from QEMU (via cmos index 0x5f).
> That is, during smp init, SeaBIOS expects QEMU to tell it how many
> cpus are active, and SeaBIOS waits until that many CPUs check in from
> its SIPI request before proceeding.
>
> I wonder if QEMU reported only 1 active cpu via that cmos register,
> but more were actually active.  If that was the case, it could

I was daring enough to try this and I don't see the crash :)


So, the while loop results in a race somehow ?

Bandan
> certainly explain the failure - as multiple cpus could be running
> without the sipi trapoline in place.
>
> What does the log look like on a non-failure case?
>
> -Kevin

Comments

Kevin O'Connor March 11, 2015, 5:32 p.m. UTC | #1
On Wed, Mar 11, 2015 at 01:09:42PM -0400, Bandan Das wrote:
> "Kevin O'Connor" <kevin@koconnor.net> writes:
> ...
> >
> > Something is very odd here.  When I run the above command (on an older
> > AMD machine) I get:
> >
> > Found 128 cpu(s) max supported 128 cpu(s)
> >
> > That first value (1 vs 128) comes from QEMU (via cmos index 0x5f).
> > That is, during smp init, SeaBIOS expects QEMU to tell it how many
> > cpus are active, and SeaBIOS waits until that many CPUs check in from
> > its SIPI request before proceeding.
> >
> > I wonder if QEMU reported only 1 active cpu via that cmos register,
> > but more were actually active.  If that was the case, it could
> 
> I was daring enough to try this and I don't see the crash :)
> 
> diff --git a/src/fw/smp.c b/src/fw/smp.c
> index a466ea6..a346d46 100644
> --- a/src/fw/smp.c
> +++ b/src/fw/smp.c
> @@ -49,6 +49,7 @@ int apic_id_is_present(u8 apic_id)
>  void VISIBLE32FLAT
>  handle_smp(void)
>  {
> +  dprintf(DEBUG_HDL_smp, "Calling handle_smp\n");
>      if (!CONFIG_QEMU)
>          return;
>  
> @@ -128,6 +129,8 @@ smp_setup(void)
>  
>      // Wait for other CPUs to process the SIPI.
>      u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
> +    while (cmos_smp_count == 1)
> +        cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;

That would loop forever if you had "-smp 1".

>      while (cmos_smp_count != CountCPUs)
>          asm volatile(
>              // Release lock and allow other processors to use the stack.
> 
> So, the while loop results in a race somehow ?

No, the problem is that loop doesn't run at all, and as a result the
other cpus end up running random code.  SeaBIOS sets up an entry
vector for multiple cpus, wakes up the cpus, then tears down the entry
vector.  If it tears down the entry vector before all the cpus have
run through it, then the other cpus can end up running random code.

-Kevin
Bandan Das March 11, 2015, 6:01 p.m. UTC | #2
"Kevin O'Connor" <kevin@koconnor.net> writes:

> On Wed, Mar 11, 2015 at 01:09:42PM -0400, Bandan Das wrote:
>> "Kevin O'Connor" <kevin@koconnor.net> writes:
>> ...
>> >
>> > Something is very odd here.  When I run the above command (on an older
>> > AMD machine) I get:
>> >
>> > Found 128 cpu(s) max supported 128 cpu(s)
>> >
>> > That first value (1 vs 128) comes from QEMU (via cmos index 0x5f).
>> > That is, during smp init, SeaBIOS expects QEMU to tell it how many
>> > cpus are active, and SeaBIOS waits until that many CPUs check in from
>> > its SIPI request before proceeding.
>> >
>> > I wonder if QEMU reported only 1 active cpu via that cmos register,
>> > but more were actually active.  If that was the case, it could
>> 
>> I was daring enough to try this and I don't see the crash :)
>> 
>> diff --git a/src/fw/smp.c b/src/fw/smp.c
>> index a466ea6..a346d46 100644
>> --- a/src/fw/smp.c
>> +++ b/src/fw/smp.c
>> @@ -49,6 +49,7 @@ int apic_id_is_present(u8 apic_id)
>>  void VISIBLE32FLAT
>>  handle_smp(void)
>>  {
>> +  dprintf(DEBUG_HDL_smp, "Calling handle_smp\n");
>>      if (!CONFIG_QEMU)
>>          return;
>>  
>> @@ -128,6 +129,8 @@ smp_setup(void)
>>  
>>      // Wait for other CPUs to process the SIPI.
>>      u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
>> +    while (cmos_smp_count == 1)
>> +        cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
>
> That would loop forever if you had "-smp 1".

Sorry, I should have been more clear. What I meant is if I run
with "-smp 128" (from Dave's original reproducer), sticking this
while loop avoids the crash. So, the rtc_read eventually returns the
right number (127), as the above while loop keeps polling.

Bandan

>>      while (cmos_smp_count != CountCPUs)
>>          asm volatile(
>>              // Release lock and allow other processors to use the stack.
>> 
>> So, the while loop results in a race somehow ?
>
> No, the problem is that loop doesn't run at all, and as a result the
> other cpus end up running random code.  SeaBIOS sets up an entry
> vector for multiple cpus, wakes up the cpus, then tears down the entry
> vector.  If it tears down the entry vector before all the cpus have
> run through it, then the other cpus can end up running random code.
>
> -Kevin
diff mbox

Patch

diff --git a/src/fw/smp.c b/src/fw/smp.c
index a466ea6..a346d46 100644
--- a/src/fw/smp.c
+++ b/src/fw/smp.c
@@ -49,6 +49,7 @@  int apic_id_is_present(u8 apic_id)
 void VISIBLE32FLAT
 handle_smp(void)
 {
+  dprintf(DEBUG_HDL_smp, "Calling handle_smp\n");
     if (!CONFIG_QEMU)
         return;
 
@@ -128,6 +129,8 @@  smp_setup(void)
 
     // Wait for other CPUs to process the SIPI.
     u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
+    while (cmos_smp_count == 1)
+        cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
     while (cmos_smp_count != CountCPUs)
         asm volatile(
             // Release lock and allow other processors to use the stack.