E5-2620v2 - emulation stop error

Message ID	jpgvbi7v4tl.fsf@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Bandan Das <bsd@redhat.com> To: "Kevin O'Connor" <kevin@koconnor.net> References: <jpgwq2uqd1h.fsf@redhat.com> <CABYiri-u0Spd9u1mBNqYiD9t4n2hC65_pGHBdhnPeEgXtbnFEQ@mail.gmail.com> <CABYiri-oXA9eMqWDd+qXGozcZaVLLTocRpUwKnvBy-GAc69fnw@mail.gmail.com> <20150310165755.GL2338@work-vm> <54FF337A.1010202@redhat.com> <jpgoao0r9vh.fsf@redhat.com> <54FF4541.9080608@redhat.com> <20150310202958.GR2338@work-vm> <jpgbnk01cn2.fsf@redhat.com> <20150311134556.GH2334@work-vm> <20150311154220.GA26463@morn.localdomain> Date: Wed, 11 Mar 2015 13:09:42 -0400 In-Reply-To: <20150311154220.GA26463@morn.localdomain> (Kevin O'Connor's message of "Wed, 11 Mar 2015 11:42:20 -0400") Message-ID: <jpgvbi7v4tl.fsf@redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Cc: Andrey Korolyov <andrey@xdel.ru>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, kraxel@redhat.com, Paolo Bonzini <pbonzini@redhat.com> Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Message ID

jpgvbi7v4tl.fsf@redhat.com

State

New

Headers

From: Bandan Das <bsd@redhat.com>
To: "Kevin O'Connor" <kevin@koconnor.net>
References: <jpgwq2uqd1h.fsf@redhat.com>
	<CABYiri-u0Spd9u1mBNqYiD9t4n2hC65_pGHBdhnPeEgXtbnFEQ@mail.gmail.com>
	<CABYiri-oXA9eMqWDd+qXGozcZaVLLTocRpUwKnvBy-GAc69fnw@mail.gmail.com>
	<20150310165755.GL2338@work-vm> <54FF337A.1010202@redhat.com>
	<jpgoao0r9vh.fsf@redhat.com> <54FF4541.9080608@redhat.com>
	<20150310202958.GR2338@work-vm> <jpgbnk01cn2.fsf@redhat.com>
	<20150311134556.GH2334@work-vm>
	<20150311154220.GA26463@morn.localdomain>
Date: Wed, 11 Mar 2015 13:09:42 -0400
In-Reply-To: <20150311154220.GA26463@morn.localdomain> (Kevin O'Connor's
	message of "Wed, 11 Mar 2015 11:42:20 -0400")
Message-ID: <jpgvbi7v4tl.fsf@redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Cc: Andrey Korolyov <andrey@xdel.ru>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	kraxel@redhat.com, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error
Precedence: list
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Commit Message

Bandan Das March 11, 2015, 5:09 p.m. UTC

"Kevin O'Connor" <kevin@koconnor.net> writes:
...
>
> Something is very odd here.  When I run the above command (on an older
> AMD machine) I get:
>
> Found 128 cpu(s) max supported 128 cpu(s)
>
> That first value (1 vs 128) comes from QEMU (via cmos index 0x5f).
> That is, during smp init, SeaBIOS expects QEMU to tell it how many
> cpus are active, and SeaBIOS waits until that many CPUs check in from
> its SIPI request before proceeding.
>
> I wonder if QEMU reported only 1 active cpu via that cmos register,
> but more were actually active.  If that was the case, it could

I was daring enough to try this and I don't see the crash :)


So, the while loop results in a race somehow ?

Bandan
> certainly explain the failure - as multiple cpus could be running
> without the sipi trapoline in place.
>
> What does the log look like on a non-failure case?
>
> -Kevin

Comments

Kevin O'Connor March 11, 2015, 5:32 p.m. UTC | #1

On Wed, Mar 11, 2015 at 01:09:42PM -0400, Bandan Das wrote:
> "Kevin O'Connor" <kevin@koconnor.net> writes:
> ...
> >
> > Something is very odd here.  When I run the above command (on an older
> > AMD machine) I get:
> >
> > Found 128 cpu(s) max supported 128 cpu(s)
> >
> > That first value (1 vs 128) comes from QEMU (via cmos index 0x5f).
> > That is, during smp init, SeaBIOS expects QEMU to tell it how many
> > cpus are active, and SeaBIOS waits until that many CPUs check in from
> > its SIPI request before proceeding.
> >
> > I wonder if QEMU reported only 1 active cpu via that cmos register,
> > but more were actually active.  If that was the case, it could
> 
> I was daring enough to try this and I don't see the crash :)
> 
> diff --git a/src/fw/smp.c b/src/fw/smp.c
> index a466ea6..a346d46 100644
> --- a/src/fw/smp.c
> +++ b/src/fw/smp.c
> @@ -49,6 +49,7 @@ int apic_id_is_present(u8 apic_id)
>  void VISIBLE32FLAT
>  handle_smp(void)
>  {
> +  dprintf(DEBUG_HDL_smp, "Calling handle_smp\n");
>      if (!CONFIG_QEMU)
>          return;
>  
> @@ -128,6 +129,8 @@ smp_setup(void)
>  
>      // Wait for other CPUs to process the SIPI.
>      u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
> +    while (cmos_smp_count == 1)
> +        cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;

That would loop forever if you had "-smp 1".

>      while (cmos_smp_count != CountCPUs)
>          asm volatile(
>              // Release lock and allow other processors to use the stack.
> 
> So, the while loop results in a race somehow ?

No, the problem is that loop doesn't run at all, and as a result the
other cpus end up running random code.  SeaBIOS sets up an entry
vector for multiple cpus, wakes up the cpus, then tears down the entry
vector.  If it tears down the entry vector before all the cpus have
run through it, then the other cpus can end up running random code.

-Kevin

Bandan Das March 11, 2015, 6:01 p.m. UTC | #2

"Kevin O'Connor" <kevin@koconnor.net> writes:

> On Wed, Mar 11, 2015 at 01:09:42PM -0400, Bandan Das wrote:
>> "Kevin O'Connor" <kevin@koconnor.net> writes:
>> ...
>> >
>> > Something is very odd here.  When I run the above command (on an older
>> > AMD machine) I get:
>> >
>> > Found 128 cpu(s) max supported 128 cpu(s)
>> >
>> > That first value (1 vs 128) comes from QEMU (via cmos index 0x5f).
>> > That is, during smp init, SeaBIOS expects QEMU to tell it how many
>> > cpus are active, and SeaBIOS waits until that many CPUs check in from
>> > its SIPI request before proceeding.
>> >
>> > I wonder if QEMU reported only 1 active cpu via that cmos register,
>> > but more were actually active.  If that was the case, it could
>> 
>> I was daring enough to try this and I don't see the crash :)
>> 
>> diff --git a/src/fw/smp.c b/src/fw/smp.c
>> index a466ea6..a346d46 100644
>> --- a/src/fw/smp.c
>> +++ b/src/fw/smp.c
>> @@ -49,6 +49,7 @@ int apic_id_is_present(u8 apic_id)
>>  void VISIBLE32FLAT
>>  handle_smp(void)
>>  {
>> +  dprintf(DEBUG_HDL_smp, "Calling handle_smp\n");
>>      if (!CONFIG_QEMU)
>>          return;
>>  
>> @@ -128,6 +129,8 @@ smp_setup(void)
>>  
>>      // Wait for other CPUs to process the SIPI.
>>      u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
>> +    while (cmos_smp_count == 1)
>> +        cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
>
> That would loop forever if you had "-smp 1".

Sorry, I should have been more clear. What I meant is if I run
with "-smp 128" (from Dave's original reproducer), sticking this
while loop avoids the crash. So, the rtc_read eventually returns the
right number (127), as the above while loop keeps polling.

Bandan

>>      while (cmos_smp_count != CountCPUs)
>>          asm volatile(
>>              // Release lock and allow other processors to use the stack.
>> 
>> So, the while loop results in a race somehow ?
>
> No, the problem is that loop doesn't run at all, and as a result the
> other cpus end up running random code.  SeaBIOS sets up an entry
> vector for multiple cpus, wakes up the cpus, then tears down the entry
> vector.  If it tears down the entry vector before all the cpus have
> run through it, then the other cpus can end up running random code.
>
> -Kevin

diff --git a/src/fw/smp.c b/src/fw/smp.c
index a466ea6..a346d46 100644
--- a/src/fw/smp.c
+++ b/src/fw/smp.c
@@ -49,6 +49,7 @@  int apic_id_is_present(u8 apic_id)
 void VISIBLE32FLAT
 handle_smp(void)
 {
+  dprintf(DEBUG_HDL_smp, "Calling handle_smp\n");
     if (!CONFIG_QEMU)
         return;
 
@@ -128,6 +129,8 @@  smp_setup(void)
 
     // Wait for other CPUs to process the SIPI.
     u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
+    while (cmos_smp_count == 1)
+        cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
     while (cmos_smp_count != CountCPUs)
         asm volatile(
             // Release lock and allow other processors to use the stack.

E5-2620v2 - emulation stop error

Commit Message

Comments

Patch