diff mbox

E5-2620v2 - emulation stop error

Message ID 20150311163739.GA29522@morn.localdomain
State New
Headers show

Commit Message

Kevin O'Connor March 11, 2015, 4:37 p.m. UTC
On Wed, Mar 11, 2015 at 03:53:07PM +0000, Dr. David Alan Gilbert wrote:
> * Kevin O'Connor (kevin@koconnor.net) wrote:
> > On Wed, Mar 11, 2015 at 01:45:57PM +0000, Dr. David Alan Gilbert wrote:
> > > * Bandan Das (bsd@redhat.com) wrote:
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> > > > > while true; do (sleep 5; echo -e '\001cq\n')|/opt/qemu-try-world3/bin/qemu-system-x86_64 -machine pc-i440fx-2.0,accel=kvm -m 1024 -smp 128 -nographic -device sga 2>&1 | tee /tmp/qemu.op; grep "internal error" /tmp/qemu.op -q && break; done

That is a truly impressive command line, BTW.

> > > > > [root@virtlab413 qemu-world3]# git bisect bad
> > > > > 21f5826a04d38e19488f917e1eef22751490c769 is the first bad commit
> > > > 
> > > > I can reproduce this on E5-2620 v2 with  David's "while true" test.
> > > > (The emulation failure I mean, not the suberror 2 that Andrey is seeing)
> > > > The commit that seems to have introduced this is -
> > > > 
> > > > commit 0673b7870063a3affbad9046fb6d385a4e734c19
> > > > Author: Kevin O'Connor <kevin@koconnor.net>
> > > > Date:   Sat May 24 10:49:50 2014 -0400
> > > > 
> > > >     smp: Replace QEMU SMP init assembler code with C; run only in 32bit mode.
> > [...]
> > > Turning on debug logging
> > > ( -chardev file,id=log,path=/tmp/debugcon.$$ -device isa-debugcon,chardev=log,iobase=0x402 )
> > > 
> > > SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org)
> > [...]
> > > Found 1 cpu(s) max supported 128 cpu(s)
> > 
> > Something is very odd here.  When I run the above command (on an older
> > AMD machine) I get:
> > 
> > Found 128 cpu(s) max supported 128 cpu(s)
> > 
> > That first value (1 vs 128) comes from QEMU (via cmos index 0x5f).
> > That is, during smp init, SeaBIOS expects QEMU to tell it how many
> > cpus are active, and SeaBIOS waits until that many CPUs check in from
> > its SIPI request before proceeding.
> > 
> > I wonder if QEMU reported only 1 active cpu via that cmos register,
> > but more were actually active.  If that was the case, it could
> > certainly explain the failure - as multiple cpus could be running
> > without the sipi trapoline in place.
> > 
> > What does the log look like on a non-failure case?
> 
> I had to drop down from 128 to get a working run with debug; here
> are two runs with -smp 20   the first one worked, the second one
> failed.
[...]
> =========== Working ===========
> 
> SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org)
[...]
> Found 20 cpu(s) max supported 20 cpu(s)
[...]
> =========== Broken ===========
> 
> SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org)
[...]
> Found 1 cpu(s) max supported 20 cpu(s)

So, I couldn't get this to fail on my older AMD machine at all with
the default SeaBIOS code.  But, when I change the code with the patch
below, it failed right away.

KVM internal error. Suberror: 1
emulation failure
EAX=00000000 EBX=00000000 ECX=00000000 EDX=000fd2b8
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=000fd2c1 EFL=00000007 [-----PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008300 DPL=0 TSS16-busy
GDT=     000f6a50 00000037
IDT=     000f6a8e 00000000
CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=66 ba b8 d2 0f 00 e9 a2 fe f3 90 f0 0f ba 2d 04 ff fb 3f 00 <72> f3 8b 25 00 ff fb 3f e8 d2 65 ff ff c7 05 04 ff fb 3f 00 00 00 00 f4 eb fd fa fc 66 b8

And the failed debug output looks like:

SeaBIOS (version rel-1.8.0-7-gd23eba6-dirty-20150311_121819-morn.localdomain)
[...]
cmos_smp_count0=20
[...]
cmos_smp_count=1
cmos_smp_count2=1/20
Found 1 cpu(s) max supported 20 cpu(s)

I'm going to check the assembly for a compiler error, but is it
possible QEMU is returning incorrect data in cmos index 0x5f?

David, any chance you can recompile seabios and double check your
output?

-Kevin

Comments

Dr. David Alan Gilbert March 11, 2015, 4:52 p.m. UTC | #1
* Kevin O'Connor (kevin@koconnor.net) wrote:
> On Wed, Mar 11, 2015 at 03:53:07PM +0000, Dr. David Alan Gilbert wrote:
> > * Kevin O'Connor (kevin@koconnor.net) wrote:
> > > On Wed, Mar 11, 2015 at 01:45:57PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Bandan Das (bsd@redhat.com) wrote:
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:
> > > > > > while true; do (sleep 5; echo -e '\001cq\n')|/opt/qemu-try-world3/bin/qemu-system-x86_64 -machine pc-i440fx-2.0,accel=kvm -m 1024 -smp 128 -nographic -device sga 2>&1 | tee /tmp/qemu.op; grep "internal error" /tmp/qemu.op -q && break; done
> 
> That is a truly impressive command line, BTW.

Thanks :-)

> > > > > > [root@virtlab413 qemu-world3]# git bisect bad
> > > > > > 21f5826a04d38e19488f917e1eef22751490c769 is the first bad commit
> > > > > 
> > > > > I can reproduce this on E5-2620 v2 with  David's "while true" test.
> > > > > (The emulation failure I mean, not the suberror 2 that Andrey is seeing)
> > > > > The commit that seems to have introduced this is -
> > > > > 
> > > > > commit 0673b7870063a3affbad9046fb6d385a4e734c19
> > > > > Author: Kevin O'Connor <kevin@koconnor.net>
> > > > > Date:   Sat May 24 10:49:50 2014 -0400
> > > > > 
> > > > >     smp: Replace QEMU SMP init assembler code with C; run only in 32bit mode.
> > > [...]
> > > > Turning on debug logging
> > > > ( -chardev file,id=log,path=/tmp/debugcon.$$ -device isa-debugcon,chardev=log,iobase=0x402 )
> > > > 
> > > > SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org)
> > > [...]
> > > > Found 1 cpu(s) max supported 128 cpu(s)
> > > 
> > > Something is very odd here.  When I run the above command (on an older
> > > AMD machine) I get:
> > > 
> > > Found 128 cpu(s) max supported 128 cpu(s)
> > > 
> > > That first value (1 vs 128) comes from QEMU (via cmos index 0x5f).
> > > That is, during smp init, SeaBIOS expects QEMU to tell it how many
> > > cpus are active, and SeaBIOS waits until that many CPUs check in from
> > > its SIPI request before proceeding.
> > > 
> > > I wonder if QEMU reported only 1 active cpu via that cmos register,
> > > but more were actually active.  If that was the case, it could
> > > certainly explain the failure - as multiple cpus could be running
> > > without the sipi trapoline in place.
> > > 
> > > What does the log look like on a non-failure case?
> > 
> > I had to drop down from 128 to get a working run with debug; here
> > are two runs with -smp 20   the first one worked, the second one
> > failed.
> [...]
> > =========== Working ===========
> > 
> > SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org)
> [...]
> > Found 20 cpu(s) max supported 20 cpu(s)
> [...]
> > =========== Broken ===========
> > 
> > SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org)
> [...]
> > Found 1 cpu(s) max supported 20 cpu(s)
> 
> So, I couldn't get this to fail on my older AMD machine at all with
> the default SeaBIOS code.  But, when I change the code with the patch
> below, it failed right away.
> 
> KVM internal error. Suberror: 1
> emulation failure
> EAX=00000000 EBX=00000000 ECX=00000000 EDX=000fd2b8
> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> EIP=000fd2c1 EFL=00000007 [-----PC] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
> SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
> LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 00000000 0000ffff 00008300 DPL=0 TSS16-busy
> GDT=     000f6a50 00000037
> IDT=     000f6a8e 00000000
> CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=66 ba b8 d2 0f 00 e9 a2 fe f3 90 f0 0f ba 2d 04 ff fb 3f 00 <72> f3 8b 25 00 ff fb 3f e8 d2 65 ff ff c7 05 04 ff fb 3f 00 00 00 00 f4 eb fd fa fc 66 b8
> 
> And the failed debug output looks like:
> 
> SeaBIOS (version rel-1.8.0-7-gd23eba6-dirty-20150311_121819-morn.localdomain)
> [...]
> cmos_smp_count0=20
> [...]
> cmos_smp_count=1
> cmos_smp_count2=1/20
> Found 1 cpu(s) max supported 20 cpu(s)
> 
> I'm going to check the assembly for a compiler error, but is it
> possible QEMU is returning incorrect data in cmos index 0x5f?
> 
> David, any chance you can recompile seabios and double check your
> output?

Done;

=========== Working ===========
SeaBIOS (version rel-1.8.0-0-g4c59f5d-dirty-20150311_164408-dgilbert-t530)
No Xen hypervisor found.
Running on QEMU (i440fx)
Running on KVM
RamSize: 0x40000000 [cmos]
Relocating init from 0x000df4e0 to 0x3ffaf570 (size 68048)
Found QEMU fw_cfg
RamBlock: addr 0x0000000000000000 len 0x0000000040000000 [e820]
Moving pm_base to 0x600
CPU Mhz=2116
cmos_smp_count0=20
=== PCI bus & bridge init ===
PCI: pci_bios_init_bus_rec bus = 0x0
=== PCI device probing ===
Found 6 PCI devices (max PCI bus is 00)
=== PCI new allocation pass #1 ===
PCI: check devices
=== PCI new allocation pass #2 ===
PCI: IO: c000 - c04f
PCI: 32: 0000000080000000 - 00000000fec00000
PCI: map device bdf=00:03.0  bar 1, addr 0000c000, size 00000040 [io]
PCI: map device bdf=00:01.1  bar 4, addr 0000c040, size 00000010 [io]
PCI: map device bdf=00:03.0  bar 6, addr feb80000, size 00040000 [mem]
PCI: map device bdf=00:03.0  bar 0, addr febc0000, size 00020000 [mem]
PCI: map device bdf=00:02.0  bar 6, addr febe0000, size 00010000 [mem]
PCI: map device bdf=00:02.0  bar 2, addr febf0000, size 00001000 [mem]
PCI: map device bdf=00:02.0  bar 0, addr fd000000, size 01000000 [prefmem]
PCI: init bdf=00:00.0 id=8086:1237
PCI: init bdf=00:01.0 id=8086:7000
PIIX3/PIIX4 init: elcr=00 0c
PCI: init bdf=00:01.1 id=8086:7010
PCI: init bdf=00:01.3 id=8086:7113
Using pmtimer, ioport 0x608
PCI: init bdf=00:02.0 id=1234:1111
PCI: init bdf=00:03.0 id=8086:100e
PCI: Using 00:02.0 for primary VGA
cmos_smp_count=20
handle_smp: apic_id=8
handle_smp: apic_id=4
handle_smp: apic_id=2
handle_smp: apic_id=15
handle_smp: apic_id=16
handle_smp: apic_id=10
handle_smp: apic_id=3
handle_smp: apic_id=12
handle_smp: apic_id=19
handle_smp: apic_id=14
handle_smp: apic_id=11
handle_smp: apic_id=7
handle_smp: apic_id=17
handle_smp: apic_id=6
handle_smp: apic_id=18
handle_smp: apic_id=13
handle_smp: apic_id=9
handle_smp: apic_id=5
handle_smp: apic_id=1
cmos_smp_count2=20/20
Found 20 cpu(s) max supported 20 cpu(s)
Copying PIR from 0x3ffbfc9c to 0x000f6720
Copying MPTABLE from 0x00006de8/3ffa64a0 to 0x000f64c0
Copying SMBIOS entry point from 0x00006de8 to 0x000f64a0
Scan for VGA option rom
Running option rom at c000:0003
Start SeaVGABIOS (version rel-1.8.0-0-g4c59f5d-20150219_092912-nilsson.home.kraxel.org)
enter vga_post:
   a=00000010  b=0000ffff  c=00000000  d=0000ffff ds=0000 es=f000 ss=0000
  si=00000000 di=00006af0 bp=00000000 sp=00006d66 cs=f000 ip=d244  f=0000
VBE DISPI: bdf 00:02.0, bar 0
VBE DISPI: lfb_addr=fd000000, size 16 MB
Attempting to allocate VGA stack via pmm call to f000:d2f6
pmm call arg1=0
VGA stack allocated at ef430
Running option rom at c980:0003
Turning on vga text mode console
set VGA mode 3
SeaBIOS (version rel-1.8.0-0-g4c59f5d-dirty-20150311_164408-dgilbert-t530)
All threads complete.
Found 1 lpt ports
Found 1 serial ports
Searching bootorder for: /pci@i0cf8/isa@1/fdc@03f0/floppy@0
ATA controller 1 at 1f0/3f4/0 (irq 14 dev 9)
ATA controller 2 at 170/374/0 (irq 15 dev 9)
DVD/CD [ata1-0: QEMU DVD-ROM ATAPI-4 DVD/CD]
Searching bootorder for: /pci@i0cf8/*@1,1/drive@1/disk@0
PS2 keyboard initialized
All threads complete.
Scan for option roms
Running option rom at ca80:0003
pmm call arg1=1
pmm call arg1=0
pmm call arg1=1
pmm call arg1=0
Searching bootorder for: /pci@i0cf8/*@3
Searching bootorder for: /rom@genroms/kvmvapic.bin
Searching bootorder for: HALT
Running option rom at cb80:0003
Space available for UMB: ce000-ee800, f5fa0-f63c0
Returned 126976 bytes of ZoneHigh
e820 map has 7 items:
  0: 0000000000000000 - 000000000009f800 = 1 RAM
  1: 000000000009f800 - 00000000000a0000 = 2 RESERVED
  2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
  3: 0000000000100000 - 000000003ffdf000 = 1 RAM
  4: 000000003ffdf000 - 0000000040000000 = 2 RESERVED
  5: 00000000feffc000 - 00000000ff000000 = 2 RESERVED
  6: 00000000fffc0000 - 0000000100000000 = 2 RESERVED
enter handle_19:
  NULL
Booting from Hard Disk...
Boot failed: could not read the boot disk

enter handle_18:
  NULL
Booting from Floppy...
floppy error: 40 00 00 00 00 01 02
Boot failed: could not read the boot disk

enter handle_18:
  NULL
Booting from DVD/CD...
Device reports MEDIUM NOT PRESENT
scsi_is_ready returned -1
Boot failed: Could not read from CDROM (code 0003)
enter handle_18:
  NULL
Booting from ROM...
Booting from ca80:0361

=========== Broken ===========
SeaBIOS (version rel-1.8.0-0-g4c59f5d-dirty-20150311_164408-dgilbert-t530)
No Xen hypervisor found.
Running on QEMU (i440fx)
Running on KVM
RamSize: 0x40000000 [cmos]
Relocating init from 0x000df4e0 to 0x3ffaf570 (size 68048)
Found QEMU fw_cfg
RamBlock: addr 0x0000000000000000 len 0x0000000040000000 [e820]
Moving pm_base to 0x600
CPU Mhz=2115
cmos_smp_count0=20
=== PCI bus & bridge init ===
PCI: pci_bios_init_bus_rec bus = 0x0
=== PCI device probing ===
Found 6 PCI devices (max PCI bus is 00)
=== PCI new allocation pass #1 ===
PCI: check devices
=== PCI new allocation pass #2 ===
PCI: IO: c000 - c04f
PCI: 32: 0000000080000000 - 00000000fec00000
PCI: map device bdf=00:03.0  bar 1, addr 0000c000, size 00000040 [io]
PCI: map device bdf=00:01.1  bar 4, addr 0000c040, size 00000010 [io]
PCI: map device bdf=00:03.0  bar 6, addr feb80000, size 00040000 [mem]
PCI: map device bdf=00:03.0  bar 0, addr febc0000, size 00020000 [mem]
PCI: map device bdf=00:02.0  bar 6, addr febe0000, size 00010000 [mem]
PCI: map device bdf=00:02.0  bar 2, addr febf0000, size 00001000 [mem]
PCI: map device bdf=00:02.0  bar 0, addr fd000000, size 01000000 [prefmem]
PCI: init bdf=00:00.0 id=8086:1237
PCI: init bdf=00:01.0 id=8086:7000
PIIX3/PIIX4 init: elcr=00 0c
PCI: init bdf=00:01.1 id=8086:7010
PCI: init bdf=00:01.3 id=8086:7113
Using pmtimer, ioport 0x608
PCI: init bdf=00:02.0 id=1234:1111
PCI: init bdf=00:03.0 id=8086:100e
PCI: Using 00:02.0 for primary VGA
cmos_smp_count=1
cmos_smp_count2=1/20
Found 1 cpu(s) max supported 20 cpu(s)
Copying PIR from 0x3ffbfc9c to 0x000f6720
Copying MPTABLE from 0x00006de8/3ffa64a0 to 0x000f64c0
Copying SMBIOS entry point from 0x00006de8 to 0x000f64a0
Scan for VGA option rom
Running option rom at c000:0003
Start SeaVGABIOS (version rel-1.8.0-0-g4c59f5d-20150219_092912-nilsson.home.kraxel.org)
enter vga_post:
   a=00000010  b=0000ffff  c=00000000  d=0000ffff ds=0000 es=f000 ss=0000
  si=00000000 di=00006af0 bp=00000000 sp=00006d66 cs=f000 ip=d244  f=0000
VBE DISPI: bdf 00:02.0, bar 0
VBE DISPI: lfb_addr=fd000000, size 16 MB
Attempting to allocate VGA stack via pmm call to f000:d2f6
pmm call arg1=0
VGA stack allocated at ef430
Running option rom at c980:0003
Turning on vga text mode console
set VGA mode 3
SeaBIOS (version rel-1.8.0-0-g4c59f5d-dirty-20150311_164408-dgilbert-t530)
All threads complete.
Found 1 lpt ports
Found 1 serial ports
Searching bootorder for: /pci@i0cf8/isa@1/fdc@03f0/floppy@0
ATA controller 1 at 1f0/3f4/0 (irq 14 dev 9)
ATA controller 2 at 170/374/0 (irq 15 dev 9)
DVD/CD [ata1-0: QEMU DVD-ROM ATAPI-4 DVD/CD]
Searching bootorder for: /pci@i0cf8/*@1,1/drive@1/disk@0
PS2 keyboard initialized
All threads complete.
Scan for option roms
Running option rom at ca80:0003
pmm call arg1=1
pmm call arg1=0
pmm call arg1=1
pmm call arg1=0
Searching bootorder for: /pci@i0cf8/*@3
Searching bootorder for: /rom@genroms/kvmvapic.bin
Searching bootorder for: HALT
Running option rom at cb80:0003
Space available for UMB: ce000-ee800, f5fa0-f63c0
Returned 126976 bytes of ZoneHigh
e820 map has 7 items:
  0: 0000000000000000 - 000000000009f800 = 1 RAM
  1: 000000000009f800 - 00000000000a0000 = 2 RESERVED
  2: 00000000000f0000 - 0000000000100000 = 2 RESERVED
  3: 0000000000100000 - 000000003ffdf000 = 1 RAM
  4: 000000003ffdf000 - 0000000040000000 = 2 RESERVED
  5: 00000000feffc000 - 00000000ff000000 = 2 RESERVED
  6: 00000000fffc0000 - 0000000100000000 = 2 RESERVED

> -Kevin
> 
> 
> --- a/src/fw/smp.c
> +++ b/src/fw/smp.c
> @@ -128,6 +128,7 @@ smp_setup(void)
>  
>      // Wait for other CPUs to process the SIPI.
>      u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
> +    dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count);
>      while (cmos_smp_count != CountCPUs)
>          asm volatile(
>              // Release lock and allow other processors to use the stack.
> @@ -140,6 +141,8 @@ smp_setup(void)
>              : "+m" (SMPLock), "+m" (SMPStack)
>              : : "cc", "memory");
>      yield();
> +    dprintf(1, "cmos_smp_count2=%d/%d\n", cmos_smp_count
> +            , rtc_read(CMOS_BIOS_SMP_COUNT) + 1);
>  
>      // Restore memory.
>      *(u64*)BUILD_AP_BOOT_ADDR = old;
> diff --git a/src/post.c b/src/post.c
> index 9ea5620..dc11c72 100644
> --- a/src/post.c
> +++ b/src/post.c
> @@ -170,6 +170,7 @@ platform_hardware_setup(void)
>      clock_setup();
>  
>      // Platform specific setup
> +    dprintf(1, "cmos_smp_count0=%d\n", rtc_read(CMOS_BIOS_SMP_COUNT) + 1);
>      qemu_platform_setup();
>      coreboot_platform_setup();
>  }
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Kevin O'Connor March 11, 2015, 5:37 p.m. UTC | #2
On Wed, Mar 11, 2015 at 04:52:03PM +0000, Dr. David Alan Gilbert wrote:
> * Kevin O'Connor (kevin@koconnor.net) wrote:
> > So, I couldn't get this to fail on my older AMD machine at all with
> > the default SeaBIOS code.  But, when I change the code with the patch
> > below, it failed right away.
[...]
> > And the failed debug output looks like:
> > 
> > SeaBIOS (version rel-1.8.0-7-gd23eba6-dirty-20150311_121819-morn.localdomain)
> > [...]
> > cmos_smp_count0=20
> > [...]
> > cmos_smp_count=1
> > cmos_smp_count2=1/20
> > Found 1 cpu(s) max supported 20 cpu(s)
> > 
> > I'm going to check the assembly for a compiler error, but is it
> > possible QEMU is returning incorrect data in cmos index 0x5f?

I checked the SeaBIOS assembler and it looks sane.  So, I think the
question is, why is QEMU sometimes returning a 0 instead of 127 from
cmos 0x5f.

> > David, any chance you can recompile seabios and double check your
> > output?
> 
> Done;
> 
> =========== Working ===========
> SeaBIOS (version rel-1.8.0-0-g4c59f5d-dirty-20150311_164408-dgilbert-t530)
[...]
> cmos_smp_count0=20
> cmos_smp_count=20
> cmos_smp_count2=20/20
> Found 20 cpu(s) max supported 20 cpu(s)
[...]
> =========== Broken ===========
> SeaBIOS (version rel-1.8.0-0-g4c59f5d-dirty-20150311_164408-dgilbert-t530)
[...]
> cmos_smp_count0=20
> cmos_smp_count=1
> cmos_smp_count2=1/20
> Found 1 cpu(s) max supported 20 cpu(s)

That's the same pattern I see.

-Kevin
Paolo Bonzini March 11, 2015, 5:41 p.m. UTC | #3
On 11/03/2015 18:37, Kevin O'Connor wrote:
> > I'm going to check the assembly for a compiler error, but is it
> > possible QEMU is returning incorrect data in cmos index 0x5f?
> 
> I checked the SeaBIOS assembler and it looks sane.  So, I think the
> question is, why is QEMU sometimes returning a 0 instead of 127 from
> cmos 0x5f.

Dave, can you get a KVM trace (trace-cmd record -e kvm/*
qemu-system-x86_64 ...) and send it to me privately?

Paolo
Dr. David Alan Gilbert March 11, 2015, 5:59 p.m. UTC | #4
* Kevin O'Connor (kevin@koconnor.net) wrote:
> On Wed, Mar 11, 2015 at 04:52:03PM +0000, Dr. David Alan Gilbert wrote:
> > * Kevin O'Connor (kevin@koconnor.net) wrote:
> > > So, I couldn't get this to fail on my older AMD machine at all with
> > > the default SeaBIOS code.  But, when I change the code with the patch
> > > below, it failed right away.
> [...]
> > > And the failed debug output looks like:
> > > 
> > > SeaBIOS (version rel-1.8.0-7-gd23eba6-dirty-20150311_121819-morn.localdomain)
> > > [...]
> > > cmos_smp_count0=20
> > > [...]
> > > cmos_smp_count=1
> > > cmos_smp_count2=1/20
> > > Found 1 cpu(s) max supported 20 cpu(s)
> > > 
> > > I'm going to check the assembly for a compiler error, but is it
> > > possible QEMU is returning incorrect data in cmos index 0x5f?
> 
> I checked the SeaBIOS assembler and it looks sane.  So, I think the
> question is, why is QEMU sometimes returning a 0 instead of 127 from
> cmos 0x5f.

My reading of the logs I've just created is that qemu doesn't think
it's ever being asked to read 5f in the failed case:

good:

pc_cmos_init 5f setting smp_cpus=20
cmos: read index=0x0f val=0x00
cmos: read index=0x34 val=0x00
cmos: read index=0x35 val=0x3f
cmos: read index=0x38 val=0x30
cmos: read index=0x3d val=0x12
cmos: read index=0x38 val=0x30
cmos: read index=0x0b val=0x02
cmos: read index=0x0d val=0x80
cmos: read index=0x5f val=0x13  Yeh!
cmos: read index=0x0f val=0x00
cmos: read index=0x0f val=0x00
cmos: read index=0x0f val=0x00

bad:
pc_cmos_init 5f setting smp_cpus=20
cmos: read index=0x0f val=0x00
cmos: read index=0x34 val=0x00
cmos: read index=0x35 val=0x3f
cmos: read index=0x38 val=0x30
cmos: read index=0x3d val=0x12
cmos: read index=0x38 val=0x30
cmos: read index=0x0b val=0x02
cmos: read index=0x0d val=0x80  Oh!
cmos: read index=0x0f val=0x00
cmos: read index=0x0f val=0x00
cmos: read index=0x0f val=0x00

Dave

> 
> > > David, any chance you can recompile seabios and double check your
> > > output?
> > 
> > Done;
> > 
> > =========== Working ===========
> > SeaBIOS (version rel-1.8.0-0-g4c59f5d-dirty-20150311_164408-dgilbert-t530)
> [...]
> > cmos_smp_count0=20
> > cmos_smp_count=20
> > cmos_smp_count2=20/20
> > Found 20 cpu(s) max supported 20 cpu(s)
> [...]
> > =========== Broken ===========
> > SeaBIOS (version rel-1.8.0-0-g4c59f5d-dirty-20150311_164408-dgilbert-t530)
> [...]
> > cmos_smp_count0=20
> > cmos_smp_count=1
> > cmos_smp_count2=1/20
> > Found 1 cpu(s) max supported 20 cpu(s)
> 
> That's the same pattern I see.
> 
> -Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Bandan Das March 11, 2015, 6:24 p.m. UTC | #5
"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Kevin O'Connor (kevin@koconnor.net) wrote:
>> On Wed, Mar 11, 2015 at 04:52:03PM +0000, Dr. David Alan Gilbert wrote:
>> > * Kevin O'Connor (kevin@koconnor.net) wrote:
>> > > So, I couldn't get this to fail on my older AMD machine at all with
>> > > the default SeaBIOS code.  But, when I change the code with the patch
>> > > below, it failed right away.
>> [...]
>> > > And the failed debug output looks like:
>> > > 
>> > > SeaBIOS (version rel-1.8.0-7-gd23eba6-dirty-20150311_121819-morn.localdomain)
>> > > [...]
>> > > cmos_smp_count0=20
>> > > [...]
>> > > cmos_smp_count=1
>> > > cmos_smp_count2=1/20
>> > > Found 1 cpu(s) max supported 20 cpu(s)
>> > > 
>> > > I'm going to check the assembly for a compiler error, but is it
>> > > possible QEMU is returning incorrect data in cmos index 0x5f?
>> 
>> I checked the SeaBIOS assembler and it looks sane.  So, I think the
>> question is, why is QEMU sometimes returning a 0 instead of 127 from
>> cmos 0x5f.
>
> My reading of the logs I've just created is that qemu doesn't think
> it's ever being asked to read 5f in the failed case:
>
> good:
>
> pc_cmos_init 5f setting smp_cpus=20
> cmos: read index=0x0f val=0x00
> cmos: read index=0x34 val=0x00
> cmos: read index=0x35 val=0x3f
> cmos: read index=0x38 val=0x30
> cmos: read index=0x3d val=0x12
> cmos: read index=0x38 val=0x30
> cmos: read index=0x0b val=0x02
> cmos: read index=0x0d val=0x80
> cmos: read index=0x5f val=0x13  Yeh!
> cmos: read index=0x0f val=0x00
> cmos: read index=0x0f val=0x00
> cmos: read index=0x0f val=0x00
>
> bad:
> pc_cmos_init 5f setting smp_cpus=20
> cmos: read index=0x0f val=0x00
> cmos: read index=0x34 val=0x00
> cmos: read index=0x35 val=0x3f
> cmos: read index=0x38 val=0x30
> cmos: read index=0x3d val=0x12
> cmos: read index=0x38 val=0x30
> cmos: read index=0x0b val=0x02
> cmos: read index=0x0d val=0x80  Oh!
> cmos: read index=0x0f val=0x00
> cmos: read index=0x0f val=0x00
> cmos: read index=0x0f val=0x00

rtc_read in hw/rt.c is:
u8
rtc_read(u8 index)
{
    index |= NMI_DISABLE_BIT;
    outb(index, PORT_CMOS_INDEX);
    return inb(PORT_CMOS_DATA);
}

Is it possible that the read would happen before the index has been committed ?

> Dave
>
diff mbox

Patch

--- a/src/fw/smp.c
+++ b/src/fw/smp.c
@@ -128,6 +128,7 @@  smp_setup(void)
 
     // Wait for other CPUs to process the SIPI.
     u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1;
+    dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count);
     while (cmos_smp_count != CountCPUs)
         asm volatile(
             // Release lock and allow other processors to use the stack.
@@ -140,6 +141,8 @@  smp_setup(void)
             : "+m" (SMPLock), "+m" (SMPStack)
             : : "cc", "memory");
     yield();
+    dprintf(1, "cmos_smp_count2=%d/%d\n", cmos_smp_count
+            , rtc_read(CMOS_BIOS_SMP_COUNT) + 1);
 
     // Restore memory.
     *(u64*)BUILD_AP_BOOT_ADDR = old;
diff --git a/src/post.c b/src/post.c
index 9ea5620..dc11c72 100644
--- a/src/post.c
+++ b/src/post.c
@@ -170,6 +170,7 @@  platform_hardware_setup(void)
     clock_setup();
 
     // Platform specific setup
+    dprintf(1, "cmos_smp_count0=%d\n", rtc_read(CMOS_BIOS_SMP_COUNT) + 1);
     qemu_platform_setup();
     coreboot_platform_setup();
 }