From patchwork Wed Mar 11 16:37:39 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kevin O'Connor X-Patchwork-Id: 449077 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id A5D8C14016B for ; Thu, 12 Mar 2015 03:38:05 +1100 (AEDT) Received: from localhost ([::1]:55840 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YVjdz-0000zD-TD for incoming@patchwork.ozlabs.org; Wed, 11 Mar 2015 12:38:03 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48771) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YVjdh-0000f8-Rq for qemu-devel@nongnu.org; Wed, 11 Mar 2015 12:37:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YVjde-0004BQ-CU for qemu-devel@nongnu.org; Wed, 11 Mar 2015 12:37:45 -0400 Received: from mail-vc0-f181.google.com ([209.85.220.181]:35612) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YVjde-0004BH-7c for qemu-devel@nongnu.org; Wed, 11 Mar 2015 12:37:42 -0400 Received: by mail-vc0-f181.google.com with SMTP id hq12so3369312vcb.12 for ; Wed, 11 Mar 2015 09:37:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=PmJ82hAZk/SE3mqKVUOvjJLDFt7vni4LlwyTENFBLBc=; b=BjosU21xSDzcAdI1GQQzBjTyj/rOA+3KGHWp1EFLuuN4DgI1yjG+vsuLxvpiYTAinT s9H3mwvh9wLQQVGIdmrMOZJ6kaBl4h9tzhOiTHsMAScnQzrqiMeeY3lF0IeFIGSygwZr kc/kdtR+nY5W4wDivuIjSJNH6Cp9El89et3xoYC8SDGZfZ8B143MISdxGXdMzpCvIp1K nqwKXEdxbWt3tUHE+s7/r/t+zsHdvK2eCokgSp3t0VfiUgqVU6W2oLQhSYtj94oDI8aS N8ZdFa2RqNv7o0adM0OehCvTkgwjlexDzcNWb+zw5lpynDg6GVEppct4OrT8byqiy+sV bLnA== X-Gm-Message-State: ALoCoQna3AnJXaoWZJ6evqAew3BoAO1DAtpD39WZm405rxFnSeKV6DNXBpNyy6y+LjtWaqpyFINn X-Received: by 10.55.20.213 with SMTP id 82mr55326903qku.46.1426091861548; Wed, 11 Mar 2015 09:37:41 -0700 (PDT) Received: from localhost (207-172-170-53.c3-0.avec-ubr1.nyr-avec.ny.cable.rcn.com. [207.172.170.53]) by mx.google.com with ESMTPSA id o7sm2890855qge.8.2015.03.11.09.37.40 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Mar 2015 09:37:40 -0700 (PDT) Date: Wed, 11 Mar 2015 12:37:39 -0400 From: Kevin O'Connor To: "Dr. David Alan Gilbert" Message-ID: <20150311163739.GA29522@morn.localdomain> References: <20150310165755.GL2338@work-vm> <54FF337A.1010202@redhat.com> <54FF4541.9080608@redhat.com> <20150310202958.GR2338@work-vm> <20150311134556.GH2334@work-vm> <20150311154220.GA26463@morn.localdomain> <20150311155306.GK2334@work-vm> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150311155306.GK2334@work-vm> User-Agent: Mutt/1.5.23 (2014-03-12) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.85.220.181 Cc: Andrey Korolyov , "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , Bandan Das , kraxel@redhat.com, Paolo Bonzini Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org On Wed, Mar 11, 2015 at 03:53:07PM +0000, Dr. David Alan Gilbert wrote: > * Kevin O'Connor (kevin@koconnor.net) wrote: > > On Wed, Mar 11, 2015 at 01:45:57PM +0000, Dr. David Alan Gilbert wrote: > > > * Bandan Das (bsd@redhat.com) wrote: > > > > "Dr. David Alan Gilbert" writes: > > > > > while true; do (sleep 5; echo -e '\001cq\n')|/opt/qemu-try-world3/bin/qemu-system-x86_64 -machine pc-i440fx-2.0,accel=kvm -m 1024 -smp 128 -nographic -device sga 2>&1 | tee /tmp/qemu.op; grep "internal error" /tmp/qemu.op -q && break; done That is a truly impressive command line, BTW. > > > > > [root@virtlab413 qemu-world3]# git bisect bad > > > > > 21f5826a04d38e19488f917e1eef22751490c769 is the first bad commit > > > > > > > > I can reproduce this on E5-2620 v2 with David's "while true" test. > > > > (The emulation failure I mean, not the suberror 2 that Andrey is seeing) > > > > The commit that seems to have introduced this is - > > > > > > > > commit 0673b7870063a3affbad9046fb6d385a4e734c19 > > > > Author: Kevin O'Connor > > > > Date: Sat May 24 10:49:50 2014 -0400 > > > > > > > > smp: Replace QEMU SMP init assembler code with C; run only in 32bit mode. > > [...] > > > Turning on debug logging > > > ( -chardev file,id=log,path=/tmp/debugcon.$$ -device isa-debugcon,chardev=log,iobase=0x402 ) > > > > > > SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org) > > [...] > > > Found 1 cpu(s) max supported 128 cpu(s) > > > > Something is very odd here. When I run the above command (on an older > > AMD machine) I get: > > > > Found 128 cpu(s) max supported 128 cpu(s) > > > > That first value (1 vs 128) comes from QEMU (via cmos index 0x5f). > > That is, during smp init, SeaBIOS expects QEMU to tell it how many > > cpus are active, and SeaBIOS waits until that many CPUs check in from > > its SIPI request before proceeding. > > > > I wonder if QEMU reported only 1 active cpu via that cmos register, > > but more were actually active. If that was the case, it could > > certainly explain the failure - as multiple cpus could be running > > without the sipi trapoline in place. > > > > What does the log look like on a non-failure case? > > I had to drop down from 128 to get a working run with debug; here > are two runs with -smp 20 the first one worked, the second one > failed. [...] > =========== Working =========== > > SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org) [...] > Found 20 cpu(s) max supported 20 cpu(s) [...] > =========== Broken =========== > > SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org) [...] > Found 1 cpu(s) max supported 20 cpu(s) So, I couldn't get this to fail on my older AMD machine at all with the default SeaBIOS code. But, when I change the code with the patch below, it failed right away. KVM internal error. Suberror: 1 emulation failure EAX=00000000 EBX=00000000 ECX=00000000 EDX=000fd2b8 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=000fd2c1 EFL=00000007 [-----PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008300 DPL=0 TSS16-busy GDT= 000f6a50 00000037 IDT= 000f6a8e 00000000 CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=66 ba b8 d2 0f 00 e9 a2 fe f3 90 f0 0f ba 2d 04 ff fb 3f 00 <72> f3 8b 25 00 ff fb 3f e8 d2 65 ff ff c7 05 04 ff fb 3f 00 00 00 00 f4 eb fd fa fc 66 b8 And the failed debug output looks like: SeaBIOS (version rel-1.8.0-7-gd23eba6-dirty-20150311_121819-morn.localdomain) [...] cmos_smp_count0=20 [...] cmos_smp_count=1 cmos_smp_count2=1/20 Found 1 cpu(s) max supported 20 cpu(s) I'm going to check the assembly for a compiler error, but is it possible QEMU is returning incorrect data in cmos index 0x5f? David, any chance you can recompile seabios and double check your output? -Kevin --- a/src/fw/smp.c +++ b/src/fw/smp.c @@ -128,6 +128,7 @@ smp_setup(void) // Wait for other CPUs to process the SIPI. u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; + dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count); while (cmos_smp_count != CountCPUs) asm volatile( // Release lock and allow other processors to use the stack. @@ -140,6 +141,8 @@ smp_setup(void) : "+m" (SMPLock), "+m" (SMPStack) : : "cc", "memory"); yield(); + dprintf(1, "cmos_smp_count2=%d/%d\n", cmos_smp_count + , rtc_read(CMOS_BIOS_SMP_COUNT) + 1); // Restore memory. *(u64*)BUILD_AP_BOOT_ADDR = old; diff --git a/src/post.c b/src/post.c index 9ea5620..dc11c72 100644 --- a/src/post.c +++ b/src/post.c @@ -170,6 +170,7 @@ platform_hardware_setup(void) clock_setup(); // Platform specific setup + dprintf(1, "cmos_smp_count0=%d\n", rtc_read(CMOS_BIOS_SMP_COUNT) + 1); qemu_platform_setup(); coreboot_platform_setup(); }