From patchwork Thu Nov 7 21:12:31 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Laszlo Ersek X-Patchwork-Id: 289482 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id EB4962C008F for ; Fri, 8 Nov 2013 08:11:03 +1100 (EST) Received: from localhost ([::1]:42216 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VeWqz-0000lw-SS for incoming@patchwork.ozlabs.org; Thu, 07 Nov 2013 16:11:01 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49588) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VeWqA-0000Wh-7Y for qemu-devel@nongnu.org; Thu, 07 Nov 2013 16:10:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VeWq2-0003B9-Ha for qemu-devel@nongnu.org; Thu, 07 Nov 2013 16:10:09 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45840) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VeWq2-00034b-2X for qemu-devel@nongnu.org; Thu, 07 Nov 2013 16:10:02 -0500 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id rA7L9mI5023583 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 7 Nov 2013 16:09:48 -0500 Received: from lacos-laptop.usersys.redhat.com (vpn1-6-189.ams2.redhat.com [10.36.6.189]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id rA7L9ipj004169; Thu, 7 Nov 2013 16:09:45 -0500 Message-ID: <527C023F.2060506@redhat.com> Date: Thu, 07 Nov 2013 22:12:31 +0100 From: Laszlo Ersek User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130912 Thunderbird/17.0.9 MIME-Version: 1.0 To: qemu-devel References: <1383511723-11228-1-git-send-email-marcel.a@redhat.com> In-Reply-To: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: Peter Maydell , "edk2-devel@lists.sourceforge.net" , "Michael S. Tsirkin" , Jan Kiszka , Marcel Apfelbaum , Anthony Liguori , Paolo Bonzini , Jordan Justen , afaerber , rth@twiddle.net Subject: Re: [Qemu-devel] [PATCH] exec: fix regression by making system-memory region UINT64_MAX size X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org This is a QEMU bug report, only disguised as an edk2-devel followup. The problem in a nutshell is that the OVMF binary, placed into pflash (read-only KVM memslot) used to run in qemu-1.6, but it fails to start in qemu-1.7. The OVMF reset vector reads as 0xFF bytes. (Jordan and myself started writing these emails in parallel.) On 11/07/13 21:27, Jordan Justen wrote: > On Sun, Nov 3, 2013 at 12:48 PM, Marcel Apfelbaum > wrote: >> The commit: >> >> Commit: a53ae8e934cd54686875b5bcfc2f434244ee55d6 >> Author: Marcel Apfelbaum >> Date: Mon Sep 16 11:21:16 2013 +0300 >> >> hw/pci: partially handle pci master abort >> >> introduced a regression on make check: > > Laszlo pointed out that my OVMF flash support series was not working > with QEMU master. It was working with QEMU 1.6.0. I then bisected the > issue to this commit. It seems this commit regresses -pflash support > for both KVM and non-KVM modes. > > Can you reproduce the issue this with command? > x86_64-softmmu/qemu-system-x86_64 -pflash pc-bios/bios.bin > (with or without adding -enable-kvm) > > I tried adding this patch ("exec: fix regression by making > system-memory region UINT64_MAX size") and it did not help the -pflash > regression. From the edk2-devel discussion: On 11/07/13 19:07, Laszlo Ersek wrote: > On 11/07/13 17:28, Laszlo Ersek wrote: >> On 11/06/13 23:29, Jordan Justen wrote: >>> https://github.com/jljusten/edk2.git ovmf-nvvars-v2 >>> >>> This series implements support for QEMU's emulated >>> system flash. >>> >>> This allows for persistent UEFI non-volatile variables. >>> >>> Previously we attempted to emulate non-volatile >>> variables in a few ways, but each of them would fail >>> in particular situations. >>> >>> To support flash based variable storage, we: >>> * Reserve space in the firmware device image >>> * Add a new flash firmware volume block driver >>> * Disable EmuVariableFvbRuntimeDxe (at runtime) if QEMU >>> flash is available. >>> >>> To use: >>> * QEMU version 1.1 or newer is required without KVM >>> * KVM support requires Linux 3.7 and QEMU 1.6 >>> * Run QEMU with -pflash OVMF.fd instead of -L or -bios >>> or use OvmfPkg/build.sh --enable-flash qemu ... >>> * If QEMU is 1.6 or newer, then OvmfPkg/build.sh will >>> automatically enable flash when running QEMU, so in >>> that case --enable-flash is not required. >>> >>> See also: >>> * http://wiki.qemu.org/Features/PC_System_Flash >>> >>> v2: >>> * Replace >>> "OvmfPkg/AcpiPlatformDxe/Qemu: Allow high runtime memory regions" >>> with >>> "OvmfPkg/AcpiPlatformDxe/Qemu: Decrease upper limit for PCI window 32" >>> * Separate portions of >>> "OvmfPkg/build.sh: Support --enable-flash switch" >>> out into a new patch >>> "OvmfPkg/build.sh: Enable flash for QEMU >= 1.6" >>> * Add "OvmfPkg/README: Add information about OVMF flash layout" >>> * Update "OvmfPkg: Add NV Variable storage within FD" to also change the >>> size of PcdVariableStoreSize >>> * Update commit messages on a couple patches for better clarity >> >> Tested in the following configurations: >> >> (1) RHEL-6 host (no KVM support, no qemu support -- that is, >> regression test): RHEL-6, Fedora 19, Windows 2008 R2 (needs CSM), >> Windows 2012 R2 boot tests work OK. >> >> (2) 3.10-based host kernel, qemu v1.7.0-rc0, Xeon W3550 host CPU: >> Unfortunately qemu dies with the following KVM trace: >> >> KVM internal error. Suberror: 1 >> emulation failure >> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000623 >> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 >> EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 >> ES =0000 00000000 0000ffff 00009300 >> CS =f000 ffff0000 0000ffff 00009b00 >> SS =0000 00000000 0000ffff 00009300 >> DS =0000 00000000 0000ffff 00009300 >> FS =0000 00000000 0000ffff 00009300 >> GS =0000 00000000 0000ffff 00009300 >> LDT=0000 00000000 0000ffff 00008200 >> TR =0000 00000000 0000ffff 00008b00 >> GDT= 00000000 0000ffff >> IDT= 00000000 0000ffff >> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 >> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 >> DR6=00000000ffff0ff0 DR7=0000000000000400 >> EFER=0000000000000000 >> Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> >> I'm quite unsure, but the CS:IP value seems to point at the reset >> vector, no? However, the Code=... log only shows 0xFF bytes. >> >> (3) 3.10.17 host kernel, qemu v1.7.0-rc0, Athlon II X2 B22 host CPU: >> almost identical KVM error, with the following difference: >> >> --- vmx 2013-11-07 17:23:45.612973935 +0100 >> +++ svm 2013-11-07 17:24:17.237973059 +0100 >> @@ -1,6 +1,6 @@ >> KVM internal error. Suberror: 1 >> emulation failure >> -EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000623 >> +EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663 >> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 >> EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 >> ES =0000 00000000 0000ffff 00009300 >> >> Any ideas? > > I. > This is a QEMU regression (somewhere between v1.6.0 and v1.7.0-rc0), > I'll have to bisect it. This bug is "caused" by the following commit: a53ae8e934cd54686875b5bcfc2f434244ee55d6 is the first bad commit commit a53ae8e934cd54686875b5bcfc2f434244ee55d6 Author: Marcel Apfelbaum Date: Mon Sep 16 11:21:16 2013 +0300 hw/pci: partially handle pci master abort A MemoryRegion with negative priority was created and it spans over all the pci address space. It "intercepts" the accesses to unassigned pci address space and will follow the pci spec: 1. returns -1 on read 2. does nothing on write Note: setting the RECEIVED MASTER ABORT bit in the STATUS register of the device that initiated the transaction will be implemented in another series Signed-off-by: Marcel Apfelbaum Acked-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin The patch was posted as patch 3/3 of the series [Qemu-devel] [PATCH v5 0/3] pci: partially implement master abort protocol http://thread.gmane.org/gmane.comp.emulators.qemu/233798/focus=233801 Basically, the series adds a "background" memory region that covers all "unassigned" PCI addresses, and patch 3/3 specifically makes sure that writes to this region are dropped, and reads all return -1 (0xFFFFFFFF). The read implementation (master_abort_mem_read(), -1) is consistent with the KVM dump in the quoted part above. For some reason, this "background" region pushes into the "foreground" when it comes to the pflash region just below 4GB (in guest-phys address space). Or, hm, supposing we start to run in real mode at FFFF:0000, the problem could be with the "isa-bios" region too. So I think we have a bug in qemu, which is likely one of the three below: (1) This commit is wrong. Or, (2) the pflash implementation is wrong, because it doesn't register a memory region (with appropriate priority) that would catch the access. (3) Both this commit and the pflash implementations are right, but this is an unusual situation that the memory infrastructure doesn't handle well. (I doubt that the problem is in KVM.) When the bug hits, the "info qtree" command has the following to say about the flash device: dev: cfi.pflash01, id "" drive = pflash0 num-blocks = 512 sector-length = 4096 width = 1 big-endian = 0 id0 = 0 id1 = 0 id2 = 0 id3 = 0 name = "system.flash" irq 0 mmio 00000000ffe00000/0000000000200000 The "info mtree" command lists: memory 0000000000000000-7ffffffffffffffe (prio 0, RW): system [...] 0000000060000000-00000000ffffffff (prio 0, RW): alias pci-hole @pci 0000000060000000-00000000ffffffff [...] 00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash [...] pci <-------- referred to as "rom_memory" in the source (pci is enabled) 0000000000000000-7ffffffffffffffe (prio 0, RW): pci 0000000000000000-7ffffffffffffffe (prio -2147483648, RW): pci-master-abort [...] 00000000000e0000-00000000000fffff (prio 1, R-): isa-bios For some reason, the range called "pci-master-abort" takes priority over "isa-bios" (and/or "system.flash"). I wrote the attached debug patch (for v1.7.0-rc0). It produces quite a bit of output, but grepping it for isa-bios|system\.flash|pci-master-abort|pci-hole results in the following messages: adding subregion 'system.flash' to region 'system' at offset ffe00000 subregion 'system.flash' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096) subregion 'system.flash' (prio: 0) wins over sibling subregion 'ram-below-4g' (prio: 0) adding subregion 'isa-bios' to region 'pci' at offset e0000 subregion 'isa-bios' (prio: 1) inserted at tail subregion 'pc.rom' (prio: 1) wins over sibling subregion 'isa-bios' (prio: 1) adding subregion 'pci-master-abort' to region 'pci' at offset 0 subregion 'pci-master-abort' (prio: -2147483648) loses to sibling subregion 'pc.rom' (prio: 1) subregion 'pci-master-abort' (prio: -2147483648) loses to sibling subregion 'isa-bios' (prio: 1) subregion 'pci-master-abort' (prio: -2147483648) inserted at tail adding subregion 'pci-hole' to region 'system' at offset 60000000 warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash) subregion 'pci-hole' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096) subregion 'pci-hole' (prio: 0) wins over sibling subregion 'system.flash' (prio: 0) adding subregion 'pci-hole64' to region 'system' at offset 100000000 subregion 'pci-hole64' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096) subregion 'pci-hole64' (prio: 0) wins over sibling subregion 'pci-hole' (prio: 0) subregion 'smram-region' (prio: 1) wins over sibling subregion 'pci-hole64' (prio: 0) warning: subregion collision fec00000/1000 (kvm-ioapic) vs 60000000/a0000000 (pci-hole) subregion 'kvm-ioapic' (prio: 0) wins over sibling subregion 'pci-hole64' (prio: 0) warning: subregion collision fed00000/400 (hpet) vs 60000000/a0000000 (pci-hole) I think I know what's going on... There's even a warning above, printed by original (albeit disabled) qemu source code. The "pci-hole" subregion, which is an alias, takes priority over "system.flash". And, unfortunately, "pci-hole" provides a window into "pci-master-abort". I think this should be fixable by raising the priority of "system.flash" to 2048. This way the relationship between "system.flash" and any other region will not change, except it's going to be reversed with "pci-hole". The 2nd attached patch seems to solve the problem for me. I'll resubmit it as a standalone patch if it is deemed good. With it in place, I can run OVMF just fine. And: @@ -28,170 +28,224 @@ adding subregion 'pci-conf-data' to region 'io' at offset cfc subregion 'pci-conf-data' (prio: 0) wins over sibling subregion 'pci-conf-idx' (prio: 0) adding subregion 'pci-hole' to region 'system' at offset 60000000 -warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash) subregion 'pci-hole' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096) -subregion 'pci-hole' (prio: 0) wins over sibling subregion 'system.flash' (prio: 0) +subregion 'pci-hole' (prio: 0) loses to sibling subregion 'system.flash' (prio: 2048) +subregion 'pci-hole' (prio: 0) wins over sibling subregion 'ram-below-4g' (prio: 0) According to the debug patch, the flash region starts to win over quite a few unrelated other regions as well. But in practice I could see no adverse effects -- the priority matters only when the addresses actually overlap, and "system.flash" should not overlap with anything but "pci-hole". I'm attaching the following files as well: - the "info mtree" output before and after the patch, - the full output of the debug patch (before and after). Thanks, Laszlo From 44854609c3134ec174593a890dfb1a9340b714f7 Mon Sep 17 00:00:00 2001 From: Laszlo Ersek Date: Thu, 7 Nov 2013 21:19:07 +0100 Subject: [PATCH 1/2] debug messages for memory_region_add_subregion_common() Signed-off-by: Laszlo Ersek --- memory.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/memory.c b/memory.c index 28f6449..5e60a6c 100644 --- a/memory.c +++ b/memory.c @@ -1427,6 +1427,10 @@ static void memory_region_add_subregion_common(MemoryRegion *mr, memory_region_ref(subregion); subregion->parent = mr; subregion->addr = offset; + + fprintf(stderr, "adding subregion '%s' to region '%s' at offset %llx\n", + subregion->name, mr->name, (unsigned long long)offset); + QTAILQ_FOREACH(other, &mr->subregions, subregions_link) { if (subregion->may_overlap || other->may_overlap) { continue; @@ -1437,8 +1441,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr, int128_make64(other->addr))) { continue; } -#if 0 - printf("warning: subregion collision %llx/%llx (%s) " +#if 1 + fprintf(stderr, "warning: subregion collision %llx/%llx (%s) " "vs %llx/%llx (%s)\n", (unsigned long long)offset, (unsigned long long)int128_get64(subregion->size), @@ -1448,12 +1452,21 @@ static void memory_region_add_subregion_common(MemoryRegion *mr, other->name); #endif } + QTAILQ_FOREACH(other, &mr->subregions, subregions_link) { if (subregion->priority >= other->priority) { + fprintf(stderr, "subregion '%s' (prio: %d) wins over sibling " + "subregion '%s' (prio: %d)\n", subregion->name, + subregion->priority, other->name, other->priority); QTAILQ_INSERT_BEFORE(other, subregion, subregions_link); goto done; } + fprintf(stderr, "subregion '%s' (prio: %d) loses to sibling " + "subregion '%s' (prio: %d)\n", subregion->name, + subregion->priority, other->name, other->priority); } + fprintf(stderr, "subregion '%s' (prio: %d) inserted at tail\n", + subregion->name, subregion->priority); QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link); done: memory_region_update_pending |= mr->enabled && subregion->enabled;