Message ID | 527C023F.2060506@redhat.com |
---|---|
State | New |
Headers | show |
Il 07/11/2013 22:12, Laszlo Ersek ha scritto: > 0000000000000000-7ffffffffffffffe (prio 0, RW): system > [...] > 0000000060000000-00000000ffffffff (prio 0, RW): alias pci-hole @pci 0000000060000000-00000000ffffffff > [...] > 00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash > [...] Priorities are not "transitive" across aliases; once you use an alias to map a region, the alias's priority counts, not the target region's priority. So the INT_MIN priority for pci-master-abort counts *within the alias*, but the choice between pci-hole and system.flash is only affected by the priorities of pci-hole and system.flash. You could give a smaller priority (-1 or INT_MIN) to pci-hole and just let it occupy the whole address space, from 0 to UINT64_MAX. Or perhaps the pci-hole alias is too large and it should end before the system flash area. Both solutions should work. Paolo
On Thu, 2013-11-07 at 22:12 +0100, Laszlo Ersek wrote: > This is a QEMU bug report, only disguised as an edk2-devel followup. > > The problem in a nutshell is that the OVMF binary, placed into pflash > (read-only KVM memslot) used to run in qemu-1.6, but it fails to start > in qemu-1.7. The OVMF reset vector reads as 0xFF bytes. > > (Jordan and myself started writing these emails in parallel.) > > On 11/07/13 21:27, Jordan Justen wrote: > > On Sun, Nov 3, 2013 at 12:48 PM, Marcel Apfelbaum > > <marcel.a@redhat.com> wrote: > >> The commit: > >> > >> Commit: a53ae8e934cd54686875b5bcfc2f434244ee55d6 > >> Author: Marcel Apfelbaum <marcel.a@redhat.com> > >> Date: Mon Sep 16 11:21:16 2013 +0300 > >> > >> hw/pci: partially handle pci master abort > >> > >> introduced a regression on make check: > > > > Laszlo pointed out that my OVMF flash support series was not working > > with QEMU master. It was working with QEMU 1.6.0. I then bisected the > > issue to this commit. It seems this commit regresses -pflash support > > for both KVM and non-KVM modes. > > > > Can you reproduce the issue this with command? > > x86_64-softmmu/qemu-system-x86_64 -pflash pc-bios/bios.bin > > (with or without adding -enable-kvm) > > > > I tried adding this patch ("exec: fix regression by making > > system-memory region UINT64_MAX size") and it did not help the -pflash > > regression. > > > From the edk2-devel discussion: > <http://thread.gmane.org/gmane.comp.bios.tianocore.devel/4787/focus=4812> > > On 11/07/13 19:07, Laszlo Ersek wrote: > > On 11/07/13 17:28, Laszlo Ersek wrote: > >> On 11/06/13 23:29, Jordan Justen wrote: > >>> https://github.com/jljusten/edk2.git ovmf-nvvars-v2 > >>> > >>> This series implements support for QEMU's emulated > >>> system flash. > >>> > >>> This allows for persistent UEFI non-volatile variables. > >>> > >>> Previously we attempted to emulate non-volatile > >>> variables in a few ways, but each of them would fail > >>> in particular situations. > >>> > >>> To support flash based variable storage, we: > >>> * Reserve space in the firmware device image > >>> * Add a new flash firmware volume block driver > >>> * Disable EmuVariableFvbRuntimeDxe (at runtime) if QEMU > >>> flash is available. > >>> > >>> To use: > >>> * QEMU version 1.1 or newer is required without KVM > >>> * KVM support requires Linux 3.7 and QEMU 1.6 > >>> * Run QEMU with -pflash OVMF.fd instead of -L or -bios > >>> or use OvmfPkg/build.sh --enable-flash qemu ... > >>> * If QEMU is 1.6 or newer, then OvmfPkg/build.sh will > >>> automatically enable flash when running QEMU, so in > >>> that case --enable-flash is not required. > >>> > >>> See also: > >>> * http://wiki.qemu.org/Features/PC_System_Flash > >>> > >>> v2: > >>> * Replace > >>> "OvmfPkg/AcpiPlatformDxe/Qemu: Allow high runtime memory regions" > >>> with > >>> "OvmfPkg/AcpiPlatformDxe/Qemu: Decrease upper limit for PCI window 32" > >>> * Separate portions of > >>> "OvmfPkg/build.sh: Support --enable-flash switch" > >>> out into a new patch > >>> "OvmfPkg/build.sh: Enable flash for QEMU >= 1.6" > >>> * Add "OvmfPkg/README: Add information about OVMF flash layout" > >>> * Update "OvmfPkg: Add NV Variable storage within FD" to also change the > >>> size of PcdVariableStoreSize > >>> * Update commit messages on a couple patches for better clarity > >> > >> Tested in the following configurations: > >> > >> (1) RHEL-6 host (no KVM support, no qemu support -- that is, > >> regression test): RHEL-6, Fedora 19, Windows 2008 R2 (needs CSM), > >> Windows 2012 R2 boot tests work OK. > >> > >> (2) 3.10-based host kernel, qemu v1.7.0-rc0, Xeon W3550 host CPU: > >> Unfortunately qemu dies with the following KVM trace: > >> > >> KVM internal error. Suberror: 1 > >> emulation failure > >> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000623 > >> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 > >> EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 > >> ES =0000 00000000 0000ffff 00009300 > >> CS =f000 ffff0000 0000ffff 00009b00 > >> SS =0000 00000000 0000ffff 00009300 > >> DS =0000 00000000 0000ffff 00009300 > >> FS =0000 00000000 0000ffff 00009300 > >> GS =0000 00000000 0000ffff 00009300 > >> LDT=0000 00000000 0000ffff 00008200 > >> TR =0000 00000000 0000ffff 00008b00 > >> GDT= 00000000 0000ffff > >> IDT= 00000000 0000ffff > >> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 > >> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 > >> DR6=00000000ffff0ff0 DR7=0000000000000400 > >> EFER=0000000000000000 > >> Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> > >> I'm quite unsure, but the CS:IP value seems to point at the reset > >> vector, no? However, the Code=... log only shows 0xFF bytes. > >> > >> (3) 3.10.17 host kernel, qemu v1.7.0-rc0, Athlon II X2 B22 host CPU: > >> almost identical KVM error, with the following difference: > >> > >> --- vmx 2013-11-07 17:23:45.612973935 +0100 > >> +++ svm 2013-11-07 17:24:17.237973059 +0100 > >> @@ -1,6 +1,6 @@ > >> KVM internal error. Suberror: 1 > >> emulation failure > >> -EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000623 > >> +EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663 > >> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 > >> EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 > >> ES =0000 00000000 0000ffff 00009300 > >> > >> Any ideas? > > > > I. > > This is a QEMU regression (somewhere between v1.6.0 and v1.7.0-rc0), > > I'll have to bisect it. > > This bug is "caused" by the following commit: > > a53ae8e934cd54686875b5bcfc2f434244ee55d6 is the first bad commit > commit a53ae8e934cd54686875b5bcfc2f434244ee55d6 > Author: Marcel Apfelbaum <marcel.a@redhat.com> > Date: Mon Sep 16 11:21:16 2013 +0300 > > hw/pci: partially handle pci master abort > > A MemoryRegion with negative priority was created and > it spans over all the pci address space. > It "intercepts" the accesses to unassigned pci > address space and will follow the pci spec: > 1. returns -1 on read > 2. does nothing on write > > Note: setting the RECEIVED MASTER ABORT bit in the STATUS register > of the device that initiated the transaction will be > implemented in another series > > Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com> > Acked-by: Michael S. Tsirkin <mst@redhat.com> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > > The patch was posted as patch 3/3 of the series > > [Qemu-devel] [PATCH v5 0/3] pci: partially implement master abort > protocol > > http://thread.gmane.org/gmane.comp.emulators.qemu/233798/focus=233801 > > Basically, the series adds a "background" memory region that covers all > "unassigned" PCI addresses, and patch 3/3 specifically makes sure that > writes to this region are dropped, and reads all return -1 (0xFFFFFFFF). > > The read implementation (master_abort_mem_read(), -1) is consistent with > the KVM dump in the quoted part above. > > For some reason, this "background" region pushes into the "foreground" > when it comes to the pflash region just below 4GB (in guest-phys address > space). > > Or, hm, supposing we start to run in real mode at FFFF:0000, the problem > could be with the "isa-bios" region too. > > So I think we have a bug in qemu, which is likely one of the three > below: > > (1) This commit is wrong. Or, > (2) the pflash implementation is wrong, because it doesn't register a > memory region (with appropriate priority) that would catch the > access. > (3) Both this commit and the pflash implementations are right, but this > is an unusual situation that the memory infrastructure doesn't > handle well. > > (I doubt that the problem is in KVM.) > > When the bug hits, the "info qtree" command has the following to say > about the flash device: > > dev: cfi.pflash01, id "" > drive = pflash0 > num-blocks = 512 > sector-length = 4096 > width = 1 > big-endian = 0 > id0 = 0 > id1 = 0 > id2 = 0 > id3 = 0 > name = "system.flash" > irq 0 > mmio 00000000ffe00000/0000000000200000 > > The "info mtree" command lists: > > memory > 0000000000000000-7ffffffffffffffe (prio 0, RW): system > [...] > 0000000060000000-00000000ffffffff (prio 0, RW): alias pci-hole @pci 0000000060000000-00000000ffffffff > [...] > 00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash > [...] > pci <-------- referred to as "rom_memory" in the source (pci is enabled) > 0000000000000000-7ffffffffffffffe (prio 0, RW): pci > 0000000000000000-7ffffffffffffffe (prio -2147483648, RW): pci-master-abort > [...] > 00000000000e0000-00000000000fffff (prio 1, R-): isa-bios > > For some reason, the range called "pci-master-abort" takes priority over > "isa-bios" (and/or "system.flash"). > > I wrote the attached debug patch (for v1.7.0-rc0). It produces quite a > bit of output, but grepping it for > > isa-bios|system\.flash|pci-master-abort|pci-hole > > results in the following messages: > > adding subregion 'system.flash' to region 'system' at offset ffe00000 > subregion 'system.flash' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096) > subregion 'system.flash' (prio: 0) wins over sibling subregion 'ram-below-4g' (prio: 0) > > adding subregion 'isa-bios' to region 'pci' at offset e0000 > subregion 'isa-bios' (prio: 1) inserted at tail > subregion 'pc.rom' (prio: 1) wins over sibling subregion 'isa-bios' (prio: 1) > > adding subregion 'pci-master-abort' to region 'pci' at offset 0 > subregion 'pci-master-abort' (prio: -2147483648) loses to sibling subregion 'pc.rom' (prio: 1) > subregion 'pci-master-abort' (prio: -2147483648) loses to sibling subregion 'isa-bios' (prio: 1) > subregion 'pci-master-abort' (prio: -2147483648) inserted at tail > > adding subregion 'pci-hole' to region 'system' at offset 60000000 > warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash) Thank you Laszlo for the detailed info! I think the problem is right above. Why pci-hole and system.flash collide? IMHO we should not play with priorities here, better solve the collision. Thanks, Marcel > subregion 'pci-hole' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096) > subregion 'pci-hole' (prio: 0) wins over sibling subregion 'system.flash' (prio: 0) > > adding subregion 'pci-hole64' to region 'system' at offset 100000000 > subregion 'pci-hole64' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096) > subregion 'pci-hole64' (prio: 0) wins over sibling subregion 'pci-hole' (prio: 0) > subregion 'smram-region' (prio: 1) wins over sibling subregion 'pci-hole64' (prio: 0) > warning: subregion collision fec00000/1000 (kvm-ioapic) vs 60000000/a0000000 (pci-hole) > subregion 'kvm-ioapic' (prio: 0) wins over sibling subregion 'pci-hole64' (prio: 0) > warning: subregion collision fed00000/400 (hpet) vs 60000000/a0000000 (pci-hole) > > I think I know what's going on... There's even a warning above, printed > by original (albeit disabled) qemu source code. > > The "pci-hole" subregion, which is an alias, takes priority over > "system.flash". And, unfortunately, "pci-hole" provides a window into > "pci-master-abort". > > I think this should be fixable by raising the priority of "system.flash" > to 2048. This way the relationship between "system.flash" and any other > region will not change, except it's going to be reversed with > "pci-hole". > > The 2nd attached patch seems to solve the problem for me. I'll resubmit > it as a standalone patch if it is deemed good. > > With it in place, I can run OVMF just fine. And: > > @@ -28,170 +28,224 @@ > adding subregion 'pci-conf-data' to region 'io' at offset cfc > subregion 'pci-conf-data' (prio: 0) wins over sibling subregion 'pci-conf-idx' (prio: 0) > adding subregion 'pci-hole' to region 'system' at offset 60000000 > -warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash) > subregion 'pci-hole' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096) > -subregion 'pci-hole' (prio: 0) wins over sibling subregion 'system.flash' (prio: 0) > +subregion 'pci-hole' (prio: 0) loses to sibling subregion 'system.flash' (prio: 2048) > +subregion 'pci-hole' (prio: 0) wins over sibling subregion 'ram-below-4g' (prio: 0) > > According to the debug patch, the flash region starts to win over quite > a few unrelated other regions as well. But in practice I could see no > adverse effects -- the priority matters only when the addresses actually > overlap, and "system.flash" should not overlap with anything but > "pci-hole". > > I'm attaching the following files as well: > - the "info mtree" output before and after the patch, > - the full output of the debug patch (before and after). > > Thanks, > Laszlo
Il 07/11/2013 22:24, Marcel Apfelbaum ha scritto: > Thank you Laszlo for the detailed info! > I think the problem is right above. Why pci-hole and system.flash collide? > IMHO we should not play with priorities here, better solve the collision. We need to audit all the other boards that support PCI... I'll take a look tomorrow since you guys are off. Paolo
On 11/07/13 22:21, Paolo Bonzini wrote: > Il 07/11/2013 22:12, Laszlo Ersek ha scritto: >> 0000000000000000-7ffffffffffffffe (prio 0, RW): system >> [...] >> 0000000060000000-00000000ffffffff (prio 0, RW): alias pci-hole @pci 0000000060000000-00000000ffffffff >> [...] >> 00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash >> [...] > > Priorities are not "transitive" across aliases; once you use an alias to > map a region, the alias's priority counts, not the target region's > priority. So the INT_MIN priority for pci-master-abort counts *within > the alias*, but the choice between pci-hole and system.flash is only > affected by the priorities of pci-hole and system.flash. Right. It's also documented in docs/memory.txt -- Peter's recent addition I think? > You could give a smaller priority (-1 or INT_MIN) to pci-hole and just > let it occupy the whole address space, from 0 to UINT64_MAX. Or perhaps > the pci-hole alias is too large and it should end before the system > flash area. Both solutions should work. I did reorder pci-hole and system.flash, but rather than lowering pci-hole, I raised system.flash. I have no preference. Thanks Laszlo
On Thu, 2013-11-07 at 22:31 +0100, Paolo Bonzini wrote: > Il 07/11/2013 22:24, Marcel Apfelbaum ha scritto: > > Thank you Laszlo for the detailed info! > > I think the problem is right above. Why pci-hole and system.flash collide? > > IMHO we should not play with priorities here, better solve the collision. > > We need to audit all the other boards that support PCI... I'll take a > look tomorrow since you guys are off. Thanks Paolo, Let me just point out what I know (or I think I know): 1. Not all architectures have the behavior: "Address space that is not RAM(and friends) is for sure PCI". Only x86 behaves like this (I think). That means that you cannot have a 64bit wide pci-hole with lower priority that catches all accesses that are not for RAM(and firends). 2. If the above is right, and making pci-hole 64 bit wide is not an option, playing with pci-holes/other-region priorities it would be just wrong, it would be only to "fight" with the locality of the memory region's priority. That being said, I am looking forward for your findings. Thanks! Marcel > > Paolo
On 11/07/13 22:24, Marcel Apfelbaum wrote: > On Thu, 2013-11-07 at 22:12 +0100, Laszlo Ersek wrote: >> adding subregion 'pci-hole' to region 'system' at offset 60000000 >> warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash) > Thank you Laszlo for the detailed info! > I think the problem is right above. Why pci-hole and system.flash collide? > IMHO we should not play with priorities here, better solve the collision. pc_init1() pc_memory_init() pc_system_firmware_init() pc_system_flash_init() <---- sets base address to 0x100000000ULL - flash_size pflash_cfi01_register() sysbus_mmio_map() sysbus_mmio_map_common() memory_region_add_subregion() i440fx_init() memory_region_init_alias("pci-hole") pc_init1() passes 0x100000000ULL - below_4g_mem_size to i440fx_init() as "pci_hole_size", which is then used as the size of the "pci-hole" alias. We should probably subtract the size of the flash from this, but I don't know how to do that "elegantly". Yet another (output) parameter for pc_memory_init()? Blech. Or look up the end address of "system.flash" by name? Thanks Laszlo
On 7 November 2013 21:38, Marcel Apfelbaum <marcel.a@redhat.com> wrote: > Thanks Paolo, > Let me just point out what I know (or I think I know): > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends) is for sure PCI". > Only x86 behaves like this (I think). More specifically, the x86 pc behaves like this. Other x86 based systems could in theory behave differently (not that we actually model any, I think). > That means that you cannot have a 64bit wide pci-hole > with lower priority that catches all accesses that are not for RAM(and firends). ...but this conclusion is wrong, because the pci-hole region is created by the pc model. So we should create it at the correct priority and size to give the behaviour relative to other devices in the pc model that is required (ie that the hardware has). This doesn't affect any other target architecture or board. That said, I don't know enough about the PC to know what the exact details of the pci-hole are, so I'm not making a statement about what the correct model is. I'm just saying that what you do with the pci-hole and the container it lives in and the other devices in that container is not going to change the behaviour of any other target board. > 2. If the above is right, and making pci-hole 64 bit wide is not an option, > playing with pci-holes/other-region priorities it would be just wrong, > it would be only to "fight" with the locality of the memory region's priority. I have no idea what you mean by "fighting" here. MR priorities apply only within a specific container region[*], to set which of that container's children appears 'above' another. They're totally local to the container (which in this case is part of the PC model, not the generic PCI code) and so the PC model can freely set them to whatever makes most sense. [*] if you didn't already know this, see the recently committed updates to doc/memory.txt for a more detailed explanation. -- PMM
On Thu, 2013-11-07 at 21:51 +0000, Peter Maydell wrote: > On 7 November 2013 21:38, Marcel Apfelbaum <marcel.a@redhat.com> wrote: > > Thanks Paolo, > > Let me just point out what I know (or I think I know): > > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends) is for sure PCI". > > Only x86 behaves like this (I think). > > More specifically, the x86 pc behaves like this. Other > x86 based systems could in theory behave differently > (not that we actually model any, I think). > > > That means that you cannot have a 64bit wide pci-hole > > with lower priority that catches all accesses that are not for RAM(and firends). > > ...but this conclusion is wrong, because the pci-hole > region is created by the pc model. So we should create Yes... It think you are right. I was thinking for some reason of master-abort region belonging to the the pci bus that is not specific to x86 pc... > it at the correct priority and size to give the behaviour > relative to other devices in the pc model that is required > (ie that the hardware has). This doesn't affect any other > target architecture or board. > > That said, I don't know enough about the PC to know what > the exact details of the pci-hole are, so I'm not making a > statement about what the correct model is. I'm just saying > that what you do with the pci-hole and the container it lives > in and the other devices in that container is not going to > change the behaviour of any other target board. > > > 2. If the above is right, and making pci-hole 64 bit wide is not an option, > > playing with pci-holes/other-region priorities it would be just wrong, > > it would be only to "fight" with the locality of the memory region's priority. > > I have no idea what you mean by "fighting" here. MR priorities By "fighting" I was referring exactly to the fact that because priorities are container specific we need to use them twice to get the wanted behavior (master abort being with the lowest priority) 1. master-abort region priority for pci-address-space 2. pci-hole priority for system-address-space But this is as designed... Thanks, Marcel > apply only within a specific container region[*], to set which of > that container's children appears 'above' another. They're > totally local to the container (which in this case is part of the > PC model, not the generic PCI code) and so the PC model > can freely set them to whatever makes most sense. > > [*] if you didn't already know this, see the recently committed > updates to doc/memory.txt for a more detailed explanation. > > -- PMM >
On Thu, 2013-11-07 at 22:48 +0100, Laszlo Ersek wrote: > On 11/07/13 22:24, Marcel Apfelbaum wrote: > > On Thu, 2013-11-07 at 22:12 +0100, Laszlo Ersek wrote: > > >> adding subregion 'pci-hole' to region 'system' at offset 60000000 > >> warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash) > > Thank you Laszlo for the detailed info! > > I think the problem is right above. Why pci-hole and system.flash collide? > > IMHO we should not play with priorities here, better solve the collision. > > pc_init1() > pc_memory_init() > pc_system_firmware_init() > pc_system_flash_init() <---- sets base address to > 0x100000000ULL - flash_size > pflash_cfi01_register() > sysbus_mmio_map() > sysbus_mmio_map_common() > memory_region_add_subregion() > i440fx_init() > memory_region_init_alias("pci-hole") > > pc_init1() passes > > 0x100000000ULL - below_4g_mem_size > > to i440fx_init() as "pci_hole_size", which is then used as the size of > the "pci-hole" alias. > > We should probably subtract the size of the flash from this, but I don't > know how to do that "elegantly". Yet another (output) parameter for > pc_memory_init()? Blech. > > Or look up the end address of "system.flash" by name? Both seems ugly to me... As Peter Maydell pointed out, the pci-hole belongs to the pc model, so we can actually change its priority without affecting other architectures. > > Thanks > Laszlo >
On 11/07/13 22:24, Marcel Apfelbaum wrote: > Why pci-hole and system.flash collide? IMHO we should not play with > priorities here, better solve the collision. What about this "beautiful" series? It produces memory 0000000000000000-000fffffffffffff (prio 0, RW): system [...] 0000000060000000-00000000ffdfffff (prio 0, RW): alias pci-hole @pci 0000000060000000-00000000ffdfffff [...] 00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash and I can run OVMF with it. It also stays within i386/pc. Re 2/2, note that "below_4g_mem_size" never exceeds 0xe0000000 in pc_init1(), so the subtraction is safe. Thanks Laszlo Laszlo Ersek (2): i386/pc: propagate flash size from pc_system_flash_init() to pc_init1() i386/pc_piix: the pci-hole should end where the system flash starts include/hw/i386/pc.h | 6 ++++-- hw/i386/pc.c | 5 +++-- hw/i386/pc_piix.c | 5 +++-- hw/i386/pc_q35.c | 3 ++- hw/i386/pc_sysfw.c | 10 +++++++--- 5 files changed, 19 insertions(+), 10 deletions(-)
Il 07/11/2013 22:51, Peter Maydell ha scritto: >> > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends) >> > is for sure PCI". Only x86 behaves like this (I think). > > More specifically, the x86 pc behaves like this. Other > x86 based systems could in theory behave differently > (not that we actually model any, I think). After Marcel's patch, we have changed behavior for at least all boards that pass get_system_memory() to pci_register_bus or pci_bus_new: * mips/gt64xxx_pci.c * pci-host/bonito.c * pci-host/ppce500.c * ppc/ppc4xx_pci.c * sh4/sh_pci.c These now will not go anymore through unassigned_mem_ops, which is a behavioral change for MIPS boards (gt64xxx_pci and bonito) at least. Furthermore, the default behavior of the memory API _is_ read all-ones/ignore writes, so I'm not sure what's the benefit of adding a separate region for master abort... Paolo
Il 07/11/2013 23:23, Laszlo Ersek ha scritto: > On 11/07/13 22:24, Marcel Apfelbaum wrote: >> Why pci-hole and system.flash collide? IMHO we should not play with >> priorities here, better solve the collision. > > What about this "beautiful" series? It produces > > memory > 0000000000000000-000fffffffffffff (prio 0, RW): system > [...] > 0000000060000000-00000000ffdfffff (prio 0, RW): alias pci-hole @pci > 0000000060000000-00000000ffdfffff > [...] > 00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash > > and I can run OVMF with it. It also stays within i386/pc. This definitely works, and make sense. But I think for 1.7 we should just revert the commit. It can be done later in 1.8. And it should have a qtest testcase that shows the effect of the patch. Other patches can still be applied to 1.7, but the change definitely had much bigger ramifications than anticipated. The patches are [PATCH for-1.7 v2 5/8] pci: fix address space size for bridge [PATCH for-1.7 v2 7/8] pc: s/INT64_MAX/UINT64_MAX/ [PATCH for-1.7 v2 8/8] spapr_pci: s/INT64_MAX/UINT64_MAX/ There's also [PATCH 1/2] split definitions for exec.c and translate-all.c radix trees [PATCH 2/2] exec: make address spaces 64-bit wide which however has a 2% perf hit for TCG. In 1.8 we can apply it and recover the hit with other optimizations. Paolo
On 8 November 2013 08:05, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 07/11/2013 22:51, Peter Maydell ha scritto: >>> > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends) >>> > is for sure PCI". Only x86 behaves like this (I think). >> >> More specifically, the x86 pc behaves like this. Other >> x86 based systems could in theory behave differently >> (not that we actually model any, I think). > > After Marcel's patch, we have changed behavior for at least all boards > that pass get_system_memory() to pci_register_bus or pci_bus_new: > > * mips/gt64xxx_pci.c > > * pci-host/bonito.c > > * pci-host/ppce500.c > > * ppc/ppc4xx_pci.c > > * sh4/sh_pci.c Oh, right. Ideally those boards should not do that (I fixed the versatile pci controller not to do that a while back) but it's a long standing behaviour so it would be better had we not broken it. I think it should in general be fixable by just having those pci controllers create an empty memory region to pass in, like versatile: memory_region_init(&s->pci_io_space, OBJECT(s), "pci_io", 1ULL << 32); memory_region_init(&s->pci_mem_space, OBJECT(s), "pci_mem", 1ULL << 32); pci_bus_new_inplace(&s->pci_bus, sizeof(s->pci_bus), DEVICE(obj), "pci", &s->pci_mem_space, &s->pci_io_space, PCI_DEVFN(11, 0), TYPE_PCI_BUS); That doesn't affect where PCI DMA goes. It might also require mapping an alias into that region at wherever the pci hole is on those boards. Kind of awkward to do and test at this point in the release cycle though. > These now will not go anymore through unassigned_mem_ops, which is a > behavioral change for MIPS boards (gt64xxx_pci and bonito) at least. > > Furthermore, the default behavior of the memory API _is_ read > all-ones/ignore writes, so I'm not sure what's the benefit of adding a > separate region for master abort... It gives you a place set the appropriate PCI controller or device register status bits on abort by a PCI device access, IIRC. -- PMM
Il 08/11/2013 11:44, Peter Maydell ha scritto: > On 8 November 2013 08:05, Paolo Bonzini <pbonzini@redhat.com> wrote: >> Il 07/11/2013 22:51, Peter Maydell ha scritto: >>>>> 1. Not all architectures have the behavior: "Address space that is not RAM(and friends) >>>>> is for sure PCI". Only x86 behaves like this (I think). >>> >>> More specifically, the x86 pc behaves like this. Other >>> x86 based systems could in theory behave differently >>> (not that we actually model any, I think). >> >> After Marcel's patch, we have changed behavior for at least all boards >> that pass get_system_memory() to pci_register_bus or pci_bus_new: >> >> * mips/gt64xxx_pci.c >> >> * pci-host/bonito.c >> >> * pci-host/ppce500.c >> >> * ppc/ppc4xx_pci.c >> >> * sh4/sh_pci.c > > Oh, right. Ideally those boards should not do that (I fixed > the versatile pci controller not to do that a while back) but > it's a long standing behaviour so it would be better had we > not broken it. > > I think it should in general be fixable by just having those > pci controllers create an empty memory region to pass in, > like versatile: > > memory_region_init(&s->pci_io_space, OBJECT(s), "pci_io", 1ULL << 32); > memory_region_init(&s->pci_mem_space, OBJECT(s), "pci_mem", 1ULL << 32); > > pci_bus_new_inplace(&s->pci_bus, sizeof(s->pci_bus), DEVICE(obj), "pci", > &s->pci_mem_space, &s->pci_io_space, > PCI_DEVFN(11, 0), TYPE_PCI_BUS); > > That doesn't affect where PCI DMA goes. It might also > require mapping an alias into that region at wherever > the pci hole is on those boards. > > Kind of awkward to do and test at this point in the release cycle though. Yes, for now we should just revert the patch. BTW I tried optimizing MMIO dispatch and managed to save ~30 cycles so I don't think we should be afraid of increasing the depth of the radix tree. All stuff for 1.8 though. Paolo Paolo
On Fri, 2013-11-08 at 09:05 +0100, Paolo Bonzini wrote: > Il 07/11/2013 22:51, Peter Maydell ha scritto: > >> > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends) > >> > is for sure PCI". Only x86 behaves like this (I think). > > > > More specifically, the x86 pc behaves like this. Other > > x86 based systems could in theory behave differently > > (not that we actually model any, I think). > > After Marcel's patch, we have changed behavior for at least all boards > that pass get_system_memory() to pci_register_bus or pci_bus_new: > > * mips/gt64xxx_pci.c > > * pci-host/bonito.c > > * pci-host/ppce500.c > > * ppc/ppc4xx_pci.c > > * sh4/sh_pci.c > > These now will not go anymore through unassigned_mem_ops, which is a > behavioral change for MIPS boards (gt64xxx_pci and bonito) at least. > > Furthermore, the default behavior of the memory API _is_ read > all-ones/ignore writes, so I'm not sure what's the benefit of adding a > separate region for master abort... Actually, as I see, the default behavior of "system" memory region is to use unassigned_mem_ops that has valid.accepts method returning false (no read/write methods). I don't see that read all-ones/ignore writes is implemented. This was the main reason I submitted this patch. I had *no* clue that it would impact so much the system... I still think the patch is needed ans the guests will benefit from more accurate PCI spec emulation. Thanks, Marcel > > Paolo
On Fri, 2013-11-08 at 10:44 +0000, Peter Maydell wrote: > On 8 November 2013 08:05, Paolo Bonzini <pbonzini@redhat.com> wrote: > > Il 07/11/2013 22:51, Peter Maydell ha scritto: > >>> > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends) > >>> > is for sure PCI". Only x86 behaves like this (I think). > >> > >> More specifically, the x86 pc behaves like this. Other > >> x86 based systems could in theory behave differently > >> (not that we actually model any, I think). > > > > After Marcel's patch, we have changed behavior for at least all boards > > that pass get_system_memory() to pci_register_bus or pci_bus_new: > > > > * mips/gt64xxx_pci.c > > > > * pci-host/bonito.c > > > > * pci-host/ppce500.c > > > > * ppc/ppc4xx_pci.c > > > > * sh4/sh_pci.c > > Oh, right. Ideally those boards should not do that (I fixed > the versatile pci controller not to do that a while back) but > it's a long standing behaviour so it would be better had we > not broken it. > > I think it should in general be fixable by just having those > pci controllers create an empty memory region to pass in, > like versatile: > > memory_region_init(&s->pci_io_space, OBJECT(s), "pci_io", 1ULL << 32); > memory_region_init(&s->pci_mem_space, OBJECT(s), "pci_mem", 1ULL << 32); > > pci_bus_new_inplace(&s->pci_bus, sizeof(s->pci_bus), DEVICE(obj), "pci", > &s->pci_mem_space, &s->pci_io_space, > PCI_DEVFN(11, 0), TYPE_PCI_BUS); > > That doesn't affect where PCI DMA goes. It might also > require mapping an alias into that region at wherever > the pci hole is on those boards. > > Kind of awkward to do and test at this point in the release cycle though. > > > These now will not go anymore through unassigned_mem_ops, which is a > > behavioral change for MIPS boards (gt64xxx_pci and bonito) at least. > > > > Furthermore, the default behavior of the memory API _is_ read > > all-ones/ignore writes, so I'm not sure what's the benefit of adding a > > separate region for master abort... > > It gives you a place set the appropriate PCI controller or device > register status bits on abort by a PCI device access, IIRC. Right, thanks for pointing this out. This was indeed the patches direction. But even the first patch has meaning by itself and not just a preparation. As I mentioned in other mail in this thread "read all-ones/ignore writes" is not implemented yet asbeacuse unassigned_mem_ops has no read/write methods and valid.accepts returns false. Thanks, Marcel > > -- PMM >
Il 08/11/2013 16:08, Marcel Apfelbaum ha scritto: > Actually, as I see, the default behavior of "system" memory region > is to use unassigned_mem_ops that has valid.accepts method returning > false (no read/write methods). I don't see that read all-ones/ignore > writes is implemented. Right, it's read-all-zeroes (see unassigned_mem_read). It was read-all-ones for a brief time, then it got changed back to read-all-zeroes. > This was the main reason I submitted this patch. I had *no* clue that > it would impact so much the system... Yeah, it's not your fault. But it should have been held back until 1.8. Do you agree with reverting it? > I still think the patch is needed ans the guests will benefit from > more accurate PCI spec emulation. Only buggy guests :) but yes, I agree it's a good thing to have. Paolo
On Fri, 2013-11-08 at 17:12 +0100, Paolo Bonzini wrote: > Il 08/11/2013 16:08, Marcel Apfelbaum ha scritto: > > Actually, as I see, the default behavior of "system" memory region > > is to use unassigned_mem_ops that has valid.accepts method returning > > false (no read/write methods). I don't see that read all-ones/ignore > > writes is implemented. > > Right, it's read-all-zeroes (see unassigned_mem_read). It was > read-all-ones for a brief time, then it got changed back to read-all-zeroes. I remember something, Jan's patches got reverted because the modification was system wide and not only for pci address space. > > > This was the main reason I submitted this patch. I had *no* clue that > > it would impact so much the system... > > Yeah, it's not your fault. But it should have been held back until 1.8. > Do you agree with reverting it? Given the actual impact on version's stability, I agree it is the right thing to do. > > > I still think the patch is needed ans the guests will benefit from > > more accurate PCI spec emulation. > > Only buggy guests :) but yes, I agree it's a good thing to have. Yes, it may be a driver there making some a wrong decision and "reading-all-ones" may give it a clue. (the same for writes) Thanks, Marcel > > Paolo
On Thu, 7 Nov 2013 23:23:57 +0100 Laszlo Ersek <lersek@redhat.com> wrote: > On 11/07/13 22:24, Marcel Apfelbaum wrote: > > > Why pci-hole and system.flash collide? IMHO we should not play with > > priorities here, better solve the collision. > > What about this "beautiful" series? It produces Laszlo, there is patch that removes PCI-hole aliases at all mapping PCI address space with -1 priority in system address space. so all access flash region range will go to it sice it's priority 0 > > memory > 0000000000000000-000fffffffffffff (prio 0, RW): system > [...] > 0000000060000000-00000000ffdfffff (prio 0, RW): alias pci-hole @pci > 0000000060000000-00000000ffdfffff > [...] > 00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash > > and I can run OVMF with it. It also stays within i386/pc. > > Re 2/2, note that "below_4g_mem_size" never exceeds 0xe0000000 in > pc_init1(), so the subtraction is safe. > > Thanks > Laszlo > > Laszlo Ersek (2): > i386/pc: propagate flash size from pc_system_flash_init() to > pc_init1() > i386/pc_piix: the pci-hole should end where the system flash starts > > include/hw/i386/pc.h | 6 ++++-- > hw/i386/pc.c | 5 +++-- > hw/i386/pc_piix.c | 5 +++-- > hw/i386/pc_q35.c | 3 ++- > hw/i386/pc_sysfw.c | 10 +++++++--- > 5 files changed, 19 insertions(+), 10 deletions(-) > > -- > 1.8.3.1 > >
diff --git a/memory.c b/memory.c index 28f6449..5e60a6c 100644 --- a/memory.c +++ b/memory.c @@ -1427,6 +1427,10 @@ static void memory_region_add_subregion_common(MemoryRegion *mr, memory_region_ref(subregion); subregion->parent = mr; subregion->addr = offset; + + fprintf(stderr, "adding subregion '%s' to region '%s' at offset %llx\n", + subregion->name, mr->name, (unsigned long long)offset); + QTAILQ_FOREACH(other, &mr->subregions, subregions_link) { if (subregion->may_overlap || other->may_overlap) { continue; @@ -1437,8 +1441,8 @@ static void memory_region_add_subregion_common(MemoryRegion *mr, int128_make64(other->addr))) { continue; } -#if 0 - printf("warning: subregion collision %llx/%llx (%s) " +#if 1 + fprintf(stderr, "warning: subregion collision %llx/%llx (%s) " "vs %llx/%llx (%s)\n", (unsigned long long)offset, (unsigned long long)int128_get64(subregion->size), @@ -1448,12 +1452,21 @@ static void memory_region_add_subregion_common(MemoryRegion *mr, other->name); #endif } + QTAILQ_FOREACH(other, &mr->subregions, subregions_link) { if (subregion->priority >= other->priority) { + fprintf(stderr, "subregion '%s' (prio: %d) wins over sibling " + "subregion '%s' (prio: %d)\n", subregion->name, + subregion->priority, other->name, other->priority); QTAILQ_INSERT_BEFORE(other, subregion, subregions_link); goto done; } + fprintf(stderr, "subregion '%s' (prio: %d) loses to sibling " + "subregion '%s' (prio: %d)\n", subregion->name, + subregion->priority, other->name, other->priority); } + fprintf(stderr, "subregion '%s' (prio: %d) inserted at tail\n", + subregion->name, subregion->priority); QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link); done: memory_region_update_pending |= mr->enabled && subregion->enabled;