Patchwork exec: fix regression by making system-memory region UINT64_MAX size

login
register
mail settings
Submitter Laszlo Ersek
Date Nov. 7, 2013, 9:12 p.m.
Message ID <527C023F.2060506@redhat.com>
Download mbox | patch
Permalink /patch/289482/
State New
Headers show

Comments

Laszlo Ersek - Nov. 7, 2013, 9:12 p.m.
This is a QEMU bug report, only disguised as an edk2-devel followup.

The problem in a nutshell is that the OVMF binary, placed into pflash
(read-only KVM memslot) used to run in qemu-1.6, but it fails to start
in qemu-1.7. The OVMF reset vector reads as 0xFF bytes.

(Jordan and myself started writing these emails in parallel.)

On 11/07/13 21:27, Jordan Justen wrote:
> On Sun, Nov 3, 2013 at 12:48 PM, Marcel Apfelbaum
> <marcel.a@redhat.com> wrote:
>> The commit:
>>
>> Commit: a53ae8e934cd54686875b5bcfc2f434244ee55d6
>> Author: Marcel Apfelbaum <marcel.a@redhat.com>
>> Date:   Mon Sep 16 11:21:16 2013 +0300
>>
>>     hw/pci: partially handle pci master abort
>>
>> introduced a regression on make check:
>
> Laszlo pointed out that my OVMF flash support series was not working
> with QEMU master. It was working with QEMU 1.6.0. I then bisected the
> issue to this commit. It seems this commit regresses -pflash support
> for both KVM and non-KVM modes.
>
> Can you reproduce the issue this with command?
> x86_64-softmmu/qemu-system-x86_64 -pflash pc-bios/bios.bin
> (with or without adding -enable-kvm)
>
> I tried adding this patch ("exec: fix regression by making
> system-memory region UINT64_MAX size") and it did not help the -pflash
> regression.


From the edk2-devel discussion:
<http://thread.gmane.org/gmane.comp.bios.tianocore.devel/4787/focus=4812>

On 11/07/13 19:07, Laszlo Ersek wrote:
> On 11/07/13 17:28, Laszlo Ersek wrote:
>> On 11/06/13 23:29, Jordan Justen wrote:
>>> https://github.com/jljusten/edk2.git ovmf-nvvars-v2
>>>
>>> This series implements support for QEMU's emulated
>>> system flash.
>>>
>>> This allows for persistent UEFI non-volatile variables.
>>>
>>> Previously we attempted to emulate non-volatile
>>> variables in a few ways, but each of them would fail
>>> in particular situations.
>>>
>>> To support flash based variable storage, we:
>>>  * Reserve space in the firmware device image
>>>  * Add a new flash firmware volume block driver
>>>  * Disable EmuVariableFvbRuntimeDxe (at runtime) if QEMU
>>>    flash is available.
>>>
>>> To use:
>>>  * QEMU version 1.1 or newer is required without KVM
>>>  * KVM support requires Linux 3.7 and QEMU 1.6
>>>  * Run QEMU with -pflash OVMF.fd instead of -L or -bios
>>>    or use OvmfPkg/build.sh --enable-flash qemu ...
>>>  * If QEMU is 1.6 or newer, then OvmfPkg/build.sh will
>>>    automatically enable flash when running QEMU, so in
>>>    that case --enable-flash is not required.
>>>
>>> See also:
>>>  * http://wiki.qemu.org/Features/PC_System_Flash
>>>
>>> v2:
>>>  * Replace
>>>      "OvmfPkg/AcpiPlatformDxe/Qemu: Allow high runtime memory regions"
>>>    with
>>>      "OvmfPkg/AcpiPlatformDxe/Qemu: Decrease upper limit for PCI window 32"
>>>  * Separate portions of
>>>      "OvmfPkg/build.sh: Support --enable-flash switch"
>>>    out into a new patch
>>>      "OvmfPkg/build.sh: Enable flash for QEMU >= 1.6"
>>>  * Add "OvmfPkg/README: Add information about OVMF flash layout"
>>>  * Update "OvmfPkg: Add NV Variable storage within FD" to also change the
>>>    size of PcdVariableStoreSize
>>>  * Update commit messages on a couple patches for better clarity
>>
>> Tested in the following configurations:
>>
>> (1) RHEL-6 host (no KVM support, no qemu support -- that is,
>> regression test): RHEL-6, Fedora 19, Windows 2008 R2 (needs CSM),
>> Windows 2012 R2 boot tests work OK.
>>
>> (2) 3.10-based host kernel, qemu v1.7.0-rc0, Xeon W3550 host CPU:
>> Unfortunately qemu dies with the following KVM trace:
>>
>>   KVM internal error. Suberror: 1
>>   emulation failure
>>   EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000623
>>   ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
>>   EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>   ES =0000 00000000 0000ffff 00009300
>>   CS =f000 ffff0000 0000ffff 00009b00
>>   SS =0000 00000000 0000ffff 00009300
>>   DS =0000 00000000 0000ffff 00009300
>>   FS =0000 00000000 0000ffff 00009300
>>   GS =0000 00000000 0000ffff 00009300
>>   LDT=0000 00000000 0000ffff 00008200
>>   TR =0000 00000000 0000ffff 00008b00
>>   GDT=     00000000 0000ffff
>>   IDT=     00000000 0000ffff
>>   CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
>>   DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
>>   DR6=00000000ffff0ff0 DR7=0000000000000400
>>   EFER=0000000000000000
>>   Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>
>> I'm quite unsure, but the CS:IP value seems to point at the reset
>> vector, no? However, the Code=... log only shows 0xFF bytes.
>>
>> (3) 3.10.17 host kernel, qemu v1.7.0-rc0, Athlon II X2 B22 host CPU:
>> almost identical KVM error, with the following difference:
>>
>>   --- vmx 2013-11-07 17:23:45.612973935 +0100
>>   +++ svm 2013-11-07 17:24:17.237973059 +0100
>>   @@ -1,6 +1,6 @@
>>    KVM internal error. Suberror: 1
>>    emulation failure
>>   -EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000623
>>   +EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663
>>    ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
>>    EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
>>    ES =0000 00000000 0000ffff 00009300
>>
>> Any ideas?
>
> I.
> This is a QEMU regression (somewhere between v1.6.0 and v1.7.0-rc0),
> I'll have to bisect it.

This bug is "caused" by the following commit:

  a53ae8e934cd54686875b5bcfc2f434244ee55d6 is the first bad commit
  commit a53ae8e934cd54686875b5bcfc2f434244ee55d6
  Author: Marcel Apfelbaum <marcel.a@redhat.com>
  Date:   Mon Sep 16 11:21:16 2013 +0300

      hw/pci: partially handle pci master abort

      A MemoryRegion with negative priority was created and
      it spans over all the pci address space.
      It "intercepts" the accesses to unassigned pci
      address space and will follow the pci spec:
       1. returns -1 on read
       2. does nothing on write

      Note: setting the RECEIVED MASTER ABORT bit in the STATUS register
            of the device that initiated the transaction will be
            implemented in another series

      Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
      Acked-by: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

The patch was posted as patch 3/3 of the series

  [Qemu-devel] [PATCH v5 0/3] pci: partially implement master abort
  protocol

  http://thread.gmane.org/gmane.comp.emulators.qemu/233798/focus=233801

Basically, the series adds a "background" memory region that covers all
"unassigned" PCI addresses, and patch 3/3 specifically makes sure that
writes to this region are dropped, and reads all return -1 (0xFFFFFFFF).

The read implementation (master_abort_mem_read(), -1) is consistent with
the KVM dump in the quoted part above.

For some reason, this "background" region pushes into the "foreground"
when it comes to the pflash region just below 4GB (in guest-phys address
space).

Or, hm, supposing we start to run in real mode at FFFF:0000, the problem
could be with the "isa-bios" region too.

So I think we have a bug in qemu, which is likely one of the three
below:

(1) This commit is wrong. Or,
(2) the pflash implementation is wrong, because it doesn't register a
    memory region (with appropriate priority) that would catch the
    access.
(3) Both this commit and the pflash implementations are right, but this
    is an unusual situation that the memory infrastructure doesn't
    handle well.

(I doubt that the problem is in KVM.)

When the bug hits, the "info qtree" command has the following to say
about the flash device:

  dev: cfi.pflash01, id ""
    drive = pflash0
    num-blocks = 512
    sector-length = 4096
    width = 1
    big-endian = 0
    id0 = 0
    id1 = 0
    id2 = 0
    id3 = 0
    name = "system.flash"
    irq 0
    mmio 00000000ffe00000/0000000000200000

The "info mtree" command lists:

memory
0000000000000000-7ffffffffffffffe (prio 0, RW): system
  [...]
  0000000060000000-00000000ffffffff (prio 0, RW): alias pci-hole @pci 0000000060000000-00000000ffffffff
  [...]
  00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash
[...]
pci <-------- referred to as "rom_memory" in the source (pci is enabled)
0000000000000000-7ffffffffffffffe (prio 0, RW): pci
  0000000000000000-7ffffffffffffffe (prio -2147483648, RW): pci-master-abort
  [...]
  00000000000e0000-00000000000fffff (prio 1, R-): isa-bios

For some reason, the range called "pci-master-abort" takes priority over
"isa-bios" (and/or "system.flash").

I wrote the attached debug patch (for v1.7.0-rc0). It produces quite a
bit of output, but grepping it for

  isa-bios|system\.flash|pci-master-abort|pci-hole

results in the following messages:

  adding subregion 'system.flash' to region 'system' at offset ffe00000
  subregion 'system.flash' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096)
  subregion 'system.flash' (prio: 0) wins over sibling subregion 'ram-below-4g' (prio: 0)

  adding subregion 'isa-bios' to region 'pci' at offset e0000
  subregion 'isa-bios' (prio: 1) inserted at tail
  subregion 'pc.rom' (prio: 1) wins over sibling subregion 'isa-bios' (prio: 1)

  adding subregion 'pci-master-abort' to region 'pci' at offset 0
  subregion 'pci-master-abort' (prio: -2147483648) loses to sibling subregion 'pc.rom' (prio: 1)
  subregion 'pci-master-abort' (prio: -2147483648) loses to sibling subregion 'isa-bios' (prio: 1)
  subregion 'pci-master-abort' (prio: -2147483648) inserted at tail

  adding subregion 'pci-hole' to region 'system' at offset 60000000
  warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash)
  subregion 'pci-hole' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096)
  subregion 'pci-hole' (prio: 0) wins over sibling subregion 'system.flash' (prio: 0)

  adding subregion 'pci-hole64' to region 'system' at offset 100000000
  subregion 'pci-hole64' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096)
  subregion 'pci-hole64' (prio: 0) wins over sibling subregion 'pci-hole' (prio: 0)
  subregion 'smram-region' (prio: 1) wins over sibling subregion 'pci-hole64' (prio: 0)
  warning: subregion collision fec00000/1000 (kvm-ioapic) vs 60000000/a0000000 (pci-hole)
  subregion 'kvm-ioapic' (prio: 0) wins over sibling subregion 'pci-hole64' (prio: 0)
  warning: subregion collision fed00000/400 (hpet) vs 60000000/a0000000 (pci-hole)

I think I know what's going on... There's even a warning above, printed
by original (albeit disabled) qemu source code.

The "pci-hole" subregion, which is an alias, takes priority over
"system.flash". And, unfortunately, "pci-hole" provides a window into
"pci-master-abort".

I think this should be fixable by raising the priority of "system.flash"
to 2048. This way the relationship between "system.flash" and any other
region will not change, except it's going to be reversed with
"pci-hole".

The 2nd attached patch seems to solve the problem for me. I'll resubmit
it as a standalone patch if it is deemed good.

With it in place, I can run OVMF just fine. And:

  @@ -28,170 +28,224 @@
   adding subregion 'pci-conf-data' to region 'io' at offset cfc
   subregion 'pci-conf-data' (prio: 0) wins over sibling subregion 'pci-conf-idx' (prio: 0)
   adding subregion 'pci-hole' to region 'system' at offset 60000000
  -warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash)
   subregion 'pci-hole' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096)
  -subregion 'pci-hole' (prio: 0) wins over sibling subregion 'system.flash' (prio: 0)
  +subregion 'pci-hole' (prio: 0) loses to sibling subregion 'system.flash' (prio: 2048)
  +subregion 'pci-hole' (prio: 0) wins over sibling subregion 'ram-below-4g' (prio: 0)

According to the debug patch, the flash region starts to win over quite
a few unrelated other regions as well. But in practice I could see no
adverse effects -- the priority matters only when the addresses actually
overlap, and "system.flash" should not overlap with anything but
"pci-hole".

I'm attaching the following files as well:
- the "info mtree" output before and after the patch,
- the full output of the debug patch (before and after).

Thanks,
Laszlo
From 44854609c3134ec174593a890dfb1a9340b714f7 Mon Sep 17 00:00:00 2001
From: Laszlo Ersek <lersek@redhat.com>
Date: Thu, 7 Nov 2013 21:19:07 +0100
Subject: [PATCH 1/2] debug messages for memory_region_add_subregion_common()

Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 memory.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)
Paolo Bonzini - Nov. 7, 2013, 9:21 p.m.
Il 07/11/2013 22:12, Laszlo Ersek ha scritto:
> 0000000000000000-7ffffffffffffffe (prio 0, RW): system
>   [...]
>   0000000060000000-00000000ffffffff (prio 0, RW): alias pci-hole @pci 0000000060000000-00000000ffffffff
>   [...]
>   00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash
> [...]

Priorities are not "transitive" across aliases; once you use an alias to
map a region, the alias's priority counts, not the target region's
priority.  So the INT_MIN priority for pci-master-abort counts *within
the alias*, but the choice between pci-hole and system.flash is only
affected by the priorities of pci-hole and system.flash.

You could give a smaller priority (-1 or INT_MIN) to pci-hole and just
let it occupy the whole address space, from 0 to UINT64_MAX.  Or perhaps
the pci-hole alias is too large and it should end before the system
flash area.  Both solutions should work.

Paolo
Marcel Apfelbaum - Nov. 7, 2013, 9:24 p.m.
On Thu, 2013-11-07 at 22:12 +0100, Laszlo Ersek wrote:
> This is a QEMU bug report, only disguised as an edk2-devel followup.
> 
> The problem in a nutshell is that the OVMF binary, placed into pflash
> (read-only KVM memslot) used to run in qemu-1.6, but it fails to start
> in qemu-1.7. The OVMF reset vector reads as 0xFF bytes.
> 
> (Jordan and myself started writing these emails in parallel.)
> 
> On 11/07/13 21:27, Jordan Justen wrote:
> > On Sun, Nov 3, 2013 at 12:48 PM, Marcel Apfelbaum
> > <marcel.a@redhat.com> wrote:
> >> The commit:
> >>
> >> Commit: a53ae8e934cd54686875b5bcfc2f434244ee55d6
> >> Author: Marcel Apfelbaum <marcel.a@redhat.com>
> >> Date:   Mon Sep 16 11:21:16 2013 +0300
> >>
> >>     hw/pci: partially handle pci master abort
> >>
> >> introduced a regression on make check:
> >
> > Laszlo pointed out that my OVMF flash support series was not working
> > with QEMU master. It was working with QEMU 1.6.0. I then bisected the
> > issue to this commit. It seems this commit regresses -pflash support
> > for both KVM and non-KVM modes.
> >
> > Can you reproduce the issue this with command?
> > x86_64-softmmu/qemu-system-x86_64 -pflash pc-bios/bios.bin
> > (with or without adding -enable-kvm)
> >
> > I tried adding this patch ("exec: fix regression by making
> > system-memory region UINT64_MAX size") and it did not help the -pflash
> > regression.
> 
> 
> From the edk2-devel discussion:
> <http://thread.gmane.org/gmane.comp.bios.tianocore.devel/4787/focus=4812>
> 
> On 11/07/13 19:07, Laszlo Ersek wrote:
> > On 11/07/13 17:28, Laszlo Ersek wrote:
> >> On 11/06/13 23:29, Jordan Justen wrote:
> >>> https://github.com/jljusten/edk2.git ovmf-nvvars-v2
> >>>
> >>> This series implements support for QEMU's emulated
> >>> system flash.
> >>>
> >>> This allows for persistent UEFI non-volatile variables.
> >>>
> >>> Previously we attempted to emulate non-volatile
> >>> variables in a few ways, but each of them would fail
> >>> in particular situations.
> >>>
> >>> To support flash based variable storage, we:
> >>>  * Reserve space in the firmware device image
> >>>  * Add a new flash firmware volume block driver
> >>>  * Disable EmuVariableFvbRuntimeDxe (at runtime) if QEMU
> >>>    flash is available.
> >>>
> >>> To use:
> >>>  * QEMU version 1.1 or newer is required without KVM
> >>>  * KVM support requires Linux 3.7 and QEMU 1.6
> >>>  * Run QEMU with -pflash OVMF.fd instead of -L or -bios
> >>>    or use OvmfPkg/build.sh --enable-flash qemu ...
> >>>  * If QEMU is 1.6 or newer, then OvmfPkg/build.sh will
> >>>    automatically enable flash when running QEMU, so in
> >>>    that case --enable-flash is not required.
> >>>
> >>> See also:
> >>>  * http://wiki.qemu.org/Features/PC_System_Flash
> >>>
> >>> v2:
> >>>  * Replace
> >>>      "OvmfPkg/AcpiPlatformDxe/Qemu: Allow high runtime memory regions"
> >>>    with
> >>>      "OvmfPkg/AcpiPlatformDxe/Qemu: Decrease upper limit for PCI window 32"
> >>>  * Separate portions of
> >>>      "OvmfPkg/build.sh: Support --enable-flash switch"
> >>>    out into a new patch
> >>>      "OvmfPkg/build.sh: Enable flash for QEMU >= 1.6"
> >>>  * Add "OvmfPkg/README: Add information about OVMF flash layout"
> >>>  * Update "OvmfPkg: Add NV Variable storage within FD" to also change the
> >>>    size of PcdVariableStoreSize
> >>>  * Update commit messages on a couple patches for better clarity
> >>
> >> Tested in the following configurations:
> >>
> >> (1) RHEL-6 host (no KVM support, no qemu support -- that is,
> >> regression test): RHEL-6, Fedora 19, Windows 2008 R2 (needs CSM),
> >> Windows 2012 R2 boot tests work OK.
> >>
> >> (2) 3.10-based host kernel, qemu v1.7.0-rc0, Xeon W3550 host CPU:
> >> Unfortunately qemu dies with the following KVM trace:
> >>
> >>   KVM internal error. Suberror: 1
> >>   emulation failure
> >>   EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000623
> >>   ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> >>   EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> >>   ES =0000 00000000 0000ffff 00009300
> >>   CS =f000 ffff0000 0000ffff 00009b00
> >>   SS =0000 00000000 0000ffff 00009300
> >>   DS =0000 00000000 0000ffff 00009300
> >>   FS =0000 00000000 0000ffff 00009300
> >>   GS =0000 00000000 0000ffff 00009300
> >>   LDT=0000 00000000 0000ffff 00008200
> >>   TR =0000 00000000 0000ffff 00008b00
> >>   GDT=     00000000 0000ffff
> >>   IDT=     00000000 0000ffff
> >>   CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> >>   DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> >>   DR6=00000000ffff0ff0 DR7=0000000000000400
> >>   EFER=0000000000000000
> >>   Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff <ff> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >>
> >> I'm quite unsure, but the CS:IP value seems to point at the reset
> >> vector, no? However, the Code=... log only shows 0xFF bytes.
> >>
> >> (3) 3.10.17 host kernel, qemu v1.7.0-rc0, Athlon II X2 B22 host CPU:
> >> almost identical KVM error, with the following difference:
> >>
> >>   --- vmx 2013-11-07 17:23:45.612973935 +0100
> >>   +++ svm 2013-11-07 17:24:17.237973059 +0100
> >>   @@ -1,6 +1,6 @@
> >>    KVM internal error. Suberror: 1
> >>    emulation failure
> >>   -EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000623
> >>   +EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000663
> >>    ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
> >>    EIP=0000fff0 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
> >>    ES =0000 00000000 0000ffff 00009300
> >>
> >> Any ideas?
> >
> > I.
> > This is a QEMU regression (somewhere between v1.6.0 and v1.7.0-rc0),
> > I'll have to bisect it.
> 
> This bug is "caused" by the following commit:
> 
>   a53ae8e934cd54686875b5bcfc2f434244ee55d6 is the first bad commit
>   commit a53ae8e934cd54686875b5bcfc2f434244ee55d6
>   Author: Marcel Apfelbaum <marcel.a@redhat.com>
>   Date:   Mon Sep 16 11:21:16 2013 +0300
> 
>       hw/pci: partially handle pci master abort
> 
>       A MemoryRegion with negative priority was created and
>       it spans over all the pci address space.
>       It "intercepts" the accesses to unassigned pci
>       address space and will follow the pci spec:
>        1. returns -1 on read
>        2. does nothing on write
> 
>       Note: setting the RECEIVED MASTER ABORT bit in the STATUS register
>             of the device that initiated the transaction will be
>             implemented in another series
> 
>       Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
>       Acked-by: Michael S. Tsirkin <mst@redhat.com>
>       Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> The patch was posted as patch 3/3 of the series
> 
>   [Qemu-devel] [PATCH v5 0/3] pci: partially implement master abort
>   protocol
> 
>   http://thread.gmane.org/gmane.comp.emulators.qemu/233798/focus=233801
> 
> Basically, the series adds a "background" memory region that covers all
> "unassigned" PCI addresses, and patch 3/3 specifically makes sure that
> writes to this region are dropped, and reads all return -1 (0xFFFFFFFF).
> 
> The read implementation (master_abort_mem_read(), -1) is consistent with
> the KVM dump in the quoted part above.
> 
> For some reason, this "background" region pushes into the "foreground"
> when it comes to the pflash region just below 4GB (in guest-phys address
> space).
> 
> Or, hm, supposing we start to run in real mode at FFFF:0000, the problem
> could be with the "isa-bios" region too.
> 
> So I think we have a bug in qemu, which is likely one of the three
> below:
> 
> (1) This commit is wrong. Or,
> (2) the pflash implementation is wrong, because it doesn't register a
>     memory region (with appropriate priority) that would catch the
>     access.
> (3) Both this commit and the pflash implementations are right, but this
>     is an unusual situation that the memory infrastructure doesn't
>     handle well.
> 
> (I doubt that the problem is in KVM.)
> 
> When the bug hits, the "info qtree" command has the following to say
> about the flash device:
> 
>   dev: cfi.pflash01, id ""
>     drive = pflash0
>     num-blocks = 512
>     sector-length = 4096
>     width = 1
>     big-endian = 0
>     id0 = 0
>     id1 = 0
>     id2 = 0
>     id3 = 0
>     name = "system.flash"
>     irq 0
>     mmio 00000000ffe00000/0000000000200000
> 
> The "info mtree" command lists:
> 
> memory
> 0000000000000000-7ffffffffffffffe (prio 0, RW): system
>   [...]
>   0000000060000000-00000000ffffffff (prio 0, RW): alias pci-hole @pci 0000000060000000-00000000ffffffff
>   [...]
>   00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash
> [...]
> pci <-------- referred to as "rom_memory" in the source (pci is enabled)
> 0000000000000000-7ffffffffffffffe (prio 0, RW): pci
>   0000000000000000-7ffffffffffffffe (prio -2147483648, RW): pci-master-abort
>   [...]
>   00000000000e0000-00000000000fffff (prio 1, R-): isa-bios
> 
> For some reason, the range called "pci-master-abort" takes priority over
> "isa-bios" (and/or "system.flash").
> 
> I wrote the attached debug patch (for v1.7.0-rc0). It produces quite a
> bit of output, but grepping it for
> 
>   isa-bios|system\.flash|pci-master-abort|pci-hole
> 
> results in the following messages:
> 
>   adding subregion 'system.flash' to region 'system' at offset ffe00000
>   subregion 'system.flash' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096)
>   subregion 'system.flash' (prio: 0) wins over sibling subregion 'ram-below-4g' (prio: 0)
> 
>   adding subregion 'isa-bios' to region 'pci' at offset e0000
>   subregion 'isa-bios' (prio: 1) inserted at tail
>   subregion 'pc.rom' (prio: 1) wins over sibling subregion 'isa-bios' (prio: 1)
> 
>   adding subregion 'pci-master-abort' to region 'pci' at offset 0
>   subregion 'pci-master-abort' (prio: -2147483648) loses to sibling subregion 'pc.rom' (prio: 1)
>   subregion 'pci-master-abort' (prio: -2147483648) loses to sibling subregion 'isa-bios' (prio: 1)
>   subregion 'pci-master-abort' (prio: -2147483648) inserted at tail
> 
>   adding subregion 'pci-hole' to region 'system' at offset 60000000
>   warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash)
Thank you Laszlo for the detailed info!
I think the problem is right above. Why pci-hole and system.flash collide?
IMHO we should not play with priorities here, better solve the collision.

Thanks,
Marcel 

>   subregion 'pci-hole' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096)
>   subregion 'pci-hole' (prio: 0) wins over sibling subregion 'system.flash' (prio: 0)
> 
>   adding subregion 'pci-hole64' to region 'system' at offset 100000000
>   subregion 'pci-hole64' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096)
>   subregion 'pci-hole64' (prio: 0) wins over sibling subregion 'pci-hole' (prio: 0)
>   subregion 'smram-region' (prio: 1) wins over sibling subregion 'pci-hole64' (prio: 0)
>   warning: subregion collision fec00000/1000 (kvm-ioapic) vs 60000000/a0000000 (pci-hole)
>   subregion 'kvm-ioapic' (prio: 0) wins over sibling subregion 'pci-hole64' (prio: 0)
>   warning: subregion collision fed00000/400 (hpet) vs 60000000/a0000000 (pci-hole)
> 
> I think I know what's going on... There's even a warning above, printed
> by original (albeit disabled) qemu source code.
> 
> The "pci-hole" subregion, which is an alias, takes priority over
> "system.flash". And, unfortunately, "pci-hole" provides a window into
> "pci-master-abort".
> 
> I think this should be fixable by raising the priority of "system.flash"
> to 2048. This way the relationship between "system.flash" and any other
> region will not change, except it's going to be reversed with
> "pci-hole".
> 
> The 2nd attached patch seems to solve the problem for me. I'll resubmit
> it as a standalone patch if it is deemed good.
> 
> With it in place, I can run OVMF just fine. And:
> 
>   @@ -28,170 +28,224 @@
>    adding subregion 'pci-conf-data' to region 'io' at offset cfc
>    subregion 'pci-conf-data' (prio: 0) wins over sibling subregion 'pci-conf-idx' (prio: 0)
>    adding subregion 'pci-hole' to region 'system' at offset 60000000
>   -warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash)
>    subregion 'pci-hole' (prio: 0) loses to sibling subregion 'icc-apic-container' (prio: 4096)
>   -subregion 'pci-hole' (prio: 0) wins over sibling subregion 'system.flash' (prio: 0)
>   +subregion 'pci-hole' (prio: 0) loses to sibling subregion 'system.flash' (prio: 2048)
>   +subregion 'pci-hole' (prio: 0) wins over sibling subregion 'ram-below-4g' (prio: 0)
> 
> According to the debug patch, the flash region starts to win over quite
> a few unrelated other regions as well. But in practice I could see no
> adverse effects -- the priority matters only when the addresses actually
> overlap, and "system.flash" should not overlap with anything but
> "pci-hole".
> 
> I'm attaching the following files as well:
> - the "info mtree" output before and after the patch,
> - the full output of the debug patch (before and after).
> 
> Thanks,
> Laszlo
Paolo Bonzini - Nov. 7, 2013, 9:31 p.m.
Il 07/11/2013 22:24, Marcel Apfelbaum ha scritto:
> Thank you Laszlo for the detailed info!
> I think the problem is right above. Why pci-hole and system.flash collide?
> IMHO we should not play with priorities here, better solve the collision.

We need to audit all the other boards that support PCI...  I'll take a
look tomorrow since you guys are off.

Paolo
Laszlo Ersek - Nov. 7, 2013, 9:32 p.m.
On 11/07/13 22:21, Paolo Bonzini wrote:
> Il 07/11/2013 22:12, Laszlo Ersek ha scritto:
>> 0000000000000000-7ffffffffffffffe (prio 0, RW): system
>>   [...]
>>   0000000060000000-00000000ffffffff (prio 0, RW): alias pci-hole @pci 0000000060000000-00000000ffffffff
>>   [...]
>>   00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash
>> [...]
> 
> Priorities are not "transitive" across aliases; once you use an alias to
> map a region, the alias's priority counts, not the target region's
> priority.  So the INT_MIN priority for pci-master-abort counts *within
> the alias*, but the choice between pci-hole and system.flash is only
> affected by the priorities of pci-hole and system.flash.

Right. It's also documented in docs/memory.txt -- Peter's recent
addition I think?

> You could give a smaller priority (-1 or INT_MIN) to pci-hole and just
> let it occupy the whole address space, from 0 to UINT64_MAX.  Or perhaps
> the pci-hole alias is too large and it should end before the system
> flash area.  Both solutions should work.

I did reorder pci-hole and system.flash, but rather than lowering
pci-hole, I raised system.flash. I have no preference.

Thanks
Laszlo
Marcel Apfelbaum - Nov. 7, 2013, 9:38 p.m.
On Thu, 2013-11-07 at 22:31 +0100, Paolo Bonzini wrote:
> Il 07/11/2013 22:24, Marcel Apfelbaum ha scritto:
> > Thank you Laszlo for the detailed info!
> > I think the problem is right above. Why pci-hole and system.flash collide?
> > IMHO we should not play with priorities here, better solve the collision.
> 
> We need to audit all the other boards that support PCI...  I'll take a
> look tomorrow since you guys are off.
Thanks Paolo,
Let me just point out what I know (or I think I know):
1. Not all architectures have the behavior: "Address space that is not RAM(and friends) is for sure PCI".
   Only x86 behaves like this (I think). That means that you cannot have a 64bit wide pci-hole
   with lower priority that catches all accesses that are not for RAM(and firends).
2. If the above is right, and making pci-hole 64 bit wide is not an option,
   playing with pci-holes/other-region priorities it would be just wrong,
   it would be only to "fight" with the locality of the memory region's priority.

That being said, I am looking forward for your findings.
Thanks!
Marcel
  
> 
> Paolo
Laszlo Ersek - Nov. 7, 2013, 9:48 p.m.
On 11/07/13 22:24, Marcel Apfelbaum wrote:
> On Thu, 2013-11-07 at 22:12 +0100, Laszlo Ersek wrote:

>>   adding subregion 'pci-hole' to region 'system' at offset 60000000
>>   warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash)
> Thank you Laszlo for the detailed info!
> I think the problem is right above. Why pci-hole and system.flash collide?
> IMHO we should not play with priorities here, better solve the collision.

pc_init1()
  pc_memory_init()
    pc_system_firmware_init()
      pc_system_flash_init() <---- sets base address to
                                   0x100000000ULL - flash_size
        pflash_cfi01_register()
          sysbus_mmio_map()
            sysbus_mmio_map_common()
              memory_region_add_subregion()
  i440fx_init()
    memory_region_init_alias("pci-hole")

pc_init1() passes

    0x100000000ULL - below_4g_mem_size

to i440fx_init() as "pci_hole_size", which is then used as the size of
the "pci-hole" alias.

We should probably subtract the size of the flash from this, but I don't
know how to do that "elegantly". Yet another (output) parameter for
pc_memory_init()? Blech.

Or look up the end address of "system.flash" by name?

Thanks
Laszlo
Peter Maydell - Nov. 7, 2013, 9:51 p.m.
On 7 November 2013 21:38, Marcel Apfelbaum <marcel.a@redhat.com> wrote:
> Thanks Paolo,
> Let me just point out what I know (or I think I know):
> 1. Not all architectures have the behavior: "Address space that is not RAM(and friends) is for sure PCI".
>    Only x86 behaves like this (I think).

More specifically, the x86 pc behaves like this. Other
x86 based systems could in theory behave differently
(not that we actually model any, I think).

> That means that you cannot have a 64bit wide pci-hole
>    with lower priority that catches all accesses that are not for RAM(and firends).

...but this conclusion is wrong, because the pci-hole
region is created by the pc model. So we should create
it at the correct priority and size to give the behaviour
relative to other devices in the pc model that is required
(ie that the hardware has). This doesn't affect any other
target architecture or board.

That said, I don't know enough about the PC to know what
the exact details of the pci-hole are, so I'm not making a
statement about what the correct model is. I'm just saying
that what you do with the pci-hole and the container it lives
in and the other devices in that container is not going to
change the behaviour of any other target board.

> 2. If the above is right, and making pci-hole 64 bit wide is not an option,
>    playing with pci-holes/other-region priorities it would be just wrong,
>    it would be only to "fight" with the locality of the memory region's priority.

I have no idea what you mean by "fighting" here. MR priorities
apply only within a specific container region[*], to set which of
that container's children appears 'above' another. They're
totally local to the container (which in this case is part of the
PC model, not the generic PCI code) and so the PC model
can freely set them to whatever makes most sense.

[*] if you didn't already know this, see the recently committed
updates to doc/memory.txt for a more detailed explanation.

-- PMM
Marcel Apfelbaum - Nov. 7, 2013, 10:06 p.m.
On Thu, 2013-11-07 at 21:51 +0000, Peter Maydell wrote:
> On 7 November 2013 21:38, Marcel Apfelbaum <marcel.a@redhat.com> wrote:
> > Thanks Paolo,
> > Let me just point out what I know (or I think I know):
> > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends) is for sure PCI".
> >    Only x86 behaves like this (I think).
> 
> More specifically, the x86 pc behaves like this. Other
> x86 based systems could in theory behave differently
> (not that we actually model any, I think).
> 
> > That means that you cannot have a 64bit wide pci-hole
> >    with lower priority that catches all accesses that are not for RAM(and firends).
> 
> ...but this conclusion is wrong, because the pci-hole
> region is created by the pc model. So we should create
Yes... It think you are right. I was thinking for some
reason of master-abort region belonging to the the pci bus
that is not specific to x86 pc...

> it at the correct priority and size to give the behaviour
> relative to other devices in the pc model that is required
> (ie that the hardware has). This doesn't affect any other
> target architecture or board.
> 
> That said, I don't know enough about the PC to know what
> the exact details of the pci-hole are, so I'm not making a
> statement about what the correct model is. I'm just saying
> that what you do with the pci-hole and the container it lives
> in and the other devices in that container is not going to
> change the behaviour of any other target board.
> 
> > 2. If the above is right, and making pci-hole 64 bit wide is not an option,
> >    playing with pci-holes/other-region priorities it would be just wrong,
> >    it would be only to "fight" with the locality of the memory region's priority.
> 
> I have no idea what you mean by "fighting" here. MR priorities
By "fighting" I was referring exactly to the fact that because 
priorities are container specific we need to use them twice to
get the wanted behavior (master abort being with the lowest priority)
1. master-abort region priority for pci-address-space
2. pci-hole priority for system-address-space
But this is as designed...

Thanks,
Marcel

> apply only within a specific container region[*], to set which of
> that container's children appears 'above' another. They're
> totally local to the container (which in this case is part of the
> PC model, not the generic PCI code) and so the PC model
> can freely set them to whatever makes most sense.
> 
> [*] if you didn't already know this, see the recently committed
> updates to doc/memory.txt for a more detailed explanation.
> 
> -- PMM
>
Marcel Apfelbaum - Nov. 7, 2013, 10:09 p.m.
On Thu, 2013-11-07 at 22:48 +0100, Laszlo Ersek wrote:
> On 11/07/13 22:24, Marcel Apfelbaum wrote:
> > On Thu, 2013-11-07 at 22:12 +0100, Laszlo Ersek wrote:
> 
> >>   adding subregion 'pci-hole' to region 'system' at offset 60000000
> >>   warning: subregion collision 60000000/a0000000 (pci-hole) vs ffe00000/200000 (system.flash)
> > Thank you Laszlo for the detailed info!
> > I think the problem is right above. Why pci-hole and system.flash collide?
> > IMHO we should not play with priorities here, better solve the collision.
> 
> pc_init1()
>   pc_memory_init()
>     pc_system_firmware_init()
>       pc_system_flash_init() <---- sets base address to
>                                    0x100000000ULL - flash_size
>         pflash_cfi01_register()
>           sysbus_mmio_map()
>             sysbus_mmio_map_common()
>               memory_region_add_subregion()
>   i440fx_init()
>     memory_region_init_alias("pci-hole")
> 
> pc_init1() passes
> 
>     0x100000000ULL - below_4g_mem_size
> 
> to i440fx_init() as "pci_hole_size", which is then used as the size of
> the "pci-hole" alias.
> 
> We should probably subtract the size of the flash from this, but I don't
> know how to do that "elegantly". Yet another (output) parameter for
> pc_memory_init()? Blech.
> 
> Or look up the end address of "system.flash" by name?
Both seems ugly to me...
As Peter Maydell pointed out, the pci-hole belongs to the pc model,
so we can actually change its priority without affecting other
architectures.
 
> 
> Thanks
> Laszlo
>
Laszlo Ersek - Nov. 7, 2013, 10:23 p.m.
On 11/07/13 22:24, Marcel Apfelbaum wrote:

> Why pci-hole and system.flash collide? IMHO we should not play with
> priorities here, better solve the collision.

What about this "beautiful" series? It produces

memory
0000000000000000-000fffffffffffff (prio 0, RW): system
  [...]
  0000000060000000-00000000ffdfffff (prio 0, RW): alias pci-hole @pci
  0000000060000000-00000000ffdfffff
  [...]
  00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash

and I can run OVMF with it. It also stays within i386/pc.

Re 2/2, note that "below_4g_mem_size" never exceeds 0xe0000000 in
pc_init1(), so the subtraction is safe.

Thanks
Laszlo

Laszlo Ersek (2):
  i386/pc: propagate flash size from pc_system_flash_init() to
    pc_init1()
  i386/pc_piix: the pci-hole should end where the system flash starts

 include/hw/i386/pc.h |  6 ++++--
 hw/i386/pc.c         |  5 +++--
 hw/i386/pc_piix.c    |  5 +++--
 hw/i386/pc_q35.c     |  3 ++-
 hw/i386/pc_sysfw.c   | 10 +++++++---
 5 files changed, 19 insertions(+), 10 deletions(-)
Paolo Bonzini - Nov. 8, 2013, 8:05 a.m.
Il 07/11/2013 22:51, Peter Maydell ha scritto:
>> > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends)
>> >     is for sure PCI". Only x86 behaves like this (I think).
> 
> More specifically, the x86 pc behaves like this. Other
> x86 based systems could in theory behave differently
> (not that we actually model any, I think).

After Marcel's patch, we have changed behavior for at least all boards
that pass get_system_memory() to pci_register_bus or pci_bus_new:

* mips/gt64xxx_pci.c

* pci-host/bonito.c

* pci-host/ppce500.c

* ppc/ppc4xx_pci.c

* sh4/sh_pci.c

These now will not go anymore through unassigned_mem_ops, which is a
behavioral change for MIPS boards (gt64xxx_pci and bonito) at least.

Furthermore, the default behavior of the memory API _is_ read
all-ones/ignore writes, so I'm not sure what's the benefit of adding a
separate region for master abort...

Paolo
Paolo Bonzini - Nov. 8, 2013, 10:14 a.m.
Il 07/11/2013 23:23, Laszlo Ersek ha scritto:
> On 11/07/13 22:24, Marcel Apfelbaum wrote:
>> Why pci-hole and system.flash collide? IMHO we should not play with
>> priorities here, better solve the collision.
> 
> What about this "beautiful" series? It produces
> 
> memory
> 0000000000000000-000fffffffffffff (prio 0, RW): system
>   [...]
>   0000000060000000-00000000ffdfffff (prio 0, RW): alias pci-hole @pci
>   0000000060000000-00000000ffdfffff
>   [...]
>   00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash
> 
> and I can run OVMF with it. It also stays within i386/pc.

This definitely works, and make sense.

But I think for 1.7 we should just revert the commit.  It can be done
later in 1.8.  And it should have a qtest testcase that shows the effect
of the patch.

Other patches can still be applied to 1.7, but the change definitely had
much bigger ramifications than anticipated.  The patches are

[PATCH for-1.7 v2 5/8] pci: fix address space size for bridge
[PATCH for-1.7 v2 7/8] pc: s/INT64_MAX/UINT64_MAX/
[PATCH for-1.7 v2 8/8] spapr_pci: s/INT64_MAX/UINT64_MAX/

There's also

[PATCH 1/2] split definitions for exec.c and translate-all.c radix trees
[PATCH 2/2] exec: make address spaces 64-bit wide

which however has a 2% perf hit for TCG.  In 1.8 we can apply it and
recover the hit with other optimizations.

Paolo
Peter Maydell - Nov. 8, 2013, 10:44 a.m.
On 8 November 2013 08:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 07/11/2013 22:51, Peter Maydell ha scritto:
>>> > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends)
>>> >     is for sure PCI". Only x86 behaves like this (I think).
>>
>> More specifically, the x86 pc behaves like this. Other
>> x86 based systems could in theory behave differently
>> (not that we actually model any, I think).
>
> After Marcel's patch, we have changed behavior for at least all boards
> that pass get_system_memory() to pci_register_bus or pci_bus_new:
>
> * mips/gt64xxx_pci.c
>
> * pci-host/bonito.c
>
> * pci-host/ppce500.c
>
> * ppc/ppc4xx_pci.c
>
> * sh4/sh_pci.c

Oh, right. Ideally those boards should not do that (I fixed
the versatile pci controller not to do that a while back) but
it's a long standing behaviour so it would be better had we
not broken it.

I think it should in general be fixable by just having those
pci controllers create an empty memory region to pass in,
like versatile:

    memory_region_init(&s->pci_io_space, OBJECT(s), "pci_io", 1ULL << 32);
    memory_region_init(&s->pci_mem_space, OBJECT(s), "pci_mem", 1ULL << 32);

    pci_bus_new_inplace(&s->pci_bus, sizeof(s->pci_bus), DEVICE(obj), "pci",
                        &s->pci_mem_space, &s->pci_io_space,
                        PCI_DEVFN(11, 0), TYPE_PCI_BUS);

That doesn't affect where PCI DMA goes. It might also
require mapping an alias into that region at wherever
the pci hole is on those boards.

Kind of awkward to do and test at this point in the release cycle though.

> These now will not go anymore through unassigned_mem_ops, which is a
> behavioral change for MIPS boards (gt64xxx_pci and bonito) at least.
>
> Furthermore, the default behavior of the memory API _is_ read
> all-ones/ignore writes, so I'm not sure what's the benefit of adding a
> separate region for master abort...

It gives you a place set the appropriate PCI controller or device
register status bits on abort by a PCI device access, IIRC.

-- PMM
Paolo Bonzini - Nov. 8, 2013, 11 a.m.
Il 08/11/2013 11:44, Peter Maydell ha scritto:
> On 8 November 2013 08:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> Il 07/11/2013 22:51, Peter Maydell ha scritto:
>>>>> 1. Not all architectures have the behavior: "Address space that is not RAM(and friends)
>>>>>     is for sure PCI". Only x86 behaves like this (I think).
>>>
>>> More specifically, the x86 pc behaves like this. Other
>>> x86 based systems could in theory behave differently
>>> (not that we actually model any, I think).
>>
>> After Marcel's patch, we have changed behavior for at least all boards
>> that pass get_system_memory() to pci_register_bus or pci_bus_new:
>>
>> * mips/gt64xxx_pci.c
>>
>> * pci-host/bonito.c
>>
>> * pci-host/ppce500.c
>>
>> * ppc/ppc4xx_pci.c
>>
>> * sh4/sh_pci.c
> 
> Oh, right. Ideally those boards should not do that (I fixed
> the versatile pci controller not to do that a while back) but
> it's a long standing behaviour so it would be better had we
> not broken it.
> 
> I think it should in general be fixable by just having those
> pci controllers create an empty memory region to pass in,
> like versatile:
> 
>     memory_region_init(&s->pci_io_space, OBJECT(s), "pci_io", 1ULL << 32);
>     memory_region_init(&s->pci_mem_space, OBJECT(s), "pci_mem", 1ULL << 32);
> 
>     pci_bus_new_inplace(&s->pci_bus, sizeof(s->pci_bus), DEVICE(obj), "pci",
>                         &s->pci_mem_space, &s->pci_io_space,
>                         PCI_DEVFN(11, 0), TYPE_PCI_BUS);
> 
> That doesn't affect where PCI DMA goes. It might also
> require mapping an alias into that region at wherever
> the pci hole is on those boards.
> 
> Kind of awkward to do and test at this point in the release cycle though.

Yes, for now we should just revert the patch.

BTW I tried optimizing MMIO dispatch and managed to save ~30 cycles so I
don't think we should be afraid of increasing the depth of the radix
tree.  All stuff for 1.8 though.

Paolo

Paolo
Marcel Apfelbaum - Nov. 8, 2013, 3:08 p.m.
On Fri, 2013-11-08 at 09:05 +0100, Paolo Bonzini wrote:
> Il 07/11/2013 22:51, Peter Maydell ha scritto:
> >> > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends)
> >> >     is for sure PCI". Only x86 behaves like this (I think).
> > 
> > More specifically, the x86 pc behaves like this. Other
> > x86 based systems could in theory behave differently
> > (not that we actually model any, I think).
> 
> After Marcel's patch, we have changed behavior for at least all boards
> that pass get_system_memory() to pci_register_bus or pci_bus_new:
> 
> * mips/gt64xxx_pci.c
> 
> * pci-host/bonito.c
> 
> * pci-host/ppce500.c
> 
> * ppc/ppc4xx_pci.c
> 
> * sh4/sh_pci.c
> 
> These now will not go anymore through unassigned_mem_ops, which is a
> behavioral change for MIPS boards (gt64xxx_pci and bonito) at least.
> 
> Furthermore, the default behavior of the memory API _is_ read
> all-ones/ignore writes, so I'm not sure what's the benefit of adding a
> separate region for master abort...
Actually, as I see, the default behavior of "system" memory region
is to use unassigned_mem_ops that has valid.accepts method returning
false (no read/write methods). I don't see that read all-ones/ignore
writes is implemented.
This was the main reason I submitted this patch. I had *no* clue that
it would impact so much the system...
I still think the patch is needed ans the guests will benefit from
more accurate PCI spec emulation.

Thanks,
Marcel

> 
> Paolo
Marcel Apfelbaum - Nov. 8, 2013, 3:16 p.m.
On Fri, 2013-11-08 at 10:44 +0000, Peter Maydell wrote:
> On 8 November 2013 08:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 07/11/2013 22:51, Peter Maydell ha scritto:
> >>> > 1. Not all architectures have the behavior: "Address space that is not RAM(and friends)
> >>> >     is for sure PCI". Only x86 behaves like this (I think).
> >>
> >> More specifically, the x86 pc behaves like this. Other
> >> x86 based systems could in theory behave differently
> >> (not that we actually model any, I think).
> >
> > After Marcel's patch, we have changed behavior for at least all boards
> > that pass get_system_memory() to pci_register_bus or pci_bus_new:
> >
> > * mips/gt64xxx_pci.c
> >
> > * pci-host/bonito.c
> >
> > * pci-host/ppce500.c
> >
> > * ppc/ppc4xx_pci.c
> >
> > * sh4/sh_pci.c
> 
> Oh, right. Ideally those boards should not do that (I fixed
> the versatile pci controller not to do that a while back) but
> it's a long standing behaviour so it would be better had we
> not broken it.
> 
> I think it should in general be fixable by just having those
> pci controllers create an empty memory region to pass in,
> like versatile:
> 
>     memory_region_init(&s->pci_io_space, OBJECT(s), "pci_io", 1ULL << 32);
>     memory_region_init(&s->pci_mem_space, OBJECT(s), "pci_mem", 1ULL << 32);
> 
>     pci_bus_new_inplace(&s->pci_bus, sizeof(s->pci_bus), DEVICE(obj), "pci",
>                         &s->pci_mem_space, &s->pci_io_space,
>                         PCI_DEVFN(11, 0), TYPE_PCI_BUS);
> 
> That doesn't affect where PCI DMA goes. It might also
> require mapping an alias into that region at wherever
> the pci hole is on those boards.
> 
> Kind of awkward to do and test at this point in the release cycle though.
> 
> > These now will not go anymore through unassigned_mem_ops, which is a
> > behavioral change for MIPS boards (gt64xxx_pci and bonito) at least.
> >
> > Furthermore, the default behavior of the memory API _is_ read
> > all-ones/ignore writes, so I'm not sure what's the benefit of adding a
> > separate region for master abort...
> 
> It gives you a place set the appropriate PCI controller or device
> register status bits on abort by a PCI device access, IIRC.

Right, thanks for pointing this out. This was indeed the patches direction.

But even the first patch has meaning by itself and not just a preparation.
As I mentioned in other mail in this thread "read all-ones/ignore writes"
is not implemented yet asbeacuse unassigned_mem_ops has no read/write methods
and valid.accepts returns false.

Thanks,
Marcel

> 
> -- PMM
>
Paolo Bonzini - Nov. 8, 2013, 4:12 p.m.
Il 08/11/2013 16:08, Marcel Apfelbaum ha scritto:
> Actually, as I see, the default behavior of "system" memory region
> is to use unassigned_mem_ops that has valid.accepts method returning
> false (no read/write methods). I don't see that read all-ones/ignore
> writes is implemented.

Right, it's read-all-zeroes (see unassigned_mem_read).  It was
read-all-ones for a brief time, then it got changed back to read-all-zeroes.

> This was the main reason I submitted this patch. I had *no* clue that
> it would impact so much the system...

Yeah, it's not your fault.  But it should have been held back until 1.8.
 Do you agree with reverting it?

> I still think the patch is needed ans the guests will benefit from
> more accurate PCI spec emulation.

Only buggy guests :) but yes, I agree it's a good thing to have.

Paolo
Marcel Apfelbaum - Nov. 8, 2013, 4:19 p.m.
On Fri, 2013-11-08 at 17:12 +0100, Paolo Bonzini wrote:
> Il 08/11/2013 16:08, Marcel Apfelbaum ha scritto:
> > Actually, as I see, the default behavior of "system" memory region
> > is to use unassigned_mem_ops that has valid.accepts method returning
> > false (no read/write methods). I don't see that read all-ones/ignore
> > writes is implemented.
> 
> Right, it's read-all-zeroes (see unassigned_mem_read).  It was
> read-all-ones for a brief time, then it got changed back to read-all-zeroes.
I remember something, Jan's patches got reverted because the modification
was system wide and not only for pci address space.
> 
> > This was the main reason I submitted this patch. I had *no* clue that
> > it would impact so much the system...
> 
> Yeah, it's not your fault.  But it should have been held back until 1.8.
>  Do you agree with reverting it?
Given the actual impact on version's stability, I agree
it is the right thing to do.

> 
> > I still think the patch is needed ans the guests will benefit from
> > more accurate PCI spec emulation.
> 
> Only buggy guests :) but yes, I agree it's a good thing to have.
Yes, it may be a driver there making some a wrong decision and
"reading-all-ones" may give it a clue. (the same for writes)

Thanks,
Marcel

> 
> Paolo
Igor Mammedov - Nov. 8, 2013, 4:37 p.m.
On Thu,  7 Nov 2013 23:23:57 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> On 11/07/13 22:24, Marcel Apfelbaum wrote:
> 
> > Why pci-hole and system.flash collide? IMHO we should not play with
> > priorities here, better solve the collision.
> 
> What about this "beautiful" series? It produces
Laszlo,
 there is patch that removes PCI-hole aliases at all
 mapping PCI address space with -1 priority in system address space.
 so all access flash region range will go to it sice it's priority 0 

> 
> memory
> 0000000000000000-000fffffffffffff (prio 0, RW): system
>   [...]
>   0000000060000000-00000000ffdfffff (prio 0, RW): alias pci-hole @pci
>   0000000060000000-00000000ffdfffff
>   [...]
>   00000000ffe00000-00000000ffffffff (prio 0, R-): system.flash
> 
> and I can run OVMF with it. It also stays within i386/pc.
> 
> Re 2/2, note that "below_4g_mem_size" never exceeds 0xe0000000 in
> pc_init1(), so the subtraction is safe.
> 
> Thanks
> Laszlo
> 
> Laszlo Ersek (2):
>   i386/pc: propagate flash size from pc_system_flash_init() to
>     pc_init1()
>   i386/pc_piix: the pci-hole should end where the system flash starts
> 
>  include/hw/i386/pc.h |  6 ++++--
>  hw/i386/pc.c         |  5 +++--
>  hw/i386/pc_piix.c    |  5 +++--
>  hw/i386/pc_q35.c     |  3 ++-
>  hw/i386/pc_sysfw.c   | 10 +++++++---
>  5 files changed, 19 insertions(+), 10 deletions(-)
> 
> -- 
> 1.8.3.1
> 
>

Patch

diff --git a/memory.c b/memory.c
index 28f6449..5e60a6c 100644
--- a/memory.c
+++ b/memory.c
@@ -1427,6 +1427,10 @@  static void memory_region_add_subregion_common(MemoryRegion *mr,
     memory_region_ref(subregion);
     subregion->parent = mr;
     subregion->addr = offset;
+
+    fprintf(stderr, "adding subregion '%s' to region '%s' at offset %llx\n",
+            subregion->name, mr->name, (unsigned long long)offset);
+
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->may_overlap || other->may_overlap) {
             continue;
@@ -1437,8 +1441,8 @@  static void memory_region_add_subregion_common(MemoryRegion *mr,
                          int128_make64(other->addr))) {
             continue;
         }
-#if 0
-        printf("warning: subregion collision %llx/%llx (%s) "
+#if 1
+        fprintf(stderr, "warning: subregion collision %llx/%llx (%s) "
                "vs %llx/%llx (%s)\n",
                (unsigned long long)offset,
                (unsigned long long)int128_get64(subregion->size),
@@ -1448,12 +1452,21 @@  static void memory_region_add_subregion_common(MemoryRegion *mr,
                other->name);
 #endif
     }
+
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->priority >= other->priority) {
+            fprintf(stderr, "subregion '%s' (prio: %d) wins over sibling "
+                    "subregion '%s' (prio: %d)\n", subregion->name,
+                    subregion->priority, other->name, other->priority);
             QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
             goto done;
         }
+        fprintf(stderr, "subregion '%s' (prio: %d) loses to sibling "
+                "subregion '%s' (prio: %d)\n", subregion->name,
+                subregion->priority, other->name, other->priority);
     }
+    fprintf(stderr, "subregion '%s' (prio: %d) inserted at tail\n",
+            subregion->name, subregion->priority);
     QTAILQ_INSERT_TAIL(&mr->subregions, subregion, subregions_link);
 done:
     memory_region_update_pending |= mr->enabled && subregion->enabled;