diff mbox

[edk2,edk2,0/1] OvmfPkg: grab ACPI tables from QEMU

Message ID 528B5A2B.3060100@redhat.com
State New
Headers show

Commit Message

Laszlo Ersek Nov. 19, 2013, 12:31 p.m. UTC
On 11/19/13 09:54, Gerd Hoffmann wrote:
>   Hi,
>
>>   ACPI PciWindow32: Base=0xA0000000 End=0xFEEFFFFF Length=0x5EF00000
>
>>   begin32=c0000000 end32=fec00000 begin64=0 end64=0
>
> qemu & seabios pick a 32bit window size which allows to cover it with
> a single mtrr entry.  Size must be a power of two for that to work.
>
>   [root@fedora ~]# cat /proc/mtrr
>   reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
>
> So we have three cases for piix (as you've figured in the source code
> already).  Start at 0x80000000 (2G window), 0xc0000000 (1G window) and
> 0xe0000000 (512M window).
>
> btw: q35 has a fixed 1G window, max low ram addr is 0xb000000, the
> remaining address space (0xb0000000 -> 0xc0000000) is used for
> mmconfig.
>
>> I guess the range starting at 0xc0000000 is somehow "incompatible"
>> with the EFI memory map. (I can't actually explain this idea because,
>> again, this second range is a proper subset of the former, and its
>> size is still 1004MB.)
>
> Probably efi places the gfx memory bar somewhere between 0xa0000000
> and 0xc0000000.  Sets up efifb accordingly.  Then the linux kernel
> loads and figures "Oh, that bar is outside the 0xc0000000+ window" and
> goes remap it -> efifb writes go into nowhere.

Thank you for the explanation.

How do I fix it though? :) I could change the MMIO HOBs in PEI
(OvmfPkg/PlatformPei/Platform.c):

  //
  // Video memory + Legacy BIOS region
  //
  AddIoMemoryRangeHob (0x0A0000, BASE_1MB);

  //
  // address       purpose   size
  // ------------  --------  -------------------------
  // max(top, 2g)  PCI MMIO  0xFC000000 - max(top, 2g)
  // 0xFC000000    gap                           44 MB
  // 0xFEC00000    IO-APIC                        4 KB
  // 0xFEC01000    gap                         1020 KB
  // 0xFED00000    HPET                           1 KB
  // 0xFED00400    gap                         1023 KB
  // 0xFEE00000    LAPIC                          1 MB
  //
  AddIoMemoryRangeHob (TopOfMemory < BASE_2GB ? BASE_2GB : TopOfMemory, 0xFC000000);
  AddIoMemoryBaseSizeHob (0xFEC00000, SIZE_4KB);
  AddIoMemoryBaseSizeHob (0xFED00000, SIZE_1KB);
  AddIoMemoryBaseSizeHob (PcdGet32(PcdCpuLocalApicBaseAddress), SIZE_1MB);

to imitate the same three cases. The HOB with the lowest address
produced here would affect the BAR placement as well.

... Yes, I tested the attached patch, and it makes the display work in
both 2560MB guests.

However, I don't like the idea of hardwiring those boundaries here. (The
current values are also hardwired, but they are better: they are not the
consequence of "SeaBIOS has done it like this forever" -- inter-firmware
dependency, especially when they aren't each other's payloads, is bad
IMHO.) We'd need something dynamic here, like a memory map from qemu.

... Which puts us in the same boat with Wei :)

Thanks
Laszlo
From 9cf2af82399d7d7a9717ff6ac17860b66c705a64 Mon Sep 17 00:00:00 2001
From: Laszlo Ersek <lersek@redhat.com>
Date: Tue, 19 Nov 2013 13:07:41 +0100
Subject: [PATCH] OvmfPkg/PlatformPei: follow SeaBIOS tradition with 32-bit PCI
 hole placement

Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
---
 OvmfPkg/PlatformPei/Platform.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Michael S. Tsirkin Nov. 19, 2013, 8:54 p.m. UTC | #1
On Tue, Nov 19, 2013 at 01:31:39PM +0100, Laszlo Ersek wrote:
> On 11/19/13 09:54, Gerd Hoffmann wrote:
> >   Hi,
> >
> >>   ACPI PciWindow32: Base=0xA0000000 End=0xFEEFFFFF Length=0x5EF00000
> >
> >>   begin32=c0000000 end32=fec00000 begin64=0 end64=0
> >
> > qemu & seabios pick a 32bit window size which allows to cover it with
> > a single mtrr entry.  Size must be a power of two for that to work.
> >
> >   [root@fedora ~]# cat /proc/mtrr
> >   reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
> >
> > So we have three cases for piix (as you've figured in the source code
> > already).  Start at 0x80000000 (2G window), 0xc0000000 (1G window) and
> > 0xe0000000 (512M window).
> >
> > btw: q35 has a fixed 1G window, max low ram addr is 0xb000000, the
> > remaining address space (0xb0000000 -> 0xc0000000) is used for
> > mmconfig.

Is it really fixed? My understanding is it's programmed by firmware,
you can put mmconfig whereever you want.

> >> I guess the range starting at 0xc0000000 is somehow "incompatible"
> >> with the EFI memory map. (I can't actually explain this idea because,
> >> again, this second range is a proper subset of the former, and its
> >> size is still 1004MB.)
> >
> > Probably efi places the gfx memory bar somewhere between 0xa0000000
> > and 0xc0000000.  Sets up efifb accordingly.  Then the linux kernel
> > loads and figures "Oh, that bar is outside the 0xc0000000+ window" and
> > goes remap it -> efifb writes go into nowhere.
> 
> Thank you for the explanation.
> 
> How do I fix it though? :) I could change the MMIO HOBs in PEI
> (OvmfPkg/PlatformPei/Platform.c):
> 
>   //
>   // Video memory + Legacy BIOS region
>   //
>   AddIoMemoryRangeHob (0x0A0000, BASE_1MB);
> 
>   //
>   // address       purpose   size
>   // ------------  --------  -------------------------
>   // max(top, 2g)  PCI MMIO  0xFC000000 - max(top, 2g)
>   // 0xFC000000    gap                           44 MB
>   // 0xFEC00000    IO-APIC                        4 KB
>   // 0xFEC01000    gap                         1020 KB
>   // 0xFED00000    HPET                           1 KB
>   // 0xFED00400    gap                         1023 KB
>   // 0xFEE00000    LAPIC                          1 MB
>   //
>   AddIoMemoryRangeHob (TopOfMemory < BASE_2GB ? BASE_2GB : TopOfMemory, 0xFC000000);
>   AddIoMemoryBaseSizeHob (0xFEC00000, SIZE_4KB);
>   AddIoMemoryBaseSizeHob (0xFED00000, SIZE_1KB);
>   AddIoMemoryBaseSizeHob (PcdGet32(PcdCpuLocalApicBaseAddress), SIZE_1MB);
> 
> to imitate the same three cases. The HOB with the lowest address
> produced here would affect the BAR placement as well.
> 
> ... Yes, I tested the attached patch, and it makes the display work in
> both 2560MB guests.
> 
> However, I don't like the idea of hardwiring those boundaries here. (The
> current values are also hardwired, but they are better: they are not the
> consequence of "SeaBIOS has done it like this forever" -- inter-firmware
> dependency, especially when they aren't each other's payloads, is bad
> IMHO.) We'd need something dynamic here, like a memory map from qemu.
> 
> ... Which puts us in the same boat with Wei :)
> 
> Thanks
> Laszlo

> From 9cf2af82399d7d7a9717ff6ac17860b66c705a64 Mon Sep 17 00:00:00 2001
> From: Laszlo Ersek <lersek@redhat.com>
> Date: Tue, 19 Nov 2013 13:07:41 +0100
> Subject: [PATCH] OvmfPkg/PlatformPei: follow SeaBIOS tradition with 32-bit PCI
>  hole placement
> 
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  OvmfPkg/PlatformPei/Platform.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/OvmfPkg/PlatformPei/Platform.c b/OvmfPkg/PlatformPei/Platform.c
> index fb56e99..5bc0d74 100644
> --- a/OvmfPkg/PlatformPei/Platform.c
> +++ b/OvmfPkg/PlatformPei/Platform.c
> @@ -197,7 +197,7 @@ MemMapInitialization (
>    //
>    // address       purpose   size
>    // ------------  --------  -------------------------
> -  // max(top, 2g)  PCI MMIO  0xFC000000 - max(top, 2g)
> +  // 2G/3G/3.5G    PCI MMIO    0xFC000000 - 2G/3G/3.5G
>    // 0xFC000000    gap                           44 MB
>    // 0xFEC00000    IO-APIC                        4 KB
>    // 0xFEC01000    gap                         1020 KB
> @@ -205,7 +205,9 @@ MemMapInitialization (
>    // 0xFED00400    gap                         1023 KB
>    // 0xFEE00000    LAPIC                          1 MB
>    //
> -  AddIoMemoryRangeHob (TopOfMemory < BASE_2GB ? BASE_2GB : TopOfMemory, 0xFC000000);
> +  AddIoMemoryRangeHob (TopOfMemory <= 0x80000000 ? 0x80000000 :
> +                       TopOfMemory <= 0xC0000000 ? 0xC0000000 :
> +                       0xE0000000, 0xFC000000);
>    AddIoMemoryBaseSizeHob (0xFEC00000, SIZE_4KB);
>    AddIoMemoryBaseSizeHob (0xFED00000, SIZE_1KB);
>    AddIoMemoryBaseSizeHob (PcdGet32(PcdCpuLocalApicBaseAddress), SIZE_1MB);
> -- 
> 1.8.3.1
>
Michael S. Tsirkin Nov. 19, 2013, 9:13 p.m. UTC | #2
On Tue, Nov 19, 2013 at 01:31:39PM +0100, Laszlo Ersek wrote:
> On 11/19/13 09:54, Gerd Hoffmann wrote:
> >   Hi,
> >
> >>   ACPI PciWindow32: Base=0xA0000000 End=0xFEEFFFFF Length=0x5EF00000
> >
> >>   begin32=c0000000 end32=fec00000 begin64=0 end64=0
> >
> > qemu & seabios pick a 32bit window size which allows to cover it with
> > a single mtrr entry.  Size must be a power of two for that to work.
> >
> >   [root@fedora ~]# cat /proc/mtrr
> >   reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
> >
> > So we have three cases for piix (as you've figured in the source code
> > already).  Start at 0x80000000 (2G window), 0xc0000000 (1G window) and
> > 0xe0000000 (512M window).
> >
> > btw: q35 has a fixed 1G window, max low ram addr is 0xb000000, the
> > remaining address space (0xb0000000 -> 0xc0000000) is used for
> > mmconfig.
> >
> >> I guess the range starting at 0xc0000000 is somehow "incompatible"
> >> with the EFI memory map. (I can't actually explain this idea because,
> >> again, this second range is a proper subset of the former, and its
> >> size is still 1004MB.)
> >
> > Probably efi places the gfx memory bar somewhere between 0xa0000000
> > and 0xc0000000.  Sets up efifb accordingly.  Then the linux kernel
> > loads and figures "Oh, that bar is outside the 0xc0000000+ window" and
> > goes remap it -> efifb writes go into nowhere.
> 
> Thank you for the explanation.
> 
> How do I fix it though? :) I could change the MMIO HOBs in PEI
> (OvmfPkg/PlatformPei/Platform.c):
> 
>   //
>   // Video memory + Legacy BIOS region
>   //
>   AddIoMemoryRangeHob (0x0A0000, BASE_1MB);
> 
>   //
>   // address       purpose   size
>   // ------------  --------  -------------------------
>   // max(top, 2g)  PCI MMIO  0xFC000000 - max(top, 2g)
>   // 0xFC000000    gap                           44 MB
>   // 0xFEC00000    IO-APIC                        4 KB
>   // 0xFEC01000    gap                         1020 KB
>   // 0xFED00000    HPET                           1 KB
>   // 0xFED00400    gap                         1023 KB
>   // 0xFEE00000    LAPIC                          1 MB
>   //
>   AddIoMemoryRangeHob (TopOfMemory < BASE_2GB ? BASE_2GB : TopOfMemory, 0xFC000000);
>   AddIoMemoryBaseSizeHob (0xFEC00000, SIZE_4KB);
>   AddIoMemoryBaseSizeHob (0xFED00000, SIZE_1KB);
>   AddIoMemoryBaseSizeHob (PcdGet32(PcdCpuLocalApicBaseAddress), SIZE_1MB);
> 
> to imitate the same three cases. The HOB with the lowest address
> produced here would affect the BAR placement as well.
> 
> ... Yes, I tested the attached patch, and it makes the display work in
> both 2560MB guests.
> 
> However, I don't like the idea of hardwiring those boundaries here. (The
> current values are also hardwired, but they are better: they are not the
> consequence of "SeaBIOS has done it like this forever" -- inter-firmware
> dependency, especially when they aren't each other's payloads, is bad
> IMHO.) We'd need something dynamic here, like a memory map from qemu.
> 
> ... Which puts us in the same boat with Wei :)
> 
> Thanks
> Laszlo

So the bug is really qemu being too conservative with it's ACPI
tables, isn't it?
The MTRRs in BIOS don't affect much: AFAIK they are completely ignored
if it's exiting to the host, but I'm not sure about assigned
devices.
Alex, can you remind us all what happens there?


> From 9cf2af82399d7d7a9717ff6ac17860b66c705a64 Mon Sep 17 00:00:00 2001
> From: Laszlo Ersek <lersek@redhat.com>
> Date: Tue, 19 Nov 2013 13:07:41 +0100
> Subject: [PATCH] OvmfPkg/PlatformPei: follow SeaBIOS tradition with 32-bit PCI
>  hole placement
> 
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> ---
>  OvmfPkg/PlatformPei/Platform.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/OvmfPkg/PlatformPei/Platform.c b/OvmfPkg/PlatformPei/Platform.c
> index fb56e99..5bc0d74 100644
> --- a/OvmfPkg/PlatformPei/Platform.c
> +++ b/OvmfPkg/PlatformPei/Platform.c
> @@ -197,7 +197,7 @@ MemMapInitialization (
>    //
>    // address       purpose   size
>    // ------------  --------  -------------------------
> -  // max(top, 2g)  PCI MMIO  0xFC000000 - max(top, 2g)
> +  // 2G/3G/3.5G    PCI MMIO    0xFC000000 - 2G/3G/3.5G
>    // 0xFC000000    gap                           44 MB
>    // 0xFEC00000    IO-APIC                        4 KB
>    // 0xFEC01000    gap                         1020 KB
> @@ -205,7 +205,9 @@ MemMapInitialization (
>    // 0xFED00400    gap                         1023 KB
>    // 0xFEE00000    LAPIC                          1 MB
>    //
> -  AddIoMemoryRangeHob (TopOfMemory < BASE_2GB ? BASE_2GB : TopOfMemory, 0xFC000000);
> +  AddIoMemoryRangeHob (TopOfMemory <= 0x80000000 ? 0x80000000 :
> +                       TopOfMemory <= 0xC0000000 ? 0xC0000000 :
> +                       0xE0000000, 0xFC000000);
>    AddIoMemoryBaseSizeHob (0xFEC00000, SIZE_4KB);
>    AddIoMemoryBaseSizeHob (0xFED00000, SIZE_1KB);
>    AddIoMemoryBaseSizeHob (PcdGet32(PcdCpuLocalApicBaseAddress), SIZE_1MB);
> -- 
> 1.8.3.1
>
Laszlo Ersek Nov. 19, 2013, 9:39 p.m. UTC | #3
On 11/19/13 22:13, Michael S. Tsirkin wrote:
> On Tue, Nov 19, 2013 at 01:31:39PM +0100, Laszlo Ersek wrote:
>> On 11/19/13 09:54, Gerd Hoffmann wrote:
>>>   Hi,
>>>
>>>>   ACPI PciWindow32: Base=0xA0000000 End=0xFEEFFFFF Length=0x5EF00000
>>>
>>>>   begin32=c0000000 end32=fec00000 begin64=0 end64=0
>>>
>>> qemu & seabios pick a 32bit window size which allows to cover it with
>>> a single mtrr entry.  Size must be a power of two for that to work.
>>>
>>>   [root@fedora ~]# cat /proc/mtrr
>>>   reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
>>>
>>> So we have three cases for piix (as you've figured in the source code
>>> already).  Start at 0x80000000 (2G window), 0xc0000000 (1G window) and
>>> 0xe0000000 (512M window).
>>>
>>> btw: q35 has a fixed 1G window, max low ram addr is 0xb000000, the
>>> remaining address space (0xb0000000 -> 0xc0000000) is used for
>>> mmconfig.
>>>
>>>> I guess the range starting at 0xc0000000 is somehow "incompatible"
>>>> with the EFI memory map. (I can't actually explain this idea because,
>>>> again, this second range is a proper subset of the former, and its
>>>> size is still 1004MB.)
>>>
>>> Probably efi places the gfx memory bar somewhere between 0xa0000000
>>> and 0xc0000000.  Sets up efifb accordingly.  Then the linux kernel
>>> loads and figures "Oh, that bar is outside the 0xc0000000+ window" and
>>> goes remap it -> efifb writes go into nowhere.
>>
>> Thank you for the explanation.
>>
>> How do I fix it though? :) I could change the MMIO HOBs in PEI
>> (OvmfPkg/PlatformPei/Platform.c):
>>
>>   //
>>   // Video memory + Legacy BIOS region
>>   //
>>   AddIoMemoryRangeHob (0x0A0000, BASE_1MB);
>>
>>   //
>>   // address       purpose   size
>>   // ------------  --------  -------------------------
>>   // max(top, 2g)  PCI MMIO  0xFC000000 - max(top, 2g)
>>   // 0xFC000000    gap                           44 MB
>>   // 0xFEC00000    IO-APIC                        4 KB
>>   // 0xFEC01000    gap                         1020 KB
>>   // 0xFED00000    HPET                           1 KB
>>   // 0xFED00400    gap                         1023 KB
>>   // 0xFEE00000    LAPIC                          1 MB
>>   //
>>   AddIoMemoryRangeHob (TopOfMemory < BASE_2GB ? BASE_2GB : TopOfMemory, 0xFC000000);
>>   AddIoMemoryBaseSizeHob (0xFEC00000, SIZE_4KB);
>>   AddIoMemoryBaseSizeHob (0xFED00000, SIZE_1KB);
>>   AddIoMemoryBaseSizeHob (PcdGet32(PcdCpuLocalApicBaseAddress), SIZE_1MB);
>>
>> to imitate the same three cases. The HOB with the lowest address
>> produced here would affect the BAR placement as well.
>>
>> ... Yes, I tested the attached patch, and it makes the display work in
>> both 2560MB guests.
>>
>> However, I don't like the idea of hardwiring those boundaries here. (The
>> current values are also hardwired, but they are better: they are not the
>> consequence of "SeaBIOS has done it like this forever" -- inter-firmware
>> dependency, especially when they aren't each other's payloads, is bad
>> IMHO.) We'd need something dynamic here, like a memory map from qemu.
>>
>> ... Which puts us in the same boat with Wei :)
>>
>> Thanks
>> Laszlo
> 
> So the bug is really qemu being too conservative with it's ACPI
> tables, isn't it?

Here's what I think happens (and I'll shamelessly steal Gerd's explanation):

(1) OVMF configures its own memory map very early (in PEI). The ranges
configured are quite static. The only variable that goes into this
configuration is "top of RAM below 4G", which is capped at 3.5GB by qemu
(the variable comes from the CMOS):

  TopOfMemory <=  2G ---> PCI hole starts at 2G
  TopOfMemory >   2G ---> PCI hole starts at TopOfMemory
  TopOfMemory > 3.5G ---> not possible

The low bound of the 32-bit PCI hole determined thus further affects two
things (in a later phase (DXE)) of OVMF:

(2) PCI resource allocation -- in practice the video framebuffer gets
allocated right at the start of the hole,
(3) the value that OVMF's own ACPI table population code advertises in
the PCI bridge's _CRS.


              > (2) PCI resource alloc
             /
  (1) memmap
            \
             > (3) ACPI
                       \
                        > (4) guest OS access -- match (2)

So, until this patch, everything was in sync. With this patch, the low
bound exported over ACPI -- in (3) -- got replaced (whole-sale) by
qemu's ACPI table. Therefore (1) only determined (2), and a conflict
ensued between (2) and (3). (Which is where I refer back to Gerd's
explanation -- the guest OS tries to access the framebuffer according to
(3), the value in the ACPI table, but that's not where the framebuffer
has actually been configured in (2), which had keyed off (1).)

              > (2) PCI resource alloc
             /
  (1) memmap

   (X) qemu
           \
            > (3) ACPI
                      \
                       > (4) guest OS access -- should match (2),
                                                but doesn't


The *real* bug is that OVMF's memory map configuration (1) is not
*dynamically* synchronized with qemu's ACPI table population logic (X).
My patch that I attached earlier to this thread fixes this (by simply
duplicating the same "quantization" logic), but it's ugly:

              > (2) PCI resource alloc
             /
  (1) memmap
        ||
   (X) qemu
           \
            > (3) ACPI
                      \
                       > (4) guest OS access -- matches (2) by way of
                                                code duplication
                                                between (1) and (X)

What we really should implement is this:

                          > (2) PCI resource alloc
                         /
            > (1) memmap
           /
   (X) qemu
           \
            > (3) ACPI
                       \
                        > (4) guest OS access -- matches (2) by (X)
                                                 being the common root
                                                 of everything

That is, OVMF should compose its own memmap in (1) based on dynamic
information from qemu (importantly, the start address of the 32-bit PCI
hole that qemu is going to advertise later in the ACPI tables).

> The MTRRs in BIOS don't affect much: AFAIK they are completely ignored
> if it's exiting to the host, but I'm not sure about assigned
> devices.
> Alex, can you remind us all what happens there?

If you simply relax (drop) the stage-wise hole-setting, that will at
best create a situation where OVMF and qemu duplicate the logic. This
info needs to be passed down to OVMF (maybe it already is, somewhere in
fw_cfg!), in some form that's more approachable than an ACPI table. (We
absolutely don't want to parse ACPI (AML) in OVMF.)

Thanks!
Laszlo
Michael S. Tsirkin Nov. 19, 2013, 10:03 p.m. UTC | #4
On Tue, Nov 19, 2013 at 10:39:06PM +0100, Laszlo Ersek wrote:
> On 11/19/13 22:13, Michael S. Tsirkin wrote:
> > On Tue, Nov 19, 2013 at 01:31:39PM +0100, Laszlo Ersek wrote:
> >> On 11/19/13 09:54, Gerd Hoffmann wrote:
> >>>   Hi,
> >>>
> >>>>   ACPI PciWindow32: Base=0xA0000000 End=0xFEEFFFFF Length=0x5EF00000
> >>>
> >>>>   begin32=c0000000 end32=fec00000 begin64=0 end64=0
> >>>
> >>> qemu & seabios pick a 32bit window size which allows to cover it with
> >>> a single mtrr entry.  Size must be a power of two for that to work.
> >>>
> >>>   [root@fedora ~]# cat /proc/mtrr
> >>>   reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
> >>>
> >>> So we have three cases for piix (as you've figured in the source code
> >>> already).  Start at 0x80000000 (2G window), 0xc0000000 (1G window) and
> >>> 0xe0000000 (512M window).
> >>>
> >>> btw: q35 has a fixed 1G window, max low ram addr is 0xb000000, the
> >>> remaining address space (0xb0000000 -> 0xc0000000) is used for
> >>> mmconfig.
> >>>
> >>>> I guess the range starting at 0xc0000000 is somehow "incompatible"
> >>>> with the EFI memory map. (I can't actually explain this idea because,
> >>>> again, this second range is a proper subset of the former, and its
> >>>> size is still 1004MB.)
> >>>
> >>> Probably efi places the gfx memory bar somewhere between 0xa0000000
> >>> and 0xc0000000.  Sets up efifb accordingly.  Then the linux kernel
> >>> loads and figures "Oh, that bar is outside the 0xc0000000+ window" and
> >>> goes remap it -> efifb writes go into nowhere.
> >>
> >> Thank you for the explanation.
> >>
> >> How do I fix it though? :) I could change the MMIO HOBs in PEI
> >> (OvmfPkg/PlatformPei/Platform.c):
> >>
> >>   //
> >>   // Video memory + Legacy BIOS region
> >>   //
> >>   AddIoMemoryRangeHob (0x0A0000, BASE_1MB);
> >>
> >>   //
> >>   // address       purpose   size
> >>   // ------------  --------  -------------------------
> >>   // max(top, 2g)  PCI MMIO  0xFC000000 - max(top, 2g)
> >>   // 0xFC000000    gap                           44 MB
> >>   // 0xFEC00000    IO-APIC                        4 KB
> >>   // 0xFEC01000    gap                         1020 KB
> >>   // 0xFED00000    HPET                           1 KB
> >>   // 0xFED00400    gap                         1023 KB
> >>   // 0xFEE00000    LAPIC                          1 MB
> >>   //
> >>   AddIoMemoryRangeHob (TopOfMemory < BASE_2GB ? BASE_2GB : TopOfMemory, 0xFC000000);
> >>   AddIoMemoryBaseSizeHob (0xFEC00000, SIZE_4KB);
> >>   AddIoMemoryBaseSizeHob (0xFED00000, SIZE_1KB);
> >>   AddIoMemoryBaseSizeHob (PcdGet32(PcdCpuLocalApicBaseAddress), SIZE_1MB);
> >>
> >> to imitate the same three cases. The HOB with the lowest address
> >> produced here would affect the BAR placement as well.
> >>
> >> ... Yes, I tested the attached patch, and it makes the display work in
> >> both 2560MB guests.
> >>
> >> However, I don't like the idea of hardwiring those boundaries here. (The
> >> current values are also hardwired, but they are better: they are not the
> >> consequence of "SeaBIOS has done it like this forever" -- inter-firmware
> >> dependency, especially when they aren't each other's payloads, is bad
> >> IMHO.) We'd need something dynamic here, like a memory map from qemu.
> >>
> >> ... Which puts us in the same boat with Wei :)
> >>
> >> Thanks
> >> Laszlo
> > 
> > So the bug is really qemu being too conservative with it's ACPI
> > tables, isn't it?
> 
> Here's what I think happens (and I'll shamelessly steal Gerd's explanation):
> 
> (1) OVMF configures its own memory map very early (in PEI). The ranges
> configured are quite static. The only variable that goes into this
> configuration is "top of RAM below 4G", which is capped at 3.5GB by qemu
> (the variable comes from the CMOS):
> 
>   TopOfMemory <=  2G ---> PCI hole starts at 2G
>   TopOfMemory >   2G ---> PCI hole starts at TopOfMemory
>   TopOfMemory > 3.5G ---> not possible
> 
> The low bound of the 32-bit PCI hole determined thus further affects two
> things (in a later phase (DXE)) of OVMF:
> 
> (2) PCI resource allocation -- in practice the video framebuffer gets
> allocated right at the start of the hole,
> (3) the value that OVMF's own ACPI table population code advertises in
> the PCI bridge's _CRS.
> 
> 
>               > (2) PCI resource alloc
>              /
>   (1) memmap
>             \
>              > (3) ACPI
>                        \
>                         > (4) guest OS access -- match (2)
> 
> So, until this patch, everything was in sync. With this patch, the low
> bound exported over ACPI -- in (3) -- got replaced (whole-sale) by
> qemu's ACPI table. Therefore (1) only determined (2), and a conflict
> ensued between (2) and (3). (Which is where I refer back to Gerd's
> explanation -- the guest OS tries to access the framebuffer according to
> (3), the value in the ACPI table, but that's not where the framebuffer
> has actually been configured in (2), which had keyed off (1).)
> 
>               > (2) PCI resource alloc
>              /
>   (1) memmap
> 
>    (X) qemu
>            \
>             > (3) ACPI
>                       \
>                        > (4) guest OS access -- should match (2),
>                                                 but doesn't
> 
> 
> The *real* bug is that OVMF's memory map configuration (1) is not
> *dynamically* synchronized with qemu's ACPI table population logic (X).
> My patch that I attached earlier to this thread fixes this (by simply
> duplicating the same "quantization" logic), but it's ugly:
> 
>               > (2) PCI resource alloc
>              /
>   (1) memmap
>         ||
>    (X) qemu
>            \
>             > (3) ACPI
>                       \
>                        > (4) guest OS access -- matches (2) by way of
>                                                 code duplication
>                                                 between (1) and (X)
> 
> What we really should implement is this:
> 
>                           > (2) PCI resource alloc
>                          /
>             > (1) memmap
>            /
>    (X) qemu
>            \
>             > (3) ACPI
>                        \
>                         > (4) guest OS access -- matches (2) by (X)
>                                                  being the common root
>                                                  of everything
> 
> That is, OVMF should compose its own memmap in (1) based on dynamic
> information from qemu (importantly, the start address of the 32-bit PCI
> hole that qemu is going to advertise later in the ACPI tables).
> 
> > The MTRRs in BIOS don't affect much: AFAIK they are completely ignored
> > if it's exiting to the host, but I'm not sure about assigned
> > devices.
> > Alex, can you remind us all what happens there?
> 
> If you simply relax (drop) the stage-wise hole-setting, that will at
> best create a situation where OVMF and qemu duplicate the logic. This
> info needs to be passed down to OVMF (maybe it already is, somewhere in
> fw_cfg!), in some form that's more approachable than an ACPI table. (We
> absolutely don't want to parse ACPI (AML) in OVMF.)
> 
> Thanks!
> Laszlo


So the argument of whether we should make qemu control
PCI BAR mapping for guest is a completely separate issue
I think.
Even if we should, I think making ACPI depend on guest
obeying this logic is a mistake.

I think the whole confusion comes from the pci hole terminology.
There's no PCI hole, at least not on PIIX.
On PIIX what's not memory apic etc, is PCI.
So guest can use everything that isn't memory for PCI.

The only issue is _CRS in our ACPI is kind of seabios specific.
So let's just make it generic, make it really describe hardware:


Two possible fixes, both of them in QEMU:
1. declare all available regions (that resolve to PCI) in _CRS
2. detect where did guest put PCI devices and declare the result in _CRS

2 is exactly what we did for 64 bit BARs.

The only issue I worry about is MTRRs.
Maybe we are lucky and they do not matter for KVM.
Gerd Hoffmann Nov. 20, 2013, 8:18 a.m. UTC | #5
Hi,

> > > btw: q35 has a fixed 1G window, max low ram addr is 0xb000000, the
> > > remaining address space (0xb0000000 -> 0xc0000000) is used for
> > > mmconfig.
> 
> Is it really fixed? My understanding is it's programmed by firmware,
> you can put mmconfig whereever you want.

Not really fixed, but it is a most useful configuration.  You can place
large bars (up to 512M) @ 0xc0000000 then, and you have a single region
for your bars which makes management easy.  If you place the mmconfig
somewhere else it only gets more complicated.

cheers,
  Gerd
Gerd Hoffmann Nov. 20, 2013, 8:36 a.m. UTC | #6
Hi,

> > If you simply relax (drop) the stage-wise hole-setting, that will at
> > best create a situation where OVMF and qemu duplicate the logic. This
> > info needs to be passed down to OVMF (maybe it already is, somewhere in
> > fw_cfg!), in some form that's more approachable than an ACPI table. (We
> > absolutely don't want to parse ACPI (AML) in OVMF.)

No need to parse aml.  The acpi tables are not fixed, qemu tweaks them
according to the hardware programming done by the guest.

> Two possible fixes, both of them in QEMU:
> 1. declare all available regions (that resolve to PCI) in _CRS
> 2. detect where did guest put PCI devices and declare the result in _CRS
> 
> 2 is exactly what we did for 64 bit BARs.

Well, it's actually a mix, isn't it?  64bit window base address is
derived from guest pci bar mapping, 64bit window size is configurable to
leave room for hotplug (or maybe that is the future plan still ...).

I think we should simply declare end-of-lowram -> ioapic base as 32bit
window.  Looking at the pci bar locations programmed by the guest will
only make this smaller, leaving less space for hotplug, for no good
reason.

> The only issue I worry about is MTRRs.
> Maybe we are lucky and they do not matter for KVM.

I think they are largely cosmetic in virtual machines.  And if ovmf
wants use 0xa000000+ as pci io window it still has the option to create
two mtrr entries to cover the region (one 512m @ 0xa0000000 and one 1G @
0xc0000000).

cheers,
  Gerd
diff mbox

Patch

diff --git a/OvmfPkg/PlatformPei/Platform.c b/OvmfPkg/PlatformPei/Platform.c
index fb56e99..5bc0d74 100644
--- a/OvmfPkg/PlatformPei/Platform.c
+++ b/OvmfPkg/PlatformPei/Platform.c
@@ -197,7 +197,7 @@  MemMapInitialization (
   //
   // address       purpose   size
   // ------------  --------  -------------------------
-  // max(top, 2g)  PCI MMIO  0xFC000000 - max(top, 2g)
+  // 2G/3G/3.5G    PCI MMIO    0xFC000000 - 2G/3G/3.5G
   // 0xFC000000    gap                           44 MB
   // 0xFEC00000    IO-APIC                        4 KB
   // 0xFEC01000    gap                         1020 KB
@@ -205,7 +205,9 @@  MemMapInitialization (
   // 0xFED00400    gap                         1023 KB
   // 0xFEE00000    LAPIC                          1 MB
   //
-  AddIoMemoryRangeHob (TopOfMemory < BASE_2GB ? BASE_2GB : TopOfMemory, 0xFC000000);
+  AddIoMemoryRangeHob (TopOfMemory <= 0x80000000 ? 0x80000000 :
+                       TopOfMemory <= 0xC0000000 ? 0xC0000000 :
+                       0xE0000000, 0xFC000000);
   AddIoMemoryBaseSizeHob (0xFEC00000, SIZE_4KB);
   AddIoMemoryBaseSizeHob (0xFED00000, SIZE_1KB);
   AddIoMemoryBaseSizeHob (PcdGet32(PcdCpuLocalApicBaseAddress), SIZE_1MB);