diff mbox

[Seabios] Over 4GB address ranges for 64bit PCI BARs

Message ID 1288923218.28768.67.camel@nzhmlwks0057
State New
Headers show

Commit Message

Alexey Korolev Nov. 5, 2010, 2:13 a.m. UTC
Hi,

We have seen some issues with 64bit PCI address and large BAR region support in 
seabios. 

On attempting to register a 64bit BAR of 1GB size we had some strange failures
in qemu. After some debugging we find out that source of the issue was in 
seabios PCI enumeration code. 

The issue takes place because of u32 overflow in pciinit.c. 
….........
             if (ALIGN(*paddr, size) + size >= BUILD_PCIPREFMEM_END) {
                 dprintf(1,
                         "prefmem region of (bdf 0x%x bar %d) can't be mapped. "
                         "decrease BUILD_PCIMEM_SIZE and recompile. size %x\n",
                         bdf, region_num, BUILD_PCIPREFMEM_SIZE);
                 size = 0;
             }
…...........
       if (size > 0) {
            *paddr = ALIGN(*paddr, size);
            pci_set_io_region_addr(bdf, region_num, *paddr);
            *paddr += size;
       }

If size is greater than 0xFFFFFFFF – BUILD_PCIPREFMEM_START (256MB), 
this call ALIGN(*paddr, size) will always return 0. The protection test fails, size remains > 0, and guest memory mapping will be corrupted.

We have found that a very similar problem was discussed previously:

http://www.mail-archive.com/qemu-devel@nongnu.org/msg38090.html

The discussion also touches on the question of 64bit addressing for PCI BARs.
In the current implementation regardless of whether the PCI BAR type is 64bit or not
the addressable range will be limited to the first 4GB. We also want to have "true" 64bit
addressable range for PCI BARs.

So to solve these issues some changes were made. 
Tracking for overflows has been added as well as support for 64bit range. 
To support the full 64bit address range a pci_bios_64bit_addr variable has been added. 
The 64bit range can be used only if BAR has PCI_BASE_ADDRESS_MEM_TYPE_64
attribute and the given region doesn't fit in first 4GB.  The patch has been
tested on Linux and BSD guest OS's and it appears to work fine. 



Signed-off-by Alexey Korolev <Alexey.Korolev@endace.com>
Signed-off-by Stephen Donelly <Stephen.Donelly@endace.com>
---
 pciinit.c |   83 ++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 43 insertions(+), 40 deletions(-)

Comments

Isaku Yamahata Nov. 5, 2010, 4:16 a.m. UTC | #1
Hi.
The current BAR allocation doesn't check overflow and some patches
are floating around which aren't merged yet.
There are several issues.

- overflow check
  This should be fixed.
  Some patches are proposed. None hasn't been merged yet.
  Your patch also addresses this issue.
  http://www.seabios.org/pipermail/seabios/2010-July/000794.html
  http://www.seabios.org/pipermail/seabios/2010-October/001089.html

- abstraction of pci bar allocation 
  This is very coding detail. This should be addressed.
  The current PCI BAR allocation uses primitives. So overflow check becomes
  ugly. So it would desirable to abstract it.
  Kevin suggested to use the existing PMM framework. But I concluded PMM
  doesn't suit for it, so introduced new api. But I haven't got any feedback
  from him yet.

- >4GB 64bit bar allocation
  Your patche tries to address this issue. But it breaks PCI-to-PCI
  bridge filtering support.

  If the BAR size is huge (or there are too many BARs), the bar can't
  be allocated under 4G. So several persons want seabios to allocate
  such BARs at >4GB area complaining that OS can't use BARs that seabios
  didn't assigned.

  Others think such BAR can be left unallocated.
  Seabios role is to setup minimal basic environment for bootloader
  to boot OS, 64bit bar allocation is beyond it's role.
  bootloader/rombios usually doesn't handle BARs that is allocated
  beyond 4GB, and Modern OSes can re-arrange PCI bar allocation itself.
  So 64bit bar allocation support wouldn't be needed.

  I'm not sure if there is enough demand to support 64bit BAR allocation
  and if Kevin will accept it or not. Consensus is needed.
  What OS are you using?

- bridge filtering
  64bit BAR allocation must be aware of PCI-to-PCI bridge filtering.
  I suppose one more pci bus walk (or two) would be necessary.
  i.e. make pci bar allocation two (or three) pass.  

thanks,

On Fri, Nov 05, 2010 at 03:13:38PM +1300, Alexey Korolev wrote:
> Hi,
> 
> We have seen some issues with 64bit PCI address and large BAR region support in 
> seabios. 
> 
> On attempting to register a 64bit BAR of 1GB size we had some strange failures
> in qemu. After some debugging we find out that source of the issue was in 
> seabios PCI enumeration code. 
> 
> The issue takes place because of u32 overflow in pciinit.c. 
> ….........
>              if (ALIGN(*paddr, size) + size >= BUILD_PCIPREFMEM_END) {
>                  dprintf(1,
>                          "prefmem region of (bdf 0x%x bar %d) can't be mapped. "
>                          "decrease BUILD_PCIMEM_SIZE and recompile. size %x\n",
>                          bdf, region_num, BUILD_PCIPREFMEM_SIZE);
>                  size = 0;
>              }
> …...........
>        if (size > 0) {
>             *paddr = ALIGN(*paddr, size);
>             pci_set_io_region_addr(bdf, region_num, *paddr);
>             *paddr += size;
>        }
> 
> If size is greater than 0xFFFFFFFF ??? BUILD_PCIPREFMEM_START (256MB), 
> this call ALIGN(*paddr, size) will always return 0. The protection test fails, size remains > 0, and guest memory mapping will be corrupted.
> 
> We have found that a very similar problem was discussed previously:
> 
> http://www.mail-archive.com/qemu-devel@nongnu.org/msg38090.html
> 
> The discussion also touches on the question of 64bit addressing for PCI BARs.
> In the current implementation regardless of whether the PCI BAR type is 64bit or not
> the addressable range will be limited to the first 4GB. We also want to have "true" 64bit
> addressable range for PCI BARs.
> 
> So to solve these issues some changes were made. 
> Tracking for overflows has been added as well as support for 64bit range. 
> To support the full 64bit address range a pci_bios_64bit_addr variable has been added. 
> The 64bit range can be used only if BAR has PCI_BASE_ADDRESS_MEM_TYPE_64
> attribute and the given region doesn't fit in first 4GB.  The patch has been
> tested on Linux and BSD guest OS's and it appears to work fine. 
> 
> 
> 
> Signed-off-by Alexey Korolev <Alexey.Korolev@endace.com>
> Signed-off-by Stephen Donelly <Stephen.Donelly@endace.com>
> ---
>  pciinit.c |   83 ++++++++++++++++++++++++++++++++------------------------------
>  1 file changed, 43 insertions(+), 40 deletions(-)
> 
> diff --git a/src/pciinit.c b/src/pciinit.c
> index 0346423..86c8137 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -20,6 +20,7 @@ static void pci_bios_init_device_in_bus(int bus);
>  static u32 pci_bios_io_addr;
>  static u32 pci_bios_mem_addr;
>  static u32 pci_bios_prefmem_addr;
> +static u64 pci_bios_64bit_addr;
>  /* host irqs corresponding to PCI irqs A-D */
>  const u8 pci_irqs[4] = {
>      10, 10, 11, 11
> @@ -56,9 +57,11 @@ static int pci_bios_allocate_region(u16 bdf, int region_num)
>  {
>      u32 *paddr;
>      u32 ofs = pci_bar(bdf, region_num);
> -
>      u32 old = pci_config_readl(bdf, ofs);
>      u32 mask;
> +    u32 limit_addr;
> +    int is_64bit;
> +
>      if (region_num == PCI_ROM_SLOT) {
>          mask = PCI_ROM_ADDRESS_MASK;
>          pci_config_writel(bdf, ofs, mask);
> @@ -73,50 +76,49 @@ static int pci_bios_allocate_region(u16 bdf, int region_num)
>      pci_config_writel(bdf, ofs, old);
>  
>      u32 size = (~(val & mask)) + 1;
> -    if (val != 0) {
> -        if (val & PCI_BASE_ADDRESS_SPACE_IO) {
> -            paddr = &pci_bios_io_addr;
> -            if (ALIGN(*paddr, size) + size >= 64 * 1024) {
> -                dprintf(1,
> -                        "io region of (bdf 0x%x bar %d) can't be mapped.\n",
> -                        bdf, region_num);
> -                size = 0;
> -            }
> -        } else if ((val & PCI_BASE_ADDRESS_MEM_PREFETCH) &&
> -                 /* keep behaviour on bus = 0 */
> -                 pci_bdf_to_bus(bdf) != 0 &&
> -                 /* If pci_bios_prefmem_addr == 0, keep old behaviour */
> -                 pci_bios_prefmem_addr != 0) {
> -            paddr = &pci_bios_prefmem_addr;
> -            if (ALIGN(*paddr, size) + size >= BUILD_PCIPREFMEM_END) {
> -                dprintf(1,
> -                        "prefmem region of (bdf 0x%x bar %d) can't be mapped. "
> -                        "decrease BUILD_PCIMEM_SIZE and recompile. size %x\n",
> -                        bdf, region_num, BUILD_PCIPREFMEM_SIZE);
> -                size = 0;
> -            }
> -        } else {
> -            paddr = &pci_bios_mem_addr;
> -            if (ALIGN(*paddr, size) + size >= BUILD_PCIMEM_END) {
> +    if (!val || !size)
> +        return 0;
> +
> +    is_64bit = !(val & PCI_BASE_ADDRESS_SPACE_IO) &&
> +        (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64;
> +
> +    if (val & PCI_BASE_ADDRESS_SPACE_IO) {
> +        paddr = &pci_bios_io_addr;
> +        limit_addr = 64 * 1024;
> +    } else if ((val & PCI_BASE_ADDRESS_MEM_PREFETCH) &&
> +             /* keep behaviour on bus = 0 */
> +             pci_bdf_to_bus(bdf) != 0 &&
> +             /* If pci_bios_prefmem_addr == 0, keep old behaviour */
> +             pci_bios_prefmem_addr != 0) {
> +        paddr = &pci_bios_prefmem_addr;
> +        limit_addr = BUILD_PCIPREFMEM_END;
> +    } else {
> +        paddr = &pci_bios_mem_addr;
> +        limit_addr = BUILD_PCIMEM_END;
> +    }
> +
> +    /* The region is out of allowed 32bit mapping range */
> +    if ((ALIGN(*paddr, size) + size >= limit_addr) ||
> +        (ALIGN(*paddr, size) + size <= *paddr)) {
> +            if (!is_64bit) {
>                  dprintf(1,
>                          "mem region of (bdf 0x%x bar %d) can't be mapped. "
> -                        "increase BUILD_PCIMEM_SIZE and recompile. size %x\n",
> -                        bdf, region_num, BUILD_PCIMEM_SIZE);
> -                size = 0;
> +                        "Region size %x\n", bdf, region_num, size);
> +                return 0;
>              }
> -        }
> -        if (size > 0) {
> -            *paddr = ALIGN(*paddr, size);
> -            pci_set_io_region_addr(bdf, region_num, *paddr);
> -            *paddr += size;
> -        }
> +            pci_bios_64bit_addr = ALIGN(pci_bios_64bit_addr, size);
> +            pci_set_io_region_addr(bdf, region_num, (u32)pci_bios_64bit_addr);
> +            pci_set_io_region_addr(bdf, region_num + 1,
> +                                   (u32)(pci_bios_64bit_addr >> 32));
> +            pci_bios_64bit_addr += size;
> +            return is_64bit;
>      }
>  
> -    int is_64bit = !(val & PCI_BASE_ADDRESS_SPACE_IO) &&
> -        (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64;
> -    if (is_64bit && size > 0) {
> -        pci_config_writel(bdf, ofs + 4, 0);
> -    }
> +    *paddr = ALIGN(*paddr, size);
> +    pci_set_io_region_addr(bdf, region_num, *paddr);
> +    if (is_64bit)
> +        pci_set_io_region_addr(bdf, region_num + 1, 0);
> +    *paddr += size;
>      return is_64bit;
>  }
>  
> @@ -409,6 +411,7 @@ pci_setup(void)
>      pci_bios_io_addr = 0xc000;
>      pci_bios_mem_addr = BUILD_PCIMEM_START;
>      pci_bios_prefmem_addr = BUILD_PCIPREFMEM_START;
> +    pci_bios_64bit_addr = ALIGN(RamSizeOver4G + 0x100000000ull, 0x100000000ull);
>  
>      pci_bios_init_bus();
>  
> 
>
Alexey Korolev Nov. 8, 2010, 3:35 a.m. UTC | #2
Hi,
> Hi.
> The current BAR allocation doesn't check overflow and some patches
> are floating around which aren't merged yet.
> There are several issues.
> 
> - overflow check
>   This should be fixed.
>   Some patches are proposed. None hasn't been merged yet.
>   Your patch also addresses this issue.
>   http://www.seabios.org/pipermail/seabios/2010-July/000794.html
>   http://www.seabios.org/pipermail/seabios/2010-October/001089.html

Right. It would be great if the fix get included in seabios/qemu. 

Update: I've seen today's message from Kevin. He is going to merge your
patch. So it will be good news for us. 


> - >4GB 64bit bar allocation
>   Your patche tries to address this issue. But it breaks PCI-to-PCI
>   bridge filtering support.
Hmm, it is quite possible, as we don't know a lot about seabios PCI-to-PCI bridge filtering support.
Just out of curiosity: what is the issue?

>   If the BAR size is huge (or there are too many BARs), the bar can't
>   be allocated under 4G. So several persons want seabios to allocate
>   such BARs at >4GB area complaining that OS can't use BARs that seabios
>   didn't assigned.
> 
>   Others think such BAR can be left unallocated.
>   Seabios role is to setup minimal basic environment for bootloader
>   to boot OS, 64bit bar allocation is beyond it's role.
>   bootloader/rombios usually doesn't handle BARs that is allocated
>   beyond 4GB, and Modern OSes can re-arrange PCI bar allocation itself.
>   So 64bit bar allocation support wouldn't be needed.
> 
>   I'm not sure if there is enough demand to support 64bit BAR allocation
>   and if Kevin will accept it or not. Consensus is needed.
>   What OS are you using?
> 
For us >4GB allocation is welcome but not critical, because we mainly
use Linux versions 2.6.18 and newer. We've tested the seabios without
assignment of the regions which do not fit in first 32bit and it appears
to work fine. So for us 64bit bar allocation support wouldn't be needed.

It is possible that people will use an ancient version of Linux, but the
probability of this event is very low.


Thanks,
Alexey
Isaku Yamahata Nov. 8, 2010, 4:11 a.m. UTC | #3
On Mon, Nov 08, 2010 at 04:35:38PM +1300, Alexey Korolev wrote:
> > - >4GB 64bit bar allocation
> >   Your patche tries to address this issue. But it breaks PCI-to-PCI
> >   bridge filtering support.
> Hmm, it is quite possible, as we don't know a lot about seabios PCI-to-PCI bridge filtering support.
> Just out of curiosity: what is the issue?

It's pci_bios_init_device_bridge() in pciinit.c.
The function touches pci_bios_io_addr, pci_bios_mem_addr, and
pci_bios_prefmem_addr.
So we need to modify, not only pci_bios_allocate_region(),
but also pci_bios_init_device_bridge().

The function programs the P2P bridge to forward IO/memory access
on primary pci bus to secondary pci bus.
It needs to be aware of 64bit BAR allocation.


> >   If the BAR size is huge (or there are too many BARs), the bar can't
> >   be allocated under 4G. So several persons want seabios to allocate
> >   such BARs at >4GB area complaining that OS can't use BARs that seabios
> >   didn't assigned.
> > 
> >   Others think such BAR can be left unallocated.
> >   Seabios role is to setup minimal basic environment for bootloader
> >   to boot OS, 64bit bar allocation is beyond it's role.
> >   bootloader/rombios usually doesn't handle BARs that is allocated
> >   beyond 4GB, and Modern OSes can re-arrange PCI bar allocation itself.
> >   So 64bit bar allocation support wouldn't be needed.
> > 
> >   I'm not sure if there is enough demand to support 64bit BAR allocation
> >   and if Kevin will accept it or not. Consensus is needed.
> >   What OS are you using?
> > 
> For us >4GB allocation is welcome but not critical, because we mainly
> use Linux versions 2.6.18 and newer. We've tested the seabios without
> assignment of the regions which do not fit in first 32bit and it appears
> to work fine. So for us 64bit bar allocation support wouldn't be needed.
> 
> It is possible that people will use an ancient version of Linux, but the
> probability of this event is very low.

My position is same to yours. Welcome, but not critical.
So the issue is, who will finish it.
diff mbox

Patch

diff --git a/src/pciinit.c b/src/pciinit.c
index 0346423..86c8137 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -20,6 +20,7 @@  static void pci_bios_init_device_in_bus(int bus);
 static u32 pci_bios_io_addr;
 static u32 pci_bios_mem_addr;
 static u32 pci_bios_prefmem_addr;
+static u64 pci_bios_64bit_addr;
 /* host irqs corresponding to PCI irqs A-D */
 const u8 pci_irqs[4] = {
     10, 10, 11, 11
@@ -56,9 +57,11 @@  static int pci_bios_allocate_region(u16 bdf, int region_num)
 {
     u32 *paddr;
     u32 ofs = pci_bar(bdf, region_num);
-
     u32 old = pci_config_readl(bdf, ofs);
     u32 mask;
+    u32 limit_addr;
+    int is_64bit;
+
     if (region_num == PCI_ROM_SLOT) {
         mask = PCI_ROM_ADDRESS_MASK;
         pci_config_writel(bdf, ofs, mask);
@@ -73,50 +76,49 @@  static int pci_bios_allocate_region(u16 bdf, int region_num)
     pci_config_writel(bdf, ofs, old);
 
     u32 size = (~(val & mask)) + 1;
-    if (val != 0) {
-        if (val & PCI_BASE_ADDRESS_SPACE_IO) {
-            paddr = &pci_bios_io_addr;
-            if (ALIGN(*paddr, size) + size >= 64 * 1024) {
-                dprintf(1,
-                        "io region of (bdf 0x%x bar %d) can't be mapped.\n",
-                        bdf, region_num);
-                size = 0;
-            }
-        } else if ((val & PCI_BASE_ADDRESS_MEM_PREFETCH) &&
-                 /* keep behaviour on bus = 0 */
-                 pci_bdf_to_bus(bdf) != 0 &&
-                 /* If pci_bios_prefmem_addr == 0, keep old behaviour */
-                 pci_bios_prefmem_addr != 0) {
-            paddr = &pci_bios_prefmem_addr;
-            if (ALIGN(*paddr, size) + size >= BUILD_PCIPREFMEM_END) {
-                dprintf(1,
-                        "prefmem region of (bdf 0x%x bar %d) can't be mapped. "
-                        "decrease BUILD_PCIMEM_SIZE and recompile. size %x\n",
-                        bdf, region_num, BUILD_PCIPREFMEM_SIZE);
-                size = 0;
-            }
-        } else {
-            paddr = &pci_bios_mem_addr;
-            if (ALIGN(*paddr, size) + size >= BUILD_PCIMEM_END) {
+    if (!val || !size)
+        return 0;
+
+    is_64bit = !(val & PCI_BASE_ADDRESS_SPACE_IO) &&
+        (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64;
+
+    if (val & PCI_BASE_ADDRESS_SPACE_IO) {
+        paddr = &pci_bios_io_addr;
+        limit_addr = 64 * 1024;
+    } else if ((val & PCI_BASE_ADDRESS_MEM_PREFETCH) &&
+             /* keep behaviour on bus = 0 */
+             pci_bdf_to_bus(bdf) != 0 &&
+             /* If pci_bios_prefmem_addr == 0, keep old behaviour */
+             pci_bios_prefmem_addr != 0) {
+        paddr = &pci_bios_prefmem_addr;
+        limit_addr = BUILD_PCIPREFMEM_END;
+    } else {
+        paddr = &pci_bios_mem_addr;
+        limit_addr = BUILD_PCIMEM_END;
+    }
+
+    /* The region is out of allowed 32bit mapping range */
+    if ((ALIGN(*paddr, size) + size >= limit_addr) ||
+        (ALIGN(*paddr, size) + size <= *paddr)) {
+            if (!is_64bit) {
                 dprintf(1,
                         "mem region of (bdf 0x%x bar %d) can't be mapped. "
-                        "increase BUILD_PCIMEM_SIZE and recompile. size %x\n",
-                        bdf, region_num, BUILD_PCIMEM_SIZE);
-                size = 0;
+                        "Region size %x\n", bdf, region_num, size);
+                return 0;
             }
-        }
-        if (size > 0) {
-            *paddr = ALIGN(*paddr, size);
-            pci_set_io_region_addr(bdf, region_num, *paddr);
-            *paddr += size;
-        }
+            pci_bios_64bit_addr = ALIGN(pci_bios_64bit_addr, size);
+            pci_set_io_region_addr(bdf, region_num, (u32)pci_bios_64bit_addr);
+            pci_set_io_region_addr(bdf, region_num + 1,
+                                   (u32)(pci_bios_64bit_addr >> 32));
+            pci_bios_64bit_addr += size;
+            return is_64bit;
     }
 
-    int is_64bit = !(val & PCI_BASE_ADDRESS_SPACE_IO) &&
-        (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64;
-    if (is_64bit && size > 0) {
-        pci_config_writel(bdf, ofs + 4, 0);
-    }
+    *paddr = ALIGN(*paddr, size);
+    pci_set_io_region_addr(bdf, region_num, *paddr);
+    if (is_64bit)
+        pci_set_io_region_addr(bdf, region_num + 1, 0);
+    *paddr += size;
     return is_64bit;
 }
 
@@ -409,6 +411,7 @@  pci_setup(void)
     pci_bios_io_addr = 0xc000;
     pci_bios_mem_addr = BUILD_PCIMEM_START;
     pci_bios_prefmem_addr = BUILD_PCIPREFMEM_START;
+    pci_bios_64bit_addr = ALIGN(RamSizeOver4G + 0x100000000ull, 0x100000000ull);
 
     pci_bios_init_bus();