Patchwork RFC: delay pci_update_mappings for 64-bit BARs

login
register
mail settings
Submitter Cam Macdonell
Date Dec. 13, 2010, 10:43 p.m.
Message ID <1292280224-5119-1-git-send-email-cam@cs.ualberta.ca>
Download mbox | patch
Permalink /patch/75432/
State New
Headers show

Comments

Cam Macdonell - Dec. 13, 2010, 10:43 p.m.
Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar.  Wait for the upper 32 or else Qemu will try to map on just the lower 32 which is probably going to corrupt memory.

I was encountering crashes when mapping certain PCI region sizes.  The problem turns out that pci_update_mappings is being called without all 64-bits in the BAR.  For example when mapping to 0x180000000, once the lower 32-bits were written the remapping happened (mapping to 0x8000000) which would overwrite something.

I'm not certain if this is completely correct, I'm simply testing the lower 4-bits to only be MEM_TYPE_64 flag.  Upper 32-bit address parts can be values like 0xff which is tricky to test against.

Cam
---
 hw/pci.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)
Isaku Yamahata - Dec. 14, 2010, 3 a.m.
On Mon, Dec 13, 2010 at 03:43:44PM -0700, Cam Macdonell wrote:
> Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar.  Wait for the upper 32 or else Qemu will try to map on just the lower 32 which is probably going to corrupt memory.
> 
> I was encountering crashes when mapping certain PCI region sizes.  The problem turns out that pci_update_mappings is being called without all 64-bits in the BAR.  For example when mapping to 0x180000000, once the lower 32-bits were written the remapping happened (mapping to 0x8000000) which would overwrite something.
> 
> I'm not certain if this is completely correct, I'm simply testing the lower 4-bits to only be MEM_TYPE_64 flag.  Upper 32-bit address parts can be values like 0xff which is tricky to test against.

You're assuming that guest OS always write lower 32bit and them upper 32bit.
Is the assumption correct?
I found Linux does, but I don't know about other OSes.
And I couldn't find any sentence about how to update (64bit) BAR in the specs.
(Please correct me if I missed it)

Some work around would be necessary regardless of 32bit-or-64bit.
because qemu doesn't emulate bus accurately at the moment.
How about the followings?
If BAR overlaps with RAM, don't map BAR.
If BAR overlaps with other BARs, record the overlapping and
when updating one of the BARs, update all the overlapping BARs.
Which BAR wins depends on the order of updating, it doesn't matter because
it's anomaly case.

This way, 32bit BAR case is also covered.

thanks,

> 
> Cam
> ---
>  hw/pci.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 438c0d1..3b81792 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1000,6 +1000,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
>  {
>      int i, was_irq_disabled = pci_irq_disabled(d);
>      uint32_t config_size = pci_config_size(d);
> +    int is_64 = 0;
> +
> +    is_64 = ((val & 0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64);
>  
>      for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
>          uint8_t wmask = d->wmask[addr + i];
> @@ -1008,7 +1011,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
>          d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
>          d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
>      }
> -    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
> +    if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) && (!is_64)) ||
>          ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
>          ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
>          range_covers_byte(addr, l, PCI_COMMAND))
> -- 
> 1.7.0.4
> 
>
Cam Macdonell - Dec. 14, 2010, 5:42 p.m.
On Mon, Dec 13, 2010 at 8:00 PM, Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> On Mon, Dec 13, 2010 at 03:43:44PM -0700, Cam Macdonell wrote:
>> Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar.  Wait for the upper 32 or else Qemu will try to map on just the lower 32 which is probably going to corrupt memory.
>>
>> I was encountering crashes when mapping certain PCI region sizes.  The problem turns out that pci_update_mappings is being called without all 64-bits in the BAR.  For example when mapping to 0x180000000, once the lower 32-bits were written the remapping happened (mapping to 0x8000000) which would overwrite something.
>>
>> I'm not certain if this is completely correct, I'm simply testing the lower 4-bits to only be MEM_TYPE_64 flag.  Upper 32-bit address parts can be values like 0xff which is tricky to test against.
>
> You're assuming that guest OS always write lower 32bit and them upper 32bit.
> Is the assumption correct?
> I found Linux does, but I don't know about other OSes.
> And I couldn't find any sentence about how to update (64bit) BAR in the specs.
> (Please correct me if I missed it)

I think you're right, we probably can't assume the order.

>
> Some work around would be necessary regardless of 32bit-or-64bit.
> because qemu doesn't emulate bus accurately at the moment.
> How about the followings?
> If BAR overlaps with RAM, don't map BAR.
> If BAR overlaps with other BARs, record the overlapping and
> when updating one of the BARs, update all the overlapping BARs.
> Which BAR wins depends on the order of updating, it doesn't matter because
> it's anomaly case.

But the addresses in the BARs may not overlap.  For example, Linux
allocates memory from top down, so I recently had the mapping of a BAR
to address 0xffc0000000

So BAR 0x18 sees 0xc0000004
Then BAR 0x1c sees 0xff

So if I understand what you mean by overlapping BARs, 0xc0000000 and
0xffc0000000 will not be detected as overlapping and so we can't
record it.  But, we can allow harmless mappings of the incomplete
lower-32 to proceed and then get remapped when the upper bits are
written.  (This is what happens currently, but fails when the lower-32
overwrite RAM).

Case of writing upper-then-lower (non-Linux case):
The addresses in the upper 32-bits are going to be limited to 16-bits
(at most 48-bit addresses currently) and so those shouldn't update
mappings because they will overlap with RAM.  When the lower-bits are
written, we have the full 64-bit address and can update mappings.

Case of writing lower-then-upper:
If the lower 32-bit BAR address doesn't conflict with RAM, map it.
When the upper bits are written, update to the correct mapping.

We would just have to ensure the first mapping is indeed harmless.

Would that work?
Cam

>
> This way, 32bit BAR case is also covered.
>
> thanks,
>
>>
>> Cam
>> ---
>>  hw/pci.c |    5 ++++-
>>  1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/pci.c b/hw/pci.c
>> index 438c0d1..3b81792 100644
>> --- a/hw/pci.c
>> +++ b/hw/pci.c
>> @@ -1000,6 +1000,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
>>  {
>>      int i, was_irq_disabled = pci_irq_disabled(d);
>>      uint32_t config_size = pci_config_size(d);
>> +    int is_64 = 0;
>> +
>> +    is_64 = ((val & 0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64);
>>
>>      for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
>>          uint8_t wmask = d->wmask[addr + i];
>> @@ -1008,7 +1011,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
>>          d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
>>          d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
>>      }
>> -    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
>> +    if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) && (!is_64)) ||
>>          ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
>>          ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
>>          range_covers_byte(addr, l, PCI_COMMAND))
>> --
>> 1.7.0.4
>>
>>
>
> --
> yamahata
>
>
Isaku Yamahata - Dec. 16, 2010, 7:25 a.m.
On Tue, Dec 14, 2010 at 10:42:43AM -0700, Cam Macdonell wrote:
> On Mon, Dec 13, 2010 at 8:00 PM, Isaku Yamahata <yamahata@valinux.co.jp> wrote:
> > On Mon, Dec 13, 2010 at 03:43:44PM -0700, Cam Macdonell wrote:
> >> Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar. ?Wait for the upper 32 or else Qemu will try to map on just the lower 32 which is probably going to corrupt memory.
> >>
> >> I was encountering crashes when mapping certain PCI region sizes. ?The problem turns out that pci_update_mappings is being called without all 64-bits in the BAR. ?For example when mapping to 0x180000000, once the lower 32-bits were written the remapping happened (mapping to 0x8000000) which would overwrite something.
> >>
> >> I'm not certain if this is completely correct, I'm simply testing the lower 4-bits to only be MEM_TYPE_64 flag. ?Upper 32-bit address parts can be values like 0xff which is tricky to test against.
> >
> > You're assuming that guest OS always write lower 32bit and them upper 32bit.
> > Is the assumption correct?
> > I found Linux does, but I don't know about other OSes.
> > And I couldn't find any sentence about how to update (64bit) BAR in the specs.
> > (Please correct me if I missed it)
> 
> I think you're right, we probably can't assume the order.
> 
> >
> > Some work around would be necessary regardless of 32bit-or-64bit.
> > because qemu doesn't emulate bus accurately at the moment.
> > How about the followings?
> > If BAR overlaps with RAM, don't map BAR.
> > If BAR overlaps with other BARs, record the overlapping and
> > when updating one of the BARs, update all the overlapping BARs.
> > Which BAR wins depends on the order of updating, it doesn't matter because
> > it's anomaly case.
> 
> But the addresses in the BARs may not overlap.  For example, Linux
> allocates memory from top down, so I recently had the mapping of a BAR
> to address 0xffc0000000
> 
> So BAR 0x18 sees 0xc0000004
> Then BAR 0x1c sees 0xff
> 
> So if I understand what you mean by overlapping BARs, 0xc0000000 and
> 0xffc0000000 will not be detected as overlapping and so we can't
> record it.  But, we can allow harmless mappings of the incomplete
> lower-32 to proceed and then get remapped when the upper bits are
> written.  (This is what happens currently, but fails when the lower-32
> overwrite RAM).
> 
> Case of writing upper-then-lower (non-Linux case):
> The addresses in the upper 32-bits are going to be limited to 16-bits
> (at most 48-bit addresses currently) and so those shouldn't update
> mappings because they will overlap with RAM.  When the lower-bits are
> written, we have the full 64-bit address and can update mappings.
> 
> Case of writing lower-then-upper:
> If the lower 32-bit BAR address doesn't conflict with RAM, map it.
> When the upper bits are written, update to the correct mapping.
> 
> We would just have to ensure the first mapping is indeed harmless.

Agreed. When only lower 32-bit is set, it may conflicts with other BAR.
So when upper 32-bit is written, the conflicted BAR also needs update.
It might be acceptable to update all pci devices instead of conflicted ones.
Although it would be quite slow, setting BAR is not performance critical.


> Would that work?

I hope so. Anyway it's just work around.


> Cam
> 
> >
> > This way, 32bit BAR case is also covered.
> >
> > thanks,
> >
> >>
> >> Cam
> >> ---
> >> ?hw/pci.c | ? ?5 ++++-
> >> ?1 files changed, 4 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/hw/pci.c b/hw/pci.c
> >> index 438c0d1..3b81792 100644
> >> --- a/hw/pci.c
> >> +++ b/hw/pci.c
> >> @@ -1000,6 +1000,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> >> ?{
> >> ? ? ?int i, was_irq_disabled = pci_irq_disabled(d);
> >> ? ? ?uint32_t config_size = pci_config_size(d);
> >> + ? ?int is_64 = 0;
> >> +
> >> + ? ?is_64 = ((val & 0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64);
> >>
> >> ? ? ?for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> >> ? ? ? ? ?uint8_t wmask = d->wmask[addr + i];
> >> @@ -1008,7 +1011,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> >> ? ? ? ? ?d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
> >> ? ? ? ? ?d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
> >> ? ? ?}
> >> - ? ?if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
> >> + ? ?if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) && (!is_64)) ||
> >> ? ? ? ? ?ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
> >> ? ? ? ? ?ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
> >> ? ? ? ? ?range_covers_byte(addr, l, PCI_COMMAND))
> >> --
> >> 1.7.0.4
> >>
> >>
> >
> > --
> > yamahata
> >
> >
>

Patch

diff --git a/hw/pci.c b/hw/pci.c
index 438c0d1..3b81792 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1000,6 +1000,9 @@  void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
     int i, was_irq_disabled = pci_irq_disabled(d);
     uint32_t config_size = pci_config_size(d);
+    int is_64 = 0;
+
+    is_64 = ((val & 0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64);
 
     for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
         uint8_t wmask = d->wmask[addr + i];
@@ -1008,7 +1011,7 @@  void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
         d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
         d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */
     }
-    if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) ||
+    if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) && (!is_64)) ||
         ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) ||
         ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) ||
         range_covers_byte(addr, l, PCI_COMMAND))