diff mbox

[v2] pci: Introduce helper to retrieve a PCI device's DMA address space

Message ID 5204C0D8.5010205@ozlabs.ru
State New
Headers show

Commit Message

Alexey Kardashevskiy Aug. 9, 2013, 10:13 a.m. UTC
On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
> Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
>> On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
>>> Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
>>>> A PCI device's DMA address space (possibly an IOMMU) is returned by a
>>>> method on the PCIBus.  At the moment that only has one caller, so the
>>>> method is simply open coded.  We'll need another caller for VFIO, so
>>>> this patch introduces a helper/wrapper function.
>>>>
>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>> [aik: added inheritance from parent if iommu is not set for the current bus]
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>
>>>> ---
>>>> Changes:
>>>> v2:
>>>> * added inheritance, needed for a pci-bridge on spapr-ppc64
>>>> * pci_iommu_as renamed to pci_device_iommu_address_space
>>>> ---
>>>>  hw/pci/pci.c         | 22 ++++++++++++++++------
>>>>  include/hw/pci/pci.h |  1 +
>>>>  2 files changed, 17 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>> index 4c004f5..0072b54 100644
>>>> --- a/hw/pci/pci.c
>>>> +++ b/hw/pci/pci.c
>>>> @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>>>      }
>>>>  
>>>>      pci_dev->bus = bus;
>>>> -    if (bus->iommu_fn) {
>>>> -        dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
>>>> -    } else {
>>>> -        /* FIXME: inherit memory region from bus creator */
>>>> -        dma_as = &address_space_memory;
>>>> -    }
>>>> +    dma_as = pci_device_iommu_address_space(pci_dev);
>>>>  
>>>>      memory_region_init_alias(&pci_dev->bus_master_enable_region,
>>>>                               OBJECT(pci_dev), "bus master",
>>>> @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
>>>>      k->props = pci_props;
>>>>  }
>>>>  
>>>> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>>>> +{
>>>> +    PCIBus *bus = PCI_BUS(dev->bus);
>>>> +
>>>> +    if (bus->iommu_fn) {
>>>> +        return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>>> +    }
>>>> +
>>>> +    if (bus->parent_dev) {
>>>> +        return pci_device_iommu_address_space(bus->parent_dev);
>>>> +    }
>>>
>>> No, this would fail if bus->parent_dev is not NULL but not a PCI device
>>> either.
>>
>> parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/
> 
> Doh, I misread the code, I thought it was the "parent" field in
> BusState.  Why do we have parent_dev at all?

The code is too old? Don't know.


>>> You can use object_dynamic_cast to convert the parent_dev to
>>> PCIDevice, and if the cast succeeds you call the new function.
>>>
>>> Perhaps you could make the new function take a PCIBus instead.
>>> Accessing the PCIDevice's IOMMU address space (as opposed to the
>>> bus-master address space) doesn't make much sense, VFIO is really a
>>> special case.  Putting the new API on the bus side instead looks better.
>>>
>>> (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
>>> devices sitting on the secondary bus?)
>>
>> It happens naturally I guess when linux enables devices.
> 
> Yes, but then using the IOMMU address space would be wrong; you would
> have to use the bus-master address space as a base for the child's
> bus-master address space.


Like this? Works too.

Comments

Paolo Bonzini Aug. 9, 2013, 10:19 a.m. UTC | #1
Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
> On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
>> Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
>>> On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
>>>> Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
>>>>> A PCI device's DMA address space (possibly an IOMMU) is returned by a
>>>>> method on the PCIBus.  At the moment that only has one caller, so the
>>>>> method is simply open coded.  We'll need another caller for VFIO, so
>>>>> this patch introduces a helper/wrapper function.
>>>>>
>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>>> [aik: added inheritance from parent if iommu is not set for the current bus]
>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>
>>>>> ---
>>>>> Changes:
>>>>> v2:
>>>>> * added inheritance, needed for a pci-bridge on spapr-ppc64
>>>>> * pci_iommu_as renamed to pci_device_iommu_address_space
>>>>> ---
>>>>>  hw/pci/pci.c         | 22 ++++++++++++++++------
>>>>>  include/hw/pci/pci.h |  1 +
>>>>>  2 files changed, 17 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>> index 4c004f5..0072b54 100644
>>>>> --- a/hw/pci/pci.c
>>>>> +++ b/hw/pci/pci.c
>>>>> @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>>>>      }
>>>>>  
>>>>>      pci_dev->bus = bus;
>>>>> -    if (bus->iommu_fn) {
>>>>> -        dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
>>>>> -    } else {
>>>>> -        /* FIXME: inherit memory region from bus creator */
>>>>> -        dma_as = &address_space_memory;
>>>>> -    }
>>>>> +    dma_as = pci_device_iommu_address_space(pci_dev);
>>>>>  
>>>>>      memory_region_init_alias(&pci_dev->bus_master_enable_region,
>>>>>                               OBJECT(pci_dev), "bus master",
>>>>> @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
>>>>>      k->props = pci_props;
>>>>>  }
>>>>>  
>>>>> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>>>>> +{
>>>>> +    PCIBus *bus = PCI_BUS(dev->bus);
>>>>> +
>>>>> +    if (bus->iommu_fn) {
>>>>> +        return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>>>> +    }
>>>>> +
>>>>> +    if (bus->parent_dev) {
>>>>> +        return pci_device_iommu_address_space(bus->parent_dev);
>>>>> +    }
>>>>
>>>> No, this would fail if bus->parent_dev is not NULL but not a PCI device
>>>> either.
>>>
>>> parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/
>>
>> Doh, I misread the code, I thought it was the "parent" field in
>> BusState.  Why do we have parent_dev at all?
> 
> The code is too old? Don't know.
> 
> 
>>>> You can use object_dynamic_cast to convert the parent_dev to
>>>> PCIDevice, and if the cast succeeds you call the new function.
>>>>
>>>> Perhaps you could make the new function take a PCIBus instead.
>>>> Accessing the PCIDevice's IOMMU address space (as opposed to the
>>>> bus-master address space) doesn't make much sense, VFIO is really a
>>>> special case.  Putting the new API on the bus side instead looks better.
>>>>
>>>> (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
>>>> devices sitting on the secondary bus?)
>>>
>>> It happens naturally I guess when linux enables devices.
>>
>> Yes, but then using the IOMMU address space would be wrong; you would
>> have to use the bus-master address space as a base for the child's
>> bus-master address space.
> 
> 
> Like this? Works too.
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 8bdcedc..a4c70e6 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2247,23 +2247,23 @@ AddressSpace
> *pci_device_iommu_address_space(PCIDevice *dev)
>  {
>      PCIBus *bus = PCI_BUS(dev->bus);
> 
>      if (bus->iommu_fn) {
>          return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>      }
> 
>      if (bus->parent_dev) {
>          return pci_device_iommu_address_space(bus->parent_dev);
>      }
> 
> -    return &address_space_memory;
> +    return &dev->bus_master_as;
>  }

I was thinking more like this:

     if (bus->parent_dev) {
-        return pci_device_iommu_address_space(bus->parent_dev);
+        /* Take parent device's bus-master enable bit into account.  */
+        return pci_get_address_space(bus->parent_dev);
     }

+    /* Not a secondary bus and no IOMMU.  Use system memory.  */
     return &address_space_memory;

Paolo

> 
> 
>> Also, we would have to fix the x86 firmware.
>>
>> Paolo
>>
> 
>
Alexey Kardashevskiy Aug. 9, 2013, 10:58 a.m. UTC | #2
On 08/09/2013 08:19 PM, Paolo Bonzini wrote:
> Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
>> On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
>>> Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
>>>> On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
>>>>> Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
>>>>>> A PCI device's DMA address space (possibly an IOMMU) is returned by a
>>>>>> method on the PCIBus.  At the moment that only has one caller, so the
>>>>>> method is simply open coded.  We'll need another caller for VFIO, so
>>>>>> this patch introduces a helper/wrapper function.
>>>>>>
>>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>>>> [aik: added inheritance from parent if iommu is not set for the current bus]
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>
>>>>>> ---
>>>>>> Changes:
>>>>>> v2:
>>>>>> * added inheritance, needed for a pci-bridge on spapr-ppc64
>>>>>> * pci_iommu_as renamed to pci_device_iommu_address_space
>>>>>> ---
>>>>>>  hw/pci/pci.c         | 22 ++++++++++++++++------
>>>>>>  include/hw/pci/pci.h |  1 +
>>>>>>  2 files changed, 17 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>> index 4c004f5..0072b54 100644
>>>>>> --- a/hw/pci/pci.c
>>>>>> +++ b/hw/pci/pci.c
>>>>>> @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>>>>>      }
>>>>>>  
>>>>>>      pci_dev->bus = bus;
>>>>>> -    if (bus->iommu_fn) {
>>>>>> -        dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
>>>>>> -    } else {
>>>>>> -        /* FIXME: inherit memory region from bus creator */
>>>>>> -        dma_as = &address_space_memory;
>>>>>> -    }
>>>>>> +    dma_as = pci_device_iommu_address_space(pci_dev);
>>>>>>  
>>>>>>      memory_region_init_alias(&pci_dev->bus_master_enable_region,
>>>>>>                               OBJECT(pci_dev), "bus master",
>>>>>> @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
>>>>>>      k->props = pci_props;
>>>>>>  }
>>>>>>  
>>>>>> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>>>>>> +{
>>>>>> +    PCIBus *bus = PCI_BUS(dev->bus);
>>>>>> +
>>>>>> +    if (bus->iommu_fn) {
>>>>>> +        return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>>>>> +    }
>>>>>> +
>>>>>> +    if (bus->parent_dev) {
>>>>>> +        return pci_device_iommu_address_space(bus->parent_dev);
>>>>>> +    }
>>>>>
>>>>> No, this would fail if bus->parent_dev is not NULL but not a PCI device
>>>>> either.
>>>>
>>>> parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/
>>>
>>> Doh, I misread the code, I thought it was the "parent" field in
>>> BusState.  Why do we have parent_dev at all?
>>
>> The code is too old? Don't know.
>>
>>
>>>>> You can use object_dynamic_cast to convert the parent_dev to
>>>>> PCIDevice, and if the cast succeeds you call the new function.
>>>>>
>>>>> Perhaps you could make the new function take a PCIBus instead.
>>>>> Accessing the PCIDevice's IOMMU address space (as opposed to the
>>>>> bus-master address space) doesn't make much sense, VFIO is really a
>>>>> special case.  Putting the new API on the bus side instead looks better.
>>>>>
>>>>> (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
>>>>> devices sitting on the secondary bus?)
>>>>
>>>> It happens naturally I guess when linux enables devices.
>>>
>>> Yes, but then using the IOMMU address space would be wrong; you would
>>> have to use the bus-master address space as a base for the child's
>>> bus-master address space.
>>
>>
>> Like this? Works too.
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 8bdcedc..a4c70e6 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -2247,23 +2247,23 @@ AddressSpace
>> *pci_device_iommu_address_space(PCIDevice *dev)
>>  {
>>      PCIBus *bus = PCI_BUS(dev->bus);
>>
>>      if (bus->iommu_fn) {
>>          return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>      }
>>
>>      if (bus->parent_dev) {
>>          return pci_device_iommu_address_space(bus->parent_dev);
>>      }
>>
>> -    return &address_space_memory;
>> +    return &dev->bus_master_as;
>>  }
> 
> I was thinking more like this:
> 
>      if (bus->parent_dev) {
> -        return pci_device_iommu_address_space(bus->parent_dev);
> +        /* Take parent device's bus-master enable bit into account.  */
> +        return pci_get_address_space(bus->parent_dev);
>      }
> 
> +    /* Not a secondary bus and no IOMMU.  Use system memory.  */
>      return &address_space_memory;

Ok.

Oh. BTW. This "bus master" thing now breaks VFIO's check for all devices
being in the same address space as every single device has its own "bus
master" AddressSpace.


> 
> Paolo
> 
>>
>>
>>> Also, we would have to fix the x86 firmware.

btw what would it mean? :)


>>>
>>> Paolo
>>>
>>
>>
>
Paolo Bonzini Aug. 9, 2013, 11:07 a.m. UTC | #3
Il 09/08/2013 12:58, Alexey Kardashevskiy ha scritto:
> On 08/09/2013 08:19 PM, Paolo Bonzini wrote:
>> Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
>>> On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
>>>> Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
>>>>> On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
>>>>>> Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
>>>>>>> A PCI device's DMA address space (possibly an IOMMU) is returned by a
>>>>>>> method on the PCIBus.  At the moment that only has one caller, so the
>>>>>>> method is simply open coded.  We'll need another caller for VFIO, so
>>>>>>> this patch introduces a helper/wrapper function.
>>>>>>>
>>>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>>>>> [aik: added inheritance from parent if iommu is not set for the current bus]
>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>
>>>>>>> ---
>>>>>>> Changes:
>>>>>>> v2:
>>>>>>> * added inheritance, needed for a pci-bridge on spapr-ppc64
>>>>>>> * pci_iommu_as renamed to pci_device_iommu_address_space
>>>>>>> ---
>>>>>>>  hw/pci/pci.c         | 22 ++++++++++++++++------
>>>>>>>  include/hw/pci/pci.h |  1 +
>>>>>>>  2 files changed, 17 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>>> index 4c004f5..0072b54 100644
>>>>>>> --- a/hw/pci/pci.c
>>>>>>> +++ b/hw/pci/pci.c
>>>>>>> @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>>>>>>      }
>>>>>>>  
>>>>>>>      pci_dev->bus = bus;
>>>>>>> -    if (bus->iommu_fn) {
>>>>>>> -        dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
>>>>>>> -    } else {
>>>>>>> -        /* FIXME: inherit memory region from bus creator */
>>>>>>> -        dma_as = &address_space_memory;
>>>>>>> -    }
>>>>>>> +    dma_as = pci_device_iommu_address_space(pci_dev);
>>>>>>>  
>>>>>>>      memory_region_init_alias(&pci_dev->bus_master_enable_region,
>>>>>>>                               OBJECT(pci_dev), "bus master",
>>>>>>> @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
>>>>>>>      k->props = pci_props;
>>>>>>>  }
>>>>>>>  
>>>>>>> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>>>>>>> +{
>>>>>>> +    PCIBus *bus = PCI_BUS(dev->bus);
>>>>>>> +
>>>>>>> +    if (bus->iommu_fn) {
>>>>>>> +        return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    if (bus->parent_dev) {
>>>>>>> +        return pci_device_iommu_address_space(bus->parent_dev);
>>>>>>> +    }
>>>>>>
>>>>>> No, this would fail if bus->parent_dev is not NULL but not a PCI device
>>>>>> either.
>>>>>
>>>>> parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/
>>>>
>>>> Doh, I misread the code, I thought it was the "parent" field in
>>>> BusState.  Why do we have parent_dev at all?
>>>
>>> The code is too old? Don't know.
>>>
>>>
>>>>>> You can use object_dynamic_cast to convert the parent_dev to
>>>>>> PCIDevice, and if the cast succeeds you call the new function.
>>>>>>
>>>>>> Perhaps you could make the new function take a PCIBus instead.
>>>>>> Accessing the PCIDevice's IOMMU address space (as opposed to the
>>>>>> bus-master address space) doesn't make much sense, VFIO is really a
>>>>>> special case.  Putting the new API on the bus side instead looks better.
>>>>>>
>>>>>> (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
>>>>>> devices sitting on the secondary bus?)
>>>>>
>>>>> It happens naturally I guess when linux enables devices.
>>>>
>>>> Yes, but then using the IOMMU address space would be wrong; you would
>>>> have to use the bus-master address space as a base for the child's
>>>> bus-master address space.
>>>
>>>
>>> Like this? Works too.
>>>
>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>> index 8bdcedc..a4c70e6 100644
>>> --- a/hw/pci/pci.c
>>> +++ b/hw/pci/pci.c
>>> @@ -2247,23 +2247,23 @@ AddressSpace
>>> *pci_device_iommu_address_space(PCIDevice *dev)
>>>  {
>>>      PCIBus *bus = PCI_BUS(dev->bus);
>>>
>>>      if (bus->iommu_fn) {
>>>          return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>>      }
>>>
>>>      if (bus->parent_dev) {
>>>          return pci_device_iommu_address_space(bus->parent_dev);
>>>      }
>>>
>>> -    return &address_space_memory;
>>> +    return &dev->bus_master_as;
>>>  }
>>
>> I was thinking more like this:
>>
>>      if (bus->parent_dev) {
>> -        return pci_device_iommu_address_space(bus->parent_dev);
>> +        /* Take parent device's bus-master enable bit into account.  */
>> +        return pci_get_address_space(bus->parent_dev);
>>      }
>>
>> +    /* Not a secondary bus and no IOMMU.  Use system memory.  */
>>      return &address_space_memory;
> 
> Ok.
> 
> Oh. BTW. This "bus master" thing now breaks VFIO's check for all devices
> being in the same address space as every single device has its own "bus
> master" AddressSpace.

Yes, fixing that check is another good use of the new API you introduced.

But after Ben's answer, I guess the above change is not really needed.
It would add complication for VFIO, too.  Proper emulation would require
QEMU to trap writes to the device's bus-master bit.  QEMU would have to
take of the value that the guest writes, AND it with the bus-master
bit(s) of all bridges between the host bridge and the VFIO device, and
write the result to the passed-through device.  This is because the
bridges are emulated, and do not exist in real hardware.

>>>> Also, we would have to fix the x86 firmware.
> 
> btw what would it mean? :)

Firmware would have to look for bridges and enable bus master DMA on
them.  Otherwise, for example a SCSI controller below a bridge (not
virtio-scsi) would fail to boot.

Paolo
Alexey Kardashevskiy Aug. 9, 2013, 11:21 a.m. UTC | #4
On 08/09/2013 09:07 PM, Paolo Bonzini wrote:
> Il 09/08/2013 12:58, Alexey Kardashevskiy ha scritto:
>> On 08/09/2013 08:19 PM, Paolo Bonzini wrote:
>>> Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
>>>> On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
>>>>> Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
>>>>>> On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
>>>>>>> Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
>>>>>>>> A PCI device's DMA address space (possibly an IOMMU) is returned by a
>>>>>>>> method on the PCIBus.  At the moment that only has one caller, so the
>>>>>>>> method is simply open coded.  We'll need another caller for VFIO, so
>>>>>>>> this patch introduces a helper/wrapper function.
>>>>>>>>
>>>>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>>>>>> [aik: added inheritance from parent if iommu is not set for the current bus]
>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> Changes:
>>>>>>>> v2:
>>>>>>>> * added inheritance, needed for a pci-bridge on spapr-ppc64
>>>>>>>> * pci_iommu_as renamed to pci_device_iommu_address_space
>>>>>>>> ---
>>>>>>>>  hw/pci/pci.c         | 22 ++++++++++++++++------
>>>>>>>>  include/hw/pci/pci.h |  1 +
>>>>>>>>  2 files changed, 17 insertions(+), 6 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>>>> index 4c004f5..0072b54 100644
>>>>>>>> --- a/hw/pci/pci.c
>>>>>>>> +++ b/hw/pci/pci.c
>>>>>>>> @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>>>>>>>      }
>>>>>>>>  
>>>>>>>>      pci_dev->bus = bus;
>>>>>>>> -    if (bus->iommu_fn) {
>>>>>>>> -        dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
>>>>>>>> -    } else {
>>>>>>>> -        /* FIXME: inherit memory region from bus creator */
>>>>>>>> -        dma_as = &address_space_memory;
>>>>>>>> -    }
>>>>>>>> +    dma_as = pci_device_iommu_address_space(pci_dev);
>>>>>>>>  
>>>>>>>>      memory_region_init_alias(&pci_dev->bus_master_enable_region,
>>>>>>>>                               OBJECT(pci_dev), "bus master",
>>>>>>>> @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
>>>>>>>>      k->props = pci_props;
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>>>>>>>> +{
>>>>>>>> +    PCIBus *bus = PCI_BUS(dev->bus);
>>>>>>>> +
>>>>>>>> +    if (bus->iommu_fn) {
>>>>>>>> +        return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (bus->parent_dev) {
>>>>>>>> +        return pci_device_iommu_address_space(bus->parent_dev);
>>>>>>>> +    }
>>>>>>>
>>>>>>> No, this would fail if bus->parent_dev is not NULL but not a PCI device
>>>>>>> either.
>>>>>>
>>>>>> parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/
>>>>>
>>>>> Doh, I misread the code, I thought it was the "parent" field in
>>>>> BusState.  Why do we have parent_dev at all?
>>>>
>>>> The code is too old? Don't know.
>>>>
>>>>
>>>>>>> You can use object_dynamic_cast to convert the parent_dev to
>>>>>>> PCIDevice, and if the cast succeeds you call the new function.
>>>>>>>
>>>>>>> Perhaps you could make the new function take a PCIBus instead.
>>>>>>> Accessing the PCIDevice's IOMMU address space (as opposed to the
>>>>>>> bus-master address space) doesn't make much sense, VFIO is really a
>>>>>>> special case.  Putting the new API on the bus side instead looks better.
>>>>>>>
>>>>>>> (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
>>>>>>> devices sitting on the secondary bus?)
>>>>>>
>>>>>> It happens naturally I guess when linux enables devices.
>>>>>
>>>>> Yes, but then using the IOMMU address space would be wrong; you would
>>>>> have to use the bus-master address space as a base for the child's
>>>>> bus-master address space.
>>>>
>>>>
>>>> Like this? Works too.
>>>>
>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>> index 8bdcedc..a4c70e6 100644
>>>> --- a/hw/pci/pci.c
>>>> +++ b/hw/pci/pci.c
>>>> @@ -2247,23 +2247,23 @@ AddressSpace
>>>> *pci_device_iommu_address_space(PCIDevice *dev)
>>>>  {
>>>>      PCIBus *bus = PCI_BUS(dev->bus);
>>>>
>>>>      if (bus->iommu_fn) {
>>>>          return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>>>      }
>>>>
>>>>      if (bus->parent_dev) {
>>>>          return pci_device_iommu_address_space(bus->parent_dev);
>>>>      }
>>>>
>>>> -    return &address_space_memory;
>>>> +    return &dev->bus_master_as;
>>>>  }
>>>
>>> I was thinking more like this:
>>>
>>>      if (bus->parent_dev) {
>>> -        return pci_device_iommu_address_space(bus->parent_dev);
>>> +        /* Take parent device's bus-master enable bit into account.  */
>>> +        return pci_get_address_space(bus->parent_dev);
>>>      }
>>>
>>> +    /* Not a secondary bus and no IOMMU.  Use system memory.  */
>>>      return &address_space_memory;
>>
>> Ok.
>>
>> Oh. BTW. This "bus master" thing now breaks VFIO's check for all devices
>> being in the same address space as every single device has its own "bus
>> master" AddressSpace.
> 
> Yes, fixing that check is another good use of the new API you introduced.
> 
> But after Ben's answer, I guess the above change is not really needed.
> It would add complication for VFIO, too.  Proper emulation would require
> QEMU to trap writes to the device's bus-master bit.  QEMU would have to
> take of the value that the guest writes, AND it with the bus-master
> bit(s) of all bridges between the host bridge and the VFIO device, and
> write the result to the passed-through device.  This is because the
> bridges are emulated, and do not exist in real hardware.


So does this mean that we go with the original patch and ignore bus master
address space here?


I guess I could overcome that VFIO check by comparing not just AddressSpace
but AddressSpace->root if AddressSpace is different but it does not make a
lot of sense.

> 
>>>>> Also, we would have to fix the x86 firmware.
>>
>> btw what would it mean? :)
> 
> Firmware would have to look for bridges and enable bus master DMA on
> them.  Otherwise, for example a SCSI controller below a bridge (not
> virtio-scsi) would fail to boot.
> 
> Paolo
>
Paolo Bonzini Aug. 9, 2013, 11:30 a.m. UTC | #5
Il 09/08/2013 13:21, Alexey Kardashevskiy ha scritto:
> On 08/09/2013 09:07 PM, Paolo Bonzini wrote:
>> Il 09/08/2013 12:58, Alexey Kardashevskiy ha scritto:
>>> On 08/09/2013 08:19 PM, Paolo Bonzini wrote:
>>>> Il 09/08/2013 12:13, Alexey Kardashevskiy ha scritto:
>>>>> On 08/09/2013 07:53 PM, Paolo Bonzini wrote:
>>>>>> Il 09/08/2013 11:48, Alexey Kardashevskiy ha scritto:
>>>>>>> On 08/09/2013 07:40 PM, Paolo Bonzini wrote:
>>>>>>>> Il 09/08/2013 10:49, Alexey Kardashevskiy ha scritto:
>>>>>>>>> A PCI device's DMA address space (possibly an IOMMU) is returned by a
>>>>>>>>> method on the PCIBus.  At the moment that only has one caller, so the
>>>>>>>>> method is simply open coded.  We'll need another caller for VFIO, so
>>>>>>>>> this patch introduces a helper/wrapper function.
>>>>>>>>>
>>>>>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>>>>>>> [aik: added inheritance from parent if iommu is not set for the current bus]
>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> Changes:
>>>>>>>>> v2:
>>>>>>>>> * added inheritance, needed for a pci-bridge on spapr-ppc64
>>>>>>>>> * pci_iommu_as renamed to pci_device_iommu_address_space
>>>>>>>>> ---
>>>>>>>>>  hw/pci/pci.c         | 22 ++++++++++++++++------
>>>>>>>>>  include/hw/pci/pci.h |  1 +
>>>>>>>>>  2 files changed, 17 insertions(+), 6 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>>>>> index 4c004f5..0072b54 100644
>>>>>>>>> --- a/hw/pci/pci.c
>>>>>>>>> +++ b/hw/pci/pci.c
>>>>>>>>> @@ -812,12 +812,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
>>>>>>>>>      }
>>>>>>>>>  
>>>>>>>>>      pci_dev->bus = bus;
>>>>>>>>> -    if (bus->iommu_fn) {
>>>>>>>>> -        dma_as = bus->iommu_fn(bus, bus->iommu_opaque, devfn);
>>>>>>>>> -    } else {
>>>>>>>>> -        /* FIXME: inherit memory region from bus creator */
>>>>>>>>> -        dma_as = &address_space_memory;
>>>>>>>>> -    }
>>>>>>>>> +    dma_as = pci_device_iommu_address_space(pci_dev);
>>>>>>>>>  
>>>>>>>>>      memory_region_init_alias(&pci_dev->bus_master_enable_region,
>>>>>>>>>                               OBJECT(pci_dev), "bus master",
>>>>>>>>> @@ -2239,6 +2234,21 @@ static void pci_device_class_init(ObjectClass *klass, void *data)
>>>>>>>>>      k->props = pci_props;
>>>>>>>>>  }
>>>>>>>>>  
>>>>>>>>> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>>>>>>>>> +{
>>>>>>>>> +    PCIBus *bus = PCI_BUS(dev->bus);
>>>>>>>>> +
>>>>>>>>> +    if (bus->iommu_fn) {
>>>>>>>>> +        return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    if (bus->parent_dev) {
>>>>>>>>> +        return pci_device_iommu_address_space(bus->parent_dev);
>>>>>>>>> +    }
>>>>>>>>
>>>>>>>> No, this would fail if bus->parent_dev is not NULL but not a PCI device
>>>>>>>> either.
>>>>>>>
>>>>>>> parent_dev is of the PCIDevice* type, how can it be not a PCI device? :-/
>>>>>>
>>>>>> Doh, I misread the code, I thought it was the "parent" field in
>>>>>> BusState.  Why do we have parent_dev at all?
>>>>>
>>>>> The code is too old? Don't know.
>>>>>
>>>>>
>>>>>>>> You can use object_dynamic_cast to convert the parent_dev to
>>>>>>>> PCIDevice, and if the cast succeeds you call the new function.
>>>>>>>>
>>>>>>>> Perhaps you could make the new function take a PCIBus instead.
>>>>>>>> Accessing the PCIDevice's IOMMU address space (as opposed to the
>>>>>>>> bus-master address space) doesn't make much sense, VFIO is really a
>>>>>>>> special case.  Putting the new API on the bus side instead looks better.
>>>>>>>>
>>>>>>>> (BTW, do you need to enable bus-master DMA on PCI bridges, to do DMA for
>>>>>>>> devices sitting on the secondary bus?)
>>>>>>>
>>>>>>> It happens naturally I guess when linux enables devices.
>>>>>>
>>>>>> Yes, but then using the IOMMU address space would be wrong; you would
>>>>>> have to use the bus-master address space as a base for the child's
>>>>>> bus-master address space.
>>>>>
>>>>>
>>>>> Like this? Works too.
>>>>>
>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>> index 8bdcedc..a4c70e6 100644
>>>>> --- a/hw/pci/pci.c
>>>>> +++ b/hw/pci/pci.c
>>>>> @@ -2247,23 +2247,23 @@ AddressSpace
>>>>> *pci_device_iommu_address_space(PCIDevice *dev)
>>>>>  {
>>>>>      PCIBus *bus = PCI_BUS(dev->bus);
>>>>>
>>>>>      if (bus->iommu_fn) {
>>>>>          return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
>>>>>      }
>>>>>
>>>>>      if (bus->parent_dev) {
>>>>>          return pci_device_iommu_address_space(bus->parent_dev);
>>>>>      }
>>>>>
>>>>> -    return &address_space_memory;
>>>>> +    return &dev->bus_master_as;
>>>>>  }
>>>>
>>>> I was thinking more like this:
>>>>
>>>>      if (bus->parent_dev) {
>>>> -        return pci_device_iommu_address_space(bus->parent_dev);
>>>> +        /* Take parent device's bus-master enable bit into account.  */
>>>> +        return pci_get_address_space(bus->parent_dev);
>>>>      }
>>>>
>>>> +    /* Not a secondary bus and no IOMMU.  Use system memory.  */
>>>>      return &address_space_memory;
>>>
>>> Ok.
>>>
>>> Oh. BTW. This "bus master" thing now breaks VFIO's check for all devices
>>> being in the same address space as every single device has its own "bus
>>> master" AddressSpace.
>>
>> Yes, fixing that check is another good use of the new API you introduced.
>>
>> But after Ben's answer, I guess the above change is not really needed.
>> It would add complication for VFIO, too.  Proper emulation would require
>> QEMU to trap writes to the device's bus-master bit.  QEMU would have to
>> take of the value that the guest writes, AND it with the bus-master
>> bit(s) of all bridges between the host bridge and the VFIO device, and
>> write the result to the passed-through device.  This is because the
>> bridges are emulated, and do not exist in real hardware.
> 
> So does this mean that we go with the original patch and ignore bus master
> address space here?

Yes.  Just add a comment that we are ignoring the bus master DMA bit of
the bridge.

> I guess I could overcome that VFIO check by comparing not just AddressSpace
> but AddressSpace->root if AddressSpace is different but it does not make a
> lot of sense.

Paolo
diff mbox

Patch

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8bdcedc..a4c70e6 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2247,23 +2247,23 @@  AddressSpace
*pci_device_iommu_address_space(PCIDevice *dev)
 {
     PCIBus *bus = PCI_BUS(dev->bus);

     if (bus->iommu_fn) {
         return bus->iommu_fn(bus, bus->iommu_opaque, dev->devfn);
     }

     if (bus->parent_dev) {
         return pci_device_iommu_address_space(bus->parent_dev);
     }

-    return &address_space_memory;
+    return &dev->bus_master_as;
 }



> Also, we would have to fix the x86 firmware.
> 
> Paolo
>