diff mbox

[qemu,2/2] spapr_pci: Advertise 16M IOMMU pages when available

Message ID 20161222052212.49006-3-aik@ozlabs.ru
State New
Headers show

Commit Message

Alexey Kardashevskiy Dec. 22, 2016, 5:22 a.m. UTC
On sPAPR, IOMMU page size varies and if QEMU is running with RAM
backed with hugepages, we can advertise this to the guest so does
this patch.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_pci.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

David Gibson Jan. 2, 2017, 11:41 p.m. UTC | #1
On Thu, Dec 22, 2016 at 04:22:12PM +1100, Alexey Kardashevskiy wrote:
> On sPAPR, IOMMU page size varies and if QEMU is running with RAM
> backed with hugepages, we can advertise this to the guest so does
> this patch.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/ppc/spapr_pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index fd6fc1d953..09244056fc 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1505,6 +1505,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>      }
>  
>      /* DMA setup */
> +    /* This allows huge pages for IOMMU when guest is backed with huge pages */
> +    sphb->page_size_mask |= qemu_getrampagesize();

This doesn't look right - you're unconditionally enabling the host ram
page size, regardless of anything else.  Instead the backing page size
should be used to filter out those sizes which are possible from the
list of those supported by the guest hardware.  This patch will give
particularly odd results if you ran it on x86 with hugepages for
example: it would advertise a 2M IOMMU page size, which could never
exist on native POWER.

Except... come to think of it, why is the backing RAM page size
relevant at all?  Or rather.. I think VFIO should be able to cope with
any guest IOMMU page size which is larger than the host ram page size
(although if it's much larger it could get expensive in the host
tables).  This case would already be routine for ppc64 on x86, where
the guest IOMMU page size is 64kiB, but the host page size is 4 kiB.
Alexey Kardashevskiy Jan. 9, 2017, 2:06 a.m. UTC | #2
On 03/01/17 10:41, David Gibson wrote:
> On Thu, Dec 22, 2016 at 04:22:12PM +1100, Alexey Kardashevskiy wrote:
>> On sPAPR, IOMMU page size varies and if QEMU is running with RAM
>> backed with hugepages, we can advertise this to the guest so does
>> this patch.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  hw/ppc/spapr_pci.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index fd6fc1d953..09244056fc 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -1505,6 +1505,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>      }
>>  
>>      /* DMA setup */
>> +    /* This allows huge pages for IOMMU when guest is backed with huge pages */
>> +    sphb->page_size_mask |= qemu_getrampagesize();
> 
> This doesn't look right - you're unconditionally enabling the host ram
> page size, regardless of anything else.  Instead the backing page size
> should be used to filter out those sizes which are possible from the
> list of those supported by the guest hardware.  This patch will give
> particularly odd results if you ran it on x86 with hugepages for
> example: it would advertise a 2M IOMMU page size, which could never
> exist on native POWER.

Ok, I'll filter 16M out if passed to PHB and not supported by the host.


> Except... come to think of it, why is the backing RAM page size
> relevant at all? 

Because this is just an optimization/acceleration and I'd think the user
wants to know if it is actually accelerated or not. If I always allow 16M
pages, and QEMU is not backed with hugepages, then all H_PUT_TCE will go
via slow path and consume as much memory for TCE as without hugepages, and
it will only be visible to the user if TCE-tracepoints are enabled.

> Or rather.. I think VFIO should be able to cope with
> any guest IOMMU page size which is larger than the host ram page size

It could, I just do not see much benefit in it. pseries guest can negotiate
4k, 64k, 16m pages and this seems to cover everything we want, why would we
want to emulate IOMMU page size?

> (although if it's much larger it could get expensive in the host
> tables).  This case would already be routine for ppc64 on x86, where
> the guest IOMMU page size is 64kiB, but the host page size is 4 kiB.
David Gibson Jan. 12, 2017, 5:09 a.m. UTC | #3
On Mon, Jan 09, 2017 at 01:06:03PM +1100, Alexey Kardashevskiy wrote:
> On 03/01/17 10:41, David Gibson wrote:
> > On Thu, Dec 22, 2016 at 04:22:12PM +1100, Alexey Kardashevskiy wrote:
> >> On sPAPR, IOMMU page size varies and if QEMU is running with RAM
> >> backed with hugepages, we can advertise this to the guest so does
> >> this patch.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>  hw/ppc/spapr_pci.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> >> index fd6fc1d953..09244056fc 100644
> >> --- a/hw/ppc/spapr_pci.c
> >> +++ b/hw/ppc/spapr_pci.c
> >> @@ -1505,6 +1505,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
> >>      }
> >>  
> >>      /* DMA setup */
> >> +    /* This allows huge pages for IOMMU when guest is backed with huge pages */
> >> +    sphb->page_size_mask |= qemu_getrampagesize();
> > 
> > This doesn't look right - you're unconditionally enabling the host ram
> > page size, regardless of anything else.  Instead the backing page size
> > should be used to filter out those sizes which are possible from the
> > list of those supported by the guest hardware.  This patch will give
> > particularly odd results if you ran it on x86 with hugepages for
> > example: it would advertise a 2M IOMMU page size, which could never
> > exist on native POWER.
> 
> Ok, I'll filter 16M out if passed to PHB and not supported by the host.
> 
> 
> > Except... come to think of it, why is the backing RAM page size
> > relevant at all? 
> 
> Because this is just an optimization/acceleration and I'd think the user
> wants to know if it is actually accelerated or not. If I always allow 16M
> pages, and QEMU is not backed with hugepages, then all H_PUT_TCE will go
> via slow path and consume as much memory for TCE as without hugepages, and
> it will only be visible to the user if TCE-tracepoints are enabled.

Hm, ok, fair enough.

> > Or rather.. I think VFIO should be able to cope with
> > any guest IOMMU page size which is larger than the host ram page size
> 
> It could, I just do not see much benefit in it. pseries guest can negotiate
> 4k, 64k, 16m pages and this seems to cover everything we want, why would we
> want to emulate IOMMU page size?

Just for testing or debugging, I suppose.

> 
> > (although if it's much larger it could get expensive in the host
> > tables).  This case would already be routine for ppc64 on x86, where
> > the guest IOMMU page size is 64kiB, but the host page size is 4 kiB.
> 
> 
> 
>
diff mbox

Patch

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index fd6fc1d953..09244056fc 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1505,6 +1505,9 @@  static void spapr_phb_realize(DeviceState *dev, Error **errp)
     }
 
     /* DMA setup */
+    /* This allows huge pages for IOMMU when guest is backed with huge pages */
+    sphb->page_size_mask |= qemu_getrampagesize();
+
     for (i = 0; i < windows_supported; ++i) {
         tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn[i]);
         if (!tcet) {