diff mbox series

[qemu] vfio/spapr: Allow fallback to SPAPR TCE IOMMU v1

Message ID 20171122051552.3529-1-aik@ozlabs.ru
State New
Headers show
Series [qemu] vfio/spapr: Allow fallback to SPAPR TCE IOMMU v1 | expand

Commit Message

Alexey Kardashevskiy Nov. 22, 2017, 5:15 a.m. UTC
The vfio_iommu_spapr_tce driver always advertises v1 and v2 IOMMU support,
however PR KVM (a special version of KVM designed to work in
a paravirtualized system; these days used for nested virtualizaion) only
supports the "pseries" platform which does not support v2. Since there is
no way to choose the IOMMU version in QEMU, it fails to start.

This adds a fallback to the v1 IOMMU if v2 cannot be used.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/vfio/common.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

David Gibson Nov. 22, 2017, 1:39 p.m. UTC | #1
On Wed, Nov 22, 2017 at 04:15:52PM +1100, Alexey Kardashevskiy wrote:
> The vfio_iommu_spapr_tce driver always advertises v1 and v2 IOMMU support,
> however PR KVM (a special version of KVM designed to work in
> a paravirtualized system; these days used for nested virtualizaion) only
> supports the "pseries" platform which does not support v2. Since there is
> no way to choose the IOMMU version in QEMU, it fails to start.
> 
> This adds a fallback to the v1 IOMMU if v2 cannot be used.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

The fallback itself isn't a bad idea, but your commit message contains
several inaccurracies.  KVM PR is not particularly designed to work in
a paravirtualized system, and it doesn't only support the pseries
platform (as guest *or* host).  It's actually a lot more general than
KVM HV - just slow, not that well tested and missing a number of
features that no-one's bothered to port to it.

> ---
>  hw/vfio/common.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c..cd81cc9 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1040,6 +1040,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>              v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
>          ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
>          if (ret) {
> +            container->iommu_type = VFIO_SPAPR_TCE_IOMMU;
> +            v2 = false;
> +            ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
> +        }
> +        if (ret) {
>              error_setg_errno(errp, errno, "failed to set iommu for container");
>              ret = -errno;
>              goto free_container_exit;
Alexey Kardashevskiy Nov. 23, 2017, 1:58 a.m. UTC | #2
On 23/11/17 00:39, David Gibson wrote:
> On Wed, Nov 22, 2017 at 04:15:52PM +1100, Alexey Kardashevskiy wrote:
>> The vfio_iommu_spapr_tce driver always advertises v1 and v2 IOMMU support,
>> however PR KVM (a special version of KVM designed to work in
>> a paravirtualized system; these days used for nested virtualizaion) only
>> supports the "pseries" platform which does not support v2. Since there is
>> no way to choose the IOMMU version in QEMU, it fails to start.
>>
>> This adds a fallback to the v1 IOMMU if v2 cannot be used.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> The fallback itself isn't a bad idea, but your commit message contains
> several inaccurracies.  KVM PR is not particularly designed to work in
> a paravirtualized system, and it doesn't only support the pseries
> platform (as guest *or* host).  It's actually a lot more general than
> KVM HV - just slow, not that well tested and missing a number of
> features that no-one's bothered to port to it.

Well, true. I kinda tried to give an example of how this may be useful and
exaggerated a bit, plus my ignorance :) I'll repost if Alex does not have
objections otherwise.



> 
>> ---
>>  hw/vfio/common.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 7b2924c..cd81cc9 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1040,6 +1040,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>>              v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
>>          ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
>>          if (ret) {
>> +            container->iommu_type = VFIO_SPAPR_TCE_IOMMU;
>> +            v2 = false;
>> +            ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
>> +        }
>> +        if (ret) {
>>              error_setg_errno(errp, errno, "failed to set iommu for container");
>>              ret = -errno;
>>              goto free_container_exit;
>
Alex Williamson Nov. 29, 2017, 3:26 p.m. UTC | #3
On Wed, 22 Nov 2017 16:15:52 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> The vfio_iommu_spapr_tce driver always advertises v1 and v2 IOMMU support,
> however PR KVM (a special version of KVM designed to work in
> a paravirtualized system; these days used for nested virtualizaion) only
> supports the "pseries" platform which does not support v2. Since there is
> no way to choose the IOMMU version in QEMU, it fails to start.

Seems like the bug is then in advertising v2 support when it doesn't
exist.  Isn't that a kernel bug?  Otherwise I think this is a long
standing bug, since QEMU-2.7 and probably not 2.11 material this late
into the freeze.  A "Fixes" tag would probably also be appropriate,
identifying the commit where this was introduced (318f67ce1371) if it
is a QEMU bug.  Thanks,

Alex

> This adds a fallback to the v1 IOMMU if v2 cannot be used.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/vfio/common.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c..cd81cc9 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1040,6 +1040,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>              v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
>          ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
>          if (ret) {
> +            container->iommu_type = VFIO_SPAPR_TCE_IOMMU;
> +            v2 = false;
> +            ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
> +        }
> +        if (ret) {
>              error_setg_errno(errp, errno, "failed to set iommu for container");
>              ret = -errno;
>              goto free_container_exit;
Alexey Kardashevskiy Nov. 30, 2017, 4:43 a.m. UTC | #4
On 30/11/17 02:26, Alex Williamson wrote:
> On Wed, 22 Nov 2017 16:15:52 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> The vfio_iommu_spapr_tce driver always advertises v1 and v2 IOMMU support,
>> however PR KVM (a special version of KVM designed to work in
>> a paravirtualized system; these days used for nested virtualizaion) only
>> supports the "pseries" platform which does not support v2. Since there is
>> no way to choose the IOMMU version in QEMU, it fails to start.
> 
> Seems like the bug is then in advertising v2 support when it doesn't
> exist. 


The way to distinguish v1 IOMMUs from v2 IOMMU in the powerpc code is: get
an IOMMU group ops pointer and see if DMA window callbacks are defined. So
at least in theory it is still possible to have both types of IOMMUs and
VFIO_CHECK_EXTENSION is called before any group is attached to a container.

Well, I could walk through all IOMMU groups in the system (possibly
hundreds), work out the maximum common set of IOMMU types and only
advertise those. Doing fallback to v1 seems nicer, no?



> Isn't that a kernel bug?  Otherwise I think this is a long
> standing bug, since QEMU-2.7 and probably not 2.11 material this late
> into the freeze.  A "Fixes" tag would probably also be appropriate,
> identifying the commit where this was introduced (318f67ce1371) if it
> is a QEMU bug.  Thanks,
> 
> Alex
> 
>> This adds a fallback to the v1 IOMMU if v2 cannot be used.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  hw/vfio/common.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 7b2924c..cd81cc9 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1040,6 +1040,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>>              v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
>>          ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
>>          if (ret) {
>> +            container->iommu_type = VFIO_SPAPR_TCE_IOMMU;
>> +            v2 = false;
>> +            ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
>> +        }
>> +        if (ret) {
>>              error_setg_errno(errp, errno, "failed to set iommu for container");
>>              ret = -errno;
>>              goto free_container_exit;
>
Alex Williamson Nov. 30, 2017, 7:34 p.m. UTC | #5
On Thu, 30 Nov 2017 15:43:08 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 30/11/17 02:26, Alex Williamson wrote:
> > On Wed, 22 Nov 2017 16:15:52 +1100
> > Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >   
> >> The vfio_iommu_spapr_tce driver always advertises v1 and v2 IOMMU support,
> >> however PR KVM (a special version of KVM designed to work in
> >> a paravirtualized system; these days used for nested virtualizaion) only
> >> supports the "pseries" platform which does not support v2. Since there is
> >> no way to choose the IOMMU version in QEMU, it fails to start.  
> > 
> > Seems like the bug is then in advertising v2 support when it doesn't
> > exist.   
> 
> 
> The way to distinguish v1 IOMMUs from v2 IOMMU in the powerpc code is: get
> an IOMMU group ops pointer and see if DMA window callbacks are defined. So
> at least in theory it is still possible to have both types of IOMMUs and
> VFIO_CHECK_EXTENSION is called before any group is attached to a container.
> 
> Well, I could walk through all IOMMU groups in the system (possibly
> hundreds), work out the maximum common set of IOMMU types and only
> advertise those. Doing fallback to v1 seems nicer, no?

It depends how you define "nicer".  As a user I'd think it's nicer if
the kernel doesn't advertise extensions that it can't support.  From
your perspective, it's nicer that the user can avoid the issue with
less code than the kernel.  Looking for precedent, I see that we
already have a similar issue with VFIO_TYPE1_NESTING_IOMMU, checking
the capability doesn't do anything to check whether we can actually set
the domain attribute, it only indicates that the kernel code is capable
of supporting nesting if the iommu domain does as well.  So I suppose
that's the definition we use.  I'd still like to see a Fixes tag, but
it's clearly not 2.11 material.  Thanks,

Alex

> > Isn't that a kernel bug?  Otherwise I think this is a long
> > standing bug, since QEMU-2.7 and probably not 2.11 material this late
> > into the freeze.  A "Fixes" tag would probably also be appropriate,
> > identifying the commit where this was introduced (318f67ce1371) if it
> > is a QEMU bug.  Thanks,
> > 
> > Alex
> >   
> >> This adds a fallback to the v1 IOMMU if v2 cannot be used.
> >>
> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> ---
> >>  hw/vfio/common.c | 5 +++++
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 7b2924c..cd81cc9 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -1040,6 +1040,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
> >>              v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
> >>          ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
> >>          if (ret) {
> >> +            container->iommu_type = VFIO_SPAPR_TCE_IOMMU;
> >> +            v2 = false;
> >> +            ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
> >> +        }
> >> +        if (ret) {
> >>              error_setg_errno(errp, errno, "failed to set iommu for container");
> >>              ret = -errno;
> >>              goto free_container_exit;  
> >   
> 
>
diff mbox series

Patch

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 7b2924c..cd81cc9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1040,6 +1040,11 @@  static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
             v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
         ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
         if (ret) {
+            container->iommu_type = VFIO_SPAPR_TCE_IOMMU;
+            v2 = false;
+            ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
+        }
+        if (ret) {
             error_setg_errno(errp, errno, "failed to set iommu for container");
             ret = -errno;
             goto free_container_exit;