[guest,kernel] vfio/powerpc/spapr_tce: Enforce IOMMU type compatibility check

Message ID 20170324064406.17076-1-aik@ozlabs.ru
State Accepted
Headers show

Commit Message

Alexey Kardashevskiy March 24, 2017, 6:44 a.m.
The existing SPAPR TCE driver advertises both VFIO_SPAPR_TCE_IOMMU and
VFIO_SPAPR_TCE_v2_IOMMU types to the userspace and the userspace usually
picks the v2.

Normally the userspace would create a container, attach an IOMMU group
to it and only then set the IOMMU type (which would normally be v2).

However a specific IOMMU group may not support v2, in other words
it may not implement set_window/unset_window/take_ownership/
release_ownership and such a group should not be attached to
a v2 container.

This adds extra checks that a new group can do what the selected IOMMU
type suggests. The userspace can then test the return value from
ioctl(VFIO_SET_IOMMU, VFIO_SPAPR_TCE_v2_IOMMU) and try
VFIO_SPAPR_TCE_IOMMU.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

This is one of the patches needed to do nested VFIO - for either
second level guest or DPDK running in a guest.
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Alex Williamson March 24, 2017, 8:29 p.m. | #1
On Fri, 24 Mar 2017 17:44:06 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> The existing SPAPR TCE driver advertises both VFIO_SPAPR_TCE_IOMMU and
> VFIO_SPAPR_TCE_v2_IOMMU types to the userspace and the userspace usually
> picks the v2.
> 
> Normally the userspace would create a container, attach an IOMMU group
> to it and only then set the IOMMU type (which would normally be v2).
> 
> However a specific IOMMU group may not support v2, in other words
> it may not implement set_window/unset_window/take_ownership/
> release_ownership and such a group should not be attached to
> a v2 container.
> 
> This adds extra checks that a new group can do what the selected IOMMU
> type suggests. The userspace can then test the return value from
> ioctl(VFIO_SET_IOMMU, VFIO_SPAPR_TCE_v2_IOMMU) and try
> VFIO_SPAPR_TCE_IOMMU.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> This is one of the patches needed to do nested VFIO - for either
> second level guest or DPDK running in a guest.
> ---
>  drivers/vfio/vfio_iommu_spapr_tce.c | 8 ++++++++
>  1 file changed, 8 insertions(+)

I'm not sure I understand why you're labeling this "guest kernel", is a
VM the only case where we can have combinations that only a subset of
the groups might support v2?  What terrible things happen when such a
combination is created?  The fix itself seems sane, but I'm trying to
figure out whether it should be marked for stable, should go in for
v4.11, or be queued for v4.12.  Thanks,

Alex

> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> index cf3de91fbfe7..a7d811524092 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -1335,8 +1335,16 @@ static int tce_iommu_attach_group(void *iommu_data,
>  
>  	if (!table_group->ops || !table_group->ops->take_ownership ||
>  			!table_group->ops->release_ownership) {
> +		if (container->v2) {
> +			ret = -EPERM;
> +			goto unlock_exit;
> +		}
>  		ret = tce_iommu_take_ownership(container, table_group);
>  	} else {
> +		if (!container->v2) {
> +			ret = -EPERM;
> +			goto unlock_exit;
> +		}
>  		ret = tce_iommu_take_ownership_ddw(container, table_group);
>  		if (!tce_groups_attached(container) && !container->tables[0])
>  			container->def_window_pending = true;

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexey Kardashevskiy March 25, 2017, 12:25 p.m. | #2
On 25/03/17 07:29, Alex Williamson wrote:
> On Fri, 24 Mar 2017 17:44:06 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> The existing SPAPR TCE driver advertises both VFIO_SPAPR_TCE_IOMMU and
>> VFIO_SPAPR_TCE_v2_IOMMU types to the userspace and the userspace usually
>> picks the v2.
>>
>> Normally the userspace would create a container, attach an IOMMU group
>> to it and only then set the IOMMU type (which would normally be v2).
>>
>> However a specific IOMMU group may not support v2, in other words
>> it may not implement set_window/unset_window/take_ownership/
>> release_ownership and such a group should not be attached to
>> a v2 container.
>>
>> This adds extra checks that a new group can do what the selected IOMMU
>> type suggests. The userspace can then test the return value from
>> ioctl(VFIO_SET_IOMMU, VFIO_SPAPR_TCE_v2_IOMMU) and try
>> VFIO_SPAPR_TCE_IOMMU.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> This is one of the patches needed to do nested VFIO - for either
>> second level guest or DPDK running in a guest.
>> ---
>>  drivers/vfio/vfio_iommu_spapr_tce.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
> 
> I'm not sure I understand why you're labeling this "guest kernel", is a


That is my script :)

> VM the only case where we can have combinations that only a subset of
> the groups might support v2?  

powernv (non-virtualized, and it runs HV KVM) host provides v2-capable
groups, they all the same, and a pseries host (which normally runs as a
guest but it can do nested KVM as well - it is called PR KVM) can do only
v1 (after this patch, without it - no vfio at all).

> What terrible things happen when such a
> combination is created?

There is no mixture at the moment, I just needed a way to tell userspace
that a group cannot do v2.

> The fix itself seems sane, but I'm trying to
> figure out whether it should be marked for stable, should go in for
> v4.11, or be queued for v4.12.  Thanks,

No need for stable.


> 
> Alex
> 
>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
>> index cf3de91fbfe7..a7d811524092 100644
>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>> @@ -1335,8 +1335,16 @@ static int tce_iommu_attach_group(void *iommu_data,
>>  
>>  	if (!table_group->ops || !table_group->ops->take_ownership ||
>>  			!table_group->ops->release_ownership) {
>> +		if (container->v2) {
>> +			ret = -EPERM;
>> +			goto unlock_exit;
>> +		}
>>  		ret = tce_iommu_take_ownership(container, table_group);
>>  	} else {
>> +		if (!container->v2) {
>> +			ret = -EPERM;
>> +			goto unlock_exit;
>> +		}
>>  		ret = tce_iommu_take_ownership_ddw(container, table_group);
>>  		if (!tce_groups_attached(container) && !container->tables[0])
>>  			container->def_window_pending = true;
>
Alexey Kardashevskiy April 4, 2017, 10:12 a.m. | #3
On 25/03/17 23:25, Alexey Kardashevskiy wrote:
> On 25/03/17 07:29, Alex Williamson wrote:
>> On Fri, 24 Mar 2017 17:44:06 +1100
>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>>> The existing SPAPR TCE driver advertises both VFIO_SPAPR_TCE_IOMMU and
>>> VFIO_SPAPR_TCE_v2_IOMMU types to the userspace and the userspace usually
>>> picks the v2.
>>>
>>> Normally the userspace would create a container, attach an IOMMU group
>>> to it and only then set the IOMMU type (which would normally be v2).
>>>
>>> However a specific IOMMU group may not support v2, in other words
>>> it may not implement set_window/unset_window/take_ownership/
>>> release_ownership and such a group should not be attached to
>>> a v2 container.
>>>
>>> This adds extra checks that a new group can do what the selected IOMMU
>>> type suggests. The userspace can then test the return value from
>>> ioctl(VFIO_SET_IOMMU, VFIO_SPAPR_TCE_v2_IOMMU) and try
>>> VFIO_SPAPR_TCE_IOMMU.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>
>>> This is one of the patches needed to do nested VFIO - for either
>>> second level guest or DPDK running in a guest.
>>> ---
>>>  drivers/vfio/vfio_iommu_spapr_tce.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>
>> I'm not sure I understand why you're labeling this "guest kernel", is a
> 
> 
> That is my script :)
> 
>> VM the only case where we can have combinations that only a subset of
>> the groups might support v2?  
> 
> powernv (non-virtualized, and it runs HV KVM) host provides v2-capable
> groups, they all the same, and a pseries host (which normally runs as a
> guest but it can do nested KVM as well - it is called PR KVM) can do only
> v1 (after this patch, without it - no vfio at all).
> 
>> What terrible things happen when such a
>> combination is created?
> 
> There is no mixture at the moment, I just needed a way to tell userspace
> that a group cannot do v2.
> 
>> The fix itself seems sane, but I'm trying to
>> figure out whether it should be marked for stable, should go in for
>> v4.11, or be queued for v4.12.  Thanks,
> 
> No need for stable.


So what is the next step with this patch?


> 
> 
>>
>> Alex
>>
>>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
>>> index cf3de91fbfe7..a7d811524092 100644
>>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>>> @@ -1335,8 +1335,16 @@ static int tce_iommu_attach_group(void *iommu_data,
>>>  
>>>  	if (!table_group->ops || !table_group->ops->take_ownership ||
>>>  			!table_group->ops->release_ownership) {
>>> +		if (container->v2) {
>>> +			ret = -EPERM;
>>> +			goto unlock_exit;
>>> +		}
>>>  		ret = tce_iommu_take_ownership(container, table_group);
>>>  	} else {
>>> +		if (!container->v2) {
>>> +			ret = -EPERM;
>>> +			goto unlock_exit;
>>> +		}
>>>  		ret = tce_iommu_take_ownership_ddw(container, table_group);
>>>  		if (!tce_groups_attached(container) && !container->tables[0])
>>>  			container->def_window_pending = true;
>>
> 
>
Alex Williamson April 4, 2017, 3:36 p.m. | #4
On Tue, 4 Apr 2017 20:12:45 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 25/03/17 23:25, Alexey Kardashevskiy wrote:
> > On 25/03/17 07:29, Alex Williamson wrote:  
> >> On Fri, 24 Mar 2017 17:44:06 +1100
> >> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >>  
> >>> The existing SPAPR TCE driver advertises both VFIO_SPAPR_TCE_IOMMU and
> >>> VFIO_SPAPR_TCE_v2_IOMMU types to the userspace and the userspace usually
> >>> picks the v2.
> >>>
> >>> Normally the userspace would create a container, attach an IOMMU group
> >>> to it and only then set the IOMMU type (which would normally be v2).
> >>>
> >>> However a specific IOMMU group may not support v2, in other words
> >>> it may not implement set_window/unset_window/take_ownership/
> >>> release_ownership and such a group should not be attached to
> >>> a v2 container.
> >>>
> >>> This adds extra checks that a new group can do what the selected IOMMU
> >>> type suggests. The userspace can then test the return value from
> >>> ioctl(VFIO_SET_IOMMU, VFIO_SPAPR_TCE_v2_IOMMU) and try
> >>> VFIO_SPAPR_TCE_IOMMU.
> >>>
> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>> ---
> >>>
> >>> This is one of the patches needed to do nested VFIO - for either
> >>> second level guest or DPDK running in a guest.
> >>> ---
> >>>  drivers/vfio/vfio_iommu_spapr_tce.c | 8 ++++++++
> >>>  1 file changed, 8 insertions(+)  
> >>
> >> I'm not sure I understand why you're labeling this "guest kernel", is a  
> > 
> > 
> > That is my script :)
> >   
> >> VM the only case where we can have combinations that only a subset of
> >> the groups might support v2?    
> > 
> > powernv (non-virtualized, and it runs HV KVM) host provides v2-capable
> > groups, they all the same, and a pseries host (which normally runs as a
> > guest but it can do nested KVM as well - it is called PR KVM) can do only
> > v1 (after this patch, without it - no vfio at all).
> >   
> >> What terrible things happen when such a
> >> combination is created?  
> > 
> > There is no mixture at the moment, I just needed a way to tell userspace
> > that a group cannot do v2.
> >   
> >> The fix itself seems sane, but I'm trying to
> >> figure out whether it should be marked for stable, should go in for
> >> v4.11, or be queued for v4.12.  Thanks,  
> > 
> > No need for stable.  
> 
> 
> So what is the next step with this patch?

Unless there are objections or further comments, I'll put this in my
next branch for v4.12, probably this week.  Thanks,

Alex

> >>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> >>> index cf3de91fbfe7..a7d811524092 100644
> >>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> >>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >>> @@ -1335,8 +1335,16 @@ static int tce_iommu_attach_group(void *iommu_data,
> >>>  
> >>>  	if (!table_group->ops || !table_group->ops->take_ownership ||
> >>>  			!table_group->ops->release_ownership) {
> >>> +		if (container->v2) {
> >>> +			ret = -EPERM;
> >>> +			goto unlock_exit;
> >>> +		}
> >>>  		ret = tce_iommu_take_ownership(container, table_group);
> >>>  	} else {
> >>> +		if (!container->v2) {
> >>> +			ret = -EPERM;
> >>> +			goto unlock_exit;
> >>> +		}
> >>>  		ret = tce_iommu_take_ownership_ddw(container, table_group);
> >>>  		if (!tce_groups_attached(container) && !container->tables[0])
> >>>  			container->def_window_pending = true;  
> >>  
> > 
> >   
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
index cf3de91fbfe7..a7d811524092 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -1335,8 +1335,16 @@  static int tce_iommu_attach_group(void *iommu_data,
 
 	if (!table_group->ops || !table_group->ops->take_ownership ||
 			!table_group->ops->release_ownership) {
+		if (container->v2) {
+			ret = -EPERM;
+			goto unlock_exit;
+		}
 		ret = tce_iommu_take_ownership(container, table_group);
 	} else {
+		if (!container->v2) {
+			ret = -EPERM;
+			goto unlock_exit;
+		}
 		ret = tce_iommu_take_ownership_ddw(container, table_group);
 		if (!tce_groups_attached(container) && !container->tables[0])
 			container->def_window_pending = true;