[kernel,2/3] vfio_pci: Allow regions to add own capabilities

Message ID 20181015094233.1324-3-aik@ozlabs.ru
State New
Headers show
Series
  • vfio/spapr_tce: Reworks for NVIDIA V100 + P9 passthrough (part 3)
Related show

Commit Message

Alexey Kardashevskiy Oct. 15, 2018, 9:42 a.m.
VFIO regions already support region capabilities with a limited set of
fields. However the subdriver might have to report to the userspace
additional bits.

This adds an add_capability() hook to vfio_pci_regops.

This is aiming Witherspoon POWER9 machines which have multiple
interconnected NVIDIA V100 GPUs with coherent RAM; each GPU's RAM
is mapped to a system bus and to each of GPU internal system bus and
the GPUs use this for DMA routing as DMA trafic can go via any
of many NVLink2 (GPU-GPU or GPU-CPU) or even stay local within a GPU.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

This is based on top of "vfio_pci: Allow mapping extra regions"
---
 drivers/vfio/pci/vfio_pci_private.h | 3 +++
 drivers/vfio/pci/vfio_pci.c         | 6 ++++++
 2 files changed, 9 insertions(+)

Comments

David Gibson Nov. 8, 2018, 6:21 a.m. | #1
On Mon, Oct 15, 2018 at 08:42:32PM +1100, Alexey Kardashevskiy wrote:
> VFIO regions already support region capabilities with a limited set of
> fields. However the subdriver might have to report to the userspace
> additional bits.
> 
> This adds an add_capability() hook to vfio_pci_regops.
> 
> This is aiming Witherspoon POWER9 machines which have multiple
> interconnected NVIDIA V100 GPUs with coherent RAM; each GPU's RAM
> is mapped to a system bus and to each of GPU internal system bus and
> the GPUs use this for DMA routing as DMA trafic can go via any
> of many NVLink2 (GPU-GPU or GPU-CPU) or even stay local within a
> GPU.

This description doesn't really make clear how per-region capabilities
are relevant to these devices.

> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> This is based on top of "vfio_pci: Allow mapping extra regions"
> ---
>  drivers/vfio/pci/vfio_pci_private.h | 3 +++
>  drivers/vfio/pci/vfio_pci.c         | 6 ++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index 86aab05..93c1738 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -62,6 +62,9 @@ struct vfio_pci_regops {
>  	int	(*mmap)(struct vfio_pci_device *vdev,
>  			struct vfio_pci_region *region,
>  			struct vm_area_struct *vma);
> +	int	(*add_capability)(struct vfio_pci_device *vdev,
> +				  struct vfio_pci_region *region,
> +				  struct vfio_info_cap *caps);
>  };
>  
>  struct vfio_pci_region {
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 7923314..4a3b93e 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -759,6 +759,12 @@ static long vfio_pci_ioctl(void *device_data,
>  			if (ret)
>  				return ret;
>  
> +			if (vdev->region[i].ops->add_capability) {
> +				ret = vdev->region[i].ops->add_capability(vdev,
> +						&vdev->region[i], &caps);
> +				if (ret)
> +					return ret;
> +			}
>  		}
>  		}
>
Alexey Kardashevskiy Nov. 8, 2018, 6:48 a.m. | #2
On 08/11/2018 17:21, David Gibson wrote:
> On Mon, Oct 15, 2018 at 08:42:32PM +1100, Alexey Kardashevskiy wrote:
>> VFIO regions already support region capabilities with a limited set of
>> fields. However the subdriver might have to report to the userspace
>> additional bits.
>>
>> This adds an add_capability() hook to vfio_pci_regops.
>>
>> This is aiming Witherspoon POWER9 machines which have multiple
>> interconnected NVIDIA V100 GPUs with coherent RAM; each GPU's RAM
>> is mapped to a system bus and to each of GPU internal system bus and
>> the GPUs use this for DMA routing as DMA trafic can go via any
>> of many NVLink2 (GPU-GPU or GPU-CPU) or even stay local within a
>> GPU.
> 
> This description doesn't really make clear how per-region capabilities
> are relevant to these devices.


I am confused. This patch just adds a hook, and the device specifics are
explained in the next patch where they are used...


> 
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> This is based on top of "vfio_pci: Allow mapping extra regions"
>> ---
>>  drivers/vfio/pci/vfio_pci_private.h | 3 +++
>>  drivers/vfio/pci/vfio_pci.c         | 6 ++++++
>>  2 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
>> index 86aab05..93c1738 100644
>> --- a/drivers/vfio/pci/vfio_pci_private.h
>> +++ b/drivers/vfio/pci/vfio_pci_private.h
>> @@ -62,6 +62,9 @@ struct vfio_pci_regops {
>>  	int	(*mmap)(struct vfio_pci_device *vdev,
>>  			struct vfio_pci_region *region,
>>  			struct vm_area_struct *vma);
>> +	int	(*add_capability)(struct vfio_pci_device *vdev,
>> +				  struct vfio_pci_region *region,
>> +				  struct vfio_info_cap *caps);
>>  };
>>  
>>  struct vfio_pci_region {
>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>> index 7923314..4a3b93e 100644
>> --- a/drivers/vfio/pci/vfio_pci.c
>> +++ b/drivers/vfio/pci/vfio_pci.c
>> @@ -759,6 +759,12 @@ static long vfio_pci_ioctl(void *device_data,
>>  			if (ret)
>>  				return ret;
>>  
>> +			if (vdev->region[i].ops->add_capability) {
>> +				ret = vdev->region[i].ops->add_capability(vdev,
>> +						&vdev->region[i], &caps);
>> +				if (ret)
>> +					return ret;
>> +			}
>>  		}
>>  		}
>>  
>
David Gibson Nov. 8, 2018, 7:16 a.m. | #3
On Thu, Nov 08, 2018 at 05:48:58PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 08/11/2018 17:21, David Gibson wrote:
> > On Mon, Oct 15, 2018 at 08:42:32PM +1100, Alexey Kardashevskiy wrote:
> >> VFIO regions already support region capabilities with a limited set of
> >> fields. However the subdriver might have to report to the userspace
> >> additional bits.
> >>
> >> This adds an add_capability() hook to vfio_pci_regops.
> >>
> >> This is aiming Witherspoon POWER9 machines which have multiple
> >> interconnected NVIDIA V100 GPUs with coherent RAM; each GPU's RAM
> >> is mapped to a system bus and to each of GPU internal system bus and
> >> the GPUs use this for DMA routing as DMA trafic can go via any
> >> of many NVLink2 (GPU-GPU or GPU-CPU) or even stay local within a
> >> GPU.
> > 
> > This description doesn't really make clear how per-region capabilities
> > are relevant to these devices.
> 
> 
> I am confused. This patch just adds a hook, and the device specifics are
> explained in the next patch where they are used...

Well, my point is the last paragraph of the commit message appears to
be a rationale for this change in terms of what's needed for the GPU
devices.  But how those described properties of the GPU mean that
region capabilites are useful / necessary isn't made clear.  If it's
not meant to be a rationale, I'm not sure what it's doing there at
all.

Patch

diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 86aab05..93c1738 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -62,6 +62,9 @@  struct vfio_pci_regops {
 	int	(*mmap)(struct vfio_pci_device *vdev,
 			struct vfio_pci_region *region,
 			struct vm_area_struct *vma);
+	int	(*add_capability)(struct vfio_pci_device *vdev,
+				  struct vfio_pci_region *region,
+				  struct vfio_info_cap *caps);
 };
 
 struct vfio_pci_region {
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 7923314..4a3b93e 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -759,6 +759,12 @@  static long vfio_pci_ioctl(void *device_data,
 			if (ret)
 				return ret;
 
+			if (vdev->region[i].ops->add_capability) {
+				ret = vdev->region[i].ops->add_capability(vdev,
+						&vdev->region[i], &caps);
+				if (ret)
+					return ret;
+			}
 		}
 		}