diff mbox series

[v6,04/13] drm/amdkfd: add SPM support for SVM

Message ID 20210813063150.2938-5-alex.sierra@amd.com
State Superseded
Headers show
Series Support DEVICE_GENERIC memory in migrate_vma_* | expand

Commit Message

Sierra Guiza, Alejandro (Alex) Aug. 13, 2021, 6:31 a.m. UTC
When CPU is connected throug XGMI, it has coherent
access to VRAM resource. In this case that resource
is taken from a table in the device gmc aperture base.
This resource is used along with the device type, which could
be DEVICE_PRIVATE or DEVICE_GENERIC to create the device
page map region.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Christoph Hellwig Aug. 15, 2021, 9:10 a.m. UTC | #1
> @@ -880,17 +881,22 @@ int svm_migrate_init(struct amdgpu_device *adev)
>  	 * should remove reserved size
>  	 */
>  	size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20);
> -	res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
> +	if (xgmi_connected_to_cpu)
> +		res = lookup_resource(&iomem_resource, adev->gmc.aper_base);
> +	else
> +		res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
> +

Can you explain what the point of the lookup_resource is here? res->start
is obviously identical to the start value you pass in.  So this is used
as a way to query the length, but I'm pretty sure the driver must
already know that as it inserted the resource itself, right?

On a slightly higher level comment svm_migrate_init is a bit of a mess
with all the if/else already, and with the above addressed will become
a bit more.  I think splitting it into a device private and device
generic case would probably help people finding it to understand the code
much better later on.  Even more so with a useful comment.
Felix Kuehling Aug. 16, 2021, 6:54 p.m. UTC | #2
Am 2021-08-15 um 5:10 a.m. schrieb Christoph Hellwig:
>> @@ -880,17 +881,22 @@ int svm_migrate_init(struct amdgpu_device *adev)
>>  	 * should remove reserved size
>>  	 */
>>  	size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20);
>> -	res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
>> +	if (xgmi_connected_to_cpu)
>> +		res = lookup_resource(&iomem_resource, adev->gmc.aper_base);
>> +	else
>> +		res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
>> +
> Can you explain what the point of the lookup_resource is here? res->start
> is obviously identical to the start value you pass in.  So this is used
> as a way to query the length, but I'm pretty sure the driver must
> already know that as it inserted the resource itself, right?

I think you're right. We only need the start and end address from
lookup_resource and we already know that anyway. It means we can drop
patch 3 from the series.

Just to be sure, we'll confirm that the end address determined by our
driver matches the one from lookup_resource (coming from the system
address map in the system BIOS). If there were a mismatch, it would
probably be a bug (in the driver or the BIOS) that we'd need to fix anyway.


>
> On a slightly higher level comment svm_migrate_init is a bit of a mess
> with all the if/else already, and with the above addressed will become
> a bit more.  I think splitting it into a device private and device
> generic case would probably help people finding it to understand the code
> much better later on.  Even more so with a useful comment.

I don't really see the "mess" you're talking about. Including the above,
there are only 3 conditional statements in that function that are not
error-handling related:

        /* Page migration works on Vega10 or newer */
        if (kfddev->device_info->asic_family < CHIP_VEGA10)
                return -EINVAL;
...
        if (xgmi_connected_to_cpu)
                res = lookup_resource(&iomem_resource, adev->gmc.aper_base);
        else
                res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
...
        pgmap->type = xgmi_connected_to_cpu ?
                                MEMORY_DEVICE_GENERIC : MEMORY_DEVICE_PRIVATE;


Regards,
  Felix
Christoph Hellwig Aug. 17, 2021, 5:47 a.m. UTC | #3
On Mon, Aug 16, 2021 at 02:54:30PM -0400, Felix Kuehling wrote:
> I think you're right. We only need the start and end address from
> lookup_resource and we already know that anyway. It means we can drop
> patch 3 from the series.
> 
> Just to be sure, we'll confirm that the end address determined by our
> driver matches the one from lookup_resource (coming from the system
> address map in the system BIOS). If there were a mismatch, it would
> probably be a bug (in the driver or the BIOS) that we'd need to fix anyway.

Or rather that the driver claimed area is smaller or the same as the
bios range.  No harm (except for potential peformance implications) when
you don't use all of it.

> I don't really see the "mess" you're talking about. Including the above,
> there are only 3 conditional statements in that function that are not
> error-handling related:
> 
>         /* Page migration works on Vega10 or newer */
>         if (kfddev->device_info->asic_family < CHIP_VEGA10)
>                 return -EINVAL;
> ...
>         if (xgmi_connected_to_cpu)
>                 res = lookup_resource(&iomem_resource, adev->gmc.aper_base);
>         else
>                 res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
> ...
>         pgmap->type = xgmi_connected_to_cpu ?
>                                 MEMORY_DEVICE_GENERIC : MEMORY_DEVICE_PRIVATE;
> 

Plus the devm_release_mem_region error handling that is currently missing.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index dab290a4d19d..24a8b6d4f947 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -868,6 +868,7 @@  int svm_migrate_init(struct amdgpu_device *adev)
 	struct resource *res;
 	unsigned long size;
 	void *r;
+	bool xgmi_connected_to_cpu = adev->gmc.xgmi.connected_to_cpu;
 
 	/* Page migration works on Vega10 or newer */
 	if (kfddev->device_info->asic_family < CHIP_VEGA10)
@@ -880,17 +881,22 @@  int svm_migrate_init(struct amdgpu_device *adev)
 	 * should remove reserved size
 	 */
 	size = ALIGN(adev->gmc.real_vram_size, 2ULL << 20);
-	res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
+	if (xgmi_connected_to_cpu)
+		res = lookup_resource(&iomem_resource, adev->gmc.aper_base);
+	else
+		res = devm_request_free_mem_region(adev->dev, &iomem_resource, size);
+
 	if (IS_ERR(res))
 		return -ENOMEM;
 
-	pgmap->type = MEMORY_DEVICE_PRIVATE;
 	pgmap->nr_range = 1;
 	pgmap->range.start = res->start;
 	pgmap->range.end = res->end;
+	pgmap->type = xgmi_connected_to_cpu ?
+				MEMORY_DEVICE_GENERIC : MEMORY_DEVICE_PRIVATE;
 	pgmap->ops = &svm_migrate_pgmap_ops;
 	pgmap->owner = SVM_ADEV_PGMAP_OWNER(adev);
-	pgmap->flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
+	pgmap->flags = 0;
 	r = devm_memremap_pages(adev->dev, pgmap);
 	if (IS_ERR(r)) {
 		pr_err("failed to register HMM device memory\n");
@@ -914,6 +920,7 @@  void svm_migrate_fini(struct amdgpu_device *adev)
 	struct dev_pagemap *pgmap = &adev->kfd.dev->pgmap;
 
 	devm_memunmap_pages(adev->dev, pgmap);
-	devm_release_mem_region(adev->dev, pgmap->range.start,
-				pgmap->range.end - pgmap->range.start + 1);
+	if (pgmap->type == MEMORY_DEVICE_PRIVATE)
+		devm_release_mem_region(adev->dev, pgmap->range.start,
+					pgmap->range.end - pgmap->range.start + 1);
 }