diff mbox series

[1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range

Message ID 20180417091129.23069-1-alistair@popple.id.au (mailing list archive)
State Accepted
Commit d0cf9b561ca97d5245bb9e0c4774b7fadd897d67
Headers show
Series [1/2] powernv/npu: Do a PID GPU TLB flush when invalidating a large address range | expand

Commit Message

Alistair Popple April 17, 2018, 9:11 a.m. UTC
The NPU has a limited number of address translation shootdown (ATSD)
registers and the GPU has limited bandwidth to process ATSDs. This can
result in contention of ATSD registers leading to soft lockups on some
threads, particularly when invalidating a large address range in
pnv_npu2_mn_invalidate_range().

At some threshold it becomes more efficient to flush the entire GPU TLB for
the given MM context (PID) than individually flushing each address in the
range. This patch will result in ranges greater than 2MB being converted
from 32+ ATSDs into a single ATSD which will flush the TLB for the given
PID on each GPU.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
---
 arch/powerpc/platforms/powernv/npu-dma.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

Comments

Balbir Singh April 17, 2018, 9:17 a.m. UTC | #1
On Tue, Apr 17, 2018 at 7:11 PM, Alistair Popple <alistair@popple.id.au> wrote:
> The NPU has a limited number of address translation shootdown (ATSD)
> registers and the GPU has limited bandwidth to process ATSDs. This can
> result in contention of ATSD registers leading to soft lockups on some
> threads, particularly when invalidating a large address range in
> pnv_npu2_mn_invalidate_range().
>
> At some threshold it becomes more efficient to flush the entire GPU TLB for
> the given MM context (PID) than individually flushing each address in the
> range. This patch will result in ranges greater than 2MB being converted
> from 32+ ATSDs into a single ATSD which will flush the TLB for the given
> PID on each GPU.
>
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 94801d8e7894..dc34662e9df9 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -40,6 +40,13 @@
>  DEFINE_SPINLOCK(npu_context_lock);
>
>  /*
> + * When an address shootdown range exceeds this threshold we invalidate the
> + * entire TLB on the GPU for the given PID rather than each specific address in
> + * the range.
> + */
> +#define ATSD_THRESHOLD (2*1024*1024)
> +
> +/*
>   * Other types of TCE cache invalidation are not functional in the
>   * hardware.
>   */
> @@ -675,11 +682,19 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
>         struct npu_context *npu_context = mn_to_npu_context(mn);
>         unsigned long address;
>
> -       for (address = start; address < end; address += PAGE_SIZE)
> -               mmio_invalidate(npu_context, 1, address, false);
> +       if (end - start > ATSD_THRESHOLD) {

I'm nitpicking, but (end - start) > ATSD_THRESHOLD is clearer

> +               /*
> +                * Just invalidate the entire PID if the address range is too
> +                * large.
> +                */
> +               mmio_invalidate(npu_context, 0, 0, true);
> +       } else {
> +               for (address = start; address < end; address += PAGE_SIZE)
> +                       mmio_invalidate(npu_context, 1, address, false);
>
> -       /* Do the flush only on the final addess == end */
> -       mmio_invalidate(npu_context, 1, address, true);
> +               /* Do the flush only on the final addess == end */
> +               mmio_invalidate(npu_context, 1, address, true);
> +       }
>  }
>

Acked-by: Balbir Singh <bsingharora@gmail.com>
Balbir Singh April 17, 2018, 10:25 p.m. UTC | #2
On Tue, Apr 17, 2018 at 7:17 PM, Balbir Singh <bsingharora@gmail.com> wrote:
> On Tue, Apr 17, 2018 at 7:11 PM, Alistair Popple <alistair@popple.id.au> wrote:
>> The NPU has a limited number of address translation shootdown (ATSD)
>> registers and the GPU has limited bandwidth to process ATSDs. This can
>> result in contention of ATSD registers leading to soft lockups on some
>> threads, particularly when invalidating a large address range in
>> pnv_npu2_mn_invalidate_range().
>>
>> At some threshold it becomes more efficient to flush the entire GPU TLB for
>> the given MM context (PID) than individually flushing each address in the
>> range. This patch will result in ranges greater than 2MB being converted
>> from 32+ ATSDs into a single ATSD which will flush the TLB for the given
>> PID on each GPU.
>>
>> Signed-off-by: Alistair Popple <alistair@popple.id.au>
>> +       }
>>  }
>>
>
> Acked-by: Balbir Singh <bsingharora@gmail.com>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Alistair Popple April 20, 2018, 3:51 a.m. UTC | #3
Sorry, forgot to include:

Fixes: 1ab66d1fbada ("powerpc/powernv: Introduce address translation services for Nvlink2")

Thanks

On Tuesday, 17 April 2018 7:11:28 PM AEST Alistair Popple wrote:
> The NPU has a limited number of address translation shootdown (ATSD)
> registers and the GPU has limited bandwidth to process ATSDs. This can
> result in contention of ATSD registers leading to soft lockups on some
> threads, particularly when invalidating a large address range in
> pnv_npu2_mn_invalidate_range().
> 
> At some threshold it becomes more efficient to flush the entire GPU TLB for
> the given MM context (PID) than individually flushing each address in the
> range. This patch will result in ranges greater than 2MB being converted
> from 32+ ATSDs into a single ATSD which will flush the TLB for the given
> PID on each GPU.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
> index 94801d8e7894..dc34662e9df9 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -40,6 +40,13 @@
>  DEFINE_SPINLOCK(npu_context_lock);
>  
>  /*
> + * When an address shootdown range exceeds this threshold we invalidate the
> + * entire TLB on the GPU for the given PID rather than each specific address in
> + * the range.
> + */
> +#define ATSD_THRESHOLD (2*1024*1024)
> +
> +/*
>   * Other types of TCE cache invalidation are not functional in the
>   * hardware.
>   */
> @@ -675,11 +682,19 @@ static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
>  	struct npu_context *npu_context = mn_to_npu_context(mn);
>  	unsigned long address;
>  
> -	for (address = start; address < end; address += PAGE_SIZE)
> -		mmio_invalidate(npu_context, 1, address, false);
> +	if (end - start > ATSD_THRESHOLD) {
> +		/*
> +		 * Just invalidate the entire PID if the address range is too
> +		 * large.
> +		 */
> +		mmio_invalidate(npu_context, 0, 0, true);
> +	} else {
> +		for (address = start; address < end; address += PAGE_SIZE)
> +			mmio_invalidate(npu_context, 1, address, false);
>  
> -	/* Do the flush only on the final addess == end */
> -	mmio_invalidate(npu_context, 1, address, true);
> +		/* Do the flush only on the final addess == end */
> +		mmio_invalidate(npu_context, 1, address, true);
> +	}
>  }
>  
>  static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
>
Michael Ellerman April 24, 2018, 3:48 a.m. UTC | #4
On Tue, 2018-04-17 at 09:11:28 UTC, Alistair Popple wrote:
> The NPU has a limited number of address translation shootdown (ATSD)
> registers and the GPU has limited bandwidth to process ATSDs. This can
> result in contention of ATSD registers leading to soft lockups on some
> threads, particularly when invalidating a large address range in
> pnv_npu2_mn_invalidate_range().
> 
> At some threshold it becomes more efficient to flush the entire GPU TLB for
> the given MM context (PID) than individually flushing each address in the
> range. This patch will result in ranges greater than 2MB being converted
> from 32+ ATSDs into a single ATSD which will flush the TLB for the given
> PID on each GPU.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>
> Acked-by: Balbir Singh <bsingharora@gmail.com>
> Tested-by: Balbir Singh <bsingharora@gmail.com>

Patch 1 applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/d0cf9b561ca97d5245bb9e0c4774b7

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
index 94801d8e7894..dc34662e9df9 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -40,6 +40,13 @@ 
 DEFINE_SPINLOCK(npu_context_lock);
 
 /*
+ * When an address shootdown range exceeds this threshold we invalidate the
+ * entire TLB on the GPU for the given PID rather than each specific address in
+ * the range.
+ */
+#define ATSD_THRESHOLD (2*1024*1024)
+
+/*
  * Other types of TCE cache invalidation are not functional in the
  * hardware.
  */
@@ -675,11 +682,19 @@  static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
 	struct npu_context *npu_context = mn_to_npu_context(mn);
 	unsigned long address;
 
-	for (address = start; address < end; address += PAGE_SIZE)
-		mmio_invalidate(npu_context, 1, address, false);
+	if (end - start > ATSD_THRESHOLD) {
+		/*
+		 * Just invalidate the entire PID if the address range is too
+		 * large.
+		 */
+		mmio_invalidate(npu_context, 0, 0, true);
+	} else {
+		for (address = start; address < end; address += PAGE_SIZE)
+			mmio_invalidate(npu_context, 1, address, false);
 
-	/* Do the flush only on the final addess == end */
-	mmio_invalidate(npu_context, 1, address, true);
+		/* Do the flush only on the final addess == end */
+		mmio_invalidate(npu_context, 1, address, true);
+	}
 }
 
 static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {