diff mbox series

[2/2,nvptx,PR83589] Workaround for branch-around-nothing JIT bug

Message ID 34fb1d00-dc5d-04f2-d601-ee6fe710ac3b@mentor.com
State New
Headers show
Series None | expand

Commit Message

Tom de Vries Jan. 24, 2018, 10:41 a.m. UTC
Hi,

this patch adds a workaround for the nvptx target JIT bug PR83589 - 
"[nvptx] mode-transitions.c and private-variables.{c,f90} execution 
FAILs at GOMP_NVPTX_JIT=-O0".


When compiling a branch-around-nothing (where the branch is warp 
neutering, so it's a divergent branch):
...
   .reg .pred %r36;
   {
     .reg .u32 %x;
     mov.u32 %x,%tid.x;
     setp.ne.u32 %r36,%x,0;
   }

   @ %r36 bra $L5;
   $L5:
...

The JIT fails to generate a convergence point here:
...
          /*0128*/               @P0 BRA `(.L_1);
.L_1:
...

Consequently, we execute subsequent code in divergent mode, and when 
executing a shfl.idx a bit later we run into the undefined behaviour 
that shfl.idx has when executing in divergent mode.

The workaround detects branch-around-nothing, and inserts a ptx 
operation that does nothing (I'm calling it a fake nop, I haven't been 
able to come up with a better term yet):
...
   @ %r36 bra $L5;
     {
       .reg .u32 %nop_src;
       .reg .u32 %nop_dst;
       mov.u32 %nop_dst, %nop_src;
     }
   $L5:
...
which makes the test pass, because then we generate a convergence point 
here at .L1:
...
         /*0128*/                   SSY `(.L_1);
         /*0130*/               @P0 SYNC (*"TARGET= .L_1 "*);
         /*0138*/                   SYNC (*"TARGET= .L_1 "*);
.L_1:
...

The workaround is not minimal given that it inserts the fake nop in all 
branch-around-nothings it detects, not just the warp neutering ones, but 
I think this is more robust than trying to identify the warp neutering 
branches. Furthermore, I'm not going for optimality here anyway. The 
optimal way to fix this is making sure we don't generate 
branch-around-nothing, but that's for stage1.

Build and reg-tested on x86_64 with nvptx accelerator.

I'd like to commit in stage4, but I'd appreciate a review of the code. 
Does the patch look OK?

Thanks,
- Tom

Comments

Richard Biener Jan. 24, 2018, 11 a.m. UTC | #1
On Wed, 24 Jan 2018, Tom de Vries wrote:

> Hi,
> 
> this patch adds a workaround for the nvptx target JIT bug PR83589 - "[nvptx]
> mode-transitions.c and private-variables.{c,f90} execution FAILs at
> GOMP_NVPTX_JIT=-O0".
> 
> 
> When compiling a branch-around-nothing (where the branch is warp neutering, so
> it's a divergent branch):
> ...
>   .reg .pred %r36;
>   {
>     .reg .u32 %x;
>     mov.u32 %x,%tid.x;
>     setp.ne.u32 %r36,%x,0;
>   }
> 
>   @ %r36 bra $L5;
>   $L5:
> ...
> 
> The JIT fails to generate a convergence point here:
> ...
>          /*0128*/               @P0 BRA `(.L_1);
> .L_1:
> ...
> 
> Consequently, we execute subsequent code in divergent mode, and when executing
> a shfl.idx a bit later we run into the undefined behaviour that shfl.idx has
> when executing in divergent mode.
> 
> The workaround detects branch-around-nothing, and inserts a ptx operation that
> does nothing (I'm calling it a fake nop, I haven't been able to come up with a
> better term yet):
> ...
>   @ %r36 bra $L5;
>     {
>       .reg .u32 %nop_src;
>       .reg .u32 %nop_dst;
>       mov.u32 %nop_dst, %nop_src;
>     }
>   $L5:
> ...
> which makes the test pass, because then we generate a convergence point here
> at .L1:
> ...
>         /*0128*/                   SSY `(.L_1);
>         /*0130*/               @P0 SYNC (*"TARGET= .L_1 "*);
>         /*0138*/                   SYNC (*"TARGET= .L_1 "*);
> .L_1:
> ...
> 
> The workaround is not minimal given that it inserts the fake nop in all
> branch-around-nothings it detects, not just the warp neutering ones, but I
> think this is more robust than trying to identify the warp neutering branches.
> Furthermore, I'm not going for optimality here anyway. The optimal way to fix
> this is making sure we don't generate branch-around-nothing, but that's for
> stage1.
> 
> Build and reg-tested on x86_64 with nvptx accelerator.
> 
> I'd like to commit in stage4, but I'd appreciate a review of the code. Does
> the patch look OK?

Ok for stage4, but this isn't a review ;)

Richard.

> Thanks,
> - Tom
>
Jakub Jelinek Jan. 24, 2018, 11:03 a.m. UTC | #2
On Wed, Jan 24, 2018 at 11:41:45AM +0100, Tom de Vries wrote:
> +/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
> +   insn inbetween the branch and the label.  This works around a JIT bug
> +   observed at driver version 384.111, at -O0 for sm_50.  */
> +
> +static void
> +prevent_branch_around_nothing (void)
> +{
> +  rtx_insn *seen_label = 0;
> +    for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +      {
> +	if (seen_label == 0)
> +	  {
> +	    if (INSN_P (insn) && condjump_p (insn))
> +	      seen_label = label_ref_label (nvptx_condjump_label (insn, false));
> +
> +	    continue;
> +	  }
> +
> +	if (NOTE_P (insn))
> +	  continue;

I'm afraid for review I don't know the backend enough.
I'd just suggest using NULL instead of 0 for pointers, i.e. clearing
seen_label or comparisons of seen_label against NULL, and wonder if
DEBUG_INSNs are guaranteed not to appear here.  If not, you'd need to
skip them too.

	Jakub
Tom de Vries Jan. 24, 2018, 1:56 p.m. UTC | #3
On 01/24/2018 12:03 PM, Jakub Jelinek wrote:
> On Wed, Jan 24, 2018 at 11:41:45AM +0100, Tom de Vries wrote:
>> +/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
>> +   insn inbetween the branch and the label.  This works around a JIT bug
>> +   observed at driver version 384.111, at -O0 for sm_50.  */
>> +
>> +static void
>> +prevent_branch_around_nothing (void)
>> +{
>> +  rtx_insn *seen_label = 0;
>> +    for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
>> +      {
>> +	if (seen_label == 0)
>> +	  {
>> +	    if (INSN_P (insn) && condjump_p (insn))
>> +	      seen_label = label_ref_label (nvptx_condjump_label (insn, false));
>> +
>> +	    continue;
>> +	  }
>> +
>> +	if (NOTE_P (insn))
>> +	  continue;
> 
> I'm afraid for review I don't know the backend enough. > I'd just suggest using NULL instead of 0 for pointers, i.e. clearing
> seen_label or comparisons of seen_label against NULL,

Done.

> and wonder if
> DEBUG_INSNs are guaranteed not to appear here.  If not, you'd need to
> skip them too.
> 

Done.

Retested and committed as attached.

Thanks,
- Tom
[nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-23  Tom de Vries  <tom@codesourcery.com>

	PR target/83589
	* config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1.
	(nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c.
	Add strict parameter.
	(prevent_branch_around_nothing): Insert dummy insn between branch to
	label and label with no ptx insn inbetween.
	* config/nvptx/nvptx.md (define_insn "fake_nop"): New insn.

	* testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test.

---
 gcc/config/nvptx/nvptx.c                           | 92 ++++++++++++++++++++++
 gcc/config/nvptx/nvptx.md                          |  9 +++
 .../testsuite/libgomp.oacc-c-c++-common/pr83589.c  | 21 +++++
 3 files changed, 122 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 3516740..d848412 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -78,6 +78,7 @@
 #include "target-def.h"
 
 #define WORKAROUND_PTXJIT_BUG 1
+#define WORKAROUND_PTXJIT_BUG_2 1
 
 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
@@ -4363,6 +4364,93 @@ nvptx_neuter_pars (parallel *par, unsigned modes, unsigned outer)
     nvptx_neuter_pars (par->next, modes, outer);
 }
 
+#if WORKAROUND_PTXJIT_BUG_2
+/* Variant of pc_set that only requires JUMP_P (INSN) if STRICT.  This variant
+   is needed in the nvptx target because the branches generated for
+   parititioning are NONJUMP_INSN_P, not JUMP_P.  */
+
+static rtx
+nvptx_pc_set (const rtx_insn *insn, bool strict = true)
+{
+  rtx pat;
+  if ((strict && !JUMP_P (insn))
+      || (!strict && !INSN_P (insn)))
+    return NULL_RTX;
+  pat = PATTERN (insn);
+
+  /* The set is allowed to appear either as the insn pattern or
+     the first set in a PARALLEL.  */
+  if (GET_CODE (pat) == PARALLEL)
+    pat = XVECEXP (pat, 0, 0);
+  if (GET_CODE (pat) == SET && GET_CODE (SET_DEST (pat)) == PC)
+    return pat;
+
+  return NULL_RTX;
+}
+
+/* Variant of condjump_label that only requires JUMP_P (INSN) if STRICT.  */
+
+static rtx
+nvptx_condjump_label (const rtx_insn *insn, bool strict = true)
+{
+  rtx x = nvptx_pc_set (insn, strict);
+
+  if (!x)
+    return NULL_RTX;
+  x = SET_SRC (x);
+  if (GET_CODE (x) == LABEL_REF)
+    return x;
+  if (GET_CODE (x) != IF_THEN_ELSE)
+    return NULL_RTX;
+  if (XEXP (x, 2) == pc_rtx && GET_CODE (XEXP (x, 1)) == LABEL_REF)
+    return XEXP (x, 1);
+  if (XEXP (x, 1) == pc_rtx && GET_CODE (XEXP (x, 2)) == LABEL_REF)
+    return XEXP (x, 2);
+  return NULL_RTX;
+}
+
+/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
+   insn inbetween the branch and the label.  This works around a JIT bug
+   observed at driver version 384.111, at -O0 for sm_50.  */
+
+static void
+prevent_branch_around_nothing (void)
+{
+  rtx_insn *seen_label = NULL;
+    for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
+      {
+	if (seen_label == NULL)
+	  {
+	    if (INSN_P (insn) && condjump_p (insn))
+	      seen_label = label_ref_label (nvptx_condjump_label (insn, false));
+
+	    continue;
+	  }
+
+	if (NOTE_P (insn) || DEBUG_INSN_P (insn))
+	  continue;
+
+	if (INSN_P (insn))
+	  switch (recog_memoized (insn))
+	    {
+	    case CODE_FOR_nvptx_fork:
+	    case CODE_FOR_nvptx_forked:
+	    case CODE_FOR_nvptx_joining:
+	    case CODE_FOR_nvptx_join:
+	      continue;
+	    default:
+	      seen_label = NULL;
+	      continue;
+	    }
+
+	if (LABEL_P (insn) && insn == seen_label)
+	  emit_insn_before (gen_fake_nop (), insn);
+
+	seen_label = NULL;
+      }
+  }
+#endif
+
 /* PTX-specific reorganization
    - Split blocks at fork and join instructions
    - Compute live registers
@@ -4442,6 +4530,10 @@ nvptx_reorg (void)
   if (TARGET_UNIFORM_SIMT)
     nvptx_reorg_uniform_simt ();
 
+#if WORKAROUND_PTXJIT_BUG_2
+  prevent_branch_around_nothing ();
+#endif
+
   regstat_free_n_sets_and_refs ();
 
   df_finish_pass (true);
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 135479b..4f4453d 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -999,6 +999,15 @@
   ""
   "exit;")
 
+(define_insn "fake_nop"
+  [(const_int 2)]
+  ""
+  "{
+     .reg .u32 %%nop_src;
+     .reg .u32 %%nop_dst;
+     mov.u32 %%nop_dst, %%nop_src;
+   }")
+
 (define_insn "return"
   [(return)]
   ""
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c
new file mode 100644
index 0000000..a6ed5cf
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-set-target-env-var GOMP_NVPTX_JIT "-O0" } */
+
+#define n 32
+
+int
+main (void)
+{
+  int arr_a[n];
+
+#pragma acc parallel copyout(arr_a) num_gangs(1) num_workers(1) vector_length(32)
+  {
+    #pragma acc loop vector
+    for (int m = 0; m < 32; m++)
+      ;
+
+    #pragma acc loop vector
+    for (int m = 0; m < 32; m++)
+      arr_a[m] = 0;
+  }
+}
Jakub Jelinek Jan. 24, 2018, 2:07 p.m. UTC | #4
On Wed, Jan 24, 2018 at 02:56:28PM +0100, Tom de Vries wrote:
> +#if WORKAROUND_PTXJIT_BUG_2
> +/* Variant of pc_set that only requires JUMP_P (INSN) if STRICT.  This variant
> +   is needed in the nvptx target because the branches generated for
> +   parititioning are NONJUMP_INSN_P, not JUMP_P.  */
> +
> +static rtx
> +nvptx_pc_set (const rtx_insn *insn, bool strict = true)
> +{
> +  rtx pat;
> +  if ((strict && !JUMP_P (insn))
> +      || (!strict && !INSN_P (insn)))
> +    return NULL_RTX;
> +  pat = PATTERN (insn);
> +
> +  /* The set is allowed to appear either as the insn pattern or
> +     the first set in a PARALLEL.  */
> +  if (GET_CODE (pat) == PARALLEL)
> +    pat = XVECEXP (pat, 0, 0);

This could have been single_set.

> +  if (!x)
> +    return NULL_RTX;
> +  x = SET_SRC (x);
> +  if (GET_CODE (x) == LABEL_REF)
> +    return x;
> +  if (GET_CODE (x) != IF_THEN_ELSE)
> +    return NULL_RTX;
> +  if (XEXP (x, 2) == pc_rtx && GET_CODE (XEXP (x, 1)) == LABEL_REF)
> +    return XEXP (x, 1);
> +  if (XEXP (x, 1) == pc_rtx && GET_CODE (XEXP (x, 2)) == LABEL_REF)
> +    return XEXP (x, 2);
> +  return NULL_RTX;

And this looks like condjump_label.  What are the nvptx conditional jumps
that aren't JUMP_INSN and why?  That looks like a bad idea.
Otherwise, there is also JUMP_LABEL (insn)...

	Jakub
Tom de Vries Jan. 24, 2018, 3:36 p.m. UTC | #5
On 01/24/2018 03:07 PM, Jakub Jelinek wrote:
> On Wed, Jan 24, 2018 at 02:56:28PM +0100, Tom de Vries wrote:
>> +#if WORKAROUND_PTXJIT_BUG_2
>> +/* Variant of pc_set that only requires JUMP_P (INSN) if STRICT.  This variant
>> +   is needed in the nvptx target because the branches generated for
>> +   parititioning are NONJUMP_INSN_P, not JUMP_P.  */
>> +
>> +static rtx
>> +nvptx_pc_set (const rtx_insn *insn, bool strict = true)
>> +{
>> +  rtx pat;
>> +  if ((strict && !JUMP_P (insn))
>> +      || (!strict && !INSN_P (insn)))
>> +    return NULL_RTX;
>> +  pat = PATTERN (insn);
>> +
>> +  /* The set is allowed to appear either as the insn pattern or
>> +     the first set in a PARALLEL.  */
>> +  if (GET_CODE (pat) == PARALLEL)
>> +    pat = XVECEXP (pat, 0, 0);
> 
> This could have been single_set.
> 

This is just a copy of pc_set in jump.c, with the strict parameter added.

It's possible that we can use single_set in pc_set in jump.c. But there 
are subtle differences:
- current pc_set allows a second non-dead set in parallel
- single_set doesn't allow second non-dead set in parallel

I don't know whether this difference is significant or not.

>> +  if (!x)
>> +    return NULL_RTX;
>> +  x = SET_SRC (x);
>> +  if (GET_CODE (x) == LABEL_REF)
>> +    return x;
>> +  if (GET_CODE (x) != IF_THEN_ELSE)
>> +    return NULL_RTX;
>> +  if (XEXP (x, 2) == pc_rtx && GET_CODE (XEXP (x, 1)) == LABEL_REF)
>> +    return XEXP (x, 1);
>> +  if (XEXP (x, 1) == pc_rtx && GET_CODE (XEXP (x, 2)) == LABEL_REF)
>> +    return XEXP (x, 2);
>> +  return NULL_RTX;
> 
> And this looks like condjump_label.  

This is just a copy of condjump_label in jump.c, with the strict 
parameter added.

> What are the nvptx conditional jumps
> that aren't JUMP_INSN and why?  That looks like a bad idea.

OpenACC has different execution modes:
- gang redundant vs gang partitioned
- worker single vs worker partitioned
- vector single vs vector partitioned

The transitions between the different modes are represented by:
- nvptx_fork
- nvptx_forked
- nvptx_join
- nvptx_joined
until pass_machine_reorg.

In pass_machine_reorg, they are expanded into more detailed operations 
implementing state propagation and neutering code for single mode.

The neutering code consists of branch-around code, which uses these 
conditional jumps that are not JUMP_INSN.

My assumption is that this is done in order to make the compiler behave 
conservatively with these jumps.  I'm not sure if this is related to one 
or more passes after reorg, or if this is just defensive programming.

I could try to change these into JUMP_INSN in stage1, and see how that goes.

> Otherwise, there is also JUMP_LABEL (insn)...

Right, that one requires a JUMP_INSN.

Thanks,
- Tom
Thomas Schwinge May 17, 2018, 6:34 a.m. UTC | #6
Hi!

On Wed, 24 Jan 2018 14:56:28 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 01/24/2018 12:03 PM, Jakub Jelinek wrote:
> > On Wed, Jan 24, 2018 at 11:41:45AM +0100, Tom de Vries wrote:
> >> +/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
> >> +   insn inbetween the branch and the label.  This works around a JIT bug
> >> +   observed at driver version 384.111, at -O0 for sm_50.  */

> [...] committed as attached.

I pushed this to openacc-gcc-7-branch, including its later bug fix
"[nvptx] Fix prevent_branch_around_nothing":

commit 4cad9fa6b7a85e01da260d0b9e20de30d53f1881
Author: Tom de Vries <tom@codesourcery.com>
Date:   Wed Jan 24 13:52:12 2018 +0000

    [nvptx, PR83589] Workaround for branch-around-nothing JIT bug
    
            gcc/
            PR target/83589
            * config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1.
            (nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c.
            Add strict parameter.
            (prevent_branch_around_nothing): Insert dummy insn between branch to
            label and label with no ptx insn inbetween.
            * config/nvptx/nvptx.md (define_insn "fake_nop"): New insn.
            PR target/84954
            * config/nvptx/nvptx.c (prevent_branch_around_nothing): Also update
            seen_label if seen_label is already set.
            libgomp/
            PR target/83589
            * testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test.
    
    (cherry picked from trunk r257016 and r258674)
---
 gcc/ChangeLog.openacc                              | 16 ++++
 gcc/config/nvptx/nvptx.c                           | 93 ++++++++++++++++++++++
 gcc/config/nvptx/nvptx.md                          |  9 +++
 libgomp/ChangeLog.openacc                          |  5 ++
 .../testsuite/libgomp.oacc-c-c++-common/pr83589.c  | 21 +++++
 5 files changed, 144 insertions(+)

diff --git gcc/ChangeLog.openacc gcc/ChangeLog.openacc
index 172f1fc..4d13080 100644
--- gcc/ChangeLog.openacc
+++ gcc/ChangeLog.openacc
@@ -1,3 +1,19 @@
+2018-03-20  Tom de Vries  <tom@codesourcery.com>
+
+	PR target/84954
+	* config/nvptx/nvptx.c (prevent_branch_around_nothing): Also update
+	seen_label if seen_label is already set.
+
+2018-01-24  Tom de Vries  <tom@codesourcery.com>
+
+	PR target/83589
+	* config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1.
+	(nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c.
+	Add strict parameter.
+	(prevent_branch_around_nothing): Insert dummy insn between branch to
+	label and label with no ptx insn inbetween.
+	* config/nvptx/nvptx.md (define_insn "fake_nop"): New insn.
+
 2018-05-09  Tom de Vries  <tom@codesourcery.com>
 
 	backport from trunk:
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index d659ab4..f636d8d 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -77,6 +77,7 @@
 #include "target-def.h"
 
 #define WORKAROUND_PTXJIT_BUG 1
+#define WORKAROUND_PTXJIT_BUG_2 1
 #define WORKAROUND_PTXJIT_BUG_3 1
 
 /* Define dimension sizes for known hardware.  */
@@ -4636,6 +4637,94 @@ populate_offload_attrs (offload_attrs *oa)
     oa->max_workers = oa->num_workers;
 }
 
+#if WORKAROUND_PTXJIT_BUG_2
+/* Variant of pc_set that only requires JUMP_P (INSN) if STRICT.  This variant
+   is needed in the nvptx target because the branches generated for
+   parititioning are NONJUMP_INSN_P, not JUMP_P.  */
+
+static rtx
+nvptx_pc_set (const rtx_insn *insn, bool strict = true)
+{
+  rtx pat;
+  if ((strict && !JUMP_P (insn))
+      || (!strict && !INSN_P (insn)))
+    return NULL_RTX;
+  pat = PATTERN (insn);
+
+  /* The set is allowed to appear either as the insn pattern or
+     the first set in a PARALLEL.  */
+  if (GET_CODE (pat) == PARALLEL)
+    pat = XVECEXP (pat, 0, 0);
+  if (GET_CODE (pat) == SET && GET_CODE (SET_DEST (pat)) == PC)
+    return pat;
+
+  return NULL_RTX;
+}
+
+/* Variant of condjump_label that only requires JUMP_P (INSN) if STRICT.  */
+
+static rtx
+nvptx_condjump_label (const rtx_insn *insn, bool strict = true)
+{
+  rtx x = nvptx_pc_set (insn, strict);
+
+  if (!x)
+    return NULL_RTX;
+  x = SET_SRC (x);
+  if (GET_CODE (x) == LABEL_REF)
+    return x;
+  if (GET_CODE (x) != IF_THEN_ELSE)
+    return NULL_RTX;
+  if (XEXP (x, 2) == pc_rtx && GET_CODE (XEXP (x, 1)) == LABEL_REF)
+    return XEXP (x, 1);
+  if (XEXP (x, 1) == pc_rtx && GET_CODE (XEXP (x, 2)) == LABEL_REF)
+    return XEXP (x, 2);
+  return NULL_RTX;
+}
+
+/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
+   insn inbetween the branch and the label.  This works around a JIT bug
+   observed at driver version 384.111, at -O0 for sm_50.  */
+
+static void
+prevent_branch_around_nothing (void)
+{
+  rtx_insn *seen_label = NULL;
+    for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
+      {
+	if (INSN_P (insn) && condjump_p (insn))
+	  {
+	    seen_label = label_ref_label (nvptx_condjump_label (insn, false));
+	    continue;
+	  }
+
+	if (seen_label == NULL)
+	  continue;
+
+	if (NOTE_P (insn) || DEBUG_INSN_P (insn))
+	  continue;
+
+	if (INSN_P (insn))
+	  switch (recog_memoized (insn))
+	    {
+	    case CODE_FOR_nvptx_fork:
+	    case CODE_FOR_nvptx_forked:
+	    case CODE_FOR_nvptx_joining:
+	    case CODE_FOR_nvptx_join:
+	      continue;
+	    default:
+	      seen_label = NULL;
+	      continue;
+	    }
+
+	if (LABEL_P (insn) && insn == seen_label)
+	  emit_insn_before (gen_fake_nop (), insn);
+
+	seen_label = NULL;
+      }
+  }
+#endif
+
 #ifdef WORKAROUND_PTXJIT_BUG_3
 /* Insert two membar.cta insns inbetween two subsequent bar.sync insns.  This
    works around a hang observed at driver version 390.48 for sm_50.  */
@@ -4754,6 +4843,10 @@ nvptx_reorg (void)
   if (TARGET_UNIFORM_SIMT)
     nvptx_reorg_uniform_simt ();
 
+#if WORKAROUND_PTXJIT_BUG_2
+  prevent_branch_around_nothing ();
+#endif
+
 #ifdef WORKAROUND_PTXJIT_BUG_3
   workaround_barsyncs ();
 #endif
diff --git gcc/config/nvptx/nvptx.md gcc/config/nvptx/nvptx.md
index ab63cf8..9e3001c 100644
--- gcc/config/nvptx/nvptx.md
+++ gcc/config/nvptx/nvptx.md
@@ -981,6 +981,15 @@
   ""
   "exit;")
 
+(define_insn "fake_nop"
+  [(const_int 2)]
+  ""
+  "{
+     .reg .u32 %%nop_src;
+     .reg .u32 %%nop_dst;
+     mov.u32 %%nop_dst, %%nop_src;
+   }")
+
 (define_insn "return"
   [(return)]
   ""
diff --git libgomp/ChangeLog.openacc libgomp/ChangeLog.openacc
index d1cc107..add3b24 100644
--- libgomp/ChangeLog.openacc
+++ libgomp/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2018-01-24  Tom de Vries  <tom@codesourcery.com>
+
+	PR target/83589
+	* testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test.
+
 2018-05-09  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* libgomp.oacc-fortran/deviceptr-1.f90: Remove xfail for -O2 and -O3.
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c
new file mode 100644
index 0000000..a6ed5cf
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-set-target-env-var GOMP_NVPTX_JIT "-O0" } */
+
+#define n 32
+
+int
+main (void)
+{
+  int arr_a[n];
+
+#pragma acc parallel copyout(arr_a) num_gangs(1) num_workers(1) vector_length(32)
+  {
+    #pragma acc loop vector
+    for (int m = 0; m < 32; m++)
+      ;
+
+    #pragma acc loop vector
+    for (int m = 0; m < 32; m++)
+      arr_a[m] = 0;
+  }
+}


Grüße
 Thomas
diff mbox series

Patch

[nvptx, PR83589] Workaround for branch-around-nothing JIT bug

2018-01-23  Tom de Vries  <tom@codesourcery.com>

	PR target/83589
	* config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1.
	(nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c.
	Add strict parameter.
	(prevent_branch_around_nothing): Insert dummy insn between branch to
	label and label with no ptx insn inbetween.
	* config/nvptx/nvptx.md (define_insn "fake_nop"): New insn.

	* testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test.

---
 gcc/config/nvptx/nvptx.c                           | 92 ++++++++++++++++++++++
 gcc/config/nvptx/nvptx.md                          |  9 +++
 .../testsuite/libgomp.oacc-c-c++-common/pr83589.c  | 21 +++++
 3 files changed, 122 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 3516740..e55b426 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -78,6 +78,7 @@ 
 #include "target-def.h"
 
 #define WORKAROUND_PTXJIT_BUG 1
+#define WORKAROUND_PTXJIT_BUG_2 1
 
 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
@@ -4363,6 +4364,93 @@  nvptx_neuter_pars (parallel *par, unsigned modes, unsigned outer)
     nvptx_neuter_pars (par->next, modes, outer);
 }
 
+#if WORKAROUND_PTXJIT_BUG_2
+/* Variant of pc_set that only requires JUMP_P (INSN) if STRICT.  This variant
+   is needed in the nvptx target because the branches generated for
+   parititioning are NONJUMP_INSN_P, not JUMP_P.  */
+
+static rtx
+nvptx_pc_set (const rtx_insn *insn, bool strict = true)
+{
+  rtx pat;
+  if ((strict && !JUMP_P (insn))
+      || (!strict && !INSN_P (insn)))
+    return NULL_RTX;
+  pat = PATTERN (insn);
+
+  /* The set is allowed to appear either as the insn pattern or
+     the first set in a PARALLEL.  */
+  if (GET_CODE (pat) == PARALLEL)
+    pat = XVECEXP (pat, 0, 0);
+  if (GET_CODE (pat) == SET && GET_CODE (SET_DEST (pat)) == PC)
+    return pat;
+
+  return NULL_RTX;
+}
+
+/* Variant of condjump_label that only requires JUMP_P (INSN) if STRICT.  */
+
+static rtx
+nvptx_condjump_label (const rtx_insn *insn, bool strict = true)
+{
+  rtx x = nvptx_pc_set (insn, strict);
+
+  if (!x)
+    return NULL_RTX;
+  x = SET_SRC (x);
+  if (GET_CODE (x) == LABEL_REF)
+    return x;
+  if (GET_CODE (x) != IF_THEN_ELSE)
+    return NULL_RTX;
+  if (XEXP (x, 2) == pc_rtx && GET_CODE (XEXP (x, 1)) == LABEL_REF)
+    return XEXP (x, 1);
+  if (XEXP (x, 1) == pc_rtx && GET_CODE (XEXP (x, 2)) == LABEL_REF)
+    return XEXP (x, 2);
+  return NULL_RTX;
+}
+
+/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
+   insn inbetween the branch and the label.  This works around a JIT bug
+   observed at driver version 384.111, at -O0 for sm_50.  */
+
+static void
+prevent_branch_around_nothing (void)
+{
+  rtx_insn *seen_label = 0;
+    for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
+      {
+	if (seen_label == 0)
+	  {
+	    if (INSN_P (insn) && condjump_p (insn))
+	      seen_label = label_ref_label (nvptx_condjump_label (insn, false));
+
+	    continue;
+	  }
+
+	if (NOTE_P (insn))
+	  continue;
+
+	if (INSN_P (insn))
+	  switch (recog_memoized (insn))
+	    {
+	    case CODE_FOR_nvptx_fork:
+	    case CODE_FOR_nvptx_forked:
+	    case CODE_FOR_nvptx_joining:
+	    case CODE_FOR_nvptx_join:
+	      continue;
+	    default:
+	      seen_label = 0;
+	      continue;
+	    }
+
+	if (LABEL_P (insn) && insn == seen_label)
+	  emit_insn_before (gen_fake_nop (), insn);
+
+	seen_label = 0;
+      }
+  }
+#endif
+
 /* PTX-specific reorganization
    - Split blocks at fork and join instructions
    - Compute live registers
@@ -4442,6 +4530,10 @@  nvptx_reorg (void)
   if (TARGET_UNIFORM_SIMT)
     nvptx_reorg_uniform_simt ();
 
+#if WORKAROUND_PTXJIT_BUG_2
+  prevent_branch_around_nothing ();
+#endif
+
   regstat_free_n_sets_and_refs ();
 
   df_finish_pass (true);
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 135479b..4f4453d 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -999,6 +999,15 @@ 
   ""
   "exit;")
 
+(define_insn "fake_nop"
+  [(const_int 2)]
+  ""
+  "{
+     .reg .u32 %%nop_src;
+     .reg .u32 %%nop_dst;
+     mov.u32 %%nop_dst, %%nop_src;
+   }")
+
 (define_insn "return"
   [(return)]
   ""
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c
new file mode 100644
index 0000000..a6ed5cf
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83589.c
@@ -0,0 +1,21 @@ 
+/* { dg-do run } */
+/* { dg-set-target-env-var GOMP_NVPTX_JIT "-O0" } */
+
+#define n 32
+
+int
+main (void)
+{
+  int arr_a[n];
+
+#pragma acc parallel copyout(arr_a) num_gangs(1) num_workers(1) vector_length(32)
+  {
+    #pragma acc loop vector
+    for (int m = 0; m < 32; m++)
+      ;
+
+    #pragma acc loop vector
+    for (int m = 0; m < 32; m++)
+      arr_a[m] = 0;
+  }
+}