diff mbox series

[nvptx,PR84954,committed] Fix prevent_branch_around_nothing

Message ID 5ad5142d-7b99-9f1b-8513-92f77c25d823@mentor.com
State New
Headers show
Series [nvptx,PR84954,committed] Fix prevent_branch_around_nothing | expand

Commit Message

Tom de Vries March 20, 2018, 9:17 a.m. UTC
[ was: Re: [PATCH, 2/2][nvptx, PR83589] Workaround for 
branch-around-nothing JIT bug ]

On 01/24/2018 11:41 AM, Tom de Vries wrote:
> Hi,
> 
> this patch adds a workaround for the nvptx target JIT bug PR83589 - 
> "[nvptx] mode-transitions.c and private-variables.{c,f90} execution 
> FAILs at GOMP_NVPTX_JIT=-O0".
> 
> 
> When compiling a branch-around-nothing (where the branch is warp 
> neutering, so it's a divergent branch):
> ...
>    .reg .pred %r36;
>    {
>      .reg .u32 %x;
>      mov.u32 %x,%tid.x;
>      setp.ne.u32 %r36,%x,0;
>    }
> 
>    @ %r36 bra $L5;
>    $L5:
> ...
> 
> The JIT fails to generate a convergence point here:
> ...
>           /*0128*/               @P0 BRA `(.L_1);
> .L_1:
> ...
> 
> Consequently, we execute subsequent code in divergent mode, and when 
> executing a shfl.idx a bit later we run into the undefined behaviour 
> that shfl.idx has when executing in divergent mode.
> 
> The workaround detects branch-around-nothing, and inserts a ptx 
> operation that does nothing (I'm calling it a fake nop, I haven't been 
> able to come up with a better term yet):
> ...
>    @ %r36 bra $L5;
>      {
>        .reg .u32 %nop_src;
>        .reg .u32 %nop_dst;
>        mov.u32 %nop_dst, %nop_src;
>      }
>    $L5:
> ...
> which makes the test pass, because then we generate a convergence point 
> here at .L1:
> ...
>          /*0128*/                   SSY `(.L_1);
>          /*0130*/               @P0 SYNC (*"TARGET= .L_1 "*);
>          /*0138*/                   SYNC (*"TARGET= .L_1 "*);
> .L_1:
> ...
> 
> The workaround is not minimal given that it inserts the fake nop in all 
> branch-around-nothings it detects, not just the warp neutering ones, but 
> I think this is more robust than trying to identify the warp neutering 
> branches. Furthermore, I'm not going for optimality here anyway. The 
> optimal way to fix this is making sure we don't generate 
> branch-around-nothing, but that's for stage1.
> 
> Build and reg-tested on x86_64 with nvptx accelerator.
> 
> I'd like to commit in stage4, but I'd appreciate a review of the code. 
> Does the patch look OK?
> 
> Thanks,
> - Tom
> 
> 0002-nvptx-PR83589-Workaround-for-branch-around-nothing-JIT-bug.patch
> 
> 
> [nvptx, PR83589] Workaround for branch-around-nothing JIT bug
> 
> 2018-01-23  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/83589
> 	* config/nvptx/nvptx.c (WORKAROUND_PTXJIT_BUG_2): Define to 1.
> 	(nvptx_pc_set, nvptx_condjump_label): New function. Copy from jump.c.
> 	Add strict parameter.
> 	(prevent_branch_around_nothing): Insert dummy insn between branch to
> 	label and label with no ptx insn inbetween.
> 	* config/nvptx/nvptx.md (define_insn "fake_nop"): New insn.
> 
> 	* testsuite/libgomp.oacc-c-c++-common/pr83589.c: New test.
> 
> ---
>   gcc/config/nvptx/nvptx.c                           | 92 ++++++++++++++++++++++
>   gcc/config/nvptx/nvptx.md                          |  9 +++
>   .../testsuite/libgomp.oacc-c-c++-common/pr83589.c  | 21 +++++
>   3 files changed, 122 insertions(+)
> 

> +/* Insert a dummy ptx insn when encountering a branch to a label with no ptx
> +   insn inbetween the branch and the label.  This works around a JIT bug
> +   observed at driver version 384.111, at -O0 for sm_50.  */
> +
> +static void
> +prevent_branch_around_nothing (void)
> +{
> +  rtx_insn *seen_label = 0;
> +    for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
> +      {
> +	if (seen_label == 0)
> +	  {
> +	    if (INSN_P (insn) && condjump_p (insn))
> +	      seen_label = label_ref_label (nvptx_condjump_label (insn, false));
> +
> +	    continue;
> +	  }
> +
> +	if (NOTE_P (insn))
> +	  continue;
> +
> +	if (INSN_P (insn))
> +	  switch (recog_memoized (insn))
> +	    {
> +	    case CODE_FOR_nvptx_fork:
> +	    case CODE_FOR_nvptx_forked:
> +	    case CODE_FOR_nvptx_joining:
> +	    case CODE_FOR_nvptx_join:
> +	      continue;
> +	    default:
> +	      seen_label = 0;
> +	      continue;
> +	    }
> +
> +	if (LABEL_P (insn) && insn == seen_label)
> +	  emit_insn_before (gen_fake_nop (), insn);
> +
> +	seen_label = 0;
> +      }
> +  }

Consider testcase:
...
int
main (void)
{
   int a[10];
#pragma acc parallel loop worker
   for (int i = 0; i < 10; i++)
     a[i] = i;

   return 0;
}
...

At -O2, we generate this, and fail to generate a fake nop:
...
   @ %r34 bra.uni $L8;
   @ %r33 bra $L9;
   // join 2;
  $L9:
  $L8:
...

What is happening in prevent_branch_around_nothing is:
- seen_label is NULL
- we process "@ %r34 bra.uni $L8" and seen_label becomes $L8
- we process "@ %r33 bra $L9" and since seen_label != NULL, we end up in
   the default case in the switch and reset seen_label to NULL
- we process the labels, seen_label remains NULL, and no fake nop is
   generated

What we want to happen instead, is that when processing "@ %r33 bra 
$L9", seen_label is updated to $L9. Patch below implements that.

Build and reg-tested on x86_64 with nvptx accelerator.

Committed to stage4 trunk.

Thanks,
- Tom
diff mbox series

Patch

[nvptx] Fix prevent_branch_around_nothing

2018-03-20  Tom de Vries  <tom@codesourcery.com>

	PR target/84954
	* config/nvptx/nvptx.c (prevent_branch_around_nothing): Also update
	seen_label if seen_label is already set.

---
 gcc/config/nvptx/nvptx.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index a6f4443..7b0b182 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4419,14 +4419,15 @@  prevent_branch_around_nothing (void)
   rtx_insn *seen_label = NULL;
     for (rtx_insn *insn = get_insns (); insn; insn = NEXT_INSN (insn))
       {
-	if (seen_label == NULL)
+	if (INSN_P (insn) && condjump_p (insn))
 	  {
-	    if (INSN_P (insn) && condjump_p (insn))
-	      seen_label = label_ref_label (nvptx_condjump_label (insn, false));
-
+	    seen_label = label_ref_label (nvptx_condjump_label (insn, false));
 	    continue;
 	  }
 
+	if (seen_label == NULL)
+	  continue;
+
 	if (NOTE_P (insn) || DEBUG_INSN_P (insn))
 	  continue;