diff mbox

[nvptx,committed,PR81069] Insert diverging jump alap in nvptx_single

Message ID 55a722a5-4452-ed93-267e-e44b1d0572ed@mentor.com
State New
Headers show

Commit Message

Tom de Vries July 17, 2017, 8:41 a.m. UTC
Hi,

Consider nvptx_single:
...
/* Single neutering according to MASK.  FROM is the incoming block and
    TO is the outgoing block.  These may be the same block. Insert at
    start of FROM:

      if (tid.<axis>) goto end.

    and insert before ending branch of TO (if there is such an insn):

      end:
      <possibly-broadcast-cond>
      <branch>

    We currently only use differnt FROM and TO when skipping an entire
    loop.  We could do more if we detected superblocks.  */

static void
nvptx_single (unsigned mask, basic_block from, basic_block to)
...

When compiling libgomp.oacc-fortran/nested-function-1.f90 at -O1, we 
observed the following pattern:
...
<bb2>:
      goto bb3;

<bb3>: (with single predecessor)
      <some code>
      <cond-branch>
...

which was translated by nvptx_single into:
...
<bb2>
      if (tid.<axis>) goto end.
      goto bb3;

<bb3>:
      <some code>
      end:
      <broadcast-cond>
      <cond-branch>
...

There is no benefit to be gained from doing the goto bb3 in neutered 
mode, and there is no need to, so we might as well insert the neutering 
branch as late as possible:
...
<bb2>
      goto bb3;

<bb3>:
      if (tid.<axis>) goto end.
      <some code>
      end:
      <broadcast-cond>
      <cond-branch>
...

This patch implements inserting the neutering branch as late as possible.

[ As it happens, the actual code for 
libgomp.oacc-fortran/nested-function-1.f90 at -O1 was more complicated: 
there were other bbs inbetween bb2 and bb3. While this doesn't change 
anything from a control flow graph point of view, it did trigger a bug 
in the ptx JIT compiler where it inserts the synchronization point for 
the diverging branch later than the immediate post-dominator point at 
the end label. Consequently, the condition broadcast was executed in 
divergent mode (which is known to give undefined results), resulting in 
a hang.
This patch also works around this ptx JIT compiler bug, for this 
test-case. ]

Build and tested on x86_64 with nvptx accelerator.

Committed.

Thanks,
- Tom

Comments

Tom de Vries July 18, 2017, 12:37 p.m. UTC | #1
[ was: Re: [nvptx, committed, PR81069] Insert diverging jump alap in 
nvptx_single ]

On 07/17/2017 10:41 AM, Tom de Vries wrote:
> Hi,
> 
> Consider nvptx_single:
> ...
> /* Single neutering according to MASK.  FROM is the incoming block and
>     TO is the outgoing block.  These may be the same block. Insert at
>     start of FROM:
> 
>       if (tid.<axis>) goto end.
> 
>     and insert before ending branch of TO (if there is such an insn):
> 
>       end:
>       <possibly-broadcast-cond>
>       <branch>
> 
>     We currently only use differnt FROM and TO when skipping an entire
>     loop.  We could do more if we detected superblocks.  */
> 
> static void
> nvptx_single (unsigned mask, basic_block from, basic_block to)
> ...
> 
> When compiling libgomp.oacc-fortran/nested-function-1.f90 at -O1, we 
> observed the following pattern:
> ...
> <bb2>:
>       goto bb3;
> 
> <bb3>: (with single predecessor)
>       <some code>
>       <cond-branch>
> ...
> 
> which was translated by nvptx_single into:
> ...
> <bb2>
>       if (tid.<axis>) goto end.
>       goto bb3;
> 
> <bb3>:
>       <some code>
>       end:
>       <broadcast-cond>
>       <cond-branch>
> ...
> 
> There is no benefit to be gained from doing the goto bb3 in neutered 
> mode, and there is no need to, so we might as well insert the neutering 
> branch as late as possible:
> ...
> <bb2>
>       goto bb3;
> 
> <bb3>:
>       if (tid.<axis>) goto end.
>       <some code>
>       end:
>       <broadcast-cond>
>       <cond-branch>
> ...
> 
> This patch implements inserting the neutering branch as late as possible.
> 
> [ As it happens, the actual code for 
> libgomp.oacc-fortran/nested-function-1.f90 at -O1 was more complicated: 
> there were other bbs inbetween bb2 and bb3. While this doesn't change 
> anything from a control flow graph point of view, it did trigger a bug 
> in the ptx JIT compiler where it inserts the synchronization point for 
> the diverging branch later than the immediate post-dominator point at 
> the end label. Consequently, the condition broadcast was executed in 
> divergent mode (which is known to give undefined results), resulting in 
> a hang.
> This patch also works around this ptx JIT compiler bug, for this 
> test-case. ]
> 
> Build and tested on x86_64 with nvptx accelerator.
> 
> Committed.

Jakub,

I'd like to backport this nvptx patch to the gcc-7-branch.

The patch doesn't trivially fit into the category of regression or 
documentation fix.

Without this patch, when building an nvptx offloading compiler and 
running the libgomp testsuite for the gcc-7-branch, the GPU hangs, and 
I've had a report from a colleague who experienced system crashes 
because of it.

However, in principle gcc is not doing anything wrong: the generated 
code is according to the ptx spec. It's just that the patch makes it 
less likely to run into a ptx JIT bug.

Then again, it's an nvptx patch, neither a primary nor secondary target.

I'll commit the backport some time this week, unless there are objections.

Thanks,
- Tom

> 0001-Insert-diverging-jump-alap-in-nvptx_single.patch
> 
> 
> Insert diverging jump alap in nvptx_single
> 
> 2017-07-17  Tom de Vries  <tom@codesourcery.com>
> 
> 	PR target/81069
> 	* config/nvptx/nvptx.c (nvptx_single): Insert diverging branch as late
> 	as possible.
> 
> ---
>   gcc/config/nvptx/nvptx.c | 22 +++++++++++++++++++---
>   1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
> index daeec27..cb11686 100644
> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c
> @@ -3866,9 +3866,25 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
>     rtx_insn *tail = BB_END (to);
>     unsigned skip_mask = mask;
>   
> -  /* Find first insn of from block */
> -  while (head != BB_END (from) && !INSN_P (head))
> -    head = NEXT_INSN (head);
> +  while (true)
> +    {
> +      /* Find first insn of from block.  */
> +      while (head != BB_END (from) && !INSN_P (head))
> +	head = NEXT_INSN (head);
> +
> +      if (from == to)
> +	break;
> +
> +      if (!(JUMP_P (head) && single_succ_p (from)))
> +	break;
> +
> +      basic_block jump_target = single_succ (from);
> +      if (!single_pred_p (jump_target))
> +	break;
> +
> +      from = jump_target;
> +      head = BB_HEAD (from);
> +    }
>   
>     /* Find last insn of to block */
>     rtx_insn *limit = from == to ? head : BB_HEAD (to);
>
Jakub Jelinek July 18, 2017, 12:39 p.m. UTC | #2
On Tue, Jul 18, 2017 at 02:37:38PM +0200, Tom de Vries wrote:
> I'd like to backport this nvptx patch to the gcc-7-branch.
> 
> The patch doesn't trivially fit into the category of regression or
> documentation fix.
> 
> Without this patch, when building an nvptx offloading compiler and running
> the libgomp testsuite for the gcc-7-branch, the GPU hangs, and I've had a
> report from a colleague who experienced system crashes because of it.
> 
> However, in principle gcc is not doing anything wrong: the generated code is
> according to the ptx spec. It's just that the patch makes it less likely to
> run into a ptx JIT bug.
> 
> Then again, it's an nvptx patch, neither a primary nor secondary target.
> 
> I'll commit the backport some time this week, unless there are objections.

Ok, thanks.

> > 0001-Insert-diverging-jump-alap-in-nvptx_single.patch
> > 
> > 
> > Insert diverging jump alap in nvptx_single
> > 
> > 2017-07-17  Tom de Vries  <tom@codesourcery.com>
> > 
> > 	PR target/81069
> > 	* config/nvptx/nvptx.c (nvptx_single): Insert diverging branch as late
> > 	as possible.

	Jakub
diff mbox

Patch

Insert diverging jump alap in nvptx_single

2017-07-17  Tom de Vries  <tom@codesourcery.com>

	PR target/81069
	* config/nvptx/nvptx.c (nvptx_single): Insert diverging branch as late
	as possible.

---
 gcc/config/nvptx/nvptx.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index daeec27..cb11686 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3866,9 +3866,25 @@  nvptx_single (unsigned mask, basic_block from, basic_block to)
   rtx_insn *tail = BB_END (to);
   unsigned skip_mask = mask;
 
-  /* Find first insn of from block */
-  while (head != BB_END (from) && !INSN_P (head))
-    head = NEXT_INSN (head);
+  while (true)
+    {
+      /* Find first insn of from block.  */
+      while (head != BB_END (from) && !INSN_P (head))
+	head = NEXT_INSN (head);
+
+      if (from == to)
+	break;
+
+      if (!(JUMP_P (head) && single_succ_p (from)))
+	break;
+
+      basic_block jump_target = single_succ (from);
+      if (!single_pred_p (jump_target))
+	break;
+
+      from = jump_target;
+      head = BB_HEAD (from);
+    }
 
   /* Find last insn of to block */
   rtx_insn *limit = from == to ? head : BB_HEAD (to);