diff mbox

[PR81832] Skip copying loop header if inner loop is distributed

Message ID DB5PR0801MB27423F8E089681655CD3BB34E78D0@DB5PR0801MB2742.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Bin Cheng Aug. 15, 2017, 9:28 a.m. UTC
Hi,
This patch fixes PR81832.  Root cause for the ICE is:
  1) Loop has distributed inner loop.
  2) The guarding function call IFN_LOOP_DIST_CALL happens to be in loop's header.
  3) IFN_LOOP_DIST_CALL (int loop's header) is duplicated by pass_ch_vect thus
     not eliminated.

Given pass_ch_vect copies loop header to enable more vectorization, we should
skip loop in this case because distributed inner loop means this loop can not
be vectorized anyway.  One point to mention is name inner_loop_distributed_p
is a little misleading.  The name indicates that each basic block is checked,
but the patch only checks loop's header for simplicity/efficiency's purpose.
Any comment?
Bootstrap and test on x86_64.

Thanks,
bin
2017-08-15  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/81832
	* tree-ssa-loop-ch.c (inner_loop_distributed_p): New function.
	(pass_ch_vect::process_loop_p): Call above function.

gcc/testsuite
2017-08-15  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/81832
	* gcc.dg/tree-ssa/pr81832.c: New test.

Comments

Richard Biener Aug. 15, 2017, 10:40 a.m. UTC | #1
On Tue, Aug 15, 2017 at 11:28 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> This patch fixes PR81832.  Root cause for the ICE is:
>   1) Loop has distributed inner loop.
>   2) The guarding function call IFN_LOOP_DIST_CALL happens to be in loop's header.
>   3) IFN_LOOP_DIST_CALL (int loop's header) is duplicated by pass_ch_vect thus
>      not eliminated.
>
> Given pass_ch_vect copies loop header to enable more vectorization, we should
> skip loop in this case because distributed inner loop means this loop can not
> be vectorized anyway.  One point to mention is name inner_loop_distributed_p
> is a little misleading.  The name indicates that each basic block is checked,
> but the patch only checks loop's header for simplicity/efficiency's purpose.
> Any comment?

My comment would be to question pass_ch_vect placement -- what was the
reason to place it so late?

I also see GRAPHITE runs inbetween loop distribution and vectorization --
what prevents GRAPHITE from messing up things here?  Or autopar?

The patch itself shows should_duplicate_loop_header_p should
handle this IFN specially (somehow all IFNs are considered "inexpensive").

So can you please adjust should_duplicate_loop_header_p instead and/or
gimple_inexpensive_call_p?  Since we have IFNs for stuff like EXP10 I'm not
sure if that default is so good.

Thanks,
Richard.

> Bootstrap and test on x86_64.
>
> Thanks,
> bin
> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/81832
>         * tree-ssa-loop-ch.c (inner_loop_distributed_p): New function.
>         (pass_ch_vect::process_loop_p): Call above function.
>
> gcc/testsuite
> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/81832
>         * gcc.dg/tree-ssa/pr81832.c: New test.
Richard Sandiford Aug. 15, 2017, 5:33 p.m. UTC | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Aug 15, 2017 at 11:28 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> This patch fixes PR81832.  Root cause for the ICE is:
>>   1) Loop has distributed inner loop.
>>   2) The guarding function call IFN_LOOP_DIST_CALL happens to be in loop's header.
>>   3) IFN_LOOP_DIST_CALL (int loop's header) is duplicated by pass_ch_vect thus
>>      not eliminated.
>>
>> Given pass_ch_vect copies loop header to enable more vectorization, we should
>> skip loop in this case because distributed inner loop means this loop can not
>> be vectorized anyway.  One point to mention is name inner_loop_distributed_p
>> is a little misleading.  The name indicates that each basic block is checked,
>> but the patch only checks loop's header for simplicity/efficiency's purpose.
>> Any comment?
>
> My comment would be to question pass_ch_vect placement -- what was the
> reason to place it so late?
>
> I also see GRAPHITE runs inbetween loop distribution and vectorization --
> what prevents GRAPHITE from messing up things here?  Or autopar?
>
> The patch itself shows should_duplicate_loop_header_p should
> handle this IFN specially (somehow all IFNs are considered "inexpensive").
>
> So can you please adjust should_duplicate_loop_header_p instead and/or
> gimple_inexpensive_call_p?  Since we have IFNs for stuff like EXP10 I'm not
> sure if that default is so good.

I think the default itself is OK: we only use IFNs for libm functions
like exp10 if an underlying optab exists (unlike __builtin_exp10, which
is useful as the non-errno setting version of exp10), so the target must
have something that it thinks is worth open-coding.  Also, we currently
treat all MD built-ins as "simple" and thus "inexpensive", so the IFN
handling is consistent with that.

Maybe there are some IFNs that are worth special-casing as expensive,
but IMO doing that to solve this problem would be a bit hacky.  It seems
like "inexpensive" should be more of a cost thing than a correctness thing.

Thanks,
Richard

> Thanks,
> Richard.
>
>> Bootstrap and test on x86_64.
>>
>> Thanks,
>> bin
>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/81832
>>         * tree-ssa-loop-ch.c (inner_loop_distributed_p): New function.
>>         (pass_ch_vect::process_loop_p): Call above function.
>>
>> gcc/testsuite
>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/81832
>>         * gcc.dg/tree-ssa/pr81832.c: New test.
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
new file mode 100644
index 0000000..893124e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
@@ -0,0 +1,22 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int a, b, *c;
+void d(void)
+{
+    int **e;
+    for(;;)
+        for(int f = 1; f <= 6; f++)
+        {
+            b = 0;
+            if(a)
+g:
+                while(a++);
+            if (**e);
+            else
+            {
+                *c = a;
+                goto g;
+            }
+        }
+}
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 14cc6d8d..3c217d4 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -143,6 +143,27 @@  should_duplicate_loop_header_p (basic_block header, struct loop *loop,
   return true;
 }
 
+/* Return TRUE if LOOP's inner loop is versioned by loop distribution and
+   the guarding internal function call happens to be in LOOP's header.
+   Given loop distribution is placed between pass_ch and pass_ch_vect,
+   this function only returns true in pass_ch_vect.  When it returns TRUE,
+   it's known that copying LOOP's header is meaningless.  */
+
+static bool
+inner_loop_distributed_p (struct loop *loop)
+{
+  gimple *stmt = last_stmt (loop->header);
+  if (stmt == NULL || gimple_code (stmt) != GIMPLE_COND)
+    return false;
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gsi_prev (&gsi);
+  if (gsi_end_p (gsi))
+    return false;
+
+  return (gimple_call_internal_p (gsi_stmt (gsi), IFN_LOOP_DIST_ALIAS));
+}
+
 /* Checks whether LOOP is a do-while style loop.  */
 
 static bool
@@ -442,6 +463,9 @@  pass_ch_vect::process_loop_p (struct loop *loop)
   if (loop->dont_vectorize)
     return false;
 
+  if (inner_loop_distributed_p (loop))
+    return false;
+
   if (!do_while_loop_p (loop))
     return true;