diff mbox

[PR81832] Skip copying loop header if inner loop is distributed

Message ID CAHFci28nuXRzo2EJ1JFhH2mqzHWKrB5JfH9mcZdRW_ga7esnKw@mail.gmail.com
State New
Headers show

Commit Message

Bin.Cheng Aug. 16, 2017, 9:22 a.m. UTC
On Tue, Aug 15, 2017 at 6:33 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Tue, Aug 15, 2017 at 11:28 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>> Hi,
>>> This patch fixes PR81832.  Root cause for the ICE is:
>>>   1) Loop has distributed inner loop.
>>>   2) The guarding function call IFN_LOOP_DIST_CALL happens to be in loop's header.
>>>   3) IFN_LOOP_DIST_CALL (int loop's header) is duplicated by pass_ch_vect thus
>>>      not eliminated.
>>>
>>> Given pass_ch_vect copies loop header to enable more vectorization, we should
>>> skip loop in this case because distributed inner loop means this loop can not
>>> be vectorized anyway.  One point to mention is name inner_loop_distributed_p
>>> is a little misleading.  The name indicates that each basic block is checked,
>>> but the patch only checks loop's header for simplicity/efficiency's purpose.
>>> Any comment?
>>
>> My comment would be to question pass_ch_vect placement -- what was the
>> reason to place it so late?
>>
>> I also see GRAPHITE runs inbetween loop distribution and vectorization --
>> what prevents GRAPHITE from messing up things here?  Or autopar?
>>
>> The patch itself shows should_duplicate_loop_header_p should
>> handle this IFN specially (somehow all IFNs are considered "inexpensive").
>>
>> So can you please adjust should_duplicate_loop_header_p instead and/or
>> gimple_inexpensive_call_p?  Since we have IFNs for stuff like EXP10 I'm not
>> sure if that default is so good.
>
> I think the default itself is OK: we only use IFNs for libm functions
> like exp10 if an underlying optab exists (unlike __builtin_exp10, which
> is useful as the non-errno setting version of exp10), so the target must
> have something that it thinks is worth open-coding.  Also, we currently
> treat all MD built-ins as "simple" and thus "inexpensive", so the IFN
> handling is consistent with that.
>
> Maybe there are some IFNs that are worth special-casing as expensive,
> but IMO doing that to solve this problem would be a bit hacky.  It seems
> like "inexpensive" should be more of a cost thing than a correctness thing.
Hi all,
This is updated patch.  It only adds a single check on IFN_LOOP_DIST_ALIAS
in function should_duplicate_loop_header_p.  As for gimple_inexpensive_call_p,
I think it's natural to consider functions like IFN_LOOP_VECTORIZED and
IFN_LOOP_DIST_ALIAS as expensive because they are only meant to indicate
temporary arrangement of optimizations and are never used in code generation.
I will send a standalone patch for that.  Another thing is this patch
doesn't check
IFN_LOOP_VECTORIZED because it's impossible to have it with current order
of passes.
Bootstrap and test ongoing.  Further comments?

Thanks,
bin
2017-08-15  Bin Cheng  <bin.cheng@arm.com>

    PR tree-optimization/81832
    * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Don't
    copy loop header which has IFN_LOOP_DIST_ALIAS call.

gcc/testsuite
2017-08-15  Bin Cheng  <bin.cheng@arm.com>

    PR tree-optimization/81832
    * gcc.dg/tree-ssa/pr81832.c: New test.

>
> Thanks,
> Richard
>
>> Thanks,
>> Richard.
>>
>>> Bootstrap and test on x86_64.
>>>
>>> Thanks,
>>> bin
>>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         PR tree-optimization/81832
>>>         * tree-ssa-loop-ch.c (inner_loop_distributed_p): New function.
>>>         (pass_ch_vect::process_loop_p): Call above function.
>>>
>>> gcc/testsuite
>>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         PR tree-optimization/81832
>>>         * gcc.dg/tree-ssa/pr81832.c: New test.

Comments

Richard Sandiford Aug. 16, 2017, 9:31 a.m. UTC | #1
"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Tue, Aug 15, 2017 at 6:33 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Tue, Aug 15, 2017 at 11:28 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>> Hi,
>>>> This patch fixes PR81832.  Root cause for the ICE is:
>>>>   1) Loop has distributed inner loop.
>>>>   2) The guarding function call IFN_LOOP_DIST_CALL happens to be in loop's header.
>>>>   3) IFN_LOOP_DIST_CALL (int loop's header) is duplicated by pass_ch_vect thus
>>>>      not eliminated.
>>>>
>>>> Given pass_ch_vect copies loop header to enable more vectorization, we should
>>>> skip loop in this case because distributed inner loop means this loop can not
>>>> be vectorized anyway.  One point to mention is name inner_loop_distributed_p
>>>> is a little misleading.  The name indicates that each basic block is checked,
>>>> but the patch only checks loop's header for simplicity/efficiency's purpose.
>>>> Any comment?
>>>
>>> My comment would be to question pass_ch_vect placement -- what was the
>>> reason to place it so late?
>>>
>>> I also see GRAPHITE runs inbetween loop distribution and vectorization --
>>> what prevents GRAPHITE from messing up things here?  Or autopar?
>>>
>>> The patch itself shows should_duplicate_loop_header_p should
>>> handle this IFN specially (somehow all IFNs are considered "inexpensive").
>>>
>>> So can you please adjust should_duplicate_loop_header_p instead and/or
>>> gimple_inexpensive_call_p?  Since we have IFNs for stuff like EXP10 I'm not
>>> sure if that default is so good.
>>
>> I think the default itself is OK: we only use IFNs for libm functions
>> like exp10 if an underlying optab exists (unlike __builtin_exp10, which
>> is useful as the non-errno setting version of exp10), so the target must
>> have something that it thinks is worth open-coding.  Also, we currently
>> treat all MD built-ins as "simple" and thus "inexpensive", so the IFN
>> handling is consistent with that.
>>
>> Maybe there are some IFNs that are worth special-casing as expensive,
>> but IMO doing that to solve this problem would be a bit hacky.  It seems
>> like "inexpensive" should be more of a cost thing than a correctness thing.
> Hi all,

> This is updated patch.  It only adds a single check on IFN_LOOP_DIST_ALIAS
> in function should_duplicate_loop_header_p.

Thanks.

> As for gimple_inexpensive_call_p, I think it's natural to consider
> functions like IFN_LOOP_VECTORIZED and IFN_LOOP_DIST_ALIAS as
> expensive because they are only meant to indicate temporary
> arrangement of optimizations and are never used in code generation.
> I will send a standalone patch for that.

Is that enough to consider them expensive though?  To me, "expensive"
should mean that they cost a lot in terms of size or speed (whichever
is most important in context).  Both functions are really cheap in
that sense, since they eventually expand to constants.

Thanks,
Richard

> Another thing is this patch doesn't check IFN_LOOP_VECTORIZED because
> it's impossible to have it with current order of passes.  Bootstrap
> and test ongoing.  Further comments?
>
> Thanks,
> bin
> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>
>     PR tree-optimization/81832
>     * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Don't
>     copy loop header which has IFN_LOOP_DIST_ALIAS call.
>
> gcc/testsuite
> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>
>     PR tree-optimization/81832
>     * gcc.dg/tree-ssa/pr81832.c: New test.
>
>>
>> Thanks,
>> Richard
>>
>>> Thanks,
>>> Richard.
>>>
>>>> Bootstrap and test on x86_64.
>>>>
>>>> Thanks,
>>>> bin
>>>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>         PR tree-optimization/81832
>>>>         * tree-ssa-loop-ch.c (inner_loop_distributed_p): New function.
>>>>         (pass_ch_vect::process_loop_p): Call above function.
>>>>
>>>> gcc/testsuite
>>>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>         PR tree-optimization/81832
>>>>         * gcc.dg/tree-ssa/pr81832.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
> new file mode 100644
> index 0000000..893124e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +int a, b, *c;
> +void d(void)
> +{
> +    int **e;
> +    for(;;)
> +        for(int f = 1; f <= 6; f++)
> +        {
> +            b = 0;
> +            if(a)
> +g:
> +                while(a++);
> +            if (**e);
> +            else
> +            {
> +                *c = a;
> +                goto g;
> +            }
> +        }
> +}
> diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
> index 14cc6d8d..6bb0220 100644
> --- a/gcc/tree-ssa-loop-ch.c
> +++ b/gcc/tree-ssa-loop-ch.c
> @@ -119,7 +119,10 @@ should_duplicate_loop_header_p (basic_block header, struct loop *loop,
>  	continue;
>  
>        if (gimple_code (last) == GIMPLE_CALL
> -	  && !gimple_inexpensive_call_p (as_a <gcall *> (last)))
> +	  && (!gimple_inexpensive_call_p (as_a <gcall *> (last))
> +	      /* IFN_LOOP_DIST_ALIAS means that inner loop is distributed
> +		 at current loop's header.  Don't copy in this case.  */
> +	      || gimple_call_internal_p (last, IFN_LOOP_DIST_ALIAS)))
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
>  	    fprintf (dump_file,
Bin.Cheng Aug. 16, 2017, 9:40 a.m. UTC | #2
On Wed, Aug 16, 2017 at 10:31 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>> On Tue, Aug 15, 2017 at 6:33 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Tue, Aug 15, 2017 at 11:28 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>>> Hi,
>>>>> This patch fixes PR81832.  Root cause for the ICE is:
>>>>>   1) Loop has distributed inner loop.
>>>>>   2) The guarding function call IFN_LOOP_DIST_CALL happens to be in loop's header.
>>>>>   3) IFN_LOOP_DIST_CALL (int loop's header) is duplicated by pass_ch_vect thus
>>>>>      not eliminated.
>>>>>
>>>>> Given pass_ch_vect copies loop header to enable more vectorization, we should
>>>>> skip loop in this case because distributed inner loop means this loop can not
>>>>> be vectorized anyway.  One point to mention is name inner_loop_distributed_p
>>>>> is a little misleading.  The name indicates that each basic block is checked,
>>>>> but the patch only checks loop's header for simplicity/efficiency's purpose.
>>>>> Any comment?
>>>>
>>>> My comment would be to question pass_ch_vect placement -- what was the
>>>> reason to place it so late?
>>>>
>>>> I also see GRAPHITE runs inbetween loop distribution and vectorization --
>>>> what prevents GRAPHITE from messing up things here?  Or autopar?
>>>>
>>>> The patch itself shows should_duplicate_loop_header_p should
>>>> handle this IFN specially (somehow all IFNs are considered "inexpensive").
>>>>
>>>> So can you please adjust should_duplicate_loop_header_p instead and/or
>>>> gimple_inexpensive_call_p?  Since we have IFNs for stuff like EXP10 I'm not
>>>> sure if that default is so good.
>>>
>>> I think the default itself is OK: we only use IFNs for libm functions
>>> like exp10 if an underlying optab exists (unlike __builtin_exp10, which
>>> is useful as the non-errno setting version of exp10), so the target must
>>> have something that it thinks is worth open-coding.  Also, we currently
>>> treat all MD built-ins as "simple" and thus "inexpensive", so the IFN
>>> handling is consistent with that.
>>>
>>> Maybe there are some IFNs that are worth special-casing as expensive,
>>> but IMO doing that to solve this problem would be a bit hacky.  It seems
>>> like "inexpensive" should be more of a cost thing than a correctness thing.
>> Hi all,
>
>> This is updated patch.  It only adds a single check on IFN_LOOP_DIST_ALIAS
>> in function should_duplicate_loop_header_p.
>
> Thanks.
>
>> As for gimple_inexpensive_call_p, I think it's natural to consider
>> functions like IFN_LOOP_VECTORIZED and IFN_LOOP_DIST_ALIAS as
>> expensive because they are only meant to indicate temporary
>> arrangement of optimizations and are never used in code generation.
>> I will send a standalone patch for that.
>
> Is that enough to consider them expensive though?  To me, "expensive"
Not sure.  Or the other hand,  "Expensive" is a measurement of the cost of
generated code.  For internal function calls in discussion, maybe we should
not ask the question in the first.  Even these function calls are expanded to
constant, IMHO, we can't simply consider it's cheap either because there is
high level side-effects along with expanding, i.e, undoing loop versioning.
such high level transformation is not (and should not be) covered by
gimple_inexpensive_call_p.

Thanks,
bin
> should mean that they cost a lot in terms of size or speed (whichever
> is most important in context).  Both functions are really cheap in
> that sense, since they eventually expand to constants.
>
> Thanks,
> Richard
>
>> Another thing is this patch doesn't check IFN_LOOP_VECTORIZED because
>> it's impossible to have it with current order of passes.  Bootstrap
>> and test ongoing.  Further comments?
>>
>> Thanks,
>> bin
>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>
>>     PR tree-optimization/81832
>>     * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Don't
>>     copy loop header which has IFN_LOOP_DIST_ALIAS call.
>>
>> gcc/testsuite
>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>
>>     PR tree-optimization/81832
>>     * gcc.dg/tree-ssa/pr81832.c: New test.
>>
>>>
>>> Thanks,
>>> Richard
>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Bootstrap and test on x86_64.
>>>>>
>>>>> Thanks,
>>>>> bin
>>>>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>         PR tree-optimization/81832
>>>>>         * tree-ssa-loop-ch.c (inner_loop_distributed_p): New function.
>>>>>         (pass_ch_vect::process_loop_p): Call above function.
>>>>>
>>>>> gcc/testsuite
>>>>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>         PR tree-optimization/81832
>>>>>         * gcc.dg/tree-ssa/pr81832.c: New test.
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
>> new file mode 100644
>> index 0000000..893124e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
>> @@ -0,0 +1,22 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +
>> +int a, b, *c;
>> +void d(void)
>> +{
>> +    int **e;
>> +    for(;;)
>> +        for(int f = 1; f <= 6; f++)
>> +        {
>> +            b = 0;
>> +            if(a)
>> +g:
>> +                while(a++);
>> +            if (**e);
>> +            else
>> +            {
>> +                *c = a;
>> +                goto g;
>> +            }
>> +        }
>> +}
>> diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
>> index 14cc6d8d..6bb0220 100644
>> --- a/gcc/tree-ssa-loop-ch.c
>> +++ b/gcc/tree-ssa-loop-ch.c
>> @@ -119,7 +119,10 @@ should_duplicate_loop_header_p (basic_block header, struct loop *loop,
>>       continue;
>>
>>        if (gimple_code (last) == GIMPLE_CALL
>> -       && !gimple_inexpensive_call_p (as_a <gcall *> (last)))
>> +       && (!gimple_inexpensive_call_p (as_a <gcall *> (last))
>> +           /* IFN_LOOP_DIST_ALIAS means that inner loop is distributed
>> +              at current loop's header.  Don't copy in this case.  */
>> +           || gimple_call_internal_p (last, IFN_LOOP_DIST_ALIAS)))
>>       {
>>         if (dump_file && (dump_flags & TDF_DETAILS))
>>           fprintf (dump_file,
Richard Biener Aug. 16, 2017, 10:47 a.m. UTC | #3
On Wed, Aug 16, 2017 at 11:22 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, Aug 15, 2017 at 6:33 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Tue, Aug 15, 2017 at 11:28 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>>>> Hi,
>>>> This patch fixes PR81832.  Root cause for the ICE is:
>>>>   1) Loop has distributed inner loop.
>>>>   2) The guarding function call IFN_LOOP_DIST_CALL happens to be in loop's header.
>>>>   3) IFN_LOOP_DIST_CALL (int loop's header) is duplicated by pass_ch_vect thus
>>>>      not eliminated.
>>>>
>>>> Given pass_ch_vect copies loop header to enable more vectorization, we should
>>>> skip loop in this case because distributed inner loop means this loop can not
>>>> be vectorized anyway.  One point to mention is name inner_loop_distributed_p
>>>> is a little misleading.  The name indicates that each basic block is checked,
>>>> but the patch only checks loop's header for simplicity/efficiency's purpose.
>>>> Any comment?
>>>
>>> My comment would be to question pass_ch_vect placement -- what was the
>>> reason to place it so late?
>>>
>>> I also see GRAPHITE runs inbetween loop distribution and vectorization --
>>> what prevents GRAPHITE from messing up things here?  Or autopar?
>>>
>>> The patch itself shows should_duplicate_loop_header_p should
>>> handle this IFN specially (somehow all IFNs are considered "inexpensive").
>>>
>>> So can you please adjust should_duplicate_loop_header_p instead and/or
>>> gimple_inexpensive_call_p?  Since we have IFNs for stuff like EXP10 I'm not
>>> sure if that default is so good.
>>
>> I think the default itself is OK: we only use IFNs for libm functions
>> like exp10 if an underlying optab exists (unlike __builtin_exp10, which
>> is useful as the non-errno setting version of exp10), so the target must
>> have something that it thinks is worth open-coding.  Also, we currently
>> treat all MD built-ins as "simple" and thus "inexpensive", so the IFN
>> handling is consistent with that.
>>
>> Maybe there are some IFNs that are worth special-casing as expensive,
>> but IMO doing that to solve this problem would be a bit hacky.  It seems
>> like "inexpensive" should be more of a cost thing than a correctness thing.
> Hi all,
> This is updated patch.  It only adds a single check on IFN_LOOP_DIST_ALIAS
> in function should_duplicate_loop_header_p.  As for gimple_inexpensive_call_p,
> I think it's natural to consider functions like IFN_LOOP_VECTORIZED and
> IFN_LOOP_DIST_ALIAS as expensive because they are only meant to indicate
> temporary arrangement of optimizations and are never used in code generation.
> I will send a standalone patch for that.  Another thing is this patch
> doesn't check
> IFN_LOOP_VECTORIZED because it's impossible to have it with current order
> of passes.
> Bootstrap and test ongoing.  Further comments?

Ok.

Thanks,
Richard.

> Thanks,
> bin
> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>
>     PR tree-optimization/81832
>     * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Don't
>     copy loop header which has IFN_LOOP_DIST_ALIAS call.
>
> gcc/testsuite
> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>
>     PR tree-optimization/81832
>     * gcc.dg/tree-ssa/pr81832.c: New test.
>
>>
>> Thanks,
>> Richard
>>
>>> Thanks,
>>> Richard.
>>>
>>>> Bootstrap and test on x86_64.
>>>>
>>>> Thanks,
>>>> bin
>>>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>         PR tree-optimization/81832
>>>>         * tree-ssa-loop-ch.c (inner_loop_distributed_p): New function.
>>>>         (pass_ch_vect::process_loop_p): Call above function.
>>>>
>>>> gcc/testsuite
>>>> 2017-08-15  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>         PR tree-optimization/81832
>>>>         * gcc.dg/tree-ssa/pr81832.c: New test.
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
new file mode 100644
index 0000000..893124e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr81832.c
@@ -0,0 +1,22 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int a, b, *c;
+void d(void)
+{
+    int **e;
+    for(;;)
+        for(int f = 1; f <= 6; f++)
+        {
+            b = 0;
+            if(a)
+g:
+                while(a++);
+            if (**e);
+            else
+            {
+                *c = a;
+                goto g;
+            }
+        }
+}
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index 14cc6d8d..6bb0220 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -119,7 +119,10 @@  should_duplicate_loop_header_p (basic_block header, struct loop *loop,
 	continue;
 
       if (gimple_code (last) == GIMPLE_CALL
-	  && !gimple_inexpensive_call_p (as_a <gcall *> (last)))
+	  && (!gimple_inexpensive_call_p (as_a <gcall *> (last))
+	      /* IFN_LOOP_DIST_ALIAS means that inner loop is distributed
+		 at current loop's header.  Don't copy in this case.  */
+	      || gimple_call_internal_p (last, IFN_LOOP_DIST_ALIAS)))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file,