diff mbox series

[PR90106] Builtin call transformation changes in cdce pass

Message ID fb8eff07-3445-9a76-5d03-fe02e2966ef3@linux.alibaba.com
State New
Headers show
Series [PR90106] Builtin call transformation changes in cdce pass | expand

Commit Message

JunMa May 8, 2019, 10:09 a.m. UTC
Hi

As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
when gcc meets builtin function call like:

   y = sqrt (x);

The cdce pass tries to transform the call into an internal function
call and conditionally executes call with a simple range check on the
arguments which can detect most cases and the errno does not need
to be set. It looks like:

   y = IFN_SQRT (x);
   if (__builtin_isless (x, 0))
     sqrt (x);

However, If the call is in tail position, for example:

   y =  sqrt (x);
   return y;

will become:

   y = IFN_SQRT (x);
   if (__builtin_isless (x, 0))
     sqrt (x);
   return y;

This transformation breaks tailcall pattern, and prevents
later tailcall optimizations.

So This patch transform builtin call with return value into
if-then-else part, which looks like:

   y =  sqrt (x);
    ==>
   if (__builtin_isless (x, 0))
     y = sqrt (x);
   else
     y = IFN_SQRT (x);

BTW, y = sqrt (x) can also transform like:

   y = IFN_SQRT (x);
   if (__builtin_isless (x, 0))
     y = sqrt (x);

We don‘t choose this pattern because it emits worse assemble
code(more move instruction and use more registers) in x86_64.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

Regards
JunMa


gcc/ChangeLog

2019-05-07  Jun Ma <JunMa@linux.alibaba.com>

     PR tree-optimization/90106
     * tree-call-cdce.c (shrink_wrap_one_built_in_call_with_conds): Add
     new parameter as new internal function call, also move it to new
     basic block.
     (use_internal_fn): Pass internal function call to
     shrink_wrap_one_built_in_call_with_conds.

gcc/testsuite/ChangeLog

2019-05-07  Jun Ma <JunMa@linux.alibaba.com>

     PR tree-optimization/90106
     * gcc.dg/cdce1.c: Check tailcall code generation after cdce pass.
     * gcc.dg/cdce2.c: Likewise.
---
 gcc/testsuite/gcc.dg/cdce1.c |  3 +-
 gcc/testsuite/gcc.dg/cdce2.c |  3 +-
 gcc/tree-call-cdce.c         | 90 +++++++++++++++++++++++++++++++++-----------
 3 files changed, 71 insertions(+), 25 deletions(-)

Comments

Richard Biener May 8, 2019, 12:28 p.m. UTC | #1
On Wed, May 8, 2019 at 12:09 PM JunMa <JunMa@linux.alibaba.com> wrote:
>
> Hi
>
> As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
> when gcc meets builtin function call like:
>
>    y = sqrt (x);
>
> The cdce pass tries to transform the call into an internal function
> call and conditionally executes call with a simple range check on the
> arguments which can detect most cases and the errno does not need
> to be set. It looks like:
>
>    y = IFN_SQRT (x);
>    if (__builtin_isless (x, 0))
>      sqrt (x);
>
> However, If the call is in tail position, for example:
>
>    y =  sqrt (x);
>    return y;
>
> will become:
>
>    y = IFN_SQRT (x);
>    if (__builtin_isless (x, 0))
>      sqrt (x);
>    return y;
>
> This transformation breaks tailcall pattern, and prevents
> later tailcall optimizations.
>
> So This patch transform builtin call with return value into
> if-then-else part, which looks like:
>
>    y =  sqrt (x);
>     ==>
>    if (__builtin_isless (x, 0))
>      y = sqrt (x);
>    else
>      y = IFN_SQRT (x);
>
> BTW, y = sqrt (x) can also transform like:
>
>    y = IFN_SQRT (x);
>    if (__builtin_isless (x, 0))
>      y = sqrt (x);
>
> We don‘t choose this pattern because it emits worse assemble
> code(more move instruction and use more registers) in x86_64.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

Thanks,
Richard.


> Regards
> JunMa
>
>
> gcc/ChangeLog
>
> 2019-05-07  Jun Ma <JunMa@linux.alibaba.com>
>
>      PR tree-optimization/90106
>      * tree-call-cdce.c (shrink_wrap_one_built_in_call_with_conds): Add
>      new parameter as new internal function call, also move it to new
>      basic block.
>      (use_internal_fn): Pass internal function call to
>      shrink_wrap_one_built_in_call_with_conds.
>
> gcc/testsuite/ChangeLog
>
> 2019-05-07  Jun Ma <JunMa@linux.alibaba.com>
>
>      PR tree-optimization/90106
>      * gcc.dg/cdce1.c: Check tailcall code generation after cdce pass.
>      * gcc.dg/cdce2.c: Likewise.
>
Jeff Law May 8, 2019, 9:31 p.m. UTC | #2
On 5/8/19 6:28 AM, Richard Biener wrote:
> On Wed, May 8, 2019 at 12:09 PM JunMa <JunMa@linux.alibaba.com> wrote:
>>
>> Hi
>>
>> As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
>> when gcc meets builtin function call like:
>>
>>    y = sqrt (x);
>>
>> The cdce pass tries to transform the call into an internal function
>> call and conditionally executes call with a simple range check on the
>> arguments which can detect most cases and the errno does not need
>> to be set. It looks like:
>>
>>    y = IFN_SQRT (x);
>>    if (__builtin_isless (x, 0))
>>      sqrt (x);
>>
>> However, If the call is in tail position, for example:
>>
>>    y =  sqrt (x);
>>    return y;
>>
>> will become:
>>
>>    y = IFN_SQRT (x);
>>    if (__builtin_isless (x, 0))
>>      sqrt (x);
>>    return y;
>>
>> This transformation breaks tailcall pattern, and prevents
>> later tailcall optimizations.
>>
>> So This patch transform builtin call with return value into
>> if-then-else part, which looks like:
>>
>>    y =  sqrt (x);
>>     ==>
>>    if (__builtin_isless (x, 0))
>>      y = sqrt (x);
>>    else
>>      y = IFN_SQRT (x);
>>
>> BTW, y = sqrt (x) can also transform like:
>>
>>    y = IFN_SQRT (x);
>>    if (__builtin_isless (x, 0))
>>      y = sqrt (x);
>>
>> We don‘t choose this pattern because it emits worse assemble
>> code(more move instruction and use more registers) in x86_64.
>>
>> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> OK.
JunMa -- do you have a copyright assignment on file and write access to
the repository?  If not we should take care of that before proceeding
further.

Jeff
Bin.Cheng May 9, 2019, 1:20 a.m. UTC | #3
On Thu, May 9, 2019 at 5:31 AM Jeff Law <law@redhat.com> wrote:
>
> On 5/8/19 6:28 AM, Richard Biener wrote:
> > On Wed, May 8, 2019 at 12:09 PM JunMa <JunMa@linux.alibaba.com> wrote:
> >>
> >> Hi
> >>
> >> As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
> >> when gcc meets builtin function call like:
> >>
> >>    y = sqrt (x);
> >>
> >> The cdce pass tries to transform the call into an internal function
> >> call and conditionally executes call with a simple range check on the
> >> arguments which can detect most cases and the errno does not need
> >> to be set. It looks like:
> >>
> >>    y = IFN_SQRT (x);
> >>    if (__builtin_isless (x, 0))
> >>      sqrt (x);
> >>
> >> However, If the call is in tail position, for example:
> >>
> >>    y =  sqrt (x);
> >>    return y;
> >>
> >> will become:
> >>
> >>    y = IFN_SQRT (x);
> >>    if (__builtin_isless (x, 0))
> >>      sqrt (x);
> >>    return y;
> >>
> >> This transformation breaks tailcall pattern, and prevents
> >> later tailcall optimizations.
> >>
> >> So This patch transform builtin call with return value into
> >> if-then-else part, which looks like:
> >>
> >>    y =  sqrt (x);
> >>     ==>
> >>    if (__builtin_isless (x, 0))
> >>      y = sqrt (x);
> >>    else
> >>      y = IFN_SQRT (x);
> >>
> >> BTW, y = sqrt (x) can also transform like:
> >>
> >>    y = IFN_SQRT (x);
> >>    if (__builtin_isless (x, 0))
> >>      y = sqrt (x);
> >>
> >> We don‘t choose this pattern because it emits worse assemble
> >> code(more move instruction and use more registers) in x86_64.
> >>
> >> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> >
> > OK.
> JunMa -- do you have a copyright assignment on file and write access to
> the repository?  If not we should take care of that before proceeding
> further.
Hi Jeff,
Thanks very much for helping.
Yes, he is under Alibaba's copyright assignment.  What else should we
do other than noticing in this mailing list message?
BTW, I think JunMa needs to apply write-access though.

Thanks,
bin
>
> Jeff
JunMa May 9, 2019, 2:25 a.m. UTC | #4
在 2019/5/9 上午9:20, Bin.Cheng 写道:
> On Thu, May 9, 2019 at 5:31 AM Jeff Law <law@redhat.com> wrote:
>> On 5/8/19 6:28 AM, Richard Biener wrote:
>>> On Wed, May 8, 2019 at 12:09 PM JunMa <JunMa@linux.alibaba.com> wrote:
>>>> Hi
>>>>
>>>> As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
>>>> when gcc meets builtin function call like:
>>>>
>>>>     y = sqrt (x);
>>>>
>>>> The cdce pass tries to transform the call into an internal function
>>>> call and conditionally executes call with a simple range check on the
>>>> arguments which can detect most cases and the errno does not need
>>>> to be set. It looks like:
>>>>
>>>>     y = IFN_SQRT (x);
>>>>     if (__builtin_isless (x, 0))
>>>>       sqrt (x);
>>>>
>>>> However, If the call is in tail position, for example:
>>>>
>>>>     y =  sqrt (x);
>>>>     return y;
>>>>
>>>> will become:
>>>>
>>>>     y = IFN_SQRT (x);
>>>>     if (__builtin_isless (x, 0))
>>>>       sqrt (x);
>>>>     return y;
>>>>
>>>> This transformation breaks tailcall pattern, and prevents
>>>> later tailcall optimizations.
>>>>
>>>> So This patch transform builtin call with return value into
>>>> if-then-else part, which looks like:
>>>>
>>>>     y =  sqrt (x);
>>>>      ==>
>>>>     if (__builtin_isless (x, 0))
>>>>       y = sqrt (x);
>>>>     else
>>>>       y = IFN_SQRT (x);
>>>>
>>>> BTW, y = sqrt (x) can also transform like:
>>>>
>>>>     y = IFN_SQRT (x);
>>>>     if (__builtin_isless (x, 0))
>>>>       y = sqrt (x);
>>>>
>>>> We don‘t choose this pattern because it emits worse assemble
>>>> code(more move instruction and use more registers) in x86_64.
>>>>
>>>> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>>> OK.
>> JunMa -- do you have a copyright assignment on file and write access to
>> the repository?  If not we should take care of that before proceeding
>> further.
> Hi Jeff,
> Thanks very much for helping.
> Yes, he is under Alibaba's copyright assignment.  What else should we
> do other than noticing in this mailing list message?
> BTW, I think JunMa needs to apply write-access though.

Yes, I do not have write access, FYI.

Thanks
JunMa


> Thanks,
> bin
>> Jeff
Jeff Law May 9, 2019, 2:54 p.m. UTC | #5
On 5/8/19 7:20 PM, Bin.Cheng wrote:
> On Thu, May 9, 2019 at 5:31 AM Jeff Law <law@redhat.com> wrote:
>>
>> On 5/8/19 6:28 AM, Richard Biener wrote:
>>> On Wed, May 8, 2019 at 12:09 PM JunMa <JunMa@linux.alibaba.com> wrote:
>>>>
>>>> Hi
>>>>
>>>> As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
>>>> when gcc meets builtin function call like:
>>>>
>>>>    y = sqrt (x);
>>>>
>>>> The cdce pass tries to transform the call into an internal function
>>>> call and conditionally executes call with a simple range check on the
>>>> arguments which can detect most cases and the errno does not need
>>>> to be set. It looks like:
>>>>
>>>>    y = IFN_SQRT (x);
>>>>    if (__builtin_isless (x, 0))
>>>>      sqrt (x);
>>>>
>>>> However, If the call is in tail position, for example:
>>>>
>>>>    y =  sqrt (x);
>>>>    return y;
>>>>
>>>> will become:
>>>>
>>>>    y = IFN_SQRT (x);
>>>>    if (__builtin_isless (x, 0))
>>>>      sqrt (x);
>>>>    return y;
>>>>
>>>> This transformation breaks tailcall pattern, and prevents
>>>> later tailcall optimizations.
>>>>
>>>> So This patch transform builtin call with return value into
>>>> if-then-else part, which looks like:
>>>>
>>>>    y =  sqrt (x);
>>>>     ==>
>>>>    if (__builtin_isless (x, 0))
>>>>      y = sqrt (x);
>>>>    else
>>>>      y = IFN_SQRT (x);
>>>>
>>>> BTW, y = sqrt (x) can also transform like:
>>>>
>>>>    y = IFN_SQRT (x);
>>>>    if (__builtin_isless (x, 0))
>>>>      y = sqrt (x);
>>>>
>>>> We don‘t choose this pattern because it emits worse assemble
>>>> code(more move instruction and use more registers) in x86_64.
>>>>
>>>> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>>>
>>> OK.
>> JunMa -- do you have a copyright assignment on file and write access to
>> the repository?  If not we should take care of that before proceeding
>> further.
> Hi Jeff,
> Thanks very much for helping.
> Yes, he is under Alibaba's copyright assignment.  What else should we
> do other than noticing in this mailing list message?
> BTW, I think JunMa needs to apply write-access though.
If he's covered under a corporate assignment, then we're fine from that
standpoint.

For write access, we just need to go through the usual procedure.  JunMa
just needs to fill out this form, I'm happy to sponsor him.

https://sourceware.org/cgi-bin/pdw/ps_form.cgi

Jeff
Jeff Law May 9, 2019, 8 p.m. UTC | #6
On 5/8/19 8:25 PM, JunMa wrote:
> 在 2019/5/9 上午9:20, Bin.Cheng 写道:
>> On Thu, May 9, 2019 at 5:31 AM Jeff Law <law@redhat.com> wrote:
>>> On 5/8/19 6:28 AM, Richard Biener wrote:
>>>> On Wed, May 8, 2019 at 12:09 PM JunMa <JunMa@linux.alibaba.com> wrote:
>>>>> Hi
>>>>>
>>>>> As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
>>>>> when gcc meets builtin function call like:
>>>>>
>>>>>     y = sqrt (x);
>>>>>
>>>>> The cdce pass tries to transform the call into an internal function
>>>>> call and conditionally executes call with a simple range check on the
>>>>> arguments which can detect most cases and the errno does not need
>>>>> to be set. It looks like:
>>>>>
>>>>>     y = IFN_SQRT (x);
>>>>>     if (__builtin_isless (x, 0))
>>>>>       sqrt (x);
>>>>>
>>>>> However, If the call is in tail position, for example:
>>>>>
>>>>>     y =  sqrt (x);
>>>>>     return y;
>>>>>
>>>>> will become:
>>>>>
>>>>>     y = IFN_SQRT (x);
>>>>>     if (__builtin_isless (x, 0))
>>>>>       sqrt (x);
>>>>>     return y;
>>>>>
>>>>> This transformation breaks tailcall pattern, and prevents
>>>>> later tailcall optimizations.
>>>>>
>>>>> So This patch transform builtin call with return value into
>>>>> if-then-else part, which looks like:
>>>>>
>>>>>     y =  sqrt (x);
>>>>>      ==>
>>>>>     if (__builtin_isless (x, 0))
>>>>>       y = sqrt (x);
>>>>>     else
>>>>>       y = IFN_SQRT (x);
>>>>>
>>>>> BTW, y = sqrt (x) can also transform like:
>>>>>
>>>>>     y = IFN_SQRT (x);
>>>>>     if (__builtin_isless (x, 0))
>>>>>       y = sqrt (x);
>>>>>
>>>>> We don‘t choose this pattern because it emits worse assemble
>>>>> code(more move instruction and use more registers) in x86_64.
>>>>>
>>>>> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>>>> OK.
>>> JunMa -- do you have a copyright assignment on file and write access to
>>> the repository?  If not we should take care of that before proceeding
>>> further.
>> Hi Jeff,
>> Thanks very much for helping.
>> Yes, he is under Alibaba's copyright assignment.  What else should we
>> do other than noticing in this mailing list message?
>> BTW, I think JunMa needs to apply write-access though.
> 
> Yes, I do not have write access, FYI.
OK.  So in my response to Bin I gave you a link to the form to fill out
to get write access.  List me as your sponsor.  Once that's all taken
care of and working, go ahead and commit this change to the trunk.

Jeff
JunMa May 10, 2019, 1:04 a.m. UTC | #7
在 2019/5/10 上午4:00, Jeff Law 写道:
> On 5/8/19 8:25 PM, JunMa wrote:
>> 在 2019/5/9 上午9:20, Bin.Cheng 写道:
>>> On Thu, May 9, 2019 at 5:31 AM Jeff Law <law@redhat.com> wrote:
>>>> On 5/8/19 6:28 AM, Richard Biener wrote:
>>>>> On Wed, May 8, 2019 at 12:09 PM JunMa <JunMa@linux.alibaba.com> wrote:
>>>>>> Hi
>>>>>>
>>>>>> As PR90106 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106),
>>>>>> when gcc meets builtin function call like:
>>>>>>
>>>>>>      y = sqrt (x);
>>>>>>
>>>>>> The cdce pass tries to transform the call into an internal function
>>>>>> call and conditionally executes call with a simple range check on the
>>>>>> arguments which can detect most cases and the errno does not need
>>>>>> to be set. It looks like:
>>>>>>
>>>>>>      y = IFN_SQRT (x);
>>>>>>      if (__builtin_isless (x, 0))
>>>>>>        sqrt (x);
>>>>>>
>>>>>> However, If the call is in tail position, for example:
>>>>>>
>>>>>>      y =  sqrt (x);
>>>>>>      return y;
>>>>>>
>>>>>> will become:
>>>>>>
>>>>>>      y = IFN_SQRT (x);
>>>>>>      if (__builtin_isless (x, 0))
>>>>>>        sqrt (x);
>>>>>>      return y;
>>>>>>
>>>>>> This transformation breaks tailcall pattern, and prevents
>>>>>> later tailcall optimizations.
>>>>>>
>>>>>> So This patch transform builtin call with return value into
>>>>>> if-then-else part, which looks like:
>>>>>>
>>>>>>      y =  sqrt (x);
>>>>>>       ==>
>>>>>>      if (__builtin_isless (x, 0))
>>>>>>        y = sqrt (x);
>>>>>>      else
>>>>>>        y = IFN_SQRT (x);
>>>>>>
>>>>>> BTW, y = sqrt (x) can also transform like:
>>>>>>
>>>>>>      y = IFN_SQRT (x);
>>>>>>      if (__builtin_isless (x, 0))
>>>>>>        y = sqrt (x);
>>>>>>
>>>>>> We don‘t choose this pattern because it emits worse assemble
>>>>>> code(more move instruction and use more registers) in x86_64.
>>>>>>
>>>>>> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>>>>> OK.
>>>> JunMa -- do you have a copyright assignment on file and write access to
>>>> the repository?  If not we should take care of that before proceeding
>>>> further.
>>> Hi Jeff,
>>> Thanks very much for helping.
>>> Yes, he is under Alibaba's copyright assignment.  What else should we
>>> do other than noticing in this mailing list message?
>>> BTW, I think JunMa needs to apply write-access though.
>> Yes, I do not have write access, FYI.
> OK.  So in my response to Bin I gave you a link to the form to fill out
> to get write access.  List me as your sponsor.  Once that's all taken
> care of and working, go ahead and commit this change to the trunk.
Thanks!
JunMa
> Jeff
Jakub Jelinek May 16, 2019, 9:39 p.m. UTC | #8
On Wed, May 08, 2019 at 06:09:06PM +0800, JunMa wrote:
> 2019-05-07  Jun Ma <JunMa@linux.alibaba.com>
> 
>     PR tree-optimization/90106
>     * gcc.dg/cdce1.c: Check tailcall code generation after cdce pass.
>     * gcc.dg/cdce2.c: Likewise.

This is wrong and results in UNSUPPORTED failures.
Both tests are dg-do run, so you can't scan-assembler for them, assembly
isn't generated at all (it could be if -save-temps is in dg-options).
Furthermore, I don't any target actually spells the tailcall in the assembly
as jmp space pow/log, rather than say on x86_64/i686 jmp tab pow/log, but
on most other targets completely differently (and some targets don't support
tail calls at all, or have various limitations for them etc.).

One possibility is to add -fdump-tree-optimized and scan for
/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
resp.
/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */

> diff --git a/gcc/testsuite/gcc.dg/cdce1.c b/gcc/testsuite/gcc.dg/cdce1.c
> index b23ad63..424d80f 100644
> --- a/gcc/testsuite/gcc.dg/cdce1.c
> +++ b/gcc/testsuite/gcc.dg/cdce1.c
> @@ -1,7 +1,8 @@
>  /* { dg-do  run  } */
>  /* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
>  /* { dg-require-effective-target int32plus } */
> -/* { dg-final { scan-tree-dump  "cdce1.c:16: .* function call is shrink-wrapped into error conditions\."  "cdce" } } */
> +/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is shrink-wrapped into error conditions\."  "cdce" } } */
> +/* { dg-final { scan-assembler     "jmp pow" } } */
>  /* { dg-require-effective-target large_double } */
>  
>  #include <stdlib.h>
> diff --git a/gcc/testsuite/gcc.dg/cdce2.c b/gcc/testsuite/gcc.dg/cdce2.c
> index 30e7cb1..2af2893 100644
> --- a/gcc/testsuite/gcc.dg/cdce2.c
> +++ b/gcc/testsuite/gcc.dg/cdce2.c
> @@ -1,7 +1,8 @@
>  /* { dg-do  run  } */
>  /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
>  /* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
> -/* { dg-final { scan-tree-dump  "cdce2.c:15: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> +/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> +/* { dg-final { scan-assembler "jmp log" } } */
>   
>  #include <stdlib.h>
>  #include <math.h>

	Jakub
Jakub Jelinek May 16, 2019, 10:04 p.m. UTC | #9
On Thu, May 16, 2019 at 11:39:38PM +0200, Jakub Jelinek wrote:
> One possibility is to add -fdump-tree-optimized and scan for
> /* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
> resp.
> /* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */

Here it is in patch form.

That said, I'm not convinced your patch does what you wanted, because
comparing a month old trunk with today's trunk generates the same assembly
except for .ident, generates as many [tail call] lines in *.optimized dump
as before, emits the same number of jmp\tpow and jmp\tlog instructions as
before (one in a separate routine).

But at least the tests aren't UNSUPPORTED anymore.

2019-05-16  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/90106
	* gcc.dg/cdce1.c: Don't scan-assembler, instead -fdump-tree-optimized
	and scan-tree-dump for tail call.
	* gcc.dg/cdce2.c: Likewise.

--- gcc/testsuite/gcc.dg/cdce1.c.jj	2019-05-16 11:28:22.750177582 +0200
+++ gcc/testsuite/gcc.dg/cdce1.c	2019-05-16 23:50:23.618450891 +0200
@@ -1,9 +1,9 @@
-/* { dg-do  run  } */
-/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
+/* { dg-do run } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized -lm" } */
 /* { dg-require-effective-target int32plus } */
-/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is shrink-wrapped into error conditions\."  "cdce" } } */
-/* { dg-final { scan-assembler     "jmp pow" } } */
 /* { dg-require-effective-target large_double } */
+/* { dg-final { scan-tree-dump "cdce1.c:17: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
 
 #include <stdlib.h>
 #include <math.h>
--- gcc/testsuite/gcc.dg/cdce2.c.jj	2019-05-16 11:28:22.781177075 +0200
+++ gcc/testsuite/gcc.dg/cdce2.c	2019-05-16 23:50:58.505880845 +0200
@@ -1,8 +1,8 @@
-/* { dg-do  run  } */
+/* { dg-do run } */
 /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
-/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
-/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
-/* { dg-final { scan-assembler "jmp log" } } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized -lm" } */
+/* { dg-final { scan-tree-dump "cdce2.c:16: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
  
 #include <stdlib.h>
 #include <math.h>


	Jakub
JunMa May 17, 2019, 3:09 a.m. UTC | #10
在 2019/5/17 上午6:04, Jakub Jelinek 写道:
> On Thu, May 16, 2019 at 11:39:38PM +0200, Jakub Jelinek wrote:
>> One possibility is to add -fdump-tree-optimized and scan for
>> /* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
>> resp.
>> /* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
> Here it is in patch form.
>
> That said, I'm not convinced your patch does what you wanted, because
> comparing a month old trunk with today's trunk generates the same assembly
> except for .ident, generates as many [tail call] lines in *.optimized dump
> as before, emits the same number of jmp\tpow and jmp\tlog instructions as
> before (one in a separate routine).


Thanks for point out the mistake and fix it.

For these two tests, cdce pass doesn't transform the builtin math 
functions in foo
with or without the patch because they cannot use internal functions.

I'll add another testcase to verify the patch.

Regards
Jun


> But at least the tests aren't UNSUPPORTED anymore.
>
> 2019-05-16  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR tree-optimization/90106
> 	* gcc.dg/cdce1.c: Don't scan-assembler, instead -fdump-tree-optimized
> 	and scan-tree-dump for tail call.
> 	* gcc.dg/cdce2.c: Likewise.
>
> --- gcc/testsuite/gcc.dg/cdce1.c.jj	2019-05-16 11:28:22.750177582 +0200
> +++ gcc/testsuite/gcc.dg/cdce1.c	2019-05-16 23:50:23.618450891 +0200
> @@ -1,9 +1,9 @@
> -/* { dg-do  run  } */
> -/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized -lm" } */
>   /* { dg-require-effective-target int32plus } */
> -/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is shrink-wrapped into error conditions\."  "cdce" } } */
> -/* { dg-final { scan-assembler     "jmp pow" } } */
>   /* { dg-require-effective-target large_double } */
> +/* { dg-final { scan-tree-dump "cdce1.c:17: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> +/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
>   
>   #include <stdlib.h>
>   #include <math.h>
> --- gcc/testsuite/gcc.dg/cdce2.c.jj	2019-05-16 11:28:22.781177075 +0200
> +++ gcc/testsuite/gcc.dg/cdce2.c	2019-05-16 23:50:58.505880845 +0200
> @@ -1,8 +1,8 @@
> -/* { dg-do  run  } */
> +/* { dg-do run } */
>   /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
> -/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
> -/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> -/* { dg-final { scan-assembler "jmp log" } } */
> +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized -lm" } */
> +/* { dg-final { scan-tree-dump "cdce2.c:16: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> +/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
>    
>   #include <stdlib.h>
>   #include <math.h>
>
>
> 	Jakub
JunMa May 17, 2019, 6:24 a.m. UTC | #11
在 2019/5/17 上午11:09, JunMa 写道:
> 在 2019/5/17 上午6:04, Jakub Jelinek 写道:
>> On Thu, May 16, 2019 at 11:39:38PM +0200, Jakub Jelinek wrote:
>>> One possibility is to add -fdump-tree-optimized and scan for
>>> /* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail 
>>> call\\\]" "optimized" } } */
>>> resp.
>>> /* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail 
>>> call\\\]" "optimized" } } */
>> Here it is in patch form.
>>
>> That said, I'm not convinced your patch does what you wanted, because
>> comparing a month old trunk with today's trunk generates the same 
>> assembly
>> except for .ident, generates as many [tail call] lines in *.optimized 
>> dump
>> as before, emits the same number of jmp\tpow and jmp\tlog 
>> instructions as
>> before (one in a separate routine).
>
>
> Thanks for point out the mistake and fix it.
>
> For these two tests, cdce pass doesn't transform the builtin math 
> functions in foo
> with or without the patch because they cannot use internal functions.
>
> I'll add another testcase to verify the patch.


Here is the new testcase.

The sqrtf function call keeps as tailcall with the patch
but not without the patch. For more details, you can read
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

Regards
Jun


gcc/testsuite/ChangeLog

2019-05-17  Jun Ma <JunMa@linux.alibaba.com>

     PR tree-optimization/90106
     * gcc.dg/cdce3.c: New test.


>
> Regards
> Jun
>
>
>> But at least the tests aren't UNSUPPORTED anymore.
>>
>> 2019-05-16  Jakub Jelinek  <jakub@redhat.com>
>>
>>     PR tree-optimization/90106
>>     * gcc.dg/cdce1.c: Don't scan-assembler, instead 
>> -fdump-tree-optimized
>>     and scan-tree-dump for tail call.
>>     * gcc.dg/cdce2.c: Likewise.
>>
>> --- gcc/testsuite/gcc.dg/cdce1.c.jj    2019-05-16 11:28:22.750177582 
>> +0200
>> +++ gcc/testsuite/gcc.dg/cdce1.c    2019-05-16 23:50:23.618450891 +0200
>> @@ -1,9 +1,9 @@
>> -/* { dg-do  run  } */
>> -/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -lm" } */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details 
>> -fdump-tree-optimized -lm" } */
>>   /* { dg-require-effective-target int32plus } */
>> -/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is 
>> shrink-wrapped into error conditions\."  "cdce" } } */
>> -/* { dg-final { scan-assembler     "jmp pow" } } */
>>   /* { dg-require-effective-target large_double } */
>> +/* { dg-final { scan-tree-dump "cdce1.c:17: .* function call is 
>> shrink-wrapped into error conditions\." "cdce" } } */
>> +/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail 
>> call\\\]" "optimized" } } */
>>     #include <stdlib.h>
>>   #include <math.h>
>> --- gcc/testsuite/gcc.dg/cdce2.c.jj    2019-05-16 11:28:22.781177075 
>> +0200
>> +++ gcc/testsuite/gcc.dg/cdce2.c    2019-05-16 23:50:58.505880845 +0200
>> @@ -1,8 +1,8 @@
>> -/* { dg-do  run  } */
>> +/* { dg-do run } */
>>   /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
>> -/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -lm" } */
>> -/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is 
>> shrink-wrapped into error conditions\." "cdce" } } */
>> -/* { dg-final { scan-assembler "jmp log" } } */
>> +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details 
>> -fdump-tree-optimized -lm" } */
>> +/* { dg-final { scan-tree-dump "cdce2.c:16: .* function call is 
>> shrink-wrapped into error conditions\." "cdce" } } */
>> +/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail 
>> call\\\]" "optimized" } } */
>>      #include <stdlib.h>
>>   #include <math.h>
>>
>>
>>     Jakub
>
---
 gcc/testsuite/gcc.dg/cdce3.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/cdce3.c

diff --git a/gcc/testsuite/gcc.dg/cdce3.c b/gcc/testsuite/gcc.dg/cdce3.c
new file mode 100644
index 0000000..0062c4f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cdce3.c
@@ -0,0 +1,12 @@
+/* { dg-do  compile } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized  -lm" } */
+/* { dg-final { scan-tree-dump "cdce3.c:10: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "sqrtf \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
+
+#include <math.h>
+
+float foo ( float x )
+{
+  return sqrtf( x );
+}
+
Jakub Jelinek May 17, 2019, 6:34 a.m. UTC | #12
On Fri, May 17, 2019 at 02:24:22PM +0800, JunMa wrote:
> 2019-05-17  Jun Ma <JunMa@linux.alibaba.com>

Two spaces before < rather than one.

>     PR tree-optimization/90106
>     * gcc.dg/cdce3.c: New test.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/cdce3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do  compile } */

Just use one space instead of two.

> +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized  -lm" } */

For compile time test, no need to add "  -lm" (well, no need to add it even
for link/run tests).

> +/* { dg-final { scan-tree-dump "cdce3.c:10: .* function call is shrink-wrapped into error conditions\." "cdce" } } */

Please use \[^\n\r]* instead of .*, you don't want newlines matched in
there.

> +/* { dg-final { scan-tree-dump "sqrtf \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
> +
> +#include <math.h>

Wouldn't it be better to just declare it yourself:
float sqrtf (float);
?
You really don't know what the target math.h includes.

> +
> +float foo ( float x )
> +{
> +  return sqrtf( x );
> +}
> +
> -- 
> 1.8.3.1
> 

	Jakub
JunMa May 17, 2019, 7:02 a.m. UTC | #13
在 2019/5/17 下午2:34, Jakub Jelinek 写道:
> On Fri, May 17, 2019 at 02:24:22PM +0800, JunMa wrote:
>> 2019-05-17  Jun Ma <JunMa@linux.alibaba.com>
> Two spaces before < rather than one.
>
>>      PR tree-optimization/90106
>>      * gcc.dg/cdce3.c: New test.
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/cdce3.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do  compile } */
> Just use one space instead of two.
>
>> +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized  -lm" } */
> For compile time test, no need to add "  -lm" (well, no need to add it even
> for link/run tests).
>
>> +/* { dg-final { scan-tree-dump "cdce3.c:10: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> Please use \[^\n\r]* instead of .*, you don't want newlines matched in
> there.
>
>> +/* { dg-final { scan-tree-dump "sqrtf \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
>> +
>> +#include <math.h>
> Wouldn't it be better to just declare it yourself:
> float sqrtf (float);
> ?
> You really don't know what the target math.h includes.
>
>> +
>> +float foo ( float x )
>> +{
>> +  return sqrtf( x );
>> +}
>> +
>> -- 
>> 1.8.3.1
>>

Thanks for review so carefully. Update the patch based on your suggestion.

Regards
Jun

gcc/testsuite/ChangeLog

2019-05-17  Jun Ma  <JunMa@linux.alibaba.com>

     PR tree-optimization/90106
     * gcc.dg/cdce3.c: New test.


> 	Jakub
---
 gcc/testsuite/gcc.dg/cdce3.c | 11 +++++++++++
 1 file changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/cdce3.c

diff --git a/gcc/testsuite/gcc.dg/cdce3.c b/gcc/testsuite/gcc.dg/cdce3.c
new file mode 100644
index 0000000..8a74379
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cdce3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump "cdce3.c:9: \[^\n\r]* function call is shrink-wrapped into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump "sqrtf \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
+
+float sqrtf (float);
+float foo (float x)
+{
+  return sqrtf (x);
+}
+
Jakub Jelinek May 17, 2019, 7:44 a.m. UTC | #14
On Fri, May 17, 2019 at 03:02:51PM +0800, JunMa wrote:
> 2019-05-17  Jun Ma  <JunMa@linux.alibaba.com>
> 
>     PR tree-optimization/90106
>     * gcc.dg/cdce3.c: New test.

Ok, thanks.

	Jakub
Jakub Jelinek May 24, 2019, 10:10 a.m. UTC | #15
On Fri, May 17, 2019 at 12:04:16AM +0200, Jakub Jelinek wrote:
> On Thu, May 16, 2019 at 11:39:38PM +0200, Jakub Jelinek wrote:
> > One possibility is to add -fdump-tree-optimized and scan for
> > /* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
> > resp.
> > /* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
> 
> Here it is in patch form.
> 
> That said, I'm not convinced your patch does what you wanted, because
> comparing a month old trunk with today's trunk generates the same assembly
> except for .ident, generates as many [tail call] lines in *.optimized dump
> as before, emits the same number of jmp\tpow and jmp\tlog instructions as
> before (one in a separate routine).
> 
> But at least the tests aren't UNSUPPORTED anymore.
> 
> 2019-05-16  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/90106
> 	* gcc.dg/cdce1.c: Don't scan-assembler, instead -fdump-tree-optimized
> 	and scan-tree-dump for tail call.
> 	* gcc.dg/cdce2.c: Likewise.

I'd like to ping this patch.  A new cdce3.c testcase has been added in the
mean time, so the above "That said," paragraph is resolved through that.

> --- gcc/testsuite/gcc.dg/cdce1.c.jj	2019-05-16 11:28:22.750177582 +0200
> +++ gcc/testsuite/gcc.dg/cdce1.c	2019-05-16 23:50:23.618450891 +0200
> @@ -1,9 +1,9 @@
> -/* { dg-do  run  } */
> -/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized -lm" } */
>  /* { dg-require-effective-target int32plus } */
> -/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is shrink-wrapped into error conditions\."  "cdce" } } */
> -/* { dg-final { scan-assembler     "jmp pow" } } */
>  /* { dg-require-effective-target large_double } */
> +/* { dg-final { scan-tree-dump "cdce1.c:17: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> +/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
>  
>  #include <stdlib.h>
>  #include <math.h>
> --- gcc/testsuite/gcc.dg/cdce2.c.jj	2019-05-16 11:28:22.781177075 +0200
> +++ gcc/testsuite/gcc.dg/cdce2.c	2019-05-16 23:50:58.505880845 +0200
> @@ -1,8 +1,8 @@
> -/* { dg-do  run  } */
> +/* { dg-do run } */
>  /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
> -/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
> -/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> -/* { dg-final { scan-assembler "jmp log" } } */
> +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized -lm" } */
> +/* { dg-final { scan-tree-dump "cdce2.c:16: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> +/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
>   
>  #include <stdlib.h>
>  #include <math.h>
> 

	Jakub
Richard Biener May 24, 2019, 10:13 a.m. UTC | #16
On Fri, 24 May 2019, Jakub Jelinek wrote:

> On Fri, May 17, 2019 at 12:04:16AM +0200, Jakub Jelinek wrote:
> > On Thu, May 16, 2019 at 11:39:38PM +0200, Jakub Jelinek wrote:
> > > One possibility is to add -fdump-tree-optimized and scan for
> > > /* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
> > > resp.
> > > /* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
> > 
> > Here it is in patch form.
> > 
> > That said, I'm not convinced your patch does what you wanted, because
> > comparing a month old trunk with today's trunk generates the same assembly
> > except for .ident, generates as many [tail call] lines in *.optimized dump
> > as before, emits the same number of jmp\tpow and jmp\tlog instructions as
> > before (one in a separate routine).
> > 
> > But at least the tests aren't UNSUPPORTED anymore.
> > 
> > 2019-05-16  Jakub Jelinek  <jakub@redhat.com>
> > 
> > 	PR tree-optimization/90106
> > 	* gcc.dg/cdce1.c: Don't scan-assembler, instead -fdump-tree-optimized
> > 	and scan-tree-dump for tail call.
> > 	* gcc.dg/cdce2.c: Likewise.
> 
> I'd like to ping this patch.  A new cdce3.c testcase has been added in the
> mean time, so the above "That said," paragraph is resolved through that.

OK

> > --- gcc/testsuite/gcc.dg/cdce1.c.jj	2019-05-16 11:28:22.750177582 +0200
> > +++ gcc/testsuite/gcc.dg/cdce1.c	2019-05-16 23:50:23.618450891 +0200
> > @@ -1,9 +1,9 @@
> > -/* { dg-do  run  } */
> > -/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized -lm" } */
> >  /* { dg-require-effective-target int32plus } */
> > -/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is shrink-wrapped into error conditions\."  "cdce" } } */
> > -/* { dg-final { scan-assembler     "jmp pow" } } */
> >  /* { dg-require-effective-target large_double } */
> > +/* { dg-final { scan-tree-dump "cdce1.c:17: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> > +/* { dg-final { scan-tree-dump "pow \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
> >  
> >  #include <stdlib.h>
> >  #include <math.h>
> > --- gcc/testsuite/gcc.dg/cdce2.c.jj	2019-05-16 11:28:22.781177075 +0200
> > +++ gcc/testsuite/gcc.dg/cdce2.c	2019-05-16 23:50:58.505880845 +0200
> > @@ -1,8 +1,8 @@
> > -/* { dg-do  run  } */
> > +/* { dg-do run } */
> >  /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
> > -/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
> > -/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> > -/* { dg-final { scan-assembler "jmp log" } } */
> > +/* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details -fdump-tree-optimized -lm" } */
> > +/* { dg-final { scan-tree-dump "cdce2.c:16: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
> > +/* { dg-final { scan-tree-dump "log \\(\[^\n\r]*\\); \\\[tail call\\\]" "optimized" } } */
> >   
> >  #include <stdlib.h>
> >  #include <math.h>
> > 
> 
> 	Jakub
>
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/cdce1.c b/gcc/testsuite/gcc.dg/cdce1.c
index b23ad63..424d80f 100644
--- a/gcc/testsuite/gcc.dg/cdce1.c
+++ b/gcc/testsuite/gcc.dg/cdce1.c
@@ -1,7 +1,8 @@ 
 /* { dg-do  run  } */
 /* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
 /* { dg-require-effective-target int32plus } */
-/* { dg-final { scan-tree-dump  "cdce1.c:16: .* function call is shrink-wrapped into error conditions\."  "cdce" } } */
+/* { dg-final { scan-tree-dump  "cdce1.c:17: .* function call is shrink-wrapped into error conditions\."  "cdce" } } */
+/* { dg-final { scan-assembler     "jmp pow" } } */
 /* { dg-require-effective-target large_double } */
 
 #include <stdlib.h>
diff --git a/gcc/testsuite/gcc.dg/cdce2.c b/gcc/testsuite/gcc.dg/cdce2.c
index 30e7cb1..2af2893 100644
--- a/gcc/testsuite/gcc.dg/cdce2.c
+++ b/gcc/testsuite/gcc.dg/cdce2.c
@@ -1,7 +1,8 @@ 
 /* { dg-do  run  } */
 /* { dg-skip-if "doubles are floats" { "avr-*-*" } } */
 /* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details  -lm" } */
-/* { dg-final { scan-tree-dump  "cdce2.c:15: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
+/* { dg-final { scan-tree-dump  "cdce2.c:16: .* function call is shrink-wrapped into error conditions\." "cdce" } } */
+/* { dg-final { scan-assembler "jmp log" } } */
  
 #include <stdlib.h>
 #include <math.h>
diff --git a/gcc/tree-call-cdce.c b/gcc/tree-call-cdce.c
index 2e482b3..9e3372f 100644
--- a/gcc/tree-call-cdce.c
+++ b/gcc/tree-call-cdce.c
@@ -93,10 +93,10 @@  along with GCC; see the file COPYING3.  If not see
 
 	y = sqrt (x);
      ==>
-	y = IFN_SQRT (x);
 	if (__builtin_isless (x, 0))
-	    sqrt (x);
-
+	  y =  sqrt (x);
+	else
+	  y = IFN_SQRT (x);
      In the vast majority of cases we should then never need to call sqrt.
 
    Note that library functions are not supposed to clear errno to zero without
@@ -793,14 +793,16 @@  gen_shrink_wrap_conditions (gcall *bi_call, vec<gimple *> conds,
 }
 
 /* Shrink-wrap BI_CALL so that it is only called when one of the NCONDS
-   conditions in CONDS is false.  */
+   conditions in CONDS is false.  Also move BI_NEWCALL to a new basic block
+   when it is non-null, it is called while all of the CONDS are true.  */
 
 static void
 shrink_wrap_one_built_in_call_with_conds (gcall *bi_call, vec <gimple *> conds,
-					  unsigned int nconds)
+					  unsigned int nconds,
+					  gcall *bi_newcall = NULL)
 {
   gimple_stmt_iterator bi_call_bsi;
-  basic_block bi_call_bb, join_tgt_bb, guard_bb;
+  basic_block bi_call_bb, bi_newcall_bb, join_tgt_bb, guard_bb;
   edge join_tgt_in_edge_from_call, join_tgt_in_edge_fall_thru;
   edge bi_call_in_edge0, guard_bb_in_edge;
   unsigned tn_cond_stmts;
@@ -809,27 +811,26 @@  shrink_wrap_one_built_in_call_with_conds (gcall *bi_call, vec <gimple *> conds,
   gimple *cond_expr_start;
 
   /* The cfg we want to create looks like this:
-
-	   [guard n-1]         <- guard_bb (old block)
-	     |    \
-	     | [guard n-2]                   }
-	     |    / \                        }
-	     |   /  ...                      } new blocks
-	     |  /  [guard 0]                 }
-	     | /    /   |                    }
-	    [ call ]    |     <- bi_call_bb  }
-	     | \        |
-	     |  \       |
-	     |   [ join ]     <- join_tgt_bb (old iff call must end bb)
-	     |
+          [guard n-1]         <- guard_bb (old block)
+            |    \
+            | [guard n-2]                   }
+            |    / \                        }
+            |   /  ...                      } new blocks
+            |  /  [guard 0]                 }
+            | /  /    |                     }
+           [call]     |      <- bi_call_bb  }
+             \    [newcall]  <-bi_newcall_bb}
+              \       |
+                [join]       <- join_tgt_bb (old iff call must end bb)
 	 possible EH edges (only if [join] is old)
 
      When [join] is new, the immediate dominators for these blocks are:
 
      1. [guard n-1]: unchanged
      2. [call]: [guard n-1]
-     3. [guard m]: [guard m+1] for 0 <= m <= n-2
-     4. [join]: [guard n-1]
+     3. [newcall]: [guard 0]
+     4. [guard m]: [guard m+1] for 0 <= m <= n-2
+     5. [join]: [guard n-1]
 
      We punt for the more complex case case of [join] being old and
      simply free the dominance info.  We also punt on postdominators,
@@ -927,6 +928,47 @@  shrink_wrap_one_built_in_call_with_conds (gcall *bi_call, vec <gimple *> conds,
       edges.quick_push (edge_pair (bi_call_in_edge, guard_bb_in_edge));
     }
 
+  /* Move BI_NEWCALL to new basic block when it is non-null.  */
+  if (bi_newcall)
+    {
+      /* Get bi_newcall_bb by split join_tgt_in_edge_fall_thru edge,
+         and move BI_NEWCALL to bi_newcall_bb.  */
+      bi_newcall_bb = split_edge (join_tgt_in_edge_fall_thru);
+      gimple_stmt_iterator to_gsi = gsi_start_bb (bi_newcall_bb);
+      gimple_stmt_iterator from_gsi = gsi_for_stmt (bi_newcall);
+      gsi_move_before (&from_gsi, &to_gsi);
+      join_tgt_in_edge_fall_thru = EDGE_SUCC (bi_newcall_bb, 0);
+      join_tgt_bb = join_tgt_in_edge_fall_thru->dest;
+
+      tree bi_newcall_lhs = gimple_call_lhs (bi_newcall);
+      tree bi_call_lhs = gimple_call_lhs (bi_call);
+      if (!bi_call_lhs)
+        {
+          bi_call_lhs = copy_ssa_name (bi_newcall_lhs);
+          gimple_call_set_lhs (bi_call, bi_call_lhs);
+          SSA_NAME_DEF_STMT (bi_call_lhs) = bi_call;
+        }
+
+      /* Create phi node for lhs of BI_CALL and BI_NEWCALL.  */
+      gphi *new_phi = create_phi_node (copy_ssa_name (bi_newcall_lhs),
+				       join_tgt_bb);
+      SSA_NAME_OCCURS_IN_ABNORMAL_PHI (PHI_RESULT (new_phi))
+        = SSA_NAME_OCCURS_IN_ABNORMAL_PHI (bi_newcall_lhs);
+      add_phi_arg (new_phi, bi_call_lhs, join_tgt_in_edge_from_call,
+                   gimple_location (bi_call));
+      add_phi_arg (new_phi, bi_newcall_lhs, join_tgt_in_edge_fall_thru,
+                   gimple_location (bi_newcall));
+
+      /* Replace all use of original return value with result of phi node.  */
+      use_operand_p use_p;
+      gimple *use_stmt;
+      imm_use_iterator iterator;
+      FOR_EACH_IMM_USE_STMT (use_stmt, iterator, bi_newcall_lhs)
+        if (use_stmt != new_phi)
+	  FOR_EACH_IMM_USE_ON_STMT (use_p, iterator)
+	    SET_USE (use_p, PHI_RESULT (new_phi));
+    }
+
   /* Now update the probability and profile information, processing the
      guards in order of execution.
 
@@ -1030,9 +1072,11 @@  use_internal_fn (gcall *call)
 
   unsigned nconds = 0;
   auto_vec<gimple *, 12> conds;
+  bool is_arg_conds = false;
   if (can_test_argument_range (call))
     {
       gen_shrink_wrap_conditions (call, conds, &nconds);
+      is_arg_conds = true;
       gcc_assert (nconds != 0);
     }
   else
@@ -1082,8 +1126,8 @@  use_internal_fn (gcall *call)
 	  call = new_call;
 	}
     }
-
-  shrink_wrap_one_built_in_call_with_conds (call, conds, nconds);
+  shrink_wrap_one_built_in_call_with_conds (call, conds, nconds,
+					    is_arg_conds ? new_call : NULL);
 }
 
 /* The top level function for conditional dead code shrink