diff mbox series

Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)

Message ID 20200415084755.72653-1-luoxhu@linux.ibm.com
State New
Headers show
Series Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837) | expand

Commit Message

Li, Pan2 via Gcc-patches April 15, 2020, 8:47 a.m. UTC
From: Xionghu Luo <luoxhu@linux.ibm.com>

This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).

Andraw:
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
Joseph:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html

However, the doloop code improved a lot since so many years passed,
gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant
desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop
and no transform done in doloop again.  Thus we can do the simplification
from "subtract/extend/add" to only extend as the condition in doloop will
never be false based on loop ch's optimization.
What's more, this patch is slightly different with Andrw's implementation,
the check of ZERO_EXT and SImode will guard the count won't be changed
from char/short caused cases not time out on slow platforms before.
Any comments?  Thanks.

doloop-1.c.257r.loop2_doloop
...
12: [r129:DI]=r123:SI
  REG_DEAD r129:DI
  REG_DEAD r123:SI
13: r125:SI=r120:DI#0-0x1
  REG_DEAD r120:DI
14: r120:DI=zero_extend(r125:SI#0)
  REG_DEAD r125:SI
16: r126:CC=cmp(r120:DI,0)
17: pc={(r126:CC!=0)?L43:pc}
  REG_DEAD r126:CC
...

Bootstrap and regression tested pass on Power8-LE.

gcc/ChangeLog

	2020-04-15  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, PR target/61837
	loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
	to zero_ext.
---
 gcc/loop-doloop.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

Comments

Richard Sandiford April 15, 2020, 9:18 a.m. UTC | #1
luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> From: Xionghu Luo <luoxhu@linux.ibm.com>
>
> This "subtract/extend/add" existed for a long time and still annoying us
> (PR37451, PR61837) when converting from 32bits to 64bits, as the ctr
> register is used as 64bits on powerpc64, Andraw Pinski had a patch but
> caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).
>
> Andraw:
> http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
> http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
> Joseph:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html
>
> However, the doloop code improved a lot since so many years passed,
> gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant
> desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop
> and no transform done in doloop again.  Thus we can do the simplification
> from "subtract/extend/add" to only extend as the condition in doloop will
> never be false based on loop ch's optimization.
> What's more, this patch is slightly different with Andrw's implementation,
> the check of ZERO_EXT and SImode will guard the count won't be changed
> from char/short caused cases not time out on slow platforms before.
> Any comments?  Thanks.
>
> doloop-1.c.257r.loop2_doloop
> ...
> 12: [r129:DI]=r123:SI
>   REG_DEAD r129:DI
>   REG_DEAD r123:SI
> 13: r125:SI=r120:DI#0-0x1
>   REG_DEAD r120:DI
> 14: r120:DI=zero_extend(r125:SI#0)
>   REG_DEAD r125:SI
> 16: r126:CC=cmp(r120:DI,0)
> 17: pc={(r126:CC!=0)?L43:pc}
>   REG_DEAD r126:CC
> ...
>
> Bootstrap and regression tested pass on Power8-LE.
>
> gcc/ChangeLog
>
> 	2020-04-15  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>
> 	PR rtl-optimization/37451, PR target/61837
> 	loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
> 	to zero_ext.
> ---
>  gcc/loop-doloop.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
> index db6a014e43d..9f967fa3a0b 100644
> --- a/gcc/loop-doloop.c
> +++ b/gcc/loop-doloop.c
> @@ -477,7 +477,31 @@ doloop_modify (class loop *loop, class niter_desc *desc,
>      }
>  
>    if (increment_count)
> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    {
> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
> +	 is never zero, as gimple pass loop ch will do optimization to simplify
> +	 the loop to NO loop for loop condition is false.  */

IMO the code needs to prove this, rather than just assume that previous
passes have made it so.

Thanks,
Richard

> +      bool simplify_zext = false;
> +      rtx extop0 = XEXP (count, 0);
> +      if (mode == E_DImode
> +	  && GET_CODE (count) == ZERO_EXTEND
> +	  && GET_CODE (extop0) == PLUS)
> +	{
> +	  rtx addop0 = XEXP (extop0, 0);
> +	  rtx addop1 = XEXP (extop0, 1);
> +	  if (CONST_SCALAR_INT_P (addop1)
> +	      && GET_MODE (addop0) == E_SImode
> +	      && addop1 == GEN_INT (-1))
> +	    {
> +	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
> +					  GET_MODE (addop0));
> +	      simplify_zext = true;
> +	    }
> +	}
> +
> +      if (!simplify_zext)
> +	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    }
>  
>    /* Insert initialization of the count register into the loop header.  */
>    start_sequence ();
Jakub Jelinek April 15, 2020, 9:37 a.m. UTC | #2
On Wed, Apr 15, 2020 at 03:47:55AM -0500, luoxhu--- via Gcc-patches wrote:
> 	2020-04-15  Xiong Hu Luo  <luoxhu@linux.ibm.com>
> 
> 	PR rtl-optimization/37451, PR target/61837
> 	loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)

"* " missing before loop-doloop.c

> 	to zero_ext.
> ---
>  gcc/loop-doloop.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
> index db6a014e43d..9f967fa3a0b 100644
> --- a/gcc/loop-doloop.c
> +++ b/gcc/loop-doloop.c
> @@ -477,7 +477,31 @@ doloop_modify (class loop *loop, class niter_desc *desc,
>      }
>  
>    if (increment_count)
> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    {
> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
> +	 is never zero, as gimple pass loop ch will do optimization to simplify
> +	 the loop to NO loop for loop condition is false.  */

There is no guarantee the loop ch pass was run, one can do
-fno-tree-loop-ch, -fdisable-tree-ch, -fdisable-tree-ch=foobar or
perhaps the zext(x-1)+1 has been introduced after it (either the loop
appeared post that at GIMPLE or later)?
IMHO if you want something like that, you need to prove at the RTL level
that addop0 must be non-zero, perhaps using saved VRP info from the GIMPLE
stuff if there is REG_EXPR mapping it to a SSA_NAME with one (though even
that might not be safe if things changed during RTL opts).

> +      bool simplify_zext = false;
> +      rtx extop0 = XEXP (count, 0);
> +      if (mode == E_DImode

Why hardcode DImode and SImode?  In generic code, shouldn't it work with
something more generic, like word_mode and MODE_INT mode twice as big as
the word_mode (or do you want MODE_INT mode with half the size of word_mode
and word_mode)?

> +	  && GET_CODE (count) == ZERO_EXTEND
> +	  && GET_CODE (extop0) == PLUS)
> +	{
> +	  rtx addop0 = XEXP (extop0, 0);
> +	  rtx addop1 = XEXP (extop0, 1);
> +	  if (CONST_SCALAR_INT_P (addop1)
> +	      && GET_MODE (addop0) == E_SImode
> +	      && addop1 == GEN_INT (-1))
> +	    {
> +	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
> +					  GET_MODE (addop0));
> +	      simplify_zext = true;
> +	    }
> +	}
> +
> +      if (!simplify_zext)
> +	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    }

	Jakub
Segher Boessenkool April 17, 2020, 1:21 a.m. UTC | #3
On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
> luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> > +    {
> > +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
> > +	 is never zero, as gimple pass loop ch will do optimization to simplify
> > +	 the loop to NO loop for loop condition is false.  */
> 
> IMO the code needs to prove this, rather than just assume that previous
> passes have made it so.

Well, it should gcc_assert it, probably.

It is the left-hand side of a+b...  it cannot be 0, because niter always
is simplified!


Segher
Segher Boessenkool April 17, 2020, 4:32 p.m. UTC | #4
On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote:
> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
> > luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> > > +    {
> > > +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
> > > +	 is never zero, as gimple pass loop ch will do optimization to simplify
> > > +	 the loop to NO loop for loop condition is false.  */
> > 
> > IMO the code needs to prove this, rather than just assume that previous
> > passes have made it so.
> 
> Well, it should gcc_assert it, probably.
> 
> It is the left-hand side of a+b...  it cannot be 0, because niter always
> is simplified!

Scratch that...  it cannot be *constant* 0, but that isn't the issue here.


Segher
Xionghu Luo April 20, 2020, 8:21 a.m. UTC | #5
Hi,

On 2020/4/18 00:32, Segher Boessenkool wrote:
> On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote:
>> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
>>> luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
>>>> +    {
>>>> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
>>>> +	 is never zero, as gimple pass loop ch will do optimization to simplify
>>>> +	 the loop to NO loop for loop condition is false.  */
>>>
>>> IMO the code needs to prove this, rather than just assume that previous
>>> passes have made it so.
>>
>> Well, it should gcc_assert it, probably.
>>
>> It is the left-hand side of a+b...  it cannot be 0, because niter always
>> is simplified!
> 
> Scratch that...  it cannot be *constant* 0, but that isn't the issue here.

Sorry that my comments in the code is a bit misleading, it is actually not
related to loop-ch at all.  The instruction sequence at 255r.loop2_invariant:

   25: NOTE_INSN_BASIC_BLOCK 5
   26: r133:SI=r123:DI#0-0x1
      REG_DEAD r123:DI
   27: r123:DI=zero_extend(r133:SI)
      REG_DEAD r133:SI
   28: r124:DI=r124:DI+0x4
   30: r134:CC=cmp(r123:DI,0)
   31: pc={(r134:CC!=0)?L69:pc}

And 257r.loop2_doloop (inserted #72,#73,#74, and #31 is replaced by #71):   

;; Determined upper bound -1.
Loop 2 is simple:
  simple exit 6 -> 7
  number of iterations: (plus:SI (subreg:SI (reg:DI 123 [ doloop.6 ]) 0)
    (const_int -1 [0xffffffffffffffff]))
  upper bound: 2147483646
  likely upper bound: 2147483646
  realistic bound: -1
...
   72: r144:SI=r123:DI#0-0x1
   73: r143:DI=zero_extend(r144:SI)
   74: r142:DI=r143:DI+0x1
...
   25: NOTE_INSN_BASIC_BLOCK 5
   26: r133:SI=r123:DI#0-0x1
      REG_DEAD r123:DI
   27: r123:DI=zero_extend(r133:SI)
      REG_DEAD r133:SI
   28: r124:DI=r124:DI+0x4
   30: r134:CC=cmp(r123:DI,0)
   71: {pc={(r142:DI!=0x1)?L69:pc};r142:DI=r142:DI-0x1;clobber scratch;clobber scratch;}

increment_count is true ensures the (condition NE const1_rtx), r123:DI#0-0x1 is the loop number
of iterations in doloop, it is definitely >= 0, and r123:DI#0 also shouldn't be zero as the
loop upper bound is 2147483646(0x7fffffffe)???

Since this simplification is in doloop-modify,  there is already some doloop form check like
!desc->simple_p || desc->assumptions|| desc->infinite in doloop_valid_p, so it seems
not necessary to repeat check it here again? 
Maybe we just need check the loop upper bound is LEU than 0x7fffffffe to avoid if
instruction #26 overflow?


Updated patch, thanks:


This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).

Andraw:
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
Joseph:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html

However, the doloop code improved a lot since so many years passed,
gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant
desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop
and no transform done in doloop again.  Thus we can do the simplification
from "subtract/extend/add" to only extend when loop upper_bound is known
to be LE than SINT_MAX-1(not do simplify when r120:DI#0-0x1 overflow).

Bootstrap and regression tested pass on Power8-LE.

gcc/ChangeLog

	2020-04-20  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, PR target/61837
	* loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
	to zero_ext.
---
 gcc/loop-doloop.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index db6a014e43d..da537aff60f 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -477,7 +477,46 @@ doloop_modify (class loop *loop, class niter_desc *desc,
     }
 
   if (increment_count)
-    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    {
+      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
+
+	 73: r145:SI=r123:DI#0-0x1
+	 74: r144:DI=zero_extend(r145:SI)
+	 75: r143:DI=r144:DI+0x1
+	 ...
+	 31: r135:CC=cmp(r123:DI,0)
+	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
+	 scratch;clobber scratch;}
+
+	 r123:DI#0-0x1 is the loop iterations be GE than 0, r143 is the loop
+	 count be saved to ctr, if this loop's upper bound is known, r123:DI#0
+	 won't be zero, then the expressions could be simplified to zero_extend
+	 only.  */
+      bool simplify_zext = false;
+      rtx extop0 = XEXP (count, 0);
+      if (loop->any_upper_bound
+	  && wi::leu_p (loop->nb_iterations_upper_bound, 0x7ffffffe)
+	  && GET_CODE (count) == ZERO_EXTEND
+	  && GET_CODE (extop0) == PLUS)
+	{
+	  rtx addop0 = XEXP (extop0, 0);
+	  rtx addop1 = XEXP (extop0, 1);
+
+	  unsigned int_mode
+	    = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));
+	  if (CONST_SCALAR_INT_P (addop1)
+	      && GET_MODE_PRECISION (mode) == int_mode * 2
+	      && addop1 == GEN_INT (-1))
+	    {
+	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
+					  GET_MODE (addop0));
+	      simplify_zext = true;
+	    }
+	}
+
+      if (!simplify_zext)
+	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    }
 
   /* Insert initialization of the count register into the loop header.  */
   start_sequence ();
Xionghu Luo April 20, 2020, 9:05 a.m. UTC | #6
Tiny update to accommodate unsigned int compare.

On 2020/4/20 16:21, luoxhu via Gcc-patches wrote:
> Hi,
> 
> On 2020/4/18 00:32, Segher Boessenkool wrote:
>> On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote:
>>> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
>>>> luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>>> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
>>>>> +    {
>>>>> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
>>>>> +	 is never zero, as gimple pass loop ch will do optimization to simplify
>>>>> +	 the loop to NO loop for loop condition is false.  */
>>>>
>>>> IMO the code needs to prove this, rather than just assume that previous
>>>> passes have made it so.
>>>
>>> Well, it should gcc_assert it, probably.
>>>
>>> It is the left-hand side of a+b...  it cannot be 0, because niter always
>>> is simplified!
>>
>> Scratch that...  it cannot be *constant* 0, but that isn't the issue here.
> 
> Sorry that my comments in the code is a bit misleading, it is actually not
> related to loop-ch at all.  The instruction sequence at 255r.loop2_invariant:
> 
>     25: NOTE_INSN_BASIC_BLOCK 5
>     26: r133:SI=r123:DI#0-0x1
>        REG_DEAD r123:DI
>     27: r123:DI=zero_extend(r133:SI)
>        REG_DEAD r133:SI
>     28: r124:DI=r124:DI+0x4
>     30: r134:CC=cmp(r123:DI,0)
>     31: pc={(r134:CC!=0)?L69:pc}
> 
> And 257r.loop2_doloop (inserted #72,#73,#74, and #31 is replaced by #71):
> 
> ;; Determined upper bound -1.
> Loop 2 is simple:
>    simple exit 6 -> 7
>    number of iterations: (plus:SI (subreg:SI (reg:DI 123 [ doloop.6 ]) 0)
>      (const_int -1 [0xffffffffffffffff]))
>    upper bound: 2147483646
>    likely upper bound: 2147483646
>    realistic bound: -1
> ...
>     72: r144:SI=r123:DI#0-0x1
>     73: r143:DI=zero_extend(r144:SI)
>     74: r142:DI=r143:DI+0x1
> ...
>     25: NOTE_INSN_BASIC_BLOCK 5
>     26: r133:SI=r123:DI#0-0x1
>        REG_DEAD r123:DI
>     27: r123:DI=zero_extend(r133:SI)
>        REG_DEAD r133:SI
>     28: r124:DI=r124:DI+0x4
>     30: r134:CC=cmp(r123:DI,0)
>     71: {pc={(r142:DI!=0x1)?L69:pc};r142:DI=r142:DI-0x1;clobber scratch;clobber scratch;}
> 
> increment_count is true ensures the (condition NE const1_rtx), r123:DI#0-0x1 is the loop number
> of iterations in doloop, it is definitely >= 0, and r123:DI#0 also shouldn't be zero as the
> loop upper bound is 2147483646(0x7fffffffe)???
> 
> Since this simplification is in doloop-modify,  there is already some doloop form check like
> !desc->simple_p || desc->assumptions|| desc->infinite in doloop_valid_p, so it seems
> not necessary to repeat check it here again?
> Maybe we just need check the loop upper bound is LEU than 0x7fffffffe to avoid if
> instruction #26 overflow?
0xfffffffe

> 
> 
> Updated patch, thanks:
> 
> 
> This "subtract/extend/add" existed for a long time and still annoying us
> (PR37451, PR61837) when converting from 32bits to 64bits, as the ctr
> register is used as 64bits on powerpc64, Andraw Pinski had a patch but
> caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).
> 
> Andraw:
> http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
> http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
> Joseph:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html
> 
> However, the doloop code improved a lot since so many years passed,
> gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant
> desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop
> and no transform done in doloop again.  Thus we can do the simplification
> from "subtract/extend/add" to only extend when loop upper_bound is known
> to be LE than SINT_MAX-1(not do simplify when r120:DI#0-0x1 overflow).
UINT_MAX-1

> 
> Bootstrap and regression tested pass on Power8-LE.
> 
> gcc/ChangeLog
> 
> 	2020-04-20  Xiong Hu Luo  <luoxhu@linux.ibm.com>
> 
> 	PR rtl-optimization/37451, PR target/61837
> 	* loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
> 	to zero_ext.
> ---
>   gcc/loop-doloop.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
> index db6a014e43d..da537aff60f 100644
> --- a/gcc/loop-doloop.c
> +++ b/gcc/loop-doloop.c
> @@ -477,7 +477,46 @@ doloop_modify (class loop *loop, class niter_desc *desc,
>       }
> 
>     if (increment_count)
> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    {
> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
> +
> +	 73: r145:SI=r123:DI#0-0x1
> +	 74: r144:DI=zero_extend(r145:SI)
> +	 75: r143:DI=r144:DI+0x1
> +	 ...
> +	 31: r135:CC=cmp(r123:DI,0)
> +	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
> +	 scratch;clobber scratch;}
> +
> +	 r123:DI#0-0x1 is the loop iterations be GE than 0, r143 is the loop
> +	 count be saved to ctr, if this loop's upper bound is known, r123:DI#0
> +	 won't be zero, then the expressions could be simplified to zero_extend
> +	 only.  */
> +      bool simplify_zext = false;
> +      rtx extop0 = XEXP (count, 0);
> +      if (loop->any_upper_bound
> +	  && wi::leu_p (loop->nb_iterations_upper_bound, 0x7ffffffe)

Should be 4294967294(0xfffffffe) here.

Xionghu

> +	  && GET_CODE (count) == ZERO_EXTEND
> +	  && GET_CODE (extop0) == PLUS)
> +	{
> +	  rtx addop0 = XEXP (extop0, 0);
> +	  rtx addop1 = XEXP (extop0, 1);
> +
> +	  unsigned int_mode
> +	    = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));
> +	  if (CONST_SCALAR_INT_P (addop1)
> +	      && GET_MODE_PRECISION (mode) == int_mode * 2
> +	      && addop1 == GEN_INT (-1))
> +	    {
> +	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
> +					  GET_MODE (addop0));
> +	      simplify_zext = true;
> +	    }
> +	}
> +
> +      if (!simplify_zext)
> +	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    }
> 
>     /* Insert initialization of the count register into the loop header.  */
>     start_sequence ();
>
diff mbox series

Patch

diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index db6a014e43d..9f967fa3a0b 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -477,7 +477,31 @@  doloop_modify (class loop *loop, class niter_desc *desc,
     }
 
   if (increment_count)
-    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    {
+      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
+	 is never zero, as gimple pass loop ch will do optimization to simplify
+	 the loop to NO loop for loop condition is false.  */
+      bool simplify_zext = false;
+      rtx extop0 = XEXP (count, 0);
+      if (mode == E_DImode
+	  && GET_CODE (count) == ZERO_EXTEND
+	  && GET_CODE (extop0) == PLUS)
+	{
+	  rtx addop0 = XEXP (extop0, 0);
+	  rtx addop1 = XEXP (extop0, 1);
+	  if (CONST_SCALAR_INT_P (addop1)
+	      && GET_MODE (addop0) == E_SImode
+	      && addop1 == GEN_INT (-1))
+	    {
+	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
+					  GET_MODE (addop0));
+	      simplify_zext = true;
+	    }
+	}
+
+      if (!simplify_zext)
+	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    }
 
   /* Insert initialization of the count register into the loop header.  */
   start_sequence ();