Message ID | 20200415084755.72653-1-luoxhu@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837) | expand |
luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > From: Xionghu Luo <luoxhu@linux.ibm.com> > > This "subtract/extend/add" existed for a long time and still annoying us > (PR37451, PR61837) when converting from 32bits to 64bits, as the ctr > register is used as 64bits on powerpc64, Andraw Pinski had a patch but > caused some issue and reverted by Joseph S. Myers(PR37451, PR37782). > > Andraw: > http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html > http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html > Joseph: > https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html > > However, the doloop code improved a lot since so many years passed, > gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant > desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop > and no transform done in doloop again. Thus we can do the simplification > from "subtract/extend/add" to only extend as the condition in doloop will > never be false based on loop ch's optimization. > What's more, this patch is slightly different with Andrw's implementation, > the check of ZERO_EXT and SImode will guard the count won't be changed > from char/short caused cases not time out on slow platforms before. > Any comments? Thanks. > > doloop-1.c.257r.loop2_doloop > ... > 12: [r129:DI]=r123:SI > REG_DEAD r129:DI > REG_DEAD r123:SI > 13: r125:SI=r120:DI#0-0x1 > REG_DEAD r120:DI > 14: r120:DI=zero_extend(r125:SI#0) > REG_DEAD r125:SI > 16: r126:CC=cmp(r120:DI,0) > 17: pc={(r126:CC!=0)?L43:pc} > REG_DEAD r126:CC > ... > > Bootstrap and regression tested pass on Power8-LE. > > gcc/ChangeLog > > 2020-04-15 Xiong Hu Luo <luoxhu@linux.ibm.com> > > PR rtl-optimization/37451, PR target/61837 > loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1) > to zero_ext. > --- > gcc/loop-doloop.c | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c > index db6a014e43d..9f967fa3a0b 100644 > --- a/gcc/loop-doloop.c > +++ b/gcc/loop-doloop.c > @@ -477,7 +477,31 @@ doloop_modify (class loop *loop, class niter_desc *desc, > } > > if (increment_count) > - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); > + { > + /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0 > + is never zero, as gimple pass loop ch will do optimization to simplify > + the loop to NO loop for loop condition is false. */ IMO the code needs to prove this, rather than just assume that previous passes have made it so. Thanks, Richard > + bool simplify_zext = false; > + rtx extop0 = XEXP (count, 0); > + if (mode == E_DImode > + && GET_CODE (count) == ZERO_EXTEND > + && GET_CODE (extop0) == PLUS) > + { > + rtx addop0 = XEXP (extop0, 0); > + rtx addop1 = XEXP (extop0, 1); > + if (CONST_SCALAR_INT_P (addop1) > + && GET_MODE (addop0) == E_SImode > + && addop1 == GEN_INT (-1)) > + { > + count = simplify_gen_unary (ZERO_EXTEND, mode, addop0, > + GET_MODE (addop0)); > + simplify_zext = true; > + } > + } > + > + if (!simplify_zext) > + count = simplify_gen_binary (PLUS, mode, count, const1_rtx); > + } > > /* Insert initialization of the count register into the loop header. */ > start_sequence ();
On Wed, Apr 15, 2020 at 03:47:55AM -0500, luoxhu--- via Gcc-patches wrote: > 2020-04-15 Xiong Hu Luo <luoxhu@linux.ibm.com> > > PR rtl-optimization/37451, PR target/61837 > loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1) "* " missing before loop-doloop.c > to zero_ext. > --- > gcc/loop-doloop.c | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c > index db6a014e43d..9f967fa3a0b 100644 > --- a/gcc/loop-doloop.c > +++ b/gcc/loop-doloop.c > @@ -477,7 +477,31 @@ doloop_modify (class loop *loop, class niter_desc *desc, > } > > if (increment_count) > - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); > + { > + /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0 > + is never zero, as gimple pass loop ch will do optimization to simplify > + the loop to NO loop for loop condition is false. */ There is no guarantee the loop ch pass was run, one can do -fno-tree-loop-ch, -fdisable-tree-ch, -fdisable-tree-ch=foobar or perhaps the zext(x-1)+1 has been introduced after it (either the loop appeared post that at GIMPLE or later)? IMHO if you want something like that, you need to prove at the RTL level that addop0 must be non-zero, perhaps using saved VRP info from the GIMPLE stuff if there is REG_EXPR mapping it to a SSA_NAME with one (though even that might not be safe if things changed during RTL opts). > + bool simplify_zext = false; > + rtx extop0 = XEXP (count, 0); > + if (mode == E_DImode Why hardcode DImode and SImode? In generic code, shouldn't it work with something more generic, like word_mode and MODE_INT mode twice as big as the word_mode (or do you want MODE_INT mode with half the size of word_mode and word_mode)? > + && GET_CODE (count) == ZERO_EXTEND > + && GET_CODE (extop0) == PLUS) > + { > + rtx addop0 = XEXP (extop0, 0); > + rtx addop1 = XEXP (extop0, 1); > + if (CONST_SCALAR_INT_P (addop1) > + && GET_MODE (addop0) == E_SImode > + && addop1 == GEN_INT (-1)) > + { > + count = simplify_gen_unary (ZERO_EXTEND, mode, addop0, > + GET_MODE (addop0)); > + simplify_zext = true; > + } > + } > + > + if (!simplify_zext) > + count = simplify_gen_binary (PLUS, mode, count, const1_rtx); > + } Jakub
On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote: > luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); > > + { > > + /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0 > > + is never zero, as gimple pass loop ch will do optimization to simplify > > + the loop to NO loop for loop condition is false. */ > > IMO the code needs to prove this, rather than just assume that previous > passes have made it so. Well, it should gcc_assert it, probably. It is the left-hand side of a+b... it cannot be 0, because niter always is simplified! Segher
On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote: > On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote: > > luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > > - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); > > > + { > > > + /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0 > > > + is never zero, as gimple pass loop ch will do optimization to simplify > > > + the loop to NO loop for loop condition is false. */ > > > > IMO the code needs to prove this, rather than just assume that previous > > passes have made it so. > > Well, it should gcc_assert it, probably. > > It is the left-hand side of a+b... it cannot be 0, because niter always > is simplified! Scratch that... it cannot be *constant* 0, but that isn't the issue here. Segher
Hi, On 2020/4/18 00:32, Segher Boessenkool wrote: > On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote: >> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote: >>> luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >>>> - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); >>>> + { >>>> + /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0 >>>> + is never zero, as gimple pass loop ch will do optimization to simplify >>>> + the loop to NO loop for loop condition is false. */ >>> >>> IMO the code needs to prove this, rather than just assume that previous >>> passes have made it so. >> >> Well, it should gcc_assert it, probably. >> >> It is the left-hand side of a+b... it cannot be 0, because niter always >> is simplified! > > Scratch that... it cannot be *constant* 0, but that isn't the issue here. Sorry that my comments in the code is a bit misleading, it is actually not related to loop-ch at all. The instruction sequence at 255r.loop2_invariant: 25: NOTE_INSN_BASIC_BLOCK 5 26: r133:SI=r123:DI#0-0x1 REG_DEAD r123:DI 27: r123:DI=zero_extend(r133:SI) REG_DEAD r133:SI 28: r124:DI=r124:DI+0x4 30: r134:CC=cmp(r123:DI,0) 31: pc={(r134:CC!=0)?L69:pc} And 257r.loop2_doloop (inserted #72,#73,#74, and #31 is replaced by #71): ;; Determined upper bound -1. Loop 2 is simple: simple exit 6 -> 7 number of iterations: (plus:SI (subreg:SI (reg:DI 123 [ doloop.6 ]) 0) (const_int -1 [0xffffffffffffffff])) upper bound: 2147483646 likely upper bound: 2147483646 realistic bound: -1 ... 72: r144:SI=r123:DI#0-0x1 73: r143:DI=zero_extend(r144:SI) 74: r142:DI=r143:DI+0x1 ... 25: NOTE_INSN_BASIC_BLOCK 5 26: r133:SI=r123:DI#0-0x1 REG_DEAD r123:DI 27: r123:DI=zero_extend(r133:SI) REG_DEAD r133:SI 28: r124:DI=r124:DI+0x4 30: r134:CC=cmp(r123:DI,0) 71: {pc={(r142:DI!=0x1)?L69:pc};r142:DI=r142:DI-0x1;clobber scratch;clobber scratch;} increment_count is true ensures the (condition NE const1_rtx), r123:DI#0-0x1 is the loop number of iterations in doloop, it is definitely >= 0, and r123:DI#0 also shouldn't be zero as the loop upper bound is 2147483646(0x7fffffffe)??? Since this simplification is in doloop-modify, there is already some doloop form check like !desc->simple_p || desc->assumptions|| desc->infinite in doloop_valid_p, so it seems not necessary to repeat check it here again? Maybe we just need check the loop upper bound is LEU than 0x7fffffffe to avoid if instruction #26 overflow? Updated patch, thanks: This "subtract/extend/add" existed for a long time and still annoying us (PR37451, PR61837) when converting from 32bits to 64bits, as the ctr register is used as 64bits on powerpc64, Andraw Pinski had a patch but caused some issue and reverted by Joseph S. Myers(PR37451, PR37782). Andraw: http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html Joseph: https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html However, the doloop code improved a lot since so many years passed, gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop and no transform done in doloop again. Thus we can do the simplification from "subtract/extend/add" to only extend when loop upper_bound is known to be LE than SINT_MAX-1(not do simplify when r120:DI#0-0x1 overflow). Bootstrap and regression tested pass on Power8-LE. gcc/ChangeLog 2020-04-20 Xiong Hu Luo <luoxhu@linux.ibm.com> PR rtl-optimization/37451, PR target/61837 * loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1) to zero_ext. --- gcc/loop-doloop.c | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c index db6a014e43d..da537aff60f 100644 --- a/gcc/loop-doloop.c +++ b/gcc/loop-doloop.c @@ -477,7 +477,46 @@ doloop_modify (class loop *loop, class niter_desc *desc, } if (increment_count) - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); + { + /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e: + + 73: r145:SI=r123:DI#0-0x1 + 74: r144:DI=zero_extend(r145:SI) + 75: r143:DI=r144:DI+0x1 + ... + 31: r135:CC=cmp(r123:DI,0) + 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber + scratch;clobber scratch;} + + r123:DI#0-0x1 is the loop iterations be GE than 0, r143 is the loop + count be saved to ctr, if this loop's upper bound is known, r123:DI#0 + won't be zero, then the expressions could be simplified to zero_extend + only. */ + bool simplify_zext = false; + rtx extop0 = XEXP (count, 0); + if (loop->any_upper_bound + && wi::leu_p (loop->nb_iterations_upper_bound, 0x7ffffffe) + && GET_CODE (count) == ZERO_EXTEND + && GET_CODE (extop0) == PLUS) + { + rtx addop0 = XEXP (extop0, 0); + rtx addop1 = XEXP (extop0, 1); + + unsigned int_mode + = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0)); + if (CONST_SCALAR_INT_P (addop1) + && GET_MODE_PRECISION (mode) == int_mode * 2 + && addop1 == GEN_INT (-1)) + { + count = simplify_gen_unary (ZERO_EXTEND, mode, addop0, + GET_MODE (addop0)); + simplify_zext = true; + } + } + + if (!simplify_zext) + count = simplify_gen_binary (PLUS, mode, count, const1_rtx); + } /* Insert initialization of the count register into the loop header. */ start_sequence ();
Tiny update to accommodate unsigned int compare. On 2020/4/20 16:21, luoxhu via Gcc-patches wrote: > Hi, > > On 2020/4/18 00:32, Segher Boessenkool wrote: >> On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote: >>> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote: >>>> luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes: >>>>> - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); >>>>> + { >>>>> + /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0 >>>>> + is never zero, as gimple pass loop ch will do optimization to simplify >>>>> + the loop to NO loop for loop condition is false. */ >>>> >>>> IMO the code needs to prove this, rather than just assume that previous >>>> passes have made it so. >>> >>> Well, it should gcc_assert it, probably. >>> >>> It is the left-hand side of a+b... it cannot be 0, because niter always >>> is simplified! >> >> Scratch that... it cannot be *constant* 0, but that isn't the issue here. > > Sorry that my comments in the code is a bit misleading, it is actually not > related to loop-ch at all. The instruction sequence at 255r.loop2_invariant: > > 25: NOTE_INSN_BASIC_BLOCK 5 > 26: r133:SI=r123:DI#0-0x1 > REG_DEAD r123:DI > 27: r123:DI=zero_extend(r133:SI) > REG_DEAD r133:SI > 28: r124:DI=r124:DI+0x4 > 30: r134:CC=cmp(r123:DI,0) > 31: pc={(r134:CC!=0)?L69:pc} > > And 257r.loop2_doloop (inserted #72,#73,#74, and #31 is replaced by #71): > > ;; Determined upper bound -1. > Loop 2 is simple: > simple exit 6 -> 7 > number of iterations: (plus:SI (subreg:SI (reg:DI 123 [ doloop.6 ]) 0) > (const_int -1 [0xffffffffffffffff])) > upper bound: 2147483646 > likely upper bound: 2147483646 > realistic bound: -1 > ... > 72: r144:SI=r123:DI#0-0x1 > 73: r143:DI=zero_extend(r144:SI) > 74: r142:DI=r143:DI+0x1 > ... > 25: NOTE_INSN_BASIC_BLOCK 5 > 26: r133:SI=r123:DI#0-0x1 > REG_DEAD r123:DI > 27: r123:DI=zero_extend(r133:SI) > REG_DEAD r133:SI > 28: r124:DI=r124:DI+0x4 > 30: r134:CC=cmp(r123:DI,0) > 71: {pc={(r142:DI!=0x1)?L69:pc};r142:DI=r142:DI-0x1;clobber scratch;clobber scratch;} > > increment_count is true ensures the (condition NE const1_rtx), r123:DI#0-0x1 is the loop number > of iterations in doloop, it is definitely >= 0, and r123:DI#0 also shouldn't be zero as the > loop upper bound is 2147483646(0x7fffffffe)??? > > Since this simplification is in doloop-modify, there is already some doloop form check like > !desc->simple_p || desc->assumptions|| desc->infinite in doloop_valid_p, so it seems > not necessary to repeat check it here again? > Maybe we just need check the loop upper bound is LEU than 0x7fffffffe to avoid if > instruction #26 overflow? 0xfffffffe > > > Updated patch, thanks: > > > This "subtract/extend/add" existed for a long time and still annoying us > (PR37451, PR61837) when converting from 32bits to 64bits, as the ctr > register is used as 64bits on powerpc64, Andraw Pinski had a patch but > caused some issue and reverted by Joseph S. Myers(PR37451, PR37782). > > Andraw: > http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html > http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html > Joseph: > https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html > > However, the doloop code improved a lot since so many years passed, > gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant > desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop > and no transform done in doloop again. Thus we can do the simplification > from "subtract/extend/add" to only extend when loop upper_bound is known > to be LE than SINT_MAX-1(not do simplify when r120:DI#0-0x1 overflow). UINT_MAX-1 > > Bootstrap and regression tested pass on Power8-LE. > > gcc/ChangeLog > > 2020-04-20 Xiong Hu Luo <luoxhu@linux.ibm.com> > > PR rtl-optimization/37451, PR target/61837 > * loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1) > to zero_ext. > --- > gcc/loop-doloop.c | 41 ++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 40 insertions(+), 1 deletion(-) > > diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c > index db6a014e43d..da537aff60f 100644 > --- a/gcc/loop-doloop.c > +++ b/gcc/loop-doloop.c > @@ -477,7 +477,46 @@ doloop_modify (class loop *loop, class niter_desc *desc, > } > > if (increment_count) > - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); > + { > + /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e: > + > + 73: r145:SI=r123:DI#0-0x1 > + 74: r144:DI=zero_extend(r145:SI) > + 75: r143:DI=r144:DI+0x1 > + ... > + 31: r135:CC=cmp(r123:DI,0) > + 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber > + scratch;clobber scratch;} > + > + r123:DI#0-0x1 is the loop iterations be GE than 0, r143 is the loop > + count be saved to ctr, if this loop's upper bound is known, r123:DI#0 > + won't be zero, then the expressions could be simplified to zero_extend > + only. */ > + bool simplify_zext = false; > + rtx extop0 = XEXP (count, 0); > + if (loop->any_upper_bound > + && wi::leu_p (loop->nb_iterations_upper_bound, 0x7ffffffe) Should be 4294967294(0xfffffffe) here. Xionghu > + && GET_CODE (count) == ZERO_EXTEND > + && GET_CODE (extop0) == PLUS) > + { > + rtx addop0 = XEXP (extop0, 0); > + rtx addop1 = XEXP (extop0, 1); > + > + unsigned int_mode > + = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0)); > + if (CONST_SCALAR_INT_P (addop1) > + && GET_MODE_PRECISION (mode) == int_mode * 2 > + && addop1 == GEN_INT (-1)) > + { > + count = simplify_gen_unary (ZERO_EXTEND, mode, addop0, > + GET_MODE (addop0)); > + simplify_zext = true; > + } > + } > + > + if (!simplify_zext) > + count = simplify_gen_binary (PLUS, mode, count, const1_rtx); > + } > > /* Insert initialization of the count register into the loop header. */ > start_sequence (); >
diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c index db6a014e43d..9f967fa3a0b 100644 --- a/gcc/loop-doloop.c +++ b/gcc/loop-doloop.c @@ -477,7 +477,31 @@ doloop_modify (class loop *loop, class niter_desc *desc, } if (increment_count) - count = simplify_gen_binary (PLUS, mode, count, const1_rtx); + { + /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0 + is never zero, as gimple pass loop ch will do optimization to simplify + the loop to NO loop for loop condition is false. */ + bool simplify_zext = false; + rtx extop0 = XEXP (count, 0); + if (mode == E_DImode + && GET_CODE (count) == ZERO_EXTEND + && GET_CODE (extop0) == PLUS) + { + rtx addop0 = XEXP (extop0, 0); + rtx addop1 = XEXP (extop0, 1); + if (CONST_SCALAR_INT_P (addop1) + && GET_MODE (addop0) == E_SImode + && addop1 == GEN_INT (-1)) + { + count = simplify_gen_unary (ZERO_EXTEND, mode, addop0, + GET_MODE (addop0)); + simplify_zext = true; + } + } + + if (!simplify_zext) + count = simplify_gen_binary (PLUS, mode, count, const1_rtx); + } /* Insert initialization of the count register into the loop header. */ start_sequence ();
From: Xionghu Luo <luoxhu@linux.ibm.com> This "subtract/extend/add" existed for a long time and still annoying us (PR37451, PR61837) when converting from 32bits to 64bits, as the ctr register is used as 64bits on powerpc64, Andraw Pinski had a patch but caused some issue and reverted by Joseph S. Myers(PR37451, PR37782). Andraw: http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html Joseph: https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html However, the doloop code improved a lot since so many years passed, gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop and no transform done in doloop again. Thus we can do the simplification from "subtract/extend/add" to only extend as the condition in doloop will never be false based on loop ch's optimization. What's more, this patch is slightly different with Andrw's implementation, the check of ZERO_EXT and SImode will guard the count won't be changed from char/short caused cases not time out on slow platforms before. Any comments? Thanks. doloop-1.c.257r.loop2_doloop ... 12: [r129:DI]=r123:SI REG_DEAD r129:DI REG_DEAD r123:SI 13: r125:SI=r120:DI#0-0x1 REG_DEAD r120:DI 14: r120:DI=zero_extend(r125:SI#0) REG_DEAD r125:SI 16: r126:CC=cmp(r120:DI,0) 17: pc={(r126:CC!=0)?L43:pc} REG_DEAD r126:CC ... Bootstrap and regression tested pass on Power8-LE. gcc/ChangeLog 2020-04-15 Xiong Hu Luo <luoxhu@linux.ibm.com> PR rtl-optimization/37451, PR target/61837 loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1) to zero_ext. --- gcc/loop-doloop.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-)