Submitted by Uros Bizjak on Sept. 27, 2012, 6:04 p.m.

Message ID | CAFULd4YiqSJPDNzg3XTyWx=mBphzCWSpuaQiQ7SbO17Bgc7SSA@mail.gmail.com |
---|---|

State | New |

Headers | show |

On Sep 27, 2012, at 2:04 PM, Uros Bizjak wrote: > > > >>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is >>>>> a good transformation, but why do we need to handle as special >>>>> the case where the subreg is itself the operand of a plus or minus? >>>>> I think it should happen regardless of where the subreg occurs. >>>> >>>> Don't we need to restrict this to the low part though? >>> >>> ... > > After some off-line discussion with Richard, attached is v2 of the patch. > > 2012-09-27 Uros Bizjak <ubizjak@gmail.com> > > PR rtl-optimization/54457 > * simplify-rtx.c (simplify_subreg): > Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) > to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). > ... Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI? paul

On Thu, Sep 27, 2012 at 8:08 PM, <Paul_Koning@dell.com> wrote: >>>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is >>>>>> a good transformation, but why do we need to handle as special >>>>>> the case where the subreg is itself the operand of a plus or minus? >>>>>> I think it should happen regardless of where the subreg occurs. >>>>> >>>>> Don't we need to restrict this to the low part though? >>>> >>>> ... >> >> After some off-line discussion with Richard, attached is v2 of the patch. >> >> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >> >> PR rtl-optimization/54457 >> * simplify-rtx.c (simplify_subreg): >> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). >> ... > > Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI? Oh, I just copied v1 ChangeLog. The patch converts all modes where size of mode M < size of mode N. Updated ChangeLog reads: 2012-09-27 Uros Bizjak <ubizjak@gmail.com> PR rtl-optimization/54457 * simplify-rtx.c (simplify_subreg): Simplify (subreg:M (op:N ((x:N) (y:N)), 0) to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where the outer subreg is effectively a truncation to the original mode M. testsuite/ChangeLog: 2012-09-27 Uros Bizjak <ubizjak@gmail.com> PR rtl-optimization/54457 * gcc.target/i386/pr54457.c: New test. Uros.

On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote: > After some off-line discussion with Richard, attached is v2 of the patch. > > 2012-09-27 Uros Bizjak <ubizjak@gmail.com> > > PR rtl-optimization/54457 > * simplify-rtx.c (simplify_subreg): > Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) > to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). Is that a good idea even for WORD_REGISTER_OPERATIONS targets? > --- simplify-rtx.c (revision 191808) > +++ simplify-rtx.c (working copy) > @@ -5689,6 +5689,28 @@ simplify_subreg (enum machine_mode outermode, rtx > return CONST0_RTX (outermode); > } > > + /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) > + to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where > + the outer subreg is effectively a truncation to the original mode. */ > + if ((GET_CODE (op) == PLUS > + || GET_CODE (op) == MINUS > + || GET_CODE (op) == MULT) > + && SCALAR_INT_MODE_P (outermode) > + && SCALAR_INT_MODE_P (innermode) > + && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) > + && byte == subreg_lowpart_offset (outermode, innermode)) > + { > + rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0), > + innermode, byte); > + if (op0) > + { > + rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1), > + innermode, byte); > + if (op1) > + return simplify_gen_binary (GET_CODE (op), outermode, op0, op1); > + } > + } > + > /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into > to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and > the outer subreg is effectively a truncation to the original mode. */ Jakub

Jakub Jelinek <jakub@redhat.com> writes: > On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote: >> After some off-line discussion with Richard, attached is v2 of the patch. >> >> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >> >> PR rtl-optimization/54457 >> * simplify-rtx.c (simplify_subreg): >> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). > > Is that a good idea even for WORD_REGISTER_OPERATIONS targets? I think so. Admittedly it means I'm going to have to change the mips.md BADDU patterns from: (define_insn "*baddu_si_el" [(set (match_operand:SI 0 "register_operand" "=d") (zero_extend:SI (subreg:QI (plus:SI (match_operand:SI 1 "register_operand" "d") (match_operand:SI 2 "register_operand" "d")) 0)))] "ISA_HAS_BADDU && !BYTES_BIG_ENDIAN" "baddu\\t%0,%1,%2" [(set_attr "alu_type" "add")]) to: (define_insn "*baddu_si_el" [(set (match_operand:SI 0 "register_operand" "=d") (zero_extend:SI (plus:QI (match_operand:QI 1 "register_operand" "d") (match_operand:QI 2 "register_operand" "d"))))] "ISA_HAS_BADDU" "baddu\\t%0,%1,%2" [(set_attr "alu_type" "add")]) But really, that's a better representation even on MIPS, not least because it gets rid of the ugly endianness condition. There will probably be fallout on other targets too. But the upside is that we get rid of more subregs from .mds. I think a few of the i386.md define_peephole2s could go too. E.g. the second two of: (define_peephole2 [(set (match_operand:DI 0 "register_operand") (zero_extend:DI (plus:SI (match_operand:SI 1 "register_operand") (match_operand:SI 2 "nonmemory_operand"))))] "TARGET_64BIT && !TARGET_OPT_AGU && REGNO (operands[0]) == REGNO (operands[1]) && peep2_regno_dead_p (0, FLAGS_REG)" [(parallel [(set (match_dup 0) (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2)))) (clobber (reg:CC FLAGS_REG))])]) (define_peephole2 [(set (match_operand:DI 0 "register_operand") (zero_extend:DI (plus:SI (match_operand:SI 1 "nonmemory_operand") (match_operand:SI 2 "register_operand"))))] "TARGET_64BIT && !TARGET_OPT_AGU && REGNO (operands[0]) == REGNO (operands[2]) && peep2_regno_dead_p (0, FLAGS_REG)" [(parallel [(set (match_dup 0) (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1)))) (clobber (reg:CC FLAGS_REG))])]) (define_peephole2 [(set (match_operand:DI 0 "register_operand") (zero_extend:DI (subreg:SI (plus:DI (match_dup 0) (match_operand:DI 1 "nonmemory_operand")) 0)))] "TARGET_64BIT && !TARGET_OPT_AGU && peep2_regno_dead_p (0, FLAGS_REG)" [(parallel [(set (match_dup 0) (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1)))) (clobber (reg:CC FLAGS_REG))])] { operands[1] = gen_lowpart (SImode, operands[1]); operands[2] = gen_lowpart (SImode, operands[0]); }) (define_peephole2 [(set (match_operand:DI 0 "register_operand") (zero_extend:DI (subreg:SI (plus:DI (match_operand:DI 1 "nonmemory_operand") (match_dup 0)) 0)))] "TARGET_64BIT && !TARGET_OPT_AGU && peep2_regno_dead_p (0, FLAGS_REG)" [(parallel [(set (match_dup 0) (zero_extend:DI (plus:SI (match_dup 2) (match_dup 1)))) (clobber (reg:CC FLAGS_REG))])] { operands[1] = gen_lowpart (SImode, operands[1]); operands[2] = gen_lowpart (SImode, operands[0]); }) where we should now always generate the first two forms. There's no semantic difference between the two rtxes, and I think it would be confusing to have different canonical forms on different targets. If the caller really wants a word-mode operation on WORD_REGISTER_OPERATIONS targets, then I think it's asking for the wrong thing by taking this subreg. Richard

On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote: >> After some off-line discussion with Richard, attached is v2 of the patch. >> >> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >> >> PR rtl-optimization/54457 >> * simplify-rtx.c (simplify_subreg): >> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). > > Is that a good idea even for WORD_REGISTER_OPERATIONS targets? I have bootstrapped and regtested [1] the patch on alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there were no additional failures. [1] http://gcc.gnu.org/ml/gcc-testresults/2012-09/msg02828.html Uros.

Uros Bizjak <ubizjak@gmail.com> writes: > On Thu, Sep 27, 2012 at 8:20 PM, Jakub Jelinek <jakub@redhat.com> wrote: >> On Thu, Sep 27, 2012 at 08:04:58PM +0200, Uros Bizjak wrote: >>> After some off-line discussion with Richard, attached is v2 of the patch. >>> >>> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >>> >>> PR rtl-optimization/54457 >>> * simplify-rtx.c (simplify_subreg): >>> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >>> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). >> >> Is that a good idea even for WORD_REGISTER_OPERATIONS targets? > > I have bootstrapped and regtested [1] the patch on > alphaev68-pc-linux-gnu, a WORD_REGISTER_OPERATIONS target, and there > were no additional failures. Thanks. Given Jakub's question/concern, I'd really prefer a third opinion rather than approving it myself, but... OK if no-one objects within 24hrs. Richard

On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Thu, Sep 27, 2012 at 8:08 PM, <Paul_Koning@dell.com> wrote: > >>>>>>> I agree (subreg:M (op:N A C) 0) to (op:M (subreg:N (A 0)) C) is >>>>>>> a good transformation, but why do we need to handle as special >>>>>>> the case where the subreg is itself the operand of a plus or minus? >>>>>>> I think it should happen regardless of where the subreg occurs. >>>>>> >>>>>> Don't we need to restrict this to the low part though? >>>>> >>>>> ... >>> >>> After some off-line discussion with Richard, attached is v2 of the patch. >>> >>> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >>> >>> PR rtl-optimization/54457 >>> * simplify-rtx.c (simplify_subreg): >>> Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) >>> to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)). >>> ... >> >> Is it just specific to DI -> SI, or is it for any large mode -> smaller mode, like SI -> HI? > > Oh, I just copied v1 ChangeLog. The patch converts all modes where > size of mode M < size of mode N. Updated ChangeLog reads: > > 2012-09-27 Uros Bizjak <ubizjak@gmail.com> > > PR rtl-optimization/54457 > * simplify-rtx.c (simplify_subreg): > Simplify (subreg:M (op:N ((x:N) (y:N)), 0) > to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where > the outer subreg is effectively a truncation to the original mode M. When I was doing something similar on our internal toolchain at Cavium. I found doing this caused a regression on MIPS64 n32 in gcc.c-torture/execute/20040709-1.c Where: (insn 15 14 16 2 (set (reg/v:DI 200 [ y ]) (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit} (expr_list:REG_DEAD (reg:DI 2 $2) (nil))) (insn 16 15 17 2 (set (reg:DI 210) (zero_extract:DI (reg/v:DI 200 [ y ]) (const_int 29 [0x1d]) (const_int 0 [0]))) t.c:16 249 {extzvdi} (expr_list:REG_DEAD (reg/v:DI 200 [ y ]) (nil))) (insn 17 16 23 2 (set (reg:SI 211) (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2} (expr_list:REG_DEAD (reg:DI 210) (nil))) Gets converted to: (insn 23 17 26 2 (set (reg/i:SI 2 $2) (and:SI (reg:SI 2 $2 [+4 ]) (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3} (nil)) Which is considered an ext instruction And with the Octeon simulator which causes undefined arguments to 32bit word operations to come out as 0xDEADBEEF which showed the regression. I fixed it by changing it to produce TRUNCATE instead of the subreg. I did the simplification on ior/and rather than plus/minus/mult so the issue is only when expanding to this to and/ior. Thanks, Andrew Pinski > > testsuite/ChangeLog: > > 2012-09-27 Uros Bizjak <ubizjak@gmail.com> > > PR rtl-optimization/54457 > * gcc.target/i386/pr54457.c: New test. > > Uros.

Andrew Pinski <andrew.pinski@caviumnetworks.com> writes: > On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote: >> 2012-09-27 Uros Bizjak <ubizjak@gmail.com> >> >> PR rtl-optimization/54457 >> * simplify-rtx.c (simplify_subreg): >> Simplify (subreg:M (op:N ((x:N) (y:N)), 0) >> to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where >> the outer subreg is effectively a truncation to the original mode M. > > > When I was doing something similar on our internal toolchain at > Cavium. I found doing this caused a regression on MIPS64 n32 in > gcc.c-torture/execute/20040709-1.c Where: > > > (insn 15 14 16 2 (set (reg/v:DI 200 [ y ]) > (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit} > (expr_list:REG_DEAD (reg:DI 2 $2) > (nil))) > > (insn 16 15 17 2 (set (reg:DI 210) > (zero_extract:DI (reg/v:DI 200 [ y ]) > (const_int 29 [0x1d]) > (const_int 0 [0]))) t.c:16 249 {extzvdi} > (expr_list:REG_DEAD (reg/v:DI 200 [ y ]) > (nil))) > > (insn 17 16 23 2 (set (reg:SI 211) > (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2} > (expr_list:REG_DEAD (reg:DI 210) > (nil))) > > Gets converted to: > (insn 23 17 26 2 (set (reg/i:SI 2 $2) > (and:SI (reg:SI 2 $2 [+4 ]) > (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3} > (nil)) > > Which is considered an ext instruction > > And with the Octeon simulator which causes undefined arguments to > 32bit word operations to come out as 0xDEADBEEF which showed the > regression. I fixed it by changing it to produce TRUNCATE instead of > the subreg. > > I did the simplification on ior/and rather than plus/minus/mult so the > issue is only when expanding to this to and/ior. Hmm, hadn't thought of that. I think some of the existing subreg optimisations suffer the same problem. I.e. we can't assume that subreg truncations of nested operands are OK just because the outer subreg is OK. I've got a patch I'm testing. BTW, I haven't forgotten about your other ext patch. Was hoping to see whether we could finally take the opportunity to parameterise the ext* patterns by mode, but got distracted with other patches. Maybe I'll just have to admit I won't get time to try it for 4.8... Richard

Index: simplify-rtx.c =================================================================== --- simplify-rtx.c (revision 191808) +++ simplify-rtx.c (working copy) @@ -5689,6 +5689,28 @@ simplify_subreg (enum machine_mode outermode, rtx return CONST0_RTX (outermode); } + /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0) + to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where + the outer subreg is effectively a truncation to the original mode. */ + if ((GET_CODE (op) == PLUS + || GET_CODE (op) == MINUS + || GET_CODE (op) == MULT) + && SCALAR_INT_MODE_P (outermode) + && SCALAR_INT_MODE_P (innermode) + && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode) + && byte == subreg_lowpart_offset (outermode, innermode)) + { + rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0), + innermode, byte); + if (op0) + { + rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1), + innermode, byte); + if (op1) + return simplify_gen_binary (GET_CODE (op), outermode, op0, op1); + } + } + /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and the outer subreg is effectively a truncation to the original mode. */ Index: testsuite/gcc.target/i386/pr54457.c =================================================================== --- testsuite/gcc.target/i386/pr54457.c (revision 0) +++ testsuite/gcc.target/i386/pr54457.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile { target { ! { ia32 } } } } */ +/* { dg-options "-O2 -mx32 -maddress-mode=short" } */ + +extern char array[40]; + +char foo (long long position) +{ + return array[position + 1]; +} + +/* { dg-final { scan-assembler-not "add\[lq\]?\[^\n\]*1" } } */