Message ID | CAFULd4bd2XDzAF4ECtb4berWMU2Dxh9ZWyZP-22cgteY617R-A@mail.gmail.com |
---|---|
State | New |
Headers | show |
Hi, On Fri, 2012-08-17 at 20:34 +0200, Uros Bizjak wrote: > On Fri, Aug 17, 2012 at 6:36 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > > > 2012-08-17 Uros Bizjak <ubizjak@gmail.com> > > > > PR rtl-optimization/46829 > > * combine.c (recog_for_combine): Check operand constraints > > in case hard registers were propagater into insn pattern. > > > > testsuite/ChangeLog: > > > > 2012-08-17 Uros Bizjak <ubizjak@gmail.com> > > > > PR rtl-optimization/46829 > > * gcc.target/i386/pr46829.c: New test. > > > > Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. > > Oh ... > > This part: > > + && rtx_equal_p (recog_data.operand[j], > + recog_data.operand[op_alt[j].matches])) > > should read: > > + && rtx_equal_p (recog_data.operand[i], > + recog_data.operand[op_alt[j].matches])) > > Note the "j" vs "i" array index difference in the first line. > > Correct patch attached, bootstrapped and re-tested on > x86_64-pc-linux-gnu {,-m32}. > I've briefly checked your patch against rev 190459 for the SH target. The CSiBE result-size for '-m4 -ml -O2 -mpretend-cmove' shows quite some movements and improvements. However, the additional restrictions you've added effectively disable some of the combine patterns I've been adding recently to improve code quality on SH and I see some SH specific failures (haven't run the full test suite though): FAIL: gcc.target/sh/pr49263.c scan-assembler-not and FAIL: gcc.target/sh/pr51244-1.c scan-assembler-not movt|tst|negc|extu FAIL: gcc.target/sh/pr51244-10.c scan-assembler-not shll|subc|and FAIL: gcc.target/sh/pr51244-7.c scan-assembler-not cmp/hi FAIL: gcc.target/sh/pr51244-7.c scan-assembler-not mov\t#0 FAIL: gcc.target/sh/pr51244-8.c scan-assembler-not shad|neg FAIL: gcc.target/sh/pr51244-9.c scan-assembler-not mov\t#0 FAIL: gcc.target/sh/pr52933-1.c scan-assembler-times div0s 25 FAIL: gcc.target/sh/pr52933-2.c scan-assembler-times div0s 25 FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times addc 3 FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times subc 3 FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times sett 4 FAIL: gcc.target/sh/pr54236-1.c scan-assembler-not movt I've not checked every single case, but the first two seem to be the core issues. 1) FAIL: gcc.target/sh/pr49263.c scan-assembler-not and Test function: int test (uint8_t x) { return x & 15 ? -40 : -9; } Pattern in sh.md that would originally match: (define_insn "tstsi_t_zero_extract_eq" [(set (reg:SI T_REG) (eq:SI (zero_extract:SI (match_operand 0 "logical_operand" "z") (match_operand:SI 1 "const_int_operand") (match_operand:SI 2 "const_int_operand")) (const_int 0)))] ...) ("z" is a constraint for the r0 register on SH) Combine now says: Trying 11 -> 12: Successfully matched this instruction: (set (reg:SI 147 t) (eq:SI (zero_extract:SI (reg:SI 4 r4 [ x ]) (const_int 4 [0x4]) (const_int 0 [0])) (const_int 0 [0]))) Operand failed to match constraints: (reg:SI 4 r4 [ x ]) The problem in this case is that on SH (and other targets, too) function args are passed in regs. In this case it's the hard reg r4, which is sort of pre-allocated already. Before your patch, the pattern would match and reload would stuff in a 'mov r4,r0' to satisfy the "z" constraint. 2) FAIL: gcc.target/sh/pr51244-1.c scan-assembler-not movt|tst|negc|extu Test function: int testfunc_01 (int a, int b, int c, int d) { return (a == b || a == d) ? b : c; } Pattern in sh.md that would originally match: (define_insn_and_split "nott" [(set (reg:SI T_REG) (xor:SI (match_operand:SI 0 "t_reg_operand" "") (const_int 1)))] "TARGET_SH1" ...) Combine now says: Trying 14 -> 15: Successfully matched this instruction: (set (reg:SI 147 t) (xor:SI (reg:SI 147 t) (const_int 1 [0x1]))) Operand failed to match constraints: (reg:SI 147 t) In this case the T_REG matching is already in the predicate (similar to the thing that's being done on x86 you've mentioned before). I've started using this kind of matching pattern a while ago to get simpler and better matching combine patterns around SH's T_REG. (T_REG is a fixed 1 bit hard reg on SH, treated as SImode throughout the MD. It is used to hold comparison results, carry/borrow bits etc). I'm not sure what would be the possible solutions for those cases. For 1) maybe pre-allocated regs of args should be moved to pseudos before combine? For 2) I could probably try to come up with some matching that would satisfy the new restrictions in combine and fix up the affected patterns throughout the MD. In the worst case, would it be an option to make this restriction in combine optional (target hook or something)? Cheers, Oleg
On Fri, Aug 17, 2012 at 9:58 PM, Oleg Endo <oleg.endo@t-online.de> wrote: >> > 2012-08-17 Uros Bizjak <ubizjak@gmail.com> >> > >> > PR rtl-optimization/46829 >> > * combine.c (recog_for_combine): Check operand constraints >> > in case hard registers were propagater into insn pattern. >> > >> > testsuite/ChangeLog: >> > >> > 2012-08-17 Uros Bizjak <ubizjak@gmail.com> >> > >> > PR rtl-optimization/46829 >> > * gcc.target/i386/pr46829.c: New test. >> > >> > Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. >> >> Oh ... >> >> This part: >> >> + && rtx_equal_p (recog_data.operand[j], >> + recog_data.operand[op_alt[j].matches])) >> >> should read: >> >> + && rtx_equal_p (recog_data.operand[i], >> + recog_data.operand[op_alt[j].matches])) >> >> Note the "j" vs "i" array index difference in the first line. >> >> Correct patch attached, bootstrapped and re-tested on >> x86_64-pc-linux-gnu {,-m32}. >> > > I've briefly checked your patch against rev 190459 for the SH target. > The CSiBE result-size for '-m4 -ml -O2 -mpretend-cmove' shows quite some > movements and improvements. However, the additional restrictions you've > added effectively disable some of the combine patterns I've been adding > recently to improve code quality on SH and I see some SH specific > failures (haven't run the full test suite though): > FAIL: gcc.target/sh/pr49263.c scan-assembler-not and > FAIL: gcc.target/sh/pr51244-1.c scan-assembler-not movt|tst|negc|extu > FAIL: gcc.target/sh/pr51244-10.c scan-assembler-not shll|subc|and > FAIL: gcc.target/sh/pr51244-7.c scan-assembler-not cmp/hi > FAIL: gcc.target/sh/pr51244-7.c scan-assembler-not mov\t#0 > FAIL: gcc.target/sh/pr51244-8.c scan-assembler-not shad|neg > FAIL: gcc.target/sh/pr51244-9.c scan-assembler-not mov\t#0 > FAIL: gcc.target/sh/pr52933-1.c scan-assembler-times div0s 25 > FAIL: gcc.target/sh/pr52933-2.c scan-assembler-times div0s 25 > FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times addc 3 > FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times subc 3 > FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times sett 4 > FAIL: gcc.target/sh/pr54236-1.c scan-assembler-not movt > > I've not checked every single case, but the first two seem to be the > core issues. > > 1) FAIL: gcc.target/sh/pr49263.c scan-assembler-not and > > Test function: > > int test (uint8_t x) > { > return x & 15 ? -40 : -9; > } > > Pattern in sh.md that would originally match: > > (define_insn "tstsi_t_zero_extract_eq" > [(set (reg:SI T_REG) > (eq:SI (zero_extract:SI (match_operand 0 "logical_operand" "z") > (match_operand:SI 1 "const_int_operand") > (match_operand:SI 2 "const_int_operand")) > (const_int 0)))] > ...) > > ("z" is a constraint for the r0 register on SH) > > Combine now says: > > Trying 11 -> 12: > Successfully matched this instruction: > (set (reg:SI 147 t) > (eq:SI (zero_extract:SI (reg:SI 4 r4 [ x ]) > (const_int 4 [0x4]) > (const_int 0 [0])) > (const_int 0 [0]))) > Operand failed to match constraints: > (reg:SI 4 r4 [ x ]) > > The problem in this case is that on SH (and other targets, too) function > args are passed in regs. In this case it's the hard reg r4, which is > sort of pre-allocated already. Before your patch, the pattern would > match and reload would stuff in a 'mov r4,r0' to satisfy the "z" > constraint. Yes, x86_64 also has register passing convention. So, i.e.: --cut here-- int foo (int a, int b, int c) { return a + b + c; } --cut here-- expands to: (note 6 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 6 3 2 (set (reg/v:SI 62 [ a ]) (reg:SI 5 di [ a ])) t.c:2 -1 (nil)) (insn 3 2 4 2 (set (reg/v:SI 63 [ b ]) (reg:SI 4 si [ b ])) t.c:2 -1 (nil)) (insn 4 3 5 2 (set (reg/v:SI 64 [ c ]) (reg:SI 1 dx [ c ])) t.c:2 -1 (nil)) (note 5 4 8 2 NOTE_INSN_FUNCTION_BEG) (insn 8 5 9 2 (parallel [ (set (reg:SI 66 [ D.1722 ]) (plus:SI (reg/v:SI 62 [ a ]) (reg/v:SI 63 [ b ]))) (clobber (reg:CC 17 flags)) ]) t.c:3 -1 (nil)) (...) So, we expand function arguments to pseudos first, and these survive up to the combine pass. If sh expanded func arguments to pseudos, then invalid hard register won't be propagated to insn pattern, leaving reload with much more freedom to choose correct register. I believe that code improvements you are seeing come from this added reload freedom. > 2) FAIL: gcc.target/sh/pr51244-1.c scan-assembler-not movt|tst|negc|extu > > Test function: > > int > testfunc_01 (int a, int b, int c, int d) > { > return (a == b || a == d) ? b : c; > } > > Pattern in sh.md that would originally match: > > (define_insn_and_split "nott" > [(set (reg:SI T_REG) > (xor:SI (match_operand:SI 0 "t_reg_operand" "") (const_int 1)))] > "TARGET_SH1" > ...) > > Combine now says: > > Trying 14 -> 15: > Successfully matched this instruction: > (set (reg:SI 147 t) > (xor:SI (reg:SI 147 t) > (const_int 1 [0x1]))) > Operand failed to match constraints: > (reg:SI 147 t) > > In this case the T_REG matching is already in the predicate (similar to > the thing that's being done on x86 you've mentioned before). I've > started using this kind of matching pattern a while ago to get simpler > and better matching combine patterns around SH's T_REG. > (T_REG is a fixed 1 bit hard reg on SH, treated as SImode throughout the > MD. It is used to hold comparison results, carry/borrow bits etc). Hm, I'd write this instruction as: (define_insn_and_split "nott" [(set (match_operand:SI 0 "t_reg_operand" "t") (xor:SI (match_operand:SI 1 "t_reg_operand" "0") (const_int 1)))] "TARGET_SH1" ...) This would match new combine limitation without problems, although empty constraint should be matched as well. Can you perhaps investigate a bit, why op_alt[j].anything.ok in + if (op_alt[j].anything_ok + || (op_alt[j].matches != -1 + && rtx_equal_p (recog_data.operand[i], + recog_data.operand[op_alt[j].matches])) + || (reg_fits_class_p (op, op_alt[j].cl, offset, mode))) does not trigger for this operation? Maybe we need to add another condition to allow empty constraint. > I'm not sure what would be the possible solutions for those cases. > For 1) maybe pre-allocated regs of args should be moved to pseudos > before combine? x86_64 implement this approach. > For 2) I could probably try to come up with some matching that would > satisfy the new restrictions in combine and fix up the affected patterns > throughout the MD. No, empty operand constraints should be universally matched. > In the worst case, would it be an option to make this restriction in > combine optional (target hook or something)? IMO, a patch like this that touches many targets won't be applied over night. Rest assured that there will be a discussion about it. Thanks for testing the patch, I look forward to comments. BR, Uros.
On Fri, 2012-08-17 at 22:32 +0200, Uros Bizjak wrote: > Yes, x86_64 also has register passing convention. So, i.e.: > > --cut here-- > int foo (int a, int b, int c) > { > return a + b + c; > } > --cut here-- > > expands to: > > (note 6 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > (insn 2 6 3 2 (set (reg/v:SI 62 [ a ]) > (reg:SI 5 di [ a ])) t.c:2 -1 > (nil)) > (insn 3 2 4 2 (set (reg/v:SI 63 [ b ]) > (reg:SI 4 si [ b ])) t.c:2 -1 > (nil)) > (insn 4 3 5 2 (set (reg/v:SI 64 [ c ]) > (reg:SI 1 dx [ c ])) t.c:2 -1 > (nil)) > (note 5 4 8 2 NOTE_INSN_FUNCTION_BEG) > (insn 8 5 9 2 (parallel [ > (set (reg:SI 66 [ D.1722 ]) > (plus:SI (reg/v:SI 62 [ a ]) > (reg/v:SI 63 [ b ]))) > (clobber (reg:CC 17 flags)) > ]) t.c:3 -1 > (nil)) > (...) > > So, we expand function arguments to pseudos first, and these survive > up to the combine pass. If sh expanded func arguments to pseudos, then > invalid hard register won't be propagated to insn pattern, leaving > reload with much more freedom to choose correct register. I believe > that code improvements you are seeing come from this added reload > freedom. It seems on SH func args are expanded to pseudos, sorry for not checking this properly. int test (uint8_t x) { return x & 15 ? -40 : -9; } expands to: (note 1 0 7 NOTE_INSN_DELETED) (note 7 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 7 3 2 (set (reg:SI 164) (reg:SI 4 r4 [ x ])) sh_tmp.cpp:9 -1 (nil)) (insn 3 2 4 2 (set (reg/v:SI 163 [ x ]) (zero_extend:SI (subreg:QI (reg:SI 164) 0))) sh_tmp.cpp:9 -1 (nil)) (note 4 3 9 2 NOTE_INSN_FUNCTION_BEG) (insn 9 4 10 2 (set (reg:SI 165) (and:SI (reg/v:SI 163 [ x ]) (const_int 15 [0xf]))) sh_tmp.cpp:10 -1 (nil)) ... Then in combine it goes like: Successfully matched this instruction: (set (reg/v:SI 163 [ x ]) (zero_extend:SI (reg:QI 4 r4 [ x ]))) deferring deletion of insn with uid = 2. modifying insn i3 3 r163:SI=zero_extend(r4:QI) REG_DEAD: r4:SI deferring rescan insn with uid = 3. [[note: the matched pattern above is "*zero_extendqisi2_compact" ]] Trying 3 -> 9: Successfully matched this instruction: (set (reg:SI 165) (and:SI (reg:SI 4 r4 [ x ]) (const_int 15 [0xf]))) deferring deletion of insn with uid = 3. modifying insn i3 9 r165:SI=r4:SI&0xf REG_DEAD: r4:SI deferring rescan insn with uid = 9. Trying 9 -> 11: Successfully matched this instruction: (set (reg:SI 167) (and:SI (reg:SI 4 r4 [ x ]) (const_int 15 [0xf]))) deferring deletion of insn with uid = 9. modifying insn i3 11 r167:SI=r4:SI&0xf REG_DEAD: r4:SI deferring rescan insn with uid = 11. Trying 11 -> 12: Successfully matched this instruction: (set (reg:SI 147 t) (eq:SI (zero_extract:SI (reg:SI 4 r4 [ x ]) (const_int 4 [0x4]) (const_int 0 [0])) (const_int 0 [0]))) Operand failed to match constraints: (reg:SI 4 r4 [ x ]) > Hm, I'd write this instruction as: > > (define_insn_and_split "nott" > [(set (match_operand:SI 0 "t_reg_operand" "t") > (xor:SI (match_operand:SI 1 "t_reg_operand" "0") (const_int 1)))] > "TARGET_SH1" > ...) > > This would match new combine limitation without problems, although > empty constraint should be matched as well. Can you perhaps > investigate a bit, why op_alt[j].anything.ok in > > + if (op_alt[j].anything_ok > + || (op_alt[j].matches != -1 > + && rtx_equal_p (recog_data.operand[i], > + recog_data.operand[op_alt[j].matches])) > + || (reg_fits_class_p (op, op_alt[j].cl, offset, mode))) > > does not trigger for this operation? Maybe we need to add another > condition to allow empty constraint. It seems that in this case there are no alternatives in recog_data at all. The 'for' never runs, 'win' remains 'false'. Doing - if (!win) + if (!win && recog_data.n_alternatives > 0) seems to help. > IMO, a patch like this that touches many targets won't be applied over > night. Rest assured that there will be a discussion about it. > > Thanks for testing the patch, I look forward to comments. Sure, no problem. Cheers, Oleg
Index: combine.c =================================================================== --- combine.c (revision 190480) +++ combine.c (working copy) @@ -10507,6 +10507,7 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn int i; rtx notes = 0; rtx old_notes, old_pat; + int old_icode; /* If PAT is a PARALLEL, check to see if it contains the CLOBBER we use to indicate that something didn't match. If we find such a @@ -10566,6 +10567,7 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn print_rtl_single (dump_file, pat); } } + PATTERN (insn) = old_pat; REG_NOTES (insn) = old_notes; @@ -10607,6 +10609,86 @@ recog_for_combine (rtx *pnewpat, rtx insn, rtx *pn pat = newpat; } + old_pat = PATTERN (insn); + old_notes = REG_NOTES (insn); + old_icode = INSN_CODE (insn); + PATTERN (insn) = pat; + REG_NOTES (insn) = notes; + + /* Check operand constraints in case hard registers were propagated + into insn pattern. This check prevents combine pass from + generating insn patterns with invalid hard register operands. + These invalid insns can eventually confuse reload to error out + with a spill failure. See also PR 46829. */ + if (insn_code_number >= 0 + && insn_code_number != NOOP_MOVE_INSN_CODE + && (INSN_CODE (insn) = recog (PATTERN (insn), insn, 0)) >= 0) + { + extract_insn (insn); + preprocess_constraints (); + + for (i = 0; i < recog_data.n_operands; i++) + { + rtx op = recog_data.operand[i]; + enum machine_mode mode = GET_MODE (op); + struct operand_alternative *op_alt; + int offset = 0; + bool win; + int j; + + /* A unary operator may be accepted by the predicate, but it + is irrelevant for matching constraints. */ + if (UNARY_P (op)) + op = XEXP (op, 0); + + if (GET_CODE (op) == SUBREG) + { + if (REG_P (SUBREG_REG (op)) + && REGNO (SUBREG_REG (op)) < FIRST_PSEUDO_REGISTER) + offset = subreg_regno_offset (REGNO (SUBREG_REG (op)), + GET_MODE (SUBREG_REG (op)), + SUBREG_BYTE (op), + GET_MODE (op)); + op = SUBREG_REG (op); + } + + if (!(REG_P (op) && HARD_REGISTER_P (op))) + continue; + + op_alt = recog_op_alt[i]; + + win = false; + for (j = 0; j < recog_data.n_alternatives; j++) + { + if (op_alt[j].anything_ok + || (op_alt[j].matches != -1 + && rtx_equal_p (recog_data.operand[i], + recog_data.operand[op_alt[j].matches])) + || (reg_fits_class_p (op, op_alt[j].cl, offset, mode))) + { + win = true; + break; + } + } + + if (!win) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fputs ("Operand failed to match constraints:\n", + dump_file); + print_rtl_single (dump_file, op); + } + insn_code_number = -1; + break; + } + } + } + + PATTERN (insn) = old_pat; + REG_NOTES (insn) = old_notes; + INSN_CODE (insn) = old_icode; + *pnewpat = pat; *pnotes = notes; Index: testsuite/gcc.target/i386/pr46829.c =================================================================== --- testsuite/gcc.target/i386/pr46829.c (revision 0) +++ testsuite/gcc.target/i386/pr46829.c (working copy) @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fschedule-insns" } */ + +struct S +{ + int i, j; +}; + +extern struct S s[]; + +extern void bar (int, ...); + +void +foo (int n) +{ + while (s[n].i) + bar (0, n, s[n].j, s, s[n].i / s[n].j); +}