Message ID | 5429A102.4000602@arm.com |
---|---|
State | New |
Headers | show |
On 09/29/2014 11:12 AM, Jiong Wang wrote: > +inline rtx single_set_no_clobber_use (const rtx_insn *insn) > +{ > + if (!INSN_P (insn)) > + return NULL_RTX; > + > + if (GET_CODE (PATTERN (insn)) == SET) > + return PATTERN (insn); > + > + /* Defer to the more expensive case, and return NULL_RTX if there is > + USE or CLOBBER. */ > + return single_set_2 (insn, PATTERN (insn), true); > } What more expensive case? If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET. I think this function is somewhat useless, and should not be added. An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't tried to understand the miscompilation yet. I can imagine that this would disable quite a bit of shrink wrapping for x86 though. Can we do better in understanding when the clobbered register is live at the location to which we'd like to move then insns? r~
On 29/09/14 19:32, Richard Henderson wrote: > On 09/29/2014 11:12 AM, Jiong Wang wrote: >> +inline rtx single_set_no_clobber_use (const rtx_insn *insn) >> +{ >> + if (!INSN_P (insn)) >> + return NULL_RTX; >> + >> + if (GET_CODE (PATTERN (insn)) == SET) >> + return PATTERN (insn); >> + >> + /* Defer to the more expensive case, and return NULL_RTX if there is >> + USE or CLOBBER. */ >> + return single_set_2 (insn, PATTERN (insn), true); >> } Richard, thanks for review. > What more expensive case? single_set_no_clobber_use is just a clone of single_set, I copied the comments with only minor modifications. I think the "more expensive case" here means the case where there are PARALLEL that we need to check the inner rtx. > > If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET. > > I think this function is somewhat useless, and should not be added. > > An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't > tried to understand the miscompilation yet. I can imagine that this would > disable quite a bit of shrink wrapping for x86 though. I don't think so. from the x86-64 bootstrap, there is no regression on the number of functions shrink-wrapped. actually speaking, previously only single mov dest, src handled, so the disallowing USE/CLOBBER will not disallow shrink-wrap opportunity which was allowed previously. and I am afraid if we don't reuse single_set_2, then there will be another loop to check all those inner rtx which single_set_2 already does. so, IMHO, just modify single_set_2 will be more efficient. > Can we do better in > understanding when the clobbered register is live at the location to which we'd > like to move then insns? currently, the generic code in move_insn_for_shrink_wrap only handle dest/src be single register, so if there is clobber or use, then we might need to check multiply regs, then there might be a few modifications. and I think that's better be done after all single dest/src issues fixed. -- Jiong > > > r~ > > >
On Mon, 2014-09-29 at 20:24 +0100, Jiong Wang wrote: > On 29/09/14 19:32, Richard Henderson wrote: > > On 09/29/2014 11:12 AM, Jiong Wang wrote: > >> +inline rtx single_set_no_clobber_use (const rtx_insn *insn) > >> +{ > >> + if (!INSN_P (insn)) > >> + return NULL_RTX; > >> + > >> + if (GET_CODE (PATTERN (insn)) == SET) > >> + return PATTERN (insn); > >> + > >> + /* Defer to the more expensive case, and return NULL_RTX if there is > >> + USE or CLOBBER. */ > >> + return single_set_2 (insn, PATTERN (insn), true); > >> } > > Richard, > > thanks for review. > > > What more expensive case? > > single_set_no_clobber_use is just a clone of single_set, I copied the comments with > only minor modifications. I introduced that comment to single_set, in r215089, when making single_set into an inline function (so that it could check that it received an rtx_insn *, rather than an rtx). > I think the "more expensive case" here means the case where there are PARALLEL that > we need to check the inner rtx. My comment may have been misleading, sorry. IIRC, what I was thinking that the old implementation had split single_set into a macro and a function. This was by Honza (CCed), 14 years ago to the day back in r36664 (on 2000-09-29): https://gcc.gnu.org/ml/gcc-patches/2000-09/msg00893.html /* Single set is implemented as macro for performance reasons. */ #define single_set(I) (INSN_P (I) \ ? (GET_CODE (PATTERN (I)) == SET \ ? PATTERN (I) : single_set_1 (I)) \ : NULL_RTX) I think by "the more expensive case" I meant having to make a function call to handle the less-common cases (which indeed covers the PARALLEL case), rather than having logic inline; preserving that inlined vs not-inlined split was one of my aims for r215089. Perhaps it should be rewritten to "Defer to a function call to handle the less common cases", or somesuch? [...snip rest of post...] Dave
On 09/29/14 13:24, Jiong Wang wrote: > > I don't think so. from the x86-64 bootstrap, there is no regression > on the number of functions shrink-wrapped. actually speaking, > previously only single mov dest, src handled, so the disallowing > USE/CLOBBER will not disallow shrink-wrap opportunity which was > allowed previously. This is the key, of course. shrink-wrapping is very restrictive in its ability to sink insns. The only forms it'll currently shrink are simple moves. Arithmetic, logicals, etc are left alone. Thus disallowing USE/CLOBBER does not impact the x86 port in any significant way. I do agree with Richard that it would be useful to see the insns that are incorrectly sunk and the surrounding context. Jeff
On 29/09/14 19:32, Richard Henderson wrote: > On 09/29/2014 11:12 AM, Jiong Wang wrote: >> +inline rtx single_set_no_clobber_use (const rtx_insn *insn) >> +{ >> + if (!INSN_P (insn)) >> + return NULL_RTX; >> + >> + if (GET_CODE (PATTERN (insn)) == SET) >> + return PATTERN (insn); >> + >> + /* Defer to the more expensive case, and return NULL_RTX if there is >> + USE or CLOBBER. */ >> + return single_set_2 (insn, PATTERN (insn), true); >> } > > What more expensive case? > > If you're disallowing USE and CLOBBER, then single_set is just GET_CODE == SET. > > I think this function is somewhat useless, and should not be added. > > An adjustment to move_insn_for_shrink_wrap may be reasonable though. I haven't > tried to understand the miscompilation yet. I can imagine that this would > disable quite a bit of shrink wrapping for x86 though. Can we do better in > understanding when the clobbered register is live at the location to which we'd > like to move then insns? > > > r~ > I think part of the problem is in the naming of single_set(). From the name it's not entirely obvious to users that this includes insns that clobber registers or which write other registers that are unused after that point. I've previously had to fix a bug where this assumption was made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300) Most uses of single_set prior to register allocation are probably safe; but later uses are fraught with potential problems of this nature and may well be bugs waiting to happen. R.
On 30/09/14 05:21, Jeff Law wrote: > On 09/29/14 13:24, Jiong Wang wrote: >> I don't think so. from the x86-64 bootstrap, there is no regression >> on the number of functions shrink-wrapped. actually speaking, >> previously only single mov dest, src handled, so the disallowing >> USE/CLOBBER will not disallow shrink-wrap opportunity which was >> allowed previously. > This is the key, of course. shrink-wrapping is very restrictive in its > ability to sink insns. The only forms it'll currently shrink are simple > moves. Arithmetic, logicals, etc are left alone. Thus disallowing > USE/CLOBBER does not impact the x86 port in any significant way. yes, and we could get +1.22% (2567 compared with 2536) functions shrink-wrapped after we sinking more insn except simple "mov dest, src" on x86-64 bootstrap. and I remember the similar percentage on glibc build. while on aarch64, the overall functions shrink-wrapped increased +25% on some programs. maybe we could gain the same on other RISC backend. > > I do agree with Richard that it would be useful to see the insns that > are incorrectly sunk and the surrounding context. insn 14 and 182 are sunk incorrectly. below is the details. (insn 14 173 174 2 (parallel [ (set (reg:QI 37 r8 [orig:86 D.32480 ] [86]) (lshiftrt:QI (reg:QI 37 r8 [orig:86 D.32480 ] [86]) (const_int 2 [0x2]))) (clobber (reg:CC 17 flags)) ]) /home/andi/lsrc/linux/block/blk-flush2.c:50 547 {*lshrqi3_1} (expr_list:REG_EQUAL (lshiftrt:QI (mem:QI (plus:DI (reg/v/f:DI 43 r14 [orig:85 q ] [85]) (const_int 1612 [0x64c])) [20 *q_7+1612 S1 A32]) (const_int 2 [0x2])) (nil))) (insn 174 14 182 2 (set (reg:QI 44 r15 [orig:86 D.32480 ] [86]) (reg:QI 37 r8 [orig:86 D.32480 ] [86])) /home/andi/lsrc/linux/block/blk-flush2.c:50 93 {*movqi_internal} (nil)) (insn 182 174 16 2 (parallel [ (set (reg:SI 44 r15 [orig:86 D.32480 ] [86]) (and:SI (reg:SI 44 r15 [orig:86 D.32480 ] [86]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) /home/andi/lsrc/linux/block/blk-flush2.c:50 376 {*andsi_1} (nil)) > Jeff > > >
On 09/30/14 08:37, Jiong Wang wrote: > > On 30/09/14 05:21, Jeff Law wrote: >> On 09/29/14 13:24, Jiong Wang wrote: >>> I don't think so. from the x86-64 bootstrap, there is no regression >>> on the number of functions shrink-wrapped. actually speaking, >>> previously only single mov dest, src handled, so the disallowing >>> USE/CLOBBER will not disallow shrink-wrap opportunity which was >>> allowed previously. >> This is the key, of course. shrink-wrapping is very restrictive in its >> ability to sink insns. The only forms it'll currently shrink are simple >> moves. Arithmetic, logicals, etc are left alone. Thus disallowing >> USE/CLOBBER does not impact the x86 port in any significant way. > > yes, and we could get +1.22% (2567 compared with 2536) functions > shrink-wrapped > after we sinking more insn except simple "mov dest, src" on x86-64 > bootstrap. and > I remember the similar percentage on glibc build. > > while on aarch64, the overall functions shrink-wrapped increased +25% on > some > programs. maybe we could gain the same on other RISC backend. > >> >> I do agree with Richard that it would be useful to see the insns that >> are incorrectly sunk and the surrounding context. So I must be missing something. I thought the shrink-wrapping code wouldn't sink arithmetic/logical insns like we see with insn 14 and insn 182. I thought it was limited to reg-reg copies and constant initializations. Jeff
On 09/30/14 08:15, Richard Earnshaw wrote: > > I think part of the problem is in the naming of single_set(). From the > name it's not entirely obvious to users that this includes insns that > clobber registers or which write other registers that are unused after > that point. I've previously had to fix a bug where this assumption was > made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300) > > Most uses of single_set prior to register allocation are probably safe; > but later uses are fraught with potential problems of this nature and > may well be bugs waiting to happen. Very possibly. There's a bit of a natural tension here in that often we don't much care about the additional CLOBBERS, but when we get it wrong, obviously it's bad. I haven't done any research, but I suspect the change it ignore clobbers in single_set came in as part of exposing the CC register and avoiding regressions all over the place as a result. I wonder what would happen if we ignored prior to register allocation, then rejected insns with those CLOBBERs once register allocation started. Jeff
On 30/09/14 17:45, Jeff Law wrote: > On 09/30/14 08:15, Richard Earnshaw wrote: >> >> I think part of the problem is in the naming of single_set(). From the >> name it's not entirely obvious to users that this includes insns that >> clobber registers or which write other registers that are unused after >> that point. I've previously had to fix a bug where this assumption was >> made (eg https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54300) >> >> Most uses of single_set prior to register allocation are probably safe; >> but later uses are fraught with potential problems of this nature and >> may well be bugs waiting to happen. > Very possibly. There's a bit of a natural tension here in that often we > don't much care about the additional CLOBBERS, but when we get it wrong, > obviously it's bad. > > I haven't done any research, but I suspect the change it ignore clobbers > in single_set came in as part of exposing the CC register and avoiding > regressions all over the place as a result. It's not just clobbers; it ignores patterns like (parallel [(set (a) (...) (set (b) (...)]) [(reg_note (reg_unused(b))] Which is probably fine before register allocation but definitely something you have to think about afterwards. > > I wonder what would happen if we ignored prior to register allocation, > then rejected insns with those CLOBBERs once register allocation started. > Might work; but it might miss some useful cases as well... R.
2014-09-30 17:30 GMT+01:00 Jeff Law <law@redhat.com>: > On 09/30/14 08:37, Jiong Wang wrote: >> >> >> On 30/09/14 05:21, Jeff Law wrote: >>> >>> On 09/29/14 13:24, Jiong Wang wrote: >>>> >>>> I don't think so. from the x86-64 bootstrap, there is no regression >>>> on the number of functions shrink-wrapped. actually speaking, >>>> previously only single mov dest, src handled, so the disallowing >>>> USE/CLOBBER will not disallow shrink-wrap opportunity which was >>>> allowed previously. >>> >>> This is the key, of course. shrink-wrapping is very restrictive in its >>> ability to sink insns. The only forms it'll currently shrink are simple >>> moves. Arithmetic, logicals, etc are left alone. Thus disallowing >>> USE/CLOBBER does not impact the x86 port in any significant way. >> >> >> yes, and we could get +1.22% (2567 compared with 2536) functions >> shrink-wrapped >> after we sinking more insn except simple "mov dest, src" on x86-64 >> bootstrap. and >> I remember the similar percentage on glibc build. >> >> while on aarch64, the overall functions shrink-wrapped increased +25% on >> some >> programs. maybe we could gain the same on other RISC backend. >> >>> >>> I do agree with Richard that it would be useful to see the insns that >>> are incorrectly sunk and the surrounding context. > > So I must be missing something. I thought the shrink-wrapping code wouldn't > sink arithmetic/logical insns like we see with insn 14 and insn 182. I > thought it was limited to reg-reg copies and constant initializations. yes, it was limited to reg-reg copies, and my previous sink improvement aimed to sink any rtx A: be single_set B: the src operand be any combination of no more than one register and no non-constant objects. while some operator like shift may have side effect. IMHO, all side effects are reflected on RTX, together with this fail_on_clobber_use modification, the rtx returned by single_set_no_clobber_use is safe to sink if it meets the above limit B and pass later register use/def check in move_insn_for_shrink_wrap ? Regards, Jiong > > Jeff > >
On Tue, Sep 30, 2014 at 6:51 PM, Richard Earnshaw wrote: > It's not just clobbers; it ignores patterns like > > (parallel > [(set (a) (...) > (set (b) (...)]) > [(reg_note (reg_unused(b))] > > Which is probably fine before register allocation but definitely > something you have to think about afterwards. Even before RA this isn't always fine. We have checks for !multiple_sets for this. Ciao! Steven
diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c index 9337be1..09d3c4a 100644 --- a/gcc/config/ia64/ia64.c +++ b/gcc/config/ia64/ia64.c @@ -7172,7 +7172,7 @@ ia64_single_set (rtx_insn *insn) break; default: - ret = single_set_2 (insn, x); + ret = single_set_2 (insn, x, false); break; } diff --git a/gcc/rtl.h b/gcc/rtl.h index e73f731..7c40d5a 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2797,7 +2797,7 @@ extern void set_insn_deleted (rtx); /* Functions in rtlanal.c */ -extern rtx single_set_2 (const rtx_insn *, const_rtx); +extern rtx single_set_2 (const rtx_insn *, const_rtx, bool fail_on_clobber_use); /* Handle the cheap and common cases inline for performance. */ @@ -2810,7 +2810,20 @@ inline rtx single_set (const rtx_insn *insn) return PATTERN (insn); /* Defer to the more expensive case. */ - return single_set_2 (insn, PATTERN (insn)); + return single_set_2 (insn, PATTERN (insn), false); +} + +inline rtx single_set_no_clobber_use (const rtx_insn *insn) +{ + if (!INSN_P (insn)) + return NULL_RTX; + + if (GET_CODE (PATTERN (insn)) == SET) + return PATTERN (insn); + + /* Defer to the more expensive case, and return NULL_RTX if there is + USE or CLOBBER. */ + return single_set_2 (insn, PATTERN (insn), true); } extern enum machine_mode get_address_mode (rtx mem); diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 3063458..cb5e36a 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1182,7 +1182,7 @@ record_hard_reg_uses (rtx *px, void *data) will not be used, which we ignore. */ rtx -single_set_2 (const rtx_insn *insn, const_rtx pat) +single_set_2 (const rtx_insn *insn, const_rtx pat, bool fail_on_clobber_use) { rtx set = NULL; int set_verified = 1; @@ -1197,6 +1197,8 @@ single_set_2 (const rtx_insn *insn, const_rtx pat) { case USE: case CLOBBER: + if (fail_on_clobber_use) + return NULL_RTX; break; case SET: diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index b1ff8a2..5624ef7 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -177,7 +177,7 @@ move_insn_for_shrink_wrap (basic_block bb, rtx_insn *insn, edge live_edge; /* Look for a simple register copy. */ - set = single_set (insn); + set = single_set_no_clobber_use (insn); if (!set) return false; src = SET_SRC (set);