Message ID | CAFULd4ZLes9A0mFFYvnv-CDC+n9w7hFw67PsP7Q9UTwz2eGUVA@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Wed, May 10, 2017 at 9:05 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Wed, May 10, 2017 at 5:18 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote: >>> On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote: >>>> Attached patch enables post-reload compare elimination pass by >>>> providing expected patterns (duplicates of existing patterns with >>>> setters of reg and flags switched in the parallel) for flag setting >>>> arithmetic instructions. >>>> >>>> The merge triggers more than 3000 times during the gcc bootstrap, >>>> mostly in cases where intervening memory load or store prevents >>>> combine from merging the arithmetic insn and the following compare. >>>> >>>> Also, some recent linux x86_64 defconfig build results in ~200 merges, >>>> removing ~200 test/cmp insns. Not much, but I think the results still >>>> warrant the pass to be enabled. >>> >>> Isn't the right fix instead to change the compare-elim.c pass to either >>> accept both reg vs. flags orderings in parallel, or both depending >>> on some target hook, or change it to the order i386.md and most other >>> major targets use and just fix up mn10300/rx (and aarch64?) to use the same >>> order? > > Attached patch changes compare-elim.c order to what i386.md expects. BTW: This patch now catches 417 cases (instead of 200+) in linux build, including e.g.: (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) (const_int 1 [0x1])) (const_int 0 [0]))) (set (reg:DI 4 si) (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) (const_int 1 [0x1])))) ]) Uros.
On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote: > BTW: This patch now catches 417 cases (instead of 200+) in linux > build, including e.g.: > > (parallel [ > (set (reg:CCZ 17 flags) > (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) > (const_int 1 [0x1])) > (const_int 0 [0]))) > (set (reg:DI 4 si) > (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) > (const_int 1 [0x1])))) > ]) That looks nice. So, I think we need analysis on what order which targets use. I have looked at mn10300.md, I see {add,sub}si3_flags patterns that would need PARALLEL reordering for this compare-elim.c change and then cmp_liw vs. liw_cmp patterns I have no clue what they do and whether compare-elim.c would care about those or not (they have UNSPECs). Jeff/Alex? In rx.md I see {add,sub}si3_flags too, then ssaddsi3 and 2 peepholes that would need changing. In visium.md I see flags_subst_{logic,arith} define_substs, *{add,sub}<mode>3_insn_set_{carry,overflow} {add,sub}si3_insn_set_{carry,overflow}, negsi2_insn_set_carry and *neg<mode>2_insn_set_overflow that would need changing. aarch64 is the only remaining compare-elim.c enabled target (one that defines TARGET_FLAGS_REGNUM), and that one seems to use the same parallel order as i386.md, so compare-elim.c most likely just doesn't work there at all. So all in all, sounds like we need to change at least 17 patterns on 3 not very widely used targets. Jakub
On 05/10/2017 02:27 PM, Jakub Jelinek wrote: > On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote: >> BTW: This patch now catches 417 cases (instead of 200+) in linux >> build, including e.g.: >> >> (parallel [ >> (set (reg:CCZ 17 flags) >> (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) >> (const_int 1 [0x1])) >> (const_int 0 [0]))) >> (set (reg:DI 4 si) >> (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) >> (const_int 1 [0x1])))) >> ]) > > That looks nice. So, I think we need analysis on what order which targets > use. I have looked at mn10300.md, I see {add,sub}si3_flags patterns that > would need PARALLEL reordering for this compare-elim.c change and then > cmp_liw vs. liw_cmp patterns I have no clue what they do and whether > compare-elim.c would care about those or not (they have UNSPECs). Jeff/Alex? No idea what the cmp_liw and liw_cmp patterns are -- they were added after my involvement with the mn103 ended. I know it's been discussed before, but all that's needed here is to put the assignment to CC_REG first in patterns were CC_REG is assigned a useful value, right? For patterns were CC_REG is clobbered, we leave them as-is, right? Does this impact logicals or just arithmetic? If it hits logicals, there's a few more patterns in the mn103 port and likely the rx port as well. But the changes look highly mechanical. Furthermore, it's trivial for me to build this port through newlib. So I could build before the change, stash away the resulting libc.a, then build after the change and compare the contents of the resulting objects. In theory they should be identical unless I'm missing something. I would say don't let the mn103 stop this work. If the work goes forward, I'll own cleaning up the mn103. I don't mind assigning the cleanup of rx to Nick to be done after the generic changes are done. That just leaves visium which Eric B. owns. Note that I can test visium in the same way I can test mn103 and rx, so if someone does the visium work I can test it. Jeff
On 05/10/2017 01:05 PM, Uros Bizjak wrote: > On Wed, May 10, 2017 at 5:18 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote: >>> On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote: >>>> Attached patch enables post-reload compare elimination pass by >>>> providing expected patterns (duplicates of existing patterns with >>>> setters of reg and flags switched in the parallel) for flag setting >>>> arithmetic instructions. >>>> >>>> The merge triggers more than 3000 times during the gcc bootstrap, >>>> mostly in cases where intervening memory load or store prevents >>>> combine from merging the arithmetic insn and the following compare. >>>> >>>> Also, some recent linux x86_64 defconfig build results in ~200 merges, >>>> removing ~200 test/cmp insns. Not much, but I think the results still >>>> warrant the pass to be enabled. >>> >>> Isn't the right fix instead to change the compare-elim.c pass to either >>> accept both reg vs. flags orderings in parallel, or both depending >>> on some target hook, or change it to the order i386.md and most other >>> major targets use and just fix up mn10300/rx (and aarch64?) to use the same >>> order? > > Attached patch changes compare-elim.c order to what i386.md expects. > > Thoughts? Haven't looked at the patch itself. But I do have the necessary bits to convert the mn103 port. It was slightly more than just fixing the md file, but nothing significant or time consuming. The net result is 100% identical code for newlib before your patch vs after your patch w/mn103 converted. Hell, it was easy enough, I'll take a cut at the rx port. jeff
On Wed, 10 May 2017, Jakub Jelinek wrote: > On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote: > > BTW: This patch now catches 417 cases (instead of 200+) in linux > > build, including e.g.: > > > > (parallel [ > > (set (reg:CCZ 17 flags) > > (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) > > (const_int 1 [0x1])) > > (const_int 0 [0]))) > > (set (reg:DI 4 si) > > (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) > > (const_int 1 [0x1])))) > > ]) > > That looks nice. So, I think we need analysis on what order which targets > use. I have looked at mn10300.md, I see {add,sub}si3_flags patterns that > would need PARALLEL reordering for this compare-elim.c change and then > cmp_liw vs. liw_cmp patterns I have no clue what they do and whether > compare-elim.c would care about those or not (they have UNSPECs). Jeff/Alex? > > In rx.md I see {add,sub}si3_flags too, then ssaddsi3 and 2 peepholes that > would need changing. The canonical order of insns affecting condition-codes has been discussed in the past too. I was then arguing that the compare should go last, i.e. (parallel [(set (reg) (op...)) (set (ccreg) (compare (op...) (const_int 0)))]) for simplicity of processing together with an alternative that clobbers (i.e. same location in the parallel), as the canonical order of clobbers in a parallel is that they always go last. (IIRC from that distant past, transforming a cc0 target to "ccreg" required at least three variants: the "naked" insn (pre-reload?), the parallel with a clobber of ccreg, and the actual use of ccreg; as above.) Anyway, people seem to drift towards the ccreg-last variant for some reason or another every time this is brought up; this time it seems the single reason is that some existing pass expects or generates this, using the order that causes the minimal fallout. (Maybe it's the same pass...) Not sure that's a good base for decisions. Whatever you do in the end, *please document the canonical order* in the RTL documentation. brgds, H-P
On 05/11/2017 03:29 PM, Hans-Peter Nilsson wrote: > On Wed, 10 May 2017, Jakub Jelinek wrote: > >> On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote: >>> BTW: This patch now catches 417 cases (instead of 200+) in linux >>> build, including e.g.: >>> >>> (parallel [ >>> (set (reg:CCZ 17 flags) >>> (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) >>> (const_int 1 [0x1])) >>> (const_int 0 [0]))) >>> (set (reg:DI 4 si) >>> (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) >>> (const_int 1 [0x1])))) >>> ]) >> >> That looks nice. So, I think we need analysis on what order which targets >> use. I have looked at mn10300.md, I see {add,sub}si3_flags patterns that >> would need PARALLEL reordering for this compare-elim.c change and then >> cmp_liw vs. liw_cmp patterns I have no clue what they do and whether >> compare-elim.c would care about those or not (they have UNSPECs). Jeff/Alex? >> >> In rx.md I see {add,sub}si3_flags too, then ssaddsi3 and 2 peepholes that >> would need changing. > > The canonical order of insns affecting condition-codes has been > discussed in the past too. > > I was then arguing that the compare should go last, i.e. > (parallel [(set (reg) (op...)) > (set (ccreg) (compare (op...) (const_int 0)))]) > for simplicity of processing together with an alternative that > clobbers (i.e. same location in the parallel), as the canonical > order of clobbers in a parallel is that they always go last. > (IIRC from that distant past, transforming a cc0 target to > "ccreg" required at least three variants: the "naked" insn > (pre-reload?), the parallel with a clobber of ccreg, and the > actual use of ccreg; as above.) > > Anyway, people seem to drift towards the ccreg-last variant for > some reason or another every time this is brought up; this time > it seems the single reason is that some existing pass expects or > generates this, using the order that causes the minimal fallout. > (Maybe it's the same pass...) Not sure that's a good base for > decisions. > > Whatever you do in the end, *please document the canonical > order* in the RTL documentation. I think what's driving the decision is more a desire not to muck with x86 and aarch64 and instead place the burden to change on the less popular ports. Though I must admit i prefer the cc setting last since that mirrors how we typically do things with clobbers of the cc register. But yes, we definitely should document the final canonical ordering. jeff
(To-list pruned, my correction doesn't need attention.) On Thu, 11 May 2017, Hans-Peter Nilsson wrote: > On Wed, 10 May 2017, Jakub Jelinek wrote: > > > On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote: > > > BTW: This patch now catches 417 cases (instead of 200+) in linux > > > build, including e.g.: > > > > > > (parallel [ > > > (set (reg:CCZ 17 flags) > > > (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) > > > (const_int 1 [0x1])) > > > (const_int 0 [0]))) > > > (set (reg:DI 4 si) > > > (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) > > > (const_int 1 [0x1])))) > > > ]) > Anyway, people seem to drift towards the ccreg-last variant JFTR, I miswrote that; I meant "towards the variant with ccreg-first" as in Uros' example kept above and as opposed to my example. brgds, H-P
On Fri, May 12, 2017 at 05:42:59AM -0400, Hans-Peter Nilsson wrote: > (To-list pruned, my correction doesn't need attention.) > > On Thu, 11 May 2017, Hans-Peter Nilsson wrote: > > On Wed, 10 May 2017, Jakub Jelinek wrote: > > > > > On Wed, May 10, 2017 at 09:57:56PM +0200, Uros Bizjak wrote: > > > > BTW: This patch now catches 417 cases (instead of 200+) in linux > > > > build, including e.g.: > > > > > > > > (parallel [ > > > > (set (reg:CCZ 17 flags) > > > > (compare:CCZ (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) > > > > (const_int 1 [0x1])) > > > > (const_int 0 [0]))) > > > > (set (reg:DI 4 si) > > > > (zero_extend:DI (lshiftrt:SI (reg:SI 4 si [orig:93 _10 ] [93]) > > > > (const_int 1 [0x1])))) > > > > ]) > > > Anyway, people seem to drift towards the ccreg-last variant > > JFTR, I miswrote that; I meant "towards the variant with > ccreg-first" as in Uros' example kept above and as opposed to my > example. If I skim the current primary/secondary targets, then i386/x86_64, rs6000, aarch64, arm, sparc, s390* all use the ccreg-first order, in mips I couldn't find either of the orders. Looking at other targets, in alpha, avr, bfin, c6x, cr16, cris, fr30, ft32, h8300, ia64, iq2000, lm32, m32c, m32r, m68k, mcore, microblaze, mmix, moxie, nds32, nios2, nvptx, pa, pdp11, riscv, rl78, sh, spu, stormy16, tilegx, tilepro, v850, vax, xtensa I can't find any order, arc, epiphany are ccreg-first, frv is some weird mixture of ccreg-first (e.g. *combo_intop_compare2) and ccreg-last (e.g. adddi3_lower), mn10300, rx and visium are ccreg-last that we want to switch over now to ccreg-first. It was all from quickly skimming (mostly) the main config/<cpu>/<cpu>.md looking for \(compare and/or \(reg.*CC and looking if there are patterns that have the same arithmetics inside a compare or something similar vs. the operation repeated on a set, so it is possible I've missed something. But if the above is roughly true, then it is of course preferable to change 3 less used targets than 6 primary/secondary ones + 2 further ones. Jakub
On 05/10/2017 01:05 PM, Uros Bizjak wrote: > On Wed, May 10, 2017 at 5:18 PM, Uros Bizjak <ubizjak@gmail.com> wrote: >> On Wed, May 10, 2017 at 4:27 PM, Jakub Jelinek <jakub@redhat.com> wrote: >>> On Tue, May 09, 2017 at 06:06:47PM +0200, Uros Bizjak wrote: >>>> Attached patch enables post-reload compare elimination pass by >>>> providing expected patterns (duplicates of existing patterns with >>>> setters of reg and flags switched in the parallel) for flag setting >>>> arithmetic instructions. >>>> >>>> The merge triggers more than 3000 times during the gcc bootstrap, >>>> mostly in cases where intervening memory load or store prevents >>>> combine from merging the arithmetic insn and the following compare. >>>> >>>> Also, some recent linux x86_64 defconfig build results in ~200 merges, >>>> removing ~200 test/cmp insns. Not much, but I think the results still >>>> warrant the pass to be enabled. >>> >>> Isn't the right fix instead to change the compare-elim.c pass to either >>> accept both reg vs. flags orderings in parallel, or both depending >>> on some target hook, or change it to the order i386.md and most other >>> major targets use and just fix up mn10300/rx (and aarch64?) to use the same >>> order? > > Attached patch changes compare-elim.c order to what i386.md expects. > > Thoughts? I think with an appropriate change to the canonicalization rules in the manual this is fine. I've got the visium, rx and mn103 patches handy to ensure they don't regress. aarch64 doesn't seem to be affected either way but I didn't investigate why -- I expected it to improve with your change. I'll write up a ChangeLog and commit the mn103, rx and visium changes after you commit the compare-elim & documentation bits. jeff
On Fri, 12 May 2017, Jeff Law wrote: > On 05/11/2017 03:29 PM, Hans-Peter Nilsson wrote: > > The canonical order of insns affecting condition-codes has been > > discussed in the past too. > > > > I was then arguing that the compare should go last, i.e. > > (parallel [(set (reg) (op...)) > > (set (ccreg) (compare (op...) (const_int 0)))]) > > for simplicity of processing together with an alternative that > > clobbers (i.e. same location in the parallel), as the canonical > > order of clobbers in a parallel is that they always go last. > > (IIRC from that distant past, transforming a cc0 target to > > "ccreg" required at least three variants: the "naked" insn > > (pre-reload?), the parallel with a clobber of ccreg, and the > > actual use of ccreg; as above.) > > > > Anyway, people seem to drift towards the ccreg-last variant for (miswrite: ccreg-*first*, as opposed to the above) > > some reason or another every time this is brought up; this time > > it seems the single reason is that some existing pass expects or > > generates this, using the order that causes the minimal fallout. > > (Maybe it's the same pass...) Not sure that's a good base for > > decisions. > > > > Whatever you do in the end, *please document the canonical > > order* in the RTL documentation. > I think what's driving the decision is more a desire not to muck with x86 and > aarch64 and instead place the burden to change on the less popular ports. > > Though I must admit i prefer the cc setting last since that mirrors how we > typically do things with clobbers of the cc register. IOW. > But yes, we definitely should document the final canonical ordering. Is that about to also happen? I foresee in another half-a-dozen years, and *this* iteration is forgotten, someone bothered enough to argue eloquently coming around, doing rtl-level-maintenance, maybe a new pass (ok maybe not a *new RTL-pass* :) sees that order as wrong for the reason listed above, and does the legwork to switch the order around. It will be ok to change it again then, because the order just happened this time because of minimal-edit-reasons, right? Noone can argue that it was a thoughtful deliberate change that we bothered to document, to stay consistent? ;) brgds, H-P
On Wed, May 17, 2017 at 4:45 AM, Hans-Peter Nilsson <hp@bitrange.com> wrote: >> But yes, we definitely should document the final canonical ordering. > > Is that about to also happen? > > I foresee in another half-a-dozen years, and *this* iteration is > forgotten, someone bothered enough to argue eloquently coming > around, doing rtl-level-maintenance, maybe a new pass (ok maybe > not a *new RTL-pass* :) sees that order as wrong for the reason > listed above, and does the legwork to switch the order around. > It will be ok to change it again then, because the order just > happened this time because of minimal-edit-reasons, right? > Noone can argue that it was a thoughtful deliberate change that > we bothered to document, to stay consistent? ;) The proposed doc patch is wiating for review at [1]. [1] https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01073.html Uros.
On Wed, 17 May 2017, Uros Bizjak wrote: > On Wed, May 17, 2017 at 4:45 AM, Hans-Peter Nilsson <hp@bitrange.com> wrote: > >> But yes, we definitely should document the final canonical ordering. > > > > Is that about to also happen? > The proposed doc patch is wiating for review at [1]. > > [1] https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01073.html Thanks. LGTM. brgds, H-P
On 05/17/2017 12:33 AM, Uros Bizjak wrote: > On Wed, May 17, 2017 at 4:45 AM, Hans-Peter Nilsson <hp@bitrange.com> wrote: > >>> But yes, we definitely should document the final canonical ordering. >> >> Is that about to also happen? >> >> I foresee in another half-a-dozen years, and *this* iteration is >> forgotten, someone bothered enough to argue eloquently coming >> around, doing rtl-level-maintenance, maybe a new pass (ok maybe >> not a *new RTL-pass* :) sees that order as wrong for the reason >> listed above, and does the legwork to switch the order around. >> It will be ok to change it again then, because the order just >> happened this time because of minimal-edit-reasons, right? >> Noone can argue that it was a thoughtful deliberate change that >> we bothered to document, to stay consistent? ;) > > The proposed doc patch is wiating for review at [1]. > > [1] https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01073.html OK for the trunk. Sorry for the delay, jeff
Index: compare-elim.c =================================================================== --- compare-elim.c (revision 247850) +++ compare-elim.c (working copy) @@ -45,9 +45,9 @@ (3) If an insn of form (2) can usefully set the flags, there is another pattern of the form - [(set (reg) (operation) - (set (reg:CCM) (compare:CCM (operation) (immediate)))] - + [(set (reg:CCM) (compare:CCM (operation) (immediate))) + (set (reg) (operation)] + The mode CCM will be chosen as if by SELECT_CC_MODE. Note that unlike NOTICE_UPDATE_CC, we do not handle memory operands. @@ -582,7 +582,7 @@ static bool try_eliminate_compare (struct comparison *cmp) { - rtx x, flags, in_a, in_b, cmp_src; + rtx flags, in_a, in_b, cmp_src; /* We must have found an interesting "clobber" preceding the compare. */ if (cmp->prev_clobber == NULL) @@ -628,7 +628,8 @@ Validate that PREV_CLOBBER itself does in fact refer to IN_A. Do recall that we've already validated the shape of PREV_CLOBBER. */ rtx_insn *insn = cmp->prev_clobber; - x = XVECEXP (PATTERN (insn), 0, 0); + + rtx x = XVECEXP (PATTERN (insn), 0, 0); if (rtx_equal_p (SET_DEST (x), in_a)) cmp_src = SET_SRC (x); @@ -666,16 +667,30 @@ flags = gen_rtx_REG (cmp->orig_mode, targetm.flags_regnum); /* Generate a new comparison for installation in the setter. */ - x = copy_rtx (cmp_src); - x = gen_rtx_COMPARE (GET_MODE (flags), x, in_b); - x = gen_rtx_SET (flags, x); + rtx y = copy_rtx (cmp_src); + y = gen_rtx_COMPARE (GET_MODE (flags), y, in_b); + y = gen_rtx_SET (flags, y); + /* Canonicalize instruction to: + [(set (reg:CCM) (compare:CCM (operation) (immediate))) + (set (reg) (operation)] + */ + + rtvec v = rtvec_alloc (2); + RTVEC_ELT (v, 0) = y; + RTVEC_ELT (v, 1) = x; + + rtx pat = gen_rtx_PARALLEL (VOIDmode, v); + /* Succeed if the new instruction is valid. Note that we may have started a change group within maybe_select_cc_mode, therefore we must continue. */ - validate_change (insn, &XVECEXP (PATTERN (insn), 0, 1), x, true); + validate_change (insn, &PATTERN (insn), pat, true); if (!apply_change_group ()) return false; + printf ("TRIGGERED\n"); + debug_rtx (pat); + /* Success. Delete the compare insn... */ delete_insn (cmp->insn); Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 247850) +++ config/i386/i386.c (working copy) @@ -52043,6 +52043,8 @@ #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST ix86_address_cost +#undef TARGET_FLAGS_REGNUM +#define TARGET_FLAGS_REGNUM FLAGS_REG #undef TARGET_FIXED_CONDITION_CODE_REGS #define TARGET_FIXED_CONDITION_CODE_REGS ix86_fixed_condition_code_regs #undef TARGET_CC_MODES_COMPATIBLE