Message ID | CAEwic4Ycv_asbomwC2gpcjDLQmJ9LO023PBFtrmKA=9BXf58ag@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Wed, Jun 25, 2014 at 3:35 PM, Kai Tietz <ktietz70@googlemail.com> wrote: > Hello, > > so there seems to be a fallout caused by moving peephole2 pass. See PR/61608. > So we need indeed 2 peephole2 passes. > > ChangeLog > > 2014-06-25 Kai Tietz <ktietz@redhat.com> > > PR rtl-optimization/61608 > * passes.def (peephole2): Readd peephole2 pass > before if-after-reload pass. > > Tested for arm*-none-*, i686-w64-cygwin, x86_64-unknown-linux-gnu. Ok > for apply? Uh, every time I try to improve compile-time by removing passes somebody else adds other passes back. Please try to avoid this. Thanks, Richard. > Regards, > Kai > > Index: passes.def > =================================================================== > --- passes.def (Revision 211971) > +++ passes.def (Arbeitskopie) > @@ -396,6 +396,7 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_rtl_dse2); > NEXT_PASS (pass_stack_adjustments); > NEXT_PASS (pass_jump2); > + NEXT_PASS (pass_peephole2); > NEXT_PASS (pass_if_after_reload); > NEXT_PASS (pass_regrename); > NEXT_PASS (pass_cprop_hardreg); > @@ -407,8 +408,7 @@ along with GCC; see the file COPYING3. If not see > We have a single indirect branch in the entire function > before duplicate-compute-gotos pass. This vastly reduces > the size of the CFG. > - For preventing to run peephole2 pass twice, its run after > - the jump2 got removed. */ > + We need to run peephole2 pass twice. See PR/61608. */ > NEXT_PASS (pass_peephole2); > NEXT_PASS (pass_branch_target_load_optimize2); > NEXT_PASS (pass_leaf_regs);
On 06/25/14 07:35, Kai Tietz wrote: > Hello, > > so there seems to be a fallout caused by moving peephole2 pass. See PR/61608. > So we need indeed 2 peephole2 passes. > > ChangeLog > > 2014-06-25 Kai Tietz <ktietz@redhat.com> > > PR rtl-optimization/61608 > * passes.def (peephole2): Readd peephole2 pass > before if-after-reload pass. So why is the peephole not working in its current location? Jeff
2014-06-25 16:04 GMT+02:00 Jeff Law <law@redhat.com>: > So why is the peephole not working in its current location? > > Jeff Hi Jeff, that is what I read out of dumps: If peephole2 is executed early we see following pattern transformation: (insn 12 11 13 2 (set (reg:CC_NOOV 100 cc) (compare:CC_NOOV (zero_extract:SI (reg:SI 4 r4 [orig:110 D.1414 ] [110] ) (const_int 1 [0x1]) (const_int 2 [0x2])) (const_int 0 [0]))) arm_epilog-1.c:11 84 {*zeroextractsi_compare0_scratch} (expr_list:REG_DEAD (reg:SI 4 r4 [orig:110 D.1414 ] [110]) (nil))) (jump_insn 13 12 14 2 (set (pc) (if_then_else (eq (reg:CC_NOOV 100 cc) (const_int 0 [0])) (label_ref:SI 16) (pc))) arm_epilog-1.c:11 236 {arm_cond_branch} (expr_list:REG_DEAD (reg:CC_NOOV 100 cc) (int_list:REG_BR_PROB 3900 (nil))) -> 16) which gets replaced to: (insn 62 11 63 2 (parallel [ (set (reg:CC_NOOV 100 cc) (compare:CC_NOOV (ashift:SI (reg:SI 4 r4 [orig:110 D.1414 ] [110]) (const_int 29 [0x1d])) (const_int 0 [0]))) (clobber (reg:SI 4 r4)) ]) arm_epilog-1.c:11 -1 (nil)) (jump_insn 63 62 14 2 (set (pc) (if_then_else (ge (reg:CC_NOOV 100 cc) (const_int 0 [0])) (label_ref:SI 16) (pc))) arm_epilog-1.c:11 -1 (nil) -> 16) If we run peephole2 pass late we see instead the following pattern, which isn't handled by peephole2 pass anymore. (insn 12 11 15 2 (set (reg:CC_NOOV 100 cc) (compare:CC_NOOV (zero_extract:SI (reg:SI 4 r4 [orig:110 D.1414 ] [110]) (const_int 1 [0x1]) (const_int 2 [0x2])) (const_int 0 [0]))) arm_epilog-1_.c:11 84 {*zeroextractsi_compare0_s cratch} (expr_list:REG_DEAD (reg:SI 4 r4 [orig:110 D.1414 ] [110]) (nil))) (insn 15 12 22 2 (cond_exec (ne (reg:CC_NOOV 100 cc) (const_int 0 [0])) (set (reg/v:SI 2 r2 [orig:115 c ] [115]) (plus:SI (reg/v:SI 2 r2 [orig:115 c ] [115]) (const_int 1 [0x1])))) arm_epilog-1_.c:11 3229 {*p *arm_addsi3} (expr_list:REG_DEAD (reg:CC_NOOV 100 cc) (nil))) So this issue might be solvable by extending ARM's peephole2 patterns? An ARM maintainer might be able to tell. Kai
On 06/25/14 09:04, Kai Tietz wrote: > 2014-06-25 16:04 GMT+02:00 Jeff Law <law@redhat.com>: >> So why is the peephole not working in its current location? >> >> Jeff > > Hi Jeff, > > that is what I read out of dumps: > > If peephole2 is executed early we see following pattern transformation: [ ... ] Ask an ARM maintainer if the new code is actually better than the old code. It appears that with the peep2 pass moved that we actually if-convert the fall-thru path of the conditional and eliminate the conditional. Which, on the surface seems like a good thing. It may be the case that we need to twiddle the test. Not sure yet. Jeff
[Apologies about the duplicates to folks - resending to make this hit the lists] On 25/06/14 16:28, Jeff Law wrote: > On 06/25/14 09:04, Kai Tietz wrote: >> 2014-06-25 16:04 GMT+02:00 Jeff Law <law@redhat.com>: >>> So why is the peephole not working in its current location? >>> >>> Jeff >> >> Hi Jeff, >> >> that is what I read out of dumps: >> >> If peephole2 is executed early we see following pattern transformation: > [ ... ] > Ask an ARM maintainer if the new code is actually better than the old code. > > It appears that with the peep2 pass moved that we actually if-convert > the fall-thru path of the conditional and eliminate the conditional. > Which, on the surface seems like a good thing. It may be the case that > we need to twiddle the test. Not sure yet. No, we don't need to twiddle the test. The test is good as it is today as we caught a potential size regression. (Prima-facie the test itself doesn't show an actual size regression because of padding but you can see the replacement of a 2 byte instruction with a 4 byte instruction) These peepholes that were in there gave 0.1% in CSiBe for Thumb2 as per the original patch (https://gcc.gnu.org/ml/gcc-patches/2010-06/msg02518.html). The if-conversion happened anyway afterwards regardless of where the pattern was but the key here was the conversion of the 4 byte instruction to a 2 byte instruction. Looks like the all the peephole2's that have if_then_else's (5 patterns today) also need to develop a cond_exec variant where possible in the ARM backend. It seems unfortunate that the backend has to handle each of the possible pattern classes that may be conditionalized by the cond-exec pass and we potentially maintain a list of patterns in sync with anything that can be conditionalized. At first glance this looks ugly to "work around" in the backend. regards Ramana > > > Jeff > > >
On 06/25/2014 08:28 AM, Jeff Law wrote: > Ask an ARM maintainer if the new code is actually better than the old code. It isn't. > It appears that with the peep2 pass moved that we actually if-convert the > fall-thru path of the conditional and eliminate the conditional. Which, on the > surface seems like a good thing. It may be the case that we need to twiddle > the test. Not sure yet. Looking at the final code in the pr, it looks like we if-convert both ways, just with changed condition on the test. It looks like there are at least 3 peepholes in the arm backend that match compare+branch with a scratch register that are affected by this. I don't think it's reasonable to expect a peephole to match compare + n cond_exec insns, so running peep2 before if-conversion would seem to be called for. Kai, why does your indirect jump peephole require pass_reorder_blocks? It seems to me that instead of moving peephole2 down, we could move duplicate_computed_gotos up. r~
2014-06-25 17:50 GMT+02:00 Richard Henderson <rth@redhat.com>: > On 06/25/2014 08:28 AM, Jeff Law wrote: >> Ask an ARM maintainer if the new code is actually better than the old code. > > It isn't. > >> It appears that with the peep2 pass moved that we actually if-convert the >> fall-thru path of the conditional and eliminate the conditional. Which, on the >> surface seems like a good thing. It may be the case that we need to twiddle >> the test. Not sure yet. > > Looking at the final code in the pr, it looks like we if-convert both ways, > just with changed condition on the test. > > It looks like there are at least 3 peepholes in the arm backend that match > compare+branch with a scratch register that are affected by this. I don't > think it's reasonable to expect a peephole to match compare + n cond_exec > insns, so running peep2 before if-conversion would seem to be called for. > > Kai, why does your indirect jump peephole require pass_reorder_blocks? It > seems to me that instead of moving peephole2 down, we could move > duplicate_computed_gotos up. > > > r~ We need the peephole2 pass after reorder_blocks due we move in that pass the jump ... (code_label 54 52 55 5 8 "" [0 uses]) (note 55 54 56 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (jump_insn 56 55 57 5 (set (pc) (reg/f:DI 0 ax [orig:83 D.3469 ] [83])) 638 {*indirect_jump} (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:83 D.3469 ] [83]) (nil))) (barrier 57 56 58) .... into each bb, so it can be resolved to an indirect jump on memory. Testcase to reproduce that is: // PR 51840/ #include <stdint.h> #include <stdio.h> #include <stdlib.h> enum { S_atop = 0, S_atop_f = 1, S_atop_t = 2, S_limit = 3 }; enum { I_a_dec = 0, I_a_non_zero_p = 1, I_jmp_if_true = 2, I_exit = 3, I_limit = 4 }; typedef struct { uint64_t a0; } vm_state_t; __attribute__((noinline, noclone)) void exec_code(vm_state_t *state, uint8_t *code) { static void (* volatile const vm_dispatch[S_limit][I_limit]) = { //dispatch for [S_atop = 0][..] with four unique destination labels {&&atop__a_dec, &&atop__a_non_zero_p, &&fixme, &&atop__exit}, //dispatch for [S_atop_f = 1][..] with two unique destination labels {&&fixme, &&fixme, &&atop_f__jmp_if_true, &&fixme}, //dispatch for [S_atop_t = 2][..] with two unique destination labels {&&fixme, &&fixme, &&atop_t__jmp_if_true, &&fixme} }; printf("atop__a_dec: %p\n", &&atop__a_dec); printf("atop__a_non_zero_p: %p\n", &&atop__a_non_zero_p); printf("atop__exit: %p\n", &&atop__exit); printf("atop_f__jmp_if_true: %p\n", &&atop_f__jmp_if_true); printf("atop_t__jmp_if_true: %p\n", &&atop_t__jmp_if_true); printf("fixme: %p\n", &&fixme); volatile uint64_t atop = state->a0; volatile int64_t inst_offset=0; goto *(vm_dispatch[S_atop][code[inst_offset]]); atop__a_dec: { printf("atop = %ld\n", atop); --atop; volatile uint64_t next_inst = code[inst_offset + 1]; inst_offset += 1; goto *(vm_dispatch[S_atop][next_inst]); } atop__a_non_zero_p: { volatile uint64_t next_inst = code[inst_offset + 1]; inst_offset += 1; if (atop != 0) { goto *(vm_dispatch[S_atop_t][next_inst]); } else { goto *(vm_dispatch[S_atop_f][next_inst]); } } atop_f__jmp_if_true: { volatile uint64_t next_inst = code[inst_offset + 2]; inst_offset += 2; goto *(vm_dispatch[S_atop][next_inst]); } atop_t__jmp_if_true: { int64_t offset = (int8_t) code[inst_offset + 1]; uint64_t next_inst = code[inst_offset + offset]; inst_offset += offset; goto *(vm_dispatch[S_atop][next_inst]); } atop__exit: { state->a0 = atop; return; } fixme: { printf("Dispatch logic ERROR\n"); exit(EXIT_FAILURE); } } int main(void) { vm_state_t state; state.a0 = 10; uint8_t code[] = {I_a_dec, //decrement top of stack I_a_non_zero_p, //true if top of stack is non-zero I_jmp_if_true, -2, //if true jump back to decrement I_exit}; //otherwise exit exec_code(&state, code); return EXIT_SUCCESS; } Regards, Kai
On 06/25/14 10:02, Kai Tietz wrote: > 2014-06-25 17:50 GMT+02:00 Richard Henderson <rth@redhat.com>: >> On 06/25/2014 08:28 AM, Jeff Law wrote: >>> Ask an ARM maintainer if the new code is actually better than the old code. >> >> It isn't. >> >>> It appears that with the peep2 pass moved that we actually if-convert the >>> fall-thru path of the conditional and eliminate the conditional. Which, on the >>> surface seems like a good thing. It may be the case that we need to twiddle >>> the test. Not sure yet. >> >> Looking at the final code in the pr, it looks like we if-convert both ways, >> just with changed condition on the test. >> >> It looks like there are at least 3 peepholes in the arm backend that match >> compare+branch with a scratch register that are affected by this. I don't >> think it's reasonable to expect a peephole to match compare + n cond_exec >> insns, so running peep2 before if-conversion would seem to be called for. >> >> Kai, why does your indirect jump peephole require pass_reorder_blocks? It >> seems to me that instead of moving peephole2 down, we could move >> duplicate_computed_gotos up. >> >> >> r~ > > We need the peephole2 pass after reorder_blocks due we move in that > pass the jump > > ... > (code_label 54 52 55 5 8 "" [0 uses]) > (note 55 54 56 5 [bb 5] NOTE_INSN_BASIC_BLOCK) > (jump_insn 56 55 57 5 (set (pc) > (reg/f:DI 0 ax [orig:83 D.3469 ] [83])) 638 {*indirect_jump} > (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:83 D.3469 ] [83]) > (nil))) > (barrier 57 56 58) > .... > > into each bb, so it can be resolved to an indirect jump on memory. But I think Richard's question was can we move up duplicate_computed_gotos since that's the code which I'd expect to make this transformation. Then you run peep2 immediately after duplicate_computed_gotos. Jeff
On 06/25/14 09:50, Richard Henderson wrote: > On 06/25/2014 08:28 AM, Jeff Law wrote: >> Ask an ARM maintainer if the new code is actually better than the old code. > > It isn't. > >> It appears that with the peep2 pass moved that we actually if-convert the >> fall-thru path of the conditional and eliminate the conditional. Which, on the >> surface seems like a good thing. It may be the case that we need to twiddle >> the test. Not sure yet. > > Looking at the final code in the pr, it looks like we if-convert both ways, > just with changed condition on the test. Yea, I came to the same conclusion when I looked back at the code in the BZ. I was too focused on Kai's message and the RTL contained within when I wrote my comment. > > It looks like there are at least 3 peepholes in the arm backend that match > compare+branch with a scratch register that are affected by this. I don't > think it's reasonable to expect a peephole to match compare + n cond_exec > insns, so running peep2 before if-conversion would seem to be called for. Agreed. jeff
2014-06-25 19:15 GMT+02:00 Jeff Law <law@redhat.com>: > On 06/25/14 10:02, Kai Tietz wrote: >> >> 2014-06-25 17:50 GMT+02:00 Richard Henderson <rth@redhat.com>: >>> >>> On 06/25/2014 08:28 AM, Jeff Law wrote: >>>> >>>> Ask an ARM maintainer if the new code is actually better than the old >>>> code. >>> >>> >>> It isn't. >>> >>>> It appears that with the peep2 pass moved that we actually if-convert >>>> the >>>> fall-thru path of the conditional and eliminate the conditional. Which, >>>> on the >>>> surface seems like a good thing. It may be the case that we need to >>>> twiddle >>>> the test. Not sure yet. >>> >>> >>> Looking at the final code in the pr, it looks like we if-convert both >>> ways, >>> just with changed condition on the test. >>> >>> It looks like there are at least 3 peepholes in the arm backend that >>> match >>> compare+branch with a scratch register that are affected by this. I >>> don't >>> think it's reasonable to expect a peephole to match compare + n cond_exec >>> insns, so running peep2 before if-conversion would seem to be called for. >>> >>> Kai, why does your indirect jump peephole require pass_reorder_blocks? >>> It >>> seems to me that instead of moving peephole2 down, we could move >>> duplicate_computed_gotos up. >>> >>> >>> r~ >> >> >> We need the peephole2 pass after reorder_blocks due we move in that >> pass the jump >> >> ... >> (code_label 54 52 55 5 8 "" [0 uses]) >> (note 55 54 56 5 [bb 5] NOTE_INSN_BASIC_BLOCK) >> (jump_insn 56 55 57 5 (set (pc) >> (reg/f:DI 0 ax [orig:83 D.3469 ] [83])) 638 {*indirect_jump} >> (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:83 D.3469 ] [83]) >> (nil))) >> (barrier 57 56 58) >> .... >> >> into each bb, so it can be resolved to an indirect jump on memory. > > But I think Richard's question was can we move up duplicate_computed_gotos > since that's the code which I'd expect to make this transformation. Then > you run peep2 immediately after duplicate_computed_gotos. > > > Jeff > The pass of question on ARM isn't the duplicate_compute_gotos pass. It is the if-after-reload pass. I tested it by moving it. Kai
Index: passes.def =================================================================== --- passes.def (Revision 211971) +++ passes.def (Arbeitskopie) @@ -396,6 +396,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_rtl_dse2); NEXT_PASS (pass_stack_adjustments); NEXT_PASS (pass_jump2); + NEXT_PASS (pass_peephole2); NEXT_PASS (pass_if_after_reload); NEXT_PASS (pass_regrename); NEXT_PASS (pass_cprop_hardreg); @@ -407,8 +408,7 @@ along with GCC; see the file COPYING3. If not see We have a single indirect branch in the entire function before duplicate-compute-gotos pass. This vastly reduces the size of the CFG. - For preventing to run peephole2 pass twice, its run after - the jump2 got removed. */ + We need to run peephole2 pass twice. See PR/61608. */ NEXT_PASS (pass_peephole2); NEXT_PASS (pass_branch_target_load_optimize2); NEXT_PASS (pass_leaf_regs);