Message ID | 001001d765ea$286661f0$793325d0$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [x86_64] PR target/11877: Use xor to write zero to memory with -Os | expand |
On Sun, Jun 20, 2021 at 5:37 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > The following patch attempts to resolve PR target/11877 (without > triggering PR/23102). On x86_64, writing an SImode or DImode zero > to memory uses an instruction encoding that is larger than first > clearing a register (using xor) then writing that to memory. Hence, > after reload, the peephole2 pass can determine if there's a suitable > free register, and if so, use that to shrink the code size with -Os. > > To improve code size, and avoid inserting a large number of xor > instructions (PR target/23102), this patch makes use of peephole2's > efficient pattern matching to use a single temporary for a run of > consecutive writes. In theory, one could do better still with a > new target-specific pass, gated on -Os, to shrink these instructions > (like stv), but that's probably overkill for the little remaining > space savings. Agreed. Peephole2 pass runs before sched2 (and x86 targets do not use sched1 pass), so there is a good chance these instructions stay together until the new peephole2 converts them. > Evaluating this patch on the CSiBE benchmark (v2.1.1) results in a > 0.26% code size improvement (3715273 bytes down to 3705477) on x86_64 > with -Os [saving 1 byte every 400]. 549 of 894 tests improve, two > tests grow larger. Analysis of these 2 pathological cases reveals > that although peephole2's match_scratch prefers to use a call-clobbered > register (to avoid requiring a new stack frame), very rarely this > interacts with GCC's shrink wrapping optimization, which may previously > have avoided saving/restoring a call clobbered register, such as %eax, > in the calling function. > > This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap > and make -k check with no new failures. > > Ok for mainline? > > > 2021-06-20 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR target/11877 > * config/i386/i386.md: New define_peephole2s to shrink writing > 1, 2 or 4 consecutive zeros to memory when optimizing for size. > > gcc/testsuite/ChangeLog > PR target/11877 > * gcc.target/i386/pr11877.c: New test case. OK. Thanks, Uros.
On Mon, Jun 21, 2021 at 09:18:28AM +0200, Uros Bizjak via Gcc-patches wrote: > > 2021-06-20 Roger Sayle <roger@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR target/11877 > > * config/i386/i386.md: New define_peephole2s to shrink writing > > 1, 2 or 4 consecutive zeros to memory when optimizing for size. > > > > gcc/testsuite/ChangeLog > > PR target/11877 > > * gcc.target/i386/pr11877.c: New test case. > > OK. It unfortunately doesn't extend well to larger memory clearing. Consider e.g. void foo (int *p) { p[0] = 0; p[7] = 0; p[23] = 0; p[41] = 0; p[48] = 0; p[59] = 0; p[69] = 0; p[78] = 0; p[83] = 0; p[89] = 0; p[98] = 0; p[121] = 0; p[132] = 0; p[143] = 0; p[154] = 0; } where with the patch we emit: xorl %eax, %eax xorl %edx, %edx xorl %ecx, %ecx xorl %esi, %esi xorl %r8d, %r8d movl %eax, (%rdi) movl %eax, 28(%rdi) movl %eax, 92(%rdi) movl %eax, 164(%rdi) movl %edx, 192(%rdi) movl %edx, 236(%rdi) movl %edx, 276(%rdi) movl %edx, 312(%rdi) movl %ecx, 332(%rdi) movl %ecx, 356(%rdi) movl %ecx, 392(%rdi) movl %ecx, 484(%rdi) movl %esi, 528(%rdi) movl %esi, 572(%rdi) movl %r8d, 616(%rdi) Here is an incremental (so far untested) patch that emits: xorl %eax, %eax movl %eax, (%rdi) movl %eax, 28(%rdi) movl %eax, 92(%rdi) movl %eax, 164(%rdi) movl %eax, 192(%rdi) movl %eax, 236(%rdi) movl %eax, 276(%rdi) movl %eax, 312(%rdi) movl %eax, 332(%rdi) movl %eax, 356(%rdi) movl %eax, 392(%rdi) movl %eax, 484(%rdi) movl %eax, 528(%rdi) movl %eax, 572(%rdi) movl %eax, 616(%rdi) instead: 2021-06-21 Jakub Jelinek <jakub@redhat.com> PR target/11877 * config/i386/i386-protos.h (ix86_zero_stores_peep2_p): Declare. * config/i386/i386.c (ix86_zero_stores_peep2_p): New function. * config/i386/i386.md (peephole2s for 1/2/4 stores of const0_rtx): Remove "" from match_operand. Add peephole2s for 1/2/4 stores of const0_rtx following previous successful peep2s. --- gcc/config/i386/i386-protos.h.jj 2021-06-07 09:24:57.696690116 +0200 +++ gcc/config/i386/i386-protos.h 2021-06-21 10:21:05.428887980 +0200 @@ -111,6 +111,7 @@ extern bool ix86_use_lea_for_mov (rtx_in extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]); extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode); extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]); +extern bool ix86_zero_stores_peep2_p (rtx_insn *, rtx); extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high); extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn); extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn); --- gcc/config/i386/i386.c.jj 2021-06-21 09:39:21.622487840 +0200 +++ gcc/config/i386/i386.c 2021-06-21 10:21:12.389794740 +0200 @@ -15186,6 +15186,33 @@ ix86_lea_for_add_ok (rtx_insn *insn, rtx return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0, false); } +/* Return true if insns before FIRST_INSN (which is of the form + (set (memory) (zero_operand)) are all also either in the + same form, or (set (zero_operand) (const_int 0)). */ + +bool +ix86_zero_stores_peep2_p (rtx_insn *first_insn, rtx zero_operand) +{ + rtx_insn *insn = first_insn; + for (int count = 0; count < 512; count++) + { + insn = prev_nonnote_nondebug_insn_bb (insn); + if (!insn) + return false; + rtx set = single_set (insn); + if (!set) + return false; + if (SET_SRC (set) == const0_rtx + && rtx_equal_p (SET_DEST (set), zero_operand)) + return true; + if (set != PATTERN (insn) + || !rtx_equal_p (SET_SRC (set), zero_operand) + || !memory_operand (SET_DEST (set), VOIDmode)) + return false; + } + return false; +} + /* Return true if destination reg of SET_BODY is shift count of USE_BODY. */ --- gcc/config/i386/i386.md.jj 2021-06-21 09:42:04.086303699 +0200 +++ gcc/config/i386/i386.md 2021-06-21 10:21:31.932532964 +0200 @@ -19360,10 +19360,10 @@ (define_peephole2 ;; When optimizing for size, zeroing memory should use a register. (define_peephole2 [(match_scratch:SWI48 0 "r") - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0)) - (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0)) - (set (match_operand:SWI48 3 "memory_operand" "") (const_int 0)) - (set (match_operand:SWI48 4 "memory_operand" "") (const_int 0))] + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 2 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 3 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 4 "memory_operand") (const_int 0))] "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" [(set (match_dup 1) (match_dup 0)) (set (match_dup 2) (match_dup 0)) @@ -19375,8 +19375,8 @@ (define_peephole2 (define_peephole2 [(match_scratch:SWI48 0 "r") - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0)) - (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))] + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 2 "memory_operand") (const_int 0))] "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" [(set (match_dup 1) (match_dup 0)) (set (match_dup 2) (match_dup 0))] @@ -19386,13 +19386,48 @@ (define_peephole2 (define_peephole2 [(match_scratch:SWI48 0 "r") - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))] + (set (match_operand:SWI48 1 "memory_operand") (const_int 0))] "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" [(set (match_dup 1) (match_dup 0))] { ix86_expand_clear (operands[0]); }) +(define_peephole2 + [(set (match_operand:SWI48 5 "memory_operand") + (match_operand:SWI48 0 "general_reg_operand")) + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 2 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 3 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 4 "memory_operand") (const_int 0))] + "optimize_insn_for_size_p () + && ix86_zero_stores_peep2_p (peep2_next_insn (0), operands[0])" + [(set (match_dup 5) (match_dup 0)) + (set (match_dup 1) (match_dup 0)) + (set (match_dup 2) (match_dup 0)) + (set (match_dup 3) (match_dup 0)) + (set (match_dup 4) (match_dup 0))]) + +(define_peephole2 + [(set (match_operand:SWI48 3 "memory_operand") + (match_operand:SWI48 0 "general_reg_operand")) + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 2 "memory_operand") (const_int 0))] + "optimize_insn_for_size_p () + && ix86_zero_stores_peep2_p (peep2_next_insn (0), operands[0])" + [(set (match_dup 3) (match_dup 0)) + (set (match_dup 1) (match_dup 0)) + (set (match_dup 2) (match_dup 0))]) + +(define_peephole2 + [(set (match_operand:SWI48 2 "memory_operand") + (match_operand:SWI48 0 "general_reg_operand")) + (set (match_operand:SWI48 1 "memory_operand") (const_int 0))] + "optimize_insn_for_size_p () + && ix86_zero_stores_peep2_p (peep2_next_insn (0), operands[0])" + [(set (match_dup 2) (match_dup 0)) + (set (match_dup 1) (match_dup 0))]) + ;; Reload dislikes loading constants directly into class_likely_spilled ;; hard registers. Try to tidy things up here. (define_peephole2 Jakub
On Mon, Jun 21, 2021 at 10:37 AM Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Mon, Jun 21, 2021 at 09:18:28AM +0200, Uros Bizjak via Gcc-patches wrote: > > > 2021-06-20 Roger Sayle <roger@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > PR target/11877 > > > * config/i386/i386.md: New define_peephole2s to shrink writing > > > 1, 2 or 4 consecutive zeros to memory when optimizing for size. > > > > > > gcc/testsuite/ChangeLog > > > PR target/11877 > > > * gcc.target/i386/pr11877.c: New test case. > > > > OK. > > It unfortunately doesn't extend well to larger memory clearing. > Consider e.g. > void > foo (int *p) > { > p[0] = 0; > p[7] = 0; > p[23] = 0; > p[41] = 0; > p[48] = 0; > p[59] = 0; > p[69] = 0; > p[78] = 0; > p[83] = 0; > p[89] = 0; > p[98] = 0; > p[121] = 0; > p[132] = 0; > p[143] = 0; > p[154] = 0; > } > where with the patch we emit: > xorl %eax, %eax > xorl %edx, %edx > xorl %ecx, %ecx > xorl %esi, %esi > xorl %r8d, %r8d > movl %eax, (%rdi) > movl %eax, 28(%rdi) > movl %eax, 92(%rdi) > movl %eax, 164(%rdi) > movl %edx, 192(%rdi) > movl %edx, 236(%rdi) > movl %edx, 276(%rdi) > movl %edx, 312(%rdi) > movl %ecx, 332(%rdi) > movl %ecx, 356(%rdi) > movl %ecx, 392(%rdi) > movl %ecx, 484(%rdi) > movl %esi, 528(%rdi) > movl %esi, 572(%rdi) > movl %r8d, 616(%rdi) > Here is an incremental (so far untested) patch that emits: > xorl %eax, %eax > movl %eax, (%rdi) > movl %eax, 28(%rdi) > movl %eax, 92(%rdi) > movl %eax, 164(%rdi) > movl %eax, 192(%rdi) > movl %eax, 236(%rdi) > movl %eax, 276(%rdi) > movl %eax, 312(%rdi) > movl %eax, 332(%rdi) > movl %eax, 356(%rdi) > movl %eax, 392(%rdi) > movl %eax, 484(%rdi) > movl %eax, 528(%rdi) > movl %eax, 572(%rdi) > movl %eax, 616(%rdi) > instead: > > 2021-06-21 Jakub Jelinek <jakub@redhat.com> > > PR target/11877 > * config/i386/i386-protos.h (ix86_zero_stores_peep2_p): Declare. > * config/i386/i386.c (ix86_zero_stores_peep2_p): New function. > * config/i386/i386.md (peephole2s for 1/2/4 stores of const0_rtx): > Remove "" from match_operand. Add peephole2s for 1/2/4 stores of > const0_rtx following previous successful peep2s. > > --- gcc/config/i386/i386-protos.h.jj 2021-06-07 09:24:57.696690116 +0200 > +++ gcc/config/i386/i386-protos.h 2021-06-21 10:21:05.428887980 +0200 > @@ -111,6 +111,7 @@ extern bool ix86_use_lea_for_mov (rtx_in > extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]); > extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode); > extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]); > +extern bool ix86_zero_stores_peep2_p (rtx_insn *, rtx); > extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high); > extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn); > extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn); > --- gcc/config/i386/i386.c.jj 2021-06-21 09:39:21.622487840 +0200 > +++ gcc/config/i386/i386.c 2021-06-21 10:21:12.389794740 +0200 > @@ -15186,6 +15186,33 @@ ix86_lea_for_add_ok (rtx_insn *insn, rtx > return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0, false); > } > > +/* Return true if insns before FIRST_INSN (which is of the form > + (set (memory) (zero_operand)) are all also either in the > + same form, or (set (zero_operand) (const_int 0)). */ > + > +bool > +ix86_zero_stores_peep2_p (rtx_insn *first_insn, rtx zero_operand) > +{ > + rtx_insn *insn = first_insn; > + for (int count = 0; count < 512; count++) Can't the peephole add a note (reg_equal?) that the SET_SRC of the previously matched store is zero? That would avoid the need to walk here. > + { > + insn = prev_nonnote_nondebug_insn_bb (insn); > + if (!insn) > + return false; > + rtx set = single_set (insn); > + if (!set) > + return false; > + if (SET_SRC (set) == const0_rtx > + && rtx_equal_p (SET_DEST (set), zero_operand)) > + return true; > + if (set != PATTERN (insn) > + || !rtx_equal_p (SET_SRC (set), zero_operand) > + || !memory_operand (SET_DEST (set), VOIDmode)) > + return false; > + } > + return false; > +} > + > /* Return true if destination reg of SET_BODY is shift count of > USE_BODY. */ > > --- gcc/config/i386/i386.md.jj 2021-06-21 09:42:04.086303699 +0200 > +++ gcc/config/i386/i386.md 2021-06-21 10:21:31.932532964 +0200 > @@ -19360,10 +19360,10 @@ (define_peephole2 > ;; When optimizing for size, zeroing memory should use a register. > (define_peephole2 > [(match_scratch:SWI48 0 "r") > - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0)) > - (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0)) > - (set (match_operand:SWI48 3 "memory_operand" "") (const_int 0)) > - (set (match_operand:SWI48 4 "memory_operand" "") (const_int 0))] > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 2 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 3 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 4 "memory_operand") (const_int 0))] > "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" > [(set (match_dup 1) (match_dup 0)) > (set (match_dup 2) (match_dup 0)) > @@ -19375,8 +19375,8 @@ (define_peephole2 > > (define_peephole2 > [(match_scratch:SWI48 0 "r") > - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0)) > - (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))] > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 2 "memory_operand") (const_int 0))] > "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" > [(set (match_dup 1) (match_dup 0)) > (set (match_dup 2) (match_dup 0))] > @@ -19386,13 +19386,48 @@ (define_peephole2 > > (define_peephole2 > [(match_scratch:SWI48 0 "r") > - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))] > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0))] > "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" > [(set (match_dup 1) (match_dup 0))] > { > ix86_expand_clear (operands[0]); > }) > > +(define_peephole2 > + [(set (match_operand:SWI48 5 "memory_operand") > + (match_operand:SWI48 0 "general_reg_operand")) > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 2 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 3 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 4 "memory_operand") (const_int 0))] > + "optimize_insn_for_size_p () > + && ix86_zero_stores_peep2_p (peep2_next_insn (0), operands[0])" > + [(set (match_dup 5) (match_dup 0)) > + (set (match_dup 1) (match_dup 0)) > + (set (match_dup 2) (match_dup 0)) > + (set (match_dup 3) (match_dup 0)) > + (set (match_dup 4) (match_dup 0))]) > + > +(define_peephole2 > + [(set (match_operand:SWI48 3 "memory_operand") > + (match_operand:SWI48 0 "general_reg_operand")) > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 2 "memory_operand") (const_int 0))] > + "optimize_insn_for_size_p () > + && ix86_zero_stores_peep2_p (peep2_next_insn (0), operands[0])" > + [(set (match_dup 3) (match_dup 0)) > + (set (match_dup 1) (match_dup 0)) > + (set (match_dup 2) (match_dup 0))]) > + > +(define_peephole2 > + [(set (match_operand:SWI48 2 "memory_operand") > + (match_operand:SWI48 0 "general_reg_operand")) > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0))] > + "optimize_insn_for_size_p () > + && ix86_zero_stores_peep2_p (peep2_next_insn (0), operands[0])" > + [(set (match_dup 2) (match_dup 0)) > + (set (match_dup 1) (match_dup 0))]) > + > ;; Reload dislikes loading constants directly into class_likely_spilled > ;; hard registers. Try to tidy things up here. > (define_peephole2 > > > Jakub >
On Mon, Jun 21, 2021 at 11:19:12AM +0200, Richard Biener wrote: > > --- gcc/config/i386/i386.c.jj 2021-06-21 09:39:21.622487840 +0200 > > +++ gcc/config/i386/i386.c 2021-06-21 10:21:12.389794740 +0200 > > @@ -15186,6 +15186,33 @@ ix86_lea_for_add_ok (rtx_insn *insn, rtx > > return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0, false); > > } > > > > +/* Return true if insns before FIRST_INSN (which is of the form > > + (set (memory) (zero_operand)) are all also either in the > > + same form, or (set (zero_operand) (const_int 0)). */ > > + > > +bool > > +ix86_zero_stores_peep2_p (rtx_insn *first_insn, rtx zero_operand) > > +{ > > + rtx_insn *insn = first_insn; > > + for (int count = 0; count < 512; count++) > > Can't the peephole add a note (reg_equal?) that the > SET_SRC of the previously matched store is zero? I think REG_EQUAL is not valid, the documentation says that it should be used on SET of a REG, which is not the case here - we have a MEM. > That would avoid the need to walk here. But we could do what I've done in r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70 - have int ix86_last_zero_store_uid; set to INSN_UID of the last store emitted by the peephole2s and then check that INSN_UID against the var. Jakub
On Mon, Jun 21, 2021 at 11:59 AM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Jun 21, 2021 at 11:19:12AM +0200, Richard Biener wrote: > > > --- gcc/config/i386/i386.c.jj 2021-06-21 09:39:21.622487840 +0200 > > > +++ gcc/config/i386/i386.c 2021-06-21 10:21:12.389794740 +0200 > > > @@ -15186,6 +15186,33 @@ ix86_lea_for_add_ok (rtx_insn *insn, rtx > > > return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0, false); > > > } > > > > > > +/* Return true if insns before FIRST_INSN (which is of the form > > > + (set (memory) (zero_operand)) are all also either in the > > > + same form, or (set (zero_operand) (const_int 0)). */ > > > + > > > +bool > > > +ix86_zero_stores_peep2_p (rtx_insn *first_insn, rtx zero_operand) > > > +{ > > > + rtx_insn *insn = first_insn; > > > + for (int count = 0; count < 512; count++) > > > > Can't the peephole add a note (reg_equal?) that the > > SET_SRC of the previously matched store is zero? > > I think REG_EQUAL is not valid, the documentation says that it should > be used on SET of a REG, which is not the case here - we have a MEM. > > > That would avoid the need to walk here. > > But we could do what I've done in > r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70 > - have int ix86_last_zero_store_uid; > set to INSN_UID of the last store emitted by the peephole2s and > then check that INSN_UID against the var. Hmm, or have reg_nonzero_bits_for_peephole2 () and maintain that somehow ... (conservatively drop it when a SET is seen). > Jakub >
On Mon, Jun 21, 2021 at 12:14:09PM +0200, Richard Biener wrote: > > But we could do what I've done in > > r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70 > > - have int ix86_last_zero_store_uid; > > set to INSN_UID of the last store emitted by the peephole2s and > > then check that INSN_UID against the var. > > Hmm, or have reg_nonzero_bits_for_peephole2 () and maintain > that somehow ... (conservatively drop it when a SET is seen). Maintaining something in peephole2 wouldn't be that easy because of peephole2's rolling window, plus it would need to be done in the generic code even when nothing but a single target in a specific case needs that. The following seems to work. 2021-06-21 Jakub Jelinek <jakub@redhat.com> PR target/11877 * config/i386/i386-protos.h (ix86_last_zero_store_uid): Declare. * config/i386/i386-expand.c (ix86_last_zero_store_uid): New variable. * config/i386/i386.c (ix86_expand_prologue): Clear it. * config/i386/i386.md (peephole2s for 1/2/4 stores of const0_rtx): Remove "" from match_operand. Emit new insns using emit_move_insn and set ix86_last_zero_store_uid to INSN_UID of the last store. Add peephole2s for 1/2/4 stores of const0_rtx following previous successful peep2s. --- gcc/config/i386/i386-protos.h.jj 2021-06-21 11:59:16.769693735 +0200 +++ gcc/config/i386/i386-protos.h 2021-06-21 12:01:47.875691930 +0200 @@ -111,6 +111,7 @@ extern bool ix86_use_lea_for_mov (rtx_in extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]); extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode); extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]); +extern int ix86_last_zero_store_uid; extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high); extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn); extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn); --- gcc/config/i386/i386-expand.c.jj 2021-06-21 09:39:21.604488082 +0200 +++ gcc/config/i386/i386-expand.c 2021-06-21 12:21:33.017977951 +0200 @@ -1316,6 +1316,9 @@ find_nearest_reg_def (rtx_insn *insn, in return false; } +/* INSN_UID of the last insn emitted by zero store peephole2s. */ +int ix86_last_zero_store_uid; + /* Split lea instructions into a sequence of instructions which are executed on ALU to avoid AGU stalls. It is assumed that it is allowed to clobber flags register --- gcc/config/i386/i386.c.jj 2021-06-21 09:39:21.622487840 +0200 +++ gcc/config/i386/i386.c 2021-06-21 12:06:54.049634337 +0200 @@ -8196,6 +8196,7 @@ ix86_expand_prologue (void) bool save_stub_call_needed; rtx static_chain = NULL_RTX; + ix86_last_zero_store_uid = 0; if (ix86_function_naked (current_function_decl)) { if (flag_stack_usage_info) --- gcc/config/i386/i386.md.jj 2021-06-21 09:42:04.086303699 +0200 +++ gcc/config/i386/i386.md 2021-06-21 12:14:10.411847549 +0200 @@ -19360,37 +19360,96 @@ (define_peephole2 ;; When optimizing for size, zeroing memory should use a register. (define_peephole2 [(match_scratch:SWI48 0 "r") - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0)) - (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0)) - (set (match_operand:SWI48 3 "memory_operand" "") (const_int 0)) - (set (match_operand:SWI48 4 "memory_operand" "") (const_int 0))] + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 2 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 3 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 4 "memory_operand") (const_int 0))] "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" - [(set (match_dup 1) (match_dup 0)) - (set (match_dup 2) (match_dup 0)) - (set (match_dup 3) (match_dup 0)) - (set (match_dup 4) (match_dup 0))] + [(const_int 0)] { ix86_expand_clear (operands[0]); + emit_move_insn (operands[1], operands[0]); + emit_move_insn (operands[2], operands[0]); + emit_move_insn (operands[3], operands[0]); + ix86_last_zero_store_uid + = INSN_UID (emit_move_insn (operands[4], operands[0])); + DONE; }) (define_peephole2 [(match_scratch:SWI48 0 "r") - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0)) - (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))] + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 2 "memory_operand") (const_int 0))] "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" - [(set (match_dup 1) (match_dup 0)) - (set (match_dup 2) (match_dup 0))] + [(const_int 0)] { ix86_expand_clear (operands[0]); + emit_move_insn (operands[1], operands[0]); + ix86_last_zero_store_uid + = INSN_UID (emit_move_insn (operands[2], operands[0])); + DONE; }) (define_peephole2 [(match_scratch:SWI48 0 "r") - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))] + (set (match_operand:SWI48 1 "memory_operand") (const_int 0))] "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" - [(set (match_dup 1) (match_dup 0))] + [(const_int 0)] { ix86_expand_clear (operands[0]); + ix86_last_zero_store_uid + = INSN_UID (emit_move_insn (operands[1], operands[0])); + DONE; +}) + +(define_peephole2 + [(set (match_operand:SWI48 5 "memory_operand") + (match_operand:SWI48 0 "general_reg_operand")) + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 2 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 3 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 4 "memory_operand") (const_int 0))] + "optimize_insn_for_size_p () + && INSN_UID (peep2_next_insn (0)) == ix86_last_zero_store_uid" + [(const_int 0)] +{ + emit_move_insn (operands[5], operands[0]); + emit_move_insn (operands[1], operands[0]); + emit_move_insn (operands[2], operands[0]); + emit_move_insn (operands[3], operands[0]); + ix86_last_zero_store_uid + = INSN_UID (emit_move_insn (operands[4], operands[0])); + DONE; +}) + +(define_peephole2 + [(set (match_operand:SWI48 3 "memory_operand") + (match_operand:SWI48 0 "general_reg_operand")) + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) + (set (match_operand:SWI48 2 "memory_operand") (const_int 0))] + "optimize_insn_for_size_p () + && INSN_UID (peep2_next_insn (0)) == ix86_last_zero_store_uid" + [(const_int 0)] +{ + emit_move_insn (operands[3], operands[0]); + emit_move_insn (operands[1], operands[0]); + ix86_last_zero_store_uid + = INSN_UID (emit_move_insn (operands[2], operands[0])); + DONE; +}) + +(define_peephole2 + [(set (match_operand:SWI48 2 "memory_operand") + (match_operand:SWI48 0 "general_reg_operand")) + (set (match_operand:SWI48 1 "memory_operand") (const_int 0))] + "optimize_insn_for_size_p () + && INSN_UID (peep2_next_insn (0)) == ix86_last_zero_store_uid" + [(const_int 0)] +{ + emit_move_insn (operands[2], operands[0]); + ix86_last_zero_store_uid + = INSN_UID (emit_move_insn (operands[1], operands[0])); + DONE; }) ;; Reload dislikes loading constants directly into class_likely_spilled Jakub
On Mon, Jun 21, 2021 at 12:28 PM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Jun 21, 2021 at 12:14:09PM +0200, Richard Biener wrote: > > > But we could do what I've done in > > > r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70 > > > - have int ix86_last_zero_store_uid; > > > set to INSN_UID of the last store emitted by the peephole2s and > > > then check that INSN_UID against the var. > > > > Hmm, or have reg_nonzero_bits_for_peephole2 () and maintain > > that somehow ... (conservatively drop it when a SET is seen). > > Maintaining something in peephole2 wouldn't be that easy because > of peephole2's rolling window, plus it would need to be done > in the generic code even when nothing but a single target in a specific case > needs that. > > The following seems to work. > > 2021-06-21 Jakub Jelinek <jakub@redhat.com> > > PR target/11877 > * config/i386/i386-protos.h (ix86_last_zero_store_uid): Declare. > * config/i386/i386-expand.c (ix86_last_zero_store_uid): New variable. > * config/i386/i386.c (ix86_expand_prologue): Clear it. > * config/i386/i386.md (peephole2s for 1/2/4 stores of const0_rtx): > Remove "" from match_operand. Emit new insns using emit_move_insn and > set ix86_last_zero_store_uid to INSN_UID of the last store. > Add peephole2s for 1/2/4 stores of const0_rtx following previous > successful peep2s. LGTM. Thanks, Uros. > > --- gcc/config/i386/i386-protos.h.jj 2021-06-21 11:59:16.769693735 +0200 > +++ gcc/config/i386/i386-protos.h 2021-06-21 12:01:47.875691930 +0200 > @@ -111,6 +111,7 @@ extern bool ix86_use_lea_for_mov (rtx_in > extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]); > extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode); > extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]); > +extern int ix86_last_zero_store_uid; > extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high); > extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn); > extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn); > --- gcc/config/i386/i386-expand.c.jj 2021-06-21 09:39:21.604488082 +0200 > +++ gcc/config/i386/i386-expand.c 2021-06-21 12:21:33.017977951 +0200 > @@ -1316,6 +1316,9 @@ find_nearest_reg_def (rtx_insn *insn, in > return false; > } > > +/* INSN_UID of the last insn emitted by zero store peephole2s. */ > +int ix86_last_zero_store_uid; > + > /* Split lea instructions into a sequence of instructions > which are executed on ALU to avoid AGU stalls. > It is assumed that it is allowed to clobber flags register > --- gcc/config/i386/i386.c.jj 2021-06-21 09:39:21.622487840 +0200 > +++ gcc/config/i386/i386.c 2021-06-21 12:06:54.049634337 +0200 > @@ -8196,6 +8196,7 @@ ix86_expand_prologue (void) > bool save_stub_call_needed; > rtx static_chain = NULL_RTX; > > + ix86_last_zero_store_uid = 0; > if (ix86_function_naked (current_function_decl)) > { > if (flag_stack_usage_info) > --- gcc/config/i386/i386.md.jj 2021-06-21 09:42:04.086303699 +0200 > +++ gcc/config/i386/i386.md 2021-06-21 12:14:10.411847549 +0200 > @@ -19360,37 +19360,96 @@ (define_peephole2 > ;; When optimizing for size, zeroing memory should use a register. > (define_peephole2 > [(match_scratch:SWI48 0 "r") > - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0)) > - (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0)) > - (set (match_operand:SWI48 3 "memory_operand" "") (const_int 0)) > - (set (match_operand:SWI48 4 "memory_operand" "") (const_int 0))] > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 2 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 3 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 4 "memory_operand") (const_int 0))] > "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" > - [(set (match_dup 1) (match_dup 0)) > - (set (match_dup 2) (match_dup 0)) > - (set (match_dup 3) (match_dup 0)) > - (set (match_dup 4) (match_dup 0))] > + [(const_int 0)] > { > ix86_expand_clear (operands[0]); > + emit_move_insn (operands[1], operands[0]); > + emit_move_insn (operands[2], operands[0]); > + emit_move_insn (operands[3], operands[0]); > + ix86_last_zero_store_uid > + = INSN_UID (emit_move_insn (operands[4], operands[0])); > + DONE; > }) > > (define_peephole2 > [(match_scratch:SWI48 0 "r") > - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0)) > - (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))] > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 2 "memory_operand") (const_int 0))] > "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" > - [(set (match_dup 1) (match_dup 0)) > - (set (match_dup 2) (match_dup 0))] > + [(const_int 0)] > { > ix86_expand_clear (operands[0]); > + emit_move_insn (operands[1], operands[0]); > + ix86_last_zero_store_uid > + = INSN_UID (emit_move_insn (operands[2], operands[0])); > + DONE; > }) > > (define_peephole2 > [(match_scratch:SWI48 0 "r") > - (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))] > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0))] > "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" > - [(set (match_dup 1) (match_dup 0))] > + [(const_int 0)] > { > ix86_expand_clear (operands[0]); > + ix86_last_zero_store_uid > + = INSN_UID (emit_move_insn (operands[1], operands[0])); > + DONE; > +}) > + > +(define_peephole2 > + [(set (match_operand:SWI48 5 "memory_operand") > + (match_operand:SWI48 0 "general_reg_operand")) > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 2 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 3 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 4 "memory_operand") (const_int 0))] > + "optimize_insn_for_size_p () > + && INSN_UID (peep2_next_insn (0)) == ix86_last_zero_store_uid" > + [(const_int 0)] > +{ > + emit_move_insn (operands[5], operands[0]); > + emit_move_insn (operands[1], operands[0]); > + emit_move_insn (operands[2], operands[0]); > + emit_move_insn (operands[3], operands[0]); > + ix86_last_zero_store_uid > + = INSN_UID (emit_move_insn (operands[4], operands[0])); > + DONE; > +}) > + > +(define_peephole2 > + [(set (match_operand:SWI48 3 "memory_operand") > + (match_operand:SWI48 0 "general_reg_operand")) > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0)) > + (set (match_operand:SWI48 2 "memory_operand") (const_int 0))] > + "optimize_insn_for_size_p () > + && INSN_UID (peep2_next_insn (0)) == ix86_last_zero_store_uid" > + [(const_int 0)] > +{ > + emit_move_insn (operands[3], operands[0]); > + emit_move_insn (operands[1], operands[0]); > + ix86_last_zero_store_uid > + = INSN_UID (emit_move_insn (operands[2], operands[0])); > + DONE; > +}) > + > +(define_peephole2 > + [(set (match_operand:SWI48 2 "memory_operand") > + (match_operand:SWI48 0 "general_reg_operand")) > + (set (match_operand:SWI48 1 "memory_operand") (const_int 0))] > + "optimize_insn_for_size_p () > + && INSN_UID (peep2_next_insn (0)) == ix86_last_zero_store_uid" > + [(const_int 0)] > +{ > + emit_move_insn (operands[2], operands[0]); > + ix86_last_zero_store_uid > + = INSN_UID (emit_move_insn (operands[1], operands[0])); > + DONE; > }) > > ;; Reload dislikes loading constants directly into class_likely_spilled > > > Jakub >
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 48532eb..2333261 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -19357,6 +19357,42 @@ ix86_expand_clear (operands[1]); }) +;; When optimizing for size, zeroing memory should use a register. +(define_peephole2 + [(match_scratch:SWI48 0 "r") + (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0)) + (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0)) + (set (match_operand:SWI48 3 "memory_operand" "") (const_int 0)) + (set (match_operand:SWI48 4 "memory_operand" "") (const_int 0))] + "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" + [(set (match_dup 1) (match_dup 0)) + (set (match_dup 2) (match_dup 0)) + (set (match_dup 3) (match_dup 0)) + (set (match_dup 4) (match_dup 0))] +{ + ix86_expand_clear (operands[0]); +}) + +(define_peephole2 + [(match_scratch:SWI48 0 "r") + (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0)) + (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))] + "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" + [(set (match_dup 1) (match_dup 0)) + (set (match_dup 2) (match_dup 0))] +{ + ix86_expand_clear (operands[0]); +}) + +(define_peephole2 + [(match_scratch:SWI48 0 "r") + (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))] + "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)" + [(set (match_dup 1) (match_dup 0))] +{ + ix86_expand_clear (operands[0]); +}) + ;; Reload dislikes loading constants directly into class_likely_spilled ;; hard registers. Try to tidy things up here. (define_peephole2