Message ID | 20160401131407.GT3017@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
On 04/01/2016 03:14 PM, Jakub Jelinek wrote: > As the testcase below shows, we can end up with lots of useless > instructions from multi-word arithmetics. > simplify-rtx.c can optimize x {&,|,^}= {0,-1}, but while > x &= 0 or x {|,^}= -1 are optimized into constants and CSE can handle those > fine, we keep x &= -1 and x {|,^}= 0 in the IL until expansion if x > is a MEM. There are two issues, one is that cse_insn has for a few years > code that wants to prevent partially overlapping MEM->MEM moves, > but actually doesn't realize that fully overlapping MEM->MEM noop moves > are fine. And the second one is that on most backends, there are no > MEM->MEM move instructions, so we need to delete the useless insns instead, > because it can't match. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux. > Is this something we want for 6.x or defer for stage1? It seems like a stage1 thing to me unless it's a regression. But you're in a better position to make that call. > + /* Similarly, lots of targets don't allow no-op > + (set (mem x) (mem x)) moves. */ > + else if (n_sets == 1 > + && MEM_P (trial) > + && MEM_P (dest) > + && rtx_equal_p (trial, dest) > + && !side_effects_p (dest) > + && (cfun->can_delete_dead_exceptions > + || insn_nothrow_p (insn))) Looks like this block of code is practically duplicated - I'd prefer a helper function set_of_equal_mems_removable_p or something. Ok with that change. Bernd
On Fri, Apr 01, 2016 at 03:35:19PM +0200, Bernd Schmidt wrote: > On 04/01/2016 03:14 PM, Jakub Jelinek wrote: > >As the testcase below shows, we can end up with lots of useless > >instructions from multi-word arithmetics. > >simplify-rtx.c can optimize x {&,|,^}= {0,-1}, but while > >x &= 0 or x {|,^}= -1 are optimized into constants and CSE can handle those > >fine, we keep x &= -1 and x {|,^}= 0 in the IL until expansion if x > >is a MEM. There are two issues, one is that cse_insn has for a few years > >code that wants to prevent partially overlapping MEM->MEM moves, > >but actually doesn't realize that fully overlapping MEM->MEM noop moves > >are fine. And the second one is that on most backends, there are no > >MEM->MEM move instructions, so we need to delete the useless insns instead, > >because it can't match. > > > >Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux. > >Is this something we want for 6.x or defer for stage1? > > It seems like a stage1 thing to me unless it's a regression. But you're in a > better position to make that call. I guess it can wait for stage1. > >+ /* Similarly, lots of targets don't allow no-op > >+ (set (mem x) (mem x)) moves. */ > >+ else if (n_sets == 1 > >+ && MEM_P (trial) > >+ && MEM_P (dest) > >+ && rtx_equal_p (trial, dest) > >+ && !side_effects_p (dest) > >+ && (cfun->can_delete_dead_exceptions > >+ || insn_nothrow_p (insn))) > > Looks like this block of code is practically duplicated - I'd prefer a > helper function set_of_equal_mems_removable_p or something. Ok with that > change. Perhaps instead just set a bool in the second hunk and just test that at the third hunk's condition? Jakub
On 04/01/2016 04:51 PM, Jakub Jelinek wrote: > On Fri, Apr 01, 2016 at 03:35:19PM +0200, Bernd Schmidt wrote: >> On 04/01/2016 03:14 PM, Jakub Jelinek wrote: >>> As the testcase below shows, we can end up with lots of useless >>> instructions from multi-word arithmetics. >>> simplify-rtx.c can optimize x {&,|,^}= {0,-1}, but while >>> x &= 0 or x {|,^}= -1 are optimized into constants and CSE can handle those >>> fine, we keep x &= -1 and x {|,^}= 0 in the IL until expansion if x >>> is a MEM. There are two issues, one is that cse_insn has for a few years >>> code that wants to prevent partially overlapping MEM->MEM moves, >>> but actually doesn't realize that fully overlapping MEM->MEM noop moves >>> are fine. And the second one is that on most backends, there are no >>> MEM->MEM move instructions, so we need to delete the useless insns instead, >>> because it can't match. >>> >>> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux. >>> Is this something we want for 6.x or defer for stage1? >> >> It seems like a stage1 thing to me unless it's a regression. But you're in a >> better position to make that call. > > I guess it can wait for stage1. > >>> + /* Similarly, lots of targets don't allow no-op >>> + (set (mem x) (mem x)) moves. */ >>> + else if (n_sets == 1 >>> + && MEM_P (trial) >>> + && MEM_P (dest) >>> + && rtx_equal_p (trial, dest) >>> + && !side_effects_p (dest) >>> + && (cfun->can_delete_dead_exceptions >>> + || insn_nothrow_p (insn))) >> >> Looks like this block of code is practically duplicated - I'd prefer a >> helper function set_of_equal_mems_removable_p or something. Ok with that >> change. > > Perhaps instead just set a bool in the second hunk and just test that at the > third hunk's condition? Also works for me. Or maybe set trial and dest to pc_rtx and merge the new case with the preexisting one. As long as we don't get a big block of duplicated conditions. Bernd
--- gcc/cse.c.jj 2016-04-01 12:12:22.630602270 +0200 +++ gcc/cse.c 2016-04-01 12:18:03.509947871 +0200 @@ -5166,7 +5166,7 @@ cse_insn (rtx_insn *insn) } /* Avoid creation of overlapping memory moves. */ - if (MEM_P (trial) && MEM_P (SET_DEST (sets[i].rtl))) + if (MEM_P (trial) && MEM_P (dest) && !rtx_equal_p (trial, dest)) { rtx src, dest; @@ -5277,6 +5277,20 @@ cse_insn (rtx_insn *insn) break; } + /* Similarly, lots of targets don't allow no-op + (set (mem x) (mem x)) moves. */ + else if (n_sets == 1 + && MEM_P (trial) + && MEM_P (dest) + && rtx_equal_p (trial, dest) + && !side_effects_p (dest) + && (cfun->can_delete_dead_exceptions + || insn_nothrow_p (insn))) + { + SET_SRC (sets[i].rtl) = trial; + break; + } + /* Reject certain invalid forms of CONST that we create. */ else if (CONSTANT_P (trial) && GET_CODE (trial) == CONST @@ -5495,6 +5509,21 @@ cse_insn (rtx_insn *insn) sets[i].rtl = 0; } + /* Similarly for no-op MEM moves. */ + else if (n_sets == 1 + && MEM_P (SET_DEST (sets[i].rtl)) + && MEM_P (SET_SRC (sets[i].rtl)) + && rtx_equal_p (SET_DEST (sets[i].rtl), + SET_SRC (sets[i].rtl)) + && !side_effects_p (SET_DEST (sets[i].rtl)) + && (cfun->can_delete_dead_exceptions || insn_nothrow_p (insn))) + { + if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn)) + cse_cfg_altered = true; + delete_insn_and_edges (insn); + sets[i].rtl = 0; + } + /* If this SET is now setting PC to a label, we know it used to be a conditional or computed branch. */ else if (dest == pc_rtx && GET_CODE (src) == LABEL_REF --- gcc/testsuite/gcc.target/i386/pr70467-1.c.jj 2016-04-01 12:13:06.945997183 +0200 +++ gcc/testsuite/gcc.target/i386/pr70467-1.c 2016-04-01 12:17:12.187648630 +0200 @@ -0,0 +1,55 @@ +/* PR rtl-optimization/70467 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mno-sse" } */ + +void foo (unsigned long long *); + +void +bar (void) +{ + unsigned long long a; + foo (&a); + a &= 0x7fffffffffffffffULL; + foo (&a); + a &= 0xffffffff7fffffffULL; + foo (&a); + a &= 0x7fffffff00000000ULL; + foo (&a); + a &= 0x000000007fffffffULL; + foo (&a); + a &= 0x00000000ffffffffULL; + foo (&a); + a &= 0xffffffff00000000ULL; + foo (&a); + a |= 0x7fffffffffffffffULL; + foo (&a); + a |= 0xffffffff7fffffffULL; + foo (&a); + a |= 0x7fffffff00000000ULL; + foo (&a); + a |= 0x000000007fffffffULL; + foo (&a); + a |= 0x00000000ffffffffULL; + foo (&a); + a |= 0xffffffff00000000ULL; + foo (&a); + a ^= 0x7fffffffffffffffULL; + foo (&a); + a ^= 0xffffffff7fffffffULL; + foo (&a); + a ^= 0x7fffffff00000000ULL; + foo (&a); + a ^= 0x000000007fffffffULL; + foo (&a); + a ^= 0x00000000ffffffffULL; + foo (&a); + a ^= 0xffffffff00000000ULL; + foo (&a); +} + +/* { dg-final { scan-assembler-not "andl\[ \t\]*.-1," { target ia32 } } } */ +/* { dg-final { scan-assembler-not "andl\[ \t\]*.0," { target ia32 } } } */ +/* { dg-final { scan-assembler-not "orl\[ \t\]*.-1," { target ia32 } } } */ +/* { dg-final { scan-assembler-not "orl\[ \t\]*.0," { target ia32 } } } */ +/* { dg-final { scan-assembler-not "xorl\[ \t\]*.-1," { target ia32 } } } */ +/* { dg-final { scan-assembler-not "xorl\[ \t\]*.0," { target ia32 } } } */