Message ID | 20200714083055.GL2363@tucnak |
---|---|
State | New |
Headers | show |
Series | builtins: Avoid useless char/short -> int promotions before atomics [PR96176] | expand |
On Tue, 14 Jul 2020, Jakub Jelinek wrote: > Hi! > > As mentioned in the PR, we generate a useless movzbl insn before lock cmpxchg. > The problem is that the builtin for the char/short cases has the arguments > promoted to int and combine gives up, because the instructions have > MEM_VOLATILE_P arguments and recog in that case doesn't recognize anything > when volatile_ok is false, and nothing afterwards optimizes the > (reg:SI a) = (zero_extend:SI (reg:QI a)) > ... (subreg:QI (reg:SI a) 0) ... So the above isn't fixable? Because it would probably be the more generic fix, right? > The following patch fixes it at expansion time, we already have a function > that is meant to undo the promotion, so this just adds the very common case > to that. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2020-07-14 Jakub Jelinek <jakub@redhat.com> > > PR target/96176 > * builtins.c: Include gimple-ssa.h, tree-ssa-live.h and > tree-outof-ssa.h. > (expand_expr_force_mode): If exp is a SSA_NAME with different mode > from MODE and get_gimple_for_ssa_name is a cast from MODE, use the > cast's rhs. > > * gcc.target/i386/pr96176.c: New test. > > --- gcc/builtins.c.jj 2020-06-22 10:59:15.000000000 +0200 > +++ gcc/builtins.c 2020-07-13 12:37:56.519653940 +0200 > @@ -73,6 +73,9 @@ along with GCC; see the file COPYING3. > #include "gomp-constants.h" > #include "omp-general.h" > #include "tree-dfa.h" > +#include "gimple-ssa.h" > +#include "tree-ssa-live.h" > +#include "tree-outof-ssa.h" > > struct target_builtins default_target_builtins; > #if SWITCHABLE_TARGET > @@ -6671,6 +6674,21 @@ expand_expr_force_mode (tree exp, machin > rtx val; > machine_mode old_mode; > > + if (TREE_CODE (exp) == SSA_NAME > + && TYPE_MODE (TREE_TYPE (exp)) != mode) > + { > + /* Undo argument promotion if possible, as combine might not > + be able to do it later due to MEM_VOLATILE_P uses in the > + patterns. */ > + gimple *g = get_gimple_for_ssa_name (exp); > + if (g && gimple_assign_cast_p (g)) > + { > + tree rhs = gimple_assign_rhs1 (g); > + if (TYPE_MODE (TREE_TYPE (rhs)) == mode) Is this really enough to check? What if the cast was truncating? The gimple_assign_cast_p predicate also allows for FIX_TRUNC_EXPR and VIEW_CONVERT_EXPR where for the latter the rhs is the VIEW_CONVERT_EXPR itself. Richard. > + exp = rhs; > + } > + } > + > val = expand_expr (exp, NULL_RTX, mode, EXPAND_NORMAL); > /* If VAL is promoted to a wider mode, convert it back to MODE. Take care > of CONST_INTs, where we know the old_mode only from the call argument. */ > --- gcc/testsuite/gcc.target/i386/pr96176.c.jj 2020-07-13 12:44:15.940149701 +0200 > +++ gcc/testsuite/gcc.target/i386/pr96176.c 2020-07-13 12:45:07.822396980 +0200 > @@ -0,0 +1,13 @@ > +/* PR target/96176 */ > +/* { dg-do compile { target lp64 } } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-not "\tmovzbl\t" } } */ > + > +unsigned char v; > + > +void > +foo (unsigned char *x, unsigned char y, unsigned char z) > +{ > + __atomic_compare_exchange_n (x, &y, z, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED); > + v = y; > +} > > Jakub > >
On Tue, Jul 14, 2020 at 02:33:41PM +0200, Richard Biener wrote: > > As mentioned in the PR, we generate a useless movzbl insn before lock cmpxchg. > > The problem is that the builtin for the char/short cases has the arguments > > promoted to int and combine gives up, because the instructions have > > MEM_VOLATILE_P arguments and recog in that case doesn't recognize anything > > when volatile_ok is false, and nothing afterwards optimizes the > > (reg:SI a) = (zero_extend:SI (reg:QI a)) > > ... (subreg:QI (reg:SI a) 0) ... > > So the above isn't fixable? Because it would probably be the more > generic fix, right? I'm afraid it is not, CCing Segher on that. The question is how to differentiate between "combine didn't do anything dangerous to this MEM_VOLATILE_P and just kept it in the pattern as is" vs. "combine changed it in a dangerous way unsuitable for volatile accesses". Even if we wanted to make just the case where i3 is the only insn that originally contains MEM_VOLATILE_P and checked that the MEM stayed as is, there is the difficulty that we change the insns in place, so remembering how it looked before is hard. And then before trying to recognize it we'd carefully need to check that nothing else is volatile before enabling temporarily volatile_ok. > > + if (TREE_CODE (exp) == SSA_NAME > > + && TYPE_MODE (TREE_TYPE (exp)) != mode) > > + { > > + /* Undo argument promotion if possible, as combine might not > > + be able to do it later due to MEM_VOLATILE_P uses in the > > + patterns. */ > > + gimple *g = get_gimple_for_ssa_name (exp); > > + if (g && gimple_assign_cast_p (g)) > > + { > > + tree rhs = gimple_assign_rhs1 (g); > > + if (TYPE_MODE (TREE_TYPE (rhs)) == mode) > > Is this really enough to check? What if the cast was truncating? > The gimple_assign_cast_p predicate also allows for FIX_TRUNC_EXPR > and VIEW_CONVERT_EXPR where for the latter the rhs is the > VIEW_CONVERT_EXPR itself. I don't know how could those happen. mode is the result of get_builtin_sync_mode or int_mode_for_size (BITS_PER_UNIT * size, 0).require () (the former is int_mode_for_size (...).require ()) and thus an integral mode. The checks verify that the SSA_NAME doesn't have that mode, but rhs1 of the statement does have the right mode. As mode is integral, I think that rules out FIX_TRUNC_EXPR which would need real mode on the argument, and for VCE, if VCE is in the operand itself, then it ought to have the same mode as the lhs. I believe the only possibility where exp doesn't have already the desired mode is the char/short to int promotion, otherwise all the builtins are prototyped and thus one shouldn't have some unrelated mode. But if you want some extra checks just to be sure, I can add them, whether it means checking only for CONVERT_EXPR_CODE_P instead of all 4 rhs codes, and/or checking that both types are integral scalar and the conversion is an extension (by comparing type precisions). Jakub
Hi! On Tue, Jul 14, 2020 at 03:26:50PM +0200, Jakub Jelinek wrote: > On Tue, Jul 14, 2020 at 02:33:41PM +0200, Richard Biener wrote: > > > As mentioned in the PR, we generate a useless movzbl insn before lock cmpxchg. > > > The problem is that the builtin for the char/short cases has the arguments > > > promoted to int and combine gives up, because the instructions have > > > MEM_VOLATILE_P arguments and recog in that case doesn't recognize anything > > > when volatile_ok is false, and nothing afterwards optimizes the > > > (reg:SI a) = (zero_extend:SI (reg:QI a)) > > > ... (subreg:QI (reg:SI a) 0) ... > > > > So the above isn't fixable? Because it would probably be the more > > generic fix, right? > > I'm afraid it is not, CCing Segher on that. The question is how to > differentiate between "combine didn't do anything dangerous to this > MEM_VOLATILE_P and just kept it in the pattern as is" vs. > "combine changed it in a dangerous way unsuitable for volatile accesses". This can be handled by having a predicate that allows volatile memory. Various targets do this already (rs6000 is the most prominent). You then have to make sure that every insn using one of those predicates actually *does* do everything volatile memory needs to be done ;-) If ever variable is aligned (say the ABI requires that), you are 98% there already. If not, it is much harder. > Even if we wanted to make just the case where i3 is the only insn > that originally contains MEM_VOLATILE_P and checked that the MEM stayed as > is, there is the difficulty that we change the insns in place, so remembering > how it looked before is hard. And then before trying to recognize it we'd > carefully need to check that nothing else is volatile before enabling > temporarily volatile_ok. Yeah, don't, this wont't work :-) Segher
On Tue, 14 Jul 2020, Jakub Jelinek wrote: > On Tue, Jul 14, 2020 at 02:33:41PM +0200, Richard Biener wrote: > > > As mentioned in the PR, we generate a useless movzbl insn before lock cmpxchg. > > > The problem is that the builtin for the char/short cases has the arguments > > > promoted to int and combine gives up, because the instructions have > > > MEM_VOLATILE_P arguments and recog in that case doesn't recognize anything > > > when volatile_ok is false, and nothing afterwards optimizes the > > > (reg:SI a) = (zero_extend:SI (reg:QI a)) > > > ... (subreg:QI (reg:SI a) 0) ... > > > > So the above isn't fixable? Because it would probably be the more > > generic fix, right? > > I'm afraid it is not, CCing Segher on that. The question is how to > differentiate between "combine didn't do anything dangerous to this > MEM_VOLATILE_P and just kept it in the pattern as is" vs. > "combine changed it in a dangerous way unsuitable for volatile accesses". > Even if we wanted to make just the case where i3 is the only insn > that originally contains MEM_VOLATILE_P and checked that the MEM stayed as > is, there is the difficulty that we change the insns in place, so remembering > how it looked before is hard. And then before trying to recognize it we'd > carefully need to check that nothing else is volatile before enabling > temporarily volatile_ok. > > > > + if (TREE_CODE (exp) == SSA_NAME > > > + && TYPE_MODE (TREE_TYPE (exp)) != mode) > > > + { > > > + /* Undo argument promotion if possible, as combine might not > > > + be able to do it later due to MEM_VOLATILE_P uses in the > > > + patterns. */ > > > + gimple *g = get_gimple_for_ssa_name (exp); > > > + if (g && gimple_assign_cast_p (g)) > > > + { > > > + tree rhs = gimple_assign_rhs1 (g); > > > + if (TYPE_MODE (TREE_TYPE (rhs)) == mode) > > > > Is this really enough to check? What if the cast was truncating? > > The gimple_assign_cast_p predicate also allows for FIX_TRUNC_EXPR > > and VIEW_CONVERT_EXPR where for the latter the rhs is the > > VIEW_CONVERT_EXPR itself. > > I don't know how could those happen. > mode is the result of get_builtin_sync_mode or > int_mode_for_size (BITS_PER_UNIT * size, 0).require () (the former is > int_mode_for_size (...).require ()) and thus an integral mode. > The checks verify that the SSA_NAME doesn't have that mode, but rhs1 > of the statement does have the right mode. > As mode is integral, I think that rules out FIX_TRUNC_EXPR > which would need real mode on the argument, and for VCE, if VCE is > in the operand itself, then it ought to have the same mode as the lhs. > I believe the only possibility where exp doesn't have already the desired > mode is the char/short to int promotion, otherwise all the builtins are > prototyped and thus one shouldn't have some unrelated mode. > > But if you want some extra checks just to be sure, I can add them, whether > it means checking only for CONVERT_EXPR_CODE_P instead of all 4 rhs codes, > and/or checking that both types are integral scalar and the conversion is an > extension (by comparing type precisions). I see the function is only used from atomics expansion but still I would feel more comfortable with checking for just CONVERT_EXPR_CODE_P and both integral types plus the precision bits. You never know how these functions end up re-used ... OK with those changes. Thanks, Richard.
--- gcc/builtins.c.jj 2020-06-22 10:59:15.000000000 +0200 +++ gcc/builtins.c 2020-07-13 12:37:56.519653940 +0200 @@ -73,6 +73,9 @@ along with GCC; see the file COPYING3. #include "gomp-constants.h" #include "omp-general.h" #include "tree-dfa.h" +#include "gimple-ssa.h" +#include "tree-ssa-live.h" +#include "tree-outof-ssa.h" struct target_builtins default_target_builtins; #if SWITCHABLE_TARGET @@ -6671,6 +6674,21 @@ expand_expr_force_mode (tree exp, machin rtx val; machine_mode old_mode; + if (TREE_CODE (exp) == SSA_NAME + && TYPE_MODE (TREE_TYPE (exp)) != mode) + { + /* Undo argument promotion if possible, as combine might not + be able to do it later due to MEM_VOLATILE_P uses in the + patterns. */ + gimple *g = get_gimple_for_ssa_name (exp); + if (g && gimple_assign_cast_p (g)) + { + tree rhs = gimple_assign_rhs1 (g); + if (TYPE_MODE (TREE_TYPE (rhs)) == mode) + exp = rhs; + } + } + val = expand_expr (exp, NULL_RTX, mode, EXPAND_NORMAL); /* If VAL is promoted to a wider mode, convert it back to MODE. Take care of CONST_INTs, where we know the old_mode only from the call argument. */ --- gcc/testsuite/gcc.target/i386/pr96176.c.jj 2020-07-13 12:44:15.940149701 +0200 +++ gcc/testsuite/gcc.target/i386/pr96176.c 2020-07-13 12:45:07.822396980 +0200 @@ -0,0 +1,13 @@ +/* PR target/96176 */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-O2" } */ +/* { dg-final { scan-assembler-not "\tmovzbl\t" } } */ + +unsigned char v; + +void +foo (unsigned char *x, unsigned char y, unsigned char z) +{ + __atomic_compare_exchange_n (x, &y, z, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED); + v = y; +}