diff mbox series

builtins: Avoid useless char/short -> int promotions before atomics [PR96176]

Message ID 20200714083055.GL2363@tucnak
State New
Headers show
Series builtins: Avoid useless char/short -> int promotions before atomics [PR96176] | expand

Commit Message

Jakub Jelinek July 14, 2020, 8:30 a.m. UTC
Hi!

As mentioned in the PR, we generate a useless movzbl insn before lock cmpxchg.
The problem is that the builtin for the char/short cases has the arguments
promoted to int and combine gives up, because the instructions have
MEM_VOLATILE_P arguments and recog in that case doesn't recognize anything
when volatile_ok is false, and nothing afterwards optimizes the
(reg:SI a) = (zero_extend:SI (reg:QI a))
... (subreg:QI (reg:SI a) 0) ...

The following patch fixes it at expansion time, we already have a function
that is meant to undo the promotion, so this just adds the very common case
to that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-07-14  Jakub Jelinek  <jakub@redhat.com>

	PR target/96176
	* builtins.c: Include gimple-ssa.h, tree-ssa-live.h and
	tree-outof-ssa.h.
	(expand_expr_force_mode): If exp is a SSA_NAME with different mode
	from MODE and get_gimple_for_ssa_name is a cast from MODE, use the
	cast's rhs.

	* gcc.target/i386/pr96176.c: New test.


	Jakub

Comments

Richard Biener July 14, 2020, 12:33 p.m. UTC | #1
On Tue, 14 Jul 2020, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, we generate a useless movzbl insn before lock cmpxchg.
> The problem is that the builtin for the char/short cases has the arguments
> promoted to int and combine gives up, because the instructions have
> MEM_VOLATILE_P arguments and recog in that case doesn't recognize anything
> when volatile_ok is false, and nothing afterwards optimizes the
> (reg:SI a) = (zero_extend:SI (reg:QI a))
> ... (subreg:QI (reg:SI a) 0) ...

So the above isn't fixable?  Because it would probably be the more
generic fix, right?

> The following patch fixes it at expansion time, we already have a function
> that is meant to undo the promotion, so this just adds the very common case
> to that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2020-07-14  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/96176
> 	* builtins.c: Include gimple-ssa.h, tree-ssa-live.h and
> 	tree-outof-ssa.h.
> 	(expand_expr_force_mode): If exp is a SSA_NAME with different mode
> 	from MODE and get_gimple_for_ssa_name is a cast from MODE, use the
> 	cast's rhs.
> 
> 	* gcc.target/i386/pr96176.c: New test.
> 
> --- gcc/builtins.c.jj	2020-06-22 10:59:15.000000000 +0200
> +++ gcc/builtins.c	2020-07-13 12:37:56.519653940 +0200
> @@ -73,6 +73,9 @@ along with GCC; see the file COPYING3.
>  #include "gomp-constants.h"
>  #include "omp-general.h"
>  #include "tree-dfa.h"
> +#include "gimple-ssa.h"
> +#include "tree-ssa-live.h"
> +#include "tree-outof-ssa.h"
>  
>  struct target_builtins default_target_builtins;
>  #if SWITCHABLE_TARGET
> @@ -6671,6 +6674,21 @@ expand_expr_force_mode (tree exp, machin
>    rtx val;
>    machine_mode old_mode;
>  
> +  if (TREE_CODE (exp) == SSA_NAME
> +      && TYPE_MODE (TREE_TYPE (exp)) != mode)
> +    {
> +      /* Undo argument promotion if possible, as combine might not
> +	 be able to do it later due to MEM_VOLATILE_P uses in the
> +	 patterns.  */
> +      gimple *g = get_gimple_for_ssa_name (exp);
> +      if (g && gimple_assign_cast_p (g))
> +	{
> +	  tree rhs = gimple_assign_rhs1 (g);
> +	  if (TYPE_MODE (TREE_TYPE (rhs)) == mode)

Is this really enough to check?  What if the cast was truncating?
The gimple_assign_cast_p predicate also allows for FIX_TRUNC_EXPR
and VIEW_CONVERT_EXPR where for the latter the rhs is the 
VIEW_CONVERT_EXPR itself.

Richard.

> +	    exp = rhs;
> +	}
> +    }
> +
>    val = expand_expr (exp, NULL_RTX, mode, EXPAND_NORMAL);
>    /* If VAL is promoted to a wider mode, convert it back to MODE.  Take care
>       of CONST_INTs, where we know the old_mode only from the call argument.  */
> --- gcc/testsuite/gcc.target/i386/pr96176.c.jj	2020-07-13 12:44:15.940149701 +0200
> +++ gcc/testsuite/gcc.target/i386/pr96176.c	2020-07-13 12:45:07.822396980 +0200
> @@ -0,0 +1,13 @@
> +/* PR target/96176 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "\tmovzbl\t" } } */
> +
> +unsigned char v;
> +
> +void
> +foo (unsigned char *x, unsigned char y, unsigned char z)
> +{
> +  __atomic_compare_exchange_n (x, &y, z, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
> +  v = y;
> +}
> 
> 	Jakub
> 
>
Jakub Jelinek July 14, 2020, 1:26 p.m. UTC | #2
On Tue, Jul 14, 2020 at 02:33:41PM +0200, Richard Biener wrote:
> > As mentioned in the PR, we generate a useless movzbl insn before lock cmpxchg.
> > The problem is that the builtin for the char/short cases has the arguments
> > promoted to int and combine gives up, because the instructions have
> > MEM_VOLATILE_P arguments and recog in that case doesn't recognize anything
> > when volatile_ok is false, and nothing afterwards optimizes the
> > (reg:SI a) = (zero_extend:SI (reg:QI a))
> > ... (subreg:QI (reg:SI a) 0) ...
> 
> So the above isn't fixable?  Because it would probably be the more
> generic fix, right?

I'm afraid it is not, CCing Segher on that.  The question is how to
differentiate between "combine didn't do anything dangerous to this
MEM_VOLATILE_P and just kept it in the pattern as is" vs.
"combine changed it in a dangerous way unsuitable for volatile accesses".
Even if we wanted to make just the case where i3 is the only insn
that originally contains MEM_VOLATILE_P and checked that the MEM stayed as
is, there is the difficulty that we change the insns in place, so remembering
how it looked before is hard.  And then before trying to recognize it we'd
carefully need to check that nothing else is volatile before enabling
temporarily volatile_ok.

> > +  if (TREE_CODE (exp) == SSA_NAME
> > +      && TYPE_MODE (TREE_TYPE (exp)) != mode)
> > +    {
> > +      /* Undo argument promotion if possible, as combine might not
> > +	 be able to do it later due to MEM_VOLATILE_P uses in the
> > +	 patterns.  */
> > +      gimple *g = get_gimple_for_ssa_name (exp);
> > +      if (g && gimple_assign_cast_p (g))
> > +	{
> > +	  tree rhs = gimple_assign_rhs1 (g);
> > +	  if (TYPE_MODE (TREE_TYPE (rhs)) == mode)
> 
> Is this really enough to check?  What if the cast was truncating?
> The gimple_assign_cast_p predicate also allows for FIX_TRUNC_EXPR
> and VIEW_CONVERT_EXPR where for the latter the rhs is the 
> VIEW_CONVERT_EXPR itself.

I don't know how could those happen.
mode is the result of get_builtin_sync_mode or
int_mode_for_size (BITS_PER_UNIT * size, 0).require () (the former is
int_mode_for_size (...).require ()) and thus an integral mode.
The checks verify that the SSA_NAME doesn't have that mode, but rhs1
of the statement does have the right mode.
As mode is integral, I think that rules out FIX_TRUNC_EXPR
which would need real mode on the argument, and for VCE, if VCE is
in the operand itself, then it ought to have the same mode as the lhs.
I believe the only possibility where exp doesn't have already the desired
mode is the char/short to int promotion, otherwise all the builtins are
prototyped and thus one shouldn't have some unrelated mode.

But if you want some extra checks just to be sure, I can add them, whether
it means checking only for CONVERT_EXPR_CODE_P instead of all 4 rhs codes,
and/or checking that both types are integral scalar and the conversion is an
extension (by comparing type precisions).

	Jakub
Segher Boessenkool July 14, 2020, 2:04 p.m. UTC | #3
Hi!

On Tue, Jul 14, 2020 at 03:26:50PM +0200, Jakub Jelinek wrote:
> On Tue, Jul 14, 2020 at 02:33:41PM +0200, Richard Biener wrote:
> > > As mentioned in the PR, we generate a useless movzbl insn before lock cmpxchg.
> > > The problem is that the builtin for the char/short cases has the arguments
> > > promoted to int and combine gives up, because the instructions have
> > > MEM_VOLATILE_P arguments and recog in that case doesn't recognize anything
> > > when volatile_ok is false, and nothing afterwards optimizes the
> > > (reg:SI a) = (zero_extend:SI (reg:QI a))
> > > ... (subreg:QI (reg:SI a) 0) ...
> > 
> > So the above isn't fixable?  Because it would probably be the more
> > generic fix, right?
> 
> I'm afraid it is not, CCing Segher on that.  The question is how to
> differentiate between "combine didn't do anything dangerous to this
> MEM_VOLATILE_P and just kept it in the pattern as is" vs.
> "combine changed it in a dangerous way unsuitable for volatile accesses".

This can be handled by having a predicate that allows volatile memory.
Various targets do this already (rs6000 is the most prominent).  You then
have to make sure that every insn using one of those predicates actually
*does* do everything volatile memory needs to be done ;-)

If ever variable is aligned (say the ABI requires that), you are 98%
there already.  If not, it is much harder.

> Even if we wanted to make just the case where i3 is the only insn
> that originally contains MEM_VOLATILE_P and checked that the MEM stayed as
> is, there is the difficulty that we change the insns in place, so remembering
> how it looked before is hard.  And then before trying to recognize it we'd
> carefully need to check that nothing else is volatile before enabling
> temporarily volatile_ok.

Yeah, don't, this wont't work :-)


Segher
Richard Biener July 14, 2020, 2:11 p.m. UTC | #4
On Tue, 14 Jul 2020, Jakub Jelinek wrote:

> On Tue, Jul 14, 2020 at 02:33:41PM +0200, Richard Biener wrote:
> > > As mentioned in the PR, we generate a useless movzbl insn before lock cmpxchg.
> > > The problem is that the builtin for the char/short cases has the arguments
> > > promoted to int and combine gives up, because the instructions have
> > > MEM_VOLATILE_P arguments and recog in that case doesn't recognize anything
> > > when volatile_ok is false, and nothing afterwards optimizes the
> > > (reg:SI a) = (zero_extend:SI (reg:QI a))
> > > ... (subreg:QI (reg:SI a) 0) ...
> > 
> > So the above isn't fixable?  Because it would probably be the more
> > generic fix, right?
> 
> I'm afraid it is not, CCing Segher on that.  The question is how to
> differentiate between "combine didn't do anything dangerous to this
> MEM_VOLATILE_P and just kept it in the pattern as is" vs.
> "combine changed it in a dangerous way unsuitable for volatile accesses".
> Even if we wanted to make just the case where i3 is the only insn
> that originally contains MEM_VOLATILE_P and checked that the MEM stayed as
> is, there is the difficulty that we change the insns in place, so remembering
> how it looked before is hard.  And then before trying to recognize it we'd
> carefully need to check that nothing else is volatile before enabling
> temporarily volatile_ok.
> 
> > > +  if (TREE_CODE (exp) == SSA_NAME
> > > +      && TYPE_MODE (TREE_TYPE (exp)) != mode)
> > > +    {
> > > +      /* Undo argument promotion if possible, as combine might not
> > > +	 be able to do it later due to MEM_VOLATILE_P uses in the
> > > +	 patterns.  */
> > > +      gimple *g = get_gimple_for_ssa_name (exp);
> > > +      if (g && gimple_assign_cast_p (g))
> > > +	{
> > > +	  tree rhs = gimple_assign_rhs1 (g);
> > > +	  if (TYPE_MODE (TREE_TYPE (rhs)) == mode)
> > 
> > Is this really enough to check?  What if the cast was truncating?
> > The gimple_assign_cast_p predicate also allows for FIX_TRUNC_EXPR
> > and VIEW_CONVERT_EXPR where for the latter the rhs is the 
> > VIEW_CONVERT_EXPR itself.
> 
> I don't know how could those happen.
> mode is the result of get_builtin_sync_mode or
> int_mode_for_size (BITS_PER_UNIT * size, 0).require () (the former is
> int_mode_for_size (...).require ()) and thus an integral mode.
> The checks verify that the SSA_NAME doesn't have that mode, but rhs1
> of the statement does have the right mode.
> As mode is integral, I think that rules out FIX_TRUNC_EXPR
> which would need real mode on the argument, and for VCE, if VCE is
> in the operand itself, then it ought to have the same mode as the lhs.
> I believe the only possibility where exp doesn't have already the desired
> mode is the char/short to int promotion, otherwise all the builtins are
> prototyped and thus one shouldn't have some unrelated mode.
> 
> But if you want some extra checks just to be sure, I can add them, whether
> it means checking only for CONVERT_EXPR_CODE_P instead of all 4 rhs codes,
> and/or checking that both types are integral scalar and the conversion is an
> extension (by comparing type precisions).

I see the function is only used from atomics expansion but still
I would feel more comfortable with checking for just
CONVERT_EXPR_CODE_P and both integral types plus the precision bits.
You never know how these functions end up re-used ...

OK with those changes.

Thanks,
Richard.
diff mbox series

Patch

--- gcc/builtins.c.jj	2020-06-22 10:59:15.000000000 +0200
+++ gcc/builtins.c	2020-07-13 12:37:56.519653940 +0200
@@ -73,6 +73,9 @@  along with GCC; see the file COPYING3.
 #include "gomp-constants.h"
 #include "omp-general.h"
 #include "tree-dfa.h"
+#include "gimple-ssa.h"
+#include "tree-ssa-live.h"
+#include "tree-outof-ssa.h"
 
 struct target_builtins default_target_builtins;
 #if SWITCHABLE_TARGET
@@ -6671,6 +6674,21 @@  expand_expr_force_mode (tree exp, machin
   rtx val;
   machine_mode old_mode;
 
+  if (TREE_CODE (exp) == SSA_NAME
+      && TYPE_MODE (TREE_TYPE (exp)) != mode)
+    {
+      /* Undo argument promotion if possible, as combine might not
+	 be able to do it later due to MEM_VOLATILE_P uses in the
+	 patterns.  */
+      gimple *g = get_gimple_for_ssa_name (exp);
+      if (g && gimple_assign_cast_p (g))
+	{
+	  tree rhs = gimple_assign_rhs1 (g);
+	  if (TYPE_MODE (TREE_TYPE (rhs)) == mode)
+	    exp = rhs;
+	}
+    }
+
   val = expand_expr (exp, NULL_RTX, mode, EXPAND_NORMAL);
   /* If VAL is promoted to a wider mode, convert it back to MODE.  Take care
      of CONST_INTs, where we know the old_mode only from the call argument.  */
--- gcc/testsuite/gcc.target/i386/pr96176.c.jj	2020-07-13 12:44:15.940149701 +0200
+++ gcc/testsuite/gcc.target/i386/pr96176.c	2020-07-13 12:45:07.822396980 +0200
@@ -0,0 +1,13 @@ 
+/* PR target/96176 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "\tmovzbl\t" } } */
+
+unsigned char v;
+
+void
+foo (unsigned char *x, unsigned char y, unsigned char z)
+{
+  __atomic_compare_exchange_n (x, &y, z, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+  v = y;
+}