diff mbox series

lower-subreg: Fix ROTATE handling [PR114211]

Message ID ZebGequZsnqC9wQt@tucnak
State New
Headers show
Series lower-subreg: Fix ROTATE handling [PR114211] | expand

Commit Message

Jakub Jelinek March 5, 2024, 7:15 a.m. UTC
Hi!

On the following testcase, we have
(insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
        (rotate:TI (reg/v:TI 106 [ h ])
            (const_int 64 [0x40]))) "pr114211.c":8:5 1042 {rotl64ti2_doubleword}
     (nil))
before subreg1 and the pass decides to use
(reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ])
register pair instead of (reg/v:TI 106 [ h ]).
resolve_operand_for_swap_move_operator implements it by pretending it is
an assignment from
(concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ]))
to
(concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ]))
The problem is that if the rotate argument is the same as destination or
if there is even an overlap between the first half of the destination with
second half of the source we emit incorrect code, because the store to
(reg:DI 128 [ h+8 ]) overwrites what we need for source of the second
move.  THe following patch detects that case and uses a temporary pseudo
to hold the original (reg:DI 128 [ h+8 ]) value across the first store.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-05  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/114211
	* lower-subreg.cc (resolve_simple_move): For double-word
	rotates by BITS_PER_WORD if there is overlap between source
	and destination use a temporary.

	* gcc.dg/pr114211.c: New test.


	Jakub

Comments

Richard Biener March 5, 2024, 8:29 a.m. UTC | #1
On Tue, 5 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase, we have
> (insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
>         (rotate:TI (reg/v:TI 106 [ h ])
>             (const_int 64 [0x40]))) "pr114211.c":8:5 1042 {rotl64ti2_doubleword}
>      (nil))
> before subreg1 and the pass decides to use
> (reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ])
> register pair instead of (reg/v:TI 106 [ h ]).
> resolve_operand_for_swap_move_operator implements it by pretending it is
> an assignment from
> (concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ]))
> to
> (concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ]))
> The problem is that if the rotate argument is the same as destination or
> if there is even an overlap between the first half of the destination with
> second half of the source we emit incorrect code, because the store to
> (reg:DI 128 [ h+8 ]) overwrites what we need for source of the second
> move.  THe following patch detects that case and uses a temporary pseudo
> to hold the original (reg:DI 128 [ h+8 ]) value across the first store.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

I wonder if we need to care about extra temporaries on RTL before
RA, thus whether always using a temporary would be OK?

Thanks,
Richard.

> 2024-03-05  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR rtl-optimization/114211
> 	* lower-subreg.cc (resolve_simple_move): For double-word
> 	rotates by BITS_PER_WORD if there is overlap between source
> 	and destination use a temporary.
> 
> 	* gcc.dg/pr114211.c: New test.
> 
> --- gcc/lower-subreg.cc.jj	2024-01-03 11:51:33.713700906 +0100
> +++ gcc/lower-subreg.cc	2024-03-04 20:29:13.911428988 +0100
> @@ -927,6 +927,21 @@ resolve_simple_move (rtx set, rtx_insn *
>  	     SRC's operator.  */
>  	  dest = resolve_operand_for_swap_move_operator (dest);
>  	  src = src_op;
> +	  if (resolve_reg_p (src))
> +	    {
> +	      gcc_assert (GET_CODE (src) == CONCATN);
> +	      if (reg_overlap_mentioned_p (XVECEXP (dest, 0, 0),
> +					   XVECEXP (src, 0, 1)))
> +		{
> +		  /* If there is overlap betwee the first half of the
> +		     destination and what will be stored to the second one,
> +		     use a temporary pseudo.  See PR114211.  */
> +		  rtx tem = gen_reg_rtx (GET_MODE (XVECEXP (src, 0, 1)));
> +		  emit_move_insn (tem, XVECEXP (src, 0, 1));
> +		  src = copy_rtx (src);
> +		  XVECEXP (src, 0, 1) = tem;
> +		}
> +	    }
>  	}
>        else if (resolve_reg_p (src_op))
>  	{
> --- gcc/testsuite/gcc.dg/pr114211.c.jj	2024-03-04 20:37:58.735339443 +0100
> +++ gcc/testsuite/gcc.dg/pr114211.c	2024-03-04 20:37:33.666678077 +0100
> @@ -0,0 +1,23 @@
> +/* PR rtl-optimization/114211 */
> +/* { dg-do run { target int128 } } */
> +/* { dg-options "-O -fno-tree-coalesce-vars -Wno-psabi" } */
> +
> +typedef unsigned __int128 V __attribute__((__vector_size__ (16)));
> +unsigned int u;
> +V v;
> +
> +V
> +foo (unsigned __int128 h)
> +{
> +  h = h << 64 | h >> 64;
> +  h *= ~u;
> +  return h + v;
> +}
> +
> +int
> +main ()
> +{
> +  V x = foo (1);
> +  if (x[0] != (unsigned __int128) 0xffffffff << 64)
> +    __builtin_abort ();
> +}
> 
> 	Jakub
> 
>
Jakub Jelinek March 5, 2024, 8:45 a.m. UTC | #2
On Tue, Mar 05, 2024 at 09:29:38AM +0100, Richard Biener wrote:
> I wonder if we need to care about extra temporaries on RTL before
> RA, thus whether always using a temporary would be OK?

I'd still need to do the resolve_reg_p check, otherwise if it is e.g. a
memory, the copying to temporary would be significantly harder because
src isn't a CONCATN in that case.
I think the reg_overlap_mentioned_p is worth it, because the normal
case is that there is no overlap, usually expansion forces distinct
pseudos.  If this was more common problem, it wouldn't take so long
to discover it and wouldn't need -fno-tree-coalesce-vars.
And the extra pseudo could result in different RA decisions.

	Jakub
diff mbox series

Patch

--- gcc/lower-subreg.cc.jj	2024-01-03 11:51:33.713700906 +0100
+++ gcc/lower-subreg.cc	2024-03-04 20:29:13.911428988 +0100
@@ -927,6 +927,21 @@  resolve_simple_move (rtx set, rtx_insn *
 	     SRC's operator.  */
 	  dest = resolve_operand_for_swap_move_operator (dest);
 	  src = src_op;
+	  if (resolve_reg_p (src))
+	    {
+	      gcc_assert (GET_CODE (src) == CONCATN);
+	      if (reg_overlap_mentioned_p (XVECEXP (dest, 0, 0),
+					   XVECEXP (src, 0, 1)))
+		{
+		  /* If there is overlap betwee the first half of the
+		     destination and what will be stored to the second one,
+		     use a temporary pseudo.  See PR114211.  */
+		  rtx tem = gen_reg_rtx (GET_MODE (XVECEXP (src, 0, 1)));
+		  emit_move_insn (tem, XVECEXP (src, 0, 1));
+		  src = copy_rtx (src);
+		  XVECEXP (src, 0, 1) = tem;
+		}
+	    }
 	}
       else if (resolve_reg_p (src_op))
 	{
--- gcc/testsuite/gcc.dg/pr114211.c.jj	2024-03-04 20:37:58.735339443 +0100
+++ gcc/testsuite/gcc.dg/pr114211.c	2024-03-04 20:37:33.666678077 +0100
@@ -0,0 +1,23 @@ 
+/* PR rtl-optimization/114211 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O -fno-tree-coalesce-vars -Wno-psabi" } */
+
+typedef unsigned __int128 V __attribute__((__vector_size__ (16)));
+unsigned int u;
+V v;
+
+V
+foo (unsigned __int128 h)
+{
+  h = h << 64 | h >> 64;
+  h *= ~u;
+  return h + v;
+}
+
+int
+main ()
+{
+  V x = foo (1);
+  if (x[0] != (unsigned __int128) 0xffffffff << 64)
+    __builtin_abort ();
+}