diff mbox series

[x86_64] PR target/113690: Fix-up MULT REG_EQUAL notes in STV.

Message ID 00b701da57c9$b7444a80$25ccdf80$@nextmovesoftware.com
State New
Headers show
Series [x86_64] PR target/113690: Fix-up MULT REG_EQUAL notes in STV. | expand

Commit Message

Roger Sayle Feb. 5, 2024, 12:24 a.m. UTC
This patch fixes PR target/113690, an ICE-on-valid regression on x86_64
that exhibits with a specific combination of command line options.  The
cause is that x86's scalar-to-vector pass converts a chain of instructions
from TImode to V1TImode, but fails to appropriately update the attached
REG_EQUAL note.  Given that multiplication isn't supported in V1TImode,
the REG_NOTE handling code wasn't expecting to see a MULT.  Easily solved
with additional handling for other binary operators that may potentially
(in future) have an immediate constant as the second operand that needs
handling.  For convenience, this code (re)factors the logic to convert
a TImode constant into a V1TImode constant vector into a subroutine and
reuses it.

For the record, STV is actually doing something useful in this strange
testcase,  GCC with -O2 -fno-dce -fno-forward-propagate
-fno-split-wide-types
-funroll-loops generates:

foo:    movl    $v, %eax
        pxor    %xmm0, %xmm0
        movaps  %xmm0, 48(%rax)
        movaps  %xmm0, (%rax)
        movaps  %xmm0, 16(%rax)
        movaps  %xmm0, 32(%rax)
        ret

With the addition of -mno-stv (to disable the patched code) it gives:

foo:    movl    $v, %eax
        movq    $0, 48(%rax)
        movq    $0, 56(%rax)
        movq    $0, (%rax)
        movq    $0, 8(%rax)
        movq    $0, 16(%rax)
        movq    $0, 24(%rax)
        movq    $0, 32(%rax)
        movq    $0, 40(%rax)
        ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-02-05  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR target/113690
        * config/i386/i386-features.cc (timode_convert_cst): New helper
        function to convert a TImode CONST_SCALAR_INT_P to a V1TImode
        CONST_VECTOR.
        (timode_scalar_chain::convert_op): Use timode_convert_cst.
        (timode_scalar_chain::convert_insn): If a REG_EQUAL note contains
        a binary operator where the second operand is an immediate integer
        constant, convert it to V1TImode using timode_convert_cst.
        Use timode_convert_cst.

gcc/testsuite/ChangeLog
        PR target/113690
        * gcc.target/i386/pr113690.c: New test case.


Thanks in advance,
Roger
--

Comments

Uros Bizjak Feb. 5, 2024, 8:06 a.m. UTC | #1
On Mon, Feb 5, 2024 at 1:24 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch fixes PR target/113690, an ICE-on-valid regression on x86_64
> that exhibits with a specific combination of command line options.  The
> cause is that x86's scalar-to-vector pass converts a chain of instructions
> from TImode to V1TImode, but fails to appropriately update the attached
> REG_EQUAL note.  Given that multiplication isn't supported in V1TImode,
> the REG_NOTE handling code wasn't expecting to see a MULT.  Easily solved
> with additional handling for other binary operators that may potentially
> (in future) have an immediate constant as the second operand that needs
> handling.  For convenience, this code (re)factors the logic to convert
> a TImode constant into a V1TImode constant vector into a subroutine and
> reuses it.
>
> For the record, STV is actually doing something useful in this strange
> testcase,  GCC with -O2 -fno-dce -fno-forward-propagate
> -fno-split-wide-types
> -funroll-loops generates:
>
> foo:    movl    $v, %eax
>         pxor    %xmm0, %xmm0
>         movaps  %xmm0, 48(%rax)
>         movaps  %xmm0, (%rax)
>         movaps  %xmm0, 16(%rax)
>         movaps  %xmm0, 32(%rax)
>         ret
>
> With the addition of -mno-stv (to disable the patched code) it gives:
>
> foo:    movl    $v, %eax
>         movq    $0, 48(%rax)
>         movq    $0, 56(%rax)
>         movq    $0, (%rax)
>         movq    $0, 8(%rax)
>         movq    $0, 16(%rax)
>         movq    $0, 24(%rax)
>         movq    $0, 32(%rax)
>         movq    $0, 40(%rax)
>         ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-02-05  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/113690
>         * config/i386/i386-features.cc (timode_convert_cst): New helper
>         function to convert a TImode CONST_SCALAR_INT_P to a V1TImode
>         CONST_VECTOR.
>         (timode_scalar_chain::convert_op): Use timode_convert_cst.
>         (timode_scalar_chain::convert_insn): If a REG_EQUAL note contains
>         a binary operator where the second operand is an immediate integer
>         constant, convert it to V1TImode using timode_convert_cst.
>         Use timode_convert_cst.
>
> gcc/testsuite/ChangeLog
>         PR target/113690
>         * gcc.target/i386/pr113690.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>
Uros Bizjak Feb. 5, 2024, 8:16 a.m. UTC | #2
On Mon, Feb 5, 2024 at 9:06 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Feb 5, 2024 at 1:24 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
> >
> >
> > This patch fixes PR target/113690, an ICE-on-valid regression on x86_64
> > that exhibits with a specific combination of command line options.  The
> > cause is that x86's scalar-to-vector pass converts a chain of instructions
> > from TImode to V1TImode, but fails to appropriately update the attached
> > REG_EQUAL note.  Given that multiplication isn't supported in V1TImode,
> > the REG_NOTE handling code wasn't expecting to see a MULT.  Easily solved
> > with additional handling for other binary operators that may potentially
> > (in future) have an immediate constant as the second operand that needs
> > handling.  For convenience, this code (re)factors the logic to convert
> > a TImode constant into a V1TImode constant vector into a subroutine and
> > reuses it.
> >
> > For the record, STV is actually doing something useful in this strange
> > testcase,  GCC with -O2 -fno-dce -fno-forward-propagate
> > -fno-split-wide-types
> > -funroll-loops generates:
> >
> > foo:    movl    $v, %eax
> >         pxor    %xmm0, %xmm0
> >         movaps  %xmm0, 48(%rax)
> >         movaps  %xmm0, (%rax)
> >         movaps  %xmm0, 16(%rax)
> >         movaps  %xmm0, 32(%rax)
> >         ret
> >
> > With the addition of -mno-stv (to disable the patched code) it gives:
> >
> > foo:    movl    $v, %eax
> >         movq    $0, 48(%rax)
> >         movq    $0, 56(%rax)
> >         movq    $0, (%rax)
> >         movq    $0, 8(%rax)
> >         movq    $0, 16(%rax)
> >         movq    $0, 24(%rax)
> >         movq    $0, 32(%rax)
> >         movq    $0, 40(%rax)
> >         ret
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2024-02-05  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR target/113690
> >         * config/i386/i386-features.cc (timode_convert_cst): New helper
> >         function to convert a TImode CONST_SCALAR_INT_P to a V1TImode
> >         CONST_VECTOR.
> >         (timode_scalar_chain::convert_op): Use timode_convert_cst.
> >         (timode_scalar_chain::convert_insn): If a REG_EQUAL note contains
> >         a binary operator where the second operand is an immediate integer
> >         constant, convert it to V1TImode using timode_convert_cst.
> >         Use timode_convert_cst.
> >
> > gcc/testsuite/ChangeLog
> >         PR target/113690
> >         * gcc.target/i386/pr113690.c: New test case.
>
> OK.

OTOH, how about we follow the approach from
general_scalar_chain::convert_insn and just kill the note?

Uros.
diff mbox series

Patch

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 4020b27..90ada7d 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -1749,6 +1749,19 @@  timode_scalar_chain::fix_debug_reg_uses (rtx reg)
     }
 }
 
+/* Helper function to convert immediate constant X to V1TImode.  */
+static rtx
+timode_convert_cst (rtx x)
+{
+  /* Prefer all ones vector in case of -1.  */
+  if (constm1_operand (x, TImode))
+    return CONSTM1_RTX (V1TImode);
+
+  rtx *v = XALLOCAVEC (rtx, 1);
+  v[0] = x;
+  return gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec_v (1, v));
+}
+
 /* Convert operand OP in INSN from TImode to V1TImode.  */
 
 void
@@ -1775,18 +1788,8 @@  timode_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
     }
   else if (CONST_SCALAR_INT_P (*op))
     {
-      rtx vec_cst;
       rtx tmp = gen_reg_rtx (V1TImode);
-
-      /* Prefer all ones vector in case of -1.  */
-      if (constm1_operand (*op, TImode))
-	vec_cst = CONSTM1_RTX (V1TImode);
-      else
-	{
-	  rtx *v = XALLOCAVEC (rtx, 1);
-	  v[0] = *op;
-	  vec_cst = gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec_v (1, v));
-	}
+      rtx vec_cst = timode_convert_cst (*op);
 
       if (!standard_sse_constant_p (vec_cst, V1TImode))
 	{
@@ -1830,12 +1833,28 @@  timode_scalar_chain::convert_insn (rtx_insn *insn)
 	  tmp = find_reg_equal_equiv_note (insn);
 	  if (tmp)
 	    {
-	      if (GET_MODE (XEXP (tmp, 0)) == TImode)
-		PUT_MODE (XEXP (tmp, 0), V1TImode);
-	      else if (CONST_SCALAR_INT_P (XEXP (tmp, 0)))
-		XEXP (tmp, 0)
-		  = gen_rtx_CONST_VECTOR (V1TImode,
-					  gen_rtvec (1, XEXP (tmp, 0)));
+	      rtx expr = XEXP (tmp, 0);
+	      if (GET_MODE (expr) == TImode)
+		{
+		  PUT_MODE (expr, V1TImode);
+		  switch (GET_CODE (expr))
+		    {
+		    case PLUS:
+		    case MINUS:
+		    case MULT:
+		    case AND:
+		    case IOR:
+		    case XOR:
+		      if (CONST_SCALAR_INT_P (XEXP (expr, 1)))
+			XEXP (expr, 1) = timode_convert_cst (XEXP (expr, 1));
+		      break;
+
+		    default:
+		      break;
+		    }
+		}
+	      else if (CONST_SCALAR_INT_P (expr))
+		XEXP (tmp, 0) = timode_convert_cst (expr);
 	    }
 	}
       break;
@@ -1876,7 +1895,7 @@  timode_scalar_chain::convert_insn (rtx_insn *insn)
 	    }
 	  else
 	    {
-	      src = gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec (1, src));
+	      src = timode_convert_cst (src);
 	      src = validize_mem (force_const_mem (V1TImode, src));
 	      use_move = MEM_P (dst);
 	    }
diff --git a/gcc/testsuite/gcc.target/i386/pr113690.c b/gcc/testsuite/gcc.target/i386/pr113690.c
new file mode 100644
index 0000000..23a1108
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr113690.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -fno-dce -fno-forward-propagate -fno-split-wide-types -funroll-loops" } */
+int i;
+__attribute__((__vector_size__(64))) __int128 v;
+
+void
+foo(void)
+{
+  v <<= 127;
+  __builtin_mul_overflow(0, i, &v[3]);
+  v *= 6;
+}