diff mbox series

PR tree-optimization/105668: Provide RTL expansion for VEC_COND_EXPR.

Message ID 011301d86e70$814ba340$83e2e9c0$@nextmovesoftware.com
State New
Headers show
Series PR tree-optimization/105668: Provide RTL expansion for VEC_COND_EXPR. | expand

Commit Message

Roger Sayle May 23, 2022, 6:44 a.m. UTC
This resolves PR tree-optimization/105668, a P1 ice-on-valid regression
triggered by my recent patch to add a vec_cmpeqv1tiv1ti define_expand
to the i386 backend.  The existence of this optab currently leads GCC
to incorrectly assume the existence of a corresponding vcond_mask for
V1TImode.

I believe the best solution (of the three possible fixes) is to allow
gimple_expand_vec_cond_expr to fail (return NULL) when a suitable optab
to generate a IFN_VCOND_MASK isn't available, but instead allow RTL
expansion to provide a default implementation using vector mode logic
operations.  On x86_64, the equivalent of a pblend can be generated in
three instructions using pand, pandn and pxor.  In fact, this fallback
implementation is already used in ix86_expand_sse_movcc when the -march
doesn't provide a suitable instruction.  This patch provides that
functionality to all targets in the middle-end, allowing the vectorizer(s)
to safely assume support for VEC_COND_EXPR (when the target has suitable
vector logic instructions).

I should point out (for the record) that the new expand_vec_cond_expr
function in expr.cc is very different from the function of the same name
removed by Matin Liska in June 2020.
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547097.html
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=502d63b6d6141597bb18fd23c8
7736a1b384cf8f
That function simply expanded the vcond_mask optab and failed if it
wasn't available, which is currently the task of the gimple-isel pass.
The implementation here is a traditional RTL expander, sythesizing the
desired vector conditional move using bit-wise XOR and AND instructions
of the mask vector.

At some point in the future, gimple-isel could be enhanced to consider
alternative vector modes, as a V1TI blend/vec_cond_expr may be implemented
using V2DI, V4SI, V8HI or V16QI.  Alas, I couldn't figure out how to
conveniently iterate over the desired modes, so this enhancement is left
for a follow-up patch.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} with
no new failures.  Ok for mainline?


2022-05-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR tree-optimization/105668
	* expr.cc (expand_vec_cond_expr): New function to expand
	VEC_COND_EXPR using vector mode logical instructions.
	(expand_expr_real_2) <case VEC_COND_EXPR>: Call the above.
	* gimple-isel.cc (gimple_expand_vec_cond_expr): Instead of
	asserting, retun NULL when the target's get_vcond_mask_icode
	returns CODE_FOR_nothing.

gcc/testsuite/ChangeLog
	PR tree-optimization/105668
	* gcc.target/i386/pr105668.c: New test case.

Roger
--

Comments

Richard Biener May 23, 2022, 12:35 p.m. UTC | #1
On Mon, May 23, 2022 at 8:44 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This resolves PR tree-optimization/105668, a P1 ice-on-valid regression
> triggered by my recent patch to add a vec_cmpeqv1tiv1ti define_expand
> to the i386 backend.  The existence of this optab currently leads GCC
> to incorrectly assume the existence of a corresponding vcond_mask for
> V1TImode.
>
> I believe the best solution (of the three possible fixes) is to allow
> gimple_expand_vec_cond_expr to fail (return NULL) when a suitable optab
> to generate a IFN_VCOND_MASK isn't available, but instead allow RTL
> expansion to provide a default implementation using vector mode logic
> operations.  On x86_64, the equivalent of a pblend can be generated in
> three instructions using pand, pandn and pxor.  In fact, this fallback
> implementation is already used in ix86_expand_sse_movcc when the -march
> doesn't provide a suitable instruction.  This patch provides that
> functionality to all targets in the middle-end, allowing the vectorizer(s)
> to safely assume support for VEC_COND_EXPR (when the target has suitable
> vector logic instructions).
>
> I should point out (for the record) that the new expand_vec_cond_expr
> function in expr.cc is very different from the function of the same name
> removed by Matin Liska in June 2020.
> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547097.html
> https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=502d63b6d6141597bb18fd23c8
> 7736a1b384cf8f
> That function simply expanded the vcond_mask optab and failed if it
> wasn't available, which is currently the task of the gimple-isel pass.
> The implementation here is a traditional RTL expander, sythesizing the
> desired vector conditional move using bit-wise XOR and AND instructions
> of the mask vector.
>
> At some point in the future, gimple-isel could be enhanced to consider
> alternative vector modes, as a V1TI blend/vec_cond_expr may be implemented
> using V2DI, V4SI, V8HI or V16QI.  Alas, I couldn't figure out how to
> conveniently iterate over the desired modes, so this enhancement is left
> for a follow-up patch.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} with
> no new failures.  Ok for mainline?

No, first of all the purpose of ISEL is to get rid of _all_ VEC_COND_EXPRs.
So iff then this fallback would have to reside in the ISEL pass, replacing the
GIMPLE with target supported GIMPLE.

But then it is the task of tree-vect-generic.cc to turn not target supported
GIMPLE into target supported GIMPLE - and the issue in the PR in question
is that at its point we basically have _1 < _2 ? _3 : _4 which _is_ supported
by the target but passes inbetween vector lowering and ISEL hide
_1 < _2 via a PHI node and so the GIMPLE is no longer target supported.

That would be the thing to fix - I'll note we put us into the corner of
needing to keep the SSA def of the VEC_COND_EXPR condition
"next" (as in SSA def) to the VEC_COND_EXPR, something that's
difficult to maintain, especially when so man passes run inbetween.

So the solution might be to somehow move the two closer together,
maybe as much as merging the VEC_COND_EXPR part into
vector lowering itself (with the disadvantage of more difficult
to deal with IL).  Or alternatively have vectors lowered earlier
for those produced by user code and have "final" lowering done
as part of RTL expansion (so in ISEL then).

Richard.

>
> 2022-05-23  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR tree-optimization/105668
>         * expr.cc (expand_vec_cond_expr): New function to expand
>         VEC_COND_EXPR using vector mode logical instructions.
>         (expand_expr_real_2) <case VEC_COND_EXPR>: Call the above.
>         * gimple-isel.cc (gimple_expand_vec_cond_expr): Instead of
>         asserting, retun NULL when the target's get_vcond_mask_icode
>         returns CODE_FOR_nothing.
>
> gcc/testsuite/ChangeLog
>         PR tree-optimization/105668
>         * gcc.target/i386/pr105668.c: New test case.
>
> Roger
> --
>
diff mbox series

Patch

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 7197996..d130c4f 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -8931,6 +8931,53 @@  expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0,
   return expand_divmod (mod_p, code, mode, op0, op1, target, unsignedp);
 }
 
+/* Expand VEC_COND_EXPR TARGET = OP0 ? OP1 : OP2 as a sequence of
+   logical operations, typically (OP0 & OP1) | (~OP0 & OP2).  */
+
+static rtx
+expand_vec_cond_expr (machine_mode mode, rtx op0, rtx op1, rtx op2,
+		      rtx target)
+{
+  if (rtx_equal_p (op1, op2))
+    return op1;
+
+  if (op0 == CONST0_RTX (GET_MODE (op0)))
+    return op2;
+  if (vector_all_ones_operand (op0, GET_MODE (op0)))
+    return op1;
+
+  if (mode != GET_MODE (op0))
+    op0 = convert_modes (mode, GET_MODE (op0), op0, true);
+
+  if (op2 == CONST0_RTX (mode))
+    {
+      if (vector_all_ones_operand (op1, mode))
+	return op0;
+      return expand_simple_binop (mode, AND, op0, op1,
+				  target, 1, OPTAB_DIRECT);
+    }
+  else if (op1 == CONST0_RTX (mode))
+    {
+      rtx tmp = expand_simple_unop (mode, NOT, op0, target, 1);
+      if (vector_all_ones_operand (op2, mode))
+	return tmp;
+      return expand_simple_binop (mode, AND, tmp, op2,
+				  target, 1, OPTAB_DIRECT);
+    }
+  else if (vector_all_ones_operand (op1, mode))
+    return expand_simple_binop (mode, IOR, op0, op2,
+				target, 1, OPTAB_DIRECT);
+
+  rtx tmp1 = expand_simple_binop (mode, XOR, op1, op2,
+				  NULL_RTX, 1, OPTAB_DIRECT);
+  rtx tmp2 = expand_simple_binop (mode, AND, op0, tmp1,
+				  NULL_RTX, 1, OPTAB_DIRECT);
+  return expand_simple_binop (mode, XOR, tmp2, op2,
+			      target, 1, OPTAB_DIRECT);
+}
+
+/* sepops variant of expand_expr_real.  */
+
 rtx
 expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 		    enum expand_modifier modifier)
@@ -10302,6 +10349,12 @@  expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
       expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
       return expand_vec_series_expr (mode, op0, op1, target);
 
+    case VEC_COND_EXPR:
+      op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
+      op1 = expand_expr (treeop1, NULL_RTX, VOIDmode, modifier);
+      op2 = expand_expr (treeop2, NULL_RTX, VOIDmode, modifier);
+      return expand_vec_cond_expr (mode, op0, op1, op2, target);
+
     case BIT_INSERT_EXPR:
       {
 	unsigned bitpos = tree_to_uhwi (treeop2);
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 44d086d..62b8ba7 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -279,10 +279,11 @@  gimple_expand_vec_cond_expr (struct function *fun, gimple_stmt_iterator *gsi,
 	}
 
       gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (op0))
-		  && can_compute_op0
-		  && (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
-		      != CODE_FOR_nothing));
-      return gimple_build_call_internal (IFN_VCOND_MASK, 3, op0, op1, op2);
+		  && can_compute_op0);
+      if (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
+	  != CODE_FOR_nothing)
+        return gimple_build_call_internal (IFN_VCOND_MASK, 3, op0, op1, op2);
+      return NULL;
     }
 
   tree tcode_tree = build_int_cst (integer_type_node, tcode);
diff --git a/gcc/testsuite/gcc.target/i386/pr105668.c b/gcc/testsuite/gcc.target/i386/pr105668.c
new file mode 100644
index 0000000..359c2b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105668.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O -ftracer -fno-tree-fre" } */
+
+typedef __int128 __attribute__((__vector_size__ (16))) V;
+
+int i;
+
+V
+foo (_Complex float f)
+{
+  (void) __builtin_atanhf (i);
+  V v = i != (V) { };
+  i ^= f && 8;
+  v %= 5;
+  return v;
+}