Patchwork [AArch64] Add special case when expanding vcond with arms {-1, -1}, {0, 0}.

login
register
mail settings
Submitter James Greenhalgh
Date April 30, 2013, 3:06 p.m.
Message ID <1367334387-22610-1-git-send-email-james.greenhalgh@arm.com>
Download mbox | patch
Permalink /patch/240638/
State New
Headers show

Comments

James Greenhalgh - April 30, 2013, 3:06 p.m.
If the end goal of a VEC_COND_EXPR is to pick between
{-1, -1, -1, -1} and {0, 0, 0, 0}
then we do not need to do a bit select, this is just
a move of the generated mask to the result operand.

This patch checks for this case, and emits the
appropriate instructions. This can save us loading the
two constant masks and performing a bsl.

The midend turns GE_EXPR style tree codes in to VEC_COND_EXPRS,
so fixing this folding up earlier in the compiler is not helpful.

Regression tested for aarch64-none-elf with no regressions.

OK?

Thanks,
James

---
gcc/

2013-04-30  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-simd.md
	(vcond<mode>_internal): Handle special cases for constant masks.
	(vcond<mode><mode>): Allow nonmemory_operands for outcome vectors.
	(vcondu<mode><mode>): Likewise.
	(vcond<v_cmp_result><mode>): New.
Marcus Shawcroft - April 30, 2013, 5:22 p.m.
OK
/Marcus

On 30 April 2013 16:06, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> If the end goal of a VEC_COND_EXPR is to pick between
> {-1, -1, -1, -1} and {0, 0, 0, 0}
> then we do not need to do a bit select, this is just
> a move of the generated mask to the result operand.
>
> This patch checks for this case, and emits the
> appropriate instructions. This can save us loading the
> two constant masks and performing a bsl.
>
> The midend turns GE_EXPR style tree codes in to VEC_COND_EXPRS,
> so fixing this folding up earlier in the compiler is not helpful.
>
> Regression tested for aarch64-none-elf with no regressions.
>
> OK?
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2013-04-30  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-simd.md
>         (vcond<mode>_internal): Handle special cases for constant masks.
>         (vcond<mode><mode>): Allow nonmemory_operands for outcome vectors.
>         (vcondu<mode><mode>): Likewise.
>         (vcond<v_cmp_result><mode>): New.
Marc Glisse - April 30, 2013, 9:06 p.m.
On Tue, 30 Apr 2013, James Greenhalgh wrote:

> If the end goal of a VEC_COND_EXPR is to pick between
> {-1, -1, -1, -1} and {0, 0, 0, 0}
> then we do not need to do a bit select, this is just
> a move of the generated mask to the result operand.
>
> This patch checks for this case, and emits the
> appropriate instructions. This can save us loading the
> two constant masks and performing a bsl.

Some other targets (x86) optimize:
(a cmp b) ? x : 0
to
(a cmp b) & x
and then I think they rely on the generic simplify-rtx to optimize AND 
with -1.

(it is also possible to replace (a cmp b) ? -1 : x with (a cmp b) | x)

Since several targets are in the same situation, it might make sense to 
have those optimizations in the generic code with a way for targets to 
advertise that their comparisons return 0/-1 the same size as the 
arguments (using VECTOR_STORE_FLAG_VALUE?).

(to be clear: I am not at all arguing against this patch)

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 3893444..dfe4acb 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1683,11 +1683,13 @@ 
 	  (match_operator 3 "comparison_operator"
 	    [(match_operand:VDQ 4 "register_operand")
 	     (match_operand:VDQ 5 "nonmemory_operand")])
-	  (match_operand:VDQ 1 "register_operand")
-	  (match_operand:VDQ 2 "register_operand")))]
+	  (match_operand:VDQ 1 "nonmemory_operand")
+	  (match_operand:VDQ 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
   int inverse = 0, has_zero_imm_form = 0;
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
   rtx mask = gen_reg_rtx (<MODE>mode);
 
   switch (GET_CODE (operands[3]))
@@ -1746,11 +1748,26 @@ 
     }
 
   if (inverse)
-    emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask, operands[2],
-				    operands[1]));
-  else
-    emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask, operands[1],
-				    operands[2]));
+    {
+      op1 = operands[2];
+      op2 = operands[1];
+    }
+
+    /* If we have (a = (b CMP c) ? -1 : 0);
+       Then we can simply move the generated mask.  */
+
+    if (op1 == CONSTM1_RTX (<V_cmp_result>mode)
+	&& op2 == CONST0_RTX (<V_cmp_result>mode))
+      emit_move_insn (operands[0], mask);
+    else
+      {
+	if (!REG_P (op1))
+	  op1 = force_reg (<MODE>mode, op1);
+	if (!REG_P (op2))
+	  op2 = force_reg (<MODE>mode, op2);
+	emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask,
+					       op1, op2));
+      }
 
   DONE;
 })
@@ -1761,13 +1778,15 @@ 
 	  (match_operator 3 "comparison_operator"
 	    [(match_operand:VDQF 4 "register_operand")
 	     (match_operand:VDQF 5 "nonmemory_operand")])
-	  (match_operand:VDQF 1 "register_operand")
-	  (match_operand:VDQF 2 "register_operand")))]
+	  (match_operand:VDQF 1 "nonmemory_operand")
+	  (match_operand:VDQF 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
   int inverse = 0;
   int use_zero_form = 0;
   int swap_bsl_operands = 0;
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
   rtx mask = gen_reg_rtx (<V_cmp_result>mode);
   rtx tmp = gen_reg_rtx (<V_cmp_result>mode);
 
@@ -1912,11 +1931,27 @@ 
     }
 
   if (swap_bsl_operands)
-    emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask, operands[2],
-				    operands[1]));
-  else
-    emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask, operands[1],
-				    operands[2]));
+    {
+      op1 = operands[2];
+      op2 = operands[1];
+    }
+
+    /* If we have (a = (b CMP c) ? -1 : 0);
+       Then we can simply move the generated mask.  */
+
+    if (op1 == CONSTM1_RTX (<V_cmp_result>mode)
+	&& op2 == CONST0_RTX (<V_cmp_result>mode))
+      emit_move_insn (operands[0], mask);
+    else
+      {
+	if (!REG_P (op1))
+	  op1 = force_reg (<MODE>mode, op1);
+	if (!REG_P (op2))
+	  op2 = force_reg (<MODE>mode, op2);
+	emit_insn (gen_aarch64_simd_bsl<mode> (operands[0], mask,
+					       op1, op2));
+      }
+
   DONE;
 })
 
@@ -1926,8 +1961,8 @@ 
 	  (match_operator 3 "comparison_operator"
 	    [(match_operand:VALL 4 "register_operand")
 	     (match_operand:VALL 5 "nonmemory_operand")])
-	  (match_operand:VALL 1 "register_operand")
-	  (match_operand:VALL 2 "register_operand")))]
+	  (match_operand:VALL 1 "nonmemory_operand")
+	  (match_operand:VALL 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
   emit_insn (gen_aarch64_vcond_internal<mode> (operands[0], operands[1],
@@ -1936,6 +1971,22 @@ 
   DONE;
 })
 
+(define_expand "vcond<v_cmp_result><mode>"
+  [(set (match_operand:<V_cmp_result> 0 "register_operand")
+	(if_then_else:<V_cmp_result>
+	  (match_operator 3 "comparison_operator"
+	    [(match_operand:VDQF 4 "register_operand")
+	     (match_operand:VDQF 5 "nonmemory_operand")])
+	  (match_operand:<V_cmp_result> 1 "nonmemory_operand")
+	  (match_operand:<V_cmp_result> 2 "nonmemory_operand")))]
+  "TARGET_SIMD"
+{
+  emit_insn (gen_aarch64_vcond_internal<v_cmp_result> (
+						operands[0], operands[1],
+						operands[2], operands[3],
+						operands[4], operands[5]));
+  DONE;
+})
 
 (define_expand "vcondu<mode><mode>"
   [(set (match_operand:VDQ 0 "register_operand")
@@ -1943,8 +1994,8 @@ 
 	  (match_operator 3 "comparison_operator"
 	    [(match_operand:VDQ 4 "register_operand")
 	     (match_operand:VDQ 5 "nonmemory_operand")])
-	  (match_operand:VDQ 1 "register_operand")
-	  (match_operand:VDQ 2 "register_operand")))]
+	  (match_operand:VDQ 1 "nonmemory_operand")
+	  (match_operand:VDQ 2 "nonmemory_operand")))]
   "TARGET_SIMD"
 {
   emit_insn (gen_aarch64_vcond_internal<mode> (operands[0], operands[1],