Two RTL CC tweaks for SVE pmore/plast conditions
diff mbox series

Message ID mptpnhsym1p.fsf@arm.com
State New
Headers show
Series
  • Two RTL CC tweaks for SVE pmore/plast conditions
Related show

Commit Message

Richard Sandiford Nov. 16, 2019, 1:42 p.m. UTC
SVE has two composite conditions:

  pmore == at least one bit set && last bit clear
  plast == no bits set || last bit set

So in general we generate them from:

  A: CC = test bits
  B: reg1 = first condition
  C: CC = test bits
  D: reg2 = second condition
  E: result = (reg1 op reg2)   where op is || or &&

To fold all this into a single test, we need to be able to remove
the redundant C (the cse.c patch) and then fold B, D and E down to
a single condition (the simplify-rtx.c patch).

The underlying conditions are unsigned, so the simplify-rtx.c part needs
to support both unsigned comparisons and AND.  However, to avoid opening
the can of worms that is ANDing FP comparisons for unordered inputs,
I've restricted the new AND handling to cases in which NaNs can be
ignored.  I think this is still a strict extension of what we have now,
it just doesn't go as far as it could.  Going further would need an
entirely different set of testcases so I think would make more sense
as separate work.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* cse.c (cse_insn): Delete no-op register moves too.
	* simplify-rtx.c (comparison_to_mask): Handle unsigned comparisons.
	Take a second comparison to control the value for NE.
	(mask_to_comparison): Handle unsigned comparisons.
	(simplify_logical_relational_operation): Likewise.  Update call
	to comparison_to_mask.  Handle AND if !HONOR_NANs.
	(simplify_binary_operation_1): Call the above for AND too.

gcc/testsuite/
	* gcc.target/aarch64/sve/acle/asm/ptest_pmore.c: New test.

Comments

Jeff Law Nov. 17, 2019, 4:58 p.m. UTC | #1
On 11/16/19 6:42 AM, Richard Sandiford wrote:
> SVE has two composite conditions:
> 
>   pmore == at least one bit set && last bit clear
>   plast == no bits set || last bit set
> 
> So in general we generate them from:
> 
>   A: CC = test bits
>   B: reg1 = first condition
>   C: CC = test bits
>   D: reg2 = second condition
>   E: result = (reg1 op reg2)   where op is || or &&
> 
> To fold all this into a single test, we need to be able to remove
> the redundant C (the cse.c patch) and then fold B, D and E down to
> a single condition (the simplify-rtx.c patch).
> 
> The underlying conditions are unsigned, so the simplify-rtx.c part needs
> to support both unsigned comparisons and AND.  However, to avoid opening
> the can of worms that is ANDing FP comparisons for unordered inputs,
> I've restricted the new AND handling to cases in which NaNs can be
> ignored.  I think this is still a strict extension of what we have now,
> it just doesn't go as far as it could.  Going further would need an
> entirely different set of testcases so I think would make more sense
> as separate work.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2019-11-16  Richard Sandiford  <richard.sandiford@arm.com>
> 
> gcc/
> 	* cse.c (cse_insn): Delete no-op register moves too.
> 	* simplify-rtx.c (comparison_to_mask): Handle unsigned comparisons.
> 	Take a second comparison to control the value for NE.
> 	(mask_to_comparison): Handle unsigned comparisons.
> 	(simplify_logical_relational_operation): Likewise.  Update call
> 	to comparison_to_mask.  Handle AND if !HONOR_NANs.
> 	(simplify_binary_operation_1): Call the above for AND too.
> 
> gcc/testsuite/
> 	* gcc.target/aarch64/sve/acle/asm/ptest_pmore.c: New test.
OK
jeff

Patch
diff mbox series

Index: gcc/cse.c
===================================================================
--- gcc/cse.c	2019-11-16 13:42:09.000000000 +0000
+++ gcc/cse.c	2019-11-16 13:42:09.653773983 +0000
@@ -4625,7 +4625,7 @@  cse_insn (rtx_insn *insn)
   for (i = 0; i < n_sets; i++)
     {
       bool repeat = false;
-      bool mem_noop_insn = false;
+      bool noop_insn = false;
       rtx src, dest;
       rtx src_folded;
       struct table_elt *elt = 0, *p;
@@ -5324,17 +5324,17 @@  cse_insn (rtx_insn *insn)
 	    }
 
 	  /* Similarly, lots of targets don't allow no-op
-	     (set (mem x) (mem x)) moves.  */
+	     (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
+	     might be impossible for certain registers (like CC registers).  */
 	  else if (n_sets == 1
-		   && MEM_P (trial)
-		   && MEM_P (dest)
+		   && (MEM_P (trial) || REG_P (trial))
 		   && rtx_equal_p (trial, dest)
 		   && !side_effects_p (dest)
 		   && (cfun->can_delete_dead_exceptions
 		       || insn_nothrow_p (insn)))
 	    {
 	      SET_SRC (sets[i].rtl) = trial;
-	      mem_noop_insn = true;
+	      noop_insn = true;
 	      break;
 	    }
 
@@ -5562,8 +5562,8 @@  cse_insn (rtx_insn *insn)
 	  sets[i].rtl = 0;
 	}
 
-      /* Similarly for no-op MEM moves.  */
-      else if (mem_noop_insn)
+      /* Similarly for no-op moves.  */
+      else if (noop_insn)
 	{
 	  if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
 	    cse_cfg_altered = true;
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2019-11-16 13:42:09.000000000 +0000
+++ gcc/simplify-rtx.c	2019-11-16 13:42:09.653773983 +0000
@@ -2125,12 +2125,17 @@  simplify_associative_operation (enum rtx
   return 0;
 }
 
-/* Return a mask describing the COMPARISON.  */
+/* Return a mask describing the COMPARISON.  Treat NE as unsigned
+   if OTHER_COMPARISON is.  */
 static int
-comparison_to_mask (enum rtx_code comparison)
+comparison_to_mask (rtx_code comparison, rtx_code other_comparison)
 {
   switch (comparison)
     {
+    case LTU:
+      return 32;
+    case GTU:
+      return 16;
     case LT:
       return 8;
     case GT:
@@ -2140,6 +2145,10 @@  comparison_to_mask (enum rtx_code compar
     case UNORDERED:
       return 1;
 
+    case LEU:
+      return 34;
+    case GEU:
+      return 18;
     case LTGT:
       return 12;
     case LE:
@@ -2156,7 +2165,10 @@  comparison_to_mask (enum rtx_code compar
     case ORDERED:
       return 14;
     case NE:
-      return 13;
+      return (other_comparison == LTU
+	      || other_comparison == LEU
+	      || other_comparison == GTU
+	      || other_comparison == GEU ? 48 : 13);
     case UNLE:
       return 11;
     case UNGE:
@@ -2173,6 +2185,10 @@  mask_to_comparison (int mask)
 {
   switch (mask)
     {
+    case 32:
+      return LTU;
+    case 16:
+      return GTU;
     case 8:
       return LT;
     case 4:
@@ -2182,6 +2198,10 @@  mask_to_comparison (int mask)
     case 1:
       return UNORDERED;
 
+    case 34:
+      return LEU;
+    case 18:
+      return GEU;
     case 12:
       return LTGT;
     case 10:
@@ -2197,6 +2217,7 @@  mask_to_comparison (int mask)
 
     case 14:
       return ORDERED;
+    case 48:
     case 13:
       return NE;
     case 11:
@@ -2216,8 +2237,9 @@  mask_to_comparison (int mask)
 simplify_logical_relational_operation (enum rtx_code code, machine_mode mode,
 				       rtx op0, rtx op1)
 {
-  /* We only handle IOR of two relational operations.  */
-  if (code != IOR)
+  /* We only handle AND if we can ignore unordered cases.  */
+  bool honor_nans_p = HONOR_NANS (GET_MODE (op0));
+  if (code != IOR && (code != AND || honor_nans_p))
     return 0;
 
   if (!(COMPARISON_P (op0) && COMPARISON_P (op1)))
@@ -2230,18 +2252,20 @@  simplify_logical_relational_operation (e
   enum rtx_code code0 = GET_CODE (op0);
   enum rtx_code code1 = GET_CODE (op1);
 
-  /* We don't handle unsigned comparisons currently.  */
-  if (code0 == LTU || code0 == GTU || code0 == LEU || code0 == GEU)
-    return 0;
-  if (code1 == LTU || code1 == GTU || code1 == LEU || code1 == GEU)
-    return 0;
+  int mask0 = comparison_to_mask (code0, code1);
+  int mask1 = comparison_to_mask (code1, code0);
+
+  /* Reject combinations of signed and unsigned comparisons,
+     with ORDERED being signed.  */
+  if (((mask0 & 13) && (mask1 & 48)) || ((mask1 & 13) && (mask0 & 48)))
+    return NULL_RTX;
 
-  int mask0 = comparison_to_mask (code0);
-  int mask1 = comparison_to_mask (code1);
+  int mask = (code == IOR ? mask0 | mask1 : mask0 & mask1);
 
-  int mask = mask0 | mask1;
+  if (mask == 0)
+    return const0_rtx;
 
-  if (mask == 15)
+  if (mask == 50 || mask == 15)
     return const_true_rtx;
 
   code = mask_to_comparison (mask);
@@ -3450,6 +3474,10 @@  simplify_binary_operation_1 (enum rtx_co
       tem = simplify_associative_operation (code, mode, op0, op1);
       if (tem)
 	return tem;
+
+      tem = simplify_logical_relational_operation (code, mode, op0, op1);
+      if (tem)
+	return tem;
       break;
 
     case UDIV:
Index: gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ptest_pmore.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/acle/asm/ptest_pmore.c	2019-11-16 13:42:09.653773983 +0000
@@ -0,0 +1,77 @@ 
+/* { dg-additional-options "-msve-vector-bits=scalable" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+#include "test_sve_acle.h"
+#include <stdbool.h>
+
+/*
+** test_bool_pmore:
+**	ptest	p0, p1\.b
+**	cset	[wx]0, pmore
+**	ret
+*/
+TEST_PTEST (test_bool_pmore, bool,
+	    x0 = svptest_any (p0, p1) & !svptest_last (p0, p1));
+
+/*
+** test_bool_plast:
+**	ptest	p0, p1\.b
+**	cset	[wx]0, plast
+**	ret
+*/
+TEST_PTEST (test_bool_plast, bool,
+	    x0 = !svptest_any (p0, p1) | svptest_last (p0, p1));
+
+/*
+** test_int_pmore:
+**	ptest	p0, p1\.b
+**	cset	[wx]0, pmore
+**	ret
+*/
+TEST_PTEST (test_int_pmore, int,
+	    x0 = svptest_any (p0, p1) & !svptest_last (p0, p1));
+
+/*
+** test_int_plast:
+**	ptest	p0, p1\.b
+**	cset	[wx]0, plast
+**	ret
+*/
+TEST_PTEST (test_int_plast, int,
+	    x0 = !svptest_any (p0, p1) | svptest_last (p0, p1));
+
+/*
+** test_int64_t_pmore:
+**	ptest	p0, p1\.b
+**	cset	[wx]0, pmore
+**	ret
+*/
+TEST_PTEST (test_int64_t_pmore, int64_t,
+	    x0 = svptest_any (p0, p1) & !svptest_last (p0, p1));
+
+/*
+** test_int64_t_plast:
+**	ptest	p0, p1\.b
+**	cset	[wx]0, plast
+**	ret
+*/
+TEST_PTEST (test_int64_t_plast, int64_t,
+	    x0 = !svptest_any (p0, p1) | svptest_last (p0, p1));
+
+/*
+** sel_pmore:
+**	ptest	p0, p1\.b
+**	csel	x0, (x0, x1, pmore|x1, x0, plast)
+**	ret
+*/
+TEST_PTEST (sel_pmore, int64_t,
+	    x0 = svptest_any (p0, p1) & !svptest_last (p0, p1) ? x0 : x1);
+
+/*
+** sel_plast:
+**	ptest	p0, p1\.b
+**	csel	x0, (x0, x1, plast|x1, x0, pmore)
+**	ret
+*/
+TEST_PTEST (sel_plast, int64_t,
+	    x0 = !svptest_any (p0, p1) | svptest_last (p0, p1) ? x0 : x1);