Patchwork [rtl] combine a vec_concat of 2 vec_selects from the same vector

login
register
mail settings
Submitter Marc Glisse
Date Sept. 9, 2012, 11:13 a.m.
Message ID <alpine.DEB.2.02.1209091258520.5537@stedding.saclay.inria.fr>
Download mbox | patch
Permalink /patch/182613/
State New
Headers show

Comments

Marc Glisse - Sept. 9, 2012, 11:13 a.m.
Hello,

this patch lets the compiler try to rewrite:

(vec_concat (vec_select x [a]) (vec_select x [b]))

as:

vec_select x [a b]

or even just "x" if appropriate.

In a first iteration I was restricting it to b-a==1, but it seemed better 
not to: it helps for {v[1],v[0]} and doesn't change anything for unknown 
patterns.

Note that I am planning to do a similar optimization at tree level, but it 
shouldn't make this one useless because such patterns can be created 
during rtl passes. The testcase may need an additional -fno-tree-xxx to 
still be useful at that point though.


bootstrap+testsuite on x86_64-linux-gnu.

2012-09-09  Marc Glisse  <marc.glisse@inria.fr>

gcc/
 	* simplify-rtx.c (simplify_binary_operation_1): Handle vec_concat
 	of vec_selects from the same vector.

gcc/testsuite/
 	* gcc.target/i386/vect-rebuild.c: New testcase.
Marc Glisse - Sept. 11, 2012, 3:55 p.m.
On Sun, 9 Sep 2012, Marc Glisse wrote:

> Hello,
>
> this patch lets the compiler try to rewrite:
>
> (vec_concat (vec_select x [a]) (vec_select x [b]))
>
> as:
>
> vec_select x [a b]
>
> or even just "x" if appropriate.
>
> In a first iteration I was restricting it to b-a==1, but it seemed better not 
> to: it helps for {v[1],v[0]} and doesn't change anything for unknown 
> patterns.
>
> Note that I am planning to do a similar optimization at tree level, but it 
> shouldn't make this one useless because such patterns can be created during 
> rtl passes. The testcase may need an additional -fno-tree-xxx to still be 
> useful at that point though.

Since the tree-ssa patch was reviewed faster, assume there is a 
-fno-tree-forwprop in dg-options for the testcase.

> bootstrap+testsuite on x86_64-linux-gnu.
>
> 2012-09-09  Marc Glisse  <marc.glisse@inria.fr>
>
> gcc/
> 	* simplify-rtx.c (simplify_binary_operation_1): Handle vec_concat
> 	of vec_selects from the same vector.
>
> gcc/testsuite/
> 	* gcc.target/i386/vect-rebuild.c: New testcase.

http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00540.html
Marc Glisse - Sept. 26, 2012, 2:52 p.m.
Hello,

RTL optimizers looks like a promising category, and Eric like a nice 
maintainer for it, adding in Cc: ;-)

(I bothered Richard Sandiford last time)

On Tue, 11 Sep 2012, Marc Glisse wrote:

> On Sun, 9 Sep 2012, Marc Glisse wrote:
>
>> Hello,
>> 
>> this patch lets the compiler try to rewrite:
>> 
>> (vec_concat (vec_select x [a]) (vec_select x [b]))
>> 
>> as:
>> 
>> vec_select x [a b]
>> 
>> or even just "x" if appropriate.
>> 
>> In a first iteration I was restricting it to b-a==1, but it seemed better 
>> not to: it helps for {v[1],v[0]} and doesn't change anything for unknown 
>> patterns.
>> 
>> Note that I am planning to do a similar optimization at tree level, but it 
>> shouldn't make this one useless because such patterns can be created during 
>> rtl passes. The testcase may need an additional -fno-tree-xxx to still be 
>> useful at that point though.
>
> Since the tree-ssa patch was reviewed faster, assume there is a 
> -fno-tree-forwprop in dg-options for the testcase.
>
>> bootstrap+testsuite on x86_64-linux-gnu.
>> 
>> 2012-09-09  Marc Glisse  <marc.glisse@inria.fr>
>> 
>> gcc/
>> 	* simplify-rtx.c (simplify_binary_operation_1): Handle vec_concat
>> 	of vec_selects from the same vector.
>> 
>> gcc/testsuite/
>> 	* gcc.target/i386/vect-rebuild.c: New testcase.
>
> http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00540.html
Eric Botcazou - Sept. 29, 2012, 1:07 p.m.
> this patch lets the compiler try to rewrite:
> 
> (vec_concat (vec_select x [a]) (vec_select x [b]))
> 
> as:
> 
> vec_select x [a b]
> 
> or even just "x" if appropriate.
> 
> In a first iteration I was restricting it to b-a==1, but it seemed better
> not to: it helps for {v[1],v[0]} and doesn't change anything for unknown
> patterns.
> 
> Note that I am planning to do a similar optimization at tree level, but it
> shouldn't make this one useless because such patterns can be created
> during rtl passes. The testcase may need an additional -fno-tree-xxx to
> still be useful at that point though.
> 
> 
> bootstrap+testsuite on x86_64-linux-gnu.
> 
> 2012-09-09  Marc Glisse  <marc.glisse@inria.fr>
> 
> gcc/
>  	* simplify-rtx.c (simplify_binary_operation_1): Handle vec_concat
>  	of vec_selects from the same vector.

	* simplify-rtx.c (simplify_binary_operation_1) <VEC_CONCAT>: Handle
	VEC_SELECTs from the same vector.

> gcc/testsuite/
>  	* gcc.target/i386/vect-rebuild.c: New testcase.

OK, but:

 1) Always add a comment describing the simplification when you add one,

 2) The condition:

> +           if (GET_MODE (XEXP (trueop0, 0)) == mode
> +               && INTVAL (XVECEXP (XEXP (trueop1, 1), 0, 0))
> +                  - INTVAL (XVECEXP (XEXP (trueop0, 1), 0, 0)) == 1)
> +             return XEXP (trueop0, 0);

can be simplified: if GET_MODE (XEXP (trueop0, 0)) == mode, then XEXP 
(trueop0, 0) is a 2-element vector so the only possible case is (0,1).
That would probably even be more correct since you don't test CONST_INT_P for 
the indices, while the test is done in the VEC_SELECT case.

Why not generalizing to all kinds of VEC_SELECTs instead of just scalar ones?

Patch

Index: simplify-rtx.c

===================================================================
--- simplify-rtx.c	(revision 191106)

+++ simplify-rtx.c	(working copy)

@@ -3357,20 +3357,38 @@  simplify_binary_operation_1 (enum rtx_co

 		    if (!VECTOR_MODE_P (op1_mode))
 		      RTVEC_ELT (v, i) = trueop1;
 		    else
 		      RTVEC_ELT (v, i) = CONST_VECTOR_ELT (trueop1,
 							   i - in_n_elts);
 		  }
 	      }
 
 	    return gen_rtx_CONST_VECTOR (mode, v);
 	  }
+

+	if (GET_CODE (trueop0) == VEC_SELECT

+	    && GET_CODE (trueop1) == VEC_SELECT

+	    && !VECTOR_MODE_P (op0_mode)

+	    && !VECTOR_MODE_P (op1_mode)

+	    && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)))

+	  {

+	    if (GET_MODE (XEXP (trueop0, 0)) == mode

+		&& INTVAL (XVECEXP (XEXP (trueop1, 1), 0, 0))

+		   - INTVAL (XVECEXP (XEXP (trueop0, 1), 0, 0)) == 1)

+	      return XEXP (trueop0, 0);

+

+	    rtvec vec = rtvec_alloc (2);

+	    RTVEC_ELT (vec, 0) = XVECEXP (XEXP (trueop0, 1), 0, 0);

+	    RTVEC_ELT (vec, 1) = XVECEXP (XEXP (trueop1, 1), 0, 0);

+	    return simplify_gen_binary (VEC_SELECT, mode, XEXP (trueop0, 0),

+					gen_rtx_PARALLEL (VOIDmode, vec));

+	  }

       }
       return 0;
 
     default:
       gcc_unreachable ();
     }
 
   return 0;
 }
 
Index: testsuite/gcc.target/i386/vect-rebuild.c

===================================================================
--- testsuite/gcc.target/i386/vect-rebuild.c	(revision 0)

+++ testsuite/gcc.target/i386/vect-rebuild.c	(revision 0)

@@ -0,0 +1,27 @@ 

+/* { dg-do compile } */

+/* { dg-options "-O -mavx" } */

+

+typedef double v2df __attribute__ ((__vector_size__ (16)));

+typedef double v4df __attribute__ ((__vector_size__ (32)));

+

+v2df f (v2df x)

+{

+  v2df xx = { x[0], x[1] };

+  return xx;

+}

+

+v2df g (v2df x)

+{

+  v2df xx = { x[1], x[0] };

+  return xx;

+}

+

+v2df h (v4df x)

+{

+  v2df xx = { x[2], x[3] };

+  return xx;

+}

+

+/* { dg-final { scan-assembler-not "unpck" } } */

+/* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */

+/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */