Patchwork [rtl] combine a vec_concat of 2 vec_selects from the same vector

login
register
mail settings
Submitter Marc Glisse
Date Sept. 30, 2012, 4:29 p.m.
Message ID <alpine.DEB.2.02.1209300954110.4012@stedding.saclay.inria.fr>
Download mbox | patch
Permalink /patch/188171/
State New
Headers show

Comments

Marc Glisse - Sept. 30, 2012, 4:29 p.m.
On Sat, 29 Sep 2012, Eric Botcazou wrote:

>> this patch lets the compiler try to rewrite:
>>
>> (vec_concat (vec_select x [a]) (vec_select x [b]))
>>
>> as:
>>
>> vec_select x [a b]
>>
>> or even just "x" if appropriate.
[...]
> OK, but:
>
> 1) Always add a comment describing the simplification when you add one,
>
> 2) The condition:
>
>> +           if (GET_MODE (XEXP (trueop0, 0)) == mode
>> +               && INTVAL (XVECEXP (XEXP (trueop1, 1), 0, 0))
>> +                  - INTVAL (XVECEXP (XEXP (trueop0, 1), 0, 0)) == 1)
>> +             return XEXP (trueop0, 0);
>
> can be simplified: if GET_MODE (XEXP (trueop0, 0)) == mode, then XEXP
> (trueop0, 0) is a 2-element vector so the only possible case is (0,1).
> That would probably even be more correct since you don't test CONST_INT_P for
> the indices, while the test is done in the VEC_SELECT case.

It looks like I was trying to be clever by replacing 2 understandable 
tests with a single more obscure one, bad idea.

> Why not generalizing to all kinds of VEC_SELECTs instead of just scalar ones?

Ok, I changed the patch a bit to handle arbitrary VEC_SELECTs, and moved 
the identity recognition to VEC_SELECT handling (where it belonged). 
Testing with non-scalar VEC_SELECTs was limited though, because they are 
not that easy to generate. Also, the identity case is the only one where 
it actually optimized. To handle more cases, I'd have to look through 
several layers of VEC_SELECTs, which gets a bit complicated (for instance, 
the permutation 0,1,3,2 will appear as a vec_concat of a 
vec_select(v,[0,1]) and a vec_select(vec_select(v,[2,3]),[1,0]), or worse 
with a vec_concat in the middle). It also didn't optimize 3,2,3,2, 
possibly because that meant substituting the same rtx twice (I didn't go 
that far in gdb). Then there is also the vec_duplicate case (I should try 
to replace vec_duplicate with vec_concat in simplify-rtx to see what 
happens...). Still, the identity case is nice to have.

Thanks for your comments.

bootstrap+testsuite on x86_64-linux-gnu with default languages.

2012-09-09  Marc Glisse  <marc.glisse@inria.fr>

gcc/
 	* simplify-rtx.c (simplify_binary_operation_1) <VEC_SELECT>:
 	Detect the identity.
 	<VEC_CONCAT>: Handle VEC_SELECTs from the same vector.

gcc/testsuite/
  	* gcc.target/i386/vect-rebuild.c: New testcase.
Eric Botcazou - Oct. 1, 2012, 7:56 a.m.
> 2012-09-09  Marc Glisse  <marc.glisse@inria.fr>
> 
> gcc/
>  	* simplify-rtx.c (simplify_binary_operation_1) <VEC_SELECT>:
>  	Detect the identity.
>  	<VEC_CONCAT>: Handle VEC_SELECTs from the same vector.
> 
> gcc/testsuite/
>   	* gcc.target/i386/vect-rebuild.c: New testcase.

OK if you adjust the above date and add the missing space at the end of:

/* Try to merge 2 VEC_SELECTs from the same vector into a single one. */
Marc Glisse - Oct. 1, 2012, 8:24 a.m.
On Mon, 1 Oct 2012, Eric Botcazou wrote:

>> 2012-09-09  Marc Glisse  <marc.glisse@inria.fr>
>>
>> gcc/
>>  	* simplify-rtx.c (simplify_binary_operation_1) <VEC_SELECT>:
>>  	Detect the identity.
>>  	<VEC_CONCAT>: Handle VEC_SELECTs from the same vector.
>>
>> gcc/testsuite/
>>   	* gcc.target/i386/vect-rebuild.c: New testcase.
>
> OK if you adjust the above date and add the missing space at the end of:
>
> /* Try to merge 2 VEC_SELECTs from the same vector into a single one. */

I was trying to avoid splitting in 2 lines, but ok I'll split.

Thank you for the quick reply,
Eric Botcazou - Oct. 1, 2012, 8:27 a.m.
> > /* Try to merge 2 VEC_SELECTs from the same vector into a single one. */
> 
> I was trying to avoid splitting in 2 lines, but ok I'll split.

Indeed.  Then you can remove the '2' above, it doesn't add much.
Marc Glisse - March 15, 2013, 10:33 p.m.
On Sun, 30 Sep 2012, Marc Glisse wrote:

> On Sat, 29 Sep 2012, Eric Botcazou wrote:
>
>>> this patch lets the compiler try to rewrite:
>>> 
>>> (vec_concat (vec_select x [a]) (vec_select x [b]))
>>> 
>>> as:
>>> 
>>> vec_select x [a b]
>>> 
>>> or even just "x" if appropriate.
[...]
>> Why not generalizing to all kinds of VEC_SELECTs instead of just scalar 
>> ones?
>
> Ok, I changed the patch a bit to handle arbitrary VEC_SELECTs, and moved the 
> identity recognition to VEC_SELECT handling (where it belonged). Testing with 
> non-scalar VEC_SELECTs was limited though, because they are not that easy to 
> generate. Also, the identity case is the only one where it actually 
> optimized. To handle more cases, I'd have to look through several layers of 
> VEC_SELECTs, which gets a bit complicated (for instance, the permutation 
> 0,1,3,2 will appear as a vec_concat of a vec_select(v,[0,1]) and a 
> vec_select(vec_select(v,[2,3]),[1,0]), or worse with a vec_concat in the 
> middle). It also didn't optimize 3,2,3,2, possibly because that meant 
> substituting the same rtx twice (I didn't go that far in gdb). Then there is 
> also the vec_duplicate case (I should try to replace vec_duplicate with 
> vec_concat in simplify-rtx to see what happens...). Still, the identity case 
> is nice to have.

I think I may have been a bit too hasty removing the restriction to the 
scalar case (and even that version was possibly wrong for some targets). 
For instance, if I have a vec_concat of 2 permutations of a vector, this 
will generate a vec_select with a result twice the size of its input, 
which will likely fail to be recognized. For now I have only managed to 
reach this situation in combine, where the unrecognizable expression is 
simply ignored. But I think simplify-rtx is used in less forgiving places 
(and even in combine it could cause optimizations to be missed).

My current understanding of simplify-rtx is that we should only do "safe" 
optimizations in it (make sure we only create expressions that every 
target will recognize), and if I want more advanced optimizations, I 
should do them elsewhere (not sure where). So I should probably at least 
restrict this one to the case where the result and XEXP (trueop0, 0) have 
the same mode.

Does that make sense? Or is the current code ok and I am worrying for 
nothing?

For reference, the conversation started here:
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00540.html
and I include a copy of the relevant part of the patch that was committed:

> 2012-09-09  Marc Glisse  <marc.glisse@inria.fr>
>
> gcc/
> 	* simplify-rtx.c (simplify_binary_operation_1)
> 	<VEC_CONCAT>: Handle VEC_SELECTs from the same vector.

+       /* Try to merge VEC_SELECTs from the same vector into a single one.  */
+       if (GET_CODE (trueop0) == VEC_SELECT
+           && GET_CODE (trueop1) == VEC_SELECT
+           && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)))
+         {
+           rtx par0 = XEXP (trueop0, 1);
+           rtx par1 = XEXP (trueop1, 1);
+           int len0 = XVECLEN (par0, 0);
+           int len1 = XVECLEN (par1, 0);
+           rtvec vec = rtvec_alloc (len0 + len1);
+           for (int i = 0; i < len0; i++)
+             RTVEC_ELT (vec, i) = XVECEXP (par0, 0, i);
+           for (int i = 0; i < len1; i++)
+             RTVEC_ELT (vec, len0 + i) = XVECEXP (par1, 0, i);
+           return simplify_gen_binary (VEC_SELECT, mode, XEXP (trueop0, 0),
+                                       gen_rtx_PARALLEL (VOIDmode, vec));
+         }
Eric Botcazou - March 16, 2013, 9:30 a.m.
> My current understanding of simplify-rtx is that we should only do "safe"
> optimizations in it (make sure we only create expressions that every
> target will recognize), and if I want more advanced optimizations, I
> should do them elsewhere (not sure where). So I should probably at least
> restrict this one to the case where the result and XEXP (trueop0, 0) have
> the same mode.

Yes, simplify-rtx should only canonicalize or simplify expressions.  Merging 2 
VEC_SELECTs into a single one seems to fall into the second category, but if 
no target can take advantage of it, that's probably a bit questionable indeed.

Patch

Index: gcc/testsuite/gcc.target/i386/vect-rebuild.c

===================================================================
--- gcc/testsuite/gcc.target/i386/vect-rebuild.c	(revision 0)

+++ gcc/testsuite/gcc.target/i386/vect-rebuild.c	(revision 0)

@@ -0,0 +1,33 @@ 

+/* { dg-do compile } */

+/* { dg-options "-O -mavx -fno-tree-forwprop" } */

+

+typedef double v2df __attribute__ ((__vector_size__ (16)));

+typedef double v4df __attribute__ ((__vector_size__ (32)));

+

+v2df f1 (v2df x)

+{

+  v2df xx = { x[0], x[1] };

+  return xx;

+}

+

+v4df f2 (v4df x)

+{

+  v4df xx = { x[0], x[1], x[2], x[3] };

+  return xx;

+}

+

+v2df g (v2df x)

+{

+  v2df xx = { x[1], x[0] };

+  return xx;

+}

+

+v2df h (v4df x)

+{

+  v2df xx = { x[2], x[3] };

+  return xx;

+}

+

+/* { dg-final { scan-assembler-not "unpck" } } */

+/* { dg-final { scan-assembler-times "\tv?permilpd\[ \t\]" 1 } } */

+/* { dg-final { scan-assembler-times "\tv?extractf128\[ \t\]" 1 } } */


Property changes on: gcc/testsuite/gcc.target/i386/vect-rebuild.c
___________________________________________________________________
Added: svn:keywords
   + Author Date Id Revision URL
Added: svn:eol-style
   + native

Index: gcc/simplify-rtx.c

===================================================================
--- gcc/simplify-rtx.c	(revision 191865)

+++ gcc/simplify-rtx.c	(working copy)

@@ -3239,20 +3239,37 @@  simplify_binary_operation_1 (enum rtx_co

 		  rtx x = XVECEXP (trueop1, 0, i);
 
 		  gcc_assert (CONST_INT_P (x));
 		  RTVEC_ELT (v, i) = CONST_VECTOR_ELT (trueop0,
 						       INTVAL (x));
 		}
 
 	      return gen_rtx_CONST_VECTOR (mode, v);
 	    }
 
+	  /* Recognize the identity.  */

+	  if (GET_MODE (trueop0) == mode)

+	    {

+	      bool maybe_ident = true;

+	      for (int i = 0; i < XVECLEN (trueop1, 0); i++)

+		{

+		  rtx j = XVECEXP (trueop1, 0, i);

+		  if (!CONST_INT_P (j) || INTVAL (j) != i)

+		    {

+		      maybe_ident = false;

+		      break;

+		    }

+		}

+	      if (maybe_ident)

+		return trueop0;

+	    }

+

 	  /* If we build {a,b} then permute it, build the result directly.  */
 	  if (XVECLEN (trueop1, 0) == 2
 	      && CONST_INT_P (XVECEXP (trueop1, 0, 0))
 	      && CONST_INT_P (XVECEXP (trueop1, 0, 1))
 	      && GET_CODE (trueop0) == VEC_CONCAT
 	      && GET_CODE (XEXP (trueop0, 0)) == VEC_CONCAT
 	      && GET_MODE (XEXP (trueop0, 0)) == mode
 	      && GET_CODE (XEXP (trueop0, 1)) == VEC_CONCAT
 	      && GET_MODE (XEXP (trueop0, 1)) == mode)
 	    {
@@ -3364,20 +3381,38 @@  simplify_binary_operation_1 (enum rtx_co

 		    if (!VECTOR_MODE_P (op1_mode))
 		      RTVEC_ELT (v, i) = trueop1;
 		    else
 		      RTVEC_ELT (v, i) = CONST_VECTOR_ELT (trueop1,
 							   i - in_n_elts);
 		  }
 	      }
 
 	    return gen_rtx_CONST_VECTOR (mode, v);
 	  }
+

+	/* Try to merge 2 VEC_SELECTs from the same vector into a single one. */

+	if (GET_CODE (trueop0) == VEC_SELECT

+	    && GET_CODE (trueop1) == VEC_SELECT

+	    && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop1, 0)))

+	  {

+	    rtx par0 = XEXP (trueop0, 1);

+	    rtx par1 = XEXP (trueop1, 1);

+	    int len0 = XVECLEN (par0, 0);

+	    int len1 = XVECLEN (par1, 0);

+	    rtvec vec = rtvec_alloc (len0 + len1);

+	    for (int i = 0; i < len0; i++)

+	      RTVEC_ELT (vec, i) = XVECEXP (par0, 0, i);

+	    for (int i = 0; i < len1; i++)

+	      RTVEC_ELT (vec, len0 + i) = XVECEXP (par1, 0, i);

+	    return simplify_gen_binary (VEC_SELECT, mode, XEXP (trueop0, 0),

+					gen_rtx_PARALLEL (VOIDmode, vec));

+	  }

       }
       return 0;
 
     default:
       gcc_unreachable ();
     }
 
   return 0;
 }