Patchwork [rtl] combine concat+shuffle

login
register
mail settings
Submitter Marc Glisse
Date May 7, 2012, 8:16 p.m.
Message ID <alpine.DEB.2.02.1205072120220.3230@laptop-mg.saclay.inria.fr>
Download mbox | patch
Permalink /patch/157411/
State New
Headers show

Comments

Marc Glisse - May 7, 2012, 8:16 p.m.
Hello,

this patch combines for vectors a concat and a shuffle. An example on x86 
would be:

__m128d f(double d){
   __m128d x=_mm_setr_pd(-d,d);
   return _mm_shuffle_pd(x,x,1);
}

which was compiled as:

 	vmovsd	.LC0(%rip), %xmm1
 	vxorpd	%xmm0, %xmm1, %xmm1
 	vunpcklpd	%xmm0, %xmm1, %xmm0
 	vshufpd	$1, %xmm0, %xmm0, %xmm0

and with the patch:

 	vmovsd	.LC0(%rip), %xmm1
 	vxorpd	%xmm0, %xmm1, %xmm1
 	vunpcklpd	%xmm1, %xmm0, %xmm0

This happens a lot in my code, for interval arithmetics, where I have a 
number d, build an interval (-d,d) from it, then subtract that interval 
from an other one, and subtraction is implemented as shufpd+addpd.

The patch is quite specialized, but I guessed I could start there, and it 
can always be generalized later.

For the testsuite, since the patch is not in a particular target, it would 
be better to have a generic test (in gcc.dg?), but I don't really know how 
to write a generic one, so would a test in gcc.target/i386 that scans 
the asm for shuf or perm be ok?

Ah, and if I use __builtin_shuffle instead of _mm_shuffle_pd, the patch 
works without -mavx, but -mavx uses vpermilpd (ie a vec_select:V2DF 
(reg:V2DF) ...) instead of a vshufpd, so I'll probably want to handle that 
too later. I thought about doing a general transformation from 
vec_select(vec_concat(x,x),*) to vec_select(x,*) (reducing the indexes in 
* so they fit), but that seemed way too dangerous.
Richard Sandiford - May 8, 2012, 9:39 a.m.
Looks like a good idea.

Marc Glisse <marc.glisse@inria.fr> writes:
> For the testsuite, since the patch is not in a particular target, it would 
> be better to have a generic test (in gcc.dg?), but I don't really know how 
> to write a generic one, so would a test in gcc.target/i386 that scans 
> the asm for shuf or perm be ok?

Tree-level vectorisation tests tend to go in gcc.dg/vector, but it's
hard to generalise rtl-level transforms like these.  An x86-only test
sounds good to me FWIW.

> Index: simplify-rtx.c
> ===================================================================
> --- simplify-rtx.c	(revision 187228)
> +++ simplify-rtx.c	(working copy)
> @@ -3268,10 +3268,32 @@ simplify_binary_operation_1 (enum rtx_co
>
>  	  if (GET_MODE (vec) == mode)
>  	    return vec;
>  	}
> 
> +      /* If we build {a,b} then permute it, build the result directly.  */
> +      if (XVECLEN (trueop1, 0) == 2
> +	  && CONST_INT_P (XVECEXP (trueop1, 0, 0))
> +	  && CONST_INT_P (XVECEXP (trueop1, 0, 1))
> +	  && GET_CODE (trueop0) == VEC_CONCAT
> +	  && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop0, 1))
> +	  && GET_CODE (XEXP (trueop0, 0)) == VEC_CONCAT
> +	  && GET_MODE (XEXP (trueop0, 0)) == mode)
> +	{
> +	  int offset0 = INTVAL (XVECEXP (trueop1, 0, 0)) % 2;
> +	  int offset1 = INTVAL (XVECEXP (trueop1, 0, 1)) % 2;
> +	  rtx baseop  = XEXP (trueop0, 0);
> +	  rtx baseop0 = XEXP (baseop , 0);
> +	  rtx baseop1 = XEXP (baseop , 1);
> +	  baseop0 = avoid_constant_pool_reference (baseop0);
> +	  baseop1 = avoid_constant_pool_reference (baseop1);
> +
> +	  return simplify_gen_binary (VEC_CONCAT, mode,
> +	     offset0 ? baseop1 : baseop0,
> +	     offset1 ? baseop1 : baseop0);
> +	}
> +

I know you said that generalising it could be done later,
and that's fine, but it looks in some ways like it would
be easier to go straight for the more general:

	  && GET_CODE (trueop0) == VEC_CONCAT
	  && GET_CODE (XEXP (trueop0, 0)) == VEC_CONCAT
	  && GET_MODE (XEXP (trueop0, 0)) == mode
	  && GET_CODE (XEXP (trueop0, 1)) == VEC_CONCAT
	  && GET_MODE (XEXP (trueop0, 1)) == mode)
	{
	  unsigned int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
	  unsigned int i1 = INTVAL (XVECEXP (trueop1, 0, 1));
	  rtx op0, op1;

	  gcc_assert (i0 < 4 && i1 < 4);
	  op0 = XEXP (XEXP (trueop0, i0 / 2), i0 % 2);
	  op1 = XEXP (XEXP (trueop0, i1 / 2), i1 % 2);

	  return simplify_gen_binary (VEC_CONCAT, mode, op0, op1);
	}

(completely untested).  avoid_constant_pool_reference shouldn't be
called here.

Very minor, but this code probably belongs in the else part of the
if (!VECTOR_MODE_P (mode)) block.

Richard
Marc Glisse - May 8, 2012, 10:30 a.m.
On Tue, 8 May 2012, Richard Sandiford wrote:

> I know you said that generalising it could be done later,
> and that's fine, but it looks in some ways like it would
> be easier to go straight for the more general:
>
> 	  && GET_CODE (trueop0) == VEC_CONCAT
> 	  && GET_CODE (XEXP (trueop0, 0)) == VEC_CONCAT
> 	  && GET_MODE (XEXP (trueop0, 0)) == mode
> 	  && GET_CODE (XEXP (trueop0, 1)) == VEC_CONCAT
> 	  && GET_MODE (XEXP (trueop0, 1)) == mode)
> 	{
> 	  unsigned int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
> 	  unsigned int i1 = INTVAL (XVECEXP (trueop1, 0, 1));
> 	  rtx op0, op1;
>
> 	  gcc_assert (i0 < 4 && i1 < 4);
> 	  op0 = XEXP (XEXP (trueop0, i0 / 2), i0 % 2);
> 	  op1 = XEXP (XEXP (trueop0, i1 / 2), i1 % 2);
>
> 	  return simplify_gen_binary (VEC_CONCAT, mode, op0, op1);
> 	}

Yes, I hesitated.

> (completely untested).  avoid_constant_pool_reference shouldn't be
> called here.

I wasn't quite sure what it was for, but it looked safer to call it for 
nothing than to forget it ;-)

> Very minor, but this code probably belongs in the else part of the
> if (!VECTOR_MODE_P (mode)) block.

Thanks, I'll update the patch with your comments, add a testcase and 
ChangeLog and re-send it here.
Richard Sandiford - May 8, 2012, 11:08 a.m.
Marc Glisse <marc.glisse@inria.fr> writes:
> On Tue, 8 May 2012, Richard Sandiford wrote:
>> I know you said that generalising it could be done later,
>> and that's fine, but it looks in some ways like it would
>> be easier to go straight for the more general:
>>
>> 	  && GET_CODE (trueop0) == VEC_CONCAT
>> 	  && GET_CODE (XEXP (trueop0, 0)) == VEC_CONCAT
>> 	  && GET_MODE (XEXP (trueop0, 0)) == mode
>> 	  && GET_CODE (XEXP (trueop0, 1)) == VEC_CONCAT
>> 	  && GET_MODE (XEXP (trueop0, 1)) == mode)
>> 	{
>> 	  unsigned int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
>> 	  unsigned int i1 = INTVAL (XVECEXP (trueop1, 0, 1));
>> 	  rtx op0, op1;
>>
>> 	  gcc_assert (i0 < 4 && i1 < 4);
>> 	  op0 = XEXP (XEXP (trueop0, i0 / 2), i0 % 2);
>> 	  op1 = XEXP (XEXP (trueop0, i1 / 2), i1 % 2);
>>
>> 	  return simplify_gen_binary (VEC_CONCAT, mode, op0, op1);
>> 	}
>
> Yes, I hesitated.

Realised afterwards that both versions need to check
GET_MODE_NUNITS (mode) == 2, because we're requiring OP0 and OP1
to be scalar.  Sorry for not noticing first time.

Richard
Marc Glisse - May 8, 2012, 12:08 p.m.
On Tue, 8 May 2012, Richard Sandiford wrote:

> Marc Glisse <marc.glisse@inria.fr> writes:
>> On Tue, 8 May 2012, Richard Sandiford wrote:
>>> I know you said that generalising it could be done later,
>>> and that's fine, but it looks in some ways like it would
>>> be easier to go straight for the more general:
>>>
>>> 	  && GET_CODE (trueop0) == VEC_CONCAT
>>> 	  && GET_CODE (XEXP (trueop0, 0)) == VEC_CONCAT
>>> 	  && GET_MODE (XEXP (trueop0, 0)) == mode
>>> 	  && GET_CODE (XEXP (trueop0, 1)) == VEC_CONCAT
>>> 	  && GET_MODE (XEXP (trueop0, 1)) == mode)
>>> 	{
>>> 	  unsigned int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
>>> 	  unsigned int i1 = INTVAL (XVECEXP (trueop1, 0, 1));
>>> 	  rtx op0, op1;
>>>
>>> 	  gcc_assert (i0 < 4 && i1 < 4);
>>> 	  op0 = XEXP (XEXP (trueop0, i0 / 2), i0 % 2);
>>> 	  op1 = XEXP (XEXP (trueop0, i1 / 2), i1 % 2);
>>>
>>> 	  return simplify_gen_binary (VEC_CONCAT, mode, op0, op1);
>>> 	}
>>
>> Yes, I hesitated.
>
> Realised afterwards that both versions need to check
> GET_MODE_NUNITS (mode) == 2, because we're requiring OP0 and OP1
> to be scalar.  Sorry for not noticing first time.

I thought that was a consequence of

XVECLEN (trueop1, 0) == 2

(in the lines before your first &&)
ie the result (which has mode "mode") is a vec_select of 2 objects.
Richard Sandiford - May 8, 2012, 12:56 p.m.
Marc Glisse <marc.glisse@inria.fr> writes:
>> Realised afterwards that both versions need to check
>> GET_MODE_NUNITS (mode) == 2, because we're requiring OP0 and OP1
>> to be scalar.  Sorry for not noticing first time.
>
> I thought that was a consequence of
>
> XVECLEN (trueop1, 0) == 2
>
> (in the lines before your first &&)
> ie the result (which has mode "mode") is a vec_select of 2 objects.

Yeah, you're right of course.

Richard

Patch

Index: simplify-rtx.c

===================================================================
--- simplify-rtx.c	(revision 187228)

+++ simplify-rtx.c	(working copy)

@@ -3268,10 +3268,32 @@  simplify_binary_operation_1 (enum rtx_co

 
 	  if (GET_MODE (vec) == mode)
 	    return vec;
 	}
 
+      /* If we build {a,b} then permute it, build the result directly.  */

+      if (XVECLEN (trueop1, 0) == 2

+	  && CONST_INT_P (XVECEXP (trueop1, 0, 0))

+	  && CONST_INT_P (XVECEXP (trueop1, 0, 1))

+	  && GET_CODE (trueop0) == VEC_CONCAT

+	  && rtx_equal_p (XEXP (trueop0, 0), XEXP (trueop0, 1))

+	  && GET_CODE (XEXP (trueop0, 0)) == VEC_CONCAT

+	  && GET_MODE (XEXP (trueop0, 0)) == mode)

+	{

+	  int offset0 = INTVAL (XVECEXP (trueop1, 0, 0)) % 2;

+	  int offset1 = INTVAL (XVECEXP (trueop1, 0, 1)) % 2;

+	  rtx baseop  = XEXP (trueop0, 0);

+	  rtx baseop0 = XEXP (baseop , 0);

+	  rtx baseop1 = XEXP (baseop , 1);

+	  baseop0 = avoid_constant_pool_reference (baseop0);

+	  baseop1 = avoid_constant_pool_reference (baseop1);

+

+	  return simplify_gen_binary (VEC_CONCAT, mode,

+	     offset0 ? baseop1 : baseop0,

+	     offset1 ? baseop1 : baseop0);

+	}

+

       return 0;
     case VEC_CONCAT:
       {
 	enum machine_mode op0_mode = (GET_MODE (trueop0) != VOIDmode
 				      ? GET_MODE (trueop0)