diff mbox

Fix up VEC_INTERLEAVE_*_EXPR folding and expansion for big endian (PR tree-optimization/51074)

Message ID 20111122143056.GR27242@tyan-ft48-01.lab.bos.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Nov. 22, 2011, 2:30 p.m. UTC
Hi!

VEC_INTERLEAVE_*_EXPR trees are unfortunately dependent on BYTES_BIG_ENDIAN,
what is HIGH vs. LOW is different based on endianity.
The only place that creates these in the IL is:
          if (BYTES_BIG_ENDIAN)
            {
              high_code = VEC_INTERLEAVE_HIGH_EXPR;
              low_code = VEC_INTERLEAVE_LOW_EXPR;
            }
          else
            {
              low_code = VEC_INTERLEAVE_HIGH_EXPR;
              high_code = VEC_INTERLEAVE_LOW_EXPR;
            }
          perm_stmt = gimple_build_assign_with_ops (high_code, perm_dest,
                                                    vect1, vect2);
...
so either folding (and expansion if only vec_perm* is supported) needs to
be adjusted as done in the patch below, or we'd need to rename them
to VEC_INTERLEAVE_{FIRST,SECOND}_EXPR or similar and adjust all the patterns
etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, tested on the
testcase using powerpc cross.  Ok for trunk?

2011-11-22  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/51074
	* fold-const.c (fold_binary_loc): Fix up VEC_INTERLEAVE_*_EXPR
	handling for BYTES_BIG_ENDIAN.
	* optabs.c (can_vec_perm_for_code_p): Likewise.

	* gcc.dg/vect/pr51074.c: New test.


	Jakub

Comments

Richard Henderson Nov. 22, 2011, 5:37 p.m. UTC | #1
On 11/22/2011 06:30 AM, Jakub Jelinek wrote:
> 	PR tree-optimization/51074
> 	* fold-const.c (fold_binary_loc): Fix up VEC_INTERLEAVE_*_EXPR
> 	handling for BYTES_BIG_ENDIAN.
> 	* optabs.c (can_vec_perm_for_code_p): Likewise.
> 
> 	* gcc.dg/vect/pr51074.c: New test.

As we discussed on IRC, I think this is ok for the short term.

However, this does mean that the representation of vec_interleave_*_expr is wrong on *all* big-endian targets, and the only reason we havn't seen this before now is that we don't generally get to fold them.

The only good way to clean up this mixup is to remove vec_interleave_*_expr entirely, and always use vec_perm_expr, whose semantics are not in doubt.  This may sound like it's too heavy-weight for stage3, but the situation at the moment is that we're relying on rtl-folding not to occur for these targets.

If we don't change the vectorizer and the targets for gcc 4.7, then I think we should disable rtl folding of vec_select and vec_merge, as these are the rtl patterns used to implement vec_interleave.

Thoughts?


r~
Richard Biener Dec. 1, 2011, 9:53 a.m. UTC | #2
On Tue, 22 Nov 2011, Jakub Jelinek wrote:

> Hi!
> 
> VEC_INTERLEAVE_*_EXPR trees are unfortunately dependent on BYTES_BIG_ENDIAN,
> what is HIGH vs. LOW is different based on endianity.

Huh, that looks bogus.  Both tree codes operate on registers and no
other codes care about "endianess" of vector registers.
(What about VEC_WIDEN_LSHIFT_{HI,LO}_EXPR?)

Can't we simply push the differece to expansion time?  Or even later?

Richard.

> The only place that creates these in the IL is:
>           if (BYTES_BIG_ENDIAN)
>             {
>               high_code = VEC_INTERLEAVE_HIGH_EXPR;
>               low_code = VEC_INTERLEAVE_LOW_EXPR;
>             }
>           else
>             {
>               low_code = VEC_INTERLEAVE_HIGH_EXPR;
>               high_code = VEC_INTERLEAVE_LOW_EXPR;
>             }
>           perm_stmt = gimple_build_assign_with_ops (high_code, perm_dest,
>                                                     vect1, vect2);
> ...
> so either folding (and expansion if only vec_perm* is supported) needs to
> be adjusted as done in the patch below, or we'd need to rename them
> to VEC_INTERLEAVE_{FIRST,SECOND}_EXPR or similar and adjust all the patterns
> etc.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, tested on the
> testcase using powerpc cross.  Ok for trunk?
> 
> 2011-11-22  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/51074
> 	* fold-const.c (fold_binary_loc): Fix up VEC_INTERLEAVE_*_EXPR
> 	handling for BYTES_BIG_ENDIAN.
> 	* optabs.c (can_vec_perm_for_code_p): Likewise.
> 
> 	* gcc.dg/vect/pr51074.c: New test.
> 
> --- gcc/fold-const.c.jj	2011-11-21 16:22:02.000000000 +0100
> +++ gcc/fold-const.c	2011-11-22 09:59:15.606739333 +0100
> @@ -13483,10 +13483,12 @@ fold_binary_loc (location_t loc,
>  		sel[i] = i * 2 + 1;
>  		break;
>  	      case VEC_INTERLEAVE_HIGH_EXPR:
> -		sel[i] = (i + nelts) / 2 + ((i & 1) ? nelts : 0);
> +		sel[i] = (i + (BYTES_BIG_ENDIAN ? 0 : nelts)) / 2
> +			 + ((i & 1) ? nelts : 0);
>  		break;
>  	      case VEC_INTERLEAVE_LOW_EXPR:
> -		sel[i] = i / 2 + ((i & 1) ? nelts : 0);
> +		sel[i] = (i + (BYTES_BIG_ENDIAN ? nelts : 0)) / 2
> +			 + ((i & 1) ? nelts : 0);
>  		break;
>  	      default:
>  		gcc_unreachable ();
> --- gcc/optabs.c.jj	2011-11-21 16:22:02.000000000 +0100
> +++ gcc/optabs.c	2011-11-22 10:17:04.820399126 +0100
> @@ -6932,9 +6932,9 @@ can_vec_perm_for_code_p (enum tree_code
>  	  break;
>  
>  	case VEC_INTERLEAVE_HIGH_EXPR:
> -	  alt = nelt / 2;
> -	  /* FALLTHRU */
>  	case VEC_INTERLEAVE_LOW_EXPR:
> +	  if ((BYTES_BIG_ENDIAN != 0) ^ (code == VEC_INTERLEAVE_HIGH_EXPR))
> +	    alt = nelt / 2;
>  	  for (i = 0; i < nelt / 2; ++i)
>  	    {
>  	      data[i * 2] = i + alt;
> --- gcc/testsuite/gcc.dg/vect/pr51074.c.jj	2011-11-22 10:22:44.247377928 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr51074.c	2011-11-22 10:22:16.000000000 +0100
> @@ -0,0 +1,24 @@
> +/* PR tree-optimization/51074 */
> +
> +#include "tree-vect.h"
> +
> +struct S { int a, b; } s[8];
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  for (i = 0; i < 8; i++)
> +    {
> +      s[i].b = 0;
> +      s[i].a = i;
> +    }
> +  asm volatile ("" : : : "memory");
> +  for (i = 0; i < 8; i++)
> +    if (s[i].b != 0 || s[i].a != i)
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> 
> 	Jakub
> 
>
Jakub Jelinek Dec. 1, 2011, 10:07 a.m. UTC | #3
On Thu, Dec 01, 2011 at 10:53:57AM +0100, Richard Guenther wrote:
> On Tue, 22 Nov 2011, Jakub Jelinek wrote:
> > VEC_INTERLEAVE_*_EXPR trees are unfortunately dependent on BYTES_BIG_ENDIAN,
> > what is HIGH vs. LOW is different based on endianity.
> 
> Huh, that looks bogus.  Both tree codes operate on registers and no
> other codes care about "endianess" of vector registers.
> (What about VEC_WIDEN_LSHIFT_{HI,LO}_EXPR?)
> 
> Can't we simply push the differece to expansion time?  Or even later?

As RTH said, the best fix is to remove VEC_INTERLEAVE_*_EXPR altogether
and just use VEC_PERM_EXPR always, it is redundant with that.  But that
might be too invasive for 4.8.

	Jakub
Richard Biener Dec. 1, 2011, 11:21 a.m. UTC | #4
On Thu, 1 Dec 2011, Jakub Jelinek wrote:

> On Thu, Dec 01, 2011 at 10:53:57AM +0100, Richard Guenther wrote:
> > On Tue, 22 Nov 2011, Jakub Jelinek wrote:
> > > VEC_INTERLEAVE_*_EXPR trees are unfortunately dependent on BYTES_BIG_ENDIAN,
> > > what is HIGH vs. LOW is different based on endianity.
> > 
> > Huh, that looks bogus.  Both tree codes operate on registers and no
> > other codes care about "endianess" of vector registers.
> > (What about VEC_WIDEN_LSHIFT_{HI,LO}_EXPR?)
> > 
> > Can't we simply push the differece to expansion time?  Or even later?
> 
> As RTH said, the best fix is to remove VEC_INTERLEAVE_*_EXPR altogether
> and just use VEC_PERM_EXPR always, it is redundant with that.  But that
> might be too invasive for 4.8.

Yes, sorry - I'm recovering from a 3 week e-mail lag ;)  I agree
using VEC_PERM_EXPR would be best - but that would also affect
backend patterns.  Can we have a middle-ground that leaves those
untouched?  We're still in stage 3, so fixing the bug with using
VEC_PERM_EXPR sounds appealing to me ;)

Thanks,
Richard.
Richard Henderson Dec. 1, 2011, 3:57 p.m. UTC | #5
On 12/01/2011 03:21 AM, Richard Guenther wrote:
> Yes, sorry - I'm recovering from a 3 week e-mail lag ;)  I agree
> using VEC_PERM_EXPR would be best - but that would also affect
> backend patterns.  Can we have a middle-ground that leaves those
> untouched?  We're still in stage 3, so fixing the bug with using
> VEC_PERM_EXPR sounds appealing to me ;)

If we agree that we want to fix this with vec_perm_expr, then we need a relatively minor patch to the  vectorizer, and cleanups in the targets.

In particular, powerpc, spu, and ia64 will need to recognize various constant permutations so that they  can continue using the specialized instructions for interleave.  This shouldn't be particularly difficult; a few testcases added to make sure we don't regress to full permutation wouldn't be amiss.

The x86 port is the only one that really does aggressive constant permutation pattern recognition atm.  That is, of course, because the ISA support for permutation there is all over the map and we had no choice.

I've already zapped the target patterns that expanded interleave/even_odd back into a permuation operation.

If we think this is ok for stage3, we can certainly give it a whack.  I'll take care of the backends if Jakub takes care of the vectorizer?


r~
diff mbox

Patch

--- gcc/fold-const.c.jj	2011-11-21 16:22:02.000000000 +0100
+++ gcc/fold-const.c	2011-11-22 09:59:15.606739333 +0100
@@ -13483,10 +13483,12 @@  fold_binary_loc (location_t loc,
 		sel[i] = i * 2 + 1;
 		break;
 	      case VEC_INTERLEAVE_HIGH_EXPR:
-		sel[i] = (i + nelts) / 2 + ((i & 1) ? nelts : 0);
+		sel[i] = (i + (BYTES_BIG_ENDIAN ? 0 : nelts)) / 2
+			 + ((i & 1) ? nelts : 0);
 		break;
 	      case VEC_INTERLEAVE_LOW_EXPR:
-		sel[i] = i / 2 + ((i & 1) ? nelts : 0);
+		sel[i] = (i + (BYTES_BIG_ENDIAN ? nelts : 0)) / 2
+			 + ((i & 1) ? nelts : 0);
 		break;
 	      default:
 		gcc_unreachable ();
--- gcc/optabs.c.jj	2011-11-21 16:22:02.000000000 +0100
+++ gcc/optabs.c	2011-11-22 10:17:04.820399126 +0100
@@ -6932,9 +6932,9 @@  can_vec_perm_for_code_p (enum tree_code
 	  break;
 
 	case VEC_INTERLEAVE_HIGH_EXPR:
-	  alt = nelt / 2;
-	  /* FALLTHRU */
 	case VEC_INTERLEAVE_LOW_EXPR:
+	  if ((BYTES_BIG_ENDIAN != 0) ^ (code == VEC_INTERLEAVE_HIGH_EXPR))
+	    alt = nelt / 2;
 	  for (i = 0; i < nelt / 2; ++i)
 	    {
 	      data[i * 2] = i + alt;
--- gcc/testsuite/gcc.dg/vect/pr51074.c.jj	2011-11-22 10:22:44.247377928 +0100
+++ gcc/testsuite/gcc.dg/vect/pr51074.c	2011-11-22 10:22:16.000000000 +0100
@@ -0,0 +1,24 @@ 
+/* PR tree-optimization/51074 */
+
+#include "tree-vect.h"
+
+struct S { int a, b; } s[8];
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  for (i = 0; i < 8; i++)
+    {
+      s[i].b = 0;
+      s[i].a = i;
+    }
+  asm volatile ("" : : : "memory");
+  for (i = 0; i < 8; i++)
+    if (s[i].b != 0 || s[i].a != i)
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */