diff mbox

[rs6000] Be careful with special permute masks for little endian, take 2

Message ID 1382400200.6275.170.camel@gnopaine
State New
Headers show

Commit Message

Bill Schmidt Oct. 22, 2013, 12:03 a.m. UTC
Hi,

This is a revision of my earlier patch on the subject, expanded to catch
a few more cases and with some attendant test-case adjustments:

In altivec_expand_vec_perm_const, we look for special masks that match
the behavior of specific instructions, so we can use those instructions
rather than load a constant control vector and perform a permute.  Some
of the masks must be treated differently for little endian mode.

The masks that represent merge-high and merge-low operations have
reversed meanings in little-endian, because of the reversed ordering of
the vector elements.  

The masks that represent vector-pack operations remain correct when the
mode of the input operands matches the natural mode of the instruction,
but not otherwise.  This is because the pack instructions always select
the rightmost, low-order bits of the vector element.  There are cases
where we use this, for example, with a V8SI vector matching a vpkuwum
mask in order to select the odd-numbered elements of the vector.  In
little endian mode, this instruction will get us the even-numbered
elements instead.  There is no alternative instruction with the desired
behavior, so I've just disabled use of those masks for little endian
when the mode isn't natural.

This requires adjusting the altivec-perm-1.c test case.  The vector pack
tests are moved to a new altivec-perm-3.c test, which is restricted to
big-endian targets.

These changes fix 49 failures in the test suite for little endian mode
(9 vector failures left to go!).  Bootstrapped and tested on
powerpc64{,le}-unknown-linux-gnu with no new failures.  Is this ok for
trunk?

Thanks,
Bill


gcc:

2013-10-21  Bill Schmidt  <wschmidt@vnet.ibm.com>

	* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
	meaning of merge-high and merge-low masks for little endian; avoid
	use of vector-pack masks for little endian for mismatched modes.

gcc/testsuite:

2013-10-21  Bill Schmidt  <wschmidt@vnet.ibm.com>

	* gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack
	tests into...
	* gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is
	restricted to big-endian targets.

Comments

David Edelsohn Oct. 22, 2013, 2:22 p.m. UTC | #1
On Mon, Oct 21, 2013 at 8:03 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> This is a revision of my earlier patch on the subject, expanded to catch
> a few more cases and with some attendant test-case adjustments:
>
> In altivec_expand_vec_perm_const, we look for special masks that match
> the behavior of specific instructions, so we can use those instructions
> rather than load a constant control vector and perform a permute.  Some
> of the masks must be treated differently for little endian mode.
>
> The masks that represent merge-high and merge-low operations have
> reversed meanings in little-endian, because of the reversed ordering of
> the vector elements.
>
> The masks that represent vector-pack operations remain correct when the
> mode of the input operands matches the natural mode of the instruction,
> but not otherwise.  This is because the pack instructions always select
> the rightmost, low-order bits of the vector element.  There are cases
> where we use this, for example, with a V8SI vector matching a vpkuwum
> mask in order to select the odd-numbered elements of the vector.  In
> little endian mode, this instruction will get us the even-numbered
> elements instead.  There is no alternative instruction with the desired
> behavior, so I've just disabled use of those masks for little endian
> when the mode isn't natural.
>
> This requires adjusting the altivec-perm-1.c test case.  The vector pack
> tests are moved to a new altivec-perm-3.c test, which is restricted to
> big-endian targets.
>
> These changes fix 49 failures in the test suite for little endian mode
> (9 vector failures left to go!).  Bootstrapped and tested on
> powerpc64{,le}-unknown-linux-gnu with no new failures.  Is this ok for
> trunk?
>
> Thanks,
> Bill
>
>
> gcc:
>
> 2013-10-21  Bill Schmidt  <wschmidt@vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
>         meaning of merge-high and merge-low masks for little endian; avoid
>         use of vector-pack masks for little endian for mismatched modes.
>
> gcc/testsuite:
>
> 2013-10-21  Bill Schmidt  <wschmidt@vnet.ibm.com>
>
>         * gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack
>         tests into...
>         * gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is
>         restricted to big-endian targets.

Okay.

Thanks, David
diff mbox

Patch

Index: gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c	(revision 0)
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
+/* { dg-options "-O -maltivec -mno-vsx" } */
+
+typedef unsigned char V __attribute__((vector_size(16)));
+
+V p2(V x, V y)
+{
+  return __builtin_shuffle(x, y,
+	(V){ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
+
+}
+
+V p4(V x, V y)
+{
+  return __builtin_shuffle(x, y,
+	(V){ 2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
+}
+
+/* { dg-final { scan-assembler-not "vperm" } } */
+/* { dg-final { scan-assembler "vpkuhum" } } */
+/* { dg-final { scan-assembler "vpkuwum" } } */
Index: gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c	(revision 203792)
+++ gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c	(working copy)
@@ -19,19 +19,6 @@  V b4(V x)
   return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, });
 }
 
-V p2(V x, V y)
-{
-  return __builtin_shuffle(x, y,
-	(V){ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
-
-}
-
-V p4(V x, V y)
-{
-  return __builtin_shuffle(x, y,
-	(V){ 2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
-}
-
 V h1(V x, V y)
 {
   return __builtin_shuffle(x, y,
@@ -72,5 +59,3 @@  V l4(V x, V y)
 /* { dg-final { scan-assembler "vspltb" } } */
 /* { dg-final { scan-assembler "vsplth" } } */
 /* { dg-final { scan-assembler "vspltw" } } */
-/* { dg-final { scan-assembler "vpkuhum" } } */
-/* { dg-final { scan-assembler "vpkuwum" } } */
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 203792)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -28837,17 +28838,23 @@  altivec_expand_vec_perm_const (rtx operands[4])
       {  1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } },
     { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum,
       {  2,  3,  6,  7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb,
+    { OPTION_MASK_ALTIVEC, 
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb,
       {  0, 16,  1, 17,  2, 18,  3, 19,  4, 20,  5, 21,  6, 22,  7, 23 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh,
+    { OPTION_MASK_ALTIVEC,
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh,
       {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7, 22, 23 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw,
+    { OPTION_MASK_ALTIVEC,
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw,
       {  0,  1,  2,  3, 16, 17, 18, 19,  4,  5,  6,  7, 20, 21, 22, 23 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb,
+    { OPTION_MASK_ALTIVEC,
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb,
       {  8, 24,  9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh,
+    { OPTION_MASK_ALTIVEC,
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh,
       {  8,  9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } },
-    { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw,
+    { OPTION_MASK_ALTIVEC,
+      BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw,
       {  8,  9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
     { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
       {  0,  1,  2,  3, 16, 17, 18, 19,  8,  9, 10, 11, 24, 25, 26, 27 } },
@@ -28980,6 +28987,26 @@  altivec_expand_vec_perm_const (rtx operands[4])
 	  enum machine_mode omode = insn_data[icode].operand[0].mode;
 	  enum machine_mode imode = insn_data[icode].operand[1].mode;
 
+	  /* For little-endian, don't use vpkuwum and vpkuhum if the
+	     underlying vector type is not V4SI and V8HI, respectively.
+	     For example, using vpkuwum with a V8HI picks up the even
+	     halfwords (BE numbering) when the even halfwords (LE
+	     numbering) are what we need.  */
+	  if (!BYTES_BIG_ENDIAN
+	      && icode == CODE_FOR_altivec_vpkuwum
+	      && ((GET_CODE (op0) == REG
+		   && GET_MODE (op0) != V4SImode)
+		  || (GET_CODE (op0) == SUBREG
+		      && GET_MODE (XEXP (op0, 0)) != V4SImode)))
+	    continue;
+	  if (!BYTES_BIG_ENDIAN
+	      && icode == CODE_FOR_altivec_vpkuhum
+	      && ((GET_CODE (op0) == REG
+		   && GET_MODE (op0) != V8HImode)
+		  || (GET_CODE (op0) == SUBREG
+		      && GET_MODE (XEXP (op0, 0)) != V8HImode)))
+	    continue;
+
           /* For little-endian, the two input operands must be swapped
              (or swapped back) to ensure proper right-to-left numbering
              from 0 to 2N-1.  */