diff mbox

, Add support for PowerPC ISA 3.0 vector byte reverse instructions

Message ID 20170118015008.GA28903@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Jan. 18, 2017, 1:50 a.m. UTC
This patch adds support for adding built-in functions for the ISA 3.0 vector
byte reverse instructions (XXBR{Q,D,W,H}).

The vec_revb built-in function follows the specifications in the OpenPOWER ABI
for Linux Supplement Power Architecture 64-Bit ELF V2 ABI and reverses the
bytes in each vector element.  I added a GCC extension, so that vec_revb of a
vector unsigned char, vector signed char, or vector char argument would act the
same as a vector __int128_t or vector __uint128_t (i.e. reverse all of the
bytes).

In the course of working on this patch, I tried to get the code in the rs6000.c
function altivec_expand_vec_perm_const to generate these instructions, but
there doesn't seem to be any caller of it to implement the permutes.  I will
open a bug on this altivec_expand_vec_perm_const, and if these patches are
checked in, I will try to add support for the instructions.

I have checked this on a little endian power8 system (64-bit only), a big
endian power8 system (64-bit only), and a big endian power7 system (both 32-bit
and 64-bit), and there were no regressions.  Can I check this into the trunk?

[gcc]
2017-01-17  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
	__builtin_vec_revb builtins.
	* config/rs6000/rs6000-builtins.def (P9V_BUILTIN_XXBRQ_V16QI): Add
	built-in functions to support generation of the ISA 3.0 XXBR<x>
	vector byte reverse instructions.
	(P9V_BUILTIN_XXBRQ_V1TI): Likewise.
	(P9V_BUILTIN_XXBRD_V2DI): Likewise.
	(P9V_BUILTIN_XXBRD_V2DF): Likewise.
	(P9V_BUILTIN_XXBGW_V4SI): Likewise.
	(P9V_BUILTIN_XXBGW_V4SF): Likewise.
	(P9V_BUILTIN_XXBGH_V8HI): Likewise.
	(P9V_BUILTIN_VEC_REVB): Likewise.
	* config/rs6000/vsx.md (p9_xxbrq_v1ti): New insns/expanders to
	generate the ISA 3.0 XXBR<x> vector byte reverse instructions.
	(p9_xxbrq_v16qi): Likewise.
	(p9_xxbrd_<mode>, VSX_D iterator): Likewise.
	(p9_xxbrw_<mode>, VSX_W iterator): Likewise.
	(p9_xxbrh_v8hi): Likewise.
	* config/rs6000/altivec.h (vec_revb): Define if ISA 3.0.
	* doc/extend.texi (RS/6000 Altivec Built-ins): Document the
	vec_revb built-in functions.

[gcc/testsuite]
2017-01-13  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/p9-xxbr-1.c: New test.
	* gcc.target/powerpc/p9-xxbr-2.c: Likewise.

Comments

Segher Boessenkool Jan. 18, 2017, 5:21 p.m. UTC | #1
On Tue, Jan 17, 2017 at 08:50:08PM -0500, Michael Meissner wrote:
> I have checked this on a little endian power8 system (64-bit only), a big
> endian power8 system (64-bit only), and a big endian power7 system (both 32-bit
> and 64-bit), and there were no regressions.  Can I check this into the trunk?

Yes please.

One comment:

> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V16QI,
> +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V16QI,
> +    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V1TI,
> +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V1TI,
> +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DI,
> +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DI,
> +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DF,
> +    RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SI,
> +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SI,
> +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SF,
> +    RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRH_V8HI,
> +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
> +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRH_V8HI,
> +    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },

This is much harder to review (and read in general) than needed, because
it uses alphapetical ordering instead of something logical (like, all
integer together and ordered by size; all float together and ordered by
size).


Segher
Michael Meissner Jan. 18, 2017, 7:04 p.m. UTC | #2
On Wed, Jan 18, 2017 at 11:21:40AM -0600, Segher Boessenkool wrote:
> On Tue, Jan 17, 2017 at 08:50:08PM -0500, Michael Meissner wrote:
> > I have checked this on a little endian power8 system (64-bit only), a big
> > endian power8 system (64-bit only), and a big endian power7 system (both 32-bit
> > and 64-bit), and there were no regressions.  Can I check this into the trunk?
> 
> Yes please.
> 
> One comment:
> 
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V16QI,
> > +    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V16QI,
> > +    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V1TI,
> > +    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V1TI,
> > +    RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DI,
> > +    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DI,
> > +    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DF,
> > +    RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SI,
> > +    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SI,
> > +    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SF,
> > +    RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRH_V8HI,
> > +    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
> > +  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRH_V8HI,
> > +    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
> 
> This is much harder to review (and read in general) than needed, because
> it uses alphapetical ordering instead of something logical (like, all
> integer together and ordered by size; all float together and ordered by
> size).

I ordered it by the size of the vector element being swapped, i.e. XXBRQ comes
first, because it is swapping the bytes of one element; XXBRD comes next
swapping bytes in vector long/vector double; XXBRW comes next swapping bytes in
vector int/vector short, and finally XXBRH swapping bytes in vector short.
diff mbox

Patch

Index: gcc/config/rs6000/rs6000-c.c
===================================================================
--- gcc/config/rs6000/rs6000-c.c	(revision 244382)
+++ gcc/config/rs6000/rs6000-c.c	(working copy)
@@ -5016,6 +5016,31 @@  const struct altivec_builtin_types altiv
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
     RS6000_BTI_unsigned_V16QI, 0 },
 
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V16QI,
+    RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V16QI,
+    RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRQ_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_V1TI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRD_V2DF,
+    RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRW_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRH_V8HI,
+    RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0, 0 },
+  { P9V_BUILTIN_VEC_REVB, P9V_BUILTIN_XXBRH_V8HI,
+    RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 },
+
   /* Crypto builtins.  */
   { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V16QI,
     RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
Index: gcc/config/rs6000/rs6000-builtin.def
===================================================================
--- gcc/config/rs6000/rs6000-builtin.def	(revision 244382)
+++ gcc/config/rs6000/rs6000-builtin.def	(working copy)
@@ -1932,6 +1932,14 @@  BU_P9V_64BIT_VSX_1 (VSESDP,	"scalar_extr
 BU_P9V_VSX_1 (VSTDCNDP,	"scalar_test_neg_dp",	CONST,	xststdcnegdp)
 BU_P9V_VSX_1 (VSTDCNSP,	"scalar_test_neg_sp",	CONST,	xststdcnegsp)
 
+BU_P9V_VSX_1 (XXBRQ_V16QI,	"xxbrq_v16qi",	CONST,	p9_xxbrq_v16qi)
+BU_P9V_VSX_1 (XXBRQ_V1TI,	"xxbrq_v1ti",	CONST,	p9_xxbrq_v1ti)
+BU_P9V_VSX_1 (XXBRD_V2DI,	"xxbrd_v2di",	CONST,	p9_xxbrd_v2di)
+BU_P9V_VSX_1 (XXBRD_V2DF,	"xxbrd_v2df",	CONST,	p9_xxbrd_v2df)
+BU_P9V_VSX_1 (XXBRW_V4SI,	"xxbrw_v4si",	CONST,	p9_xxbrw_v4si)
+BU_P9V_VSX_1 (XXBRW_V4SF,	"xxbrw_v4sf",	CONST,	p9_xxbrw_v4sf)
+BU_P9V_VSX_1 (XXBRH_V8HI,	"xxbrh_v8hi",	CONST,	p9_xxbrh_v8hi)
+
 /* 2 argument vsx scalar functions added in ISA 3.0 (power9).  */
 BU_P9V_64BIT_VSX_2 (VSIEDP,	"scalar_insert_exp",	CONST,	xsiexpdp)
 
@@ -1951,6 +1959,8 @@  BU_P9V_OVERLOAD_1 (VSTDCN,	"scalar_test_
 BU_P9V_OVERLOAD_1 (VSTDCNDP,	"scalar_test_neg_dp")
 BU_P9V_OVERLOAD_1 (VSTDCNSP,	"scalar_test_neg_sp")
 
+BU_P9V_OVERLOAD_1 (REVB,	"revb")
+
 /* ISA 3.0 vector scalar overloaded 2 argument functions.  */
 BU_P9V_OVERLOAD_2 (VSIEDP,	"scalar_insert_exp")
 
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 244382)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -3899,6 +3899,51 @@  (define_insn "*vinsert4b_di_internal"
   [(set_attr "type" "vecperm")])
 
 
+;; Support for ISA 3.0 vector byte reverse
+
+;; Swap all bytes with in a vector
+(define_insn "p9_xxbrq_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")
+	(bswap:V1TI (match_operand:V1TI 1 "vsx_register_operand" "wa")))]
+  "TARGET_P9_VECTOR"
+  "xxbrq %x0,%x1"
+  [(set_attr "type" "vecperm")])
+
+(define_expand "p9_xxbrq_v16qi"
+  [(use (match_operand:V16QI 0 "vsx_register_operand" "=wa"))
+   (use (match_operand:V16QI 1 "vsx_register_operand" "=wa"))]
+  "TARGET_P9_VECTOR"
+{
+  rtx op0 = gen_lowpart (V1TImode, operands[0]);
+  rtx op1 = gen_lowpart (V1TImode, operands[1]);
+  emit_insn (gen_p9_xxbrq_v1ti (op0, op1));
+  DONE;
+})
+
+;; Swap all bytes in each 64-bit element
+(define_insn "p9_xxbrd_<mode>"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(bswap:VSX_D (match_operand:VSX_D 1 "vsx_register_operand" "wa")))]
+  "TARGET_P9_VECTOR"
+  "xxbrd %x0,%x1"
+  [(set_attr "type" "vecperm")])
+
+;; Swap all bytes in each 32-bit element
+(define_insn "p9_xxbrw_<mode>"
+  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
+	(bswap:VSX_W (match_operand:VSX_W 1 "vsx_register_operand" "wa")))]
+  "TARGET_P9_VECTOR"
+  "xxbrw %x0,%x1"
+  [(set_attr "type" "vecperm")])
+
+;; Swap all bytes in each 16-bit element
+(define_insn "p9_xxbrh_v8hi"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+	(bswap:V8HI (match_operand:V8HI 1 "vsx_register_operand" "wa")))]
+  "TARGET_P9_VECTOR"
+  "xxbrh %x0,%x1"
+  [(set_attr "type" "vecperm")])
+
 
 ;; Operand numbers for the following peephole2
 (define_constants
Index: gcc/config/rs6000/altivec.h
===================================================================
--- gcc/config/rs6000/altivec.h	(revision 244382)
+++ gcc/config/rs6000/altivec.h	(working copy)
@@ -440,6 +440,8 @@ 
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
+
+#define vec_revb __builtin_vec_revb
 #endif
 
 /* Predicates.
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 244382)
+++ gcc/doc/extend.texi	(working copy)
@@ -18179,6 +18179,34 @@  If any of the enabled test conditions is
 in the result vector is -1.  Otherwise (all of the enabled test
 conditions are false), the corresponding entry of the result vector is 0.
 
+If the ISA 3.0 instruction set additions (@option{-mcpu=power9})
+are available:
+@smallexample
+vector signed char vec_revb (vector signed char);
+vector unsigned char vec_revb (vector unsigned char);
+vector short vec_revb (vector short);
+vector unsigned short vec_revb (vector unsigned short);
+vector int vec_revb (vector int);
+vector unsigned int vec_revb (vector unsigned int);
+vector float vec_revb (vector float);
+vector long long vec_revb (vector long long);
+vector unsigned long long vec_revb (vector unsigned long long);
+vector double vec_revb (vector double);
+@end smallexample
+
+On 64-bit targets, if the ISA 3.0 additions (@option{-mcpu=power9})
+are available:
+@smallexample
+vector long vec_revb (vector long);
+vector unsigned long vec_revb (vector unsigned long);
+vector __int128_t vec_revb (vector __int128_t);
+vector __uint128_t vec_revb (vector __uint128_t);
+@end smallexample
+
+The @code{vec_revb} built-in function reverses the bytes on an element
+by element basis.  A vector of @code{vector unsigned char} or
+@code{vector signed char} reverses the bytes in the whole word.
+
 If the cryptographic instructions are enabled (@option{-mcrypto} or
 @option{-mcpu=power8}), the following builtins are enabled.
 
Index: gcc/testsuite/gcc.target/powerpc/p9-xxbr-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p9-xxbr-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p9-xxbr-1.c	(revision 0)
@@ -0,0 +1,67 @@ 
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+#include <altivec.h>
+
+/* Verify P9 vec_revb builtin generates the XXBR{Q,D,W,H} instructions.  */
+
+vector char
+rev_char (vector char a)
+{
+  return vec_revb (a);		/* XXBRQ.  */
+}
+
+vector signed char
+rev_schar (vector signed char a)
+{
+  return vec_revb (a);		/* XXBRQ.  */
+}
+
+vector unsigned char
+rev_uchar (vector unsigned char a)
+{
+  return vec_revb (a);		/* XXBRQ.  */
+}
+
+vector short
+rev_short (vector short a)
+{
+  return vec_revb (a);		/* XXBRH.  */
+}
+
+vector unsigned short
+rev_ushort (vector unsigned short a)
+{
+  return vec_revb (a);		/* XXBRH.  */
+}
+
+vector int
+rev_int (vector int a)
+{
+  return vec_revb (a);		/* XXBRW.  */
+}
+
+vector unsigned int
+rev_uint (vector unsigned int a)
+{
+  return vec_revb (a);		/* XXBRW.  */
+}
+
+vector float
+rev_float (vector float a)
+{
+  return vec_revb (a);		/* XXBRW.  */
+}
+
+vector double
+rev_double (vector double a)
+{
+  return vec_revb (a);		/* XXBRD.  */
+}
+
+/* { dg-final { scan-assembler-count "xxbrd" 1 } } */
+/* { dg-final { scan-assembler-count "xxbrh" 2 } } */
+/* { dg-final { scan-assembler-count "xxbrq" 3 } } */
+/* { dg-final { scan-assembler-count "xxbrw" 3 } } */
Index: gcc/testsuite/gcc.target/powerpc/p9-xxbr-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/p9-xxbr-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/p9-xxbr-2.c	(revision 0)
@@ -0,0 +1,36 @@ 
+/* { dg-do compile { target { powerpc64le-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O3" } */
+
+#include <altivec.h>
+
+/* Verify P9 vec_revb builtin generates the XXBR{Q,D,W,H} instructions.  This
+   test only tests the vector types that need a 64-bit environment.  */
+
+vector long
+rev_long (vector long a)
+{
+  return vec_revb (a);		/* XXBRD.  */
+}
+
+vector unsigned long
+rev_ulong (vector unsigned long a)
+{
+  return vec_revb (a);		/* XXBRD.  */
+}
+
+vector __int128_t
+rev_int128 (vector __int128_t a)
+{
+  return vec_revb (a);		/* XXBRQ.  */
+}
+
+vector __uint128_t
+rev_uint128 (vector __uint128_t a)
+{
+  return vec_revb (a);		/* XXBRQ.  */
+}
+
+/* { dg-final { scan-assembler-count "xxbrd" 2 } } */
+/* { dg-final { scan-assembler-count "xxbrq" 2 } } */