diff mbox series

rs6000: Add new __builtin_vsx_build_pair and __builtin_mma_build_acc built-ins

Message ID 17bba0c5-28b9-6085-f70b-630d881062f1@linux.ibm.com
State New
Headers show
Series rs6000: Add new __builtin_vsx_build_pair and __builtin_mma_build_acc built-ins | expand

Commit Message

Peter Bergner June 9, 2021, 9:05 p.m. UTC
The __builtin_vsx_assemble_pair and __builtin_mma_assemble_acc built-ins
currently assign their first source operand to the first VSX register
in a pair/quad, their second operand to the second register in a pair/quad, etc.
This is not endian friendly and forces the user to generate different calls
depending on endianness.  In agreement with the POWER LLVM team, we've
decided to lightly deprecate the assemble built-ins and replace them with
"build" built-ins that automatically handle endianness so the same built-in
call and be used for both little-endian and big-endian compiles.  We are not
removing the assemble built-ins, since there is code in the wild that use
them, but we are removing their documentation to encourage the use of the
new "build" variants.

This passed bootstrap and regtesting on both powerpc64-linux and
powerpc64le-linux with no regressions.  I also verified that we continue
to generate the same code as before on some small unit tests and that we
agree with what LLVM generates.  Ok for trunk?

I'd also like to backport this in time for gcc 11.2.  Ok for the release
branch after it has baked on trunk for a few days?

Peter


gcc/
	* config/rs6000/rs6000-builtin.def (build_pair): New built-in.
	(build_acc): Likewise.
	* config/rs6000/rs6000-call.c (mma_expand_builtin): Swap assemble
	source operands in little-endian mode.
	(rs6000_gimple_fold_mma_builtin): Handle VSX_BUILTIN_BUILD_PAIR.
	(mma_init_builtins): Likewise.
	* config/rs6000/rs6000.c (rs6000_split_multireg_move): Handle endianness
	ordering for the MMA assemble and build source operands.
	* doc/extend.texi (__builtin_vsx_build_acc, __builtin_mma_build_pair):
	Document.
	(__builtin_mma_assemble_acc, __builtin_mma_assemble_pair): Remove
	documentation.

gcc/testsuite
	* gcc.target/powerpc/mma-builtin-4.c (__builtin_vsx_build_pair): Add
	tests.  Update expected counts.
	* gcc.target/powerpc/mma-builtin-5.c (__builtin_mma_build_acc): Add
	tests.  Update expected counts.

Comments

Segher Boessenkool June 9, 2021, 9:38 p.m. UTC | #1
Hi!

On Wed, Jun 09, 2021 at 04:05:37PM -0500, Peter Bergner wrote:
> The __builtin_vsx_assemble_pair and __builtin_mma_assemble_acc built-ins
> currently assign their first source operand to the first VSX register
> in a pair/quad, their second operand to the second register in a pair/quad, etc.
> This is not endian friendly and forces the user to generate different calls
> depending on endianness.  In agreement with the POWER LLVM team, we've
> decided to lightly deprecate the assemble built-ins and replace them with
> "build" built-ins that automatically handle endianness so the same built-in
> call and be used for both little-endian and big-endian compiles.  We are not
> removing the assemble built-ins, since there is code in the wild that use
> them, but we are removing their documentation to encourage the use of the
> new "build" variants.

It is better if you *do* document the old names, but say "use the new
stuff", I think?  Or is there so little material with the old names
out there that no one will notice?

> +      /* The ASSEMBLE builtin source operands are reversed in little-endian
> +	 mode, so reorder them.  */
> +      if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN)
> +	pat = GEN_FCN (icode) (op[0], op[2], op[1]);
> +      else
> +	pat = GEN_FCN (icode) (op[0], op[1], op[2]);

I think this reads simpler as
      /* The ASSEMBLE builtin source operands are reversed in little-endian
	 mode, so reorder them.  */
      if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN)
	std::swap (op[1], op[2]);
      pat = GEN_FCN (icode) (op[0], op[1], op[2]);
do you agree?

> +      /* The ASSEMBLE builtin source operands are reversed in little-endian
> +	 mode, so reorder them.  */
> +      if (fcode == MMA_BUILTIN_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
> +	pat = GEN_FCN (icode) (op[0], op[4], op[3], op[2], op[1]);
> +      else
> +	pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);

And
      /* The ASSEMBLE builtin source operands are reversed in little-endian
	 mode, so reorder them.  */
      if (fcode == MMA_BUILTIN_AS> +SEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
	{
	  std::swap (op[1], [op[4]);
	  std::swap (op[2], [op[3]);
	}
      pat = GEN_FCN (icode) (op> +[0], op[1], op[2], op[3], op[4]);
for that then of course.

> @@ -14151,7 +14161,8 @@ mma_init_builtins (void)
>  	      if (gimple_func && mode == XOmode)
>  		op[nopnds++] = build_pointer_type (vector_quad_type_node);
>  	      else if (gimple_func && mode == OOmode

Pleae write the
   && mode == OOmode
on a new line as well then?

> +	      int index = WORDS_BIG_ENDIAN ? i: nvecs - 1 - i;

Space before colon.

Okay for trunk and 11 with at least that space fixed.  Thanks!


Segher
Peter Bergner June 10, 2021, 3:43 p.m. UTC | #2
On 6/9/21 4:38 PM, Segher Boessenkool wrote:
> It is better if you *do* document the old names, but say "use the new
> stuff", I think?  Or is there so little material with the old names
> out there that no one will notice?

The latter.  There is only one user, but we want all new uses to use the
new built-in.  Plus, LLVM will be removing their documentation for that
built-in too, so we want to be consistent.



>> +      /* The ASSEMBLE builtin source operands are reversed in little-endian
>> +	 mode, so reorder them.  */
>> +      if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN)
>> +	pat = GEN_FCN (icode) (op[0], op[2], op[1]);
>> +      else
>> +	pat = GEN_FCN (icode) (op[0], op[1], op[2]);
> 
> I think this reads simpler as
>       /* The ASSEMBLE builtin source operands are reversed in little-endian
> 	 mode, so reorder them.  */
>       if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN)
> 	std::swap (op[1], op[2]);
>       pat = GEN_FCN (icode) (op[0], op[1], op[2]);
> do you agree?

Do I think C++ is simpler than plain C?  Is this a trick question? ;-)
If this is what you want, I'm fine with it.



> And
>       /* The ASSEMBLE builtin source operands are reversed in little-endian
> 	 mode, so reorder them.  */
>       if (fcode == MMA_BUILTIN_AS> +SEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
> 	{
> 	  std::swap (op[1], [op[4]);
> 	  std::swap (op[2], [op[3]);
> 	}
>       pat = GEN_FCN (icode) (op> +[0], op[1], op[2], op[3], op[4]);
> for that then of course.

Ok.


>> @@ -14151,7 +14161,8 @@ mma_init_builtins (void)
>>  	      if (gimple_func && mode == XOmode)
>>  		op[nopnds++] = build_pointer_type (vector_quad_type_node);
>>  	      else if (gimple_func && mode == OOmode
> 
> Pleae write the
>    && mode == OOmode
> on a new line as well then?

Will do.


>> +	      int index = WORDS_BIG_ENDIAN ? i: nvecs - 1 - i;
> 
> Space before colon.

Ah, thanks for catching that!



> Okay for trunk and 11 with at least that space fixed.  Thanks!

Ok, I'll retest with the above (but still removing the assemble
built-in documentation) and push it if it's clean.  Thanks!

Peter
Peter Bergner June 10, 2021, 6:58 p.m. UTC | #3
On 6/10/21 10:43 AM, Peter Bergner wrote:
> On 6/9/21 4:38 PM, Segher Boessenkool wrote:
>> Okay for trunk and 11 with at least that space fixed.  Thanks!
> 
> Ok, I'll retest with the above (but still removing the assemble
> built-in documentation) and push it if it's clean.  Thanks!

Testing was clean on the updated patch so I pushed it to trunk.
I'll let it bake a day or two before backporting.  Thanks!

Peter
Segher Boessenkool June 10, 2021, 8:40 p.m. UTC | #4
On Thu, Jun 10, 2021 at 10:43:05AM -0500, Peter Bergner wrote:
> On 6/9/21 4:38 PM, Segher Boessenkool wrote:
> > I think this reads simpler as
> >       /* The ASSEMBLE builtin source operands are reversed in little-endian
> > 	 mode, so reorder them.  */
> >       if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN)
> > 	std::swap (op[1], op[2]);
> >       pat = GEN_FCN (icode) (op[0], op[1], op[2]);
> > do you agree?
> 
> Do I think C++ is simpler than plain C?  Is this a trick question? ;-)

I *like* std::swap.  It is one of the few C++ things that is clearly
nice: it gives nice, unsurprising semantics, for nice, unsurprising
syntax, and it does exactly what you expect, and this can be described
in a few words, without needing any special concepts.  It does something
that cannot be done easily some other way.  It is utterly local.

Yup, I like it.

I often do not like C++ because it *is* surprising, often has things
spread out over different parts of your program, and distracts from the
one kind of programming style that C++ is well suited for: imperative
programming.  That is also the style that most of most compilers are
written in, for good reasons.

The main consumers of programs are *humans*.  So programs have to be
clear to people.  Modern C style helps that a lot.  Most C++ is the
opposite.  Perhaps this is because C++ gives you so many ways to write
unclear programs, because it has so many features that can be overused,
I do not know what it is exactly.

</rant>

Either way, I asked if you like my suggestion, you do not have to take
it if you don't :-)

> >> +	      int index = WORDS_BIG_ENDIAN ? i: nvecs - 1 - i;
> > 
> > Space before colon.
> 
> Ah, thanks for catching that!

Yeah I do not miss the important problems!

> Ok, I'll retest with the above (but still removing the assemble
> built-in documentation) and push it if it's clean.  Thanks!

Great, thank you!


Segher
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index 609bebdfd74..4043e14ed3f 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -3207,6 +3207,7 @@  BU_MMA_2 (DISASSEMBLE_ACC, "disassemble_acc",	QUAD, mma_disassemble_acc)
 BU_MMA_V2 (DISASSEMBLE_PAIR, "disassemble_pair", PAIR, vsx_disassemble_pair)
 BU_COMPAT (VSX_BUILTIN_DISASSEMBLE_PAIR, "mma_disassemble_pair")
 
+BU_MMA_V3 (BUILD_PAIR,	    "build_pair",	MISC, vsx_assemble_pair)
 BU_MMA_V3 (ASSEMBLE_PAIR,   "assemble_pair",	MISC, vsx_assemble_pair)
 BU_COMPAT (VSX_BUILTIN_ASSEMBLE_PAIR, "mma_assemble_pair")
 BU_MMA_3 (XVBF16GER2,	    "xvbf16ger2",	MISC, mma_xvbf16ger2)
@@ -3239,6 +3240,7 @@  BU_MMA_3 (XVI8GER4SPP,	    "xvi8ger4spp",      QUAD, mma_xvi8ger4spp)
 BU_MMA_3 (XVI16GER2PP,	    "xvi16ger2pp",      QUAD, mma_xvi16ger2pp)
 BU_MMA_3 (XVI16GER2SPP,	    "xvi16ger2spp",     QUAD, mma_xvi16ger2spp)
 
+BU_MMA_5 (BUILD_ACC,	    "build_acc",	MISC, mma_assemble_acc)
 BU_MMA_5 (ASSEMBLE_ACC,     "assemble_acc",	MISC, mma_assemble_acc)
 BU_MMA_5 (PMXVF32GER,	    "pmxvf32ger",       MISC, mma_pmxvf32ger)
 BU_MMA_5 (PMXVF64GER,	    "pmxvf64ger",       PAIR, mma_pmxvf64ger)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b4e13af4dc6..8deb9a3d207 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -10118,13 +10118,23 @@  mma_expand_builtin (tree exp, rtx target, bool *expandedp)
       pat = GEN_FCN (icode) (op[0], op[1]);
       break;
     case 3:
-      pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+      /* The ASSEMBLE builtin source operands are reversed in little-endian
+	 mode, so reorder them.  */
+      if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN)
+	pat = GEN_FCN (icode) (op[0], op[2], op[1]);
+      else
+	pat = GEN_FCN (icode) (op[0], op[1], op[2]);
       break;
     case 4:
       pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
       break;
     case 5:
-      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
+      /* The ASSEMBLE builtin source operands are reversed in little-endian
+	 mode, so reorder them.  */
+      if (fcode == MMA_BUILTIN_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
+	pat = GEN_FCN (icode) (op[0], op[4], op[3], op[2], op[1]);
+      else
+	pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
       break;
     case 6:
       pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]);
@@ -11835,7 +11845,7 @@  rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi)
       gcc_unreachable ();
     }
 
-  if (fncode == VSX_BUILTIN_ASSEMBLE_PAIR)
+  if (fncode == VSX_BUILTIN_BUILD_PAIR || fncode == VSX_BUILTIN_ASSEMBLE_PAIR)
     lhs = make_ssa_name (vector_pair_type_node);
   else
     lhs = make_ssa_name (vector_quad_type_node);
@@ -14151,7 +14161,8 @@  mma_init_builtins (void)
 	      if (gimple_func && mode == XOmode)
 		op[nopnds++] = build_pointer_type (vector_quad_type_node);
 	      else if (gimple_func && mode == OOmode
-		       && d->code == VSX_BUILTIN_ASSEMBLE_PAIR)
+		       && (d->code == VSX_BUILTIN_BUILD_PAIR
+			   || d->code == VSX_BUILTIN_ASSEMBLE_PAIR))
 		op[nopnds++] = build_pointer_type (vector_pair_type_node);
 	      else
 		/* MMA uses unsigned types.  */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b01bb5c8191..35b5d53cc9e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16807,9 +16807,11 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 	    gcc_assert (VSX_REGNO_P (REGNO (dst)));
 
 	  reg_mode = GET_MODE (XVECEXP (src, 0, 0));
-	  for (int i = 0; i < XVECLEN (src, 0); i++)
+	  int nvecs = XVECLEN (src, 0);
+	  for (int i = 0; i < nvecs; i++)
 	    {
-	      rtx dst_i = gen_rtx_REG (reg_mode, reg + i);
+	      int index = WORDS_BIG_ENDIAN ? i: nvecs - 1 - i;
+	      rtx dst_i = gen_rtx_REG (reg_mode, reg + index);
 	      emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i)));
 	    }
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 22f9e93073f..2bce9ab996f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20536,10 +20536,10 @@  void __builtin_mma_xxmtacc (__vector_quad *);
 void __builtin_mma_xxmfacc (__vector_quad *);
 void __builtin_mma_xxsetaccz (__vector_quad *);
 
-void __builtin_mma_assemble_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t);
+void __builtin_mma_build_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t);
 void __builtin_mma_disassemble_acc (void *, __vector_quad *);
 
-void __builtin_vsx_assemble_pair (__vector_pair *, vec_t, vec_t);
+void __builtin_vsx_build_pair (__vector_pair *, vec_t, vec_t);
 void __builtin_vsx_disassemble_pair (void *, __vector_pair *);
 
 vec_t __builtin_vsx_xvcvspbf16 (vec_t);
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
index 3bedf531de0..a9fb0107d12 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
@@ -20,6 +20,14 @@  foo2 (__vector_pair *dst, vec_t *src)
   *dst = pair;
 }
 
+void
+foo3 (__vector_pair *dst, vec_t *src)
+{
+  __vector_pair pair;
+  __builtin_vsx_build_pair (&pair, src[4], src[0]);
+  *dst = pair;
+}
+
 void
 bar (vec_t *dst, __vector_pair *src)
 {
@@ -54,8 +62,12 @@  bar2 (vec_t *dst, __vector_pair *src)
 #  error "__has_builtin (__builtin_mma_disassemble_pair) failed"
 #endif
 
-/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
+#if !__has_builtin (__builtin_vsx_build_pair)
+#  error "__has_builtin (__builtin_vsx_build_pair) failed"
+#endif
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 3 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
index 43b6d3ac91e..00503b7343d 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
@@ -12,6 +12,14 @@  foo (__vector_quad *dst, vec_t *src)
   *dst = acc;
 }
 
+void
+foo2 (__vector_quad *dst, vec_t *src)
+{
+  __vector_quad acc;
+  __builtin_mma_build_acc (&acc, src[12], src[8], src[4], src[0]);
+  *dst = acc;
+}
+
 void
 bar (vec_t *dst, __vector_quad *src)
 {
@@ -23,9 +31,17 @@  bar (vec_t *dst, __vector_quad *src)
   dst[12] = res[3];
 }
 
-/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */
+#if !__has_builtin (__builtin_mma_assemble_acc)
+#  error "__has_builtin (__builtin_mma_assemble_acc) failed"
+#endif
+
+#if !__has_builtin (__builtin_mma_build_acc)
+#  error "__has_builtin (__builtin_mma_build_acc) failed"
+#endif
+
+/* { dg-final { scan-assembler-times {\mlxv\M} 8 } } */
 /* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxmfacc\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mxxmtacc\M} 3 } } */