diff mbox series

rs6000: MMA test case emits wrong code when building a vector pair

Message ID 24684412-042a-0842-3a5e-1e62c0ae3691@linux.ibm.com
State New
Headers show
Series rs6000: MMA test case emits wrong code when building a vector pair | expand

Commit Message

Peter Bergner Oct. 28, 2021, 1:37 a.m. UTC
PR102976 shows a test case where we generate wrong code when building
a vector pair from 2 vector registers.  The bug here is that with unlucky
register assignments, we can clobber one of the input operands before
we write both registers of the output operand.  The solution is to use
early-clobbers in the assemble pair and accumulator patterns.

This passed bootstrap and regtesting with no regressions and our
OpenBLAS team has confirmed it fixes the issues they reported.
Ok for mainline?

Ok for GCC 11 too after a few days on trunk?

Peter


gcc/
	PR target/102976
	* config/rs6000/mma.md (*vsx_assemble_pair): Add early-clobber for
	output operand.
	(*mma_assemble_acc): Likewise.

gcc/testsuite/
	PR target/102976
	* gcc.target/powerpc/pr102976.c: New test.

Comments

Peter Bergner Nov. 12, 2021, 7:49 p.m. UTC | #1
I'd like to ping the following patch.

Peter


On 10/27/21 8:37 PM, Peter Bergner via Gcc-patches wrote:
> PR102976 shows a test case where we generate wrong code when building
> a vector pair from 2 vector registers.  The bug here is that with unlucky
> register assignments, we can clobber one of the input operands before
> we write both registers of the output operand.  The solution is to use
> early-clobbers in the assemble pair and accumulator patterns.
> 
> This passed bootstrap and regtesting with no regressions and our
> OpenBLAS team has confirmed it fixes the issues they reported.
> Ok for mainline?
> 
> Ok for GCC 11 too after a few days on trunk?
> 
> Peter
> 
> 
> gcc/
> 	PR target/102976
> 	* config/rs6000/mma.md (*vsx_assemble_pair): Add early-clobber for
> 	output operand.
> 	(*mma_assemble_acc): Likewise.
> 
> gcc/testsuite/
> 	PR target/102976
> 	* gcc.target/powerpc/pr102976.c: New test.
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index 1990a2183f6..f0ea99963f7 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -339,7 +339,7 @@ (define_expand "vsx_assemble_pair"
>  })
>  
>  (define_insn_and_split "*vsx_assemble_pair"
> -  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
> +  [(set (match_operand:OO 0 "vsx_register_operand" "=&wa")
>  	(unspec:OO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa")
>  		    (match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")]
>  		    UNSPEC_MMA_ASSEMBLE))]
> @@ -405,7 +405,7 @@ (define_expand "mma_assemble_acc"
>  })
>  
>  (define_insn_and_split "*mma_assemble_acc"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> +  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
>  	(unspec:XO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa")
>  		    (match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")
>  		    (match_operand:V16QI 3 "mma_assemble_input_operand" "mwa")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102976.c b/gcc/testsuite/gcc.target/powerpc/pr102976.c
> new file mode 100644
> index 00000000000..a8de8f056f1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102976.c
> @@ -0,0 +1,14 @@
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +
> +#include <altivec.h>
> +void
> +bug (__vector_pair *dst)
> +{
> +  register vector unsigned char vec0 asm ("vs44");
> +  register vector unsigned char vec1 asm ("vs32");
> +  __builtin_vsx_build_pair (dst, vec0, vec1);
> +}
> +
> +/* { dg-final { scan-assembler-times {xxlor[^,]*,44,44} 1 } } */
> +/* { dg-final { scan-assembler-times {xxlor[^,]*,32,32} 1 } } */
>
Segher Boessenkool Nov. 13, 2021, 1:25 p.m. UTC | #2
On Wed, Oct 27, 2021 at 08:37:57PM -0500, Peter Bergner wrote:
> PR102976 shows a test case where we generate wrong code when building
> a vector pair from 2 vector registers.  The bug here is that with unlucky
> register assignments, we can clobber one of the input operands before
> we write both registers of the output operand.  The solution is to use
> early-clobbers in the assemble pair and accumulator patterns.

Because of what insns there are after the split.  Aha.

Please add a comment explaining this, near the earlyclobber itself.

A usually nicer way of doing it is by special casing the split code for
this situation.  But with the comment in place the way you do it might
even be preferable here :-)

> +/* { dg-final { scan-assembler-times {xxlor[^,]*,44,44} 1 } } */
> +/* { dg-final { scan-assembler-times {xxlor[^,]*,32,32} 1 } } */

Bracket expressions using ^ match newlines as well, unless you use
(partial) newline-sensitive matching.  Partial is almost always what you
want, so start the regex with (?p) ?  You also want to add some \m and
\M btw.  For example, as written his will match xxlorc insns as well.
Not a super big deal, but :-)

You can just write this as {\mxxlor \d+,44,44\M} etc., that will be
simplest I think.

Okay for trunk with comments added near the earlyclobber, and the RE
improved.  Also fine for 11 after some burn-in.  Thanks!


Segher
Peter Bergner Nov. 16, 2021, 6:18 p.m. UTC | #3
On 11/13/21 7:25 AM, Segher Boessenkool wrote:
> On Wed, Oct 27, 2021 at 08:37:57PM -0500, Peter Bergner wrote:
>> PR102976 shows a test case where we generate wrong code when building
>> a vector pair from 2 vector registers.  The bug here is that with unlucky
>> register assignments, we can clobber one of the input operands before
>> we write both registers of the output operand.  The solution is to use
>> early-clobbers in the assemble pair and accumulator patterns.
> 
> Because of what insns there are after the split.  Aha.
> 
> Please add a comment explaining this, near the earlyclobber itself.

Done for both patterns.



> You can just write this as {\mxxlor \d+,44,44\M} etc., that will be
> simplest I think.

Done and tested that it still works.


> Okay for trunk with comments added near the earlyclobber, and the RE
> improved.  Also fine for 11 after some burn-in.  Thanks!

Ok, I pushed with both changes.  I'll push a change to GCC11 in a few days.
Thanks!

Peter
diff mbox series

Patch

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 1990a2183f6..f0ea99963f7 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -339,7 +339,7 @@  (define_expand "vsx_assemble_pair"
 })
 
 (define_insn_and_split "*vsx_assemble_pair"
-  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+  [(set (match_operand:OO 0 "vsx_register_operand" "=&wa")
 	(unspec:OO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa")
 		    (match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")]
 		    UNSPEC_MMA_ASSEMBLE))]
@@ -405,7 +405,7 @@  (define_expand "mma_assemble_acc"
 })
 
 (define_insn_and_split "*mma_assemble_acc"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
 	(unspec:XO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa")
 		    (match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")
 		    (match_operand:V16QI 3 "mma_assemble_input_operand" "mwa")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102976.c b/gcc/testsuite/gcc.target/powerpc/pr102976.c
new file mode 100644
index 00000000000..a8de8f056f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102976.c
@@ -0,0 +1,14 @@ 
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#include <altivec.h>
+void
+bug (__vector_pair *dst)
+{
+  register vector unsigned char vec0 asm ("vs44");
+  register vector unsigned char vec1 asm ("vs32");
+  __builtin_vsx_build_pair (dst, vec0, vec1);
+}
+
+/* { dg-final { scan-assembler-times {xxlor[^,]*,44,44} 1 } } */
+/* { dg-final { scan-assembler-times {xxlor[^,]*,32,32} 1 } } */