Message ID | 17bba0c5-28b9-6085-f70b-630d881062f1@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000: Add new __builtin_vsx_build_pair and __builtin_mma_build_acc built-ins | expand |
Hi! On Wed, Jun 09, 2021 at 04:05:37PM -0500, Peter Bergner wrote: > The __builtin_vsx_assemble_pair and __builtin_mma_assemble_acc built-ins > currently assign their first source operand to the first VSX register > in a pair/quad, their second operand to the second register in a pair/quad, etc. > This is not endian friendly and forces the user to generate different calls > depending on endianness. In agreement with the POWER LLVM team, we've > decided to lightly deprecate the assemble built-ins and replace them with > "build" built-ins that automatically handle endianness so the same built-in > call and be used for both little-endian and big-endian compiles. We are not > removing the assemble built-ins, since there is code in the wild that use > them, but we are removing their documentation to encourage the use of the > new "build" variants. It is better if you *do* document the old names, but say "use the new stuff", I think? Or is there so little material with the old names out there that no one will notice? > + /* The ASSEMBLE builtin source operands are reversed in little-endian > + mode, so reorder them. */ > + if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN) > + pat = GEN_FCN (icode) (op[0], op[2], op[1]); > + else > + pat = GEN_FCN (icode) (op[0], op[1], op[2]); I think this reads simpler as /* The ASSEMBLE builtin source operands are reversed in little-endian mode, so reorder them. */ if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN) std::swap (op[1], op[2]); pat = GEN_FCN (icode) (op[0], op[1], op[2]); do you agree? > + /* The ASSEMBLE builtin source operands are reversed in little-endian > + mode, so reorder them. */ > + if (fcode == MMA_BUILTIN_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN) > + pat = GEN_FCN (icode) (op[0], op[4], op[3], op[2], op[1]); > + else > + pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]); And /* The ASSEMBLE builtin source operands are reversed in little-endian mode, so reorder them. */ if (fcode == MMA_BUILTIN_AS> +SEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN) { std::swap (op[1], [op[4]); std::swap (op[2], [op[3]); } pat = GEN_FCN (icode) (op> +[0], op[1], op[2], op[3], op[4]); for that then of course. > @@ -14151,7 +14161,8 @@ mma_init_builtins (void) > if (gimple_func && mode == XOmode) > op[nopnds++] = build_pointer_type (vector_quad_type_node); > else if (gimple_func && mode == OOmode Pleae write the && mode == OOmode on a new line as well then? > + int index = WORDS_BIG_ENDIAN ? i: nvecs - 1 - i; Space before colon. Okay for trunk and 11 with at least that space fixed. Thanks! Segher
On 6/9/21 4:38 PM, Segher Boessenkool wrote: > It is better if you *do* document the old names, but say "use the new > stuff", I think? Or is there so little material with the old names > out there that no one will notice? The latter. There is only one user, but we want all new uses to use the new built-in. Plus, LLVM will be removing their documentation for that built-in too, so we want to be consistent. >> + /* The ASSEMBLE builtin source operands are reversed in little-endian >> + mode, so reorder them. */ >> + if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN) >> + pat = GEN_FCN (icode) (op[0], op[2], op[1]); >> + else >> + pat = GEN_FCN (icode) (op[0], op[1], op[2]); > > I think this reads simpler as > /* The ASSEMBLE builtin source operands are reversed in little-endian > mode, so reorder them. */ > if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN) > std::swap (op[1], op[2]); > pat = GEN_FCN (icode) (op[0], op[1], op[2]); > do you agree? Do I think C++ is simpler than plain C? Is this a trick question? ;-) If this is what you want, I'm fine with it. > And > /* The ASSEMBLE builtin source operands are reversed in little-endian > mode, so reorder them. */ > if (fcode == MMA_BUILTIN_AS> +SEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN) > { > std::swap (op[1], [op[4]); > std::swap (op[2], [op[3]); > } > pat = GEN_FCN (icode) (op> +[0], op[1], op[2], op[3], op[4]); > for that then of course. Ok. >> @@ -14151,7 +14161,8 @@ mma_init_builtins (void) >> if (gimple_func && mode == XOmode) >> op[nopnds++] = build_pointer_type (vector_quad_type_node); >> else if (gimple_func && mode == OOmode > > Pleae write the > && mode == OOmode > on a new line as well then? Will do. >> + int index = WORDS_BIG_ENDIAN ? i: nvecs - 1 - i; > > Space before colon. Ah, thanks for catching that! > Okay for trunk and 11 with at least that space fixed. Thanks! Ok, I'll retest with the above (but still removing the assemble built-in documentation) and push it if it's clean. Thanks! Peter
On 6/10/21 10:43 AM, Peter Bergner wrote: > On 6/9/21 4:38 PM, Segher Boessenkool wrote: >> Okay for trunk and 11 with at least that space fixed. Thanks! > > Ok, I'll retest with the above (but still removing the assemble > built-in documentation) and push it if it's clean. Thanks! Testing was clean on the updated patch so I pushed it to trunk. I'll let it bake a day or two before backporting. Thanks! Peter
On Thu, Jun 10, 2021 at 10:43:05AM -0500, Peter Bergner wrote: > On 6/9/21 4:38 PM, Segher Boessenkool wrote: > > I think this reads simpler as > > /* The ASSEMBLE builtin source operands are reversed in little-endian > > mode, so reorder them. */ > > if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN) > > std::swap (op[1], op[2]); > > pat = GEN_FCN (icode) (op[0], op[1], op[2]); > > do you agree? > > Do I think C++ is simpler than plain C? Is this a trick question? ;-) I *like* std::swap. It is one of the few C++ things that is clearly nice: it gives nice, unsurprising semantics, for nice, unsurprising syntax, and it does exactly what you expect, and this can be described in a few words, without needing any special concepts. It does something that cannot be done easily some other way. It is utterly local. Yup, I like it. I often do not like C++ because it *is* surprising, often has things spread out over different parts of your program, and distracts from the one kind of programming style that C++ is well suited for: imperative programming. That is also the style that most of most compilers are written in, for good reasons. The main consumers of programs are *humans*. So programs have to be clear to people. Modern C style helps that a lot. Most C++ is the opposite. Perhaps this is because C++ gives you so many ways to write unclear programs, because it has so many features that can be overused, I do not know what it is exactly. </rant> Either way, I asked if you like my suggestion, you do not have to take it if you don't :-) > >> + int index = WORDS_BIG_ENDIAN ? i: nvecs - 1 - i; > > > > Space before colon. > > Ah, thanks for catching that! Yeah I do not miss the important problems! > Ok, I'll retest with the above (but still removing the assemble > built-in documentation) and push it if it's clean. Thanks! Great, thank you! Segher
diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def index 609bebdfd74..4043e14ed3f 100644 --- a/gcc/config/rs6000/rs6000-builtin.def +++ b/gcc/config/rs6000/rs6000-builtin.def @@ -3207,6 +3207,7 @@ BU_MMA_2 (DISASSEMBLE_ACC, "disassemble_acc", QUAD, mma_disassemble_acc) BU_MMA_V2 (DISASSEMBLE_PAIR, "disassemble_pair", PAIR, vsx_disassemble_pair) BU_COMPAT (VSX_BUILTIN_DISASSEMBLE_PAIR, "mma_disassemble_pair") +BU_MMA_V3 (BUILD_PAIR, "build_pair", MISC, vsx_assemble_pair) BU_MMA_V3 (ASSEMBLE_PAIR, "assemble_pair", MISC, vsx_assemble_pair) BU_COMPAT (VSX_BUILTIN_ASSEMBLE_PAIR, "mma_assemble_pair") BU_MMA_3 (XVBF16GER2, "xvbf16ger2", MISC, mma_xvbf16ger2) @@ -3239,6 +3240,7 @@ BU_MMA_3 (XVI8GER4SPP, "xvi8ger4spp", QUAD, mma_xvi8ger4spp) BU_MMA_3 (XVI16GER2PP, "xvi16ger2pp", QUAD, mma_xvi16ger2pp) BU_MMA_3 (XVI16GER2SPP, "xvi16ger2spp", QUAD, mma_xvi16ger2spp) +BU_MMA_5 (BUILD_ACC, "build_acc", MISC, mma_assemble_acc) BU_MMA_5 (ASSEMBLE_ACC, "assemble_acc", MISC, mma_assemble_acc) BU_MMA_5 (PMXVF32GER, "pmxvf32ger", MISC, mma_pmxvf32ger) BU_MMA_5 (PMXVF64GER, "pmxvf64ger", PAIR, mma_pmxvf64ger) diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c index b4e13af4dc6..8deb9a3d207 100644 --- a/gcc/config/rs6000/rs6000-call.c +++ b/gcc/config/rs6000/rs6000-call.c @@ -10118,13 +10118,23 @@ mma_expand_builtin (tree exp, rtx target, bool *expandedp) pat = GEN_FCN (icode) (op[0], op[1]); break; case 3: - pat = GEN_FCN (icode) (op[0], op[1], op[2]); + /* The ASSEMBLE builtin source operands are reversed in little-endian + mode, so reorder them. */ + if (fcode == VSX_BUILTIN_ASSEMBLE_PAIR_INTERNAL && !WORDS_BIG_ENDIAN) + pat = GEN_FCN (icode) (op[0], op[2], op[1]); + else + pat = GEN_FCN (icode) (op[0], op[1], op[2]); break; case 4: pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]); break; case 5: - pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]); + /* The ASSEMBLE builtin source operands are reversed in little-endian + mode, so reorder them. */ + if (fcode == MMA_BUILTIN_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN) + pat = GEN_FCN (icode) (op[0], op[4], op[3], op[2], op[1]); + else + pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]); break; case 6: pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]); @@ -11835,7 +11845,7 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi) gcc_unreachable (); } - if (fncode == VSX_BUILTIN_ASSEMBLE_PAIR) + if (fncode == VSX_BUILTIN_BUILD_PAIR || fncode == VSX_BUILTIN_ASSEMBLE_PAIR) lhs = make_ssa_name (vector_pair_type_node); else lhs = make_ssa_name (vector_quad_type_node); @@ -14151,7 +14161,8 @@ mma_init_builtins (void) if (gimple_func && mode == XOmode) op[nopnds++] = build_pointer_type (vector_quad_type_node); else if (gimple_func && mode == OOmode - && d->code == VSX_BUILTIN_ASSEMBLE_PAIR) + && (d->code == VSX_BUILTIN_BUILD_PAIR + || d->code == VSX_BUILTIN_ASSEMBLE_PAIR)) op[nopnds++] = build_pointer_type (vector_pair_type_node); else /* MMA uses unsigned types. */ diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index b01bb5c8191..35b5d53cc9e 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -16807,9 +16807,11 @@ rs6000_split_multireg_move (rtx dst, rtx src) gcc_assert (VSX_REGNO_P (REGNO (dst))); reg_mode = GET_MODE (XVECEXP (src, 0, 0)); - for (int i = 0; i < XVECLEN (src, 0); i++) + int nvecs = XVECLEN (src, 0); + for (int i = 0; i < nvecs; i++) { - rtx dst_i = gen_rtx_REG (reg_mode, reg + i); + int index = WORDS_BIG_ENDIAN ? i: nvecs - 1 - i; + rtx dst_i = gen_rtx_REG (reg_mode, reg + index); emit_insn (gen_rtx_SET (dst_i, XVECEXP (src, 0, i))); } diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 22f9e93073f..2bce9ab996f 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -20536,10 +20536,10 @@ void __builtin_mma_xxmtacc (__vector_quad *); void __builtin_mma_xxmfacc (__vector_quad *); void __builtin_mma_xxsetaccz (__vector_quad *); -void __builtin_mma_assemble_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t); +void __builtin_mma_build_acc (__vector_quad *, vec_t, vec_t, vec_t, vec_t); void __builtin_mma_disassemble_acc (void *, __vector_quad *); -void __builtin_vsx_assemble_pair (__vector_pair *, vec_t, vec_t); +void __builtin_vsx_build_pair (__vector_pair *, vec_t, vec_t); void __builtin_vsx_disassemble_pair (void *, __vector_pair *); vec_t __builtin_vsx_xvcvspbf16 (vec_t); diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c index 3bedf531de0..a9fb0107d12 100644 --- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c @@ -20,6 +20,14 @@ foo2 (__vector_pair *dst, vec_t *src) *dst = pair; } +void +foo3 (__vector_pair *dst, vec_t *src) +{ + __vector_pair pair; + __builtin_vsx_build_pair (&pair, src[4], src[0]); + *dst = pair; +} + void bar (vec_t *dst, __vector_pair *src) { @@ -54,8 +62,12 @@ bar2 (vec_t *dst, __vector_pair *src) # error "__has_builtin (__builtin_mma_disassemble_pair) failed" #endif -/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */ +#if !__has_builtin (__builtin_vsx_build_pair) +# error "__has_builtin (__builtin_vsx_build_pair) failed" +#endif + +/* { dg-final { scan-assembler-times {\mlxv\M} 6 } } */ /* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */ /* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */ -/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 3 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c index 43b6d3ac91e..00503b7343d 100644 --- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c @@ -12,6 +12,14 @@ foo (__vector_quad *dst, vec_t *src) *dst = acc; } +void +foo2 (__vector_quad *dst, vec_t *src) +{ + __vector_quad acc; + __builtin_mma_build_acc (&acc, src[12], src[8], src[4], src[0]); + *dst = acc; +} + void bar (vec_t *dst, __vector_quad *src) { @@ -23,9 +31,17 @@ bar (vec_t *dst, __vector_quad *src) dst[12] = res[3]; } -/* { dg-final { scan-assembler-times {\mlxv\M} 4 } } */ +#if !__has_builtin (__builtin_mma_assemble_acc) +# error "__has_builtin (__builtin_mma_assemble_acc) failed" +#endif + +#if !__has_builtin (__builtin_mma_build_acc) +# error "__has_builtin (__builtin_mma_build_acc) failed" +#endif + +/* { dg-final { scan-assembler-times {\mlxv\M} 8 } } */ /* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */ /* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */ -/* { dg-final { scan-assembler-times {\mstxvp\M} 2 } } */ -/* { dg-final { scan-assembler-times {\mxxmfacc\M} 2 } } */ -/* { dg-final { scan-assembler-times {\mxxmtacc\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mxxmfacc\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mxxmtacc\M} 3 } } */