Message ID | 3b4976974ca4a9e481c462ef2b9a4892f1d4174f.camel@vnet.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000: RFC/Update support for addg6s instruction. PR100693 | expand |
Hi! On Wed, Mar 16, 2022 at 12:20:18PM -0500, will schmidt wrote: > For PR100693, we currently provide an addg6s builtin using unsigned > int arguments, but we are missing an unsigned long long argument > equivalent. This patch adds an overload to provide the long long > version of the builtin. > > unsigned long long __builtin_addg6s (unsigned long long, unsigned long long); > > RFC/concerns: This patch works, but looking briefly at intermediate stages > is not behaving quite as I expected. Looking at the intermediate dumps, I > see in pr100693.original that calls I expect to be routed to the internal > __builtin_addg6s_si() that uses (unsigned int) arguments are instead being > handled by __builtin_addg6s_di() with casts that convert the arguments to > (unsigned long long). Did you test with actual 32-bit variables, instead of just function arguments? Function arguments are always passed in (sign-extended) registers. Like, unsigned int f(unsigned int *a, unsigned int *b) { return __builtin_addg6s(*a, *b); } > As a test, I see if I swap the order of the builtins in rs6000-overload.def > I end up with code casting the ULL values to UI, which provides truncated > results, and is similar to what occurs today without this patch. > > All that said, this patch seems to work. OK for next stage 1? > Tested on power8BE as well as LE power8,power9,power10. Please ask again when stage 1 has started? > gcc/ > PR target/100693 > * config/rs6000/rs600-builtins.def: Remove entry for __builtin_addgs() > and add entries for __builtin_addg6s_di() and __builtin_addg6s_si(). Indent of second and further lines should be at the "*", not two spaces after that. > - UNSPEC_ADDG6S > + UNSPEC_ADDG6S_SI > + UNSPEC_ADDG6S_DI You do not need multiple unspec numbers. You can differentiate them based on the modes of the arguments, already :-) > ;; Miscellaneous ISA 2.06 (power7) instructions > -(define_insn "addg6s" > +(define_insn "addg6s_si" > [(set (match_operand:SI 0 "register_operand" "=r") > (unspec:SI [(match_operand:SI 1 "register_operand" "r") > (match_operand:SI 2 "register_operand" "r")] > - UNSPEC_ADDG6S))] > + UNSPEC_ADDG6S_SI))] > + "TARGET_POPCNTD" > + "addg6s %0,%1,%2" > + [(set_attr "type" "integer")]) > + > +(define_insn "addg6s_di" > + [(set (match_operand:DI 0 "register_operand" "=r") > + (unspec:DI [(match_operand:DI 1 "register_operand" "r") > + (match_operand:DI 2 "register_operand" "r")] > + UNSPEC_ADDG6S_DI))] > "TARGET_POPCNTD" > "addg6s %0,%1,%2" > [(set_attr "type" "integer")]) (define_insn "addg6s" [(set (match_operand:GPR 0 "register_operand" "=r") (unspec:GPR [(match_operand:GPR 1 "register_operand" "r") (match_operand:GPR 2 "register_operand" "r")] UNSPEC_ADDG6S))] "TARGET_POPCNTD" "addg6s %0,%1,%2" [(set_attr "type" "integer")]) We do not want DI (here, and in most places) for -m32! > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr100693.c > @@ -0,0 +1,68 @@ > +/* { dg-do compile { target { powerpc*-*-linux* } } } */ Why only on Linux? > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ Why not on Darwin? And why skip it anyway, given the previous line :-) > +/* { dg-require-effective-target powerpc_vsx_ok } */ That is the wrong requirement. You want to test for Power7, not for VSX. I realise you probably copied this from elsewhere :-( (If from another addg6s testcase, just keep it). Segher
On Wed, 2022-03-16 at 13:12 -0500, Segher Boessenkool wrote: > Hi! > > On Wed, Mar 16, 2022 at 12:20:18PM -0500, will schmidt wrote: > > For PR100693, we currently provide an addg6s builtin using unsigned > > int arguments, but we are missing an unsigned long long argument > > equivalent. This patch adds an overload to provide the long long > > version of the builtin. > > > > unsigned long long __builtin_addg6s (unsigned long long, unsigned > > long long); > > > > RFC/concerns: This patch works, but looking briefly at intermediate > > stages > > is not behaving quite as I expected. Looking at the intermediate > > dumps, I > > see in pr100693.original that calls I expect to be routed to the > > internal > > __builtin_addg6s_si() that uses (unsigned int) arguments are > > instead being > > handled by __builtin_addg6s_di() with casts that convert the > > arguments to > > (unsigned long long). > > Did you test with actual 32-bit variables, instead of just function > arguments? Function arguments are always passed in (sign-extended) > registers. > > Like, > > unsigned int f(unsigned int *a, unsigned int *b) > { > return __builtin_addg6s(*a, *b); > } I perhaps missed that subtlety. I'll investigate that further. > > > As a test, I see if I swap the order of the builtins in rs6000- > > overload.def > > I end up with code casting the ULL values to UI, which provides > > truncated > > results, and is similar to what occurs today without this patch. > > > > All that said, this patch seems to work. OK for next stage 1? > > Tested on power8BE as well as LE power8,power9,power10. > > Please ask again when stage 1 has started? > > > gcc/ > > PR target/100693 > > * config/rs6000/rs600-builtins.def: Remove entry for > > __builtin_addgs() > > and add entries for __builtin_addg6s_di() and > > __builtin_addg6s_si(). > > Indent of second and further lines should be at the "*", not two > spaces > after that. > > > - UNSPEC_ADDG6S > > + UNSPEC_ADDG6S_SI > > + UNSPEC_ADDG6S_DI > > You do not need multiple unspec numbers. You can differentiate them > based on the modes of the arguments, already :-) > > > ;; Miscellaneous ISA 2.06 (power7) instructions > > -(define_insn "addg6s" > > +(define_insn "addg6s_si" > > [(set (match_operand:SI 0 "register_operand" "=r") > > (unspec:SI [(match_operand:SI 1 "register_operand" "r") > > (match_operand:SI 2 "register_operand" "r")] > > - UNSPEC_ADDG6S))] > > + UNSPEC_ADDG6S_SI))] > > + "TARGET_POPCNTD" > > + "addg6s %0,%1,%2" > > + [(set_attr "type" "integer")]) > > + > > +(define_insn "addg6s_di" > > + [(set (match_operand:DI 0 "register_operand" "=r") > > + (unspec:DI [(match_operand:DI 1 "register_operand" "r") > > + (match_operand:DI 2 "register_operand" "r")] > > + UNSPEC_ADDG6S_DI))] > > "TARGET_POPCNTD" > > "addg6s %0,%1,%2" > > [(set_attr "type" "integer")]) > > (define_insn "addg6s" > [(set (match_operand:GPR 0 "register_operand" "=r") > (unspec:GPR [(match_operand:GPR 1 "register_operand" "r") > (match_operand:GPR 2 "register_operand" "r")] > UNSPEC_ADDG6S))] > "TARGET_POPCNTD" > "addg6s %0,%1,%2" > [(set_attr "type" "integer")]) > You do not need multiple unspec numbers. You can differentiate them > based on the modes of the arguments, already :-) Yeah, Thats what I thought, which is a big part of why I posted this with RFC. :-) When I attempted this there was an issue with multiple <mode>s (behind the GPR predicate) versus the singular "addg6s" define_insn. It's possible I had something else wrong there, but I'll go back to that attempt and work in that direction. > > We do not want DI (here, and in most places) for -m32! > > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/pr100693.c > > @@ -0,0 +1,68 @@ > > +/* { dg-do compile { target { powerpc*-*-linux* } } } */ > > Why only on Linux? > > > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ > > Why not on Darwin? And why skip it anyway, given the previous line > :-) > > > +/* { dg-require-effective-target powerpc_vsx_ok } */ > > That is the wrong requirement. You want to test for Power7, not for > VSX. I realise you probably copied this from elsewhere :-( (If from > another addg6s testcase, just keep it). Because reasons. :-) The stanzas are copied from the nearby bcd-1.c testcase that has a simpler test for addg6s. Given the input I'll try to correct the stanzas here and limit how much error I carry along. Thanks for the feedback and review. I'll investigate further, and resubmit at stage1. Thanks, -Will > > > Segher
On Wed, Mar 16, 2022 at 03:06:42PM -0500, will schmidt wrote: > On Wed, 2022-03-16 at 13:12 -0500, Segher Boessenkool wrote: > > (define_insn "addg6s" > > [(set (match_operand:GPR 0 "register_operand" "=r") > > (unspec:GPR [(match_operand:GPR 1 "register_operand" "r") > > (match_operand:GPR 2 "register_operand" "r")] > > UNSPEC_ADDG6S))] > > "TARGET_POPCNTD" > > "addg6s %0,%1,%2" > > [(set_attr "type" "integer")]) > > You do not need multiple unspec numbers. You can differentiate > them > > based on the modes of the arguments, already :-) > > Yeah, Thats what I thought, which is a big part of why I posted this > with RFC. :-) When I attempted this there was an issue with multiple > <mode>s (behind the GPR predicate) versus the singular "addg6s" > define_insn. > It's possible I had something else wrong there, but I'll > go back to that attempt and work in that direction. No, that is still there. One way out is to make this an unnamed pattern (like "*addg6s"). Another is to put the mode in the name, and then you probably want to make it a parameterized name as well, which then hides all that again: (define_insn "@addg6s" ... > > > +/* { dg-require-effective-target powerpc_vsx_ok } */ > > > > That is the wrong requirement. You want to test for Power7, not for > > VSX. I realise you probably copied this from elsewhere :-( (If from > > another addg6s testcase, just keep it). > > Because reasons. :-) The stanzas are copied from the nearby bcd-1.c > testcase that has a simpler test for addg6s. Given the input I'll > try to correct the stanzas here and limit how much error I carry along. If you do not improve the existing ones it may be best to just keep this the same in the new testcases as well. Consistency is a good thing. Consistent wrong is not so great of course :-) Segher
diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index ae2760c33389..4c23cac26932 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -1993,12 +1993,16 @@ XXSPLTD_V2DI vsx_xxspltd_v2di {} ; Power7 builtins (ISA 2.06). [power7] - const unsigned int __builtin_addg6s (unsigned int, unsigned int); - ADDG6S addg6s {} + const unsigned long long __builtin_addg6s_di (unsigned long long, \ + unsigned long long); + ADDG6S_DI addg6s_di {} + + const unsigned int __builtin_addg6s_si (unsigned int, unsigned int); + ADDG6S_SI addg6s_si {} const signed long __builtin_bpermd (signed long, signed long); BPERMD bpermd_di {32bit} const unsigned int __builtin_cbcdtd (unsigned int); diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 44e2945aaa0e..931f85b738c5 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -76,10 +76,15 @@ ; Blank lines may be used as desired in this file between the lines as ; defined above; that is, you can introduce as many extra newlines as you ; like after a required newline, but nowhere else. Lines beginning with ; a semicolon are also treated as blank lines. +[ADDG6S, __builtin_i_addg6s, __builtin_addg6s] + unsigned long long __builtin_addg6s_di (signed long long, unsigned long long); + ADDG6S_DI + unsigned int __builtin_addg6s_si (unsigned int, unsigned int); + ADDG6S_SI [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd] vsq __builtin_vec_bcdadd (vsq, vsq, const int); BCDADD_V1TI vuc __builtin_vec_bcdadd (vuc, vuc, const int); diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index fdfbc6566a5c..d040f127eb55 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -122,11 +122,12 @@ (define_c_enum "unspec" UNSPEC_P8V_MTVSRWZ UNSPEC_P8V_RELOAD_FROM_GPR UNSPEC_P8V_MTVSRD UNSPEC_P8V_XXPERMDI UNSPEC_P8V_RELOAD_FROM_VSX - UNSPEC_ADDG6S + UNSPEC_ADDG6S_SI + UNSPEC_ADDG6S_DI UNSPEC_CDTBCD UNSPEC_CBCDTD UNSPEC_DIVE UNSPEC_DIVEU UNSPEC_UNPACK_128BIT @@ -14495,15 +14496,24 @@ (define_peephole2 operands[5] = change_address (mem, <ALTIVEC_DFORM:MODE>mode, new_addr); }) ;; Miscellaneous ISA 2.06 (power7) instructions -(define_insn "addg6s" +(define_insn "addg6s_si" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(match_operand:SI 1 "register_operand" "r") (match_operand:SI 2 "register_operand" "r")] - UNSPEC_ADDG6S))] + UNSPEC_ADDG6S_SI))] + "TARGET_POPCNTD" + "addg6s %0,%1,%2" + [(set_attr "type" "integer")]) + +(define_insn "addg6s_di" + [(set (match_operand:DI 0 "register_operand" "=r") + (unspec:DI [(match_operand:DI 1 "register_operand" "r") + (match_operand:DI 2 "register_operand" "r")] + UNSPEC_ADDG6S_DI))] "TARGET_POPCNTD" "addg6s %0,%1,%2" [(set_attr "type" "integer")]) (define_insn "cdtbcd" diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0dc752e8aadd..9eeb962f7363 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -18072,10 +18072,11 @@ addition to the @option{-maltivec}, @option{-mpopcntd}, and @option{-mvsx} options. The following basic built-in functions require @option{-mpopcntd}: @smallexample unsigned int __builtin_addg6s (unsigned int, unsigned int); +unsigned long long __builtin_addg6s (unsigned long long, unsigned long long); long long __builtin_bpermd (long long, long long); unsigned int __builtin_cbcdtd (unsigned int); unsigned int __builtin_cdtbcd (unsigned int); long long __builtin_divde (long long, long long); unsigned long long __builtin_divdeu (unsigned long long, unsigned long long); diff --git a/gcc/testsuite/gcc.target/powerpc/pr100693.c b/gcc/testsuite/gcc.target/powerpc/pr100693.c new file mode 100644 index 000000000000..31fd118ee0d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr100693.c @@ -0,0 +1,68 @@ +/* { dg-do compile { target { powerpc*-*-linux* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-mdejagnu-cpu=power7 -O2" } */ +/* { dg-final { scan-assembler-times {\maddg6s\M} 4 } } */ +/* { dg-final { scan-assembler-not "bl __builtin" } } */ + +/* Test case for the addg6s builtin, exercising both + * unsigned int and unsigned long long arguments. See also bcd-1.c. */ + +#include <stdio.h> + +unsigned int test1 (unsigned int a, unsigned int b) +{ + return __builtin_addg6s (a, b); +} + +unsigned long long test2 (unsigned long long a, unsigned long long b) +{ + return __builtin_addg6s (a, b); +} + +/* Expected values, Not a full pattern, these are tuned + * to match the sparse iterations as seen below. */ +unsigned int exp_int[] = { +0x00000000, +0x00000006, +0x00000666, +0x00006666, +0x00666666, +0x06666666, +0x77777777 +}; + +unsigned long long exp_longlong[] = { +0x0000000000000000, +0x0000000000000006, +0x0000000000000666, +0x0000000000006666, +0x0000000000666666, +0x0000000006666666, +0x0000000666666666, +0x0000006666666666, +0x0000666666666666, +0x0006666666666666, +0x0666666666666666, +0x7777777777777777 +}; + +int main() { + unsigned long long z; + unsigned int ux; + unsigned long long uxl; + int idx; + + for (z=0,idx=0 ; z<=31; z+=6,idx++) { + ux = test1(0x01<<z,0xffffffff); + if (exp_int[idx]!=ux) + printf("%08lx i:%d %08lx \n",ux,idx,exp_int[idx]); + } + + for (z=0,idx=0 ; z<=63; z+=6,idx++) { + uxl = test2(0x01LL<<z,0xffffffffffffffff); + if (exp_longlong[idx]!=uxl) + printf("%016llx i:%d %016llx \n",uxl,idx,exp_longlong[idx]); + } +} +