Message ID | 1596832552-111518-1-git-send-email-tnggil@protonmail.com |
---|---|
State | New |
Headers | show |
Series | [RFC,v1,1/1] PPC64: Implement POWER Architecture Vector Function ABI. | expand |
On Fri, Aug 07, 2020 at 08:35:52PM +0000, Bert Tenjy via Gcc-patches wrote: > The document describing POWER Architecture Vector Function interface is > tentatively at: https://sourceware.org/glibc/wiki/Homepage?action=AttachFile&do=view&target=powerarchvectfuncabi.html Doesn't exist. > + if (TARGET_VSX) > + { > + clonei->vecsize_mangle = 'b'; > + ret = 1; > + } > + clonei->mask_mode = VOIDmode; > + switch (clonei->vecsize_mangle) > + { > + case 'b': > + clonei->vecsize_int = 128; > + clonei->vecsize_float = 128; > + break; > + case 'c': > + clonei->vecsize_int = 128; > + clonei->vecsize_float = 128; So what is this 'c' case here for? > + break; > + default: > + gcc_unreachable (); If (!TARGET_VSX), this will abort (as you only set vecsize_mangle to 'b' if TARGET_VSX, otherwise nothing sets it (so it remains 0). The way this works is that the caller calls the target hook with 0 as last argument, and the hook either returns 0 (means not supported at all), or returns number of supported variants and provides the first one. If more than one are supported, the caller will keep iterating on those. E.g. on x86, we support 4 modes (b, c, d, e) for exported functions and for non-exported ones just choose a single one based on the ISA, because in that case it is not a matter of ABI. For PowerPC, if all you want to support is b which requires VSX, then the right thing is for !TREE_PUBLIC functions return 0 if !TARGET_VSX and otherwise set vecsize_mangle to 'b' and in the end return 1, for exported functions always set it to 'b' (and in the end return 1). Then ensure that the 'b' variants of function definitions get target ("vsx") attribute added if !TARGET_VSX. That way, e.g. on powerpc64 big endian, the declare simd variants will be provided for exported functions, but code will only use them if the caller is TARGET_VSX compiled. Jakub
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, August 7, 2020 4:59 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Fri, Aug 07, 2020 at 08:35:52PM +0000, Bert Tenjy via Gcc-patches wrote: > > > The document describing POWER Architecture Vector Function interface is > > tentatively at: https://sourceware.org/glibc/wiki/Homepage?action=AttachFile&do=view&target=powerarchvectfuncabi.html > > Doesn't exist. I can't tell what the issue is with the link as given above. But the one below does actually open the page for me. https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html > > > - if (TARGET_VSX) > > - { > > - clonei->vecsize_mangle = 'b'; > > > > > > - ret = 1; > > > > > > - } > > - clonei->mask_mode = VOIDmode; > > - switch (clonei->vecsize_mangle) > > - { > > - case 'b': > > - clonei->vecsize_int = 128; > > > > > > - clonei->vecsize_float = 128; > > > > > > - break; > > > > > > - case 'c': > > - clonei->vecsize_int = 128; > > > > > > - clonei->vecsize_float = 128; > > > > > > So what is this 'c' case here for? > I should have removed it. It will be deleted in the next version. > > - break; > > > > > > - default: > > - gcc_unreachable (); > > > > > > If (!TARGET_VSX), this will abort (as you only set vecsize_mangle to 'b' > if TARGET_VSX, otherwise nothing sets it (so it remains 0). > > The way this works is that the caller calls the target hook with 0 > as last argument, and the hook either returns 0 (means not supported at > all), or returns number of supported variants and provides the first one. > If more than one are supported, the caller will keep iterating on those. > E.g. on x86, we support 4 modes (b, c, d, e) for exported functions and > for non-exported ones just choose a single one based on the ISA, because in > that case it is not a matter of ABI. > > For PowerPC, if all you want to support is b which requires VSX, then the > right thing is for !TREE_PUBLIC functions return 0 if !TARGET_VSX and > otherwise set vecsize_mangle to 'b' and in the end return 1, for exported > functions always set it to 'b' (and in the end return 1). > Then ensure that the 'b' variants of function definitions get target ("vsx") > attribute added if !TARGET_VSX. > So setting attribute "vsx" for 'b' variants of function definitions is what should go in function rs6000_simd_clone_usable? Bert.
On Mon, Aug 10, 2020 at 05:29:49PM +0000, GT wrote: > > For PowerPC, if all you want to support is b which requires VSX, then the > > right thing is for !TREE_PUBLIC functions return 0 if !TARGET_VSX and > > otherwise set vecsize_mangle to 'b' and in the end return 1, for exported > > functions always set it to 'b' (and in the end return 1). > > Then ensure that the 'b' variants of function definitions get target ("vsx") > > attribute added if !TARGET_VSX. > > > > So setting attribute "vsx" for 'b' variants of function definitions is what > should go in function rs6000_simd_clone_usable? No. That function should say if the particular clone ('b' in this case) is usable from some caller, and the answer for your 'b' is TARGET_VSX is required to be non-zero. The adjustment should go into the simd_clone_adjust target hook, see what ix86_simd_clone_adjust does (though, that one has more variants to handle). Jakub
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Monday, August 10, 2020 2:07 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Mon, Aug 10, 2020 at 05:29:49PM +0000, GT wrote: > > > > For PowerPC, if all you want to support is b which requires VSX, then the > > > right thing is for !TREE_PUBLIC functions return 0 if !TARGET_VSX and > > > otherwise set vecsize_mangle to 'b' and in the end return 1, for exported > > > functions always set it to 'b' (and in the end return 1). > > > Then ensure that the 'b' variants of function definitions get target ("vsx") > > > attribute added if !TARGET_VSX. > > > > So setting attribute "vsx" for 'b' variants of function definitions is what > > should go in function rs6000_simd_clone_usable? > > No. That function should say if the particular clone ('b' in this case) is > usable from some caller, and the answer for your 'b' is TARGET_VSX is > required to be non-zero. > > The adjustment should go into the simd_clone_adjust target hook, see > what ix86_simd_clone_adjust does (though, that one has more variants to > handle). I'm looking at ix86_simd_clone_adjust and also aarch64_simd_clone_adjust. The latter is much simpler and I see how I would add PPC attribute "vsx" similarly. If I was to follow the ix86_simd_clone_adjust organization, then ix86_valid_target_attribute_p called near the end of the function is a problem. Because it in turn calls ix86_valid_target_attribute_tree and this last function doesn't have a similarly named function in PPC code. Also, once the attribute "vsx" is added, where is it used? I mean that in the sense of where is execution conditioned on the definition of say, the "sse2" string in x86_64? Bert.
On Thu, Aug 13, 2020 at 08:40:22PM +0000, GT wrote: > I'm looking at ix86_simd_clone_adjust and also aarch64_simd_clone_adjust. The latter is > much simpler and I see how I would add PPC attribute "vsx" similarly. If I was to follow > the ix86_simd_clone_adjust organization, then ix86_valid_target_attribute_p called near > the end of the function is a problem. Because it in turn calls > ix86_valid_target_attribute_tree and this last function doesn't have a similarly named > function in PPC code. > > Also, once the attribute "vsx" is added, where is it used? I mean that in the sense of > where is execution conditioned on the definition of say, the "sse2" string in x86_64? You need to trigger what will the middle-end and backend do if you use explicit __attribute__((target ("vsx"))) on the function, so in the end it needs to do some parsing, create a TARGET_OPTION_NODE with the right option changes and put it to the function. Jakub
Hi! This is about the Power binding to some OpenMP API, right? It has nothing to do with "vector" or "ABI" -- we have vectors already, and we have ABIs already, more than enough of each. It is very very VERY hard to review this without being told the proper setting here. On Fri, Aug 07, 2020 at 08:35:52PM +0000, Bert Tenjy wrote: > This patch adds functionality to enable use of POWER Architecture's > VSX extensions to speed up certain code sequences. It does? Oh, to implement some OpenMP stuff? > The document describing POWER Architecture Vector Function interface is > tentatively at: https://sourceware.org/glibc/wiki/Homepage?action=AttachFile& > do=view&target=powerarchvectfuncabi.html "This page does not exist yet. You can create a new empty page, or use one of the page templates." > 4. Changes to files vect-simd-clone-{1,4,5,8}.c are needed since > PPC64 has only 128bit-wide vector bus. x86_64 for which the tests were > initially written has buses wider than that for AVX and higher architectures. There is no "vector bus". All Power vector registers are 128 bits, yes. > 5. Per Segher's response to v0, we still need to agree a name for the > guiding document whose name is currently 'POWER Architecture Vector Function > ABI'. Not just the document title. You should use terminology that agrees with everything else, that isn't usiing the same words for different things, that isn't super confusing, throughout the patch :-) > +/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN. */ The documentation for this hook says ((lack of) line wraps verbatim): @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}) This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float} fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also @var{simdlen} field if it was previously 0. The hook should return 0 if SIMD clones shouldn't be emitted, or number of @var{vecsize_mangle} variants that should be emitted. @end deftypefn so I have no idea what this hook should do. Two of the four arguments are left completely undefined, to start with. > + > +static int > +rs6000_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, > + struct cgraph_simd_clone *clonei, > + tree base_type, int num) Indent is wrong here, btw (should use tabs, and everything should align to that first "struct"...) You need to be a bit more creative here, maybe use shorter function names to begin with? And use names that say what the functions are *for*, or giving some context. "simd" means nothing here. More than half of our backend is SIMD stuff. That has only sideways to do with what you call "simd" here :-/ > + tree t; > + int i; Declare things at first use, *in* the first use if you can (you usually can). > + bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE); This isn't a predicate, don't call it _p please. It's a boolean, and "decl_arg" isn't very meaningful either. You might want to factor this differently altogether. > + for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0; > + t && t != void_list_node; t = TREE_CHAIN (t), i++) Do all that "i" stuff not in the "for" expression itself (but in the body). If a for expression doesn't fit on one line, it usually is more readable if you put all three parts on separate lines. Complex initialisation like here is more readable if you do it *before* the loop. > + case E_QImode: > + case E_HImode: > + case E_SImode: > + case E_DImode: > + case E_SFmode: > + case E_DFmode: > + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, > + "unsupported argument type %qT for simd", arg_type); That isn't all types. But you do ISA 2.07? Put that in a comment somewhere then please. > + if (TARGET_VSX) > + { > + clonei->vecsize_mangle = 'b'; > + ret = 1; > + } That is ISA 2.06 (Power 7), which you do not support I think? > + switch (clonei->vecsize_mangle) I don't know what this is. > +void > +rs6000_simd_clone_adjust (struct cgraph_node *node) > +{ > +} Don't define it if it doesn't do anything? Or is it required to exist even if it doesn't ever do anything? > +static int > +rs6000_simd_clone_usable (struct cgraph_node *node) > +{ > + switch (node->simdclone->vecsize_mangle) > + { > + case 'b': (wrong indentation, "case" should align with "{") > + if (!TARGET_VSX) > + return -1; > + return 0; > + default: > + gcc_unreachable (); > + } > +} Please don't use switch statements where a simple "if" would do. static int rs6000_simd_clone_usable (struct cgraph_node *node) { gcc_assert (node->simdclone->vecsize_mangle == 'b'); if (TARGET_VSX) return 0; return -1; } (If it looks too complicated, it probably is; don't remove whitelines to make it shorter, that only makes things worse). Anyway, please give some context in the proposed commit message: like pointing to *what* it implements, and where that is described, and where the binding is described. And no dead links please ;-) Segher
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Thursday, August 13, 2020 6:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote: > Hi! > > This is about the Power binding to some OpenMP API, right? It has > nothing to do with "vector" or "ABI" -- we have vectors already, and > we have ABIs already, more than enough of each. > > It is very very VERY hard to review this without being told the proper > setting here. > What this is about: David Edelsohn wanted to have new library functions, one for each of these 6 single-precision functions: sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions: sin, cos, sincos, exp, log, and pow. For the single-precision functions, the corresponding new functions would compute 4 results simulatneously. For the double-precision functions, the new ones would compute 2 results simultaneously. x86_64 has already done something very similar so I thought I would adapt as much of their documentation and implementation as I could for PPC64. Let's start with that. Comments so far? Bert.
On Mon, Aug 17, 2020 at 05:44:46PM +0000, GT wrote: > > This is about the Power binding to some OpenMP API, right? It has > > nothing to do with "vector" or "ABI" -- we have vectors already, and > > we have ABIs already, more than enough of each. > > > > It is very very VERY hard to review this without being told the proper > > setting here. > > What this is about: > > David Edelsohn wanted to have new library functions, one for each of these 6 single-precision functions: > sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions: > sin, cos, sincos, exp, log, and pow. > > For the single-precision functions, the corresponding new functions would compute 4 results > simulatneously. For the double-precision functions, the new ones would compute 2 results > simultaneously. > > x86_64 has already done something very similar so I thought I would adapt as much of their > documentation and implementation as I could for PPC64. > > Let's start with that. Comments so far? That sounds like libmvec? I still don't know what this is. Segher
The Power Vector ABI is available at https://github.com/power8-abi-doc/vector-function-abi It apparently did not attach correctly to the sourceware wiki or the filename is different. Thanks, David On Mon, Aug 17, 2020 at 1:44 PM GT <tnggil@protonmail.com> wrote: > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Thursday, August 13, 2020 6:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote: > > > Hi! > > > > This is about the Power binding to some OpenMP API, right? It has > > nothing to do with "vector" or "ABI" -- we have vectors already, and > > we have ABIs already, more than enough of each. > > > > It is very very VERY hard to review this without being told the proper > > setting here. > > > > What this is about: > > David Edelsohn wanted to have new library functions, one for each of these 6 single-precision functions: > sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions: > sin, cos, sincos, exp, log, and pow. > > For the single-precision functions, the corresponding new functions would compute 4 results > simulatneously. For the double-precision functions, the new ones would compute 2 results > simultaneously. > > x86_64 has already done something very similar so I thought I would adapt as much of their > documentation and implementation as I could for PPC64. > > Let's start with that. Comments so far? > > Bert.
On Mon, Aug 17, 2020 at 06:05:09PM -0400, David Edelsohn wrote: > The Power Vector ABI is available at > > https://github.com/power8-abi-doc/vector-function-abi > > It apparently did not attach correctly to the sourceware wiki or the > filename is different. Thanks! Segher
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Monday, August 17, 2020 5:28 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote: > On Mon, Aug 17, 2020 at 05:44:46PM +0000, GT wrote: > > > > This is about the Power binding to some OpenMP API, right? It has > > > nothing to do with "vector" or "ABI" -- we have vectors already, and > > > we have ABIs already, more than enough of each. > > > It is very very VERY hard to review this without being told the proper > > > setting here. > > > > What this is about: > > David Edelsohn wanted to have new library functions, one for each of these 6 single-precision functions: > > sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions: > > sin, cos, sincos, exp, log, and pow. > > For the single-precision functions, the corresponding new functions would compute 4 results > > simulatneously. For the double-precision functions, the new ones would compute 2 results > > simultaneously. > > x86_64 has already done something very similar so I thought I would adapt as much of their > > documentation and implementation as I could for PPC64. > > Let's start with that. Comments so far? > > That sounds like libmvec? > > I still don't know what this is. > Yes, it is libmvec. Now look at what GCC does to the code in Examples 1 and 2 at this link: https://sourceware.org/glibc/wiki/libmvec x86_64 added functionality to GCC so such code uses the new functions without the user having to re-write the loops and explicitly call the new functions. We are aiming to provide that same capability for PPC64 in GCC. Bert.
On Tue, Aug 18, 2020 at 07:14:19PM +0000, GT wrote: > > That sounds like libmvec? > > > > I still don't know what this is. > > Yes, it is libmvec. > > Now look at what GCC does to the code in Examples 1 and 2 at this link: > https://sourceware.org/glibc/wiki/libmvec > > x86_64 added functionality to GCC so such code uses the new functions without the user > having to re-write the loops and explicitly call the new functions. > > We are aiming to provide that same capability for PPC64 in GCC. Great! Please repost with what I already pointed out fixed, that explanation added, and working links to the documentation? Thanks in advance, Segher
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, August 18, 2020 5:32 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote: > On Tue, Aug 18, 2020 at 07:14:19PM +0000, GT wrote: > > > > That sounds like libmvec? > > > I still don't know what this is. > > > > Yes, it is libmvec. > > Now look at what GCC does to the code in Examples 1 and 2 at this link: > > https://sourceware.org/glibc/wiki/libmvec > > x86_64 added functionality to GCC so such code uses the new functions without the user > > having to re-write the loops and explicitly call the new functions. > > We are aiming to provide that same capability for PPC64 in GCC. > > Great! Please repost with what I already pointed out fixed, that > explanation added, and working links to the documentation? > Are you ok with the titles of the patch and this document? https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html Bert.
On Thu, Aug 20, 2020 at 04:19:36PM +0000, GT wrote: > > Great! Please repost with what I already pointed out fixed, that > > explanation added, and working links to the documentation? > > Are you ok with the titles of the patch and this document? > > https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html It is very misleading. You can undo some of the damage in the first lines of the commit message, but you can also just fix the title itself, so that anyone can see what this is about even before reading the message (which is what a mail subject is *for*!) Segher
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Thursday, August 13, 2020 5:00 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Thu, Aug 13, 2020 at 08:40:22PM +0000, GT wrote: > > > I'm looking at ix86_simd_clone_adjust and also aarch64_simd_clone_adjust. The latter is > > much simpler and I see how I would add PPC attribute "vsx" similarly. If I was to follow > > the ix86_simd_clone_adjust organization, then ix86_valid_target_attribute_p called near > > the end of the function is a problem. Because it in turn calls > > ix86_valid_target_attribute_tree and this last function doesn't have a similarly named > > function in PPC code. > > Also, once the attribute "vsx" is added, where is it used? I mean that in the sense of > > where is execution conditioned on the definition of say, the "sse2" string in x86_64? > > You need to trigger what will the middle-end and backend do if you use > explicit attribute((target ("vsx"))) on the function, so in the end it > needs to do some parsing, create a TARGET_OPTION_NODE with the right option > changes and put it to the function. > > Jakub I'm still trying to understand why we need attribute((target("vsx"))). https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes The documentation for target(string) states that the purpose is to allow a function to be compiled with different target options than were specified on the command line. Why would we want the vector version of say sinf to be compiled for a different target than the scalar sinf? Bert.
On Thu, Aug 20, 2020 at 07:31:50PM +0000, GT wrote: > I'm still trying to understand why we need attribute((target("vsx"))). > > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes > > The documentation for target(string) states that the purpose is to allow a function to be > compiled with different target options than were specified on the command line. Why would we > want the vector version of say sinf to be compiled for a different target than the scalar sinf? The scalar sinf (well, better let's talk about some arbitrary user function, sinf or at least the vector versions thereof will be often implemented in assembly) can be compiled with whatever ISA options the user chooses. But, the SIMD ABI says that the 'b' char variants are VSX only, need to pass arguments in certain registers that might not be even available without that ISA, similarly for returns, and per the ABI the callers will ensure such functions are only used from code which assumes that ISA. Have a look at the x86_64 case, we have several different variants there, e.g. SSE2 that passes in registers that are only available in SSE1 and later, then AVX variants which use AVX registers, AVX2 variants which use the same registers but different in he way how integral vectors are passed, and finally AVX512F variants that use AVX512F registers. So, on the caller side among the other desirability checks (if the function is declare simd at all, what vectorization factor is available, whether it is inbranch or notinbranch or both, whether arguments are vectors or uniform or linear etc.), the compiler also checks the ISA. And in loops that aren't even SSE2 (-mno-sse, -msse1) just won't use any of the simd variants and will use always scalar version, in code that isn't -mavx or later will only use SSE2 (or scalar), in code that isn't -mavx2 will only use AVX or SSE2 or scalar, etc. Either one wouldn't be even able to pass the arguments or read the return values without those, or if it would, the ABI still says one can assume such ISA level. This is all implemented by: 1) having the target hook used during the vectorization decision (simd_clone_usable) decide based on the current ISA level 2) on the side of generation of the functions, it checks if the ISA is already available (e.g. from command line), if it is, nothing needs to be done, otherwise target attribute is added. Jakub
On Thu, Aug 20, 2020 at 07:31:50PM +0000, GT wrote:
> I'm still trying to understand why we need attribute((target("vsx"))).
You need Power8, even! "vsx" alone is not enough (that only guarantees
Power7). Your minimum version ("b") requires Power8.
Segher
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Thursday, August 20, 2020 1:48 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote: > On Thu, Aug 20, 2020 at 04:19:36PM +0000, GT wrote: > > > > Great! Please repost with what I already pointed out fixed, that > > > explanation added, and working links to the documentation? > > > > Are you ok with the titles of the patch and this document? > > https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html > > It is very misleading. You can undo some of the damage in the first > lines of the commit message, but you can also just fix the title itself, > so that anyone can see what this is about even before reading the > message (which is what a mail subject is for!) > > Segher I have: 1. Changed the title of this document. 2. Removed all references within the document describing itself as an ABI. 3. Added a new introductory paragraph that should hopefully make clearer the doc's purpose. Use the new link below. There's has been a name change in the link as well. https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncspec.html Bert.
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index d26a18f3ece..8216a7d7c2c 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1273,6 +1273,142 @@ static const struct attribute_spec rs6000_attribute_table[] = #endif { NULL, 0, 0, false, false, false, false, NULL, NULL } }; + +/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN. */ + +static int +rs6000_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, + struct cgraph_simd_clone *clonei, + tree base_type, int num) +{ + int ret = 1; + + if (clonei->simdlen + && (clonei->simdlen < 2 + || clonei->simdlen > 128 + || (clonei->simdlen & (clonei->simdlen - 1)) != 0)) + { + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported simdlen %d", clonei->simdlen); + return 0; + } + + tree ret_type = TREE_TYPE (TREE_TYPE (node->decl)); + if (TREE_CODE (ret_type) != VOID_TYPE) + switch (TYPE_MODE (ret_type)) + { + case E_QImode: + case E_HImode: + case E_SImode: + case E_DImode: + case E_SFmode: + case E_DFmode: + if (!AGGREGATE_TYPE_P (ret_type)) + break; + /* FALLTHRU */ + default: + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported return type %qT for simd", ret_type); + return 0; + } + + tree t; + int i; + tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl)); + bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE); + + for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0; + t && t != void_list_node; t = TREE_CHAIN (t), i++) + { + tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t); + switch (TYPE_MODE (arg_type)) + { + case E_QImode: + case E_HImode: + case E_SImode: + case E_DImode: + case E_SFmode: + case E_DFmode: + if (!AGGREGATE_TYPE_P (arg_type)) + break; + /* FALLTHRU */ + default: + if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM) + break; + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported argument type %qT for simd", arg_type); + return 0; + } + } + + if (TARGET_VSX) + { + clonei->vecsize_mangle = 'b'; + ret = 1; + } + clonei->mask_mode = VOIDmode; + switch (clonei->vecsize_mangle) + { + case 'b': + clonei->vecsize_int = 128; + clonei->vecsize_float = 128; + break; + case 'c': + clonei->vecsize_int = 128; + clonei->vecsize_float = 128; + break; + default: + gcc_unreachable (); + } + if (clonei->simdlen == 0) + { + if (SCALAR_INT_MODE_P (TYPE_MODE (base_type))) + clonei->simdlen = clonei->vecsize_int; + else + clonei->simdlen = clonei->vecsize_float; + clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE (base_type)); + } + else + { + tree ctype = ret_type; + if (TREE_CODE (ret_type) == VOID_TYPE) + ctype = base_type; + int cnt = GET_MODE_BITSIZE (TYPE_MODE (ctype)) * clonei->simdlen; + if (SCALAR_INT_MODE_P (TYPE_MODE (ctype))) + cnt /= clonei->vecsize_int; + else + cnt /= clonei->vecsize_float; + if (cnt > 8) + { + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported simdlen %d", clonei->simdlen); + return 0; + } + } + return ret; +} + +/* Add target attribute to SIMD clone NODE if needed. */ + +void +rs6000_simd_clone_adjust (struct cgraph_node *node) +{ +} + +static int +rs6000_simd_clone_usable (struct cgraph_node *node) +{ + switch (node->simdclone->vecsize_mangle) + { + case 'b': + if (!TARGET_VSX) + return -1; + return 0; + default: + gcc_unreachable (); + } +} + #ifndef TARGET_PROFILE_KERNEL #define TARGET_PROFILE_KERNEL 0 @@ -1281,6 +1417,17 @@ static const struct attribute_spec rs6000_attribute_table[] = /* Initialize the GCC target structure. */ #undef TARGET_ATTRIBUTE_TABLE #define TARGET_ATTRIBUTE_TABLE rs6000_attribute_table + +#undef TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN +#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \ + rs6000_simd_clone_compute_vecsize_and_simdlen + +#undef TARGET_SIMD_CLONE_ADJUST +#define TARGET_SIMD_CLONE_ADJUST rs6000_simd_clone_adjust + +#undef TARGET_SIMD_CLONE_USABLE +#define TARGET_SIMD_CLONE_USABLE rs6000_simd_clone_usable + #undef TARGET_SET_DEFAULT_TYPE_ATTRIBUTES #define TARGET_SET_DEFAULT_TYPE_ATTRIBUTES rs6000_set_default_type_attributes #undef TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c index 50429049500..13f20aaa16a 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c @@ -10,10 +10,22 @@ int array[N]; +#ifdef __powerpc__ + +#pragma omp declare simd simdlen(2) notinbranch +#pragma omp declare simd simdlen(2) notinbranch uniform(b) linear(c:3) +#pragma omp declare simd simdlen(4) notinbranch +#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3) + +#else + #pragma omp declare simd simdlen(4) notinbranch #pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3) #pragma omp declare simd simdlen(8) notinbranch #pragma omp declare simd simdlen(8) notinbranch uniform(b) linear(c:3) + +#endif + __attribute__((noinline)) int foo (int a, int b, int c) { diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c index debbe77b79d..c440582776e 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c @@ -12,7 +12,11 @@ float d[N]; int e[N]; unsigned short f[N]; +#ifdef __powerpc__ +#pragma omp declare simd simdlen(4) notinbranch uniform(b) +#else #pragma omp declare simd simdlen(8) notinbranch uniform(b) +#endif __attribute__((noinline)) float foo (float a, float b, float c) { diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c index 6a098d9a51a..d9dc792b81e 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c @@ -10,7 +10,11 @@ int d[N], e[N]; +#ifdef __powerpc__ +#pragma omp declare simd simdlen(2) notinbranch uniform(b) linear(c:3) +#else #pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3) +#endif __attribute__((noinline)) long long int foo (int a, int b, int c) { diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c index 1bfd19dc8ab..4ed3da47449 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c @@ -12,14 +12,22 @@ int a[N], b[N]; long int c[N]; unsigned char d[N]; +#ifdef __powerpc__ +#pragma omp declare simd simdlen(2) notinbranch +#else #pragma omp declare simd simdlen(8) notinbranch +#endif __attribute__((noinline)) int foo (long int a, int b, int c) { return a + b + c; } +#ifdef __powerpc__ +#pragma omp declare simd simdlen(2) notinbranch +#else #pragma omp declare simd simdlen(8) notinbranch +#endif __attribute__((noinline)) long int bar (int a, int b, long int c) { diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index e79015b4d54..48edbc8e415 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3499,7 +3499,8 @@ proc check_effective_target_vect_simd_clones { } { return [check_cached_effective_target_indexed vect_simd_clones { expr { (([istarget i?86-*-*] || [istarget x86_64-*-*]) && [check_effective_target_avx512f]) - || [istarget amdgcn-*-*] }}] + || [istarget amdgcn-*-*] + || ([istarget powerpc*-*-*] && [check_p8vector_hw_available]) }}] } # Return 1 if this is a AArch64 target supporting big endian
From: Bert Tenjy <tnggil@gmail.com> This patch adds functionality to enable use of POWER Architecture's VSX extensions to speed up certain code sequences. Typically 2 or 4 consecutive loop iterations that each make a single scalar function call will be combined into one iteration making a single vector function call. The patch organization follows that used in x86_64 and Aarch64 implementations of the same vectorization functionality. The x86_64 document describing their vector function interface is at https://software.intel.com/sites/default/files/managed/b4/c8/Intel- Vector-Function-ABI.pdf Aarch64's document is at: https://developer.arm.com/docs/101129/latest The document describing POWER Architecture Vector Function interface is tentatively at: https://sourceware.org/glibc/wiki/Homepage?action=AttachFile& do=view&target=powerarchvectfuncabi.html The major test of this patch autovectorizes math functions and thus requires libmvec. PPC64 libmvec functionality is only available on GLIBC development branch tuliom/libmvec. Until that branch is merged into the main development branch, testing vector functions will require building and installing from branch tuliom/libmvec into a non-system directory. The development GCC with this patch will similarly have to be built and installed into a non-standard location. The DejaGnu vector SIMD tests have been enabled for PPC64. After necessary changes to the 4 vect-simd-clone-{1,4,5,8}.c files, 45 tests change status from UNSUPPORTED to PASS. --- gcc/config/rs6000/rs6000.c | 147 ++++++++++++++++++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c | 12 ++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c | 4 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c | 4 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c | 8 + gcc/testsuite/lib/target-supports.exp | 3 +- 6 files changed, 177 insertions(+), 1 deletion(-) A few notes: 1. Version v0 of this patch is at: https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540426.html 2. The code changes in rs6000.c are in response to Bill Schmidt's feedback to v0. 3. Vector SIMD testsuite files are now enabled for POWER8 and higher PPC64 processors. 4. Changes to files vect-simd-clone-{1,4,5,8}.c are needed since PPC64 has only 128bit-wide vector bus. x86_64 for which the tests were initially written has buses wider than that for AVX and higher architectures. 5. Per Segher's response to v0, we still need to agree a name for the guiding document whose name is currently 'POWER Architecture Vector Function ABI'.