diff mbox series

[RFC,v1,1/1] PPC64: Implement POWER Architecture Vector Function ABI.

Message ID 1596832552-111518-1-git-send-email-tnggil@protonmail.com
State New
Headers show
Series [RFC,v1,1/1] PPC64: Implement POWER Architecture Vector Function ABI. | expand

Commit Message

GT Aug. 7, 2020, 8:35 p.m. UTC
From: Bert Tenjy <tnggil@gmail.com>

This patch adds functionality to enable use of POWER Architecture's
VSX extensions to speed up certain code sequences. Typically
2 or 4 consecutive loop iterations that each make a single scalar
function call will be combined into one iteration making a single
vector function call.

The patch organization follows that used in x86_64 and Aarch64
implementations of the same vectorization functionality.

The x86_64 document describing their vector function interface is at
https://software.intel.com/sites/default/files/managed/b4/c8/Intel-
Vector-Function-ABI.pdf

Aarch64's document is at: https://developer.arm.com/docs/101129/latest

The document describing POWER Architecture Vector Function interface is
tentatively at: https://sourceware.org/glibc/wiki/Homepage?action=AttachFile&
do=view&target=powerarchvectfuncabi.html

The major test of this patch autovectorizes math functions and thus requires
libmvec. PPC64 libmvec functionality is only available on GLIBC development
branch tuliom/libmvec. Until that branch is merged into the main development
branch, testing vector functions will require building and installing from
branch tuliom/libmvec into a non-system directory. The development GCC
with this patch will similarly have to be built and installed into a non-standard
location.

The DejaGnu vector SIMD tests have been enabled for PPC64.
After necessary changes to the 4 vect-simd-clone-{1,4,5,8}.c files,
45 tests change status from UNSUPPORTED to PASS.
---
 gcc/config/rs6000/rs6000.c                    | 147 ++++++++++++++++++
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c |  12 ++
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c |   4 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c |   4 +
 gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c |   8 +
 gcc/testsuite/lib/target-supports.exp         |   3 +-
 6 files changed, 177 insertions(+), 1 deletion(-)

A few notes:
1. Version v0 of this patch is at:
https://gcc.gnu.org/pipermail/gcc-patches/2020-February/540426.html

2. The code changes in rs6000.c are in response to Bill Schmidt's 
feedback to v0.

3. Vector SIMD testsuite files are now enabled for POWER8 and higher 
PPC64 processors.

4. Changes to files vect-simd-clone-{1,4,5,8}.c are needed since 
PPC64 has only 128bit-wide vector bus. x86_64 for which the tests were
initially written has buses wider than that for AVX and higher architectures.

5. Per Segher's response to v0, we still need to agree a name for the 
guiding document whose name is currently 'POWER Architecture Vector Function 
ABI'.

Comments

Jakub Jelinek Aug. 7, 2020, 8:59 p.m. UTC | #1
On Fri, Aug 07, 2020 at 08:35:52PM +0000, Bert Tenjy via Gcc-patches wrote:
> The document describing POWER Architecture Vector Function interface is
> tentatively at: https://sourceware.org/glibc/wiki/Homepage?action=AttachFile&do=view&target=powerarchvectfuncabi.html

Doesn't exist.

> +  if (TARGET_VSX)
> +    {
> +      clonei->vecsize_mangle = 'b';
> +      ret = 1;
> +    }
> +  clonei->mask_mode = VOIDmode;
> +  switch (clonei->vecsize_mangle)
> +    {
> +    case 'b':
> +      clonei->vecsize_int = 128;
> +      clonei->vecsize_float = 128;
> +      break;
> +    case 'c':
> +      clonei->vecsize_int = 128;
> +      clonei->vecsize_float = 128;

So what is this 'c' case here for?

> +      break;
> +    default:
> +      gcc_unreachable ();

If (!TARGET_VSX), this will abort (as you only set vecsize_mangle to 'b'
if TARGET_VSX, otherwise nothing sets it (so it remains 0).

The way this works is that the caller calls the target hook with 0
as last argument, and the hook either returns 0 (means not supported at
all), or returns number of supported variants and provides the first one.
If more than one are supported, the caller will keep iterating on those.
E.g. on x86, we support 4 modes (b, c, d, e) for exported functions and
for non-exported ones just choose a single one based on the ISA, because in
that case it is not a matter of ABI.

For PowerPC, if all you want to support is b which requires VSX, then the
right thing is for !TREE_PUBLIC functions return 0 if !TARGET_VSX and
otherwise set vecsize_mangle to 'b' and in the end return 1, for exported
functions always set it to 'b' (and in the end return 1).
Then ensure that the 'b' variants of function definitions get target ("vsx")
attribute added if !TARGET_VSX.

That way, e.g. on powerpc64 big endian, the declare simd variants will be
provided for exported functions, but code will only use them if the caller
is TARGET_VSX compiled.

	Jakub
GT Aug. 10, 2020, 5:29 p.m. UTC | #2
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, August 7, 2020 4:59 PM, Jakub Jelinek <jakub@redhat.com> wrote:

> On Fri, Aug 07, 2020 at 08:35:52PM +0000, Bert Tenjy via Gcc-patches wrote:
>
> > The document describing POWER Architecture Vector Function interface is
> > tentatively at: https://sourceware.org/glibc/wiki/Homepage?action=AttachFile&do=view&target=powerarchvectfuncabi.html
>
> Doesn't exist.

I can't tell what the issue is with the link as given above. But the one below does actually open
the page for me.

https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html


>
> > -   if (TARGET_VSX)
> > -   {
> > -        clonei->vecsize_mangle = 'b';
> >
> >
> > -        ret = 1;
> >
> >
> > -   }
> > -   clonei->mask_mode = VOIDmode;
> > -   switch (clonei->vecsize_mangle)
> > -   {
> > -   case 'b':
> > -        clonei->vecsize_int = 128;
> >
> >
> > -        clonei->vecsize_float = 128;
> >
> >
> > -        break;
> >
> >
> > -   case 'c':
> > -        clonei->vecsize_int = 128;
> >
> >
> > -        clonei->vecsize_float = 128;
> >
> >
>
> So what is this 'c' case here for?
>

I should have removed it. It will be deleted in the next version.

> > -        break;
> >
> >
> > -   default:
> > -        gcc_unreachable ();
> >
> >
>
> If (!TARGET_VSX), this will abort (as you only set vecsize_mangle to 'b'
> if TARGET_VSX, otherwise nothing sets it (so it remains 0).
>
> The way this works is that the caller calls the target hook with 0
> as last argument, and the hook either returns 0 (means not supported at
> all), or returns number of supported variants and provides the first one.
> If more than one are supported, the caller will keep iterating on those.
> E.g. on x86, we support 4 modes (b, c, d, e) for exported functions and
> for non-exported ones just choose a single one based on the ISA, because in
> that case it is not a matter of ABI.
>
> For PowerPC, if all you want to support is b which requires VSX, then the
> right thing is for !TREE_PUBLIC functions return 0 if !TARGET_VSX and
> otherwise set vecsize_mangle to 'b' and in the end return 1, for exported
> functions always set it to 'b' (and in the end return 1).
> Then ensure that the 'b' variants of function definitions get target ("vsx")
> attribute added if !TARGET_VSX.
>

So setting attribute "vsx" for 'b' variants of function definitions is what
should go in function rs6000_simd_clone_usable?

Bert.
Jakub Jelinek Aug. 10, 2020, 6:07 p.m. UTC | #3
On Mon, Aug 10, 2020 at 05:29:49PM +0000, GT wrote:
> > For PowerPC, if all you want to support is b which requires VSX, then the
> > right thing is for !TREE_PUBLIC functions return 0 if !TARGET_VSX and
> > otherwise set vecsize_mangle to 'b' and in the end return 1, for exported
> > functions always set it to 'b' (and in the end return 1).
> > Then ensure that the 'b' variants of function definitions get target ("vsx")
> > attribute added if !TARGET_VSX.
> >
> 
> So setting attribute "vsx" for 'b' variants of function definitions is what
> should go in function rs6000_simd_clone_usable?

No.  That function should say if the particular clone ('b' in this case) is
usable from some caller, and the answer for your 'b' is TARGET_VSX is
required to be non-zero.

The adjustment should go into the simd_clone_adjust target hook, see
what ix86_simd_clone_adjust does (though, that one has more variants to
handle).

	Jakub
GT Aug. 13, 2020, 8:40 p.m. UTC | #4
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, August 10, 2020 2:07 PM, Jakub Jelinek <jakub@redhat.com> wrote:

> On Mon, Aug 10, 2020 at 05:29:49PM +0000, GT wrote:
>
> > > For PowerPC, if all you want to support is b which requires VSX, then the
> > > right thing is for !TREE_PUBLIC functions return 0 if !TARGET_VSX and
> > > otherwise set vecsize_mangle to 'b' and in the end return 1, for exported
> > > functions always set it to 'b' (and in the end return 1).
> > > Then ensure that the 'b' variants of function definitions get target ("vsx")
> > > attribute added if !TARGET_VSX.
> >
> > So setting attribute "vsx" for 'b' variants of function definitions is what
> > should go in function rs6000_simd_clone_usable?
>
> No. That function should say if the particular clone ('b' in this case) is
> usable from some caller, and the answer for your 'b' is TARGET_VSX is
> required to be non-zero.
>
> The adjustment should go into the simd_clone_adjust target hook, see
> what ix86_simd_clone_adjust does (though, that one has more variants to
> handle).

I'm looking at ix86_simd_clone_adjust and also aarch64_simd_clone_adjust. The latter is
much simpler and I see how I would add PPC attribute "vsx" similarly. If I was to follow
the ix86_simd_clone_adjust organization, then ix86_valid_target_attribute_p called near
the end of the function is a problem. Because it in turn calls
ix86_valid_target_attribute_tree and this last function doesn't have a similarly named
function in PPC code.

Also, once the attribute "vsx" is added, where is it used? I mean that in the sense of
where is execution conditioned on the definition of say, the "sse2" string in x86_64?

Bert.
Jakub Jelinek Aug. 13, 2020, 9 p.m. UTC | #5
On Thu, Aug 13, 2020 at 08:40:22PM +0000, GT wrote:
> I'm looking at ix86_simd_clone_adjust and also aarch64_simd_clone_adjust. The latter is
> much simpler and I see how I would add PPC attribute "vsx" similarly. If I was to follow
> the ix86_simd_clone_adjust organization, then ix86_valid_target_attribute_p called near
> the end of the function is a problem. Because it in turn calls
> ix86_valid_target_attribute_tree and this last function doesn't have a similarly named
> function in PPC code.
> 
> Also, once the attribute "vsx" is added, where is it used? I mean that in the sense of
> where is execution conditioned on the definition of say, the "sse2" string in x86_64?

You need to trigger what will the middle-end and backend do if you use
explicit __attribute__((target ("vsx"))) on the function, so in the end it
needs to do some parsing, create a TARGET_OPTION_NODE with the right option
changes and put it to the function.

	Jakub
Segher Boessenkool Aug. 13, 2020, 10:49 p.m. UTC | #6
Hi!

This is about the Power binding to some OpenMP API, right?  It has
nothing to do with "vector" or "ABI" -- we have vectors already, and
we have ABIs already, more than enough of each.

It is very very VERY hard to review this without being told the proper
setting here.


On Fri, Aug 07, 2020 at 08:35:52PM +0000, Bert Tenjy wrote:
> This patch adds functionality to enable use of POWER Architecture's
> VSX extensions to speed up certain code sequences.

It does?  Oh, to implement some OpenMP stuff?

> The document describing POWER Architecture Vector Function interface is
> tentatively at: https://sourceware.org/glibc/wiki/Homepage?action=AttachFile&
> do=view&target=powerarchvectfuncabi.html

"This page does not exist yet. You can create a new empty page, or use
one of the page templates."

> 4. Changes to files vect-simd-clone-{1,4,5,8}.c are needed since 
> PPC64 has only 128bit-wide vector bus. x86_64 for which the tests were
> initially written has buses wider than that for AVX and higher architectures.

There is no "vector bus".  All Power vector registers are 128 bits, yes.

> 5. Per Segher's response to v0, we still need to agree a name for the 
> guiding document whose name is currently 'POWER Architecture Vector Function 
> ABI'.

Not just the document title.  You should use terminology that agrees with
everything else, that isn't usiing the same words for different things,
that isn't super confusing, throughout the patch :-)

> +/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */

The documentation for this hook says ((lack of) line wraps verbatim):

@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int})
This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
@var{simdlen} field if it was previously 0.
The hook should return 0 if SIMD clones shouldn't be emitted,
or number of @var{vecsize_mangle} variants that should be emitted.
@end deftypefn

so I have no idea what this hook should do.  Two of the four arguments
are left completely undefined, to start with.

> +
> +static int
> +rs6000_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
> +                                        struct cgraph_simd_clone *clonei,
> +                                        tree base_type, int num)

Indent is wrong here, btw (should use tabs, and everything should align
to that first "struct"...)  You need to be a bit more creative here,
maybe use shorter function names to begin with?  And use names that say
what the functions are *for*, or giving some context.

"simd" means nothing here.  More than  half of our backend is SIMD
stuff.  That has only sideways to do with what you call "simd" here :-/

> +  tree t;
> +  int i;

Declare things at first use, *in* the first use if you can (you usually
can).

> +  bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE);

This isn't a predicate, don't call it _p please.  It's a boolean, and
"decl_arg" isn't very meaningful either.

You might want to factor this differently altogether.

> +  for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
> +       t && t != void_list_node; t = TREE_CHAIN (t), i++)

Do all that "i" stuff not in the "for" expression itself (but in the
body).  If a for expression doesn't fit on one line, it usually is more
readable if you put all three parts on separate lines.  Complex
initialisation like here is more readable if you do it *before* the
loop.

> +	case E_QImode:
> +	case E_HImode:
> +	case E_SImode:
> +	case E_DImode:
> +	case E_SFmode:
> +	case E_DFmode:

> +	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> +		      "unsupported argument type %qT for simd", arg_type);

That isn't all types.  But you do ISA 2.07?  Put that in a comment
somewhere then please.

> +  if (TARGET_VSX)
> +    {
> +      clonei->vecsize_mangle = 'b';
> +      ret = 1;
> +    }

That is ISA 2.06 (Power 7), which you do not support I think?

> +  switch (clonei->vecsize_mangle)

I don't know what this is.

> +void
> +rs6000_simd_clone_adjust (struct cgraph_node *node)
> +{
> +}

Don't define it if it doesn't do anything?  Or is it required to exist
even if it doesn't ever do anything?

> +static int
> +rs6000_simd_clone_usable (struct cgraph_node *node)
> +{
> +  switch (node->simdclone->vecsize_mangle)
> +    {
> +      case 'b':

(wrong indentation, "case" should align with "{")

> +        if (!TARGET_VSX)
> +          return -1;
> +        return 0;
> +      default:
> +        gcc_unreachable ();
> +    }
> +}

Please don't use switch statements where a simple "if" would do.

static int
rs6000_simd_clone_usable (struct cgraph_node *node)
{
  gcc_assert (node->simdclone->vecsize_mangle == 'b');

  if (TARGET_VSX)
    return 0;

  return -1;
}

(If it looks too complicated, it probably is; don't remove whitelines to
make it shorter, that only makes things worse).


Anyway, please give some context in the proposed commit message: like
pointing to *what* it implements, and where that is described, and where
the binding is described.  And no dead links please ;-)


Segher
GT Aug. 17, 2020, 5:44 p.m. UTC | #7
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, August 13, 2020 6:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> Hi!
>
> This is about the Power binding to some OpenMP API, right? It has
> nothing to do with "vector" or "ABI" -- we have vectors already, and
> we have ABIs already, more than enough of each.
>
> It is very very VERY hard to review this without being told the proper
> setting here.
>

What this is about:

David Edelsohn wanted to have new library functions, one for each of these 6 single-precision functions:
sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions:
sin, cos, sincos, exp, log, and pow.

For the single-precision functions, the corresponding new functions would compute 4 results
simulatneously. For the double-precision functions, the new ones would compute 2 results
simultaneously.

x86_64 has already done something very similar so I thought I would adapt as much of their
documentation and implementation as I could for PPC64.

Let's start with that. Comments so far?

Bert.
Segher Boessenkool Aug. 17, 2020, 9:28 p.m. UTC | #8
On Mon, Aug 17, 2020 at 05:44:46PM +0000, GT wrote:
> > This is about the Power binding to some OpenMP API, right? It has
> > nothing to do with "vector" or "ABI" -- we have vectors already, and
> > we have ABIs already, more than enough of each.
> >
> > It is very very VERY hard to review this without being told the proper
> > setting here.
> 
> What this is about:
> 
> David Edelsohn wanted to have new library functions, one for each of these 6 single-precision functions:
> sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions:
> sin, cos, sincos, exp, log, and pow.
> 
> For the single-precision functions, the corresponding new functions would compute 4 results
> simulatneously. For the double-precision functions, the new ones would compute 2 results
> simultaneously.
> 
> x86_64 has already done something very similar so I thought I would adapt as much of their
> documentation and implementation as I could for PPC64.
> 
> Let's start with that. Comments so far?

That sounds like libmvec?

I still don't know what this is.


Segher
David Edelsohn Aug. 17, 2020, 10:05 p.m. UTC | #9
The Power Vector ABI is available at

https://github.com/power8-abi-doc/vector-function-abi

It apparently did not attach correctly to the sourceware wiki or the
filename is different.

Thanks, David

On Mon, Aug 17, 2020 at 1:44 PM GT <tnggil@protonmail.com> wrote:
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, August 13, 2020 6:49 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>
> > Hi!
> >
> > This is about the Power binding to some OpenMP API, right? It has
> > nothing to do with "vector" or "ABI" -- we have vectors already, and
> > we have ABIs already, more than enough of each.
> >
> > It is very very VERY hard to review this without being told the proper
> > setting here.
> >
>
> What this is about:
>
> David Edelsohn wanted to have new library functions, one for each of these 6 single-precision functions:
> sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions:
> sin, cos, sincos, exp, log, and pow.
>
> For the single-precision functions, the corresponding new functions would compute 4 results
> simulatneously. For the double-precision functions, the new ones would compute 2 results
> simultaneously.
>
> x86_64 has already done something very similar so I thought I would adapt as much of their
> documentation and implementation as I could for PPC64.
>
> Let's start with that. Comments so far?
>
> Bert.
Segher Boessenkool Aug. 17, 2020, 11:06 p.m. UTC | #10
On Mon, Aug 17, 2020 at 06:05:09PM -0400, David Edelsohn wrote:
> The Power Vector ABI is available at
> 
> https://github.com/power8-abi-doc/vector-function-abi
> 
> It apparently did not attach correctly to the sourceware wiki or the
> filename is different.

Thanks!


Segher
GT Aug. 18, 2020, 7:14 p.m. UTC | #11
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, August 17, 2020 5:28 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Mon, Aug 17, 2020 at 05:44:46PM +0000, GT wrote:
>
> > > This is about the Power binding to some OpenMP API, right? It has
> > > nothing to do with "vector" or "ABI" -- we have vectors already, and
> > > we have ABIs already, more than enough of each.
> > > It is very very VERY hard to review this without being told the proper
> > > setting here.
> >
> > What this is about:
> > David Edelsohn wanted to have new library functions, one for each of these 6 single-precision functions:
> > sinf, cosf, sincosf, expf, logf, powf; and these 6 double-precision functions:
> > sin, cos, sincos, exp, log, and pow.
> > For the single-precision functions, the corresponding new functions would compute 4 results
> > simulatneously. For the double-precision functions, the new ones would compute 2 results
> > simultaneously.
> > x86_64 has already done something very similar so I thought I would adapt as much of their
> > documentation and implementation as I could for PPC64.
> > Let's start with that. Comments so far?
>
> That sounds like libmvec?
>
> I still don't know what this is.
>

Yes, it is libmvec.

Now look at what GCC does to the code in Examples 1 and 2 at this link:
https://sourceware.org/glibc/wiki/libmvec

x86_64 added functionality to GCC so such code uses the new functions without the user
having to re-write the loops and explicitly call the new functions.

We are aiming to provide that same capability for PPC64 in GCC.

Bert.
Segher Boessenkool Aug. 18, 2020, 9:32 p.m. UTC | #12
On Tue, Aug 18, 2020 at 07:14:19PM +0000, GT wrote:
> > That sounds like libmvec?
> >
> > I still don't know what this is.
> 
> Yes, it is libmvec.
> 
> Now look at what GCC does to the code in Examples 1 and 2 at this link:
> https://sourceware.org/glibc/wiki/libmvec
> 
> x86_64 added functionality to GCC so such code uses the new functions without the user
> having to re-write the loops and explicitly call the new functions.
> 
> We are aiming to provide that same capability for PPC64 in GCC.

Great!  Please repost with what I already pointed out fixed, that
explanation added, and working links to the documentation?

Thanks in advance,


Segher
GT Aug. 20, 2020, 4:19 p.m. UTC | #13
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, August 18, 2020 5:32 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Tue, Aug 18, 2020 at 07:14:19PM +0000, GT wrote:
>
> > > That sounds like libmvec?
> > > I still don't know what this is.
> >
> > Yes, it is libmvec.
> > Now look at what GCC does to the code in Examples 1 and 2 at this link:
> > https://sourceware.org/glibc/wiki/libmvec
> > x86_64 added functionality to GCC so such code uses the new functions without the user
> > having to re-write the loops and explicitly call the new functions.
> > We are aiming to provide that same capability for PPC64 in GCC.
>
> Great! Please repost with what I already pointed out fixed, that
> explanation added, and working links to the documentation?
>

Are you ok with the titles of the patch and this document?

https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html

Bert.
Segher Boessenkool Aug. 20, 2020, 5:48 p.m. UTC | #14
On Thu, Aug 20, 2020 at 04:19:36PM +0000, GT wrote:
> > Great! Please repost with what I already pointed out fixed, that
> > explanation added, and working links to the documentation?
> 
> Are you ok with the titles of the patch and this document?
> 
> https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html

It is very misleading.  You can undo some of the damage in the first
lines of the commit message, but you can also just fix the title itself,
so that anyone can see what this is about even before reading the
message (which is what a mail subject is *for*!)


Segher
GT Aug. 20, 2020, 7:31 p.m. UTC | #15
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, August 13, 2020 5:00 PM, Jakub Jelinek <jakub@redhat.com> wrote:

> On Thu, Aug 13, 2020 at 08:40:22PM +0000, GT wrote:
>
> > I'm looking at ix86_simd_clone_adjust and also aarch64_simd_clone_adjust. The latter is
> > much simpler and I see how I would add PPC attribute "vsx" similarly. If I was to follow
> > the ix86_simd_clone_adjust organization, then ix86_valid_target_attribute_p called near
> > the end of the function is a problem. Because it in turn calls
> > ix86_valid_target_attribute_tree and this last function doesn't have a similarly named
> > function in PPC code.
> > Also, once the attribute "vsx" is added, where is it used? I mean that in the sense of
> > where is execution conditioned on the definition of say, the "sse2" string in x86_64?
>
> You need to trigger what will the middle-end and backend do if you use
> explicit attribute((target ("vsx"))) on the function, so in the end it
> needs to do some parsing, create a TARGET_OPTION_NODE with the right option
> changes and put it to the function.
>
> Jakub

I'm still trying to understand why we need attribute((target("vsx"))).

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

The documentation for target(string) states that the purpose is to allow a function to be
compiled with different target options than were specified on the command line. Why would we
want the vector version of say sinf to be compiled for a different target than the scalar sinf?

Bert.
Jakub Jelinek Aug. 20, 2020, 8:04 p.m. UTC | #16
On Thu, Aug 20, 2020 at 07:31:50PM +0000, GT wrote:
> I'm still trying to understand why we need attribute((target("vsx"))).
> 
> https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
> 
> The documentation for target(string) states that the purpose is to allow a function to be
> compiled with different target options than were specified on the command line. Why would we
> want the vector version of say sinf to be compiled for a different target than the scalar sinf?

The scalar sinf (well, better let's talk about some arbitrary user function,
sinf or at least the vector versions thereof will be often implemented in
assembly) can be compiled with whatever ISA options the user chooses.
But, the SIMD ABI says that the 'b' char variants are VSX only, need to pass
arguments in certain registers that might not be even available without that
ISA, similarly for returns, and per the ABI the callers will ensure such
functions are only used from code which assumes that ISA.

Have a look at the x86_64 case, we have several different variants there,
e.g. SSE2 that passes in registers that are only available in SSE1 and
later, then AVX variants which use AVX registers, AVX2 variants which use
the same registers but different in he way how integral vectors are passed,
and finally AVX512F variants that use AVX512F registers.
So, on the caller side among the other desirability checks (if the function
is declare simd at all, what vectorization factor is available, whether it
is inbranch or notinbranch or both, whether arguments are vectors or uniform
or linear etc.), the compiler also checks the ISA.  And in loops that aren't
even SSE2 (-mno-sse, -msse1) just won't use any of the simd variants and
will use always scalar version, in code that isn't -mavx or later will only
use SSE2 (or scalar), in code that isn't -mavx2 will only use AVX or SSE2 or
scalar, etc.  Either one wouldn't be even able to pass the arguments or read
the return values without those, or if it would, the ABI still says one can
assume such ISA level.
This is all implemented by:
1) having the target hook used during the vectorization decision
(simd_clone_usable) decide based on the current ISA level
2) on the side of generation of the functions, it checks if the ISA is
already available (e.g. from command line), if it is, nothing needs to be
done, otherwise target attribute is added.

	Jakub
Segher Boessenkool Aug. 20, 2020, 8:29 p.m. UTC | #17
On Thu, Aug 20, 2020 at 07:31:50PM +0000, GT wrote:
> I'm still trying to understand why we need attribute((target("vsx"))).

You need Power8, even!  "vsx" alone is not enough (that only guarantees
Power7).  Your minimum version ("b") requires Power8.


Segher
GT Dec. 4, 2020, 6:28 p.m. UTC | #18
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, August 20, 2020 1:48 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Thu, Aug 20, 2020 at 04:19:36PM +0000, GT wrote:
>
> > > Great! Please repost with what I already pointed out fixed, that
> > > explanation added, and working links to the documentation?
> >
> > Are you ok with the titles of the patch and this document?
> > https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html
>
> It is very misleading. You can undo some of the damage in the first
> lines of the commit message, but you can also just fix the title itself,
> so that anyone can see what this is about even before reading the
> message (which is what a mail subject is for!)
>
> Segher

I have:

1. Changed the title of this document.
2. Removed all references within the document describing itself as an ABI.
3. Added a new introductory paragraph that should hopefully make clearer the doc's purpose.

Use the new link below. There's has been a name change in the link as well.

https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncspec.html

Bert.
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d26a18f3ece..8216a7d7c2c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1273,6 +1273,142 @@  static const struct attribute_spec rs6000_attribute_table[] =
 #endif
   { NULL,        0, 0, false, false, false, false, NULL, NULL }
 };
+
+/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */
+
+static int
+rs6000_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
+                                        struct cgraph_simd_clone *clonei,
+                                        tree base_type, int num)
+{
+  int ret = 1;
+
+  if (clonei->simdlen
+      && (clonei->simdlen < 2
+	  || clonei->simdlen > 128
+	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+    {
+      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		  "unsupported simdlen %d", clonei->simdlen);
+      return 0;
+    }
+
+  tree ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+  if (TREE_CODE (ret_type) != VOID_TYPE)
+    switch (TYPE_MODE (ret_type))
+      {
+      case E_QImode:
+      case E_HImode:
+      case E_SImode:
+      case E_DImode:
+      case E_SFmode:
+      case E_DFmode:
+	if (!AGGREGATE_TYPE_P (ret_type))
+	  break;
+	/* FALLTHRU */
+      default:
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported return type %qT for simd", ret_type);
+	return 0;
+      }
+
+  tree t;
+  int i;
+  tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
+  bool decl_arg_p = (node->definition || type_arg_types == NULL_TREE);
+
+  for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0;
+       t && t != void_list_node; t = TREE_CHAIN (t), i++)
+    {
+      tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t);
+      switch (TYPE_MODE (arg_type))
+	{
+	case E_QImode:
+	case E_HImode:
+	case E_SImode:
+	case E_DImode:
+	case E_SFmode:
+	case E_DFmode:
+	  if (!AGGREGATE_TYPE_P (arg_type))
+	    break;
+	  /* FALLTHRU */
+	default:
+	  if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM)
+	    break;
+	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		      "unsupported argument type %qT for simd", arg_type);
+	  return 0;
+	}
+    }
+
+  if (TARGET_VSX)
+    {
+      clonei->vecsize_mangle = 'b';
+      ret = 1;
+    }
+  clonei->mask_mode = VOIDmode;
+  switch (clonei->vecsize_mangle)
+    {
+    case 'b':
+      clonei->vecsize_int = 128;
+      clonei->vecsize_float = 128;
+      break;
+    case 'c':
+      clonei->vecsize_int = 128;
+      clonei->vecsize_float = 128;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  if (clonei->simdlen == 0)
+    {
+      if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
+	clonei->simdlen = clonei->vecsize_int;
+      else
+	clonei->simdlen = clonei->vecsize_float;
+      clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE (base_type));
+    }
+  else
+    {
+      tree ctype = ret_type;
+      if (TREE_CODE (ret_type) == VOID_TYPE)
+	ctype = base_type;
+      int cnt = GET_MODE_BITSIZE (TYPE_MODE (ctype)) * clonei->simdlen;
+      if (SCALAR_INT_MODE_P (TYPE_MODE (ctype)))
+	cnt /= clonei->vecsize_int;
+      else
+	cnt /= clonei->vecsize_float;
+      if (cnt > 8)
+	{
+	  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		      "unsupported simdlen %d", clonei->simdlen);
+	  return 0;
+	}
+      }
+  return ret;
+}
+
+/* Add target attribute to SIMD clone NODE if needed.  */
+
+void
+rs6000_simd_clone_adjust (struct cgraph_node *node)
+{
+}
+
+static int
+rs6000_simd_clone_usable (struct cgraph_node *node)
+{
+  switch (node->simdclone->vecsize_mangle)
+    {
+      case 'b':
+        if (!TARGET_VSX)
+          return -1;
+        return 0;
+      default:
+        gcc_unreachable ();
+    }
+}
+
 
 #ifndef TARGET_PROFILE_KERNEL
 #define TARGET_PROFILE_KERNEL 0
@@ -1281,6 +1417,17 @@  static const struct attribute_spec rs6000_attribute_table[] =
 /* Initialize the GCC target structure.  */
 #undef TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE rs6000_attribute_table
+
+#undef TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
+#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \
+  rs6000_simd_clone_compute_vecsize_and_simdlen
+
+#undef TARGET_SIMD_CLONE_ADJUST
+#define TARGET_SIMD_CLONE_ADJUST rs6000_simd_clone_adjust
+
+#undef TARGET_SIMD_CLONE_USABLE
+#define TARGET_SIMD_CLONE_USABLE rs6000_simd_clone_usable
+
 #undef TARGET_SET_DEFAULT_TYPE_ATTRIBUTES
 #define TARGET_SET_DEFAULT_TYPE_ATTRIBUTES rs6000_set_default_type_attributes
 #undef TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
index 50429049500..13f20aaa16a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c
@@ -10,10 +10,22 @@ 
 
 int array[N];
 
+#ifdef __powerpc__
+
+#pragma omp declare simd simdlen(2) notinbranch
+#pragma omp declare simd simdlen(2) notinbranch uniform(b) linear(c:3)
+#pragma omp declare simd simdlen(4) notinbranch
+#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
+
+#else
+
 #pragma omp declare simd simdlen(4) notinbranch
 #pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
 #pragma omp declare simd simdlen(8) notinbranch
 #pragma omp declare simd simdlen(8) notinbranch uniform(b) linear(c:3)
+
+#endif
+
 __attribute__((noinline)) int
 foo (int a, int b, int c)
 {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
index debbe77b79d..c440582776e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c
@@ -12,7 +12,11 @@  float d[N];
 int e[N];
 unsigned short f[N];
 
+#ifdef __powerpc__
+#pragma omp declare simd simdlen(4) notinbranch uniform(b)
+#else
 #pragma omp declare simd simdlen(8) notinbranch uniform(b)
+#endif
 __attribute__((noinline)) float
 foo (float a, float b, float c)
 {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
index 6a098d9a51a..d9dc792b81e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c
@@ -10,7 +10,11 @@ 
 
 int d[N], e[N];
 
+#ifdef __powerpc__
+#pragma omp declare simd simdlen(2) notinbranch uniform(b) linear(c:3)
+#else
 #pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
+#endif
 __attribute__((noinline)) long long int
 foo (int a, int b, int c)
 {
diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
index 1bfd19dc8ab..4ed3da47449 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c
@@ -12,14 +12,22 @@  int a[N], b[N];
 long int c[N];
 unsigned char d[N];
 
+#ifdef __powerpc__
+#pragma omp declare simd simdlen(2) notinbranch
+#else
 #pragma omp declare simd simdlen(8) notinbranch
+#endif
 __attribute__((noinline)) int
 foo (long int a, int b, int c)
 {
   return a + b + c;
 }
 
+#ifdef __powerpc__
+#pragma omp declare simd simdlen(2) notinbranch
+#else
 #pragma omp declare simd simdlen(8) notinbranch
+#endif
 __attribute__((noinline)) long int
 bar (int a, int b, long int c)
 {
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index e79015b4d54..48edbc8e415 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3499,7 +3499,8 @@  proc check_effective_target_vect_simd_clones { } {
     return [check_cached_effective_target_indexed vect_simd_clones {
       expr { (([istarget i?86-*-*] || [istarget x86_64-*-*])
 	      && [check_effective_target_avx512f])
-	     || [istarget amdgcn-*-*] }}]
+	     || [istarget amdgcn-*-*]
+	     || ([istarget powerpc*-*-*] && [check_p8vector_hw_available]) }}]
 }
 
 # Return 1 if this is a AArch64 target supporting big endian