diff mbox series

[RFC] Allow functions with target_clones attribute to be inlined

Message ID 20240914191709.648227-1-chenyangyu@isrc.iscas.ac.cn
State New
Headers show
Series [RFC] Allow functions with target_clones attribute to be inlined | expand

Commit Message

Yangyu Chen Sept. 14, 2024, 7:17 p.m. UTC
I recently found that target_clones functions cannot inline even when
the caller has exactly the same target. However, if we only use target
attributes in C++ and let the compiler generate IFUNC for us, the
functions with the same target will be inlined.

For example, the following code compiled on x86-64 target with -O3 will
generate IFUNC for foo and bar and inline foo into the bar:

```cpp
__attribute__((target("default")))
int foo(int *arr) {
    int sum = 0;
    for (int i=0;i<16;i++) sum += arr[i];
    return sum;
}

__attribute__((target("avx2")))
int foo(int *arr) {
    int sum = 0;
    for (int i=0;i<16;i++) sum += arr[i];
    return sum;
}

__attribute__((target("default")))
int bar(int *arr) {
    return foo(arr);
}

__attribute__((target("avx2")))
int bar(int *arr) {
    return foo(arr);
}
```

However, if we use target_clones attribute, the target_clones functions
will not be inlined:

```cpp
__attribute__((target_clones("default","avx2")))
int foo(int *arr) {
    int sum = 0;
    for (int i=0;i<16;i++) sum += arr[i];
    return sum;
}

__attribute__((target_clones("default","avx2")))
int bar(int *arr) {
    return foo(arr);
}
```

This behavior may negatively impact performance since the target_clones
functions are not inlined. And since we didn't jump to the target_clones
functions based on PLT but used the same target as the caller's target.
I think it's better to allow the target_clones functions to be inlined.

gcc/ada/ChangeLog:

        * gcc-interface/utils.cc (handle_target_clones_attribute):
        Allow functions with target_clones attribute to be inlined.

gcc/c-family/ChangeLog:

        * c-attribs.cc (handle_target_clones_attribute):
        Allow functions with target_clones attribute to be inlined.

gcc/d/ChangeLog:

        * d-attribs.cc (d_handle_target_clones_attribute):
        Allow functions with target_clones attribute to be inlined.

Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
---
 gcc/ada/gcc-interface/utils.cc | 5 +----
 gcc/c-family/c-attribs.cc      | 3 ---
 gcc/d/d-attribs.cc             | 5 -----
 3 files changed, 1 insertion(+), 12 deletions(-)

Comments

Richard Sandiford Sept. 18, 2024, 8:46 a.m. UTC | #1
Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> writes:
> I recently found that target_clones functions cannot inline even when
> the caller has exactly the same target. However, if we only use target
> attributes in C++ and let the compiler generate IFUNC for us, the
> functions with the same target will be inlined.
>
> For example, the following code compiled on x86-64 target with -O3 will
> generate IFUNC for foo and bar and inline foo into the bar:
>
> ```cpp
> __attribute__((target("default")))
> int foo(int *arr) {
>     int sum = 0;
>     for (int i=0;i<16;i++) sum += arr[i];
>     return sum;
> }
>
> __attribute__((target("avx2")))
> int foo(int *arr) {
>     int sum = 0;
>     for (int i=0;i<16;i++) sum += arr[i];
>     return sum;
> }
>
> __attribute__((target("default")))
> int bar(int *arr) {
>     return foo(arr);
> }
>
> __attribute__((target("avx2")))
> int bar(int *arr) {
>     return foo(arr);
> }
> ```
>
> However, if we use target_clones attribute, the target_clones functions
> will not be inlined:
>
> ```cpp
> __attribute__((target_clones("default","avx2")))
> int foo(int *arr) {
>     int sum = 0;
>     for (int i=0;i<16;i++) sum += arr[i];
>     return sum;
> }
>
> __attribute__((target_clones("default","avx2")))
> int bar(int *arr) {
>     return foo(arr);
> }
> ```
>
> This behavior may negatively impact performance since the target_clones
> functions are not inlined. And since we didn't jump to the target_clones
> functions based on PLT but used the same target as the caller's target.
> I think it's better to allow the target_clones functions to be inlined.
>
> gcc/ada/ChangeLog:
>
>         * gcc-interface/utils.cc (handle_target_clones_attribute):
>         Allow functions with target_clones attribute to be inlined.
>
> gcc/c-family/ChangeLog:
>
>         * c-attribs.cc (handle_target_clones_attribute):
>         Allow functions with target_clones attribute to be inlined.
>
> gcc/d/ChangeLog:
>
>         * d-attribs.cc (d_handle_target_clones_attribute):
>         Allow functions with target_clones attribute to be inlined.

What I'm about to say applies to both sequences above, but:

Before inlining avx2 foo into avx2 bar, don't we need to be careful about
making sure that foo would still pick the avx2 version if called normally?
E.g. if foo had an avx512 version, direct calls to foo would presumably
pick that on avx512 targets, but still pick the avx2 version of bar.
It would then seem strange for the avx2 version of bar to inline the
avx2 version of foo, both for performance and ODR reasons.

Thanks,
Richard

>
> Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
> ---
>  gcc/ada/gcc-interface/utils.cc | 5 +----
>  gcc/c-family/c-attribs.cc      | 3 ---
>  gcc/d/d-attribs.cc             | 5 -----
>  3 files changed, 1 insertion(+), 12 deletions(-)
>
> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
> index 60f36b1e50d..d010b684177 100644
> --- a/gcc/ada/gcc-interface/utils.cc
> +++ b/gcc/ada/gcc-interface/utils.cc
> @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
>  			  int ARG_UNUSED (flags), bool *no_add_attrs)
>  {
>    /* Ensure we have a function type.  */
> -  if (TREE_CODE (*node) == FUNCTION_DECL)
> -    /* Do not inline functions with multiple clone targets.  */
> -    DECL_UNINLINABLE (*node) = 1;
> -  else
> +  if (TREE_CODE (*node) != FUNCTION_DECL)
>      {
>        warning (OPT_Wattributes, "%qE attribute ignored", name);
>        *no_add_attrs = true;
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index 4dd2eecbea5..f8759bb1908 100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
>  		   "single %<target_clones%> attribute is ignored");
>  	  *no_add_attrs = true;
>  	}
> -      else
> -      /* Do not inline functions with multiple clone targets.  */
> -	DECL_UNINLINABLE (*node) = 1;
>      }
>    else
>      {
> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
> index 0f7ca10e017..9f67415adb1 100644
> --- a/gcc/d/d-attribs.cc
> +++ b/gcc/d/d-attribs.cc
> @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree name, tree, int,
>        warning (OPT_Wattributes, "%qE attribute ignored", name);
>        *no_add_attrs = true;
>      }
> -  else
> -    {
> -      /* Do not inline functions with multiple clone targets.  */
> -      DECL_UNINLINABLE (*node) = 1;
> -    }
>  
>    return NULL_TREE;
>  }
Yangyu Chen Sept. 18, 2024, 3:25 p.m. UTC | #2
> On Sep 18, 2024, at 16:46, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> writes:
>> I recently found that target_clones functions cannot inline even when
>> the caller has exactly the same target. However, if we only use target
>> attributes in C++ and let the compiler generate IFUNC for us, the
>> functions with the same target will be inlined.
>> 
>> For example, the following code compiled on x86-64 target with -O3 will
>> generate IFUNC for foo and bar and inline foo into the bar:
>> 
>> ```cpp
>> __attribute__((target("default")))
>> int foo(int *arr) {
>>    int sum = 0;
>>    for (int i=0;i<16;i++) sum += arr[i];
>>    return sum;
>> }
>> 
>> __attribute__((target("avx2")))
>> int foo(int *arr) {
>>    int sum = 0;
>>    for (int i=0;i<16;i++) sum += arr[i];
>>    return sum;
>> }
>> 
>> __attribute__((target("default")))
>> int bar(int *arr) {
>>    return foo(arr);
>> }
>> 
>> __attribute__((target("avx2")))
>> int bar(int *arr) {
>>    return foo(arr);
>> }
>> ```
>> 
>> However, if we use target_clones attribute, the target_clones functions
>> will not be inlined:
>> 
>> ```cpp
>> __attribute__((target_clones("default","avx2")))
>> int foo(int *arr) {
>>    int sum = 0;
>>    for (int i=0;i<16;i++) sum += arr[i];
>>    return sum;
>> }
>> 
>> __attribute__((target_clones("default","avx2")))
>> int bar(int *arr) {
>>    return foo(arr);
>> }
>> ```
>> 
>> This behavior may negatively impact performance since the target_clones
>> functions are not inlined. And since we didn't jump to the target_clones
>> functions based on PLT but used the same target as the caller's target.
>> I think it's better to allow the target_clones functions to be inlined.
>> 
>> gcc/ada/ChangeLog:
>> 
>>        * gcc-interface/utils.cc (handle_target_clones_attribute):
>>        Allow functions with target_clones attribute to be inlined.
>> 
>> gcc/c-family/ChangeLog:
>> 
>>        * c-attribs.cc (handle_target_clones_attribute):
>>        Allow functions with target_clones attribute to be inlined.
>> 
>> gcc/d/ChangeLog:
>> 
>>        * d-attribs.cc (d_handle_target_clones_attribute):
>>        Allow functions with target_clones attribute to be inlined.
> 
> What I'm about to say applies to both sequences above, but:
> 
> Before inlining avx2 foo into avx2 bar, don't we need to be careful about
> making sure that foo would still pick the avx2 version if called normally?
> E.g. if foo had an avx512 version, direct calls to foo would presumably
> pick that on avx512 targets, but still pick the avx2 version of bar.
> It would then seem strange for the avx2 version of bar to inline the
> avx2 version of foo, both for performance and ODR reasons.
> 

Yes. For now, the GCC will emit code to call the avx2 callee directly,
which will be worse than inlining. This should be fixed in the
future.

> Thanks,
> Richard
> 
>> 
>> Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
>> ---
>> gcc/ada/gcc-interface/utils.cc | 5 +----
>> gcc/c-family/c-attribs.cc      | 3 ---
>> gcc/d/d-attribs.cc             | 5 -----
>> 3 files changed, 1 insertion(+), 12 deletions(-)
>> 
>> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
>> index 60f36b1e50d..d010b684177 100644
>> --- a/gcc/ada/gcc-interface/utils.cc
>> +++ b/gcc/ada/gcc-interface/utils.cc
>> @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
>>   int ARG_UNUSED (flags), bool *no_add_attrs)
>> {
>>   /* Ensure we have a function type.  */
>> -  if (TREE_CODE (*node) == FUNCTION_DECL)
>> -    /* Do not inline functions with multiple clone targets.  */
>> -    DECL_UNINLINABLE (*node) = 1;
>> -  else
>> +  if (TREE_CODE (*node) != FUNCTION_DECL)
>>     {
>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
>>       *no_add_attrs = true;
>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
>> index 4dd2eecbea5..f8759bb1908 100644
>> --- a/gcc/c-family/c-attribs.cc
>> +++ b/gcc/c-family/c-attribs.cc
>> @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
>>    "single %<target_clones%> attribute is ignored");
>>   *no_add_attrs = true;
>> }
>> -      else
>> -      /* Do not inline functions with multiple clone targets.  */
>> - DECL_UNINLINABLE (*node) = 1;
>>     }
>>   else
>>     {
>> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
>> index 0f7ca10e017..9f67415adb1 100644
>> --- a/gcc/d/d-attribs.cc
>> +++ b/gcc/d/d-attribs.cc
>> @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree name, tree, int,
>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
>>       *no_add_attrs = true;
>>     }
>> -  else
>> -    {
>> -      /* Do not inline functions with multiple clone targets.  */
>> -      DECL_UNINLINABLE (*node) = 1;
>> -    }
>> 
>>   return NULL_TREE;
>> }
Andrew Carlotti Sept. 18, 2024, 3:36 p.m. UTC | #3
On Wed, Sep 18, 2024 at 09:46:15AM +0100, Richard Sandiford wrote:
> Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> writes:
> > I recently found that target_clones functions cannot inline even when
> > the caller has exactly the same target. However, if we only use target
> > attributes in C++ and let the compiler generate IFUNC for us, the
> > functions with the same target will be inlined.
> >
> > For example, the following code compiled on x86-64 target with -O3 will
> > generate IFUNC for foo and bar and inline foo into the bar:
> >
> > ```cpp
> > __attribute__((target("default")))
> > int foo(int *arr) {
> >     int sum = 0;
> >     for (int i=0;i<16;i++) sum += arr[i];
> >     return sum;
> > }
> >
> > __attribute__((target("avx2")))
> > int foo(int *arr) {
> >     int sum = 0;
> >     for (int i=0;i<16;i++) sum += arr[i];
> >     return sum;
> > }
> >
> > __attribute__((target("default")))
> > int bar(int *arr) {
> >     return foo(arr);
> > }
> >
> > __attribute__((target("avx2")))
> > int bar(int *arr) {
> >     return foo(arr);
> > }
> > ```
> >
> > However, if we use target_clones attribute, the target_clones functions
> > will not be inlined:
> >
> > ```cpp
> > __attribute__((target_clones("default","avx2")))
> > int foo(int *arr) {
> >     int sum = 0;
> >     for (int i=0;i<16;i++) sum += arr[i];
> >     return sum;
> > }
> >
> > __attribute__((target_clones("default","avx2")))
> > int bar(int *arr) {
> >     return foo(arr);
> > }
> > ```
> >
> > This behavior may negatively impact performance since the target_clones
> > functions are not inlined. And since we didn't jump to the target_clones
> > functions based on PLT but used the same target as the caller's target.
> > I think it's better to allow the target_clones functions to be inlined.
> >
> > gcc/ada/ChangeLog:
> >
> >         * gcc-interface/utils.cc (handle_target_clones_attribute):
> >         Allow functions with target_clones attribute to be inlined.
> >
> > gcc/c-family/ChangeLog:
> >
> >         * c-attribs.cc (handle_target_clones_attribute):
> >         Allow functions with target_clones attribute to be inlined.
> >
> > gcc/d/ChangeLog:
> >
> >         * d-attribs.cc (d_handle_target_clones_attribute):
> >         Allow functions with target_clones attribute to be inlined.
> 
> What I'm about to say applies to both sequences above, but:
> 
> Before inlining avx2 foo into avx2 bar, don't we need to be careful about
> making sure that foo would still pick the avx2 version if called normally?
> E.g. if foo had an avx512 version, direct calls to foo would presumably
> pick that on avx512 targets, but still pick the avx2 version of bar.
> It would then seem strange for the avx2 version of bar to inline the
> avx2 version of foo, both for performance and ODR reasons.
> 
> Thanks,
> Richard

This is actually an existing bug in the indirection elimination code for the
'target' attribute.  AArch64's 'target_version' attribute is not affected by
this bug because the code in redirect_to_specific_clone only checks the target
attribute (however, AArch64 is affected by a more convoluted bug in cases where
we use both target and target_version attributes at the same time).

I think it's correct to mark functions with the target_clones attribute
as non-inlinable by the usual inlining mechanisms, since we don't know which
version will be called at runtime without doing further work to check this.  If
we can remove the indirection, then the individual function versions might then
become inlinable, but we don't have these individual versions available until
after the target clones pass.

> >
> > Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
> > ---
> >  gcc/ada/gcc-interface/utils.cc | 5 +----
> >  gcc/c-family/c-attribs.cc      | 3 ---
> >  gcc/d/d-attribs.cc             | 5 -----
> >  3 files changed, 1 insertion(+), 12 deletions(-)
> >
> > diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
> > index 60f36b1e50d..d010b684177 100644
> > --- a/gcc/ada/gcc-interface/utils.cc
> > +++ b/gcc/ada/gcc-interface/utils.cc
> > @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
> >  			  int ARG_UNUSED (flags), bool *no_add_attrs)
> >  {
> >    /* Ensure we have a function type.  */
> > -  if (TREE_CODE (*node) == FUNCTION_DECL)
> > -    /* Do not inline functions with multiple clone targets.  */
> > -    DECL_UNINLINABLE (*node) = 1;
> > -  else
> > +  if (TREE_CODE (*node) != FUNCTION_DECL)
> >      {
> >        warning (OPT_Wattributes, "%qE attribute ignored", name);
> >        *no_add_attrs = true;
> > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> > index 4dd2eecbea5..f8759bb1908 100644
> > --- a/gcc/c-family/c-attribs.cc
> > +++ b/gcc/c-family/c-attribs.cc
> > @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
> >  		   "single %<target_clones%> attribute is ignored");
> >  	  *no_add_attrs = true;
> >  	}
> > -      else
> > -      /* Do not inline functions with multiple clone targets.  */
> > -	DECL_UNINLINABLE (*node) = 1;
> >      }
> >    else
> >      {
> > diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
> > index 0f7ca10e017..9f67415adb1 100644
> > --- a/gcc/d/d-attribs.cc
> > +++ b/gcc/d/d-attribs.cc
> > @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree name, tree, int,
> >        warning (OPT_Wattributes, "%qE attribute ignored", name);
> >        *no_add_attrs = true;
> >      }
> > -  else
> > -    {
> > -      /* Do not inline functions with multiple clone targets.  */
> > -      DECL_UNINLINABLE (*node) = 1;
> > -    }
> >  
> >    return NULL_TREE;
> >  }
Yangyu Chen Sept. 18, 2024, 5:01 p.m. UTC | #4
> On Sep 18, 2024, at 23:36, Andrew Carlotti <andrew.carlotti@arm.com> wrote:
> 
> On Wed, Sep 18, 2024 at 09:46:15AM +0100, Richard Sandiford wrote:
>> Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> writes:
>>> I recently found that target_clones functions cannot inline even when
>>> the caller has exactly the same target. However, if we only use target
>>> attributes in C++ and let the compiler generate IFUNC for us, the
>>> functions with the same target will be inlined.
>>> 
>>> For example, the following code compiled on x86-64 target with -O3 will
>>> generate IFUNC for foo and bar and inline foo into the bar:
>>> 
>>> ```cpp
>>> __attribute__((target("default")))
>>> int foo(int *arr) {
>>>    int sum = 0;
>>>    for (int i=0;i<16;i++) sum += arr[i];
>>>    return sum;
>>> }
>>> 
>>> __attribute__((target("avx2")))
>>> int foo(int *arr) {
>>>    int sum = 0;
>>>    for (int i=0;i<16;i++) sum += arr[i];
>>>    return sum;
>>> }
>>> 
>>> __attribute__((target("default")))
>>> int bar(int *arr) {
>>>    return foo(arr);
>>> }
>>> 
>>> __attribute__((target("avx2")))
>>> int bar(int *arr) {
>>>    return foo(arr);
>>> }
>>> ```
>>> 
>>> However, if we use target_clones attribute, the target_clones functions
>>> will not be inlined:
>>> 
>>> ```cpp
>>> __attribute__((target_clones("default","avx2")))
>>> int foo(int *arr) {
>>>    int sum = 0;
>>>    for (int i=0;i<16;i++) sum += arr[i];
>>>    return sum;
>>> }
>>> 
>>> __attribute__((target_clones("default","avx2")))
>>> int bar(int *arr) {
>>>    return foo(arr);
>>> }
>>> ```
>>> 
>>> This behavior may negatively impact performance since the target_clones
>>> functions are not inlined. And since we didn't jump to the target_clones
>>> functions based on PLT but used the same target as the caller's target.
>>> I think it's better to allow the target_clones functions to be inlined.
>>> 
>>> gcc/ada/ChangeLog:
>>> 
>>>        * gcc-interface/utils.cc (handle_target_clones_attribute):
>>>        Allow functions with target_clones attribute to be inlined.
>>> 
>>> gcc/c-family/ChangeLog:
>>> 
>>>        * c-attribs.cc (handle_target_clones_attribute):
>>>        Allow functions with target_clones attribute to be inlined.
>>> 
>>> gcc/d/ChangeLog:
>>> 
>>>        * d-attribs.cc (d_handle_target_clones_attribute):
>>>        Allow functions with target_clones attribute to be inlined.
>> 
>> What I'm about to say applies to both sequences above, but:
>> 
>> Before inlining avx2 foo into avx2 bar, don't we need to be careful about
>> making sure that foo would still pick the avx2 version if called normally?
>> E.g. if foo had an avx512 version, direct calls to foo would presumably
>> pick that on avx512 targets, but still pick the avx2 version of bar.
>> It would then seem strange for the avx2 version of bar to inline the
>> avx2 version of foo, both for performance and ODR reasons.
>> 
>> Thanks,
>> Richard
> 
> This is actually an existing bug in the indirection elimination code for the
> 'target' attribute.  AArch64's 'target_version' attribute is not affected by
> this bug because the code in redirect_to_specific_clone only checks the target
> attribute (however, AArch64 is affected by a more convoluted bug in cases where
> we use both target and target_version attributes at the same time).
> 
> I think it's correct to mark functions with the target_clones attribute
> as non-inlinable by the usual inlining mechanisms, since we don't know which
> version will be called at runtime without doing further work to check this.  

I think the best case might be allowing inline when the top priority
callee function has a subset of the caller's ISA. When the callee
has some target ISA extension that does not exist in the caller's,
we call the function through PLT. However, this solution also needs
to allow the target_clones callee to be able to inline.

> If
> we can remove the indirection, then the individual function versions might then
> become inlinable, but we don't have these individual versions available until
> after the target clones pass.
> 
>>> 
>>> Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
>>> ---
>>> gcc/ada/gcc-interface/utils.cc | 5 +----
>>> gcc/c-family/c-attribs.cc      | 3 ---
>>> gcc/d/d-attribs.cc             | 5 -----
>>> 3 files changed, 1 insertion(+), 12 deletions(-)
>>> 
>>> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
>>> index 60f36b1e50d..d010b684177 100644
>>> --- a/gcc/ada/gcc-interface/utils.cc
>>> +++ b/gcc/ada/gcc-interface/utils.cc
>>> @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
>>>   int ARG_UNUSED (flags), bool *no_add_attrs)
>>> {
>>>   /* Ensure we have a function type.  */
>>> -  if (TREE_CODE (*node) == FUNCTION_DECL)
>>> -    /* Do not inline functions with multiple clone targets.  */
>>> -    DECL_UNINLINABLE (*node) = 1;
>>> -  else
>>> +  if (TREE_CODE (*node) != FUNCTION_DECL)
>>>     {
>>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
>>>       *no_add_attrs = true;
>>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
>>> index 4dd2eecbea5..f8759bb1908 100644
>>> --- a/gcc/c-family/c-attribs.cc
>>> +++ b/gcc/c-family/c-attribs.cc
>>> @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
>>>    "single %<target_clones%> attribute is ignored");
>>>   *no_add_attrs = true;
>>> }
>>> -      else
>>> -      /* Do not inline functions with multiple clone targets.  */
>>> - DECL_UNINLINABLE (*node) = 1;
>>>     }
>>>   else
>>>     {
>>> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
>>> index 0f7ca10e017..9f67415adb1 100644
>>> --- a/gcc/d/d-attribs.cc
>>> +++ b/gcc/d/d-attribs.cc
>>> @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree name, tree, int,
>>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
>>>       *no_add_attrs = true;
>>>     }
>>> -  else
>>> -    {
>>> -      /* Do not inline functions with multiple clone targets.  */
>>> -      DECL_UNINLINABLE (*node) = 1;
>>> -    }
>>> 
>>>   return NULL_TREE;
>>> }
Andrew Carlotti Sept. 18, 2024, 6:40 p.m. UTC | #5
On Thu, Sep 19, 2024 at 01:01:39AM +0800, Yangyu Chen wrote:
> 
> 
> > On Sep 18, 2024, at 23:36, Andrew Carlotti <andrew.carlotti@arm.com> wrote:
> > 
> > On Wed, Sep 18, 2024 at 09:46:15AM +0100, Richard Sandiford wrote:
> >> Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> writes:
> >>> I recently found that target_clones functions cannot inline even when
> >>> the caller has exactly the same target. However, if we only use target
> >>> attributes in C++ and let the compiler generate IFUNC for us, the
> >>> functions with the same target will be inlined.
> >>> 
> >>> For example, the following code compiled on x86-64 target with -O3 will
> >>> generate IFUNC for foo and bar and inline foo into the bar:
> >>> 
> >>> ```cpp
> >>> __attribute__((target("default")))
> >>> int foo(int *arr) {
> >>>    int sum = 0;
> >>>    for (int i=0;i<16;i++) sum += arr[i];
> >>>    return sum;
> >>> }
> >>> 
> >>> __attribute__((target("avx2")))
> >>> int foo(int *arr) {
> >>>    int sum = 0;
> >>>    for (int i=0;i<16;i++) sum += arr[i];
> >>>    return sum;
> >>> }
> >>> 
> >>> __attribute__((target("default")))
> >>> int bar(int *arr) {
> >>>    return foo(arr);
> >>> }
> >>> 
> >>> __attribute__((target("avx2")))
> >>> int bar(int *arr) {
> >>>    return foo(arr);
> >>> }
> >>> ```
> >>> 
> >>> However, if we use target_clones attribute, the target_clones functions
> >>> will not be inlined:
> >>> 
> >>> ```cpp
> >>> __attribute__((target_clones("default","avx2")))
> >>> int foo(int *arr) {
> >>>    int sum = 0;
> >>>    for (int i=0;i<16;i++) sum += arr[i];
> >>>    return sum;
> >>> }
> >>> 
> >>> __attribute__((target_clones("default","avx2")))
> >>> int bar(int *arr) {
> >>>    return foo(arr);
> >>> }
> >>> ```
> >>> 
> >>> This behavior may negatively impact performance since the target_clones
> >>> functions are not inlined. And since we didn't jump to the target_clones
> >>> functions based on PLT but used the same target as the caller's target.
> >>> I think it's better to allow the target_clones functions to be inlined.
> >>> 
> >>> gcc/ada/ChangeLog:
> >>> 
> >>>        * gcc-interface/utils.cc (handle_target_clones_attribute):
> >>>        Allow functions with target_clones attribute to be inlined.
> >>> 
> >>> gcc/c-family/ChangeLog:
> >>> 
> >>>        * c-attribs.cc (handle_target_clones_attribute):
> >>>        Allow functions with target_clones attribute to be inlined.
> >>> 
> >>> gcc/d/ChangeLog:
> >>> 
> >>>        * d-attribs.cc (d_handle_target_clones_attribute):
> >>>        Allow functions with target_clones attribute to be inlined.
> >> 
> >> What I'm about to say applies to both sequences above, but:
> >> 
> >> Before inlining avx2 foo into avx2 bar, don't we need to be careful about
> >> making sure that foo would still pick the avx2 version if called normally?
> >> E.g. if foo had an avx512 version, direct calls to foo would presumably
> >> pick that on avx512 targets, but still pick the avx2 version of bar.
> >> It would then seem strange for the avx2 version of bar to inline the
> >> avx2 version of foo, both for performance and ODR reasons.
> >> 
> >> Thanks,
> >> Richard
> > 
> > This is actually an existing bug in the indirection elimination code for the
> > 'target' attribute.  AArch64's 'target_version' attribute is not affected by
> > this bug because the code in redirect_to_specific_clone only checks the target
> > attribute (however, AArch64 is affected by a more convoluted bug in cases where
> > we use both target and target_version attributes at the same time).
> > 
> > I think it's correct to mark functions with the target_clones attribute
> > as non-inlinable by the usual inlining mechanisms, since we don't know which
> > version will be called at runtime without doing further work to check this.  
> 
> I think the best case might be allowing inline when the top priority
> callee function has a subset of the caller's ISA. When the callee
> has some target ISA extension that does not exist in the caller's,
> we call the function through PLT. However, this solution also needs
> to allow the target_clones callee to be able to inline.

If we redirect to a specific version in the target_clone pass, then presumably
that means that it's safe to inline that version?  I'm not sure whether the
existing code in redirect_to_specific_clone checks whether this is safe before
removing the indirection, which might be a bug in itself.

If we know it's safe to inline, then we could mark the function as inlineable
in redirect_to_specific_clone, which would allow the later ipa_inline pass
toinline the function.

> > If
> > we can remove the indirection, then the individual function versions might then
> > become inlinable, but we don't have these individual versions available until
> > after the target clones pass.
> > 
> >>> 
> >>> Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
> >>> ---
> >>> gcc/ada/gcc-interface/utils.cc | 5 +----
> >>> gcc/c-family/c-attribs.cc      | 3 ---
> >>> gcc/d/d-attribs.cc             | 5 -----
> >>> 3 files changed, 1 insertion(+), 12 deletions(-)
> >>> 
> >>> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
> >>> index 60f36b1e50d..d010b684177 100644
> >>> --- a/gcc/ada/gcc-interface/utils.cc
> >>> +++ b/gcc/ada/gcc-interface/utils.cc
> >>> @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
> >>>   int ARG_UNUSED (flags), bool *no_add_attrs)
> >>> {
> >>>   /* Ensure we have a function type.  */
> >>> -  if (TREE_CODE (*node) == FUNCTION_DECL)
> >>> -    /* Do not inline functions with multiple clone targets.  */
> >>> -    DECL_UNINLINABLE (*node) = 1;
> >>> -  else
> >>> +  if (TREE_CODE (*node) != FUNCTION_DECL)
> >>>     {
> >>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
> >>>       *no_add_attrs = true;
> >>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> >>> index 4dd2eecbea5..f8759bb1908 100644
> >>> --- a/gcc/c-family/c-attribs.cc
> >>> +++ b/gcc/c-family/c-attribs.cc
> >>> @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
> >>>    "single %<target_clones%> attribute is ignored");
> >>>   *no_add_attrs = true;
> >>> }
> >>> -      else
> >>> -      /* Do not inline functions with multiple clone targets.  */
> >>> - DECL_UNINLINABLE (*node) = 1;
> >>>     }
> >>>   else
> >>>     {
> >>> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
> >>> index 0f7ca10e017..9f67415adb1 100644
> >>> --- a/gcc/d/d-attribs.cc
> >>> +++ b/gcc/d/d-attribs.cc
> >>> @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree name, tree, int,
> >>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
> >>>       *no_add_attrs = true;
> >>>     }
> >>> -  else
> >>> -    {
> >>> -      /* Do not inline functions with multiple clone targets.  */
> >>> -      DECL_UNINLINABLE (*node) = 1;
> >>> -    }
> >>> 
> >>>   return NULL_TREE;
> >>> }
> 
>
Richard Biener Sept. 19, 2024, 6:12 a.m. UTC | #6
On Wed, Sep 18, 2024 at 8:41 PM Andrew Carlotti <andrew.carlotti@arm.com> wrote:
>
> On Thu, Sep 19, 2024 at 01:01:39AM +0800, Yangyu Chen wrote:
> >
> >
> > > On Sep 18, 2024, at 23:36, Andrew Carlotti <andrew.carlotti@arm.com> wrote:
> > >
> > > On Wed, Sep 18, 2024 at 09:46:15AM +0100, Richard Sandiford wrote:
> > >> Yangyu Chen <chenyangyu@isrc.iscas.ac.cn> writes:
> > >>> I recently found that target_clones functions cannot inline even when
> > >>> the caller has exactly the same target. However, if we only use target
> > >>> attributes in C++ and let the compiler generate IFUNC for us, the
> > >>> functions with the same target will be inlined.
> > >>>
> > >>> For example, the following code compiled on x86-64 target with -O3 will
> > >>> generate IFUNC for foo and bar and inline foo into the bar:
> > >>>
> > >>> ```cpp
> > >>> __attribute__((target("default")))
> > >>> int foo(int *arr) {
> > >>>    int sum = 0;
> > >>>    for (int i=0;i<16;i++) sum += arr[i];
> > >>>    return sum;
> > >>> }
> > >>>
> > >>> __attribute__((target("avx2")))
> > >>> int foo(int *arr) {
> > >>>    int sum = 0;
> > >>>    for (int i=0;i<16;i++) sum += arr[i];
> > >>>    return sum;
> > >>> }
> > >>>
> > >>> __attribute__((target("default")))
> > >>> int bar(int *arr) {
> > >>>    return foo(arr);
> > >>> }
> > >>>
> > >>> __attribute__((target("avx2")))
> > >>> int bar(int *arr) {
> > >>>    return foo(arr);
> > >>> }
> > >>> ```
> > >>>
> > >>> However, if we use target_clones attribute, the target_clones functions
> > >>> will not be inlined:
> > >>>
> > >>> ```cpp
> > >>> __attribute__((target_clones("default","avx2")))
> > >>> int foo(int *arr) {
> > >>>    int sum = 0;
> > >>>    for (int i=0;i<16;i++) sum += arr[i];
> > >>>    return sum;
> > >>> }
> > >>>
> > >>> __attribute__((target_clones("default","avx2")))
> > >>> int bar(int *arr) {
> > >>>    return foo(arr);
> > >>> }
> > >>> ```
> > >>>
> > >>> This behavior may negatively impact performance since the target_clones
> > >>> functions are not inlined. And since we didn't jump to the target_clones
> > >>> functions based on PLT but used the same target as the caller's target.
> > >>> I think it's better to allow the target_clones functions to be inlined.
> > >>>
> > >>> gcc/ada/ChangeLog:
> > >>>
> > >>>        * gcc-interface/utils.cc (handle_target_clones_attribute):
> > >>>        Allow functions with target_clones attribute to be inlined.
> > >>>
> > >>> gcc/c-family/ChangeLog:
> > >>>
> > >>>        * c-attribs.cc (handle_target_clones_attribute):
> > >>>        Allow functions with target_clones attribute to be inlined.
> > >>>
> > >>> gcc/d/ChangeLog:
> > >>>
> > >>>        * d-attribs.cc (d_handle_target_clones_attribute):
> > >>>        Allow functions with target_clones attribute to be inlined.
> > >>
> > >> What I'm about to say applies to both sequences above, but:
> > >>
> > >> Before inlining avx2 foo into avx2 bar, don't we need to be careful about
> > >> making sure that foo would still pick the avx2 version if called normally?
> > >> E.g. if foo had an avx512 version, direct calls to foo would presumably
> > >> pick that on avx512 targets, but still pick the avx2 version of bar.
> > >> It would then seem strange for the avx2 version of bar to inline the
> > >> avx2 version of foo, both for performance and ODR reasons.
> > >>
> > >> Thanks,
> > >> Richard
> > >
> > > This is actually an existing bug in the indirection elimination code for the
> > > 'target' attribute.  AArch64's 'target_version' attribute is not affected by
> > > this bug because the code in redirect_to_specific_clone only checks the target
> > > attribute (however, AArch64 is affected by a more convoluted bug in cases where
> > > we use both target and target_version attributes at the same time).
> > >
> > > I think it's correct to mark functions with the target_clones attribute
> > > as non-inlinable by the usual inlining mechanisms, since we don't know which
> > > version will be called at runtime without doing further work to check this.
> >
> > I think the best case might be allowing inline when the top priority
> > callee function has a subset of the caller's ISA. When the callee
> > has some target ISA extension that does not exist in the caller's,
> > we call the function through PLT. However, this solution also needs
> > to allow the target_clones callee to be able to inline.
>
> If we redirect to a specific version in the target_clone pass, then presumably
> that means that it's safe to inline that version?

I would say so.  I think marking the resolver decl as not inlineable
is correct but
the individual target clones should not be marked that way (are they?)

>  I'm not sure whether the
> existing code in redirect_to_specific_clone checks whether this is safe before
> removing the indirection, which might be a bug in itself.

I think the code should make the call inlineable already, unless the functions
are marked as not inlineable - you'd need to debug what happens.

RIchard.

>
> If we know it's safe to inline, then we could mark the function as inlineable
> in redirect_to_specific_clone, which would allow the later ipa_inline pass
> toinline the function.
>
> > > If
> > > we can remove the indirection, then the individual function versions might then
> > > become inlinable, but we don't have these individual versions available until
> > > after the target clones pass.
> > >
> > >>>
> > >>> Signed-off-by: Yangyu Chen <chenyangyu@isrc.iscas.ac.cn>
> > >>> ---
> > >>> gcc/ada/gcc-interface/utils.cc | 5 +----
> > >>> gcc/c-family/c-attribs.cc      | 3 ---
> > >>> gcc/d/d-attribs.cc             | 5 -----
> > >>> 3 files changed, 1 insertion(+), 12 deletions(-)
> > >>>
> > >>> diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
> > >>> index 60f36b1e50d..d010b684177 100644
> > >>> --- a/gcc/ada/gcc-interface/utils.cc
> > >>> +++ b/gcc/ada/gcc-interface/utils.cc
> > >>> @@ -7299,10 +7299,7 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
> > >>>   int ARG_UNUSED (flags), bool *no_add_attrs)
> > >>> {
> > >>>   /* Ensure we have a function type.  */
> > >>> -  if (TREE_CODE (*node) == FUNCTION_DECL)
> > >>> -    /* Do not inline functions with multiple clone targets.  */
> > >>> -    DECL_UNINLINABLE (*node) = 1;
> > >>> -  else
> > >>> +  if (TREE_CODE (*node) != FUNCTION_DECL)
> > >>>     {
> > >>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
> > >>>       *no_add_attrs = true;
> > >>> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> > >>> index 4dd2eecbea5..f8759bb1908 100644
> > >>> --- a/gcc/c-family/c-attribs.cc
> > >>> +++ b/gcc/c-family/c-attribs.cc
> > >>> @@ -6105,9 +6105,6 @@ handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
> > >>>    "single %<target_clones%> attribute is ignored");
> > >>>   *no_add_attrs = true;
> > >>> }
> > >>> -      else
> > >>> -      /* Do not inline functions with multiple clone targets.  */
> > >>> - DECL_UNINLINABLE (*node) = 1;
> > >>>     }
> > >>>   else
> > >>>     {
> > >>> diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
> > >>> index 0f7ca10e017..9f67415adb1 100644
> > >>> --- a/gcc/d/d-attribs.cc
> > >>> +++ b/gcc/d/d-attribs.cc
> > >>> @@ -788,11 +788,6 @@ d_handle_target_clones_attribute (tree *node, tree name, tree, int,
> > >>>       warning (OPT_Wattributes, "%qE attribute ignored", name);
> > >>>       *no_add_attrs = true;
> > >>>     }
> > >>> -  else
> > >>> -    {
> > >>> -      /* Do not inline functions with multiple clone targets.  */
> > >>> -      DECL_UNINLINABLE (*node) = 1;
> > >>> -    }
> > >>>
> > >>>   return NULL_TREE;
> > >>> }
> >
> >
diff mbox series

Patch

diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 60f36b1e50d..d010b684177 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -7299,10 +7299,7 @@  handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 			  int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   /* Ensure we have a function type.  */
-  if (TREE_CODE (*node) == FUNCTION_DECL)
-    /* Do not inline functions with multiple clone targets.  */
-    DECL_UNINLINABLE (*node) = 1;
-  else
+  if (TREE_CODE (*node) != FUNCTION_DECL)
     {
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 4dd2eecbea5..f8759bb1908 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -6105,9 +6105,6 @@  handle_target_clones_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 		   "single %<target_clones%> attribute is ignored");
 	  *no_add_attrs = true;
 	}
-      else
-      /* Do not inline functions with multiple clone targets.  */
-	DECL_UNINLINABLE (*node) = 1;
     }
   else
     {
diff --git a/gcc/d/d-attribs.cc b/gcc/d/d-attribs.cc
index 0f7ca10e017..9f67415adb1 100644
--- a/gcc/d/d-attribs.cc
+++ b/gcc/d/d-attribs.cc
@@ -788,11 +788,6 @@  d_handle_target_clones_attribute (tree *node, tree name, tree, int,
       warning (OPT_Wattributes, "%qE attribute ignored", name);
       *no_add_attrs = true;
     }
-  else
-    {
-      /* Do not inline functions with multiple clone targets.  */
-      DECL_UNINLINABLE (*node) = 1;
-    }
 
   return NULL_TREE;
 }