diff mbox series

i386: Relax inline requirement for functions with different target attrs

Message ID 20230626023408.33758-1-hongyu.wang@intel.com
State New
Headers show
Series i386: Relax inline requirement for functions with different target attrs | expand

Commit Message

Hongyu Wang June 26, 2023, 2:34 a.m. UTC
Hi,

For function with different target attributes, current logic rejects to
inline the callee when any arch or tune is mismatched. Relax the
condition to honor just prefer_vecotr_width_type and other flags that
may cause safety issue so caller can get more optimization opportunity.

Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}

Ok for trunk?

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
	tune directly, just check prefer_vector_width_type and make sure
	not to inline if they mismatch.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/inline-target-attr.c: New test.
---
 gcc/config/i386/i386.cc                       | 11 +++++----
 .../gcc.target/i386/inline-target-attr.c      | 24 +++++++++++++++++++
 2 files changed, 30 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c

Comments

Uros Bizjak June 27, 2023, 9:16 a.m. UTC | #1
On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
>
> Hi,
>
> For function with different target attributes, current logic rejects to
> inline the callee when any arch or tune is mismatched. Relax the
> condition to honor just prefer_vecotr_width_type and other flags that
> may cause safety issue so caller can get more optimization opportunity.

I don't think this is desirable. If we inline something with different
ISAs, we get some strange mix of ISAs when the function is inlined.
OTOH - we already inline with mismatched tune flags if the function is
marked with always_inline.

Uros.

> Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
>         tune directly, just check prefer_vector_width_type and make sure
>         not to inline if they mismatch.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/inline-target-attr.c: New test.
> ---
>  gcc/config/i386/i386.cc                       | 11 +++++----
>  .../gcc.target/i386/inline-target-attr.c      | 24 +++++++++++++++++++
>  2 files changed, 30 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 0761965344b..1d86384ac06 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
>                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
>      ret = false;
>
> -  /* See if arch, tune, etc. are the same.  */
> -  else if (caller_opts->arch != callee_opts->arch)
> -    ret = false;
> -
> -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> +  /* Do not inline when specified perfer-vector-width mismatched between
> +     callee and caller.  */
> +  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> +          && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> +          && callee_opts->x_prefer_vector_width_type
> +             != caller_opts->x_prefer_vector_width_type)
>      ret = false;
>
>    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> new file mode 100644
> index 00000000000..995502165f0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
> +
> +__attribute__((target("arch=skylake")))
> +int callee (int n)
> +{
> +  int sum = 0;
> +  for (int i = 0; i < n; i++)
> +    {
> +      if (i % 2 == 0)
> +       sum +=i;
> +      else
> +       sum += (i - 1);
> +    }
> +  return sum + n;
> +}
> +
> +__attribute__((target("arch=icelake-server")))
> +int caller (int n)
> +{
> +  return callee (n) + n;
> +}
> +
> --
> 2.31.1
>
Hongyu Wang June 28, 2023, 1:49 a.m. UTC | #2
> I don't think this is desirable. If we inline something with different
> ISAs, we get some strange mix of ISAs when the function is inlined.
> OTOH - we already inline with mismatched tune flags if the function is
> marked with always_inline.

Previously ix86_can_inline_p has

if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
     != callee_opts->x_ix86_isa_flags)
    || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
        != callee_opts->x_ix86_isa_flags2))
  ret = false;

It make sure caller ISA is a super set of callee, and the inlined one
should follow caller's ISA specification.

IMHO I cannot give a real example that after inline the caller's
performance get harmed, I added PVW since there might
be some callee want to limit its vector size and caller may have
larger preferred vector size. At least with current change
we get more optimization opportunity for different target_clones.

But I agree the tuning setting may be a factor that affect the
performance. One possible choice is that if the
tune for callee is unspecified or default, just inline it to the
caller with specified arch and tune.

Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道:



>
> On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> >
> > Hi,
> >
> > For function with different target attributes, current logic rejects to
> > inline the callee when any arch or tune is mismatched. Relax the
> > condition to honor just prefer_vecotr_width_type and other flags that
> > may cause safety issue so caller can get more optimization opportunity.
>
> I don't think this is desirable. If we inline something with different
> ISAs, we get some strange mix of ISAs when the function is inlined.
> OTOH - we already inline with mismatched tune flags if the function is
> marked with always_inline.
>
> Uros.
>
> > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
> >         tune directly, just check prefer_vector_width_type and make sure
> >         not to inline if they mismatch.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/inline-target-attr.c: New test.
> > ---
> >  gcc/config/i386/i386.cc                       | 11 +++++----
> >  .../gcc.target/i386/inline-target-attr.c      | 24 +++++++++++++++++++
> >  2 files changed, 30 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 0761965344b..1d86384ac06 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
> >                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
> >      ret = false;
> >
> > -  /* See if arch, tune, etc. are the same.  */
> > -  else if (caller_opts->arch != callee_opts->arch)
> > -    ret = false;
> > -
> > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > +  /* Do not inline when specified perfer-vector-width mismatched between
> > +     callee and caller.  */
> > +  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> > +          && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> > +          && callee_opts->x_prefer_vector_width_type
> > +             != caller_opts->x_prefer_vector_width_type)
> >      ret = false;
> >
> >    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > new file mode 100644
> > index 00000000000..995502165f0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
> > +
> > +__attribute__((target("arch=skylake")))
> > +int callee (int n)
> > +{
> > +  int sum = 0;
> > +  for (int i = 0; i < n; i++)
> > +    {
> > +      if (i % 2 == 0)
> > +       sum +=i;
> > +      else
> > +       sum += (i - 1);
> > +    }
> > +  return sum + n;
> > +}
> > +
> > +__attribute__((target("arch=icelake-server")))
> > +int caller (int n)
> > +{
> > +  return callee (n) + n;
> > +}
> > +
> > --
> > 2.31.1
> >
Uros Bizjak June 28, 2023, 6:42 a.m. UTC | #3
On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
>
> > I don't think this is desirable. If we inline something with different
> > ISAs, we get some strange mix of ISAs when the function is inlined.
> > OTOH - we already inline with mismatched tune flags if the function is
> > marked with always_inline.
>
> Previously ix86_can_inline_p has
>
> if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
>      != callee_opts->x_ix86_isa_flags)
>     || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
>         != callee_opts->x_ix86_isa_flags2))
>   ret = false;
>
> It make sure caller ISA is a super set of callee, and the inlined one
> should follow caller's ISA specification.
>
> IMHO I cannot give a real example that after inline the caller's
> performance get harmed, I added PVW since there might
> be some callee want to limit its vector size and caller may have
> larger preferred vector size. At least with current change
> we get more optimization opportunity for different target_clones.
>
> But I agree the tuning setting may be a factor that affect the
> performance. One possible choice is that if the
> tune for callee is unspecified or default, just inline it to the
> caller with specified arch and tune.

If the user specified a different arch for callee than the caller,
then the compiler will switch on different ISAs (-march is just a
shortcut for different ISA packs), and the programmer is aware that
inlining isn't intended here (we have -mtune, which is not as strong
as -march, but even functions with different -mtune are not inlined
without always_inline attribute). This is documented as:

--q--
On the x86, the inliner does not inline a function that has different
target options than the caller, unless the callee has a subset of the
target options of the caller. For example a function declared with
target("sse3") can inline a function with target("sse2"), since -msse3
implies -msse2.
--/q--

I don't think arch=skylake can be considered as a subset of arch=icelake-server.

I agree that the compiler should reject functions with different PVW.
This is also in accordance with the documentation.

Uros.

>
> Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道:
>
>
>
> >
> > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> > >
> > > Hi,
> > >
> > > For function with different target attributes, current logic rejects to
> > > inline the callee when any arch or tune is mismatched. Relax the
> > > condition to honor just prefer_vecotr_width_type and other flags that
> > > may cause safety issue so caller can get more optimization opportunity.
> >
> > I don't think this is desirable. If we inline something with different
> > ISAs, we get some strange mix of ISAs when the function is inlined.
> > OTOH - we already inline with mismatched tune flags if the function is
> > marked with always_inline.
> >
> > Uros.
> >
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
> > >
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
> > >         tune directly, just check prefer_vector_width_type and make sure
> > >         not to inline if they mismatch.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/i386/inline-target-attr.c: New test.
> > > ---
> > >  gcc/config/i386/i386.cc                       | 11 +++++----
> > >  .../gcc.target/i386/inline-target-attr.c      | 24 +++++++++++++++++++
> > >  2 files changed, 30 insertions(+), 5 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 0761965344b..1d86384ac06 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
> > >                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
> > >      ret = false;
> > >
> > > -  /* See if arch, tune, etc. are the same.  */
> > > -  else if (caller_opts->arch != callee_opts->arch)
> > > -    ret = false;
> > > -
> > > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > > +  /* Do not inline when specified perfer-vector-width mismatched between
> > > +     callee and caller.  */
> > > +  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> > > +          && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> > > +          && callee_opts->x_prefer_vector_width_type
> > > +             != caller_opts->x_prefer_vector_width_type)
> > >      ret = false;
> > >
> > >    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > new file mode 100644
> > > index 00000000000..995502165f0
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2" } */
> > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
> > > +
> > > +__attribute__((target("arch=skylake")))
> > > +int callee (int n)
> > > +{
> > > +  int sum = 0;
> > > +  for (int i = 0; i < n; i++)
> > > +    {
> > > +      if (i % 2 == 0)
> > > +       sum +=i;
> > > +      else
> > > +       sum += (i - 1);
> > > +    }
> > > +  return sum + n;
> > > +}
> > > +
> > > +__attribute__((target("arch=icelake-server")))
> > > +int caller (int n)
> > > +{
> > > +  return callee (n) + n;
> > > +}
> > > +
> > > --
> > > 2.31.1
> > >
Hongyu Wang June 28, 2023, 8:13 a.m. UTC | #4
> If the user specified a different arch for callee than the caller,
> then the compiler will switch on different ISAs (-march is just a
> shortcut for different ISA packs), and the programmer is aware that
> inlining isn't intended here (we have -mtune, which is not as strong
> as -march, but even functions with different -mtune are not inlined
> without always_inline attribute). This is documented as:

The original issue comes from a case like

float callee (float a, float b, float c, float d,
            float e, float f, float g, float h)
{
    return a * b + c * d + e * f + g + h + a * c + b * c
            + a * d + b * e + a * f + c * h +
            b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
}

__attribute__((target_clones("default","arch=icelake-server")))
void caller (int n, float *a,
            float c1, float c2, float c3,
            float c4, float c5, float c6,
            float c7)
{
  for (int i = 0; i < n; i++)
  {
    a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
  }
}

For current gcc, the .icelake_server clone fails to inline callee due
to target specific option mismatch, while the .default clone
succeeded and the loop get vectorized. I think it is not reasonable
that the specific clone with higher arch cannot produce better code.
So I think at least we can decide to inline those callee without any
arch/tune specified, but for now they are rejected by the strict arch=
and tune= check.

Uros Bizjak <ubizjak@gmail.com> 于2023年6月28日周三 14:43写道:
>
> On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
> >
> > > I don't think this is desirable. If we inline something with different
> > > ISAs, we get some strange mix of ISAs when the function is inlined.
> > > OTOH - we already inline with mismatched tune flags if the function is
> > > marked with always_inline.
> >
> > Previously ix86_can_inline_p has
> >
> > if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
> >      != callee_opts->x_ix86_isa_flags)
> >     || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
> >         != callee_opts->x_ix86_isa_flags2))
> >   ret = false;
> >
> > It make sure caller ISA is a super set of callee, and the inlined one
> > should follow caller's ISA specification.
> >
> > IMHO I cannot give a real example that after inline the caller's
> > performance get harmed, I added PVW since there might
> > be some callee want to limit its vector size and caller may have
> > larger preferred vector size. At least with current change
> > we get more optimization opportunity for different target_clones.
> >
> > But I agree the tuning setting may be a factor that affect the
> > performance. One possible choice is that if the
> > tune for callee is unspecified or default, just inline it to the
> > caller with specified arch and tune.
>
> If the user specified a different arch for callee than the caller,
> then the compiler will switch on different ISAs (-march is just a
> shortcut for different ISA packs), and the programmer is aware that
> inlining isn't intended here (we have -mtune, which is not as strong
> as -march, but even functions with different -mtune are not inlined
> without always_inline attribute). This is documented as:
>
> --q--
> On the x86, the inliner does not inline a function that has different
> target options than the caller, unless the callee has a subset of the
> target options of the caller. For example a function declared with
> target("sse3") can inline a function with target("sse2"), since -msse3
> implies -msse2.
> --/q--
>
> I don't think arch=skylake can be considered as a subset of arch=icelake-server.
>
> I agree that the compiler should reject functions with different PVW.
> This is also in accordance with the documentation.
>
> Uros.
>
> >
> > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道:
> >
> >
> >
> > >
> > > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > For function with different target attributes, current logic rejects to
> > > > inline the callee when any arch or tune is mismatched. Relax the
> > > > condition to honor just prefer_vecotr_width_type and other flags that
> > > > may cause safety issue so caller can get more optimization opportunity.
> > >
> > > I don't think this is desirable. If we inline something with different
> > > ISAs, we get some strange mix of ISAs when the function is inlined.
> > > OTOH - we already inline with mismatched tune flags if the function is
> > > marked with always_inline.
> > >
> > > Uros.
> > >
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
> > > >
> > > > Ok for trunk?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
> > > >         tune directly, just check prefer_vector_width_type and make sure
> > > >         not to inline if they mismatch.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         * gcc.target/i386/inline-target-attr.c: New test.
> > > > ---
> > > >  gcc/config/i386/i386.cc                       | 11 +++++----
> > > >  .../gcc.target/i386/inline-target-attr.c      | 24 +++++++++++++++++++
> > > >  2 files changed, 30 insertions(+), 5 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > >
> > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > index 0761965344b..1d86384ac06 100644
> > > > --- a/gcc/config/i386/i386.cc
> > > > +++ b/gcc/config/i386/i386.cc
> > > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
> > > >                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
> > > >      ret = false;
> > > >
> > > > -  /* See if arch, tune, etc. are the same.  */
> > > > -  else if (caller_opts->arch != callee_opts->arch)
> > > > -    ret = false;
> > > > -
> > > > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > > > +  /* Do not inline when specified perfer-vector-width mismatched between
> > > > +     callee and caller.  */
> > > > +  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> > > > +          && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> > > > +          && callee_opts->x_prefer_vector_width_type
> > > > +             != caller_opts->x_prefer_vector_width_type)
> > > >      ret = false;
> > > >
> > > >    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > > new file mode 100644
> > > > index 00000000000..995502165f0
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > > @@ -0,0 +1,24 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2" } */
> > > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
> > > > +
> > > > +__attribute__((target("arch=skylake")))
> > > > +int callee (int n)
> > > > +{
> > > > +  int sum = 0;
> > > > +  for (int i = 0; i < n; i++)
> > > > +    {
> > > > +      if (i % 2 == 0)
> > > > +       sum +=i;
> > > > +      else
> > > > +       sum += (i - 1);
> > > > +    }
> > > > +  return sum + n;
> > > > +}
> > > > +
> > > > +__attribute__((target("arch=icelake-server")))
> > > > +int caller (int n)
> > > > +{
> > > > +  return callee (n) + n;
> > > > +}
> > > > +
> > > > --
> > > > 2.31.1
> > > >
Uros Bizjak June 28, 2023, 8:39 a.m. UTC | #5
On Wed, Jun 28, 2023 at 10:20 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
>
> > If the user specified a different arch for callee than the caller,
> > then the compiler will switch on different ISAs (-march is just a
> > shortcut for different ISA packs), and the programmer is aware that
> > inlining isn't intended here (we have -mtune, which is not as strong
> > as -march, but even functions with different -mtune are not inlined
> > without always_inline attribute). This is documented as:
>
> The original issue comes from a case like
>
> float callee (float a, float b, float c, float d,
>             float e, float f, float g, float h)
> {
>     return a * b + c * d + e * f + g + h + a * c + b * c
>             + a * d + b * e + a * f + c * h +
>             b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
> }
>
> __attribute__((target_clones("default","arch=icelake-server")))
> void caller (int n, float *a,
>             float c1, float c2, float c3,
>             float c4, float c5, float c6,
>             float c7)
> {
>   for (int i = 0; i < n; i++)
>   {
>     a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
>   }
> }
>
> For current gcc, the .icelake_server clone fails to inline callee due
> to target specific option mismatch, while the .default clone
> succeeded and the loop get vectorized. I think it is not reasonable
> that the specific clone with higher arch cannot produce better code.
> So I think at least we can decide to inline those callee without any
> arch/tune specified, but for now they are rejected by the strict arch=
> and tune= check.

Yes, I think it is reasonable to inline callee without an arch/tune
specified. We expect "default" callee to have properties that allow
inlining it into all callers, independent of callers arch/tune target
attribute.

Uros.

>
> Uros Bizjak <ubizjak@gmail.com> 于2023年6月28日周三 14:43写道:
> >
> > On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote:
> > >
> > > > I don't think this is desirable. If we inline something with different
> > > > ISAs, we get some strange mix of ISAs when the function is inlined.
> > > > OTOH - we already inline with mismatched tune flags if the function is
> > > > marked with always_inline.
> > >
> > > Previously ix86_can_inline_p has
> > >
> > > if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
> > >      != callee_opts->x_ix86_isa_flags)
> > >     || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
> > >         != callee_opts->x_ix86_isa_flags2))
> > >   ret = false;
> > >
> > > It make sure caller ISA is a super set of callee, and the inlined one
> > > should follow caller's ISA specification.
> > >
> > > IMHO I cannot give a real example that after inline the caller's
> > > performance get harmed, I added PVW since there might
> > > be some callee want to limit its vector size and caller may have
> > > larger preferred vector size. At least with current change
> > > we get more optimization opportunity for different target_clones.
> > >
> > > But I agree the tuning setting may be a factor that affect the
> > > performance. One possible choice is that if the
> > > tune for callee is unspecified or default, just inline it to the
> > > caller with specified arch and tune.
> >
> > If the user specified a different arch for callee than the caller,
> > then the compiler will switch on different ISAs (-march is just a
> > shortcut for different ISA packs), and the programmer is aware that
> > inlining isn't intended here (we have -mtune, which is not as strong
> > as -march, but even functions with different -mtune are not inlined
> > without always_inline attribute). This is documented as:
> >
> > --q--
> > On the x86, the inliner does not inline a function that has different
> > target options than the caller, unless the callee has a subset of the
> > target options of the caller. For example a function declared with
> > target("sse3") can inline a function with target("sse2"), since -msse3
> > implies -msse2.
> > --/q--
> >
> > I don't think arch=skylake can be considered as a subset of arch=icelake-server.
> >
> > I agree that the compiler should reject functions with different PVW.
> > This is also in accordance with the documentation.
> >
> > Uros.
> >
> > >
> > > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道:
> > >
> > >
> > >
> > > >
> > > > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.wang@intel.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > For function with different target attributes, current logic rejects to
> > > > > inline the callee when any arch or tune is mismatched. Relax the
> > > > > condition to honor just prefer_vecotr_width_type and other flags that
> > > > > may cause safety issue so caller can get more optimization opportunity.
> > > >
> > > > I don't think this is desirable. If we inline something with different
> > > > ISAs, we get some strange mix of ISAs when the function is inlined.
> > > > OTOH - we already inline with mismatched tune flags if the function is
> > > > marked with always_inline.
> > > >
> > > > Uros.
> > > >
> > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
> > > > >
> > > > > Ok for trunk?
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
> > > > >         tune directly, just check prefer_vector_width_type and make sure
> > > > >         not to inline if they mismatch.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         * gcc.target/i386/inline-target-attr.c: New test.
> > > > > ---
> > > > >  gcc/config/i386/i386.cc                       | 11 +++++----
> > > > >  .../gcc.target/i386/inline-target-attr.c      | 24 +++++++++++++++++++
> > > > >  2 files changed, 30 insertions(+), 5 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > > >
> > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > > index 0761965344b..1d86384ac06 100644
> > > > > --- a/gcc/config/i386/i386.cc
> > > > > +++ b/gcc/config/i386/i386.cc
> > > > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
> > > > >                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
> > > > >      ret = false;
> > > > >
> > > > > -  /* See if arch, tune, etc. are the same.  */
> > > > > -  else if (caller_opts->arch != callee_opts->arch)
> > > > > -    ret = false;
> > > > > -
> > > > > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > > > > +  /* Do not inline when specified perfer-vector-width mismatched between
> > > > > +     callee and caller.  */
> > > > > +  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> > > > > +          && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> > > > > +          && callee_opts->x_prefer_vector_width_type
> > > > > +             != caller_opts->x_prefer_vector_width_type)
> > > > >      ret = false;
> > > > >
> > > > >    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > > > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > > > new file mode 100644
> > > > > index 00000000000..995502165f0
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > > > @@ -0,0 +1,24 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2" } */
> > > > > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
> > > > > +
> > > > > +__attribute__((target("arch=skylake")))
> > > > > +int callee (int n)
> > > > > +{
> > > > > +  int sum = 0;
> > > > > +  for (int i = 0; i < n; i++)
> > > > > +    {
> > > > > +      if (i % 2 == 0)
> > > > > +       sum +=i;
> > > > > +      else
> > > > > +       sum += (i - 1);
> > > > > +    }
> > > > > +  return sum + n;
> > > > > +}
> > > > > +
> > > > > +__attribute__((target("arch=icelake-server")))
> > > > > +int caller (int n)
> > > > > +{
> > > > > +  return callee (n) + n;
> > > > > +}
> > > > > +
> > > > > --
> > > > > 2.31.1
> > > > >
diff mbox series

Patch

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 0761965344b..1d86384ac06 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -605,11 +605,12 @@  ix86_can_inline_p (tree caller, tree callee)
 	       != (callee_opts->x_target_flags & ~always_inline_safe_mask))
     ret = false;
 
-  /* See if arch, tune, etc. are the same.  */
-  else if (caller_opts->arch != callee_opts->arch)
-    ret = false;
-
-  else if (!always_inline && caller_opts->tune != callee_opts->tune)
+  /* Do not inline when specified perfer-vector-width mismatched between
+     callee and caller.  */
+  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
+	   && caller_opts->x_prefer_vector_width_type != PVW_NONE)
+	   && callee_opts->x_prefer_vector_width_type
+	      != caller_opts->x_prefer_vector_width_type)
     ret = false;
 
   else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
new file mode 100644
index 00000000000..995502165f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
+
+__attribute__((target("arch=skylake")))
+int callee (int n)
+{
+  int sum = 0;
+  for (int i = 0; i < n; i++)
+    {
+      if (i % 2 == 0)
+	sum +=i;
+      else
+	sum += (i - 1);
+    }
+  return sum + n;
+}
+
+__attribute__((target("arch=icelake-server")))
+int caller (int n)
+{
+  return callee (n) + n;
+}
+