Message ID | 20230704031244.1074834-1-hongyu.wang@intel.com |
---|---|
State | New |
Headers | show |
Series | [V2] i386: Inline function with default arch/tune to caller | expand |
On Tue, Jul 4, 2023 at 5:12 AM Hongyu Wang <hongyu.wang@intel.com> wrote: > > Hi, > > For function with different target attributes, current logic rejects to > inline the callee when any arch or tune is mismatched. Relax the > condition to allow callee with default arch/tune to be inlined. > > Boostrapped/regtested on x86-64-linux-gnu{-m32,}. > > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.cc (ix86_can_inline_p): If callee has > default arch=x86-64 and tune=generic, do not block the > inlining to its caller. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/inline_target_clones.c: New test. OK. In a follow-up patch, can you please document inlining rules involving -march and -mtune to "x86 Function Attributes" section? Currently, the inlining rules at the end of "target function attribute" section does not even mention -march and -mtune. Maybe a subsubsection "Inlining rules" should be added (like AArch64 has) to mention that only default arch and tune are inlined by default (but inline can be forced with always_inline for different mtune flags). Looking at the above, perhaps inlining of different arches can also be forced with always_inline? This would allow developers some control of inlining, and would not be surprising. Thanks, Uros. > --- > gcc/config/i386/i386.cc | 22 +++++++++++------ > .../gcc.target/i386/inline_target_clones.c | 24 +++++++++++++++++++ > 2 files changed, 39 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 8989985700a..4741c9b5364 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee) > != (callee_opts->x_target_flags & ~always_inline_safe_mask)) > ret = false; > > - /* See if arch, tune, etc. are the same. */ > - else if (caller_opts->arch != callee_opts->arch) > - ret = false; > - > - else if (!always_inline && caller_opts->tune != callee_opts->tune) > - ret = false; > - > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath > /* If the calle doesn't use FP expressions differences in > ix86_fpmath can be ignored. We are called from FEs > @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee) > || ipa_fn_summaries->get (callee_node)->fp_expressions)) > ret = false; > > + /* At this point we cannot identify whether arch or tune setting > + comes from target attribute or not. So the most conservative way > + is to allow the callee that uses default arch and tune string to > + be inlined. */ > + else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64") > + && !strcmp (callee_opts->x_ix86_tune_string, "generic")) > + ret = true; > + > + /* See if arch, tune, etc. are the same. */ > + else if (caller_opts->arch != callee_opts->arch) > + ret = false; > + > + else if (!always_inline && caller_opts->tune != callee_opts->tune) > + ret = false; > + > else if (!always_inline > && caller_opts->branch_cost != callee_opts->branch_cost) > ret = false; > diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c > new file mode 100644 > index 00000000000..53db1600ce5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c > @@ -0,0 +1,24 @@ > +/* { dg-do compile } */ > +/* { dg-require-ifunc "" } */ > +/* { dg-options "-O3 -march=x86-64" } */ > +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */ > + > +float callee (float a, float b, float c, float d, > + float e, float f, float g, float h) > +{ > + return a * b + c * d + e * f + g + h + a * c + b * c > + + a * d + b * e + a * f + c * h + > + b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h; > +} > + > +__attribute__((target_clones("default","arch=icelake-server"))) > +void caller (int n, float *a, > + float c1, float c2, float c3, > + float c4, float c5, float c6, > + float c7) > +{ > + for (int i = 0; i < n; i++) > + { > + a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7); > + } > +} > -- > 2.31.1 >
> In a follow-up patch, can you please document inlining rules involving > -march and -mtune to "x86 Function Attributes" section? Currently, the > inlining rules at the end of "target function attribute" section does > not even mention -march and -mtune. Maybe a subsubsection "Inlining > rules" should be added (like AArch64 has) to mention that only default > arch and tune are inlined by default (but inline can be forced with > always_inline for different mtune flags). The document has below at the end of 'target (OPTIONS)' section On the x86, the inliner does not inline a function that has different target options than the caller, unless the callee has a subset of the target options of the caller. For example a function declared with 'target("sse3")' can inline a function with 'target("sse2")', since '-msse3' implies '-msse2'. Do we need to move this part to a new section and combine with -march and -mtune rule description to the new subsubsection? > Looking at the above, perhaps inlining of different arches can also be > forced with always_inline? This would allow developers some control of > inlining, and would not be surprising. If so, I'd like to add the always_inline change on arch to current patch and leave the document change alone in the next patch. Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年7月4日周二 14:19写道: > > On Tue, Jul 4, 2023 at 5:12 AM Hongyu Wang <hongyu.wang@intel.com> wrote: > > > > Hi, > > > > For function with different target attributes, current logic rejects to > > inline the callee when any arch or tune is mismatched. Relax the > > condition to allow callee with default arch/tune to be inlined. > > > > Boostrapped/regtested on x86-64-linux-gnu{-m32,}. > > > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/i386/i386.cc (ix86_can_inline_p): If callee has > > default arch=x86-64 and tune=generic, do not block the > > inlining to its caller. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/inline_target_clones.c: New test. > > OK. > > In a follow-up patch, can you please document inlining rules involving > -march and -mtune to "x86 Function Attributes" section? Currently, the > inlining rules at the end of "target function attribute" section does > not even mention -march and -mtune. Maybe a subsubsection "Inlining > rules" should be added (like AArch64 has) to mention that only default > arch and tune are inlined by default (but inline can be forced with > always_inline for different mtune flags). > > Looking at the above, perhaps inlining of different arches can also be > forced with always_inline? This would allow developers some control of > inlining, and would not be surprising. > > Thanks, > Uros. > > > --- > > gcc/config/i386/i386.cc | 22 +++++++++++------ > > .../gcc.target/i386/inline_target_clones.c | 24 +++++++++++++++++++ > > 2 files changed, 39 insertions(+), 7 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index 8989985700a..4741c9b5364 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee) > > != (callee_opts->x_target_flags & ~always_inline_safe_mask)) > > ret = false; > > > > - /* See if arch, tune, etc. are the same. */ > > - else if (caller_opts->arch != callee_opts->arch) > > - ret = false; > > - > > - else if (!always_inline && caller_opts->tune != callee_opts->tune) > > - ret = false; > > - > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath > > /* If the calle doesn't use FP expressions differences in > > ix86_fpmath can be ignored. We are called from FEs > > @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee) > > || ipa_fn_summaries->get (callee_node)->fp_expressions)) > > ret = false; > > > > + /* At this point we cannot identify whether arch or tune setting > > + comes from target attribute or not. So the most conservative way > > + is to allow the callee that uses default arch and tune string to > > + be inlined. */ > > + else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64") > > + && !strcmp (callee_opts->x_ix86_tune_string, "generic")) > > + ret = true; > > + > > + /* See if arch, tune, etc. are the same. */ > > + else if (caller_opts->arch != callee_opts->arch) > > + ret = false; > > + > > + else if (!always_inline && caller_opts->tune != callee_opts->tune) > > + ret = false; > > + > > else if (!always_inline > > && caller_opts->branch_cost != callee_opts->branch_cost) > > ret = false; > > diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c > > new file mode 100644 > > index 00000000000..53db1600ce5 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c > > @@ -0,0 +1,24 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-ifunc "" } */ > > +/* { dg-options "-O3 -march=x86-64" } */ > > +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */ > > + > > +float callee (float a, float b, float c, float d, > > + float e, float f, float g, float h) > > +{ > > + return a * b + c * d + e * f + g + h + a * c + b * c > > + + a * d + b * e + a * f + c * h + > > + b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h; > > +} > > + > > +__attribute__((target_clones("default","arch=icelake-server"))) > > +void caller (int n, float *a, > > + float c1, float c2, float c3, > > + float c4, float c5, float c6, > > + float c7) > > +{ > > + for (int i = 0; i < n; i++) > > + { > > + a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7); > > + } > > +} > > -- > > 2.31.1 > >
On Tue, Jul 4, 2023 at 10:32 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote: > > > In a follow-up patch, can you please document inlining rules involving > > -march and -mtune to "x86 Function Attributes" section? Currently, the > > inlining rules at the end of "target function attribute" section does > > not even mention -march and -mtune. Maybe a subsubsection "Inlining > > rules" should be added (like AArch64 has) to mention that only default > > arch and tune are inlined by default (but inline can be forced with > > always_inline for different mtune flags). > > The document has below at the end of 'target (OPTIONS)' section > > On the x86, the inliner does not inline a function that has > different target options than the caller, unless the callee > has a subset of the target options of the caller. For example > a function declared with 'target("sse3")' can inline a > function with 'target("sse2")', since '-msse3' implies > '-msse2'. > > Do we need to move this part to a new section and combine with -march and > -mtune rule description to the new subsubsection? > > > Looking at the above, perhaps inlining of different arches can also be > > forced with always_inline? This would allow developers some control of > > inlining, and would not be surprising. > > If so, I'd like to add the always_inline change on arch to current > patch and leave the > document change alone in the next patch. Yes, this is OK. Thanks, Uros. > > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年7月4日周二 14:19写道: > > > > On Tue, Jul 4, 2023 at 5:12 AM Hongyu Wang <hongyu.wang@intel.com> wrote: > > > > > > Hi, > > > > > > For function with different target attributes, current logic rejects to > > > inline the callee when any arch or tune is mismatched. Relax the > > > condition to allow callee with default arch/tune to be inlined. > > > > > > Boostrapped/regtested on x86-64-linux-gnu{-m32,}. > > > > > > Ok for trunk? > > > > > > gcc/ChangeLog: > > > > > > * config/i386/i386.cc (ix86_can_inline_p): If callee has > > > default arch=x86-64 and tune=generic, do not block the > > > inlining to its caller. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/inline_target_clones.c: New test. > > > > OK. > > > > In a follow-up patch, can you please document inlining rules involving > > -march and -mtune to "x86 Function Attributes" section? Currently, the > > inlining rules at the end of "target function attribute" section does > > not even mention -march and -mtune. Maybe a subsubsection "Inlining > > rules" should be added (like AArch64 has) to mention that only default > > arch and tune are inlined by default (but inline can be forced with > > always_inline for different mtune flags). > > > > Looking at the above, perhaps inlining of different arches can also be > > forced with always_inline? This would allow developers some control of > > inlining, and would not be surprising. > > > > Thanks, > > Uros. > > > > > --- > > > gcc/config/i386/i386.cc | 22 +++++++++++------ > > > .../gcc.target/i386/inline_target_clones.c | 24 +++++++++++++++++++ > > > 2 files changed, 39 insertions(+), 7 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > index 8989985700a..4741c9b5364 100644 > > > --- a/gcc/config/i386/i386.cc > > > +++ b/gcc/config/i386/i386.cc > > > @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee) > > > != (callee_opts->x_target_flags & ~always_inline_safe_mask)) > > > ret = false; > > > > > > - /* See if arch, tune, etc. are the same. */ > > > - else if (caller_opts->arch != callee_opts->arch) > > > - ret = false; > > > - > > > - else if (!always_inline && caller_opts->tune != callee_opts->tune) > > > - ret = false; > > > - > > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath > > > /* If the calle doesn't use FP expressions differences in > > > ix86_fpmath can be ignored. We are called from FEs > > > @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee) > > > || ipa_fn_summaries->get (callee_node)->fp_expressions)) > > > ret = false; > > > > > > + /* At this point we cannot identify whether arch or tune setting > > > + comes from target attribute or not. So the most conservative way > > > + is to allow the callee that uses default arch and tune string to > > > + be inlined. */ > > > + else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64") > > > + && !strcmp (callee_opts->x_ix86_tune_string, "generic")) > > > + ret = true; > > > + > > > + /* See if arch, tune, etc. are the same. */ > > > + else if (caller_opts->arch != callee_opts->arch) > > > + ret = false; > > > + > > > + else if (!always_inline && caller_opts->tune != callee_opts->tune) > > > + ret = false; > > > + > > > else if (!always_inline > > > && caller_opts->branch_cost != callee_opts->branch_cost) > > > ret = false; > > > diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c > > > new file mode 100644 > > > index 00000000000..53db1600ce5 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c > > > @@ -0,0 +1,24 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-require-ifunc "" } */ > > > +/* { dg-options "-O3 -march=x86-64" } */ > > > +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */ > > > + > > > +float callee (float a, float b, float c, float d, > > > + float e, float f, float g, float h) > > > +{ > > > + return a * b + c * d + e * f + g + h + a * c + b * c > > > + + a * d + b * e + a * f + c * h + > > > + b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h; > > > +} > > > + > > > +__attribute__((target_clones("default","arch=icelake-server"))) > > > +void caller (int n, float *a, > > > + float c1, float c2, float c3, > > > + float c4, float c5, float c6, > > > + float c7) > > > +{ > > > + for (int i = 0; i < n; i++) > > > + { > > > + a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7); > > > + } > > > +} > > > -- > > > 2.31.1 > > >
Thanks, this is the updated patch I'm going to check in. Uros Bizjak <ubizjak@gmail.com> 于2023年7月4日周二 16:57写道: > > On Tue, Jul 4, 2023 at 10:32 AM Hongyu Wang <wwwhhhyyy333@gmail.com> wrote: > > > > > In a follow-up patch, can you please document inlining rules involving > > > -march and -mtune to "x86 Function Attributes" section? Currently, the > > > inlining rules at the end of "target function attribute" section does > > > not even mention -march and -mtune. Maybe a subsubsection "Inlining > > > rules" should be added (like AArch64 has) to mention that only default > > > arch and tune are inlined by default (but inline can be forced with > > > always_inline for different mtune flags). > > > > The document has below at the end of 'target (OPTIONS)' section > > > > On the x86, the inliner does not inline a function that has > > different target options than the caller, unless the callee > > has a subset of the target options of the caller. For example > > a function declared with 'target("sse3")' can inline a > > function with 'target("sse2")', since '-msse3' implies > > '-msse2'. > > > > Do we need to move this part to a new section and combine with -march and > > -mtune rule description to the new subsubsection? > > > > > Looking at the above, perhaps inlining of different arches can also be > > > forced with always_inline? This would allow developers some control of > > > inlining, and would not be surprising. > > > > If so, I'd like to add the always_inline change on arch to current > > patch and leave the > > document change alone in the next patch. > > Yes, this is OK. > > Thanks, > Uros. > > > > Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年7月4日周二 14:19写道: > > > > > > On Tue, Jul 4, 2023 at 5:12 AM Hongyu Wang <hongyu.wang@intel.com> wrote: > > > > > > > > Hi, > > > > > > > > For function with different target attributes, current logic rejects to > > > > inline the callee when any arch or tune is mismatched. Relax the > > > > condition to allow callee with default arch/tune to be inlined. > > > > > > > > Boostrapped/regtested on x86-64-linux-gnu{-m32,}. > > > > > > > > Ok for trunk? > > > > > > > > gcc/ChangeLog: > > > > > > > > * config/i386/i386.cc (ix86_can_inline_p): If callee has > > > > default arch=x86-64 and tune=generic, do not block the > > > > inlining to its caller. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.target/i386/inline_target_clones.c: New test. > > > > > > OK. > > > > > > In a follow-up patch, can you please document inlining rules involving > > > -march and -mtune to "x86 Function Attributes" section? Currently, the > > > inlining rules at the end of "target function attribute" section does > > > not even mention -march and -mtune. Maybe a subsubsection "Inlining > > > rules" should be added (like AArch64 has) to mention that only default > > > arch and tune are inlined by default (but inline can be forced with > > > always_inline for different mtune flags). > > > > > > Looking at the above, perhaps inlining of different arches can also be > > > forced with always_inline? This would allow developers some control of > > > inlining, and would not be surprising. > > > > > > Thanks, > > > Uros. > > > > > > > --- > > > > gcc/config/i386/i386.cc | 22 +++++++++++------ > > > > .../gcc.target/i386/inline_target_clones.c | 24 +++++++++++++++++++ > > > > 2 files changed, 39 insertions(+), 7 deletions(-) > > > > create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c > > > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > > index 8989985700a..4741c9b5364 100644 > > > > --- a/gcc/config/i386/i386.cc > > > > +++ b/gcc/config/i386/i386.cc > > > > @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee) > > > > != (callee_opts->x_target_flags & ~always_inline_safe_mask)) > > > > ret = false; > > > > > > > > - /* See if arch, tune, etc. are the same. */ > > > > - else if (caller_opts->arch != callee_opts->arch) > > > > - ret = false; > > > > - > > > > - else if (!always_inline && caller_opts->tune != callee_opts->tune) > > > > - ret = false; > > > > - > > > > else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath > > > > /* If the calle doesn't use FP expressions differences in > > > > ix86_fpmath can be ignored. We are called from FEs > > > > @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee) > > > > || ipa_fn_summaries->get (callee_node)->fp_expressions)) > > > > ret = false; > > > > > > > > + /* At this point we cannot identify whether arch or tune setting > > > > + comes from target attribute or not. So the most conservative way > > > > + is to allow the callee that uses default arch and tune string to > > > > + be inlined. */ > > > > + else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64") > > > > + && !strcmp (callee_opts->x_ix86_tune_string, "generic")) > > > > + ret = true; > > > > + > > > > + /* See if arch, tune, etc. are the same. */ > > > > + else if (caller_opts->arch != callee_opts->arch) > > > > + ret = false; > > > > + > > > > + else if (!always_inline && caller_opts->tune != callee_opts->tune) > > > > + ret = false; > > > > + > > > > else if (!always_inline > > > > && caller_opts->branch_cost != callee_opts->branch_cost) > > > > ret = false; > > > > diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c > > > > new file mode 100644 > > > > index 00000000000..53db1600ce5 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c > > > > @@ -0,0 +1,24 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-require-ifunc "" } */ > > > > +/* { dg-options "-O3 -march=x86-64" } */ > > > > +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */ > > > > + > > > > +float callee (float a, float b, float c, float d, > > > > + float e, float f, float g, float h) > > > > +{ > > > > + return a * b + c * d + e * f + g + h + a * c + b * c > > > > + + a * d + b * e + a * f + c * h + > > > > + b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h; > > > > +} > > > > + > > > > +__attribute__((target_clones("default","arch=icelake-server"))) > > > > +void caller (int n, float *a, > > > > + float c1, float c2, float c3, > > > > + float c4, float c5, float c6, > > > > + float c7) > > > > +{ > > > > + for (int i = 0; i < n; i++) > > > > + { > > > > + a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7); > > > > + } > > > > +} > > > > -- > > > > 2.31.1 > > > >
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 8989985700a..4741c9b5364 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee) != (callee_opts->x_target_flags & ~always_inline_safe_mask)) ret = false; - /* See if arch, tune, etc. are the same. */ - else if (caller_opts->arch != callee_opts->arch) - ret = false; - - else if (!always_inline && caller_opts->tune != callee_opts->tune) - ret = false; - else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath /* If the calle doesn't use FP expressions differences in ix86_fpmath can be ignored. We are called from FEs @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee) || ipa_fn_summaries->get (callee_node)->fp_expressions)) ret = false; + /* At this point we cannot identify whether arch or tune setting + comes from target attribute or not. So the most conservative way + is to allow the callee that uses default arch and tune string to + be inlined. */ + else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64") + && !strcmp (callee_opts->x_ix86_tune_string, "generic")) + ret = true; + + /* See if arch, tune, etc. are the same. */ + else if (caller_opts->arch != callee_opts->arch) + ret = false; + + else if (!always_inline && caller_opts->tune != callee_opts->tune) + ret = false; + else if (!always_inline && caller_opts->branch_cost != callee_opts->branch_cost) ret = false; diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c b/gcc/testsuite/gcc.target/i386/inline_target_clones.c new file mode 100644 index 00000000000..53db1600ce5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-ifunc "" } */ +/* { dg-options "-O3 -march=x86-64" } */ +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */ + +float callee (float a, float b, float c, float d, + float e, float f, float g, float h) +{ + return a * b + c * d + e * f + g + h + a * c + b * c + + a * d + b * e + a * f + c * h + + b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h; +} + +__attribute__((target_clones("default","arch=icelake-server"))) +void caller (int n, float *a, + float c1, float c2, float c3, + float c4, float c5, float c6, + float c7) +{ + for (int i = 0; i < n; i++) + { + a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7); + } +}