Patchwork [i386] Cannot inline sse*.* functions into avx functions

login
register
mail settings
Submitter Sriraman Tallam
Date June 28, 2013, 10:55 p.m.
Message ID <CAAs8HmywEfzR3CxOUQSQ3z1LC2xvucR--acW0e3Bavy0jO9J1w@mail.gmail.com>
Download mbox | patch
Permalink /patch/255695/
State New
Headers show

Comments

Sriraman Tallam - June 28, 2013, 10:55 p.m.
Hi,

Inlining sse* functions into avx is broken?  Here is an example:

__attribute__((always_inline,target("sse3")))
inline int callee ()
{
  return 0;
}

__attribute__((target("avx")))
inline int caller ()
{
  return callee ();
}

main ()
{
  return caller ();
}

$ g++ -O2 foo.cc
error: inlining failed in call to always_inline 'int callee()': target
specific option mismatch

This patch fixes the problem:

Setting avx as ISA has side-effects on some target flags and directly
comparing them seems incorrect.  This is also breaking the usage of
mmintrinsic headers. An intrinsic in avxintrin.h is calling an
intrinsic in another header.  The call is not inlinable for the same
reason.

Is this ok?

Thanks
Sri
Uros Bizjak - June 30, 2013, 9:31 a.m.
On Sat, Jun 29, 2013 at 12:55 AM, Sriraman Tallam <tmsriram@google.com> wrote:

> Inlining sse* functions into avx is broken?  Here is an example:
>
> __attribute__((always_inline,target("sse3")))
> inline int callee ()
> {
>   return 0;
> }
>
> __attribute__((target("avx")))
> inline int caller ()
> {
>   return callee ();
> }
>
> main ()
> {
>   return caller ();
> }
>
> $ g++ -O2 foo.cc
> error: inlining failed in call to always_inline 'int callee()': target
> specific option mismatch
>
> This patch fixes the problem:
>
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c (revision 200497)
> +++ gcc/config/i386/i386.c (working copy)
> @@ -4520,8 +4520,9 @@ ix86_can_inline_p (tree caller, tree callee)
>    != callee_opts->x_ix86_isa_flags)
>   ret = false;
>
> -      /* See if we have the same non-isa options.  */
> -      else if (caller_opts->x_target_flags != callee_opts->x_target_flags)
> +      /* Callee's non-isa options should be a subset of the caller's.  */
> +      else if ((caller_opts->x_target_flags & callee_opts->x_target_flags)
> +                 != callee_opts->x_target_flags)
>           ret = false;
>
>
> Setting avx as ISA has side-effects on some target flags and directly
> comparing them seems incorrect.  This is also breaking the usage of
> mmintrinsic headers. An intrinsic in avxintrin.h is calling an
> intrinsic in another header.  The call is not inlinable for the same
> reason.
>
> Is this ok?

This looks correct to me.

Do we need to backport this change?

Uros.
Jan Hubicka - June 30, 2013, 9:43 a.m.
> On Sat, Jun 29, 2013 at 12:55 AM, Sriraman Tallam <tmsriram@google.com> wrote:
> 
> > Inlining sse* functions into avx is broken?  Here is an example:
> >
> > __attribute__((always_inline,target("sse3")))
> > inline int callee ()
> > {
> >   return 0;
> > }
> >
> > __attribute__((target("avx")))
> > inline int caller ()
> > {
> >   return callee ();
> > }
> >
> > main ()
> > {
> >   return caller ();
> > }
> >
> > $ g++ -O2 foo.cc
> > error: inlining failed in call to always_inline 'int callee()': target
> > specific option mismatch
> >
> > This patch fixes the problem:
> >
> > Index: gcc/config/i386/i386.c
> > ===================================================================
> > --- gcc/config/i386/i386.c (revision 200497)
> > +++ gcc/config/i386/i386.c (working copy)
> > @@ -4520,8 +4520,9 @@ ix86_can_inline_p (tree caller, tree callee)
> >    != callee_opts->x_ix86_isa_flags)
> >   ret = false;
> >
> > -      /* See if we have the same non-isa options.  */
> > -      else if (caller_opts->x_target_flags != callee_opts->x_target_flags)
> > +      /* Callee's non-isa options should be a subset of the caller's.  */
> > +      else if ((caller_opts->x_target_flags & callee_opts->x_target_flags)
> > +                 != callee_opts->x_target_flags)
> >           ret = false;
> >
> >
> > Setting avx as ISA has side-effects on some target flags and directly
> > comparing them seems incorrect.  This is also breaking the usage of

What target flags are enabled by AVX? Assumming that all target flags are
positive seems incorrect to me (like -mno-red-zone function can not be inlined
into -mred-zone).  Does those conditionally enabled AVX codegen flags have any
effect when AVX is disabled?  Perhaps we can set them unconditionally?

Honza
> > mmintrinsic headers. An intrinsic in avxintrin.h is calling an
> > intrinsic in another header.  The call is not inlinable for the same
> > reason.
> >
> > Is this ok?
> 
> This looks correct to me.
> 
> Do we need to backport this change?
> 
> Uros.
Jan Hubicka - June 30, 2013, 9:47 a.m.
> 
> What target flags are enabled by AVX? Assumming that all target flags are
> positive seems incorrect to me (like -mno-red-zone function can not be inlined
> into -mred-zone).  Does those conditionally enabled AVX codegen flags have any

Actually -mred-zone seems right, but stuff like -msseregparm will probably break?

Honza

> effect when AVX is disabled?  Perhaps we can set them unconditionally?
> 
> Honza
> > > mmintrinsic headers. An intrinsic in avxintrin.h is calling an
> > > intrinsic in another header.  The call is not inlinable for the same
> > > reason.
> > >
> > > Is this ok?
> > 
> > This looks correct to me.
> > 
> > Do we need to backport this change?
> > 
> > Uros.
Uros Bizjak - June 30, 2013, 9:56 a.m.
On Sun, Jun 30, 2013 at 11:47 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>
>> What target flags are enabled by AVX? Assumming that all target flags are
>> positive seems incorrect to me (like -mno-red-zone function can not be inlined
>> into -mred-zone).  Does those conditionally enabled AVX codegen flags have any
>
> Actually -mred-zone seems right, but stuff like -msseregparm will probably break?
>
> Honza
>
>> effect when AVX is disabled?  Perhaps we can set them unconditionally?

The issue is with (config/i386.c, ix86_option_override_internal):

  if (TARGET_AVX)
    {
      /* When not optimize for size, enable vzeroupper optimization for
     TARGET_AVX with -fexpensive-optimizations and split 32-byte
     AVX unaligned load/store.  */
      if (!optimize_size)
    {
      if (flag_expensive_optimizations
          && !(target_flags_explicit & MASK_VZEROUPPER))
        target_flags |= MASK_VZEROUPPER;
      if ((x86_avx256_split_unaligned_load & ix86_tune_mask)
          && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
        target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
      if ((x86_avx256_split_unaligned_store & ix86_tune_mask)
          && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
        target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
      /* Enable 128-bit AVX instruction generation
         for the auto-vectorizer.  */
      if (TARGET_AVX128_OPTIMAL
          && !(target_flags_explicit & MASK_PREFER_AVX128))
        target_flags |= MASK_PREFER_AVX128;
    }

These are all tuning flags that are applicable to AVX only. They
depend on AVX, so can be probably enabled unconditionally.

Uros.

Patch

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c (revision 200497)
+++ gcc/config/i386/i386.c (working copy)
@@ -4520,8 +4520,9 @@  ix86_can_inline_p (tree caller, tree callee)
   != callee_opts->x_ix86_isa_flags)
  ret = false;

-      /* See if we have the same non-isa options.  */
-      else if (caller_opts->x_target_flags != callee_opts->x_target_flags)
+      /* Callee's non-isa options should be a subset of the caller's.  */
+      else if ((caller_opts->x_target_flags & callee_opts->x_target_flags)
+                 != callee_opts->x_target_flags)
          ret = false;