diff mbox series

[rs6000] folding of vector stores in GIMPLE

Message ID 1506027369.26707.65.camel@brimstone.rchland.ibm.com
State New
Headers show
Series [rs6000] folding of vector stores in GIMPLE | expand

Commit Message

will schmidt Sept. 21, 2017, 8:56 p.m. UTC
Hi, 
   
Folding of vector stores in GIMPLE.
    
- Add code to handle gimple folding for the vec_st (vector store) builtins.
- Remove the now obsoleted folding code for vec_st from rs6000-c-c.
    
There are two spots that I could use some feedback on. 

First - 
 An early exit remains in place prevent folding of statements that do not
 have a LHS.  To allow folding of the stores to get past the check, I have
 added a helper function (rs6000_builtin_valid_without_lhs) that allows
 those store intrinsics to proceed.  I'm not sure the approach (or the name I chose)
 is the best choice, so I'll defer to recommendations on how to improve that. :-)

Second - 
 This code (as-is) is subject to a TBAA related issue (similar to what was noticed
 in the gimple folding code for loads.   As-is, with a testcase such as :

void testst_struct1b (vector double vd1, long long ll1, struct S *p)
    {
        vec_st (vd1, ll1, (vector double *)p);
    }

 will generate gimple that looks like:
    MEM[(struct S *)D.3218] = vd1;

If I rework the code, setting arg2_type to be ptr_type_node, i.e. 
+ tree arg2_type = TREE_TYPE (arg2);
to:
+ tree arg2_type = ptr_type_node;

the generated gimple then looks like
  MEM[(void *)D.3218] = vd1;

Which is probably OK, but I cannot say for certain.  The generated .s content is at least equivalent.

The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which
has been posted separately.

regtest looks clean on power6 and newer.
    
pending feedback, OK for trunk? 
    
Thanks,
-Will
    
    [gcc]
    
    2017-09-21  Will Schmidt  <will_schmidt@vnet.ibm.com>
    
        * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
        for early folding of vector stores (ALTIVEC_BUILTIN_ST_*).
        (rs6000_builtin_valid_without_lhs): helper function.
        * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
        Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST.

Comments

Richard Biener Sept. 22, 2017, 10:27 a.m. UTC | #1
On Thu, Sep 21, 2017 at 10:56 PM, Will Schmidt
<will_schmidt@vnet.ibm.com> wrote:
> Hi,
>
> Folding of vector stores in GIMPLE.
>
> - Add code to handle gimple folding for the vec_st (vector store) builtins.
> - Remove the now obsoleted folding code for vec_st from rs6000-c-c.
>
> There are two spots that I could use some feedback on.
>
> First -
>  An early exit remains in place prevent folding of statements that do not
>  have a LHS.  To allow folding of the stores to get past the check, I have
>  added a helper function (rs6000_builtin_valid_without_lhs) that allows
>  those store intrinsics to proceed.  I'm not sure the approach (or the name I chose)
>  is the best choice, so I'll defer to recommendations on how to improve that. :-)
>
> Second -
>  This code (as-is) is subject to a TBAA related issue (similar to what was noticed
>  in the gimple folding code for loads.   As-is, with a testcase such as :
>
> void testst_struct1b (vector double vd1, long long ll1, struct S *p)
>     {
>         vec_st (vd1, ll1, (vector double *)p);
>     }
>
>  will generate gimple that looks like:
>     MEM[(struct S *)D.3218] = vd1;
>
> If I rework the code, setting arg2_type to be ptr_type_node, i.e.
> + tree arg2_type = TREE_TYPE (arg2);
> to:
> + tree arg2_type = ptr_type_node;
>
> the generated gimple then looks like
>   MEM[(void *)D.3218] = vd1;
>
> Which is probably OK, but I cannot say for certain.  The generated .s content is at least equivalent.

It looks safe.

The question I had is whether vec_st (vd1, ll1, (vector double *)p) is
equivalent
to *(vector double *)((char *)p + ll1) = vd1; from a TBAA perspective.  Thus
whether the type of the tird argument to vec_st defines the type of the access
(in C standards meaning).  If so then what we do now is pessimizing (but
as you say previously (long long *) and (long *) were aliased together and
you got wrong-code with aliasing with regular stores of the "wrong" same type).

A proper fix would be to transition this type as seen from the frontend to
GIMPLE, for example in a similar way we do for MEM_REFs by piggy-backing
that on an extra argument, a constant zero pointer of the alias
pointer type to use
(which would also serve as a type indicator of the store itself).  I'd use a
target specific internal function for this (not sure if we can have those target
specific, but I guess if it's folded away then that's fine) and get away with
the overload set.

Richard.

> The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which
> has been posted separately.
>
> regtest looks clean on power6 and newer.
>
> pending feedback, OK for trunk?
>
> Thanks,
> -Will
>
>     [gcc]
>
>     2017-09-21  Will Schmidt  <will_schmidt@vnet.ibm.com>
>
>         * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>         for early folding of vector stores (ALTIVEC_BUILTIN_ST_*).
>         (rs6000_builtin_valid_without_lhs): helper function.
>         * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>         Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST.
>
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index a49db97..4a363a1 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
>                      convert (TREE_TYPE (stmt), arg0));
>        stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>        return stmt;
>      }
>
> -  /* Expand vec_st into an expression that masks the address and
> -     performs the store.  We need to expand this early to allow
> -     the best aliasing, as by the time we get into RTL we no longer
> -     are able to honor __restrict__, for example.  We may want to
> -     consider this for all memory access built-ins.
> -
> -     When -maltivec=be is specified, or the wrong number of arguments
> -     is provided, simply punt to existing built-in processing.  */
> -
> -  if (fcode == ALTIVEC_BUILTIN_VEC_ST
> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
> -      && nargs == 3)
> -    {
> -      tree arg0 = (*arglist)[0];
> -      tree arg1 = (*arglist)[1];
> -      tree arg2 = (*arglist)[2];
> -
> -      /* Construct the masked address.  Let existing error handling take
> -        over if we don't have a constant offset.  */
> -      arg1 = fold (arg1);
> -
> -      if (TREE_CODE (arg1) == INTEGER_CST)
> -       {
> -         if (!ptrofftype_p (TREE_TYPE (arg1)))
> -           arg1 = build1 (NOP_EXPR, sizetype, arg1);
> -
> -         tree arg2_type = TREE_TYPE (arg2);
> -         if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ())
> -           {
> -             /* Force array-to-pointer decay for C++.  */
> -             arg2 = default_conversion (arg2);
> -             arg2_type = TREE_TYPE (arg2);
> -           }
> -
> -         /* Find the built-in to make sure a compatible one exists; if not
> -            we fall back to default handling to get the error message.  */
> -         for (desc = altivec_overloaded_builtins;
> -              desc->code && desc->code != fcode; desc++)
> -           continue;
> -
> -         for (; desc->code == fcode; desc++)
> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)
> -               && rs6000_builtin_type_compatible (TREE_TYPE (arg1), desc->op2)
> -               && rs6000_builtin_type_compatible (TREE_TYPE (arg2),
> -                                                  desc->op3))
> -             {
> -               tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg2_type,
> -                                            arg2, arg1);
> -               tree aligned
> -                 = fold_build2_loc (loc, BIT_AND_EXPR, arg2_type,
> -                                    addr, build_int_cst (arg2_type, -16));
> -
> -               tree arg0_type = TREE_TYPE (arg0);
> -               if (TYPE_MODE (arg0_type) == V2DImode)
> -                 /* Type-based aliasing analysis thinks vector long
> -                    and vector long long are different and will put them
> -                    in distinct alias classes.  Force our address type
> -                    to be a may-alias type to avoid this.  */
> -                 arg0_type
> -                   = build_pointer_type_for_mode (arg0_type, Pmode,
> -                                                  true/*can_alias_all*/);
> -               else
> -                 arg0_type = build_pointer_type (arg0_type);
> -               aligned = build1 (NOP_EXPR, arg0_type, aligned);
> -               tree stg = build_indirect_ref (loc, aligned, RO_NULL);
> -               tree retval = build2 (MODIFY_EXPR, TREE_TYPE (stg), stg,
> -                                     convert (TREE_TYPE (stg), arg0));
> -               return retval;
> -             }
> -       }
> -    }
> -
>    for (n = 0;
>         !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
>         fnargs = TREE_CHAIN (fnargs), n++)
>      {
>        tree decl_type = TREE_VALUE (fnargs);
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 1978634..ef41534 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -16155,10 +16155,29 @@ rs6000_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
>  #else
>    return NULL_TREE;
>  #endif
>  }
>
> +/*  Helper function to sort out which built-ins may be valid without having
> +    a LHS.  */
> +bool
> +rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code)
> +{
> +   switch (fn_code)
> +      {
> +       case ALTIVEC_BUILTIN_STVX_V16QI:
> +       case ALTIVEC_BUILTIN_STVX_V8HI:
> +       case ALTIVEC_BUILTIN_STVX_V4SI:
> +       case ALTIVEC_BUILTIN_STVX_V4SF:
> +       case ALTIVEC_BUILTIN_STVX_V2DI:
> +       case ALTIVEC_BUILTIN_STVX_V2DF:
> +         return true;
> +       default:
> +         return false;
> +      }
> +}
> +
>  /* Fold a machine-dependent built-in in GIMPLE.  (For folding into
>     a constant, use rs6000_fold_builtin.)  */
>
>  bool
>  rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
> @@ -16182,12 +16201,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>                fn_code, fn_name1, fn_name2);
>
>    if (!rs6000_fold_gimple)
>      return false;
>
> -  /* Generic solution to prevent gimple folding of code without a LHS.  */
> -  if (!gimple_call_lhs (stmt))
> +  /* Prevent gimple folding for code that does not have a LHS, unless it is
> +   allowed per the rs6000_builtin_valid_without_lhs helper function.  */
> +  if (!gimple_call_lhs (stmt) && !rs6000_builtin_valid_without_lhs (fn_code))
>      return false;
>
>    switch (fn_code)
>      {
>      /* Flavors of vec_add.  We deliberately don't expand
> @@ -16585,11 +16605,48 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>                                                 build_int_cst (arg1_type, 0)));
>          gimple_set_location (g, loc);
>          gsi_replace (gsi, g, true);
>          return true;
>        }
> -
> +    /* Vector stores.  */
> +    case ALTIVEC_BUILTIN_STVX_V16QI:
> +    case ALTIVEC_BUILTIN_STVX_V8HI:
> +    case ALTIVEC_BUILTIN_STVX_V4SI:
> +    case ALTIVEC_BUILTIN_STVX_V4SF:
> +    case ALTIVEC_BUILTIN_STVX_V2DI:
> +    case ALTIVEC_BUILTIN_STVX_V2DF:
> +      {
> +        /* Do not fold for -maltivec=be on LE targets.  */
> +        if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN)
> +           return false;
> +        arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
> +        arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
> +        tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
> +        location_t loc = gimple_location (stmt);
> +        tree arg0_type = TREE_TYPE (arg0);
> +        tree arg2_type = TREE_TYPE (arg2);
> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
> +           the tree using the value from arg0.  The resulting type will match
> +           the type of arg2.  */
> +        gimple_seq stmts = NULL;
> +        tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
> +        tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
> +                                      arg2_type, arg2, temp_offset);
> +        /* Mask off any lower bits from the address.  */
> +        tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
> +                                         arg2_type, temp_addr,
> +                                         build_int_cst (arg2_type, -16));
> +        gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +       /* The desired gimple result should be similar to:
> +        MEM[(__vector floatD.1407 *)_1] = vf1D.2697;  */
> +        gimple *g;
> +        g = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr,
> +                                          build_int_cst (arg2_type, 0)), arg0);
> +        gimple_set_location (g, loc);
> +        gsi_replace (gsi, g, true);
> +        return true;
> +      }
>      default:
>         if (TARGET_DEBUG_BUILTIN)
>            fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
>                     fn_code, fn_name1, fn_name2);
>        break;
>
>
Bill Schmidt Sept. 22, 2017, 7:58 p.m. UTC | #2
On Sep 22, 2017, at 5:27 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> On Thu, Sep 21, 2017 at 10:56 PM, Will Schmidt
> <will_schmidt@vnet.ibm.com> wrote:
>> Hi,
>> 
>> Folding of vector stores in GIMPLE.
>> 
>> - Add code to handle gimple folding for the vec_st (vector store) builtins.
>> - Remove the now obsoleted folding code for vec_st from rs6000-c-c.
>> 
>> There are two spots that I could use some feedback on.
>> 
>> First -
>> An early exit remains in place prevent folding of statements that do not
>> have a LHS.  To allow folding of the stores to get past the check, I have
>> added a helper function (rs6000_builtin_valid_without_lhs) that allows
>> those store intrinsics to proceed.  I'm not sure the approach (or the name I chose)
>> is the best choice, so I'll defer to recommendations on how to improve that. :-)

It's fine, but please make the helper function static.

>> 
>> Second -
>> This code (as-is) is subject to a TBAA related issue (similar to what was noticed
>> in the gimple folding code for loads.   As-is, with a testcase such as :
>> 
>> void testst_struct1b (vector double vd1, long long ll1, struct S *p)
>>    {
>>        vec_st (vd1, ll1, (vector double *)p);
>>    }
>> 
>> will generate gimple that looks like:
>>    MEM[(struct S *)D.3218] = vd1;
>> 
>> If I rework the code, setting arg2_type to be ptr_type_node, i.e.
>> + tree arg2_type = TREE_TYPE (arg2);
>> to:
>> + tree arg2_type = ptr_type_node;
>> 
>> the generated gimple then looks like
>>  MEM[(void *)D.3218] = vd1;
>> 
>> Which is probably OK, but I cannot say for certain.  The generated .s content is at least equivalent.
> 
> It looks safe.
> 
> The question I had is whether vec_st (vd1, ll1, (vector double *)p) is
> equivalent
> to *(vector double *)((char *)p + ll1) = vd1; from a TBAA perspective.  Thus
> whether the type of the tird argument to vec_st defines the type of the access
> (in C standards meaning).  If so then what we do now is pessimizing (but
> as you say previously (long long *) and (long *) were aliased together and
> you got wrong-code with aliasing with regular stores of the "wrong" same type).
> 
> A proper fix would be to transition this type as seen from the frontend to
> GIMPLE, for example in a similar way we do for MEM_REFs by piggy-backing
> that on an extra argument, a constant zero pointer of the alias
> pointer type to use
> (which would also serve as a type indicator of the store itself).  I'd use a
> target specific internal function for this (not sure if we can have those target
> specific, but I guess if it's folded away then that's fine) and get away with
> the overload set.

OK.  Will, for now, let's again use the (void *) solution for the time being, and
add commentary recording this improvement for future work.  Same would
go for the vec_vsx_ld/st variations once you get to those.

Otherwise the patch looks ok to me.  I'll defer to Segher for approval, of course.

Thanks,
Bill
> 
> Richard.
> 
>> The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which
>> has been posted separately.
>> 
>> regtest looks clean on power6 and newer.
>> 
>> pending feedback, OK for trunk?
>> 
>> Thanks,
>> -Will
>> 
>>    [gcc]
>> 
>>    2017-09-21  Will Schmidt  <will_schmidt@vnet.ibm.com>
>> 
>>        * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>>        for early folding of vector stores (ALTIVEC_BUILTIN_ST_*).
>>        (rs6000_builtin_valid_without_lhs): helper function.
>>        * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>        Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST.
>> 
>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>> index a49db97..4a363a1 100644
>> --- a/gcc/config/rs6000/rs6000-c.c
>> +++ b/gcc/config/rs6000/rs6000-c.c
>> @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
>>                     convert (TREE_TYPE (stmt), arg0));
>>       stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>       return stmt;
>>     }
>> 
>> -  /* Expand vec_st into an expression that masks the address and
>> -     performs the store.  We need to expand this early to allow
>> -     the best aliasing, as by the time we get into RTL we no longer
>> -     are able to honor __restrict__, for example.  We may want to
>> -     consider this for all memory access built-ins.
>> -
>> -     When -maltivec=be is specified, or the wrong number of arguments
>> -     is provided, simply punt to existing built-in processing.  */
>> -
>> -  if (fcode == ALTIVEC_BUILTIN_VEC_ST
>> -      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>> -      && nargs == 3)
>> -    {
>> -      tree arg0 = (*arglist)[0];
>> -      tree arg1 = (*arglist)[1];
>> -      tree arg2 = (*arglist)[2];
>> -
>> -      /* Construct the masked address.  Let existing error handling take
>> -        over if we don't have a constant offset.  */
>> -      arg1 = fold (arg1);
>> -
>> -      if (TREE_CODE (arg1) == INTEGER_CST)
>> -       {
>> -         if (!ptrofftype_p (TREE_TYPE (arg1)))
>> -           arg1 = build1 (NOP_EXPR, sizetype, arg1);
>> -
>> -         tree arg2_type = TREE_TYPE (arg2);
>> -         if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ())
>> -           {
>> -             /* Force array-to-pointer decay for C++.  */
>> -             arg2 = default_conversion (arg2);
>> -             arg2_type = TREE_TYPE (arg2);
>> -           }
>> -
>> -         /* Find the built-in to make sure a compatible one exists; if not
>> -            we fall back to default handling to get the error message.  */
>> -         for (desc = altivec_overloaded_builtins;
>> -              desc->code && desc->code != fcode; desc++)
>> -           continue;
>> -
>> -         for (; desc->code == fcode; desc++)
>> -           if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)
>> -               && rs6000_builtin_type_compatible (TREE_TYPE (arg1), desc->op2)
>> -               && rs6000_builtin_type_compatible (TREE_TYPE (arg2),
>> -                                                  desc->op3))
>> -             {
>> -               tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg2_type,
>> -                                            arg2, arg1);
>> -               tree aligned
>> -                 = fold_build2_loc (loc, BIT_AND_EXPR, arg2_type,
>> -                                    addr, build_int_cst (arg2_type, -16));
>> -
>> -               tree arg0_type = TREE_TYPE (arg0);
>> -               if (TYPE_MODE (arg0_type) == V2DImode)
>> -                 /* Type-based aliasing analysis thinks vector long
>> -                    and vector long long are different and will put them
>> -                    in distinct alias classes.  Force our address type
>> -                    to be a may-alias type to avoid this.  */
>> -                 arg0_type
>> -                   = build_pointer_type_for_mode (arg0_type, Pmode,
>> -                                                  true/*can_alias_all*/);
>> -               else
>> -                 arg0_type = build_pointer_type (arg0_type);
>> -               aligned = build1 (NOP_EXPR, arg0_type, aligned);
>> -               tree stg = build_indirect_ref (loc, aligned, RO_NULL);
>> -               tree retval = build2 (MODIFY_EXPR, TREE_TYPE (stg), stg,
>> -                                     convert (TREE_TYPE (stg), arg0));
>> -               return retval;
>> -             }
>> -       }
>> -    }
>> -
>>   for (n = 0;
>>        !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
>>        fnargs = TREE_CHAIN (fnargs), n++)
>>     {
>>       tree decl_type = TREE_VALUE (fnargs);
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 1978634..ef41534 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -16155,10 +16155,29 @@ rs6000_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
>> #else
>>   return NULL_TREE;
>> #endif
>> }
>> 
>> +/*  Helper function to sort out which built-ins may be valid without having
>> +    a LHS.  */
>> +bool
>> +rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code)
>> +{
>> +   switch (fn_code)
>> +      {
>> +       case ALTIVEC_BUILTIN_STVX_V16QI:
>> +       case ALTIVEC_BUILTIN_STVX_V8HI:
>> +       case ALTIVEC_BUILTIN_STVX_V4SI:
>> +       case ALTIVEC_BUILTIN_STVX_V4SF:
>> +       case ALTIVEC_BUILTIN_STVX_V2DI:
>> +       case ALTIVEC_BUILTIN_STVX_V2DF:
>> +         return true;
>> +       default:
>> +         return false;
>> +      }
>> +}
>> +
>> /* Fold a machine-dependent built-in in GIMPLE.  (For folding into
>>    a constant, use rs6000_fold_builtin.)  */
>> 
>> bool
>> rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>> @@ -16182,12 +16201,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>>               fn_code, fn_name1, fn_name2);
>> 
>>   if (!rs6000_fold_gimple)
>>     return false;
>> 
>> -  /* Generic solution to prevent gimple folding of code without a LHS.  */
>> -  if (!gimple_call_lhs (stmt))
>> +  /* Prevent gimple folding for code that does not have a LHS, unless it is
>> +   allowed per the rs6000_builtin_valid_without_lhs helper function.  */
>> +  if (!gimple_call_lhs (stmt) && !rs6000_builtin_valid_without_lhs (fn_code))
>>     return false;
>> 
>>   switch (fn_code)
>>     {
>>     /* Flavors of vec_add.  We deliberately don't expand
>> @@ -16585,11 +16605,48 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>>                                                build_int_cst (arg1_type, 0)));
>>         gimple_set_location (g, loc);
>>         gsi_replace (gsi, g, true);
>>         return true;
>>       }
>> -
>> +    /* Vector stores.  */
>> +    case ALTIVEC_BUILTIN_STVX_V16QI:
>> +    case ALTIVEC_BUILTIN_STVX_V8HI:
>> +    case ALTIVEC_BUILTIN_STVX_V4SI:
>> +    case ALTIVEC_BUILTIN_STVX_V4SF:
>> +    case ALTIVEC_BUILTIN_STVX_V2DI:
>> +    case ALTIVEC_BUILTIN_STVX_V2DF:
>> +      {
>> +        /* Do not fold for -maltivec=be on LE targets.  */
>> +        if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN)
>> +           return false;
>> +        arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
>> +        arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
>> +        tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
>> +        location_t loc = gimple_location (stmt);
>> +        tree arg0_type = TREE_TYPE (arg0);
>> +        tree arg2_type = TREE_TYPE (arg2);
>> +        /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
>> +           the tree using the value from arg0.  The resulting type will match
>> +           the type of arg2.  */
>> +        gimple_seq stmts = NULL;
>> +        tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
>> +        tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
>> +                                      arg2_type, arg2, temp_offset);
>> +        /* Mask off any lower bits from the address.  */
>> +        tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
>> +                                         arg2_type, temp_addr,
>> +                                         build_int_cst (arg2_type, -16));
>> +        gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +       /* The desired gimple result should be similar to:
>> +        MEM[(__vector floatD.1407 *)_1] = vf1D.2697;  */
>> +        gimple *g;
>> +        g = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr,
>> +                                          build_int_cst (arg2_type, 0)), arg0);
>> +        gimple_set_location (g, loc);
>> +        gsi_replace (gsi, g, true);
>> +        return true;
>> +      }
>>     default:
>>        if (TARGET_DEBUG_BUILTIN)
>>           fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
>>                    fn_code, fn_name1, fn_name2);
>>       break;
Segher Boessenkool Sept. 22, 2017, 11:44 p.m. UTC | #3
On Fri, Sep 22, 2017 at 02:58:54PM -0500, Bill Schmidt wrote:
> OK.  Will, for now, let's again use the (void *) solution for the time being, and
> add commentary recording this improvement for future work.  Same would
> go for the vec_vsx_ld/st variations once you get to those.
> 
> Otherwise the patch looks ok to me.  I'll defer to Segher for approval, of course.

It's okay for trunk with the suggested improvements.  Thanks for the
review!


Segher
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index a49db97..4a363a1 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -6470,82 +6470,10 @@  altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
 		     convert (TREE_TYPE (stmt), arg0));
       stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
       return stmt;
     }
 
-  /* Expand vec_st into an expression that masks the address and
-     performs the store.  We need to expand this early to allow
-     the best aliasing, as by the time we get into RTL we no longer
-     are able to honor __restrict__, for example.  We may want to
-     consider this for all memory access built-ins.
-
-     When -maltivec=be is specified, or the wrong number of arguments
-     is provided, simply punt to existing built-in processing.  */
-
-  if (fcode == ALTIVEC_BUILTIN_VEC_ST
-      && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
-      && nargs == 3)
-    {
-      tree arg0 = (*arglist)[0];
-      tree arg1 = (*arglist)[1];
-      tree arg2 = (*arglist)[2];
-
-      /* Construct the masked address.  Let existing error handling take
-	 over if we don't have a constant offset.  */
-      arg1 = fold (arg1);
-
-      if (TREE_CODE (arg1) == INTEGER_CST)
-	{
-	  if (!ptrofftype_p (TREE_TYPE (arg1)))
-	    arg1 = build1 (NOP_EXPR, sizetype, arg1);
-
-	  tree arg2_type = TREE_TYPE (arg2);
-	  if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ())
-	    {
-	      /* Force array-to-pointer decay for C++.  */
-	      arg2 = default_conversion (arg2);
-	      arg2_type = TREE_TYPE (arg2);
-	    }
-
-	  /* Find the built-in to make sure a compatible one exists; if not
-	     we fall back to default handling to get the error message.  */
-	  for (desc = altivec_overloaded_builtins;
-	       desc->code && desc->code != fcode; desc++)
-	    continue;
-
-	  for (; desc->code == fcode; desc++)
-	    if (rs6000_builtin_type_compatible (TREE_TYPE (arg0), desc->op1)
-		&& rs6000_builtin_type_compatible (TREE_TYPE (arg1), desc->op2)
-		&& rs6000_builtin_type_compatible (TREE_TYPE (arg2),
-						   desc->op3))
-	      {
-		tree addr = fold_build2_loc (loc, POINTER_PLUS_EXPR, arg2_type,
-					     arg2, arg1);
-		tree aligned
-		  = fold_build2_loc (loc, BIT_AND_EXPR, arg2_type,
-				     addr, build_int_cst (arg2_type, -16));
-
-		tree arg0_type = TREE_TYPE (arg0);
-		if (TYPE_MODE (arg0_type) == V2DImode)
-		  /* Type-based aliasing analysis thinks vector long
-		     and vector long long are different and will put them
-		     in distinct alias classes.  Force our address type
-		     to be a may-alias type to avoid this.  */
-		  arg0_type
-		    = build_pointer_type_for_mode (arg0_type, Pmode,
-						   true/*can_alias_all*/);
-		else
-		  arg0_type = build_pointer_type (arg0_type);
-		aligned = build1 (NOP_EXPR, arg0_type, aligned);
-		tree stg = build_indirect_ref (loc, aligned, RO_NULL);
-		tree retval = build2 (MODIFY_EXPR, TREE_TYPE (stg), stg,
-				      convert (TREE_TYPE (stg), arg0));
-		return retval;
-	      }
-	}
-    }
-
   for (n = 0;
        !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
        fnargs = TREE_CHAIN (fnargs), n++)
     {
       tree decl_type = TREE_VALUE (fnargs);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1978634..ef41534 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16155,10 +16155,29 @@  rs6000_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
 #else
   return NULL_TREE;
 #endif
 }
 
+/*  Helper function to sort out which built-ins may be valid without having
+    a LHS.  */
+bool
+rs6000_builtin_valid_without_lhs (enum rs6000_builtins fn_code)
+{
+   switch (fn_code)
+      {
+       case ALTIVEC_BUILTIN_STVX_V16QI:
+       case ALTIVEC_BUILTIN_STVX_V8HI:
+       case ALTIVEC_BUILTIN_STVX_V4SI:
+       case ALTIVEC_BUILTIN_STVX_V4SF:
+	case ALTIVEC_BUILTIN_STVX_V2DI:
+	case ALTIVEC_BUILTIN_STVX_V2DF:
+	  return true;
+	default:
+	  return false;
+      }
+}
+
 /* Fold a machine-dependent built-in in GIMPLE.  (For folding into
    a constant, use rs6000_fold_builtin.)  */
 
 bool
 rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
@@ -16182,12 +16201,13 @@  rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 	       fn_code, fn_name1, fn_name2);
 
   if (!rs6000_fold_gimple)
     return false;
 
-  /* Generic solution to prevent gimple folding of code without a LHS.  */
-  if (!gimple_call_lhs (stmt))
+  /* Prevent gimple folding for code that does not have a LHS, unless it is
+   allowed per the rs6000_builtin_valid_without_lhs helper function.  */
+  if (!gimple_call_lhs (stmt) && !rs6000_builtin_valid_without_lhs (fn_code))
     return false;
 
   switch (fn_code)
     {
     /* Flavors of vec_add.  We deliberately don't expand
@@ -16585,11 +16605,48 @@  rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 						build_int_cst (arg1_type, 0)));
 	 gimple_set_location (g, loc);
 	 gsi_replace (gsi, g, true);
 	 return true;
       }
-
+    /* Vector stores.  */
+    case ALTIVEC_BUILTIN_STVX_V16QI:
+    case ALTIVEC_BUILTIN_STVX_V8HI:
+    case ALTIVEC_BUILTIN_STVX_V4SI:
+    case ALTIVEC_BUILTIN_STVX_V4SF:
+    case ALTIVEC_BUILTIN_STVX_V2DI:
+    case ALTIVEC_BUILTIN_STVX_V2DF:
+      {
+	 /* Do not fold for -maltivec=be on LE targets.  */
+	 if (VECTOR_ELT_ORDER_BIG && !BYTES_BIG_ENDIAN)
+	    return false;
+	 arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
+	 arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
+	 tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
+	 location_t loc = gimple_location (stmt);
+	 tree arg0_type = TREE_TYPE (arg0);
+	 tree arg2_type = TREE_TYPE (arg2);
+	 /* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
+	    the tree using the value from arg0.  The resulting type will match
+	    the type of arg2.  */
+	 gimple_seq stmts = NULL;
+	 tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
+	 tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
+				       arg2_type, arg2, temp_offset);
+	 /* Mask off any lower bits from the address.  */
+	 tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
+					  arg2_type, temp_addr,
+					  build_int_cst (arg2_type, -16));
+	 gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	/* The desired gimple result should be similar to:
+	 MEM[(__vector floatD.1407 *)_1] = vf1D.2697;  */
+	 gimple *g;
+	 g = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr,
+					   build_int_cst (arg2_type, 0)), arg0);
+	 gimple_set_location (g, loc);
+	 gsi_replace (gsi, g, true);
+	 return true;
+      }
     default:
 	if (TARGET_DEBUG_BUILTIN)
 	   fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
 		    fn_code, fn_name1, fn_name2);
       break;