diff mbox

[PR50527] Don't assume alignment of vla-related allocas.

Message ID 4E8CC712.3000602@mentor.com
State New
Headers show

Commit Message

Tom de Vries Oct. 5, 2011, 9:07 p.m. UTC
On 10/05/2011 10:46 AM, Richard Guenther wrote:
> On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 10/04/2011 03:03 PM, Richard Guenther wrote:
>>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>>>>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> I got a patch for PR50527.
>>>>>>>>>
>>>>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>>>>> the alloca.
>>>>>>>>>
>>>>>>>>> Bootstrapped and regtested on x86_64.
>>>>>>>>>
>>>>>>>>> OK for trunk?
>>>>>>>>
>>>>>>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>>>>>
>>>>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>>>>> the alignment requirement of the VLA somewhere.
>>>>>>>>
>>>>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>>>>> point was to make the decl cheaper than the alloca).  But that might
>>>>>>>> possibly be the better choice.
>>>>>>>>
>>>>>>>> Any other thoughts?
>>>>>>>
>>>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>>>>> large vlas.
>>>>>>>
>>>>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>>>>>
>>>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>>>>>
>>>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>>>>> not fold during ccp3.
>>>>>>
>>>>>> Ugh, somehow I like this the least ;)
>>>>>>
>>>>>> How about lowering VLAs to
>>>>>>
>>>>>>   p = __builtin_alloca (...);
>>>>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>>>>
>>>>>> and not assume anything for alloca itself if it feeds a
>>>>>> __builtin_assume_aligned?
>>>>>>
>>>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>>>>
>>>>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>>>>
>>>>>> that's less awkward to use?
>>>>>>
>>>>>> Sorry for not having a clear plan here ;)
>>>>>>
>>>>>
>>>>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>>>>> indeed harder to use.
>>>>>
>>>>> Another possibility is to add a 'tree vla_decl' field to struct
>>>>> gimple_statement_call, which is probably the easiest to implement.
>>>>>
>>>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>>>>> decided to try this one. Attached patch implements my first stab at this  (now
>>>>> testing on x86_64).
>>>>>
>>>>> Is this an acceptable approach?
>>>>>
>>>>
>>>> bootstrapped and reg-tested (including ada) on x86_64.
>>>>
>>>> Ok for trunk?
>>>
>>> The idea is ok I think.  But
>>>
>>>      case BUILT_IN_ALLOCA:
>>> +    case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>        /* If the allocation stems from the declaration of a variable-sized
>>>          object, it cannot accumulate.  */
>>>        target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>>>        if (target)
>>>         return target;
>>> +      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
>>> +         == BUILT_IN_ALLOCA_WITH_ALIGN)
>>> +       {
>>> +         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
>>> +                                              built_in_decls[BUILT_IN_ALLOCA],
>>> +                                              1, CALL_EXPR_ARG (exp, 0));
>>> +         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
>>> +         exp = new_call;
>>> +       }
>>>
>>> Ick.  Why can't the rest of the compiler not handle
>>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
>>> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
>>>
>>
>> We can set the assembler name in the local_define_builtin call. But that will
>> still create a call alloca (12, 4). How do we deal with the second argument?
>>
>>> I don't see why you still need the special late CCP pass.
>>>
>>
>> For alloca_with_align, the align will minimally be the 2nd argument. This is
>> independent of folding, and we can propagate this information in every ccp.
>>
>> If the alloca_with_align is not folded and will not be folded anymore (something
>> we know at the earliest after the propagation phase of the last ccp),
>> the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a
>> normal allloca. We try to exploit this by improving the ptr_info->align.
> 
> I'd just assume the (lower) constant alignment of the argument always.
> 
>> Well, that was the idea. But now I wonder, isn't it better to do this in
>> expand_builtin_alloca:
>> ...
>>   /* Compute the argument.  */
>>   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
>>
>> +  align =
>> +    (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
>> +     BUILT_IN_ALLOCA_WITH_ALIGN)
>> +    ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
>> +    : BIGGEST_ALIGNMENT;
>> +
>> +
>>   /* Allocate the desired space.  */
>> -  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
>> -                                        cannot_accumulate);
>> +  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
>>   result = convert_memory_address (ptr_mode, result);
> 
> Yes, that's probably a good idea anyway - it will reduce excess
> stack re-alignment for VLAs.
> 
>>   return result;
>>
>> ...
>>
>> This way, we lower the alignment requirement for vlas in general, not only when
>> folding (unless we expand to an actual alloca call).
>>
>> If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be applicable
>> until expand, so we can't exploit that in the middle-end, so we lose the need
>> for ccp_last.
> 
> Good ;)
> 

Handled this way now.

> Thanks,
> Richard.
> 
>>> -static tree
>>> -fold_builtin_alloca_for_var (gimple stmt)
>>> +static bool
>>> +builtin_alloca_with_align_p (gimple stmt)
>>>  {
>>> -  unsigned HOST_WIDE_INT size, threshold, n_elem;
>>> -  tree lhs, arg, block, var, elem_type, array_type;
>>> -  unsigned int align;
>>> +  tree fndecl;
>>> +  if (!is_gimple_call (stmt))
>>> +    return false;
>>> +
>>> +  fndecl = gimple_call_fndecl (stmt);
>>> +
>>> +  return (fndecl != NULL_TREE
>>> +         && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
>>> +}
>>>
>>> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).
>>>
>>> +  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>>
>>> Now, users might call __builtin_alloca_with_align (8, align), thus use
>>> a non-constant alignment argument.  You either need to reject this
>>> in c-family/c-common.c:check_builtin_function_arguments with an
>>> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
>>> I suppose).
>>>
>>> +DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
>>>
>>> Oh, I see you are not exposing this to users, so ignore the comments
>>> about non-constant align.

>>> Can you move this
>>> DEF down, near the other DEF_BUILTIN_STUBs and add a comment
>>> that this is used for VLAs?

Done.

>>>
>>> +                                      build_int_cstu (size_type_node,
>>> +                                                      DECL_ALIGN (parm)));
>>>
>>> size_int (DECL_ALIGN (parm)), similar in other places.
>>>

Done.

>>> Let's see if there are comments from others on this - which I like.
>>>
>>
>> Thanks,
>> - Tom
>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>
>>

Also, the alloca_with_align (a,b) now appears in assembly as alloca (a, b), as
discussed in http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00307.html.

Bootstrapped and regtested (including ada) on x86_64.

OK for trunk?

Thanks,
- Tom

2011-10-05  Tom de Vries  <tom@codesourcery.com>

	PR middle-end/50527
	* tree.c (build_common_builtin_nodes): Add local_define_builtin for
	BUILT_IN_ALLOCA_WITH_ALIGN.  Mark that BUILT_IN_ALLOCA_WITH_ALIGN can
	throw.
	* builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
	arglist.  Set align for	BUILT_IN_ALLOCA_WITH_ALIGN.
	(expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	(is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	* tree-ssa-ccp.c (evaluate_stmt): Set align for
	BUILT_IN_ALLOCA_WITH_ALIGN.
	(fold_builtin_alloca_for_var): Rename to ...
	(fold_builtin_alloca_with_align): Set DECL_ALIGN from 2nd
	BUILT_IN_ALLOCA_WITH_ALIGN argument.
	(ccp_fold_stmt): Try folding BUILT_IN_ALLOCA_WITH_ALIGN using
	fold_builtin_alloca_with_align.
	(optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	* builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
	DEF_BUILTIN_STUB.
	* ipa-pure-const.c (special_builtin_state): Handle
	BUILT_IN_ALLOCA_WITH_ALIGN.
	* tree-ssa-alias.c (ref_maybe_used_by_call_p_1)
	(call_may_clobber_ref_p_1): Same.
	* function.c (gimplify_parameters): Lower vla to
	BUILT_IN_ALLOCA_WITH_ALIGN.
	* gimplify.c (gimplify_vla_decl): Same.
	* cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
	* tree-mudflap.c (mf_xform_statements): Same.
	* tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
	(mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
	* varasm.c (incorporeal_function_p): Same.
	* tree-object-size.c (alloc_object_size): Same.
	* gimple.c (gimple_build_call_from_tree): Same.

	* gcc.dg/pr50527.c: New test.

Comments

Richard Biener Oct. 6, 2011, 9:48 a.m. UTC | #1
On Wed, Oct 5, 2011 at 11:07 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 10/05/2011 10:46 AM, Richard Guenther wrote:
>> On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 10/04/2011 03:03 PM, Richard Guenther wrote:
>>>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>>>>>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>>>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote:
>>>>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I got a patch for PR50527.
>>>>>>>>>>
>>>>>>>>>> The patch prevents the alignment of vla-related allocas to be set to
>>>>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
>>>>>>>>>> the alloca.
>>>>>>>>>>
>>>>>>>>>> Bootstrapped and regtested on x86_64.
>>>>>>>>>>
>>>>>>>>>> OK for trunk?
>>>>>>>>>
>>>>>>>>> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
>>>>>>>>> the vectorizer then will no longer see that the arrays are properly aligned.
>>>>>>>>>
>>>>>>>>> I'm not sure what the best thing to do is here, other than trying to record
>>>>>>>>> the alignment requirement of the VLA somewhere.
>>>>>>>>>
>>>>>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
>>>>>>>>> has the issue that it will force stack-realignment which isn't free (and the
>>>>>>>>> point was to make the decl cheaper than the alloca).  But that might
>>>>>>>>> possibly be the better choice.
>>>>>>>>>
>>>>>>>>> Any other thoughts?
>>>>>>>>
>>>>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla
>>>>>>>> for the new array prevents stack realignment for folded vla-allocas, also for
>>>>>>>> large vlas.
>>>>>>>>
>>>>>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not
>>>>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has
>>>>>>>> been the case up until we started to fold). If you want to trigger vectorization
>>>>>>>> for a vla, you can still use the aligned attribute on the declaration.
>>>>>>>>
>>>>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using
>>>>>>>> an attribute on the decl. This patch exploits this by setting it at the end of
>>>>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in
>>>>>>>> propagation though, because although the ptr_info of the lhs is propagated via
>>>>>>>> copy_prop afterwards, it's not propagated anymore via ccp.
>>>>>>>>
>>>>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and
>>>>>>>> not fold during ccp3.
>>>>>>>
>>>>>>> Ugh, somehow I like this the least ;)
>>>>>>>
>>>>>>> How about lowering VLAs to
>>>>>>>
>>>>>>>   p = __builtin_alloca (...);
>>>>>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>>>>>
>>>>>>> and not assume anything for alloca itself if it feeds a
>>>>>>> __builtin_assume_aligned?
>>>>>>>
>>>>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>>>>>
>>>>>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>>>>>
>>>>>>> that's less awkward to use?
>>>>>>>
>>>>>>> Sorry for not having a clear plan here ;)
>>>>>>>
>>>>>>
>>>>>> Using assume_aligned is a more orthogonal way to represent this in gimple, but
>>>>>> indeed harder to use.
>>>>>>
>>>>>> Another possibility is to add a 'tree vla_decl' field to struct
>>>>>> gimple_statement_call, which is probably the easiest to implement.
>>>>>>
>>>>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I
>>>>>> decided to try this one. Attached patch implements my first stab at this  (now
>>>>>> testing on x86_64).
>>>>>>
>>>>>> Is this an acceptable approach?
>>>>>>
>>>>>
>>>>> bootstrapped and reg-tested (including ada) on x86_64.
>>>>>
>>>>> Ok for trunk?
>>>>
>>>> The idea is ok I think.  But
>>>>
>>>>      case BUILT_IN_ALLOCA:
>>>> +    case BUILT_IN_ALLOCA_WITH_ALIGN:
>>>>        /* If the allocation stems from the declaration of a variable-sized
>>>>          object, it cannot accumulate.  */
>>>>        target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
>>>>        if (target)
>>>>         return target;
>>>> +      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
>>>> +         == BUILT_IN_ALLOCA_WITH_ALIGN)
>>>> +       {
>>>> +         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
>>>> +                                              built_in_decls[BUILT_IN_ALLOCA],
>>>> +                                              1, CALL_EXPR_ARG (exp, 0));
>>>> +         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
>>>> +         exp = new_call;
>>>> +       }
>>>>
>>>> Ick.  Why can't the rest of the compiler not handle
>>>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA?
>>>> (thus, arrange things so the assembler name of alloca-with-align is alloca?)
>>>>
>>>
>>> We can set the assembler name in the local_define_builtin call. But that will
>>> still create a call alloca (12, 4). How do we deal with the second argument?
>>>
>>>> I don't see why you still need the special late CCP pass.
>>>>
>>>
>>> For alloca_with_align, the align will minimally be the 2nd argument. This is
>>> independent of folding, and we can propagate this information in every ccp.
>>>
>>> If the alloca_with_align is not folded and will not be folded anymore (something
>>> we know at the earliest after the propagation phase of the last ccp),
>>> the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a
>>> normal allloca. We try to exploit this by improving the ptr_info->align.
>>
>> I'd just assume the (lower) constant alignment of the argument always.
>>
>>> Well, that was the idea. But now I wonder, isn't it better to do this in
>>> expand_builtin_alloca:
>>> ...
>>>   /* Compute the argument.  */
>>>   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
>>>
>>> +  align =
>>> +    (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) ==
>>> +     BUILT_IN_ALLOCA_WITH_ALIGN)
>>> +    ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
>>> +    : BIGGEST_ALIGNMENT;
>>> +
>>> +
>>>   /* Allocate the desired space.  */
>>> -  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
>>> -                                        cannot_accumulate);
>>> +  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
>>>   result = convert_memory_address (ptr_mode, result);
>>
>> Yes, that's probably a good idea anyway - it will reduce excess
>> stack re-alignment for VLAs.
>>
>>>   return result;
>>>
>>> ...
>>>
>>> This way, we lower the alignment requirement for vlas in general, not only when
>>> folding (unless we expand to an actual alloca call).
>>>
>>> If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be applicable
>>> until expand, so we can't exploit that in the middle-end, so we lose the need
>>> for ccp_last.
>>
>> Good ;)
>>
>
> Handled this way now.
>
>> Thanks,
>> Richard.
>>
>>>> -static tree
>>>> -fold_builtin_alloca_for_var (gimple stmt)
>>>> +static bool
>>>> +builtin_alloca_with_align_p (gimple stmt)
>>>>  {
>>>> -  unsigned HOST_WIDE_INT size, threshold, n_elem;
>>>> -  tree lhs, arg, block, var, elem_type, array_type;
>>>> -  unsigned int align;
>>>> +  tree fndecl;
>>>> +  if (!is_gimple_call (stmt))
>>>> +    return false;
>>>> +
>>>> +  fndecl = gimple_call_fndecl (stmt);
>>>> +
>>>> +  return (fndecl != NULL_TREE
>>>> +         && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN);
>>>> +}
>>>>
>>>> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN).
>>>>
>>>> +  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>>>
>>>> Now, users might call __builtin_alloca_with_align (8, align), thus use
>>>> a non-constant alignment argument.  You either need to reject this
>>>> in c-family/c-common.c:check_builtin_function_arguments with an
>>>> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation
>>>> I suppose).
>>>>
>>>> +DEF_BUILTIN_STUB       (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align")
>>>>
>>>> Oh, I see you are not exposing this to users, so ignore the comments
>>>> about non-constant align.
>
>>>> Can you move this
>>>> DEF down, near the other DEF_BUILTIN_STUBs and add a comment
>>>> that this is used for VLAs?
>
> Done.
>
>>>>
>>>> +                                      build_int_cstu (size_type_node,
>>>> +                                                      DECL_ALIGN (parm)));
>>>>
>>>> size_int (DECL_ALIGN (parm)), similar in other places.
>>>>
>
> Done.
>
>>>> Let's see if there are comments from others on this - which I like.
>>>>
>>>
>>> Thanks,
>>> - Tom
>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>
>>>
>
> Also, the alloca_with_align (a,b) now appears in assembly as alloca (a, b), as
> discussed in http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00307.html.
>
> Bootstrapped and regtested (including ada) on x86_64.
>
> OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> - Tom
>
> 2011-10-05  Tom de Vries  <tom@codesourcery.com>
>
>        PR middle-end/50527
>        * tree.c (build_common_builtin_nodes): Add local_define_builtin for
>        BUILT_IN_ALLOCA_WITH_ALIGN.  Mark that BUILT_IN_ALLOCA_WITH_ALIGN can
>        throw.
>        * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN
>        arglist.  Set align for BUILT_IN_ALLOCA_WITH_ALIGN.
>        (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-ssa-ccp.c (evaluate_stmt): Set align for
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        (fold_builtin_alloca_for_var): Rename to ...
>        (fold_builtin_alloca_with_align): Set DECL_ALIGN from 2nd
>        BUILT_IN_ALLOCA_WITH_ALIGN argument.
>        (ccp_fold_stmt): Try folding BUILT_IN_ALLOCA_WITH_ALIGN using
>        fold_builtin_alloca_with_align.
>        (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using
>        DEF_BUILTIN_STUB.
>        * ipa-pure-const.c (special_builtin_state): Handle
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-ssa-alias.c (ref_maybe_used_by_call_p_1)
>        (call_may_clobber_ref_p_1): Same.
>        * function.c (gimplify_parameters): Lower vla to
>        BUILT_IN_ALLOCA_WITH_ALIGN.
>        * gimplify.c (gimplify_vla_decl): Same.
>        * cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN.
>        * tree-mudflap.c (mf_xform_statements): Same.
>        * tree-ssa-dce.c (mark_stmt_if_obviously_necessary)
>        (mark_all_reaching_defs_necessary_1, propagate_necessity): Same.
>        * varasm.c (incorporeal_function_p): Same.
>        * tree-object-size.c (alloc_object_size): Same.
>        * gimple.c (gimple_build_call_from_tree): Same.
>
>        * gcc.dg/pr50527.c: New test.
>
diff mbox

Patch

Index: gcc/tree.c
===================================================================
--- gcc/tree.c (revision 179210)
+++ gcc/tree.c (working copy)
@@ -9483,9 +9483,18 @@  build_common_builtin_nodes (void)
 			    "alloca", ECF_MALLOC | ECF_NOTHROW | ECF_LEAF);
     }
 
+  ftype = build_function_type_list (ptr_type_node, size_type_node,
+				    size_type_node, NULL_TREE);
+  local_define_builtin ("__builtin_alloca_with_align", ftype,
+			BUILT_IN_ALLOCA_WITH_ALIGN, "alloca",
+			ECF_MALLOC | ECF_NOTHROW | ECF_LEAF);
+
   /* If we're checking the stack, `alloca' can throw.  */
   if (flag_stack_check)
-    TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA]) = 0;
+    {
+      TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA]) = 0;
+      TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN]) = 0;
+    }
 
   ftype = build_function_type_list (void_type_node,
 				    ptr_type_node, ptr_type_node,
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c (revision 179210)
+++ gcc/builtins.c (working copy)
@@ -4516,20 +4516,33 @@  expand_builtin_alloca (tree exp, bool ca
 {
   rtx op0;
   rtx result;
+  bool valid_arglist;
+  unsigned int align;
+  bool alloca_with_align = (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
+			    == BUILT_IN_ALLOCA_WITH_ALIGN);
 
   /* Emit normal call if marked not-inlineable.  */
   if (CALL_CANNOT_INLINE_P (exp))
     return NULL_RTX;
 
-  if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
+  valid_arglist
+    = (alloca_with_align
+       ? validate_arglist (exp, INTEGER_TYPE, INTEGER_TYPE, VOID_TYPE)
+       : validate_arglist (exp, INTEGER_TYPE, VOID_TYPE));
+
+  if (!valid_arglist)
     return NULL_RTX;
 
   /* Compute the argument.  */
   op0 = expand_normal (CALL_EXPR_ARG (exp, 0));
 
+  /* Compute the alignment.  */
+  align = (alloca_with_align
+	   ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1))
+	   : BIGGEST_ALIGNMENT);
+
   /* Allocate the desired space.  */
-  result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT,
-					 cannot_accumulate);
+  result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate);
   result = convert_memory_address (ptr_mode, result);
 
   return result;
@@ -5304,6 +5317,7 @@  expand_builtin (tree exp, rtx target, rt
       && !called_as_built_in (fndecl)
       && DECL_ASSEMBLER_NAME_SET_P (fndecl)
       && fcode != BUILT_IN_ALLOCA
+      && fcode != BUILT_IN_ALLOCA_WITH_ALIGN
       && fcode != BUILT_IN_FREE)
     return expand_call (exp, target, ignore);
 
@@ -5559,6 +5573,7 @@  expand_builtin (tree exp, rtx target, rt
 	return XEXP (DECL_RTL (DECL_RESULT (current_function_decl)), 0);
 
     case BUILT_IN_ALLOCA:
+    case BUILT_IN_ALLOCA_WITH_ALIGN:
       /* If the allocation stems from the declaration of a variable-sized
 	 object, it cannot accumulate.  */
       target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
@@ -13561,6 +13576,7 @@  is_inexpensive_builtin (tree decl)
       {
       case BUILT_IN_ABS:
       case BUILT_IN_ALLOCA:
+      case BUILT_IN_ALLOCA_WITH_ALIGN:
       case BUILT_IN_BSWAP32:
       case BUILT_IN_BSWAP64:
       case BUILT_IN_CLZ:
Index: gcc/tree-ssa-ccp.c
===================================================================
--- gcc/tree-ssa-ccp.c (revision 179210)
+++ gcc/tree-ssa-ccp.c (working copy)
@@ -1492,6 +1492,7 @@  evaluate_stmt (gimple stmt)
   tree simplified = NULL_TREE;
   ccp_lattice_t likelyvalue = likely_value (stmt);
   bool is_constant = false;
+  unsigned int align;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -1632,10 +1633,14 @@  evaluate_stmt (gimple stmt)
 	      break;
 
 	    case BUILT_IN_ALLOCA:
+	    case BUILT_IN_ALLOCA_WITH_ALIGN:
+	      align = (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN
+		       ? TREE_INT_CST_LOW (gimple_call_arg (stmt, 1))
+		       : BIGGEST_ALIGNMENT);
 	      val.lattice_val = CONSTANT;
 	      val.value = build_int_cst (TREE_TYPE (gimple_get_lhs (stmt)), 0);
 	      val.mask = shwi_to_double_int
-		  	   (~(((HOST_WIDE_INT) BIGGEST_ALIGNMENT)
+		  	   (~(((HOST_WIDE_INT) align)
 			      / BITS_PER_UNIT - 1));
 	      break;
 
@@ -1685,15 +1690,15 @@  evaluate_stmt (gimple stmt)
   return val;
 }
 
-/* Detects a vla-related alloca with a constant argument.  Declares fixed-size
-   array and return the address, if found, otherwise returns NULL_TREE.  */
+/* Detects a __builtin_alloca_with_align with constant size argument.  Declares
+   fixed-size array and returns the address, if found, otherwise returns
+   NULL_TREE.  */
 
 static tree
-fold_builtin_alloca_for_var (gimple stmt)
+fold_builtin_alloca_with_align (gimple stmt)
 {
   unsigned HOST_WIDE_INT size, threshold, n_elem;
   tree lhs, arg, block, var, elem_type, array_type;
-  unsigned int align;
 
   /* Get lhs.  */
   lhs = gimple_call_lhs (stmt);
@@ -1709,10 +1714,10 @@  fold_builtin_alloca_for_var (gimple stmt
 
   size = TREE_INT_CST_LOW (arg);
 
-  /* Heuristic: don't fold large vlas.  */
+  /* Heuristic: don't fold large allocas.  */
   threshold = (unsigned HOST_WIDE_INT)PARAM_VALUE (PARAM_LARGE_STACK_FRAME);
-  /* In case a vla is declared at function scope, it has the same lifetime as a
-     declared array, so we allow a larger size.  */
+  /* In case the alloca is located at function entry, it has the same lifetime
+     as a declared array, so we allow a larger size.  */
   block = gimple_block (stmt);
   if (!(cfun->after_inlining
         && TREE_CODE (BLOCK_SUPERCONTEXT (block)) == FUNCTION_DECL))
@@ -1723,12 +1728,9 @@  fold_builtin_alloca_for_var (gimple stmt
   /* Declare array.  */
   elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
   n_elem = size * 8 / BITS_PER_UNIT;
-  align = MIN (size * 8, BIGGEST_ALIGNMENT);
-  if (align < BITS_PER_UNIT)
-    align = BITS_PER_UNIT;
   array_type = build_array_type_nelts (elem_type, n_elem);
   var = create_tmp_var (array_type, NULL);
-  DECL_ALIGN (var) = align;
+  DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
   {
     struct ptr_info_def *pi = SSA_NAME_PTR_INFO (lhs);
     if (pi != NULL && !pi->pt.anything)
@@ -1813,12 +1815,12 @@  ccp_fold_stmt (gimple_stmt_iterator *gsi
 	if (gimple_call_internal_p (stmt))
 	  return false;
 
-        /* The heuristic of fold_builtin_alloca_for_var differs before and after
-           inlining, so we don't require the arg to be changed into a constant
-           for folding, but just to be constant.  */
-        if (gimple_call_alloca_for_var_p (stmt))
+        /* The heuristic of fold_builtin_alloca_with_align differs before and
+	   after inlining, so we don't require the arg to be changed into a
+	   constant for folding, but just to be constant.  */
+        if (gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN))
           {
-            tree new_rhs = fold_builtin_alloca_for_var (stmt);
+            tree new_rhs = fold_builtin_alloca_with_align (stmt);
             if (new_rhs)
 	      {
 		bool res = update_call_from_tree (gsi, new_rhs);
@@ -2093,7 +2095,8 @@  optimize_stack_restore (gimple_stmt_iter
       if (!callee
 	  || DECL_BUILT_IN_CLASS (callee) != BUILT_IN_NORMAL
 	  /* All regular builtins are ok, just obviously not alloca.  */
-	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA)
+	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA_WITH_ALIGN)
 	return NULL_TREE;
 
       if (DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE)
Index: gcc/builtins.def
===================================================================
--- gcc/builtins.def (revision 179210)
+++ gcc/builtins.def (working copy)
@@ -738,6 +738,7 @@  DEF_BUILTIN_STUB (BUILT_IN_SETJMP_RECEIV
 /* Implementing variable sized local variables.  */
 DEF_BUILTIN_STUB (BUILT_IN_STACK_SAVE, "__builtin_stack_save")
 DEF_BUILTIN_STUB (BUILT_IN_STACK_RESTORE, "__builtin_stack_restore")
+DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN, "__builtin_alloca_with_align")
 
 /* Object size checking builtins.  */
 DEF_GCC_BUILTIN	       (BUILT_IN_OBJECT_SIZE, "object_size", BT_FN_SIZE_CONST_PTR_INT, ATTR_PURE_NOTHROW_LEAF_LIST)
Index: gcc/ipa-pure-const.c
===================================================================
--- gcc/ipa-pure-const.c (revision 179210)
+++ gcc/ipa-pure-const.c (working copy)
@@ -437,6 +437,7 @@  special_builtin_state (enum pure_const_s
 	case BUILT_IN_RETURN:
 	case BUILT_IN_UNREACHABLE:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_STACK_RESTORE:
 	case BUILT_IN_EH_POINTER:
Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c (revision 179210)
+++ gcc/tree-ssa-alias.c (working copy)
@@ -1231,6 +1231,7 @@  ref_maybe_used_by_call_p_1 (gimple call,
 	case BUILT_IN_MALLOC:
 	case BUILT_IN_CALLOC:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_STACK_RESTORE:
 	case BUILT_IN_MEMSET:
@@ -1513,6 +1514,7 @@  call_may_clobber_ref_p_1 (gimple call, a
 	  return false;
 	case BUILT_IN_STACK_SAVE:
 	case BUILT_IN_ALLOCA:
+	case BUILT_IN_ALLOCA_WITH_ALIGN:
 	case BUILT_IN_ASSUME_ALIGNED:
 	  return false;
 	/* Freeing memory kills the pointed-to memory.  More importantly
Index: gcc/function.c
===================================================================
--- gcc/function.c (revision 179210)
+++ gcc/function.c (working copy)
@@ -3638,8 +3638,10 @@  gimplify_parameters (void)
 		  DECL_IGNORED_P (addr) = 0;
 		  local = build_fold_indirect_ref (addr);
 
-		  t = built_in_decls[BUILT_IN_ALLOCA];
-		  t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm));
+		  t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN];
+		  t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm),
+				       size_int (DECL_ALIGN (parm)));
+
 		  /* The call has been built for a variable-sized object.  */
 		  CALL_ALLOCA_FOR_VAR_P (t) = 1;
 		  t = fold_convert (ptr_type, t);
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c (revision 179210)
+++ gcc/gimplify.c (working copy)
@@ -1329,8 +1329,9 @@  gimplify_vla_decl (tree decl, gimple_seq
   SET_DECL_VALUE_EXPR (decl, t);
   DECL_HAS_VALUE_EXPR_P (decl) = 1;
 
-  t = built_in_decls[BUILT_IN_ALLOCA];
-  t = build_call_expr (t, 1, DECL_SIZE_UNIT (decl));
+  t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN];
+  t = build_call_expr (t, 2, DECL_SIZE_UNIT (decl),
+		       size_int (DECL_ALIGN (decl)));
   /* The call has been built for a variable-sized object.  */
   CALL_ALLOCA_FOR_VAR_P (t) = 1;
   t = fold_convert (ptr_type, t);
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c (revision 179210)
+++ gcc/cfgexpand.c (working copy)
@@ -1858,7 +1858,8 @@  expand_call_stmt (gimple stmt)
   CALL_EXPR_RETURN_SLOT_OPT (exp) = gimple_call_return_slot_opt_p (stmt);
   if (decl
       && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL
-      && DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA)
+      && (DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA_WITH_ALIGN))
     CALL_ALLOCA_FOR_VAR_P (exp) = gimple_call_alloca_for_var_p (stmt);
   else
     CALL_FROM_THUNK_P (exp) = gimple_call_from_thunk_p (stmt);
Index: gcc/tree-mudflap.c
===================================================================
--- gcc/tree-mudflap.c (revision 179210)
+++ gcc/tree-mudflap.c (working copy)
@@ -969,7 +969,9 @@  mf_xform_statements (void)
             case GIMPLE_CALL:
               {
                 tree fndecl = gimple_call_fndecl (s);
-                if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA))
+                if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA
+			       || (DECL_FUNCTION_CODE (fndecl)
+				   == BUILT_IN_ALLOCA_WITH_ALIGN)))
                   gimple_call_set_cannot_inline (s, true);
               }
               break;
Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c (revision 179210)
+++ gcc/tree-ssa-dce.c (working copy)
@@ -308,6 +308,7 @@  mark_stmt_if_obviously_necessary (gimple
 	    case BUILT_IN_MALLOC:
 	    case BUILT_IN_CALLOC:
 	    case BUILT_IN_ALLOCA:
+	    case BUILT_IN_ALLOCA_WITH_ALIGN:
 	      return;
 
 	    default:;
@@ -639,6 +640,7 @@  mark_all_reaching_defs_necessary_1 (ao_r
 	  case BUILT_IN_MALLOC:
 	  case BUILT_IN_CALLOC:
 	  case BUILT_IN_ALLOCA:
+	  case BUILT_IN_ALLOCA_WITH_ALIGN:
 	  case BUILT_IN_FREE:
 	    return false;
 
@@ -890,6 +892,8 @@  propagate_necessity (struct edge_list *e
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_FREE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_VA_END
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA
+		      || (DECL_FUNCTION_CODE (callee)
+			  == BUILT_IN_ALLOCA_WITH_ALIGN)
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_SAVE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE
 		      || DECL_FUNCTION_CODE (callee) == BUILT_IN_ASSUME_ALIGNED))
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c (revision 179210)
+++ gcc/varasm.c (working copy)
@@ -2104,7 +2104,8 @@  incorporeal_function_p (tree decl)
       const char *name;
 
       if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL
-	  && DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA)
+	  && (DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA
+	      || DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA_WITH_ALIGN))
 	return true;
 
       name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
Index: gcc/tree-object-size.c
===================================================================
--- gcc/tree-object-size.c (revision 179210)
+++ gcc/tree-object-size.c (working copy)
@@ -411,6 +411,7 @@  alloc_object_size (const_gimple call, in
 	/* fall through */
       case BUILT_IN_MALLOC:
       case BUILT_IN_ALLOCA:
+      case BUILT_IN_ALLOCA_WITH_ALIGN:
 	arg1 = 0;
       default:
 	break;
Index: gcc/gimple.c
===================================================================
--- gcc/gimple.c (revision 179210)
+++ gcc/gimple.c (working copy)
@@ -374,7 +374,8 @@  gimple_build_call_from_tree (tree t)
   gimple_call_set_return_slot_opt (call, CALL_EXPR_RETURN_SLOT_OPT (t));
   if (fndecl
       && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
-      && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA)
+      && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA
+	  || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN))
     gimple_call_set_alloca_for_var (call, CALL_ALLOCA_FOR_VAR_P (t));
   else
     gimple_call_set_from_thunk (call, CALL_FROM_THUNK_P (t));
Index: gcc/testsuite/gcc.dg/pr50527.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr50527.c (revision 0)
@@ -0,0 +1,46 @@ 
+/* { dg-do run } */
+/* { dg-options "-Os --param large-stack-frame=30" } */
+
+extern void abort (void);
+
+void __attribute__((noinline))
+bar (char *a)
+{
+}
+
+void __attribute__((noinline))
+foo (char *a, int b)
+{
+}
+
+void __attribute__((noinline))
+test_align (char *p, int aligned, unsigned int mask)
+{
+  int p_aligned = ((unsigned long int)p & mask) == 0;
+  if (aligned != p_aligned)
+    abort ();
+}
+
+int
+main ()
+{
+  const int kIterations = 4;
+  char results[kIterations];
+  int i;
+  unsigned int mask;
+
+  mask = 0xf;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x7;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x3;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+  mask = 0x1;
+  test_align (results, ((unsigned long int)results & mask) == 0, mask);
+
+  bar (results);
+  for (i = 0; i < kIterations; i++)
+    foo ("%d ", results[i]);
+
+  return 0;
+}