From patchwork Wed Oct 5 21:07:30 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 117916 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 14965B6F9D for ; Thu, 6 Oct 2011 08:08:13 +1100 (EST) Received: (qmail 4240 invoked by alias); 5 Oct 2011 21:08:11 -0000 Received: (qmail 4227 invoked by uid 22791); 5 Oct 2011 21:08:07 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL, BAYES_00, TW_FN, TW_TM X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 05 Oct 2011 21:07:49 +0000 Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1RBYgu-0003rB-3R from Tom_deVries@mentor.com ; Wed, 05 Oct 2011 14:07:48 -0700 Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 5 Oct 2011 13:57:31 -0700 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.1.289.1; Wed, 5 Oct 2011 22:07:45 +0100 Message-ID: <4E8CC712.3000602@mentor.com> Date: Wed, 5 Oct 2011 23:07:30 +0200 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.21) Gecko/20110831 Lightning/1.0b2 Thunderbird/3.1.13 MIME-Version: 1.0 To: Richard Guenther CC: "gcc-patches >> \"gcc-patches@gcc.gnu.org\"" , Richard Henderson Subject: Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas. References: <4E82EA3E.4050000@mentor.com> <4E846F6F.7070002@mentor.com> <4E8735DD.5010701@mentor.com> <4E8AB925.8010305@mentor.com> <4E8B344A.7010502@mentor.com> In-Reply-To: Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On 10/05/2011 10:46 AM, Richard Guenther wrote: > On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries wrote: >> On 10/04/2011 03:03 PM, Richard Guenther wrote: >>> On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries wrote: >>>> On 10/01/2011 05:46 PM, Tom de Vries wrote: >>>>> On 09/30/2011 03:29 PM, Richard Guenther wrote: >>>>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries wrote: >>>>>>> On 09/28/2011 11:53 AM, Richard Guenther wrote: >>>>>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> I got a patch for PR50527. >>>>>>>>> >>>>>>>>> The patch prevents the alignment of vla-related allocas to be set to >>>>>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding >>>>>>>>> the alloca. >>>>>>>>> >>>>>>>>> Bootstrapped and regtested on x86_64. >>>>>>>>> >>>>>>>>> OK for trunk? >>>>>>>> >>>>>>>> Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that >>>>>>>> the vectorizer then will no longer see that the arrays are properly aligned. >>>>>>>> >>>>>>>> I'm not sure what the best thing to do is here, other than trying to record >>>>>>>> the alignment requirement of the VLA somewhere. >>>>>>>> >>>>>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT >>>>>>>> has the issue that it will force stack-realignment which isn't free (and the >>>>>>>> point was to make the decl cheaper than the alloca). But that might >>>>>>>> possibly be the better choice. >>>>>>>> >>>>>>>> Any other thoughts? >>>>>>> >>>>>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of the vla >>>>>>> for the new array prevents stack realignment for folded vla-allocas, also for >>>>>>> large vlas. >>>>>>> >>>>>>> This will not help in vectorizing large folded vla-allocas, but I think it's not >>>>>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that has >>>>>>> been the case up until we started to fold). If you want to trigger vectorization >>>>>>> for a vla, you can still use the aligned attribute on the declaration. >>>>>>> >>>>>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without using >>>>>>> an attribute on the decl. This patch exploits this by setting it at the end of >>>>>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in >>>>>>> propagation though, because although the ptr_info of the lhs is propagated via >>>>>>> copy_prop afterwards, it's not propagated anymore via ccp. >>>>>>> >>>>>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of ccp2 and >>>>>>> not fold during ccp3. >>>>>> >>>>>> Ugh, somehow I like this the least ;) >>>>>> >>>>>> How about lowering VLAs to >>>>>> >>>>>> p = __builtin_alloca (...); >>>>>> p = __builtin_assume_aligned (p, DECL_ALIGN (vla)); >>>>>> >>>>>> and not assume anything for alloca itself if it feeds a >>>>>> __builtin_assume_aligned? >>>>>> >>>>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do >>>>>> >>>>>> p = __builtin_alloca_with_align (..., DECL_ALIGN (vla)); >>>>>> >>>>>> that's less awkward to use? >>>>>> >>>>>> Sorry for not having a clear plan here ;) >>>>>> >>>>> >>>>> Using assume_aligned is a more orthogonal way to represent this in gimple, but >>>>> indeed harder to use. >>>>> >>>>> Another possibility is to add a 'tree vla_decl' field to struct >>>>> gimple_statement_call, which is probably the easiest to implement. >>>>> >>>>> But I think __builtin_alloca_with_align might have a use beyond vlas, so I >>>>> decided to try this one. Attached patch implements my first stab at this (now >>>>> testing on x86_64). >>>>> >>>>> Is this an acceptable approach? >>>>> >>>> >>>> bootstrapped and reg-tested (including ada) on x86_64. >>>> >>>> Ok for trunk? >>> >>> The idea is ok I think. But >>> >>> case BUILT_IN_ALLOCA: >>> + case BUILT_IN_ALLOCA_WITH_ALIGN: >>> /* If the allocation stems from the declaration of a variable-sized >>> object, it cannot accumulate. */ >>> target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp)); >>> if (target) >>> return target; >>> + if (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) >>> + == BUILT_IN_ALLOCA_WITH_ALIGN) >>> + { >>> + tree new_call = build_call_expr_loc (EXPR_LOCATION (exp), >>> + built_in_decls[BUILT_IN_ALLOCA], >>> + 1, CALL_EXPR_ARG (exp, 0)); >>> + CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp); >>> + exp = new_call; >>> + } >>> >>> Ick. Why can't the rest of the compiler not handle >>> BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA? >>> (thus, arrange things so the assembler name of alloca-with-align is alloca?) >>> >> >> We can set the assembler name in the local_define_builtin call. But that will >> still create a call alloca (12, 4). How do we deal with the second argument? >> >>> I don't see why you still need the special late CCP pass. >>> >> >> For alloca_with_align, the align will minimally be the 2nd argument. This is >> independent of folding, and we can propagate this information in every ccp. >> >> If the alloca_with_align is not folded and will not be folded anymore (something >> we know at the earliest after the propagation phase of the last ccp), >> the alignment of BIGGEST_ALIGNMENT is guaranteed, because we expand it like a >> normal allloca. We try to exploit this by improving the ptr_info->align. > > I'd just assume the (lower) constant alignment of the argument always. > >> Well, that was the idea. But now I wonder, isn't it better to do this in >> expand_builtin_alloca: >> ... >> /* Compute the argument. */ >> op0 = expand_normal (CALL_EXPR_ARG (exp, 0)); >> >> + align = >> + (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) == >> + BUILT_IN_ALLOCA_WITH_ALIGN) >> + ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1)) >> + : BIGGEST_ALIGNMENT; >> + >> + >> /* Allocate the desired space. */ >> - result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT, >> - cannot_accumulate); >> + result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate); >> result = convert_memory_address (ptr_mode, result); > > Yes, that's probably a good idea anyway - it will reduce excess > stack re-alignment for VLAs. > >> return result; >> >> ... >> >> This way, we lower the alignment requirement for vlas in general, not only when >> folding (unless we expand to an actual alloca call). >> >> If we do that, we don't know whether BIGGEST_ALIGNMENT is going to be applicable >> until expand, so we can't exploit that in the middle-end, so we lose the need >> for ccp_last. > > Good ;) > Handled this way now. > Thanks, > Richard. > >>> -static tree >>> -fold_builtin_alloca_for_var (gimple stmt) >>> +static bool >>> +builtin_alloca_with_align_p (gimple stmt) >>> { >>> - unsigned HOST_WIDE_INT size, threshold, n_elem; >>> - tree lhs, arg, block, var, elem_type, array_type; >>> - unsigned int align; >>> + tree fndecl; >>> + if (!is_gimple_call (stmt)) >>> + return false; >>> + >>> + fndecl = gimple_call_fndecl (stmt); >>> + >>> + return (fndecl != NULL_TREE >>> + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN); >>> +} >>> >>> equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN). >>> >>> + DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); >>> >>> Now, users might call __builtin_alloca_with_align (8, align), thus use >>> a non-constant alignment argument. You either need to reject this >>> in c-family/c-common.c:check_builtin_function_arguments with an >>> error or fall back to BIGGEST_ALIGNMENT (similarly during propagation >>> I suppose). >>> >>> +DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align") >>> >>> Oh, I see you are not exposing this to users, so ignore the comments >>> about non-constant align. >>> Can you move this >>> DEF down, near the other DEF_BUILTIN_STUBs and add a comment >>> that this is used for VLAs? Done. >>> >>> + build_int_cstu (size_type_node, >>> + DECL_ALIGN (parm))); >>> >>> size_int (DECL_ALIGN (parm)), similar in other places. >>> Done. >>> Let's see if there are comments from others on this - which I like. >>> >> >> Thanks, >> - Tom >> >>> Thanks, >>> Richard. >>> >>>> Thanks, >>>> - Tom >>>> >> >> Also, the alloca_with_align (a,b) now appears in assembly as alloca (a, b), as discussed in http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00307.html. Bootstrapped and regtested (including ada) on x86_64. OK for trunk? Thanks, - Tom 2011-10-05 Tom de Vries PR middle-end/50527 * tree.c (build_common_builtin_nodes): Add local_define_builtin for BUILT_IN_ALLOCA_WITH_ALIGN. Mark that BUILT_IN_ALLOCA_WITH_ALIGN can throw. * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN arglist. Set align for BUILT_IN_ALLOCA_WITH_ALIGN. (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. * tree-ssa-ccp.c (evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN. (fold_builtin_alloca_for_var): Rename to ... (fold_builtin_alloca_with_align): Set DECL_ALIGN from 2nd BUILT_IN_ALLOCA_WITH_ALIGN argument. (ccp_fold_stmt): Try folding BUILT_IN_ALLOCA_WITH_ALIGN using fold_builtin_alloca_with_align. (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN. * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using DEF_BUILTIN_STUB. * ipa-pure-const.c (special_builtin_state): Handle BUILT_IN_ALLOCA_WITH_ALIGN. * tree-ssa-alias.c (ref_maybe_used_by_call_p_1) (call_may_clobber_ref_p_1): Same. * function.c (gimplify_parameters): Lower vla to BUILT_IN_ALLOCA_WITH_ALIGN. * gimplify.c (gimplify_vla_decl): Same. * cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN. * tree-mudflap.c (mf_xform_statements): Same. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary) (mark_all_reaching_defs_necessary_1, propagate_necessity): Same. * varasm.c (incorporeal_function_p): Same. * tree-object-size.c (alloc_object_size): Same. * gimple.c (gimple_build_call_from_tree): Same. * gcc.dg/pr50527.c: New test. Index: gcc/tree.c =================================================================== --- gcc/tree.c (revision 179210) +++ gcc/tree.c (working copy) @@ -9483,9 +9483,18 @@ build_common_builtin_nodes (void) "alloca", ECF_MALLOC | ECF_NOTHROW | ECF_LEAF); } + ftype = build_function_type_list (ptr_type_node, size_type_node, + size_type_node, NULL_TREE); + local_define_builtin ("__builtin_alloca_with_align", ftype, + BUILT_IN_ALLOCA_WITH_ALIGN, "alloca", + ECF_MALLOC | ECF_NOTHROW | ECF_LEAF); + /* If we're checking the stack, `alloca' can throw. */ if (flag_stack_check) - TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA]) = 0; + { + TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA]) = 0; + TREE_NOTHROW (built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN]) = 0; + } ftype = build_function_type_list (void_type_node, ptr_type_node, ptr_type_node, Index: gcc/builtins.c =================================================================== --- gcc/builtins.c (revision 179210) +++ gcc/builtins.c (working copy) @@ -4516,20 +4516,33 @@ expand_builtin_alloca (tree exp, bool ca { rtx op0; rtx result; + bool valid_arglist; + unsigned int align; + bool alloca_with_align = (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) + == BUILT_IN_ALLOCA_WITH_ALIGN); /* Emit normal call if marked not-inlineable. */ if (CALL_CANNOT_INLINE_P (exp)) return NULL_RTX; - if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE)) + valid_arglist + = (alloca_with_align + ? validate_arglist (exp, INTEGER_TYPE, INTEGER_TYPE, VOID_TYPE) + : validate_arglist (exp, INTEGER_TYPE, VOID_TYPE)); + + if (!valid_arglist) return NULL_RTX; /* Compute the argument. */ op0 = expand_normal (CALL_EXPR_ARG (exp, 0)); + /* Compute the alignment. */ + align = (alloca_with_align + ? TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1)) + : BIGGEST_ALIGNMENT); + /* Allocate the desired space. */ - result = allocate_dynamic_stack_space (op0, 0, BIGGEST_ALIGNMENT, - cannot_accumulate); + result = allocate_dynamic_stack_space (op0, 0, align, cannot_accumulate); result = convert_memory_address (ptr_mode, result); return result; @@ -5304,6 +5317,7 @@ expand_builtin (tree exp, rtx target, rt && !called_as_built_in (fndecl) && DECL_ASSEMBLER_NAME_SET_P (fndecl) && fcode != BUILT_IN_ALLOCA + && fcode != BUILT_IN_ALLOCA_WITH_ALIGN && fcode != BUILT_IN_FREE) return expand_call (exp, target, ignore); @@ -5559,6 +5573,7 @@ expand_builtin (tree exp, rtx target, rt return XEXP (DECL_RTL (DECL_RESULT (current_function_decl)), 0); case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: /* If the allocation stems from the declaration of a variable-sized object, it cannot accumulate. */ target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp)); @@ -13561,6 +13576,7 @@ is_inexpensive_builtin (tree decl) { case BUILT_IN_ABS: case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: case BUILT_IN_BSWAP32: case BUILT_IN_BSWAP64: case BUILT_IN_CLZ: Index: gcc/tree-ssa-ccp.c =================================================================== --- gcc/tree-ssa-ccp.c (revision 179210) +++ gcc/tree-ssa-ccp.c (working copy) @@ -1492,6 +1492,7 @@ evaluate_stmt (gimple stmt) tree simplified = NULL_TREE; ccp_lattice_t likelyvalue = likely_value (stmt); bool is_constant = false; + unsigned int align; if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -1632,10 +1633,14 @@ evaluate_stmt (gimple stmt) break; case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: + align = (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN + ? TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)) + : BIGGEST_ALIGNMENT); val.lattice_val = CONSTANT; val.value = build_int_cst (TREE_TYPE (gimple_get_lhs (stmt)), 0); val.mask = shwi_to_double_int - (~(((HOST_WIDE_INT) BIGGEST_ALIGNMENT) + (~(((HOST_WIDE_INT) align) / BITS_PER_UNIT - 1)); break; @@ -1685,15 +1690,15 @@ evaluate_stmt (gimple stmt) return val; } -/* Detects a vla-related alloca with a constant argument. Declares fixed-size - array and return the address, if found, otherwise returns NULL_TREE. */ +/* Detects a __builtin_alloca_with_align with constant size argument. Declares + fixed-size array and returns the address, if found, otherwise returns + NULL_TREE. */ static tree -fold_builtin_alloca_for_var (gimple stmt) +fold_builtin_alloca_with_align (gimple stmt) { unsigned HOST_WIDE_INT size, threshold, n_elem; tree lhs, arg, block, var, elem_type, array_type; - unsigned int align; /* Get lhs. */ lhs = gimple_call_lhs (stmt); @@ -1709,10 +1714,10 @@ fold_builtin_alloca_for_var (gimple stmt size = TREE_INT_CST_LOW (arg); - /* Heuristic: don't fold large vlas. */ + /* Heuristic: don't fold large allocas. */ threshold = (unsigned HOST_WIDE_INT)PARAM_VALUE (PARAM_LARGE_STACK_FRAME); - /* In case a vla is declared at function scope, it has the same lifetime as a - declared array, so we allow a larger size. */ + /* In case the alloca is located at function entry, it has the same lifetime + as a declared array, so we allow a larger size. */ block = gimple_block (stmt); if (!(cfun->after_inlining && TREE_CODE (BLOCK_SUPERCONTEXT (block)) == FUNCTION_DECL)) @@ -1723,12 +1728,9 @@ fold_builtin_alloca_for_var (gimple stmt /* Declare array. */ elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1); n_elem = size * 8 / BITS_PER_UNIT; - align = MIN (size * 8, BIGGEST_ALIGNMENT); - if (align < BITS_PER_UNIT) - align = BITS_PER_UNIT; array_type = build_array_type_nelts (elem_type, n_elem); var = create_tmp_var (array_type, NULL); - DECL_ALIGN (var) = align; + DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); { struct ptr_info_def *pi = SSA_NAME_PTR_INFO (lhs); if (pi != NULL && !pi->pt.anything) @@ -1813,12 +1815,12 @@ ccp_fold_stmt (gimple_stmt_iterator *gsi if (gimple_call_internal_p (stmt)) return false; - /* The heuristic of fold_builtin_alloca_for_var differs before and after - inlining, so we don't require the arg to be changed into a constant - for folding, but just to be constant. */ - if (gimple_call_alloca_for_var_p (stmt)) + /* The heuristic of fold_builtin_alloca_with_align differs before and + after inlining, so we don't require the arg to be changed into a + constant for folding, but just to be constant. */ + if (gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN)) { - tree new_rhs = fold_builtin_alloca_for_var (stmt); + tree new_rhs = fold_builtin_alloca_with_align (stmt); if (new_rhs) { bool res = update_call_from_tree (gsi, new_rhs); @@ -2093,7 +2095,8 @@ optimize_stack_restore (gimple_stmt_iter if (!callee || DECL_BUILT_IN_CLASS (callee) != BUILT_IN_NORMAL /* All regular builtins are ok, just obviously not alloca. */ - || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA) + || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA + || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA_WITH_ALIGN) return NULL_TREE; if (DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE) Index: gcc/builtins.def =================================================================== --- gcc/builtins.def (revision 179210) +++ gcc/builtins.def (working copy) @@ -738,6 +738,7 @@ DEF_BUILTIN_STUB (BUILT_IN_SETJMP_RECEIV /* Implementing variable sized local variables. */ DEF_BUILTIN_STUB (BUILT_IN_STACK_SAVE, "__builtin_stack_save") DEF_BUILTIN_STUB (BUILT_IN_STACK_RESTORE, "__builtin_stack_restore") +DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN, "__builtin_alloca_with_align") /* Object size checking builtins. */ DEF_GCC_BUILTIN (BUILT_IN_OBJECT_SIZE, "object_size", BT_FN_SIZE_CONST_PTR_INT, ATTR_PURE_NOTHROW_LEAF_LIST) Index: gcc/ipa-pure-const.c =================================================================== --- gcc/ipa-pure-const.c (revision 179210) +++ gcc/ipa-pure-const.c (working copy) @@ -437,6 +437,7 @@ special_builtin_state (enum pure_const_s case BUILT_IN_RETURN: case BUILT_IN_UNREACHABLE: case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: case BUILT_IN_STACK_SAVE: case BUILT_IN_STACK_RESTORE: case BUILT_IN_EH_POINTER: Index: gcc/tree-ssa-alias.c =================================================================== --- gcc/tree-ssa-alias.c (revision 179210) +++ gcc/tree-ssa-alias.c (working copy) @@ -1231,6 +1231,7 @@ ref_maybe_used_by_call_p_1 (gimple call, case BUILT_IN_MALLOC: case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: case BUILT_IN_STACK_SAVE: case BUILT_IN_STACK_RESTORE: case BUILT_IN_MEMSET: @@ -1513,6 +1514,7 @@ call_may_clobber_ref_p_1 (gimple call, a return false; case BUILT_IN_STACK_SAVE: case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: case BUILT_IN_ASSUME_ALIGNED: return false; /* Freeing memory kills the pointed-to memory. More importantly Index: gcc/function.c =================================================================== --- gcc/function.c (revision 179210) +++ gcc/function.c (working copy) @@ -3638,8 +3638,10 @@ gimplify_parameters (void) DECL_IGNORED_P (addr) = 0; local = build_fold_indirect_ref (addr); - t = built_in_decls[BUILT_IN_ALLOCA]; - t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm)); + t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN]; + t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm), + size_int (DECL_ALIGN (parm))); + /* The call has been built for a variable-sized object. */ CALL_ALLOCA_FOR_VAR_P (t) = 1; t = fold_convert (ptr_type, t); Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c (revision 179210) +++ gcc/gimplify.c (working copy) @@ -1329,8 +1329,9 @@ gimplify_vla_decl (tree decl, gimple_seq SET_DECL_VALUE_EXPR (decl, t); DECL_HAS_VALUE_EXPR_P (decl) = 1; - t = built_in_decls[BUILT_IN_ALLOCA]; - t = build_call_expr (t, 1, DECL_SIZE_UNIT (decl)); + t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN]; + t = build_call_expr (t, 2, DECL_SIZE_UNIT (decl), + size_int (DECL_ALIGN (decl))); /* The call has been built for a variable-sized object. */ CALL_ALLOCA_FOR_VAR_P (t) = 1; t = fold_convert (ptr_type, t); Index: gcc/cfgexpand.c =================================================================== --- gcc/cfgexpand.c (revision 179210) +++ gcc/cfgexpand.c (working copy) @@ -1858,7 +1858,8 @@ expand_call_stmt (gimple stmt) CALL_EXPR_RETURN_SLOT_OPT (exp) = gimple_call_return_slot_opt_p (stmt); if (decl && DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL - && DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA) + && (DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA + || DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA_WITH_ALIGN)) CALL_ALLOCA_FOR_VAR_P (exp) = gimple_call_alloca_for_var_p (stmt); else CALL_FROM_THUNK_P (exp) = gimple_call_from_thunk_p (stmt); Index: gcc/tree-mudflap.c =================================================================== --- gcc/tree-mudflap.c (revision 179210) +++ gcc/tree-mudflap.c (working copy) @@ -969,7 +969,9 @@ mf_xform_statements (void) case GIMPLE_CALL: { tree fndecl = gimple_call_fndecl (s); - if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA)) + if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA + || (DECL_FUNCTION_CODE (fndecl) + == BUILT_IN_ALLOCA_WITH_ALIGN))) gimple_call_set_cannot_inline (s, true); } break; Index: gcc/tree-ssa-dce.c =================================================================== --- gcc/tree-ssa-dce.c (revision 179210) +++ gcc/tree-ssa-dce.c (working copy) @@ -308,6 +308,7 @@ mark_stmt_if_obviously_necessary (gimple case BUILT_IN_MALLOC: case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: return; default:; @@ -639,6 +640,7 @@ mark_all_reaching_defs_necessary_1 (ao_r case BUILT_IN_MALLOC: case BUILT_IN_CALLOC: case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: case BUILT_IN_FREE: return false; @@ -890,6 +892,8 @@ propagate_necessity (struct edge_list *e || DECL_FUNCTION_CODE (callee) == BUILT_IN_FREE || DECL_FUNCTION_CODE (callee) == BUILT_IN_VA_END || DECL_FUNCTION_CODE (callee) == BUILT_IN_ALLOCA + || (DECL_FUNCTION_CODE (callee) + == BUILT_IN_ALLOCA_WITH_ALIGN) || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_SAVE || DECL_FUNCTION_CODE (callee) == BUILT_IN_STACK_RESTORE || DECL_FUNCTION_CODE (callee) == BUILT_IN_ASSUME_ALIGNED)) Index: gcc/varasm.c =================================================================== --- gcc/varasm.c (revision 179210) +++ gcc/varasm.c (working copy) @@ -2104,7 +2104,8 @@ incorporeal_function_p (tree decl) const char *name; if (DECL_BUILT_IN_CLASS (decl) == BUILT_IN_NORMAL - && DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA) + && (DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA + || DECL_FUNCTION_CODE (decl) == BUILT_IN_ALLOCA_WITH_ALIGN)) return true; name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); Index: gcc/tree-object-size.c =================================================================== --- gcc/tree-object-size.c (revision 179210) +++ gcc/tree-object-size.c (working copy) @@ -411,6 +411,7 @@ alloc_object_size (const_gimple call, in /* fall through */ case BUILT_IN_MALLOC: case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: arg1 = 0; default: break; Index: gcc/gimple.c =================================================================== --- gcc/gimple.c (revision 179210) +++ gcc/gimple.c (working copy) @@ -374,7 +374,8 @@ gimple_build_call_from_tree (tree t) gimple_call_set_return_slot_opt (call, CALL_EXPR_RETURN_SLOT_OPT (t)); if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL - && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA) + && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA + || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN)) gimple_call_set_alloca_for_var (call, CALL_ALLOCA_FOR_VAR_P (t)); else gimple_call_set_from_thunk (call, CALL_FROM_THUNK_P (t)); Index: gcc/testsuite/gcc.dg/pr50527.c =================================================================== --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr50527.c (revision 0) @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-Os --param large-stack-frame=30" } */ + +extern void abort (void); + +void __attribute__((noinline)) +bar (char *a) +{ +} + +void __attribute__((noinline)) +foo (char *a, int b) +{ +} + +void __attribute__((noinline)) +test_align (char *p, int aligned, unsigned int mask) +{ + int p_aligned = ((unsigned long int)p & mask) == 0; + if (aligned != p_aligned) + abort (); +} + +int +main () +{ + const int kIterations = 4; + char results[kIterations]; + int i; + unsigned int mask; + + mask = 0xf; + test_align (results, ((unsigned long int)results & mask) == 0, mask); + mask = 0x7; + test_align (results, ((unsigned long int)results & mask) == 0, mask); + mask = 0x3; + test_align (results, ((unsigned long int)results & mask) == 0, mask); + mask = 0x1; + test_align (results, ((unsigned long int)results & mask) == 0, mask); + + bar (results); + for (i = 0; i < kIterations; i++) + foo ("%d ", results[i]); + + return 0; +}