From patchwork Mon Nov 11 23:55:17 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Marc Glisse X-Patchwork-Id: 290523 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 7863A2C0099 for ; Tue, 12 Nov 2013 10:55:39 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type; q=dns; s=default; b=P9guviWyvVBXut+3 BMZwA5vqUjb6oRgctb6/mElTo6dYEBGox8mrZAH1niu5gr4NEXYeT2w/vMumy3FW 007bMzJvmIOOUWIb3sAcvIeH0uNIJ9yOgBXBQNMB1rosvk4MsPENVtjvxZ60FCsh j/Sp2p+wSLdVvDGVNDtXDNaTniU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type; s=default; bh=7OJFpvGqMNSI2kynHQXEVD WFprM=; b=uhIvZbz84c737KMLJuHda41MMNdSQysZVdtDwBckNHaMO02XaP42Qn ljydetfUlUpt7izYQ73kbWHLecQ5skfPJRweiBoTy4VApWRpap73gr0oaDwJOqO4 UxsstTETxB2PnZhTQR4zW0mryp9ZpPD2nifN3A1GkThRoWWFaiRVs= Received: (qmail 23931 invoked by alias); 11 Nov 2013 23:55:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 23920 invoked by uid 89); 11 Nov 2013 23:55:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_99, RDNS_NONE, URIBL_BLOCKED autolearn=no version=3.3.2 X-HELO: mail3-relais-sop.national.inria.fr Received: from Unknown (HELO mail3-relais-sop.national.inria.fr) (192.134.164.104) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Mon, 11 Nov 2013 23:55:27 +0000 Received: from stedding.saclay.inria.fr ([193.55.250.194]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES128-SHA; 12 Nov 2013 00:55:17 +0100 Received: from glisse (helo=localhost) by stedding.saclay.inria.fr with local-esmtp (Exim 4.80) (envelope-from ) id 1Vg1K9-0000V8-74; Tue, 12 Nov 2013 00:55:17 +0100 Date: Tue, 12 Nov 2013 00:55:17 +0100 (CET) From: Marc Glisse To: Richard Biener cc: GCC Patches Subject: Re: [RFC] replace malloc with a decl on the stack In-Reply-To: Message-ID: References: User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 On Mon, 11 Nov 2013, Richard Biener wrote: > On Sun, Nov 10, 2013 at 4:27 PM, Marc Glisse wrote: >> Hello, >> >> I am posting this patch to get some feedback on the approach. The goal is to >> replace malloc+free with a stack allocation (a decl actually) when the size >> is a small constant. >> >> For testing, I highjacked the "leaf" attribute, but it isn't right, I'll >> remove it from the list (not sure what I'll do for the testcases then). What >> I'd want instead is a "returns" attribute that means the function will >> return (either by "return" or an exception), as opposed to having an >> infinite loop, calling exit or longjmp, etc (sched-deps.c has a related >> call_may_noreturn_p). The situation I am trying to avoid is: >> p=malloc(12); >> f(p) >> free(p) >> >> where f contains somewhere: >> free(p); exit(42); >> (or longjmp or something else that takes the regular call to free out of the >> execution flow). > > Instead of a new attribute IPA should be used to compute this call-graph > property. Could be both, no? Any idea where I could store this information? Space is so short that pure and const don't even have their bits in the same place. Then there is the question of what properties should be looked at exactly: - may leave the function via longjmp - must leave the function via return or an exception - may free memory it didn't allocate itself - ? > An infinite loop wouldn't be an issue, but the real issue is > that it shouldn't free the object which would be only valid if the free > after the call isn't reached. So the infinite loop is an issue exactly in the same way calling exit is an issue, right? > IPA pure-const is used to compute similar properties (may loop, > may throw). > > You also have to make sure to properly align the replacement. Thanks, for some reason I lost this line when I copied from fold_builtin_alloca_with_align. >> The size above which the malloc->stack transformation is not applied should >> depend on a parameter, I don't know if it should get its own or depend on an >> existing one. In any case, I'd like to avoid ending up with a ridiculously >> low threshold (my programs use GMP, which internally uses alloca up to 65536 >> bytes (even in recursive functions that have a dozen allocations), so I >> don't want gcc to tell me that 50 bytes are too much). >> >> A program with a double-free may, with this patch, end up crashing on the >> first free instead of the second, but that's an invalid program anyway. >> >> >> I don't know if tree-ssa-forwprop is a good place for it, but it was >> convenient for a prototype. I like the idea of running it several times: >> replacing malloc with a decl early can help other optimizations, and other >> optimizations can make this one possible later. > > We have a similar transform in CCP (fold_builtin_alloca_with_align) which > is there because CCP is a good place where size arguments may > become constant (the case you handle - you don't seem to handle > variable malloc -> alloca transform which would be possible if for example > VRP figures out a acceptable upper bound for the size). I added something for the case where VRP gives an upper bound on the size, but I am still using a DECL and not alloca. Using alloca has some drawbacks: it shouldn't be done inside a loop (unless we know the memory is deallocated in the same iteration and we can save/restore the stack à la VLA, but that gets complicated), it should be tried only in late passes not to hinder inlining, etc. I guess I could try, but I fear it will be quite rare to have suitable bounds on the size of allocations. I am currently having trouble with removing the free calls. I am using gsi_replace to put a clobber instead (is that useful?) but I also tried gsi_remove, and adding various release_defs, unlink_stmt_vdef, update_stmt without success. When I compile heapstack-1.c, the sink pass crashes because it looks at the uses of p=&array and ends up on __builtin_free(p), which was removed several passes earlier and doesn't have a bb anymore. What is the right way to get rid of those statements? Index: testsuite/g++.dg/tree-ssa/heapstack-1.C =================================================================== --- testsuite/g++.dg/tree-ssa/heapstack-1.C (revision 0) +++ testsuite/g++.dg/tree-ssa/heapstack-1.C (working copy) @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +struct A { + void*p; + A(void*q) : p(q) {} + ~A(){ __builtin_free(p); } +}; +void f(void*)__attribute__((__leaf__)); +void h(void*)__attribute__((__leaf__,__nothrow__)); +void g(){ + void*p=__builtin_malloc(12); + A a(p); + f(p); +} + +void i(){ + void*p=__builtin_malloc(12); + h(p); + __builtin_free(p); +} + +/* { dg-final { scan-tree-dump-not "malloc" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Property changes on: testsuite/g++.dg/tree-ssa/heapstack-1.C ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Index: testsuite/g++.dg/tree-ssa/heapstack-2.C =================================================================== --- testsuite/g++.dg/tree-ssa/heapstack-2.C (revision 0) +++ testsuite/g++.dg/tree-ssa/heapstack-2.C (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +void f(void*)__attribute__((__leaf__)); +void g(){ + void*p=__builtin_malloc(12); + f(p); + __builtin_free(p); +} + +/* { dg-final { scan-tree-dump-times "malloc" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Property changes on: testsuite/g++.dg/tree-ssa/heapstack-2.C ___________________________________________________________________ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: testsuite/gcc.dg/tree-ssa/heapstack-1.c =================================================================== --- testsuite/gcc.dg/tree-ssa/heapstack-1.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/heapstack-1.c (working copy) @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +void* f(void*,int)__attribute__((__pure__)); +void * volatile q; +void g(int m,int n,int s){ + int i; + if (s<0||s>3)__builtin_unreachable(); + void*p=__builtin_calloc(s,4); + switch(n){ + case 1: + for (i=0; ivalue); nonzero_bits &= get_nonzero_bits (name); set_nonzero_bits (name, nonzero_bits); } } /* Perform substitutions based on the known constant values. */ something_changed = substitute_and_fold (get_constant_value, ccp_fold_stmt, true); + BITMAP_FREE (malloc_studied); free (const_val); const_val = NULL; return something_changed;; } /* Compute the meet operator between *VAL1 and *VAL2. Store the result in VAL1. any M UNDEFINED = any @@ -1910,20 +1918,161 @@ fold_builtin_alloca_with_align (gimple s singleton_p = pt_solution_singleton_p (&pi->pt, &uid); gcc_assert (singleton_p); SET_DECL_PT_UID (var, uid); } } /* Fold alloca to the address of the array. */ return fold_convert (TREE_TYPE (lhs), build_fold_addr_expr (var)); } +/* Returns a constant upper bound if possible, or the variable if not. */ +static tree +get_upper_bound (tree var) +{ + tree cst = get_constant_value (var); + if (cst) + return cst; + double_int min, max; + if (TREE_CODE (var) != SSA_NAME + || get_range_info (var, &min, &max) != VR_RANGE) + return var; + return double_int_to_tree (TREE_TYPE (var), max); +} + +/* We want to be sure that free (VAR) can't be called in another function, and + the easiest way to ensure that is to prove that it is called in this + function. The hardest part is avoiding some call to a function that may not + return because of exit, an infinite loop, setjmp, etc, where it might call + free on VAR. */ +static bool +malloc_safe_on_stack (gimple stmt, vec *list_of_frees) +{ + tree var = gimple_call_lhs (stmt); + if (!bitmap_set_bit (malloc_studied, SSA_NAME_VERSION (var))) + return false; + basic_block bb = gimple_bb (stmt); + stack_vec bb_to_visit; + bitmap bb_visited = BITMAP_ALLOC (NULL); + bitmap_set_bit (bb_visited, bb->index); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + enum tree_code code; + +next_stmt: + gsi_next_nondebug (&gsi); + +handle_stmt: + if (gsi_end_p (gsi)) + { + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, bb->succs) + bb_to_visit.safe_push (e->dest); + goto next_bb; + } + stmt = gsi_stmt (gsi); + if (stmt_can_throw_external (stmt)) + /* We could give up only if VAR has escaped (same for return), but that + means there is a memory leak, so don't bother. */ + goto unsafe; + + switch (gimple_code (stmt)) + // TODO: GIMPLE_ASM, EH related gimples? + { + /* We leave the function without calling free. */ + case GIMPLE_RETURN: + goto unsafe; + + case GIMPLE_COND: + code = gimple_cond_code (stmt); + /* If we replace malloc by an array on the stack, it can't be NULL. */ + if ((code == EQ_EXPR || code == NE_EXPR) + && var == gimple_cond_lhs (stmt) + && integer_zerop (gimple_cond_rhs (stmt))) + { + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, bb->succs) + if (e->flags + & ((code == NE_EXPR) ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE)) + bb_to_visit.safe_push (e->dest); + goto next_bb; + } + /* Fallthrough. */ + + /* Last stmt of the bb, handled by looking at the outgoing edges. */ + case GIMPLE_GOTO: + case GIMPLE_SWITCH: + /* Statements that are irrelevant. */ + case GIMPLE_ASSIGN: + case GIMPLE_LABEL: + case GIMPLE_NOP: + goto next_stmt; + + case GIMPLE_CALL: + { + tree callee = gimple_call_fndecl (stmt); + if (callee != NULL_TREE + && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL) + switch (DECL_FUNCTION_CODE (callee)) + { + case BUILT_IN_FREE: + if (gimple_call_arg (stmt, 0) != var) + goto next_stmt; + list_of_frees->safe_push (stmt); + /* Fallthrough. */ + + case BUILT_IN_ABORT: + case BUILT_IN_EXIT: + case BUILT_IN__EXIT: + case BUILT_IN__EXIT2: + case BUILT_IN_TRAP: + case BUILT_IN_UNREACHABLE: + /* Nothing could throw an exception earlier in the block, + so we don't need to check the EH edges. */ + goto next_bb; + default:; + } + +#if 0 + // For Joost + goto next_stmt; +#endif + int flags = gimple_call_flags (stmt); + // FIXME: leaf is actually not safe, we need some new ECF_* flags. + // ??? In fortran any call is safe? + if (flags & (ECF_CONST|ECF_PURE|ECF_LEAF|ECF_NOVOPS)) + goto next_stmt; + goto unsafe; + } + default: + goto unsafe; + } + +next_bb: + if (bb_to_visit.is_empty ()) + goto safe; + bb = bb_to_visit.last (); + bb_to_visit.pop (); + if (!bitmap_set_bit (bb_visited, bb->index)) + goto next_bb; + gsi = gsi_start_nondebug_bb (bb); + goto handle_stmt; + +unsafe: + BITMAP_FREE (bb_visited); + return false; +safe: + BITMAP_FREE (bb_visited); + return true; +} + /* Fold the stmt at *GSI with CCP specific information that propagating and regular folding does not catch. */ static bool ccp_fold_stmt (gimple_stmt_iterator *gsi) { gimple stmt = gsi_stmt (*gsi); switch (gimple_code (stmt)) { @@ -1998,20 +2147,90 @@ ccp_fold_stmt (gimple_stmt_iterator *gsi if (new_rhs) { bool res = update_call_from_tree (gsi, new_rhs); tree var = TREE_OPERAND (TREE_OPERAND (new_rhs, 0),0); gcc_assert (res); insert_clobbers_for_var (*gsi, var); return true; } } + /* Replace malloc+free with a stack allocation. */ + if (gimple_call_builtin_p (stmt, BUILT_IN_FREE)) + { + bool is_calloc = false; + tree ptr = gimple_call_arg (stmt, 0); + if (TREE_CODE (ptr) != SSA_NAME) + return false; + gimple stmt1 = SSA_NAME_DEF_STMT (ptr); + if ((is_calloc = gimple_call_builtin_p (stmt1, BUILT_IN_CALLOC)) + || gimple_call_builtin_p (stmt1, BUILT_IN_MALLOC)) + { + gcc_checking_assert (gimple_call_lhs (stmt1) == ptr); + tree size = gimple_call_arg (stmt1, 0); + size = get_upper_bound (size); + if (is_calloc) + { + tree siz2 = gimple_call_arg (stmt1, 1); + siz2 = get_upper_bound (siz2); + size = fold_binary (MULT_EXPR, size_type_node, size, siz2); + } + if (!size || !host_integerp (size, 1) + /* param_value(...) / 10? Some new parameter? */ + || TREE_INT_CST_LOW (size) + > (unsigned HOST_WIDE_INT) + PARAM_VALUE (PARAM_LARGE_STACK_FRAME)) + return false; + stack_vec list_of_frees; + if (!malloc_safe_on_stack (stmt1, &list_of_frees)) + return false; + /* Declare array. */ + tree elem_type = + build_nonstandard_integer_type (BITS_PER_UNIT, 1); + tree array_type = + build_array_type_nelts (elem_type, TREE_INT_CST_LOW (size)); + tree var = create_tmp_var (array_type, NULL); + DECL_ALIGN (var) = BIGGEST_ALIGNMENT; + tree p = fold_convert (TREE_TYPE (ptr), + build_fold_addr_expr (var)); + /* Replace free with a clobber. */ + int i; + gimple free_stmt; + FOR_EACH_VEC_ELT (list_of_frees, i, free_stmt) + { +#if 0 + unlink_stmt_vdef (free_stmt); + gimple_stmt_iterator gsi_free = gsi_for_stmt (free_stmt); + gsi_remove (&gsi_free, true); + release_defs (free_stmt); +#else + tree clobber = build_constructor (array_type, NULL); + TREE_THIS_VOLATILE (clobber) = 1; + gimple clobber_stmt = gimple_build_assign (var, clobber); + gimple_stmt_iterator gsi_free = gsi_for_stmt (free_stmt); + gsi_replace (&gsi_free, clobber_stmt, false); +#endif + } + /* Replace malloc with the address of the variable. */ + gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt1); + update_call_from_tree (&gsi1, p); + if (is_calloc) + { + tree memset_decl = builtin_decl_explicit (BUILT_IN_MEMSET); + gimple zero = gimple_build_call (memset_decl, 3, ptr, + integer_zero_node, size); + gsi_insert_after (&gsi1, zero, GSI_SAME_STMT); + } + return true; + } + } + /* Propagate into the call arguments. Compared to replace_uses_in this can use the argument slot types for type verification instead of the current argument type. We also can safely drop qualifiers here as we are dealing with constants anyway. */ argt = TYPE_ARG_TYPES (gimple_call_fntype (stmt)); for (i = 0; i < gimple_call_num_args (stmt) && argt; ++i, argt = TREE_CHAIN (argt)) { tree arg = gimple_call_arg (stmt, i); if (TREE_CODE (arg) == SSA_NAME