From patchwork Tue May 22 02:31:37 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aldy Hernandez X-Patchwork-Id: 160525 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 417BFB6FAB for ; Tue, 22 May 2012 12:32:05 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1338258726; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject: References:In-Reply-To:Content-Type:Mailing-List:Precedence: List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender: Delivered-To; bh=kHO7YrCuyRVamwuNsFtAlNTLX+8=; b=tQULptkiIDi3/gI Vtrpk/3ouImc19676LypVz2/EtaA9J1t+OyCP1REwcpAhPAFitM2QY3TiQ0BN6xc kRJcouJbwYni/KVdG/B1mRydwzCcsLWuzU5xj1oHuKOpP+1F95Lzj+2lHO1DiZfD PxzkQ69qFx9LfjdM66LurnxGWh3Q= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=o5i5FZooQoIZp4Cwbn1ElHfawq7jGbJ0MSVys3p/Ic4ODQCLNMbnCUNb+EEPdO ijq5B7rOmwNHSxE9KWZdLCac5S4oiJUL1eWZqxGcX/9AKdPX9tR4OzEDBq9w1OvA Cp2OMRgov+OMShSJngkxhzj1sX7LCUuMALEkzPEz/vQEk=; Received: (qmail 13077 invoked by alias); 22 May 2012 02:32:01 -0000 Received: (qmail 12955 invoked by uid 22791); 22 May 2012 02:31:57 -0000 X-SWARE-Spam-Status: No, hits=-6.9 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, KHOP_THREADED, RCVD_IN_DNSWL_HI, RCVD_IN_HOSTKARMA_W, SPF_HELO_PASS, TW_EV, TW_TM, TW_VB, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 22 May 2012 02:31:39 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q4M2Vcvh012004 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 21 May 2012 22:31:39 -0400 Received: from houston.quesejoda.com (vpn-230-217.phx2.redhat.com [10.3.230.217]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q4M2VbKs009653; Mon, 21 May 2012 22:31:38 -0400 Message-ID: <4FBAFA89.8050706@redhat.com> Date: Mon, 21 May 2012 21:31:37 -0500 From: Aldy Hernandez User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 To: Richard Guenther CC: Andrew MacLeod , "Boehm, Hans" , gcc-patches , Torvald Riegel Subject: Re: [PR tree-optimization/52558]: RFC: questions on store data race References: <4F875324.7060908@redhat.com> <4F96E635.2040808@redhat.com> <4F9880C9.2070704@redhat.com> <4FA85518.4070106@redhat.com> In-Reply-To: Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On 05/16/12 07:53, Richard Guenther wrote: > On Mon, 7 May 2012, Aldy Hernandez wrote: [Sorry for the delay; I was on vacation.] I am forgoing the load avoidance code altogether to simplify things. Thanks. > + /* Emit the load code into the latch, so that we are sure it will > + be processed after all dependencies. */ > + latch_edge = loop_latch_edge (loop); > > inserting code into the latch block is bad - the vectorizer will > be confused by non-empty latches. The code as is on trunk inserts the load on the latch. That's why I also inserted the supporting flag checking code there. Do you want me to put the supporting code somewhere else? > Your ChangeLog mentions changes that are no longer necessary > (the target hook). Whoops. Fixed. > > I didn't quite get the store order issue - we _do_ preserve store > ordering right now, do we? So how come introducing the flags > change that? The current scheme on trunk works because it inserts one load with gsi_insert_on_edge() and subsequent ones get appended correctly, whereas my patch has to split the edge (which happens at the top of the block), so subsequent code inserts happen in reverse order (at the top). If I remember correctly, that is... I can look again and report if it's still unclear. New patch attached. Tested on x86-64 Linux. No regressions. Thanks. Aldy * cfg.c (alloc_aux_for_edge): Fix comment. (alloc_aux_for_edge): Remove static. * basic-block.h (alloc_aux_for_edge): Protoize. * tree-ssa-loop-im.c (execute_sm_if_changed): New. (execute_sm_if_changed_flag): New. (execute_sm_if_changed_flag_set): New. (execute_sm): Do not generate data races unless requested. (tree_ssa_lim_initialize): Call alloc_aux_for_edges. (tree_ssa_lim_finalize): Call free_aux_for_edges. Index: testsuite/gcc.dg/tm/reg-promotion.c =================================================================== --- testsuite/gcc.dg/tm/reg-promotion.c (revision 0) +++ testsuite/gcc.dg/tm/reg-promotion.c (revision 0) @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-fgnu-tm -O2 -fdump-tree-lim1" } */ + +/* Test that `count' is not written to unless p->data>0. */ + +int count; + +struct obj { + int data; + struct obj *next; +} *q; + +void func() +{ + struct obj *p; + __transaction_atomic { + for (p = q; p; p = p->next) + if (p->data > 0) + count++; + } +} + +/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" } } */ +/* { dg-final { cleanup-tree-dump "lim1" } } */ Index: testsuite/gcc.dg/pr52558-1.c =================================================================== --- testsuite/gcc.dg/pr52558-1.c (revision 0) +++ testsuite/gcc.dg/pr52558-1.c (revision 0) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */ + +/* Test that `count' is not written to unless p->data > 0. */ + +int count; + +struct obj { + int data; + struct obj *next; +} *q; + +void func() +{ + struct obj *p; + for (p = q; p; p = p->next) + if (p->data > 0) + count++; +} + +/* { dg-final { scan-tree-dump-times "MEM count_lsm.. count_lsm_flag" 1 "lim1" } } */ +/* { dg-final { cleanup-tree-dump "lim1" } } */ Index: testsuite/gcc.dg/pr52558-2.c =================================================================== --- testsuite/gcc.dg/pr52558-2.c (revision 0) +++ testsuite/gcc.dg/pr52558-2.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "--param allow-store-data-races=0 -O2 -fdump-tree-lim1" } */ + +/* Test that g_2 is not written to unless !g_1. */ + +int g_1 = 1; +int g_2 = 0; + +int func_1(void) +{ + int l; + for (l = 0; l < 1234; l++) + { + if (g_1) + return l; + else + g_2 = 0; + } + return 999; +} + +/* { dg-final { scan-tree-dump-times "MEM.*g_2_lsm_flag" 1 "lim1" } } */ +/* { dg-final { cleanup-tree-dump "lim1" } } */ Index: basic-block.h =================================================================== --- basic-block.h (revision 187729) +++ basic-block.h (working copy) @@ -802,6 +802,7 @@ extern basic_block alloc_block (void); extern void alloc_aux_for_blocks (int); extern void clear_aux_for_blocks (void); extern void free_aux_for_blocks (void); +extern void alloc_aux_for_edge (edge, int); extern void alloc_aux_for_edges (int); extern void clear_aux_for_edges (void); extern void free_aux_for_edges (void); Index: tree-ssa-loop-im.c =================================================================== --- tree-ssa-loop-im.c (revision 187729) +++ tree-ssa-loop-im.c (working copy) @@ -52,7 +52,7 @@ along with GCC; see the file COPYING3. } } - Where COND and INV are is invariants, but evaluating INV may trap or be + Where COND and INV are invariants, but evaluating INV may trap or be invalid from some other reason if !COND. This may be transformed to if (cond) @@ -1626,6 +1626,7 @@ gather_mem_refs_stmt (struct loop *loop, fprintf (dump_file, "\n"); } } + if (is_stored) mark_ref_stored (ref, loop); @@ -1956,6 +1957,173 @@ get_lsm_tmp_name (tree ref, unsigned n) return lsm_tmp_name; } +struct prev_flag_edges { + /* Edge to insert new flag comparison code. */ + edge append_cond_position; + + /* Edge for fall through from previous flag comparison. */ + edge last_cond_fallthru; +}; + +/* Helper function for execute_sm. Emit code to store TMP_VAR into + MEM along edge EX. + + The store is only done if MEM has changed. We do this so no + changes to MEM occur on code paths that did not originally store + into it. + + The common case for execute_sm will transform: + + for (...) { + if (foo) + stuff; + else + MEM = TMP_VAR; + } + + into: + + lsm = MEM; + for (...) { + if (foo) + stuff; + else + lsm = TMP_VAR; + } + MEM = lsm; + + This function will generate: + + lsm = MEM; + + lsm_flag = false; + ... + for (...) { + if (foo) + stuff; + else { + lsm = TMP_VAR; + lsm_flag = true; + } + } + if (lsm_flag) <-- + MEM = lsm; <-- +*/ + +static void +execute_sm_if_changed (edge ex, tree mem, tree tmp_var, tree flag) +{ + basic_block new_bb, then_bb, old_dest; + bool loop_has_only_one_exit; + edge then_old_edge, orig_ex = ex; + gimple_stmt_iterator gsi; + gimple stmt; + struct prev_flag_edges *prev_edges = (struct prev_flag_edges *) ex->aux; + + /* ?? Insert store after previous store if applicable. See note + below. */ + if (prev_edges) + ex = prev_edges->append_cond_position; + + loop_has_only_one_exit = single_pred_p (ex->dest); + + if (loop_has_only_one_exit) + ex = split_block_after_labels (ex->dest); + + old_dest = ex->dest; + new_bb = split_edge (ex); + then_bb = create_empty_bb (new_bb); + if (current_loops && new_bb->loop_father) + add_bb_to_loop (then_bb, new_bb->loop_father); + + gsi = gsi_start_bb (new_bb); + stmt = gimple_build_cond (NE_EXPR, flag, boolean_false_node, + NULL_TREE, NULL_TREE); + gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING); + + gsi = gsi_start_bb (then_bb); + /* Insert actual store. */ + stmt = gimple_build_assign (unshare_expr (mem), tmp_var); + gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING); + + make_edge (new_bb, then_bb, EDGE_TRUE_VALUE); + make_edge (new_bb, old_dest, EDGE_FALSE_VALUE); + then_old_edge = make_edge (then_bb, old_dest, EDGE_FALLTHRU); + + set_immediate_dominator (CDI_DOMINATORS, then_bb, new_bb); + + if (prev_edges) + { + basic_block prevbb = prev_edges->last_cond_fallthru->src; + redirect_edge_succ (prev_edges->last_cond_fallthru, new_bb); + set_immediate_dominator (CDI_DOMINATORS, new_bb, prevbb); + set_immediate_dominator (CDI_DOMINATORS, old_dest, + recompute_dominator (CDI_DOMINATORS, old_dest)); + } + + /* ?? Because stores may alias, they must happen in the exact + sequence they originally happened. Save the position right after + the (_lsm) store we just created so we can continue appending after + it and maintain the original order. */ + { + struct prev_flag_edges *p; + + if (orig_ex->aux) + orig_ex->aux = NULL; + alloc_aux_for_edge (orig_ex, sizeof (struct prev_flag_edges)); + p = (struct prev_flag_edges *) orig_ex->aux; + p->append_cond_position = then_old_edge; + p->last_cond_fallthru = find_edge (new_bb, old_dest); + orig_ex->aux = (void *) p; + } + + if (!loop_has_only_one_exit) + for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple phi = gsi_stmt (gsi); + unsigned i; + + for (i = 0; i < gimple_phi_num_args (phi); i++) + if (gimple_phi_arg_edge (phi, i)->src == new_bb) + { + tree arg = gimple_phi_arg_def (phi, i); + add_phi_arg (phi, arg, then_old_edge, UNKNOWN_LOCATION); + update_stmt (phi); + } + } + /* Remove the original fall through edge. This was the + single_succ_edge (new_bb). */ + EDGE_SUCC (new_bb, 0)->flags &= ~EDGE_FALLTHRU; +} + +/* Helper function for execute_sm. On every location where REF is + set, set an appropriate flag indicating the store. */ + +static tree +execute_sm_if_changed_flag_set (struct loop *loop, mem_ref_p ref) +{ + unsigned i; + mem_ref_loc_p loc; + tree flag; + VEC (mem_ref_loc_p, heap) *locs = NULL; + char *str = get_lsm_tmp_name (ref->mem, ~0); + + lsm_tmp_name_add ("_flag"); + flag = make_rename_temp (boolean_type_node, str); + get_all_locs_in_loop (loop, ref, &locs); + FOR_EACH_VEC_ELT (mem_ref_loc_p, locs, i, loc) + { + gimple_stmt_iterator gsi; + gimple stmt; + + gsi = gsi_for_stmt (loc->stmt); + stmt = gimple_build_assign (flag, boolean_true_node); + gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING); + } + VEC_free (mem_ref_loc_p, heap, locs); + return flag; +} + /* Executes store motion of memory reference REF from LOOP. Exits from the LOOP are stored in EXITS. The initialization of the temporary variable is put to the preheader of the loop, and assignments @@ -1964,12 +2132,16 @@ get_lsm_tmp_name (tree ref, unsigned n) static void execute_sm (struct loop *loop, VEC (edge, heap) *exits, mem_ref_p ref) { - tree tmp_var; + tree tmp_var, store_flag; unsigned i; - gimple load, store; + gimple load; struct fmt_data fmt_data; - edge ex; + edge ex, latch_edge; struct lim_aux_data *lim_data; + bool multi_threaded_model_p = false; + /* ?? FIXME TESTING TESTING ?? */ + multi_threaded_model_p=true; + /* ?? FIXME TESTING TESTING ?? */ if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -1985,23 +2157,47 @@ execute_sm (struct loop *loop, VEC (edge fmt_data.orig_loop = loop; for_each_index (&ref->mem, force_move_till, &fmt_data); + if ((flag_tm && loop_preheader_edge (loop)->src->flags & BB_IN_TRANSACTION) + || !PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)) + multi_threaded_model_p = true; + + if (multi_threaded_model_p) + store_flag = execute_sm_if_changed_flag_set (loop, ref); + rewrite_mem_refs (loop, ref, tmp_var); - /* Emit the load & stores. */ + /* Emit the load code into the latch, so that we are sure it will + be processed after all dependencies. */ + latch_edge = loop_latch_edge (loop); + + /* FIXME/TODO: For the multi-threaded variant, we could avoid this + load altogether, since the store is predicated by a flag. We + could, do the load only if it was originally in the loop. */ load = gimple_build_assign (tmp_var, unshare_expr (ref->mem)); lim_data = init_lim_data (load); lim_data->max_loop = loop; lim_data->tgt_loop = loop; + gsi_insert_on_edge (latch_edge, load); - /* Put this into the latch, so that we are sure it will be processed after - all dependencies. */ - gsi_insert_on_edge (loop_latch_edge (loop), load); - - FOR_EACH_VEC_ELT (edge, exits, i, ex) + if (multi_threaded_model_p) { - store = gimple_build_assign (unshare_expr (ref->mem), tmp_var); - gsi_insert_on_edge (ex, store); + load = gimple_build_assign (store_flag, boolean_false_node); + lim_data = init_lim_data (load); + lim_data->max_loop = loop; + lim_data->tgt_loop = loop; + gsi_insert_on_edge (latch_edge, load); } + + /* Sink the store to every exit from the loop. */ + FOR_EACH_VEC_ELT (edge, exits, i, ex) + if (!multi_threaded_model_p) + { + gimple store; + store = gimple_build_assign (unshare_expr (ref->mem), tmp_var); + gsi_insert_on_edge (ex, store); + } + else + execute_sm_if_changed (ex, ref->mem, tmp_var, store_flag); } /* Hoists memory references MEM_REFS out of LOOP. EXITS is the list of exit @@ -2410,6 +2606,8 @@ tree_ssa_lim_initialize (void) if (flag_tm) compute_transaction_bits (); + + alloc_aux_for_edges (0); } /* Cleans up after the invariant motion pass. */ @@ -2421,6 +2619,8 @@ tree_ssa_lim_finalize (void) unsigned i; bitmap b; + free_aux_for_edges (); + FOR_EACH_BB (bb) SET_ALWAYS_EXECUTED_IN (bb, NULL); Index: cfg.c =================================================================== --- cfg.c (revision 187729) +++ cfg.c (working copy) @@ -814,10 +814,10 @@ free_aux_for_blocks (void) clear_aux_for_blocks (); } -/* Allocate a memory edge of SIZE as BB->aux. The obstack must +/* Allocate a memory edge of SIZE as E->aux. The obstack must be first initialized by alloc_aux_for_edges. */ -static void +void alloc_aux_for_edge (edge e, int size) { /* Verify that aux field is clear. */