From patchwork Mon Dec 13 20:35:35 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Fang, Changpeng" X-Patchwork-Id: 75402 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 50F011007D1 for ; Tue, 14 Dec 2010 07:35:52 +1100 (EST) Received: (qmail 17754 invoked by alias); 13 Dec 2010 20:35:50 -0000 Received: (qmail 17364 invoked by uid 22791); 13 Dec 2010 20:35:48 -0000 X-SWARE-Spam-Status: No, hits=-0.6 required=5.0 tests=AWL,BAYES_50,TW_DB X-Spam-Check-By: sourceware.org Received: from tx2ehsobe003.messaging.microsoft.com (HELO TX2EHSOBE005.bigfish.com) (65.55.88.13) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 13 Dec 2010 20:35:41 +0000 Received: from mail17-tx2-R.bigfish.com (10.9.14.251) by TX2EHSOBE005.bigfish.com (10.9.40.25) with Microsoft SMTP Server id 14.1.225.8; Mon, 13 Dec 2010 20:35:39 +0000 Received: from mail17-tx2 (localhost.localdomain [127.0.0.1]) by mail17-tx2-R.bigfish.com (Postfix) with ESMTP id 26CE214D8315 for ; Mon, 13 Dec 2010 20:35:39 +0000 (UTC) X-SpamScore: -3 X-BigFish: VPS-3(zz4015Lzz1202hzzz32i668h34h) X-Forefront-Antispam-Report: KIP:(null); UIP:(null); IPVD:NLI; H:ausb3twp02.amd.com; RD:none; EFVD:NLI Received: from mail17-tx2 (localhost.localdomain [127.0.0.1]) by mail17-tx2 (MessageSwitch) id 1292272538912645_22214; Mon, 13 Dec 2010 20:35:38 +0000 (UTC) Received: from TX2EHSMHS041.bigfish.com (unknown [10.9.14.236]) by mail17-tx2.bigfish.com (Postfix) with ESMTP id D21ED920050 for ; Mon, 13 Dec 2010 20:35:38 +0000 (UTC) Received: from ausb3twp02.amd.com (163.181.249.109) by TX2EHSMHS041.bigfish.com (10.9.99.141) with Microsoft SMTP Server id 14.1.225.8; Mon, 13 Dec 2010 20:35:38 +0000 X-M-MSG: Received: from sausexedgep02.amd.com (sausexedgep02-ext.amd.com [163.181.249.73]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ausb3twp02.amd.com (Tumbleweed MailGate 3.7.2) with ESMTP id 26C2DC89E8 for ; Mon, 13 Dec 2010 14:35:35 -0600 (CST) Received: from sausexhtp01.amd.com (163.181.3.165) by sausexedgep02.amd.com (163.181.36.59) with Microsoft SMTP Server (TLS) id 8.3.106.1; Mon, 13 Dec 2010 14:37:29 -0600 Received: from SAUSEXMBP01.amd.com ([163.181.3.198]) by sausexhtp01.amd.com ([163.181.3.165]) with mapi; Mon, 13 Dec 2010 14:35:36 -0600 From: "Fang, Changpeng" To: "gcc-patches@gcc.gnu.org" Date: Mon, 13 Dec 2010 14:35:35 -0600 Subject: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops Message-ID: MIME-Version: 1.0 X-OriginatorOrg: amd.com Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, The attached patch adds the logic to disable certain loop optimizations on pre-/post-loops. Some loop optimizations (auto-vectorization, loop unrolling, etc) may peel a few iterations of a loop to form pre- and/or post-loops for various purposes (alignment, loop bounds, etc). Currently, GCC loop optimizer is unable to recognize that such loops will roll only a few iterations and still perform optimizations on them. While this does not hurt the performance in general, it may significantly increase the compilation time and code size without performance benefit. This patch adds such logic for the loop optimizer to recognize pre- and/or post loops, and disable prefetch, unswitch and loop unrolling on them. On polyhedron with -Ofast -funroll-loops -march=amdfam10, the patch could reduce the compilation time by 28% on average, the reduce the binary size by 20% on average (see the atached data). Note that the small improvement (0.5%) could have been noise, the code size reduction could possibly improve the performance in some cases (I_cache iprovement?). The patch passed bootstrap and gcc regression tests on x86_64-unknown-linux-gnu. Is it OK to commit to trunk? Thanks, Changpeng Effact of the pre-/post-loop patch on polyhedron option: gfortran -Ofast -funroll-loops -march=amdfam10 compilation code size speed time reduction reduction improvement (%) (%) (%) ----------------------------------------------------------- ac -20.54 -17.15 0 aermod -15.93 -10.15 2.51 air -5.74 -5.45 -0.09 capacita -31.35 -18.27 0.08 channel -11.32 -10.24 1.22 doduc -4.52 -6.12 0.82 fatigue -34.51 -15.94 0 gas_dyn -45.56 -28.66 2.31 induct -3.1 -1.91 0.05 linpk -25.55 -27.5 0.26 mdbx -24.06 -19.74 1.27 nf -60.85 -48.92 -0.77 protein -44.73 -24.02 -0.19 rnflow -50.55 -36.69 0.47 test_fpu -52.49 -41.35 1.18 tfft -24.83 -18.29 0.39 ----------------------------------------------------------- average -28.48 -20.65 0.59 From e8636e80de4d6de8ba2dbc8f08bd2daddd02edc3 Mon Sep 17 00:00:00 2001 From: Changpeng Fang Date: Mon, 13 Dec 2010 12:01:49 -0800 Subject: [PATCH] Don't perform certain loop optimizations on pre/post loops * basic-block.h (bb_flags): Add a new flag BB_PRE_POST_LOOP_HEADER. * cfg.c (clear_bb_flags): Keep BB_PRE_POST_LOOP_HEADER marker. * cfgloop.h (mark_pre_or_post_loop): New function declaration. (pre_or_post_loop_p): New function declaration. * loop-unroll.c (decide_unroll_runtime_iterations): Do not unroll a pre- or post-loop. * loop-unswitch.c (unswitch_single_loop): Do not unswitch a pre- or post-loop. * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the post-loop. * tree-ssa-loop-niter.c (mark_pre_or_post_loop): Implement the new function. (pre_or_post_loop_p): Implement the new function. * tree-ssa-loop-prefetch.c (loop_prefetch_arrays): Don't prefetch a pre- or post-loop. * tree-ssa-loop-unswitch.c (tree_ssa_unswitch_loops): Do not unswitch a pre- or post-loop. * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the post-loop. (vect_do_peeling_for_alignment): Mark the pre-loop. --- gcc/basic-block.h | 6 +++++- gcc/cfg.c | 7 ++++--- gcc/cfgloop.h | 2 ++ gcc/loop-unroll.c | 7 +++++++ gcc/loop-unswitch.c | 8 ++++++++ gcc/tree-ssa-loop-manip.c | 3 +++ gcc/tree-ssa-loop-niter.c | 20 ++++++++++++++++++++ gcc/tree-ssa-loop-prefetch.c | 7 +++++++ gcc/tree-ssa-loop-unswitch.c | 8 ++++++++ gcc/tree-vect-loop-manip.c | 8 ++++++++ 10 files changed, 72 insertions(+), 4 deletions(-) diff --git a/gcc/basic-block.h b/gcc/basic-block.h index be0a1d1..78552fd 100644 --- a/gcc/basic-block.h +++ b/gcc/basic-block.h @@ -245,7 +245,11 @@ enum bb_flags /* Set on blocks that cannot be threaded through. Only used in cfgcleanup.c. */ - BB_NONTHREADABLE_BLOCK = 1 << 11 + BB_NONTHREADABLE_BLOCK = 1 << 11, + + /* Set on blocks that are headers of pre- or post-loops. */ + BB_PRE_POST_LOOP_HEADER = 1 << 12 + }; /* Dummy flag for convenience in the hot/cold partitioning code. */ diff --git a/gcc/cfg.c b/gcc/cfg.c index c8ef799..e9b394a 100644 --- a/gcc/cfg.c +++ b/gcc/cfg.c @@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred) connect_src (e); } -/* Clear all basic block flags, with the exception of partitioning and - setjmp_target. */ +/* Clear all basic block flags, with the exception of partitioning, + setjmp_target, and the pre/post loop marker. */ void clear_bb_flags (void) { @@ -434,7 +434,8 @@ clear_bb_flags (void) FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb) bb->flags = (BB_PARTITION (bb) - | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET))); + | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET + + BB_PRE_POST_LOOP_HEADER))); } /* Check the consistency of profile information. We can't do that diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index bf2614e..ce848cc 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx); void estimate_numbers_of_iterations_loop (struct loop *, bool); HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool); bool estimated_loop_iterations (struct loop *, bool, double_int *); +void mark_pre_or_post_loop (struct loop *); +bool pre_or_post_loop_p (struct loop *); /* Loop manipulation. */ extern bool can_duplicate_loop_p (const struct loop *loop); diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c index 67d6ea0..6f095f6 100644 --- a/gcc/loop-unroll.c +++ b/gcc/loop-unroll.c @@ -857,6 +857,13 @@ decide_unroll_runtime_iterations (struct loop *loop, int flags) fprintf (dump_file, ";; Loop iterates constant times\n"); return; } + + if (pre_or_post_loop_p (loop)) + { + if (dump_file) + fprintf (dump_file, ";; Not unrolling, a pre- or post-loop\n"); + return; + } /* If we have profile feedback, check whether the loop rolls. */ if (loop->header->count && expected_loop_iterations (loop) < 2 * nunroll) diff --git a/gcc/loop-unswitch.c b/gcc/loop-unswitch.c index 77524d8..59373bf 100644 --- a/gcc/loop-unswitch.c +++ b/gcc/loop-unswitch.c @@ -276,6 +276,14 @@ unswitch_single_loop (struct loop *loop, rtx cond_checked, int num) return; } + /* Pre- or post loop usually just roll a few iterations. */ + if (pre_or_post_loop_p (loop)) + { + if (dump_file) + fprintf (dump_file, ";; Not unswitching, a pre- or post loop\n"); + return; + } + /* We must be able to duplicate loop body. */ if (!can_duplicate_loop_p (loop)) { diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c index 87b2c0d..f8ddbab 100644 --- a/gcc/tree-ssa-loop-manip.c +++ b/gcc/tree-ssa-loop-manip.c @@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor, gcc_assert (new_loop != NULL); update_ssa (TODO_update_ssa); + /* NEW_LOOP is a post-loop. */ + mark_pre_or_post_loop (new_loop); + /* Determine the probability of the exit edge of the unrolled loop. */ new_est_niter = est_niter / factor; diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index ee85f6f..33e8cc3 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p) fold_undefer_and_ignore_overflow_warnings (); } +/* Mark LOOP as a pre- or post loop. */ + +void +mark_pre_or_post_loop (struct loop *loop) +{ + gcc_assert (loop && loop->header); + loop->header->flags |= BB_PRE_POST_LOOP_HEADER; +} + +/* Return true if LOOP is a pre- or post loop. */ + +bool +pre_or_post_loop_p (struct loop *loop) +{ + int masked_flags; + gcc_assert (loop && loop->header); + masked_flags = (loop->header->flags & BB_PRE_POST_LOOP_HEADER); + return (masked_flags != 0); +} + /* Returns true if statement S1 dominates statement S2. */ bool diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c index 59c65d3..5c9f640 100644 --- a/gcc/tree-ssa-loop-prefetch.c +++ b/gcc/tree-ssa-loop-prefetch.c @@ -1793,6 +1793,13 @@ loop_prefetch_arrays (struct loop *loop) return false; } + if (pre_or_post_loop_p (loop)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, " Not Prefetching -- pre- or post loop\n"); + return false; + } + /* FIXME: the time should be weighted by the probabilities of the blocks in the loop body. */ time = tree_num_loop_insns (loop, &eni_time_weights); diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c index b6b32dc..f3b8108 100644 --- a/gcc/tree-ssa-loop-unswitch.c +++ b/gcc/tree-ssa-loop-unswitch.c @@ -88,6 +88,14 @@ tree_ssa_unswitch_loops (void) if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, ";; Considering loop %d\n", loop->num); + /* Do not unswitch a pre- or post loop. */ + if (pre_or_post_loop_p (loop)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, ";; Not unswitching, a pre- or post loop\n"); + continue; + } + /* Do not unswitch in cold regions. */ if (optimize_loop_for_size_p (loop)) { diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 6ecd304..9a63f7e 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio, cond_expr, cond_expr_stmt_list); gcc_assert (new_loop); gcc_assert (loop_num == loop->num); + + /* NEW_LOOP is a post loop. */ + mark_pre_or_post_loop (new_loop); + #ifdef ENABLE_CHECKING slpeel_verify_cfg_after_peeling (loop, new_loop); #endif @@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo) th, true, NULL_TREE, NULL); gcc_assert (new_loop); + + /* NEW_LOOP is a pre-loop. */ + mark_pre_or_post_loop (new_loop); + #ifdef ENABLE_CHECKING slpeel_verify_cfg_after_peeling (new_loop, loop); #endif -- 1.6.3.3