From patchwork Mon Sep 9 17:46:07 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Mi X-Patchwork-Id: 273626 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "www.sourceware.org", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 8109E2C0129 for ; Tue, 10 Sep 2013 03:46:50 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=h0e6AEtit2CTVV19Ip W5GwcYbE80zHRBO6/YphG9IHrahbYoVT31uab8o9Ps33turTZAPSX6mDIu9NuWGu iSwzgln+NfapuCvlzErkarC2uJcma5BeWQBVtnIXGFFtkIqeJn5zrboJvTmc+ch5 Sx3wN+yAnazUoyGjQTc5ApfWU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=OpOBqx6H04cQvDYAr8YCntsK Ot4=; b=eTkP82I9Uy6xqXZvcZa2OftHwBmXlOPjnvRv+0vE8wdzM/tiUE6A+X5+ Qk2bPCJ7+oP34b6b1IQwNTwtQOwsok5S/zE5GvgdC81xCAksCq8aTxL3WYTAa+/F ie6QSKU1oT6Bg6PC43mKUUNhYLMK+yTiJtjSYVE2BIEFDvmhgho= Received: (qmail 12124 invoked by alias); 9 Sep 2013 17:46:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 12105 invoked by uid 89); 9 Sep 2013 17:46:40 -0000 Received: from mail-ie0-f181.google.com (HELO mail-ie0-f181.google.com) (209.85.223.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Mon, 09 Sep 2013 17:46:40 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, KHOP_THREADED, NO_RELAYS autolearn=ham version=3.3.2 X-HELO: mail-ie0-f181.google.com Received: by mail-ie0-f181.google.com with SMTP id y16so8221859ieg.40 for ; Mon, 09 Sep 2013 10:46:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=r2tiYSS+su0tRmedJvrnBlBv17DyOcFLNDpvemeAHBg=; b=Dn0z6PkSGLN8/UWKhGhANflUvBGlrcepczns+VajHarscJXtDrWu42DB++58hTt0xS muY9O6qIPRbWLf7XEnRMRe4ejZIgiB69QIPO4Fa05eslWeqNFX1Qk7yOKCneH6zb5GIx OL+rx0Co7g/6YAXFce2H/xU12yzCjF1ZOjB+h/uCsNGnXG5jwfsVtGJdWO+UC6alL7ob VEWHyqziDfd0vmKfhwuVWO4VeflpQrrBoJNrK1IUlTB4Dtqh1QmFF+jqDwSkVspkH+9P 644RmLm/So/f0GPd4iR2+Y+pE6dpjHCSppnK/f1YToZNbck68dBf3vsNHB5eEldJgvxa HM3g== X-Gm-Message-State: ALoCoQlL/w2/2lA0QO1P6uXU+qTIpYX3FAon350W0/8NpVwmTRiKoR9b3+M8U5n0JC4NAZeiIuOGhHCd8axN7z1LmNT7MtwCQslfKnDlkhwv9V9WvKz0HfiT/Hghkgalt2tlBdDvZNEb/5HTqVKQGtE+nqBQ7kn5CarqZ63q02nEvPYwA4USoroc+OAm0lKxGAFXD41K2nvWfp9MzSdbdtEuF6sp0PN1gw== MIME-Version: 1.0 X-Received: by 10.50.30.42 with SMTP id p10mr8752058igh.5.1378748767997; Mon, 09 Sep 2013 10:46:07 -0700 (PDT) Received: by 10.64.147.102 with HTTP; Mon, 9 Sep 2013 10:46:07 -0700 (PDT) In-Reply-To: References: Date: Mon, 9 Sep 2013 10:46:07 -0700 Message-ID: Subject: Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion From: Wei Mi To: Alexander Monakov Cc: Steven Bosscher , GCC Patches , David Li Add a testcase. bootstrap and regression ok for the patch in last mail. 2013-09-09 Wei Mi * gcc/testsuite/gcc.dg/macro-fusion-1.c: New. + On Fri, Sep 6, 2013 at 10:39 AM, Wei Mi wrote: > SCHED_GROUP works after I add chain_to_prev_insn after > add_branch_dependences, in order to chain control dependences to prev > insn for sched group. Here is the new patch. Testing is going on. > > Thanks, > Wei Mi. > > 2013-09-06 Wei Mi > > * config/i386/i386.c (ix86_macro_fusion_p): New function. > (ix86_macro_fusion_pair_p): Ditto. > * config/i386/x86-tune.def (DEF_TUNE): Add m_COREI7 for > X86_TUNE_FUSE_CMP_AND_BRANCH. > * sched-deps.c (group_insns_for_macro_fusion): New function. > (sched_analyze_insn): Call group_insns_for_macro_fusion. > (chain_to_prev_insn): Change it from static to extern. > (chain_to_prev_insn_p): Ditto. > * doc/tm.texi: Generated. > * doc/tm.texi.in: Ditto. > * sched-int.h: New declarations. > * sched-rgn.c (add_branch_dependences): Chain control > dependences to prev insn for sched group. > * target.def: Add macro_fusion_p and macro_fusion_pair_p. > > Index: config/i386/i386.c > =================================================================== > --- config/i386/i386.c (revision 201963) > +++ config/i386/i386.c (working copy) > @@ -24850,6 +24850,99 @@ ia32_multipass_dfa_lookahead (void) > } > } > > +/* Return true if target platform supports macro-fusion. */ > + > +static bool > +ix86_macro_fusion_p () > +{ > + if (TARGET_FUSE_CMP_AND_BRANCH) > + return true; > + else > + return false; > +} > + > +/* Check whether current microarchitecture support macro fusion > + for insn pair "CONDGEN + CONDJMP". Refer to > + "Intel Architectures Optimization Reference Manual". */ > + > +static bool > +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) > +{ > + rtx src; > + if (!strcmp (ix86_tune_string, "corei7")) > + { > + /* For Nehalem. */ > + rtx single_set = single_set (condgen); > + /* Nehalem doesn't support macro-fusion for add/sub+jmp. */ > + if (single_set == NULL_RTX) > + return false; > + > + src = SET_SRC (single_set); > + if (GET_CODE (src) != COMPARE) > + return false; > + > + /* Nehalem doesn't support macro-fusion for cmp/test MEM-IMM > + insn pattern. */ > + if ((MEM_P (XEXP (src, 0)) > + && CONST_INT_P (XEXP (src, 1))) > + || (MEM_P (XEXP (src, 1)) > + && CONST_INT_P (XEXP (src, 0)))) > + return false; > + > + /* Nehalem doesn't support macro-fusion for add/sub/dec/inc + jmp. */ > + if (get_attr_type (condgen) != TYPE_TEST > + && get_attr_type (condgen) != TYPE_ICMP) > + return false; > + return true; > + } > + else if (!strcmp (ix86_tune_string, "corei7-avx")) > + { > + /* For Sandybridge. */ > + enum rtx_code ccode; > + rtx compare_set = NULL_RTX, test_if, cond; > + rtx single_set = single_set (condgen); > + if (single_set != NULL_RTX) > + compare_set = single_set; > + else > + { > + int i; > + rtx pat = PATTERN (condgen); > + for (i = 0; i < XVECLEN (pat, 0); i++) > + if (GET_CODE (XVECEXP (pat, 0, i)) == SET > + && GET_CODE (SET_SRC (XVECEXP (pat, 0, i))) == COMPARE) > + compare_set = XVECEXP (pat, 0, i); > + } > + > + if (compare_set == NULL_RTX) > + return false; > + src = SET_SRC (compare_set); > + if (GET_CODE (src) != COMPARE) > + return false; > + > + /* Sandybridge doesn't support macro-fusion for cmp/test MEM-IMM > + insn pattern. */ > + if ((MEM_P (XEXP (src, 0)) > + && CONST_INT_P (XEXP (src, 1))) > + || (MEM_P (XEXP (src, 1)) > + && CONST_INT_P (XEXP (src, 0)))) > + return false; > + > + /* Sandybridge doesn't support macro-fusion for inc/dec + > + unsigned comparison jmp. */ > + test_if = SET_SRC (pc_set (condjmp)); > + cond = XEXP (test_if, 0); > + ccode = GET_CODE (cond); > + if (get_attr_type (condgen) == TYPE_INCDEC > + && (ccode == GEU > + || ccode == GTU > + || ccode == LEU > + || ccode == LTU)) > + return false; > + return true; > + } > + return false; > +} > + > /* Try to reorder ready list to take advantage of Atom pipelined IMUL > execution. It is applied if > (1) IMUL instruction is on the top of list; > @@ -42982,6 +43075,10 @@ ix86_memmodel_check (unsigned HOST_WIDE_ > #undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD > #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \ > ia32_multipass_dfa_lookahead > +#undef TARGET_SCHED_MACRO_FUSION_P > +#define TARGET_SCHED_MACRO_FUSION_P ix86_macro_fusion_p > +#undef TARGET_SCHED_MACRO_FUSION_PAIR_P > +#define TARGET_SCHED_MACRO_FUSION_PAIR_P ix86_macro_fusion_pair_p > > #undef TARGET_FUNCTION_OK_FOR_SIBCALL > #define TARGET_FUNCTION_OK_FOR_SIBCALL ix86_function_ok_for_sibcall > Index: config/i386/x86-tune.def > =================================================================== > --- config/i386/x86-tune.def (revision 201963) > +++ config/i386/x86-tune.def (working copy) > @@ -196,7 +196,8 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, > /* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test instruction > with a subsequent conditional jump instruction into a single > compare-and-branch uop. */ > -DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, "fuse_cmp_and_branch", m_BDVER) > +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, "fuse_cmp_and_branch", > + m_COREI7 | m_BDVER) > /* X86_TUNE_OPT_AGU: Optimize for Address Generation Unit. This flag > will impact LEA instruction selection. */ > DEF_TUNE (X86_TUNE_OPT_AGU, "opt_agu", m_ATOM | m_SLM) > Index: sched-deps.c > =================================================================== > --- sched-deps.c (revision 201963) > +++ sched-deps.c (working copy) > @@ -487,7 +487,6 @@ static void add_dependence_list (rtx, rt > static void add_dependence_list_and_free (struct deps_desc *, rtx, > rtx *, int, enum reg_note, bool); > static void delete_all_dependences (rtx); > -static void chain_to_prev_insn (rtx); > > static void flush_pending_lists (struct deps_desc *, rtx, int, int); > static void sched_analyze_1 (struct deps_desc *, rtx, rtx); > @@ -1660,7 +1659,7 @@ delete_all_dependences (rtx insn) > chains backwards. Then we add the dependencies for the group to > the previous nonnote insn. */ > > -static void > +void > chain_to_prev_insn (rtx insn) > { > sd_iterator_def sd_it; > @@ -2821,6 +2820,35 @@ sched_analyze_2 (struct deps_desc *deps, > sched_deps_info->finish_rhs (); > } > > +/* If the last cond jump and the cond register defining insn are consecutive > + before scheduling, we want them to be in a schedule group. This is good > + for performance on microarchitectures supporting macro-fusion. */ > + > +static void > +group_insns_for_macro_fusion (rtx insn) > +{ > + unsigned int condreg1, condreg2; > + rtx cc_reg_1; > + rtx prev; > + > + targetm.fixed_condition_code_regs (&condreg1, &condreg2); > + cc_reg_1 = gen_rtx_REG (CCmode, condreg1); > + prev = prev_nonnote_nondebug_insn (insn); > + if (!any_condjump_p (insn) > + || !reg_referenced_p (cc_reg_1, PATTERN (insn)) > + || !prev > + || !modified_in_p (cc_reg_1, prev)) > + return; > + > + /* Different microarchitectures support macro fusions for different > + combinations of insn pairs. */ > + if (!targetm.sched.macro_fusion_pair_p > + || !targetm.sched.macro_fusion_pair_p (prev, insn)) > + return; > + > + SCHED_GROUP_P (insn) = 1; > +} > + > /* Analyze an INSN with pattern X to find all dependencies. */ > static void > sched_analyze_insn (struct deps_desc *deps, rtx x, rtx insn) > @@ -2844,6 +2872,10 @@ sched_analyze_insn (struct deps_desc *de > can_start_lhs_rhs_p = (NONJUMP_INSN_P (insn) > && code == SET); > > + if (targetm.sched.macro_fusion_p > + && targetm.sched.macro_fusion_p ()) > + group_insns_for_macro_fusion (insn); > + > if (may_trap_p (x)) > /* Avoid moving trapping instructions across function calls that might > not always return. */ > @@ -3504,7 +3536,7 @@ call_may_noreturn_p (rtx insn) > group, and if all INSN's dependencies should be moved to the first > instruction of that group. */ > > -static bool > +bool > chain_to_prev_insn_p (rtx insn) > { > rtx prev, x; > Index: doc/tm.texi > =================================================================== > --- doc/tm.texi (revision 201963) > +++ doc/tm.texi (working copy) > @@ -6553,6 +6553,17 @@ scheduling one insn causes other insns t > cycle. These other insns can then be taken into account properly. > @end deftypefn > > +@deftypefn {Target Hook} bool TARGET_SCHED_MACRO_FUSION_P (void) > +This hook is used to check whether target platform supports macro fusion. > +@end deftypefn > + > +@deftypefn {Target Hook} bool TARGET_SCHED_MACRO_FUSION_PAIR_P (rtx > @var{condgen}, rtx @var{condjmp}) > +This hook is used to check whether two insns could be macro fused for > +target microarchitecture. If this hook returns true for the given insn pair > +(@var{condgen} and @var{condjmp}), scheduler will put them into a sched > +group, and they will not be scheduled apart. > +@end deftypefn > + > @deftypefn {Target Hook} void > TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK (rtx @var{head}, rtx > @var{tail}) > This hook is called after evaluation forward dependencies of insns in > chain given by two parameter values (@var{head} and @var{tail} > Index: doc/tm.texi.in > =================================================================== > --- doc/tm.texi.in (revision 201963) > +++ doc/tm.texi.in (working copy) > @@ -4940,6 +4940,10 @@ them: try the first ones in this list fi > > @hook TARGET_SCHED_REORDER2 > > +@hook TARGET_SCHED_MACRO_FUSION_P > + > +@hook TARGET_SCHED_MACRO_FUSION_PAIR_P > + > @hook TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK > > @hook TARGET_SCHED_INIT > Index: sched-int.h > =================================================================== > --- sched-int.h (revision 201963) > +++ sched-int.h (working copy) > @@ -1302,6 +1302,8 @@ extern void finish_deps_global (void); > extern void deps_analyze_insn (struct deps_desc *, rtx); > extern void remove_from_deps (struct deps_desc *, rtx); > extern void init_insn_reg_pressure_info (rtx); > +extern bool chain_to_prev_insn_p (rtx insn); > +extern void chain_to_prev_insn (rtx); > > extern dw_t get_dep_weak (ds_t, ds_t); > extern ds_t set_dep_weak (ds_t, ds_t, dw_t); > Index: sched-rgn.c > =================================================================== > --- sched-rgn.c (revision 201963) > +++ sched-rgn.c (working copy) > @@ -2507,7 +2507,7 @@ add_branch_dependences (rtx head, rtx ta > } > > if (!targetm.have_conditional_execution ()) > - return; > + goto chain_to_prev_insn; > > /* Finally, if the block ends in a jump, and we are doing intra-block > scheduling, make sure that the branch depends on any COND_EXEC insns > @@ -2543,7 +2543,7 @@ add_branch_dependences (rtx head, rtx ta > could remove always-true predicates. */ > > if (!reload_completed || ! (JUMP_P (tail) || JUMP_TABLE_DATA_P (tail))) > - return; > + goto chain_to_prev_insn; > > insn = tail; > while (insn != head) > @@ -2557,6 +2557,23 @@ add_branch_dependences (rtx head, rtx ta > if (INSN_P (insn) && GET_CODE (PATTERN (insn)) == COND_EXEC) > add_dependence (tail, insn, REG_DEP_ANTI); > } > + > + chain_to_prev_insn: > + /* Control dependences also need to be chained to the prev insn > + for sched group. */ > + insn = tail; > + while (insn != head) > + { > + /* Fixup the dependencies in the sched group. */ > + if (JUMP_P (insn) > + && chain_to_prev_insn_p (insn) > + && !sel_sched_p ()) > + chain_to_prev_insn (insn); > + > + insn = PREV_INSN (insn); > + } > + > + return; > } > > /* Data structures for the computation of data dependences in a regions. We > Index: target.def > =================================================================== > --- target.def (revision 201963) > +++ target.def (working copy) > @@ -1041,6 +1041,19 @@ scheduling one insn causes other insns t > cycle. These other insns can then be taken into account properly.", > int, (FILE *file, int verbose, rtx *ready, int *n_readyp, int clock), NULL) > > +DEFHOOK > +(macro_fusion_p, > + "This hook is used to check whether target platform supports macro fusion.", > + bool, (void), NULL) > + > +DEFHOOK > +(macro_fusion_pair_p, > + "This hook is used to check whether two insns could be macro fused for\n\ > +target microarchitecture. If this hook returns true for the given insn pair\n\ > +(@var{condgen} and @var{condjmp}), scheduler will put them into a sched\n\ > +group, and they will not be scheduled apart.", > + bool, (rtx condgen, rtx condjmp), NULL) > + > /* The following member value is a pointer to a function called > after evaluation forward dependencies of insns in chain given > by two parameter values (head and tail correspondingly). */ Index: gcc/testsuite/gcc.dg/macro-fusion-1.c =================================================================== --- gcc/testsuite/gcc.dg/macro-fusion-1.c (revision 0) +++ gcc/testsuite/gcc.dg/macro-fusion-1.c (revision 0) @@ -0,0 +1,14 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -mtune=corei7 -fdump-rtl-sched2" } */ +/* { dg-final { scan-rtl-dump-not "compare.*insn.*jump_insn.*jump_insn" "sched2" } } */ + +int a[100]; + +double bar (double sum) +{ + int i; + for (i = 0; i < 1000000; i++) + sum += (0.5 + (a[i%100] - 128)); + return sum; +}