From patchwork Wed Nov 27 22:31:00 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Mi X-Patchwork-Id: 294679 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id CDCB22C00AB for ; Thu, 28 Nov 2013 09:32:09 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=CHD+XYxAzQCJ05FnB6 60wdonR+MmIA3E+QVWB1CPiIQ7LkLUFfgevlmZ0nDsittFJSpyhZP7RyhDmsvUHL 6KswHa95/zw26fcm0eoO5V7uPfZgaWsiP2jygPLLK5VIs6U1cc2Yaozzy+OR/PBR wuRefj/DX3ATVn1yaVhbvC8UI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=yTx4YlVda2bEyvSUpAvOTEMq zCY=; b=nzcNmJUyKFKV6YKAI6rREuYKkKw2cvNb5mkusK8XLi2/R4Yr6IbrTH9w siXNQ/3U4ZTya/Kdx5Tx+ekltWW5XtBlXxk4gZhSbUHbxcCMSQ7drqrU1ajZUaja GKWO08CvYdmPJaFKYctozTHA1fBUsdgb+3OnJR/depe3wj4btxs= Received: (qmail 11449 invoked by alias); 27 Nov 2013 22:31:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 11429 invoked by uid 89); 27 Nov 2013 22:31:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL, BAYES_50, RDNS_NONE, SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-oa0-f53.google.com Received: from Unknown (HELO mail-oa0-f53.google.com) (209.85.219.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 27 Nov 2013 22:31:08 +0000 Received: by mail-oa0-f53.google.com with SMTP id m1so8301796oag.40 for ; Wed, 27 Nov 2013 14:31:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=KM0BH6cuoz9UM1c4fXexSG31KRvriBezocZaSlEhTNA=; b=m02M5Kz8pwsrTbPiyNSTTXhR8pyFdv+82t9UARnsyFakP9//5XTgS1Hs2+Da3RieCo HwX+sKgfbtRmN/uDgjF5w+i1WsZ1+0OBzwENgZ3jeVpNVhVsvzekkeDi8Zv5XYiU9Ne4 QEjE9kPhsEsarV+ljsMZRgVGZurHwFIAiUoPCH7QWzxXB8giIPDNzF3wDvOCY2c808fi Tanhdy5FVcBa6LNcqW0Lr0oYIX7kdE806kWmkYwFwY49Wmovrk85ZUfOj0iTXIV+JtIC 60zJqsTY/3yuiGgdUfMmLCdJbkNTh2FdMP57PgUElTLT3bUVuaGPs6Xq4siKPWj5SMJ/ WgkQ== X-Gm-Message-State: ALoCoQkEeK7TBzQ6r1iQBTbiJ3kDI0HS7V7YwQPY1MUF080x2oPnexO+i7iNyIVTgQm+j1whWimUBcHrA+CIkDEccjlqxPXKaMTdYPn5rA0MkpL34BgTjL1u0oNTBjf3mpGh+O7h8xxG8nYPMNdtnbIGLkkOD/VODo/e4hG5A7v1P5raXkEVzZiJ+gbtMBwEJ+kwcQ/BKtoNIRVwMSA0bonKAMaN6+KdNw== MIME-Version: 1.0 X-Received: by 10.182.49.166 with SMTP id v6mr35337388obn.13.1385591460886; Wed, 27 Nov 2013 14:31:00 -0800 (PST) Received: by 10.76.75.3 with HTTP; Wed, 27 Nov 2013 14:31:00 -0800 (PST) In-Reply-To: References: <525DA6FF.5010901@redhat.com> <525EF180.4020708@redhat.com> <20131104011801.GB32134@atrey.karlin.mff.cuni.cz> <52939894.9020005@redhat.com> <5293A434.4000305@redhat.com> <5293CB63.5070807@redhat.com> <5295845E.9090505@redhat.com> Date: Wed, 27 Nov 2013 14:31:00 -0800 Message-ID: Subject: Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion From: Wei Mi To: Jeff Law Cc: "H.J. Lu" , Jan Hubicka , Alexander Monakov , Steven Bosscher , GCC Patches , David Li , Kirill Yukhin , Vladimir Makarov >> Hmm, maybe attack from the other direction? -- could we clear SCHED_GROUP_P >> for each insn at the start of this loop in sched_analyze? >> >> It's not as clean in the sense that SCHED_GROUP_P "escapes" the scheduler, >> but it might be an option. >> >> for (insn = head;; insn = NEXT_INSN (insn)) >> { >> >> if (INSN_P (insn)) >> { >> /* And initialize deps_lists. */ >> sd_init_insn (insn); >> } >> >> deps_analyze_insn (deps, insn); >> >> if (insn == tail) >> { >> if (sched_deps_info->use_cselib) >> cselib_finish (); >> return; >> } >> } >> Jeff >>> >>> >> > > Thanks for the suggestion. It looks workable. Then I need to move the > SCHED_GROUP_P setting for macrofusion from sched_init to a place > inside sched_analyze after the SCHED_GROUP_P cleanup. It will be more > consistent with the settings for cc0 setter-user group and call group, > which are both inside sched_analyze. > I am trying this method... > > Thanks, > Wei. Here is the patch. The patch does the SCHED_GROUP_P cleanup in sched_analyze before deps_analyze_insn set SCHED_GROUP_P and chain the insn with prev insns. And it move try_group_insn for macrofusion from sched_init to sched_analyze_insn. bootstrap and regression pass on x86_64-linux-gnu. Is it ok? Thanks, Wei. 2013-11-27 Wei Mi PR rtl-optimization/59020 * sched-deps.c (try_group_insn): Move it from haifa-sched.c to here. (sched_analyze_insn): Call try_group_insn. (sched_analyze): Cleanup SCHED_GROUP_P before start the analysis. * haifa-sched.c (try_group_insn): Moved to sched-deps.c. (group_insns_for_macro_fusion): Removed. (sched_init): Remove calling group_insns_for_macro_fusion. 2013-11-27 Wei Mi PR rtl-optimization/59020 * testsuite/gcc.dg/pr59020.c: New. * testsuite/gcc.dg/macro-fusion-1.c: New. * testsuite/gcc.dg/macro-fusion-2.c: New. Index: sched-deps.c =================================================================== --- sched-deps.c (revision 204923) +++ sched-deps.c (working copy) @@ -2820,6 +2820,37 @@ sched_analyze_2 (struct deps_desc *deps, sched_deps_info->finish_rhs (); } +/* Try to group comparison and the following conditional jump INSN if + they're already adjacent. This is to prevent scheduler from scheduling + them apart. */ + +static void +try_group_insn (rtx insn) +{ + unsigned int condreg1, condreg2; + rtx cc_reg_1; + rtx prev; + + if (!any_condjump_p (insn)) + return; + + targetm.fixed_condition_code_regs (&condreg1, &condreg2); + cc_reg_1 = gen_rtx_REG (CCmode, condreg1); + prev = prev_nonnote_nondebug_insn (insn); + if (!reg_referenced_p (cc_reg_1, PATTERN (insn)) + || !prev + || !modified_in_p (cc_reg_1, prev)) + return; + + /* Different microarchitectures support macro fusions for different + combinations of insn pairs. */ + if (!targetm.sched.macro_fusion_pair_p + || !targetm.sched.macro_fusion_pair_p (prev, insn)) + return; + + SCHED_GROUP_P (insn) = 1; +} + /* Analyze an INSN with pattern X to find all dependencies. */ static void sched_analyze_insn (struct deps_desc *deps, rtx x, rtx insn) @@ -2843,6 +2874,11 @@ sched_analyze_insn (struct deps_desc *de can_start_lhs_rhs_p = (NONJUMP_INSN_P (insn) && code == SET); + /* Group compare and branch insns for macro-fusion. */ + if (targetm.sched.macro_fusion_p + && targetm.sched.macro_fusion_p ()) + try_group_insn (insn); + if (may_trap_p (x)) /* Avoid moving trapping instructions across function calls that might not always return. */ @@ -3733,6 +3769,10 @@ sched_analyze (struct deps_desc *deps, r { /* And initialize deps_lists. */ sd_init_insn (insn); + /* Clean up SCHED_GROUP_P which may have been set by last + scheduler pass. */ + if (SCHED_GROUP_P (insn)) + SCHED_GROUP_P (insn) = 0; } deps_analyze_insn (deps, insn); Index: haifa-sched.c =================================================================== --- haifa-sched.c (revision 204923) +++ haifa-sched.c (working copy) @@ -6554,50 +6554,6 @@ setup_sched_dump (void) ? stderr : dump_file); } -/* Try to group comparison and the following conditional jump INSN if - they're already adjacent. This is to prevent scheduler from scheduling - them apart. */ - -static void -try_group_insn (rtx insn) -{ - unsigned int condreg1, condreg2; - rtx cc_reg_1; - rtx prev; - - if (!any_condjump_p (insn)) - return; - - targetm.fixed_condition_code_regs (&condreg1, &condreg2); - cc_reg_1 = gen_rtx_REG (CCmode, condreg1); - prev = prev_nonnote_nondebug_insn (insn); - if (!reg_referenced_p (cc_reg_1, PATTERN (insn)) - || !prev - || !modified_in_p (cc_reg_1, prev)) - return; - - /* Different microarchitectures support macro fusions for different - combinations of insn pairs. */ - if (!targetm.sched.macro_fusion_pair_p - || !targetm.sched.macro_fusion_pair_p (prev, insn)) - return; - - SCHED_GROUP_P (insn) = 1; -} - -/* If the last cond jump and the cond register defining insn are consecutive - before scheduling, we want them to be in a schedule group. This is good - for performance on microarchitectures supporting macro-fusion. */ - -static void -group_insns_for_macro_fusion () -{ - basic_block bb; - - FOR_EACH_BB (bb) - try_group_insn (BB_END (bb)); -} - /* Initialize some global state for the scheduler. This function works with the common data shared between all the schedulers. It is called from the scheduler specific initialization routine. */ @@ -6726,11 +6682,6 @@ sched_init (void) } curr_state = xmalloc (dfa_state_size); - - /* Group compare and branch insns for macro-fusion. */ - if (targetm.sched.macro_fusion_p - && targetm.sched.macro_fusion_p ()) - group_insns_for_macro_fusion (); } static void haifa_init_only_bb (basic_block, basic_block); Index: testsuite/gcc.dg/macro-fusion-1.c =================================================================== --- testsuite/gcc.dg/macro-fusion-1.c (revision 0) +++ testsuite/gcc.dg/macro-fusion-1.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -mtune=corei7 -fdump-rtl-sched2" } */ +/* { dg-final { scan-rtl-dump-not "compare.*insn.*jump_insn.*jump_insn" "sched2" } } */ + +int a[100]; + +double bar (double sum) +{ + int i; + for (i = 0; i < 1000000; i++) + sum += (0.5 + (a[i%100] - 128)); + return sum; +} Index: testsuite/gcc.dg/macro-fusion-2.c =================================================================== --- testsuite/gcc.dg/macro-fusion-2.c (revision 0) +++ testsuite/gcc.dg/macro-fusion-2.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -mtune=corei7-avx -fdump-rtl-sched2" } */ +/* { dg-final { scan-rtl-dump-not "compare.*insn.*jump_insn.*jump_insn" "sched2" } } */ + +int a[100]; + +double bar (double sum) +{ + int i = 100000; + while (i != 0) + { + sum += (0.5 + (a[i%100] - 128)); + i--; + } + return sum; +} Index: testsuite/gcc.dg/pr59020.c =================================================================== --- testsuite/gcc.dg/pr59020.c (revision 0) +++ testsuite/gcc.dg/pr59020.c (revision 0) @@ -0,0 +1,15 @@ +/* PR rtl-optimization/59020 */ + +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -fmodulo-sched -fno-inline -march=corei7" } */ + +int a, b, d; +unsigned c; + +void f() +{ + unsigned q; + for(; a; a++) + if(((c %= d && 1) ? : 1) & 1) + for(; b; q++); +}