From patchwork Fri Dec 12 07:27:17 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhenqiang Chen X-Patchwork-Id: 420374 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5C95F1400A0 for ; Fri, 12 Dec 2014 18:27:48 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:references:in-reply-to:subject:date:message-id :mime-version:content-type:content-transfer-encoding; q=dns; s= default; b=pqAGvy1h178mUulWxSPpMhW7m0yUcFsM8oS6LjdP6mHQvVYqdo/I/ zUlUimHjvJBgdUmQrpnHeFqxYW+QkVLCtXILrHOrfxEihsod1PpORLoNG5MmMYf2 JDVFDkN319idgogixxO4gP7rDvrGx9t8zQbEAebh/GUcbfA50hwids= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:references:in-reply-to:subject:date:message-id :mime-version:content-type:content-transfer-encoding; s=default; bh=eHgH8v0CRB87/ypyC7aZZDEyjNE=; b=A5JaVDCyl/yJr8tTQAeN24z0APeW kOoQqBMARxwPFk8NQGcMDiXu4xjjVbbfhOtJu9W3zaUDXQ6Y7UDTtxathoYSKkc/ K2DiXsfN26GTbs0dDscPqgUvIc6afoleQQNp57MYpof9nR5C1DOuhwa0oymdECnl x6PR4APCEvkJJx4= Received: (qmail 12927 invoked by alias); 12 Dec 2014 07:27:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 12916 invoked by uid 89); 12 Dec 2014 07:27:38 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 12 Dec 2014 07:27:34 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by service87.mimecast.com; Fri, 12 Dec 2014 07:27:31 +0000 Received: from shawin003 ([10.164.2.24]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 12 Dec 2014 07:27:28 +0000 From: "Zhenqiang Chen" To: "'Jeff Law'" , "Segher Boessenkool" Cc: References: <53C73EBD.9070909@redhat.com> <547CE75B.6090709@redhat.com> <000001d00f9e$64f43600$2edca200$@arm.com> <54861817.4030409@redhat.com> <000301d01395$675d5890$361809b0$@arm.com> <20141209190744.GA8314@gate.crashing.org> <54874A52.3030304@redhat.com> In-Reply-To: <54874A52.3030304@redhat.com> Subject: RE: [PATCH] Fix PR 61225 Date: Fri, 12 Dec 2014 15:27:17 +0800 Message-ID: <000101d015dd$1042e6c0$30c8b440$@arm.com> MIME-Version: 1.0 X-MC-Unique: 114121207273100701 X-IsSubscribed: yes > -----Original Message----- > From: Jeff Law [mailto:law@redhat.com] > Sent: Wednesday, December 10, 2014 3:16 AM > To: Segher Boessenkool; Zhenqiang Chen > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Fix PR 61225 > > On 12/09/14 12:07, Segher Boessenkool wrote: > > On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote: > >>> Do you need to verify SETA and SETB satisfy single_set? Or has that > >>> already been done elsewhere? > >> > >> A is NEXT_INSN (insn) > >> B is prev_nonnote_nondebug_insn (insn), > >> > >> For I1 -> I2 -> B; I2 -> A; > >> LOG_LINK can make sure I1 and I2 are single_set, > > > > It cannot, not anymore anyway. LOG_LINKs can point to an insn with > > multiple SETs; multiple LOG_LINKs can point to such an insn. > So let's go ahead and put a single_set test in this function. > > >>>> Is this fragment really needed? Does it ever trigger? I'd think > >>>> that > >>> for > 2 uses punting would be fine. Do we really commonly have > >>> cases with > 2 uses, but where they're all in SETA and SETB? > > > > Can't you just check for a death note on the second insn? Together > > with reg_used_between_p? > Yea, that'd accomplish the same thing I think Zhenqiang is trying to catch and > is simpler than walking the lists. Updated. Check for a death note is enough since b is prev_nonnote_nondebug_insn (a). > > > >>>> + /* Try to combine a compare insn that sets CC > >>>> + with a preceding insn that can set CC, and maybe with its > >>>> + logical predecessor as well. > >>>> + We need this special code because data flow connections > >>>> + do not get entered in LOG_LINKS. */ > > > > I think you mean "not _all_ data flow connections"? > I almost said something about this comment, but figured I was nitpicking too > much :-) Updated. > >>> So you've got two new combine cases here, but I think the testcase > >>> only tests one of them. Can you include a testcase for both of hte > >>> major paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN) > >> > >> pr61225.c is the case to cover I1->I2->I3; I2->insn. > >> > >> For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2 > >> can also handle them. So I removed the code from the patch. > > > > Why? The simpler case has much better chances of being used. > The question does it actually catch anything not already handled? I guess you > could argue that doing it in combine is better than peep2 and I'd agree with > that. > > > > > In fact, there are many more cases you could handle: > > > > You handle > > > > I1 -> I2 -> I3; I2 -> insn > > I2 -> I3; I2 -> insn > > > > but there are also > > > > I1,I2 -> I3; I2 -> insn > > > > and the many 4-insn combos, too. > Yes, but I wonder how much of this is really necessary in practice. We > could do exhaustive testing here, but I suspect the payoff isn't all > that great. Thus I'm comfortable with faulting in the cases we actually > find are useful in practice. > > > > >> +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2. > >> + It returns TRUE, if reg1 == reg2, and no other refer of reg1 > >> + except A and B. */ > > > > That sound like the only correct inputs are such a compare etc., but the > > routine tests whether that is true. > Correct, the RTL has to have a specific form and that is tested for. > Comment updates can't hurt. Updated. > > > >> +static bool > >> +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b) > >> +{ > >> + rtx seta = single_set (a); > >> + rtx setb = single_set (b); > >> + > >> + if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b) > > > > Neither the comment nor the function name mention this. This test is > > better placed in the caller of this function, anyway. > Didn't consider it terribly important. Moving it to the caller doesn't > change anything significantly, though I would agree it's martinally cleaner. Updated. > > > >> @@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2, > rtx_insn > >> *i1, rtx_insn *i0, > >> rtx old = newpat; > >> total_sets = 1 + extra_sets; > >> newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets)); > >> - XVECEXP (newpat, 0, 0) = old; > >> + > >> + if (to_combined_insn) > >> + XVECEXP (newpat, 0, --total_sets) = old; > >> + else > >> + XVECEXP (newpat, 0, 0) = old; > >> } > > > > Is this correct? If so, it needs a big fat comment, because it is > > not exactly obvious :-) > > > > Also, it doesn't handle at all the case where the new pattern already is > > a PARALLEL; can that never happen? > I'd convinced myself it was. But yes, a comment here would be good. The following comments are added. + /* This is a hack to match i386 instruction pattern, which + is like + (parallel [ + (set (reg:CCZ 17 flags) + ...) + (set ...)}) + we have to swap the newpat order of I3 and TO_COMBINED_INSN. */ > Presumably you're thinking about a PARALLEL that satisfies single_set_p? No. It has nothing to do with single_set_p. I just want to reuse the code to match the instruction pattern. In common, the new PARALLEL is like Parallel newpat from I3 newpat from I2 // if have newpat from I1 // if have newpat from I0 // if have For to_combined_insn, i0 is NULL and there should have no newpat from I1 When handling I1->I2->I3, with normal order, it will get Parallel newpat from I3 After I2-> to_combined_insn, the parallel will be Parallel newpat from I3 newpat from to_combined_insn. But this can not match the insn pattern. So I swap the order to. Parallel newpat from to_combined_insn. newpat from I3 Here is the updated patch. +/* { dg-final { cleanup-rtl-dump "combine" } } */ diff --git a/gcc/combine.c b/gcc/combine.c index 9ed03be..bbdddfb 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -431,7 +431,7 @@ static int can_combine_p (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn *, static int combinable_i3pat (rtx_insn *, rtx *, rtx, rtx, rtx, int, int, rtx *); static int contains_muldiv (rtx); static rtx_insn *try_combine (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn *, - int *, rtx_insn *); + int *, rtx_insn *, rtx_insn *); static void undo_all (void); static void undo_commit (void); static rtx *find_split_point (rtx *, rtx_insn *, bool); @@ -1120,6 +1120,32 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b) #endif return false; } + +/* The function checks whether it is possible to use B to set the CC flag + in A. It returns TRUE, if A is a compare (reg1, 0), B is a SINGLE_SET + which SET_SRC is reg2, reg1 == reg2, and no other refer of reg1 + except A and B. */ + +static bool +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b) +{ + rtx seta = single_set (a); + rtx setb = single_set (b); + + if (!seta || !setb) + return false; + + if (GET_CODE (SET_SRC (seta)) != COMPARE + || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC + || !REG_P (XEXP (SET_SRC (seta), 0)) + || XEXP (SET_SRC (seta), 1) != CONST0_RTX (GET_MODE (SET_SRC (seta))) + || !REG_P (SET_SRC (setb)) + || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0))) + return false; + + return find_reg_note (a, REG_DEAD, XEXP (SET_SRC (seta), 0)); +} + /* Main entry point for combiner. F is the first insn of the function. NREGS is the first unused pseudo-reg number. @@ -1129,10 +1155,7 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b) static int combine_instructions (rtx_insn *f, unsigned int nregs) { - rtx_insn *insn, *next; -#ifdef HAVE_cc0 - rtx_insn *prev; -#endif + rtx_insn *insn, *next, *prev; struct insn_link *links, *nextlinks; rtx_insn *first; basic_block last_bb; @@ -1279,7 +1302,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) FOR_EACH_LOG_LINK (links, insn) if ((next = try_combine (insn, links->insn, NULL, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "two-insn combine", 1); goto retry; @@ -1300,7 +1323,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) FOR_EACH_LOG_LINK (nextlinks, link) if ((next = try_combine (insn, link, nextlinks->insn, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "three-insn combine", 1); goto retry; @@ -1322,13 +1345,13 @@ combine_instructions (rtx_insn *f, unsigned int nregs) { if ((next = try_combine (insn, prev, NULL, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) goto retry; FOR_EACH_LOG_LINK (nextlinks, prev) if ((next = try_combine (insn, prev, nextlinks->insn, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) goto retry; } @@ -1342,13 +1365,13 @@ combine_instructions (rtx_insn *f, unsigned int nregs) { if ((next = try_combine (insn, prev, NULL, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) goto retry; FOR_EACH_LOG_LINK (nextlinks, prev) if ((next = try_combine (insn, prev, nextlinks->insn, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) goto retry; } @@ -1364,7 +1387,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) && sets_cc0_p (PATTERN (prev)) && (next = try_combine (insn, links->insn, prev, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) goto retry; #endif @@ -1377,7 +1400,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) if ((next = try_combine (insn, links->insn, nextlinks->insn, NULL, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "three-insn combine", 1); @@ -1406,7 +1429,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1417,7 +1440,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1434,7 +1457,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1444,7 +1467,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) if ((next = try_combine (insn, link, link1, nextlinks->insn, &new_direct_jump_p, - last_combined_insn)) != 0) + last_combined_insn, NULL)) != 0) { statistics_counter_event (cfun, "four-insn combine", 1); goto retry; @@ -1452,6 +1475,38 @@ combine_instructions (rtx_insn *f, unsigned int nregs) } } + /* Try to combine a compare insn that sets CC + with a preceding insn that can set CC, and maybe with its + logical predecessor as well. + We need this special code because not all data flow connections + get entered in LOG_LINKS. */ + if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX + && BLOCK_FOR_INSN (insn) == BLOCK_FOR_INSN (prev) + && can_reuse_cc_set_p (insn, prev) + && max_combine >= 4) + { + struct insn_link *next1; + FOR_EACH_LOG_LINK (next1, prev) + { + rtx_insn *link1 = next1->insn; + if (NOTE_P (link1)) + continue; + /* I1 -> I2 -> I3; I2 -> insn; + output parallel (insn, I3). */ + FOR_EACH_LOG_LINK (nextlinks, link1) + if ((next = try_combine (prev, link1, + nextlinks->insn, NULL, + &new_direct_jump_p, + last_combined_insn, insn)) != 0) + + { + delete_insn (insn); + insn = next; + statistics_counter_event (cfun, "four-insn combine", 1); + goto retry; + } + } + } /* Try this insn with each REG_EQUAL note it links back to. */ FOR_EACH_LOG_LINK (links, insn) { @@ -1477,7 +1532,7 @@ combine_instructions (rtx_insn *f, unsigned int nregs) i2mod_new_rhs = copy_rtx (note); next = try_combine (insn, i2mod, NULL, NULL, &new_direct_jump_p, - last_combined_insn); + last_combined_insn, NULL); i2mod = NULL; if (next) { @@ -2535,11 +2590,15 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n) LAST_COMBINED_INSN is either I3, or some insn after I3 that has been I3 passed to an earlier try_combine within the same basic - block. */ + block. + + TO_COMBINED_INSN is an insn after I3 that sets CC flags. It is used to + combine with I3 to make a new insn. */ static rtx_insn * try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, - int *new_direct_jump_p, rtx_insn *last_combined_insn) + int *new_direct_jump_p, rtx_insn *last_combined_insn, + rtx_insn *to_combined_insn) { /* New patterns for I3 and I2, respectively. */ rtx newpat, newi2pat = 0; @@ -2632,6 +2691,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, || cant_combine_insn_p (i2) || (i1 && cant_combine_insn_p (i1)) || (i0 && cant_combine_insn_p (i0)) + || (to_combined_insn && cant_combine_insn_p (to_combined_insn)) || likely_spilled_retval_p (i3)) return 0; @@ -3325,7 +3385,18 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, rtx old = newpat; total_sets = 1 + extra_sets; newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets)); - XVECEXP (newpat, 0, 0) = old; + + if (to_combined_insn) + /* This is a hack to match i386 instruction pattern, which + is like + (parallel [ + (set (reg:CCZ 17 flags) + ...) + (set ...)}) + we have to swap the newpat order of I3 and TO_COMBINED_INSN. */ + XVECEXP (newpat, 0, --total_sets) = old; + else + XVECEXP (newpat, 0, 0) = old; } if (added_sets_0) @@ -3348,6 +3419,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, if ((i0_feeds_i1_n && i1_feeds_i2_n) || i0_feeds_i2_n) t = subst (t, i0dest, i0src_copy2 ? i0src_copy2 : i0src, 0, 0, 0); + if (to_combined_insn) + { + rtx todest = NULL_RTX, tosrc = NULL_RTX; + if (can_combine_p (i2, to_combined_insn, NULL, NULL, + i3, NULL, &todest, &tosrc)) + { + rtx pat = copy_rtx (PATTERN (to_combined_insn)); + t = subst (pat, todest, tosrc, 0, 0, 0); + } + else + { + undo_all (); + return 0; + } + } XVECEXP (newpat, 0, --total_sets) = t; } } diff --git a/gcc/testsuite/gcc.target/i386/pr61225.c b/gcc/testsuite/gcc.target/i386/pr61225.c new file mode 100644 index 0000000..9c08102 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr61225.c @@ -0,0 +1,16 @@ +/* PR rtl-optimization/61225 */ +/* { dg-do compile } */ +/* { dg-options "-Os -fdump-rtl-combine-details" } */ + +void foo (void *); + +int * +f1 (int *x) +{ + if (!--*x) + foo (x); + return x; +} + +/* { dg-final { scan-rtl-dump "Successfully matched this instruction" "combine" } } */