diff mbox

Fix PR 61225

Message ID 000101d015dd$1042e6c0$30c8b440$@arm.com
State New
Headers show

Commit Message

Zhenqiang Chen Dec. 12, 2014, 7:27 a.m. UTC
> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: Wednesday, December 10, 2014 3:16 AM
> To: Segher Boessenkool; Zhenqiang Chen
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Fix PR 61225
> 
> On 12/09/14 12:07, Segher Boessenkool wrote:
> > On Tue, Dec 09, 2014 at 05:49:18PM +0800, Zhenqiang Chen wrote:
> >>> Do you need to verify SETA and SETB satisfy single_set?  Or has that
> >>> already been done elsewhere?
> >>
> >> A is NEXT_INSN (insn)
> >> B is prev_nonnote_nondebug_insn (insn),
> >>
> >> For I1 -> I2 -> B; I2 -> A;
> >> LOG_LINK can make sure I1 and I2 are single_set,
> >
> > It cannot, not anymore anyway.  LOG_LINKs can point to an insn with
> > multiple SETs; multiple LOG_LINKs can point to such an insn.
> So let's go ahead and put a single_set test in this function.
> 
> >>>> Is this fragment really needed?  Does it ever trigger?  I'd think
> >>>> that
> >>> for > 2 uses punting would be fine.  Do we really commonly have
> >>> cases with > 2 uses, but where they're all in SETA and SETB?
> >
> > Can't you just check for a death note on the second insn?  Together
> > with reg_used_between_p?
> Yea, that'd accomplish the same thing I think Zhenqiang is trying to catch
and
> is simpler than walking the lists.

Updated. Check for a death note is enough since b is
prev_nonnote_nondebug_insn (a). 
 
> >
> >>>> +	  /* Try to combine a compare insn that sets CC
> >>>> +	     with a preceding insn that can set CC, and maybe with
its
> >>>> +	     logical predecessor as well.
> >>>> +	     We need this special code because data flow connections
> >>>> +	     do not get entered in LOG_LINKS.  */
> >
> > I think you mean "not _all_ data flow connections"?
> I almost said something about this comment, but figured I was nitpicking
too
> much :-)

Updated. 

> >>> So you've got two new combine cases here, but I think the testcase
> >>> only tests one of them.  Can you include a testcase for both of hte
> >>> major paths above (I1->I2->I3; I2->insn and I2->I3; I2->INSN)
> >>
> >> pr61225.c is the case to cover I1->I2->I3; I2->insn.
> >>
> >> For I2 -> I3; I2 -> insn, I tried my test cases and found peephole2
> >> can also handle them. So I removed the code from the patch.
> >
> > Why?  The simpler case has much better chances of being used.
> The question does it actually catch anything not already handled?  I guess
you
> could argue that doing it in combine is better than peep2 and I'd agree
with
> that.
> 
> >
> > In fact, there are many more cases you could handle:
> >
> > You handle
> >
> > I1 -> I2 -> I3; I2 -> insn
> >        I2 -> I3; I2 -> insn
> >
> > but there are also
> >
> >     I1,I2 -> I3; I2 -> insn
> >
> > and the many 4-insn combos, too.
> Yes, but I wonder how much of this is really necessary in practice.  We
> could do exhaustive testing here, but I suspect the payoff isn't all
> that great.  Thus I'm comfortable with faulting in the cases we actually
> find are useful in practice.
> 
> >
> >> +/* A is a compare (reg1, 0) and B is SINGLE_SET which SET_SRC is reg2.
> >> +   It returns TRUE, if reg1 == reg2, and no other refer of reg1
> >> +   except A and B.  */
> >
> > That sound like the only correct inputs are such a compare etc., but the
> > routine tests whether that is true.
> Correct, the RTL has to have a specific form and that is tested for.
> Comment updates can't hurt.
 
Updated.
 
> >
> >> +static bool
> >> +can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b)
> >> +{
> >> +  rtx seta = single_set (a);
> >> +  rtx setb = single_set (b);
> >> +
> >> +  if (BLOCK_FOR_INSN (a) != BLOCK_FOR_INSN (b)
> >
> > Neither the comment nor the function name mention this.  This test is
> > better placed in the caller of this function, anyway.
> Didn't consider it terribly important.  Moving it to the caller doesn't
> change anything significantly, though I would agree it's martinally
cleaner.

Updated.
 
> >
> >> @@ -3323,7 +3396,11 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
> rtx_insn
> >> *i1, rtx_insn *i0,
> >>   	  rtx old = newpat;
> >>   	  total_sets = 1 + extra_sets;
> >>   	  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
> >> -	  XVECEXP (newpat, 0, 0) = old;
> >> +
> >> +	  if (to_combined_insn)
> >> +	    XVECEXP (newpat, 0, --total_sets) = old;
> >> +	  else
> >> +	    XVECEXP (newpat, 0, 0) = old;
> >>   	}
> >
> > Is this correct?  If so, it needs a big fat comment, because it is
> > not exactly obvious :-)
> >
> > Also, it doesn't handle at all the case where the new pattern already is
> > a PARALLEL; can that never happen?
> I'd convinced myself it was.  But yes, a comment here would be good.

The following comments are added.

+	    /* This is a hack to match i386 instruction pattern, which
+		is like
+		    (parallel [
+			(set (reg:CCZ 17 flags)
+				...)
+			(set ...)})
+		we have to swap the newpat order of I3 and TO_COMBINED_INSN.
*/

> Presumably you're thinking about a PARALLEL that satisfies single_set_p?

No. It has nothing to do with single_set_p. I just want to reuse the code to
match the instruction pattern.

In common, the new PARALLEL is like

  Parallel
    newpat from I3
    newpat from I2 // if have
    newpat from I1 // if have
    newpat from I0 // if have

For to_combined_insn, i0 is NULL and there should have no

    newpat from I1

When handling I1->I2->I3, with normal order, it will get
  Parallel
    newpat from I3

After I2-> to_combined_insn, the parallel will be
  Parallel
    newpat from I3
    newpat from to_combined_insn.

But this can not match the insn pattern. So I swap the order to.
  Parallel
    newpat from to_combined_insn.
    newpat from I3

Here is the updated patch.

+/* { dg-final { cleanup-rtl-dump "combine" } } */

Comments

Segher Boessenkool Dec. 12, 2014, 11:08 a.m. UTC | #1
On Fri, Dec 12, 2014 at 03:27:17PM +0800, Zhenqiang Chen wrote:
> > Presumably you're thinking about a PARALLEL that satisfies single_set_p?
> 
> No. It has nothing to do with single_set_p. I just want to reuse the code to
> match the instruction pattern.
> 
> In common, the new PARALLEL is like
> 
>   Parallel
>     newpat from I3
>     newpat from I2 // if have
>     newpat from I1 // if have
>     newpat from I0 // if have
> 
> For to_combined_insn, i0 is NULL and there should have no
> 
>     newpat from I1
> 
> When handling I1->I2->I3, with normal order, it will get
>   Parallel
>     newpat from I3
> 
> After I2-> to_combined_insn, the parallel will be
>   Parallel
>     newpat from I3
>     newpat from to_combined_insn.
> 
> But this can not match the insn pattern. So I swap the order to.
>   Parallel
>     newpat from to_combined_insn.
>     newpat from I3

Maybe I wasn't clear, sorry.  My concern is you only handle a SET as
newpat, not a PARALLEL.  It can be a PARALLEL just fine, even if it
satisfies single_set (it can have a clobber, it can have multiple sets,
all but one dead).


Thanks for the other changes, much appreciated.


Segher
diff mbox

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index 9ed03be..bbdddfb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -431,7 +431,7 @@  static int can_combine_p (rtx_insn *, rtx_insn *,
rtx_insn *, rtx_insn *,
 static int combinable_i3pat (rtx_insn *, rtx *, rtx, rtx, rtx, int, int,
rtx *);
 static int contains_muldiv (rtx);
 static rtx_insn *try_combine (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn
*,
-			      int *, rtx_insn *);
+			      int *, rtx_insn *, rtx_insn *);
 static void undo_all (void);
 static void undo_commit (void);
 static rtx *find_split_point (rtx *, rtx_insn *, bool);
@@ -1120,6 +1120,32 @@  insn_a_feeds_b (rtx_insn *a, rtx_insn *b)
 #endif
   return false;
 }
+
+/* The function checks whether it is possible to use B to set the CC flag
+   in A.  It returns TRUE, if A is a compare (reg1, 0), B is a SINGLE_SET
+   which SET_SRC is reg2, reg1 == reg2, and no other refer of reg1
+   except A and B.  */
+
+static bool
+can_reuse_cc_set_p (rtx_insn *a, rtx_insn *b)
+{
+  rtx seta = single_set (a);
+  rtx setb = single_set (b);
+
+  if (!seta || !setb)
+    return false;
+
+  if (GET_CODE (SET_SRC (seta)) != COMPARE
+      || GET_MODE_CLASS (GET_MODE (SET_DEST (seta))) != MODE_CC
+      || !REG_P (XEXP (SET_SRC (seta), 0))
+      || XEXP (SET_SRC (seta), 1) != CONST0_RTX (GET_MODE (SET_SRC (seta)))
+      || !REG_P (SET_SRC (setb))
+      || REGNO (SET_SRC (setb)) != REGNO (XEXP (SET_SRC (seta), 0)))
+    return false;
+
+  return find_reg_note (a, REG_DEAD, XEXP (SET_SRC (seta), 0));
+}
+
 

 /* Main entry point for combiner.  F is the first insn of the function.
    NREGS is the first unused pseudo-reg number.
@@ -1129,10 +1155,7 @@  insn_a_feeds_b (rtx_insn *a, rtx_insn *b)
 static int
 combine_instructions (rtx_insn *f, unsigned int nregs)
 {
-  rtx_insn *insn, *next;
-#ifdef HAVE_cc0
-  rtx_insn *prev;
-#endif
+  rtx_insn *insn, *next, *prev;
   struct insn_link *links, *nextlinks;
   rtx_insn *first;
   basic_block last_bb;
@@ -1279,7 +1302,7 @@  combine_instructions (rtx_insn *f, unsigned int nregs)
 	  FOR_EACH_LOG_LINK (links, insn)
 	    if ((next = try_combine (insn, links->insn, NULL,
 				     NULL, &new_direct_jump_p,
-				     last_combined_insn)) != 0)
+				     last_combined_insn, NULL)) != 0)
 	      {
 		statistics_counter_event (cfun, "two-insn combine", 1);
 		goto retry;
@@ -1300,7 +1323,7 @@  combine_instructions (rtx_insn *f, unsigned int nregs)
 		FOR_EACH_LOG_LINK (nextlinks, link)
 		  if ((next = try_combine (insn, link, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    {
 		      statistics_counter_event (cfun, "three-insn combine",
1);
 		      goto retry;
@@ -1322,13 +1345,13 @@  combine_instructions (rtx_insn *f, unsigned int
nregs)
 	    {
 	      if ((next = try_combine (insn, prev, NULL, NULL,
 				       &new_direct_jump_p,
-				       last_combined_insn)) != 0)
+				       last_combined_insn, NULL)) != 0)
 		goto retry;
 
 	      FOR_EACH_LOG_LINK (nextlinks, prev)
 		  if ((next = try_combine (insn, prev, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    goto retry;
 	    }
 
@@ -1342,13 +1365,13 @@  combine_instructions (rtx_insn *f, unsigned int
nregs)
 	    {
 	      if ((next = try_combine (insn, prev, NULL, NULL,
 				       &new_direct_jump_p,
-				       last_combined_insn)) != 0)
+				       last_combined_insn, NULL)) != 0)
 		goto retry;
 
 	      FOR_EACH_LOG_LINK (nextlinks, prev)
 		  if ((next = try_combine (insn, prev, nextlinks->insn,
 					   NULL, &new_direct_jump_p,
-					   last_combined_insn)) != 0)
+					   last_combined_insn, NULL)) != 0)
 		    goto retry;
 	    }
 
@@ -1364,7 +1387,7 @@  combine_instructions (rtx_insn *f, unsigned int nregs)
 		  && sets_cc0_p (PATTERN (prev))
 		  && (next = try_combine (insn, links->insn,
 					  prev, NULL, &new_direct_jump_p,
-					  last_combined_insn)) != 0)
+					  last_combined_insn, NULL)) != 0)
 		goto retry;
 #endif
 
@@ -1377,7 +1400,7 @@  combine_instructions (rtx_insn *f, unsigned int nregs)
 		if ((next = try_combine (insn, links->insn,
 					 nextlinks->insn, NULL,
 					 &new_direct_jump_p,
-					 last_combined_insn)) != 0)
+					 last_combined_insn, NULL)) != 0)
 
 		  {
 		    statistics_counter_event (cfun, "three-insn combine",
1);
@@ -1406,7 +1429,7 @@  combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1417,7 +1440,7 @@  combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1434,7 +1457,7 @@  combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1444,7 +1467,7 @@  combine_instructions (rtx_insn *f, unsigned int nregs)
 		      if ((next = try_combine (insn, link, link1,
 					       nextlinks->insn,
 					       &new_direct_jump_p,
-					       last_combined_insn)) != 0)
+					       last_combined_insn, NULL)) !=
0)
 			{
 			  statistics_counter_event (cfun, "four-insn
combine", 1);
 			  goto retry;
@@ -1452,6 +1475,38 @@  combine_instructions (rtx_insn *f, unsigned int
nregs)
 		  }
 	      }
 
+	  /* Try to combine a compare insn that sets CC
+	     with a preceding insn that can set CC, and maybe with its
+	     logical predecessor as well.
+	     We need this special code because not all data flow connections
+	     get entered in LOG_LINKS.  */
+	  if ((prev = prev_nonnote_nondebug_insn (insn)) != NULL_RTX
+	      && BLOCK_FOR_INSN (insn) == BLOCK_FOR_INSN (prev)
+	      && can_reuse_cc_set_p (insn, prev)
+	      && max_combine >= 4)
+	    {
+		struct insn_link *next1;
+		FOR_EACH_LOG_LINK (next1, prev)
+		  {
+		    rtx_insn *link1 = next1->insn;
+		    if (NOTE_P (link1))
+		      continue;
+		    /* I1 -> I2 -> I3; I2 -> insn;
+		       output parallel (insn, I3).  */
+		    FOR_EACH_LOG_LINK (nextlinks, link1)
+		      if ((next = try_combine (prev, link1,
+					       nextlinks->insn, NULL,
+					       &new_direct_jump_p,
+					       last_combined_insn, insn)) !=
0)
+
+			{
+			  delete_insn (insn);
+			  insn = next;
+			  statistics_counter_event (cfun, "four-insn
combine", 1);
+			  goto retry;
+			}
+		  }
+	    }
 	  /* Try this insn with each REG_EQUAL note it links back to.  */
 	  FOR_EACH_LOG_LINK (links, insn)
 	    {
@@ -1477,7 +1532,7 @@  combine_instructions (rtx_insn *f, unsigned int nregs)
 		  i2mod_new_rhs = copy_rtx (note);
 		  next = try_combine (insn, i2mod, NULL, NULL,
 				      &new_direct_jump_p,
-				      last_combined_insn);
+				      last_combined_insn, NULL);
 		  i2mod = NULL;
 		  if (next)
 		    {
@@ -2535,11 +2590,15 @@  can_split_parallel_of_n_reg_sets (rtx_insn *insn,
int n)
 
    LAST_COMBINED_INSN is either I3, or some insn after I3 that has
    been I3 passed to an earlier try_combine within the same basic
-   block.  */
+   block.
+
+   TO_COMBINED_INSN is an insn after I3 that sets CC flags.  It is used to
+   combine with I3 to make a new insn.  */
 
 static rtx_insn *
 try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
-	     int *new_direct_jump_p, rtx_insn *last_combined_insn)
+	     int *new_direct_jump_p, rtx_insn *last_combined_insn,
+	     rtx_insn *to_combined_insn)
 {
   /* New patterns for I3 and I2, respectively.  */
   rtx newpat, newi2pat = 0;
@@ -2632,6 +2691,7 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1,
rtx_insn *i0,
       || cant_combine_insn_p (i2)
       || (i1 && cant_combine_insn_p (i1))
       || (i0 && cant_combine_insn_p (i0))
+      || (to_combined_insn && cant_combine_insn_p (to_combined_insn))
       || likely_spilled_retval_p (i3))
     return 0;
 
@@ -3325,7 +3385,18 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
*i1, rtx_insn *i0,
 	  rtx old = newpat;
 	  total_sets = 1 + extra_sets;
 	  newpat = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (total_sets));
-	  XVECEXP (newpat, 0, 0) = old;
+
+	  if (to_combined_insn)
+	    /* This is a hack to match i386 instruction pattern, which
+		is like
+		    (parallel [
+			(set (reg:CCZ 17 flags)
+				...)
+			(set ...)})
+		we have to swap the newpat order of I3 and TO_COMBINED_INSN.
*/
+	    XVECEXP (newpat, 0, --total_sets) = old;
+	  else
+	    XVECEXP (newpat, 0, 0) = old;
 	}
 
       if (added_sets_0)
@@ -3348,6 +3419,21 @@  try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
*i1, rtx_insn *i0,
 	  if ((i0_feeds_i1_n && i1_feeds_i2_n) || i0_feeds_i2_n)
 	    t = subst (t, i0dest, i0src_copy2 ? i0src_copy2 : i0src, 0, 0,
0);
 
+	  if (to_combined_insn)
+	    {
+	      rtx todest = NULL_RTX, tosrc = NULL_RTX;
+	      if (can_combine_p (i2, to_combined_insn, NULL, NULL,
+				 i3, NULL, &todest, &tosrc))
+		{
+		  rtx pat = copy_rtx (PATTERN (to_combined_insn));
+		  t = subst (pat, todest, tosrc, 0, 0, 0);
+		}
+	      else
+		{
+		  undo_all ();
+		  return 0;
+		}
+	    }
 	  XVECEXP (newpat, 0, --total_sets) = t;
 	}
     }
diff --git a/gcc/testsuite/gcc.target/i386/pr61225.c
b/gcc/testsuite/gcc.target/i386/pr61225.c
new file mode 100644
index 0000000..9c08102
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr61225.c
@@ -0,0 +1,16 @@ 
+/* PR rtl-optimization/61225 */
+/* { dg-do compile } */
+/* { dg-options "-Os -fdump-rtl-combine-details" } */
+
+void foo (void *);
+
+int *
+f1 (int *x)
+{
+  if (!--*x)
+    foo (x);
+  return x;
+}
+
+/* { dg-final { scan-rtl-dump "Successfully matched this instruction"
"combine" } } */