Patchwork RFA: Fix OP_INOUT handling of web.c:union_match_dups

login
register
mail settings
Submitter Joern Rennecke
Date Oct. 2, 2012, 3:49 p.m.
Message ID <20121002114919.zb2fcoyr404w40sc-nzlynne@webmail.spamcop.net>
Download mbox | patch
Permalink /patch/188560/
State New
Headers show

Comments

Joern Rennecke - Oct. 2, 2012, 3:49 p.m.
Similar to PR43742, the ARCompact port gets an ICE from the current mainline
version of web.c:union_match_dups for its zero overhead loop pattern.
My first attempt at rectifying this was equivalent in effect to the patch
from comment #1 from this patch; that seemed to work well enough.
Later I stumbled across PR43742, which made me take a second look at the
problem.

Unlike the SH, the ARCompact architecture as an actual zero overhead loop
mechanism, which uses a dedicated loop counter register.  Although it can
be used in most contexts that a general-purpose register can, this causes
a lot of pipeline stalls, so if we changed the match_dup into a matching
constraint, and reload inserted reg-reg copies to fix up matching consatraints
for just a small fraction of the zero overhead loops, the performance penalty
of these stalls would wipe out any benefit gained from having any compiler
generated zero overhead loops.

Looking at md.texi, you could be excused thinking that match_dups have to
follow the operand that they match only for define_expand.
However, when you try to scramble the order in a define_insn_and_split, you
get an error:

/home/amylaar/synopsys/arc_gnu_4.8/unisrc/gcc/config/arc/arc.md:5516:  
operand 0 duplicated before defined

which is emitted by validate_pattern in genrecog.c .  This code is from 2004,
so I'd say there is a good chance that more code that actually relies on this.

Some patterns might be made to conform both to the match_dup ordering
constraint and avoid the web.c SEGV by reordering the pattern, although
at times at other infrastructure, e.g. when every place that tries to
recognize zero overhead loop patterns has to be amended to look for multiple
forms.  But other patterns intrinsically need one of these strictures
removed.
Consider an instruction that atomically exchanges the contents of
two registers and/or memory locations:  The source of the first set must
match the destination of the second set.  So, if we have to make the first
occurencence a match_operand, we must tag the "+" constraint on this input.

Therefore, web.c:union_match_dups should handle "+" constraints on inputs
tied with a match_dup to a later mentioned output.

The problem here is that the current version of this function only searches
the match_dup location in the use_link array, but for an OP_INPUT operand,
the location will be in the def_link array.

When I originally implemented this, I put some asserts there to make sure
we now handle all the *dupref == NULL cases; however, this lead to ICEs for
i686-pc-linux-gnu.  As mentioned in the new comment, the
DF_REF_LOC (use_link[n]) points to the register part of a memory address,
wheras recog_data.dup_loc[m] points to an enclosing MEM.
This is really a separate problem, so I choose to leave the behaviour in this
case alone, i.e. just continue in the loop without creating the  
def/use union, even though the comment at the top of the function says  
that it should create
that.

Bootstrapped and regtested on i686-pc-linux-gnu (baseline: revision 191817) .

2012-10-02  Joern Rennecke  <joern.rennecke@embecosm.com>

         * web.c (union_match_dups): Properly handle OP_INOUT match_dups.

Patch

Index: web.c
===================================================================
--- web.c	(revision 191817)
+++ web.c	(working copy)
@@ -96,6 +96,7 @@  union_match_dups (rtx insn, struct web_e
   struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
   df_ref *use_link = DF_INSN_INFO_USES (insn_info);
   df_ref *def_link = DF_INSN_INFO_DEFS (insn_info);
+  struct web_entry *dup_entry;
   int i;
 
   extract_insn (insn);
@@ -107,10 +108,24 @@  union_match_dups (rtx insn, struct web_e
       df_ref *ref, *dupref;
       struct web_entry *entry;
 
-      for (dupref = use_link; *dupref; dupref++)
+      for (dup_entry = use_entry, dupref = use_link; *dupref; dupref++)
 	if (DF_REF_LOC (*dupref) == recog_data.dup_loc[i])
 	  break;
 
+      if (*dupref == NULL && type == OP_INOUT)
+	{
+
+	  for (dup_entry = def_entry, dupref = def_link; *dupref; dupref++)
+	    if (DF_REF_LOC (*dupref) == recog_data.dup_loc[i])
+	      break;
+	}
+      /* ??? *DUPREF can still be zero, because when an operand matches
+	 a memory, DF_REF_LOC (use_link[n]) points to the register part
+	 of the address, whereas recog_data.dup_loc[m] points to the
+	 entire memory ref, thus we fail to find the duplicate entry,
+         even though it is there.
+         Example: i686-pc-linux-gnu gcc.c-torture/compile/950607-1.c
+		  -O3 -fomit-frame-pointer -funroll-loops  */
       if (*dupref == NULL
 	  || DF_REF_REGNO (*dupref) < FIRST_PSEUDO_REGISTER)
 	continue;
@@ -121,7 +136,15 @@  union_match_dups (rtx insn, struct web_e
 	if (DF_REF_LOC (*ref) == recog_data.operand_loc[op])
 	  break;
 
-      (*fun) (use_entry + DF_REF_ID (*dupref), entry + DF_REF_ID (*ref));
+      if (!*ref && type == OP_INOUT)
+	{
+	  for (ref = use_link, entry = use_entry; *ref; ref++)
+	    if (DF_REF_LOC (*ref) == recog_data.operand_loc[op])
+	      break;
+	}
+
+      gcc_assert (*ref);
+      (*fun) (dup_entry + DF_REF_ID (*dupref), entry + DF_REF_ID (*ref));
     }
 }