From patchwork Tue Oct 2 15:49:19 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: RFA: Fix OP_INOUT handling of web.c:union_match_dups Date: Tue, 02 Oct 2012 05:49:19 -0000 From: Joern Rennecke X-Patchwork-Id: 188560 Message-Id: <20121002114919.zb2fcoyr404w40sc-nzlynne@webmail.spamcop.net> To: gcc-patches@gcc.gnu.org Similar to PR43742, the ARCompact port gets an ICE from the current mainline version of web.c:union_match_dups for its zero overhead loop pattern. My first attempt at rectifying this was equivalent in effect to the patch from comment #1 from this patch; that seemed to work well enough. Later I stumbled across PR43742, which made me take a second look at the problem. Unlike the SH, the ARCompact architecture as an actual zero overhead loop mechanism, which uses a dedicated loop counter register. Although it can be used in most contexts that a general-purpose register can, this causes a lot of pipeline stalls, so if we changed the match_dup into a matching constraint, and reload inserted reg-reg copies to fix up matching consatraints for just a small fraction of the zero overhead loops, the performance penalty of these stalls would wipe out any benefit gained from having any compiler generated zero overhead loops. Looking at md.texi, you could be excused thinking that match_dups have to follow the operand that they match only for define_expand. However, when you try to scramble the order in a define_insn_and_split, you get an error: /home/amylaar/synopsys/arc_gnu_4.8/unisrc/gcc/config/arc/arc.md:5516: operand 0 duplicated before defined which is emitted by validate_pattern in genrecog.c . This code is from 2004, so I'd say there is a good chance that more code that actually relies on this. Some patterns might be made to conform both to the match_dup ordering constraint and avoid the web.c SEGV by reordering the pattern, although at times at other infrastructure, e.g. when every place that tries to recognize zero overhead loop patterns has to be amended to look for multiple forms. But other patterns intrinsically need one of these strictures removed. Consider an instruction that atomically exchanges the contents of two registers and/or memory locations: The source of the first set must match the destination of the second set. So, if we have to make the first occurencence a match_operand, we must tag the "+" constraint on this input. Therefore, web.c:union_match_dups should handle "+" constraints on inputs tied with a match_dup to a later mentioned output. The problem here is that the current version of this function only searches the match_dup location in the use_link array, but for an OP_INPUT operand, the location will be in the def_link array. When I originally implemented this, I put some asserts there to make sure we now handle all the *dupref == NULL cases; however, this lead to ICEs for i686-pc-linux-gnu. As mentioned in the new comment, the DF_REF_LOC (use_link[n]) points to the register part of a memory address, wheras recog_data.dup_loc[m] points to an enclosing MEM. This is really a separate problem, so I choose to leave the behaviour in this case alone, i.e. just continue in the loop without creating the def/use union, even though the comment at the top of the function says that it should create that. Bootstrapped and regtested on i686-pc-linux-gnu (baseline: revision 191817) . 2012-10-02 Joern Rennecke * web.c (union_match_dups): Properly handle OP_INOUT match_dups. Index: web.c =================================================================== --- web.c (revision 191817) +++ web.c (working copy) @@ -96,6 +96,7 @@ union_match_dups (rtx insn, struct web_e struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn); df_ref *use_link = DF_INSN_INFO_USES (insn_info); df_ref *def_link = DF_INSN_INFO_DEFS (insn_info); + struct web_entry *dup_entry; int i; extract_insn (insn); @@ -107,10 +108,24 @@ union_match_dups (rtx insn, struct web_e df_ref *ref, *dupref; struct web_entry *entry; - for (dupref = use_link; *dupref; dupref++) + for (dup_entry = use_entry, dupref = use_link; *dupref; dupref++) if (DF_REF_LOC (*dupref) == recog_data.dup_loc[i]) break; + if (*dupref == NULL && type == OP_INOUT) + { + + for (dup_entry = def_entry, dupref = def_link; *dupref; dupref++) + if (DF_REF_LOC (*dupref) == recog_data.dup_loc[i]) + break; + } + /* ??? *DUPREF can still be zero, because when an operand matches + a memory, DF_REF_LOC (use_link[n]) points to the register part + of the address, whereas recog_data.dup_loc[m] points to the + entire memory ref, thus we fail to find the duplicate entry, + even though it is there. + Example: i686-pc-linux-gnu gcc.c-torture/compile/950607-1.c + -O3 -fomit-frame-pointer -funroll-loops */ if (*dupref == NULL || DF_REF_REGNO (*dupref) < FIRST_PSEUDO_REGISTER) continue; @@ -121,7 +136,15 @@ union_match_dups (rtx insn, struct web_e if (DF_REF_LOC (*ref) == recog_data.operand_loc[op]) break; - (*fun) (use_entry + DF_REF_ID (*dupref), entry + DF_REF_ID (*ref)); + if (!*ref && type == OP_INOUT) + { + for (ref = use_link, entry = use_entry; *ref; ref++) + if (DF_REF_LOC (*ref) == recog_data.operand_loc[op]) + break; + } + + gcc_assert (*ref); + (*fun) (dup_entry + DF_REF_ID (*dupref), entry + DF_REF_ID (*ref)); } }