Message ID | 1294691517-19580-29-git-send-email-rth@redhat.com |
---|---|
State | New |
Headers | show |
On Mon, Jan 10, 2011 at 12:31:57PM -0800, Richard Henderson wrote: > +static unsigned int > +execute_compare_elim_after_reload (void) > +{ > + /* Eliminate comparisons that are redundant with flags computation. */ > + for (i = 0; VEC_iterate (comparison_struct_p, all_compares, i, cmp); ++i) Please use FOR_EACH_VEC_ELT here. -Nathan
On 01/11/2011 10:30 AM, Nathan Froyd wrote: > On Mon, Jan 10, 2011 at 12:31:57PM -0800, Richard Henderson wrote: >> +static unsigned int >> +execute_compare_elim_after_reload (void) >> +{ >> + /* Eliminate comparisons that are redundant with flags computation. */ >> + for (i = 0; VEC_iterate (comparison_struct_p, all_compares, i, cmp); ++i) > > Please use FOR_EACH_VEC_ELT here. Fixed locally, thanks. r~
> + /* Find the DEF of the flags register. It must be there. */ > + for (use_rec = DF_INSN_DEFS (insn); ; use_rec++) > + { > + use = *use_rec; > + if (DF_REF_TYPE (use) == DF_REF_REG_DEF > + && DF_REF_REGNO (use) == targetm.flags_regnum) > + break; > + } Can you rename these to def/def_rec? For a moment I thought you were not using def-use chains because I found only DF_REF_CHAIN (use). (My observations below actually override this comment). > + /* Note that df does not create use chains that cross basic blocks. I don't think this is correct, as this is the very thing that is responsible for the chains problem's potential quadratic behavior. Have you seen it in practice (the chain that doesn't cross basic blocks, not the quadratic behavior)? This is basically the only case in which you'd rely on non-singleton chains, and you're solving it by punting anyway (i.e. via missing_uses). This means this pass could be done just as easily without def-use chains. During the forward walk you can record the last definition of CC in the basic block (which could start at last_cmp for an extended basic block). Then, when you walk each insn's uses and look for a CC use. If you find it, you know what it's last definition is. That is, find_flags_uses_in_bb becomes something like maybe_record_flags_use and you would call it for all instructions, passing the last CC def. > + if (DF_REF_CHAIN (use) == NULL) > + return false; > + > + def = DF_REF_CHAIN (use)->ref; Here you should probably bail out if the use has multiple reaching definitions (i.e. DF_REF_CHAIN (use)->next != NULL). It probably won't happen given how cbranch/cstore splitters work, but you never know. > + Note that this doesn't follow the USE-DEF chain from X, but > + since we already have to search for the previous clobber of > + the flags register, this wouldn't really be a problem. */ > + > + /* Make sure that the flags are not clobbered in between the two > + instructions. Unfortunately, we don't get DEF-DEF chains, so > + do this the old fashioned way. */ Again, this is probably handled better without the use-def chains. (Chains are in the DF framework, but are rarely the best solution---especially if you're not limiting them to a region, e.g. a loop). For a full-blown solution, we could generalize the singleton-use-def-chains code from fwprop and reuse it here. Actually, I don't think you plan to move instructions in compare-elim, just like NOTICE_UPDATE_CC didn't. So, the forward scan can remember, together with the last comparison insn, the last instruction that clobbered the flags. Then whenever you find a comparison, you can already look at the shape of the last clobber; you then store if the comparison instruction is "mergeable", and if so with which insn. Non-mergeable comparisons would still be recorded for the sake of eliminating duplicates. > + /* Succeed if the new instruction is valid. */ > + validate_change (dinsn, &XVECEXP (dpat, 0, 1), x, true); > + if (!apply_change_group ()) > + return false; Here you can test the return value of validate_change directly. Thanks very much for this work, it is a very nice and readable pass, and probably one of the biggest steps towards eradication of cc0. HTH, Paolo
On 01/12/2011 12:31 AM, Paolo Bonzini wrote: >> + /* Note that df does not create use chains that cross basic blocks. > > I don't think this is correct, as this is the very thing that is > responsible for the chains problem's potential quadratic behavior. > Have you seen it in practice (the chain that doesn't cross basic > blocks, not the quadratic behavior)? No, I thought I'd correctly read the code to determine that. Of course, with the code that I'm currently generating from mn10300 we'll never see a cross-block occurrence, but I expect other targets to easily do so. For instance, in the RX target I expect that we'll generate e.g. UNLT with an UNORDERED branch and a LT branch, and that we'll only generate one compare out of the post-reload splitter to do so. > During the forward walk you can record the last definition of CC in > the basic block (which could start at last_cmp for an extended basic > block). Then, when you walk each insn's uses and look for a CC use. > If you find it, you know what it's last definition is. That is, > find_flags_uses_in_bb becomes something like maybe_record_flags_use > and you would call it for all instructions, passing the last CC def. An interesting idea. >> + if (DF_REF_CHAIN (use) == NULL) >> + return false; >> + >> + def = DF_REF_CHAIN (use)->ref; > > Here you should probably bail out if the use has multiple reaching > definitions (i.e. DF_REF_CHAIN (use)->next != NULL). It probably > won't happen given how cbranch/cstore splitters work, but you never > know. Good idea, though this has nothing to do with cbranch/cstore. This is finding the DEF for the register use inside the compare. >> + Note that this doesn't follow the USE-DEF chain from X, but >> + since we already have to search for the previous clobber of >> + the flags register, this wouldn't really be a problem. */ >> + >> + /* Make sure that the flags are not clobbered in between the two >> + instructions. Unfortunately, we don't get DEF-DEF chains, so >> + do this the old fashioned way. */ > > Again, this is probably handled better without the use-def chains. > (Chains are in the DF framework, but are rarely the best > solution---especially if you're not limiting them to a region, e.g. a > loop). Err, I'm not using use-def chains for this. I'm using reg_set_between_p. I'm a bit confused about your statement here. >> + /* Succeed if the new instruction is valid. */ >> + validate_change (dinsn, &XVECEXP (dpat, 0, 1), x, true); >> + if (!apply_change_group ()) >> + return false; > > Here you can test the return value of validate_change directly. True enough. I'm a bit confused about the stance in regards to chains. Should I attempt to avoid them entirely, so that I never add the problem? I guess I can implement try_eliminate_compare by: * In find_comparisons_in_bb, remember the previous insn that clobbers CC_REG. Record that in struct comparison.prev_clobber. * In try_eliminate_compare, if prev_clobber is not set, then we don't know where CC_REG was previously set, so we cannot eliminate. * The SET in prev_clobber must be the comparison.in_a register. If it isn't, then we cannot eliminate. * Use reg_set_between_p to verify that comparison.in_a is not modified between prev_clobber and comparison.insn. It's that last step that seems a bit... odd, but not wrong exactly. But is it odd enough to warrant using chains? r~
On 01/12/2011 08:44 PM, Richard Henderson wrote: > On 01/12/2011 12:31 AM, Paolo Bonzini wrote: >>> + /* Note that df does not create use chains that cross basic blocks. >> >> I don't think this is correct, as this is the very thing that is >> responsible for the chains problem's potential quadratic behavior. >> Have you seen it in practice (the chain that doesn't cross basic >> blocks, not the quadratic behavior)? > > No, I thought I'd correctly read the code to determine that. Chains are based on the output of reaching definitions, they do not care about whether the defs are in the same basic block or not. >>> + if (DF_REF_CHAIN (use) == NULL) >>> + return false; >>> + >>> + def = DF_REF_CHAIN (use)->ref; >> >> Here you should probably bail out if the use has multiple reaching >> definitions (i.e. DF_REF_CHAIN (use)->next != NULL). It probably >> won't happen given how cbranch/cstore splitters work, but you never >> know. > > Good idea, though this has nothing to do with cbranch/cstore. This > is finding the DEF for the register use inside the compare. Oops, right. >>> + Note that this doesn't follow the USE-DEF chain from X, but >>> + since we already have to search for the previous clobber of >>> + the flags register, this wouldn't really be a problem. */ >>> + >>> + /* Make sure that the flags are not clobbered in between the two >>> + instructions. Unfortunately, we don't get DEF-DEF chains, so >>> + do this the old fashioned way. */ >> >> Again, this is probably handled better without the use-def chains. >> (Chains are in the DF framework, but are rarely the best >> solution---especially if you're not limiting them to a region, e.g. a >> loop). > > Err, I'm not using use-def chains for this. I'm using reg_set_between_p. > I'm a bit confused about your statement here. You're right, what I meant was "since you are anyway resorting to the old fashioned way, you can probably handle everything better without the use-def chains". > I'm a bit confused about the stance in regards to chains. Should I > attempt to avoid them entirely, so that I never add the problem? It's an expensive problem when applied to the entire function, so it is better to avoid it. I'm not saying this cannot be delayed to 4.7 if the pass goes in now (it's almost a target-specific pass now, so it probably could). > I guess I can implement try_eliminate_compare by: > > * In find_comparisons_in_bb, remember the previous insn that > clobbers CC_REG. Record that in struct comparison.prev_clobber. > > * In try_eliminate_compare, if prev_clobber is not set, then we > don't know where CC_REG was previously set, so we cannot eliminate. > > * The SET in prev_clobber must be the comparison.in_a register. > If it isn't, then we cannot eliminate. > > * Use reg_set_between_p to verify that comparison.in_a is not > modified between prev_clobber and comparison.insn. Exactly. > It's that last step that seems a bit... odd, but not wrong exactly. > But is it odd enough to warrant using chains? I'd say no because you're doing the same right now. USE-DEF chains get you to a clobber, and then reg_set_between_p confirms whether that clobber is the prev_clobber. If you really want to remove the reg_set_between_p, you can of course record the SET of prev_clobber, and reset prev_clobber if an insn sets that register. But I think it's more clumsy and not enough more efficient than what you outlined. Paolo
Version 2, based on Paolo's comments from v1. Changes: (0) The USE-DEF and DEF-USE chains are gone. (1) A PREV_CLOBBER insn is recorded during the single scan of the function. This insn is the insn previous to the compare which (A) clobbers the flags and (B) is of a shape amenable to compare elimination. If the insn previous to the compare that clobbers the flags is not amenable to elimination -- e.g. a call -- we don't bother recording it, and later know that we shouldn't try. (2) Uses of the compare are also found during that single scan. Some, ahem, highly questionable assumptions that had been made about how DEF-USE chains are or are not created are gone. I check the DF_LIVE_BB_INFO now, combined with the shape of the CFG to determine if there's a use of the flags that escapes the extended basic block. (3) try_eliminate_compare has been rewritten along the lines we discussed previously, although instead of reg_set_between_p... (4) Porting the RX target to the compare-elimination pass was very helpful. If showed that there's a phase ordering problem, that fortunately can be worked around with a little bit of effort. Consider libgcc _absvdi3.o, as of split2: (insn 76 75 74 4 (parallel [ (set (reg:SI 4 r4 [38]) (minus:SI (minus:SI (reg:SI 4 r4 [38]) (reg:SI 2 r2 [orig:37 a+4 ] [37])) (geu:SI (reg:CC 16 cc) (const_int 0 [0])))) (clobber (reg:CC 16 cc)) ]) libgcc2.c:265 76 {sbb_internal} (nil)) (insn 74 76 77 4 (set (reg:SI 2 r2 [orig:33 w+4 ] [33]) (reg:SI 4 r4 [38])) libgcc2.c:265 22 {*movsi_internal} (nil)) (insn 77 74 78 4 (set (reg:CC 16 cc) (compare:CC (reg:SI 2 r2 [orig:33 w+4 ] [33]) (const_int 0 [0]))) libgcc2.c:267 1 {*cmpsi} (nil)) For RX, subtract-with-borrow (sbb) is a two-address insn, and so RA has little choice but to add an output reload (insn 74) in order to satisfy the matching constraint and put the result into the proper register. If we were to run this pass after pass_cprop_hardreg, this sequence would be tidied up such that the compare (insn 77) would have R4 as an input, and so we would immediately see the corresponding arithmetic (insn 76) and perform the elimination. However, pass_cprop_hardreg currently runs much later. In particular, after pass_peephole2. Now, peep2 is where we would like to make transformations like "mov 0,r" -> "xor r,r" iff the flags are not already live at that location. In order to satisfy both, we'd have to shuffle lots of passes around with the attendant problem of determining if we've pessimized something else. Instead, simply look for copy insns while we're stepping back looking to be sure that the input to the compare isn't changed since PREV_CLOBBER. Comments? r~ /* Post-reload compare elimination. Copyright (C) 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. GCC is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version. GCC is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ /* There is a set of targets whose general-purpose move or addition instructions clobber the flags. These targets cannot split their CBRANCH/CSTORE etc patterns before reload is complete, lest reload itself insert instructions in between the flags setter and user. Because these targets cannot split the compare from the use, they cannot make use of the comparison elimination offered by the combine pass. This is a small pass intended to provide comparison elimination similar to what is available via NOTICE_UPDATE_CC for cc0 targets. This should help encourage cc0 targets to convert to an explicit post-reload representation of the flags. This pass assumes: (0) CBRANCH/CSTORE etc have been split in pass_split_after_reload. (1) All comparison patterns are represented as [(set (reg:CC) (compare:CC (reg) (immediate)))] (2) All insn patterns that modify the flags are represented as [(set (reg) (operation) (clobber (reg:CC))] (3) If an insn of form (2) can usefully set the flags, there is another pattern of the form [(set (reg) (operation) (set (reg:CCM) (compare:CCM (operation) (immediate)))] The mode CCM will be chosen as if by SELECT_CC_MODE. */ #include "config.h" #include "system.h" #include "coretypes.h" #include "tm.h" #include "rtl.h" #include "tm_p.h" #include "insn-config.h" #include "recog.h" #include "flags.h" #include "basic-block.h" #include "tree-pass.h" #include "target.h" #include "df.h" #include "domwalk.h" /* These structures describe a comparison and how it is used. */ /* The choice of maximum 3 uses comes from wanting to eliminate the two duplicate compares from a three-way branch on the sign of a value. This is also sufficient to eliminate the duplicate compare against the high-part of a double-word comparison. */ #define MAX_CMP_USE 3 struct comparison_use { /* The instruction in which the result of the compare is used. */ rtx insn; /* The location of the flags register within the use. */ rtx *loc; /* The comparison code applied against the flags register. */ enum rtx_code code; }; struct comparison { /* The comparison instruction. */ rtx insn; /* The insn prior to the comparison insn that clobbers the flags. */ rtx prev_clobber; /* The two values being compared. These will be either REGs or constants. */ rtx in_a, in_b; /* Information about how this comparison is used. */ struct comparison_use uses[MAX_CMP_USE]; /* The original CC_MODE for this comparison. */ enum machine_mode orig_mode; /* The number of uses identified for this comparison. */ unsigned short n_uses; /* True if not all uses of this comparison have been identified. This can happen either for overflowing the array above, or if the flags register is used in some unusual context. */ bool missing_uses; /* True if its inputs are still valid at the end of the block. */ bool inputs_valid; }; typedef struct comparison *comparison_struct_p; DEF_VEC_P(comparison_struct_p); DEF_VEC_ALLOC_P(comparison_struct_p, heap); static VEC(comparison_struct_p, heap) *all_compares; /* Look for a "conforming" comparison, as defined above. If valid, return the rtx for the COMPARE itself. */ static rtx conforming_compare (rtx insn) { rtx set, src, dest; set = single_set (insn); if (set == NULL) return NULL; src = SET_SRC (set); if (GET_CODE (src) != COMPARE) return NULL; dest = SET_DEST (set); if (!REG_P (dest) || REGNO (dest) != targetm.flags_regnum) return NULL; if (REG_P (XEXP (src, 0)) && REG_P (XEXP (src, 0)) && (REG_P (XEXP (src, 1)) || CONSTANT_P (XEXP (src, 1)))) return src; return NULL; } /* Look for a pattern of the "correct" form for an insn with a flags clobber for which we may be able to eliminate a compare later. We're not looking to validate any inputs at this time, merely see that the basic shape is correct. The term "arithmetic" may be somewhay misleading... */ static bool arithmetic_flags_clobber_p (rtx insn) { rtx pat, x; if (!NONJUMP_INSN_P (insn)) return false; pat = PATTERN (insn); if (extract_asm_operands (pat)) return false; if (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) == 2) { x = XVECEXP (pat, 0, 0); if (GET_CODE (x) != SET) return false; x = SET_DEST (x); if (!REG_P (x)) return false; x = XVECEXP (pat, 0, 1); if (GET_CODE (x) == CLOBBER) { x = XEXP (x, 0); if (REG_P (x) && REGNO (x) == targetm.flags_regnum) return true; } } return false; } /* Look for uses of FLAGS in INSN. If we find one we can analyze, record it in CMP; otherwise indicate that we've missed a use. */ static void find_flags_uses_in_insn (struct comparison *cmp, rtx insn) { df_ref *use_rec, use; /* If we've already lost track of uses, don't bother collecting more. */ if (cmp->missing_uses) return; /* Find a USE of the flags register. */ for (use_rec = DF_INSN_USES (insn); (use = *use_rec) != NULL; use_rec++) if (DF_REF_REGNO (use) == targetm.flags_regnum) { rtx x, *loc; /* If this is an unusual use, quit. */ if (DF_REF_TYPE (use) != DF_REF_REG_USE) goto fail; /* If we've run out of slots to record uses, quit. */ if (cmp->n_uses == MAX_CMP_USE) goto fail; /* Unfortunately, the location of the flags register, while present in the reference structure, doesn't help. We need to find the comparison code that is outer to the actual flags use. */ loc = DF_REF_LOC (use); x = PATTERN (insn); if (GET_CODE (x) == PARALLEL) x = XVECEXP (x, 0, 0); x = SET_SRC (x); if (GET_CODE (x) == IF_THEN_ELSE) x = XEXP (x, 0); if (COMPARISON_P (x) && loc == &XEXP (x, 0) && XEXP (x, 1) == const0_rtx) { /* We've found a use of the flags that we understand. */ struct comparison_use *cuse = &cmp->uses[cmp->n_uses++]; cuse->insn = insn; cuse->loc = loc; cuse->code = GET_CODE (x); } else goto fail; } return; fail: /* We failed to recognize this use of the flags register. */ cmp->missing_uses = true; } /* Identify comparison instructions within BB. If the last compare in the BB is valid at the end of the block, install it in BB->AUX. Called via walk_dominators_tree. */ static void find_comparisons_in_bb (struct dom_walk_data *data ATTRIBUTE_UNUSED, basic_block bb) { struct comparison *last_cmp; rtx insn, next, last_clobber; bool last_cmp_valid; bitmap killed; killed = BITMAP_ALLOC (NULL); /* The last comparison that was made. Will be reset to NULL once the flags are clobbered. */ last_cmp = NULL; /* True iff the last comparison has not been clobbered, nor have its inputs. Used to eliminate duplicate compares. */ last_cmp_valid = false; /* The last insn that clobbered the flags, if that insn is of a form that may be valid for eliminating a following compare. To be reset to NULL once the flags are set otherwise. */ last_clobber = NULL; /* Propagate the last live comparison throughout the extended basic block. */ if (single_pred_p (bb)) { last_cmp = (struct comparison *) single_pred (bb)->aux; if (last_cmp) last_cmp_valid = last_cmp->inputs_valid; } for (insn = BB_HEAD (bb); insn; insn = next) { rtx src; next = (insn == BB_END (bb) ? NULL_RTX : NEXT_INSN (insn)); if (!NONDEBUG_INSN_P (insn)) continue; src = conforming_compare (insn); if (src) { /* Eliminate a compare that's redundant with the previous. */ if (last_cmp_valid && rtx_equal_p (last_cmp->in_a, XEXP (src, 0)) && rtx_equal_p (last_cmp->in_b, XEXP (src, 1))) { delete_insn (insn); continue; } last_cmp = XCNEW (struct comparison); last_cmp->insn = insn; last_cmp->prev_clobber = last_clobber; last_cmp->in_a = XEXP (src, 0); last_cmp->in_b = XEXP (src, 1); last_cmp->orig_mode = GET_MODE (SET_DEST (single_set (insn))); VEC_safe_push (comparison_struct_p, heap, all_compares, last_cmp); /* It's unusual, but be prepared for comparison patterns that also clobber an input, or perhaps a scratch. */ last_clobber = NULL; last_cmp_valid = true; bitmap_clear (killed); df_simulate_find_defs (insn, killed); } else { /* Notice if this instruction kills the flags register. */ bitmap_clear (killed); df_simulate_find_defs (insn, killed); if (bitmap_bit_p (killed, targetm.flags_regnum)) { /* See if this insn could be the "clobber" that eliminates a future comparison. */ if (arithmetic_flags_clobber_p (insn)) last_clobber = insn; else last_clobber = NULL; /* In either case, the previous compare is no longer valid. */ last_cmp = NULL; last_cmp_valid = false; continue; } /* Notice if this insn uses the flags register. */ if (last_cmp) find_flags_uses_in_insn (last_cmp, insn); } /* Notice if any of the inputs to the comparison have changed. */ if (last_cmp && (bitmap_bit_p (killed, REGNO (last_cmp->in_a)) || (REG_P (last_cmp->in_b) && bitmap_bit_p (killed, REGNO (last_cmp->in_b))))) last_cmp_valid = false; } BITMAP_FREE (killed); /* Remember the live comparison for subsequent members of the extended basic block. */ if (last_cmp) { bb->aux = last_cmp; last_cmp->inputs_valid = last_cmp_valid; /* Look to see if the flags register is live outgoing here, and incoming to any successor not part of the extended basic block. */ if (bitmap_bit_p (&DF_LIVE_BB_INFO (bb)->out, targetm.flags_regnum)) { edge e; edge_iterator ei; FOR_EACH_EDGE (e, ei, bb->succs) { basic_block dest = e->dest; if (bitmap_bit_p (&DF_LIVE_BB_INFO (dest)->in, targetm.flags_regnum) && !single_pred_p (dest)) { last_cmp->missing_uses = true; break; } } } } } /* Find all comparisons in the function. */ static void find_comparisons (void) { struct dom_walk_data data; memset (&data, 0, sizeof(data)); data.dom_direction = CDI_DOMINATORS; data.before_dom_children = find_comparisons_in_bb; calculate_dominance_info (CDI_DOMINATORS); init_walk_dominator_tree (&data); walk_dominator_tree (&data, ENTRY_BLOCK_PTR); fini_walk_dominator_tree (&data); clear_aux_for_blocks (); free_dominance_info (CDI_DOMINATORS); } /* Select an alternate CC_MODE for a comparison insn comparing A and B. Note that inputs are almost certainly different than the IN_A and IN_B stored in CMP -- we're called while attempting to eliminate the compare after all. Return the new FLAGS rtx if successful, else return NULL. */ static rtx maybe_select_cc_mode (struct comparison *cmp, rtx a, rtx b) { enum machine_mode sel_mode; const int n = cmp->n_uses; rtx flags = NULL; #ifndef SELECT_CC_MODE /* Minimize code differences when this target macro is undefined. */ return NULL; #define SELECT_CC_MODE(A,B,C) (gcc_unreachable (), VOIDmode) #endif /* If we don't have access to all of the uses, we can't validate. */ if (cmp->missing_uses || n == 0) return NULL; /* Find a new mode that works for all of the uses. Special case the common case of exactly one use. */ if (n == 1) { sel_mode = SELECT_CC_MODE (cmp->uses[0].code, a, b); if (sel_mode != cmp->orig_mode) { flags = gen_rtx_REG (sel_mode, targetm.flags_regnum); validate_change (cmp->uses[0].insn, cmp->uses[0].loc, flags, true); } } else { int i; sel_mode = SELECT_CC_MODE (cmp->uses[0].code, a, b); for (i = 1; i < n; ++i) { enum machine_mode new_mode; new_mode = SELECT_CC_MODE (cmp->uses[i].code, a, b); if (new_mode != sel_mode) { sel_mode = targetm.cc_modes_compatible (sel_mode, new_mode); if (sel_mode == VOIDmode) return NULL; } } if (sel_mode != cmp->orig_mode) { flags = gen_rtx_REG (sel_mode, targetm.flags_regnum); for (i = 0; i < n; ++i) validate_change (cmp->uses[i].insn, cmp->uses[i].loc, flags, true); } } return flags; } /* Attempt to find an instruction prior to CMP that can be used to compute the same flags value as the comparison itself. Return true if successful. */ static bool try_eliminate_compare (struct comparison *cmp) { rtx x, insn, bb_head, flags, in_a, cmp_src; /* We must have found an interesting "clobber" preceeding the compare. */ if (cmp->prev_clobber == NULL) return false; /* ??? For the moment we don't handle comparisons for which IN_B is a register. We accepted these during initial comparison recognition in order to eliminate duplicate compares. An improvement here would be to handle x = a - b; if (a < b). */ if (!CONSTANT_P (cmp->in_b)) return false; /* Verify that PREV_CLOBBER defines IN_A, and that IN_A is not clobbered in between. Given that this target requires this pass, we can assume that most insns do clobber the flags, and so the distance between the compare and the clobber is likely to be small. */ /* ??? This is one point at which one could argue that DF_REF_CHAIN would be useful, but it is thought to be too heavy-weight a solution here. */ in_a = cmp->in_a; insn = cmp->insn; bb_head = BB_HEAD (BLOCK_FOR_INSN (insn)); for (insn = PREV_INSN (insn); insn != cmp->prev_clobber; insn = PREV_INSN (insn)) { const int abnormal_flags = (DF_REF_CONDITIONAL | DF_REF_PARTIAL | DF_REF_MAY_CLOBBER | DF_REF_MUST_CLOBBER | DF_REF_SIGN_EXTRACT | DF_REF_ZERO_EXTRACT | DF_REF_STRICT_LOW_PART | DF_REF_PRE_POST_MODIFY); df_ref *def_rec, def; /* Note that the BB_HEAD is always either a note or a label, but in any case it means that IN_A is defined outside the block. */ if (insn == bb_head) return false; if (NOTE_P (insn) || DEBUG_INSN_P (insn)) continue; /* Find a possible def of IN_A in INSN. */ for (def_rec = DF_INSN_DEFS (insn); (def = *def_rec) != NULL; def_rec++) if (DF_REF_REGNO (def) == REGNO (in_a)) break; /* No definitions of IN_A; continue searching. */ if (def == NULL) continue; /* Bail if this is not a totally normal set of IN_A. */ if (DF_REF_IS_ARTIFICIAL (def)) return false; if (DF_REF_FLAGS (def) & abnormal_flags) return false; /* We've found an insn between the compare and the clobber that sets IN_A. Given that pass_cprop_hardreg has not yet run, we still find situations in which we can usefully look through a copy insn. */ x = single_set (insn); if (x == NULL) return false; in_a = SET_SRC (x); if (!REG_P (in_a)) return false; } /* We've reached PREV_CLOBBER without finding a modification of IN_A. Validate that PREV_CLOBBER itself does in fact refer to IN_A. Do recall that we've already validated the shape of PREV_CLOBBER. */ x = XVECEXP (PATTERN (insn), 0, 0); if (!rtx_equal_p (SET_DEST (x), in_a)) return false; cmp_src = SET_SRC (x); /* Determine if we ought to use a different CC_MODE here. */ flags = maybe_select_cc_mode (cmp, cmp_src, cmp->in_b); if (flags == NULL) flags = gen_rtx_REG (cmp->orig_mode, targetm.flags_regnum); /* Generate a new comparison for installation in the setter. */ x = copy_rtx (cmp_src); x = gen_rtx_COMPARE (GET_MODE (flags), x, cmp->in_b); x = gen_rtx_SET (VOIDmode, flags, x); /* Succeed if the new instruction is valid. */ validate_change (insn, &XVECEXP (PATTERN (insn), 0, 1), x, true); if (!apply_change_group ()) return false; /* Success. Delete the compare insn... */ delete_insn (cmp->insn); /* ... and any notes that are now irrelevant due to multi-stores. */ x = find_regno_note (insn, REG_UNUSED, targetm.flags_regnum); if (x) remove_note (insn, x); x = find_reg_note (insn, REG_EQUAL, NULL); if (x) remove_note (insn, x); x = find_reg_note (insn, REG_EQUIV, NULL); if (x) remove_note (insn, x); return true; } /* Main entry point to the pass. */ static unsigned int execute_compare_elim_after_reload (void) { df_set_flags (DF_DEFER_INSN_RESCAN); df_live_add_problem (); df_analyze (); gcc_checking_assert (all_compares == NULL); /* Locate all comparisons and their uses, and eliminate duplicates. */ find_comparisons (); if (all_compares) { struct comparison *cmp; size_t i; /* Eliminate comparisons that are redundant with flags computation. */ FOR_EACH_VEC_ELT (comparison_struct_p, all_compares, i, cmp) { try_eliminate_compare (cmp); XDELETE (cmp); } VEC_free (comparison_struct_p, heap, all_compares); all_compares = NULL; df_analyze (); } return 0; } static bool gate_compare_elim_after_reload (void) { return (flag_compare_elim_after_reload && targetm.flags_regnum != INVALID_REGNUM); } struct rtl_opt_pass pass_compare_elim_after_reload = { { RTL_PASS, "cmpelim", /* name */ gate_compare_elim_after_reload, /* gate */ execute_compare_elim_after_reload, /* execute */ NULL, /* sub */ NULL, /* next */ 0, /* static_pass_number */ TV_NONE, /* tv_id */ 0, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ TODO_df_finish | TODO_df_verify | TODO_verify_rtl_sharing | TODO_dump_func | TODO_ggc_collect /* todo_flags_finish */ } };
On 01/17/2011 11:26 PM, Richard Henderson wrote: > Version 2, based on Paolo's comments from v1. > > Comments? Just a few nits: 1) a comment above maybe_select_cc_mode and above its call, stating that the function starts a change group, would be nice. I missed this in the previous review. Given the very nice comments in the code, it would be a nice touch. 2) Here: > src = conforming_compare (insn); > if (src) > { > ... > bitmap_clear (killed); > df_simulate_find_defs (insn, killed); > } > else > { > /* Notice if this instruction kills the flags register. */ > bitmap_clear (killed); > df_simulate_find_defs (insn, killed); The bitmap_clear/df_simulate_find_defs can be hoisted above the if. The only case where we don't use the result is when we delete a redundant compare, and it's definitely not the common case. This would make it clearer that KILLED is not used for any kind of real "simulation", but only as a quick way to query the array of DEFs for the presence of a register. 3) Just to take back that clock cycle lost in the above change, here: > if (last_cmp > && (bitmap_bit_p (killed, REGNO (last_cmp->in_a)) > || (REG_P (last_cmp->in_b) > && bitmap_bit_p (killed, REGNO (last_cmp->in_b))))) > last_cmp_valid = false; You can change LAST_CMP to LAST_CMP_VALID. LAST_CMP will never be NULL as long as LAST_CMP_VALID is true, but you may save some bitmap accesses. 4) Not a task for now, but perhaps some DF functions could be provided also in a form that operates on HARD_REG_SETs for improved performance. Just a reminder to myself. Besides this, "seems to be perfect". Paolo
diff --git a/gcc/Makefile.in b/gcc/Makefile.in index de3bde9..f421d5a 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1205,6 +1205,7 @@ OBJS-common = \ cfgrtl.o \ combine.o \ combine-stack-adj.o \ + compare-elim.o \ convert.o \ coverage.o \ cse.o \ @@ -3352,6 +3353,9 @@ combine-stack-adj.o : combine-stack-adj.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(TM_H) $(RTL_H) insn-config.h $(TIMEVAR_H) $(TREE_PASS_H) \ $(RECOG_H) output.h $(REGS_H) hard-reg-set.h $(FLAGS_H) $(FUNCTION_H) \ $(EXPR_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_CORE_H) $(TM_P_H) $(DF_H) $(EXCEPT_H) reload.h +compare-elim.o : compare-elim.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ + $(TM_H) $(RTL_H) $(TM_P_H) $(RECOG_H) $(FLAGS_H) $(BASIC_BLOCK_H) \ + $(TREE_PASS_H) $(DF_H) ddg.o : ddg.c $(DDG_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) \ $(DIAGNOSTIC_CORE_H) $(RTL_H) $(TM_P_H) $(REGS_H) $(FUNCTION_H) \ $(FLAGS_H) insn-config.h $(INSN_ATTR_H) $(EXCEPT_H) $(RECOG_H) \ diff --git a/gcc/common.opt b/gcc/common.opt index 32df6fc..5b01363 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -853,6 +853,10 @@ fcompare-debug-second Common Driver RejectNegative Var(flag_compare_debug) Run only the second compilation of -fcompare-debug +fcompare-elim +Common Report Var(flag_compare_elim_after_reload) Optimization +Perform comparison elimination after register allocation has finished + fconserve-stack Common Var(flag_conserve_stack) Optimization Do not perform optimizations increasing noticeably stack usage diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c new file mode 100644 index 0000000..a1d91cc --- /dev/null +++ b/gcc/compare-elim.c @@ -0,0 +1,519 @@ +/* Post-reload compare elimination. + Copyright (C) 2010, 2011 + Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +<http://www.gnu.org/licenses/>. */ + +/* There is a set of targets whose general-purpose move or addition + instructions clobber the flags. These targets cannot split their + CBRANCH/CSTORE etc patterns before reload is complete, lest reload + itself insert instructions in between the flags setter and user. + Because these targets cannot split the compare from the use, they + cannot make use of the comparison elimination offered by the combine pass. + + This is a small pass intended to provide comparison elimination similar to + what is available via NOTICE_UPDATE_CC for cc0 targets. This should help + encourage cc0 targets to convert to an explicit post-reload representation + of the flags. + + This pass assumes: + + (0) CBRANCH/CSTORE etc have been split in pass_split_after_reload. + + (1) All comparison patterns are represented as + + [(set (reg:CC) (compare:CC (reg) (immediate)] + + (2) All insn patterns that modify the flags are represented as + + [(set (reg) (operation) + (clobber (reg:CC))] + + (3) If an insn of form (2) can usefully set the flags, there is + another pattern of the form + + [(set (reg) (operation) + (set (reg:CCM) (compare:CCM (operation) (immediate)))] + + The mode CCM will be chosen as if by SELECT_CC_MODE. +*/ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" +#include "rtl.h" +#include "tm_p.h" +#include "insn-config.h" +#include "recog.h" +#include "flags.h" +#include "basic-block.h" +#include "tree-pass.h" +#include "target.h" +#include "df.h" +#include "domwalk.h" + + +/* These structures describe a comparison and how it is used. */ + +/* The choice of maximum 3 uses comes from wanting to eliminate the two + duplicate compares from a three-way branch on the sign of a value. + This is also sufficient to eliminate the duplicate compare against the + high-part of a double-word comparison. */ +#define MAX_CMP_USE 3 + +struct comparison_use +{ + /* The instruction in which the result of the compare is used. */ + rtx insn; + /* The location of the flags register within the use. */ + rtx *loc; + /* The comparison code applied against the flags register. */ + enum rtx_code code; +}; + +struct comparison +{ + /* The comparison instruction. */ + rtx insn; + + /* The two values being compared. These will be either REGs or + constants. */ + rtx in_a, in_b; + + /* Information about how this comparison is used. */ + struct comparison_use uses[MAX_CMP_USE]; + + /* The original CC_MODE for this comparison. */ + enum machine_mode orig_mode; + + /* The number of uses identified for this comparison. */ + unsigned short n_uses; + + /* True if not all uses of this comparison have been identified. + This can happen either for overflowing the array above, or if + the flags register is used in some unusual context. */ + bool missing_uses; +}; + +typedef struct comparison *comparison_struct_p; +DEF_VEC_P(comparison_struct_p); +DEF_VEC_ALLOC_P(comparison_struct_p, heap); + +static VEC(comparison_struct_p, heap) *all_compares; + +static void +find_flags_uses_in_bb (struct comparison *cmp, + basic_block bb ATTRIBUTE_UNUSED, rtx insn) +{ + df_ref *use_rec, use; + struct df_link *chain; + + /* If we've already lost track of uses, don't bother collecting more. */ + if (cmp->missing_uses) + return; + + /* Find the DEF of the flags register. It must be there. */ + for (use_rec = DF_INSN_DEFS (insn); ; use_rec++) + { + use = *use_rec; + if (DF_REF_TYPE (use) == DF_REF_REG_DEF + && DF_REF_REGNO (use) == targetm.flags_regnum) + break; + } + + for (chain = DF_REF_CHAIN (use); chain ; chain = chain->next) + { + rtx x, uinsn, *uloc; + + /* If we've run out of slots to record uses, quit. */ + if (cmp->n_uses == MAX_CMP_USE) + { + cmp->missing_uses = true; + return; + } + + /* Unfortunately, the location of the flags register, while present in + the reference structure, doesn't help. We need to find the comparison + code that is outer to the actual flags use. */ + uinsn = DF_REF_INSN (chain->ref); + uloc = DF_REF_LOC (chain->ref); + x = PATTERN (uinsn); + if (GET_CODE (x) == PARALLEL) + x = XVECEXP (x, 0, 0); + x = SET_SRC (x); + if (GET_CODE (x) == IF_THEN_ELSE) + x = XEXP (x, 0); + if (COMPARISON_P (x) + && uloc == &XEXP (x, 0) + && XEXP (x, 1) == const0_rtx) + { + struct comparison_use *cuse = &cmp->uses[cmp->n_uses++]; + cuse->insn = uinsn; + cuse->loc = uloc; + cuse->code = GET_CODE (x); + } + else + { + /* We failed to recognize this use of the flags register. */ + cmp->missing_uses = true; + return; + } + } +} + +/* Identify comparison instructions within BB. If the last compare in the BB + is valid at the end of the block, install it in BB->AUX. Called via + walk_dominators_tree. */ + +static void +find_comparisons_in_bb (struct dom_walk_data *data ATTRIBUTE_UNUSED, + basic_block bb) +{ + struct comparison *last_cmp = NULL; + bitmap killed; + rtx insn, next; + + killed = BITMAP_ALLOC (NULL); + + /* Propagate the last live comparison throughout the extended basic block. */ + if (single_pred_p (bb)) + last_cmp = (struct comparison *) single_pred (bb)->aux; + + for (insn = BB_HEAD (bb); insn; insn = next) + { + rtx set, src, dst; + + next = (insn == BB_END (bb) ? NULL_RTX : NEXT_INSN (insn)); + if (!NONDEBUG_INSN_P (insn)) + continue; + + if ((set = single_set (insn)) != NULL + && (src = SET_SRC (set), GET_CODE (src) == COMPARE) + && (dst = SET_DEST (set), REG_P (dst)) + && REGNO (dst) == targetm.flags_regnum + && REG_P (XEXP (src, 0)) + && (REG_P (XEXP (src, 1)) || CONSTANT_P (XEXP (src, 1)))) + { + /* Eliminate a compare that's redundant with the previous. */ + if (last_cmp + && rtx_equal_p (last_cmp->in_a, XEXP (src, 0)) + && rtx_equal_p (last_cmp->in_b, XEXP (src, 1))) + { + find_flags_uses_in_bb (last_cmp, bb, insn); + delete_insn (insn); + continue; + } + + last_cmp = XCNEW (struct comparison); + last_cmp->insn = insn; + last_cmp->in_a = XEXP (src, 0); + last_cmp->in_b = XEXP (src, 1); + last_cmp->orig_mode = GET_MODE (dst); + VEC_safe_push (comparison_struct_p, heap, all_compares, last_cmp); + + find_flags_uses_in_bb (last_cmp, bb, insn); + + /* It's unusual, but be prepared for comparison patterns that + also clobber an input, or perhaps a scratch. */ + bitmap_clear (killed); + df_simulate_find_defs (insn, killed); + } + else if (last_cmp) + { + /* Notice if this instruction kills the flags register. If so, + any comparison we're holding is no longer valid. */ + bitmap_clear (killed); + df_simulate_find_defs (insn, killed); + if (bitmap_bit_p (killed, targetm.flags_regnum)) + { + last_cmp = NULL; + continue; + } + } + else + continue; + + /* Notice if any of the inputs to the comparison have changed. */ + if (bitmap_bit_p (killed, REGNO (last_cmp->in_a))) + last_cmp = NULL; + else if (REG_P (last_cmp->in_b) + && bitmap_bit_p (killed, REGNO (last_cmp->in_b))) + last_cmp = NULL; + } + + BITMAP_FREE (killed); + + /* Remember the live comparison for subsequent members of + the extended basic block. */ + if (last_cmp) + { + bb->aux = last_cmp; + + /* Note that df does not create use chains that cross basic blocks. + Therefore, given that we've not yet eliminated duplicate comparisons, + if the flags register is live (in the sense of being used) then we're + not going to be able to find all uses. Given that this target needed + this pass in the first place, this should be exceedingly rare, caused + by a post-reload splitter creating new basic blocks. */ + if (bitmap_bit_p (DF_LR_OUT (bb), targetm.flags_regnum)) + last_cmp->missing_uses = true; + } +} + +/* Find all comparisons in the function. */ + +static void +find_comparisons (void) +{ + struct dom_walk_data data; + + memset (&data, 0, sizeof(data)); + data.dom_direction = CDI_DOMINATORS; + data.before_dom_children = find_comparisons_in_bb; + + calculate_dominance_info (CDI_DOMINATORS); + + init_walk_dominator_tree (&data); + walk_dominator_tree (&data, ENTRY_BLOCK_PTR); + fini_walk_dominator_tree (&data); + + clear_aux_for_blocks (); + free_dominance_info (CDI_DOMINATORS); +} + +/* Select an alternate CC_MODE for a comparison insn comparing A and B. + Note that inputs are almost certainly different than the IN_A and IN_B + stored in CMP -- we're called while attempting to eliminate the compare + after all. Return the new FLAGS rtx if successful, else return NULL. */ + +static rtx +maybe_select_cc_mode (struct comparison *cmp, rtx a, rtx b) +{ + enum machine_mode sel_mode; + const int n = cmp->n_uses; + rtx flags = NULL; + +#ifndef SELECT_CC_MODE + /* Minimize code differences when this target macro is undefined. */ + return NULL; +#define SELECT_CC_MODE(A,B,C) (gcc_unreachable (), VOIDmode) +#endif + + /* If we don't have access to all of the uses, we can't validate. */ + if (cmp->missing_uses || n == 0) + return NULL; + + /* Find a new mode that works for all of the uses. Special case the + common case of exactly one use. */ + if (n == 1) + { + sel_mode = SELECT_CC_MODE (cmp->uses[0].code, a, b); + if (sel_mode != cmp->orig_mode) + { + flags = gen_rtx_REG (sel_mode, targetm.flags_regnum); + validate_change (cmp->uses[0].insn, cmp->uses[0].loc, flags, true); + } + } + else + { + int i; + + sel_mode = SELECT_CC_MODE (cmp->uses[0].code, a, b); + for (i = 1; i < n; ++i) + { + enum machine_mode new_mode; + new_mode = SELECT_CC_MODE (cmp->uses[i].code, a, b); + if (new_mode != sel_mode) + { + sel_mode = targetm.cc_modes_compatible (sel_mode, new_mode); + if (sel_mode == VOIDmode) + return NULL; + } + } + + if (sel_mode != cmp->orig_mode) + { + flags = gen_rtx_REG (sel_mode, targetm.flags_regnum); + for (i = 0; i < n; ++i) + validate_change (cmp->uses[i].insn, cmp->uses[i].loc, flags, true); + } + } + + return flags; +} + +/* Attempt to find an instruction prior to CMP that can be used to compute the + same flags value as the comparison itself. Return true if successful. */ + +static bool +try_eliminate_compare (struct comparison *cmp) +{ + df_ref *use_rec, use, def; + rtx dinsn, dpat, x, flags, cmp_src; + + /* ??? For the moment we don't handle comparisons for which IN_B + is a register. We accepted these during initial comparison + recognition in order to eliminate duplicate compares. + An improvement here would be to handle + x = a - b; if (a < b) + Note that this doesn't follow the USE-DEF chain from X, but + since we already have to search for the previous clobber of + the flags register, this wouldn't really be a problem. */ + if (!CONSTANT_P (cmp->in_b)) + return false; + + /* The relevant instruction is the definition for the use in the + comparison instruction. */ + for (use_rec = DF_INSN_USES (cmp->insn); ; use_rec++) + { + use = *use_rec; + if (DF_REF_REGNO (use) == REGNO (cmp->in_a)) + break; + } + if (DF_REF_CHAIN (use) == NULL) + return false; + + def = DF_REF_CHAIN (use)->ref; + gcc_assert (DF_REF_REG_DEF_P (def)); + + /* If the defintion comes from a parameter, or such, there's no insn. */ + if (DF_REF_IS_ARTIFICIAL (def)) + return false; + + /* Make sure that the defining instruction is of the proper form. + That is, a single set plus a clobber of the flags register. */ + dinsn = DF_REF_INSN (def); + if (!NONJUMP_INSN_P (dinsn)) + return false; + dpat = PATTERN (dinsn); + if (GET_CODE (dpat) != PARALLEL || XVECLEN (dpat, 0) != 2) + return false; + x = XVECEXP (dpat, 0, 0); + if (GET_CODE (x) != SET || DF_REF_LOC (def) != &SET_DEST (x)) + return false; + x = XVECEXP (dpat, 0, 1); + if (GET_CODE (x) != CLOBBER) + return false; + flags = XEXP (x, 0); + if (!REG_P (flags) || REGNO (flags) != targetm.flags_regnum) + return false; + + /* Make sure that the flags are not clobbered in between the two + instructions. Unfortunately, we don't get DEF-DEF chains, so + do this the old fashioned way. */ + if (reg_set_between_p (flags, dinsn, cmp->insn)) + return false; + + /* Determine if we ought to use a different CC_MODE here. */ + cmp_src = SET_SRC (XVECEXP (dpat, 0, 0)); + x = maybe_select_cc_mode (cmp, cmp_src, cmp->in_b); + if (x != NULL) + flags = x; + + /* Generate a new comparison for installation in the setter. */ + x = copy_rtx (cmp_src); + x = gen_rtx_COMPARE (GET_MODE (flags), x, cmp->in_b); + x = gen_rtx_SET (VOIDmode, flags, x); + + /* Succeed if the new instruction is valid. */ + validate_change (dinsn, &XVECEXP (dpat, 0, 1), x, true); + if (!apply_change_group ()) + return false; + + /* Success. Delete the compare insn... */ + delete_insn (cmp->insn); + + /* ... and any notes that are now irrelevant due to multi-stores. */ + x = find_regno_note (dinsn, REG_UNUSED, targetm.flags_regnum); + if (x) + remove_note (dinsn, x); + x = find_reg_note (dinsn, REG_EQUAL, NULL); + if (x) + remove_note (dinsn, x); + x = find_reg_note (dinsn, REG_EQUIV, NULL); + if (x) + remove_note (dinsn, x); + + return true; +} + +/* Main entry point to the pass. */ + +static unsigned int +execute_compare_elim_after_reload (void) +{ + df_set_flags (DF_DEFER_INSN_RESCAN); + df_live_add_problem (); + df_chain_add_problem (DF_DU_CHAIN + DF_UD_CHAIN); + df_analyze (); + + gcc_checking_assert (all_compares == NULL); + + /* Locate all comparisons and their uses, and eliminate duplicates. */ + find_comparisons (); + if (all_compares) + { + struct comparison *cmp; + size_t i; + + /* Eliminate comparisons that are redundant with flags computation. */ + for (i = 0; VEC_iterate (comparison_struct_p, all_compares, i, cmp); ++i) + { + try_eliminate_compare (cmp); + XDELETE (cmp); + } + + VEC_free (comparison_struct_p, heap, all_compares); + all_compares = NULL; + + df_remove_problem (df_chain); + df_analyze (); + } + + return 0; +} + +static bool +gate_compare_elim_after_reload (void) +{ + return (flag_compare_elim_after_reload + && targetm.flags_regnum != INVALID_REGNUM); +} + +struct rtl_opt_pass pass_compare_elim_after_reload = +{ + { + RTL_PASS, + "cmpelim", /* name */ + gate_compare_elim_after_reload, /* gate */ + execute_compare_elim_after_reload, /* execute */ + NULL, /* sub */ + NULL, /* next */ + 0, /* static_pass_number */ + TV_NONE, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_df_finish + | TODO_df_verify + | TODO_verify_rtl_sharing + | TODO_dump_func + | TODO_ggc_collect /* todo_flags_finish */ + } +}; diff --git a/gcc/config/mn10300/mn10300.c b/gcc/config/mn10300/mn10300.c index 25bfa0d..e5bfdae 100644 --- a/gcc/config/mn10300/mn10300.c +++ b/gcc/config/mn10300/mn10300.c @@ -3023,4 +3023,7 @@ mn10300_md_asm_clobbers (tree outputs ATTRIBUTE_UNUSED, #undef TARGET_MD_ASM_CLOBBERS #define TARGET_MD_ASM_CLOBBERS mn10300_md_asm_clobbers +#undef TARGET_FLAGS_REGNUM +#define TARGET_FLAGS_REGNUM CC_REG + struct gcc_target targetm = TARGET_INITIALIZER; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 800c592..e50de64 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -337,7 +337,7 @@ Objective-C and Objective-C++ Dialects}. -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol -fcheck-data-deps -fcombine-stack-adjustments -fconserve-stack @gol --fcprop-registers -fcrossjumping @gol +-fcompare-elim -fcprop-registers -fcrossjumping @gol -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol -fcx-limited-range @gol -fdata-sections -fdce -fdce @gol @@ -5870,6 +5870,7 @@ compilation time. @option{-O} turns on the following optimization flags: @gccoptlist{ -fauto-inc-dec @gol +-fcompare-elim @gol -fcprop-registers @gol -fdce @gol -fdefer-pop @gol @@ -7680,6 +7681,18 @@ use hidden visibility) is similar to @code{-fwhole-program}. See Enabled by default when LTO support in GCC is enabled and GCC was compiled with linker supporting plugins (GNU ld or @code{gold}). +@item -fcompare-elim +@opindex fcompare-elim +After register allocation and post-register allocation instruction splitting, +identify arithmetic instructions that compute processor flags similar to a +comparison operation based on that arithmetic. If possible, eliminate the +explicit comparison operation. + +This pass only applies to certain targets that cannot explicitly represent +the comparison operation before register allocation is complete. + +Enabled at levels @option{-O}, @option{-O2}, @option{-O3}, @option{-Os}. + @item -fcprop-registers @opindex fcprop-registers After register allocation and post-register allocation instruction splitting, diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 139c5a7..5dfd100 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -4345,6 +4345,10 @@ to return a nonzero value when it is required, the compiler will run out of spill registers and print a fatal error message. @end deftypefn +@deftypevr {Target Hook} {unsigned int} TARGET_FLAGS_REGNUM +If the target has a dedicated flags register, and it needs to use the post-reload comparison elimination pass, then this value should be set appropriately. +@end deftypevr + @node Scalar Return @subsection How Scalar Function Values Are Returned @cindex return values in registers diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 2085816..bd49012 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4333,6 +4333,8 @@ to return a nonzero value when it is required, the compiler will run out of spill registers and print a fatal error message. @end deftypefn +@hook TARGET_FLAGS_REGNUM + @node Scalar Return @subsection How Scalar Function Values Are Returned @cindex return values in registers diff --git a/gcc/opts.c b/gcc/opts.c index 42d53f4..7001122 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -456,6 +456,7 @@ static const struct default_options default_options_table[] = { OPT_LEVELS_1_PLUS, OPT_ftree_sink, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 }, + { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 }, /* -O2 optimizations. */ { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 }, diff --git a/gcc/passes.c b/gcc/passes.c index 804ac9f..5aba987 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -1021,6 +1021,7 @@ init_optimization_passes (void) NEXT_PASS (pass_gcse2); NEXT_PASS (pass_split_after_reload); NEXT_PASS (pass_implicit_zee); + NEXT_PASS (pass_compare_elim_after_reload); NEXT_PASS (pass_branch_target_load_optimize1); NEXT_PASS (pass_thread_prologue_and_epilogue); NEXT_PASS (pass_rtl_dse2); diff --git a/gcc/recog.h b/gcc/recog.h index 217c6e5..534d2c9 100644 --- a/gcc/recog.h +++ b/gcc/recog.h @@ -18,6 +18,9 @@ You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ +#ifndef GCC_RECOG_H +#define GCC_RECOG_H + /* Random number that should be large enough for all purposes. */ #define MAX_RECOG_ALTERNATIVES 30 @@ -305,3 +308,5 @@ struct insn_data_d extern const struct insn_data_d insn_data[]; extern int peep2_current_count; + +#endif /* GCC_RECOG_H */ diff --git a/gcc/target.def b/gcc/target.def index 9d96e65..29d2a51 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1627,6 +1627,15 @@ DEFHOOK bool, (enum machine_mode mode), hook_bool_mode_false) +/* Register number for a flags register. Only needs to be defined if the + target is constrainted to use post-reload comparison elimination. */ +DEFHOOKPOD +(flags_regnum, + "If the target has a dedicated flags register, and it needs to use the\ + post-reload comparison elimination pass, then this value should be set\ + appropriately.", + unsigned int, INVALID_REGNUM) + /* Compute a (partial) cost for rtx X. Return true if the complete cost has been computed, and false if subexpressions should be scanned. In either case, *TOTAL contains the cost result. */ diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 32d8f40..dd82288 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -551,6 +551,7 @@ extern struct rtl_opt_pass pass_reorder_blocks; extern struct rtl_opt_pass pass_branch_target_load_optimize2; extern struct rtl_opt_pass pass_leaf_regs; extern struct rtl_opt_pass pass_split_before_sched2; +extern struct rtl_opt_pass pass_compare_elim_after_reload; extern struct rtl_opt_pass pass_sched2; extern struct rtl_opt_pass pass_stack_regs; extern struct rtl_opt_pass pass_stack_regs_run;
From: Richard Henderson <rth@twiddle.net> Handles arithemetic redundant with compare, as well as compare redundant with compare. The later happens very often with the use of double-word arithmetic. --- gcc/Makefile.in | 4 + gcc/common.opt | 4 + gcc/compare-elim.c | 519 ++++++++++++++++++++++++++++++++++++++++++ gcc/config/mn10300/mn10300.c | 3 + gcc/doc/invoke.texi | 15 ++- gcc/doc/tm.texi | 4 + gcc/doc/tm.texi.in | 2 + gcc/opts.c | 1 + gcc/passes.c | 1 + gcc/recog.h | 5 + gcc/target.def | 9 + gcc/tree-pass.h | 1 + 12 files changed, 567 insertions(+), 1 deletions(-) create mode 100644 gcc/compare-elim.c