diff mbox

[28/28] New -fcompare-elim pass.

Message ID 1294691517-19580-29-git-send-email-rth@redhat.com
State New
Headers show

Commit Message

Richard Henderson Jan. 10, 2011, 8:31 p.m. UTC
From: Richard Henderson <rth@twiddle.net>

Handles arithemetic redundant with compare, as well as compare
redundant with compare.  The later happens very often with the
use of double-word arithmetic.
---
 gcc/Makefile.in              |    4 +
 gcc/common.opt               |    4 +
 gcc/compare-elim.c           |  519 ++++++++++++++++++++++++++++++++++++++++++
 gcc/config/mn10300/mn10300.c |    3 +
 gcc/doc/invoke.texi          |   15 ++-
 gcc/doc/tm.texi              |    4 +
 gcc/doc/tm.texi.in           |    2 +
 gcc/opts.c                   |    1 +
 gcc/passes.c                 |    1 +
 gcc/recog.h                  |    5 +
 gcc/target.def               |    9 +
 gcc/tree-pass.h              |    1 +
 12 files changed, 567 insertions(+), 1 deletions(-)
 create mode 100644 gcc/compare-elim.c

Comments

Nathan Froyd Jan. 11, 2011, 6:30 p.m. UTC | #1
On Mon, Jan 10, 2011 at 12:31:57PM -0800, Richard Henderson wrote:
> +static unsigned int
> +execute_compare_elim_after_reload (void)
> +{
> +      /* Eliminate comparisons that are redundant with flags computation.  */
> +      for (i = 0; VEC_iterate (comparison_struct_p, all_compares, i, cmp); ++i)

Please use FOR_EACH_VEC_ELT here.

-Nathan
Richard Henderson Jan. 11, 2011, 7 p.m. UTC | #2
On 01/11/2011 10:30 AM, Nathan Froyd wrote:
> On Mon, Jan 10, 2011 at 12:31:57PM -0800, Richard Henderson wrote:
>> +static unsigned int
>> +execute_compare_elim_after_reload (void)
>> +{
>> +      /* Eliminate comparisons that are redundant with flags computation.  */
>> +      for (i = 0; VEC_iterate (comparison_struct_p, all_compares, i, cmp); ++i)
> 
> Please use FOR_EACH_VEC_ELT here.

Fixed locally, thanks.


r~
Paolo Bonzini Jan. 12, 2011, 8:31 a.m. UTC | #3
> +  /* Find the DEF of the flags register.  It must be there.  */
> +  for (use_rec = DF_INSN_DEFS (insn); ; use_rec++)
> +    {
> +      use = *use_rec;
> +      if (DF_REF_TYPE (use) == DF_REF_REG_DEF
> +	  && DF_REF_REGNO (use) == targetm.flags_regnum)
> +	break;
> +    }

Can you rename these to def/def_rec?  For a moment I thought you were 
not using def-use chains because I found only DF_REF_CHAIN (use).

(My observations below actually override this comment).

> +      /* Note that df does not create use chains that cross basic blocks.

I don't think this is correct, as this is the very thing that is 
responsible for the chains problem's potential quadratic behavior.  Have 
you seen it in practice (the chain that doesn't cross basic blocks, not 
the quadratic behavior)?

This is basically the only case in which you'd rely on non-singleton 
chains, and you're solving it by punting anyway (i.e. via missing_uses). 
  This means this pass could be done just as easily without def-use chains.

During the forward walk you can record the last definition of CC in the 
basic block (which could start at last_cmp for an extended basic block). 
  Then, when you walk each insn's uses and look for a CC use.  If you 
find it, you know what it's last definition is.  That is, 
find_flags_uses_in_bb becomes something like maybe_record_flags_use and 
you would call it for all instructions, passing the last CC def.

> +  if (DF_REF_CHAIN (use) == NULL)
> +    return false;
> +
> +  def = DF_REF_CHAIN (use)->ref;

Here you should probably bail out if the use has multiple reaching 
definitions (i.e. DF_REF_CHAIN (use)->next != NULL).  It probably won't 
happen given how cbranch/cstore splitters work, but you never know.

> +     Note that this doesn't follow the USE-DEF chain from X, but
> +     since we already have to search for the previous clobber of
> +     the flags register, this wouldn't really be a problem.  */
> +
> +  /* Make sure that the flags are not clobbered in between the two
> +     instructions.  Unfortunately, we don't get DEF-DEF chains, so
> +     do this the old fashioned way.  */

Again, this is probably handled better without the use-def chains. 
(Chains are in the DF framework, but are rarely the best 
solution---especially if you're not limiting them to a region, e.g. a loop).

For a full-blown solution, we could generalize the 
singleton-use-def-chains code from fwprop and reuse it here.  Actually, 
I don't think you plan to move instructions in compare-elim, just like 
NOTICE_UPDATE_CC didn't.  So, the forward scan can remember, together 
with the last comparison insn, the last instruction that clobbered the 
flags.  Then whenever you find a comparison, you can already look at the 
shape of the last clobber; you then store if the comparison instruction 
is "mergeable", and if so with which insn.

Non-mergeable comparisons would still be recorded for the sake of 
eliminating duplicates.

> +  /* Succeed if the new instruction is valid.  */
> +  validate_change (dinsn, &XVECEXP (dpat, 0, 1), x, true);
> +  if (!apply_change_group ())
> +    return false;

Here you can test the return value of validate_change directly.

Thanks very much for this work, it is a very nice and readable pass, and 
probably one of the biggest steps towards eradication of cc0.

HTH,

Paolo
Richard Henderson Jan. 12, 2011, 7:44 p.m. UTC | #4
On 01/12/2011 12:31 AM, Paolo Bonzini wrote:
>> +      /* Note that df does not create use chains that cross basic blocks.
> 
> I don't think this is correct, as this is the very thing that is
> responsible for the chains problem's potential quadratic behavior.
> Have you seen it in practice (the chain that doesn't cross basic
> blocks, not the quadratic behavior)?

No, I thought I'd correctly read the code to determine that.  Of course,
with the code that I'm currently generating from mn10300 we'll never see
a cross-block occurrence, but I expect other targets to easily do so.

For instance, in the RX target I expect that we'll generate e.g. UNLT
with an UNORDERED branch and a LT branch, and that we'll only generate
one compare out of the post-reload splitter to do so.

> During the forward walk you can record the last definition of CC in
> the basic block (which could start at last_cmp for an extended basic
> block). Then, when you walk each insn's uses and look for a CC use.
> If you find it, you know what it's last definition is. That is,
> find_flags_uses_in_bb becomes something like maybe_record_flags_use
> and you would call it for all instructions, passing the last CC def.

An interesting idea.

>> +  if (DF_REF_CHAIN (use) == NULL)
>> +    return false;
>> +
>> +  def = DF_REF_CHAIN (use)->ref;
> 
> Here you should probably bail out if the use has multiple reaching
> definitions (i.e. DF_REF_CHAIN (use)->next != NULL). It probably
> won't happen given how cbranch/cstore splitters work, but you never
> know.

Good idea, though this has nothing to do with cbranch/cstore.  This
is finding the DEF for the register use inside the compare.

>> +     Note that this doesn't follow the USE-DEF chain from X, but
>> +     since we already have to search for the previous clobber of
>> +     the flags register, this wouldn't really be a problem.  */
>> +
>> +  /* Make sure that the flags are not clobbered in between the two
>> +     instructions.  Unfortunately, we don't get DEF-DEF chains, so
>> +     do this the old fashioned way.  */
> 
> Again, this is probably handled better without the use-def chains.
> (Chains are in the DF framework, but are rarely the best
> solution---especially if you're not limiting them to a region, e.g. a
> loop).

Err, I'm not using use-def chains for this.  I'm using reg_set_between_p.
I'm a bit confused about your statement here.

>> +  /* Succeed if the new instruction is valid.  */
>> +  validate_change (dinsn, &XVECEXP (dpat, 0, 1), x, true);
>> +  if (!apply_change_group ())
>> +    return false;
> 
> Here you can test the return value of validate_change directly.

True enough.

I'm a bit confused about the stance in regards to chains.  Should I 
attempt to avoid them entirely, so that I never add the problem?

I guess I can implement try_eliminate_compare by:

  * In find_comparisons_in_bb, remember the previous insn that
    clobbers CC_REG.  Record that in struct comparison.prev_clobber.

  * In try_eliminate_compare, if prev_clobber is not set, then we
    don't know where CC_REG was previously set, so we cannot eliminate.

  * The SET in prev_clobber must be the comparison.in_a register.
    If it isn't, then we cannot eliminate.

  * Use reg_set_between_p to verify that comparison.in_a is not
    modified between prev_clobber and comparison.insn.

It's that last step that seems a bit... odd, but not wrong exactly.
But is it odd enough to warrant using chains?



r~
Paolo Bonzini Jan. 13, 2011, 8:34 a.m. UTC | #5
On 01/12/2011 08:44 PM, Richard Henderson wrote:
> On 01/12/2011 12:31 AM, Paolo Bonzini wrote:
>>> +      /* Note that df does not create use chains that cross basic blocks.
>>
>> I don't think this is correct, as this is the very thing that is
>> responsible for the chains problem's potential quadratic behavior.
>> Have you seen it in practice (the chain that doesn't cross basic
>> blocks, not the quadratic behavior)?
>
> No, I thought I'd correctly read the code to determine that.

Chains are based on the output of reaching definitions, they do not care 
about whether the defs are in the same basic block or not.

>>> +  if (DF_REF_CHAIN (use) == NULL)
>>> +    return false;
>>> +
>>> +  def = DF_REF_CHAIN (use)->ref;
>>
>> Here you should probably bail out if the use has multiple reaching
>> definitions (i.e. DF_REF_CHAIN (use)->next != NULL). It probably
>> won't happen given how cbranch/cstore splitters work, but you never
>> know.
>
> Good idea, though this has nothing to do with cbranch/cstore.  This
> is finding the DEF for the register use inside the compare.

Oops, right.

>>> +     Note that this doesn't follow the USE-DEF chain from X, but
>>> +     since we already have to search for the previous clobber of
>>> +     the flags register, this wouldn't really be a problem.  */
>>> +
>>> +  /* Make sure that the flags are not clobbered in between the two
>>> +     instructions.  Unfortunately, we don't get DEF-DEF chains, so
>>> +     do this the old fashioned way.  */
>>
>> Again, this is probably handled better without the use-def chains.
>> (Chains are in the DF framework, but are rarely the best
>> solution---especially if you're not limiting them to a region, e.g. a
>> loop).
>
> Err, I'm not using use-def chains for this.  I'm using reg_set_between_p.
> I'm a bit confused about your statement here.

You're right, what I meant was "since you are anyway resorting to the 
old fashioned way, you can probably handle everything better without the 
use-def chains".

> I'm a bit confused about the stance in regards to chains.  Should I
> attempt to avoid them entirely, so that I never add the problem?

It's an expensive problem when applied to the entire function, so it is 
better to avoid it.  I'm not saying this cannot be delayed to 4.7 if the 
pass goes in now (it's almost a target-specific pass now, so it probably 
could).

> I guess I can implement try_eliminate_compare by:
>
>    * In find_comparisons_in_bb, remember the previous insn that
>      clobbers CC_REG.  Record that in struct comparison.prev_clobber.
>
>    * In try_eliminate_compare, if prev_clobber is not set, then we
>      don't know where CC_REG was previously set, so we cannot eliminate.
>
>    * The SET in prev_clobber must be the comparison.in_a register.
>      If it isn't, then we cannot eliminate.
>
>    * Use reg_set_between_p to verify that comparison.in_a is not
>      modified between prev_clobber and comparison.insn.

Exactly.

> It's that last step that seems a bit... odd, but not wrong exactly.
> But is it odd enough to warrant using chains?

I'd say no because you're doing the same right now.  USE-DEF chains get 
you to a clobber, and then reg_set_between_p confirms whether that 
clobber is the prev_clobber.

If you really want to remove the reg_set_between_p, you can of course 
record the SET of prev_clobber, and reset prev_clobber if an insn sets 
that register.  But I think it's more clumsy and not enough more 
efficient than what you outlined.

Paolo
Richard Henderson Jan. 17, 2011, 10:26 p.m. UTC | #6
Version 2, based on Paolo's comments from v1.  Changes:

(0) The USE-DEF and DEF-USE chains are gone.

(1) A PREV_CLOBBER insn is recorded during the single scan
    of the function.  This insn is the insn previous to the
    compare which (A) clobbers the flags and (B) is of a
    shape amenable to compare elimination.

    If the insn previous to the compare that clobbers the
    flags is not amenable to elimination -- e.g. a call --
    we don't bother recording it, and later know that we
    shouldn't try.

(2) Uses of the compare are also found during that single scan.
    Some, ahem, highly questionable assumptions that had been
    made about how DEF-USE chains are or are not created are gone.
    I check the DF_LIVE_BB_INFO now, combined with the shape of
    the CFG to determine if there's a use of the flags that 
    escapes the extended basic block.

(3) try_eliminate_compare has been rewritten along the lines we
    discussed previously, although instead of reg_set_between_p...

(4) Porting the RX target to the compare-elimination pass was
    very helpful.  If showed that there's a phase ordering problem,
    that fortunately can be worked around with a little bit of effort.

    Consider libgcc _absvdi3.o, as of split2:

(insn 76 75 74 4 (parallel [
            (set (reg:SI 4 r4 [38])
                (minus:SI (minus:SI (reg:SI 4 r4 [38])
                        (reg:SI 2 r2 [orig:37 a+4 ] [37]))
                    (geu:SI (reg:CC 16 cc)
                        (const_int 0 [0]))))
            (clobber (reg:CC 16 cc))
        ]) libgcc2.c:265 76 {sbb_internal}
     (nil))

(insn 74 76 77 4 (set (reg:SI 2 r2 [orig:33 w+4 ] [33])
        (reg:SI 4 r4 [38])) libgcc2.c:265 22 {*movsi_internal}
     (nil))

(insn 77 74 78 4 (set (reg:CC 16 cc)
        (compare:CC (reg:SI 2 r2 [orig:33 w+4 ] [33])
            (const_int 0 [0]))) libgcc2.c:267 1 {*cmpsi}
     (nil))

    For RX, subtract-with-borrow (sbb) is a two-address insn, and
    so RA has little choice but to add an output reload (insn 74)
    in order to satisfy the matching constraint and put the result
    into the proper register.

    If we were to run this pass after pass_cprop_hardreg, this
    sequence would be tidied up such that the compare (insn 77)
    would have R4 as an input, and so we would immediately see the
    corresponding arithmetic (insn 76) and perform the elimination.

    However, pass_cprop_hardreg currently runs much later.  In
    particular, after pass_peephole2.  Now, peep2 is where we would
    like to make transformations like "mov 0,r" -> "xor r,r" iff
    the flags are not already live at that location.  In order to
    satisfy both, we'd have to shuffle lots of passes around with
    the attendant problem of determining if we've pessimized 
    something else.

    Instead, simply look for copy insns while we're stepping back
    looking to be sure that the input to the compare isn't 
    changed since PREV_CLOBBER.


Comments?


r~
/* Post-reload compare elimination.
   Copyright (C) 2010, 2011
   Free Software Foundation, Inc.

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.

GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>.  */

/* There is a set of targets whose general-purpose move or addition
   instructions clobber the flags.  These targets cannot split their
   CBRANCH/CSTORE etc patterns before reload is complete, lest reload
   itself insert instructions in between the flags setter and user.
   Because these targets cannot split the compare from the use, they
   cannot make use of the comparison elimination offered by the combine pass.

   This is a small pass intended to provide comparison elimination similar to
   what is available via NOTICE_UPDATE_CC for cc0 targets.  This should help
   encourage cc0 targets to convert to an explicit post-reload representation
   of the flags.

   This pass assumes:

   (0) CBRANCH/CSTORE etc have been split in pass_split_after_reload.

   (1) All comparison patterns are represented as

	[(set (reg:CC) (compare:CC (reg) (immediate)))]

   (2) All insn patterns that modify the flags are represented as

	[(set (reg) (operation)
	 (clobber (reg:CC))]

   (3) If an insn of form (2) can usefully set the flags, there is
       another pattern of the form

	[(set (reg) (operation)
	 (set (reg:CCM) (compare:CCM (operation) (immediate)))]

       The mode CCM will be chosen as if by SELECT_CC_MODE.
*/

#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tm.h"
#include "rtl.h"
#include "tm_p.h"
#include "insn-config.h"
#include "recog.h"
#include "flags.h"
#include "basic-block.h"
#include "tree-pass.h"
#include "target.h"
#include "df.h"
#include "domwalk.h"


/* These structures describe a comparison and how it is used.  */

/* The choice of maximum 3 uses comes from wanting to eliminate the two
   duplicate compares from a three-way branch on the sign of a value.
   This is also sufficient to eliminate the duplicate compare against the
   high-part of a double-word comparison.  */
#define MAX_CMP_USE 3

struct comparison_use
{
  /* The instruction in which the result of the compare is used.  */
  rtx insn;
  /* The location of the flags register within the use.  */
  rtx *loc;
  /* The comparison code applied against the flags register.  */
  enum rtx_code code;
};

struct comparison
{
  /* The comparison instruction.  */
  rtx insn;

  /* The insn prior to the comparison insn that clobbers the flags.  */
  rtx prev_clobber;

  /* The two values being compared.  These will be either REGs or
     constants.  */
  rtx in_a, in_b;

  /* Information about how this comparison is used.  */
  struct comparison_use uses[MAX_CMP_USE];

  /* The original CC_MODE for this comparison.  */
  enum machine_mode orig_mode;

  /* The number of uses identified for this comparison.  */
  unsigned short n_uses;

  /* True if not all uses of this comparison have been identified.
     This can happen either for overflowing the array above, or if
     the flags register is used in some unusual context.  */
  bool missing_uses;

  /* True if its inputs are still valid at the end of the block.  */
  bool inputs_valid;
};
  
typedef struct comparison *comparison_struct_p;
DEF_VEC_P(comparison_struct_p);
DEF_VEC_ALLOC_P(comparison_struct_p, heap);

static VEC(comparison_struct_p, heap) *all_compares;

/* Look for a "conforming" comparison, as defined above.  If valid, return
   the rtx for the COMPARE itself.  */

static rtx
conforming_compare (rtx insn)
{
  rtx set, src, dest;

  set = single_set (insn);
  if (set == NULL)
    return NULL;

  src = SET_SRC (set);
  if (GET_CODE (src) != COMPARE)
    return NULL;

  dest = SET_DEST (set);
  if (!REG_P (dest) || REGNO (dest) != targetm.flags_regnum)
    return NULL;

  if (REG_P (XEXP (src, 0))
      && REG_P (XEXP (src, 0))
      && (REG_P (XEXP (src, 1)) || CONSTANT_P (XEXP (src, 1))))
    return src;

  return NULL;
}

/* Look for a pattern of the "correct" form for an insn with a flags clobber
   for which we may be able to eliminate a compare later.  We're not looking
   to validate any inputs at this time, merely see that the basic shape is
   correct.  The term "arithmetic" may be somewhay misleading...  */

static bool
arithmetic_flags_clobber_p (rtx insn)
{
  rtx pat, x;

  if (!NONJUMP_INSN_P (insn))
    return false;
  pat = PATTERN (insn);
  if (extract_asm_operands (pat))
    return false;

  if (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) == 2)
    {
      x = XVECEXP (pat, 0, 0);
      if (GET_CODE (x) != SET)
	return false;
      x = SET_DEST (x);
      if (!REG_P (x))
	return false;

      x = XVECEXP (pat, 0, 1);
      if (GET_CODE (x) == CLOBBER)
	{
	  x = XEXP (x, 0);
	  if (REG_P (x) && REGNO (x) == targetm.flags_regnum)
	    return true;
	}
    }

  return false;
}

/* Look for uses of FLAGS in INSN.  If we find one we can analyze, record
   it in CMP; otherwise indicate that we've missed a use.  */

static void
find_flags_uses_in_insn (struct comparison *cmp, rtx insn)
{
  df_ref *use_rec, use;

  /* If we've already lost track of uses, don't bother collecting more.  */
  if (cmp->missing_uses)
    return;

  /* Find a USE of the flags register.  */
  for (use_rec = DF_INSN_USES (insn); (use = *use_rec) != NULL; use_rec++)
    if (DF_REF_REGNO (use) == targetm.flags_regnum)
      {
	rtx x, *loc;

	/* If this is an unusual use, quit.  */
	if (DF_REF_TYPE (use) != DF_REF_REG_USE)
	  goto fail;

	/* If we've run out of slots to record uses, quit.  */
	if (cmp->n_uses == MAX_CMP_USE)
	  goto fail;

	/* Unfortunately, the location of the flags register, while
	   present in the reference structure, doesn't help.  We need
	   to find the comparison code that is outer to the actual
	   flags use.  */
	loc = DF_REF_LOC (use);
	x = PATTERN (insn);
	if (GET_CODE (x) == PARALLEL)
	  x = XVECEXP (x, 0, 0);
	x = SET_SRC (x);
	if (GET_CODE (x) == IF_THEN_ELSE)
	  x = XEXP (x, 0);
	if (COMPARISON_P (x)
	    && loc == &XEXP (x, 0)
	    && XEXP (x, 1) == const0_rtx)
	  {
	    /* We've found a use of the flags that we understand.  */
	    struct comparison_use *cuse = &cmp->uses[cmp->n_uses++];
	    cuse->insn = insn;
	    cuse->loc = loc;
	    cuse->code = GET_CODE (x);
	  }
	else
	  goto fail;
      }
  return;

 fail:
  /* We failed to recognize this use of the flags register.  */
  cmp->missing_uses = true;
}

/* Identify comparison instructions within BB.  If the last compare in the BB
   is valid at the end of the block, install it in BB->AUX.  Called via 
   walk_dominators_tree.  */

static void
find_comparisons_in_bb (struct dom_walk_data *data ATTRIBUTE_UNUSED,
			basic_block bb)
{
  struct comparison *last_cmp;
  rtx insn, next, last_clobber;
  bool last_cmp_valid;
  bitmap killed;

  killed = BITMAP_ALLOC (NULL);

  /* The last comparison that was made.  Will be reset to NULL
     once the flags are clobbered.  */
  last_cmp = NULL;

  /* True iff the last comparison has not been clobbered, nor
     have its inputs.  Used to eliminate duplicate compares.  */
  last_cmp_valid = false;

  /* The last insn that clobbered the flags, if that insn is of
     a form that may be valid for eliminating a following compare.
     To be reset to NULL once the flags are set otherwise.  */
  last_clobber = NULL;

  /* Propagate the last live comparison throughout the extended basic block.  */
  if (single_pred_p (bb))
    {
      last_cmp = (struct comparison *) single_pred (bb)->aux;
      if (last_cmp)
	last_cmp_valid = last_cmp->inputs_valid;
    }

  for (insn = BB_HEAD (bb); insn; insn = next)
    {
      rtx src;

      next = (insn == BB_END (bb) ? NULL_RTX : NEXT_INSN (insn));
      if (!NONDEBUG_INSN_P (insn))
	continue;

      src = conforming_compare (insn);
      if (src)
	{
	  /* Eliminate a compare that's redundant with the previous.  */
	  if (last_cmp_valid
	      && rtx_equal_p (last_cmp->in_a, XEXP (src, 0))
	      && rtx_equal_p (last_cmp->in_b, XEXP (src, 1)))
	    {
	      delete_insn (insn);
	      continue;
	    }

          last_cmp = XCNEW (struct comparison);
	  last_cmp->insn = insn;
	  last_cmp->prev_clobber = last_clobber;
	  last_cmp->in_a = XEXP (src, 0);
	  last_cmp->in_b = XEXP (src, 1);
	  last_cmp->orig_mode = GET_MODE (SET_DEST (single_set (insn)));
	  VEC_safe_push (comparison_struct_p, heap, all_compares, last_cmp);

	  /* It's unusual, but be prepared for comparison patterns that
	     also clobber an input, or perhaps a scratch.  */
	  last_clobber = NULL;
	  last_cmp_valid = true;
	  bitmap_clear (killed);
	  df_simulate_find_defs (insn, killed);
	}
      else
	{
	  /* Notice if this instruction kills the flags register.  */
	  bitmap_clear (killed);
	  df_simulate_find_defs (insn, killed);
	  if (bitmap_bit_p (killed, targetm.flags_regnum))
	    {
	      /* See if this insn could be the "clobber" that eliminates
		 a future comparison.   */
	      if (arithmetic_flags_clobber_p (insn))
		last_clobber = insn;
	      else
		last_clobber = NULL;

	      /* In either case, the previous compare is no longer valid.  */
	      last_cmp = NULL;
	      last_cmp_valid = false;
	      continue;
	    }

	  /* Notice if this insn uses the flags register.  */
	  if (last_cmp)
	    find_flags_uses_in_insn (last_cmp, insn);
	}

      /* Notice if any of the inputs to the comparison have changed.  */
      if (last_cmp
	  && (bitmap_bit_p (killed, REGNO (last_cmp->in_a))
	      || (REG_P (last_cmp->in_b)
		  && bitmap_bit_p (killed, REGNO (last_cmp->in_b)))))
	last_cmp_valid = false;
    }

  BITMAP_FREE (killed);

  /* Remember the live comparison for subsequent members of
     the extended basic block.  */
  if (last_cmp)
    {
      bb->aux = last_cmp;
      last_cmp->inputs_valid = last_cmp_valid;

      /* Look to see if the flags register is live outgoing here, and
	 incoming to any successor not part of the extended basic block.  */
      if (bitmap_bit_p (&DF_LIVE_BB_INFO (bb)->out, targetm.flags_regnum))
	{
	  edge e;
	  edge_iterator ei;

	  FOR_EACH_EDGE (e, ei, bb->succs)
	    {
	      basic_block dest = e->dest;
	      if (bitmap_bit_p (&DF_LIVE_BB_INFO (dest)->in,
				targetm.flags_regnum)
		  && !single_pred_p (dest))
		{
		  last_cmp->missing_uses = true;
		  break;
		}
	    }
	}
    }
}

/* Find all comparisons in the function.  */

static void
find_comparisons (void)
{
  struct dom_walk_data data;

  memset (&data, 0, sizeof(data));
  data.dom_direction = CDI_DOMINATORS;
  data.before_dom_children = find_comparisons_in_bb;

  calculate_dominance_info (CDI_DOMINATORS);

  init_walk_dominator_tree (&data);
  walk_dominator_tree (&data, ENTRY_BLOCK_PTR);
  fini_walk_dominator_tree (&data);

  clear_aux_for_blocks ();
  free_dominance_info (CDI_DOMINATORS);
}

/* Select an alternate CC_MODE for a comparison insn comparing A and B.
   Note that inputs are almost certainly different than the IN_A and IN_B
   stored in CMP -- we're called while attempting to eliminate the compare
   after all.  Return the new FLAGS rtx if successful, else return NULL.  */

static rtx
maybe_select_cc_mode (struct comparison *cmp, rtx a, rtx b)
{
  enum machine_mode sel_mode;
  const int n = cmp->n_uses;
  rtx flags = NULL;

#ifndef SELECT_CC_MODE
  /* Minimize code differences when this target macro is undefined.  */
  return NULL;
#define SELECT_CC_MODE(A,B,C) (gcc_unreachable (), VOIDmode)
#endif

  /* If we don't have access to all of the uses, we can't validate.  */
  if (cmp->missing_uses || n == 0)
    return NULL;

  /* Find a new mode that works for all of the uses.  Special case the
     common case of exactly one use.  */
  if (n == 1)
    {
      sel_mode = SELECT_CC_MODE (cmp->uses[0].code, a, b);
      if (sel_mode != cmp->orig_mode)
	{
	  flags = gen_rtx_REG (sel_mode, targetm.flags_regnum);
	  validate_change (cmp->uses[0].insn, cmp->uses[0].loc, flags, true);
	}
    }
  else
    {
      int i;

      sel_mode = SELECT_CC_MODE (cmp->uses[0].code, a, b);
      for (i = 1; i < n; ++i)
	{
	  enum machine_mode new_mode;
	  new_mode = SELECT_CC_MODE (cmp->uses[i].code, a, b);
	  if (new_mode != sel_mode)
	    {
	      sel_mode = targetm.cc_modes_compatible (sel_mode, new_mode);
	      if (sel_mode == VOIDmode)
		return NULL;
	    }
	}
      
      if (sel_mode != cmp->orig_mode)
	{
	  flags = gen_rtx_REG (sel_mode, targetm.flags_regnum);
	  for (i = 0; i < n; ++i)
	    validate_change (cmp->uses[i].insn, cmp->uses[i].loc, flags, true);
	}
    }

  return flags;
}

/* Attempt to find an instruction prior to CMP that can be used to compute the
   same flags value as the comparison itself.  Return true if successful.  */

static bool
try_eliminate_compare (struct comparison *cmp)
{
  rtx x, insn, bb_head, flags, in_a, cmp_src;

  /* We must have found an interesting "clobber" preceeding the compare.  */
  if (cmp->prev_clobber == NULL)
    return false;

  /* ??? For the moment we don't handle comparisons for which IN_B
     is a register.  We accepted these during initial comparison 
     recognition in order to eliminate duplicate compares.
     An improvement here would be to handle x = a - b; if (a < b).  */
  if (!CONSTANT_P (cmp->in_b))
    return false;

  /* Verify that PREV_CLOBBER defines IN_A, and that IN_A is not clobbered
     in between.  Given that this target requires this pass, we can assume
     that most insns do clobber the flags, and so the distance between the
     compare and the clobber is likely to be small.  */
  /* ??? This is one point at which one could argue that DF_REF_CHAIN would
     be useful, but it is thought to be too heavy-weight a solution here.  */

  in_a = cmp->in_a;
  insn = cmp->insn;
  bb_head = BB_HEAD (BLOCK_FOR_INSN (insn));
  for (insn = PREV_INSN (insn);
       insn != cmp->prev_clobber;
       insn = PREV_INSN (insn))
    {
      const int abnormal_flags
	= (DF_REF_CONDITIONAL | DF_REF_PARTIAL | DF_REF_MAY_CLOBBER
	   | DF_REF_MUST_CLOBBER | DF_REF_SIGN_EXTRACT
	   | DF_REF_ZERO_EXTRACT | DF_REF_STRICT_LOW_PART
	   | DF_REF_PRE_POST_MODIFY);

      df_ref *def_rec, def;

      /* Note that the BB_HEAD is always either a note or a label, but in
	 any case it means that IN_A is defined outside the block.  */
      if (insn == bb_head)
	return false;
      if (NOTE_P (insn) || DEBUG_INSN_P (insn))
	continue;

      /* Find a possible def of IN_A in INSN.  */
      for (def_rec = DF_INSN_DEFS (insn); (def = *def_rec) != NULL; def_rec++)
	if (DF_REF_REGNO (def) == REGNO (in_a))
	  break;

      /* No definitions of IN_A; continue searching.  */
      if (def == NULL)
	continue;

      /* Bail if this is not a totally normal set of IN_A.  */
      if (DF_REF_IS_ARTIFICIAL (def))
	return false;
      if (DF_REF_FLAGS (def) & abnormal_flags)
	return false;

      /* We've found an insn between the compare and the clobber that sets
	 IN_A.  Given that pass_cprop_hardreg has not yet run, we still find
	 situations in which we can usefully look through a copy insn.  */
      x = single_set (insn);
      if (x == NULL)
	return false;
      in_a = SET_SRC (x);
      if (!REG_P (in_a))
	return false;
    }

  /* We've reached PREV_CLOBBER without finding a modification of IN_A.
     Validate that PREV_CLOBBER itself does in fact refer to IN_A.  Do
     recall that we've already validated the shape of PREV_CLOBBER.  */
  x = XVECEXP (PATTERN (insn), 0, 0);
  if (!rtx_equal_p (SET_DEST (x), in_a))
    return false;
  cmp_src = SET_SRC (x);
  
  /* Determine if we ought to use a different CC_MODE here.  */
  flags = maybe_select_cc_mode (cmp, cmp_src, cmp->in_b);
  if (flags == NULL)
    flags = gen_rtx_REG (cmp->orig_mode, targetm.flags_regnum);

  /* Generate a new comparison for installation in the setter.  */
  x = copy_rtx (cmp_src);
  x = gen_rtx_COMPARE (GET_MODE (flags), x, cmp->in_b);
  x = gen_rtx_SET (VOIDmode, flags, x);

  /* Succeed if the new instruction is valid.  */
  validate_change (insn, &XVECEXP (PATTERN (insn), 0, 1), x, true);
  if (!apply_change_group ())
    return false;
 
  /* Success.  Delete the compare insn...  */
  delete_insn (cmp->insn);

  /* ... and any notes that are now irrelevant due to multi-stores. */
  x = find_regno_note (insn, REG_UNUSED, targetm.flags_regnum);
  if (x)
    remove_note (insn, x);
  x = find_reg_note (insn, REG_EQUAL, NULL);
  if (x)
    remove_note (insn, x);
  x = find_reg_note (insn, REG_EQUIV, NULL);
  if (x)
    remove_note (insn, x);

  return true;
}

/* Main entry point to the pass.  */

static unsigned int
execute_compare_elim_after_reload (void)
{
  df_set_flags (DF_DEFER_INSN_RESCAN);
  df_live_add_problem ();
  df_analyze ();

  gcc_checking_assert (all_compares == NULL);

  /* Locate all comparisons and their uses, and eliminate duplicates.  */
  find_comparisons ();
  if (all_compares)
    {
      struct comparison *cmp;
      size_t i;

      /* Eliminate comparisons that are redundant with flags computation.  */
      FOR_EACH_VEC_ELT (comparison_struct_p, all_compares, i, cmp)
	{
	  try_eliminate_compare (cmp);
	  XDELETE (cmp);
	}

      VEC_free (comparison_struct_p, heap, all_compares);
      all_compares = NULL;

      df_analyze ();
    }

  return 0;
}

static bool
gate_compare_elim_after_reload (void)
{
  return (flag_compare_elim_after_reload
	  && targetm.flags_regnum != INVALID_REGNUM);
}

struct rtl_opt_pass pass_compare_elim_after_reload =
{
 {
  RTL_PASS,
  "cmpelim",				/* name */
  gate_compare_elim_after_reload,	/* gate */
  execute_compare_elim_after_reload,	/* execute */
  NULL,					/* sub */
  NULL,					/* next */
  0,					/* static_pass_number */
  TV_NONE,				/* tv_id */
  0,					/* properties_required */
  0,					/* properties_provided */
  0,					/* properties_destroyed */
  0,					/* todo_flags_start */
  TODO_df_finish
  | TODO_df_verify
  | TODO_verify_rtl_sharing
  | TODO_dump_func
  | TODO_ggc_collect			/* todo_flags_finish */
 }
};
Paolo Bonzini Jan. 18, 2011, 8:45 a.m. UTC | #7
On 01/17/2011 11:26 PM, Richard Henderson wrote:
> Version 2, based on Paolo's comments from v1.
>
> Comments?

Just a few nits:

1) a comment above maybe_select_cc_mode and above its call, stating that 
the function starts a change group, would be nice.  I missed this in the 
previous review.  Given the very nice comments in the code, it would be 
a nice touch.

2) Here:

>       src = conforming_compare (insn);
>       if (src)
> 	{
> 	  ...
> 	  bitmap_clear (killed);
> 	  df_simulate_find_defs (insn, killed);
> 	}
>       else
> 	{
> 	  /* Notice if this instruction kills the flags register.  */
> 	  bitmap_clear (killed);
> 	  df_simulate_find_defs (insn, killed);

The bitmap_clear/df_simulate_find_defs can be hoisted above the if.  The 
only case where we don't use the result is when we delete a redundant 
compare, and it's definitely not the common case.  This would make it 
clearer that KILLED is not used for any kind of real "simulation", but 
only as a quick way to query the array of DEFs for the presence of a 
register.

3) Just to take back that clock cycle lost in the above change, here:

>       if (last_cmp
> 	  && (bitmap_bit_p (killed, REGNO (last_cmp->in_a))
> 	      || (REG_P (last_cmp->in_b)
> 		  && bitmap_bit_p (killed, REGNO (last_cmp->in_b)))))
> 	last_cmp_valid = false;

You can change LAST_CMP to LAST_CMP_VALID.  LAST_CMP will never be NULL 
as long as LAST_CMP_VALID is true, but you may save some bitmap accesses.

4) Not a task for now, but perhaps some DF functions could be provided 
also in a form that operates on HARD_REG_SETs for improved performance. 
  Just a reminder to myself.


Besides this, "seems to be perfect".

Paolo
diff mbox

Patch

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index de3bde9..f421d5a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1205,6 +1205,7 @@  OBJS-common = \
 	cfgrtl.o \
 	combine.o \
 	combine-stack-adj.o \
+	compare-elim.o \
 	convert.o \
 	coverage.o \
 	cse.o \
@@ -3352,6 +3353,9 @@  combine-stack-adj.o : combine-stack-adj.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(RTL_H) insn-config.h $(TIMEVAR_H) $(TREE_PASS_H) \
    $(RECOG_H) output.h $(REGS_H) hard-reg-set.h $(FLAGS_H) $(FUNCTION_H) \
    $(EXPR_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_CORE_H) $(TM_P_H) $(DF_H) $(EXCEPT_H) reload.h
+compare-elim.o : compare-elim.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+   $(TM_H) $(RTL_H) $(TM_P_H) $(RECOG_H) $(FLAGS_H) $(BASIC_BLOCK_H) \
+   $(TREE_PASS_H) $(DF_H)
 ddg.o : ddg.c $(DDG_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) \
    $(DIAGNOSTIC_CORE_H) $(RTL_H) $(TM_P_H) $(REGS_H) $(FUNCTION_H) \
    $(FLAGS_H) insn-config.h $(INSN_ATTR_H) $(EXCEPT_H) $(RECOG_H) \
diff --git a/gcc/common.opt b/gcc/common.opt
index 32df6fc..5b01363 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -853,6 +853,10 @@  fcompare-debug-second
 Common Driver RejectNegative Var(flag_compare_debug)
 Run only the second compilation of -fcompare-debug
 
+fcompare-elim
+Common Report Var(flag_compare_elim_after_reload) Optimization
+Perform comparison elimination after register allocation has finished
+
 fconserve-stack
 Common Var(flag_conserve_stack) Optimization
 Do not perform optimizations increasing noticeably stack usage
diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
new file mode 100644
index 0000000..a1d91cc
--- /dev/null
+++ b/gcc/compare-elim.c
@@ -0,0 +1,519 @@ 
+/* Post-reload compare elimination.
+   Copyright (C) 2010, 2011
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* There is a set of targets whose general-purpose move or addition
+   instructions clobber the flags.  These targets cannot split their
+   CBRANCH/CSTORE etc patterns before reload is complete, lest reload
+   itself insert instructions in between the flags setter and user.
+   Because these targets cannot split the compare from the use, they
+   cannot make use of the comparison elimination offered by the combine pass.
+
+   This is a small pass intended to provide comparison elimination similar to
+   what is available via NOTICE_UPDATE_CC for cc0 targets.  This should help
+   encourage cc0 targets to convert to an explicit post-reload representation
+   of the flags.
+
+   This pass assumes:
+
+   (0) CBRANCH/CSTORE etc have been split in pass_split_after_reload.
+
+   (1) All comparison patterns are represented as
+
+	[(set (reg:CC) (compare:CC (reg) (immediate)]
+
+   (2) All insn patterns that modify the flags are represented as
+
+	[(set (reg) (operation)
+	 (clobber (reg:CC))]
+
+   (3) If an insn of form (2) can usefully set the flags, there is
+       another pattern of the form
+
+	[(set (reg) (operation)
+	 (set (reg:CCM) (compare:CCM (operation) (immediate)))]
+
+       The mode CCM will be chosen as if by SELECT_CC_MODE.
+*/
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "flags.h"
+#include "basic-block.h"
+#include "tree-pass.h"
+#include "target.h"
+#include "df.h"
+#include "domwalk.h"
+
+
+/* These structures describe a comparison and how it is used.  */
+
+/* The choice of maximum 3 uses comes from wanting to eliminate the two
+   duplicate compares from a three-way branch on the sign of a value.
+   This is also sufficient to eliminate the duplicate compare against the
+   high-part of a double-word comparison.  */
+#define MAX_CMP_USE 3
+
+struct comparison_use
+{
+  /* The instruction in which the result of the compare is used.  */
+  rtx insn;
+  /* The location of the flags register within the use.  */
+  rtx *loc;
+  /* The comparison code applied against the flags register.  */
+  enum rtx_code code;
+};
+
+struct comparison
+{
+  /* The comparison instruction.  */
+  rtx insn;
+
+  /* The two values being compared.  These will be either REGs or
+     constants.  */
+  rtx in_a, in_b;
+
+  /* Information about how this comparison is used.  */
+  struct comparison_use uses[MAX_CMP_USE];
+
+  /* The original CC_MODE for this comparison.  */
+  enum machine_mode orig_mode;
+
+  /* The number of uses identified for this comparison.  */
+  unsigned short n_uses;
+
+  /* True if not all uses of this comparison have been identified.
+     This can happen either for overflowing the array above, or if
+     the flags register is used in some unusual context.  */
+  bool missing_uses;
+};
+  
+typedef struct comparison *comparison_struct_p;
+DEF_VEC_P(comparison_struct_p);
+DEF_VEC_ALLOC_P(comparison_struct_p, heap);
+
+static VEC(comparison_struct_p, heap) *all_compares;
+
+static void
+find_flags_uses_in_bb (struct comparison *cmp,
+		       basic_block bb ATTRIBUTE_UNUSED, rtx insn)
+{
+  df_ref *use_rec, use;
+  struct df_link *chain;
+
+  /* If we've already lost track of uses, don't bother collecting more.  */
+  if (cmp->missing_uses)
+    return;
+
+  /* Find the DEF of the flags register.  It must be there.  */
+  for (use_rec = DF_INSN_DEFS (insn); ; use_rec++)
+    {
+      use = *use_rec;
+      if (DF_REF_TYPE (use) == DF_REF_REG_DEF
+	  && DF_REF_REGNO (use) == targetm.flags_regnum)
+	break;
+    }
+
+  for (chain = DF_REF_CHAIN (use); chain ; chain = chain->next)
+    {
+      rtx x, uinsn, *uloc;
+
+      /* If we've run out of slots to record uses, quit.  */
+      if (cmp->n_uses == MAX_CMP_USE)
+	{
+	  cmp->missing_uses = true;
+	  return;
+	}
+
+      /* Unfortunately, the location of the flags register, while present in
+	 the reference structure, doesn't help.  We need to find the comparison
+	 code that is outer to the actual flags use.  */
+      uinsn = DF_REF_INSN (chain->ref);
+      uloc = DF_REF_LOC (chain->ref);
+      x = PATTERN (uinsn);
+      if (GET_CODE (x) == PARALLEL)
+	x = XVECEXP (x, 0, 0);
+      x = SET_SRC (x);
+      if (GET_CODE (x) == IF_THEN_ELSE)
+	x = XEXP (x, 0);
+      if (COMPARISON_P (x)
+	  && uloc == &XEXP (x, 0)
+	  && XEXP (x, 1) == const0_rtx)
+	{
+	  struct comparison_use *cuse = &cmp->uses[cmp->n_uses++];
+	  cuse->insn = uinsn;
+	  cuse->loc = uloc;
+	  cuse->code = GET_CODE (x);
+	}
+      else
+	{
+	  /* We failed to recognize this use of the flags register.  */
+	  cmp->missing_uses = true;
+	  return;
+	}
+    }
+}
+
+/* Identify comparison instructions within BB.  If the last compare in the BB
+   is valid at the end of the block, install it in BB->AUX.  Called via 
+   walk_dominators_tree.  */
+
+static void
+find_comparisons_in_bb (struct dom_walk_data *data ATTRIBUTE_UNUSED,
+			basic_block bb)
+{
+  struct comparison *last_cmp = NULL;
+  bitmap killed;
+  rtx insn, next;
+
+  killed = BITMAP_ALLOC (NULL);
+
+  /* Propagate the last live comparison throughout the extended basic block.  */
+  if (single_pred_p (bb))
+    last_cmp = (struct comparison *) single_pred (bb)->aux;
+
+  for (insn = BB_HEAD (bb); insn; insn = next)
+    {
+      rtx set, src, dst;
+
+      next = (insn == BB_END (bb) ? NULL_RTX : NEXT_INSN (insn));
+      if (!NONDEBUG_INSN_P (insn))
+	continue;
+
+      if ((set = single_set (insn)) != NULL
+	  && (src = SET_SRC (set), GET_CODE (src) == COMPARE)
+	  && (dst = SET_DEST (set), REG_P (dst))
+	  && REGNO (dst) == targetm.flags_regnum
+	  && REG_P (XEXP (src, 0))
+	  && (REG_P (XEXP (src, 1)) || CONSTANT_P (XEXP (src, 1))))
+	{
+	  /* Eliminate a compare that's redundant with the previous.  */
+	  if (last_cmp
+	      && rtx_equal_p (last_cmp->in_a, XEXP (src, 0))
+	      && rtx_equal_p (last_cmp->in_b, XEXP (src, 1)))
+	    {
+	      find_flags_uses_in_bb (last_cmp, bb, insn);
+	      delete_insn (insn);
+	      continue;
+	    }
+
+          last_cmp = XCNEW (struct comparison);
+	  last_cmp->insn = insn;
+	  last_cmp->in_a = XEXP (src, 0);
+	  last_cmp->in_b = XEXP (src, 1);
+	  last_cmp->orig_mode = GET_MODE (dst);
+	  VEC_safe_push (comparison_struct_p, heap, all_compares, last_cmp);
+
+	  find_flags_uses_in_bb (last_cmp, bb, insn);
+
+	  /* It's unusual, but be prepared for comparison patterns that
+	     also clobber an input, or perhaps a scratch.  */
+	  bitmap_clear (killed);
+	  df_simulate_find_defs (insn, killed);
+	}
+      else if (last_cmp)
+	{
+	  /* Notice if this instruction kills the flags register.  If so,
+	     any comparison we're holding is no longer valid.  */
+	  bitmap_clear (killed);
+	  df_simulate_find_defs (insn, killed);
+	  if (bitmap_bit_p (killed, targetm.flags_regnum))
+	    {
+	      last_cmp = NULL;
+	      continue;
+	    }
+	}
+      else
+	continue;
+
+      /* Notice if any of the inputs to the comparison have changed.  */
+      if (bitmap_bit_p (killed, REGNO (last_cmp->in_a)))
+	last_cmp = NULL;
+      else if (REG_P (last_cmp->in_b)
+	       && bitmap_bit_p (killed, REGNO (last_cmp->in_b)))
+	last_cmp = NULL;
+    }
+
+  BITMAP_FREE (killed);
+
+  /* Remember the live comparison for subsequent members of
+     the extended basic block.  */
+  if (last_cmp)
+    {
+      bb->aux = last_cmp;
+
+      /* Note that df does not create use chains that cross basic blocks.
+	 Therefore, given that we've not yet eliminated duplicate comparisons,
+	 if the flags register is live (in the sense of being used) then we're
+	 not going to be able to find all uses.  Given that this target needed
+	 this pass in the first place, this should be exceedingly rare, caused
+	 by a post-reload splitter creating new basic blocks.  */
+      if (bitmap_bit_p (DF_LR_OUT (bb), targetm.flags_regnum))
+	last_cmp->missing_uses = true;
+    }
+}
+
+/* Find all comparisons in the function.  */
+
+static void
+find_comparisons (void)
+{
+  struct dom_walk_data data;
+
+  memset (&data, 0, sizeof(data));
+  data.dom_direction = CDI_DOMINATORS;
+  data.before_dom_children = find_comparisons_in_bb;
+
+  calculate_dominance_info (CDI_DOMINATORS);
+
+  init_walk_dominator_tree (&data);
+  walk_dominator_tree (&data, ENTRY_BLOCK_PTR);
+  fini_walk_dominator_tree (&data);
+
+  clear_aux_for_blocks ();
+  free_dominance_info (CDI_DOMINATORS);
+}
+
+/* Select an alternate CC_MODE for a comparison insn comparing A and B.
+   Note that inputs are almost certainly different than the IN_A and IN_B
+   stored in CMP -- we're called while attempting to eliminate the compare
+   after all.  Return the new FLAGS rtx if successful, else return NULL.  */
+
+static rtx
+maybe_select_cc_mode (struct comparison *cmp, rtx a, rtx b)
+{
+  enum machine_mode sel_mode;
+  const int n = cmp->n_uses;
+  rtx flags = NULL;
+
+#ifndef SELECT_CC_MODE
+  /* Minimize code differences when this target macro is undefined.  */
+  return NULL;
+#define SELECT_CC_MODE(A,B,C) (gcc_unreachable (), VOIDmode)
+#endif
+
+  /* If we don't have access to all of the uses, we can't validate.  */
+  if (cmp->missing_uses || n == 0)
+    return NULL;
+
+  /* Find a new mode that works for all of the uses.  Special case the
+     common case of exactly one use.  */
+  if (n == 1)
+    {
+      sel_mode = SELECT_CC_MODE (cmp->uses[0].code, a, b);
+      if (sel_mode != cmp->orig_mode)
+	{
+	  flags = gen_rtx_REG (sel_mode, targetm.flags_regnum);
+	  validate_change (cmp->uses[0].insn, cmp->uses[0].loc, flags, true);
+	}
+    }
+  else
+    {
+      int i;
+
+      sel_mode = SELECT_CC_MODE (cmp->uses[0].code, a, b);
+      for (i = 1; i < n; ++i)
+	{
+	  enum machine_mode new_mode;
+	  new_mode = SELECT_CC_MODE (cmp->uses[i].code, a, b);
+	  if (new_mode != sel_mode)
+	    {
+	      sel_mode = targetm.cc_modes_compatible (sel_mode, new_mode);
+	      if (sel_mode == VOIDmode)
+		return NULL;
+	    }
+	}
+      
+      if (sel_mode != cmp->orig_mode)
+	{
+	  flags = gen_rtx_REG (sel_mode, targetm.flags_regnum);
+	  for (i = 0; i < n; ++i)
+	    validate_change (cmp->uses[i].insn, cmp->uses[i].loc, flags, true);
+	}
+    }
+
+  return flags;
+}
+
+/* Attempt to find an instruction prior to CMP that can be used to compute the
+   same flags value as the comparison itself.  Return true if successful.  */
+
+static bool
+try_eliminate_compare (struct comparison *cmp)
+{
+  df_ref *use_rec, use, def;
+  rtx dinsn, dpat, x, flags, cmp_src;
+
+  /* ??? For the moment we don't handle comparisons for which IN_B
+     is a register.  We accepted these during initial comparison 
+     recognition in order to eliminate duplicate compares.
+     An improvement here would be to handle
+	x = a - b; if (a < b)
+     Note that this doesn't follow the USE-DEF chain from X, but
+     since we already have to search for the previous clobber of
+     the flags register, this wouldn't really be a problem.  */
+  if (!CONSTANT_P (cmp->in_b))
+    return false;
+
+  /* The relevant instruction is the definition for the use in the
+     comparison instruction.  */
+  for (use_rec = DF_INSN_USES (cmp->insn); ; use_rec++)
+    {
+      use = *use_rec;
+      if (DF_REF_REGNO (use) == REGNO (cmp->in_a))
+	break;
+    }
+  if (DF_REF_CHAIN (use) == NULL)
+    return false;
+
+  def = DF_REF_CHAIN (use)->ref;
+  gcc_assert (DF_REF_REG_DEF_P (def));
+
+  /* If the defintion comes from a parameter, or such, there's no insn.  */
+  if (DF_REF_IS_ARTIFICIAL (def))
+    return false;
+
+  /* Make sure that the defining instruction is of the proper form.
+     That is, a single set plus a clobber of the flags register.  */
+  dinsn = DF_REF_INSN (def);
+  if (!NONJUMP_INSN_P (dinsn))
+    return false;
+  dpat = PATTERN (dinsn);
+  if (GET_CODE (dpat) != PARALLEL || XVECLEN (dpat, 0) != 2)
+    return false;
+  x = XVECEXP (dpat, 0, 0);
+  if (GET_CODE (x) != SET || DF_REF_LOC (def) != &SET_DEST (x))
+    return false;
+  x = XVECEXP (dpat, 0, 1);
+  if (GET_CODE (x) != CLOBBER)
+    return false;
+  flags = XEXP (x, 0);
+  if (!REG_P (flags) || REGNO (flags) != targetm.flags_regnum)
+    return false;
+
+  /* Make sure that the flags are not clobbered in between the two
+     instructions.  Unfortunately, we don't get DEF-DEF chains, so
+     do this the old fashioned way.  */
+  if (reg_set_between_p (flags, dinsn, cmp->insn))
+    return false;
+
+  /* Determine if we ought to use a different CC_MODE here.  */
+  cmp_src = SET_SRC (XVECEXP (dpat, 0, 0));
+  x = maybe_select_cc_mode (cmp, cmp_src, cmp->in_b);
+  if (x != NULL)
+    flags = x;
+
+  /* Generate a new comparison for installation in the setter.  */
+  x = copy_rtx (cmp_src);
+  x = gen_rtx_COMPARE (GET_MODE (flags), x, cmp->in_b);
+  x = gen_rtx_SET (VOIDmode, flags, x);
+
+  /* Succeed if the new instruction is valid.  */
+  validate_change (dinsn, &XVECEXP (dpat, 0, 1), x, true);
+  if (!apply_change_group ())
+    return false;
+ 
+  /* Success.  Delete the compare insn...  */
+  delete_insn (cmp->insn);
+
+  /* ... and any notes that are now irrelevant due to multi-stores. */
+  x = find_regno_note (dinsn, REG_UNUSED, targetm.flags_regnum);
+  if (x)
+    remove_note (dinsn, x);
+  x = find_reg_note (dinsn, REG_EQUAL, NULL);
+  if (x)
+    remove_note (dinsn, x);
+  x = find_reg_note (dinsn, REG_EQUIV, NULL);
+  if (x)
+    remove_note (dinsn, x);
+
+  return true;
+}
+
+/* Main entry point to the pass.  */
+
+static unsigned int
+execute_compare_elim_after_reload (void)
+{
+  df_set_flags (DF_DEFER_INSN_RESCAN);
+  df_live_add_problem ();
+  df_chain_add_problem (DF_DU_CHAIN + DF_UD_CHAIN);
+  df_analyze ();
+
+  gcc_checking_assert (all_compares == NULL);
+
+  /* Locate all comparisons and their uses, and eliminate duplicates.  */
+  find_comparisons ();
+  if (all_compares)
+    {
+      struct comparison *cmp;
+      size_t i;
+
+      /* Eliminate comparisons that are redundant with flags computation.  */
+      for (i = 0; VEC_iterate (comparison_struct_p, all_compares, i, cmp); ++i)
+	{
+	  try_eliminate_compare (cmp);
+	  XDELETE (cmp);
+	}
+
+      VEC_free (comparison_struct_p, heap, all_compares);
+      all_compares = NULL;
+
+      df_remove_problem (df_chain);
+      df_analyze ();
+    }
+
+  return 0;
+}
+
+static bool
+gate_compare_elim_after_reload (void)
+{
+  return (flag_compare_elim_after_reload
+	  && targetm.flags_regnum != INVALID_REGNUM);
+}
+
+struct rtl_opt_pass pass_compare_elim_after_reload =
+{
+ {
+  RTL_PASS,
+  "cmpelim",				/* name */
+  gate_compare_elim_after_reload,	/* gate */
+  execute_compare_elim_after_reload,	/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_NONE,				/* tv_id */
+  0,					/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_df_finish
+  | TODO_df_verify
+  | TODO_verify_rtl_sharing
+  | TODO_dump_func
+  | TODO_ggc_collect			/* todo_flags_finish */
+ }
+};
diff --git a/gcc/config/mn10300/mn10300.c b/gcc/config/mn10300/mn10300.c
index 25bfa0d..e5bfdae 100644
--- a/gcc/config/mn10300/mn10300.c
+++ b/gcc/config/mn10300/mn10300.c
@@ -3023,4 +3023,7 @@  mn10300_md_asm_clobbers (tree outputs ATTRIBUTE_UNUSED,
 #undef TARGET_MD_ASM_CLOBBERS
 #define TARGET_MD_ASM_CLOBBERS  mn10300_md_asm_clobbers
 
+#undef  TARGET_FLAGS_REGNUM
+#define TARGET_FLAGS_REGNUM  CC_REG
+
 struct gcc_target targetm = TARGET_INITIALIZER;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 800c592..e50de64 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -337,7 +337,7 @@  Objective-C and Objective-C++ Dialects}.
 -fauto-inc-dec -fbranch-probabilities -fbranch-target-load-optimize @gol
 -fbranch-target-load-optimize2 -fbtr-bb-exclusive -fcaller-saves @gol
 -fcheck-data-deps -fcombine-stack-adjustments -fconserve-stack @gol
--fcprop-registers -fcrossjumping @gol
+-fcompare-elim -fcprop-registers -fcrossjumping @gol
 -fcse-follow-jumps -fcse-skip-blocks -fcx-fortran-rules @gol
 -fcx-limited-range @gol
 -fdata-sections -fdce -fdce @gol
@@ -5870,6 +5870,7 @@  compilation time.
 @option{-O} turns on the following optimization flags:
 @gccoptlist{
 -fauto-inc-dec @gol
+-fcompare-elim @gol
 -fcprop-registers @gol
 -fdce @gol
 -fdefer-pop @gol
@@ -7680,6 +7681,18 @@  use hidden visibility) is similar to @code{-fwhole-program}.  See
 Enabled by default when LTO support in GCC is enabled and GCC was compiled
 with linker supporting plugins (GNU ld or @code{gold}).
 
+@item -fcompare-elim
+@opindex fcompare-elim
+After register allocation and post-register allocation instruction splitting,
+identify arithmetic instructions that compute processor flags similar to a
+comparison operation based on that arithmetic.  If possible, eliminate the
+explicit comparison operation.
+
+This pass only applies to certain targets that cannot explicitly represent
+the comparison operation before register allocation is complete.
+
+Enabled at levels @option{-O}, @option{-O2}, @option{-O3}, @option{-Os}.
+
 @item -fcprop-registers
 @opindex fcprop-registers
 After register allocation and post-register allocation instruction splitting,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 139c5a7..5dfd100 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4345,6 +4345,10 @@  to return a nonzero value when it is required, the compiler will run out
 of spill registers and print a fatal error message.
 @end deftypefn
 
+@deftypevr {Target Hook} {unsigned int} TARGET_FLAGS_REGNUM
+If the target has a dedicated flags register, and it needs to use the post-reload comparison elimination pass, then this value should be set appropriately.
+@end deftypevr
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 2085816..bd49012 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4333,6 +4333,8 @@  to return a nonzero value when it is required, the compiler will run out
 of spill registers and print a fatal error message.
 @end deftypefn
 
+@hook TARGET_FLAGS_REGNUM
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
diff --git a/gcc/opts.c b/gcc/opts.c
index 42d53f4..7001122 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -456,6 +456,7 @@  static const struct default_options default_options_table[] =
     { OPT_LEVELS_1_PLUS, OPT_ftree_sink, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_ftree_ch, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
+    { OPT_LEVELS_1_PLUS, OPT_fcompare_elim, NULL, 1 },
 
     /* -O2 optimizations.  */
     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
diff --git a/gcc/passes.c b/gcc/passes.c
index 804ac9f..5aba987 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -1021,6 +1021,7 @@  init_optimization_passes (void)
 	  NEXT_PASS (pass_gcse2);
 	  NEXT_PASS (pass_split_after_reload);
 	  NEXT_PASS (pass_implicit_zee);
+	  NEXT_PASS (pass_compare_elim_after_reload);
 	  NEXT_PASS (pass_branch_target_load_optimize1);
 	  NEXT_PASS (pass_thread_prologue_and_epilogue);
 	  NEXT_PASS (pass_rtl_dse2);
diff --git a/gcc/recog.h b/gcc/recog.h
index 217c6e5..534d2c9 100644
--- a/gcc/recog.h
+++ b/gcc/recog.h
@@ -18,6 +18,9 @@  You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+#ifndef GCC_RECOG_H
+#define GCC_RECOG_H
+
 /* Random number that should be large enough for all purposes.  */
 #define MAX_RECOG_ALTERNATIVES 30
 
@@ -305,3 +308,5 @@  struct insn_data_d
 
 extern const struct insn_data_d insn_data[];
 extern int peep2_current_count;
+
+#endif /* GCC_RECOG_H */
diff --git a/gcc/target.def b/gcc/target.def
index 9d96e65..29d2a51 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1627,6 +1627,15 @@  DEFHOOK
  bool, (enum machine_mode mode),
  hook_bool_mode_false)
 
+/* Register number for a flags register.  Only needs to be defined if the
+   target is constrainted to use post-reload comparison elimination.  */
+DEFHOOKPOD
+(flags_regnum,
+ "If the target has a dedicated flags register, and it needs to use the\
+ post-reload comparison elimination pass, then this value should be set\
+ appropriately.",
+ unsigned int, INVALID_REGNUM)
+
 /* Compute a (partial) cost for rtx X.  Return true if the complete
    cost has been computed, and false if subexpressions should be
    scanned.  In either case, *TOTAL contains the cost result.  */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 32d8f40..dd82288 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -551,6 +551,7 @@  extern struct rtl_opt_pass pass_reorder_blocks;
 extern struct rtl_opt_pass pass_branch_target_load_optimize2;
 extern struct rtl_opt_pass pass_leaf_regs;
 extern struct rtl_opt_pass pass_split_before_sched2;
+extern struct rtl_opt_pass pass_compare_elim_after_reload;
 extern struct rtl_opt_pass pass_sched2;
 extern struct rtl_opt_pass pass_stack_regs;
 extern struct rtl_opt_pass pass_stack_regs_run;