diff mbox

RFC: LRA for x86/x86-64 [7/9]

Message ID 5064DA38.9010906@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov Sept. 27, 2012, 10:59 p.m. UTC
This is the major patch containing all new files.  The patch also adds 
necessary calls to LRA from IRA.As the patch is too big, it continues in 
the next email.

2012-09-27  Vladimir Makarov  <vmakarov@redhat.com>

     * Makefile.in (LRA_INT_H): New.
     (OBJS): Add lra.o, lra-assigns.o, lra-coalesce.o,
     lra-constraints.o, lra-eliminations.o, lra-lives.o, and lra-spills.o.
     (ira.o): Add dependence on lra.h.
     (lra.o, lra-assigns.o, lra-coalesce.o, lra-constraints.o): New entries.
     (lra-eliminations.o, lra-lives.o, lra-spills.o): Ditto.
     * ira.c: Include lra.h.
     (ira_init_once, ira_init, ira_finish_once): Call lra_start_once,
     lra_init, lra_finish_once in anyway.
         (lra_in_progress): Remove.
     (do_reload): Call LRA.
     * lra.h: New.
     * lra-int.h: Ditto.
     * lra.c: Ditto.
     * lra-assigns.c: Ditto.
     * lra-constraints.c: Ditto.
     * lra-coalesce.c: Ditto.
     * lra-eliminations.c: Ditto.
     * lra-lives.c: Ditto.
     * lra-spills.c: Ditto.
     * doc/passes.texi: Describe LRA pass.

Comments

Richard Sandiford Oct. 2, 2012, 11:21 a.m. UTC | #1
Hi Vlad,

Vladimir Makarov <vmakarov@redhat.com> writes:
> This is the major patch containing all new files.  The patch also adds 
> necessary calls to LRA from IRA.As the patch is too big, it continues in 
> the next email.
>
> 2012-09-27  Vladimir Makarov  <vmakarov@redhat.com>
>
>      * Makefile.in (LRA_INT_H): New.
>      (OBJS): Add lra.o, lra-assigns.o, lra-coalesce.o,
>      lra-constraints.o, lra-eliminations.o, lra-lives.o, and lra-spills.o.
>      (ira.o): Add dependence on lra.h.
>      (lra.o, lra-assigns.o, lra-coalesce.o, lra-constraints.o): New entries.
>      (lra-eliminations.o, lra-lives.o, lra-spills.o): Ditto.
>      * ira.c: Include lra.h.
>      (ira_init_once, ira_init, ira_finish_once): Call lra_start_once,
>      lra_init, lra_finish_once in anyway.
>          (lra_in_progress): Remove.
>      (do_reload): Call LRA.
>      * lra.h: New.
>      * lra-int.h: Ditto.
>      * lra.c: Ditto.
>      * lra-assigns.c: Ditto.
>      * lra-constraints.c: Ditto.
>      * lra-coalesce.c: Ditto.
>      * lra-eliminations.c: Ditto.
>      * lra-lives.c: Ditto.
>      * lra-spills.c: Ditto.
>      * doc/passes.texi: Describe LRA pass.

A non-authoritative review of the documentation and lra-eliminations.c:

> +LRA is different from the reload pass in LRA division on small,
> +manageable, and separated sub-tasks.  All LRA transformations and
> +decisions are reflected in RTL as more as possible.  Instruction
> +constraints as a primary source of the info and that minimizes number
> +of target-depended macros/hooks.
>
> +LRA is run for the targets it were ported.

Suggest something like:

  Unlike the reload pass, intermediate LRA decisions are reflected in
  RTL as much as possible.  This reduces the number of target-dependent
  macros and hooks, leaving instruction constraints as the primary
  source of control.

  LRA is run on targets for which TARGET_LRA_P returns true.

> +/* The virtual registers (like argument and frame pointer) are widely
> +   used in RTL.	 Virtual registers should be changed by real hard
> +   registers (like stack pointer or hard frame pointer) plus some
> +   offset.  The offsets are changed usually every time when stack is
> +   expanded.  We know the final offsets only at the very end of LRA.

I always think of "virtual" as [FIRST_VIRTUAL_REGISTER, LAST_VIRTUAL_REGISTER].
Maybe "eliminable" would be better?  E.g.

/* Eliminable registers (like a soft argument or frame pointer) are widely
   used in RTL.  These eliminable registers should be replaced by real hard
   registers (like the stack pointer or hard frame pointer) plus some offset.
   The offsets usually change whenever the stack is expanded.  We know the
   final offsets only at the very end of LRA.

> +   We keep RTL code at most time in such state that the virtual
> +   registers can be changed by just the corresponding hard registers
> +   (with zero offsets) and we have the right RTL code.	To achieve this
> +   we should add initial offset at the beginning of LRA work and update
> +   offsets after each stack expanding.	But actually we update virtual
> +   registers to the same virtual registers + corresponding offsets
> +   before every constraint pass because it affects constraint
> +   satisfaction (e.g. an address displacement became too big for some
> +   target).

Suggest:

   Within LRA, we usually keep the RTL in such a state that the eliminable
   registers can be replaced by just the corresponding hard register
   (without any offset).  To achieve this we should add the initial
   elimination offset at the beginning of LRA and update the offsets
   whenever the stack is expanded.  We need to do this before every
   constraint pass because the choice of offset often affects whether
   a particular address or memory constraint is satisfied.

> +   The final change of virtual registers to the corresponding hard
> +   registers are done at the very end of LRA when there were no change
> +   in offsets anymore:
> +
> +		     fp + 42	 =>	sp + 42

virtual=>eliminable if the above is OK.

> +   Such approach requires a few changes in the rest GCC code because
> +   virtual registers are not recognized as real ones in some
> +   constraints and predicates.	Fortunately, such changes are
> +   small.  */

Not sure whether the last paragraph really belongs in the code,
since it's more about the reload->LRA transition.

> +  /* Nonzero if this elimination can be done.  */
> +  bool can_eliminate;		
> +  /* CAN_ELIMINATE since the last check.  */
> +  bool prev_can_eliminate;

AFAICT, these two fields are (now) only ever assigned at the same time,
via init_elim_table and setup_can_eliminate.  Looks like we can do
without prev_can_eliminate.  (And the way that the pass doesn't
need to differentiate between the raw CAN_ELIMINABLE value and
the processed value feels nice and reassuring.)

> +/* Map: 'from regno' -> to the current elimination, NULL otherwise.
> +   The elimination table may contains more one elimination of a hard
> +   register.  The map contains only one currently used elimination of
> +   the hard register.  */
> +static struct elim_table *elimination_map[FIRST_PSEUDO_REGISTER];

Nit: s/-> to/->/, s/may contains/may contain/.  Maybe:

/* Map: eliminable "from" register -> its current elimination,
   or NULL if none.  The elimination table may contain more than
   one elimination for the same hard register, but this map specifies
   the one that we are currently using.  */

> +/* When an eliminable hard register becomes not eliminable, we use the
> +   special following structure to restore original offsets for the
> +   register.  */

s/special following/following special/

> +/* Return the current substitution hard register of the elimination of
> +   HARD_REGNO.	If HARD_REGNO is not eliminable, return itself.	 */
> +int
> +lra_get_elimation_hard_regno (int hard_regno)

Typo: s/get_elimation/get_elimination/

> +/* Scan X and replace any eliminable registers (such as fp) with a
> +   replacement (such as sp) if SYBST_P, plus an offset.	 The offset is

Typo: SUBST_P.

> +   a change in the offset between the eliminable register and its
> +   substitution if UPDATE_P, or the full offset if FULL_P, or
> +   otherwise zero.

I wonder if an enum would be better than two booleans?
It avoids invalid combinations like UPDATE_P && FULL_P
and might make the arguments more obvious too.

> +    /* You might think handling MINUS in a manner similar to PLUS is a
> +       good idea.  It is not.  It has been tried multiple times and every
> +       time the change has had to have been reverted.
> +
> +       Other parts of LRA know a PLUS is special and require special
> +       code to handle code a reloaded PLUS operand.
> +
> +       Also consider backends where the flags register is clobbered by a
> +       MINUS, but we can emit a PLUS that does not clobber flags (IA-32,
> +       lea instruction comes to mind).	If we try to reload a MINUS, we
> +       may kill the flags register that was holding a useful value.
> +
> +       So, please before trying to handle MINUS, consider reload as a
> +       whole instead of this little section as well as the backend
> +       issues.	*/

A few references to the old enemy, especially in the last paragraph.

I think this old comment could be dropped altogether.  Handling MINUS
in a similar manner isn't a good idea because (minus ... (const_int ...))
isn't canonical.

> +	      /* The only time we want to replace a PLUS with a REG
> +		 (this occurs when the constant operand of the PLUS is
> +		 the negative of the offset) is when we are inside a
> +		 MEM.  We won't want to do so at other times because
> +		 that would change the structure of the insn in a way
> +		 that reload can't handle.  We special-case the
> +		 commonest situation in eliminate_regs_in_insn, so
> +		 just replace a PLUS with a PLUS here, unless inside a
> +		 MEM.  */

Reload reference.  Does this restriction still apply?  The later comment:

> +	 Note that there is no risk of modifying the structure of the insn,
> +	 since we only get called for its operands, thus we are either
> +	 modifying the address inside a MEM, or something like an address
> +	 operand of a load-address insn.  */

makes it sound on face value like the MEM restriction above is a reload-specific
thing.  Same question for:

> +	    /* As above, if we are not inside a MEM we do not want to
> +	       turn a PLUS into something else.	 We might try to do so here
> +	       for an addition of 0 if we aren't optimizing.  */

Given all the fuss in the 0/9 thread, maybe elimination could be
done in-place, at least when applied internally by LRA rather than
externally.  Most of the cases, including PLUS, are for things that
can't legitimately be shared.

I don't think that should be a requirement for merging though,
especially since this is how things have always been done.
Just an idea.

> +      /* We do not support elimination of a register that is modified.
> +	 elimination_effects has already make sure that this does not
> +	 happen.  */
> +      return x;
> 
> +    case PRE_MODIFY:
> +    case POST_MODIFY:
> +      /* We do not support elimination of a hard register that is
> +	 modified.  elimination_effects has already make sure that
> +	 this does not happen.	The only remaining case we need to
> +	 consider here is that the increment value may be an
> +	 eliminable register.  */

Reload references (elimination_effects).

> +#ifdef WORD_REGISTER_OPERATIONS
> +		   /* On these machines, combine can create RTL of the form
> +		      (set (subreg:m1 (reg:m2 R) 0) ...)
> +		      where m1 < m2, and expects something interesting to
> +		      happen to the entire word.  Moreover, it will use the
> +		      (reg:m2 R) later, expecting all bits to be preserved.
> +		      So if the number of words is the same, preserve the
> +		      subreg so that push_reload can see it.  */
> +		   && ! ((x_size - 1) / UNITS_PER_WORD
> +			 == (new_size -1 ) / UNITS_PER_WORD)
> +#endif

Reload reference (push_reload).  Do we still need this for LRA?

> +	    {
> +	      SUBREG_REG (x) = new_rtx;
> +	      alter_subreg (&x, false);
> +	      return x;
> +	    }

The reload version is:

    return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));

Isn't that safe here too?  We're only doing this for non-paradoxical subregs, so:

      /* For paradoxical subregs on big-endian machines, SUBREG_BYTE
	 contains 0 instead of the proper offset.  See simplify_subreg.  */
      if (offset == 0
	  && GET_MODE_SIZE (GET_MODE (y)) < GET_MODE_SIZE (GET_MODE (x)))
        {
          int difference = GET_MODE_SIZE (GET_MODE (y))
			   - GET_MODE_SIZE (GET_MODE (x));
          if (WORDS_BIG_ENDIAN)
            offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD;
          if (BYTES_BIG_ENDIAN)
            offset += difference % UNITS_PER_WORD;
        }

doesn't apply.

> +/* Scan rtx X for modifications of elimination target registers.  Update
> +   the table of eliminables to reflect the changed state.  MEM_MODE is
> +   the mode of an enclosing MEM rtx, or VOIDmode if not within a MEM.  */
> +static void
> +mark_not_eliminable (rtx x)

Maybe:

/* Scan rtx X for references to elimination source or target registers
   in contexts that would prevent the elimination from happening.
   Update ... */

The function looks at uses as well as modifications, and at elimination
source registers as well as target registers.

> +      if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER)
> +	/* If we modify the source of an elimination rule, disable it.	*/
> +	for (ep = reg_eliminate;
> +	     ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
> +	       ep++)
> +	  if (ep->from_rtx == XEXP (x, 0)
> +	      || (ep->to_rtx == XEXP (x, 0)
> +		  && ep->to_rtx != hard_frame_pointer_rtx))
> +	    setup_can_eliminate (ep, false);

Comment doesn't mention the to_rtx case.

> +   If REPLACE_P is false, do an offset updates.	 */

s/do an offset updates/just update the offsets/
Maybe worth adding "while keeping the base register the same".

> +static void
> +eliminate_regs_in_insn (rtx insn, bool replace_p)
> +{
> +  int icode = recog_memoized (insn);
> +  rtx old_set = single_set (insn);
> +  bool val;

"validate_p" might be a better name.

> +	      /* If we are assigning to a hard register that can be
> +		 eliminated, it must be as part of a PARALLEL, since
> +		 the code above handles single SETs.  We must indicate
> +		 that we can no longer eliminate this reg.  */
> +	      for (ep = reg_eliminate;
> +		   ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
> +		   ep++)
> +		lra_assert (ep->from_rtx != orig_operand[i]
> +			    || ! ep->can_eliminate);

This is now an assert, so rather than "We must indicate ...",
I think the comment should say who enforces this.  (mark_not_eliminable?)

> +	  /* If an output operand changed from a REG to a MEM and INSN is an
> +	     insn, write a CLOBBER insn.  */
> +	  if (static_id->operand[i].type != OP_IN
> +	      && REG_P (orig_operand[i])
> +	      && MEM_P (substed_operand[i])
> +	      && replace_p)
> +	    emit_insn_after (gen_clobber (orig_operand[i]), insn);

I realise this is copied from reload, but why is it needed?

> +/* Initialize the table of hard registers to eliminate.
> +   Pre-condition: global flag frame_pointer_needed has been set before
> +   calling this function.  Set up hard registers in DONT_USE_REGS
> +   which can not be used for allocation because their identical
> +   elimination is not possible.	 */
> +static void
> +init_elim_table (void)

DONT_USE_REGS doesn't exist any more.

> +/* Eliminate hard reg given by its location LOC.  */
> +void
> +lra_eliminate_reg_if_possible (rtx *loc)
> +{
> +  int regno;
> +  struct elim_table *ep;
> +
> +  lra_assert (REG_P (*loc));
> +  if ((regno = REGNO (*loc)) >= FIRST_PSEUDO_REGISTER
> +      /* Virtual registers are not allocatable. ??? */
> +      || ! TEST_HARD_REG_BIT (lra_no_alloc_regs, regno))
> +    return;

I don't understand this comment.  Is the "if" statement needed at all?
Given the requirement for rtx equality, I'd have thought:

> +  if ((ep = get_elimination (*loc)) != NULL)
> +    *loc = ep->to_rtx;

would do everything.  If that isn't true, the function might need
a bit more commentary.

> +  for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
> +    if (lra_reg_info[i].nrefs != 0)
> +      {
> +	mem_loc = ira_reg_equiv[i].memory;
> +	invariant = ira_reg_equiv[i].invariant;
> +	if (mem_loc != NULL_RTX)
> +	  mem_loc = lra_eliminate_regs_1 (mem_loc, VOIDmode,
> +					  final_p, ! final_p, false);
> +	ira_reg_equiv[i].memory = mem_loc;
> +	if (invariant != NULL_RTX)
> +	  invariant = lra_eliminate_regs_1 (invariant, VOIDmode,
> +					    final_p, ! final_p, false);
> +	ira_reg_equiv[i].invariant = invariant;

Minor nit, but the first assignment to "invariant" seems to belong
a bit further down.

Looks really good to me FWIW.  Keeping the rtl in a form where the
final elimination is simply a register replacement is a nice trick.

Richard
Bernd Schmidt Oct. 2, 2012, 11:30 a.m. UTC | #2
On 09/28/2012 12:59 AM, Vladimir Makarov wrote:
> +   We keep RTL code at most time in such state that the virtual
> +   registers can be changed by just the corresponding hard registers
> +   (with zero offsets) and we have the right RTL code.	To achieve this
> +   we should add initial offset at the beginning of LRA work and update
> +   offsets after each stack expanding.	But actually we update virtual
> +   registers to the same virtual registers + corresponding offsets
> +   before every constraint pass because it affects constraint
> +   satisfaction (e.g. an address displacement became too big for some
> +   target).
> +
> +   The final change of virtual registers to the corresponding hard
> +   registers are done at the very end of LRA when there were no change
> +   in offsets anymore:
> +
> +		     fp + 42	 =>	sp + 42

Let me try to understand this.  We have (mem (fp)), which we rewrite to
(mem (fp + 42)), but this is intended to represent (mem (sp + 42))?

Wouldn't this fail on any target which has different addressing ranges
for SP and FP?


Bernd
Richard Sandiford Oct. 2, 2012, 1:42 p.m. UTC | #3
Vladimir Makarov <vmakarov@redhat.com> writes:
> This is the major patch containing all new files.  The patch also adds 
> necessary calls to LRA from IRA.As the patch is too big, it continues in 
> the next email.
>
> 2012-09-27  Vladimir Makarov  <vmakarov@redhat.com>
>
>      * Makefile.in (LRA_INT_H): New.
>      (OBJS): Add lra.o, lra-assigns.o, lra-coalesce.o,
>      lra-constraints.o, lra-eliminations.o, lra-lives.o, and lra-spills.o.
>      (ira.o): Add dependence on lra.h.
>      (lra.o, lra-assigns.o, lra-coalesce.o, lra-constraints.o): New entries.
>      (lra-eliminations.o, lra-lives.o, lra-spills.o): Ditto.
>      * ira.c: Include lra.h.
>      (ira_init_once, ira_init, ira_finish_once): Call lra_start_once,
>      lra_init, lra_finish_once in anyway.
>          (lra_in_progress): Remove.
>      (do_reload): Call LRA.
>      * lra.h: New.
>      * lra-int.h: Ditto.
>      * lra.c: Ditto.
>      * lra-assigns.c: Ditto.
>      * lra-constraints.c: Ditto.
>      * lra-coalesce.c: Ditto.
>      * lra-eliminations.c: Ditto.
>      * lra-lives.c: Ditto.
>      * lra-spills.c: Ditto.
>      * doc/passes.texi: Describe LRA pass.

Comments on ira-lives.c.  (Sorry for the split, had more time to look
at this than expected)

> +/* Copy live range list given by its head R and return the result.  */
> +lra_live_range_t
> +lra_copy_live_range_list (lra_live_range_t r)
> +{
> +  lra_live_range_t p, first, last;
> +
> +  if (r == NULL)
> +    return NULL;
> +  for (first = last = NULL; r != NULL; r = r->next)
> +    {
> +      p = copy_live_range (r);
> +      if (first == NULL)
> +	first = p;
> +      else
> +	last->next = p;
> +      last = p;
> +    }
> +  return first;
> +}

Maybe simpler as:

  lra_live_range_t p, first, *chain;

  first = NULL;
  for (chain = &first; r != NULL; r = r->next)
    {
      p = copy_live_range (r);
      *chain = p;
      chain = &p->next;
    }
  return first;

> +/* Merge ranges R1 and R2 and returns the result.  The function
> +   maintains the order of ranges and tries to minimize size of the
> +   result range list.  */
> +lra_live_range_t 
> +lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2)
> +{
> +  lra_live_range_t first, last, temp;
> +
> +  if (r1 == NULL)
> +    return r2;
> +  if (r2 == NULL)
> +    return r1;
> +  for (first = last = NULL; r1 != NULL && r2 != NULL;)
> +    {
> +      if (r1->start < r2->start)
> +	{
> +	  temp = r1;
> +	  r1 = r2;
> +	  r2 = temp;
> +	}
> +      if (r1->start <= r2->finish + 1)
> +	{
> +	  /* Intersected ranges: merge r1 and r2 into r1.  */
> +	  r1->start = r2->start;
> +	  if (r1->finish < r2->finish)
> +	    r1->finish = r2->finish;
> +	  temp = r2;
> +	  r2 = r2->next;
> +	  pool_free (live_range_pool, temp);
> +	  if (r2 == NULL)
> +	    {
> +	      /* To try to merge with subsequent ranges in r1.	*/
> +	      r2 = r1->next;
> +	      r1->next = NULL;
> +	    }
> +	}
> +      else
> +	{
> +	  /* Add r1 to the result.  */
> +	  if (first == NULL)
> +	    first = last = r1;
> +	  else
> +	    {
> +	      last->next = r1;
> +	      last = r1;
> +	    }
> +	  r1 = r1->next;
> +	  if (r1 == NULL)
> +	    {
> +	      /* To try to merge with subsequent ranges in r2.	*/
> +	      r1 = r2->next;
> +	      r2->next = NULL;
> +	    }
> +	}

I might be misreading, but I'm not sure whether this handles merges like:

  r1 = [6,7], [3,4]
  r2 = [3,8], [0,1]

After the first iteration, it looks like we'll have:

  r1 = [3,8], [3,4]
  r2 = [0,1]

Then we'll add both [3,8] and [3,4] to the result.

Same chain pointer comment as for lra_merge_live_ranges.

> +/* Return TRUE if live range R1 is in R2.  */
> +bool
> +lra_live_range_in_p (lra_live_range_t r1, lra_live_range_t r2)
> +{
> +  /* Remember the live ranges are always kept ordered.	*/
> +  while (r1 != NULL && r2 != NULL)
> +    {
> +      /* R1's element is in R2's element.  */
> +      if (r2->start <= r1->start && r1->finish <= r2->finish)
> +	r1 = r1->next;
> +      /* Intersection: R1's start is in R2.  */
> +      else if (r2->start <= r1->start && r1->start <= r2->finish)
> +	return false;
> +      /* Intersection: R1's finish is in R2.  */
> +      else if (r2->start <= r1->finish && r1->finish <= r2->finish)
> +	return false;
> +      else if (r1->start > r2->finish)
> +	return false; /* No covering R2's element for R1's one.	 */
> +      else
> +	r2 = r2->next;
> +    }
> +  return r1 == NULL;

Does the inner bit reduce to:

      /* R1's element is in R2's element.  */
      if (r2->start <= r1->start && r1->finish <= r2->finish)
	r1 = r1->next;
      /* All of R2's element comes after R1's element.  */
      else if (r2->start > r1->finish)
	r2 = r2->next;
      else
	return false;

(Genuine question)

> +/* Process the death of hard register REGNO.  This updates
> +   hard_regs_live and START_DYING.  */
> +static void
> +make_hard_regno_dead (int regno)
> +{
> +  if (TEST_HARD_REG_BIT (lra_no_alloc_regs, regno)
> +      || ! TEST_HARD_REG_BIT (hard_regs_live, regno))
> +    return;
> +  lra_assert (regno < FIRST_PSEUDO_REGISTER);
> +  sparseset_set_bit (start_dying, regno);
> +  CLEAR_HARD_REG_BIT (hard_regs_live, regno);
> +}

Assert should be before the HARD_REG_SET stuff (like for
make_hard_regno_born).

> +	  /* Check that source regno does not conflict with
> +	     destination regno to exclude most impossible
> +	     preferences.  */
> +	  && ((((src_regno = REGNO (SET_SRC (set))) >= FIRST_PSEUDO_REGISTER
> +		&& ! sparseset_bit_p (pseudos_live, src_regno))
> +	       || (src_regno < FIRST_PSEUDO_REGISTER
> +		   && ! TEST_HARD_REG_BIT (hard_regs_live, src_regno)))

This is probably personal preference, but I think this would be more
readable with an inline utility function (regno_live_p, or whatever).

> +/* Compress pseudo live ranges by removing program points where
> +   nothing happens.  Complexity of many algorithms in LRA is linear
> +   function of program points number.  To speed up the code we try to
> +   minimize the number of the program points here.  */
> +static void
> +remove_some_program_points_and_update_live_ranges (void)

Genuine question, but could we do this on the fly instead,
by not incrementing curr_point if the current point had no value?

I suppose the main complication would be checking cases where
all births are recorded by extending the previous just-closed live
range rather than starting a new one, in which case it's the previous
point that needs to be reused.  Hmm...

Richard
Richard Sandiford Oct. 2, 2012, 2:14 p.m. UTC | #4
Richard Sandiford <rdsandiford@googlemail.com> writes:
>> +/* Merge ranges R1 and R2 and returns the result.  The function
>> +   maintains the order of ranges and tries to minimize size of the
>> +   result range list.  */
>> +lra_live_range_t 
>> +lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2)
>> +{
>> +  lra_live_range_t first, last, temp;
>> +
>> +  if (r1 == NULL)
>> +    return r2;
>> +  if (r2 == NULL)
>> +    return r1;
>> +  for (first = last = NULL; r1 != NULL && r2 != NULL;)
>> +    {
>> +      if (r1->start < r2->start)
>> +	{
>> +	  temp = r1;
>> +	  r1 = r2;
>> +	  r2 = temp;
>> +	}
>> +      if (r1->start <= r2->finish + 1)
>> +	{
>> +	  /* Intersected ranges: merge r1 and r2 into r1.  */
>> +	  r1->start = r2->start;
>> +	  if (r1->finish < r2->finish)
>> +	    r1->finish = r2->finish;
>> +	  temp = r2;
>> +	  r2 = r2->next;
>> +	  pool_free (live_range_pool, temp);
>> +	  if (r2 == NULL)
>> +	    {
>> +	      /* To try to merge with subsequent ranges in r1.	*/
>> +	      r2 = r1->next;
>> +	      r1->next = NULL;
>> +	    }
>> +	}
>> +      else
>> +	{
>> +	  /* Add r1 to the result.  */
>> +	  if (first == NULL)
>> +	    first = last = r1;
>> +	  else
>> +	    {
>> +	      last->next = r1;
>> +	      last = r1;
>> +	    }
>> +	  r1 = r1->next;
>> +	  if (r1 == NULL)
>> +	    {
>> +	      /* To try to merge with subsequent ranges in r2.	*/
>> +	      r1 = r2->next;
>> +	      r2->next = NULL;
>> +	    }
>> +	}
>
> I might be misreading, but I'm not sure whether this handles merges like:
>
>   r1 = [6,7], [3,4]
>   r2 = [3,8], [0,1]
>
> After the first iteration, it looks like we'll have:
>
>   r1 = [3,8], [3,4]
>   r2 = [0,1]
>
> Then we'll add both [3,8] and [3,4] to the result.

OK, so I start to read patch b and realise that this is only supposed to
handle non-overlapping live ranges.  It might be worth having a comment
and assert to that effect, for slow readers like me.

Although in that case the function feels a little more complicated than
it needs to be.  When we run out of R1 or R2, why not just use the other
one as the rest of the live range list?  Why is:

>> +	  if (r1 == NULL)
>> +	    {
>> +	      /* To try to merge with subsequent ranges in r2.	*/
>> +	      r1 = r2->next;
>> +	      r2->next = NULL;
>> +	    }

needed?

Richard
Vladimir Makarov Oct. 2, 2012, 10:29 p.m. UTC | #5
On 12-10-02 7:30 AM, Bernd Schmidt wrote:
> On 09/28/2012 12:59 AM, Vladimir Makarov wrote:
>> +   We keep RTL code at most time in such state that the virtual
>> +   registers can be changed by just the corresponding hard registers
>> +   (with zero offsets) and we have the right RTL code.	To achieve this
>> +   we should add initial offset at the beginning of LRA work and update
>> +   offsets after each stack expanding.	But actually we update virtual
>> +   registers to the same virtual registers + corresponding offsets
>> +   before every constraint pass because it affects constraint
>> +   satisfaction (e.g. an address displacement became too big for some
>> +   target).
>> +
>> +   The final change of virtual registers to the corresponding hard
>> +   registers are done at the very end of LRA when there were no change
>> +   in offsets anymore:
>> +
>> +		     fp + 42	 =>	sp + 42
> Let me try to understand this.  We have (mem (fp)), which we rewrite to
> (mem (fp + 42)), but this is intended to represent (mem (sp + 42))?
>
> Wouldn't this fail on any target which has different addressing ranges
> for SP and FP?
>
>
Yes, I think so.  It is not a problem for 9 current targets.  But when I 
or somebody else start porting LRA to such target we can introduce new 
virtual registers (and switch to them at the begging of LRA) to differ 
the situation.  It will be not a big deal.

I'd appreciate if you tell me such target in order to know in advance 
what to do first when I start work on it.
Bernd Schmidt Oct. 2, 2012, 10:42 p.m. UTC | #6
On 10/03/2012 12:29 AM, Vladimir Makarov wrote:
> On 12-10-02 7:30 AM, Bernd Schmidt wrote:
>> On 09/28/2012 12:59 AM, Vladimir Makarov wrote:
>>> +   We keep RTL code at most time in such state that the virtual
>>> +   registers can be changed by just the corresponding hard registers
>>> +   (with zero offsets) and we have the right RTL code.    To achieve
>>> this
>>> +   we should add initial offset at the beginning of LRA work and update
>>> +   offsets after each stack expanding.    But actually we update
>>> virtual
>>> +   registers to the same virtual registers + corresponding offsets
>>> +   before every constraint pass because it affects constraint
>>> +   satisfaction (e.g. an address displacement became too big for some
>>> +   target).
>>> +
>>> +   The final change of virtual registers to the corresponding hard
>>> +   registers are done at the very end of LRA when there were no change
>>> +   in offsets anymore:
>>> +
>>> +             fp + 42     =>    sp + 42
>> Let me try to understand this.  We have (mem (fp)), which we rewrite to
>> (mem (fp + 42)), but this is intended to represent (mem (sp + 42))?
>>
>> Wouldn't this fail on any target which has different addressing ranges
>> for SP and FP?
>>
>>
> Yes, I think so.  It is not a problem for 9 current targets.  But when I
> or somebody else start porting LRA to such target we can introduce new
> virtual registers (and switch to them at the begging of LRA) to differ
> the situation.  It will be not a big deal.
> 
> I'd appreciate if you tell me such target in order to know in advance
> what to do first when I start work on it.

C6X is such a target. The stack pointer allows a 15 bit unsigned offset,
while all normal registers only allow more limited offsets.

I think the above approach sounds quite hackish and I'd prefer to see
this reworked before merging. Can you retain the current mechanism of
calling eliminate_regs before examining an insn, or is this not workable
for some reason?


Bernd
Vladimir Makarov Oct. 2, 2012, 11:56 p.m. UTC | #7
On 10/02/2012 06:42 PM, Bernd Schmidt wrote:
> On 10/03/2012 12:29 AM, Vladimir Makarov wrote:
>> On 12-10-02 7:30 AM, Bernd Schmidt wrote:
>>> On 09/28/2012 12:59 AM, Vladimir Makarov wrote:
>>>> +   We keep RTL code at most time in such state that the virtual
>>>> +   registers can be changed by just the corresponding hard registers
>>>> +   (with zero offsets) and we have the right RTL code.    To achieve
>>>> this
>>>> +   we should add initial offset at the beginning of LRA work and update
>>>> +   offsets after each stack expanding.    But actually we update
>>>> virtual
>>>> +   registers to the same virtual registers + corresponding offsets
>>>> +   before every constraint pass because it affects constraint
>>>> +   satisfaction (e.g. an address displacement became too big for some
>>>> +   target).
>>>> +
>>>> +   The final change of virtual registers to the corresponding hard
>>>> +   registers are done at the very end of LRA when there were no change
>>>> +   in offsets anymore:
>>>> +
>>>> +             fp + 42     =>    sp + 42
>>> Let me try to understand this.  We have (mem (fp)), which we rewrite to
>>> (mem (fp + 42)), but this is intended to represent (mem (sp + 42))?
>>>
>>> Wouldn't this fail on any target which has different addressing ranges
>>> for SP and FP?
>>>
>>>
>> Yes, I think so.  It is not a problem for 9 current targets.  But when I
>> or somebody else start porting LRA to such target we can introduce new
>> virtual registers (and switch to them at the begging of LRA) to differ
>> the situation.  It will be not a big deal.
>>
>> I'd appreciate if you tell me such target in order to know in advance
>> what to do first when I start work on it.
> C6X is such a target. The stack pointer allows a 15 bit unsigned offset,
> while all normal registers only allow more limited offsets.
>
> I think the above approach sounds quite hackish and I'd prefer to see
> this reworked before merging. Can you retain the current mechanism of
> calling eliminate_regs before examining an insn, or is this not workable
> for some reason?
>
>
   I don't think it is worth to rework.  That is what I found the best 
(and I tried many approaches).  Besides it is usefull to see what the 
current offsets during debugging just looking on RTL.  But the most 
important is that it really saves a lot of code, simplifies the 
implementation making code much clear.

So that is what I exactly would like to save in this implementation.  It 
also follows the major design goal to reflect changes in RTL as much as 
possible.
Steven Bosscher Oct. 3, 2012, 7:13 a.m. UTC | #8
On Tue, Oct 2, 2012 at 3:42 PM, Richard Sandiford
<rdsandiford@googlemail.com> wrote:
>
>> +/* Compress pseudo live ranges by removing program points where
>> +   nothing happens.  Complexity of many algorithms in LRA is linear
>> +   function of program points number.  To speed up the code we try to
>> +   minimize the number of the program points here.  */
>> +static void
>> +remove_some_program_points_and_update_live_ranges (void)
>
> Genuine question, but could we do this on the fly instead,
> by not incrementing curr_point if the current point had no value?
>
> I suppose the main complication would be checking cases where
> all births are recorded by extending the previous just-closed live
> range rather than starting a new one, in which case it's the previous
> point that needs to be reused.  Hmm...

It does seem to be something worth investigating further. Things like:

Compressing live ranges: from 1742579 to 554532 - 31%
Compressing live ranges: from 1742569 to 73069 - 4%

look extreme, but they're actually the norm. For the same test case
(PR54146 still) but looking only at functions with initially 1000-9999
program points, you get this picture:

Compressing live ranges: from 1766 to 705 - 39%
Compressing live ranges: from 1449 to 370 - 25%
Compressing live ranges: from 3939 to 1093 - 27%
Compressing live ranges: from 3939 to 1093 - 27%
Compressing live ranges: from 3939 to 1093 - 27%
Compressing live ranges: from 3939 to 1093 - 27%
Compressing live ranges: from 2464 to 676 - 27%
Compressing live ranges: from 1433 to 379 - 26%
Compressing live ranges: from 1261 to 348 - 27%
Compressing live ranges: from 2835 to 755 - 26%
Compressing live ranges: from 5426 to 1678 - 30%
Compressing live ranges: from 5227 to 1477 - 28%
Compressing live ranges: from 1845 to 467 - 25%
Compressing live ranges: from 4868 to 1378 - 28%
Compressing live ranges: from 4875 to 1388 - 28%
Compressing live ranges: from 4882 to 1384 - 28%
Compressing live ranges: from 5688 to 1714 - 30%
Compressing live ranges: from 4943 to 1310 - 26%
Compressing live ranges: from 2976 to 792 - 26%
Compressing live ranges: from 5463 to 1526 - 27%
Compressing live ranges: from 2854 to 730 - 25%
Compressing live ranges: from 1810 to 745 - 41%
Compressing live ranges: from 2771 to 904 - 32%
Compressing live ranges: from 4916 to 1429 - 29%
Compressing live ranges: from 6505 to 2238 - 34%
Compressing live ranges: from 6493 to 166 - 2%
Compressing live ranges: from 5498 to 1734 - 31%
Compressing live ranges: from 1810 to 745 - 41%
Compressing live ranges: from 5043 to 1420 - 28%
Compressing live ranges: from 3094 to 788 - 25%
Compressing live ranges: from 4563 to 1311 - 28%
Compressing live ranges: from 4557 to 158 - 3%
Compressing live ranges: from 1050 to 274 - 26%
Compressing live ranges: from 1602 to 434 - 27%
Compressing live ranges: from 2474 to 600 - 24%
Compressing live ranges: from 2718 to 866 - 31%
Compressing live ranges: from 2097 to 716 - 34%
Compressing live ranges: from 4152 to 1099 - 26%
Compressing live ranges: from 5065 to 1514 - 29%
Compressing live ranges: from 1236 to 359 - 29%
Compressing live ranges: from 1722 to 541 - 31%
Compressing live ranges: from 5186 to 1401 - 27%

Unfortunately the dump doesn't mention how many live ranges could be
merged thanks to the compression.

It'd also be good to understand why the compression ratios are so
small, and consistently around ~30%. Maybe curr_point includes things
it should ignore (DEBUG_INSNs, NOTEs, ...)?

Ciao!
Steven
Vladimir Makarov Oct. 3, 2012, 2:56 p.m. UTC | #9
On 12-10-03 3:13 AM, Steven Bosscher wrote:
> On Tue, Oct 2, 2012 at 3:42 PM, Richard Sandiford
> <rdsandiford@googlemail.com> wrote:
>>> +/* Compress pseudo live ranges by removing program points where
>>> +   nothing happens.  Complexity of many algorithms in LRA is linear
>>> +   function of program points number.  To speed up the code we try to
>>> +   minimize the number of the program points here.  */
>>> +static void
>>> +remove_some_program_points_and_update_live_ranges (void)
>> Genuine question, but could we do this on the fly instead,
>> by not incrementing curr_point if the current point had no value?
>>
>> I suppose the main complication would be checking cases where
>> all births are recorded by extending the previous just-closed live
>> range rather than starting a new one, in which case it's the previous
>> point that needs to be reused.  Hmm...
> It does seem to be something worth investigating further. Things like:
>
> Compressing live ranges: from 1742579 to 554532 - 31%
> Compressing live ranges: from 1742569 to 73069 - 4%
>
> look extreme, but they're actually the norm. For the same test case
> (PR54146 still) but looking only at functions with initially 1000-9999
> program points, you get this picture:
>
> Compressing live ranges: from 1766 to 705 - 39%
> Compressing live ranges: from 1449 to 370 - 25%
> Compressing live ranges: from 3939 to 1093 - 27%
> Compressing live ranges: from 3939 to 1093 - 27%
> Compressing live ranges: from 3939 to 1093 - 27%
> Compressing live ranges: from 3939 to 1093 - 27%
> Compressing live ranges: from 2464 to 676 - 27%
> Compressing live ranges: from 1433 to 379 - 26%
> Compressing live ranges: from 1261 to 348 - 27%
> Compressing live ranges: from 2835 to 755 - 26%
> Compressing live ranges: from 5426 to 1678 - 30%
> Compressing live ranges: from 5227 to 1477 - 28%
> Compressing live ranges: from 1845 to 467 - 25%
> Compressing live ranges: from 4868 to 1378 - 28%
> Compressing live ranges: from 4875 to 1388 - 28%
> Compressing live ranges: from 4882 to 1384 - 28%
> Compressing live ranges: from 5688 to 1714 - 30%
> Compressing live ranges: from 4943 to 1310 - 26%
> Compressing live ranges: from 2976 to 792 - 26%
> Compressing live ranges: from 5463 to 1526 - 27%
> Compressing live ranges: from 2854 to 730 - 25%
> Compressing live ranges: from 1810 to 745 - 41%
> Compressing live ranges: from 2771 to 904 - 32%
> Compressing live ranges: from 4916 to 1429 - 29%
> Compressing live ranges: from 6505 to 2238 - 34%
> Compressing live ranges: from 6493 to 166 - 2%
> Compressing live ranges: from 5498 to 1734 - 31%
> Compressing live ranges: from 1810 to 745 - 41%
> Compressing live ranges: from 5043 to 1420 - 28%
> Compressing live ranges: from 3094 to 788 - 25%
> Compressing live ranges: from 4563 to 1311 - 28%
> Compressing live ranges: from 4557 to 158 - 3%
> Compressing live ranges: from 1050 to 274 - 26%
> Compressing live ranges: from 1602 to 434 - 27%
> Compressing live ranges: from 2474 to 600 - 24%
> Compressing live ranges: from 2718 to 866 - 31%
> Compressing live ranges: from 2097 to 716 - 34%
> Compressing live ranges: from 4152 to 1099 - 26%
> Compressing live ranges: from 5065 to 1514 - 29%
> Compressing live ranges: from 1236 to 359 - 29%
> Compressing live ranges: from 1722 to 541 - 31%
> Compressing live ranges: from 5186 to 1401 - 27%
>
> Unfortunately the dump doesn't mention how many live ranges could be
> merged thanks to the compression.
>
> It'd also be good to understand why the compression ratios are so
> small, and consistently around ~30%.
This sentence is not clear to me.  30% means 3 times less points. It is 
pretty good compression.
>   Maybe curr_point includes things
> it should ignore (DEBUG_INSNs, NOTEs, ...)?
>
After the compression, there are only points important for conflict 
info, i.e. only points where some reg dies or start living.  Even more 
if on the subsequent points there are only life starts or only deaths,  
they are represented by one point after the compression.
Vladimir Makarov Oct. 8, 2012, 11:53 p.m. UTC | #10
On 12-10-02 7:21 AM, Richard Sandiford wrote:
> Hi Vlad,
>
> Vladimir Makarov <vmakarov@redhat.com> writes:
>> +LRA is different from the reload pass in LRA division on small,
>> +manageable, and separated sub-tasks.  All LRA transformations and
>> +decisions are reflected in RTL as more as possible.  Instruction
>> +constraints as a primary source of the info and that minimizes number
>> +of target-depended macros/hooks.
>>
>> +LRA is run for the targets it were ported.
> Suggest something like:
>
>    Unlike the reload pass, intermediate LRA decisions are reflected in
>    RTL as much as possible.  This reduces the number of target-dependent
>    macros and hooks, leaving instruction constraints as the primary
>    source of control.
>
>    LRA is run on targets for which TARGET_LRA_P returns true.
It is better.  Done.
>> +/* The virtual registers (like argument and frame pointer) are widely
>> +   used in RTL.	 Virtual registers should be changed by real hard
>> +   registers (like stack pointer or hard frame pointer) plus some
>> +   offset.  The offsets are changed usually every time when stack is
>> +   expanded.  We know the final offsets only at the very end of LRA.
> I always think of "virtual" as [FIRST_VIRTUAL_REGISTER, LAST_VIRTUAL_REGISTER].
> Maybe "eliminable" would be better?  E.g.
>
> /* Eliminable registers (like a soft argument or frame pointer) are widely
>     used in RTL.  These eliminable registers should be replaced by real hard
>     registers (like the stack pointer or hard frame pointer) plus some offset.
>     The offsets usually change whenever the stack is expanded.  We know the
>     final offsets only at the very end of LRA.
Yes, right.  Eliminable is a better term.  I'll change it too.
>> +   We keep RTL code at most time in such state that the virtual
>> +   registers can be changed by just the corresponding hard registers
>> +   (with zero offsets) and we have the right RTL code.	To achieve this
>> +   we should add initial offset at the beginning of LRA work and update
>> +   offsets after each stack expanding.	But actually we update virtual
>> +   registers to the same virtual registers + corresponding offsets
>> +   before every constraint pass because it affects constraint
>> +   satisfaction (e.g. an address displacement became too big for some
>> +   target).
> Suggest:
>
>     Within LRA, we usually keep the RTL in such a state that the eliminable
>     registers can be replaced by just the corresponding hard register
>     (without any offset).  To achieve this we should add the initial
>     elimination offset at the beginning of LRA and update the offsets
>     whenever the stack is expanded.  We need to do this before every
>     constraint pass because the choice of offset often affects whether
>     a particular address or memory constraint is satisfied.
Done.
>> +   The final change of virtual registers to the corresponding hard
>> +   registers are done at the very end of LRA when there were no change
>> +   in offsets anymore:
>> +
>> +		     fp + 42	 =>	sp + 42
> virtual=>eliminable if the above is OK.
Ok.  Done.
>> +   Such approach requires a few changes in the rest GCC code because
>> +   virtual registers are not recognized as real ones in some
>> +   constraints and predicates.	Fortunately, such changes are
>> +   small.  */
> Not sure whether the last paragraph really belongs in the code,
> since it's more about the reload->LRA transition.
I removed the last paragraph.
>> +  /* Nonzero if this elimination can be done.  */
>> +  bool can_eliminate;		
>> +  /* CAN_ELIMINATE since the last check.  */
>> +  bool prev_can_eliminate;
> AFAICT, these two fields are (now) only ever assigned at the same time,
> via init_elim_table and setup_can_eliminate.  Looks like we can do
> without prev_can_eliminate.  (And the way that the pass doesn't
> need to differentiate between the raw CAN_ELIMINABLE value and
> the processed value feels nice and reassuring.)
Ok.  I'll put it on to my do list.
>> +/* Map: 'from regno' -> to the current elimination, NULL otherwise.
>> +   The elimination table may contains more one elimination of a hard
>> +   register.  The map contains only one currently used elimination of
>> +   the hard register.  */
>> +static struct elim_table *elimination_map[FIRST_PSEUDO_REGISTER];
> Nit: s/-> to/->/, s/may contains/may contain/.  Maybe:
>
> /* Map: eliminable "from" register -> its current elimination,
>     or NULL if none.  The elimination table may contain more than
>     one elimination for the same hard register, but this map specifies
>     the one that we are currently using.  */
Done.
>> +/* When an eliminable hard register becomes not eliminable, we use the
>> +   special following structure to restore original offsets for the
>> +   register.  */
> s/special following/following special/
Done.
>> +/* Return the current substitution hard register of the elimination of
>> +   HARD_REGNO.	If HARD_REGNO is not eliminable, return itself.	 */
>> +int
>> +lra_get_elimation_hard_regno (int hard_regno)
> Typo: s/get_elimation/get_elimination/
Done.
>> +/* Scan X and replace any eliminable registers (such as fp) with a
>> +   replacement (such as sp) if SYBST_P, plus an offset.	 The offset is
> Typo: SUBST_P.
Done.
>> +   a change in the offset between the eliminable register and its
>> +   substitution if UPDATE_P, or the full offset if FULL_P, or
>> +   otherwise zero.
> I wonder if an enum would be better than two booleans?
> It avoids invalid combinations like UPDATE_P && FULL_P
> and might make the arguments more obvious too.
IMHO, It is matter of choice.  I don't like to introduce a new enum just 
for the function.  It is pretty standard situation.  I usually introduce 
enum when there are a few combinations prohibited.
>> +    /* You might think handling MINUS in a manner similar to PLUS is a
>> +       good idea.  It is not.  It has been tried multiple times and every
>> +       time the change has had to have been reverted.
>> +
>> +       Other parts of LRA know a PLUS is special and require special
>> +       code to handle code a reloaded PLUS operand.
>> +
>> +       Also consider backends where the flags register is clobbered by a
>> +       MINUS, but we can emit a PLUS that does not clobber flags (IA-32,
>> +       lea instruction comes to mind).	If we try to reload a MINUS, we
>> +       may kill the flags register that was holding a useful value.
>> +
>> +       So, please before trying to handle MINUS, consider reload as a
>> +       whole instead of this little section as well as the backend
>> +       issues.	*/
> A few references to the old enemy, especially in the last paragraph.
>
> I think this old comment could be dropped altogether.  Handling MINUS
> in a similar manner isn't a good idea because (minus ... (const_int ...))
> isn't canonical.
I removed it.  It was mostly taken from the original code.
>> +	      /* The only time we want to replace a PLUS with a REG
>> +		 (this occurs when the constant operand of the PLUS is
>> +		 the negative of the offset) is when we are inside a
>> +		 MEM.  We won't want to do so at other times because
>> +		 that would change the structure of the insn in a way
>> +		 that reload can't handle.  We special-case the
>> +		 commonest situation in eliminate_regs_in_insn, so
>> +		 just replace a PLUS with a PLUS here, unless inside a
>> +		 MEM.  */
> Reload reference.  Does this restriction still apply?  The later comment:
I don't think so.  I removed the comment.
>> +	 Note that there is no risk of modifying the structure of the insn,
>> +	 since we only get called for its operands, thus we are either
>> +	 modifying the address inside a MEM, or something like an address
>> +	 operand of a load-address insn.  */
I removed this too.
> makes it sound on face value like the MEM restriction above is a reload-specific
> thing.  Same question for:
>
>> +	    /* As above, if we are not inside a MEM we do not want to
>> +	       turn a PLUS into something else.	 We might try to do so here
>> +	       for an addition of 0 if we aren't optimizing.  */
> Given all the fuss in the 0/9 thread, maybe elimination could be
> done in-place, at least when applied internally by LRA rather than
> externally.  Most of the cases, including PLUS, are for things that
> can't legitimately be shared.
>
> I don't think that should be a requirement for merging though,
> especially since this is how things have always been done.
> Just an idea.
Thanks, I'll keep in the mind.  This file is mostly an adaptation of 
existing code.  It is hard to decide what should go until you have 
enough targets working.  So I did minimal changes.
>> +      /* We do not support elimination of a register that is modified.
>> +	 elimination_effects has already make sure that this does not
>> +	 happen.  */
>> +      return x;
>>
>> +    case PRE_MODIFY:
>> +    case POST_MODIFY:
>> +      /* We do not support elimination of a hard register that is
>> +	 modified.  elimination_effects has already make sure that
>> +	 this does not happen.	The only remaining case we need to
>> +	 consider here is that the increment value may be an
>> +	 eliminable register.  */
> Reload references (elimination_effects).
I've changed it on LRA.  This comment is still true.
>> +#ifdef WORD_REGISTER_OPERATIONS
>> +		   /* On these machines, combine can create RTL of the form
>> +		      (set (subreg:m1 (reg:m2 R) 0) ...)
>> +		      where m1 < m2, and expects something interesting to
>> +		      happen to the entire word.  Moreover, it will use the
>> +		      (reg:m2 R) later, expecting all bits to be preserved.
>> +		      So if the number of words is the same, preserve the
>> +		      subreg so that push_reload can see it.  */
>> +		   && ! ((x_size - 1) / UNITS_PER_WORD
>> +			 == (new_size -1 ) / UNITS_PER_WORD)
>> +#endif
> Reload reference (push_reload).  Do we still need this for LRA?
It is hard me to say.  So I would not touch this code at least for now.  
I changed push reload to LRA.
>> +	    {
>> +	      SUBREG_REG (x) = new_rtx;
>> +	      alter_subreg (&x, false);
>> +	      return x;
>> +	    }
> The reload version is:
>
>      return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));
>
> Isn't that safe here too?  We're only doing this for non-paradoxical subregs, so:
>
>        /* For paradoxical subregs on big-endian machines, SUBREG_BYTE
> 	 contains 0 instead of the proper offset.  See simplify_subreg.  */
>        if (offset == 0
> 	  && GET_MODE_SIZE (GET_MODE (y)) < GET_MODE_SIZE (GET_MODE (x)))
>          {
>            int difference = GET_MODE_SIZE (GET_MODE (y))
> 			   - GET_MODE_SIZE (GET_MODE (x));
>            if (WORDS_BIG_ENDIAN)
>              offset += (difference / UNITS_PER_WORD) * UNITS_PER_WORD;
>            if (BYTES_BIG_ENDIAN)
>              offset += difference % UNITS_PER_WORD;
>          }
>
> doesn't apply.
It might be true.  But I probably did this on purpose trying to solve a 
bug (there were tons of them) on some specific target.
>> +/* Scan rtx X for modifications of elimination target registers.  Update
>> +   the table of eliminables to reflect the changed state.  MEM_MODE is
>> +   the mode of an enclosing MEM rtx, or VOIDmode if not within a MEM.  */
>> +static void
>> +mark_not_eliminable (rtx x)
> Maybe:
>
> /* Scan rtx X for references to elimination source or target registers
>     in contexts that would prevent the elimination from happening.
>     Update ... */
Done.
> The function looks at uses as well as modifications, and at elimination
> source registers as well as target registers.
>
>> +      if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER)
>> +	/* If we modify the source of an elimination rule, disable it.	*/
>> +	for (ep = reg_eliminate;
>> +	     ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
>> +	       ep++)
>> +	  if (ep->from_rtx == XEXP (x, 0)
>> +	      || (ep->to_rtx == XEXP (x, 0)
>> +		  && ep->to_rtx != hard_frame_pointer_rtx))
>> +	    setup_can_eliminate (ep, false);
> Comment doesn't mention the to_rtx case.
Fixed.
>> +   If REPLACE_P is false, do an offset updates.	 */
> s/do an offset updates/just update the offsets/
> Maybe worth adding "while keeping the base register the same".
Done.
>> +static void
>> +eliminate_regs_in_insn (rtx insn, bool replace_p)
>> +{
>> +  int icode = recog_memoized (insn);
>> +  rtx old_set = single_set (insn);
>> +  bool val;
> "validate_p" might be a better name.
>
Done.
>> +	      /* If we are assigning to a hard register that can be
>> +		 eliminated, it must be as part of a PARALLEL, since
>> +		 the code above handles single SETs.  We must indicate
>> +		 that we can no longer eliminate this reg.  */
>> +	      for (ep = reg_eliminate;
>> +		   ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
>> +		   ep++)
>> +		lra_assert (ep->from_rtx != orig_operand[i]
>> +			    || ! ep->can_eliminate);
> This is now an assert, so rather than "We must indicate ...",
> I think the comment should say who enforces this.  (mark_not_eliminable?)
Yes. Fixed.
>> +	  /* If an output operand changed from a REG to a MEM and INSN is an
>> +	     insn, write a CLOBBER insn.  */
>> +	  if (static_id->operand[i].type != OP_IN
>> +	      && REG_P (orig_operand[i])
>> +	      && MEM_P (substed_operand[i])
>> +	      && replace_p)
>> +	    emit_insn_after (gen_clobber (orig_operand[i]), insn);
> I realise this is copied from reload, but why is it needed?
I did minimal changes to original code.  I'll check it on bootstrap and 
if it is ok, I'll remove it.
Thanks for pointing this out.
>> +/* Initialize the table of hard registers to eliminate.
>> +   Pre-condition: global flag frame_pointer_needed has been set before
>> +   calling this function.  Set up hard registers in DONT_USE_REGS
>> +   which can not be used for allocation because their identical
>> +   elimination is not possible.	 */
>> +static void
>> +init_elim_table (void)
> DONT_USE_REGS doesn't exist any more.
Fixed.
>> +/* Eliminate hard reg given by its location LOC.  */
>> +void
>> +lra_eliminate_reg_if_possible (rtx *loc)
>> +{
>> +  int regno;
>> +  struct elim_table *ep;
>> +
>> +  lra_assert (REG_P (*loc));
>> +  if ((regno = REGNO (*loc)) >= FIRST_PSEUDO_REGISTER
>> +      /* Virtual registers are not allocatable. ??? */
>> +      || ! TEST_HARD_REG_BIT (lra_no_alloc_regs, regno))
>> +    return;
> I don't understand this comment.
>   Is the "if" statement needed at all?
> Given the requirement for rtx equality, I'd have thought:
Sorry, I need more time to look at this.
>> +  if ((ep = get_elimination (*loc)) != NULL)
>> +    *loc = ep->to_rtx;
> would do everything.  If that isn't true, the function might need
> a bit more commentary.
>
>> +  for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
>> +    if (lra_reg_info[i].nrefs != 0)
>> +      {
>> +	mem_loc = ira_reg_equiv[i].memory;
>> +	invariant = ira_reg_equiv[i].invariant;
>> +	if (mem_loc != NULL_RTX)
>> +	  mem_loc = lra_eliminate_regs_1 (mem_loc, VOIDmode,
>> +					  final_p, ! final_p, false);
>> +	ira_reg_equiv[i].memory = mem_loc;
>> +	if (invariant != NULL_RTX)
>> +	  invariant = lra_eliminate_regs_1 (invariant, VOIDmode,
>> +					    final_p, ! final_p, false);
>> +	ira_reg_equiv[i].invariant = invariant;
> Minor nit, but the first assignment to "invariant" seems to belong
> a bit further down.
Fixed.
> Looks really good to me FWIW.  Keeping the rtl in a form where the
> final elimination is simply a register replacement is a nice trick.
Some people don't like it.  But it simplified the implementation a lot 
and also makes debugging more convenient.
Richard, many thanks for reviewing it, especially for better comments as 
English is not my native language.
Richard Sandiford Oct. 9, 2012, 10:17 a.m. UTC | #11
Thanks for the updates.

Vladimir Makarov <vmakarov@redhat.com> writes:
>>> +   a change in the offset between the eliminable register and its
>>> +   substitution if UPDATE_P, or the full offset if FULL_P, or
>>> +   otherwise zero.
>> I wonder if an enum would be better than two booleans?
>> It avoids invalid combinations like UPDATE_P && FULL_P
>> and might make the arguments more obvious too.
> IMHO, It is matter of choice.  I don't like to introduce a new enum just 
> for the function.  It is pretty standard situation.  I usually introduce 
> enum when there are a few combinations prohibited.

OK.  I agree this is probably personal preference.

>>> +	      /* The only time we want to replace a PLUS with a REG
>>> +		 (this occurs when the constant operand of the PLUS is
>>> +		 the negative of the offset) is when we are inside a
>>> +		 MEM.  We won't want to do so at other times because
>>> +		 that would change the structure of the insn in a way
>>> +		 that reload can't handle.  We special-case the
>>> +		 commonest situation in eliminate_regs_in_insn, so
>>> +		 just replace a PLUS with a PLUS here, unless inside a
>>> +		 MEM.  */
>> Reload reference.  Does this restriction still apply?  The later comment:
> I don't think so.  I removed the comment.

Well, the question was about the code as much as the comment.
The comment did describe what the code did:

	      if (mem_mode != 0
		  && CONST_INT_P (XEXP (x, 1))
		  && INTVAL (XEXP (x, 1)) == -offset)
		return to;
	      else
		return gen_rtx_PLUS (Pmode, to,
				     plus_constant (Pmode,
						    XEXP (x, 1), offset));

If the restriction doesn't apply any more then the mem_mode condition
should be removed.  If does apply then we should have some sort of
comment to explain why.

I suppose the question is: what happens for PLUS match_operators?
If elimination changes a (plus (reg X) (const_int Y)) into (reg X'),
and the (plus ...) is matched via a match_operator, will LRA cope
correctly?  Or does LRA require a plus matched via a match_operator
to remain a plus?  Or shouldn't we eliminate match_operators at all,
and just process true operands?

I wasn't sure at this point (and still haven't read through everything,
so am still not sure now).

>>> +	 Note that there is no risk of modifying the structure of the insn,
>>> +	 since we only get called for its operands, thus we are either
>>> +	 modifying the address inside a MEM, or something like an address
>>> +	 operand of a load-address insn.  */
> I removed this too.

I think that's still accurate and should be kept.  I was just using
it to emphasise a point (probably badly, sorry).

>> makes it sound on face value like the MEM restriction above is a
>> reload-specific
>> thing.  Same question for:
>>
>>> +	    /* As above, if we are not inside a MEM we do not want to
>>> +	       turn a PLUS into something else.	 We might try to do so here
>>> +	       for an addition of 0 if we aren't optimizing.  */

It looks like your follow-up patch left this alone FWIW.

>>> +#ifdef WORD_REGISTER_OPERATIONS
>>> +		   /* On these machines, combine can create RTL of the form
>>> +		      (set (subreg:m1 (reg:m2 R) 0) ...)
>>> +		      where m1 < m2, and expects something interesting to
>>> +		      happen to the entire word.  Moreover, it will use the
>>> +		      (reg:m2 R) later, expecting all bits to be preserved.
>>> +		      So if the number of words is the same, preserve the
>>> +		      subreg so that push_reload can see it.  */
>>> +		   && ! ((x_size - 1) / UNITS_PER_WORD
>>> +			 == (new_size -1 ) / UNITS_PER_WORD)
>>> +#endif
>> Reload reference (push_reload).  Do we still need this for LRA?
> It is hard me to say.  So I would not touch this code at least for now.  
> I changed push reload to LRA.

Could I ask you to reconsider?  The situation that the comment describes
sounds like a bug to me.  Removing it shouldn't affect the 4.8 submission.

It just seems to me that LRA is our big chance of getting rid of some
of this cruft.  If we're too scared to touch code like this even on
a big change like reload to LRA, we'll never be able to touch it.

>>> +static void
>>> +eliminate_regs_in_insn (rtx insn, bool replace_p)
>>> +{
>>> +  int icode = recog_memoized (insn);
>>> +  rtx old_set = single_set (insn);
>>> +  bool val;
>> "validate_p" might be a better name.
>>
> Done.

Sorry for being too terse.  I see you've changed "replace_p" to
"validate_p", but I actually meant that _"val"_ should be changed
to "validate_p".  Elsewhere you use "val" to mean "value number",
and "val" could equally mean "validate_p" or "validated_p".

"replace_p" was a good name. :-)

>>> +	  /* If an output operand changed from a REG to a MEM and INSN is an
>>> +	     insn, write a CLOBBER insn.  */
>>> +	  if (static_id->operand[i].type != OP_IN
>>> +	      && REG_P (orig_operand[i])
>>> +	      && MEM_P (substed_operand[i])
>>> +	      && replace_p)
>>> +	    emit_insn_after (gen_clobber (orig_operand[i]), insn);
>> I realise this is copied from reload, but why is it needed?
> I did minimal changes to original code.  I'll check it on bootstrap and 
> if it is ok, I'll remove it.

Thanks.

Richard
Vladimir Makarov Oct. 10, 2012, 2:56 a.m. UTC | #12
On 10/09/2012 06:17 AM, Richard Sandiford wrote:
> Thanks for the updates.
>
> Vladimir Makarov<vmakarov@redhat.com>  writes:
>>>> +   a change in the offset between the eliminable register and its
>>>> +   substitution if UPDATE_P, or the full offset if FULL_P, or
>>>> +   otherwise zero.
>>> I wonder if an enum would be better than two booleans?
>>> It avoids invalid combinations like UPDATE_P && FULL_P
>>> and might make the arguments more obvious too.
>> IMHO, It is matter of choice.  I don't like to introduce a new enum just
>> for the function.  It is pretty standard situation.  I usually introduce
>> enum when there are a few combinations prohibited.
> OK.  I agree this is probably personal preference.
>
>>>> +	      /* The only time we want to replace a PLUS with a REG
>>>> +		 (this occurs when the constant operand of the PLUS is
>>>> +		 the negative of the offset) is when we are inside a
>>>> +		 MEM.  We won't want to do so at other times because
>>>> +		 that would change the structure of the insn in a way
>>>> +		 that reload can't handle.  We special-case the
>>>> +		 commonest situation in eliminate_regs_in_insn, so
>>>> +		 just replace a PLUS with a PLUS here, unless inside a
>>>> +		 MEM.  */
>>> Reload reference.  Does this restriction still apply?  The later comment:
>> I don't think so.  I removed the comment.
> Well, the question was about the code as much as the comment.
> The comment did describe what the code did:
>
> 	      if (mem_mode != 0
> 		  && CONST_INT_P (XEXP (x, 1))
> 		  && INTVAL (XEXP (x, 1)) == -offset)
> 		return to;
> 	      else
> 		return gen_rtx_PLUS (Pmode, to,
> 				     plus_constant (Pmode,
> 						    XEXP (x, 1), offset));
>
> If the restriction doesn't apply any more then the mem_mode condition
> should be removed.  If does apply then we should have some sort of
> comment to explain why.
>
> I suppose the question is: what happens for PLUS match_operators?
> If elimination changes a (plus (reg X) (const_int Y)) into (reg X'),
> and the (plus ...) is matched via a match_operator, will LRA cope
> correctly?  Or does LRA require a plus matched via a match_operator
> to remain a plus?  Or shouldn't we eliminate match_operators at all,
> and just process true operands?
>
> I wasn't sure at this point (and still haven't read through everything,
> so am still not sure now).
I guess LRA can handle such change with or without minor modification (a 
new insn recognition).  So I am removing mem_mode condition.  At least I 
did not find a problem at least on two targets.  I might return the code 
if I find a target where it is really necessary.
>
>>>> +	 Note that there is no risk of modifying the structure of the insn,
>>>> +	 since we only get called for its operands, thus we are either
>>>> +	 modifying the address inside a MEM, or something like an address
>>>> +	 operand of a load-address insn.  */
>> I removed this too.
> I think that's still accurate and should be kept.  I was just using
> it to emphasise a point (probably badly, sorry).
I returned the comment.
>>> makes it sound on face value like the MEM restriction above is a
>>> reload-specific
>>> thing.  Same question for:
>>>
>>>> +	    /* As above, if we are not inside a MEM we do not want to
>>>> +	       turn a PLUS into something else.	 We might try to do so here
>>>> +	       for an addition of 0 if we aren't optimizing.  */
> It looks like your follow-up patch left this alone FWIW.
I modify the code as above in hope that the removed code will be not 
necessary.
>>>> +#ifdef WORD_REGISTER_OPERATIONS
>>>> +		   /* On these machines, combine can create RTL of the form
>>>> +		      (set (subreg:m1 (reg:m2 R) 0) ...)
>>>> +		      where m1 < m2, and expects something interesting to
>>>> +		      happen to the entire word.  Moreover, it will use the
>>>> +		      (reg:m2 R) later, expecting all bits to be preserved.
>>>> +		      So if the number of words is the same, preserve the
>>>> +		      subreg so that push_reload can see it.  */
>>>> +		   && ! ((x_size - 1) / UNITS_PER_WORD
>>>> +			 == (new_size -1 ) / UNITS_PER_WORD)
>>>> +#endif
>>> Reload reference (push_reload).  Do we still need this for LRA?
>> It is hard me to say.  So I would not touch this code at least for now.
>> I changed push reload to LRA.
> Could I ask you to reconsider?  The situation that the comment describes
> sounds like a bug to me.  Removing it shouldn't affect the 4.8 submission.
>
> It just seems to me that LRA is our big chance of getting rid of some
> of this cruft.  If we're too scared to touch code like this even on
> a big change like reload to LRA, we'll never be able to touch it.
Yes, you are right.  I am removing this too.  It does not affect x86.  
It might affect other targets (although I don't think so.  I guess LRA 
does not need this). If it affects, I'll find why and try to fix it on 
the branch for other target later.
>>>> +static void
>>>> +eliminate_regs_in_insn (rtx insn, bool replace_p)
>>>> +{
>>>> +  int icode = recog_memoized (insn);
>>>> +  rtx old_set = single_set (insn);
>>>> +  bool val;
>>> "validate_p" might be a better name.
>>>
>> Done.
> Sorry for being too terse.  I see you've changed "replace_p" to
> "validate_p", but I actually meant that _"val"_ should be changed
> to "validate_p".  Elsewhere you use "val" to mean "value number",
> and "val" could equally mean "validate_p" or "validated_p".
>
> "replace_p" was a good name. :-)
Sorry, my fault.  I've should have been more attentive.  I fixed this.
>>>> +	  /* If an output operand changed from a REG to a MEM and INSN is an
>>>> +	     insn, write a CLOBBER insn.  */
>>>> +	  if (static_id->operand[i].type != OP_IN
>>>> +	      && REG_P (orig_operand[i])
>>>> +	      && MEM_P (substed_operand[i])
>>>> +	      && replace_p)
>>>> +	    emit_insn_after (gen_clobber (orig_operand[i]), insn);
>>> I realise this is copied from reload, but why is it needed?
>> I did minimal changes to original code.  I'll check it on bootstrap and
>> if it is ok, I'll remove it.
>
I am removing this.  I did not find that it affects LRA at least on 2 
targets.

Probably I should have done all these changes at the beginning but i was 
overwhelmed by other problems trying to validate the overall approach first.
Vladimir Makarov Oct. 10, 2012, 7:05 p.m. UTC | #13
On 12-10-02 9:42 AM, Richard Sandiford wrote:
> Vladimir Makarov <vmakarov@redhat.com> writes:
>> This is the major patch containing all new files.  The patch also adds
>> necessary calls to LRA from IRA.As the patch is too big, it continues in
>> the next email.
>>
>> 2012-09-27  Vladimir Makarov  <vmakarov@redhat.com>
>>
>>       * Makefile.in (LRA_INT_H): New.
>>       (OBJS): Add lra.o, lra-assigns.o, lra-coalesce.o,
>>       lra-constraints.o, lra-eliminations.o, lra-lives.o, and lra-spills.o.
>>       (ira.o): Add dependence on lra.h.
>>       (lra.o, lra-assigns.o, lra-coalesce.o, lra-constraints.o): New entries.
>>       (lra-eliminations.o, lra-lives.o, lra-spills.o): Ditto.
>>       * ira.c: Include lra.h.
>>       (ira_init_once, ira_init, ira_finish_once): Call lra_start_once,
>>       lra_init, lra_finish_once in anyway.
>>           (lra_in_progress): Remove.
>>       (do_reload): Call LRA.
>>       * lra.h: New.
>>       * lra-int.h: Ditto.
>>       * lra.c: Ditto.
>>       * lra-assigns.c: Ditto.
>>       * lra-constraints.c: Ditto.
>>       * lra-coalesce.c: Ditto.
>>       * lra-eliminations.c: Ditto.
>>       * lra-lives.c: Ditto.
>>       * lra-spills.c: Ditto.
>>       * doc/passes.texi: Describe LRA pass.
> Comments on ira-lives.c.  (Sorry for the split, had more time to look
> at this than expected)
>
>> +/* Copy live range list given by its head R and return the result.  */
>> +lra_live_range_t
>> +lra_copy_live_range_list (lra_live_range_t r)
>> +{
>> +  lra_live_range_t p, first, last;
>> +
>> +  if (r == NULL)
>> +    return NULL;
>> +  for (first = last = NULL; r != NULL; r = r->next)
>> +    {
>> +      p = copy_live_range (r);
>> +      if (first == NULL)
>> +	first = p;
>> +      else
>> +	last->next = p;
>> +      last = p;
>> +    }
>> +  return first;
>> +}
> Maybe simpler as:
>
>    lra_live_range_t p, first, *chain;
>
>    first = NULL;
>    for (chain = &first; r != NULL; r = r->next)
>      {
>        p = copy_live_range (r);
>        *chain = p;
>        chain = &p->next;
>      }
>    return first;
>
OK.
>> +/* Merge ranges R1 and R2 and returns the result.  The function
>> +   maintains the order of ranges and tries to minimize size of the
>> +   result range list.  */
>> +lra_live_range_t
>> +lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2)
>> +{
>> +  lra_live_range_t first, last, temp;
>> +
>> +  if (r1 == NULL)
>> +    return r2;
>> +  if (r2 == NULL)
>> +    return r1;
>> +  for (first = last = NULL; r1 != NULL && r2 != NULL;)
>> +    {
>> +      if (r1->start < r2->start)
>> +	{
>> +	  temp = r1;
>> +	  r1 = r2;
>> +	  r2 = temp;
>> +	}
>> +      if (r1->start <= r2->finish + 1)
>> +	{
>> +	  /* Intersected ranges: merge r1 and r2 into r1.  */
>> +	  r1->start = r2->start;
>> +	  if (r1->finish < r2->finish)
>> +	    r1->finish = r2->finish;
>> +	  temp = r2;
>> +	  r2 = r2->next;
>> +	  pool_free (live_range_pool, temp);
>> +	  if (r2 == NULL)
>> +	    {
>> +	      /* To try to merge with subsequent ranges in r1.	*/
>> +	      r2 = r1->next;
>> +	      r1->next = NULL;
>> +	    }
>> +	}
>> +      else
>> +	{
>> +	  /* Add r1 to the result.  */
>> +	  if (first == NULL)
>> +	    first = last = r1;
>> +	  else
>> +	    {
>> +	      last->next = r1;
>> +	      last = r1;
>> +	    }
>> +	  r1 = r1->next;
>> +	  if (r1 == NULL)
>> +	    {
>> +	      /* To try to merge with subsequent ranges in r2.	*/
>> +	      r1 = r2->next;
>> +	      r2->next = NULL;
>> +	    }
>> +	}
> I might be misreading, but I'm not sure whether this handles merges like:
>
>    r1 = [6,7], [3,4]
>    r2 = [3,8], [0,1]
>
> After the first iteration, it looks like we'll have:
>
>    r1 = [3,8], [3,4]
>    r2 = [0,1]
>
> Then we'll add both [3,8] and [3,4] to the result.
>
> Same chain pointer comment as for lra_merge_live_ranges.
>
>> +/* Return TRUE if live range R1 is in R2.  */
>> +bool
>> +lra_live_range_in_p (lra_live_range_t r1, lra_live_range_t r2)
>> +{
>> +  /* Remember the live ranges are always kept ordered.	*/
>> +  while (r1 != NULL && r2 != NULL)
>> +    {
>> +      /* R1's element is in R2's element.  */
>> +      if (r2->start <= r1->start && r1->finish <= r2->finish)
>> +	r1 = r1->next;
>> +      /* Intersection: R1's start is in R2.  */
>> +      else if (r2->start <= r1->start && r1->start <= r2->finish)
>> +	return false;
>> +      /* Intersection: R1's finish is in R2.  */
>> +      else if (r2->start <= r1->finish && r1->finish <= r2->finish)
>> +	return false;
>> +      else if (r1->start > r2->finish)
>> +	return false; /* No covering R2's element for R1's one.	 */
>> +      else
>> +	r2 = r2->next;
>> +    }
>> +  return r1 == NULL;
> Does the inner bit reduce to:
>
>        /* R1's element is in R2's element.  */
>        if (r2->start <= r1->start && r1->finish <= r2->finish)
> 	r1 = r1->next;
>        /* All of R2's element comes after R1's element.  */
>        else if (r2->start > r1->finish)
> 	r2 = r2->next;
>        else
> 	return false;
Yes. It seems to me the right change.  I found that this function is not 
used in LRA anymore.   So I am just removing the function. Sorry for 
wasting your time reviewing this.
I wrote this function when experimented with more sophisticated 
algorithms of range compression.  It  might be needed in future.  So I 
keep it in my mind.
> (Genuine question)
>
>> +/* Process the death of hard register REGNO.  This updates
>> +   hard_regs_live and START_DYING.  */
>> +static void
>> +make_hard_regno_dead (int regno)
>> +{
>> +  if (TEST_HARD_REG_BIT (lra_no_alloc_regs, regno)
>> +      || ! TEST_HARD_REG_BIT (hard_regs_live, regno))
>> +    return;
>> +  lra_assert (regno < FIRST_PSEUDO_REGISTER);
>> +  sparseset_set_bit (start_dying, regno);
>> +  CLEAR_HARD_REG_BIT (hard_regs_live, regno);
>> +}
> Assert should be before the HARD_REG_SET stuff (like for
> make_hard_regno_born).
Fixed.  Thanks.
>> +	  /* Check that source regno does not conflict with
>> +	     destination regno to exclude most impossible
>> +	     preferences.  */
>> +	  && ((((src_regno = REGNO (SET_SRC (set))) >= FIRST_PSEUDO_REGISTER
>> +		&& ! sparseset_bit_p (pseudos_live, src_regno))
>> +	       || (src_regno < FIRST_PSEUDO_REGISTER
>> +		   && ! TEST_HARD_REG_BIT (hard_regs_live, src_regno)))
> This is probably personal preference, but I think this would be more
> readable with an inline utility function (regno_live_p, or whatever).
It might be.  But as the code is only in one place, I'd rather not 
change it.
>> +/* Compress pseudo live ranges by removing program points where
>> +   nothing happens.  Complexity of many algorithms in LRA is linear
>> +   function of program points number.  To speed up the code we try to
>> +   minimize the number of the program points here.  */
>> +static void
>> +remove_some_program_points_and_update_live_ranges (void)
> Genuine question, but could we do this on the fly instead,
> by not incrementing curr_point if the current point had no value?
>
> I suppose the main complication would be checking cases where
> all births are recorded by extending the previous just-closed live
> range rather than starting a new one, in which case it's the previous
> point that needs to be reused.  Hmm...
>
I thought about this when I wrote the code.  I believed it is not 
critical code and wrote it simple.  Apparently I was wrong. Minimizing 
ranges during creation is time critical for the huge tests.  As you 
probably saw, it was changed by recent patches. There are still a 
potential to speed up this code even more for the huge functions.
Vladimir Makarov Oct. 10, 2012, 7:07 p.m. UTC | #14
On 12-10-02 10:14 AM, Richard Sandiford wrote:
> Richard Sandiford <rdsandiford@googlemail.com> writes:
>>> +/* Merge ranges R1 and R2 and returns the result.  The function
>>> +   maintains the order of ranges and tries to minimize size of the
>>> +   result range list.  */
>>> +lra_live_range_t
>>> +lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2)
>>> +{
>>> +  lra_live_range_t first, last, temp;
>>> +
>>> +  if (r1 == NULL)
>>> +    return r2;
>>> +  if (r2 == NULL)
>>> +    return r1;
>>> +  for (first = last = NULL; r1 != NULL && r2 != NULL;)
>>> +    {
>>> +      if (r1->start < r2->start)
>>> +	{
>>> +	  temp = r1;
>>> +	  r1 = r2;
>>> +	  r2 = temp;
>>> +	}
>>> +      if (r1->start <= r2->finish + 1)
>>> +	{
>>> +	  /* Intersected ranges: merge r1 and r2 into r1.  */
>>> +	  r1->start = r2->start;
>>> +	  if (r1->finish < r2->finish)
>>> +	    r1->finish = r2->finish;
>>> +	  temp = r2;
>>> +	  r2 = r2->next;
>>> +	  pool_free (live_range_pool, temp);
>>> +	  if (r2 == NULL)
>>> +	    {
>>> +	      /* To try to merge with subsequent ranges in r1.	*/
>>> +	      r2 = r1->next;
>>> +	      r1->next = NULL;
>>> +	    }
>>> +	}
>>> +      else
>>> +	{
>>> +	  /* Add r1 to the result.  */
>>> +	  if (first == NULL)
>>> +	    first = last = r1;
>>> +	  else
>>> +	    {
>>> +	      last->next = r1;
>>> +	      last = r1;
>>> +	    }
>>> +	  r1 = r1->next;
>>> +	  if (r1 == NULL)
>>> +	    {
>>> +	      /* To try to merge with subsequent ranges in r2.	*/
>>> +	      r1 = r2->next;
>>> +	      r2->next = NULL;
>>> +	    }
>>> +	}
>> I might be misreading, but I'm not sure whether this handles merges like:
>>
>>    r1 = [6,7], [3,4]
>>    r2 = [3,8], [0,1]
>>
>> After the first iteration, it looks like we'll have:
>>
>>    r1 = [3,8], [3,4]
>>    r2 = [0,1]
>>
>> Then we'll add both [3,8] and [3,4] to the result.
> OK, so I start to read patch b and realise that this is only supposed to
> handle non-overlapping live ranges.  It might be worth having a comment
> and assert to that effect, for slow readers like me.
>
> Although in that case the function feels a little more complicated than
> it needs to be.  When we run out of R1 or R2, why not just use the other
> one as the rest of the live range list?  Why is:
>
>>> +	  if (r1 == NULL)
>>> +	    {
>>> +	      /* To try to merge with subsequent ranges in r2.	*/
>>> +	      r1 = r2->next;
>>> +	      r2->next = NULL;
>>> +	    }
> needed?
>
>
No, it is not necessary (if an assert is removed below).  I simplified 
the code, added comments and an assert checking intersection.  Actually 
I wanted to simplify it anyway later to speed up LRA (this function is 
called many times for PR54146).
diff mbox

Patch

Index: Makefile.in
===================================================================
--- Makefile.in	(revision 191771)
+++ Makefile.in	(working copy)
@@ -938,6 +938,7 @@  TREE_DATA_REF_H = tree-data-ref.h $(OMEG
 TREE_INLINE_H = tree-inline.h vecir.h
 REAL_H = real.h $(MACHMODE_H)
 IRA_INT_H = ira.h ira-int.h $(CFGLOOP_H) alloc-pool.h
+LRA_INT_H = lra.h lra-int.h
 DBGCNT_H = dbgcnt.h dbgcnt.def
 EBITMAP_H = ebitmap.h sbitmap.h
 LTO_STREAMER_H = lto-streamer.h $(LINKER_PLUGIN_API_H) $(TARGET_H) \
@@ -1269,6 +1270,13 @@  OBJS = \
 	loop-unroll.o \
 	loop-unswitch.o \
 	lower-subreg.o \
+	lra.o \
+	lra-assigns.o \
+	lra-coalesce.o \
+	lra-constraints.o \
+	lra-eliminations.o \
+	lra-lives.o \
+	lra-spills.o \
 	lto-cgraph.o \
 	lto-streamer.o \
 	lto-streamer-in.o \
@@ -3213,7 +3221,43 @@  ira.o: ira.c $(CONFIG_H) $(SYSTEM_H) cor
    $(TM_H) $(REGS_H) $(RTL_H) $(TM_P_H) $(TARGET_H) $(FLAGS_H) $(OBSTACK_H) \
    $(BITMAP_H) hard-reg-set.h $(BASIC_BLOCK_H) $(DBGCNT_H) $(FUNCTION_H) \
    $(EXPR_H) $(RECOG_H) $(PARAMS_H) $(TREE_PASS_H) output.h \
-   $(EXCEPT_H) reload.h toplev.h $(DIAGNOSTIC_CORE_H) $(DF_H) $(GGC_H) $(IRA_INT_H)
+   $(EXCEPT_H) reload.h toplev.h $(DIAGNOSTIC_CORE_H) \
+   $(DF_H) $(GGC_H) $(IRA_INT_H) lra.h
+lra.o : lra.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(RTL_H) $(REGS_H) insn-config.h insn-codes.h $(TIMEVAR_H) $(TREE_PASS_H) \
+   $(DF_H) $(RECOG_H) output.h addresses.h $(REGS_H) hard-reg-set.h \
+   $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(BASIC_BLOCK_H) $(TM_P_H) \
+   $(EXCEPT_H) ira.h $(LRA_INT_H)
+lra-assigns.o : lra-assigns.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+   $(TM_H) $(RTL_H) $(REGS_H) insn-config.h $(DF_H) \
+   $(RECOG_H) output.h $(REGS_H) hard-reg-set.h $(FLAGS_H) $(FUNCTION_H) \
+   $(EXPR_H) $(BASIC_BLOCK_H) $(TM_P_H) $(EXCEPT_H) ira.h \
+   rtl-error.h sparseset.h $(LRA_INT_H)
+lra-coalesce.o : lra-coalesce.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+   $(TM_H) $(RTL_H) $(REGS_H) insn-config.h $(DF_H) \
+   $(RECOG_H) output.h $(REGS_H) hard-reg-set.h $(FLAGS_H) $(FUNCTION_H) \
+   $(EXPR_H) $(BASIC_BLOCK_H) $(TM_P_H) $(EXCEPT_H) ira.h \
+   rtl-error.h ira.h $(LRA_INT_H)
+lra-constraints.o : lra-constraints.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+   $(TM_H) $(RTL_H) $(REGS_H) insn-config.h insn-codes.h $(DF_H) \
+   $(RECOG_H) output.h addresses.h $(REGS_H) hard-reg-set.h $(FLAGS_H) \
+   $(FUNCTION_H) $(EXPR_H) $(BASIC_BLOCK_H) $(TM_P_H) $(EXCEPT_H) \
+   ira.h rtl-error.h $(LRA_INT_H)
+lra-eliminations.o : lra-eliminations.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+   $(TM_H) $(RTL_H) $(REGS_H) insn-config.h $(DF_H) \
+   $(RECOG_H) output.h $(REGS_H) hard-reg-set.h $(FLAGS_H) $(FUNCTION_H) \
+   $(EXPR_H) $(BASIC_BLOCK_H) $(TM_P_H) $(EXCEPT_H) ira.h \
+   rtl-error.h $(LRA_INT_H)
+lra-lives.o : lra-lives.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(RTL_H) $(REGS_H) insn-config.h $(DF_H) \
+   $(RECOG_H) output.h $(REGS_H) hard-reg-set.h $(FLAGS_H) $(FUNCTION_H) \
+   $(EXPR_H) $(BASIC_BLOCK_H) $(TM_P_H) $(EXCEPT_H) \
+   $(LRA_INT_H)
+lra-spills.o : lra-spills.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(RTL_H) $(REGS_H) insn-config.h $(DF_H) \
+   $(RECOG_H) output.h $(REGS_H) hard-reg-set.h $(FLAGS_H) $(FUNCTION_H) \
+   $(EXPR_H) $(BASIC_BLOCK_H) $(TM_P_H) $(EXCEPT_H) \
+   ira.h $(LRA_INT_H)
 regmove.o : regmove.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    insn-config.h $(TREE_PASS_H) $(DF_H) \
    $(RECOG_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) $(FUNCTION_H) \
Index: ira.c
===================================================================
--- ira.c	(revision 191771)
+++ ira.c	(working copy)
@@ -382,6 +382,7 @@  along with GCC; see the file COPYING3.
 #include "function.h"
 #include "ggc.h"
 #include "ira-int.h"
+#include "lra.h"
 #include "dce.h"
 #include "dbgcnt.h"
 
@@ -1633,6 +1634,7 @@  void
 ira_init_once (void)
 {
   ira_init_costs_once ();
+  lra_init_once ();
 }
 
 /* Free ira_max_register_move_cost, ira_may_move_in_cost and
@@ -1680,6 +1682,7 @@  ira_init (void)
   clarify_prohibited_class_mode_regs ();
   setup_hard_regno_aclass ();
   ira_init_costs ();
+  lra_init ();
 }
 
 /* Function called once at the end of compiler work.  */
@@ -1688,6 +1691,7 @@  ira_finish_once (void)
 {
   ira_finish_costs_once ();
   free_register_move_costs ();
+  lra_finish_once ();
 }
 
 
@@ -4308,9 +4312,6 @@  bool ira_conflicts_p;
 /* Saved between IRA and reload.  */
 static int saved_flag_ira_share_spill_slots;
 
-/* Set to 1 while in lra.  */
-int lra_in_progress = 0;
-
 /* This is the main entry of IRA.  */
 static void
 ira (FILE *f)
@@ -4549,6 +4550,8 @@  do_reload (void)
 
       ira_destroy ();
 
+      lra (ira_dump_file);
+
       VEC_free (reg_equivs_t, gc, reg_equivs);
       reg_equivs = NULL;
       need_dce = false;
Index: doc/passes.texi
===================================================================
--- doc/passes.texi	(revision 191771)
+++ doc/passes.texi	(working copy)
@@ -771,7 +771,7 @@  branch instructions.  The source file fo
 This pass attempts to replace conditional branches and surrounding
 assignments with arithmetic, boolean value producing comparison
 instructions, and conditional move instructions.  In the very last
-invocation after reload, it will generate predicated instructions
+invocation after reload/LRA, it will generate predicated instructions
 when supported by the target.  The code is located in @file{ifcvt.c}.
 
 @item Web construction
@@ -842,9 +842,9 @@  source file is @file{regmove.c}.
 The integrated register allocator (@acronym{IRA}).  It is called
 integrated because coalescing, register live range splitting, and hard
 register preferencing are done on-the-fly during coloring.  It also
-has better integration with the reload pass.  Pseudo-registers spilled
-by the allocator or the reload have still a chance to get
-hard-registers if the reload evicts some pseudo-registers from
+has better integration with the reload/LRA pass.  Pseudo-registers spilled
+by the allocator or the reload/LRA have still a chance to get
+hard-registers if the reload/LRA evicts some pseudo-registers from
 hard-registers.  The allocator helps to choose better pseudos for
 spilling based on their live ranges and to coalesce stack slots
 allocated for the spilled pseudo-registers.  IRA is a regional
@@ -875,6 +875,24 @@  instructions to save and restore call-cl
 
 Source files are @file{reload.c} and @file{reload1.c}, plus the header
 @file{reload.h} used for communication between them.
+
+@cindex Local Register Allocator (LRA)
+@item
+This pass is a modern replacement of the reload pass.  Source files
+are @file{lra.c}, @file{lra-assign.c}, @file{lra-coalesce.c},
+@file{lra-constraints.c}, @file{lra-eliminations.c},
+@file{lra-equivs.c}, @file{lra-lives.c}, @file{lra-saves.c},
+@file{lra-spills.c}, the header @file{lra-int.h} used for
+communication between them, and the header @file{lra.h} used for
+communication between LRA and the rest of compiler.
+
+LRA is different from the reload pass in LRA division on small,
+manageable, and separated sub-tasks.  All LRA transformations and
+decisions are reflected in RTL as more as possible.  Instruction
+constraints as a primary source of the info and that minimizes number
+of target-depended macros/hooks.
+
+LRA is run for the targets it were ported.
 @end itemize
 
 @item Basic block reordering
Index: lra-eliminations.c
===================================================================
--- lra-eliminations.c	(revision 0)
+++ lra-eliminations.c	(working copy)
@@ -0,0 +1,1352 @@ 
+/* Code for RTL register eliminations.
+   Copyright (C) 2010, 2011, 2012
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.	If not see
+<http://www.gnu.org/licenses/>.	 */
+
+/* The virtual registers (like argument and frame pointer) are widely
+   used in RTL.	 Virtual registers should be changed by real hard
+   registers (like stack pointer or hard frame pointer) plus some
+   offset.  The offsets are changed usually every time when stack is
+   expanded.  We know the final offsets only at the very end of LRA.
+
+   We keep RTL code at most time in such state that the virtual
+   registers can be changed by just the corresponding hard registers
+   (with zero offsets) and we have the right RTL code.	To achieve this
+   we should add initial offset at the beginning of LRA work and update
+   offsets after each stack expanding.	But actually we update virtual
+   registers to the same virtual registers + corresponding offsets
+   before every constraint pass because it affects constraint
+   satisfaction (e.g. an address displacement became too big for some
+   target).
+
+   The final change of virtual registers to the corresponding hard
+   registers are done at the very end of LRA when there were no change
+   in offsets anymore:
+
+		     fp + 42	 =>	sp + 42
+
+   Such approach requires a few changes in the rest GCC code because
+   virtual registers are not recognized as real ones in some
+   constraints and predicates.	Fortunately, such changes are
+   small.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "hard-reg-set.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "regs.h"
+#include "insn-config.h"
+#include "insn-codes.h"
+#include "recog.h"
+#include "output.h"
+#include "addresses.h"
+#include "target.h"
+#include "function.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "except.h"
+#include "optabs.h"
+#include "df.h"
+#include "ira.h"
+#include "rtl-error.h"
+#include "lra-int.h"
+
+/* This structure is used to record information about hard register
+   eliminations.  */
+struct elim_table
+{
+  /* Hard register number to be eliminated.  */
+  int from;			
+  /* Hard register number used as replacement.	*/
+  int to;			
+  /* Difference between values of the two hard registers above on
+     previous iteration.  */
+  HOST_WIDE_INT previous_offset;
+  /* Difference between the values on the current iteration.  */
+  HOST_WIDE_INT offset;		
+  /* Nonzero if this elimination can be done.  */
+  bool can_eliminate;		
+  /* CAN_ELIMINATE since the last check.  */
+  bool prev_can_eliminate;
+  /* REG rtx for the register to be eliminated.	 We cannot simply
+     compare the number since we might then spuriously replace a hard
+     register corresponding to a pseudo assigned to the reg to be
+     eliminated.  */
+  rtx from_rtx;			
+  /* REG rtx for the replacement.  */
+  rtx to_rtx;			
+};
+
+/* The elimination table.  Each array entry describes one possible way
+   of eliminating a register in favor of another.  If there is more
+   than one way of eliminating a particular register, the most
+   preferred should be specified first.	 */
+static struct elim_table *reg_eliminate = 0;
+
+/* This is an intermediate structure to initialize the table.  It has
+   exactly the members provided by ELIMINABLE_REGS.  */
+static const struct elim_table_1
+{
+  const int from;
+  const int to;
+} reg_eliminate_1[] =
+
+/* If a set of eliminable hard registers was specified, define the
+   table from it.  Otherwise, default to the normal case of the frame
+   pointer being replaced by the stack pointer.	 */
+
+#ifdef ELIMINABLE_REGS
+  ELIMINABLE_REGS;
+#else
+  {{ FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM}};
+#endif
+
+#define NUM_ELIMINABLE_REGS ARRAY_SIZE (reg_eliminate_1)
+
+/* Print info about elimination table to file F.  */
+static void
+print_elim_table (FILE *f)
+{
+  struct elim_table *ep;
+
+  for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
+    fprintf (f, "%s eliminate %d to %d (offset=" HOST_WIDE_INT_PRINT_DEC
+	     ", prev_offset=" HOST_WIDE_INT_PRINT_DEC ")\n",
+	     ep->can_eliminate ? "Can" : "Can't",
+	     ep->from, ep->to, ep->offset, ep->previous_offset);
+}
+
+/* Print info about elimination table to stderr.  */
+void
+lra_debug_elim_table (void)
+{
+  print_elim_table (stderr);
+}
+
+/* Setup possibility of elimination in elimination table element EP to
+   VALUE.  Setup FRAME_POINTER_NEEDED if elimination from frame
+   pointer to stack pointer is not possible anymore.  */
+static void
+setup_can_eliminate (struct elim_table *ep, bool value)
+{
+  ep->can_eliminate = ep->prev_can_eliminate = value;
+  if (! value
+      && ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM)
+    frame_pointer_needed = 1;
+}
+
+/* Map: 'from regno' -> to the current elimination, NULL otherwise.
+   The elimination table may contains more one elimination of a hard
+   register.  The map contains only one currently used elimination of
+   the hard register.  */
+static struct elim_table *elimination_map[FIRST_PSEUDO_REGISTER];
+
+/* When an eliminable hard register becomes not eliminable, we use the
+   special following structure to restore original offsets for the
+   register.  */
+static struct elim_table self_elim_table;
+
+/* Offsets should be used to restore original offsets for eliminable
+   hard register which just became not eliminable.  Zero,
+   otherwise.  */
+static HOST_WIDE_INT self_elim_offsets[FIRST_PSEUDO_REGISTER];
+
+/* Map: hard regno -> RTL presentation.	 RTL presentations of all
+   potentially eliminable hard registers are stored in the map.	 */
+static rtx eliminable_reg_rtx[FIRST_PSEUDO_REGISTER];
+
+/* Set up ELIMINATION_MAP of the currently used eliminations.  */
+static void
+setup_elimination_map (void)
+{
+  int i;
+  struct elim_table *ep;
+
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    elimination_map[i] = NULL;
+  for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
+    if (ep->can_eliminate && elimination_map[ep->from] == NULL)
+      elimination_map[ep->from] = ep;
+}
+
+
+
+/* Compute the sum of X and Y, making canonicalizations assumed in an
+   address, namely: sum constant integers, surround the sum of two
+   constants with a CONST, put the constant as the second operand, and
+   group the constant on the outermost sum.
+
+   This routine assumes both inputs are already in canonical form.  */
+static rtx
+form_sum (rtx x, rtx y)
+{
+  rtx tem;
+  enum machine_mode mode = GET_MODE (x);
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (y);
+
+  if (mode == VOIDmode)
+    mode = Pmode;
+
+  if (CONST_INT_P (x))
+    return plus_constant (mode, y, INTVAL (x));
+  else if (CONST_INT_P (y))
+    return plus_constant (mode, x, INTVAL (y));
+  else if (CONSTANT_P (x))
+    tem = x, x = y, y = tem;
+
+  if (GET_CODE (x) == PLUS && CONSTANT_P (XEXP (x, 1)))
+    return form_sum (XEXP (x, 0), form_sum (XEXP (x, 1), y));
+
+  /* Note that if the operands of Y are specified in the opposite
+     order in the recursive calls below, infinite recursion will
+     occur.  */
+  if (GET_CODE (y) == PLUS && CONSTANT_P (XEXP (y, 1)))
+    return form_sum (form_sum (x, XEXP (y, 0)), XEXP (y, 1));
+
+  /* If both constant, encapsulate sum.	 Otherwise, just form sum.  A
+     constant will have been placed second.  */
+  if (CONSTANT_P (x) && CONSTANT_P (y))
+    {
+      if (GET_CODE (x) == CONST)
+	x = XEXP (x, 0);
+      if (GET_CODE (y) == CONST)
+	y = XEXP (y, 0);
+
+      return gen_rtx_CONST (VOIDmode, gen_rtx_PLUS (mode, x, y));
+    }
+
+  return gen_rtx_PLUS (mode, x, y);
+}
+
+/* Return the current substitution hard register of the elimination of
+   HARD_REGNO.	If HARD_REGNO is not eliminable, return itself.	 */
+int
+lra_get_elimation_hard_regno (int hard_regno)
+{
+  struct elim_table *ep;
+
+  if (hard_regno < 0 || hard_regno >= FIRST_PSEUDO_REGISTER)
+    return hard_regno;
+  if ((ep = elimination_map[hard_regno]) == NULL)
+    return hard_regno;
+  return ep->to;
+}
+
+/* Return elimination which will be used for hard reg REG, NULL
+   otherwise.  */
+static struct elim_table *
+get_elimination (rtx reg)
+{
+  int hard_regno;
+  struct elim_table *ep;
+  HOST_WIDE_INT offset;
+
+  lra_assert (REG_P (reg));
+  if ((hard_regno = REGNO (reg)) < 0 || hard_regno >= FIRST_PSEUDO_REGISTER)
+    return NULL;
+  if ((ep = elimination_map[hard_regno]) != NULL)
+    return ep->from_rtx != reg ? NULL : ep;
+  if ((offset = self_elim_offsets[hard_regno]) == 0)
+    return NULL;
+  /* This is an iteration to restore offsets just after HARD_REGNO
+     stopped to be eliminable.	*/
+  self_elim_table.from = self_elim_table.to = hard_regno;
+  self_elim_table.from_rtx
+    = self_elim_table.to_rtx
+    = eliminable_reg_rtx[hard_regno];
+  lra_assert (self_elim_table.from_rtx != NULL);
+  self_elim_table.offset = offset;
+  return &self_elim_table;
+}
+
+/* Scan X and replace any eliminable registers (such as fp) with a
+   replacement (such as sp) if SYBST_P, plus an offset.	 The offset is
+   a change in the offset between the eliminable register and its
+   substitution if UPDATE_P, or the full offset if FULL_P, or
+   otherwise zero.
+
+   MEM_MODE is the mode of an enclosing MEM.  We need this to know how
+   much to adjust a register for, e.g., PRE_DEC.  Also, if we are
+   inside a MEM, we are allowed to replace a sum of a hard register
+   and the constant zero with the hard register, which we cannot do
+   outside a MEM.  In addition, we need to record the fact that a
+   hard register is referenced outside a MEM.
+
+   Alternatively, INSN may be a note (an EXPR_LIST or INSN_LIST).
+   That's used when we eliminate in expressions stored in notes.  */
+rtx
+lra_eliminate_regs_1 (rtx x, enum machine_mode mem_mode,
+		      bool subst_p, bool update_p, bool full_p)
+{
+  enum rtx_code code = GET_CODE (x);
+  struct elim_table *ep;
+  rtx new_rtx;
+  int i, j;
+  const char *fmt;
+  int copied = 0;
+
+  if (! current_function_decl)
+    return x;
+
+  switch (code)
+    {
+    case CONST_INT:
+    case CONST_DOUBLE:
+    case CONST_FIXED:
+    case CONST_VECTOR:
+    case CONST:
+    case SYMBOL_REF:
+    case CODE_LABEL:
+    case PC:
+    case CC0:
+    case ASM_INPUT:
+    case ADDR_VEC:
+    case ADDR_DIFF_VEC:
+    case RETURN:
+      return x;
+
+    case REG:
+      /* First handle the case where we encounter a bare hard register
+	 that is eliminable.  Replace it with a PLUS.  */
+      if ((ep = get_elimination (x)) != NULL)
+	{
+	  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
+	  
+	  if (update_p)
+	    return plus_constant (Pmode, to, ep->offset - ep->previous_offset);
+	  else if (full_p)
+	    return plus_constant (Pmode, to, ep->offset);
+	  else
+	    return to;
+	}
+      return x;
+
+    /* You might think handling MINUS in a manner similar to PLUS is a
+       good idea.  It is not.  It has been tried multiple times and every
+       time the change has had to have been reverted.
+
+       Other parts of LRA know a PLUS is special and require special
+       code to handle code a reloaded PLUS operand.
+
+       Also consider backends where the flags register is clobbered by a
+       MINUS, but we can emit a PLUS that does not clobber flags (IA-32,
+       lea instruction comes to mind).	If we try to reload a MINUS, we
+       may kill the flags register that was holding a useful value.
+
+       So, please before trying to handle MINUS, consider reload as a
+       whole instead of this little section as well as the backend
+       issues.	*/
+    case PLUS:
+      /* If this is the sum of an eliminable register and a constant, rework
+	 the sum.  */
+      if (REG_P (XEXP (x, 0)) && CONSTANT_P (XEXP (x, 1)))
+	{
+	  if ((ep = get_elimination (XEXP (x, 0))) != NULL)
+	    {
+	      HOST_WIDE_INT offset;
+	      rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
+	      
+	      if (! update_p && ! full_p)
+		return gen_rtx_PLUS (Pmode, to, XEXP (x, 1));
+	      
+	      offset = (update_p
+			? ep->offset - ep->previous_offset : ep->offset);
+	      /* The only time we want to replace a PLUS with a REG
+		 (this occurs when the constant operand of the PLUS is
+		 the negative of the offset) is when we are inside a
+		 MEM.  We won't want to do so at other times because
+		 that would change the structure of the insn in a way
+		 that reload can't handle.  We special-case the
+		 commonest situation in eliminate_regs_in_insn, so
+		 just replace a PLUS with a PLUS here, unless inside a
+		 MEM.  */
+	      if (mem_mode != 0
+		  && CONST_INT_P (XEXP (x, 1))
+		  && INTVAL (XEXP (x, 1)) == -offset)
+		return to;
+	      else
+		return gen_rtx_PLUS (Pmode, to,
+				     plus_constant (Pmode,
+						    XEXP (x, 1), offset));
+	    }
+
+	  /* If the hard register is not eliminable, we are done since
+	     the other operand is a constant.  */
+	  return x;
+	}
+
+      /* If this is part of an address, we want to bring any constant
+	 to the outermost PLUS.	 We will do this by doing hard
+	 register replacement in our operands and seeing if a constant
+	 shows up in one of them.
+
+	 Note that there is no risk of modifying the structure of the insn,
+	 since we only get called for its operands, thus we are either
+	 modifying the address inside a MEM, or something like an address
+	 operand of a load-address insn.  */
+
+      {
+	rtx new0 = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode,
+					 subst_p, update_p, full_p);
+	rtx new1 = lra_eliminate_regs_1 (XEXP (x, 1), mem_mode,
+					 subst_p, update_p, full_p);
+
+	if (reg_renumber && (new0 != XEXP (x, 0) || new1 != XEXP (x, 1)))
+	  {
+	    new_rtx = form_sum (new0, new1);
+
+	    /* As above, if we are not inside a MEM we do not want to
+	       turn a PLUS into something else.	 We might try to do so here
+	       for an addition of 0 if we aren't optimizing.  */
+	    if (! mem_mode && GET_CODE (new_rtx) != PLUS)
+	      return gen_rtx_PLUS (GET_MODE (x), new_rtx, const0_rtx);
+	    else
+	      return new_rtx;
+	  }
+      }
+      return x;
+
+    case MULT:
+      /* If this is the product of an eliminable hard register and a
+	 constant, apply the distribute law and move the constant out
+	 so that we have (plus (mult ..) ..).  This is needed in order
+	 to keep load-address insns valid.  This case is pathological.
+	 We ignore the possibility of overflow here.  */
+      if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1))
+	  && (ep = get_elimination (XEXP (x, 0))) != NULL)
+	{
+	  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
+	  
+	  if (update_p)
+	    return
+	      plus_constant (Pmode,
+			     gen_rtx_MULT (Pmode, to, XEXP (x, 1)),
+			     (ep->offset - ep->previous_offset)
+			     * INTVAL (XEXP (x, 1)));
+	  else if (full_p)
+	    return
+	      plus_constant (Pmode,
+			     gen_rtx_MULT (Pmode, to, XEXP (x, 1)),
+			     ep->offset * INTVAL (XEXP (x, 1)));
+	  else
+	    return gen_rtx_MULT (Pmode, to, XEXP (x, 1));
+	}
+      
+      /* ... fall through ...  */
+
+    case CALL:
+    case COMPARE:
+    /* See comments before PLUS about handling MINUS.  */
+    case MINUS:
+    case DIV:	   case UDIV:
+    case MOD:	   case UMOD:
+    case AND:	   case IOR:	  case XOR:
+    case ROTATERT: case ROTATE:
+    case ASHIFTRT: case LSHIFTRT: case ASHIFT:
+    case NE:	   case EQ:
+    case GE:	   case GT:	  case GEU:    case GTU:
+    case LE:	   case LT:	  case LEU:    case LTU:
+      {
+	rtx new0 = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode,
+					 subst_p, update_p, full_p);
+	rtx new1 = XEXP (x, 1)
+		   ? lra_eliminate_regs_1 (XEXP (x, 1), mem_mode,
+					   subst_p, update_p, full_p) : 0;
+
+	if (new0 != XEXP (x, 0) || new1 != XEXP (x, 1))
+	  return gen_rtx_fmt_ee (code, GET_MODE (x), new0, new1);
+      }
+      return x;
+
+    case EXPR_LIST:
+      /* If we have something in XEXP (x, 0), the usual case,
+	 eliminate it.	*/
+      if (XEXP (x, 0))
+	{
+	  new_rtx = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode,
+					  subst_p, update_p, full_p);
+	  if (new_rtx != XEXP (x, 0))
+	    {
+	      /* If this is a REG_DEAD note, it is not valid anymore.
+		 Using the eliminated version could result in creating a
+		 REG_DEAD note for the stack or frame pointer.	*/
+	      if (REG_NOTE_KIND (x) == REG_DEAD)
+		return (XEXP (x, 1)
+			? lra_eliminate_regs_1 (XEXP (x, 1), mem_mode,
+						subst_p, update_p, full_p)
+			: NULL_RTX);
+
+	      x = alloc_reg_note (REG_NOTE_KIND (x), new_rtx, XEXP (x, 1));
+	    }
+	}
+
+      /* ... fall through ...  */
+
+    case INSN_LIST:
+      /* Now do eliminations in the rest of the chain.	If this was
+	 an EXPR_LIST, this might result in allocating more memory than is
+	 strictly needed, but it simplifies the code.  */
+      if (XEXP (x, 1))
+	{
+	  new_rtx = lra_eliminate_regs_1 (XEXP (x, 1), mem_mode,
+					  subst_p, update_p, full_p);
+	  if (new_rtx != XEXP (x, 1))
+	    return
+	      gen_rtx_fmt_ee (GET_CODE (x), GET_MODE (x),
+			      XEXP (x, 0), new_rtx);
+	}
+      return x;
+
+    case PRE_INC:
+    case POST_INC:
+    case PRE_DEC:
+    case POST_DEC:
+      /* We do not support elimination of a register that is modified.
+	 elimination_effects has already make sure that this does not
+	 happen.  */
+      return x;
+
+    case PRE_MODIFY:
+    case POST_MODIFY:
+      /* We do not support elimination of a hard register that is
+	 modified.  elimination_effects has already make sure that
+	 this does not happen.	The only remaining case we need to
+	 consider here is that the increment value may be an
+	 eliminable register.  */
+      if (GET_CODE (XEXP (x, 1)) == PLUS
+	  && XEXP (XEXP (x, 1), 0) == XEXP (x, 0))
+	{
+	  rtx new_rtx = lra_eliminate_regs_1 (XEXP (XEXP (x, 1), 1), mem_mode,
+					      subst_p, update_p, full_p);
+
+	  if (new_rtx != XEXP (XEXP (x, 1), 1))
+	    return gen_rtx_fmt_ee (code, GET_MODE (x), XEXP (x, 0),
+				   gen_rtx_PLUS (GET_MODE (x),
+						 XEXP (x, 0), new_rtx));
+	}
+      return x;
+
+    case STRICT_LOW_PART:
+    case NEG:	       case NOT:
+    case SIGN_EXTEND:  case ZERO_EXTEND:
+    case TRUNCATE:     case FLOAT_EXTEND: case FLOAT_TRUNCATE:
+    case FLOAT:	       case FIX:
+    case UNSIGNED_FIX: case UNSIGNED_FLOAT:
+    case ABS:
+    case SQRT:
+    case FFS:
+    case CLZ:
+    case CTZ:
+    case POPCOUNT:
+    case PARITY:
+    case BSWAP:
+      new_rtx = lra_eliminate_regs_1 (XEXP (x, 0), mem_mode,
+				      subst_p, update_p, full_p);
+      if (new_rtx != XEXP (x, 0))
+	return gen_rtx_fmt_e (code, GET_MODE (x), new_rtx);
+      return x;
+
+    case SUBREG:
+	new_rtx = lra_eliminate_regs_1 (SUBREG_REG (x), mem_mode,
+					subst_p, update_p, full_p);
+
+      if (new_rtx != SUBREG_REG (x))
+	{
+	  int x_size = GET_MODE_SIZE (GET_MODE (x));
+	  int new_size = GET_MODE_SIZE (GET_MODE (new_rtx));
+
+	  if (MEM_P (new_rtx)
+	      && ((x_size < new_size
+#ifdef WORD_REGISTER_OPERATIONS
+		   /* On these machines, combine can create RTL of the form
+		      (set (subreg:m1 (reg:m2 R) 0) ...)
+		      where m1 < m2, and expects something interesting to
+		      happen to the entire word.  Moreover, it will use the
+		      (reg:m2 R) later, expecting all bits to be preserved.
+		      So if the number of words is the same, preserve the
+		      subreg so that push_reload can see it.  */
+		   && ! ((x_size - 1) / UNITS_PER_WORD
+			 == (new_size -1 ) / UNITS_PER_WORD)
+#endif
+		   )
+		  || x_size == new_size)
+	      )
+
+	    {
+	      SUBREG_REG (x) = new_rtx;
+	      alter_subreg (&x, false);
+	      return x;
+	    }
+	  else
+	    return gen_rtx_SUBREG (GET_MODE (x), new_rtx, SUBREG_BYTE (x));
+	}
+
+      return x;
+
+    case MEM:
+      /* Our only special processing is to pass the mode of the MEM to our
+	 recursive call and copy the flags.  While we are here, handle this
+	 case more efficiently.	 */
+      return
+	replace_equiv_address_nv
+	(x,
+	 lra_eliminate_regs_1 (XEXP (x, 0), GET_MODE (x),
+			       subst_p, update_p, full_p));
+
+    case USE:
+      /* Handle insn_list USE that a call to a pure function may generate.  */
+      new_rtx = lra_eliminate_regs_1 (XEXP (x, 0), VOIDmode,
+				      subst_p, update_p, full_p);
+      if (new_rtx != XEXP (x, 0))
+	return gen_rtx_USE (GET_MODE (x), new_rtx);
+      return x;
+
+    case CLOBBER:
+    case SET:
+      gcc_unreachable ();
+
+    default:
+      break;
+    }
+
+  /* Process each of our operands recursively.	If any have changed, make a
+     copy of the rtx.  */
+  fmt = GET_RTX_FORMAT (code);
+  for (i = 0; i < GET_RTX_LENGTH (code); i++, fmt++)
+    {
+      if (*fmt == 'e')
+	{
+	  new_rtx = lra_eliminate_regs_1 (XEXP (x, i), mem_mode,
+					  subst_p, update_p, full_p);
+	  if (new_rtx != XEXP (x, i) && ! copied)
+	    {
+	      x = shallow_copy_rtx (x);
+	      copied = 1;
+	    }
+	  XEXP (x, i) = new_rtx;
+	}
+      else if (*fmt == 'E')
+	{
+	  int copied_vec = 0;
+	  for (j = 0; j < XVECLEN (x, i); j++)
+	    {
+	      new_rtx = lra_eliminate_regs_1 (XVECEXP (x, i, j), mem_mode,
+					      subst_p, update_p, full_p);
+	      if (new_rtx != XVECEXP (x, i, j) && ! copied_vec)
+		{
+		  rtvec new_v = gen_rtvec_v (XVECLEN (x, i),
+					     XVEC (x, i)->elem);
+		  if (! copied)
+		    {
+		      x = shallow_copy_rtx (x);
+		      copied = 1;
+		    }
+		  XVEC (x, i) = new_v;
+		  copied_vec = 1;
+		}
+	      XVECEXP (x, i, j) = new_rtx;
+	    }
+	}
+    }
+
+  return x;
+}
+
+/* This function is used externally in subsequent passes of GCC.  It
+   always does a full elimination of X.	 */
+rtx
+lra_eliminate_regs (rtx x, enum machine_mode mem_mode,
+		    rtx insn ATTRIBUTE_UNUSED)
+{
+  return lra_eliminate_regs_1 (x, mem_mode, true, false, true);
+}
+
+/* Scan rtx X for modifications of elimination target registers.  Update
+   the table of eliminables to reflect the changed state.  MEM_MODE is
+   the mode of an enclosing MEM rtx, or VOIDmode if not within a MEM.  */
+static void
+mark_not_eliminable (rtx x)
+{
+  enum rtx_code code = GET_CODE (x);
+  struct elim_table *ep;
+  int i, j;
+  const char *fmt;
+
+  switch (code)
+    {
+    case PRE_INC:
+    case POST_INC:
+    case PRE_DEC:
+    case POST_DEC:
+    case POST_MODIFY:
+    case PRE_MODIFY:
+      if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER)
+	/* If we modify the source of an elimination rule, disable it.	*/
+	for (ep = reg_eliminate;
+	     ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
+	       ep++)
+	  if (ep->from_rtx == XEXP (x, 0)
+	      || (ep->to_rtx == XEXP (x, 0)
+		  && ep->to_rtx != hard_frame_pointer_rtx))
+	    setup_can_eliminate (ep, false);
+      return;
+
+    case USE:
+      if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER)
+	/* If using a hard register that is the source of an eliminate
+	   we still think can be performed, note it cannot be
+	   performed since we don't know how this hard register is
+	   used.  */
+	for (ep = reg_eliminate;
+	     ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
+	     ep++)
+	  if (ep->from_rtx == XEXP (x, 0)
+	      && ep->to_rtx != hard_frame_pointer_rtx)
+	    setup_can_eliminate (ep, false);
+      return;
+
+    case CLOBBER:
+      if (REG_P (XEXP (x, 0)) && REGNO (XEXP (x, 0)) < FIRST_PSEUDO_REGISTER)
+	/* If clobbering a hard register that is the replacement
+	   register for an elimination we still think can be
+	   performed, note that it cannot be performed.	 Otherwise, we
+	   need not be concerned about it.  */
+	for (ep = reg_eliminate;
+	     ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
+	     ep++)
+	  if (ep->to_rtx == XEXP (x, 0)
+	      && ep->to_rtx != hard_frame_pointer_rtx)
+	    setup_can_eliminate (ep, false);
+      return;
+
+    case SET:
+      /* Check for setting a hard register that we know about.	*/
+      if (REG_P (SET_DEST (x)) && REGNO (SET_DEST (x)) < FIRST_PSEUDO_REGISTER)
+	{
+	  /* See if this is setting the replacement hard register for
+	     an elimination.
+
+	     If DEST is the hard frame pointer, we do nothing because
+	     we assume that all assignments to the frame pointer are
+	     for non-local gotos and are being done at a time when
+	     they are valid and do not disturb anything else.  Some
+	     machines want to eliminate a fake argument pointer (or
+	     even a fake frame pointer) with either the real frame
+	     pointer or the stack pointer.  Assignments to the hard
+	     frame pointer must not prevent this elimination.  */
+
+	  for (ep = reg_eliminate;
+	       ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
+	       ep++)
+	    if (ep->to_rtx == SET_DEST (x)
+		&& SET_DEST (x) != hard_frame_pointer_rtx)
+	      setup_can_eliminate (ep, false);
+	}
+
+      mark_not_eliminable (SET_DEST (x));
+      mark_not_eliminable (SET_SRC (x));
+      return;
+
+    default:
+      break;
+    }
+
+  fmt = GET_RTX_FORMAT (code);
+  for (i = 0; i < GET_RTX_LENGTH (code); i++, fmt++)
+    {
+      if (*fmt == 'e')
+	mark_not_eliminable (XEXP (x, i));
+      else if (*fmt == 'E')
+	for (j = 0; j < XVECLEN (x, i); j++)
+	  mark_not_eliminable (XVECEXP (x, i, j));
+    }
+}
+
+
+
+/* Scan INSN and eliminate all eliminable hard registers in it.
+
+   If REPLACE_P is true, do the replacement destructively.  Also delete
+   the insn as dead it if it is setting an eliminable register.
+
+   If REPLACE_P is false, do an offset updates.	 */
+
+static void
+eliminate_regs_in_insn (rtx insn, bool replace_p)
+{
+  int icode = recog_memoized (insn);
+  rtx old_set = single_set (insn);
+  bool val;
+  int i;
+  rtx substed_operand[MAX_RECOG_OPERANDS];
+  rtx orig_operand[MAX_RECOG_OPERANDS];
+  struct elim_table *ep;
+  rtx plus_src, plus_cst_src;
+  lra_insn_recog_data_t id;
+  struct lra_static_insn_data *static_id;
+
+  if (icode < 0 && asm_noperands (PATTERN (insn)) < 0 && ! DEBUG_INSN_P (insn))
+    {
+      lra_assert (GET_CODE (PATTERN (insn)) == USE
+		  || GET_CODE (PATTERN (insn)) == CLOBBER
+		  || GET_CODE (PATTERN (insn)) == ADDR_VEC
+		  || GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC
+		  || GET_CODE (PATTERN (insn)) == ASM_INPUT);
+      return;
+    }
+
+  /* Check for setting an eliminable register.	*/
+  if (old_set != 0 && REG_P (SET_DEST (old_set))
+      && (ep = get_elimination (SET_DEST (old_set))) != NULL)
+    {
+      bool delete_p = replace_p;
+      
+#ifdef HARD_FRAME_POINTER_REGNUM
+      /* If this is setting the frame pointer register to the hardware
+	 frame pointer register and this is an elimination that will
+	 be done (tested above), this insn is really adjusting the
+	 frame pointer downward to compensate for the adjustment done
+	 before a nonlocal goto.  */
+      if (ep->from == FRAME_POINTER_REGNUM
+	  && ep->to == HARD_FRAME_POINTER_REGNUM)
+	{
+	  if (replace_p)
+	    {
+	      SET_DEST (old_set) = ep->to_rtx;
+	      lra_update_insn_recog_data (insn);
+	      return;
+	    }
+	  else
+	    {
+	      rtx base = SET_SRC (old_set);
+	      HOST_WIDE_INT offset = 0;
+	      rtx base_insn = insn;
+	      
+	      while (base != ep->to_rtx)
+		{
+		  rtx prev_insn, prev_set;
+		  
+		  if (GET_CODE (base) == PLUS && CONST_INT_P (XEXP (base, 1)))
+		    {
+		      offset += INTVAL (XEXP (base, 1));
+		      base = XEXP (base, 0);
+		    }
+		  else if ((prev_insn = prev_nonnote_insn (base_insn)) != 0
+			   && (prev_set = single_set (prev_insn)) != 0
+			   && rtx_equal_p (SET_DEST (prev_set), base))
+		    {
+		      base = SET_SRC (prev_set);
+		      base_insn = prev_insn;
+		    }
+		  else
+		    break;
+		}
+	      
+	      if (base == ep->to_rtx)
+		{
+		  rtx src;
+		  
+		  offset -= (ep->offset - ep->previous_offset);
+		  src = plus_constant (Pmode, ep->to_rtx, offset);
+		  
+		  /* First see if this insn remains valid when we make
+		     the change.  If not, keep the INSN_CODE the same
+		     and let the constraint pass fit it up.  */
+		  validate_change (insn, &SET_SRC (old_set), src, 1);
+		  validate_change (insn, &SET_DEST (old_set),
+				   ep->from_rtx, 1);
+		  if (! apply_change_group ())
+		    {
+		      SET_SRC (old_set) = src;
+		      SET_DEST (old_set) = ep->from_rtx;
+		    }
+		  lra_update_insn_recog_data (insn);
+		  return;
+		}
+	    }
+	  
+	  
+	  /* We can't delete this insn, but needn't process it
+	     since it won't be used unless something changes.  */
+	  delete_p = false;
+	}
+#endif
+      
+      /* This insn isn't serving a useful purpose.  We delete it
+	 when REPLACE is set.  */
+      if (delete_p)
+	lra_delete_dead_insn (insn);
+      return;
+    }
+
+  /* We allow one special case which happens to work on all machines we
+     currently support: a single set with the source or a REG_EQUAL
+     note being a PLUS of an eliminable register and a constant.  */
+  plus_src = plus_cst_src = 0;
+  if (old_set && REG_P (SET_DEST (old_set)))
+    {
+      if (GET_CODE (SET_SRC (old_set)) == PLUS)
+	plus_src = SET_SRC (old_set);
+      /* First see if the source is of the form (plus (...) CST).  */
+      if (plus_src
+	  && CONST_INT_P (XEXP (plus_src, 1)))
+	plus_cst_src = plus_src;
+      /* Check that the first operand of the PLUS is a hard reg or
+	 the lowpart subreg of one.  */
+      if (plus_cst_src)
+	{
+	  rtx reg = XEXP (plus_cst_src, 0);
+
+	  if (GET_CODE (reg) == SUBREG && subreg_lowpart_p (reg))
+	    reg = SUBREG_REG (reg);
+
+	  if (!REG_P (reg) || REGNO (reg) >= FIRST_PSEUDO_REGISTER)
+	    plus_cst_src = 0;
+	}
+    }
+  if (plus_cst_src)
+    {
+      rtx reg = XEXP (plus_cst_src, 0);
+      HOST_WIDE_INT offset = INTVAL (XEXP (plus_cst_src, 1));
+
+      if (GET_CODE (reg) == SUBREG)
+	reg = SUBREG_REG (reg);
+
+      if (REG_P (reg) && (ep = get_elimination (reg)) != NULL)
+	{
+	  rtx to_rtx = replace_p ? ep->to_rtx : ep->from_rtx;
+	  
+	  if (! replace_p)
+	    {
+	      offset += (ep->offset - ep->previous_offset);
+	      offset = trunc_int_for_mode (offset, GET_MODE (plus_cst_src));
+	    }
+	  
+	  if (GET_CODE (XEXP (plus_cst_src, 0)) == SUBREG)
+	    to_rtx = gen_lowpart (GET_MODE (XEXP (plus_cst_src, 0)), to_rtx);
+	  /* If we have a nonzero offset, and the source is already a
+	     simple REG, the following transformation would increase
+	     the cost of the insn by replacing a simple REG with (plus
+	     (reg sp) CST).  So try only when we already had a PLUS
+	     before.  */
+	  if (offset == 0 || plus_src)
+	    {
+	      rtx new_src = plus_constant (GET_MODE (to_rtx), to_rtx, offset);
+	      
+	      old_set = single_set (insn);
+
+	      /* First see if this insn remains valid when we make the
+		 change.  If not, try to replace the whole pattern
+		 with a simple set (this may help if the original insn
+		 was a PARALLEL that was only recognized as single_set
+		 due to REG_UNUSED notes).  If this isn't valid
+		 either, keep the INSN_CODE the same and let the
+		 constraint pass fix it up.  */
+	      if (! validate_change (insn, &SET_SRC (old_set), new_src, 0))
+		{
+		  rtx new_pat = gen_rtx_SET (VOIDmode,
+					     SET_DEST (old_set), new_src);
+		  
+		  if (! validate_change (insn, &PATTERN (insn), new_pat, 0))
+		    SET_SRC (old_set) = new_src;
+		}
+	      lra_update_insn_recog_data (insn);
+	      /* This can't have an effect on elimination offsets, so skip
+		 right to the end.  */
+	      return;
+	    }
+	}
+    }
+
+  /* Eliminate all eliminable registers occurring in operands that
+     can be handled by the constraint pass.  */
+  id = lra_get_insn_recog_data (insn);
+  static_id = id->insn_static_data;
+  val = false;
+  for (i = 0; i < static_id->n_operands; i++)
+    {
+      orig_operand[i] = *id->operand_loc[i];
+      substed_operand[i] = *id->operand_loc[i];
+
+      /* For an asm statement, every operand is eliminable.  */
+      if (icode < 0 || insn_data[icode].operand[i].eliminable)
+	{
+	  /* Check for setting a hard register that we know about.  */
+	  if (static_id->operand[i].type != OP_IN
+	      && REG_P (orig_operand[i]))
+	    {
+	      /* If we are assigning to a hard register that can be
+		 eliminated, it must be as part of a PARALLEL, since
+		 the code above handles single SETs.  We must indicate
+		 that we can no longer eliminate this reg.  */
+	      for (ep = reg_eliminate;
+		   ep < &reg_eliminate[NUM_ELIMINABLE_REGS];
+		   ep++)
+		lra_assert (ep->from_rtx != orig_operand[i]
+			    || ! ep->can_eliminate);
+	    }
+
+	  /* Companion to the above plus substitution, we can allow
+	     invariants as the source of a plain move.	*/
+	  substed_operand[i]
+	    = lra_eliminate_regs_1 (*id->operand_loc[i], VOIDmode,
+				    replace_p, ! replace_p, false);
+	  if (substed_operand[i] != orig_operand[i])
+	    val = true;
+
+	  /* If an output operand changed from a REG to a MEM and INSN is an
+	     insn, write a CLOBBER insn.  */
+	  if (static_id->operand[i].type != OP_IN
+	      && REG_P (orig_operand[i])
+	      && MEM_P (substed_operand[i])
+	      && replace_p)
+	    emit_insn_after (gen_clobber (orig_operand[i]), insn);
+	}
+    }
+
+  /* Substitute the operands; the new values are in the substed_operand
+     array.  */
+  for (i = 0; i < static_id->n_operands; i++)
+    *id->operand_loc[i] = substed_operand[i];
+  for (i = 0; i < static_id->n_dups; i++)
+    *id->dup_loc[i] = substed_operand[(int) static_id->dup_num[i]];
+
+  if (val)
+    {
+      /* If we had a move insn but now we don't, re-recognize it.
+	 This will cause spurious re-recognition if the old move had a
+	 PARALLEL since the new one still will, but we can't call
+	 single_set without having put new body into the insn and the
+	 re-recognition won't hurt in this rare case.  */
+      id = lra_update_insn_recog_data (insn);
+      static_id = id->insn_static_data;
+    }
+}
+
+/* Spill pseudos which are assigned to hard registers in SET.  Add
+   affected insns for processing in the subsequent constraint
+   pass.  */
+static void
+spill_pseudos (HARD_REG_SET set)
+{
+  int i;
+  bitmap_head to_process;
+  rtx insn;
+
+  if (hard_reg_set_empty_p (set))
+    return;
+  if (lra_dump_file != NULL)
+    {
+      fprintf (lra_dump_file, "	   Spilling non-eliminable hard regs:");
+      for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+	if (TEST_HARD_REG_BIT (set, i))
+	  fprintf (lra_dump_file, " %d", i);
+      fprintf (lra_dump_file, "\n");
+    }
+  bitmap_initialize (&to_process, &reg_obstack);
+  for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+    if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0
+	&& lra_hard_reg_set_intersection_p (reg_renumber[i],
+					    PSEUDO_REGNO_MODE (i), set))
+      {
+	if (lra_dump_file != NULL)
+	  fprintf (lra_dump_file, "	 Spilling r%d(%d)\n",
+		   i, reg_renumber[i]);
+	reg_renumber[i] = -1;
+	bitmap_ior_into (&to_process, &lra_reg_info[i].insn_bitmap);
+      }
+  IOR_HARD_REG_SET (lra_no_alloc_regs, set);
+  for (insn = get_insns (); insn != NULL_RTX; insn = NEXT_INSN (insn))
+    if (bitmap_bit_p (&to_process, INSN_UID (insn)))
+      {
+	lra_push_insn (insn);
+	lra_set_used_insn_alternative (insn, -1);
+      }
+  bitmap_clear (&to_process);
+}
+
+/* Update all offsets and possibility for elimination on eliminable
+   registers.  Spill pseudos assigned to registers which became
+   uneliminable, update LRA_NO_ALLOC_REGS and ELIMINABLE_REG_SET.  Add
+   insns to INSNS_WITH_CHANGED_OFFSETS containing eliminable hard
+   registers whose offsets should be changed.  */
+static void
+update_reg_eliminate (bitmap insns_with_changed_offsets)
+{
+  bool prev;
+  struct elim_table *ep, *ep1;
+  HARD_REG_SET temp_hard_reg_set;
+
+  /* Clear self elimination offsets.  */
+  for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
+    self_elim_offsets[ep->from] = 0;
+  CLEAR_HARD_REG_SET (temp_hard_reg_set);
+  for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
+    {
+      /* If it is a currently used elimination: update the previous
+	 offset.  */
+      if (elimination_map[ep->from] == ep)
+	ep->previous_offset = ep->offset;
+
+      prev = ep->prev_can_eliminate;
+      setup_can_eliminate (ep, targetm.can_eliminate (ep->from, ep->to));
+      if (ep->can_eliminate && ! prev)
+	{
+	  /* It is possible that not eliminable register becomes
+	     eliminable because we took other reasons into account to
+	     set up eliminable regs in the initial set up.  Just
+	     ignore new eliminable registers.  */
+	  setup_can_eliminate (ep, false);
+	  continue;
+	}
+      if (ep->can_eliminate != prev && elimination_map[ep->from] == ep)
+	{
+	  /* We cannot use this elimination anymore -- find another
+	     one.  */
+	  if (lra_dump_file != NULL)
+	    fprintf (lra_dump_file,
+		     "	Elimination %d to %d is not possible anymore\n",
+		     ep->from, ep->to);
+	  /* Mark that is not eliminable anymore.  */
+	  elimination_map[ep->from] = NULL;
+	  for (ep1 = ep + 1; ep1 < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep1++)
+	    if (ep1->can_eliminate && ep1->from == ep->from)
+	      break;
+	  if (ep1 < &reg_eliminate[NUM_ELIMINABLE_REGS])
+	    {
+	      if (lra_dump_file != NULL)
+		fprintf (lra_dump_file, "    Using elimination %d to %d now\n",
+			 ep1->from, ep1->to);
+	      /* Prevent the hard register into which we eliminate now
+		 from the usage for pseudos.  */
+	      SET_HARD_REG_BIT (temp_hard_reg_set, ep1->to);
+	      lra_assert (ep1->previous_offset == 0);
+	      ep1->previous_offset = ep->offset;
+	    }
+	  else
+	    {
+	      /* There is no elimination anymore just use the hard
+		 register `from' itself.  Setup self elimination
+		 offset to restore the original offset values.	*/
+	      if (lra_dump_file != NULL)
+		fprintf (lra_dump_file, "    %d is not eliminable at all\n",
+			 ep->from);
+	      self_elim_offsets[ep->from] = -ep->offset;
+	      SET_HARD_REG_BIT (temp_hard_reg_set, ep->from);
+	      if (ep->offset != 0)
+		bitmap_ior_into (insns_with_changed_offsets,
+				 &lra_reg_info[ep->from].insn_bitmap);
+	    }
+	}
+
+#ifdef ELIMINABLE_REGS
+      INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->offset);
+#else
+      INITIAL_FRAME_POINTER_OFFSET (ep->offset);
+#endif
+    }
+  IOR_HARD_REG_SET (lra_no_alloc_regs, temp_hard_reg_set);
+  AND_COMPL_HARD_REG_SET (eliminable_regset, temp_hard_reg_set);
+  spill_pseudos (temp_hard_reg_set);
+  setup_elimination_map ();
+  for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
+    if (elimination_map[ep->from] == ep && ep->previous_offset != ep->offset)
+      bitmap_ior_into (insns_with_changed_offsets,
+		       &lra_reg_info[ep->from].insn_bitmap);
+}
+
+/* Initialize the table of hard registers to eliminate.
+   Pre-condition: global flag frame_pointer_needed has been set before
+   calling this function.  Set up hard registers in DONT_USE_REGS
+   which can not be used for allocation because their identical
+   elimination is not possible.	 */
+static void
+init_elim_table (void)
+{
+  bool value_p;
+  struct elim_table *ep;
+#ifdef ELIMINABLE_REGS
+  const struct elim_table_1 *ep1;
+#endif
+
+  if (!reg_eliminate)
+    reg_eliminate = XCNEWVEC (struct elim_table, NUM_ELIMINABLE_REGS);
+
+  memset (self_elim_offsets, 0, sizeof (self_elim_offsets));
+  /* Initiate member values which will be never changed.  */
+  self_elim_table.can_eliminate = self_elim_table.prev_can_eliminate = true;
+  self_elim_table.previous_offset = 0;
+#ifdef ELIMINABLE_REGS
+  for (ep = reg_eliminate, ep1 = reg_eliminate_1;
+       ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++, ep1++)
+    {
+      ep->offset = ep->previous_offset = 0;
+      ep->from = ep1->from;
+      ep->to = ep1->to;
+      value_p = (targetm.can_eliminate (ep->from, ep->to)
+		 && ! (ep->to == STACK_POINTER_REGNUM
+		       && frame_pointer_needed 
+		       && (! SUPPORTS_STACK_ALIGNMENT
+			   || ! stack_realign_fp)));
+      setup_can_eliminate (ep, value_p);
+    }
+#else
+  reg_eliminate[0].offset = reg_eliminate[0].previous_offset = 0;
+  reg_eliminate[0].from = reg_eliminate_1[0].from;
+  reg_eliminate[0].to = reg_eliminate_1[0].to;
+  setup_can_eliminate (&reg_eliminate[0], ! frame_pointer_needed);
+#endif
+
+  /* Count the number of eliminable registers and build the FROM and TO
+     REG rtx's.	 Note that code in gen_rtx_REG will cause, e.g.,
+     gen_rtx_REG (Pmode, STACK_POINTER_REGNUM) to equal stack_pointer_rtx.
+     We depend on this.	 */
+  for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
+    {
+      ep->from_rtx = gen_rtx_REG (Pmode, ep->from);
+      ep->to_rtx = gen_rtx_REG (Pmode, ep->to);
+      eliminable_reg_rtx[ep->from] = ep->from_rtx;
+    }
+}
+
+/* Entry function for initialization of elimination once per
+   function.  */
+void
+lra_init_elimination (void)
+{
+  basic_block bb;
+  rtx insn;
+
+  init_elim_table ();
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+    if (NONDEBUG_INSN_P (insn))
+      mark_not_eliminable (PATTERN (insn));
+  setup_elimination_map ();
+}
+
+/* Eliminate hard reg given by its location LOC.  */
+void
+lra_eliminate_reg_if_possible (rtx *loc)
+{
+  int regno;
+  struct elim_table *ep;
+
+  lra_assert (REG_P (*loc));
+  if ((regno = REGNO (*loc)) >= FIRST_PSEUDO_REGISTER
+      /* Virtual registers are not allocatable. ??? */
+      || ! TEST_HARD_REG_BIT (lra_no_alloc_regs, regno))
+    return;
+  if ((ep = get_elimination (*loc)) != NULL)
+    *loc = ep->to_rtx;
+}
+
+/* Do (final if FINAL_P) elimination in INSN.  Add the insn for
+   subsequent processing in the constraint pass, update the insn info.	*/
+static void
+process_insn_for_elimination (rtx insn, bool final_p)
+{
+  eliminate_regs_in_insn (insn, final_p);
+  if (! final_p)
+    {
+      /* Check that insn changed its code.  This is a case when a move
+	 insn becomes an add insn and we do not want to process the
+	 insn as a move anymore.  */
+      int icode = recog (PATTERN (insn), insn, 0);
+
+      if (icode >= 0 && icode != INSN_CODE (insn))
+	{
+	  INSN_CODE (insn) = icode;
+	  lra_update_insn_recog_data (insn);
+	}
+      lra_update_insn_regno_info (insn);
+      lra_push_insn (insn);
+      lra_set_used_insn_alternative (insn, -1);
+    }
+}
+
+/* Entry function to do final elimination if FINAL_P or to update
+   elimination register offsets.  */
+void
+lra_eliminate (bool final_p)
+{
+  int i;
+  basic_block bb;
+  rtx insn, temp, mem_loc, invariant;
+  bitmap_head insns_with_changed_offsets;
+  struct elim_table *ep;
+  int regs_num = max_reg_num ();
+
+  bitmap_initialize (&insns_with_changed_offsets, &reg_obstack);
+  if (final_p)
+    {
+#ifdef ENABLE_CHECKING
+      update_reg_eliminate (&insns_with_changed_offsets);
+      if (! bitmap_empty_p (&insns_with_changed_offsets))
+	gcc_unreachable ();
+#endif
+      /* We change eliminable hard registers in insns so we should do
+	 this for all insns containing any eliminable hard
+	 register.  */
+      for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
+	if (elimination_map[ep->from] != NULL)
+	  bitmap_ior_into (&insns_with_changed_offsets,
+			   &lra_reg_info[ep->from].insn_bitmap);
+    }
+  else
+    {
+      update_reg_eliminate (&insns_with_changed_offsets);
+      if (bitmap_empty_p (&insns_with_changed_offsets))
+	return;
+    }
+  if (lra_dump_file != NULL)
+    {
+      fprintf (lra_dump_file, "New elimination table:\n");
+      print_elim_table (lra_dump_file);
+    }
+  for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
+    if (lra_reg_info[i].nrefs != 0)
+      {
+	mem_loc = ira_reg_equiv[i].memory;
+	invariant = ira_reg_equiv[i].invariant;
+	if (mem_loc != NULL_RTX)
+	  mem_loc = lra_eliminate_regs_1 (mem_loc, VOIDmode,
+					  final_p, ! final_p, false);
+	ira_reg_equiv[i].memory = mem_loc;
+	if (invariant != NULL_RTX)
+	  invariant = lra_eliminate_regs_1 (invariant, VOIDmode,
+					    final_p, ! final_p, false);
+	ira_reg_equiv[i].invariant = invariant;
+	if (lra_dump_file != NULL
+	    && (mem_loc != NULL_RTX || invariant != NULL))
+	  fprintf (lra_dump_file,
+		   "Updating elimination of equiv for reg %d\n", i);
+      }
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS_SAFE (bb, insn, temp)
+      {
+	if (bitmap_bit_p (&insns_with_changed_offsets, INSN_UID (insn)))
+	  process_insn_for_elimination (insn, final_p);
+      }
+  bitmap_clear (&insns_with_changed_offsets);
+}
Index: lra-lives.c
===================================================================
--- lra-lives.c	(revision 0)
+++ lra-lives.c	(working copy)
@@ -0,0 +1,1055 @@ 
+/* Build live ranges for pseudos.
+   Copyright (C) 2010, 2011, 2012
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.	If not see
+<http://www.gnu.org/licenses/>.	 */
+
+
+/* This file contains code to build pseudo live-ranges (analogous
+   structures used in IRA, so read comments about the live-ranges
+   there) and other info necessary for other passes to assign
+   hard-registers to pseudos, coalesce the spilled pseudos, and assign
+   stack memory slots to spilled pseudos.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "hard-reg-set.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "output.h"
+#include "regs.h"
+#include "function.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "except.h"
+#include "df.h"
+#include "ira.h"
+#include "sparseset.h"
+#include "lra-int.h"
+
+/* Program points are enumerated by numbers from range
+   0..LRA_LIVE_MAX_POINT-1.  There are approximately two times more
+   program points than insns.  Program points are places in the
+   program where liveness info can be changed.	In most general case
+   (there are more complicated cases too) some program points
+   correspond to places where input operand dies and other ones
+   correspond to places where output operands are born.	 */
+int lra_live_max_point;
+
+/* Arrays of size LRA_LIVE_MAX_POINT mapping a program point to the
+   pseudo live ranges with given start/finish point.  */
+lra_live_range_t *lra_start_point_ranges, *lra_finish_point_ranges;
+
+/* Accumulated execution frequency of all references for each hard
+   register.  */
+int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER];
+
+/* A global flag whose true value says to build live ranges for all
+   pseudos, otherwise the live ranges only for pseudos got memory is
+   build.  True value means also building copies and setting up hard
+   register preferences.  The complete info is necessary only for the
+   assignment pass.  The complete info is not needed for the
+   coalescing and spill passes.	 */
+static bool complete_info_p;
+
+/* Number of the current program point.	 */
+static int curr_point;
+
+/* Pseudos live at current point in the RTL scan.  */
+static sparseset pseudos_live;
+
+/* Pseudos probably living through calls and setjumps.	As setjump is
+   a call too, if a bit in PSEUDOS_LIVE_THROUGH_SETJUMPS is set up
+   then the corresponding bit in PSEUDOS_LIVE_THROUGH_CALLS is set up
+   too.	 These data are necessary for cases when only one subreg of a
+   multi-reg pseudo is set up after a call.  So we decide it is
+   probably live when traversing bb backward.  We are sure about
+   living when we see its usage or definition of the pseudo.  */
+static sparseset pseudos_live_through_calls;
+static sparseset pseudos_live_through_setjumps;
+
+/* Set of hard regs (except eliminable ones) currently live.  */
+static HARD_REG_SET hard_regs_live;
+
+/* Set of pseudos and hard registers start living/dying in the current
+   insn.  These sets are used to update REG_DEAD and REG_UNUSED notes
+   in the insn.	 */
+static sparseset start_living, start_dying;
+
+/* Set of pseudos and hard regs dead and unused in the current
+   insn.  */
+static sparseset unused_set, dead_set;
+
+/* Pool for pseudo live ranges.	 */
+static alloc_pool live_range_pool;
+
+/* Free live range LR.	*/
+static void
+free_live_range (lra_live_range_t lr)
+{
+  pool_free (live_range_pool, lr);
+}
+
+/* Free live range list LR.  */
+static void
+free_live_range_list (lra_live_range_t lr)
+{
+  lra_live_range_t next;
+
+  while (lr != NULL)
+    {
+      next = lr->next;
+      free_live_range (lr);
+      lr = next;
+    }
+}
+
+/* Create and return pseudo live range with given attributes.  */
+static lra_live_range_t
+create_live_range (int regno, int start, int finish, lra_live_range_t next)
+{
+  lra_live_range_t p;
+
+  p = (lra_live_range_t) pool_alloc (live_range_pool);
+  p->regno = regno;
+  p->start = start;
+  p->finish = finish;
+  p->next = next;
+  return p;
+}
+
+/* Copy live range R and return the result.  */
+static lra_live_range_t
+copy_live_range (lra_live_range_t r)
+{
+  lra_live_range_t p;
+
+  p = (lra_live_range_t) pool_alloc (live_range_pool);
+  *p = *r;
+  return p;
+}
+
+/* Copy live range list given by its head R and return the result.  */
+lra_live_range_t
+lra_copy_live_range_list (lra_live_range_t r)
+{
+  lra_live_range_t p, first, last;
+
+  if (r == NULL)
+    return NULL;
+  for (first = last = NULL; r != NULL; r = r->next)
+    {
+      p = copy_live_range (r);
+      if (first == NULL)
+	first = p;
+      else
+	last->next = p;
+      last = p;
+    }
+  return first;
+}
+
+/* Merge ranges R1 and R2 and returns the result.  The function
+   maintains the order of ranges and tries to minimize size of the
+   result range list.  */
+lra_live_range_t 
+lra_merge_live_ranges (lra_live_range_t r1, lra_live_range_t r2)
+{
+  lra_live_range_t first, last, temp;
+
+  if (r1 == NULL)
+    return r2;
+  if (r2 == NULL)
+    return r1;
+  for (first = last = NULL; r1 != NULL && r2 != NULL;)
+    {
+      if (r1->start < r2->start)
+	{
+	  temp = r1;
+	  r1 = r2;
+	  r2 = temp;
+	}
+      if (r1->start <= r2->finish + 1)
+	{
+	  /* Intersected ranges: merge r1 and r2 into r1.  */
+	  r1->start = r2->start;
+	  if (r1->finish < r2->finish)
+	    r1->finish = r2->finish;
+	  temp = r2;
+	  r2 = r2->next;
+	  pool_free (live_range_pool, temp);
+	  if (r2 == NULL)
+	    {
+	      /* To try to merge with subsequent ranges in r1.	*/
+	      r2 = r1->next;
+	      r1->next = NULL;
+	    }
+	}
+      else
+	{
+	  /* Add r1 to the result.  */
+	  if (first == NULL)
+	    first = last = r1;
+	  else
+	    {
+	      last->next = r1;
+	      last = r1;
+	    }
+	  r1 = r1->next;
+	  if (r1 == NULL)
+	    {
+	      /* To try to merge with subsequent ranges in r2.	*/
+	      r1 = r2->next;
+	      r2->next = NULL;
+	    }
+	}
+    }
+  if (r1 != NULL)
+    {
+      if (first == NULL)
+	first = r1;
+      else
+	last->next = r1;
+      lra_assert (r1->next == NULL);
+    }
+  else if (r2 != NULL)
+    {
+      if (first == NULL)
+	first = r2;
+      else
+	last->next = r2;
+      lra_assert (r2->next == NULL);
+    }
+  else
+    {
+      lra_assert (last->next == NULL);
+    }
+  return first;
+}
+
+/* Return TRUE if live ranges R1 and R2 intersect.  */
+bool
+lra_intersected_live_ranges_p (lra_live_range_t r1, lra_live_range_t r2)
+{
+  /* Remember the live ranges are always kept ordered.	*/
+  while (r1 != NULL && r2 != NULL)
+    {
+      if (r1->start > r2->finish)
+	r1 = r1->next;
+      else if (r2->start > r1->finish)
+	r2 = r2->next;
+      else
+	return true;
+    }
+  return false;
+}
+
+/* Return TRUE if live range R1 is in R2.  */
+bool
+lra_live_range_in_p (lra_live_range_t r1, lra_live_range_t r2)
+{
+  /* Remember the live ranges are always kept ordered.	*/
+  while (r1 != NULL && r2 != NULL)
+    {
+      /* R1's element is in R2's element.  */
+      if (r2->start <= r1->start && r1->finish <= r2->finish)
+	r1 = r1->next;
+      /* Intersection: R1's start is in R2.  */
+      else if (r2->start <= r1->start && r1->start <= r2->finish)
+	return false;
+      /* Intersection: R1's finish is in R2.  */
+      else if (r2->start <= r1->finish && r1->finish <= r2->finish)
+	return false;
+      else if (r1->start > r2->finish)
+	return false; /* No covering R2's element for R1's one.	 */
+      else
+	r2 = r2->next;
+    }
+  return r1 == NULL;
+}
+
+/* The function processing birth of hard register REGNO.  It updates
+   living hard regs, conflict hard regs for living pseudos, and
+   START_LIVING.  */
+static void
+make_hard_regno_born (int regno)
+{
+  unsigned int i;
+
+  lra_assert (regno < FIRST_PSEUDO_REGISTER);
+  if (TEST_HARD_REG_BIT (lra_no_alloc_regs, regno)
+      || TEST_HARD_REG_BIT (hard_regs_live, regno))
+    return;
+  SET_HARD_REG_BIT (hard_regs_live, regno);
+  sparseset_set_bit (start_living, regno);
+  EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
+    SET_HARD_REG_BIT (lra_reg_info[i].conflict_hard_regs, regno);
+}
+
+/* Process the death of hard register REGNO.  This updates
+   hard_regs_live and START_DYING.  */
+static void
+make_hard_regno_dead (int regno)
+{
+  if (TEST_HARD_REG_BIT (lra_no_alloc_regs, regno)
+      || ! TEST_HARD_REG_BIT (hard_regs_live, regno))
+    return;
+  lra_assert (regno < FIRST_PSEUDO_REGISTER);
+  sparseset_set_bit (start_dying, regno);
+  CLEAR_HARD_REG_BIT (hard_regs_live, regno);
+}
+
+/* Mark pseudo REGNO as currently living, update conflicting hard
+   registers of the pseudo and START_LIVING, and start a new live
+   range for the pseudo corresponding to REGNO if it is necessary.  */
+static void
+mark_pseudo_live (int regno)
+{
+  lra_live_range_t p;
+
+  lra_assert (regno >= FIRST_PSEUDO_REGISTER);
+  lra_assert (! sparseset_bit_p (pseudos_live, regno));
+  sparseset_set_bit (pseudos_live, regno);
+  IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs, hard_regs_live);
+  
+  if ((complete_info_p || lra_get_regno_hard_regno (regno) < 0)
+      && ((p = lra_reg_info[regno].live_ranges) == NULL
+	  || (p->finish != curr_point && p->finish + 1 != curr_point)))
+    lra_reg_info[regno].live_ranges
+      = create_live_range (regno, curr_point, -1, p);
+  sparseset_set_bit (start_living, regno);
+}
+
+/* Mark pseudo REGNO as currently not living and update START_DYING.
+   This finishes the current live range for the pseudo corresponding
+   to REGNO.  */
+static void
+mark_pseudo_dead (int regno)
+{
+  lra_live_range_t p;
+
+  lra_assert (regno >= FIRST_PSEUDO_REGISTER);
+  lra_assert (sparseset_bit_p (pseudos_live, regno));
+  sparseset_clear_bit (pseudos_live, regno);
+  sparseset_set_bit (start_dying, regno);
+  if (complete_info_p || lra_get_regno_hard_regno (regno) < 0)
+    {
+      p = lra_reg_info[regno].live_ranges;
+      lra_assert (p != NULL);
+      p->finish = curr_point;
+    }
+}
+
+/* Mark register REGNO (pseudo or hard register) in MODE as live.  */
+static void
+mark_regno_live (int regno, enum machine_mode mode)
+{
+  int last;
+
+  if (regno < FIRST_PSEUDO_REGISTER)
+    {
+      for (last = regno + hard_regno_nregs[regno][mode];
+	   regno < last;
+	   regno++)
+	make_hard_regno_born (regno);
+    }
+  else if (! sparseset_bit_p (pseudos_live, regno))
+    mark_pseudo_live (regno);
+}
+
+
+/* Mark register REGNO in MODE as dead.	 */
+static void
+mark_regno_dead (int regno, enum machine_mode mode)
+{
+  int last;
+
+  if (regno < FIRST_PSEUDO_REGISTER)
+    {
+      for (last = regno + hard_regno_nregs[regno][mode];
+	   regno < last;
+	   regno++)
+	make_hard_regno_dead (regno);
+    }
+  else if (sparseset_bit_p (pseudos_live, regno))
+    mark_pseudo_dead (regno);
+}
+
+/* Insn currently scanned.  */
+static rtx curr_insn;
+/* The insn data.  */
+static lra_insn_recog_data_t curr_id;
+/* The insn static data.  */
+static struct lra_static_insn_data *curr_static_id;
+
+/* Return true when one of the predecessor edges of BB is marked with
+   EDGE_ABNORMAL_CALL or EDGE_EH.  */
+static bool
+bb_has_abnormal_call_pred (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+  
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
+	return true;
+    }
+  return false;
+}
+
+/* Vec containing execution frequencies of program points.  */
+static VEC(int,heap) *point_freq_vec;
+
+/* The start of the above vector elements.  */
+int *lra_point_freq;
+
+/* Increment the current program point to the next point which has
+   execution frequency FREQ.  */
+static void
+incr_curr_point (int freq)
+{
+  VEC_safe_push (int, heap, point_freq_vec, freq);
+  lra_point_freq = VEC_address (int, point_freq_vec);
+  curr_point++;
+}
+
+/* Update the preference of HARD_REGNO for pseudo REGNO by PROFIT.  */
+void
+lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
+					      int hard_regno, int profit)
+{
+  lra_assert (regno >= lra_constraint_new_regno_start);
+  if (lra_reg_info[regno].preferred_hard_regno1 == hard_regno)
+    lra_reg_info[regno].preferred_hard_regno_profit1 += profit;
+  else if (lra_reg_info[regno].preferred_hard_regno2 == hard_regno)
+    lra_reg_info[regno].preferred_hard_regno_profit2 += profit;
+  else if (lra_reg_info[regno].preferred_hard_regno1 < 0)
+    {
+      lra_reg_info[regno].preferred_hard_regno1 = hard_regno;
+      lra_reg_info[regno].preferred_hard_regno_profit1 = profit;
+    }
+  else if (lra_reg_info[regno].preferred_hard_regno2 < 0
+	   || profit > lra_reg_info[regno].preferred_hard_regno_profit2)
+    {
+      lra_reg_info[regno].preferred_hard_regno2 = hard_regno;
+      lra_reg_info[regno].preferred_hard_regno_profit2 = profit;
+    }
+  else
+    return;
+  /* Keep the 1st hard regno as more profitable.  */
+  if (lra_reg_info[regno].preferred_hard_regno1 >= 0
+      && lra_reg_info[regno].preferred_hard_regno2 >= 0
+      && (lra_reg_info[regno].preferred_hard_regno_profit2
+	  > lra_reg_info[regno].preferred_hard_regno_profit1))
+    {
+      int temp;
+
+      temp = lra_reg_info[regno].preferred_hard_regno1;
+      lra_reg_info[regno].preferred_hard_regno1
+	= lra_reg_info[regno].preferred_hard_regno2;
+      lra_reg_info[regno].preferred_hard_regno2 = temp;
+      temp = lra_reg_info[regno].preferred_hard_regno_profit1;
+      lra_reg_info[regno].preferred_hard_regno_profit1
+	= lra_reg_info[regno].preferred_hard_regno_profit2;
+      lra_reg_info[regno].preferred_hard_regno_profit2 = temp;
+    }
+  if (lra_dump_file != NULL)
+    {
+      if ((hard_regno = lra_reg_info[regno].preferred_hard_regno1) >= 0)
+	fprintf (lra_dump_file,
+		 "	Hard reg %d is preferable by r%d with profit %d\n",
+		 hard_regno, regno,
+		 lra_reg_info[regno].preferred_hard_regno_profit1);
+      if ((hard_regno = lra_reg_info[regno].preferred_hard_regno2) >= 0)
+	fprintf (lra_dump_file,
+		 "	Hard reg %d is preferable by r%d with profit %d\n",
+		 hard_regno, regno,
+		 lra_reg_info[regno].preferred_hard_regno_profit2);
+    }
+}
+
+/* Check that REGNO living through calls and setjumps, set up conflict
+   regs, and clear corresponding bits in PSEUDOS_LIVE_THROUGH_CALLS and
+   PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
+static inline void
+check_pseudos_live_through_calls (int regno)
+{
+  if (! sparseset_bit_p (pseudos_live_through_calls, regno))
+    return;
+  sparseset_clear_bit (pseudos_live_through_calls, regno);
+  IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs,
+		    call_used_reg_set);
+#ifdef ENABLE_CHECKING
+  lra_reg_info[regno].call_p = true;
+#endif
+  if (! sparseset_bit_p (pseudos_live_through_setjumps, regno))
+    return;
+  sparseset_clear_bit (pseudos_live_through_setjumps, regno);
+  /* Don't allocate pseudos that cross setjmps or any call, if this
+     function receives a nonlocal goto.	 */
+  SET_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs);
+}
+
+/* Process insns of the basic block BB to update pseudo live ranges,
+   pseudo hard register conflicts, and insn notes.  We do it on
+   backward scan of BB insns.  */
+static void
+process_bb_lives (basic_block bb)
+{
+  int i, regno, freq;
+  unsigned int j;
+  bitmap_iterator bi;
+  bitmap reg_live_out;
+  unsigned int px;
+  rtx link, *link_loc;
+
+  reg_live_out = DF_LR_OUT (bb);
+  sparseset_clear (pseudos_live);
+  sparseset_clear (pseudos_live_through_calls);
+  sparseset_clear (pseudos_live_through_setjumps);
+  REG_SET_TO_HARD_REG_SET (hard_regs_live, reg_live_out);
+  AND_COMPL_HARD_REG_SET (hard_regs_live, eliminable_regset);
+  AND_COMPL_HARD_REG_SET (hard_regs_live, lra_no_alloc_regs);
+  EXECUTE_IF_SET_IN_BITMAP (reg_live_out, 0, j, bi)
+    if (j >= FIRST_PSEUDO_REGISTER)
+      mark_pseudo_live (j);
+      
+  freq = REG_FREQ_FROM_BB (bb);
+
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file, "  BB %d\n", bb->index);
+
+  /* Scan the code of this basic block, noting which pseudos and hard
+     regs are born or die.
+
+     Note that this loop treats uninitialized values as live until the
+     beginning of the block.  For example, if an instruction uses
+     (reg:DI foo), and only (subreg:SI (reg:DI foo) 0) is ever set,
+     FOO will remain live until the beginning of the block.  Likewise
+     if FOO is not set at all.	This is unnecessarily pessimistic, but
+     it probably doesn't matter much in practice.  */
+  FOR_BB_INSNS_REVERSE (bb, curr_insn)
+    {
+      bool call_p;
+      int dst_regno, src_regno;
+      rtx set;
+      struct lra_insn_reg *reg;
+
+      if (!NONDEBUG_INSN_P (curr_insn))
+	continue;
+      
+      curr_id = lra_get_insn_recog_data (curr_insn);
+      curr_static_id = curr_id->insn_static_data;
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "   Insn %u: point = %d\n",
+		 INSN_UID (curr_insn), curr_point);
+
+      /* Update max ref width and hard reg usage.  */
+      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+	if (reg->regno >= FIRST_PSEUDO_REGISTER
+	    && (GET_MODE_SIZE (reg->biggest_mode)
+		> GET_MODE_SIZE (lra_reg_info[reg->regno].biggest_mode)))
+	  lra_reg_info[reg->regno].biggest_mode = reg->biggest_mode;
+	else if (reg->regno < FIRST_PSEUDO_REGISTER)
+	  lra_hard_reg_usage[reg->regno] += freq;
+
+      call_p = CALL_P (curr_insn);
+      if (complete_info_p
+	  && (set = single_set (curr_insn)) != NULL_RTX
+	  && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set))
+	  /* Check that source regno does not conflict with
+	     destination regno to exclude most impossible
+	     preferences.  */
+	  && ((((src_regno = REGNO (SET_SRC (set))) >= FIRST_PSEUDO_REGISTER
+		&& ! sparseset_bit_p (pseudos_live, src_regno))
+	       || (src_regno < FIRST_PSEUDO_REGISTER
+		   && ! TEST_HARD_REG_BIT (hard_regs_live, src_regno)))
+	      /* It might be 'inheritance pseudo <- reload pseudo'.  */
+	      || (src_regno >= lra_constraint_new_regno_start
+		  && ((int) REGNO (SET_DEST (set))
+		      >= lra_constraint_new_regno_start))))
+	{
+	  int hard_regno = -1, regno = -1;
+
+	  dst_regno = REGNO (SET_DEST (set));
+	  if (dst_regno >= lra_constraint_new_regno_start
+	      && src_regno >= lra_constraint_new_regno_start)
+	    lra_create_copy (dst_regno, src_regno, freq);
+	  else if (dst_regno >= lra_constraint_new_regno_start)
+	    {
+	      if ((hard_regno = src_regno) >= FIRST_PSEUDO_REGISTER)
+		hard_regno = reg_renumber[src_regno];
+	      regno = dst_regno;
+	    }
+	  else if (src_regno >= lra_constraint_new_regno_start)
+	    {
+	      if ((hard_regno = dst_regno) >= FIRST_PSEUDO_REGISTER)
+		hard_regno = reg_renumber[dst_regno];
+	      regno = src_regno;
+	    }
+	  if (regno >= 0 && hard_regno >= 0)
+	    lra_setup_reload_pseudo_preferenced_hard_reg
+	      (regno, hard_regno, freq);
+	}
+
+      sparseset_clear (start_living);
+      /* Mark each defined value as live.  We need to do this for
+	 unused values because they still conflict with quantities
+	 that are live at the time of the definition.  */
+      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+	if (reg->type != OP_IN)
+	  {
+	    mark_regno_live (reg->regno, reg->biggest_mode);
+	    check_pseudos_live_through_calls (reg->regno);
+	  }
+
+      for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
+	if (reg->type != OP_IN)
+	  make_hard_regno_born (reg->regno);
+
+      sparseset_copy (unused_set, start_living);
+      
+      sparseset_clear (start_dying);
+
+      /* See which defined values die here.  */
+      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+	if (reg->type == OP_OUT && ! reg->early_clobber
+	    && (! reg->subreg_p
+		|| bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
+	  mark_regno_dead (reg->regno, reg->biggest_mode);
+
+      for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
+	if (reg->type == OP_OUT && ! reg->early_clobber
+	    && (! reg->subreg_p
+		|| bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
+	  make_hard_regno_dead (reg->regno);
+
+      if (call_p)
+	{
+	  sparseset_ior (pseudos_live_through_calls,
+			 pseudos_live_through_calls, pseudos_live);
+	  if (cfun->has_nonlocal_label
+	      || find_reg_note (curr_insn, REG_SETJMP,
+				NULL_RTX) != NULL_RTX)
+	    sparseset_ior (pseudos_live_through_setjumps,
+			   pseudos_live_through_setjumps, pseudos_live);
+	}
+      
+      incr_curr_point (freq);
+      
+      sparseset_clear (start_living);
+
+      /* Mark each used value as live.	*/
+      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+	if (reg->type == OP_IN)
+	  {
+	    mark_regno_live (reg->regno, reg->biggest_mode);
+	    check_pseudos_live_through_calls (reg->regno);
+	  }
+
+      for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
+	if (reg->type == OP_IN)
+	  make_hard_regno_born (reg->regno);
+
+      if (curr_id->arg_hard_regs != NULL)
+	/* Make argument hard registers live.  */
+	for (i = 0; (regno = curr_id->arg_hard_regs[i]) >= 0; i++)
+	  make_hard_regno_born (regno);
+
+      sparseset_and_compl (dead_set, start_living, start_dying);
+      
+      /* Mark early clobber outputs dead.  */
+      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+	if (reg->type == OP_OUT && reg->early_clobber && ! reg->subreg_p)
+	  mark_regno_dead (reg->regno, reg->biggest_mode);
+
+      for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
+	if (reg->type == OP_OUT && reg->early_clobber && ! reg->subreg_p)
+	  make_hard_regno_dead (reg->regno);
+
+      incr_curr_point (freq);
+
+      /* Update notes.	*/
+      for (link_loc = &REG_NOTES (curr_insn); (link = *link_loc) != NULL_RTX;)
+	{
+	  if (REG_NOTE_KIND (link) != REG_DEAD
+	      && REG_NOTE_KIND (link) != REG_UNUSED)
+	    ;
+	  else if (REG_P (XEXP (link, 0)))
+	    {
+	      regno = REGNO (XEXP (link, 0));
+	      if ((REG_NOTE_KIND (link) == REG_DEAD
+		   && ! sparseset_bit_p (dead_set, regno))
+		  || (REG_NOTE_KIND (link) == REG_UNUSED
+		      && ! sparseset_bit_p (unused_set, regno)))
+		{
+		  *link_loc = XEXP (link, 1);
+		  continue;
+		}
+	      if (REG_NOTE_KIND (link) == REG_DEAD)
+		sparseset_clear_bit (dead_set, regno);
+	      else if (REG_NOTE_KIND (link) == REG_UNUSED)
+		sparseset_clear_bit (unused_set, regno);
+	    }
+	  link_loc = &XEXP (link, 1);  
+	}
+      EXECUTE_IF_SET_IN_SPARSESET (dead_set, j)
+	add_reg_note (curr_insn, REG_DEAD, regno_reg_rtx[j]);
+      EXECUTE_IF_SET_IN_SPARSESET (unused_set, j)
+	add_reg_note (curr_insn, REG_UNUSED, regno_reg_rtx[j]);
+    }
+  
+#ifdef EH_RETURN_DATA_REGNO
+  if (bb_has_eh_pred (bb))
+    for (j = 0; ; ++j)
+      {
+	unsigned int regno = EH_RETURN_DATA_REGNO (j);
+
+	if (regno == INVALID_REGNUM)
+	  break;
+	make_hard_regno_born (regno);
+      }
+#endif
+  
+  /* Pseudos can't go in stack regs at the start of a basic block that
+     is reached by an abnormal edge. Likewise for call clobbered regs,
+     because caller-save, fixup_abnormal_edges and possibly the table
+     driven EH machinery are not quite ready to handle such pseudos
+     live across such edges.  */
+  if (bb_has_abnormal_pred (bb))
+    {
+#ifdef STACK_REGS
+      EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, px)
+	lra_reg_info[px].no_stack_p = true;
+      for (px = FIRST_STACK_REG; px <= LAST_STACK_REG; px++)
+	make_hard_regno_born (px);
+#endif
+      /* No need to record conflicts for call clobbered regs if we
+	 have nonlocal labels around, as we don't ever try to
+	 allocate such regs in this case.  */
+      if (!cfun->has_nonlocal_label && bb_has_abnormal_call_pred (bb))
+	for (px = 0; px < FIRST_PSEUDO_REGISTER; px++)
+	  if (call_used_regs[px])
+	    make_hard_regno_born (px);
+    }
+  
+  EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
+    mark_pseudo_dead (i);
+
+  EXECUTE_IF_SET_IN_SPARSESET (pseudos_live_through_calls, i)
+    if (bitmap_bit_p (DF_LR_IN (bb), i))
+      check_pseudos_live_through_calls (i);
+  
+  incr_curr_point (freq);
+}
+
+/* Create and set up LRA_START_POINT_RANGES and
+   LRA_FINISH_POINT_RANGES.  */
+static void
+create_start_finish_chains (void)
+{
+  int i, max_regno;
+  lra_live_range_t r;
+
+  lra_start_point_ranges
+    = (lra_live_range_t *) xmalloc (lra_live_max_point
+				    * sizeof (lra_live_range_t));
+  memset (lra_start_point_ranges, 0,
+	  lra_live_max_point * sizeof (lra_live_range_t));
+  lra_finish_point_ranges
+    = (lra_live_range_t *) xmalloc (lra_live_max_point
+				    * sizeof (lra_live_range_t));
+  memset (lra_finish_point_ranges, 0,
+	  lra_live_max_point * sizeof (lra_live_range_t));
+  max_regno = max_reg_num ();
+  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
+    {
+      for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
+	{
+	  r->start_next = lra_start_point_ranges[r->start];
+	  lra_start_point_ranges[r->start] = r;
+	  r->finish_next = lra_finish_point_ranges[r->finish];
+	  lra_finish_point_ranges[r->finish] = r;
+	}
+    }
+}
+
+/* Rebuild LRA_START_POINT_RANGES and LRA_FINISH_POINT_RANGES after
+   new live ranges and program points were added as a result of new
+   insn generation.  */
+static void
+rebuild_start_finish_chains (void)
+{
+  free (lra_finish_point_ranges);
+  free (lra_start_point_ranges);
+  create_start_finish_chains ();
+}
+
+/* Compress pseudo live ranges by removing program points where
+   nothing happens.  Complexity of many algorithms in LRA is linear
+   function of program points number.  To speed up the code we try to
+   minimize the number of the program points here.  */
+static void
+remove_some_program_points_and_update_live_ranges (void)
+{
+  unsigned i;
+  int n, max_regno;
+  int *map;
+  lra_live_range_t r, prev_r, next_r;
+  sbitmap born_or_dead, born, dead;
+  sbitmap_iterator sbi;
+  bool born_p, dead_p, prev_born_p, prev_dead_p;
+  
+  born = sbitmap_alloc (lra_live_max_point);
+  dead = sbitmap_alloc (lra_live_max_point);
+  sbitmap_zero (born);
+  sbitmap_zero (dead);
+  max_regno = max_reg_num ();
+  for (i = FIRST_PSEUDO_REGISTER; i < (unsigned) max_regno; i++)
+    {
+      for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
+	{
+	  lra_assert (r->start <= r->finish);
+	  SET_BIT (born, r->start);
+	  SET_BIT (dead, r->finish);
+	}
+    }
+  born_or_dead = sbitmap_alloc (lra_live_max_point);
+  sbitmap_a_or_b (born_or_dead, born, dead);
+  map = (int *) xmalloc (sizeof (int) * lra_live_max_point);
+  n = -1;
+  prev_born_p = prev_dead_p = false;
+  EXECUTE_IF_SET_IN_SBITMAP (born_or_dead, 0, i, sbi)
+    {
+      born_p = TEST_BIT (born, i);
+      dead_p = TEST_BIT (dead, i);
+      if ((prev_born_p && ! prev_dead_p && born_p && ! dead_p)
+	  || (prev_dead_p && ! prev_born_p && dead_p && ! born_p))
+	{
+	  map[i] = n;
+	  lra_point_freq[n] = MAX (lra_point_freq[n], lra_point_freq[i]);
+	}
+      else
+	{
+	  map[i] = ++n;
+	  lra_point_freq[n] = lra_point_freq[i];
+	}
+      prev_born_p = born_p;
+      prev_dead_p = dead_p;
+    }
+  sbitmap_free (born_or_dead);
+  sbitmap_free (born);
+  sbitmap_free (dead);
+  n++;
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file, "Compressing live ranges: from %d to %d - %d%%\n",
+	     lra_live_max_point, n, 100 * n / lra_live_max_point);
+  lra_live_max_point = n;
+  for (i = FIRST_PSEUDO_REGISTER; i < (unsigned) max_regno; i++)
+    {
+      for (prev_r = NULL, r = lra_reg_info[i].live_ranges;
+	   r != NULL;
+	   r = next_r)
+	{
+	  next_r = r->next;
+	  r->start = map[r->start];
+	  r->finish = map[r->finish];
+	  if (prev_r == NULL || prev_r->start > r->finish + 1)
+	    {
+	      prev_r = r;
+	      continue;
+	    }
+	  prev_r->start = r->start;
+	  prev_r->next = next_r;
+	  free_live_range (r);
+	}
+    }
+  free (map);
+}
+
+/* Print live ranges R to file F.  */
+void
+lra_print_live_range_list (FILE *f, lra_live_range_t r)
+{
+  for (; r != NULL; r = r->next)
+    fprintf (f, " [%d..%d]", r->start, r->finish);
+  fprintf (f, "\n");
+}
+
+/* Print live ranges R to stderr.  */
+void
+lra_debug_live_range_list (lra_live_range_t r)
+{
+  lra_print_live_range_list (stderr, r);
+}
+
+/* Print live ranges of pseudo REGNO to file F.	 */
+static void
+print_pseudo_live_ranges (FILE *f, int regno)
+{
+  if (lra_reg_info[regno].live_ranges == NULL)
+    return;
+  fprintf (f, " r%d:", regno);
+  lra_print_live_range_list (f, lra_reg_info[regno].live_ranges);
+}
+
+/* Print live ranges of pseudo REGNO to stderr.	 */
+void
+lra_debug_pseudo_live_ranges (int regno)
+{
+  print_pseudo_live_ranges (stderr, regno);
+}
+
+/* Print live ranges of all pseudos to file F.	*/
+static void
+print_live_ranges (FILE *f)
+{
+  int i, max_regno;
+
+  max_regno = max_reg_num ();
+  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
+    print_pseudo_live_ranges (f, i);
+}
+
+/* Print live ranges of all pseudos to stderr.	*/
+void
+lra_debug_live_ranges (void)
+{
+  print_live_ranges (stderr);
+}
+
+/* Compress pseudo live ranges.	 */
+static void
+compress_live_ranges (void)
+{
+  remove_some_program_points_and_update_live_ranges ();
+  rebuild_start_finish_chains ();
+  if (lra_dump_file != NULL)
+    {
+      fprintf (lra_dump_file, "Ranges after the compression:\n");
+      print_live_ranges (lra_dump_file);
+    }
+}
+
+/* The number of the current live range pass.  */
+int lra_live_range_iter;
+
+/* The main entry function creates live ranges only for memory pseudos
+   (or for all ones if ALL_P), set up CONFLICT_HARD_REGS for
+   the pseudos.	 */
+void
+lra_create_live_ranges (bool all_p)
+{
+  basic_block bb;
+  int i, hard_regno, max_regno = max_reg_num ();
+
+  complete_info_p = all_p;
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file,
+	     "\n********** Pseudo live ranges #%d: **********\n\n",
+	     ++lra_live_range_iter);
+  memset (lra_hard_reg_usage, 0, sizeof (lra_hard_reg_usage));
+  for (i = 0; i < max_regno; i++)
+    {
+      lra_reg_info[i].live_ranges = NULL;
+      CLEAR_HARD_REG_SET (lra_reg_info[i].conflict_hard_regs);
+      lra_reg_info[i].preferred_hard_regno1 = -1;
+      lra_reg_info[i].preferred_hard_regno2 = -1;
+      lra_reg_info[i].preferred_hard_regno_profit1 = 0;
+      lra_reg_info[i].preferred_hard_regno_profit2 = 0;
+#ifdef STACK_REGS
+      lra_reg_info[i].no_stack_p = false;
+#endif
+      if (regno_reg_rtx[i] != NULL_RTX)
+	lra_reg_info[i].biggest_mode = GET_MODE (regno_reg_rtx[i]);
+      else
+	lra_reg_info[i].biggest_mode = VOIDmode;
+#ifdef ENABLE_CHECKING
+      lra_reg_info[i].call_p = false;
+#endif
+      if (i >= FIRST_PSEUDO_REGISTER
+	  && lra_reg_info[i].nrefs != 0 && (hard_regno = reg_renumber[i]) >= 0)
+	lra_hard_reg_usage[hard_regno] += lra_reg_info[i].freq;
+    }
+  lra_free_copies ();
+  pseudos_live = sparseset_alloc (max_regno);
+  pseudos_live_through_calls = sparseset_alloc (max_regno);
+  pseudos_live_through_setjumps = sparseset_alloc (max_regno);
+  start_living = sparseset_alloc (max_regno);
+  start_dying = sparseset_alloc (max_regno);
+  dead_set = sparseset_alloc (max_regno);
+  unused_set = sparseset_alloc (max_regno);
+  curr_point = 0;
+  point_freq_vec = VEC_alloc (int, heap, get_max_uid () * 2);
+  lra_point_freq = VEC_address (int, point_freq_vec);
+  FOR_EACH_BB (bb)
+    process_bb_lives (bb);
+  lra_live_max_point = curr_point;
+  create_start_finish_chains ();
+  if (lra_dump_file != NULL)
+    print_live_ranges (lra_dump_file);
+  /* Clean up.	*/
+  sparseset_free (unused_set);
+  sparseset_free (dead_set);
+  sparseset_free (start_dying);
+  sparseset_free (start_living);
+  sparseset_free (pseudos_live_through_calls);
+  sparseset_free (pseudos_live_through_setjumps);
+  sparseset_free (pseudos_live);
+  compress_live_ranges ();
+}
+
+/* Finish all live ranges.  */
+void
+lra_clear_live_ranges (void)
+{
+  int i;
+
+  if (lra_finish_point_ranges != NULL)
+    {
+      free (lra_finish_point_ranges);
+      lra_finish_point_ranges = NULL;
+    }
+  if (lra_start_point_ranges != NULL)
+    {
+      free (lra_start_point_ranges);
+      lra_start_point_ranges = NULL;
+    }
+  for (i = 0; i < max_reg_num (); i++)
+    free_live_range_list (lra_reg_info[i].live_ranges);
+  VEC_free (int, heap, point_freq_vec);
+}
+
+/* Initialize live ranges data once per function.  */
+void
+lra_live_ranges_init (void)
+{
+  live_range_pool = create_alloc_pool ("live ranges",
+				       sizeof (struct lra_live_range), 100);
+}
+
+/* Finish live ranges data once per function.  */
+void
+lra_live_ranges_finish (void)
+{
+  free_alloc_pool (live_range_pool);
+}