diff mbox

RFC: LRA for x86/x86-64 [7/9] -- continuation

Message ID 5064DA43.90803@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov Sept. 27, 2012, 10:59 p.m. UTC
It is the 2nd part of the previous patch.

Comments

Richard Sandiford Oct. 3, 2012, 11:11 a.m. UTC | #1
Hi Vlad,

Some comments on lra-spills.c and lra-coalesce.c.

> +   The pass creates necessary stack slots and assign spilled pseudos
> +   to the stack slots in following way:

s/assign/assigns/

> +   (or insn memory constraints) might be not satisfied any more.

s/might be not/might not be/

> +   For some targets, the pass can spill some pseudos into hard
> +   registers of different class (usually into vector registers)
> +   instead of spilling them into memory if it is possible and
> +   profitable.	Spilling GENERAL_REGS pseudo into SSE registers for
> +   modern Intel x86/x86-64 processors is an example of such
> +   optimization.  And this is actually recommended by Intel
> +   optimization guide.

Maybe mention core i7 specifically?  "Modern" is a bit dangerous
in code that'll live a long time.

> +/* The structure describes a stack slot which can be used for several
> +   spilled pseudos.  */
> +struct slot
> +{

Looks like this describes "a register or stack slot" given the hard_regno case.

> +/* Array containing info about the stack slots.	 The array element is
> +   indexed by the stack slot number in the range [0..slost_num).  */

Typo: slots_num

> +  /* Each pseudo has an inherent size which comes from its own mode,
> +     and a total size which provides room for paradoxical subregs
> +     which refer to the pseudo reg in wider modes.
> +     
> +     We can use a slot already allocated if it provides both enough
> +     inherent space and enough total space.  Otherwise, we allocate a
> +     new slot, making sure that it has no less inherent space, and no
> +     less total space, then the previous slot.	*/

The second part of the comment seems a bit misplaced, since the following
code doesn't reuse stack slots.  This is done elsewhere instead.
Maybe the first part would be better above the inherent_size assignment.

> +  /* If we have any adjustment to make, or if the stack slot is the
> +     wrong mode, make a new stack slot.	 */
> +  x = adjust_address_nv (x, GET_MODE (regno_reg_rtx[i]), adjust);

We don't make a new slot here.

> +/* Sort pseudos according their slot numbers putting ones with smaller
> +   numbers first, or last when the frame pointer is not needed.	 So
> +   pseudos with the first slot will be finally addressed with smaller
> +   address displacement.  */
> +static int
> +pseudo_reg_slot_compare (const void *v1p, const void *v2p)
> +{
> +  const int regno1 = *(const int *) v1p;
> +  const int regno2 = *(const int *) v2p;
> +  int diff, slot_num1, slot_num2;
> +  int total_size1, total_size2;
> +
> +  slot_num1 = pseudo_slots[regno1].slot_num;
> +  slot_num2 = pseudo_slots[regno2].slot_num;
> +  if ((diff = slot_num1 - slot_num2) != 0)
> +    return (frame_pointer_needed
> +	    || !FRAME_GROWS_DOWNWARD == STACK_GROWS_DOWNWARD ? diff : -diff);

The comment doesn't quite describe the condition.  Maybe:

/* Sort pseudos according to their slots, putting the slots in the order
   that they should be allocated.  Slots with lower numbers have the highest
   priority and should get the smallest displacement from the stack or
   frame pointer (whichever is being used).

   The first allocated slot is always closest to the frame pointer,
   so prefer lower slot numbers when frame_pointer_needed.  If the stack
   and frame grow in the same direction, then the first allocated slot is
   always closest to the initial stack pointer and furthest away from the
   final stack pointer, so allocate higher numbers first when using the
   stack pointer in that case.  The reverse is true if the stack and
   frame grow in opposite directions.  */

> +  total_size1 = MAX (PSEUDO_REGNO_BYTES (regno1),
> +		     GET_MODE_SIZE (lra_reg_info[regno1].biggest_mode));
> +  total_size2 = MAX (PSEUDO_REGNO_BYTES (regno2),
> +		     GET_MODE_SIZE (lra_reg_info[regno2].biggest_mode));
> +  if ((diff = total_size2 - total_size1) != 0)
> +    return diff;

I think this could do with a bit more commentary.  When is biggest_mode
ever smaller than PSEUDO_REGNO_BYTES?  Is that for pseudos that are only
ever referenced as lowpart subregs?  If so, why does PSEUDO_REGNO_BYTES
matter for those registers here but not when calculating biggest_mode?

> +/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS.  Put the
> +   pseudos which did not get a spill hard register at the beginning of
> +   array PSEUDO_REGNOS.	 Return the number of such pseudos.  */

It'd be worth saying that PSEUDO_REGNOS is sorted in order of highest
frequency first.

> +  bitmap set_jump_crosses = regstat_get_setjmp_crosses ();

I notice you use "set_jump" here and "setjump" in parts of 7a.patch.
Probably better to use setjmp across the board.

> +  /* Hard registers which can not be used for any purpose at given
> +     program point because they are unallocatable or already allocated
> +     for other pseudos.	 */ 
> +  HARD_REG_SET *reserved_hard_regs;
> +
> +  if (! lra_reg_spill_p)
> +    return n;
> +  /* Set up reserved hard regs for every program point.	 */
> +  reserved_hard_regs = (HARD_REG_SET *) xmalloc (sizeof (HARD_REG_SET)
> +						 * lra_live_max_point);
> +  for (p = 0; p < lra_live_max_point; p++)
> +    COPY_HARD_REG_SET (reserved_hard_regs[p], lra_no_alloc_regs);
> +  for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
> +    if (lra_reg_info[i].nrefs != 0
> +	&& (hard_regno = lra_get_regno_hard_regno (i)) >= 0)
> +      for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
> +	for (p = r->start; p <= r->finish; p++)
> +	  lra_add_hard_reg_set (hard_regno, lra_reg_info[i].biggest_mode,
> +				&reserved_hard_regs[p]);

Since compilation time seems to be all the rage, I wonder if it would be
quicker to have one live range list per hard register.  Then:

> +      for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
> +	for (p = r->start; p <= r->finish; p++)
> +	  IOR_HARD_REG_SET (conflict_hard_regs, reserved_hard_regs[p]);

would just be checking for live range intersection and:

> +      /* Update reserved_hard_regs.  */
> +      for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
> +	for (p = r->start; p <= r->finish; p++)
> +	  lra_add_hard_reg_set (hard_regno, lra_reg_info[regno].biggest_mode,
> +				&reserved_hard_regs[p]);

would again be a merge.

Just an idea, not a merge requirement.  If you've already tried this and
found it to be worse, that might be worth a comment.

> +      first = pseudo_slots[regno].first = &pseudo_slots[slots[slot_num].regno];
> +      pseudo_slots[regno].next = pseudo_slots[slots[slot_num].regno].next;
> +      first->next = &pseudo_slots[regno];

Very minor nit, but I think this would be easier to read if the middle
line also used "first->next".

> +/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS.  Put the
> +   pseudos which did not get a spill hard register at the beginning of
> +   array PSEUDO_REGNOS.	 Return the number of such pseudos.  */

Here too I think it's worth mentioning that PSEUDO_REGNOS is sorted
with highest frequency first.

> +/* Recursively process LOC in INSN and change spilled pseudos to the
> +   corresponding memory or spilled hard reg.  Ignore spilled pseudos
> +   created from the scratches.	*/
> +static bool
> +remove_pseudos (rtx *loc, rtx insn)

The return value is now ignored -- we know in advance which insns need
changing -- so this could be simplified.

> +/* Change spilled pseudos into memory or spill hard regs.  The
> +   function put changed insns on the constraint stack (these insns
> +   will be considered on the next constraint pass).  The changed insns
> +   are all insns in which pseudos were changed.	 */

s/The function put/Put/

> +/* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
> +   LR_BITMAP (a BB live info bitmap).  */
> +static void
> +update_live_info (bitmap lr_bitmap)
> +{
> +  unsigned int j;
> +  bitmap_iterator bi;
> +
> +  bitmap_clear (&removed_pseudos_bitmap);
> +  bitmap_clear (&used_pseudos_bitmap);
> +  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
> +			    FIRST_PSEUDO_REGISTER, j, bi)
> +    {
> +      bitmap_set_bit (&removed_pseudos_bitmap, j);
> +      bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
> +    }
> +  if (! bitmap_empty_p (&removed_pseudos_bitmap))
> +    {
> +      bitmap_and_compl_into (lr_bitmap, &removed_pseudos_bitmap);
> +      bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
> +    }
> +}

Might be wrong, but it looks like nothing really uses removed_pseudos_bitmap
outside this function.  I think this could simply be:

/* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
   LR_BITMAP (a BB live info bitmap).  */
static void
update_live_info (bitmap lr_bitmap)
{
  unsigned int j;
  bitmap_iterator bi;

  bitmap_clear (&used_pseudos_bitmap);
  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
			    FIRST_PSEUDO_REGISTER, j, bi)
    bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
  if (! bitmap_empty_p (&used_pseudos_bitmap))
    {
      bitmap_and_compl_into (lr_bitmap, &coalesced_pseudos_bitmap);
      bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
    }
}

> +	    && mem_move_p (sregno, dregno)
> +	    /* Don't coalesce inheritance pseudos because spilled
> +	       inheritance pseudos will be removed in subsequent 'undo
> +	       inheritance' pass.  */
> +	    && lra_reg_info[sregno].restore_regno < 0
> +	    && lra_reg_info[dregno].restore_regno < 0
> +	    /* We undo splits for spilled pseudos whose live ranges
> +	       were split.  So don't coalesce them, it is not
> +	       necessary and the undo transformations would be
> +	       wrong.  */
> +	    && ! bitmap_bit_p (&split_origin_bitmap, sregno)
> +	    && ! bitmap_bit_p (&split_origin_bitmap, dregno)
> +	    && ! side_effects_p (set)
> +	    /* Don't coalesces bound pseudos.  Bound pseudos has own
> +	       rules for finding live ranges.  It is hard to maintain
> +	       this info with coalescing and it is not worth to do
> +	       it.  */
> +	    && ! bitmap_bit_p (&lra_bound_pseudos, sregno)
> +	    && ! bitmap_bit_p (&lra_bound_pseudos, dregno)
> +	    /* We don't want to coalesce regnos with equivalences,
> +	       at least without updating this info.  */
> +	    && ira_reg_equiv[sregno].constant == NULL_RTX
> +	    && ira_reg_equiv[sregno].memory == NULL_RTX
> +	    && ira_reg_equiv[sregno].invariant == NULL_RTX
> +	    && ira_reg_equiv[dregno].constant == NULL_RTX
> +	    && ira_reg_equiv[dregno].memory == NULL_RTX
> +	    && ira_reg_equiv[dregno].invariant == NULL_RTX

Probably personal preference, but I think this would be easier
to read as:

	    && coalescable_reg_p (sregno)
	    && coalescable_reg_p (dregno)
	    && !side_effects_p (set)

with coalescable_reg_p checking reg_renumber (from mem_move_p)
and the open-coded stuff in the quote above.

> +  for (; mv_num != 0;)
> +    {
> +      for (i = 0; i < mv_num; i++)
> +	{
> +	  mv = sorted_moves[i];
> +	  set = single_set (mv);
> +	  lra_assert (set != NULL && REG_P (SET_SRC (set))
> +		      && REG_P (SET_DEST (set)));
> +	  sregno = REGNO (SET_SRC (set));
> +	  dregno = REGNO (SET_DEST (set));
> +	  if (! lra_intersected_live_ranges_p
> +		(lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
> +		 lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges))
> +	    {
> +	      coalesced_moves++;
> +	      if (lra_dump_file != NULL)
> +		fprintf
> +		  (lra_dump_file,
> +		   "	  Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
> +		   INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
> +		   dregno, ORIGINAL_REGNO (SET_DEST (set)),
> +		   BLOCK_FOR_INSN (mv)->frequency);
> +	      bitmap_ior_into (&involved_insns_bitmap,
> +			       &lra_reg_info[sregno].insn_bitmap);
> +	      bitmap_ior_into (&involved_insns_bitmap,
> +			       &lra_reg_info[dregno].insn_bitmap);
> +	      merge_pseudos (sregno, dregno);
> +	      i++;
> +	      break;
> +	    }
> +	}
> +      /* Collect the rest of copies.  */
> +      for (n = 0; i < mv_num; i++)
> +	{
> +	  mv = sorted_moves[i];
> +	  set = single_set (mv);
> +	  lra_assert (set != NULL && REG_P (SET_SRC (set))
> +		      && REG_P (SET_DEST (set)));
> +	  sregno = REGNO (SET_SRC (set));
> +	  dregno = REGNO (SET_DEST (set));
> +	  if (first_coalesced_pseudo[sregno] != first_coalesced_pseudo[dregno])
> +	    sorted_moves[n++] = mv;
> +	  else if (lra_dump_file != NULL)
> +	    {
> +	      coalesced_moves++;
> +	      fprintf
> +		(lra_dump_file, "      Coalescing move %i:r%d-r%d (freq=%d)\n",
> +		 INSN_UID (mv), sregno, dregno,
> +		 BLOCK_FOR_INSN (mv)->frequency);
> +	    }
> +	}
> +      mv_num = n;

I'm probably being dense here, sorry, but why the nested loops?
Why can't we have one loop along the lines of:

      for (i = 0; i < mv_num; i++)
	{
	  mv = sorted_moves[i];
	  set = single_set (mv);
	  lra_assert (set != NULL && REG_P (SET_SRC (set))
		      && REG_P (SET_DEST (set)));
	  sregno = REGNO (SET_SRC (set));
	  dregno = REGNO (SET_DEST (set));
	  if (first_coalesced_pseudo[sregno] == first_coalesced_pseudo[dregno])
	    {
	      coalesced_moves++;
	      fprintf
		(lra_dump_file, "      Coalescing move %i:r%d-r%d (freq=%d)\n",
		 INSN_UID (mv), sregno, dregno,
		 BLOCK_FOR_INSN (mv)->frequency);
	      /* We updated involved_insns_bitmap when doing the mrege */
	    }
	  else if (!(lra_intersected_live_ranges_p
		     (lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
		      lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges)))
	    {
	      coalesced_moves++;
	      if (lra_dump_file != NULL)
		fprintf
		  (lra_dump_file,
		   "	  Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
		   INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
		   dregno, ORIGINAL_REGNO (SET_DEST (set)),
		   BLOCK_FOR_INSN (mv)->frequency);
	      bitmap_ior_into (&involved_insns_bitmap,
			       &lra_reg_info[sregno].insn_bitmap);
	      bitmap_ior_into (&involved_insns_bitmap,
			       &lra_reg_info[dregno].insn_bitmap);
	      merge_pseudos (sregno, dregno);
	    }
	}

(completely untested)

> +	    if ((set = single_set (insn)) != NULL_RTX
> +		&& REG_P (SET_DEST (set)) && REG_P (SET_SRC (set))
> +		&& REGNO (SET_SRC (set)) == REGNO (SET_DEST (set))
> +		&& ! side_effects_p (set))

Maybe use set_noop_p here?

Richard
Richard Sandiford Oct. 4, 2012, 3:50 p.m. UTC | #2
Hi Vlad,

This message is for lra-assigns.c.  Sorry for the piecemeal reviews,
never sure when I'll get time...

> +/* This file contains a pass mostly assigning hard registers to reload
> +   pseudos.  There is no any RTL code transformation on this pass.

Maybe:

/* This file's main objective is to assign hard registers to reload pseudos.
   It also tries to allocate hard registers to other pseudos, but at a lower
   priority than the reload pseudos.  The pass does not transform the RTL.

if that's accurate.

> +   Reload pseudos get what they need (usually) hard registers in
> +   anyway possibly by spilling non-reload pseudos and by assignment
> +   reload pseudos with smallest number of available hard registers
> +   first.
> +
> +   If reload pseudos can get hard registers only through spilling
> +   other pseudos, we choose what pseudos to spill taking into account
> +   how given reload pseudo benefits and also how other reload pseudos
> +   not assigned yet benefit too (see function spill_for).

Maybe:

   We must allocate a hard register to every reload pseudo.  We try to
   increase the chances of finding a viable allocation by assigning the
   pseudos in order of fewest available hard registers first.  If we
   still fail to find a hard register, we spill other (non-reload)
   pseudos in order to make room.

   assign_hard_regno_for allocates registers without spilling.
   spill_for does the same with spilling.  Both functions use
   a cost model to determine the most profitable choice of
   hard and spill registers.

> +   Non-reload pseudos can get hard registers too if it is possible and
> +   improves the code.  It might be possible because of spilling
> +   non-reload pseudos on given pass.

Maybe:

   Once we have finished allocating reload pseudos, we also try to
   assign registers to other (non-reload) pseudos.  This is useful
   if hard registers were freed up by the spilling just described.

> +   We try to assign hard registers processing pseudos by threads.  The
> +   thread contains reload and inheritance pseudos connected by copies
> +   (move insns).  It improves the chance to get the same hard register
> +   to pseudos in the thread and, as the result, to remove some move
> +   insns.

Maybe:

   We try to assign hard registers by collecting pseudos into threads.
   These threads contain reload and inheritance pseudos that are connected
   by copies (move insns).  Doing this improves the chances of pseudos
   in the thread getting the same hard register and, as a result,
   of allowing some move insns to be deleted.

> +   When we assign hard register to a pseudo, we decrease the cost of
> +   the hard registers for corresponding pseudos connected by copies.

Maybe:

   When we assign a hard register to a pseudo, we decrease the cost of
   using the same hard register for pseudos that are connected by copies.

> +   If two hard registers are equally good for assigning the pseudo
> +   with hard register cost point of view, we prefer a hard register in
> +   smaller register bank.  By default, there is only one register
> +   bank.  A target can define register banks by hook
> +   register_bank. For example, x86-64 has a few register banks: hard
> +   regs with and without REX prefixes are in different banks.  It
> +   permits to generate smaller code as insns without REX prefix are
> +   shorter.

Maybe:

   If two hard registers have the same frequency-derived cost,
   we prefer hard registers in lower register banks.  The mapping
   of registers to banks is controlled by the register_bank target hook.
   For example, x86-64 has a few register banks: hard registers with and
   without REX prefixes are in different banks.  This permits us
   to generate smaller code as insns without REX prefixes are shorter.

although this might change if the name of the hook changes.

> +/* Info about pseudo used during the assignment pass.  Thread is a set
> +   of connected reload and inheritance pseudos with the same set of
> +   available hard reg set.  Thread is a pseudo itself for other
> +   cases.  */
> +struct regno_assign_info

Maybe:

/* Information about the thread to which a pseudo belongs.  Threads are
   a set of connected reload and inheritance pseudos with the same set of
   available hard registers.  Lone registers belong to their own threads.  */

Although the condition seems to be:

> +	&& (ira_class_hard_regs_num[regno_allocno_class_array[regno1]]
> +	    == ira_class_hard_regs_num[regno_allocno_class_array[regno2]]))

i.e. the same _number_ of available hard regs, but not necessarily the
same set.

"thread" might be more mnemonic than "regno_assign" in this file,
but that's bikeshed stuff.

> +  for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
> +    {
> +      regno_assign_info[i].first = i;
> +      regno_assign_info[i].next = -1;
> +      regno_assign_info[i].freq = lra_reg_info[i].freq;
> +    }

Minor speedup, but it's probably worth caching max_reg_num () rather than
calling it in each loop iteration.  Several other loops with the same thing.

> +/* Process a pseudo copy with execution frequency COPY_FREQ connecting
> +   REGNO1 and REGNO2 to form threads.  */
> +static void
> +process_copy_to_form_thread (int regno1, int regno2, int copy_freq)
> +{
> +  int last, regno1_first, regno2_first;
> +
> +  lra_assert (regno1 >= lra_constraint_new_regno_start
> +	      && regno2 >= lra_constraint_new_regno_start);
> +  regno1_first = regno_assign_info[regno1].first;
> +  regno2_first = regno_assign_info[regno2].first;
> +  if (regno1_first != regno2_first)
> +    {
> +      for (last = regno2_first;
> +	   regno_assign_info[last].next >= 0;
> +	   last = regno_assign_info[last].next)
> +	regno_assign_info[last].first = regno1_first;
> +      regno_assign_info[last].next = regno_assign_info[regno1_first].next;
> +      regno_assign_info[regno1_first].first = regno2_first;
> +      regno_assign_info[regno1_first].freq
> +	+= regno_assign_info[regno2_first].freq;

Couple of things I don't understand here:

- Why don't we set regno_assign_info[last].first (for final "last")
  to regno1_first?  I.e. the loop stops while "last" is still valid,
  but only assigns to that element's "next" field, leaving "first"
  as before.

- I might be wrong, but should:

    regno_assign_info[regno1_first].first = regno2_first;

  be:

    regno_assign_info[regno1_first].next = regno2_first;

  so that the list becomes:

    regno1_first regno2_first ... last ...

  The current version seems to create a cycle:

   regno_assign_info[regno1_first].first == regno2_first
   regno_assign_info[regno2_first].first == regno1_first

> +/* Update LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER by
> +   pseudo REGNO assignment or by the pseudo spilling if FREE_P.	 */

Maybe:

/* Update the LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER
   entries for pseudo REGNO.  Assume that the register has been
   spilled if FREE_P, otherwise assume that it has been assigned
   reg_renumber[REGNO] (if >= 0).  */

> +/* Find and return best (or TRY_ONLY_HARD_REGNO) free hard register
> +   for pseudo REGNO.  In the failure case, return a negative number.
> +   Return through *COST the cost of usage of the hard register for the
> +   pseudo.  Best free hard register has smallest cost of usage for
> +   REGNO or smallest register bank if the cost is the same.  */

Maybe:

/* Try to find a free hard register for pseudo REGNO.  Return the
   hard register on success and set *COST to the cost of using
   that register.  (If several registers have equal cost, the one with
   the lowest register bank wins.)  Return -1 on failure.

   If TRY_ONLY_HARD_REGNO >= 0, consider only that hard register,
   otherwise consider all hard registers in REGNO's class.  */

> +      if (hard_regno_costs_check[hard_regno] != curr_hard_regno_costs_check)
> +	hard_regno_costs[hard_regno] = 0;
> +      hard_regno_costs_check[hard_regno] = curr_hard_regno_costs_check;
> +      hard_regno_costs[hard_regno]
> +	-= lra_reg_info[regno].preferred_hard_regno_profit1;

This pattern occurs several times.  I think it'd be clearer to have
an inline helper function (adjust_hard_regno_cost, or whatever).

> +  /* That is important for allocation of multi-word pseudos.  */
> +  IOR_COMPL_HARD_REG_SET (conflict_set, reg_class_contents[rclass]);

Maybe:

  /* Make sure that all registers in a multi-word pseudo belong to the
     required class.  */

> +	  /* We can not use prohibited_class_mode_regs because it is
> +	     defined not for all classes.  */

s/defined not/not defined/

> +	  && ! TEST_HARD_REG_BIT (impossible_start_hard_regs, hard_regno)
> +	  && (nregs_diff == 0
> +#ifdef WORDS_BIG_ENDIAN
> +	      || (hard_regno - nregs_diff >= 0
> +		  && TEST_HARD_REG_BIT (reg_class_contents[rclass],
> +					hard_regno - nregs_diff))
> +#else
> +	      || TEST_HARD_REG_BIT (reg_class_contents[rclass],
> +				    hard_regno + nregs_diff)
> +#endif
> +	      ))

impossible_start_hard_regs is set up as:

> +	conflict_hr = live_pseudos_reg_renumber[conflict_regno];
> +	nregs = (hard_regno_nregs[conflict_hr]
> +		 [lra_reg_info[conflict_regno].biggest_mode]);
> +	/* Remember about multi-register pseudos.  For example, 2 hard
> +	   register pseudos can start on the same hard register but can
> +	   not start on HR and HR+1/HR-1.  */ 
> +	for (hr = conflict_hr + 1;
> +	     hr < FIRST_PSEUDO_REGISTER && hr < conflict_hr + nregs;
> +	     hr++)
> +	  SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
> +	for (hr = conflict_hr - 1;
> +	     hr >= 0 && hr + hard_regno_nregs[hr][biggest_mode] > conflict_hr;
> +	     hr--)
> +	  SET_HARD_REG_BIT (impossible_start_hard_regs, hr);

which I don't think copes with big-endian cases like:

  other hard reg in widest mode:    ........XXXX...
  impossible_start_regs:            .....XXX.XXX...
  this hard reg in pseudo's mode:   ............XX.
  this hard reg in widest mode:     ..........XXXX.

which AIUI is an invalid choice.

There are other corner cases too.  If the other hard reg is narrower than
its widest mode, and that widest mode is wider than the current regno's
widest mode, then on big-endian targets we could have:

  other hard reg in its own mode:   ........XX....
  other hard reg in widest mode:    ......XXXX.....
  impossible_start_regs:            .......X.XXX... (*)
  this hard reg in pseudo's mode:   .....XX........
  this hard reg in widest mode:     .....XX........

(*) note no big-endian adjustment for the other hard reg's widest mode here.

Maybe it would be easier to track impossible end regs for
big-endian targets?

> +/* Update HARD_REGNO preference for pseudos connected (directly or
> +   indirectly) to a pseudo with REGNO.	Use divisor DIV to the
> +   corresponding copy frequency for the hard regno cost preference
> +   calculation.	 The more indirectly a pseudo connected, the less the
> +   cost preference.  It is achieved by increasing the divisor for each
> +   next recursive level move.  */

"cost preference" seems a bit contradictory.  Maybe:

/* Update the preference for using HARD_REGNO for pseudos that are
   connected directly or indirectly with REGNO.  Apply divisor DIV
   to any preference adjustments.

   The more indirectly a pseudo is connected, the smaller its effect
   should be.  We therefore increase DIV on each "hop".  */

> +static void
> +update_hard_regno_preference (int regno, int hard_regno, int div)
> +{
> +  int another_regno, cost;
> +  lra_copy_t cp, next_cp;
> +
> +  /* Search depth 5 seems to be enough.	 */
> +  if (div > (1 << 5))
> +    return;
> +  for (cp = lra_reg_info[regno].copies; cp != NULL; cp = next_cp)
> +    {
> +      if (cp->regno1 == regno)
> +	{
> +	  next_cp = cp->regno1_next;
> +	  another_regno = cp->regno2;
> +	}
> +      else if (cp->regno2 == regno)
> +	{
> +	  next_cp = cp->regno2_next;
> +	  another_regno = cp->regno1;
> +	}
> +      else
> +	gcc_unreachable ();
> +      if (reg_renumber[another_regno] < 0
> +	  && (update_hard_regno_preference_check[another_regno]
> +	      != curr_update_hard_regno_preference_check))
> +	{
> +	  update_hard_regno_preference_check[another_regno]
> +	    = curr_update_hard_regno_preference_check;
> +	  cost = cp->freq < div ? 1 : cp->freq / div;
> +	  lra_setup_reload_pseudo_preferenced_hard_reg
> +	    (another_regno, hard_regno, cost);
> +	  update_hard_regno_preference (another_regno, hard_regno, div * 2);
> +	}
> +    }
> +}

Using a depth-first search for this seems a bit dangerous, because we
could end up processing a connected pseudo via a very indirect path
first, even though it is more tightly connected via a more direct path.
(Could be a well-known problem, sorry.)

> +/* Update REG_RENUMBER and other pseudo preferences by assignment of
> +   HARD_REGNO to pseudo REGNO and print about it if PRINT_P.  */
> +void
> +lra_setup_reg_renumber (int regno, int hard_regno, bool print_p)
> +{
> +  int i, hr;
> +
> +  if ((hr = hard_regno) < 0)
> +    hr = reg_renumber[regno];
> +  reg_renumber[regno] = hard_regno;
> +  lra_assert (hr >= 0);
> +  for (i = 0; i < hard_regno_nregs[hr][PSEUDO_REGNO_MODE (regno)]; i++)
> +    if (hard_regno < 0)
> +      lra_hard_reg_usage[hr + i] -= lra_reg_info[regno].freq;
> +    else
> +      lra_hard_reg_usage[hr + i] += lra_reg_info[regno].freq;

Is it possible for this function to reallocate a register,
i.e. for reg_regnumber to be >= 0 both before and after the call?
If so, I think we'd need two loops.  If not, an assert would be good.

> +      mode = PSEUDO_REGNO_MODE (spill_regno);
> +      if (lra_hard_reg_set_intersection_p
> +	  (live_pseudos_reg_renumber[spill_regno],
> +	   mode, reg_class_contents[rclass]))
> +	{
> +	  hard_regno = live_pseudos_reg_renumber[spill_regno];

Very minor, sorry, but I think this would be more readable with the
hard_regno assignment before the condition and hard_regno used in it.

> +/* Spill some pseudos for a reload pseudo REGNO and return hard
> +   register which should be used for pseudo after spilling.  The
> +   function adds spilled pseudos to SPILLED_PSEUDO_BITMAP.  When we
> +   choose hard register (and pseudos occupying the hard registers and
> +   to be spilled), we take into account not only how REGNO will
> +   benefit from the spills but also how other reload pseudos not
> +   assigned to hard registers yet benefit from the spills too.	*/

"...not yet assigned to hard registers benefit..."

> +  curr_pseudo_check++; /* Invalidate try_hard_reg_pseudos elements.  */

Comment on its own line.

> +  bitmap_clear (&ignore_pseudos_bitmap);
> +  bitmap_clear (&best_spill_pseudos_bitmap);
> +  EXECUTE_IF_SET_IN_BITMAP (&lra_reg_info[regno].insn_bitmap, 0, uid, bi)
> +    {
> +      struct lra_insn_reg *ir;
> +      
> +      for (ir = lra_get_insn_regs (uid); ir != NULL; ir = ir->next)
> +	if (ir->regno >= FIRST_PSEUDO_REGISTER)
> +	  bitmap_set_bit (&ignore_pseudos_bitmap, ir->regno);
> +    }

The name "ignore_pseudos_bitmap" doesn't seem to describe how the set is
actually used.  We still allow the pseudos to be spilled, but the number
of such spills is the first-order cost.  Maybe "insn_conflict_pseudos"
or something like that?

> +      /* Spill pseudos.	 */
> +      CLEAR_HARD_REG_SET (spilled_hard_regs);
> +      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
> +	if ((int) spill_regno >= lra_constraint_new_regno_start
> +	    /* ??? */
> +	    && ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
> +	    && ! bitmap_bit_p (&lra_split_pseudos, spill_regno)
> +	    && ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno))
> +	  goto fail;

Leftover ??? (or lacks enough info if it's supposed to be kept)

> +	      EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start],
> +					0, k, bi2)
> +		sparseset_set_bit (live_range_hard_reg_pseudos, k);

live_range_hard_reg_pseudos and &live_hard_reg_pseudos[r->start]
seem like similar quantities.  Was there a reason for using
sparsesets for one and bitmaps for the other?

> +	      for (p = r->start + 1; p <= r->finish; p++)
> +		{
> +		  lra_live_range_t r2;
> +		  
> +		  for (r2 = lra_start_point_ranges[p];
> +		       r2 != NULL;
> +		       r2 = r2->start_next)
> +		    if (r2->regno >= lra_constraint_new_regno_start)
> +		      sparseset_set_bit (live_range_reload_pseudos, r2->regno);
> +		}

This is probably just showing my ignorance, but -- taking the above two
quotes together -- why do we calculate these two live sets in different ways?
Also, does live_range_reload_pseudos really just contain "reload" pseudos,
or inheritance pseudos as well?

> +      /* We are trying to spill a reload pseudo.  That is wrong we
> +	 should assign all reload pseudos, otherwise we cannot reuse
> +	 the selected alternatives.  */
> +      hard_regno = find_hard_regno_for (regno, &cost, -1);
> +      if (hard_regno >= 0)
> +	{

Don't really understand this comment, sorry.

Also, why are we passing -1 to find_hard_regno_for, rather than hard_regno?
The loop body up till this point has been specifically freeing up registers
to make hard_regno allocatable.  I realise that, by spilling everything
that overlaps this range, we might have freed up other registers too,
and so made others besides hard_regno allocatable.  But wouldn't we want
to test those other hard registers in "their" iteration of the loop
instead of this one?  The spills in those iterations ought to be more
directed (i.e. there should be less incidental spilling).

As things stand, doing an rclass_size * rclass_size scan seems
unnecessarily expensive, although probably off the radar.

> +	  assign_temporarily (regno, hard_regno);
> +	  n = 0;
> +	  EXECUTE_IF_SET_IN_SPARSESET (live_range_reload_pseudos, reload_regno)
> +	    if (live_pseudos_reg_renumber[reload_regno] < 0
> +		&& (hard_reg_set_intersect_p
> +		    (reg_class_contents
> +		     [regno_allocno_class_array[reload_regno]],
> +		     spilled_hard_regs)))
> +	      sorted_reload_pseudos[n++] = reload_regno;
> +	  qsort (sorted_reload_pseudos, n, sizeof (int),
> +		 reload_pseudo_compare_func);
> +	  for (j = 0; j < n; j++)
> +	    {
> +	      reload_regno = sorted_reload_pseudos[j];
> +	      if (live_pseudos_reg_renumber[reload_regno] < 0

Just trying to make sure I understand, but: isn't the final condition in
this quote redundant?  I thought that was a requirement for the register
being in sorted_reload_pseudos to begin with.

> +		  && (reload_hard_regno
> +		      = find_hard_regno_for (reload_regno,
> +					     &reload_cost, -1)) >= 0
> +		  && (lra_hard_reg_set_intersection_p
> +		      (reload_hard_regno, PSEUDO_REGNO_MODE (reload_regno),
> +		       spilled_hard_regs)))
> +		{
> +		  if (lra_dump_file != NULL)
> +		    fprintf (lra_dump_file, " assign %d(cost=%d)",
> +			     reload_regno, reload_cost);
> +		  assign_temporarily (reload_regno, reload_hard_regno);
> +		  cost += reload_cost;

It looks like registers that can be reallocated make hard_regno more
expensive (specifically by reload_cost), but registers that can't be
reallocated contribute no cost.  Is that right?  Seemed a little odd,
so maybe worth a comment.

Also, AIUI find_hard_regno_for is trying to allocate the register for
reload_regno on the basis that reload_regno has the same conflicts as
the current regno, and so it's only an approximation.  Is that right?
Might be worth a comment if so (not least to explain why we don't commit
to this allocation if we end up choosing hard_regno).

> +	  if (best_insn_pseudos_num > insn_pseudos_num
> +	      || (best_insn_pseudos_num == insn_pseudos_num
> +		  && best_cost > cost))

Should we check the register bank and levelling here too,
for consistency?

> +      /* Restore the live hard reg pseudo info for spilled pseudos.  */
> +      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
> +	update_lives (spill_regno, false);

I couldn't tell why this was outside the "hard_regno >= 0" condition.
Do we really change these registers even if find_hard_regno_for fails?

> +  /* Spill: */
> +  EXECUTE_IF_SET_IN_BITMAP (&best_spill_pseudos_bitmap, 0, spill_regno, bi)

Very minor, but I think it'd be worth asserting that best_hard_regno >= 0
before this loop.

> +/* Constraint transformation can use equivalences and they can
> +   contains pseudos assigned to hard registers.	 Such equivalence
> +   usage might create new conflicts of pseudos with hard registers
> +   (like ones used for parameter passing or call clobbered ones) or
> +   other pseudos assigned to the same hard registers.  Another very
> +   rare risky transformation is restoring whole multi-register pseudo
> +   when only one subreg lives and unused hard register is used already
> +   for something else.

In a way, I found this comment almost too detailed. :-)  Maybe:

/* The constraints pass is allowed to create equivalences between
   pseudos that make the current allocation "incorrect" (in the sense
   that pseudos are assigned to hard registers from their own conflict sets).
   The global variable lra_risky_transformations_p says whether this might
   have happened.

if that's accurate.  The detail about when this occurs probably
belongs above lra_risky_transformations_p, although it's mostly
there already.  (Haven't got to the ira-conflicts.c stuff yet,
so no comments about that here.)

> +   Process pseudos assigned to hard registers (most frequently used
> +   first), spill if a conflict is found, and mark the spilled pseudos
> +   in SPILLED_PSEUDO_BITMAP.  Set up LIVE_HARD_REG_PSEUDOS from
> +   pseudos, assigned to hard registers.	 */

Why do we spill the most frequently used registers first?  Probably worth
a comment.

> +  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
> +    if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
> +      {
> +	if (lra_risky_transformations_p)
> +	  sorted_pseudos[n++] = i;
> +	else
> +	  update_lives (i, false);
> +      }
> +  if (! lra_risky_transformations_p)
> +    return;

Seems like this would be more logically split into two (the
lra_risky_transformations_p case and the !lra_risky_transformations_p case).

> +	    /* If it is multi-register pseudos they should start on
> +	       the same hard register.	*/
> +	    || hard_regno != reg_renumber[conflict_regno])

This seems different from the find_hard_regno_for case, which took
biggest_mode into account.

> +	  /* Don't change reload pseudo allocation.  It might have
> +	     this allocation for a purpose (e.g. bound to another
> +	     pseudo) and changing it can result in LRA cycling.	 */
> +	  if (another_regno < lra_constraint_new_regno_start
> +	      && (another_hard_regno = reg_renumber[another_regno]) >= 0
> +	      && another_hard_regno != hard_regno)

Seems like this excludes split pseudos as well as reload pseudos,
or are they never included in these copies?  Might be worth mentioning
them either way.

The only general comment I have so far is that it's sometimes
difficult to follow which types of pseudos are being included
or excluded by a comparison with lra_constraint_new_regno_start.
Sometimes the comments talk about "reload pseudos", but other
similar checks imply that the registers could be inheritance
pseudos or split pseudos as well.  Some thin inline wrappers
might help here.

> +	      /* Remember that reload pseudos can be spilled on the
> +		 1st pass.  */
> +	      bitmap_clear_bit (&all_spilled_pseudos, regno);
> +	      assign_hard_regno (hard_regno, regno);

Maybe:

  /* This register might have been spilled by the previous pass.
     Indicate that it is no longer spilled.  */

> +		/* We can use inheritance pseudos in original insns
> +		   (not reload ones).  */
> +		if (regno < lra_constraint_new_regno_start
> +		    || bitmap_bit_p (&lra_inheritance_pseudos, regno)
> +		    || reg_renumber[regno] < 0)
> +		  continue;
> +		sorted_pseudos[nfails++] = regno;
> +		if (lra_dump_file != NULL)
> +		  fprintf (lra_dump_file,
> +			   "	  Spill reload r%d(hr=%d, freq=%d)\n",
> +			   regno, reg_renumber[regno],
> +			   lra_reg_info[regno].freq);

Same comment about types of pseudo as above.  (I.e. the code checks for
inheritance pseudos, but not split pseudos.)

> +  bitmap_initialize (&do_not_assign_nonreload_pseudos, &reg_obstack);
> +  EXECUTE_IF_SET_IN_BITMAP (&lra_inheritance_pseudos, 0, u, bi)
> +    if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
> +	&& reg_renumber[u] < 0 && bitmap_bit_p (&lra_inheritance_pseudos, u))
> +      bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
> +  EXECUTE_IF_SET_IN_BITMAP (&lra_split_pseudos, 0, u, bi)
> +    if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
> +	&& reg_renumber[u] >= 0 && bitmap_bit_p (&lra_split_pseudos, u))
> +      bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);

The bitmap_bit_p tests look redundant.  Also, the following code is:

> +  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
> +    if (((i < lra_constraint_new_regno_start
> +	  && ! bitmap_bit_p (&do_not_assign_nonreload_pseudos, i))
> +	 || (bitmap_bit_p (&lra_inheritance_pseudos, i)
> +	     && lra_reg_info[i].restore_regno >= 0)
> +	 || (bitmap_bit_p (&lra_split_pseudos, i)
> +	     && lra_reg_info[i].restore_regno >= 0)
> +	 || bitmap_bit_p (&lra_optional_reload_pseudos, i))
> +	&& reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
> +	&& regno_allocno_class_array[i] != NO_REGS)
> +      sorted_pseudos[n++] = i;
> +  bitmap_clear (&do_not_assign_nonreload_pseudos);

where we test very similar things inline, and then clear
do_not_assign_nonreload_pseudos.  Do we need d_n_a_n_p at all?

> +  if (n != 0 && lra_dump_file != NULL)
> +    fprintf (lra_dump_file, "  Reassing non-reload pseudos\n");

"Reassigning"

Richard
Richard Sandiford Oct. 10, 2012, 3:31 p.m. UTC | #3
Hi Vlad,

Here's a review of the first part of ira-constraints.c.  It's a big file,
and this is a long reply, so I'll send further comments another day in
another message.

Vladimir Makarov <vmakarov@redhat.com> writes:
> +/* This file contains code for 3 passes: constraint pass,
> +   inheritance/split pass, and pass for undoing failed inheritance and
> +   split.
> +
> +   The major goal of constraint pass is to transform RTL to satisfy
> +   insn and address constraints by:
> +     o choosing insn alternatives;
> +     o generating *reload insns* (or reloads in brief) and *reload
> +       pseudos* which will got necessary hard registers later;

s/got/get/

> +     o substituting pseudo equivalences (if it is done once, is done
> +       everywhere) and removes insns initializing used equivalent
> +       substitution.

Suggest:

     o substituting pseudos with equivalent values and removing the
       instructions that initialized those pseudos.

> +   To speed the pass up we process only necessary insns (first time
> +   all insns) and reuse of already chosen alternatives in some
> +   cases.

Suggest:

   On the first iteration of the pass we process every instruction and
   choose an alternative for each one.  On subsequent iterations we try
   to avoid reprocessing instructions if we can be sure that the old
   choice is still valid.

> +   The inheritance/spilt pass is to transform code to achieve
> +   ineheritance and live range splitting.  It is done on backward
> +   traverse of EBBs.

Typo: inheritance.  "backward traversal".

> +   The inheritance optimization goal is to reuse values in hard
> +   registers. There is analogous optimization in old reload pass.  The
> +   inheritance is achieved by following transformation:
> +
> +       reload_p1 <- p	     reload_p1 <- p
> +       ...		     new_p <- reload_p1
> +       ...		=>   ...
> +       reload_p2 <- p	     reload_p2 <- new_p
> +
> +   where p is spilled and not changed between the insns.  Reload_p1 is
> +   also called *original pseudo* and new_p is called *inheritance
> +   pseudo*.
> +
> +   The subsequent assignment pass will try to assign the same (or
> +   another if it is not possible) hard register to new_p as to
> +   reload_p1 or reload_p2.
> +
> +   If it fails to assign a hard register, the opposite transformation
> +   will restore the original code on (the pass called undoing
> +   inheritance) because with spilled new_p the code would be much
> +   worse. [...]

Maybe:

   If the assignment pass fails to assign a hard register to new_p,
   this file will undo the inheritance and restore the original code.
   This is because implementing the above sequence with a spilled
   new_p would make the code much worse.

> +   Splitting (transformation) is also done in EBB scope on the same
> +   pass as the inheritance:
> +
> +       r <- ... or ... <- r		 r <- ... or ... <- r
> +       ...				 s <- r (new insn -- save)
> +       ...			  => 
> +       ...				 r <- s (new insn -- restore)
> +       ... <- r				 ... <- r
> +
> +    The *split pseudo* s is assigned to the hard register of the
> +    original pseudo or hard register r.
> +
> +    Splitting is done:
> +      o In EBBs with high register pressure for global pseudos (living
> +	in at least 2 BBs) and assigned to hard registers when there
> +	are more one reloads needing the hard registers;
> +      o for pseudos needing save/restore code around calls.
> +
> +    If the split pseudo still has the same hard register as the
> +    original pseudo after the subsequent assignment pass, the opposite
> +    transformation is done on the same pass for undoing inheritance.  */

AIUI spill_for can spill split pseudos.  I think the comment should say
what happens then.  If I understand the code correctly, we keep the
split if "r" is a hard register or was assigned a hard register.
We undo it if "r" was not assigned a hard register.  Is that right?

> +/* Array whose element is (MEM:MODE BASE_REG) corresponding to the
> +   mode (index) and where BASE_REG is a base hard register for given
> +   memory mode.	 */
> +static rtx indirect_mem[MAX_MACHINE_MODE];

Maybe:

/* Index M is an rtx of the form (mem:M BASE_REG), where BASE_REG
   is a sample hard register that is a valid address for mode M.
   The memory refers to the generic address space.  */

> +/* Return class of hard regno of REGNO or if it is was not assigned to
> +   a hard register, return its allocno class but only for reload
> +   pseudos created on the current constraint pass.  Otherwise, return
> +   NO_REGS.  */
> +static enum reg_class
> +get_reg_class (int regno)

Maybe:

/* If REGNO is a hard register or has been allocated a hard register,
   return the class of that register.  If REGNO is a pseudo created
   by the current constraints pass, assume that it will be allocated
   a hard register and return the class that that register will have.
   (This assumption is optimistic when REGNO is an inheritance or
   split pseudo.)  Return NO_REGS otherwise.  */

if that's accurate.  I dropped the term "reload pseudo" because of
the general comment in my earlier reply about the use of "reload pseudo"
when the code seems to include inheritance and split pseudos too.

> +/* Return true if REGNO in REG_MODE satisfies reg class constraint CL.
> +   For new reload pseudos we should make more accurate class
> +   *NEW_CLASS (we set up it if it is not NULL) to satisfy the
> +   constraints.  Otherwise, set up NEW_CLASS to NO_REGS.  */
> +static bool
> +in_class_p (int regno, enum machine_mode reg_mode,
> +	    enum reg_class cl, enum reg_class *new_class)

Same comment here, since it uses get_reg_class.  I.e. for registers >=
new_regno_start, we're really testing whether the first allocatable
register in REGNO's allocno class satisfies CL.

Also, the only caller that doesn't directly pass REGNO and REG_MODE
from an rtx is process_addr_reg.  in_class_p uses:

> +  if (new_class != NULL)
> +    *new_class = NO_REGS;
> +  if (regno < FIRST_PSEUDO_REGISTER)
> +    return TEST_HARD_REG_BIT (reg_class_contents[cl], regno);
> +  rclass = get_reg_class (regno);

whereas process_addr_reg uses:

> +  final_regno = regno = REGNO (reg);
> +  if (regno < FIRST_PSEUDO_REGISTER)
> +    {
> +      rtx final_reg = reg;
> +      rtx *final_loc = &final_reg;
> +
> +      lra_eliminate_reg_if_possible (final_loc);
> +      final_regno = REGNO (*final_loc);
> +    }

I.e. process_addr_reg applies eliminations before testing whereas
in_class_p doesn't.  I couldn't really tell why the two were different.
Since the idea is that we use elimination source registers to represent
their targets, shouldn't in_class_p eliminate too?

With that difference removed, in_class_p could take the rtx instead
of a (REGNO, MODE) pair.  It could then pass that rtx directly to
lra_eliminate_reg_if_possible.  I think this would lead to a cleaner
interface and make things more regular.

Then the comment for in_class_p could be:

/* Return true if X satisfies (or will satisfy) reg class constraint CL.
   If X is a pseudo created by this constraints pass, assume that it will
   be allocated a hard register from its allocno class, but allow that
   class to be narrowed to CL if it is currently a superset of CL.

   If NEW_CLASS is nonnull, set *NEW_CLASS to the new allocno class
   of REGNO (X), or NO_REGS if no change in its class was needed.  */

That's a change in the meaning of NEW_CLASS, but seems easier for
callers to handle.  I think all it requires is changing:

> +      common_class = ira_reg_class_subset[rclass][cl];
> +      if (new_class != NULL)
> +	*new_class = common_class;

to:

      common_class = ira_reg_class_subset[rclass][cl];
      if (new_class != NULL && rclass != common_class)
	*new_class = common_class;

> +  if (regno < new_regno_start
> +      /* Do not make more accurate class from reloads generated.  They
> +	 are mostly moves with a lot of constraints.  Making more
> +	 accurate class may results in very narrow class and
> +	 impossibility of find registers for several reloads of one
> +	 insn.	*/

Maybe:

      /* Do not allow the constraints for reload instructions to
	 influence the classes of new pseudos.  These reloads are
	 typically moves that have many alternatives, and restricting
	 reload pseudos for one alternative may lead to situations
	 where other reload pseudos are no longer allocatable.  */

> +      || INSN_UID (curr_insn) >= new_insn_uid_start)
> +    return ((regno >= new_regno_start && rclass == ALL_REGS)
> +	    || (rclass != NO_REGS && ira_class_subset_p[rclass][cl]
> +		&& ! hard_reg_set_subset_p (reg_class_contents[cl],
> +					    lra_no_alloc_regs)));

Why the ALL_REGS special case?  I think it deserves a comment.

> +/* Return the defined and profitable equiv substitution of reg X, return
> +   X otherwise.	 */

Maybe:

/* If we have decided to substitute X with another value, return that value,
   otherwise return X.  */

> +/* Change class of pseudo REGNO to NEW_CLASS.  Print info about it
> +   using TITLE.	 Output a new line if NL_P.  */
> +static void
> +change_class (int regno, enum reg_class new_class,
> +	      const char *title, bool nl_p)
> +{
> +  if (lra_dump_file != NULL)
> +    fprintf (lra_dump_file, "%s to class %s for r%d",
> +	     title, reg_class_names[new_class], regno);
> +  setup_reg_classes (regno, new_class, NO_REGS, new_class);
> +  if (lra_dump_file != NULL && nl_p)
> +    fprintf (lra_dump_file, "\n");
> +}

I think either this or setup_reg_classes should have an assert
that REGNO is >= FIRST_PSEUDO_REGISTER.   This matters more now
because a lot of LRA deals with hard and pseudo registers
side-by-side.

> +/* Create a new pseudo using MODE, RCLASS, ORIGINAL, TITLE or reuse
> +   already created input reload pseudo (only if TYPE is not OP_OUT).
> +   The result pseudo is returned through RESULT_REG.  Return TRUE if
> +   we created a new pseudo, FALSE if we reused the already created
> +   input reload pseudo.	 */

Maybe:

/* Store in *RESULT_REG a register for reloading ORIGINAL, which has
   mode MODE.  TYPE specifies the direction of the reload -- either OP_IN
   or OP_OUT -- and RCLASS specifies the class of hard register required.

   Try to reuse existing input reloads where possible.  Return true if
   *RESULT_REG is a new register, false if it is an existing one.
   Use TITLE to describe new registers for debug purposes.  */

although I admit that's a bit convoluted...

> +  for (i = 0; i < curr_insn_input_reloads_num; i++)
> +    if (rtx_equal_p (curr_insn_input_reloads[i].input, original))
> +      break;
> +  if (i >= curr_insn_input_reloads_num
> +      || ! in_class_p (REGNO (curr_insn_input_reloads[i].reg),
> +		       GET_MODE (curr_insn_input_reloads[i].reg),
> +		       rclass, &new_class))
> +    {
> +      res_p = true;
> +      *result_reg = lra_create_new_reg (mode, original, rclass, title);
> +    }
> +  else
> +    {
> +      lra_assert (! side_effects_p (original));
> +      res_p = false;
> +      *result_reg = curr_insn_input_reloads[i].reg;
> +      regno = REGNO (*result_reg);
> +      if (lra_dump_file != NULL)
> +	 {
> +	   fprintf (lra_dump_file, "	 Reuse r%d for reload ", regno);
> +	   print_value_slim (lra_dump_file, original, 1);
> +	 }
> +      if (rclass != new_class)
> +	 change_class (regno, new_class, ", change", false);
> +      if (lra_dump_file != NULL)
> +	 fprintf (lra_dump_file, "\n");
> +    }
> +  lra_assert (curr_insn_input_reloads_num < LRA_MAX_INSN_RELOADS);
> +  curr_insn_input_reloads[curr_insn_input_reloads_num].input = original;
> +  curr_insn_input_reloads[curr_insn_input_reloads_num++].reg = *result_reg;
> +  return res_p;

It probably doesn't matter in practice, but I think this would
be better as:

  for (i = 0; i < curr_insn_input_reloads_num; i++)
    if (rtx_equal_p (curr_insn_input_reloads[i].input, original)
        && in_class_p (curr_insn_input_reloads[i].reg, rclass, &new_class))
      {
        ...reuse case..
        return false;
      }
  ...new case...
  return true;

which also copes with the unlikely case that the same input is used
three times, and that the third use requires the same class as the
second.

> +/* The page contains code to extract memory address parts.  */
> +
> +/* Info about base and index regs of an address.  In some rare cases,
> +   base/index register can be actually memory.	In this case we will
> +   reload it.  */
> +struct address
> +{
> +  rtx *base_reg_loc;  /* NULL if there is no a base register.  */
> +  rtx *base_reg_loc2; /* Second location of {post/pre}_modify, NULL
> +			 otherwise.  */
> +  rtx *index_reg_loc; /* NULL if there is no an index register.	 */
> +  rtx *index_loc; /* location of index reg * scale or index_reg_loc
> +		      otherwise.  */
> +  rtx *disp_loc; /* NULL if there is no a displacement.	 */
> +  /* Defined if base_reg_loc is not NULL.  */
> +  enum rtx_code base_outer_code, index_code;
> +  /* True if the base register is modified in the address, for
> +     example, in PRE_INC.  */
> +  bool base_modify_p;
> +};

Comments should be consistently above the fields rather than to the right.

> +/* Process address part in space AS (or all address if TOP_P) with
> +   location *LOC to extract address characteristics.
> +
> +   If CONTEXT_P is false, we are looking at the base part of an
> +   address, otherwise we are looking at the index part.
> +
> +   MODE is the mode of the memory reference; OUTER_CODE and INDEX_CODE
> +   give the context that the rtx appears in; MODIFY_P if *LOC is
> +   modified.  */
> +static void
> +extract_loc_address_regs (bool top_p, enum machine_mode mode, addr_space_t as,
> +			  rtx *loc, bool context_p, enum rtx_code outer_code,
> +			  enum rtx_code index_code,
> +			  bool modify_p, struct address *ad)
> +{
> +  rtx x = *loc;
> +  enum rtx_code code = GET_CODE (x);
> +  bool base_ok_p;
> +
> +  switch (code)
> +    {
> +    case CONST_INT:
> +    case CONST:
> +    case SYMBOL_REF:
> +    case LABEL_REF:
> +      if (! context_p)
> +	ad->disp_loc = loc;

This looks a bit odd.  I assume it's trying to avoid treating MULT
scale factors as displacements, but I thought whether something was
a displacement or not depended on whether it is involved (possibly
indirectly) in a sum with the base.  Seems like it'd be better
to check for that directly.

> +	/* If this machine only allows one register per address, it
> +	   must be in the first operand.  */
> +	if (MAX_REGS_PER_ADDRESS == 1 || code == LO_SUM)
> +	  {
> +	    extract_loc_address_regs (false, mode, as, arg0_loc, false, code,
> +				      code1, modify_p, ad);
> +	    ad->disp_loc = arg1_loc;
> +	  }
> +	/* If index and base registers are the same on this machine,
> +	   just record registers in any non-constant operands.	We
> +	   assume here, as well as in the tests below, that all
> +	   addresses are in canonical form.  */
> +	else if (INDEX_REG_CLASS
> +		 == base_reg_class (VOIDmode, as, PLUS, SCRATCH)
> +		 && code0 != PLUS && code0 != MULT)
> +	  {
> +	    extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
> +				      code1, modify_p, ad);
> +	    if (! CONSTANT_P (arg1))
> +	      extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
> +					code0, modify_p, ad);
> +	    else
> +	      ad->disp_loc = arg1_loc;
> +	  }
> +
> +	/* If the second operand is a constant integer, it doesn't
> +	   change what class the first operand must be.	 */
> +	else if (code1 == CONST_INT || code1 == CONST_DOUBLE)
> +	  {
> +	    ad->disp_loc = arg1_loc;
> +	    extract_loc_address_regs (false, mode, as, arg0_loc, context_p,
> +				      PLUS, code1, modify_p, ad);
> +	  }
> +	/* If the second operand is a symbolic constant, the first
> +	   operand must be an index register but only if this part is
> +	   all the address.  */
> +	else if (code1 == SYMBOL_REF || code1 == CONST || code1 == LABEL_REF)
> +	  {
> +	    ad->disp_loc = arg1_loc;
> +	    extract_loc_address_regs (false, mode, as, arg0_loc,
> +				      top_p ? true : context_p, PLUS, code1,
> +				      modify_p, ad);
> +	  }

What's the reason for the distinction between the last two, which AIUI
doesn't exist in reload?  I'm not sure the:

    top_p ? true : context_p

condition is safe: some targets use aligning addresses like
(and X (const_int -ALIGN)), but that shouldn't really affect whether
a register in X is treated as a base or an index.

> +	/* If both operands are registers but one is already a hard
> +	   register of index or reg-base class, give the other the
> +	   class that the hard register is not.	 */
> +	else if (code0 == REG && code1 == REG
> +		 && REGNO (arg0) < FIRST_PSEUDO_REGISTER
> +		 && ((base_ok_p
> +		      = ok_for_base_p_nonstrict (arg0, mode, as, PLUS, REG))
> +		     || ok_for_index_p_nonstrict (arg0)))
> +	  {
> +	    extract_loc_address_regs (false, mode, as, arg0_loc, ! base_ok_p,
> +				      PLUS, REG, modify_p, ad);
> +	    extract_loc_address_regs (false, mode, as, arg1_loc, base_ok_p,
> +				      PLUS, REG, modify_p, ad);
> +	  }
> +	else if (code0 == REG && code1 == REG
> +		 && REGNO (arg1) < FIRST_PSEUDO_REGISTER
> +		 && ((base_ok_p
> +		      = ok_for_base_p_nonstrict (arg1, mode, as, PLUS, REG))
> +		     || ok_for_index_p_nonstrict (arg1)))
> +	  {
> +	    extract_loc_address_regs (false, mode, as, arg0_loc, base_ok_p,
> +				      PLUS, REG, modify_p, ad);
> +	    extract_loc_address_regs (false, mode, as, arg1_loc, ! base_ok_p,
> +				      PLUS, REG, modify_p, ad);
> +	  }
> +	/* If one operand is known to be a pointer, it must be the
> +	   base with the other operand the index.  Likewise if the
> +	   other operand is a MULT.  */
> +	else if ((code0 == REG && REG_POINTER (arg0)) || code1 == MULT)
> +	  {
> +	    extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
> +				      code1, modify_p, ad);
> +	    if (code1 == MULT)
> +	      ad->index_loc = arg1_loc;
> +	    extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
> +				      code0, modify_p, ad);
> +	  }
> +	else if ((code1 == REG && REG_POINTER (arg1)) || code0 == MULT)
> +	  {
> +	    extract_loc_address_regs (false, mode, as, arg0_loc, true, PLUS,
> +				      code1, modify_p, ad);
> +	    if (code0 == MULT)
> +	      ad->index_loc = arg0_loc;
> +	    extract_loc_address_regs (false, mode, as, arg1_loc, false, PLUS,
> +				      code0, modify_p, ad);
> +	  }

Some targets care about the choice between index and base for
correctness reasons (PA IIRC) or for performance (some ppc targets IIRC),
so I'm not sure whether it's safe to give REG_POINTER such a low priority.

> +    default:
> +      {
> +	const char *fmt = GET_RTX_FORMAT (code);
> +	int i;
> +
> +	if (GET_RTX_LENGTH (code) != 1
> +	    || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
> +	  {
> +	    for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
> +	      if (fmt[i] == 'e')
> +		extract_loc_address_regs (false, mode, as, &XEXP (x, i),
> +					  context_p, code, SCRATCH,
> +					  modify_p, ad);
> +	    break;
> +	  }
> +	/* fall through for case UNARY_OP (UNSPEC ...)	*/
> +      }
> +
> +    case UNSPEC:
> +      if (ad->disp_loc == NULL)
> +	ad->disp_loc = loc;
> +      else if (ad->base_reg_loc == NULL)
> +	{
> +	  ad->base_reg_loc = loc;
> +	  ad->base_outer_code = outer_code;
> +	  ad->index_code = index_code;
> +	  ad->base_modify_p = modify_p;
> +	}
> +      else
> +	{
> +	  lra_assert (ad->index_reg_loc == NULL);
> +	  ad->index_reg_loc = loc;
> +	}
> +      break;
> +
> +    }

Which targets use a bare UNSPEC as a displacement?  I thought a
displacement had to be a link-time constant, in which case it should
satisfy CONSTANT_P.  For UNSPECs, that means wrapping it in a CONST.

I'm just a bit worried that the UNSPEC handling is sensitive to the
order that subrtxes are processed (unlike PLUS, which goes to some
trouble to work out what's what).  It could be especially confusing
because the default case processes operands in reverse order while
PLUS processes them in forward order.

Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
Probably deserves a comment.

AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
recording where reloads of a particular class need to go (obviously
in the case of disp_loc, which isn't reloaded at all).  The feidls
have semantic value too.  I.e. we use them to work out the value
of at least part of the address.

In that case it seems dangerous to look through general rtxes
in the way that the default case above does.  Maybe just making
sure that DISP_LOC is involved in a sum with the base would be
enough, but another idea was:

----------------------------------------------------------------
I know of three ways of "mutating" (for want of a better word)
an address:

  1. (and X (const_int X)), to align
  2. a subreg
  3. a unary operator (such as truncation or extension)

So maybe we could:

  a. remove outer mutations (using a helper function)
  b. handle LO_SUM, PRE_*, POST_*: as now
  c. otherwise treat the address of the sum of one, two or three pieces.
     c1. Peel mutations of all pieces.
     c2. Classify the pieces into base, index and displacement.
         This would be similar to the jousting code above, but hopefully
         easier because all three rtxes are to hand.  E.g. we could
         do the base vs. index thing in a similar way to
         commutative_operand_precedence.
     c3. Record which pieces were mutated (e.g. using something like the
         index_loc vs. index_reg_loc distinction in the current code)

That should be general enough for current targets, but if it isn't,
we could generalise it further when we know what generalisation is needed.

That's still going to be a fair amount of code, but hopefully not more,
and we might have more confidence at each stage what each value is.
And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
----------------------------------------------------------------

Just an idea though.  Probably not for 4.8, although I might try it
if I find time.

It would be nice to sort out the disp_loc thing for 4.8 though.

> +/* Extract address characteristics in address with location *LOC in
> +   space AS.  Return them in AD.  Parameter OUTER_CODE for MEM should
> +   be MEM.  Parameter OUTER_CODE for 'p' constraint should be ADDRESS
> +   and MEM_MODE should be VOIDmode.  */

Maybe:

/* Describe address *LOC in AD.  There are two cases:

   - *LOC is the address in a (mem ...).  In this case OUTER_CODE is MEM
     and AS is the mem's address space.

   - *LOC is matched to an address constraint such as 'p'.  In this case
     OUTER_CODE is ADDRESS and AS is ADDR_SPACE_GENERIC.  */

> +/* Return start register offset of hard register REGNO in MODE.	 */
> +int
> +lra_constraint_offset (int regno, enum machine_mode mode)
> +{
> +  lra_assert (regno < FIRST_PSEUDO_REGISTER);
> +  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
> +     multiple hard register group of scalar integer registers, so that
> +     for example (reg:DI 0) and (reg:SI 1) will be considered the same
> +     register.	*/
> +  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (mode) > UNITS_PER_WORD
> +      && SCALAR_INT_MODE_P (mode))
> +    return hard_regno_nregs[regno][mode] - 1;
> +  return 0;
> +}

Maybe the head comment could be:

/* Return the offset from REGNO of the least significant register
   in (reg:MODE REGNO).

   This function is used to tell whether two registers satisfy
   a matching constraint.  (reg:MODE1 REGNO1) matches (reg:MODE2 REGNO2) if:

         REGNO1 + lra_constraint_offset (REGNO1, MODE1)
      == REGNO2 + lra_constraint_offset (REGNO2, MODE2)  */

(and remove the inner comment).

> +/* Like rtx_equal_p except that it allows a REG and a SUBREG to match
> +   if they are the same hard reg, and has special hacks for
> +   auto-increment and auto-decrement.  This is specifically intended for
> +   process_alt_operands to use in determining whether two operands
> +   match.  X is the operand whose number is the lower of the two.
> +
> +   It is supposed that X is the output operand and Y is the input
> +   operand.  */
> +static bool
> +operands_match_p (rtx x, rtx y, int y_hard_regno)

Need to say what Y_HARD_REGNO is.

> +  switch (code)
> +    {
> +    case CONST_INT:
> +    case CONST_DOUBLE:
> +    case CONST_FIXED:

After a recent change this should be:

   CASE_CONST_UNIQUE:

> +	      val = operands_match_p (XVECEXP (x, i, j), XVECEXP (y, i, j),
> +				      y_hard_regno);
> +	      if (val == 0)
> +		return false;

Why do we pass the old y_hard_regno even though Y has changed?
Some of the earlier code assumes that GET_MODE (y) is the mode
of y_hard_regno.

> +/* Reload pseudos created for matched input and output reloads whose
> +   mode are different.	Such pseudos has a modified rules for finding
> +   their living ranges, e.g. assigning to subreg of such pseudo means
> +   changing all pseudo value.  */
> +bitmap_head lra_bound_pseudos;

Maybe:

/* Reload pseudos created for matched input and output reloads whose
   modes are different.  Such pseudos have different live ranges from
   other pseudos; e.g. any assignment to a subreg of these pseudos
   changes the whole pseudo's value.  */

Although that said, couldn't emit_move_insn_1 (called by gen_move_insn)
split a multiword pseudo move into two word moves?  Using the traditional
clobber technique sounds better than having special liveness rules.

> +/* True if C is a non-empty register class that has too few registers
> +   to be safely used as a reload target class.	*/
> +#define SMALL_REGISTER_CLASS_P(C)					\
> +  (reg_class_size [(C)] == 1						\
> +   || (reg_class_size [(C)] >= 1 && targetm.class_likely_spilled_p (C)))

Feels like ira_class_hard_regs_num might be better, but since the
current definition is traditional, that shouldn't be a merge requirement.

> +/* Return mode of WHAT inside of WHERE whose mode of the context is
> +   OUTER_MODE.	If WHERE does not contain WHAT, return VOIDmode.  */
> +static enum machine_mode
> +find_mode (rtx *where, enum machine_mode outer_mode, rtx *what)
> +{
> +  int i, j;
> +  enum machine_mode mode;
> +  rtx x;
> +  const char *fmt;
> +  enum rtx_code code;
> +
> +  if (where == what)
> +    return outer_mode;
> +  if (*where == NULL_RTX)
> +    return VOIDmode;
> +  x = *where;
> +  code = GET_CODE (x);
> +  outer_mode = GET_MODE (x);
> +  fmt = GET_RTX_FORMAT (code);
> +  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
> +    {
> +      if (fmt[i] == 'e')
> +	{
> +	  if ((mode = find_mode (&XEXP (x, i), outer_mode, what)) != VOIDmode)
> +	    return mode;
> +	}
> +      else if (fmt[i] == 'E')
> +	{
> +	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
> +	  if ((mode = find_mode (&XVECEXP (x, i, j), outer_mode, what))
> +	      != VOIDmode)
> +	    return mode;
> +	}
> +    }
> +  return VOIDmode;
> +}
> +
> +/* Return mode for operand NOP of the current insn.  */
> +static inline enum machine_mode
> +get_op_mode (int nop)
> +{
> +  rtx *loc;
> +  enum machine_mode mode;
> +  bool md_first_p = asm_noperands (PATTERN (curr_insn)) < 0;
> +
> +  /* Take mode from the machine description first.  */
> +  if (md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
> +    return mode;
> +  loc = curr_id->operand_loc[nop];
> +  /* Take mode from the operand second.	 */
> +  mode = GET_MODE (*loc);
> +  if (mode != VOIDmode)
> +    return mode;
> +  if (! md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
> +    return mode;
> +  /* Here is a very rare case.	Take mode from the context.  */
> +  return find_mode (&PATTERN (curr_insn), VOIDmode, loc);
> +}

This looks a lot more complicated than the reload version.  Why is
it needed?  In reload the conditions for address operands were:

	  /* Address operands are reloaded in their existing mode,
	     no matter what is specified in the machine description.  */
	  operand_mode[i] = GET_MODE (recog_data.operand[i]);

	  /* If the address is a single CONST_INT pick address mode
	     instead otherwise we will later not know in which mode
	     the reload should be performed.  */
	  if (operand_mode[i] == VOIDmode)
	    operand_mode[i] = Pmode;

which for LRA might look like:

	  /* The mode specified in the .md file for address operands
	     is the mode of the addressed value, not the address itself.
	     We therefore need to get the mode from the operand rtx.
	     If the operand has no mode, assume it was Pmode.  */

For other operands, recog_data.operand_mode ought to be correct.

find_mode assumes that the mode of an operand is the same as the mode of
the outer rtx, which isn't true when the outer rtx is a subreg, mem,
or one of several unary operators.

This is one that I think would be best decided for 4.8.

> +/* If REG is a reload pseudo, try to make its class satisfying CL.  */
> +static void
> +narrow_reload_pseudo_class (rtx reg, enum reg_class cl)
> +{
> +  int regno;
> +  enum reg_class rclass;
> +
> +  /* Do not make more accurate class from reloads generated.  They are
> +     mostly moves with a lot of constraints.  Making more accurate
> +     class may results in very narrow class and impossibility of find
> +     registers for several reloads of one insn.	 */
> +  if (INSN_UID (curr_insn) >= new_insn_uid_start)
> +    return;
> +  if (GET_CODE (reg) == SUBREG)
> +    reg = SUBREG_REG (reg);
> +  if (! REG_P (reg) || (regno = REGNO (reg)) < new_regno_start)
> +    return;
> +  rclass = get_reg_class (regno);
> +  rclass = ira_reg_class_subset[rclass][cl];
> +  if (rclass == NO_REGS)
> +    return;
> +  change_class (regno, rclass, "      Change", true);
> +}

There seems to be an overlap in functionality with in_class_p here.
Maybe:

{
  enum reg_class rclass;

  if (in_class_p (reg, cl, &rclass) && rclass != NO_REGS)
    change_class (REGNO (reg), rclass, "      Change", true);
}

(assuming the change in in_class_p interface suggested above).
This avoids duplicating subtleties like the handling of reloads.

> +      /* We create pseudo for out rtx because we always should keep
> +	 registers with the same original regno have synchronized
> +	 value (it is not true for out register but it will be
> +	 corrected by the next insn).

I don't understand this comment, sorry.

> +	 Do not reuse register because of the following situation: a <-
> +	 a op b, and b should be the same as a.	 */

This part is very convincing though :-)  Maybe:

	 We cannot reuse the current output register because we might
	 have a situation like "a <- a op b", where the constraints force
	 the second input operand ("b") to match the output operand ("a").
	 "b" must then be copied into a new register so that it doesn't
	 clobber the current value of "a".  */

We should probably keep the other reason too, of course.

> +      /* Don't generate inheritance for the new register because we
> +	 can not use the same hard register for the corresponding
> +	 inheritance pseudo for input reload.  */
> +      bitmap_set_bit (&lra_matched_pseudos, REGNO (new_in_reg));

Suggest dropping this comment, since we don't do any inheritance here.
The comment above lra_matched_pseudos already says the same thing.

> +  /* In and out operand can be got from transformations before
> +     processing constraints.  So the pseudos might have inaccurate
> +     class and we should make their classes more accurate.  */
> +  narrow_reload_pseudo_class (in_rtx, goal_class);
> +  narrow_reload_pseudo_class (out_rtx, goal_class);

I don't understand this, sorry.  Does "transformations" mean inheritance
and reload splitting?  So the registers we're changing here are inheritance
and split pseudos rather than reload pseudos created for this instruction?
If so, it sounds on face value like it conflicts with the comment quoted
above about not allowing reload instructions to the narrow the class
of pseudos.  Might be worth saying why that's OK here but not there.

Also, I'm not sure I understand why it helps.  Is it just trying
to encourage the pseudos to form a chain in lra-assigns.c?

E.g. MIPS16 has several instructions that require matched MIPS16 registers.
However, moves between MIPS16 registers and general registers are as cheap
as moves between two MIPS16 registers, so narrowing the reloaded values
from GENERAL_REGS to M16_REGS (if that ever happens) wouldn't necessarily
be a good thing.

Not saying this is wrong, just that it might need more commentary
to justify it.

> +  for (i = 0; (in = ins[i]) >= 0; i++)
> +    *curr_id->operand_loc[in] = new_in_reg;

The code assumes that all input operands have the same mode.
Probably worth asserting that here (or maybe further up; I don't mind),
just to make the assumption explicit.

> +/* Return final hard regno (plus offset) which will be after
> +   elimination.	 We do this for matching constraints because the final
> +   hard regno could have a different class.  */
> +static int
> +get_final_hard_regno (int hard_regno, int offset)
> +{
> +  if (hard_regno < 0)
> +    return hard_regno;
> +  hard_regno += offset;
> +  return lra_get_elimation_hard_regno (hard_regno);

Why apply the offset before rather than after elimination?
AIUI, AVR's eliminable registers span more than one hard register,
and the elimination is based off the first.

Also, all uses but one of lra_get_hard_regno_and_offset follow
the pattern:

      lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
      /* The real hard regno of the operand after the allocation.  */
      x_hard_regno = get_final_hard_regno (x_hard_regno, offset);

so couldn't lra_get_hard_regno_and_offset just return the final
hard register, including elimination?  Then it could apply the
elimination on the original rtx.

FWIW, the exception I mentioned was operands_match_p:

      lra_get_hard_regno_and_offset (x, &i, &offset);
      if (i < 0)
	goto slow;
      i += offset;

but I'm not sure why this is the only caller that would want
to ignore elimination.

> +/* Return register class of OP.	 That is a class of the hard register
> +   itself (if OP is a hard register), or class of assigned hard
> +   register to the pseudo (if OP is pseudo), or allocno class of
> +   unassigned pseudo (if OP is reload pseudo).	Return NO_REGS
> +   otherwise.  */
> +static enum reg_class
> +get_op_class (rtx op)
> +{
> +  int regno, hard_regno, offset;
> +
> +  if (! REG_P (op))
> +    return NO_REGS;
> +  lra_get_hard_regno_and_offset (op, &hard_regno, &offset);
> +  if (hard_regno >= 0)
> +    {
> +      hard_regno = get_final_hard_regno (hard_regno, offset);
> +      return REGNO_REG_CLASS (hard_regno);
> +    }
> +  /* Reload pseudo will get a hard register in any case.  */
> +  if ((regno = REGNO (op)) >= new_regno_start)
> +    return lra_get_allocno_class (regno);
> +  return NO_REGS;
> +}

This looks like it ought to be the same as:

  return REG_P (x) ? get_reg_class (REGNO (x)) : NO_REGS;

If not, I think there should be a comment explaining the difference.
If so, the comment might be:

/* If OP is a register, return the class of the register as per
   get_reg_class, otherwise return NO_REGS.  */

> +/* Return generated insn mem_pseudo:=val if TO_P or val:=mem_pseudo
> +   otherwise.  If modes of MEM_PSEUDO and VAL are different, use
> +   SUBREG for VAL to make them equal.  Assign CODE to the insn if it
> +   is not recognized.
> +
> +   We can not use emit_move_insn in some cases because of bad used
> +   practice in some machine descriptions.  For example, power can use
> +   only base+index addressing for altivec move insns and it is checked
> +   by insn predicates.	On the other hand, the same move insn
> +   constraints permit to use offsetable memory for moving vector mode
> +   values from/to general registers to/from memory.  emit_move_insn
> +   will transform offsetable address to one with base+index addressing
> +   which is rejected by the constraint.	 So sometimes we need to
> +   generate move insn without modifications and assign the code
> +   explicitly because the generated move can be unrecognizable because
> +   of the predicates.  */

Ick :-)  Can't we just say that fixing this is part of the process
of porting a target to LRA?  It'd be nice not to carry hacks like
this around in shiny new code.

As it stands:

> +static rtx
> +emit_spill_move (bool to_p, rtx mem_pseudo, rtx val, int code)
> +{
> +  rtx insn, after;
> +
> +  start_sequence ();
> +  if (GET_MODE (mem_pseudo) != GET_MODE (val))
> +    val = gen_rtx_SUBREG (GET_MODE (mem_pseudo),
> +			  GET_CODE (val) == SUBREG ? SUBREG_REG (val) : val,
> +			  0);
> +  if (to_p)
> +    insn = gen_move_insn (mem_pseudo, val);
> +  else
> +    insn = gen_move_insn (val, mem_pseudo);
> +  if (recog_memoized (insn) < 0)
> +    INSN_CODE (insn) = code;
> +  emit_insn (insn);
> +  after = get_insns ();
> +  end_sequence ();
> +  return after;
> +}

this recog_memoized code effectively assumes that INSN is just one
instruction, whereas emit_move_insn_1 or the backend move expanders
could split moves into several instructions.

Since the code-forcing stuff is for rs6000, I think we could drop it
from 4.8 whatever happens.

The sequence stuff above looks redundant; we should just return
INSN directly.

> +  /* Quick check on the right move insn which does not need
> +     reloads.  */
> +  if ((dclass = get_op_class (dest)) != NO_REGS
> +      && (sclass = get_op_class (src)) != NO_REGS
> +      && targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2)
> +    return true;

Suggest:

  /* The backend guarantees that register moves of cost 2 never need
     reloads.  */

> +  if (GET_CODE (dest) == SUBREG)
> +    dreg = SUBREG_REG (dest);
> +  if (GET_CODE (src) == SUBREG)
> +    sreg = SUBREG_REG (src);
> +  if (! REG_P (dreg) || ! REG_P (sreg))
> +    return false;
> +  sclass = dclass = NO_REGS;
> +  dr = get_equiv_substitution (dreg);
> +  if (dr != dreg)
> +    dreg = copy_rtx (dr);

I think this copy is too early, because there are quite a few
conditions under which we never emit anything with DREG in it.

> +  if (REG_P (dreg))
> +    dclass = get_reg_class (REGNO (dreg));
> +  if (dclass == ALL_REGS)
> +    /* We don't know what class we will use -- let it be figured out
> +       by curr_insn_transform function.	 Remember some targets does not
> +       work with such classes through their implementation of
> +       machine-dependent hooks like secondary_memory_needed.  */
> +    return false;

Don't really understand this comment, sorry.

> +  sreg_mode = GET_MODE (sreg);
> +  sr = get_equiv_substitution (sreg);
> +  if (sr != sreg)
> +    sreg = copy_rtx (sr);

This copy also seems too early.

> +  sri.prev_sri = NULL;
> +  sri.icode = CODE_FOR_nothing;
> +  sri.extra_cost = 0;
> +  secondary_class = NO_REGS;
> +  /* Set up hard register for a reload pseudo for hook
> +     secondary_reload because some targets just ignore unassigned
> +     pseudos in the hook.  */
> +  if (dclass != NO_REGS
> +      && REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start
> +      && lra_get_regno_hard_regno (dregno) < 0)
> +    reg_renumber[dregno] = ira_class_hard_regs[dclass][0];
> +  else
> +    dregno = -1;
> +  if (sclass != NO_REGS
> +      && REG_P (sreg) && (sregno = REGNO (sreg)) >= new_regno_start
> +      && lra_get_regno_hard_regno (sregno) < 0)
> +    reg_renumber[sregno] = ira_class_hard_regs[sclass][0];
> +  else
> +    sregno = -1;

I think this would be correct without the:

     && REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start

condition (and similarly for the src case).  IMO it would be clearer too:
the decision about when to return a register class for unallocated pseudos
is then localised to get_reg_class rather than copied both here and there.

> +  if (sclass != NO_REGS)
> +    secondary_class
> +      = (enum reg_class) targetm.secondary_reload (false, dest,
> +						   (reg_class_t) sclass,
> +						   GET_MODE (src), &sri);
> +  if (sclass == NO_REGS
> +      || ((secondary_class != NO_REGS || sri.icode != CODE_FOR_nothing)
> +	  && dclass != NO_REGS))
> +    secondary_class
> +      = (enum reg_class) targetm.secondary_reload (true, sreg,
> +						   (reg_class_t) dclass,
> +						   sreg_mode, &sri);

Hmm, so for register<-register moves, if the target says that the output
reload needs a secondary reload, we try again with an input reload and
hope for a different answer?

If the target is giving different answers in that case, I think that's
a bug in the target, and we should assert instead.  The problem is that
if we allow the answers to be different, and both answers involve
secondary reloads, we have no way of knowing whether the second answer
is easier to implement or "more correct" than the first.  An assert
avoids that, and puts the onus on the target to sort itself out.

Again, as long as x86 is free of this bug for 4.8, I don't the merge
needs to cater for broken targets.

> +  *change_p = true;

I think this is the point at which substituted values should be copied.

> +  new_reg = NULL_RTX;
> +  if (secondary_class != NO_REGS)
> +    new_reg = lra_create_new_reg_with_unique_value (sreg_mode, NULL_RTX,
> +						    secondary_class,
> +						    "secondary");
> +  start_sequence ();
> +  if (sri.icode == CODE_FOR_nothing)
> +    lra_emit_move (new_reg, sreg);
> +  else
> +    {
> +      enum reg_class scratch_class;
> +
> +      scratch_class = (reg_class_from_constraints
> +		       (insn_data[sri.icode].operand[2].constraint));
> +      scratch_reg = (lra_create_new_reg_with_unique_value
> +		     (insn_data[sri.icode].operand[2].mode, NULL_RTX,
> +		      scratch_class, "scratch"));
> +      emit_insn (GEN_FCN (sri.icode) (new_reg != NULL_RTX ? new_reg : dest,
> +				      sreg, scratch_reg));
> +    }
> +  before = get_insns ();
> +  end_sequence ();
> +  lra_process_new_insns (curr_insn, before, NULL_RTX, "Inserting the move");

AIUI, the constraints pass will look at these instructions and generate
what are now known as tertiary reloads where needed (by calling this
function again).  Is that right?  Very nice if so: that's far more
natural than the current reload handling.

> +/* The chosen reg classes which should be used for the corresponding
> +   operands.  */
> +static enum reg_class goal_alt[MAX_RECOG_OPERANDS];
> +/* True if the operand should be the same as another operand and the
> +   another operand does not need a reload.  */

s/and the another/and that other/

> +/* Make reloads for addr register in LOC which should be of class CL,
> +   add reloads to list BEFORE.	If AFTER is not null emit insns to set
> +   the register up after the insn (it is case of inc/dec, modify).  */

Maybe:

/* Arrange for address element *LOC to be a register of class CL.
   Add any input reloads to list BEFORE.  AFTER is nonnull if *LOC is an
   automodified value; handle that case by adding the required output
   reloads to list AFTER.  Return true if the RTL was changed.  */

> +static bool
> +process_addr_reg (rtx *loc, rtx *before, rtx *after, enum reg_class cl)
> +{
> +  int regno, final_regno;
> +  enum reg_class rclass, new_class;
> +  rtx reg = *loc;
> +  rtx new_reg;
> +  enum machine_mode mode;
> +  bool change_p = false;
> +
> +  mode = GET_MODE (reg);
> +  if (! REG_P (reg))
> +    {
> +      /* Always reload memory in an address even if the target
> +	 supports such addresses.  */
> +      new_reg
> +	= lra_create_new_reg_with_unique_value (mode, reg, cl, "address");
> +      push_to_sequence (*before);
> +      lra_emit_move (new_reg, reg);
> +      *before = get_insns ();
> +      end_sequence ();
> +      *loc = new_reg;
> +      if (after != NULL)
> +	{
> +	  start_sequence ();
> +	  lra_emit_move (reg, new_reg);
> +	  emit_insn (*after);
> +	  *after = get_insns ();
> +	  end_sequence ();
> +	}
> +      return true;

Why does this need to be a special case, rather than reusing the
code later in the function?  Specifically:

> +    }
> +  lra_assert (REG_P (reg));
> +  final_regno = regno = REGNO (reg);
> +  if (regno < FIRST_PSEUDO_REGISTER)
> +    {
> +      rtx final_reg = reg;
> +      rtx *final_loc = &final_reg;
> +
> +      lra_eliminate_reg_if_possible (final_loc);
> +      final_regno = REGNO (*final_loc);
> +    }
> +  /* Use class of hard register after elimination because some targets
> +     do not recognize virtual hard registers as valid address
> +     registers.	 */
> +  rclass = get_reg_class (final_regno);
> +  if ((*loc = get_equiv_substitution (reg)) != reg)
> +    {
> +      if (lra_dump_file != NULL)
> +	{
> +	  fprintf (lra_dump_file,
> +		   "Changing pseudo %d in address of insn %u on equiv ",
> +		   REGNO (reg), INSN_UID (curr_insn));
> +	  print_value_slim (lra_dump_file, *loc, 1);
> +	  fprintf (lra_dump_file, "\n");
> +	}
> +      *loc = copy_rtx (*loc);
> +      change_p = true;
> +    }
> +  if (*loc != reg || ! in_class_p (final_regno, GET_MODE (reg), cl, &new_class))
> +    {
> +      reg = *loc;
> +      if (get_reload_reg (OP_IN, mode, reg, cl, "address", &new_reg))
> +	{
> +	  push_to_sequence (*before);
> +	  lra_emit_move (new_reg, reg);
> +	  *before = get_insns ();
> +	  end_sequence ();
> +	}
> +      *loc = new_reg;
> +      if (after != NULL)
> +	{
> +	  start_sequence ();
> +	  lra_emit_move (reg, new_reg);
> +	  emit_insn (*after);
> +	  *after = get_insns ();
> +	  end_sequence ();
> +	}
> +      change_p = true;
> +    }
> +  else if (new_class != NO_REGS && rclass != new_class)
> +    change_class (regno, new_class, "	   Change", true);
> +  return change_p;
> +}

E.g.:

  if ((*loc = get_equiv_substitution (reg)) != reg)
    ...as above...
  if (*loc != reg || !in_class_p (reg, cl, &new_class))
    ...as above...
  else if (new_class != NO_REGS && rclass != new_class)
    change_class (regno, new_class, "	   Change", true);
  return change_p;

(assuming change to in_class_p suggested earlier) seems like it
covers the same cases.

Also, should OP_IN be OP_INOUT for after != NULL, so that we don't try
to reuse existing reload pseudos?  That would mean changing get_reload_reg
(both commentary and code) to handle OP_INOUT like OP_OUT.

Or maybe just pass OP_OUT instead of OP_INOUT, if that's more consistent.
I don't mind which.

> +  /* Force reload if this is a constant or PLUS or if there may be a
> +     problem accessing OPERAND in the outer mode.  */

Suggest:

  /* Force a reload of the SUBREG_REG if this ...

> +      /* Constant mode ???? */
> +      enum op_type type = curr_static_id->operand[nop].type;

Not sure what the comment means, but REG is still the original SUBREG_REG,
so there shouldn't be any risk of a VOIDmode constant.  (subreg (const_int))
is invalid rtl.

> +/* Return TRUE if *LOC refers for a hard register from SET.  */
> +static bool
> +uses_hard_regs_p (rtx *loc, HARD_REG_SET set)
> +{

Nothing seems to care about the address, so we would pass the rtx
rather than a pointer to it.

> +  int i, j, x_hard_regno, offset;
> +  enum machine_mode mode;
> +  rtx x;
> +  const char *fmt;
> +  enum rtx_code code;
> +
> +  if (*loc == NULL_RTX)
> +    return false;
> +  x = *loc;
> +  code = GET_CODE (x);
> +  mode = GET_MODE (x);
> +  if (code == SUBREG)
> +    {
> +      loc = &SUBREG_REG (x);
> +      x = SUBREG_REG (x);
> +      code = GET_CODE (x);
> +      if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
> +	mode = GET_MODE (x);
> +    }
> +  
> +  if (REG_P (x))
> +    {
> +      lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
> +      /* The real hard regno of the operand after the allocation.  */
> +      x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
> +      return (x_hard_regno >= 0
> +	      && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));

With the subreg mode handling above, this looks little-endian specific.
The MEM case:

> +  if (MEM_P (x))
> +    {
> +      struct address ad;
> +      enum machine_mode mode = GET_MODE (x);
> +      rtx *addr_loc = &XEXP (x, 0);
> +
> +      extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
> +      if (ad.base_reg_loc != NULL)
> +	{
> +	  if (uses_hard_regs_p (ad.base_reg_loc, set))
> +	    return true;
> +	}
> +      if (ad.index_reg_loc != NULL)
> +	{
> +	  if (uses_hard_regs_p (ad.index_reg_loc, set))
> +	    return true;
> +	}
> +    }

is independent of the subreg handling, so perhaps the paradoxical subreg
case should be handled separately, using simplify_subreg_regno.

> +/* Major function to choose the current insn alternative and what
> +   operands should be reloaded and how.	 If ONLY_ALTERNATIVE is not
> +   negative we should consider only this alternative.  Return false if
> +   we can not choose the alternative or find how to reload the
> +   operands.  */
> +static bool
> +process_alt_operands (int only_alternative)
> +{
> +  bool ok_p = false;
> +  int nop, small_class_operands_num, overall, nalt, offset;
> +  int n_alternatives = curr_static_id->n_alternatives;
> +  int n_operands = curr_static_id->n_operands;
> +  /* LOSERS counts those that don't fit this alternative and would
> +     require loading.  */
> +  int losers;

s/those/the operands/

> +  /* Calculate some data common for all alternatives to speed up the
> +     function.	*/
> +  for (nop = 0; nop < n_operands; nop++)
> +    {
> +      op = no_subreg_operand[nop] = *curr_id->operand_loc[nop];
> +      lra_get_hard_regno_and_offset (op, &hard_regno[nop], &offset);
> +      /* The real hard regno of the operand after the allocation.  */
> +      hard_regno[nop] = get_final_hard_regno (hard_regno[nop], offset);
> +      
> +      operand_reg[nop] = op;
> +      biggest_mode[nop] = GET_MODE (operand_reg[nop]);
> +      if (GET_CODE (operand_reg[nop]) == SUBREG)
> +	{
> +	  operand_reg[nop] = SUBREG_REG (operand_reg[nop]);
> +	  if (GET_MODE_SIZE (biggest_mode[nop])
> +	      < GET_MODE_SIZE (GET_MODE (operand_reg[nop])))
> +	    biggest_mode[nop] = GET_MODE (operand_reg[nop]);
> +	}
> +      if (REG_P (operand_reg[nop]))
> +	no_subreg_operand[nop] = operand_reg[nop];
> +      else
> +	operand_reg[nop] = NULL_RTX;

This looks odd: no_subreg_operand ends up being a subreg if the
SUBREG_REG wasn't a REG.  Some more commentary might help.

> +  /* The constraints are made of several alternatives.	Each operand's
> +     constraint looks like foo,bar,... with commas separating the
> +     alternatives.  The first alternatives for all operands go
> +     together, the second alternatives go together, etc.
> +
> +     First loop over alternatives.  */
> +  for (nalt = 0; nalt < n_alternatives; nalt++)
> +    {
> +      /* Loop over operands for one constraint alternative.  */
> +      if (
> +#ifdef HAVE_ATTR_enabled
> +	  (curr_id->alternative_enabled_p != NULL
> +	   && ! curr_id->alternative_enabled_p[nalt])
> +	  ||
> +#endif
> +	  (only_alternative >= 0 && nalt != only_alternative))
> +	continue;

Probably more natural if split into two "if (...) continue;"s.  E.g.:

#ifdef HAVE_ATTR_enabled
      if (curr_id->alternative_enabled_p != NULL
	  && !curr_id->alternative_enabled_p[nalt])
	continue;
#endif
      if (only_alternative >= 0 && nalt != only_alternative))
	continue;

> +      for (nop = 0; nop < n_operands; nop++)
> +	{
> +	  const char *p;
> +	  char *end;
> +	  int len, c, m, i, opalt_num, this_alternative_matches;
> +	  bool win, did_match, offmemok, early_clobber_p;
> +	  /* false => this operand can be reloaded somehow for this
> +	     alternative.  */
> +	  bool badop;
> +	  /* false => this operand can be reloaded if the alternative
> +	     allows regs.  */
> +	  bool winreg;
> +	  /* False if a constant forced into memory would be OK for
> +	     this operand.  */
> +	  bool constmemok;
> +	  enum reg_class this_alternative, this_costly_alternative;
> +	  HARD_REG_SET this_alternative_set, this_costly_alternative_set;
> +	  bool this_alternative_match_win, this_alternative_win;
> +	  bool this_alternative_offmemok;
> +	  int invalidate_m;
> +	  enum machine_mode mode;
> +
> +	  opalt_num = nalt * n_operands + nop;
> +	  if (curr_static_id->operand_alternative[opalt_num].anything_ok)
> +	    {
> +	      /* Fast track for no constraints at all.	*/
> +	      curr_alt[nop] = NO_REGS;
> +	      CLEAR_HARD_REG_SET (curr_alt_set[nop]);
> +	      curr_alt_win[nop] = true;
> +	      curr_alt_match_win[nop] = false;
> +	      curr_alt_offmemok[nop] = false;
> +	      curr_alt_matches[nop] = -1;
> +	      continue;
> +	    }

Given that this code is pretty complex, it might be clearer to remove
the intermediate "this_*" variables and assign directly to curr_alt_*.  I.e.:

	  curr_alt[nop] = NO_REGS;
	  CLEAR_HARD_REG_SET (curr_alt_set[nop]);
	  curr_alt_win[nop] = false;
	  curr_alt_match_win[nop] = false;
	  curr_alt_offmemok[nop] = false;
	  curr_alt_matches[nop] = -1;

	  opalt_num = nalt * n_operands + nop;
	  if (curr_static_id->operand_alternative[opalt_num].anything_ok)
	    {
	      /* Fast track for no constraints at all.	*/
	      curr_alt_win[nop] = true;
	      continue;
	    }

Obviously keep this nice comment:

> +	  /* We update set of possible hard regs besides its class
> +	     because reg class might be inaccurate.  For example,
> +	     union of LO_REGS (l), HI_REGS(h), and STACK_REG(k) in ARM
> +	     is translated in HI_REGS because classes are merged by
> +	     pairs and there is no accurate intermediate class.	 */

somewhere though, either here or above the declaration of curr_alt_set.

> +		    /* We are supposed to match a previous operand.
> +		       If we do, we win if that one did.  If we do
> +		       not, count both of the operands as losers.
> +		       (This is too conservative, since most of the
> +		       time only a single reload insn will be needed
> +		       to make the two operands win.  As a result,
> +		       this alternative may be rejected when it is
> +		       actually desirable.)  */
> +		    /* If it conflicts with others.  */

Last line looks incomplete/misplaced.

> +		    match_p = false;
> +		    if (operands_match_p (*curr_id->operand_loc[nop],
> +					  *curr_id->operand_loc[m], m_hregno))
> +		      {
> +			int i;
> +			
> +			for (i = 0; i < early_clobbered_regs_num; i++)
> +			  if (early_clobbered_nops[i] == m)
> +			    break;
> +			/* We should reject matching of an early
> +			   clobber operand if the matching operand is
> +			   not dying in the insn.  */
> +			if (i >= early_clobbered_regs_num

Why not simply use operands m's early_clobber field?

> +			    || operand_reg[nop] == NULL_RTX
> +			    || (find_regno_note (curr_insn, REG_DEAD,
> +						 REGNO (operand_reg[nop]))
> +				!= NULL_RTX))
> +			  match_p = true;

...although I don't really understand this condition.  If the two
operands are the same value X, then X must die here whatever the
notes say.  So I assume this is coping with a case where the operands
are different but still match.  If so, could you give an example?

Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't
apply to the matched input operand; the earlyclobber only applies to
other input operands.  So I'd have expected it was those operands
that might need reloading rather than this one.

E.g. if X occurs three times, twice in a matched earlyclobber pair
and once as an independent operand, it's the latter operand that would
need reloading.

> +			/* Operands don't match.  */
> +			/* Retroactively mark the operand we had to
> +			   match as a loser, if it wasn't already and
> +			   it wasn't matched to a register constraint
> +			   (e.g it might be matched by memory).	 */
> +			if (curr_alt_win[m]
> +			    && (operand_reg[m] == NULL_RTX
> +				|| hard_regno[m] < 0))
> +			  {
> +			    losers++;
> +			    if (curr_alt[m] != NO_REGS)
> +			      reload_nregs
> +				+= (ira_reg_class_max_nregs[curr_alt[m]]
> +				    [GET_MODE (*curr_id->operand_loc[m])]);
> +			}
> +			invalidate_m = m;
> +			if (curr_alt[m] == NO_REGS)
> +			  continue;

I found this a bit confusing.  If the operands don't match and operand m
allows no registers, don't we have to reject this constraint outright?
E.g. something like:

			/* Operands don't match.  Both operands must
			   allow a reload register, otherwise we cannot
			   make them match.  */
			if (curr_alt[m] == NO_REGS)
			  break;
			/* Retroactively mark the operand we had to
			   match as a loser, if it wasn't already and
			   it wasn't matched to a register constraint
			   (e.g it might be matched by memory).	 */
			if (curr_alt_win[m]
			    && (operand_reg[m] == NULL_RTX
				|| hard_regno[m] < 0))
			  {
			    losers++;
			    reload_nregs
			      += (ira_reg_class_max_nregs[curr_alt[m]]
				  [GET_MODE (*curr_id->operand_loc[m])]);
			  }

> +		    /* This can be fixed with reloads if the operand
> +		       we are supposed to match can be fixed with
> +		       reloads.	 */
> +		    badop = false;
> +		    this_alternative = curr_alt[m];
> +		    COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);
> +		    
> +		    /* If we have to reload this operand and some
> +		       previous operand also had to match the same
> +		       thing as this operand, we don't know how to do
> +		       that.  So reject this alternative.  */
> +		    if (! did_match)
> +		      for (i = 0; i < nop; i++)
> +			if (curr_alt_matches[i] == this_alternative_matches)
> +			  badop = true;

OK, so this is another case of cruft from reload that I'd like to remove,
but do you know of any reason why this shouldn't be:

		    /* If we have to reload this operand and some previous
		       operand also had to match the same thing as this
		       operand, we don't know how to do that.  */
		    if (!match_p || !curr_alt_win[m])
		      {
			for (i = 0; i < nop; i++)
			  if (curr_alt_matches[i] == m)
			    break;
			if (i < nop)
			  break;
		      }
		    else
--->		      did_match = true;

		    /* This can be fixed with reloads if the operand
		       we are supposed to match can be fixed with
		       reloads.	 */
--->		    this_alternative_matches = m;
--->		    invalidate_m = m;
		    badop = false;
		    this_alternative = curr_alt[m];
		    COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);

(although a helper function might be better than the awkward breaking)?
Note that the ---> lines have moved from further up.

This is the only time in the switch statement where one constraint
in a constraint string uses "badop = true" to reject the operand.
I.e. for "<something else>0" we should normally not reject the
alternative based solely on the "0", since the "<something else>"
might have been satisfied instead.  And we should only record matching
information if we've decided the match can be implemented by reloads
(the last block).

> +			/* We prefer no matching alternatives because
> +			   it gives more freedom in RA.	 */
> +			if (operand_reg[nop] == NULL_RTX
> +			    || (find_regno_note (curr_insn, REG_DEAD,
> +						 REGNO (operand_reg[nop]))
> +				 == NULL_RTX))
> +			  reject += 2;

Looks like a new reject rule.  I agree it makes conceptual sense though,
so I'm all for it.

> +		      || (REG_P (op)
> +			  && REGNO (op) >= FIRST_PSEUDO_REGISTER
> +			  && in_mem_p (REGNO (op))))

This pattern occurs several times.  I think a helper function like
spilled_reg_p (op) would help.  See 'g' below.

> +		case 's':
> +		  if (CONST_INT_P (op)
> +		      || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))
...
> +		case 'n':
> +		  if (CONST_INT_P (op)
> +		      || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))

After recent changes these should be CONST_SCALAR_INT_P (op)

> +		case 'g':
> +		  if (/* A PLUS is never a valid operand, but LRA can
> +			 make it from a register when eliminating
> +			 registers.  */
> +		      GET_CODE (op) != PLUS
> +		      && (! CONSTANT_P (op) || ! flag_pic
> +			  || LEGITIMATE_PIC_OPERAND_P (op))
> +		      && (! REG_P (op)
> +			  || (REGNO (op) >= FIRST_PSEUDO_REGISTER
> +			      && in_mem_p (REGNO (op)))))

Rather than the special case for PLUS, I think this would be better as:

	  if (MEM_P (op)
	      || spilled_reg_p (op)
	      || general_constant_p (op))
	    win = true;

where general_constant_p abstracts away:

	  (CONSTANT_P (op)
	   && (! flag_pic || LEGITIMATE_PIC_OPERAND_P (op)))

general_constant_p probably ought to go in a common file because several
places need this condition (including other parts of this switch statement).

> +#ifdef EXTRA_CONSTRAINT_STR
> +		      if (EXTRA_MEMORY_CONSTRAINT (c, p))
> +			{
> +			  if (EXTRA_CONSTRAINT_STR (op, c, p))
> +			    win = true;
> +			  /* For regno_equiv_mem_loc we have to
> +			     check.  */
> +			  else if (REG_P (op)
> +				   && REGNO (op) >= FIRST_PSEUDO_REGISTER
> +				   && in_mem_p (REGNO (op)))

Looks like an old comment from an earlier iteration.  There doesn't
seem to be a function called regno_equiv_mem_loc in the current patch.
But...

> +			    {
> +			      /* We could transform spilled memory
> +				 finally to indirect memory.  */
> +			      if (EXTRA_CONSTRAINT_STR
> +				  (get_indirect_mem (mode), c, p))
> +				win = true;
> +			    }

...is this check really needed?  It's a documented requirement that memory
constraints accept plain base registers.  Also, the following code is:

> +			  /* If we didn't already win, we can reload
> +			     constants via force_const_mem, and other
> +			     MEMs by reloading the address like for
> +			     'o'.  */
> +			  if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
> +			    badop = false;

It seems a bit inconsistent to treat a spilled pseudo whose address
might well need reloading as a win, while not treating existing MEMs
whose addresses need reloading as a win.

> +		      if (EXTRA_CONSTRAINT_STR (op, c, p))
> +			win = true;
> +		      else if (REG_P (op)
> +			       && REGNO (op) >= FIRST_PSEUDO_REGISTER
> +			       && in_mem_p (REGNO (op)))
> +			{
> +			  /* We could transform spilled memory finally
> +			     to indirect memory.  */
> +			  if (EXTRA_CONSTRAINT_STR (get_indirect_mem (mode),
> +						    c, p))
> +			    win = true;
> +			}

I don't understand why there's two copies of this.  I think we have
to trust the target's classification of constraints, so if the target
says that something isn't a memory constraint, we shouldn't check the
(mem (base)) case.

> +	      if (c != ' ' && c != '\t')
> +		costly_p = c == '*';

I think there needs to be a comment somewhere saying how we handle this.
Being costly seems to contribute one reject point (i.e. a sixth of a '?')
compared to the normal case, which is very different from the current
reload behaviour.  We should probably update the "*" documentation
in md.texi too.

Some targets use "*" constraints to make sure that floating-point
registers don't get used as spill space in purely integer code
(so that task switches don't pay the FPU save/restore penalty).
Would that still "work" with this definition?  (FWIW, I think
using "*" is a bad way to achieve this feature, just asking.)

> +		  /* We simulate the behaviour of old reload here.
> +		     Although scratches need hard registers and it
> +		     might result in spilling other pseudos, no reload
> +		     insns are generated for the scratches.  So it
> +		     might cost something but probably less than old
> +		     reload pass believes.  */
> +		  if (lra_former_scratch_p (REGNO (operand_reg[nop])))
> +		    reject += LOSER_COST_FACTOR;

Yeah, this caused me no end of trouble when tweaking the MIPS
multiply-accumulate patterns.  However, unlike the other bits of
cruft I've been complaining about, this is one where I can't think
of any alternative that makes more inherent sense (to me).  So I agree
that leaving it as-is is the best approach for now.

> +	      /* If the operand is dying, has a matching constraint,
> +		 and satisfies constraints of the matched operand
> +		 which failed to satisfy the own constraints, we do
> +		 not need to generate a reload insn for this
> +		 operand.  */
> +	      if (this_alternative_matches < 0
> +		  || curr_alt_win[this_alternative_matches]
> +		  || ! REG_P (op)
> +		  || find_regno_note (curr_insn, REG_DEAD,
> +				      REGNO (op)) == NULL_RTX
> +		  || ((hard_regno[nop] < 0
> +		       || ! in_hard_reg_set_p (this_alternative_set,
> +					       mode, hard_regno[nop]))
> +		      && (hard_regno[nop] >= 0
> +			  || ! in_class_p (REGNO (op), GET_MODE (op),
> +					   this_alternative, NULL))))
> +		losers++;

I think this might be clearer as:

	      if (!(this_alternative_matches >= 0
		    && !curr_alt_win[this_alternative_matches]
		    && REG_P (op)
		    && find_regno_note (curr_insn, REG_DEAD, REGNO (op))
		    && (hard_regno[nop] >= 0
			? in_hard_reg_set_p (this_alternative_set,
					     mode, hard_regno[nop])
			: in_class_p (op, this_alternative, NULL))))
		losers++;

> +	      if (operand_reg[nop] != NULL_RTX)
> +		{
> +		  int last_reload = (lra_reg_info[ORIGINAL_REGNO
> +						  (operand_reg[nop])]
> +				     .last_reload);
> +
> +		  if (last_reload > bb_reload_num)
> +		    reload_sum += last_reload;
> +		  else
> +		    reload_sum += bb_reload_num;

The comment for reload_sum says:

> +/* Overall number reflecting distances of previous reloading the same
> +   value.  It is used to improve inheritance chances.  */
> +static int best_reload_sum;

which made me think of distance from the current instruction.  I see
it's actually something else, effectively a sum of instruction numbers.

I assumed the idea was to prefer registers that were reloaded more
recently (closer the current instruction).  In that case I thought that,
for a distance-based best_reload_sum, smaller would be better,
while for an instruction-number-based best_reload_sum, larger would
be better.  It looks like we use instruction-number based best_reload_sums
but prefer smaller sums:

> +			      && (reload_nregs < best_reload_nregs
> +				  || (reload_nregs == best_reload_nregs
> +				      && best_reload_sum < reload_sum))))))))

Is that intentional?

Also, is this value meaningful for output reloads, which aren't really
going to be able to inherit a value as such?  We seem to apply the cost
regardless of whether it's an input or an output, so probably deserves
a comment.

Same for matched input operands, which as you say elsewhere aren't
inherited.

> +	      if (badop
> +		  /* Alternative loses if it has no regs for a reg
> +		     operand.  */
> +		  || (REG_P (op) && no_regs_p
> +		      && this_alternative_matches < 0))
> +		goto fail;

More reload cruft, but I don't understand why we have both this and, later:

> +	      if (this_alternative_matches < 0
> +		  && no_regs_p && ! this_alternative_offmemok && ! constmemok)
> +		goto fail;

We also had the earlier:

> +	  /* If this operand could be handled with a reg, and some reg
> +	     is allowed, then this operand can be handled.  */
> +	  if (winreg && this_alternative != NO_REGS)
> +	    badop = false;

which I think belongs in the same else statement.  At least after the
matching changes suggested above, I think all three can be replaced by:

	  /* If this operand accepts a register, and if the register class
	     has at least one allocatable register, then this operand
	     can be reloaded.  */
	  if (winreg && !no_regs_p)
	    badop = false;

	  if (badop)
	    goto fail;

which IMO belongs after the "no_regs_p" assignment.  badop should never
be false if we have no way of reloading the value.

> +	      if (! no_regs_p)
> +		reload_nregs
> +		  += ira_reg_class_max_nregs[this_alternative][mode];

I wasn't sure why we counted this even in the "const_to_mem && constmmeok"
and "MEM_P (op) && offmemok" cases from:

	      /* We prefer to reload pseudos over reloading other
		 things, since such reloads may be able to be
		 eliminated later.  So bump REJECT in other cases.
		 Don't do this in the case where we are forcing a
		 constant into memory and it will then win since we
		 don't want to have a different alternative match
		 then.	*/
	      if (! (REG_P (op)
		     && REGNO (op) >= FIRST_PSEUDO_REGISTER)
		  && ! (const_to_mem && constmemok)
		  /* We can reload the address instead of memory (so
		     do not punish it).	 It is preferable to do to
		     avoid cycling in some cases.  */
		  && ! (MEM_P (op) && offmemok))
		reject += 2;

> +	  if (early_clobber_p)
> +	    reject++;
> +	  /* ??? Should we update the cost because early clobber
> +	     register reloads or it is a rare thing to be worth to do
> +	     it.  */
> +	  overall = losers * LOSER_COST_FACTOR + reject;

Could you expand on the comment a bit?

> +	  if ((best_losers == 0 || losers != 0) && best_overall < overall)
> +	    goto fail;
> +
> +	  curr_alt[nop] = this_alternative;
> +	  COPY_HARD_REG_SET (curr_alt_set[nop], this_alternative_set);
> +	  curr_alt_win[nop] = this_alternative_win;
> +	  curr_alt_match_win[nop] = this_alternative_match_win;
> +	  curr_alt_offmemok[nop] = this_alternative_offmemok;
> +	  curr_alt_matches[nop] = this_alternative_matches;
> +  
> +	  if (invalidate_m >= 0 && ! this_alternative_win)
> +	    curr_alt_win[invalidate_m] = false;

BTW, after the matching changes above, I don't think we need both
"invalidate_m" and "this_alternative_matches".

> +	  for (j = hard_regno_nregs[clobbered_hard_regno][biggest_mode[i]] - 1;
> +	       j >= 0;
> +	       j--)
> +	    SET_HARD_REG_BIT (temp_set, clobbered_hard_regno + j);

add_to_hard_reg_set.

> +	    else if (curr_alt_matches[j] == i && curr_alt_match_win[j])
> +	      {
> +		/* This is a trick.  Such operands don't conflict and
> +		   don't need a reload.	 But it is hard to transfer
> +		   this information to the assignment pass which
> +		   spills one operand without this info.  We avoid the
> +		   conflict by forcing to use the same pseudo for the
> +		   operands hoping that the pseudo gets the same hard
> +		   regno as the operands and the reloads are gone.  */
> +		if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])
...
> +		/* See the comment for the previous case.  */
> +		if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])

What are these last two if statements for?  I wasn't sure how two operands
could have the same address.

Not saying they're wrong, but I think a comment would be good.

> +      small_class_operands_num = 0;
> +      for (nop = 0; nop < n_operands; nop++)
> +	/* If this alternative can be made to work by reloading, and
> +	   it needs less reloading than the others checked so far,
> +	   record it as the chosen goal for reloading.	*/
> +	small_class_operands_num
> +	  += SMALL_REGISTER_CLASS_P (curr_alt[nop]) ? 1 : 0;

Misplaced comment; I think it belongs after this line.

Richard
Richard Sandiford Oct. 10, 2012, 7:50 p.m. UTC | #4
Sorry, reading back in different surroundings made me notice a couple
of silly errors:

Richard Sandiford <rdsandiford@googlemail.com> writes:
> E.g.:
>
>   if ((*loc = get_equiv_substitution (reg)) != reg)
>     ...as above...
>   if (*loc != reg || !in_class_p (reg, cl, &new_class))
>     ...as above...
>   else if (new_class != NO_REGS && rclass != new_class)
>     change_class (regno, new_class, "	   Change", true);
>   return change_p;
>
> (assuming change to in_class_p suggested earlier) seems like it
> covers the same cases.

...but that same in_class_p change means that the "rclass != new_class"
condition isn't needed.  I.e.

  if ((*loc = get_equiv_substitution (reg)) != reg)
    ...as above...
  if (*loc != reg || !in_class_p (reg, cl, &new_class))
    ...as above...
  else if (new_class != NO_REGS)
    change_class (regno, new_class, "	   Change", true);
  return change_p;

>> +	      if (operand_reg[nop] != NULL_RTX)
>> +		{
>> +		  int last_reload = (lra_reg_info[ORIGINAL_REGNO
>> +						  (operand_reg[nop])]
>> +				     .last_reload);
>> +
>> +		  if (last_reload > bb_reload_num)
>> +		    reload_sum += last_reload;
>> +		  else
>> +		    reload_sum += bb_reload_num;
>
> The comment for reload_sum says:
>
>> +/* Overall number reflecting distances of previous reloading the same
>> +   value.  It is used to improve inheritance chances.  */
>> +static int best_reload_sum;
>
> which made me think of distance from the current instruction.  I see
> it's actually something else, effectively a sum of instruction numbers.
>
> I assumed the idea was to prefer registers that were reloaded more
> recently (closer the current instruction).  In that case I thought that,
> for a distance-based best_reload_sum, smaller would be better,
> while for an instruction-number-based best_reload_sum, larger would
> be better.  It looks like we use instruction-number based best_reload_sums
> but prefer smaller sums:
>
>> +			      && (reload_nregs < best_reload_nregs
>> +				  || (reload_nregs == best_reload_nregs
>> +				      && best_reload_sum < reload_sum))))))))
>
> Is that intentional?

Clearly I can't read.  The code _does_ prefer higher numbers.  I still
think "distance" is a bit misleading though. :-)

Just for the record, the rest of my question:

> Also, is this value meaningful for output reloads, which aren't really
> going to be able to inherit a value as such?  We seem to apply the cost
> regardless of whether it's an input or an output, so probably deserves
> a comment.
>
> Same for matched input operands, which as you say elsewhere aren't
> inherited.

still applies.

Richard
Vladimir Makarov Oct. 11, 2012, 12:41 a.m. UTC | #5
On 12-10-03 7:11 AM, Richard Sandiford wrote:
> Hi Vlad,
>
> Some comments on lra-spills.c and lra-coalesce.c.
>
>> +   The pass creates necessary stack slots and assign spilled pseudos
>> +   to the stack slots in following way:
> s/assign/assigns/
Fixed.
>> +   (or insn memory constraints) might be not satisfied any more.
> s/might be not/might not be/
Fixed.
>> +   For some targets, the pass can spill some pseudos into hard
>> +   registers of different class (usually into vector registers)
>> +   instead of spilling them into memory if it is possible and
>> +   profitable.	Spilling GENERAL_REGS pseudo into SSE registers for
>> +   modern Intel x86/x86-64 processors is an example of such
>> +   optimization.  And this is actually recommended by Intel
>> +   optimization guide.
> Maybe mention core i7 specifically?  "Modern" is a bit dangerous
> in code that'll live a long time.
Yes, right.  Fixed.  Another bad thing would be an usage of word new.
>> +/* The structure describes a stack slot which can be used for several
>> +   spilled pseudos.  */
>> +struct slot
>> +{
> Looks like this describes "a register or stack slot" given the hard_regno case.
Fixed
>> +/* Array containing info about the stack slots.	 The array element is
>> +   indexed by the stack slot number in the range [0..slost_num).  */
> Typo: slots_num
Fixed.
>> +  /* Each pseudo has an inherent size which comes from its own mode,
>> +     and a total size which provides room for paradoxical subregs
>> +     which refer to the pseudo reg in wider modes.
>> +
>> +     We can use a slot already allocated if it provides both enough
>> +     inherent space and enough total space.  Otherwise, we allocate a
>> +     new slot, making sure that it has no less inherent space, and no
>> +     less total space, then the previous slot.	*/
> The second part of the comment seems a bit misplaced, since the following
> code doesn't reuse stack slots.  This is done elsewhere instead.
> Maybe the first part would be better above the inherent_size assignment.
Right.  I've changed comment to reflect the current state of the code.
>> +  /* If we have any adjustment to make, or if the stack slot is the
>> +     wrong mode, make a new stack slot.	 */
>> +  x = adjust_address_nv (x, GET_MODE (regno_reg_rtx[i]), adjust);
> We don't make a new slot here.
I removed the comment.  The same comment is present in reload1.c and 
probably should be also removed.
>> +/* Sort pseudos according their slot numbers putting ones with smaller
>> +   numbers first, or last when the frame pointer is not needed.	 So
>> +   pseudos with the first slot will be finally addressed with smaller
>> +   address displacement.  */
>> +static int
>> +pseudo_reg_slot_compare (const void *v1p, const void *v2p)
>> +{
>> +  const int regno1 = *(const int *) v1p;
>> +  const int regno2 = *(const int *) v2p;
>> +  int diff, slot_num1, slot_num2;
>> +  int total_size1, total_size2;
>> +
>> +  slot_num1 = pseudo_slots[regno1].slot_num;
>> +  slot_num2 = pseudo_slots[regno2].slot_num;
>> +  if ((diff = slot_num1 - slot_num2) != 0)
>> +    return (frame_pointer_needed
>> +	    || !FRAME_GROWS_DOWNWARD == STACK_GROWS_DOWNWARD ? diff : -diff);
> The comment doesn't quite describe the condition.  Maybe:
>
> /* Sort pseudos according to their slots, putting the slots in the order
>     that they should be allocated.  Slots with lower numbers have the highest
>     priority and should get the smallest displacement from the stack or
>     frame pointer (whichever is being used).
>
>     The first allocated slot is always closest to the frame pointer,
>     so prefer lower slot numbers when frame_pointer_needed.  If the stack
>     and frame grow in the same direction, then the first allocated slot is
>     always closest to the initial stack pointer and furthest away from the
>     final stack pointer, so allocate higher numbers first when using the
>     stack pointer in that case.  The reverse is true if the stack and
>     frame grow in opposite directions.  */
I used your comment.  Thanks.
>> +  total_size1 = MAX (PSEUDO_REGNO_BYTES (regno1),
>> +		     GET_MODE_SIZE (lra_reg_info[regno1].biggest_mode));
>> +  total_size2 = MAX (PSEUDO_REGNO_BYTES (regno2),
>> +		     GET_MODE_SIZE (lra_reg_info[regno2].biggest_mode));
>> +  if ((diff = total_size2 - total_size1) != 0)
>> +    return diff;
> I think this could do with a bit more commentary.  When is biggest_mode
> ever smaller than PSEUDO_REGNO_BYTES?  Is that for pseudos that are only
> ever referenced as lowpart subregs?  If so, why does PSEUDO_REGNO_BYTES
> matter for those registers here but not when calculating biggest_mode?
The MAX code has no sense to me too (probably it was wrongly adapted 
from somewhere).  So I removed MAX.
>> +/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS.  Put the
>> +   pseudos which did not get a spill hard register at the beginning of
>> +   array PSEUDO_REGNOS.	 Return the number of such pseudos.  */
> It'd be worth saying that PSEUDO_REGNOS is sorted in order of highest
> frequency first.
Fixed.
>> +  bitmap set_jump_crosses = regstat_get_setjmp_crosses ();
> I notice you use "set_jump" here and "setjump" in parts of 7a.patch.
> Probably better to use setjmp across the board.
Fixed.  setjump is also used in other parts of GCC.
>> +  /* Hard registers which can not be used for any purpose at given
>> +     program point because they are unallocatable or already allocated
>> +     for other pseudos.	 */
>> +  HARD_REG_SET *reserved_hard_regs;
>> +
>> +  if (! lra_reg_spill_p)
>> +    return n;
>> +  /* Set up reserved hard regs for every program point.	 */
>> +  reserved_hard_regs = (HARD_REG_SET *) xmalloc (sizeof (HARD_REG_SET)
>> +						 * lra_live_max_point);
>> +  for (p = 0; p < lra_live_max_point; p++)
>> +    COPY_HARD_REG_SET (reserved_hard_regs[p], lra_no_alloc_regs);
>> +  for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
>> +    if (lra_reg_info[i].nrefs != 0
>> +	&& (hard_regno = lra_get_regno_hard_regno (i)) >= 0)
>> +      for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
>> +	for (p = r->start; p <= r->finish; p++)
>> +	  lra_add_hard_reg_set (hard_regno, lra_reg_info[i].biggest_mode,
>> +				&reserved_hard_regs[p]);
> Since compilation time seems to be all the rage, I wonder if it would be
> quicker to have one live range list per hard register.  Then:
>> +      for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
>> +	for (p = r->start; p <= r->finish; p++)
>> +	  IOR_HARD_REG_SET (conflict_hard_regs, reserved_hard_regs[p]);
> would just be checking for live range intersection and:
>
>> +      /* Update reserved_hard_regs.  */
>> +      for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
>> +	for (p = r->start; p <= r->finish; p++)
>> +	  lra_add_hard_reg_set (hard_regno, lra_reg_info[regno].biggest_mode,
>> +				&reserved_hard_regs[p]);
> would again be a merge.
>
> Just an idea, not a merge requirement.  If you've already tried this and
> found it to be worse, that might be worth a comment.
I checked profiles and coverage for different tests (including huge 
ones) and did not see this code is critical.  But probably it is worth 
to try.  It might a bit more complicated for multi-register pseudos.  
I've just used the same pattern as in IRA fast allocation.  On the other 
hand, stack slots are allocated as you propose.  It might be good to 
unify the code.  I'll put it on my (long) todo list.
>> +      first = pseudo_slots[regno].first = &pseudo_slots[slots[slot_num].regno];
>> +      pseudo_slots[regno].next = pseudo_slots[slots[slot_num].regno].next;
>> +      first->next = &pseudo_slots[regno];
> Very minor nit, but I think this would be easier to read if the middle
> line also used "first->next".
Fixed.
>> +/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS.  Put the
>> +   pseudos which did not get a spill hard register at the beginning of
>> +   array PSEUDO_REGNOS.	 Return the number of such pseudos.  */
> Here too I think it's worth mentioning that PSEUDO_REGNOS is sorted
> with highest frequency first.
Fixed.
>> +/* Recursively process LOC in INSN and change spilled pseudos to the
>> +   corresponding memory or spilled hard reg.  Ignore spilled pseudos
>> +   created from the scratches.	*/
>> +static bool
>> +remove_pseudos (rtx *loc, rtx insn)
> The return value is now ignored -- we know in advance which insns need
> changing -- so this could be simplified.
Fixed.
>> +/* Change spilled pseudos into memory or spill hard regs.  The
>> +   function put changed insns on the constraint stack (these insns
>> +   will be considered on the next constraint pass).  The changed insns
>> +   are all insns in which pseudos were changed.	 */
> s/The function put/Put/
Fixed
>> +/* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
>> +   LR_BITMAP (a BB live info bitmap).  */
>> +static void
>> +update_live_info (bitmap lr_bitmap)
>> +{
>> +  unsigned int j;
>> +  bitmap_iterator bi;
>> +
>> +  bitmap_clear (&removed_pseudos_bitmap);
>> +  bitmap_clear (&used_pseudos_bitmap);
>> +  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
>> +			    FIRST_PSEUDO_REGISTER, j, bi)
>> +    {
>> +      bitmap_set_bit (&removed_pseudos_bitmap, j);
>> +      bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
>> +    }
>> +  if (! bitmap_empty_p (&removed_pseudos_bitmap))
>> +    {
>> +      bitmap_and_compl_into (lr_bitmap, &removed_pseudos_bitmap);
>> +      bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
>> +    }
>> +}
> Might be wrong, but it looks like nothing really uses removed_pseudos_bitmap
> outside this function.  I think this could simply be:
> /* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
>     LR_BITMAP (a BB live info bitmap).  */
> static void
> update_live_info (bitmap lr_bitmap)
> {
>    unsigned int j;
>    bitmap_iterator bi;
>
>    bitmap_clear (&used_pseudos_bitmap);
>    EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
> 			    FIRST_PSEUDO_REGISTER, j, bi)
>      bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
>    if (! bitmap_empty_p (&used_pseudos_bitmap))
>      {
>        bitmap_and_compl_into (lr_bitmap, &coalesced_pseudos_bitmap);
>        bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
>      }
> }
Yes.  Thanks for finding such nontrivial change.  I fixed it.
>> +	    && mem_move_p (sregno, dregno)
>> +	    /* Don't coalesce inheritance pseudos because spilled
>> +	       inheritance pseudos will be removed in subsequent 'undo
>> +	       inheritance' pass.  */
>> +	    && lra_reg_info[sregno].restore_regno < 0
>> +	    && lra_reg_info[dregno].restore_regno < 0
>> +	    /* We undo splits for spilled pseudos whose live ranges
>> +	       were split.  So don't coalesce them, it is not
>> +	       necessary and the undo transformations would be
>> +	       wrong.  */
>> +	    && ! bitmap_bit_p (&split_origin_bitmap, sregno)
>> +	    && ! bitmap_bit_p (&split_origin_bitmap, dregno)
>> +	    && ! side_effects_p (set)
>> +	    /* Don't coalesces bound pseudos.  Bound pseudos has own
>> +	       rules for finding live ranges.  It is hard to maintain
>> +	       this info with coalescing and it is not worth to do
>> +	       it.  */
>> +	    && ! bitmap_bit_p (&lra_bound_pseudos, sregno)
>> +	    && ! bitmap_bit_p (&lra_bound_pseudos, dregno)
>> +	    /* We don't want to coalesce regnos with equivalences,
>> +	       at least without updating this info.  */
>> +	    && ira_reg_equiv[sregno].constant == NULL_RTX
>> +	    && ira_reg_equiv[sregno].memory == NULL_RTX
>> +	    && ira_reg_equiv[sregno].invariant == NULL_RTX
>> +	    && ira_reg_equiv[dregno].constant == NULL_RTX
>> +	    && ira_reg_equiv[dregno].memory == NULL_RTX
>> +	    && ira_reg_equiv[dregno].invariant == NULL_RTX
> Probably personal preference, but I think this would be easier
> to read as:
>
> 	    && coalescable_reg_p (sregno)
> 	    && coalescable_reg_p (dregno)
> 	    && !side_effects_p (set)
>
> with coalescable_reg_p checking reg_renumber (from mem_move_p)
> and the open-coded stuff in the quote above.
Ok.  Fixed.
>> +  for (; mv_num != 0;)
>> +    {
>> +      for (i = 0; i < mv_num; i++)
>> +	{
>> +	  mv = sorted_moves[i];
>> +	  set = single_set (mv);
>> +	  lra_assert (set != NULL && REG_P (SET_SRC (set))
>> +		      && REG_P (SET_DEST (set)));
>> +	  sregno = REGNO (SET_SRC (set));
>> +	  dregno = REGNO (SET_DEST (set));
>> +	  if (! lra_intersected_live_ranges_p
>> +		(lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
>> +		 lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges))
>> +	    {
>> +	      coalesced_moves++;
>> +	      if (lra_dump_file != NULL)
>> +		fprintf
>> +		  (lra_dump_file,
>> +		   "	  Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
>> +		   INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
>> +		   dregno, ORIGINAL_REGNO (SET_DEST (set)),
>> +		   BLOCK_FOR_INSN (mv)->frequency);
>> +	      bitmap_ior_into (&involved_insns_bitmap,
>> +			       &lra_reg_info[sregno].insn_bitmap);
>> +	      bitmap_ior_into (&involved_insns_bitmap,
>> +			       &lra_reg_info[dregno].insn_bitmap);
>> +	      merge_pseudos (sregno, dregno);
>> +	      i++;
>> +	      break;
>> +	    }
>> +	}
>> +      /* Collect the rest of copies.  */
>> +      for (n = 0; i < mv_num; i++)
>> +	{
>> +	  mv = sorted_moves[i];
>> +	  set = single_set (mv);
>> +	  lra_assert (set != NULL && REG_P (SET_SRC (set))
>> +		      && REG_P (SET_DEST (set)));
>> +	  sregno = REGNO (SET_SRC (set));
>> +	  dregno = REGNO (SET_DEST (set));
>> +	  if (first_coalesced_pseudo[sregno] != first_coalesced_pseudo[dregno])
>> +	    sorted_moves[n++] = mv;
>> +	  else if (lra_dump_file != NULL)
>> +	    {
>> +	      coalesced_moves++;
>> +	      fprintf
>> +		(lra_dump_file, "      Coalescing move %i:r%d-r%d (freq=%d)\n",
>> +		 INSN_UID (mv), sregno, dregno,
>> +		 BLOCK_FOR_INSN (mv)->frequency);
>> +	    }
>> +	}
>> +      mv_num = n;
> I'm probably being dense here, sorry, but why the nested loops?
> Why can't we have one loop along the lines of:
>
>        for (i = 0; i < mv_num; i++)
> 	{
> 	  mv = sorted_moves[i];
> 	  set = single_set (mv);
> 	  lra_assert (set != NULL && REG_P (SET_SRC (set))
> 		      && REG_P (SET_DEST (set)));
> 	  sregno = REGNO (SET_SRC (set));
> 	  dregno = REGNO (SET_DEST (set));
> 	  if (first_coalesced_pseudo[sregno] == first_coalesced_pseudo[dregno])
> 	    {
> 	      coalesced_moves++;
> 	      fprintf
> 		(lra_dump_file, "      Coalescing move %i:r%d-r%d (freq=%d)\n",
> 		 INSN_UID (mv), sregno, dregno,
> 		 BLOCK_FOR_INSN (mv)->frequency);
> 	      /* We updated involved_insns_bitmap when doing the mrege */
> 	    }
> 	  else if (!(lra_intersected_live_ranges_p
> 		     (lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
> 		      lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges)))
> 	    {
> 	      coalesced_moves++;
> 	      if (lra_dump_file != NULL)
> 		fprintf
> 		  (lra_dump_file,
> 		   "	  Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
> 		   INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
> 		   dregno, ORIGINAL_REGNO (SET_DEST (set)),
> 		   BLOCK_FOR_INSN (mv)->frequency);
> 	      bitmap_ior_into (&involved_insns_bitmap,
> 			       &lra_reg_info[sregno].insn_bitmap);
> 	      bitmap_ior_into (&involved_insns_bitmap,
> 			       &lra_reg_info[dregno].insn_bitmap);
> 	      merge_pseudos (sregno, dregno);
> 	    }
> 	}
>
> (completely untested)
As I remember, it was more complicated coalesced algorithm where sorting 
was done on each iteration after one move coalesce.

I changed the code.
>
>> +	    if ((set = single_set (insn)) != NULL_RTX
>> +		&& REG_P (SET_DEST (set)) && REG_P (SET_SRC (set))
>> +		&& REGNO (SET_SRC (set)) == REGNO (SET_DEST (set))
>> +		&& ! side_effects_p (set))
> Maybe use set_noop_p here?
>
Ok.  Why not.  The code is rarely executed so more generalize code could 
be used.  I changed it to set_noop_p.
Vladimir Makarov Oct. 12, 2012, 3:37 a.m. UTC | #6
On 10/04/2012 11:50 AM, Richard Sandiford wrote:
> Hi Vlad,
>
> This message is for lra-assigns.c.  Sorry for the piecemeal reviews,
> never sure when I'll get time...
>
>> +/* This file contains a pass mostly assigning hard registers to reload
>> +   pseudos.  There is no any RTL code transformation on this pass.
> Maybe:
>
> /* This file's main objective is to assign hard registers to reload pseudos.
>     It also tries to allocate hard registers to other pseudos, but at a lower
>     priority than the reload pseudos.  The pass does not transform the RTL.
>
> if that's accurate.
Yes.  That is better.  I used your comment.
>> +   Reload pseudos get what they need (usually) hard registers in
>> +   anyway possibly by spilling non-reload pseudos and by assignment
>> +   reload pseudos with smallest number of available hard registers
>> +   first.
>> +
>> +   If reload pseudos can get hard registers only through spilling
>> +   other pseudos, we choose what pseudos to spill taking into account
>> +   how given reload pseudo benefits and also how other reload pseudos
>> +   not assigned yet benefit too (see function spill_for).
> Maybe:
>
>     We must allocate a hard register to every reload pseudo.  We try to
>     increase the chances of finding a viable allocation by assigning the
>     pseudos in order of fewest available hard registers first.  If we
>     still fail to find a hard register, we spill other (non-reload)
>     pseudos in order to make room.
>
>     assign_hard_regno_for allocates registers without spilling.
>     spill_for does the same with spilling.  Both functions use
>     a cost model to determine the most profitable choice of
>     hard and spill registers.
Ok.  I just changed two sentences a bit:

  find_hard_regno_for finds hard registers for allocation without  
spilling.  spill_for does the same with spilling.

>> +   Non-reload pseudos can get hard registers too if it is possible and
>> +   improves the code.  It might be possible because of spilling
>> +   non-reload pseudos on given pass.
> Maybe:
>
>     Once we have finished allocating reload pseudos, we also try to
>     assign registers to other (non-reload) pseudos.  This is useful
>     if hard registers were freed up by the spilling just described.
>
Fixed.
>> +   We try to assign hard registers processing pseudos by threads.  The
>> +   thread contains reload and inheritance pseudos connected by copies
>> +   (move insns).  It improves the chance to get the same hard register
>> +   to pseudos in the thread and, as the result, to remove some move
>> +   insns.
> Maybe:
>
>     We try to assign hard registers by collecting pseudos into threads.
>     These threads contain reload and inheritance pseudos that are connected
>     by copies (move insns).  Doing this improves the chances of pseudos
>     in the thread getting the same hard register and, as a result,
>     of allowing some move insns to be deleted.
Fixed.
>> +   When we assign hard register to a pseudo, we decrease the cost of
>> +   the hard registers for corresponding pseudos connected by copies.
> Maybe:
>
>     When we assign a hard register to a pseudo, we decrease the cost of
>     using the same hard register for pseudos that are connected by copies.
Fixed.
>> +   If two hard registers are equally good for assigning the pseudo
>> +   with hard register cost point of view, we prefer a hard register in
>> +   smaller register bank.  By default, there is only one register
>> +   bank.  A target can define register banks by hook
>> +   register_bank. For example, x86-64 has a few register banks: hard
>> +   regs with and without REX prefixes are in different banks.  It
>> +   permits to generate smaller code as insns without REX prefix are
>> +   shorter.
> Maybe:
>
>     If two hard registers have the same frequency-derived cost,
>     we prefer hard registers in lower register banks.  The mapping
>     of registers to banks is controlled by the register_bank target hook.
>     For example, x86-64 has a few register banks: hard registers with and
>     without REX prefixes are in different banks.  This permits us
>     to generate smaller code as insns without REX prefixes are shorter.
>
> although this might change if the name of the hook changes.

With recent change in the hook name, I modified it to:

    If two hard registers have the same frequency-derived cost, we
    prefer hard registers with bigger priorities.  The mapping of
    registers to priorities is controlled by the register_priority
    target hook.  For example, x86-64 has a few register priorities:
    hard registers with and without REX prefixes have different
    priorities.  This permits us to generate smaller code as insns
    without REX prefixes are shorter.

>> +/* Info about pseudo used during the assignment pass.  Thread is a set
>> +   of connected reload and inheritance pseudos with the same set of
>> +   available hard reg set.  Thread is a pseudo itself for other
>> +   cases.  */
>> +struct regno_assign_info
> Maybe:
>
> /* Information about the thread to which a pseudo belongs.  Threads are
>     a set of connected reload and inheritance pseudos with the same set of
>     available hard registers.  Lone registers belong to their own threads.  */
Fixed.
> Although the condition seems to be:
>> +	&& (ira_class_hard_regs_num[regno_allocno_class_array[regno1]]
>> +	    == ira_class_hard_regs_num[regno_allocno_class_array[regno2]]))
> i.e. the same _number_ of available hard regs, but not necessarily the
> same set.
It should be the same in most cases.  This condition is just faster 
approximation of the same available hard reg set.
> "thread" might be more mnemonic than "regno_assign" in this file,
> but that's bikeshed stuff.
>
>> +  for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
>> +    {
>> +      regno_assign_info[i].first = i;
>> +      regno_assign_info[i].next = -1;
>> +      regno_assign_info[i].freq = lra_reg_info[i].freq;
>> +    }
> Minor speedup, but it's probably worth caching max_reg_num () rather than
> calling it in each loop iteration.  Several other loops with the same thing.
That is not a critical code and LTO could solve the problem. But as we 
usually don't use it for building GCC, I rewrote it.
>> +/* Process a pseudo copy with execution frequency COPY_FREQ connecting
>> +   REGNO1 and REGNO2 to form threads.  */
>> +static void
>> +process_copy_to_form_thread (int regno1, int regno2, int copy_freq)
>> +{
>> +  int last, regno1_first, regno2_first;
>> +
>> +  lra_assert (regno1 >= lra_constraint_new_regno_start
>> +	      && regno2 >= lra_constraint_new_regno_start);
>> +  regno1_first = regno_assign_info[regno1].first;
>> +  regno2_first = regno_assign_info[regno2].first;
>> +  if (regno1_first != regno2_first)
>> +    {
>> +      for (last = regno2_first;
>> +	   regno_assign_info[last].next >= 0;
>> +	   last = regno_assign_info[last].next)
>> +	regno_assign_info[last].first = regno1_first;
>> +      regno_assign_info[last].next = regno_assign_info[regno1_first].next;
>> +      regno_assign_info[regno1_first].first = regno2_first;
>> +      regno_assign_info[regno1_first].freq
>> +	+= regno_assign_info[regno2_first].freq;
> Couple of things I don't understand here:
>
> - Why don't we set regno_assign_info[last].first (for final "last")
>    to regno1_first?  I.e. the loop stops while "last" is still valid,
>    but only assigns to that element's "next" field, leaving "first"
>    as before.
>
> - I might be wrong, but should:
>
>      regno_assign_info[regno1_first].first = regno2_first;
>
>    be:
>
>      regno_assign_info[regno1_first].next = regno2_first;
>
>    so that the list becomes:
>
>      regno1_first regno2_first ... last ...
>
>    The current version seems to create a cycle:
>
>     regno_assign_info[regno1_first].first == regno2_first
>     regno_assign_info[regno2_first].first == regno1_first
It is a typo.  Fixed.  Thanks.  There is no looping danger as we don't 
traverse this list by field first but it results in assignment order 
different from what I assumed.
>> +/* Update LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER by
>> +   pseudo REGNO assignment or by the pseudo spilling if FREE_P.	 */
> Maybe:
>
> /* Update the LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER
>     entries for pseudo REGNO.  Assume that the register has been
>     spilled if FREE_P, otherwise assume that it has been assigned
>     reg_renumber[REGNO] (if >= 0).  */
Fixed.  I also added a comment about recently added 
insert_in_live_range_start_chain call.
>> +/* Find and return best (or TRY_ONLY_HARD_REGNO) free hard register
>> +   for pseudo REGNO.  In the failure case, return a negative number.
>> +   Return through *COST the cost of usage of the hard register for the
>> +   pseudo.  Best free hard register has smallest cost of usage for
>> +   REGNO or smallest register bank if the cost is the same.  */
> Maybe:
>
> /* Try to find a free hard register for pseudo REGNO.  Return the
>     hard register on success and set *COST to the cost of using
>     that register.  (If several registers have equal cost, the one with
>     the lowest register bank wins.)  Return -1 on failure.
>
>     If TRY_ONLY_HARD_REGNO >= 0, consider only that hard register,
>     otherwise consider all hard registers in REGNO's class.  */
>
Fixed with changing bank to priority.
>> +      if (hard_regno_costs_check[hard_regno] != curr_hard_regno_costs_check)
>> +	hard_regno_costs[hard_regno] = 0;
>> +      hard_regno_costs_check[hard_regno] = curr_hard_regno_costs_check;
>> +      hard_regno_costs[hard_regno]
>> +	-= lra_reg_info[regno].preferred_hard_regno_profit1;
> This pattern occurs several times.  I think it'd be clearer to have
> an inline helper function (adjust_hard_regno_cost, or whatever).
Done.
>> +  /* That is important for allocation of multi-word pseudos.  */
>> +  IOR_COMPL_HARD_REG_SET (conflict_set, reg_class_contents[rclass]);
> Maybe:
>
>    /* Make sure that all registers in a multi-word pseudo belong to the
>       required class.  */
Fixed.
>> +	  /* We can not use prohibited_class_mode_regs because it is
>> +	     defined not for all classes.  */
> s/defined not/not defined/
Fixed.
>> +	  && ! TEST_HARD_REG_BIT (impossible_start_hard_regs, hard_regno)
>> +	  && (nregs_diff == 0
>> +#ifdef WORDS_BIG_ENDIAN
>> +	      || (hard_regno - nregs_diff >= 0
>> +		  && TEST_HARD_REG_BIT (reg_class_contents[rclass],
>> +					hard_regno - nregs_diff))
>> +#else
>> +	      || TEST_HARD_REG_BIT (reg_class_contents[rclass],
>> +				    hard_regno + nregs_diff)
>> +#endif
>> +	      ))
> impossible_start_hard_regs is set up as:
>
>> +	conflict_hr = live_pseudos_reg_renumber[conflict_regno];
>> +	nregs = (hard_regno_nregs[conflict_hr]
>> +		 [lra_reg_info[conflict_regno].biggest_mode]);
>> +	/* Remember about multi-register pseudos.  For example, 2 hard
>> +	   register pseudos can start on the same hard register but can
>> +	   not start on HR and HR+1/HR-1.  */
>> +	for (hr = conflict_hr + 1;
>> +	     hr < FIRST_PSEUDO_REGISTER && hr < conflict_hr + nregs;
>> +	     hr++)
>> +	  SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
>> +	for (hr = conflict_hr - 1;
>> +	     hr >= 0 && hr + hard_regno_nregs[hr][biggest_mode] > conflict_hr;
>> +	     hr--)
>> +	  SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
> which I don't think copes with big-endian cases like:
>
>    other hard reg in widest mode:    ........XXXX...
>    impossible_start_regs:            .....XXX.XXX...
>    this hard reg in pseudo's mode:   ............XX.
>    this hard reg in widest mode:     ..........XXXX.
>
> which AIUI is an invalid choice.
>
> There are other corner cases too.  If the other hard reg is narrower than
> its widest mode, and that widest mode is wider than the current regno's
> widest mode, then on big-endian targets we could have:
>
>    other hard reg in its own mode:   ........XX....
>    other hard reg in widest mode:    ......XXXX.....
>    impossible_start_regs:            .......X.XXX... (*)
>    this hard reg in pseudo's mode:   .....XX........
>    this hard reg in widest mode:     .....XX........
>
> (*) note no big-endian adjustment for the other hard reg's widest mode here.
>
> Maybe it would be easier to track impossible end regs for
> big-endian targets?
I'll look at this tomorrow.
>> +/* Update HARD_REGNO preference for pseudos connected (directly or
>> +   indirectly) to a pseudo with REGNO.	Use divisor DIV to the
>> +   corresponding copy frequency for the hard regno cost preference
>> +   calculation.	 The more indirectly a pseudo connected, the less the
>> +   cost preference.  It is achieved by increasing the divisor for each
>> +   next recursive level move.  */
> "cost preference" seems a bit contradictory.  Maybe:
>
> /* Update the preference for using HARD_REGNO for pseudos that are
>     connected directly or indirectly with REGNO.  Apply divisor DIV
>     to any preference adjustments.
>
>     The more indirectly a pseudo is connected, the smaller its effect
>     should be.  We therefore increase DIV on each "hop".  */
>
Fixed.  By the way, it is your invention from IRA.
>> +static void
>> +update_hard_regno_preference (int regno, int hard_regno, int div)
>> +{
>> +  int another_regno, cost;
>> +  lra_copy_t cp, next_cp;
>> +
>> +  /* Search depth 5 seems to be enough.	 */
>> +  if (div > (1 << 5))
>> +    return;
>> +  for (cp = lra_reg_info[regno].copies; cp != NULL; cp = next_cp)
>> +    {
>> +      if (cp->regno1 == regno)
>> +	{
>> +	  next_cp = cp->regno1_next;
>> +	  another_regno = cp->regno2;
>> +	}
>> +      else if (cp->regno2 == regno)
>> +	{
>> +	  next_cp = cp->regno2_next;
>> +	  another_regno = cp->regno1;
>> +	}
>> +      else
>> +	gcc_unreachable ();
>> +      if (reg_renumber[another_regno] < 0
>> +	  && (update_hard_regno_preference_check[another_regno]
>> +	      != curr_update_hard_regno_preference_check))
>> +	{
>> +	  update_hard_regno_preference_check[another_regno]
>> +	    = curr_update_hard_regno_preference_check;
>> +	  cost = cp->freq < div ? 1 : cp->freq / div;
>> +	  lra_setup_reload_pseudo_preferenced_hard_reg
>> +	    (another_regno, hard_regno, cost);
>> +	  update_hard_regno_preference (another_regno, hard_regno, div * 2);
>> +	}
>> +    }
>> +}
> Using a depth-first search for this seems a bit dangerous, because we
> could end up processing a connected pseudo via a very indirect path
> first, even though it is more tightly connected via a more direct path.
> (Could be a well-known problem, sorry.)
Actually I did on purpose. It is a bit different situation from IRA.  
The vast majority of copies form a straight line (therefore I use term 
threads).  Therefore I use also only two preferences of hard registers 
(from two copies with hard registers).  I searched a balance between 
code simplicity and speed and more sophisticated and slow heuristics.
>> +/* Update REG_RENUMBER and other pseudo preferences by assignment of
>> +   HARD_REGNO to pseudo REGNO and print about it if PRINT_P.  */
>> +void
>> +lra_setup_reg_renumber (int regno, int hard_regno, bool print_p)
>> +{
>> +  int i, hr;
>> +
>> +  if ((hr = hard_regno) < 0)
>> +    hr = reg_renumber[regno];
>> +  reg_renumber[regno] = hard_regno;
>> +  lra_assert (hr >= 0);
>> +  for (i = 0; i < hard_regno_nregs[hr][PSEUDO_REGNO_MODE (regno)]; i++)
>> +    if (hard_regno < 0)
>> +      lra_hard_reg_usage[hr + i] -= lra_reg_info[regno].freq;
>> +    else
>> +      lra_hard_reg_usage[hr + i] += lra_reg_info[regno].freq;
> Is it possible for this function to reallocate a register,
> i.e. for reg_regnumber to be >= 0 both before and after the call?
> If so, I think we'd need two loops.  If not, an assert would be good.
No. I added an assert.
>> +      mode = PSEUDO_REGNO_MODE (spill_regno);
>> +      if (lra_hard_reg_set_intersection_p
>> +	  (live_pseudos_reg_renumber[spill_regno],
>> +	   mode, reg_class_contents[rclass]))
>> +	{
>> +	  hard_regno = live_pseudos_reg_renumber[spill_regno];
> Very minor, sorry, but I think this would be more readable with the
> hard_regno assignment before the condition and hard_regno used in it.
Fixed.
>> +/* Spill some pseudos for a reload pseudo REGNO and return hard
>> +   register which should be used for pseudo after spilling.  The
>> +   function adds spilled pseudos to SPILLED_PSEUDO_BITMAP.  When we
>> +   choose hard register (and pseudos occupying the hard registers and
>> +   to be spilled), we take into account not only how REGNO will
>> +   benefit from the spills but also how other reload pseudos not
>> +   assigned to hard registers yet benefit from the spills too.	*/
> "...not yet assigned to hard registers benefit..."
>
Fixed.
>> +  curr_pseudo_check++; /* Invalidate try_hard_reg_pseudos elements.  */
> Comment on its own line.
Fixed.
>> +  bitmap_clear (&ignore_pseudos_bitmap);
>> +  bitmap_clear (&best_spill_pseudos_bitmap);
>> +  EXECUTE_IF_SET_IN_BITMAP (&lra_reg_info[regno].insn_bitmap, 0, uid, bi)
>> +    {
>> +      struct lra_insn_reg *ir;
>> +
>> +      for (ir = lra_get_insn_regs (uid); ir != NULL; ir = ir->next)
>> +	if (ir->regno >= FIRST_PSEUDO_REGISTER)
>> +	  bitmap_set_bit (&ignore_pseudos_bitmap, ir->regno);
>> +    }
> The name "ignore_pseudos_bitmap" doesn't seem to describe how the set is
> actually used.  We still allow the pseudos to be spilled, but the number
> of such spills is the first-order cost.  Maybe "insn_conflict_pseudos"
> or something like that?
Ok. Fixed.
>> +      /* Spill pseudos.	 */
>> +      CLEAR_HARD_REG_SET (spilled_hard_regs);
>> +      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
>> +	if ((int) spill_regno >= lra_constraint_new_regno_start
>> +	    /* ??? */
>> +	    && ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
>> +	    && ! bitmap_bit_p (&lra_split_pseudos, spill_regno)
>> +	    && ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno))
>> +	  goto fail;
> Leftover ??? (or lacks enough info if it's supposed to be kept)
It is a leftover.  As I remember it was a mark to me to check this code 
when i worked on inheritance and splitting.

I removed it.
>> +	      EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start],
>> +					0, k, bi2)
>> +		sparseset_set_bit (live_range_hard_reg_pseudos, k);
> live_range_hard_reg_pseudos and &live_hard_reg_pseudos[r->start]
> seem like similar quantities.  Was there a reason for using
> sparsesets for one and bitmaps for the other?
>> +	      for (p = r->start + 1; p <= r->finish; p++)
>> +		{
>> +		  lra_live_range_t r2;
>> +		
>> +		  for (r2 = lra_start_point_ranges[p];
>> +		       r2 != NULL;
>> +		       r2 = r2->start_next)
>> +		    if (r2->regno >= lra_constraint_new_regno_start)
>> +		      sparseset_set_bit (live_range_reload_pseudos, r2->regno);
>> +		}
> This is probably just showing my ignorance, but -- taking the above two
> quotes together -- why do we calculate these two live sets in different ways?
> Also, does live_range_reload_pseudos really just contain "reload" pseudos,
> or inheritance pseudos as well?
Thanks for finding this.  live_range_hard_reg_pseudos is not used in 
this function.
As I remember, i used the same code as in find_hard_regno_for and 
live_(hard_)reg_pseudos contained all pseudos including spilled ones.  
But it was too expensive (especially when the register pressure was 
high).  So I started to use less accurate but faster heuristics.

So I am removing

+	      EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start],
+					0, k, bi2)
+		sparseset_set_bit (live_range_hard_reg_pseudos, k);

I changed

p = r->start + 1 to p = r->start


All these variants of code do not result in wrong code generation, only the quality of spilling (which is not worse reload's one in any case).

live_range_reload_pseudos contains inheritance pseudos too (inheritance psedous are also short live range pseudos).  I renamed it.

>> +      /* We are trying to spill a reload pseudo.  That is wrong we
>> +	 should assign all reload pseudos, otherwise we cannot reuse
>> +	 the selected alternatives.  */
>> +      hard_regno = find_hard_regno_for (regno, &cost, -1);
>> +      if (hard_regno >= 0)
>> +	{
> Don't really understand this comment, sorry.
It removed the comment.  It is from an old solution code trying to 
guarantee assignment to the reload pseudo.
> Also, why are we passing -1 to find_hard_regno_for, rather than hard_regno?
> The loop body up till this point has been specifically freeing up registers
> to make hard_regno allocatable.  I realise that, by spilling everything
> that overlaps this range, we might have freed up other registers too,
> and so made others besides hard_regno allocatable.  But wouldn't we want
> to test those other hard registers in "their" iteration of the loop
> instead of this one?  The spills in those iterations ought to be more
> directed (i.e. there should be less incidental spilling).
>
> As things stand, doing an rclass_size * rclass_size scan seems
> unnecessarily expensive, although probably off the radar.
We cannot just pass hard_regno for multi-word pseudo when hard_regno-1 
is already free.
You are right about possibility to speed up the code, although on 
profiles I looked (including the last huge tests) spill_for and 
find_hard_regno_for called from takes few time. That is probably because 
you don't need spill frequently.  Freeing one long live range pseudo 
permits to find hard regno without spilling for many short live pseudos 
(reload and inheritance ones).
Also loop rclass_size * rclass_size is not expensive, the preparation of 
data for the loop is expensive.

I believe it has a potential to speed up spill_for function if we inline 
find_hard_regno_for and remove duplicated code but it will be achieved 
by significant complication of already complicated function.
>> +	  assign_temporarily (regno, hard_regno);
>> +	  n = 0;
>> +	  EXECUTE_IF_SET_IN_SPARSESET (live_range_reload_pseudos, reload_regno)
>> +	    if (live_pseudos_reg_renumber[reload_regno] < 0
>> +		&& (hard_reg_set_intersect_p
>> +		    (reg_class_contents
>> +		     [regno_allocno_class_array[reload_regno]],
>> +		     spilled_hard_regs)))
>> +	      sorted_reload_pseudos[n++] = reload_regno;
>> +	  qsort (sorted_reload_pseudos, n, sizeof (int),
>> +		 reload_pseudo_compare_func);
>> +	  for (j = 0; j < n; j++)
>> +	    {
>> +	      reload_regno = sorted_reload_pseudos[j];
>> +	      if (live_pseudos_reg_renumber[reload_regno] < 0
> Just trying to make sure I understand, but: isn't the final condition in
> this quote redundant?  I thought that was a requirement for the register
> being in sorted_reload_pseudos to begin with.
Yes, it is redundant.  It is a leftover from some experiments with the 
code.  I converted it to assert.
>> +		  && (reload_hard_regno
>> +		      = find_hard_regno_for (reload_regno,
>> +					     &reload_cost, -1)) >= 0
>> +		  && (lra_hard_reg_set_intersection_p
>> +		      (reload_hard_regno, PSEUDO_REGNO_MODE (reload_regno),
>> +		       spilled_hard_regs)))
>> +		{
>> +		  if (lra_dump_file != NULL)
>> +		    fprintf (lra_dump_file, " assign %d(cost=%d)",
>> +			     reload_regno, reload_cost);
>> +		  assign_temporarily (reload_regno, reload_hard_regno);
>> +		  cost += reload_cost;
> It looks like registers that can be reallocated make hard_regno more
> expensive (specifically by reload_cost), but registers that can't be
> reallocated contribute no cost.  Is that right?  Seemed a little odd,
> so maybe worth a comment.
Reload cost is a negative cost.  It is negative base is the pseudo 
frequency.
I added the comment.  The better hard register is, the more negative 
cost is.
> Also, AIUI find_hard_regno_for is trying to allocate the register for
> reload_regno on the basis that reload_regno has the same conflicts as
> the current regno, and so it's only an approximation.  Is that right?
> Might be worth a comment if so (not least to explain why we don't commit
> to this allocation if we end up choosing hard_regno).
Sorry, I did not understand what you are asking.  We try to find best 
pseudos to spill which helps to assign more reload pseudos (on cost 
base) as possible.  Pseudos for which find_hard_regno finds not spilled 
hard regs are ignored as they can be assigned without spilling.
>> +	  if (best_insn_pseudos_num > insn_pseudos_num
>> +	      || (best_insn_pseudos_num == insn_pseudos_num
>> +		  && best_cost > cost))
> Should we check the register bank and levelling here too,
> for consistency?
As I remember I had a quick check of it on a few tests but did not find 
a difference in generated code. I think that is because the probability 
of the same cost is smaller than in assigning code as we usually spill 
different cost pseudos.
   I think we should try it on bigger tests and add such code if it is 
worth, or write a comment about this.  I'll put it on my todo list.  
I'll work on it later on the branch.
>> +      /* Restore the live hard reg pseudo info for spilled pseudos.  */
>> +      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
>> +	update_lives (spill_regno, false);
> I couldn't tell why this was outside the "hard_regno >= 0" condition.
> Do we really change these registers even if find_hard_regno_for fails?
The first we spill some pseudo and then call find_hard_regno_for.  So we 
should restore the state before spilling independently on success of 
find_hard_regno_for.
>> +  /* Spill: */
>> +  EXECUTE_IF_SET_IN_BITMAP (&best_spill_pseudos_bitmap, 0, spill_regno, bi)
> Very minor, but I think it'd be worth asserting that best_hard_regno >= 0
> before this loop.
Unfortunately, in very rare cases, best_hard_regno can be < 0.  That is 
why we have two iteration for assignment of reload pseudos (see comment 
for 2nd iter for reload pseudo assignments.

I've added a comment for the function that it can return negative value.
>> +/* Constraint transformation can use equivalences and they can
>> +   contains pseudos assigned to hard registers.	 Such equivalence
>> +   usage might create new conflicts of pseudos with hard registers
>> +   (like ones used for parameter passing or call clobbered ones) or
>> +   other pseudos assigned to the same hard registers.  Another very
>> +   rare risky transformation is restoring whole multi-register pseudo
>> +   when only one subreg lives and unused hard register is used already
>> +   for something else.
> In a way, I found this comment almost too detailed. :-)  Maybe:
>
> /* The constraints pass is allowed to create equivalences between
>     pseudos that make the current allocation "incorrect" (in the sense
>     that pseudos are assigned to hard registers from their own conflict sets).
>     The global variable lra_risky_transformations_p says whether this might
>     have happened.
>
> if that's accurate.  The detail about when this occurs probably
> belongs above lra_risky_transformations_p, although it's mostly
> there already.  (Haven't got to the ira-conflicts.c stuff yet,
> so no comments about that here.)
Ok. Fixed. It looks better to me.  But more important if it looks better 
to you because you have a fresh look.
>> +   Process pseudos assigned to hard registers (most frequently used
>> +   first), spill if a conflict is found, and mark the spilled pseudos
>> +   in SPILLED_PSEUDO_BITMAP.  Set up LIVE_HARD_REG_PSEUDOS from
>> +   pseudos, assigned to hard registers.	 */
> Why do we spill the most frequently used registers first?  Probably worth
> a comment.
>
It should be less frequently pseudos as an intuitive heuristic. Although 
it does matter as even on all_cp2k_fortran.f90 (500K lines test) I did 
not find that the order affect the generated code.  I fixed the code and 
the comment.
>> +  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
>> +    if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
>> +      {
>> +	if (lra_risky_transformations_p)
>> +	  sorted_pseudos[n++] = i;
>> +	else
>> +	  update_lives (i, false);
>> +      }
>> +  if (! lra_risky_transformations_p)
>> +    return;
> Seems like this would be more logically split into two (the
> lra_risky_transformations_p case and the !lra_risky_transformations_p case).
Fixed.
>> +	    /* If it is multi-register pseudos they should start on
>> +	       the same hard register.	*/
>> +	    || hard_regno != reg_renumber[conflict_regno])
> This seems different from the find_hard_regno_for case, which took
> biggest_mode into account.
I think we should use the common code.  So I'll fix it with the big 
endian problem.
>> +	  /* Don't change reload pseudo allocation.  It might have
>> +	     this allocation for a purpose (e.g. bound to another
>> +	     pseudo) and changing it can result in LRA cycling.	 */
>> +	  if (another_regno < lra_constraint_new_regno_start
>> +	      && (another_hard_regno = reg_renumber[another_regno]) >= 0
>> +	      && another_hard_regno != hard_regno)
> Seems like this excludes split pseudos as well as reload pseudos,
> or are they never included in these copies?  Might be worth mentioning
> them either way.
I fixed it.
> The only general comment I have so far is that it's sometimes
> difficult to follow which types of pseudos are being included
> or excluded by a comparison with lra_constraint_new_regno_start.
> Sometimes the comments talk about "reload pseudos", but other
> similar checks imply that the registers could be inheritance
> pseudos or split pseudos as well.  Some thin inline wrappers
> might help here.
Inheritance, split, reload pseudos created since last constraint pass >= 
lra_constraint_new_regno_start.
Inheritance and split pseudos created on any pass are in the 
corresponding bitmaps.
Inheritance and split pseudos since the last constraint pass has also 
restore_regno >= 0 until split or inheritance transformations are done.

I am putting the comment about this at the top of the file.
>> +	      /* Remember that reload pseudos can be spilled on the
>> +		 1st pass.  */
>> +	      bitmap_clear_bit (&all_spilled_pseudos, regno);
>> +	      assign_hard_regno (hard_regno, regno);
> Maybe:
>
>    /* This register might have been spilled by the previous pass.
>       Indicate that it is no longer spilled.  */
Fixed.
>> +		/* We can use inheritance pseudos in original insns
>> +		   (not reload ones).  */
>> +		if (regno < lra_constraint_new_regno_start
>> +		    || bitmap_bit_p (&lra_inheritance_pseudos, regno)
>> +		    || reg_renumber[regno] < 0)
>> +		  continue;
>> +		sorted_pseudos[nfails++] = regno;
>> +		if (lra_dump_file != NULL)
>> +		  fprintf (lra_dump_file,
>> +			   "	  Spill reload r%d(hr=%d, freq=%d)\n",
>> +			   regno, reg_renumber[regno],
>> +			   lra_reg_info[regno].freq);
> Same comment about types of pseudo as above.  (I.e. the code checks for
> inheritance pseudos, but not split pseudos.)
I modified the comment to

/* A reload pseudo did not get a hard register on the
    first iteration because of the conflict with
    another reload pseudos in the same insn.  So we
    consider only reload pseudos assigned to hard
    registers.  We shall exclude inheritance pseudos as
    they can occur in original insns (not reload ones).
    We can omit the check for split pseudos because
    they occur only in move insns containing non-reload
    pseudos. */

I hope it explains the code.
>> +  bitmap_initialize (&do_not_assign_nonreload_pseudos, &reg_obstack);
>> +  EXECUTE_IF_SET_IN_BITMAP (&lra_inheritance_pseudos, 0, u, bi)
>> +    if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
>> +	&& reg_renumber[u] < 0 && bitmap_bit_p (&lra_inheritance_pseudos, u))
>> +      bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
>> +  EXECUTE_IF_SET_IN_BITMAP (&lra_split_pseudos, 0, u, bi)
>> +    if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
>> +	&& reg_renumber[u] >= 0 && bitmap_bit_p (&lra_split_pseudos, u))
>> +      bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
> The bitmap_bit_p tests look redundant.  Also, the following code is:
>
Fixed.
>> +  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
>> +    if (((i < lra_constraint_new_regno_start
>> +	  && ! bitmap_bit_p (&do_not_assign_nonreload_pseudos, i))
>> +	 || (bitmap_bit_p (&lra_inheritance_pseudos, i)
>> +	     && lra_reg_info[i].restore_regno >= 0)
>> +	 || (bitmap_bit_p (&lra_split_pseudos, i)
>> +	     && lra_reg_info[i].restore_regno >= 0)
>> +	 || bitmap_bit_p (&lra_optional_reload_pseudos, i))
>> +	&& reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
>> +	&& regno_allocno_class_array[i] != NO_REGS)
>> +      sorted_pseudos[n++] = i;
>> +  bitmap_clear (&do_not_assign_nonreload_pseudos);
> where we test very similar things inline, and then clear
> do_not_assign_nonreload_pseudos.  Do we need d_n_a_n_p at all?
No the code is right.  We still need d_n_a_n_p as we can not easily 
calculate from what pseudo the given pseudo was inherited or split.  It 
is different from the loop where new inheritance and split pseudos are 
checked (it is different from their origins marked in d_n_a_n_p.
>> +  if (n != 0 && lra_dump_file != NULL)
>> +    fprintf (lra_dump_file, "  Reassing non-reload pseudos\n");
> "Reassigning"
>
Fixed. :)

Richard, thank you for the review.  This is the most useful review I 
ever had.  There are a lot of code can be simplified.  I had numerous 
experiments with this code (and my be with code in lra-constraints.c) 
during my work on LRA.  Therefore it can contains some artifacts from 
these experiments.
Richard Sandiford Oct. 12, 2012, 2:29 p.m. UTC | #7
Hi Vlad,

Comments for the rest of ira-constraints.c.

Vladimir Makarov <vmakarov@redhat.com> writes:
> +  saved_base_reg = saved_base_reg2 = saved_index_reg = NULL_RTX;
> +  change_p = equiv_address_substitution (&ad, addr_loc, mode, as, code);
> +  if (ad.base_reg_loc != NULL)
> +    {
> +      if (process_addr_reg
> +	  (ad.base_reg_loc, before,
> +	   (ad.base_modify_p && REG_P (*ad.base_reg_loc)
> +	    && find_regno_note (curr_insn, REG_DEAD,
> +				REGNO (*ad.base_reg_loc)) == NULL
> +	    ? after : NULL),
> +	   base_reg_class (mode, as, ad.base_outer_code, ad.index_code)))
> +	change_p = true;
> +      if (ad.base_reg_loc2 != NULL)
> +	*ad.base_reg_loc2 = *ad.base_reg_loc;
> +      saved_base_reg = *ad.base_reg_loc;
> +      lra_eliminate_reg_if_possible (ad.base_reg_loc);
> +      if (ad.base_reg_loc2 != NULL)
> +	{
> +	  saved_base_reg2 = *ad.base_reg_loc2;
> +	  lra_eliminate_reg_if_possible (ad.base_reg_loc2);
> +	}

We unconditionally make *ad.base_reg_loc2 = *ad.base_reg_loc, so it
might be clearer without saved_base_reg2.  More below...

> +      /* The following addressing is checked by constraints and
> +	 usually target specific legitimate address hooks do not
> +	 consider them valid.  */
> +      || GET_CODE (*addr_loc) == POST_DEC || GET_CODE (*addr_loc) == POST_INC
> +      || GET_CODE (*addr_loc) == PRE_DEC || GET_CODE (*addr_loc) == PRE_DEC

typo: two PRE_DECs, although:

> +      || GET_CODE (*addr_loc) == PRE_MODIFY
> +      || GET_CODE (*addr_loc) == POST_MODIFY

the whole lot could just be replaced by ad.base_modify_p, or perhaps
even removed entirely given:

> +      /* In this case we can not do anything because if it is wrong
> +	 that is because of wrong displacement.	 Remember that any
> +	 address was legitimate in non-strict sense before LRA.	 */
> +      || ad.disp_loc == NULL)

It doesn't seem worth validating the address at all for ad.disp_loc == NULL.
E.g. something like:

  if (ad.base_reg_loc != NULL
      && (process_addr_reg
	  (ad.base_reg_loc, before,
	   (ad.base_modify_p && REG_P (*ad.base_reg_loc)
	    && find_regno_note (curr_insn, REG_DEAD,
				REGNO (*ad.base_reg_loc)) == NULL
	    ? after : NULL),
	   base_reg_class (mode, as, ad.base_outer_code, ad.index_code))))
    {
      change_p = true;
      if (ad.base_reg_loc2 != NULL)
        *ad.base_reg_loc2 = *ad.base_reg_loc;
    }

  if (ad.index_reg_loc != NULL
      && process_addr_reg (ad.index_reg_loc, before, NULL, INDEX_REG_CLASS))
    change_p = true;

  /* The address was valid before LRA.  We only change its form if the
     address has a displacement, so if it has no displacement it must
     still be valid.  */
  if (ad.disp_loc == NULL)
    return change_p;

  /* See whether the address is still valid.  Some ports do not check
     displacements for eliminable registers, so we replace them
     temporarily with the elimination target.  */
  saved_base_reg = saved_index_reg = NULL_RTX;
  ...
  if (ok_p)
    return change_p;

> +#ifdef HAVE_lo_sum
> +	  {
> +	    rtx insn;
> +	    rtx last = get_last_insn ();
> +
> +	    /* disp => lo_sum (new_base, disp)	*/
> +	    insn = emit_insn (gen_rtx_SET
> +			      (VOIDmode, new_reg,
> +			       gen_rtx_HIGH (Pmode, copy_rtx (*ad.disp_loc))));
> +	    code = recog_memoized (insn);
> +	    if (code >= 0)
> +	      {
> +		rtx save = *ad.disp_loc;
> +
> +		*ad.disp_loc = gen_rtx_LO_SUM (Pmode, new_reg, *ad.disp_loc);
> +		if (! valid_address_p (mode, *ad.disp_loc, as))
> +		  {
> +		    *ad.disp_loc = save;
> +		    code = -1;
> +		  }
> +	      }
> +	    if (code < 0)
> +	      delete_insns_since (last);
> +	  }
> +#endif

Nice :-)

Purely for the record, I wondered whether the high part should be
generated with emit_move_insn(_1) instead, with the rhs of the move
being the HIGH rtx.  That would allow targets to cope with cases where
the high part isn't represented directly as a HIGH.  E.g. on MIPS and
Alpha, small-data accesses use the global register as the high part instead.

However, both MIPS and Alpha accept small-data addresses as legitimate
constants and addresses before and during reload and only introduce the
split form after reload.  And I think that's how any other cases that
aren't simple HIGHs should be handled too.  E.g. MIPS also represents
GOT page loads as HIGHs until after reload, and only then lowers the
HIGH to a GOT load.  Allowing the backend to generate anything other
than a plain HIGH set here would be a double-edged sword.

So after all that I agree that the gen_rtx_SET above is better than
calling the move expanders.

> +	  /* index * scale + disp => new base + index * scale  */
> +	  enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
> +
> +	  lra_assert (INDEX_REG_CLASS != NO_REGS);
> +	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
> +	  lra_assert (GET_CODE (*addr_loc) == PLUS);
> +	  lra_emit_move (new_reg, *ad.disp_loc);
> +	  if (CONSTANT_P (XEXP (*addr_loc, 1)))
> +	    XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0);
> +	  XEXP (*addr_loc, 0) = new_reg;

The canonical form is (plus (mult ...) (reg)) rather than
(plus (reg) (mult ...)), but it looks like we create the latter.
I realise you try both forms here:

> +	  /* Some targets like ARM, accept address operands in
> +	     specific order -- try exchange them if necessary.	*/
> +	  if (! valid_address_p (mode, *addr_loc, as))
> +	    {
> +	      exchange_plus_ops (*addr_loc);
> +	      if (! valid_address_p (mode, *addr_loc, as))
> +		exchange_plus_ops (*addr_loc);
> +	    }

but I think we should try the canonical form first.  And I'd prefer it
if we didn't try the other form at all, especially in 4.8.  It isn't
really the backend's job to reject non-canonical rtl.  This might well
be another case where some targets need a (hopefully small) tweak in
order to play by the rules.

Also, I suppose this section of code feeds back to my question on
Wednesday about the distinction that LRA seems to make between the
compile-time constant in:

  (plus (reg X1) (const_int Y1))

and the link-time constant in:

  (plus (reg X2) (symbol_ref Y2))

It looked like extract_address_regs classified X1 as a base register and
X2 as an index register.  The difference between the two constants has
no run-time significance though, and I think we should handle both X1
and X2 as base registers (as I think reload does).

I think the path above would then be specific to scaled indices.
In the original address the "complex" index must come first and the
displacement second.  In the modified address, the index would stay
first and the new base register would be second.  More below.

> +      /* We don't use transformation 'base + disp => base + new index'
> +	 because of bad practice used in some machine descriptions
> +	 (see comments for emit_spill_move).  */
> +      /* base + disp => new base  */

As before when commenting on emit_spill_move, I think we should leave
the "bad machine description" stuff out of 4.8 and treat fixing the
machine descriptions as part of the LRA port.

In this case I think there's another reason not to reload the
displacement into an index though: IIRC postreload should be able
to optimise a sequence of address reloads that have the same base
and different displacements.  LRA itself might try using "anchor"
bases in future -- although obviously not in the initial merge --
since that was one thing that LEGITIMIZE_RELOAD_ADDRESS was used for.

E.g. maybe the justification could be:

      /* base + disp => new base  */
      /* Another option would be to reload the displacement into an
	 index register.  However, postreload has code to optimize
	 address reloads that have the same base and different
	 displacements, so reloading into an index register would
	 not necessarily be a win.  */

> +      /* base + scale * index + disp => new base + scale * index  */
> +      new_reg = base_plus_disp_to_reg (mode, as, &ad);
> +      *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc);
> +      if (! valid_address_p (mode, *addr_loc, as))
> +	{
> +	  /* Some targets like ARM, accept address operands in
> +	     specific order -- try exchange them if necessary.	*/
> +	  exchange_plus_ops (*addr_loc);
> +	  if (! valid_address_p (mode, *addr_loc, as))
> +	    exchange_plus_ops (*addr_loc);
> +	}

Same comment as above about canonical rtl.  Here we can have two
registers -- in which case the base should come first -- or a more
complex index -- in which case the index should come first.

We should be able to pass both rtxes to simplify_gen_binary (PLUS, ...),
with the operands in either order, and let it take care of the details.
Using simplify_gen_binary would help with the earlier index+disp case too.

> +  /* If this is post-increment, first copy the location to the reload reg.  */
> +  if (post && real_in != result)
> +    emit_insn (gen_move_insn (result, real_in));

Nit, but real_in != result can never be true AIUI, and I was confused how
the code could be correct in that case.  Maybe just remove it, or make
it an assert?

> +  /* We suppose that there are insns to add/sub with the constant
> +     increment permitted in {PRE/POST)_{DEC/INC/MODIFY}.  At least the
> +     old reload worked with this assumption.  If the assumption
> +     becomes wrong, we should use approach in function
> +     base_plus_disp_to_reg.  */
> +  if (in == value)
> +    {
> +      /* See if we can directly increment INCLOC.  */
> +      last = get_last_insn ();
> +      add_insn = emit_insn (plus_p
> +			    ? gen_add2_insn (incloc, inc)
> +			    : gen_sub2_insn (incloc, inc));
> +
> +      code = recog_memoized (add_insn);
> +      /* We should restore recog_data for the current insn.  */

Looks like this comment might be a left-over, maybe from before the
cached insn data?

> +      /* Restore non-modified value for the result.  We prefer this
> +	 way because it does not require an addition hard
> +	 register.  */
> +      if (plus_p)
> +	{
> +	  if (CONST_INT_P (inc))
> +	    emit_insn (gen_add2_insn (result, GEN_INT (-INTVAL (inc))));
> +	  else
> +	    emit_insn (gen_sub2_insn (result, inc));
> +	}
> +      else if (CONST_INT_P (inc))
> +	emit_insn (gen_add2_insn (result, inc));

The last two lines look redundant.  The behaviour is the same as for
the following else:

> +      else
> +	emit_insn (gen_add2_insn (result, inc));

and I don't think there are any cases where !plus && CONST_INT_P (inc)
would hold.

> +/* Main entry point of this file: search the body of the current insn

s/this file/the constraints code/, since it's a static function.

> +  if (change_p)
> +    /* Changes in the insn might result in that we can not satisfy
> +       constraints in lately used alternative of the insn.  */
> +    lra_set_used_insn_alternative (curr_insn, -1);

Maybe:

  /* If we've changed the instruction then any alternative that
     we chose previously may no longer be valid.  */

> +      rtx x;
> +
> +      curr_swapped = !curr_swapped;
> +      if (curr_swapped)
> +	{
> +	  x = *curr_id->operand_loc[commutative];
> +	  *curr_id->operand_loc[commutative]
> +	    = *curr_id->operand_loc[commutative + 1];
> +	  *curr_id->operand_loc[commutative + 1] = x;
> +	  /* Swap the duplicates too.  */
> +	  lra_update_dup (curr_id, commutative);
> +	  lra_update_dup (curr_id, commutative + 1);
> +	  goto try_swapped;
> +	}
> +      else
> +	{
> +	  x = *curr_id->operand_loc[commutative];
> +	  *curr_id->operand_loc[commutative]
> +	    = *curr_id->operand_loc[commutative + 1];
> +	  *curr_id->operand_loc[commutative + 1] = x;
> +	  lra_update_dup (curr_id, commutative);
> +	  lra_update_dup (curr_id, commutative + 1);
> +	}

The swap code is the same in both cases, so I think it'd be better to
make it common.  Or possibly a helper function, since the same code
appears again later on.

> +	if (GET_CODE (op) == PLUS)
> +	  {
> +	    plus = op;
> +	    op = XEXP (op, 1);
> +	  }

Sorry, I'm complaining about old reload code again, but: does this
actually happen in LRA?  In reload, a register operand could become a
PLUS because of elimination, but I thought LRA did things differently.
Besides, this is only needed for:

> +	if (CONST_POOL_OK_P (mode, op)
> +	    && ((targetm.preferred_reload_class
> +		 (op, (enum reg_class) goal_alt[i]) == NO_REGS)
> +		|| no_input_reloads_p)
> +	    && mode != VOIDmode)
> +	  {
> +	    rtx tem = force_const_mem (mode, op);
> +	    
> +	    change_p = true;
> +	    /* If we stripped a SUBREG or a PLUS above add it back.  */
> +	    if (plus != NULL_RTX)
> +	      tem = gen_rtx_PLUS (mode, XEXP (plus, 0), tem);

and we shouldn't have (plus (constant ...) ...) after elimination
(or at all outside of a CONST).  I don't understand why the code is
needed even in reload.

> +  for (i = 0; i < n_operands; i++)
> +    {
> +      rtx old, new_reg;
> +      rtx op = *curr_id->operand_loc[i];
> +
> +      if (goal_alt_win[i])
> +	{
> +	  if (goal_alt[i] == NO_REGS
> +	      && REG_P (op)
> +	      && lra_former_scratch_operand_p (curr_insn, i))
> +	    change_class (REGNO (op), NO_REGS, "      Change", true);

I think this could do with a comment.  Does setting the class to NO_REGS
indirectly cause the operand to be switched back to a SCRATCH?

> +	  push_to_sequence (before);
> +	  rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
> +				   MEM, SCRATCH);
> +	  if (code == PRE_DEC || code == POST_DEC
> +	      || code == PRE_INC || code == POST_INC
> +	      || code == PRE_MODIFY || code == POST_MODIFY)

Very minor, but: GET_RTX_CLASS (code) == RTX_AUTOINC

> +	  enum machine_mode mode;
> +	  rtx reg, *loc;
> +	  int hard_regno, byte;
> +	  enum op_type type = curr_static_id->operand[i].type;
> +
> +	  loc = curr_id->operand_loc[i];
> +	  mode = get_op_mode (i);
> +	  if (GET_CODE (*loc) == SUBREG)
> +	    {
> +	      reg = SUBREG_REG (*loc);
> +	      byte = SUBREG_BYTE (*loc);
> +	      if (REG_P (reg)
> +		  /* Strict_low_part requires reload the register not
> +		     the sub-register.	*/
> +		  && (curr_static_id->operand[i].strict_low
> +		      || (GET_MODE_SIZE (mode)
> +			  <= GET_MODE_SIZE (GET_MODE (reg))
> +			  && (hard_regno
> +			      = get_try_hard_regno (REGNO (reg))) >= 0
> +			  && (simplify_subreg_regno
> +			      (hard_regno,
> +			       GET_MODE (reg), byte, mode) < 0)
> +			  && (goal_alt[i] == NO_REGS
> +			      || (simplify_subreg_regno
> +				  (ira_class_hard_regs[goal_alt[i]][0],
> +				   GET_MODE (reg), byte, mode) >= 0)))))
> +		{
> +		  loc = &SUBREG_REG (*loc);
> +		  mode = GET_MODE (*loc);
> +		}
> +	  old = *loc;

I think this needs a bit more justifying commentary (although I'm glad
to see it's much simpler than the reload version :-)).  One thing in
particular I didn't understand was why we don't reload the inner
register of a paradoxical subreg.

> +	  if (get_reload_reg (type, mode, old, goal_alt[i], "", &new_reg)
> +	      && type != OP_OUT)
> +	    {
> +	      push_to_sequence (before);
> +	      lra_emit_move (new_reg, old);
> +	      before = get_insns ();
> +	      end_sequence ();
> +	    }
> +	  *loc = new_reg;
> +	  if (type != OP_IN)
> +	    {
> +	      if (find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
> +		{
> +		  start_sequence ();
> +		  /* We don't want sharing subregs as the pseudo can
> +		     get a memory and the memory can be processed
> +		     several times for eliminations.  */
> +		  lra_emit_move (GET_CODE (old) == SUBREG && type == OP_INOUT
> +				 ? copy_rtx (old) : old,
> +				 new_reg);

I think this should simply be:

  lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);

leaving copy_rtx to figure out which rtxes can be shared.  No comment
would be needed for that.

> +		  emit_insn (after);
> +		  after = get_insns ();
> +		  end_sequence ();
> +		}
> +	      *loc = new_reg;
> +	    }

Very minor again, but: redundant *loc assignment (so that the two nested
if statements collapse to one).

> +      else
> +	{
> +	  lra_assert (INSN_CODE (curr_insn) < 0);
> +	  error_for_asm (curr_insn,
> +			 "inconsistent operand constraints in an %<asm%>");
> +	  /* Avoid further trouble with this insn.  */
> +	  PATTERN (curr_insn) = gen_rtx_USE (VOIDmode, const0_rtx);
> +	  return false;

Is this code handling a different case from the corresponding error
code in curr_insn_transform?  If so, it probably deserves a comment
explaining the difference.

> +/* Process all regs in debug location *LOC and change them on
> +   equivalent substitution.  Return true if any change was done.  */
> +static bool
> +debug_loc_equivalence_change_p (rtx *loc)

This doesn't keep the rtl in canonical form.  Probably the easiest and
best fix is to use simplify_replace_fn_rtx, which handles all that for you.
(simplify_replace_fn_rtx returns the original rtx if no change was made.)

> +  for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
> +    ira_reg_equiv[i].profitable_p = true;
> +  for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
> +    if (lra_reg_info[i].nrefs != 0)
> +      {
> +	if ((hard_regno = lra_get_regno_hard_regno (i)) >= 0)
> +	  {
> +	    int j, nregs = hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (i)];
> +	    
> +	    for (j = 0; j < nregs; j++)
> +	      df_set_regs_ever_live (hard_regno + j, true);
> +	  }
> +	else if ((x = get_equiv_substitution (regno_reg_rtx[i])) != NULL_RTX)
> +	  {
> +	    if (! first_p && contains_reg_p (x, false, false))
> +	      /* After RTL transformation, we can not guarantee that
> +		 pseudo in the substitution was not reloaded which
> +		 might make equivalence invalid.  For example, in
> +		 reverse equiv of p0
> +
> +		 p0 <- ...
> +		 ...
> +		 equiv_mem <- p0
> +
> +		 the memory address register was reloaded before the
> +		 2nd insn.  */
> +	      ira_reg_equiv[i].defined_p = false;
> +	    if (contains_reg_p (x, false, true))
> +	      ira_reg_equiv[i].profitable_p = false;
> +	  }
> +      }

Do we need two loops because the second may check for equivalences
of other pseudos besides "i"?  I couldn't see how offhand, but I might
well have missed something.  Might be worth a comment.

> +	      dest_reg = SET_DEST (set);
> +	      /* The equivalence pseudo could be set up as SUBREG in a
> +		 case when it is a call restore insn in a mode
> +		 different from the pseudo mode.  */
> +	      if (GET_CODE (dest_reg) == SUBREG)
> +		dest_reg = SUBREG_REG (dest_reg);
> +	      if ((REG_P (dest_reg)
> +		   && (x = get_equiv_substitution (dest_reg)) != dest_reg
> +		   /* Remove insns which set up a pseudo whose value
> +		      can not be changed.  Such insns might be not in
> +		      init_insns because we don't update equiv data
> +		      during insn transformations.
> +			  
> +		      As an example, let suppose that a pseudo got
> +		      hard register and on the 1st pass was not
> +		      changed to equivalent constant.  We generate an
> +		      additional insn setting up the pseudo because of
> +		      secondary memory movement.  Then the pseudo is
> +		      spilled and we use the equiv constant.  In this
> +		      case we should remove the additional insn and
> +		      this insn is not init_insns list.	 */
> +		   && (! MEM_P (x) || MEM_READONLY_P (x)
> +		       || in_list_p (curr_insn,
> +				     ira_reg_equiv
> +				     [REGNO (dest_reg)].init_insns)))

This is probably a stupid question, sorry, but when do we ever want
to keep an assignment to a substituted pseudo?  I.e. why isn't this just:

	      if ((REG_P (dest_reg)
		   && (x = get_equiv_substitution (dest_reg)) != dest_reg)

> +/* Info about last usage of registers in EBB to do inheritance/split
> +   transformation.  Inheritance transformation is done from a spilled
> +   pseudo and split transformations from a hard register or a pseudo
> +   assigned to a hard register.	 */
> +struct usage_insns
> +{
> +  /* If the value is equal to CURR_USAGE_INSNS_CHECK, then the member
> +     value INSNS is valid.  The insns is chain of optional debug insns
> +     and a finishing non-debug insn using the corresponding reg.  */
> +  int check;
> +  /* Value of global reloads_num at the ???corresponding next insns.  */
> +  int reloads_num;
> +  /* Value of global reloads_num at the ???corresponding next insns.  */
> +  int calls_num;

"???s".  Probably "at the last instruction in INSNS" if that's accurate
(because debug insns in INSNS don't affect these fields).

> +/* Process all regs OLD_REGNO in location *LOC and change them on the
> +   reload pseudo NEW_REG.  Return true if any change was done.	*/
> +static bool
> +substitute_pseudo (rtx *loc, int old_regno, rtx new_reg)

This is another case where I found the term "reload pseudo" a bit confusing,
since AIUI new_reg can be an inheritance or split pseudo rather than a pseudo
created solely for insn reloads.  I'll follow up about that on the original
thread.  Maybe just:

/* Replace all references to register OLD_REGNO in *LOC with pseudo register
   NEW_REG.  Return true if any change was made.  */

> +  code = GET_CODE (x);
> +  if (code == REG && (int) REGNO (x) == old_regno)
> +    {
> +      *loc = new_reg;
> +      return true;
> +    }

Maybe assert that the modes are the same?

> +/* Do inheritance transformation for insn INSN defining (if DEF_P) or
> +   using ORIGINAL_REGNO where the subsequent insn(s) in EBB (remember
> +   we traverse insns in the backward direction) for the original regno
> +   is NEXT_USAGE_INSNS.	 The transformations look like

Maybe:

/* Do interitance transformations for insn INSN, which defines (if DEF_P)
   or uses ORIGINAL_REGNO.  NEXT_USAGE_INSNS specifies which instruction
   in the EBB next uses ORIGINAL_REGNO; it has the same form as the
   "insns" field of usage_insns.

   The transformations look like:

> +
> +     p <- ...		  i <- ...
> +     ...		  p <- i    (new insn)
> +     ...	     =>
> +     <- ... p ...	  <- ... i ...
> +   or
> +     ...		  i <- p    (new insn)
> +     <- ... p ...	  <- ... i ...
> +     ...	     =>
> +     <- ... p ...	  <- ... i ...
> +   where p is a spilled original pseudo and i is a new inheritance pseudo.
> +   
> +   The inheritance pseudo has the smallest class of two classes CL and
> +   class of ORIGINAL REGNO.  It will have unique value if UNIQ_P.  The
> +   unique value is necessary for correct assignment to inheritance
> +   pseudo for input of an insn which should be the same as output
> +   (bound pseudos).  Return true if we succeed in such
> +   transformation.  */

This comment looks really good, but I still wasn't sure about the
UNIQ_P thing.  AIUI this is for cases like:

                       i <- p            [new insn]
   r <- ... p ...      r <- ... i ...    [input reload]
   r <- ... r ...   => r <- ... r ...    [original insn]
   <- r                <- r              [output reload]
   ....                ......
   <- ... p ...        <- ... i ...      [next ref]

where "r" is used on both sides of the original insn and where the
output reload assigns to something other than "p" (otherwise "next ref"
wouldn't be the next ref).  But why does this affect the way "i" is created?
I think it'd be worth expanding that part a bit.

> +  if (! ira_reg_classes_intersect_p[cl][rclass])
> +    {
> +      if (lra_dump_file != NULL)
> +	{
> +	  fprintf (lra_dump_file,
> +		   "	Rejecting inheritance for %d "
> +		   "because of too different classes %s and %s\n",

Suggest s/too different/disjoint/

> +  if ((ira_class_subset_p[cl][rclass] && cl != rclass)
> +      || ira_class_hard_regs_num[cl] < ira_class_hard_regs_num[rclass])
> +    {
> +      if (lra_dump_file != NULL)
> +	fprintf (lra_dump_file, "    Use smallest class of %s and %s\n",
> +		 reg_class_names[cl], reg_class_names[rclass]);
> +      
> +      rclass = cl;
> +    }

I don't understand the second line of the if statement.  Why do we prefer
classes with fewer allocatable registers?

My guess before reading the code was that we'd use the subunion of CL and
RCLASS, so maybe a comment explaining why we use this choice would help.

> +  if (NEXT_INSN (new_insns) != NULL_RTX)
> +    {
> +      if (lra_dump_file != NULL)
> +	{
> +	  fprintf (lra_dump_file,
> +		   "	Rejecting inheritance %d->%d "
> +		   "as it results in 2 or more insns:\n",
> +		   original_regno, REGNO (new_reg));
> +	  debug_rtl_slim (lra_dump_file, new_insns, NULL_RTX, -1, 0);
> +	  fprintf (lra_dump_file,
> +		   "	>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n");
> +	}
> +      return false;
> +    }

Hmm, I wasn't sure about this at first.  Some targets define patterns for
multiword moves and split them later.  Others expose the split straight away.
The two approaches don't really imply any difference in cost, so I didn't
want us to penalise the latter.

But I suppose on targets that split straight away, lower-subreg would
tend to replace the multiword pseudo with individual word-sized pseudos,
so LRA shouldn't see them.  I suppose this check shouldn't matter in practice.

> +  if (def_p)
> +    lra_process_new_insns (insn, NULL_RTX, new_insns,
> +			   "Add original<-inheritance");
> +  else
> +    lra_process_new_insns (insn, new_insns, NULL_RTX,
> +			   "Add inheritance<-pseudo");

Maybe "original" rather than "pseudo" here too for consistency.

> +/* Return true if we need a split for hard register REGNO or pseudo
> +   REGNO which was assigned to a hard register.
> +   POTENTIAL_RELOAD_HARD_REGS contains hard registers which might be
> +   used for reloads since the EBB end.	It is an approximation of the
> +   used hard registers in the split range.  The exact value would
> +   require expensive calculations.  If we were aggressive with
> +   splitting because of the approximation, the split pseudo will save
> +   the same hard register assignment and will be removed in the undo
> +   pass.  We still need the approximation because too aggressive
> +   splitting would result in too inaccurate cost calculation in the
> +   assignment pass because of too many generated moves which will be
> +   probably removed in the undo pass.  */
> +static inline bool
> +need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno)
> +{
> +  int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
> +
> +  lra_assert (hard_regno >= 0);
> +  return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
> +	   && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
> +	   && (usage_insns[regno].reloads_num
> +	       + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
> +	   && ((regno < FIRST_PSEUDO_REGISTER
> +		&& ! bitmap_bit_p (&ebb_global_regs, regno))
> +	       || (regno >= FIRST_PSEUDO_REGISTER
> +		   && lra_reg_info[regno].nrefs > 3
> +		   && bitmap_bit_p (&ebb_global_regs, regno))))
> +	  || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno)));
> +}

Could you add more commentary about the thinking behind this particular
choice of heuristic?  E.g. I wasn't sure what the reloads_num check did,
or why we only split hard registers that are local to the EBB and only
split pseudos that aren't.

The 2 and 3 numbers seemed a bit magic too.  I suppose the 2 has
something to do with "one save and one restore", but I wasn't sure
why we applied it only for pseudos.  (AIUI that arm of the check
deals with "genuine" split pseudos rather than call saves & restores.)

Still, it says a lot for the high quality of LRA that, out of all the
1000s of lines of code I've read so far, this is the only part that
didn't seem to have an intuitive justification.

> +  for (i = 0;
> +       (cl = reg_class_subclasses[allocno_class][i]) != LIM_REG_CLASSES;
> +       i++)
> +    if (! SECONDARY_MEMORY_NEEDED (cl, hard_reg_class, mode)
> +	&& ! SECONDARY_MEMORY_NEEDED (hard_reg_class, cl, mode)
> +	&& TEST_HARD_REG_BIT (reg_class_contents[cl], hard_regno)
> +	&& (best_cl == NO_REGS
> +	    || (hard_reg_set_subset_p (reg_class_contents[best_cl],
> +				       reg_class_contents[cl])
> +		&& ! hard_reg_set_equal_p (reg_class_contents[best_cl],
> +					   reg_class_contents[cl]))))
> +      best_cl = cl;

OK, so this suggestion isn't backed up by any evidence, but what do
you think about this alternative:

	&& (best_cl == NO_REGS
	    || (ira_class_hard_regs_num[best_cl]
		< ira_class_hard_regs_num[cl]))

which should choose the largest class that requires no secondary memory.
It looks like the subset version could get "stuck" on a single-register
class that happens to be early in the list but has no superclass smaller
than allocno_class.

> +/* Do split transformation for insn INSN defining or
> +   using ORIGINAL_REGNO where the subsequent insn(s) in EBB (remember
> +   we traverse insns in the backward direction) for the original regno
> +   is NEXT_USAGE_INSNS.	 The transformations look like

Same suggestion as for the inheritance function above.

> +  if (call_save_p)
> +    save = emit_spill_move (true, new_reg, original_reg, -1);
> +  else
> +    {
> +      start_sequence ();
> +      emit_move_insn (new_reg, original_reg);
> +      save = get_insns ();
> +      end_sequence ();
> +    }
> +  if (NEXT_INSN (save) != NULL_RTX)
> +    {
> +      lra_assert (! call_save_p);

Is emit_spill_move really guaranteed to return only one instruction in
cases where emit_move_insn might not?  Both of them use emit_move_insn_1
internally, so I wouldn't have expected much difference.

In fact I wasn't really sure why:

  save = gen_move_insn (new, original_reg);

wouldn't be correct for both.

Same comments for the restore code.

> +      /* See which defined values die here.  */
> +      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
> +	if (reg->type == OP_OUT && ! reg->early_clobber
> +	    && (! reg->subreg_p
> +		|| bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
> +	  bitmap_clear_bit (&live_regs, reg->regno);
> +      /* Mark each used value as live.	*/
> +      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
> +	if (reg->type == OP_IN
> +	    && bitmap_bit_p (&check_only_regs, reg->regno))
> +	  bitmap_set_bit (&live_regs, reg->regno);
> +      /* Mark early clobber outputs dead.  */
> +      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
> +	if (reg->type == OP_OUT && reg->early_clobber && ! reg->subreg_p)
> +	  bitmap_clear_bit (&live_regs, reg->regno);

I don't think this would be correct for unreloaded insns because an
unreloaded insn can have the same pseudo as an input and an earlyclobber
output.  (Probably not an issue here, since we're called after the
constraints pass.)  There's also the case of matched earlyclobber operands,
where the matched input is specifically not affected by the earlyclobber.

I'd have thought:

      /* See which defined values die here.  */
      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
	if (reg->type == OP_OUT
	    && (! reg->subreg_p
		|| bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
	  bitmap_clear_bit (&live_regs, reg->regno);
      /* Mark each used value as live.	*/
      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
	if (reg->type == OP_IN
	    && bitmap_bit_p (&check_only_regs, reg->regno))
	  bitmap_set_bit (&live_regs, reg->regno);

ought to be correct, but perhaps I'm missing something.

(I'm still uneasy about the special treatment of bound pseudos here.
A clobber really does seem better.)

> +      /* It is quite important to remove dead move insns because it
> +	 means removing dead store, we don't need to process them for
> +	 constraints, and unfortunately some subsequent optimizations
> +	 (like shrink-wrapping) currently based on assumption that
> +	 there are no trivial dead insns.  */

Maybe best to drop the "subsequent optimizations" part.  This comment
is unlikely to be updated after any change to shrink-wrapping & co.,
and the first two justifications seem convincing enough on their own.

> +/* Add inheritance info REGNO and INSNS.  */
> +static void
> +add_to_inherit (int regno, rtx insns)
> +{
> +  int i;
> +
> +  for (i = 0; i < to_inherit_num; i++)
> +    if (to_inherit[i].regno == regno)
> +      return;

Is the existing "insns" field guaranteed to match the "insns" parameter
in this case, or might they be different?  Probably worth an assert or
comment respectively.

> +/* Return first (if FIRST_P) or last non-debug insn in basic block BB.
> +   Return null if there are no non-debug insns in the block.  */
> +static rtx
> +get_non_debug_insn (bool first_p, basic_block bb)
> +{
> +  rtx insn;
> +
> +  for (insn = first_p ? BB_HEAD (bb) : BB_END (bb);
> +       insn != NULL_RTX && ! NONDEBUG_INSN_P (insn);
> +       insn = first_p ? NEXT_INSN (insn) : PREV_INSN (insn))
> +    ;
> +  if (insn != NULL_RTX && BLOCK_FOR_INSN (insn) != bb)
> +    insn = NULL_RTX;
> +  return insn;
> +}

It probably doesn't matter in practice, but it looks like it'd be better
to limit the walk to the bb, rather than walking until null and then
testing the bb after the walk.

Maybe it would be eaiser to split into two functions, since first_p is
always constant.  E.g.:

  rtx insn;

  FOR_BB_INSNS (bb, insn)
    if (NONDEBUG_INSN_P (insn))
      return insn;
  return NULL_RTX;

for first_p.  s/FOR_BB_INSNS/FOR_BB_INSNS_REVERSE/ for !first_p.

> +/* Set up RES by registers living on edges FROM except the edge (FROM,
> +   TO) or by registers set up in a jump insn in BB FROM.  */
> +static void
> +get_live_on_other_edges (basic_block from, basic_block to, bitmap res)
> +{
> +  int regno;
> +  rtx last;
> +  struct lra_insn_reg *reg;
> +  edge e;
> +  edge_iterator ei;
> +
> +  lra_assert (to != NULL);
> +  bitmap_clear (res);
> +  FOR_EACH_EDGE (e, ei, from->succs)
> +    if (e->dest != to)
> +      bitmap_ior_into (res, DF_LR_IN (e->dest));
> +  if ((last = get_non_debug_insn (false, from)) == NULL_RTX || ! JUMP_P (last))
> +    return;
> +  curr_id = lra_get_insn_recog_data (last);
> +  for (reg = curr_id->regs; reg != NULL; reg = reg->next)
> +    if (reg->type != OP_IN
> +	&& (regno = reg->regno) >= FIRST_PSEUDO_REGISTER)
> +      bitmap_set_bit (res, regno);
> +}

Probably a silly question, sorry, but: why does the JUMP_P part only
include pseudo registers?  The other calculations (here and elsewhere)
seem to handle both hard and pseudo registers.

> +/* Do inheritance/split transformations in EBB starting with HEAD and
> +   finishing on TAIL.  We process EBB insns in the reverse order.
> +   Return true if we did any inheritance/split transformation in the
> +   EBB.
> +
> +   We should avoid excessive splitting which results in worse code
> +   because of inaccurate cost calculations for spilling new split
> +   pseudos in such case.  To achieve this we do splitting only if
> +   register pressure is high in given basic block and there reload

"...and there are reload"

> +   pseudos requiring hard registers.  We could do more register
> +   pressure calculations at any given program point to avoid necessary
> +   splitting even more but it is to expensive and the current approach
> +   is well enough.  */

"works well enough".

> +  change_p = false;
> +  curr_usage_insns_check++;
> +  reloads_num = calls_num = 0;
> +  /* Remember: we can remove the current insn.	*/
> +  bitmap_clear (&check_only_regs);
> +  last_processed_bb = NULL;

I couldn't tell which part of the code the comment is referring to.
Maybe left over?

> +	  after_p = (last_insn != NULL_RTX && ! JUMP_P (last_insn)
> +		     && (! CALL_P (last_insn)
> +			 || (find_reg_note (last_insn,
> +					   REG_NORETURN, NULL) == NULL_RTX
> +			     && ((next_insn
> +				  = next_nonnote_nondebug_insn (last_insn))
> +				 == NULL_RTX
> +				 || GET_CODE (next_insn) != BARRIER))));

Genuine question, but: when are the last four lines needed?  The condition
that they're testing for sounds like a noreturn call.

> +      if (src_regno < lra_constraint_new_regno_start
> +	  && src_regno >= FIRST_PSEUDO_REGISTER
> +	  && reg_renumber[src_regno] < 0
> +	  && dst_regno >= lra_constraint_new_regno_start
> +	  && (cl = lra_get_allocno_class (dst_regno)) != NO_REGS)
> +	{
> +	  /* 'reload_pseudo <- original_pseudo'.  */
> +	  reloads_num++;
> +	  succ_p = false;
> +	  if (usage_insns[src_regno].check == curr_usage_insns_check
> +	      && (next_usage_insns = usage_insns[src_regno].insns) != NULL_RTX)
> +	    succ_p = inherit_reload_reg (false,
> +					 bitmap_bit_p (&lra_matched_pseudos,
> +						       dst_regno),
> +					 src_regno, cl,
> +					 curr_insn, next_usage_insns);
> +	  if (succ_p)
> +	    change_p = true;
> +	  else
> +	    {
> +	      usage_insns[src_regno].check = curr_usage_insns_check;
> +	      usage_insns[src_regno].insns = curr_insn;
> +	      usage_insns[src_regno].reloads_num = reloads_num;
> +	      usage_insns[src_regno].calls_num = calls_num;
> +	      usage_insns[src_regno].after_p = false;
> +	    }

Looks like this and other places could use the add_next_usage_insn
helper function.

> +	  if (cl != NO_REGS
> +	      && hard_reg_set_subset_p (reg_class_contents[cl],
> +					live_hard_regs))
> +	    IOR_HARD_REG_SET (potential_reload_hard_regs,
> +			      reg_class_contents[cl]);

Redundant "cl != NO_REGS" check.  (Was a bit confused by that at first.)

I don't understand the way potential_reload_hard_regs is set up.
Why does it only include reload pseudos involved in moves of the form
"reload_pseudo <- original_pseudo" and "original_pseudo <- reload_pseudo",
but include those reloads regardless of whether inheritance is possible?

I wondered whether it might be deliberately selective in order to speed
up LRA, but we walk all the registers in an insn regardless.

Same for reloads_num.

> +	  if (cl != NO_REGS
> +	      && hard_reg_set_subset_p (reg_class_contents[cl],
> +					live_hard_regs))
> +	    IOR_HARD_REG_SET (potential_reload_hard_regs,
> +			      reg_class_contents[cl]);

Same comment as for the previous block.

> +		if (reg_renumber[dst_regno] < 0
> +		    || (reg->type == OP_OUT && ! reg->subreg_p))
> +		/* Invalidate.	*/
> +		usage_insns[dst_regno].check = 0;

Could you explain this condition a bit more?  Why does reg_renumber
affect things?

> +/* This value affects EBB forming.  If probability of edge from EBB to
> +   a BB is not greater than the following value, we don't add the BB
> +   to EBB.  */ 
> +#define EBB_PROBABILITY_CUTOFF (REG_BR_PROB_BASE / 2)

It looks like schedule_ebbs uses a higher default cutoff for FDO.
Would the same distinction be useful here?

Maybe schedule_ebbs-like params would be good here too.

> +  bitmap_and (&temp_bitmap_head, removed_pseudos, live);
> +  EXECUTE_IF_SET_IN_BITMAP (&temp_bitmap_head, 0, regno, bi)

This isn't going to have much effect on compile time, but
EXECUTE_IF_AND_IN_BITMAP avoids the need for a temporary bitmap.

> +/* Remove inheritance/split pseudos which are in REMOVE_PSEUDOS and
> +   return true if we did any change.  The undo transformations for
> +   inheritance looks like
> +      i <- i2
> +      p <- i	  =>   p <- i2
> +   or removing
> +      p <- i, i <- p, and i <- i3
> +   where p is original pseudo from which inheritance pseudo i was
> +   created, i and i3 are removed inheritance pseudos, i2 is another
> +   not removed inheritance pseudo.  All split pseudos or other
> +   occurrences of removed inheritance pseudos are changed on the
> +   corresponding original pseudos.  */
> +static bool
> +remove_inheritance_pseudos (bitmap remove_pseudos)
> +{
> +  basic_block bb;
> +  int regno, sregno, prev_sregno, dregno, restore_regno;
> +  rtx set, prev_set, prev_insn;
> +  bool change_p, done_p;
> +
> +  change_p = ! bitmap_empty_p (remove_pseudos);

I wondered from the comment why we couldn't just return straight away
for the empty set, but it looks like the function also schedules a
constraints pass for instructions that keep their inheritance or
split pseudos.  Is that right?  Might be worth mentioning that
in the function comment if so.

> +	      else if (bitmap_bit_p (remove_pseudos, sregno)
> +		       && bitmap_bit_p (&lra_inheritance_pseudos, sregno))
> +		{
> +		  /* Search the following pattern:
> +		       inherit_or_split_pseudo1 <- inherit_or_split_pseudo2
> +		       original_pseudo <- inherit_or_split_pseudo1
> +		    where the 2nd insn is the current insn and
> +		    inherit_or_split_pseudo2 is not removed.  If it is found,
> +		    change the current insn onto:
> +		       original_pseudo1 <- inherit_or_split_pseudo2.  */

s/original_pseudo1/original_pseudo/ I think (we don't change the destination).

> +		  for (prev_insn = PREV_INSN (curr_insn);
> +		       prev_insn != NULL_RTX && ! NONDEBUG_INSN_P (prev_insn);
> +		       prev_insn = PREV_INSN (prev_insn))
> +		    ;
> +		  if (prev_insn != NULL_RTX && BLOCK_FOR_INSN (prev_insn) == bb
> +		      && (prev_set = single_set (prev_insn)) != NULL_RTX
> +		      /* There should be no subregs in insn we are
> +			 searching because only the original reg might
> +			 be in subreg when we changed the mode of
> +			 load/store for splitting.  */
> +		      && REG_P (SET_DEST (prev_set))
> +		      && REG_P (SET_SRC (prev_set))
> +		      && (int) REGNO (SET_DEST (prev_set)) == sregno
> +		      && ((prev_sregno = REGNO (SET_SRC (prev_set)))
> +			  >= FIRST_PSEUDO_REGISTER)
> +		      && (lra_reg_info[sregno].restore_regno
> +			  == lra_reg_info[prev_sregno].restore_regno)
> +		      && ! bitmap_bit_p (remove_pseudos, prev_sregno))

I'm sure the restore_regno comparison near the end is correct,
but could you add a comment to explain it?  The substitution
itself seems OK either way.

> +	      struct lra_insn_reg *reg;
> +	      bool insn_change_p = false;
> +
> +	      curr_id = lra_get_insn_recog_data (curr_insn);
> +	      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
> +		if ((regno = reg->regno) >= lra_constraint_new_regno_start
> +		    && lra_reg_info[regno].restore_regno >= 0)

Is the first part of the comparison needed?  Most other places don't check,
so it looked at first glance like there was something special here.

> +		  {
> +		    if (change_p && bitmap_bit_p (remove_pseudos, regno))
> +		      {
> +			restore_regno = lra_reg_info[regno].restore_regno;
> +			substitute_pseudo (&curr_insn, regno,
> +					   regno_reg_rtx[restore_regno]);
> +			insn_change_p = true;
> +		      }
> +		    else if (NONDEBUG_INSN_P (curr_insn))
> +		      {
> +			lra_push_insn_and_update_insn_regno_info (curr_insn);
> +			lra_set_used_insn_alternative_by_uid
> +			  (INSN_UID (curr_insn), -1);
> +		      }
> +		  }
> +	      if (insn_change_p)
> +		{
> +		  lra_update_insn_regno_info (curr_insn);
> +		  if (lra_dump_file != NULL)
> +		    {
> +		      fprintf (lra_dump_file, "	   Restore original insn:\n");
> +		      debug_rtl_slim (lra_dump_file,
> +				      curr_insn, curr_insn, -1, 0);
> +		    }
> +		}

AIUI we could have a partial restore, keeping some registers but
restoring others.  Is that right?  The dump entry made it sounds
like a full restore.

Maybe something like:

	      struct lra_insn_reg *reg;
	      bool restored_regs_p = false;
	      bool kept_regs_p = false;

	      curr_id = lra_get_insn_recog_data (curr_insn);
	      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
		{
		  regno = reg->regno;
		  restore_regno = lra_reg_info[regno].restore_regno;
		  if (restore_regno >= 0)
		    {
		      if (change_p && bitmap_bit_p (remove_pseudos, regno))
			{
			  substitute_pseudo (&curr_insn, regno,
					     regno_reg_rtx[restore_regno]);
			  restored_regs_p = true;
			}
		      else
			kept_regs_p = true;
		    }
		}
	      if (NONDEBUG_INSN_P (curr_insn) && kept_regs_p)
		{
		  /* The instruction has changed since the previous
		     constraints pass.  */
		  lra_push_insn_and_update_insn_regno_info (curr_insn);
		  lra_set_used_insn_alternative_by_uid
		    (INSN_UID (curr_insn), -1);
		}
	      else if (restored_regs_p)
		/* The instruction has been restored to the form that
		   it had during the previous constraints pass.  */
		lra_update_insn_regno_info (curr_insn);

	      if (restored_regs_p && lra_dump_file != NULL)
		{
		  fprintf (lra_dump_file,
			   "	   Insn after restoring regs:\n");
		  debug_rtl_slim (lra_dump_file, curr_insn, curr_insn, -1, 0);
		}

(if correct) might make the partial case clearer, but that's personal
preference, so please feel free to ignore, chop or change.

Also, is regno_reg_rtx[restore_regno] always correct when restoring
registers?  I thought restore_regno could be a hard register and that
the hard register might not necessarily be used in the same mode as
the regno_reg_rtx[] entry.

That just leaves lra.h, lra-int.h and lra.c itself.  I'm hoping to have
read through those by the middle of next week, but any comments about them
will probably just be banal stuff (even more than my comments so far :-))
so I deliberately left them to last.

Richard
Richard Sandiford Oct. 12, 2012, 4:13 p.m. UTC | #8
Vladimir Makarov <vmakarov@redhat.com> writes:
>>> +/* Info about pseudo used during the assignment pass.  Thread is a set
>>> +   of connected reload and inheritance pseudos with the same set of
>>> +   available hard reg set.  Thread is a pseudo itself for other
>>> +   cases.  */
>>> +struct regno_assign_info
>> Maybe:
>>
>> /* Information about the thread to which a pseudo belongs.  Threads are
>>     a set of connected reload and inheritance pseudos with the same set of
>>     available hard registers.  Lone registers belong to their own threads.  */
> Fixed.
>> Although the condition seems to be:
>>> +	&& (ira_class_hard_regs_num[regno_allocno_class_array[regno1]]
>>> +	    == ira_class_hard_regs_num[regno_allocno_class_array[regno2]]))
>> i.e. the same _number_ of available hard regs, but not necessarily the
>> same set.
> It should be the same in most cases.  This condition is just faster 
> approximation of the same available hard reg set.

The distinction does seem important though.   It's possible that
a target has two distinct register files of the same allocatable size.
Would something like:

      (ira_class_subset_p[class1][class2]
       && ira_class_subset_p[class2][class1])

work instead?

>> /* Update the preference for using HARD_REGNO for pseudos that are
>>     connected directly or indirectly with REGNO.  Apply divisor DIV
>>     to any preference adjustments.
>>
>>     The more indirectly a pseudo is connected, the smaller its effect
>>     should be.  We therefore increase DIV on each "hop".  */
>>
> Fixed.  By the way, it is your invention from IRA.

Heh, I'd forgotten all about that.

>>> +      /* We are trying to spill a reload pseudo.  That is wrong we
>>> +	 should assign all reload pseudos, otherwise we cannot reuse
>>> +	 the selected alternatives.  */
>>> +      hard_regno = find_hard_regno_for (regno, &cost, -1);
>>> +      if (hard_regno >= 0)
>>> +	{
>> Don't really understand this comment, sorry.
> It removed the comment.  It is from an old solution code trying to 
> guarantee assignment to the reload pseudo.
>> Also, why are we passing -1 to find_hard_regno_for, rather than hard_regno?
>> The loop body up till this point has been specifically freeing up registers
>> to make hard_regno allocatable.  I realise that, by spilling everything
>> that overlaps this range, we might have freed up other registers too,
>> and so made others besides hard_regno allocatable.  But wouldn't we want
>> to test those other hard registers in "their" iteration of the loop
>> instead of this one?  The spills in those iterations ought to be more
>> directed (i.e. there should be less incidental spilling).
>>
>> As things stand, doing an rclass_size * rclass_size scan seems
>> unnecessarily expensive, although probably off the radar.
> We cannot just pass hard_regno for multi-word pseudo when hard_regno-1 
> is already free.

But this call is in a loop that iterates over all registers in the class:

  for (i = 0; i < rclass_size; i++)
    {
      hard_regno = ira_class_hard_regs[rclass][i];

and we reach the find_hard_regno_for call unless there is some
conflicting register that we cannot spill.  So if "hard_regno - 1"
belongs to the allocation class and is a viable choice, "its" iteration
of the loop would spill specifically for "hard_regno - 1" and get the
most accurate cost for that register.  I couldn't see why any other
iteration of the loop would want to consider "hard_regno - 1".

> You are right about possibility to speed up the code, although on 
> profiles I looked (including the last huge tests) spill_for and 
> find_hard_regno_for called from takes few time. That is probably because 
> you don't need spill frequently.  Freeing one long live range pseudo 
> permits to find hard regno without spilling for many short live pseudos 
> (reload and inheritance ones).
> Also loop rclass_size * rclass_size is not expensive, the preparation of 
> data for the loop is expensive.

OK, in that case maybe the efficiency concern wasn't justified.
FWIW, I still think passing hard_regno would be clearer though,
in terms of meeting expectations.  It just seems odd to spill for
one specific register and then test all of them.  Especially when the
spilling we actually do after choosing register X is based on "X's"
iteration of this loop.

(I realise I could well be missing the point here though, sorry.)

>>> +		  && (reload_hard_regno
>>> +		      = find_hard_regno_for (reload_regno,
>>> +					     &reload_cost, -1)) >= 0
>>> +		  && (lra_hard_reg_set_intersection_p
>>> +		      (reload_hard_regno, PSEUDO_REGNO_MODE (reload_regno),
>>> +		       spilled_hard_regs)))
>>> +		{
>>> +		  if (lra_dump_file != NULL)
>>> +		    fprintf (lra_dump_file, " assign %d(cost=%d)",
>>> +			     reload_regno, reload_cost);
>>> +		  assign_temporarily (reload_regno, reload_hard_regno);
>>> +		  cost += reload_cost;
>> It looks like registers that can be reallocated make hard_regno more
>> expensive (specifically by reload_cost), but registers that can't be
>> reallocated contribute no cost.  Is that right?  Seemed a little odd,
>> so maybe worth a comment.
> Reload cost is a negative cost.  It is negative base is the pseudo 
> frequency.

Ah!  Missed that, sorry.

> I added the comment.  The better hard register is, the more negative 
> cost is.

Thanks.

>> Also, AIUI find_hard_regno_for is trying to allocate the register for
>> reload_regno on the basis that reload_regno has the same conflicts as
>> the current regno, and so it's only an approximation.  Is that right?
>> Might be worth a comment if so (not least to explain why we don't commit
>> to this allocation if we end up choosing hard_regno).
> Sorry, I did not understand what you are asking.  We try to find best 
> pseudos to spill which helps to assign more reload pseudos (on cost 
> base) as possible.  Pseudos for which find_hard_regno finds not spilled 
> hard regs are ignored as they can be assigned without spilling.

Yeah, sorry, ignore this.  I think I'd taken a break here and then
forgotten that find_hard_regno_for works out the conflicts set for
the register itself.  I think I'd assumed when writing it that
find_hard_regno_for used some state that had been precalculated
before the main "for (i = 0; i < rclass_size; i++)" loop.

>>> +      /* Restore the live hard reg pseudo info for spilled pseudos.  */
>>> +      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
>>> +	update_lives (spill_regno, false);
>> I couldn't tell why this was outside the "hard_regno >= 0" condition.
>> Do we really change these registers even if find_hard_regno_for fails?
> The first we spill some pseudo and then call find_hard_regno_for.  So we 
> should restore the state before spilling independently on success of 
> find_hard_regno_for.

Yeah, ignore this too :-)

>>> +  /* Spill: */
>>> +  EXECUTE_IF_SET_IN_BITMAP (&best_spill_pseudos_bitmap, 0, spill_regno, bi)
>> Very minor, but I think it'd be worth asserting that best_hard_regno >= 0
>> before this loop.
> Unfortunately, in very rare cases, best_hard_regno can be < 0.  That is 
> why we have two iteration for assignment of reload pseudos (see comment 
> for 2nd iter for reload pseudo assignments.
>
> I've added a comment for the function that it can return negative value.

OK, thanks.

>> The only general comment I have so far is that it's sometimes
>> difficult to follow which types of pseudos are being included
>> or excluded by a comparison with lra_constraint_new_regno_start.
>> Sometimes the comments talk about "reload pseudos", but other
>> similar checks imply that the registers could be inheritance
>> pseudos or split pseudos as well.  Some thin inline wrappers
>> might help here.
> Inheritance, split, reload pseudos created since last constraint pass >= 
> lra_constraint_new_regno_start.
> Inheritance and split pseudos created on any pass are in the 
> corresponding bitmaps.
> Inheritance and split pseudos since the last constraint pass has also 
> restore_regno >= 0 until split or inheritance transformations are done.

OK.  What prompted this was that some comments refer specifically to
"reload pseudo" whereas the accompanying code simply checks against
lra_constraint_new_regno_start.  It then wasn't obvious whether the code
really did just include "reload pseudos" in what I thought was the
strict sense -- e.g. because no other type of LRA-created pseudo could
occur in that context, so there was no point checking anything else --
or whether the code was actually handling inheritance and split pseudos too.

Maybe it would help to have a term to refer all four of:

  - reload pseudos
  - optional reload pseudos
  - inheritance pseudos
  - split pseudos

although I won't suggest one because I'm useless at naming things.

>>> +		/* We can use inheritance pseudos in original insns
>>> +		   (not reload ones).  */
>>> +		if (regno < lra_constraint_new_regno_start
>>> +		    || bitmap_bit_p (&lra_inheritance_pseudos, regno)
>>> +		    || reg_renumber[regno] < 0)
>>> +		  continue;
>>> +		sorted_pseudos[nfails++] = regno;
>>> +		if (lra_dump_file != NULL)
>>> +		  fprintf (lra_dump_file,
>>> +			   "	  Spill reload r%d(hr=%d, freq=%d)\n",
>>> +			   regno, reg_renumber[regno],
>>> +			   lra_reg_info[regno].freq);
>> Same comment about types of pseudo as above.  (I.e. the code checks for
>> inheritance pseudos, but not split pseudos.)
> I modified the comment to
>
> /* A reload pseudo did not get a hard register on the
>     first iteration because of the conflict with
>     another reload pseudos in the same insn.  So we
>     consider only reload pseudos assigned to hard
>     registers.  We shall exclude inheritance pseudos as
>     they can occur in original insns (not reload ones).
>     We can omit the check for split pseudos because
>     they occur only in move insns containing non-reload
>     pseudos. */
>
> I hope it explains the code.

Yes, thanks.

>>> +  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
>>> +    if (((i < lra_constraint_new_regno_start
>>> +	  && ! bitmap_bit_p (&do_not_assign_nonreload_pseudos, i))
>>> +	 || (bitmap_bit_p (&lra_inheritance_pseudos, i)
>>> +	     && lra_reg_info[i].restore_regno >= 0)
>>> +	 || (bitmap_bit_p (&lra_split_pseudos, i)
>>> +	     && lra_reg_info[i].restore_regno >= 0)
>>> +	 || bitmap_bit_p (&lra_optional_reload_pseudos, i))
>>> +	&& reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
>>> +	&& regno_allocno_class_array[i] != NO_REGS)
>>> +      sorted_pseudos[n++] = i;
>>> +  bitmap_clear (&do_not_assign_nonreload_pseudos);
>> where we test very similar things inline, and then clear
>> do_not_assign_nonreload_pseudos.  Do we need d_n_a_n_p at all?
> No the code is right.  We still need d_n_a_n_p as we can not easily 
> calculate from what pseudo the given pseudo was inherited or split.  It 
> is different from the loop where new inheritance and split pseudos are 
> checked (it is different from their origins marked in d_n_a_n_p.

Yeah, sorry about that.  I misread the indices.

Richard
Richard Sandiford Oct. 13, 2012, 7:42 a.m. UTC | #9
I'm having to correct my own comments again, sorry.

Richard Sandiford <rdsandiford@googlemail.com> writes:
>> +  /* If this is post-increment, first copy the location to the reload reg.  */
>> +  if (post && real_in != result)
>> +    emit_insn (gen_move_insn (result, real_in));
>
> Nit, but real_in != result can never be true AIUI, and I was confused how
> the code could be correct in that case.  Maybe just remove it, or make
> it an assert?

Probably obvious, but I meant "real_in == result" can never be true.
"real_in != result" could be removed or turned into an assert.

>> +	if (GET_CODE (op) == PLUS)
>> +	  {
>> +	    plus = op;
>> +	    op = XEXP (op, 1);
>> +	  }
>
> Sorry, I'm complaining about old reload code again, but: does this
> actually happen in LRA?  In reload, a register operand could become a
> PLUS because of elimination, but I thought LRA did things differently.
> Besides, this is only needed for:
>
>> +	if (CONST_POOL_OK_P (mode, op)
>> +	    && ((targetm.preferred_reload_class
>> +		 (op, (enum reg_class) goal_alt[i]) == NO_REGS)
>> +		|| no_input_reloads_p)
>> +	    && mode != VOIDmode)
>> +	  {
>> +	    rtx tem = force_const_mem (mode, op);
>> +	    
>> +	    change_p = true;
>> +	    /* If we stripped a SUBREG or a PLUS above add it back.  */
>> +	    if (plus != NULL_RTX)
>> +	      tem = gen_rtx_PLUS (mode, XEXP (plus, 0), tem);
>
> and we shouldn't have (plus (constant ...) ...) after elimination
> (or at all outside of a CONST).  I don't understand why the code is
> needed even in reload.

Scratch the thing about needing it for reload.  It's obviously the
second second operand we're reloading, not the first, and it could
well be that an elimination displacement needs to be reloaded via
the constant pool.

The question for LRA still stands though.

Richard
Vladimir Makarov Oct. 14, 2012, 5:36 p.m. UTC | #10
On 12-10-10 11:31 AM, Richard Sandiford wrote:
> Hi Vlad,
>
> Here's a review of the first part of ira-constraints.c.  It's a big file,
> and this is a long reply, so I'll send further comments another day in
> another message.
> Vladimir Makarov <vmakarov@redhat.com> writes:
>> +/* This file contains code for 3 passes: constraint pass,
>> +   inheritance/split pass, and pass for undoing failed inheritance and
>> +   split.
>> +
>> +   The major goal of constraint pass is to transform RTL to satisfy
>> +   insn and address constraints by:
>> +     o choosing insn alternatives;
>> +     o generating *reload insns* (or reloads in brief) and *reload
>> +       pseudos* which will got necessary hard registers later;
> s/got/get/
>
Fixed.
>> +     o substituting pseudo equivalences (if it is done once, is done
>> +       everywhere) and removes insns initializing used equivalent
>> +       substitution.
> Suggest:
>
>       o substituting pseudos with equivalent values and removing the
>         instructions that initialized those pseudos.
Fixed.
>> +   To speed the pass up we process only necessary insns (first time
>> +   all insns) and reuse of already chosen alternatives in some
>> +   cases.
> Suggest:
>
>     On the first iteration of the pass we process every instruction and
>     choose an alternative for each one.  On subsequent iterations we try
>     to avoid reprocessing instructions if we can be sure that the old
>     choice is still valid.
Fixed.
>> +   The inheritance/spilt pass is to transform code to achieve
>> +   ineheritance and live range splitting.  It is done on backward
>> +   traverse of EBBs.
> Typo: inheritance.  "backward traversal".
Fixed.
>> +   The inheritance optimization goal is to reuse values in hard
>> +   registers. There is analogous optimization in old reload pass.  The
>> +   inheritance is achieved by following transformation:
>> +
>> +       reload_p1 <- p	     reload_p1 <- p
>> +       ...		     new_p <- reload_p1
>> +       ...		=>   ...
>> +       reload_p2 <- p	     reload_p2 <- new_p
>> +
>> +   where p is spilled and not changed between the insns.  Reload_p1 is
>> +   also called *original pseudo* and new_p is called *inheritance
>> +   pseudo*.
>> +
>> +   The subsequent assignment pass will try to assign the same (or
>> +   another if it is not possible) hard register to new_p as to
>> +   reload_p1 or reload_p2.
>> +
>> +   If it fails to assign a hard register, the opposite transformation
>> +   will restore the original code on (the pass called undoing
>> +   inheritance) because with spilled new_p the code would be much
>> +   worse. [...]
> Maybe:
>
>     If the assignment pass fails to assign a hard register to new_p,
>     this file will undo the inheritance and restore the original code.
>     This is because implementing the above sequence with a spilled
>     new_p would make the code much worse.
>
Fixed.
>> +   Splitting (transformation) is also done in EBB scope on the same
>> +   pass as the inheritance:
>> +
>> +       r <- ... or ... <- r		 r <- ... or ... <- r
>> +       ...				 s <- r (new insn -- save)
>> +       ...			  =>
>> +       ...				 r <- s (new insn -- restore)
>> +       ... <- r				 ... <- r
>> +
>> +    The *split pseudo* s is assigned to the hard register of the
>> +    original pseudo or hard register r.
>> +
>> +    Splitting is done:
>> +      o In EBBs with high register pressure for global pseudos (living
>> +	in at least 2 BBs) and assigned to hard registers when there
>> +	are more one reloads needing the hard registers;
>> +      o for pseudos needing save/restore code around calls.
>> +
>> +    If the split pseudo still has the same hard register as the
>> +    original pseudo after the subsequent assignment pass, the opposite
>> +    transformation is done on the same pass for undoing inheritance.  */
> AIUI spill_for can spill split pseudos.  I think the comment should say
> what happens then.  If I understand the code correctly, we keep the
> split if "r" is a hard register or was assigned a hard register.
> We undo it if "r" was not assigned a hard register.  Is that right?
>
Yes.  To be more correctly, r and s are assigned the same hard reg 
before the assignment pass.  So if r is *spilled* (by assignment pass) 
or r and s have the same hard reg after the assignment pass, we undo the 
transformation.

I added a comment.
>> +/* Array whose element is (MEM:MODE BASE_REG) corresponding to the
>> +   mode (index) and where BASE_REG is a base hard register for given
>> +   memory mode.	 */
>> +static rtx indirect_mem[MAX_MACHINE_MODE];
> Maybe:
>
> /* Index M is an rtx of the form (mem:M BASE_REG), where BASE_REG
>     is a sample hard register that is a valid address for mode M.
>     The memory refers to the generic address space.  */
Fixed.
>> +/* Return class of hard regno of REGNO or if it is was not assigned to
>> +   a hard register, return its allocno class but only for reload
>> +   pseudos created on the current constraint pass.  Otherwise, return
>> +   NO_REGS.  */
>> +static enum reg_class
>> +get_reg_class (int regno)
> Maybe:
>
> /* If REGNO is a hard register or has been allocated a hard register,
>     return the class of that register.  If REGNO is a pseudo created
>     by the current constraints pass, assume that it will be allocated
>     a hard register and return the class that that register will have.
>     (This assumption is optimistic when REGNO is an inheritance or
>     split pseudo.)  Return NO_REGS otherwise.  */
I don't like

assume that it will be allocated
    a hard register and return the class that that register will have

For example, I could treat this as assigning ax to pseudo having class 
general_regs and returning class a_reg instead of general_regs.

So I modified your comment a bit.
> if that's accurate.  I dropped the term "reload pseudo" because of
> the general comment in my earlier reply about the use of "reload pseudo"
> when the code seems to include inheritance and split pseudos too.
There is no inheritance and splitting yet.  It is done after the 
constraint pass.
So at this stage >= new_regno_start means reload pseudo.
>> +/* Return true if REGNO in REG_MODE satisfies reg class constraint CL.
>> +   For new reload pseudos we should make more accurate class
>> +   *NEW_CLASS (we set up it if it is not NULL) to satisfy the
>> +   constraints.  Otherwise, set up NEW_CLASS to NO_REGS.  */
>> +static bool
>> +in_class_p (int regno, enum machine_mode reg_mode,
>> +	    enum reg_class cl, enum reg_class *new_class)
> Same comment here, since it uses get_reg_class.  I.e. for registers >=
> new_regno_start, we're really testing whether the first allocatable
> register in REGNO's allocno class satisfies CL.
See my comment above.
> Also, the only caller that doesn't directly pass REGNO and REG_MODE
> from an rtx is process_addr_reg.  in_class_p uses:
>
>> +  if (new_class != NULL)
>> +    *new_class = NO_REGS;
>> +  if (regno < FIRST_PSEUDO_REGISTER)
>> +    return TEST_HARD_REG_BIT (reg_class_contents[cl], regno);
>> +  rclass = get_reg_class (regno);
> whereas process_addr_reg uses:
>
>> +  final_regno = regno = REGNO (reg);
>> +  if (regno < FIRST_PSEUDO_REGISTER)
>> +    {
>> +      rtx final_reg = reg;
>> +      rtx *final_loc = &final_reg;
>> +
>> +      lra_eliminate_reg_if_possible (final_loc);
>> +      final_regno = REGNO (*final_loc);
>> +    }
> I.e. process_addr_reg applies eliminations before testing whereas
> in_class_p doesn't.  I couldn't really tell why the two were different.
> Since the idea is that we use elimination source registers to represent
> their targets, shouldn't in_class_p eliminate too?
As I remember that was a fix for a bug for some target which needed an 
eliminated reg for legitimate address recognition.  I did not see such 
problem for the constraints.


> With that difference removed, in_class_p could take the rtx instead
> of a (REGNO, MODE) pair.  It could then pass that rtx directly to
> lra_eliminate_reg_if_possible.  I think this would lead to a cleaner
> interface and make things more regular.
Thanks.  I fixed it.
> Then the comment for in_class_p could be:
>
> /* Return true if X satisfies (or will satisfy) reg class constraint CL.
>     If X is a pseudo created by this constraints pass, assume that it will
>     be allocated a hard register from its allocno class, but allow that
>     class to be narrowed to CL if it is currently a superset of CL.
>
>     If NEW_CLASS is nonnull, set *NEW_CLASS to the new allocno class
>     of REGNO (X), or NO_REGS if no change in its class was needed.  */
Fixed.  I just added 'X is a reload pseudo...'
> That's a change in the meaning of NEW_CLASS, but seems easier for
> callers to handle.  I think all it requires is changing:
>
>> +      common_class = ira_reg_class_subset[rclass][cl];
>> +      if (new_class != NULL)
>> +	*new_class = common_class;
> to:
>
>        common_class = ira_reg_class_subset[rclass][cl];
>        if (new_class != NULL && rclass != common_class)
> 	*new_class = common_class;
This change results in infinite LRA looping on a first libgcc file 
compilation.  Unfortunately I have no time to investigate it.
I'd like to say that most code of in this code is very sensitive to 
changes.  I see it a lot.  You change something looking obvious and a 
target is broken.
I am going to investigate it when I have more time.
>> +  if (regno < new_regno_start
>> +      /* Do not make more accurate class from reloads generated.  They
>> +	 are mostly moves with a lot of constraints.  Making more
>> +	 accurate class may results in very narrow class and
>> +	 impossibility of find registers for several reloads of one
>> +	 insn.	*/
> Maybe:
>
>        /* Do not allow the constraints for reload instructions to
> 	 influence the classes of new pseudos.  These reloads are
> 	 typically moves that have many alternatives, and restricting
> 	 reload pseudos for one alternative may lead to situations
> 	 where other reload pseudos are no longer allocatable.  */
Fixed.
>> +      || INSN_UID (curr_insn) >= new_insn_uid_start)
>> +    return ((regno >= new_regno_start && rclass == ALL_REGS)
>> +	    || (rclass != NO_REGS && ira_class_subset_p[rclass][cl]
>> +		&& ! hard_reg_set_subset_p (reg_class_contents[cl],
>> +					    lra_no_alloc_regs)));
> Why the ALL_REGS special case?  I think it deserves a comment.
>
I added a comment.
>> +/* Return the defined and profitable equiv substitution of reg X, return
>> +   X otherwise.	 */
> Maybe:
>
> /* If we have decided to substitute X with another value, return that value,
>     otherwise return X.  */
>
Fixed.
>> +/* Change class of pseudo REGNO to NEW_CLASS.  Print info about it
>> +   using TITLE.	 Output a new line if NL_P.  */
>> +static void
>> +change_class (int regno, enum reg_class new_class,
>> +	      const char *title, bool nl_p)
>> +{
>> +  if (lra_dump_file != NULL)
>> +    fprintf (lra_dump_file, "%s to class %s for r%d",
>> +	     title, reg_class_names[new_class], regno);
>> +  setup_reg_classes (regno, new_class, NO_REGS, new_class);
>> +  if (lra_dump_file != NULL && nl_p)
>> +    fprintf (lra_dump_file, "\n");
>> +}
> I think either this or setup_reg_classes should have an assert
> that REGNO is >= FIRST_PSEUDO_REGISTER.   This matters more now
> because a lot of LRA deals with hard and pseudo registers
> side-by-side.
Ok. I added an assert.
>> +/* Create a new pseudo using MODE, RCLASS, ORIGINAL, TITLE or reuse
>> +   already created input reload pseudo (only if TYPE is not OP_OUT).
>> +   The result pseudo is returned through RESULT_REG.  Return TRUE if
>> +   we created a new pseudo, FALSE if we reused the already created
>> +   input reload pseudo.	 */
> Maybe:
>
> /* Store in *RESULT_REG a register for reloading ORIGINAL, which has
>     mode MODE.  TYPE specifies the direction of the reload -- either OP_IN
>     or OP_OUT -- and RCLASS specifies the class of hard register required.
It can be OP_INOUT too.
>     Try to reuse existing input reloads where possible.  Return true if
>     *RESULT_REG is a new register, false if it is an existing one.
>     Use TITLE to describe new registers for debug purposes.  */
>
> although I admit that's a bit convoluted...
I combined two comments in a better one.
>> +  for (i = 0; i < curr_insn_input_reloads_num; i++)
>> +    if (rtx_equal_p (curr_insn_input_reloads[i].input, original))
>> +      break;
>> +  if (i >= curr_insn_input_reloads_num
>> +      || ! in_class_p (REGNO (curr_insn_input_reloads[i].reg),
>> +		       GET_MODE (curr_insn_input_reloads[i].reg),
>> +		       rclass, &new_class))
>> +    {
>> +      res_p = true;
>> +      *result_reg = lra_create_new_reg (mode, original, rclass, title);
>> +    }
>> +  else
>> +    {
>> +      lra_assert (! side_effects_p (original));
>> +      res_p = false;
>> +      *result_reg = curr_insn_input_reloads[i].reg;
>> +      regno = REGNO (*result_reg);
>> +      if (lra_dump_file != NULL)
>> +	 {
>> +	   fprintf (lra_dump_file, "	 Reuse r%d for reload ", regno);
>> +	   print_value_slim (lra_dump_file, original, 1);
>> +	 }
>> +      if (rclass != new_class)
>> +	 change_class (regno, new_class, ", change", false);
>> +      if (lra_dump_file != NULL)
>> +	 fprintf (lra_dump_file, "\n");
>> +    }
>> +  lra_assert (curr_insn_input_reloads_num < LRA_MAX_INSN_RELOADS);
>> +  curr_insn_input_reloads[curr_insn_input_reloads_num].input = original;
>> +  curr_insn_input_reloads[curr_insn_input_reloads_num++].reg = *result_reg;
>> +  return res_p;
> It probably doesn't matter in practice, but I think this would
> be better as:
>
>    for (i = 0; i < curr_insn_input_reloads_num; i++)
>      if (rtx_equal_p (curr_insn_input_reloads[i].input, original)
>          && in_class_p (curr_insn_input_reloads[i].reg, rclass, &new_class))
>        {
>          ...reuse case..
>          return false;
>        }
>    ...new case...
>    return true;
>
> which also copes with the unlikely case that the same input is used
> three times, and that the third use requires the same class as the
> second.
Ok. I fixed it.
>> +/* The page contains code to extract memory address parts.  */
>> +
>> +/* Info about base and index regs of an address.  In some rare cases,
>> +   base/index register can be actually memory.	In this case we will
>> +   reload it.  */
>> +struct address
>> +{
>> +  rtx *base_reg_loc;  /* NULL if there is no a base register.  */
>> +  rtx *base_reg_loc2; /* Second location of {post/pre}_modify, NULL
>> +			 otherwise.  */
>> +  rtx *index_reg_loc; /* NULL if there is no an index register.	 */
>> +  rtx *index_loc; /* location of index reg * scale or index_reg_loc
>> +		      otherwise.  */
>> +  rtx *disp_loc; /* NULL if there is no a displacement.	 */
>> +  /* Defined if base_reg_loc is not NULL.  */
>> +  enum rtx_code base_outer_code, index_code;
>> +  /* True if the base register is modified in the address, for
>> +     example, in PRE_INC.  */
>> +  bool base_modify_p;
>> +};
> Comments should be consistently above the fields rather than to the right.
Fixed.
>> +/* Process address part in space AS (or all address if TOP_P) with
>> +   location *LOC to extract address characteristics.
>> +
>> +   If CONTEXT_P is false, we are looking at the base part of an
>> +   address, otherwise we are looking at the index part.
>> +
>> +   MODE is the mode of the memory reference; OUTER_CODE and INDEX_CODE
>> +   give the context that the rtx appears in; MODIFY_P if *LOC is
>> +   modified.  */
>> +static void
>> +extract_loc_address_regs (bool top_p, enum machine_mode mode, addr_space_t as,
>> +			  rtx *loc, bool context_p, enum rtx_code outer_code,
>> +			  enum rtx_code index_code,
>> +			  bool modify_p, struct address *ad)
>> +{
>> +  rtx x = *loc;
>> +  enum rtx_code code = GET_CODE (x);
>> +  bool base_ok_p;
>> +
>> +  switch (code)
>> +    {
>> +    case CONST_INT:
>> +    case CONST:
>> +    case SYMBOL_REF:
>> +    case LABEL_REF:
>> +      if (! context_p)
>> +	ad->disp_loc = loc;
> This looks a bit odd.  I assume it's trying to avoid treating MULT
> scale factors as displacements, but I thought whether something was
> a displacement or not depended on whether it is involved (possibly
> indirectly) in a sum with the base.  Seems like it'd be better
> to check for that directly.
>
>> +	/* If this machine only allows one register per address, it
>> +	   must be in the first operand.  */
>> +	if (MAX_REGS_PER_ADDRESS == 1 || code == LO_SUM)
>> +	  {
>> +	    extract_loc_address_regs (false, mode, as, arg0_loc, false, code,
>> +				      code1, modify_p, ad);
>> +	    ad->disp_loc = arg1_loc;
>> +	  }
>> +	/* If index and base registers are the same on this machine,
>> +	   just record registers in any non-constant operands.	We
>> +	   assume here, as well as in the tests below, that all
>> +	   addresses are in canonical form.  */
>> +	else if (INDEX_REG_CLASS
>> +		 == base_reg_class (VOIDmode, as, PLUS, SCRATCH)
>> +		 && code0 != PLUS && code0 != MULT)
>> +	  {
>> +	    extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
>> +				      code1, modify_p, ad);
>> +	    if (! CONSTANT_P (arg1))
>> +	      extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
>> +					code0, modify_p, ad);
>> +	    else
>> +	      ad->disp_loc = arg1_loc;
>> +	  }
>> +
>> +	/* If the second operand is a constant integer, it doesn't
>> +	   change what class the first operand must be.	 */
>> +	else if (code1 == CONST_INT || code1 == CONST_DOUBLE)
>> +	  {
>> +	    ad->disp_loc = arg1_loc;
>> +	    extract_loc_address_regs (false, mode, as, arg0_loc, context_p,
>> +				      PLUS, code1, modify_p, ad);
>> +	  }
>> +	/* If the second operand is a symbolic constant, the first
>> +	   operand must be an index register but only if this part is
>> +	   all the address.  */
>> +	else if (code1 == SYMBOL_REF || code1 == CONST || code1 == LABEL_REF)
>> +	  {
>> +	    ad->disp_loc = arg1_loc;
>> +	    extract_loc_address_regs (false, mode, as, arg0_loc,
>> +				      top_p ? true : context_p, PLUS, code1,
>> +				      modify_p, ad);
>> +	  }
> What's the reason for the distinction between the last two, which AIUI
> doesn't exist in reload?  I'm not sure the:
>
>      top_p ? true : context_p
>
> condition is safe: some targets use aligning addresses like
> (and X (const_int -ALIGN)), but that shouldn't really affect whether
> a register in X is treated as a base or an index.
The code works for PPC which uses aligning addresses.
>> +	/* If both operands are registers but one is already a hard
>> +	   register of index or reg-base class, give the other the
>> +	   class that the hard register is not.	 */
>> +	else if (code0 == REG && code1 == REG
>> +		 && REGNO (arg0) < FIRST_PSEUDO_REGISTER
>> +		 && ((base_ok_p
>> +		      = ok_for_base_p_nonstrict (arg0, mode, as, PLUS, REG))
>> +		     || ok_for_index_p_nonstrict (arg0)))
>> +	  {
>> +	    extract_loc_address_regs (false, mode, as, arg0_loc, ! base_ok_p,
>> +				      PLUS, REG, modify_p, ad);
>> +	    extract_loc_address_regs (false, mode, as, arg1_loc, base_ok_p,
>> +				      PLUS, REG, modify_p, ad);
>> +	  }
>> +	else if (code0 == REG && code1 == REG
>> +		 && REGNO (arg1) < FIRST_PSEUDO_REGISTER
>> +		 && ((base_ok_p
>> +		      = ok_for_base_p_nonstrict (arg1, mode, as, PLUS, REG))
>> +		     || ok_for_index_p_nonstrict (arg1)))
>> +	  {
>> +	    extract_loc_address_regs (false, mode, as, arg0_loc, base_ok_p,
>> +				      PLUS, REG, modify_p, ad);
>> +	    extract_loc_address_regs (false, mode, as, arg1_loc, ! base_ok_p,
>> +				      PLUS, REG, modify_p, ad);
>> +	  }
>> +	/* If one operand is known to be a pointer, it must be the
>> +	   base with the other operand the index.  Likewise if the
>> +	   other operand is a MULT.  */
>> +	else if ((code0 == REG && REG_POINTER (arg0)) || code1 == MULT)
>> +	  {
>> +	    extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
>> +				      code1, modify_p, ad);
>> +	    if (code1 == MULT)
>> +	      ad->index_loc = arg1_loc;
>> +	    extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
>> +				      code0, modify_p, ad);
>> +	  }
>> +	else if ((code1 == REG && REG_POINTER (arg1)) || code0 == MULT)
>> +	  {
>> +	    extract_loc_address_regs (false, mode, as, arg0_loc, true, PLUS,
>> +				      code1, modify_p, ad);
>> +	    if (code0 == MULT)
>> +	      ad->index_loc = arg0_loc;
>> +	    extract_loc_address_regs (false, mode, as, arg1_loc, false, PLUS,
>> +				      code0, modify_p, ad);
>> +	  }
> Some targets care about the choice between index and base for
> correctness reasons (PA IIRC) or for performance (some ppc targets IIRC),
> so I'm not sure whether it's safe to give REG_POINTER such a low priority.
This code works for PPC and PARISC.
>> +    default:
>> +      {
>> +	const char *fmt = GET_RTX_FORMAT (code);
>> +	int i;
>> +
>> +	if (GET_RTX_LENGTH (code) != 1
>> +	    || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
>> +	  {
>> +	    for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
>> +	      if (fmt[i] == 'e')
>> +		extract_loc_address_regs (false, mode, as, &XEXP (x, i),
>> +					  context_p, code, SCRATCH,
>> +					  modify_p, ad);
>> +	    break;
>> +	  }
>> +	/* fall through for case UNARY_OP (UNSPEC ...)	*/
>> +      }
>> +
>> +    case UNSPEC:
>> +      if (ad->disp_loc == NULL)
>> +	ad->disp_loc = loc;
>> +      else if (ad->base_reg_loc == NULL)
>> +	{
>> +	  ad->base_reg_loc = loc;
>> +	  ad->base_outer_code = outer_code;
>> +	  ad->index_code = index_code;
>> +	  ad->base_modify_p = modify_p;
>> +	}
>> +      else
>> +	{
>> +	  lra_assert (ad->index_reg_loc == NULL);
>> +	  ad->index_reg_loc = loc;
>> +	}
>> +      break;
>> +
>> +    }
> Which targets use a bare UNSPEC as a displacement?  I thought a
> displacement had to be a link-time constant, in which case it should
> satisfy CONSTANT_P.  For UNSPECs, that means wrapping it in a CONST.
I saw it somewhere.  I guess IA64.
> I'm just a bit worried that the UNSPEC handling is sensitive to the
> order that subrtxes are processed (unlike PLUS, which goes to some
> trouble to work out what's what).  It could be especially confusing
> because the default case processes operands in reverse order while
> PLUS processes them in forward order.
>
> Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
> Probably deserves a comment.
I don't remember.  To figure out, I should switch it off and try all 
targets supported by LRA.
> AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
> recording where reloads of a particular class need to go (obviously
> in the case of disp_loc, which isn't reloaded at all).  The feidls
> have semantic value too.  I.e. we use them to work out the value
> of at least part of the address.
>
> In that case it seems dangerous to look through general rtxes
> in the way that the default case above does.  Maybe just making
> sure that DISP_LOC is involved in a sum with the base would be
> enough, but another idea was:
>
> ----------------------------------------------------------------
> I know of three ways of "mutating" (for want of a better word)
> an address:
>
>    1. (and X (const_int X)), to align
>    2. a subreg
>    3. a unary operator (such as truncation or extension)
>
> So maybe we could:
>
>    a. remove outer mutations (using a helper function)
>    b. handle LO_SUM, PRE_*, POST_*: as now
>    c. otherwise treat the address of the sum of one, two or three pieces.
>       c1. Peel mutations of all pieces.
>       c2. Classify the pieces into base, index and displacement.
>           This would be similar to the jousting code above, but hopefully
>           easier because all three rtxes are to hand.  E.g. we could
>           do the base vs. index thing in a similar way to
>           commutative_operand_precedence.
>       c3. Record which pieces were mutated (e.g. using something like the
>           index_loc vs. index_reg_loc distinction in the current code)
>
> That should be general enough for current targets, but if it isn't,
> we could generalise it further when we know what generalisation is needed.
>
> That's still going to be a fair amount of code, but hopefully not more,
> and we might have more confidence at each stage what each value is.
> And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
> ----------------------------------------------------------------
>
> Just an idea though.  Probably not for 4.8, although I might try it
> if I find time.
I am not sure that you listed all the cases.  It would be great if you 
listed all the cases. In this case we could make this function more clear.
I tried to do this first but permanently found new cases.  After that I 
gave up and tried to use more general implementation.

This function was rewritten and modified many times.  I am afraid to do 
this again when clock is ticking.

It would be great if you re-implement the function according to your 
ideas and we could try it on 8 targets to which LRA was already ported.  
An LRA sub-branch would a perfect place to do it
> It would be nice to sort out the disp_loc thing for 4.8 though.
>
>> +/* Extract address characteristics in address with location *LOC in
>> +   space AS.  Return them in AD.  Parameter OUTER_CODE for MEM should
>> +   be MEM.  Parameter OUTER_CODE for 'p' constraint should be ADDRESS
>> +   and MEM_MODE should be VOIDmode.  */
> Maybe:
>
> /* Describe address *LOC in AD.  There are two cases:
>
>     - *LOC is the address in a (mem ...).  In this case OUTER_CODE is MEM
>       and AS is the mem's address space.
>
>     - *LOC is matched to an address constraint such as 'p'.  In this case
>       OUTER_CODE is ADDRESS and AS is ADDR_SPACE_GENERIC.  */
Fixed.
>> +/* Return start register offset of hard register REGNO in MODE.	 */
>> +int
>> +lra_constraint_offset (int regno, enum machine_mode mode)
>> +{
>> +  lra_assert (regno < FIRST_PSEUDO_REGISTER);
>> +  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
>> +     multiple hard register group of scalar integer registers, so that
>> +     for example (reg:DI 0) and (reg:SI 1) will be considered the same
>> +     register.	*/
>> +  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (mode) > UNITS_PER_WORD
>> +      && SCALAR_INT_MODE_P (mode))
>> +    return hard_regno_nregs[regno][mode] - 1;
>> +  return 0;
>> +}
> Maybe the head comment could be:
>
> /* Return the offset from REGNO of the least significant register
>     in (reg:MODE REGNO).
>
>     This function is used to tell whether two registers satisfy
>     a matching constraint.  (reg:MODE1 REGNO1) matches (reg:MODE2 REGNO2) if:
>
>           REGNO1 + lra_constraint_offset (REGNO1, MODE1)
>        == REGNO2 + lra_constraint_offset (REGNO2, MODE2)  */
>
> (and remove the inner comment).
>
Fixed.
>> +/* Like rtx_equal_p except that it allows a REG and a SUBREG to match
>> +   if they are the same hard reg, and has special hacks for
>> +   auto-increment and auto-decrement.  This is specifically intended for
>> +   process_alt_operands to use in determining whether two operands
>> +   match.  X is the operand whose number is the lower of the two.
>> +
>> +   It is supposed that X is the output operand and Y is the input
>> +   operand.  */
>> +static bool
>> +operands_match_p (rtx x, rtx y, int y_hard_regno)
> Need to say what Y_HARD_REGNO is.
>
Fixed.
>> +  switch (code)
>> +    {
>> +    case CONST_INT:
>> +    case CONST_DOUBLE:
>> +    case CONST_FIXED:
> After a recent change this should be:
>
>     CASE_CONST_UNIQUE:
Ok.  It looks I need another round of merging.  Oh, well.
>> +	      val = operands_match_p (XVECEXP (x, i, j), XVECEXP (y, i, j),
>> +				      y_hard_regno);
>> +	      if (val == 0)
>> +		return false;
> Why do we pass the old y_hard_regno even though Y has changed?
> Some of the earlier code assumes that GET_MODE (y) is the mode
> of y_hard_regno.
It does not matter.  As we processed reg and subreg case above and 
y_hard_regno can be non-negative only for this cases.  We can not 
process it again.  But I changed it to -1 for clearness.
>> +/* Reload pseudos created for matched input and output reloads whose
>> +   mode are different.	Such pseudos has a modified rules for finding
>> +   their living ranges, e.g. assigning to subreg of such pseudo means
>> +   changing all pseudo value.  */
>> +bitmap_head lra_bound_pseudos;
> Maybe:
>
> /* Reload pseudos created for matched input and output reloads whose
>     modes are different.  Such pseudos have different live ranges from
>     other pseudos; e.g. any assignment to a subreg of these pseudos
>     changes the whole pseudo's value.  */
Fixed.
> Although that said, couldn't emit_move_insn_1 (called by gen_move_insn)
> split a multiword pseudo move into two word moves?  Using the traditional
> clobber technique sounds better than having special liveness rules.
>
It is not only about multi-words pseudos.  It is about representation of 
this situation by constructions semantically incorrect in order parts of 
compiler.  Reload has no such problem as it does not use RTL.   So I 
don't think it splits as I use emit_move_insn and that calls 
emit_move_insn_1 too.  I really needed a special liveness treatment 
(although I don't remember details) and therefore I added it.  I had no 
detail design for LRA.  The code was modified by numerous test failures 
on different targets.  There is a lot of code analogous to reload one 
and probably its necessity should be rigorously questioned.  I thought 
about and modified part of this code but unfortunately not all.

Also bound pseudos are rare.  Their bitmap is very small and testing (2 
lines of code in overall) them in ira-lives.c is fast.

>> +/* True if C is a non-empty register class that has too few registers
>> +   to be safely used as a reload target class.	*/
>> +#define SMALL_REGISTER_CLASS_P(C)					\
>> +  (reg_class_size [(C)] == 1						\
>> +   || (reg_class_size [(C)] >= 1 && targetm.class_likely_spilled_p (C)))
> Feels like ira_class_hard_regs_num might be better, but since the
> current definition is traditional, that shouldn't be a merge requirement.
Ok.
>> +/* Return mode of WHAT inside of WHERE whose mode of the context is
>> +   OUTER_MODE.	If WHERE does not contain WHAT, return VOIDmode.  */
>> +static enum machine_mode
>> +find_mode (rtx *where, enum machine_mode outer_mode, rtx *what)
>> +{
>> +  int i, j;
>> +  enum machine_mode mode;
>> +  rtx x;
>> +  const char *fmt;
>> +  enum rtx_code code;
>> +
>> +  if (where == what)
>> +    return outer_mode;
>> +  if (*where == NULL_RTX)
>> +    return VOIDmode;
>> +  x = *where;
>> +  code = GET_CODE (x);
>> +  outer_mode = GET_MODE (x);
>> +  fmt = GET_RTX_FORMAT (code);
>> +  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
>> +    {
>> +      if (fmt[i] == 'e')
>> +	{
>> +	  if ((mode = find_mode (&XEXP (x, i), outer_mode, what)) != VOIDmode)
>> +	    return mode;
>> +	}
>> +      else if (fmt[i] == 'E')
>> +	{
>> +	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
>> +	  if ((mode = find_mode (&XVECEXP (x, i, j), outer_mode, what))
>> +	      != VOIDmode)
>> +	    return mode;
>> +	}
>> +    }
>> +  return VOIDmode;
>> +}
>> +
>> +/* Return mode for operand NOP of the current insn.  */
>> +static inline enum machine_mode
>> +get_op_mode (int nop)
>> +{
>> +  rtx *loc;
>> +  enum machine_mode mode;
>> +  bool md_first_p = asm_noperands (PATTERN (curr_insn)) < 0;
>> +
>> +  /* Take mode from the machine description first.  */
>> +  if (md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
>> +    return mode;
>> +  loc = curr_id->operand_loc[nop];
>> +  /* Take mode from the operand second.	 */
>> +  mode = GET_MODE (*loc);
>> +  if (mode != VOIDmode)
>> +    return mode;
>> +  if (! md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
>> +    return mode;
>> +  /* Here is a very rare case.	Take mode from the context.  */
>> +  return find_mode (&PATTERN (curr_insn), VOIDmode, loc);
>> +}
> This looks a lot more complicated than the reload version.  Why is
> it needed?  In reload the conditions for address operands were:
>
> 	  /* Address operands are reloaded in their existing mode,
> 	     no matter what is specified in the machine description.  */
> 	  operand_mode[i] = GET_MODE (recog_data.operand[i]);
>
> 	  /* If the address is a single CONST_INT pick address mode
> 	     instead otherwise we will later not know in which mode
> 	     the reload should be performed.  */
> 	  if (operand_mode[i] == VOIDmode)
> 	    operand_mode[i] = Pmode;
>
> which for LRA might look like:
>
> 	  /* The mode specified in the .md file for address operands
> 	     is the mode of the addressed value, not the address itself.
> 	     We therefore need to get the mode from the operand rtx.
> 	     If the operand has no mode, assume it was Pmode.  */
>
> For other operands, recog_data.operand_mode ought to be correct.
>
> find_mode assumes that the mode of an operand is the same as the mode of
> the outer rtx, which isn't true when the outer rtx is a subreg, mem,
> or one of several unary operators.
>
> This is one that I think would be best decided for 4.8.
I found the reason for this code:

  http://old.nabble.com/-lra--patch-to-fix-SPEC2000-sixtrack-compiler-crash-p32189310.html

But I tried it again and can not repeat the problem.  So I reverted the 
patch.  If on stage3 we find a problem, we try to find a solution.
>> +/* If REG is a reload pseudo, try to make its class satisfying CL.  */
>> +static void
>> +narrow_reload_pseudo_class (rtx reg, enum reg_class cl)
>> +{
>> +  int regno;
>> +  enum reg_class rclass;
>> +
>> +  /* Do not make more accurate class from reloads generated.  They are
>> +     mostly moves with a lot of constraints.  Making more accurate
>> +     class may results in very narrow class and impossibility of find
>> +     registers for several reloads of one insn.	 */
>> +  if (INSN_UID (curr_insn) >= new_insn_uid_start)
>> +    return;
>> +  if (GET_CODE (reg) == SUBREG)
>> +    reg = SUBREG_REG (reg);
>> +  if (! REG_P (reg) || (regno = REGNO (reg)) < new_regno_start)
>> +    return;
>> +  rclass = get_reg_class (regno);
>> +  rclass = ira_reg_class_subset[rclass][cl];
>> +  if (rclass == NO_REGS)
>> +    return;
>> +  change_class (regno, rclass, "      Change", true);
>> +}
> There seems to be an overlap in functionality with in_class_p here.
> Maybe:
>
> {
>    enum reg_class rclass;
>
>    if (in_class_p (reg, cl, &rclass) && rclass != NO_REGS)
>      change_class (REGNO (reg), rclass, "      Change", true);
> }
>
> (assuming the change in in_class_p interface suggested above).
> This avoids duplicating subtleties like the handling of reloads.
>
Fixed.
>> +      /* We create pseudo for out rtx because we always should keep
>> +	 registers with the same original regno have synchronized
>> +	 value (it is not true for out register but it will be
>> +	 corrected by the next insn).
> I don't understand this comment, sorry.
>
Pseudos have values -- see comments for lra_reg_info.  Different pseudos 
with the same value do not conflict even if they live in the same 
place.  When we create a pseudo we assign value of original pseudo (if 
any) from which we created the new pseudo.  If we create the pseudo from 
the input pseudo, the new pseudo will no conflict with the input pseudo 
which is wrong when the input pseudo lives after the insn and as the new 
pseudo value is changed by the insn output.  Therefore we create the new 
pseudo from the output.

I hope it is more understandable.  I changed the comment.
>> +	 Do not reuse register because of the following situation: a <-
>> +	 a op b, and b should be the same as a.	 */
> This part is very convincing though :-)  Maybe:
> 	 We cannot reuse the current output register because we might
> 	 have a situation like "a <- a op b", where the constraints force
> 	 the second input operand ("b") to match the output operand ("a").
> 	 "b" must then be copied into a new register so that it doesn't
> 	 clobber the current value of "a".  */
Ok.  Fixed.
> We should probably keep the other reason too, of course.
>
>> +      /* Don't generate inheritance for the new register because we
>> +	 can not use the same hard register for the corresponding
>> +	 inheritance pseudo for input reload.  */
>> +      bitmap_set_bit (&lra_matched_pseudos, REGNO (new_in_reg));
> Suggest dropping this comment, since we don't do any inheritance here.
> The comment above lra_matched_pseudos already says the same thing.
>
Fixed.
>> +  /* In and out operand can be got from transformations before
>> +     processing constraints.  So the pseudos might have inaccurate
>> +     class and we should make their classes more accurate.  */
>> +  narrow_reload_pseudo_class (in_rtx, goal_class);
>> +  narrow_reload_pseudo_class (out_rtx, goal_class);
> I don't understand this, sorry.  Does "transformations" mean inheritance
> and reload splitting?  So the registers we're changing here are inheritance
> and split pseudos rather than reload pseudos created for this instruction?
> If so, it sounds on face value like it conflicts with the comment quoted
> above about not allowing reload instructions to the narrow the class
> of pseudos.  Might be worth saying why that's OK here but not there.
Again, inheritance and splitting is done after the constraint pass.

The transformations here are mostly reloading of subregs which is done 
before reloads for given insn.  On this transformation we create new 
pseudos for which we don't know reg class yet.  In case we don't know 
pseudo reg class yet, we assign ALL_REGS to the pseudo.
> Also, I'm not sure I understand why it helps.  Is it just trying
> to encourage the pseudos to form a chain in lra-assigns.c?
>
> E.g. MIPS16 has several instructions that require matched MIPS16 registers.
> However, moves between MIPS16 registers and general registers are as cheap
> as moves between two MIPS16 registers, so narrowing the reloaded values
> from GENERAL_REGS to M16_REGS (if that ever happens) wouldn't necessarily
> be a good thing.
>
> Not saying this is wrong, just that it might need more commentary
> to justify it.
>
>> +  for (i = 0; (in = ins[i]) >= 0; i++)
>> +    *curr_id->operand_loc[in] = new_in_reg;
> The code assumes that all input operands have the same mode.
> Probably worth asserting that here (or maybe further up; I don't mind),
> just to make the assumption explicit.
I added an assert.
>> +/* Return final hard regno (plus offset) which will be after
>> +   elimination.	 We do this for matching constraints because the final
>> +   hard regno could have a different class.  */
>> +static int
>> +get_final_hard_regno (int hard_regno, int offset)
>> +{
>> +  if (hard_regno < 0)
>> +    return hard_regno;
>> +  hard_regno += offset;
>> +  return lra_get_elimation_hard_regno (hard_regno);
> Why apply the offset before rather than after elimination?
> AIUI, AVR's eliminable registers span more than one hard register,
> and the elimination is based off the first.
I fixed it.
> Also, all uses but one of lra_get_hard_regno_and_offset follow
> the pattern:
>
>        lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
>        /* The real hard regno of the operand after the allocation.  */
>        x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
>
> so couldn't lra_get_hard_regno_and_offset just return the final
> hard register, including elimination?  Then it could apply the
> elimination on the original rtx.
>
> FWIW, the exception I mentioned was operands_match_p:
>
>        lra_get_hard_regno_and_offset (x, &i, &offset);
>        if (i < 0)
> 	goto slow;
>        i += offset;
>
> but I'm not sure why this is the only caller that would want
> to ignore elimination.
???
>> +/* Return register class of OP.	 That is a class of the hard register
>> +   itself (if OP is a hard register), or class of assigned hard
>> +   register to the pseudo (if OP is pseudo), or allocno class of
>> +   unassigned pseudo (if OP is reload pseudo).	Return NO_REGS
>> +   otherwise.  */
>> +static enum reg_class
>> +get_op_class (rtx op)
>> +{
>> +  int regno, hard_regno, offset;
>> +
>> +  if (! REG_P (op))
>> +    return NO_REGS;
>> +  lra_get_hard_regno_and_offset (op, &hard_regno, &offset);
>> +  if (hard_regno >= 0)
>> +    {
>> +      hard_regno = get_final_hard_regno (hard_regno, offset);
>> +      return REGNO_REG_CLASS (hard_regno);
>> +    }
>> +  /* Reload pseudo will get a hard register in any case.  */
>> +  if ((regno = REGNO (op)) >= new_regno_start)
>> +    return lra_get_allocno_class (regno);
>> +  return NO_REGS;
>> +}
> This looks like it ought to be the same as:
>
>    return REG_P (x) ? get_reg_class (REGNO (x)) : NO_REGS;
>
> If not, I think there should be a comment explaining the difference.
> If so, the comment might be:
>
> /* If OP is a register, return the class of the register as per
>     get_reg_class, otherwise return NO_REGS.  */
The difference is in elimination.  But if i add elimination to 
get_reg_class, it will be the same.  I think it is right to do.  It 
permits to remove elimination code in process_addr_reg too.  So I 
modified the code.  It looks more brief and logical.
>> +/* Return generated insn mem_pseudo:=val if TO_P or val:=mem_pseudo
>> +   otherwise.  If modes of MEM_PSEUDO and VAL are different, use
>> +   SUBREG for VAL to make them equal.  Assign CODE to the insn if it
>> +   is not recognized.
>> +
>> +   We can not use emit_move_insn in some cases because of bad used
>> +   practice in some machine descriptions.  For example, power can use
>> +   only base+index addressing for altivec move insns and it is checked
>> +   by insn predicates.	On the other hand, the same move insn
>> +   constraints permit to use offsetable memory for moving vector mode
>> +   values from/to general registers to/from memory.  emit_move_insn
>> +   will transform offsetable address to one with base+index addressing
>> +   which is rejected by the constraint.	 So sometimes we need to
>> +   generate move insn without modifications and assign the code
>> +   explicitly because the generated move can be unrecognizable because
>> +   of the predicates.  */
> Ick :-)  Can't we just say that fixing this is part of the process
> of porting a target to LRA?  It'd be nice not to carry hacks like
> this around in shiny new code.
It would be great but I don't expect that target maintainers will be so 
cooperative.  So my goal was to write LRA requiring minimum changes in 
target code or no changes at all.
Even with keeping this goal, I am a bit pessimistic about how much time 
will be needed to remove reload.  With such requirements not to use 
hacks, it would take forever.

The biggest number of tricks I saw was PPC.  I spent a lot of time 
porting LRA to it.  And there are a lot of code in rs6000.c for LRA.
> As it stands:
>
>> +static rtx
>> +emit_spill_move (bool to_p, rtx mem_pseudo, rtx val, int code)
>> +{
>> +  rtx insn, after;
>> +
>> +  start_sequence ();
>> +  if (GET_MODE (mem_pseudo) != GET_MODE (val))
>> +    val = gen_rtx_SUBREG (GET_MODE (mem_pseudo),
>> +			  GET_CODE (val) == SUBREG ? SUBREG_REG (val) : val,
>> +			  0);
>> +  if (to_p)
>> +    insn = gen_move_insn (mem_pseudo, val);
>> +  else
>> +    insn = gen_move_insn (val, mem_pseudo);
>> +  if (recog_memoized (insn) < 0)
>> +    INSN_CODE (insn) = code;
>> +  emit_insn (insn);
>> +  after = get_insns ();
>> +  end_sequence ();
>> +  return after;
>> +}
> this recog_memoized code effectively assumes that INSN is just one
> instruction, whereas emit_move_insn_1 or the backend move expanders
> could split moves into several instructions.
>
> Since the code-forcing stuff is for rs6000, I think we could drop it
> from 4.8 whatever happens.
>
> The sequence stuff above looks redundant; we should just return
> INSN directly.
Ok. I fixed.  Although it makes my life harder as some targets will be 
broken on the branch after all these changes.
>> +  /* Quick check on the right move insn which does not need
>> +     reloads.  */
>> +  if ((dclass = get_op_class (dest)) != NO_REGS
>> +      && (sclass = get_op_class (src)) != NO_REGS
>> +      && targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2)
>> +    return true;
> Suggest:
>
>    /* The backend guarantees that register moves of cost 2 never need
>       reloads.  */
Fixed.
>> +  if (GET_CODE (dest) == SUBREG)
>> +    dreg = SUBREG_REG (dest);
>> +  if (GET_CODE (src) == SUBREG)
>> +    sreg = SUBREG_REG (src);
>> +  if (! REG_P (dreg) || ! REG_P (sreg))
>> +    return false;
>> +  sclass = dclass = NO_REGS;
>> +  dr = get_equiv_substitution (dreg);
>> +  if (dr != dreg)
>> +    dreg = copy_rtx (dr);
> I think this copy is too early, because there are quite a few
> conditions under which we never emit anything with DREG in it.
Ok, fixed.
>> +  if (REG_P (dreg))
>> +    dclass = get_reg_class (REGNO (dreg));
>> +  if (dclass == ALL_REGS)
>> +    /* We don't know what class we will use -- let it be figured out
>> +       by curr_insn_transform function.	 Remember some targets does not
>> +       work with such classes through their implementation of
>> +       machine-dependent hooks like secondary_memory_needed.  */
>> +    return false;
> Don't really understand this comment, sorry.
Again  ALL_REGS is used for new pseudos created by transformation like 
reload of SUBREG_REG.  We don't know its class yet.  We should figure 
out the class from processing the insn constraints not in this fast path 
function.  Even if ALL_REGS were a right class for the pseudo, 
secondary_... hooks usually are not define for ALL_REGS.

I fixed the comment.
>> +  sreg_mode = GET_MODE (sreg);
>> +  sr = get_equiv_substitution (sreg);
>> +  if (sr != sreg)
>> +    sreg = copy_rtx (sr);
> This copy also seems too early.
Fixed.
>> +  sri.prev_sri = NULL;
>> +  sri.icode = CODE_FOR_nothing;
>> +  sri.extra_cost = 0;
>> +  secondary_class = NO_REGS;
>> +  /* Set up hard register for a reload pseudo for hook
>> +     secondary_reload because some targets just ignore unassigned
>> +     pseudos in the hook.  */
>> +  if (dclass != NO_REGS
>> +      && REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start
>> +      && lra_get_regno_hard_regno (dregno) < 0)
>> +    reg_renumber[dregno] = ira_class_hard_regs[dclass][0];
>> +  else
>> +    dregno = -1;
>> +  if (sclass != NO_REGS
>> +      && REG_P (sreg) && (sregno = REGNO (sreg)) >= new_regno_start
>> +      && lra_get_regno_hard_regno (sregno) < 0)
>> +    reg_renumber[sregno] = ira_class_hard_regs[sclass][0];
>> +  else
>> +    sregno = -1;
> I think this would be correct without the:
>
>       && REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start
>
> condition (and similarly for the src case).  IMO it would be clearer too:
> the decision about when to return a register class for unallocated pseudos
> is then localised to get_reg_class rather than copied both here and there.
Right. fixed.
>> +  if (sclass != NO_REGS)
>> +    secondary_class
>> +      = (enum reg_class) targetm.secondary_reload (false, dest,
>> +						   (reg_class_t) sclass,
>> +						   GET_MODE (src), &sri);
>> +  if (sclass == NO_REGS
>> +      || ((secondary_class != NO_REGS || sri.icode != CODE_FOR_nothing)
>> +	  && dclass != NO_REGS))
>> +    secondary_class
>> +      = (enum reg_class) targetm.secondary_reload (true, sreg,
>> +						   (reg_class_t) dclass,
>> +						   sreg_mode, &sri);
> Hmm, so for register<-register moves, if the target says that the output
> reload needs a secondary reload, we try again with an input reload and
> hope for a different answer?
>
> If the target is giving different answers in that case, I think that's
> a bug in the target, and we should assert instead.  The problem is that
> if we allow the answers to be different, and both answers involve
> secondary reloads, we have no way of knowing whether the second answer
> is easier to implement or "more correct" than the first.  An assert
> avoids that, and puts the onus on the target to sort itself out.
>
> Again, as long as x86 is free of this bug for 4.8, I don't the merge
> needs to cater for broken targets.
I added an assert.
>> +  *change_p = true;
> I think this is the point at which substituted values should be copied.
Fixed.
>> +  new_reg = NULL_RTX;
>> +  if (secondary_class != NO_REGS)
>> +    new_reg = lra_create_new_reg_with_unique_value (sreg_mode, NULL_RTX,
>> +						    secondary_class,
>> +						    "secondary");
>> +  start_sequence ();
>> +  if (sri.icode == CODE_FOR_nothing)
>> +    lra_emit_move (new_reg, sreg);
>> +  else
>> +    {
>> +      enum reg_class scratch_class;
>> +
>> +      scratch_class = (reg_class_from_constraints
>> +		       (insn_data[sri.icode].operand[2].constraint));
>> +      scratch_reg = (lra_create_new_reg_with_unique_value
>> +		     (insn_data[sri.icode].operand[2].mode, NULL_RTX,
>> +		      scratch_class, "scratch"));
>> +      emit_insn (GEN_FCN (sri.icode) (new_reg != NULL_RTX ? new_reg : dest,
>> +				      sreg, scratch_reg));
>> +    }
>> +  before = get_insns ();
>> +  end_sequence ();
>> +  lra_process_new_insns (curr_insn, before, NULL_RTX, "Inserting the move");
> AIUI, the constraints pass will look at these instructions and generate
> what are now known as tertiary reloads where needed (by calling this
> function again).  Is that right?  Very nice if so: that's far more
> natural than the current reload handling.
Yes.
>> +/* The chosen reg classes which should be used for the corresponding
>> +   operands.  */
>> +static enum reg_class goal_alt[MAX_RECOG_OPERANDS];
>> +/* True if the operand should be the same as another operand and the
>> +   another operand does not need a reload.  */
> s/and the another/and that other/
Fixed.
>> +/* Make reloads for addr register in LOC which should be of class CL,
>> +   add reloads to list BEFORE.	If AFTER is not null emit insns to set
>> +   the register up after the insn (it is case of inc/dec, modify).  */
> Maybe:
>
> /* Arrange for address element *LOC to be a register of class CL.
>     Add any input reloads to list BEFORE.  AFTER is nonnull if *LOC is an
>     automodified value; handle that case by adding the required output
>     reloads to list AFTER.  Return true if the RTL was changed.  */
Fixed.
>> +static bool
>> +process_addr_reg (rtx *loc, rtx *before, rtx *after, enum reg_class cl)
>> +{
>> +  int regno, final_regno;
>> +  enum reg_class rclass, new_class;
>> +  rtx reg = *loc;
>> +  rtx new_reg;
>> +  enum machine_mode mode;
>> +  bool change_p = false;
>> +
>> +  mode = GET_MODE (reg);
>> +  if (! REG_P (reg))
>> +    {
>> +      /* Always reload memory in an address even if the target
>> +	 supports such addresses.  */
>> +      new_reg
>> +	= lra_create_new_reg_with_unique_value (mode, reg, cl, "address");
>> +      push_to_sequence (*before);
>> +      lra_emit_move (new_reg, reg);
>> +      *before = get_insns ();
>> +      end_sequence ();
>> +      *loc = new_reg;
>> +      if (after != NULL)
>> +	{
>> +	  start_sequence ();
>> +	  lra_emit_move (reg, new_reg);
>> +	  emit_insn (*after);
>> +	  *after = get_insns ();
>> +	  end_sequence ();
>> +	}
>> +      return true;
> Why does this need to be a special case, rather than reusing the
> code later in the function?  Specifically:
I simplified the function factoring common code.
>> +    }
>> +  lra_assert (REG_P (reg));
>> +  final_regno = regno = REGNO (reg);
>> +  if (regno < FIRST_PSEUDO_REGISTER)
>> +    {
>> +      rtx final_reg = reg;
>> +      rtx *final_loc = &final_reg;
>> +
>> +      lra_eliminate_reg_if_possible (final_loc);
>> +      final_regno = REGNO (*final_loc);
>> +    }
>> +  /* Use class of hard register after elimination because some targets
>> +     do not recognize virtual hard registers as valid address
>> +     registers.	 */
>> +  rclass = get_reg_class (final_regno);
>> +  if ((*loc = get_equiv_substitution (reg)) != reg)
>> +    {
>> +      if (lra_dump_file != NULL)
>> +	{
>> +	  fprintf (lra_dump_file,
>> +		   "Changing pseudo %d in address of insn %u on equiv ",
>> +		   REGNO (reg), INSN_UID (curr_insn));
>> +	  print_value_slim (lra_dump_file, *loc, 1);
>> +	  fprintf (lra_dump_file, "\n");
>> +	}
>> +      *loc = copy_rtx (*loc);
>> +      change_p = true;
>> +    }
>> +  if (*loc != reg || ! in_class_p (final_regno, GET_MODE (reg), cl, &new_class))
>> +    {
>> +      reg = *loc;
>> +      if (get_reload_reg (OP_IN, mode, reg, cl, "address", &new_reg))
>> +	{
>> +	  push_to_sequence (*before);
>> +	  lra_emit_move (new_reg, reg);
>> +	  *before = get_insns ();
>> +	  end_sequence ();
>> +	}
>> +      *loc = new_reg;
>> +      if (after != NULL)
>> +	{
>> +	  start_sequence ();
>> +	  lra_emit_move (reg, new_reg);
>> +	  emit_insn (*after);
>> +	  *after = get_insns ();
>> +	  end_sequence ();
>> +	}
>> +      change_p = true;
>> +    }
>> +  else if (new_class != NO_REGS && rclass != new_class)
>> +    change_class (regno, new_class, "	   Change", true);
>> +  return change_p;
>> +}
> E.g.:
>
>    if ((*loc = get_equiv_substitution (reg)) != reg)
>      ...as above...
>    if (*loc != reg || !in_class_p (reg, cl, &new_class))
>      ...as above...
>    else if (new_class != NO_REGS && rclass != new_class)
>      change_class (regno, new_class, "	   Change", true);
>    return change_p;
>
> (assuming change to in_class_p suggested earlier) seems like it
> covers the same cases.
>
> Also, should OP_IN be OP_INOUT for after != NULL, so that we don't try
> to reuse existing reload pseudos?  That would mean changing get_reload_reg
> (both commentary and code) to handle OP_INOUT like OP_OUT.
Fixed.
> Or maybe just pass OP_OUT instead of OP_INOUT, if that's more consistent.
> I don't mind which.
>> +  /* Force reload if this is a constant or PLUS or if there may be a
>> +     problem accessing OPERAND in the outer mode.  */
> Suggest:
>
>    /* Force a reload of the SUBREG_REG if this ...
Fixed.
>> +      /* Constant mode ???? */
>> +      enum op_type type = curr_static_id->operand[nop].type;
> Not sure what the comment means, but REG is still the original SUBREG_REG,
> so there shouldn't be any risk of a VOIDmode constant.  (subreg (const_int))
> is invalid rtl.
I removed the comment.  It was probably a question for myself.
>> +/* Return TRUE if *LOC refers for a hard register from SET.  */
>> +static bool
>> +uses_hard_regs_p (rtx *loc, HARD_REG_SET set)
>> +{
> Nothing seems to care about the address, so we would pass the rtx
> rather than a pointer to it.
Fixed.
>> +  int i, j, x_hard_regno, offset;
>> +  enum machine_mode mode;
>> +  rtx x;
>> +  const char *fmt;
>> +  enum rtx_code code;
>> +
>> +  if (*loc == NULL_RTX)
>> +    return false;
>> +  x = *loc;
>> +  code = GET_CODE (x);
>> +  mode = GET_MODE (x);
>> +  if (code == SUBREG)
>> +    {
>> +      loc = &SUBREG_REG (x);
>> +      x = SUBREG_REG (x);
>> +      code = GET_CODE (x);
>> +      if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
>> +	mode = GET_MODE (x);
>> +    }
>> +
>> +  if (REG_P (x))
>> +    {
>> +      lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
>> +      /* The real hard regno of the operand after the allocation.  */
>> +      x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
>> +      return (x_hard_regno >= 0
>> +	      && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));
> With the subreg mode handling above, this looks little-endian specific.
> The MEM case:
>> +  if (MEM_P (x))
>> +    {
>> +      struct address ad;
>> +      enum machine_mode mode = GET_MODE (x);
>> +      rtx *addr_loc = &XEXP (x, 0);
>> +
>> +      extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
>> +      if (ad.base_reg_loc != NULL)
>> +	{
>> +	  if (uses_hard_regs_p (ad.base_reg_loc, set))
>> +	    return true;
>> +	}
>> +      if (ad.index_reg_loc != NULL)
>> +	{
>> +	  if (uses_hard_regs_p (ad.index_reg_loc, set))
>> +	    return true;
>> +	}
>> +    }
> is independent of the subreg handling, so perhaps the paradoxical subreg
> case should be handled separately, using simplify_subreg_regno.
>> +/* Major function to choose the current insn alternative and what
>> +   operands should be reloaded and how.	 If ONLY_ALTERNATIVE is not
>> +   negative we should consider only this alternative.  Return false if
>> +   we can not choose the alternative or find how to reload the
>> +   operands.  */
>> +static bool
>> +process_alt_operands (int only_alternative)
>> +{
>> +  bool ok_p = false;
>> +  int nop, small_class_operands_num, overall, nalt, offset;
>> +  int n_alternatives = curr_static_id->n_alternatives;
>> +  int n_operands = curr_static_id->n_operands;
>> +  /* LOSERS counts those that don't fit this alternative and would
>> +     require loading.  */
>> +  int losers;
> s/those/the operands/
Fixed.
>> +  /* Calculate some data common for all alternatives to speed up the
>> +     function.	*/
>> +  for (nop = 0; nop < n_operands; nop++)
>> +    {
>> +      op = no_subreg_operand[nop] = *curr_id->operand_loc[nop];
>> +      lra_get_hard_regno_and_offset (op, &hard_regno[nop], &offset);
>> +      /* The real hard regno of the operand after the allocation.  */
>> +      hard_regno[nop] = get_final_hard_regno (hard_regno[nop], offset);
>> +
>> +      operand_reg[nop] = op;
>> +      biggest_mode[nop] = GET_MODE (operand_reg[nop]);
>> +      if (GET_CODE (operand_reg[nop]) == SUBREG)
>> +	{
>> +	  operand_reg[nop] = SUBREG_REG (operand_reg[nop]);
>> +	  if (GET_MODE_SIZE (biggest_mode[nop])
>> +	      < GET_MODE_SIZE (GET_MODE (operand_reg[nop])))
>> +	    biggest_mode[nop] = GET_MODE (operand_reg[nop]);
>> +	}
>> +      if (REG_P (operand_reg[nop]))
>> +	no_subreg_operand[nop] = operand_reg[nop];
>> +      else
>> +	operand_reg[nop] = NULL_RTX;
> This looks odd: no_subreg_operand ends up being a subreg if the
> SUBREG_REG wasn't a REG.  Some more commentary might help.
Probably I should have a better name no_subreg_reg_operand.  I also 
added comments for its definition and definition of operand_reg.
>> +  /* The constraints are made of several alternatives.	Each operand's
>> +     constraint looks like foo,bar,... with commas separating the
>> +     alternatives.  The first alternatives for all operands go
>> +     together, the second alternatives go together, etc.
>> +
>> +     First loop over alternatives.  */
>> +  for (nalt = 0; nalt < n_alternatives; nalt++)
>> +    {
>> +      /* Loop over operands for one constraint alternative.  */
>> +      if (
>> +#ifdef HAVE_ATTR_enabled
>> +	  (curr_id->alternative_enabled_p != NULL
>> +	   && ! curr_id->alternative_enabled_p[nalt])
>> +	  ||
>> +#endif
>> +	  (only_alternative >= 0 && nalt != only_alternative))
>> +	continue;
> Probably more natural if split into two "if (...) continue;"s.  E.g.:
Fixed.
> #ifdef HAVE_ATTR_enabled
>        if (curr_id->alternative_enabled_p != NULL
> 	  && !curr_id->alternative_enabled_p[nalt])
> 	continue;
> #endif
>        if (only_alternative >= 0 && nalt != only_alternative))
> 	continue;
>
>> +      for (nop = 0; nop < n_operands; nop++)
>> +	{
>> +	  const char *p;
>> +	  char *end;
>> +	  int len, c, m, i, opalt_num, this_alternative_matches;
>> +	  bool win, did_match, offmemok, early_clobber_p;
>> +	  /* false => this operand can be reloaded somehow for this
>> +	     alternative.  */
>> +	  bool badop;
>> +	  /* false => this operand can be reloaded if the alternative
>> +	     allows regs.  */
>> +	  bool winreg;
>> +	  /* False if a constant forced into memory would be OK for
>> +	     this operand.  */
>> +	  bool constmemok;
>> +	  enum reg_class this_alternative, this_costly_alternative;
>> +	  HARD_REG_SET this_alternative_set, this_costly_alternative_set;
>> +	  bool this_alternative_match_win, this_alternative_win;
>> +	  bool this_alternative_offmemok;
>> +	  int invalidate_m;
>> +	  enum machine_mode mode;
>> +
>> +	  opalt_num = nalt * n_operands + nop;
>> +	  if (curr_static_id->operand_alternative[opalt_num].anything_ok)
>> +	    {
>> +	      /* Fast track for no constraints at all.	*/
>> +	      curr_alt[nop] = NO_REGS;
>> +	      CLEAR_HARD_REG_SET (curr_alt_set[nop]);
>> +	      curr_alt_win[nop] = true;
>> +	      curr_alt_match_win[nop] = false;
>> +	      curr_alt_offmemok[nop] = false;
>> +	      curr_alt_matches[nop] = -1;
>> +	      continue;
>> +	    }
> Given that this code is pretty complex, it might be clearer to remove
> the intermediate "this_*" variables and assign directly to curr_alt_*.  I.e.:
I'd rather not to do this.  Using array element instead of simple 
variable occurrences in about 50 places.  I don't think it makes code 
cleaner.
> 	  curr_alt[nop] = NO_REGS;
> 	  CLEAR_HARD_REG_SET (curr_alt_set[nop]);
> 	  curr_alt_win[nop] = false;
> 	  curr_alt_match_win[nop] = false;
> 	  curr_alt_offmemok[nop] = false;
> 	  curr_alt_matches[nop] = -1;
>
> 	  opalt_num = nalt * n_operands + nop;
> 	  if (curr_static_id->operand_alternative[opalt_num].anything_ok)
> 	    {
> 	      /* Fast track for no constraints at all.	*/
> 	      curr_alt_win[nop] = true;
> 	      continue;
> 	    }
>
> Obviously keep this nice comment:
>> +	  /* We update set of possible hard regs besides its class
>> +	     because reg class might be inaccurate.  For example,
>> +	     union of LO_REGS (l), HI_REGS(h), and STACK_REG(k) in ARM
>> +	     is translated in HI_REGS because classes are merged by
>> +	     pairs and there is no accurate intermediate class.	 */
> somewhere though, either here or above the declaration of curr_alt_set.
>
>> +		    /* We are supposed to match a previous operand.
>> +		       If we do, we win if that one did.  If we do
>> +		       not, count both of the operands as losers.
>> +		       (This is too conservative, since most of the
>> +		       time only a single reload insn will be needed
>> +		       to make the two operands win.  As a result,
>> +		       this alternative may be rejected when it is
>> +		       actually desirable.)  */
>> +		    /* If it conflicts with others.  */
> Last line looks incomplete/misplaced.
I have no idea where/what it should be.  I removed it.
>> +		    match_p = false;
>> +		    if (operands_match_p (*curr_id->operand_loc[nop],
>> +					  *curr_id->operand_loc[m], m_hregno))
>> +		      {
>> +			int i;
>> +			
>> +			for (i = 0; i < early_clobbered_regs_num; i++)
>> +			  if (early_clobbered_nops[i] == m)
>> +			    break;
>> +			/* We should reject matching of an early
>> +			   clobber operand if the matching operand is
>> +			   not dying in the insn.  */
>> +			if (i >= early_clobbered_regs_num
> Why not simply use operands m's early_clobber field?
Ok.  Fixed.
>> +			    || operand_reg[nop] == NULL_RTX
>> +			    || (find_regno_note (curr_insn, REG_DEAD,
>> +						 REGNO (operand_reg[nop]))
>> +				!= NULL_RTX))
>> +			  match_p = true;
> ...although I don't really understand this condition.  If the two
> operands are the same value X, then X must die here whatever the
> notes say.  So I assume this is coping with a case where the operands
> are different but still match.  If so, could you give an example?
I remember I saw such insn but I don't remember details.
> Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't
> apply to the matched input operand; the earlyclobber only applies to
> other input operands.  So I'd have expected it was those operands
> that might need reloading rather than this one.
>
> E.g. if X occurs three times, twice in a matched earlyclobber pair
> and once as an independent operand, it's the latter operand that would
> need reloading.
Yes, I know.
>> +			/* Operands don't match.  */
>> +			/* Retroactively mark the operand we had to
>> +			   match as a loser, if it wasn't already and
>> +			   it wasn't matched to a register constraint
>> +			   (e.g it might be matched by memory).	 */
>> +			if (curr_alt_win[m]
>> +			    && (operand_reg[m] == NULL_RTX
>> +				|| hard_regno[m] < 0))
>> +			  {
>> +			    losers++;
>> +			    if (curr_alt[m] != NO_REGS)
>> +			      reload_nregs
>> +				+= (ira_reg_class_max_nregs[curr_alt[m]]
>> +				    [GET_MODE (*curr_id->operand_loc[m])]);
>> +			}
>> +			invalidate_m = m;
>> +			if (curr_alt[m] == NO_REGS)
>> +			  continue;
> I found this a bit confusing.  If the operands don't match and operand m
> allows no registers, don't we have to reject this constraint outright?
> E.g. something like:
Yes, this function as in reload can be investigated for long time. I 
cleared it a bit.  May be it is right time to clear it up more although 
time is a scarce resource.
I tried your variant.  I did not find serious problems on x86 which 
should be concerned of this code.  So I am using it.
> 			/* Operands don't match.  Both operands must
> 			   allow a reload register, otherwise we cannot
> 			   make them match.  */
> 			if (curr_alt[m] == NO_REGS)
> 			  break;
> 			/* Retroactively mark the operand we had to
> 			   match as a loser, if it wasn't already and
> 			   it wasn't matched to a register constraint
> 			   (e.g it might be matched by memory).	 */
> 			if (curr_alt_win[m]
> 			    && (operand_reg[m] == NULL_RTX
> 				|| hard_regno[m] < 0))
> 			  {
> 			    losers++;
> 			    reload_nregs
> 			      += (ira_reg_class_max_nregs[curr_alt[m]]
> 				  [GET_MODE (*curr_id->operand_loc[m])]);
> 			  }
>
>> +		    /* This can be fixed with reloads if the operand
>> +		       we are supposed to match can be fixed with
>> +		       reloads.	 */
>> +		    badop = false;
>> +		    this_alternative = curr_alt[m];
>> +		    COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);
>> +		
>> +		    /* If we have to reload this operand and some
>> +		       previous operand also had to match the same
>> +		       thing as this operand, we don't know how to do
>> +		       that.  So reject this alternative.  */
>> +		    if (! did_match)
>> +		      for (i = 0; i < nop; i++)
>> +			if (curr_alt_matches[i] == this_alternative_matches)
>> +			  badop = true;
> OK, so this is another case of cruft from reload that I'd like to remove,
> but do you know of any reason why this shouldn't be:
>
> 		    /* If we have to reload this operand and some previous
> 		       operand also had to match the same thing as this
> 		       operand, we don't know how to do that.  */
> 		    if (!match_p || !curr_alt_win[m])
> 		      {
> 			for (i = 0; i < nop; i++)
> 			  if (curr_alt_matches[i] == m)
> 			    break;
> 			if (i < nop)
> 			  break;
> 		      }
> 		    else
> --->		      did_match = true;
>
> 		    /* This can be fixed with reloads if the operand
> 		       we are supposed to match can be fixed with
> 		       reloads.	 */
> --->		    this_alternative_matches = m;
> --->		    invalidate_m = m;
> 		    badop = false;
> 		    this_alternative = curr_alt[m];
> 		    COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);
>
> (although a helper function might be better than the awkward breaking)?
> Note that the ---> lines have moved from further up.
>
> This is the only time in the switch statement where one constraint
> in a constraint string uses "badop = true" to reject the operand.
> I.e. for "<something else>0" we should normally not reject the
> alternative based solely on the "0", since the "<something else>"
> might have been satisfied instead.  And we should only record matching
> information if we've decided the match can be implemented by reloads
> (the last block).
Yes, that is bad.  Especially it is not documented but people use this.

I tried it and I did not a problem with this.  So I use your variant 
(with one modification setting invalidate_m only for !did_match).
>> +			/* We prefer no matching alternatives because
>> +			   it gives more freedom in RA.	 */
>> +			if (operand_reg[nop] == NULL_RTX
>> +			    || (find_regno_note (curr_insn, REG_DEAD,
>> +						 REGNO (operand_reg[nop]))
>> +				 == NULL_RTX))
>> +			  reject += 2;
> Looks like a new reject rule.  I agree it makes conceptual sense though,
> so I'm all for it.
>> +		      || (REG_P (op)
>> +			  && REGNO (op) >= FIRST_PSEUDO_REGISTER
>> +			  && in_mem_p (REGNO (op))))
> This pattern occurs several times.  I think a helper function like
> spilled_reg_p (op) would help.  See 'g' below.
Ok. Fixed.
>> +		case 's':
>> +		  if (CONST_INT_P (op)
>> +		      || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))
> ...
>> +		case 'n':
>> +		  if (CONST_INT_P (op)
>> +		      || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))
> After recent changes these should be CONST_SCALAR_INT_P (op)
OK.
>> +		case 'g':
>> +		  if (/* A PLUS is never a valid operand, but LRA can
>> +			 make it from a register when eliminating
>> +			 registers.  */
>> +		      GET_CODE (op) != PLUS
>> +		      && (! CONSTANT_P (op) || ! flag_pic
>> +			  || LEGITIMATE_PIC_OPERAND_P (op))
>> +		      && (! REG_P (op)
>> +			  || (REGNO (op) >= FIRST_PSEUDO_REGISTER
>> +			      && in_mem_p (REGNO (op)))))
> Rather than the special case for PLUS, I think this would be better as:
>
> 	  if (MEM_P (op)
> 	      || spilled_reg_p (op)
> 	      || general_constant_p (op))
> 	    win = true;
>
> where general_constant_p abstracts away:
>
> 	  (CONSTANT_P (op)
> 	   && (! flag_pic || LEGITIMATE_PIC_OPERAND_P (op)))
>
> general_constant_p probably ought to go in a common file because several
> places need this condition (including other parts of this switch statement).
OK. Fixed.
>> +#ifdef EXTRA_CONSTRAINT_STR
>> +		      if (EXTRA_MEMORY_CONSTRAINT (c, p))
>> +			{
>> +			  if (EXTRA_CONSTRAINT_STR (op, c, p))
>> +			    win = true;
>> +			  /* For regno_equiv_mem_loc we have to
>> +			     check.  */
>> +			  else if (REG_P (op)
>> +				   && REGNO (op) >= FIRST_PSEUDO_REGISTER
>> +				   && in_mem_p (REGNO (op)))
> Looks like an old comment from an earlier iteration.  There doesn't
> seem to be a function called regno_equiv_mem_loc in the current patch.
> But...
Yes, it is from early version.  I removed it.
>> +			    {
>> +			      /* We could transform spilled memory
>> +				 finally to indirect memory.  */
>> +			      if (EXTRA_CONSTRAINT_STR
>> +				  (get_indirect_mem (mode), c, p))
>> +				win = true;
>> +			    }
> ...is this check really needed?  It's a documented requirement that memory
> constraints accept plain base registers.  Also, the following code is:
I removed it.  At least it works for x86/x86-64.
>> +			  /* If we didn't already win, we can reload
>> +			     constants via force_const_mem, and other
>> +			     MEMs by reloading the address like for
>> +			     'o'.  */
>> +			  if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
>> +			    badop = false;
> It seems a bit inconsistent to treat a spilled pseudo whose address
> might well need reloading as a win, while not treating existing MEMs
> whose addresses need reloading as a win.
Well, probability of reloading address of spilled pseudo is very small 
on most targets but reloading for MEM in this case is real. So I see it 
logical.
>> +		      if (EXTRA_CONSTRAINT_STR (op, c, p))
>> +			win = true;
>> +		      else if (REG_P (op)
>> +			       && REGNO (op) >= FIRST_PSEUDO_REGISTER
>> +			       && in_mem_p (REGNO (op)))
>> +			{
>> +			  /* We could transform spilled memory finally
>> +			     to indirect memory.  */
>> +			  if (EXTRA_CONSTRAINT_STR (get_indirect_mem (mode),
>> +						    c, p))
>> +			    win = true;
>> +			}
> I don't understand why there's two copies of this.  I think we have
> to trust the target's classification of constraints, so if the target
> says that something isn't a memory constraint, we shouldn't check the
> (mem (base)) case.
I removed it too.
>> +	      if (c != ' ' && c != '\t')
>> +		costly_p = c == '*';
> I think there needs to be a comment somewhere saying how we handle this.
> Being costly seems to contribute one reject point (i.e. a sixth of a '?')
> compared to the normal case, which is very different from the current
> reload behaviour.  We should probably update the "*" documentation
> in md.texi too.
Yes.  It is different.  New heuristics result in a better code generation.
> Some targets use "*" constraints to make sure that floating-point
> registers don't get used as spill space in purely integer code
> (so that task switches don't pay the FPU save/restore penalty).
> Would that still "work" with this definition?  (FWIW, I think
> using "*" is a bad way to achieve this feature, just asking.)
There are two places for processing '*'.  One is in ira-cost.c for 
choose classes.  Therefore I believe it will work.  At least I did not 
find any problems on 8 targets.  I changed md.texi too.
>> +		  /* We simulate the behaviour of old reload here.
>> +		     Although scratches need hard registers and it
>> +		     might result in spilling other pseudos, no reload
>> +		     insns are generated for the scratches.  So it
>> +		     might cost something but probably less than old
>> +		     reload pass believes.  */
>> +		  if (lra_former_scratch_p (REGNO (operand_reg[nop])))
>> +		    reject += LOSER_COST_FACTOR;
> Yeah, this caused me no end of trouble when tweaking the MIPS
> multiply-accumulate patterns.  However, unlike the other bits of
> cruft I've been complaining about, this is one where I can't think
> of any alternative that makes more inherent sense (to me).  So I agree
> that leaving it as-is is the best approach for now.
>> +	      /* If the operand is dying, has a matching constraint,
>> +		 and satisfies constraints of the matched operand
>> +		 which failed to satisfy the own constraints, we do
>> +		 not need to generate a reload insn for this
>> +		 operand.  */
>> +	      if (this_alternative_matches < 0
>> +		  || curr_alt_win[this_alternative_matches]
>> +		  || ! REG_P (op)
>> +		  || find_regno_note (curr_insn, REG_DEAD,
>> +				      REGNO (op)) == NULL_RTX
>> +		  || ((hard_regno[nop] < 0
>> +		       || ! in_hard_reg_set_p (this_alternative_set,
>> +					       mode, hard_regno[nop]))
>> +		      && (hard_regno[nop] >= 0
>> +			  || ! in_class_p (REGNO (op), GET_MODE (op),
>> +					   this_alternative, NULL))))
>> +		losers++;
> I think this might be clearer as:
Fixed.
> 	      if (!(this_alternative_matches >= 0
> 		    && !curr_alt_win[this_alternative_matches]
> 		    && REG_P (op)
> 		    && find_regno_note (curr_insn, REG_DEAD, REGNO (op))
> 		    && (hard_regno[nop] >= 0
> 			? in_hard_reg_set_p (this_alternative_set,
> 					     mode, hard_regno[nop])
> 			: in_class_p (op, this_alternative, NULL))))
> 		losers++;
>
>> +	      if (operand_reg[nop] != NULL_RTX)
>> +		{
>> +		  int last_reload = (lra_reg_info[ORIGINAL_REGNO
>> +						  (operand_reg[nop])]
>> +				     .last_reload);
>> +
>> +		  if (last_reload > bb_reload_num)
>> +		    reload_sum += last_reload;
>> +		  else
>> +		    reload_sum += bb_reload_num;
> The comment for reload_sum says:
>
>> +/* Overall number reflecting distances of previous reloading the same
>> +   value.  It is used to improve inheritance chances.  */
>> +static int best_reload_sum;
That is a wrong comment.  It should be

/* Overall number reflecting distances of previous reloading the same
    value.  The distances are counted from the current BB start.  It is
    used to improve inheritance chances.  */

I fixed it.  I am also decreasing the number by bb_reload_num every time 
when I increase reload_sum.
> which made me think of distance from the current instruction.  I see
> it's actually something else, effectively a sum of instruction numbers.
>
> I assumed the idea was to prefer registers that were reloaded more
> recently (closer the current instruction).  In that case I thought that,
> for a distance-based best_reload_sum, smaller would be better,
> while for an instruction-number-based best_reload_sum, larger would
> be better.  It looks like we use instruction-number based best_reload_sums
> but prefer smaller sums:
>
>> +			      && (reload_nregs < best_reload_nregs
>> +				  || (reload_nregs == best_reload_nregs
>> +				      && best_reload_sum < reload_sum))))))))
> Is that intentional?
Now it has sense the bigger number, the closer the last reloading to the 
current insn.
> Also, is this value meaningful for output reloads, which aren't really
> going to be able to inherit a value as such?  We seem to apply the cost
> regardless of whether it's an input or an output, so probably deserves
> a comment.
>
> Same for matched input operands, which as you say elsewhere aren't
> inherited.
Right.  It could improve the heuristic more.  I added the code.
>> +	      if (badop
>> +		  /* Alternative loses if it has no regs for a reg
>> +		     operand.  */
>> +		  || (REG_P (op) && no_regs_p
>> +		      && this_alternative_matches < 0))
>> +		goto fail;
> More reload cruft, but I don't understand why we have both this and, later:
>> +	      if (this_alternative_matches < 0
>> +		  && no_regs_p && ! this_alternative_offmemok && ! constmemok)
>> +		goto fail;
> We also had the earlier:
>
>> +	  /* If this operand could be handled with a reg, and some reg
>> +	     is allowed, then this operand can be handled.  */
>> +	  if (winreg && this_alternative != NO_REGS)
>> +	    badop = false;
> which I think belongs in the same else statement.  At least after the
> matching changes suggested above, I think all three can be replaced by:
>
> 	  /* If this operand accepts a register, and if the register class
> 	     has at least one allocatable register, then this operand
> 	     can be reloaded.  */
> 	  if (winreg && !no_regs_p)
> 	    badop = false;
>
> 	  if (badop)
> 	    goto fail;
>
> which IMO belongs after the "no_regs_p" assignment.  badop should never
> be false if we have no way of reloading the value.
This change is non-trivial without knowing semantics of all these 
variables.  It looks ok to me.

So I changed the code.
>> +	      if (! no_regs_p)
>> +		reload_nregs
>> +		  += ira_reg_class_max_nregs[this_alternative][mode];
> I wasn't sure why we counted this even in the "const_to_mem && constmmeok"
> and "MEM_P (op) && offmemok" cases from:
> 	      /* We prefer to reload pseudos over reloading other
> 		 things, since such reloads may be able to be
> 		 eliminated later.  So bump REJECT in other cases.
> 		 Don't do this in the case where we are forcing a
> 		 constant into memory and it will then win since we
> 		 don't want to have a different alternative match
> 		 then.	*/
> 	      if (! (REG_P (op)
> 		     && REGNO (op) >= FIRST_PSEUDO_REGISTER)
> 		  && ! (const_to_mem && constmemok)
> 		  /* We can reload the address instead of memory (so
> 		     do not punish it).	 It is preferable to do to
> 		     avoid cycling in some cases.  */
> 		  && ! (MEM_P (op) && offmemok))
> 		reject += 2;
I think constmemok is obvious.  It is not a reload, it just putting 
constant in the constant pool.  We should not punish it as no additional 
insns are generated.

There is a comment for offmemok case.  I think it describes it. 
Apparently it was a fix for LRA cycling.  I don't remember details. To 
restore them, I need to remove the code and to try it on many targets.  
I guess, it would take 3-4 days.  But I removed this as it does not 
affect x86/x86-64.
>> +	  if (early_clobber_p)
>> +	    reject++;
>> +	  /* ??? Should we update the cost because early clobber
>> +	     register reloads or it is a rare thing to be worth to do
>> +	     it.  */
>> +	  overall = losers * LOSER_COST_FACTOR + reject;
> Could you expand on the comment a bit?
Yes, I did it.
>> +	  if ((best_losers == 0 || losers != 0) && best_overall < overall)
>> +	    goto fail;
>> +
>> +	  curr_alt[nop] = this_alternative;
>> +	  COPY_HARD_REG_SET (curr_alt_set[nop], this_alternative_set);
>> +	  curr_alt_win[nop] = this_alternative_win;
>> +	  curr_alt_match_win[nop] = this_alternative_match_win;
>> +	  curr_alt_offmemok[nop] = this_alternative_offmemok;
>> +	  curr_alt_matches[nop] = this_alternative_matches;
>> +
>> +	  if (invalidate_m >= 0 && ! this_alternative_win)
>> +	    curr_alt_win[invalidate_m] = false;
> BTW, after the matching changes above, I don't think we need both
> "invalidate_m" and "this_alternative_matches".
>
Yes, if we use did_match, we could remove invalidate_m.  I removed it.
>> +	  for (j = hard_regno_nregs[clobbered_hard_regno][biggest_mode[i]] - 1;
>> +	       j >= 0;
>> +	       j--)
>> +	    SET_HARD_REG_BIT (temp_set, clobbered_hard_regno + j);
> add_to_hard_reg_set.
>
Fixed.
>> +	    else if (curr_alt_matches[j] == i && curr_alt_match_win[j])
>> +	      {
>> +		/* This is a trick.  Such operands don't conflict and
>> +		   don't need a reload.	 But it is hard to transfer
>> +		   this information to the assignment pass which
>> +		   spills one operand without this info.  We avoid the
>> +		   conflict by forcing to use the same pseudo for the
>> +		   operands hoping that the pseudo gets the same hard
>> +		   regno as the operands and the reloads are gone.  */
>> +		if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])
> ...
>> +		/* See the comment for the previous case.  */
>> +		if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])
> What are these last two if statements for?  I wasn't sure how two operands
> could have the same address.
>
> Not saying they're wrong, but I think a comment would be good.
I think it is a leftover from older state of code.  If lra-assign.c does 
something wrong in this situation, I think, next round lra-constraint.c 
will fix it.
So I am removing the code.
>> +      small_class_operands_num = 0;
>> +      for (nop = 0; nop < n_operands; nop++)
>> +	/* If this alternative can be made to work by reloading, and
>> +	   it needs less reloading than the others checked so far,
>> +	   record it as the chosen goal for reloading.	*/
>> +	small_class_operands_num
>> +	  += SMALL_REGISTER_CLASS_P (curr_alt[nop]) ? 1 : 0;
> Misplaced comment; I think it belongs after this line.
>
Yes. Fixed.
That was a rigorous review.  I took 2 full days and numerous bootstraps 
and testing to check your proposals and answer you questions.
But I am glad.  I believe the code became more clear.
Richard Sandiford Oct. 15, 2012, 12:06 p.m. UTC | #11
Vladimir Makarov <vmakarov@redhat.com> writes:
>> if that's accurate.  I dropped the term "reload pseudo" because of
>> the general comment in my earlier reply about the use of "reload pseudo"
>> when the code seems to include inheritance and split pseudos too.
> There is no inheritance and splitting yet.  It is done after the 
> constraint pass.
> So at this stage >= new_regno_start means reload pseudo.

Ah, OK.

>> That's a change in the meaning of NEW_CLASS, but seems easier for
>> callers to handle.  I think all it requires is changing:
>>
>>> +      common_class = ira_reg_class_subset[rclass][cl];
>>> +      if (new_class != NULL)
>>> +	*new_class = common_class;
>> to:
>>
>>        common_class = ira_reg_class_subset[rclass][cl];
>>        if (new_class != NULL && rclass != common_class)
>> 	*new_class = common_class;
> This change results in infinite LRA looping on a first libgcc file 
> compilation.  Unfortunately I have no time to investigate it.
> I'd like to say that most code of in this code is very sensitive to 
> changes.  I see it a lot.  You change something looking obvious and a 
> target is broken.
> I am going to investigate it when I have more time.

Thanks.

>>> +    default:
>>> +      {
>>> +	const char *fmt = GET_RTX_FORMAT (code);
>>> +	int i;
>>> +
>>> +	if (GET_RTX_LENGTH (code) != 1
>>> +	    || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
>>> +	  {
>>> +	    for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
>>> +	      if (fmt[i] == 'e')
>>> +		extract_loc_address_regs (false, mode, as, &XEXP (x, i),
>>> +					  context_p, code, SCRATCH,
>>> +					  modify_p, ad);
>>> +	    break;
>>> +	  }
>>> +	/* fall through for case UNARY_OP (UNSPEC ...)	*/
>>> +      }
>>> +
>>> +    case UNSPEC:
>>> +      if (ad->disp_loc == NULL)
>>> +	ad->disp_loc = loc;
>>> +      else if (ad->base_reg_loc == NULL)
>>> +	{
>>> +	  ad->base_reg_loc = loc;
>>> +	  ad->base_outer_code = outer_code;
>>> +	  ad->index_code = index_code;
>>> +	  ad->base_modify_p = modify_p;
>>> +	}
>>> +      else
>>> +	{
>>> +	  lra_assert (ad->index_reg_loc == NULL);
>>> +	  ad->index_reg_loc = loc;
>>> +	}
>>> +      break;
>>> +
>>> +    }
>> Which targets use a bare UNSPEC as a displacement?  I thought a
>> displacement had to be a link-time constant, in which case it should
>> satisfy CONSTANT_P.  For UNSPECs, that means wrapping it in a CONST.
> I saw it somewhere.  I guess IA64.
>> I'm just a bit worried that the UNSPEC handling is sensitive to the
>> order that subrtxes are processed (unlike PLUS, which goes to some
>> trouble to work out what's what).  It could be especially confusing
>> because the default case processes operands in reverse order while
>> PLUS processes them in forward order.
>>
>> Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
>> Probably deserves a comment.
> I don't remember.  To figure out, I should switch it off and try all 
> targets supported by LRA.
>> AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
>> recording where reloads of a particular class need to go (obviously
>> in the case of disp_loc, which isn't reloaded at all).  The feidls
>> have semantic value too.  I.e. we use them to work out the value
>> of at least part of the address.
>>
>> In that case it seems dangerous to look through general rtxes
>> in the way that the default case above does.  Maybe just making
>> sure that DISP_LOC is involved in a sum with the base would be
>> enough, but another idea was:
>>
>> ----------------------------------------------------------------
>> I know of three ways of "mutating" (for want of a better word)
>> an address:
>>
>>    1. (and X (const_int X)), to align
>>    2. a subreg
>>    3. a unary operator (such as truncation or extension)
>>
>> So maybe we could:
>>
>>    a. remove outer mutations (using a helper function)
>>    b. handle LO_SUM, PRE_*, POST_*: as now
>>    c. otherwise treat the address of the sum of one, two or three pieces.
>>       c1. Peel mutations of all pieces.
>>       c2. Classify the pieces into base, index and displacement.
>>           This would be similar to the jousting code above, but hopefully
>>           easier because all three rtxes are to hand.  E.g. we could
>>           do the base vs. index thing in a similar way to
>>           commutative_operand_precedence.
>>       c3. Record which pieces were mutated (e.g. using something like the
>>           index_loc vs. index_reg_loc distinction in the current code)
>>
>> That should be general enough for current targets, but if it isn't,
>> we could generalise it further when we know what generalisation is needed.
>>
>> That's still going to be a fair amount of code, but hopefully not more,
>> and we might have more confidence at each stage what each value is.
>> And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
>> ----------------------------------------------------------------
>>
>> Just an idea though.  Probably not for 4.8, although I might try it
>> if I find time.
> I am not sure that you listed all the cases.  It would be great if you 
> listed all the cases. In this case we could make this function more clear.
> I tried to do this first but permanently found new cases.  After that I 
> gave up and tried to use more general implementation.
>
> This function was rewritten and modified many times.  I am afraid to do 
> this again when clock is ticking.
>
> It would be great if you re-implement the function according to your 
> ideas and we could try it on 8 targets to which LRA was already ported.  
> An LRA sub-branch would a perfect place to do it

Fair enough.  I still stand by:

>> It would be nice to sort out the disp_loc thing for 4.8 though.

though.  My point is that base_plus_disp_to_reg assumes that
*base_loc and *disp_loc are involved in a sum.  It adds them together,
replaces the base_loc with the new pseudo, and removes the disp_loc.
But extract_address_regs seems to be deliberately written in a way that
doesn't require base_loc and disp_loc to be involved in a sum, and uses
a more indirect approach to working out disp_loc instead.  It just feels
like it leaves open the potential for a silent wrong-code bug.

>>> +/* Reload pseudos created for matched input and output reloads whose
>>> +   mode are different.	Such pseudos has a modified rules for finding
>>> +   their living ranges, e.g. assigning to subreg of such pseudo means
>>> +   changing all pseudo value.  */
>>> +bitmap_head lra_bound_pseudos;
>> Maybe:
>>
>> /* Reload pseudos created for matched input and output reloads whose
>>     modes are different.  Such pseudos have different live ranges from
>>     other pseudos; e.g. any assignment to a subreg of these pseudos
>>     changes the whole pseudo's value.  */
> Fixed.
>> Although that said, couldn't emit_move_insn_1 (called by gen_move_insn)
>> split a multiword pseudo move into two word moves?  Using the traditional
>> clobber technique sounds better than having special liveness rules.
>>
> It is not only about multi-words pseudos.  It is about representation of 
> this situation by constructions semantically incorrect in order parts of 
> compiler.  Reload has no such problem as it does not use RTL.   So I 
> don't think it splits as I use emit_move_insn and that calls 
> emit_move_insn_1 too.

But my point is that emit_move_insn_1 _does_ split moves that have no
.md pattern of their own.  E.g. some targets do not define double-word
move patterns because such moves are always equivalent to two individual
word moves.  And if emit_move_insn_1 splits:

   (set (reg:DI X) (reg:DI Y))

into:

   (set (subreg:SI (reg:DI X) 0) (subreg:SI (reg:DI Y) 0))
   (set (subreg:SI (reg:DI X) 4) (subreg:SI (reg:DI Y) 4))

then it would be to say that the subreg in the second instruction
is a complete definition of X.

> I really needed a special liveness treatment (although I don't
> remember details) and therefore I added it.  I had no detail design
> for LRA.  The code was modified by numerous test failures on different
> targets.  There is a lot of code analogous to reload one and probably
> its necessity should be rigorously questioned.  I thought about and
> modified part of this code but unfortunately not all.
>
> Also bound pseudos are rare.  Their bitmap is very small and testing (2 
> lines of code in overall) them in ira-lives.c is fast.

FWIW, It wasn't really speed as much as correctness I was worried about.
In a way, rarity makes having special rules seem even more dangerous.

>>> +      /* We create pseudo for out rtx because we always should keep
>>> +	 registers with the same original regno have synchronized
>>> +	 value (it is not true for out register but it will be
>>> +	 corrected by the next insn).
>> I don't understand this comment, sorry.
>>
> Pseudos have values -- see comments for lra_reg_info.  Different pseudos 
> with the same value do not conflict even if they live in the same 
> place.  When we create a pseudo we assign value of original pseudo (if 
> any) from which we created the new pseudo.  If we create the pseudo from 
> the input pseudo, the new pseudo will no conflict with the input pseudo 
> which is wrong when the input pseudo lives after the insn and as the new 
> pseudo value is changed by the insn output.  Therefore we create the new 
> pseudo from the output.
>
> I hope it is more understandable.  I changed the comment.

Yeah, I think that makes it a lot clearer, thanks.

>>> +  /* In and out operand can be got from transformations before
>>> +     processing constraints.  So the pseudos might have inaccurate
>>> +     class and we should make their classes more accurate.  */
>>> +  narrow_reload_pseudo_class (in_rtx, goal_class);
>>> +  narrow_reload_pseudo_class (out_rtx, goal_class);
>> I don't understand this, sorry.  Does "transformations" mean inheritance
>> and reload splitting?  So the registers we're changing here are inheritance
>> and split pseudos rather than reload pseudos created for this instruction?
>> If so, it sounds on face value like it conflicts with the comment quoted
>> above about not allowing reload instructions to the narrow the class
>> of pseudos.  Might be worth saying why that's OK here but not there.
> Again, inheritance and splitting is done after the constraint pass.
>
> The transformations here are mostly reloading of subregs which is done 
> before reloads for given insn.  On this transformation we create new 
> pseudos for which we don't know reg class yet.  In case we don't know 
> pseudo reg class yet, we assign ALL_REGS to the pseudo.

OK, thanks.

>> Also, all uses but one of lra_get_hard_regno_and_offset follow
>> the pattern:
>>
>>        lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
>>        /* The real hard regno of the operand after the allocation.  */
>>        x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
>>
>> so couldn't lra_get_hard_regno_and_offset just return the final
>> hard register, including elimination?  Then it could apply the
>> elimination on the original rtx.
>>
>> FWIW, the exception I mentioned was operands_match_p:
>>
>>        lra_get_hard_regno_and_offset (x, &i, &offset);
>>        if (i < 0)
>> 	goto slow;
>>        i += offset;
>>
>> but I'm not sure why this is the only caller that would want
>> to ignore elimination.
> ???

Not sure what you meant here :-)  Was that a placeholder,
or something else?  What I was getting at was that it would
be nice to replace all occurences of:

      lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
      /* The real hard regno of the operand after the allocation.  */
      x_hard_regno = get_final_hard_regno (x_hard_regno, offset);

with something like:

      x_hard_regno = lra_get_hard_regno (x);

and this operands_match_p code seemed to be the only place that didn't
apply get_final_hard_regno to the result of lra_get_hard_regno_and_offset.
I wasn't really sure why operands_match_p was different.

>>> +  int i, j, x_hard_regno, offset;
>>> +  enum machine_mode mode;
>>> +  rtx x;
>>> +  const char *fmt;
>>> +  enum rtx_code code;
>>> +
>>> +  if (*loc == NULL_RTX)
>>> +    return false;
>>> +  x = *loc;
>>> +  code = GET_CODE (x);
>>> +  mode = GET_MODE (x);
>>> +  if (code == SUBREG)
>>> +    {
>>> +      loc = &SUBREG_REG (x);
>>> +      x = SUBREG_REG (x);
>>> +      code = GET_CODE (x);
>>> +      if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
>>> +	mode = GET_MODE (x);
>>> +    }
>>> +
>>> +  if (REG_P (x))
>>> +    {
>>> +      lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
>>> +      /* The real hard regno of the operand after the allocation.  */
>>> +      x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
>>> +      return (x_hard_regno >= 0
>>> +	      && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));
>> With the subreg mode handling above, this looks little-endian specific.
>> The MEM case:
>>> +  if (MEM_P (x))
>>> +    {
>>> +      struct address ad;
>>> +      enum machine_mode mode = GET_MODE (x);
>>> +      rtx *addr_loc = &XEXP (x, 0);
>>> +
>>> +      extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
>>> +      if (ad.base_reg_loc != NULL)
>>> +	{
>>> +	  if (uses_hard_regs_p (ad.base_reg_loc, set))
>>> +	    return true;
>>> +	}
>>> +      if (ad.index_reg_loc != NULL)
>>> +	{
>>> +	  if (uses_hard_regs_p (ad.index_reg_loc, set))
>>> +	    return true;
>>> +	}
>>> +    }
>> is independent of the subreg handling, so perhaps the paradoxical subreg
>> case should be handled separately, using simplify_subreg_regno.

Not sure: did you have any thoughts on this?

>>> +		    match_p = false;
>>> +		    if (operands_match_p (*curr_id->operand_loc[nop],
>>> +					  *curr_id->operand_loc[m], m_hregno))
>>> +		      {
>>> +			int i;
>>> +			
>>> +			for (i = 0; i < early_clobbered_regs_num; i++)
>>> +			  if (early_clobbered_nops[i] == m)
>>> +			    break;
>>> +			/* We should reject matching of an early
>>> +			   clobber operand if the matching operand is
>>> +			   not dying in the insn.  */
>>> +			if (i >= early_clobbered_regs_num
>> Why not simply use operands m's early_clobber field?
> Ok.  Fixed.
>>> +			    || operand_reg[nop] == NULL_RTX
>>> +			    || (find_regno_note (curr_insn, REG_DEAD,
>>> +						 REGNO (operand_reg[nop]))
>>> +				!= NULL_RTX))
>>> +			  match_p = true;
>> ...although I don't really understand this condition.  If the two
>> operands are the same value X, then X must die here whatever the
>> notes say.  So I assume this is coping with a case where the operands
>> are different but still match.  If so, could you give an example?
> I remember I saw such insn but I don't remember details.
>> Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't
>> apply to the matched input operand; the earlyclobber only applies to
>> other input operands.  So I'd have expected it was those operands
>> that might need reloading rather than this one.
>>
>> E.g. if X occurs three times, twice in a matched earlyclobber pair
>> and once as an independent operand, it's the latter operand that would
>> need reloading.
> Yes, I know.

But in that case I don't understand the condition.  If we have:

  (set (reg X) (... (reg X) ...))

(which is the kind of thing operands_match_p is testing for)
then there is no requirement for a REG_DEAD note for X.
But it's still OK for two Xs form an earlyclobber pair.

>>> +			  /* If we didn't already win, we can reload
>>> +			     constants via force_const_mem, and other
>>> +			     MEMs by reloading the address like for
>>> +			     'o'.  */
>>> +			  if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
>>> +			    badop = false;
>> It seems a bit inconsistent to treat a spilled pseudo whose address
>> might well need reloading as a win, while not treating existing MEMs
>> whose addresses need reloading as a win.
> Well, probability of reloading address of spilled pseudo is very small 
> on most targets but reloading for MEM in this case is real. So I see it 
> logical.

OK, that's probably true. :-)

>>> +	      if (! no_regs_p)
>>> +		reload_nregs
>>> +		  += ira_reg_class_max_nregs[this_alternative][mode];
>> I wasn't sure why we counted this even in the "const_to_mem && constmmeok"
>> and "MEM_P (op) && offmemok" cases from:
>> 	      /* We prefer to reload pseudos over reloading other
>> 		 things, since such reloads may be able to be
>> 		 eliminated later.  So bump REJECT in other cases.
>> 		 Don't do this in the case where we are forcing a
>> 		 constant into memory and it will then win since we
>> 		 don't want to have a different alternative match
>> 		 then.	*/
>> 	      if (! (REG_P (op)
>> 		     && REGNO (op) >= FIRST_PSEUDO_REGISTER)
>> 		  && ! (const_to_mem && constmemok)
>> 		  /* We can reload the address instead of memory (so
>> 		     do not punish it).	 It is preferable to do to
>> 		     avoid cycling in some cases.  */
>> 		  && ! (MEM_P (op) && offmemok))
>> 		reject += 2;
> I think constmemok is obvious.  It is not a reload, it just putting 
> constant in the constant pool.  We should not punish it as no additional 
> insns are generated.
>
> There is a comment for offmemok case.  I think it describes it. 
> Apparently it was a fix for LRA cycling.  I don't remember details. To 
> restore them, I need to remove the code and to try it on many targets.  
> I guess, it would take 3-4 days.  But I removed this as it does not 
> affect x86/x86-64.

Sorry, my comment wasn't as clear as it should have been.  I think:

	      /* We prefer to reload pseudos over reloading other
		 things, since such reloads may be able to be
		 eliminated later.  So bump REJECT in other cases.
		 Don't do this in the case where we are forcing a
		 constant into memory and it will then win since we
		 don't want to have a different alternative match
		 then.	*/
	      if (! (REG_P (op)
		     && REGNO (op) >= FIRST_PSEUDO_REGISTER)
		  && ! (const_to_mem && constmemok)
		  /* We can reload the address instead of memory (so
		     do not punish it).	 It is preferable to do to
		     avoid cycling in some cases.  */
		  && ! (MEM_P (op) && offmemok))
		reject += 2;

is fine.  The point I was trying to make is that this code (correctly IMO)
ignores register reload costs if the operand won't be reloaded into
a register.  The operand won't be reloaded into a register if:

	(const_to_mem && constmemok)
     || (MEM_P (op) && offmemok)

However:

	      if (! no_regs_p)
		reload_nregs
		  += ira_reg_class_max_nregs[this_alternative][mode];

seems to count the register reload cost regardless of whether
the operand would be reloaded into a register.  It felt like
it ought to be something like:

	      if (!((const_to_mem && const_mem_ok)
		    || (MEM_P (op) && offmemok)))
		{
		  /* We prefer to reload pseudos over reloading other
		     things, since such reloads may be able to be
		     eliminated later.  So bump REJECT in other cases.  */
		  if (! (REG_P (op)
			 && REGNO (op) >= FIRST_PSEUDO_REGISTER)
		    reject += 2;

		  if (! no_regs_p)
		    reload_nregs
		      += ira_reg_class_max_nregs[this_alternative][mode];
		}

Maybe other register-specific stuff could go in that if statement too.

Richard
Richard Sandiford Oct. 15, 2012, 4:49 p.m. UTC | #12
Hi Vlad,

Some comments about the rest of LRA.  Nothing major here...

Vladimir Makarov <vmakarov@redhat.com> writes:
> +/* Info about register in an insn.  */
> +struct lra_insn_reg
> +{
> +  /* The biggest mode through which the insn refers to the register
> +     (remember the register can be accessed through a subreg in the
> +     insn).  */
> +  ENUM_BITFIELD(machine_mode) biggest_mode : 16;

AFAICT, this is actually always the mode of a specific reference,
and if there are references to the same register in different modes,
those references get their own lra_insn_regs.  "mode" might be better
than "biggest_mode" if so.

> +/* Static part (common info for insns with the same ICODE) of LRA
> +   internal insn info.	It exists in at most one exemplar for each
> +   non-negative ICODE.	Warning: if the structure definition is
> +   changed, the initializer for debug_insn_static_data in lra.c should
> +   be changed too.  */

Probably worth saying (before the warning) that there is also
one structure for each asm.

> +/* LRA internal info about an insn (LRA internal insn
> +   representation).  */
> +struct lra_insn_recog_data
> +{
> +  int icode; /* The insn code.	*/
> +  rtx insn; /* The insn itself.	 */
> +  /* Common data for insns with the same ICODE.	 */
> +  struct lra_static_insn_data *insn_static_data;

Maybe worth mentioning asms here too.

> +  /* Two arrays of size correspondingly equal to the operand and the
> +     duplication numbers: */
> +  rtx **operand_loc; /* The operand locations, NULL if no operands.  */
> +  rtx **dup_loc; /* The dup locations, NULL if no dups.	 */
> +  /* Number of hard registers implicitly used in given call insn.  The
> +     value can be NULL or points to array of the hard register numbers
> +     ending with a negative value.  */
> +  int *arg_hard_regs;
> +#ifdef HAVE_ATTR_enabled
> +  /* Alternative enabled for the insn.	NULL for debug insns.  */
> +  bool *alternative_enabled_p;
> +#endif
> +  /* The alternative should be used for the insn, -1 if invalid, or we
> +     should try to use any alternative, or the insn is a debug
> +     insn.  */
> +  int used_insn_alternative;
> +  struct lra_insn_reg *regs;  /* Always NULL for a debug insn.	*/

Comments consistently above the field.

> +extern void lra_expand_reg_info (void);

This doesn't exist any more.

> +extern int lra_constraint_new_insn_uid_start;

Just saying in case: this seems to be write-only, with lra-constraints.c
instead using a static variable to track the uid start.

I realise you might want to keep it anyway for consistency with
lra_constraint_new_regno_start, or for debugging.

> +extern rtx lra_secondary_memory[NUM_MACHINE_MODES];

This doesn't exist any more.

> +/* lra-saves.c: */
> +
> +extern bool lra_save_restore (void);

Same for this file & function.

> +/* The function returns TRUE if at least one hard register from ones
> +   starting with HARD_REGNO and containing value of MODE are in set
> +   HARD_REGSET.	 */
> +static inline bool
> +lra_hard_reg_set_intersection_p (int hard_regno, enum machine_mode mode,
> +				 HARD_REG_SET hard_regset)
> +{
> +  int i;
> +
> +  lra_assert (hard_regno >= 0);
> +  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
> +    if (TEST_HARD_REG_BIT (hard_regset, hard_regno + i))
> +      return true;
> +  return false;
> +}

This is the same as overlaps_hard_reg_set_p.

> +/* Return hard regno and offset of (sub-)register X through arguments
> +   HARD_REGNO and OFFSET.  If it is not (sub-)register or the hard
> +   register is unknown, then return -1 and 0 correspondingly.  */

The function seems to return -1 for both.

> +/* Add hard registers starting with HARD_REGNO and holding value of
> +   MODE to the set S.  */
> +static inline void
> +lra_add_hard_reg_set (int hard_regno, enum machine_mode mode, HARD_REG_SET *s)
> +{
> +  int i;
> +
> +  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
> +    SET_HARD_REG_BIT (*s, hard_regno + i);
> +}

This is add_to_hard_reg_set.

> +   Here is block diagram of LRA passes:
> +
> +	  ---------------------				    
> +	 | Undo inheritance    |      ---------------	     --------------- 
> +	 | for spilled pseudos)|     | Memory-memory |	    | New (and old) |
> +	 | and splits (for     |<----| move coalesce |<-----|	 pseudos    |
> +	 | pseudos got the     |      ---------------	    |	assignment  |
> +  Start	 |  same  hard regs)   |			     --------------- 
> +    |	  ---------------------					    ^
> +    V		  |		 ----------------		    |
> + -----------	  V		| Update virtual |		    |
> +|  Remove   |----> ------------>|    register	 |		    |
> +| scratches |	  ^		|  displacements |		    |
> + -----------	  |		 ----------------		    |
> +		  |			 |			    |
> +		  |			 V	   New		    |
> +	 ----------------    No	   ------------	 pseudos   -------------------
> +	| Spilled pseudo | change |Constraints:| or insns | Inheritance/split |
> +	|    to memory	 |<-------|    RTL     |--------->|  transformations  |
> +	|  substitution	 |	  | transfor-  |	  |    in EBB scope   |
> +	 ----------------	  |  mations   |	   -------------------
> +		|		    ------------ 
> +		V
> +    -------------------------
> +   | Hard regs substitution, |
> +   |  devirtalization, and   |------> Finish
> +   | restoring scratches got |
> +   |	     memory	     |
> +    -------------------------

This is a great diagram, thanks.

> +/* Create and return a new reg from register FROM corresponding to
> +   machine description operand of mode MD_MODE.	 Initialize its
> +   register class to RCLASS.  Print message about assigning class
> +   RCLASS containing new register name TITLE unless it is NULL.	 The
> +   created register will have unique held value.  */
> +rtx
> +lra_create_new_reg_with_unique_value (enum machine_mode md_mode, rtx original,
> +				      enum reg_class rclass, const char *title)

Comment says FROM, but parameter is called ORIGINAL.  The code copes
with both null and non-register ORIGINALs, which aren't mentinoed in
the comment.

> +/* Target checks operands through operand predicates to recognize an
> +   insn.  We should have a special precaution to generate add insns
> +   which are frequent results of elimination.
> +
> +   Emit insns for x = y + z.  X can be used to store intermediate
> +   values and should be not in Y and Z when we use x to store an
> +   intermediate value.	*/ 

I think this should say what Y and Z are allowed to be, since it's
more than just registers and constants.

> +/* Map INSN_UID -> the operand alternative data (NULL if unknown).  We
> +   assume that this data is valid until register info is changed
> +   because classes in the data can be changed.	*/
> +struct operand_alternative *op_alt_data[LAST_INSN_CODE];

In that case I think it should be in a target_globals structure,
a bit like target_ira.

> +	    for (curr = list; curr != NULL; curr = curr->next)
> +	      if (curr->regno == regno)
> +		break;
> +	    if (curr == NULL || curr->subreg_p != subreg_p
> +		|| curr->biggest_mode != mode)
> +	      {
> +		/* This is a new hard regno or the info can not be
> +		   integrated into the found structure.	 */
> +#ifdef STACK_REGS
> +		early_clobber
> +		  = (early_clobber
> +		     /* This clobber is to inform popping floating
> +			point stack only.  */
> +		     && ! (FIRST_STACK_REG <= regno
> +			   && regno <= LAST_STACK_REG));
> +#endif
> +		list = new_insn_reg (regno, type, mode, subreg_p,
> +				     early_clobber, list);
> +	      }
> +	    else
> +	      {
> +		if (curr->type != type)
> +		  curr->type = OP_INOUT;
> +		if (curr->early_clobber != early_clobber)
> +		  curr->early_clobber = true;
> +	      }

OK, so this is probably only a technicality, but I think this should be:

	    for (curr = list; curr != NULL; curr = curr->next)
	      if (curr->regno == regno
		  && curr->subreg_p == subreg_p
		  && curr->biggest_mode == mode)
		{
		  ..reuse..;
		  return list;
		}
	     ..new entry..;
	     return list;

> +      icode = INSN_CODE (insn);
> +      if (icode < 0)
> +	/* It might be a new simple insn which is not recognized yet.  */
> +	INSN_CODE (insn) = icode = recog (PATTERN (insn), insn, 0);

Any reason not to use recog_memoized here?

> +      n = insn_static_data->n_operands;
> +      if (n == 0)
> +	locs = NULL;
> +      else
> +	{
> +	  
> +	  locs = (rtx **) xmalloc (n * sizeof (rtx *));
> +	  memcpy (locs, recog_data.operand_loc, n * sizeof (rtx *));
> +	}

Excess blank line after "else" (sorry!)

> +  /* Some output operand can be recognized only from the context not
> +     from the constraints which are empty in this case.	 Call insn may
> +     contain a hard register in set destination with empty constraint
> +     and extract_insn treats them as an input.	*/
> +  for (i = 0; i < insn_static_data->n_operands; i++)
> +    {
> +      int j;
> +      rtx pat, set;
> +      struct lra_operand_data *operand = &insn_static_data->operand[i];
> +
> +      /* ??? Should we treat 'X' the same way.	It looks to me that
> +	 'X' means anything and empty constraint means we do not
> +	 care.	*/

FWIW, I think any X output operand has to be "=X" or "+X"; just "X"
would be as wrong as "r".  genrecog is supposed to complain about that
for insns, and parse_output_constraint for asms.

So I agree the code is correct in just handling empty constraints.

> +/* Update all the insn info about INSN.	 It is usually called when
> +   something in the insn was changed.  Return the udpated info.	 */

Typo: updated.

> +  for (i = 0; i < reg_info_size; i++)
> +    {
> +      bitmap_initialize (&lra_reg_info[i].insn_bitmap, &reg_obstack);
> +#ifdef STACK_REGS
> +      lra_reg_info[i].no_stack_p = false;
> +#endif
> +      CLEAR_HARD_REG_SET (lra_reg_info[i].conflict_hard_regs);
> +      lra_reg_info[i].preferred_hard_regno1 = -1;
> +      lra_reg_info[i].preferred_hard_regno2 = -1;
> +      lra_reg_info[i].preferred_hard_regno_profit1 = 0;
> +      lra_reg_info[i].preferred_hard_regno_profit2 = 0;
> +      lra_reg_info[i].live_ranges = NULL;
> +      lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
> +      lra_reg_info[i].last_reload = 0;
> +      lra_reg_info[i].restore_regno = -1;
> +      lra_reg_info[i].val = get_new_reg_value ();
> +      lra_reg_info[i].copies = NULL;
> +    }

The same loop (with a different start index) appears in expand_reg_info.
It'd be nice to factor it out, so that there's only one place to update
if the structure is changed.

> +	  for (curr = data->regs; curr != NULL; curr = curr->next)
> +	    if (curr->regno == regno)
> +	      break;
> +	  if (curr->subreg_p != subreg_p || curr->biggest_mode != mode)
> +	    /* The info can not be integrated into the found
> +	       structure.  */
> +	    data->regs = new_insn_reg (regno, type, mode, subreg_p,
> +				       early_clobber, data->regs);
> +	  else
> +	    {
> +	      if (curr->type != type)
> +		curr->type = OP_INOUT;
> +	      if (curr->early_clobber != early_clobber)
> +		curr->early_clobber = true;
> +	    }
> +	  lra_assert (curr != NULL);
> +	}

Same loop comment as for collect_non_operand_hard_regs.  Maybe another
factoring opportunity.

> +		/* Some ports don't recognize the following addresses
> +		   as legitimate.  Although they are legitimate if
> +		   they satisfies the constraints and will be checked
> +		   by insn constraints which we ignore here.  */
> +		&& GET_CODE (XEXP (op, 0)) != UNSPEC
> +		&& GET_CODE (XEXP (op, 0)) != PRE_DEC
> +		&& GET_CODE (XEXP (op, 0)) != PRE_INC
> +		&& GET_CODE (XEXP (op, 0)) != POST_DEC
> +		&& GET_CODE (XEXP (op, 0)) != POST_INC
> +		&& GET_CODE (XEXP (op, 0)) != PRE_MODIFY
> +		&& GET_CODE (XEXP (op, 0)) != POST_MODIFY)

GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) == RTX_AUTOINC

> +/* Determine if the current function has an exception receiver block
> +   that reaches the exit block via non-exceptional edges  */
> +static bool
> +has_nonexceptional_receiver (void)
> +{
> +  edge e;
> +  edge_iterator ei;
> +  basic_block *tos, *worklist, bb;
> +
> +  /* If we're not optimizing, then just err on the safe side.  */
> +  if (!optimize)
> +    return true;
> +  
> +  /* First determine which blocks can reach exit via normal paths.  */
> +  tos = worklist = XNEWVEC (basic_block, n_basic_blocks + 1);
> +
> +  FOR_EACH_BB (bb)
> +    bb->flags &= ~BB_REACHABLE;
> +
> +  /* Place the exit block on our worklist.  */
> +  EXIT_BLOCK_PTR->flags |= BB_REACHABLE;
> +  *tos++ = EXIT_BLOCK_PTR;
> +  
> +  /* Iterate: find everything reachable from what we've already seen.  */
> +  while (tos != worklist)
> +    {
> +      bb = *--tos;
> +
> +      FOR_EACH_EDGE (e, ei, bb->preds)
> +	if (!(e->flags & EDGE_ABNORMAL))
> +	  {
> +	    basic_block src = e->src;
> +
> +	    if (!(src->flags & BB_REACHABLE))
> +	      {
> +		src->flags |= BB_REACHABLE;
> +		*tos++ = src;
> +	      }
> +	  }
> +    }
> +  free (worklist);
> +
> +  /* Now see if there's a reachable block with an exceptional incoming
> +     edge.  */
> +  FOR_EACH_BB (bb)
> +    if (bb->flags & BB_REACHABLE)
> +      FOR_EACH_EDGE (e, ei, bb->preds)
> +	if (e->flags & EDGE_ABNORMAL)
> +	  return true;
> +
> +  /* No exceptional block reached exit unexceptionally.	 */
> +  return false;
> +}

Looks like we could just early out on the first loop and get rid
of the second.

> +/* Remove all REG_DEAD and REG_UNUSED notes and regenerate REG_INC.
> +   We change pseudos by hard registers without notification of DF and
> +   that can make the notes obsolete.  DF-infrastructure does not deal
> +   with REG_INC notes -- so we should regenerate them here.  */

These days passes are supposed to regenerate REG_DEAD and REG_UNUSED
notes if they need them, so that part might not be necessary.
The REG_INC bit is still needed though...

> +/* Initialize LRA data once per function.  */
> +void
> +lra_init (void)
> +{
> +  init_op_alt_data ();
> +}

I think it's more like:

/* Initialize LRA whenever register-related information is changed.  */


In summary, LRA looks really good to me FWIW.  Thanks for all your hard work.

Getting rid of reload always seemed like a pipe dream, and if the only
known drawback of this replacement is that it takes a while on extreme
testcases, that's an amazing achievement.  (Not to say compile time
isn't important, just that there were so many other hurdles to overcome.)

It looks like opinion has crystalised in favour of merging LRA for 4.8.
I hope that's what happens.  I don't see that anything would be gained
by delaying it to 4.9.  The code's not going to get any more testing on the
branch that it already has; whenever we merge, the stress test is always
going to be trunk.

Richard
Vladimir Makarov Oct. 17, 2012, 12:54 a.m. UTC | #13
On 12-10-12 10:29 AM, Richard Sandiford wrote:
> Hi Vlad,
>
> Comments for the rest of ira-constraints.c.
>
> Vladimir Makarov<vmakarov@redhat.com>  writes:
>> +  saved_base_reg = saved_base_reg2 = saved_index_reg = NULL_RTX;
>> +  change_p = equiv_address_substitution (&ad, addr_loc, mode, as, code);
>> +  if (ad.base_reg_loc != NULL)
>> +    {
>> +      if (process_addr_reg
>> +	  (ad.base_reg_loc, before,
>> +	   (ad.base_modify_p && REG_P (*ad.base_reg_loc)
>> +	    && find_regno_note (curr_insn, REG_DEAD,
>> +				REGNO (*ad.base_reg_loc)) == NULL
>> +	    ? after : NULL),
>> +	   base_reg_class (mode, as, ad.base_outer_code, ad.index_code)))
>> +	change_p = true;
>> +      if (ad.base_reg_loc2 != NULL)
>> +	*ad.base_reg_loc2 = *ad.base_reg_loc;
>> +      saved_base_reg = *ad.base_reg_loc;
>> +      lra_eliminate_reg_if_possible (ad.base_reg_loc);
>> +      if (ad.base_reg_loc2 != NULL)
>> +	{
>> +	  saved_base_reg2 = *ad.base_reg_loc2;
>> +	  lra_eliminate_reg_if_possible (ad.base_reg_loc2);
>> +	}
> We unconditionally make *ad.base_reg_loc2 = *ad.base_reg_loc, so it
> might be clearer without saved_base_reg2.  More below...
>> +      /* The following addressing is checked by constraints and
>> +	 usually target specific legitimate address hooks do not
>> +	 consider them valid.  */
>> +      || GET_CODE (*addr_loc) == POST_DEC || GET_CODE (*addr_loc) == POST_INC
>> +      || GET_CODE (*addr_loc) == PRE_DEC || GET_CODE (*addr_loc) == PRE_DEC
> typo: two PRE_DECs, although:
>> +      || GET_CODE (*addr_loc) == PRE_MODIFY
>> +      || GET_CODE (*addr_loc) == POST_MODIFY
> the whole lot could just be replaced by ad.base_modify_p, or perhaps
> even removed entirely given:
>> +      /* In this case we can not do anything because if it is wrong
>> +	 that is because of wrong displacement.	 Remember that any
>> +	 address was legitimate in non-strict sense before LRA.	 */
>> +      || ad.disp_loc == NULL)
> It doesn't seem worth validating the address at all for ad.disp_loc == NULL.
> E.g. something like:
>
>    if (ad.base_reg_loc != NULL
>        && (process_addr_reg
> 	  (ad.base_reg_loc, before,
> 	   (ad.base_modify_p && REG_P (*ad.base_reg_loc)
> 	    && find_regno_note (curr_insn, REG_DEAD,
> 				REGNO (*ad.base_reg_loc)) == NULL
> 	    ? after : NULL),
> 	   base_reg_class (mode, as, ad.base_outer_code, ad.index_code))))
>      {
>        change_p = true;
>        if (ad.base_reg_loc2 != NULL)
>          *ad.base_reg_loc2 = *ad.base_reg_loc;
>      }
>
>    if (ad.index_reg_loc != NULL
>        && process_addr_reg (ad.index_reg_loc, before, NULL, INDEX_REG_CLASS))
>      change_p = true;
>
>    /* The address was valid before LRA.  We only change its form if the
>       address has a displacement, so if it has no displacement it must
>       still be valid.  */
>    if (ad.disp_loc == NULL)
>      return change_p;
>
>    /* See whether the address is still valid.  Some ports do not check
>       displacements for eliminable registers, so we replace them
>       temporarily with the elimination target.  */
>    saved_base_reg = saved_index_reg = NULL_RTX;
>    ...
>    if (ok_p)
>      return change_p;
Yes, it has sense.  I changed the code as you propose.
>> +#ifdef HAVE_lo_sum
>> +	  {
>> +	    rtx insn;
>> +	    rtx last = get_last_insn ();
>> +
>> +	    /* disp => lo_sum (new_base, disp)	*/
>> +	    insn = emit_insn (gen_rtx_SET
>> +			      (VOIDmode, new_reg,
>> +			       gen_rtx_HIGH (Pmode, copy_rtx (*ad.disp_loc))));
>> +	    code = recog_memoized (insn);
>> +	    if (code >= 0)
>> +	      {
>> +		rtx save = *ad.disp_loc;
>> +
>> +		*ad.disp_loc = gen_rtx_LO_SUM (Pmode, new_reg, *ad.disp_loc);
>> +		if (! valid_address_p (mode, *ad.disp_loc, as))
>> +		  {
>> +		    *ad.disp_loc = save;
>> +		    code = -1;
>> +		  }
>> +	      }
>> +	    if (code < 0)
>> +	      delete_insns_since (last);
>> +	  }
>> +#endif
> Nice :-)
>
> Purely for the record, I wondered whether the high part should be
> generated with emit_move_insn(_1) instead, with the rhs of the move
> being the HIGH rtx.  That would allow targets to cope with cases where
> the high part isn't represented directly as a HIGH.  E.g. on MIPS and
> Alpha, small-data accesses use the global register as the high part instead.
>
> However, both MIPS and Alpha accept small-data addresses as legitimate
> constants and addresses before and during reload and only introduce the
> split form after reload.  And I think that's how any other cases that
> aren't simple HIGHs should be handled too.  E.g. MIPS also represents
> GOT page loads as HIGHs until after reload, and only then lowers the
> HIGH to a GOT load.  Allowing the backend to generate anything other
> than a plain HIGH set here would be a double-edged sword.
>
> So after all that I agree that the gen_rtx_SET above is better than
> calling the move expanders.
Thanks for sharing your knowledge.
>> +	  /* index * scale + disp => new base + index * scale  */
>> +	  enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
>> +
>> +	  lra_assert (INDEX_REG_CLASS != NO_REGS);
>> +	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
>> +	  lra_assert (GET_CODE (*addr_loc) == PLUS);
>> +	  lra_emit_move (new_reg, *ad.disp_loc);
>> +	  if (CONSTANT_P (XEXP (*addr_loc, 1)))
>> +	    XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0);
>> +	  XEXP (*addr_loc, 0) = new_reg;
> The canonical form is (plus (mult ...) (reg)) rather than
> (plus (reg) (mult ...)), but it looks like we create the latter.
> I realise you try both forms here:
It might happen because equiv substitution in LRA.
>> +	  /* Some targets like ARM, accept address operands in
>> +	     specific order -- try exchange them if necessary.	*/
>> +	  if (! valid_address_p (mode, *addr_loc, as))
>> +	    {
>> +	      exchange_plus_ops (*addr_loc);
>> +	      if (! valid_address_p (mode, *addr_loc, as))
>> +		exchange_plus_ops (*addr_loc);
>> +	    }
> but I think we should try the canonical form first.  And I'd prefer it
> if we didn't try the other form at all, especially in 4.8.  It isn't
> really the backend's job to reject non-canonical rtl.  This might well
> be another case where some targets need a (hopefully small) tweak in
> order to play by the rules.
>
> Also, I suppose this section of code feeds back to my question on
> Wednesday about the distinction that LRA seems to make between the
> compile-time constant in:
>
>    (plus (reg X1) (const_int Y1))
>
> and the link-time constant in:
>
>    (plus (reg X2) (symbol_ref Y2))
>
> It looked like extract_address_regs classified X1 as a base register and
> X2 as an index register.  The difference between the two constants has
> no run-time significance though, and I think we should handle both X1
> and X2 as base registers (as I think reload does).
>
> I think the path above would then be specific to scaled indices.
> In the original address the "complex" index must come first and the
> displacement second.  In the modified address, the index would stay
> first and the new base register would be second.  More below.
As I wrote above the problem is also in that equiv substitution can 
create non-canonical forms.
>> +      /* We don't use transformation 'base + disp => base + new index'
>> +	 because of bad practice used in some machine descriptions
>> +	 (see comments for emit_spill_move).  */
>> +      /* base + disp => new base  */
> As before when commenting on emit_spill_move, I think we should leave
> the "bad machine description" stuff out of 4.8 and treat fixing the
> machine descriptions as part of the LRA port.
>
> In this case I think there's another reason not to reload the
> displacement into an index though: IIRC postreload should be able
> to optimise a sequence of address reloads that have the same base
> and different displacements.  LRA itself might try using "anchor"
> bases in future -- although obviously not in the initial merge --
> since that was one thing that LEGITIMIZE_RELOAD_ADDRESS was used for.
>
> E.g. maybe the justification could be:
>
>        /* base + disp => new base  */
>        /* Another option would be to reload the displacement into an
> 	 index register.  However, postreload has code to optimize
> 	 address reloads that have the same base and different
> 	 displacements, so reloading into an index register would
> 	 not necessarily be a win.  */
Fixed.
>> +      /* base + scale * index + disp => new base + scale * index  */
>> +      new_reg = base_plus_disp_to_reg (mode, as, &ad);
>> +      *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc);
>> +      if (! valid_address_p (mode, *addr_loc, as))
>> +	{
>> +	  /* Some targets like ARM, accept address operands in
>> +	     specific order -- try exchange them if necessary.	*/
>> +	  exchange_plus_ops (*addr_loc);
>> +	  if (! valid_address_p (mode, *addr_loc, as))
>> +	    exchange_plus_ops (*addr_loc);
>> +	}
> Same comment as above about canonical rtl.  Here we can have two
> registers -- in which case the base should come first -- or a more
> complex index -- in which case the index should come first.
>
> We should be able to pass both rtxes to simplify_gen_binary (PLUS, ...),
> with the operands in either order, and let it take care of the details.
> Using simplify_gen_binary would help with the earlier index+disp case too.
Equiv substitution can create non-canonical forms.  There are 2 approaches:
o have a code for dealing with non-canonical forms (equiv substitution, 
target stupidity)
o always support canonical forms and require them from targets.

I decided to use the 1st variant but I am reconsidering it.  I'll try to 
fix before inclusion.  But I am not sure I have time for this.  All 
these changes makes LRA unstable. In fact, I've just found that changes 
I already made so far resulted in 2 SPEC2000 tests broken although GCC 
testsuite and bootstrap looks good.

As I wrote we also have two different approaches to implement LRA. Fist 
one is to make easier porting LRA to other targets (which still requires 
some changes on target side).  The 2nd one is to require targets play by 
strict rules.  I believe that the 2nd approach will slow down transition 
to LRA and removing reload.  But probably it is more right.

After recent changes required by you, we are now in some middle 
approach.  A lot of targets on lra branch are broken by these changes 
and I will need some efforts on target dependent side to make them working.

If it were only target stupidity (as in ARM case), I would remove this 
code right away.  But unfortunately, equiv substitution is also a 
problem.  And right now, I have no time to fix it using your approach.  
The whole idea of including LRA for x86 for 4.8 was to get a real 
experience and harsh testing for LRA.  I am sorry, Richard but I think 
solving this problem as you want might result in postponing LRA 
inclusion until gcc4.9.


>> +  /* If this is post-increment, first copy the location to the reload reg.  */
>> +  if (post && real_in != result)
>> +    emit_insn (gen_move_insn (result, real_in));
> Nit, but real_in != result can never be true AIUI, and I was confused how
> the code could be correct in that case.  Maybe just remove it, or make
> it an assert?
No, it might be true:

real_in = in == value ? incloc : in;
...
if (cond)
   result = incloc;
else
   result = ...

if (post && real_in != result)

So it is true if in==value && cond
>> +  /* We suppose that there are insns to add/sub with the constant
>> +     increment permitted in {PRE/POST)_{DEC/INC/MODIFY}.  At least the
>> +     old reload worked with this assumption.  If the assumption
>> +     becomes wrong, we should use approach in function
>> +     base_plus_disp_to_reg.  */
>> +  if (in == value)
>> +    {
>> +      /* See if we can directly increment INCLOC.  */
>> +      last = get_last_insn ();
>> +      add_insn = emit_insn (plus_p
>> +			    ? gen_add2_insn (incloc, inc)
>> +			    : gen_sub2_insn (incloc, inc));
>> +
>> +      code = recog_memoized (add_insn);
>> +      /* We should restore recog_data for the current insn.  */
> Looks like this comment might be a left-over, maybe from before the
> cached insn data?
Yes.  The very first variant used recog_data and was much slower.  I 
removed the comment.
>> +      /* Restore non-modified value for the result.  We prefer this
>> +	 way because it does not require an addition hard
>> +	 register.  */
>> +      if (plus_p)
>> +	{
>> +	  if (CONST_INT_P (inc))
>> +	    emit_insn (gen_add2_insn (result, GEN_INT (-INTVAL (inc))));
>> +	  else
>> +	    emit_insn (gen_sub2_insn (result, inc));
>> +	}
>> +      else if (CONST_INT_P (inc))
>> +	emit_insn (gen_add2_insn (result, inc));
> The last two lines look redundant.  The behaviour is the same as for
> the following else:
>> +      else
>> +	emit_insn (gen_add2_insn (result, inc));
Fixed.
> and I don't think there are any cases where !plus && CONST_INT_P (inc)
> would hold.
It seems I thought about this and therefore I added a placeholder which 
I can remove now.
>> +/* Main entry point of this file: search the body of the current insn
> s/this file/the constraints code/, since it's a static function.
Fixed.
>> +  if (change_p)
>> +    /* Changes in the insn might result in that we can not satisfy
>> +       constraints in lately used alternative of the insn.  */
>> +    lra_set_used_insn_alternative (curr_insn, -1);
> Maybe:
>
>    /* If we've changed the instruction then any alternative that
>       we chose previously may no longer be valid.  */
Fixed.
>> +      rtx x;
>> +
>> +      curr_swapped = !curr_swapped;
>> +      if (curr_swapped)
>> +	{
>> +	  x = *curr_id->operand_loc[commutative];
>> +	  *curr_id->operand_loc[commutative]
>> +	    = *curr_id->operand_loc[commutative + 1];
>> +	  *curr_id->operand_loc[commutative + 1] = x;
>> +	  /* Swap the duplicates too.  */
>> +	  lra_update_dup (curr_id, commutative);
>> +	  lra_update_dup (curr_id, commutative + 1);
>> +	  goto try_swapped;
>> +	}
>> +      else
>> +	{
>> +	  x = *curr_id->operand_loc[commutative];
>> +	  *curr_id->operand_loc[commutative]
>> +	    = *curr_id->operand_loc[commutative + 1];
>> +	  *curr_id->operand_loc[commutative + 1] = x;
>> +	  lra_update_dup (curr_id, commutative);
>> +	  lra_update_dup (curr_id, commutative + 1);
>> +	}
> The swap code is the same in both cases, so I think it'd be better to
> make it common.  Or possibly a helper function, since the same code
> appears again later on.
Fixed.
>> +	if (GET_CODE (op) == PLUS)
>> +	  {
>> +	    plus = op;
>> +	    op = XEXP (op, 1);
>> +	  }
> Sorry, I'm complaining about old reload code again, but: does this
> actually happen in LRA?  In reload, a register operand could become a
> PLUS because of elimination, but I thought LRA did things differently.
> Besides, this is only needed for:
No, I don't think it happens in LRA.  It is a leftover from reload 
code.  I removed it.
>> +	if (CONST_POOL_OK_P (mode, op)
>> +	    && ((targetm.preferred_reload_class
>> +		 (op, (enum reg_class) goal_alt[i]) == NO_REGS)
>> +		|| no_input_reloads_p)
>> +	    && mode != VOIDmode)
>> +	  {
>> +	    rtx tem = force_const_mem (mode, op);
>> +	
>> +	    change_p = true;
>> +	    /* If we stripped a SUBREG or a PLUS above add it back.  */
>> +	    if (plus != NULL_RTX)
>> +	      tem = gen_rtx_PLUS (mode, XEXP (plus, 0), tem);
> and we shouldn't have (plus (constant ...) ...) after elimination
> (or at all outside of a CONST).  I don't understand why the code is
> needed even in reload.
I removed the plus code.
>> +  for (i = 0; i < n_operands; i++)
>> +    {
>> +      rtx old, new_reg;
>> +      rtx op = *curr_id->operand_loc[i];
>> +
>> +      if (goal_alt_win[i])
>> +	{
>> +	  if (goal_alt[i] == NO_REGS
>> +	      && REG_P (op)
>> +	      && lra_former_scratch_operand_p (curr_insn, i))
>> +	    change_class (REGNO (op), NO_REGS, "      Change", true);
> I think this could do with a comment.  Does setting the class to NO_REGS
> indirectly cause the operand to be switched back to a SCRATCH?
When we assign NO_REGS it means that we will not assign a hard register 
to the scratch pseudo and the scratch pseudo will be spilled.  Spilled 
scratch pseudos are transformed back to scratches at the LRA end.  I've 
added a comment.
>> +	  push_to_sequence (before);
>> +	  rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
>> +				   MEM, SCRATCH);
>> +	  if (code == PRE_DEC || code == POST_DEC
>> +	      || code == PRE_INC || code == POST_INC
>> +	      || code == PRE_MODIFY || code == POST_MODIFY)
> Very minor, but: GET_RTX_CLASS (code) == RTX_AUTOINC
Fixed.
>> +	  enum machine_mode mode;
>> +	  rtx reg, *loc;
>> +	  int hard_regno, byte;
>> +	  enum op_type type = curr_static_id->operand[i].type;
>> +
>> +	  loc = curr_id->operand_loc[i];
>> +	  mode = get_op_mode (i);
>> +	  if (GET_CODE (*loc) == SUBREG)
>> +	    {
>> +	      reg = SUBREG_REG (*loc);
>> +	      byte = SUBREG_BYTE (*loc);
>> +	      if (REG_P (reg)
>> +		  /* Strict_low_part requires reload the register not
>> +		     the sub-register.	*/
>> +		  && (curr_static_id->operand[i].strict_low
>> +		      || (GET_MODE_SIZE (mode)
>> +			  <= GET_MODE_SIZE (GET_MODE (reg))
>> +			  && (hard_regno
>> +			      = get_try_hard_regno (REGNO (reg))) >= 0
>> +			  && (simplify_subreg_regno
>> +			      (hard_regno,
>> +			       GET_MODE (reg), byte, mode) < 0)
>> +			  && (goal_alt[i] == NO_REGS
>> +			      || (simplify_subreg_regno
>> +				  (ira_class_hard_regs[goal_alt[i]][0],
>> +				   GET_MODE (reg), byte, mode) >= 0)))))
>> +		{
>> +		  loc = &SUBREG_REG (*loc);
>> +		  mode = GET_MODE (*loc);
>> +		}
>> +	  old = *loc;
> I think this needs a bit more justifying commentary (although I'm glad
> to see it's much simpler than the reload version :-)).  One thing in
> particular I didn't understand was why we don't reload the inner
> register of a paradoxical subreg.
It seems not necessary. As we use the biggest mode, pseudo gets all hard 
registers or wide stack slot for paradoxical subreg.
>> +	  if (get_reload_reg (type, mode, old, goal_alt[i], "", &new_reg)
>> +	      && type != OP_OUT)
>> +	    {
>> +	      push_to_sequence (before);
>> +	      lra_emit_move (new_reg, old);
>> +	      before = get_insns ();
>> +	      end_sequence ();
>> +	    }
>> +	  *loc = new_reg;
>> +	  if (type != OP_IN)
>> +	    {
>> +	      if (find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
>> +		{
>> +		  start_sequence ();
>> +		  /* We don't want sharing subregs as the pseudo can
>> +		     get a memory and the memory can be processed
>> +		     several times for eliminations.  */
>> +		  lra_emit_move (GET_CODE (old) == SUBREG && type == OP_INOUT
>> +				 ? copy_rtx (old) : old,
>> +				 new_reg);
> I think this should simply be:
>
>    lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg);
>
> leaving copy_rtx to figure out which rtxes can be shared.  No comment
> would be needed for that.
Fixed.
>> +		  emit_insn (after);
>> +		  after = get_insns ();
>> +		  end_sequence ();
>> +		}
>> +	      *loc = new_reg;
>> +	    }
> Very minor again, but: redundant *loc assignment (so that the two nested
> if statements collapse to one).
Fixed.  It seems a leftover from previous modifications.  First variants 
of the function were more complicated.
>> +      else
>> +	{
>> +	  lra_assert (INSN_CODE (curr_insn) < 0);
>> +	  error_for_asm (curr_insn,
>> +			 "inconsistent operand constraints in an %<asm%>");
>> +	  /* Avoid further trouble with this insn.  */
>> +	  PATTERN (curr_insn) = gen_rtx_USE (VOIDmode, const0_rtx);
>> +	  return false;
> Is this code handling a different case from the corresponding error
> code in curr_insn_transform?  If so, it probably deserves a comment
> explaining the difference.
I have no idea.  It is taken from reload.  I think we could try to 
remove it.  If process_alt_operands finds an alternative we should 
generate a code anyway.  I'll make it unreachable.
>> +/* Process all regs in debug location *LOC and change them on
>> +   equivalent substitution.  Return true if any change was done.  */
>> +static bool
>> +debug_loc_equivalence_change_p (rtx *loc)
> This doesn't keep the rtl in canonical form.  Probably the easiest and
> best fix is to use simplify_replace_fn_rtx, which handles all that for you.
> (simplify_replace_fn_rtx returns the original rtx if no change was made.)
See comments above about canonical forms.
>> +  for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
>> +    ira_reg_equiv[i].profitable_p = true;
>> +  for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
>> +    if (lra_reg_info[i].nrefs != 0)
>> +      {
>> +	if ((hard_regno = lra_get_regno_hard_regno (i)) >= 0)
>> +	  {
>> +	    int j, nregs = hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (i)];
>> +	
>> +	    for (j = 0; j < nregs; j++)
>> +	      df_set_regs_ever_live (hard_regno + j, true);
>> +	  }
>> +	else if ((x = get_equiv_substitution (regno_reg_rtx[i])) != NULL_RTX)
>> +	  {
>> +	    if (! first_p && contains_reg_p (x, false, false))
>> +	      /* After RTL transformation, we can not guarantee that
>> +		 pseudo in the substitution was not reloaded which
>> +		 might make equivalence invalid.  For example, in
>> +		 reverse equiv of p0
>> +
>> +		 p0 <- ...
>> +		 ...
>> +		 equiv_mem <- p0
>> +
>> +		 the memory address register was reloaded before the
>> +		 2nd insn.  */
>> +	      ira_reg_equiv[i].defined_p = false;
>> +	    if (contains_reg_p (x, false, true))
>> +	      ira_reg_equiv[i].profitable_p = false;
>> +	  }
>> +      }
> Do we need two loops because the second may check for equivalences
> of other pseudos besides "i"?  I couldn't see how offhand, but I might
> well have missed something.  Might be worth a comment.
No, we don't need two loops.  I merged them.
>> +	      dest_reg = SET_DEST (set);
>> +	      /* The equivalence pseudo could be set up as SUBREG in a
>> +		 case when it is a call restore insn in a mode
>> +		 different from the pseudo mode.  */
>> +	      if (GET_CODE (dest_reg) == SUBREG)
>> +		dest_reg = SUBREG_REG (dest_reg);
>> +	      if ((REG_P (dest_reg)
>> +		   && (x = get_equiv_substitution (dest_reg)) != dest_reg
>> +		   /* Remove insns which set up a pseudo whose value
>> +		      can not be changed.  Such insns might be not in
>> +		      init_insns because we don't update equiv data
>> +		      during insn transformations.
>> +			
>> +		      As an example, let suppose that a pseudo got
>> +		      hard register and on the 1st pass was not
>> +		      changed to equivalent constant.  We generate an
>> +		      additional insn setting up the pseudo because of
>> +		      secondary memory movement.  Then the pseudo is
>> +		      spilled and we use the equiv constant.  In this
>> +		      case we should remove the additional insn and
>> +		      this insn is not init_insns list.	 */
>> +		   && (! MEM_P (x) || MEM_READONLY_P (x)
>> +		       || in_list_p (curr_insn,
>> +				     ira_reg_equiv
>> +				     [REGNO (dest_reg)].init_insns)))
> This is probably a stupid question, sorry, but when do we ever want
> to keep an assignment to a substituted pseudo?  I.e. why isn't this just:
>
> 	      if ((REG_P (dest_reg)
> 		   && (x = get_equiv_substitution (dest_reg)) != dest_reg)
Equivalence can be memory location.  It means you can assign many 
different values to the location.  Removing them would generate a wrong 
code.  For example, if you use your simple variant of code, you will 
have > 100 test failures of GCC testsuite.
>> +/* Info about last usage of registers in EBB to do inheritance/split
>> +   transformation.  Inheritance transformation is done from a spilled
>> +   pseudo and split transformations from a hard register or a pseudo
>> +   assigned to a hard register.	 */
>> +struct usage_insns
>> +{
>> +  /* If the value is equal to CURR_USAGE_INSNS_CHECK, then the member
>> +     value INSNS is valid.  The insns is chain of optional debug insns
>> +     and a finishing non-debug insn using the corresponding reg.  */
>> +  int check;
>> +  /* Value of global reloads_num at the ???corresponding next insns.  */
>> +  int reloads_num;
>> +  /* Value of global reloads_num at the ???corresponding next insns.  */
>> +  int calls_num;
> "???s".  Probably "at the last instruction in INSNS" if that's accurate
> (because debug insns in INSNS don't affect these fields).
Fixed.
>> +/* Process all regs OLD_REGNO in location *LOC and change them on the
>> +   reload pseudo NEW_REG.  Return true if any change was done.	*/
>> +static bool
>> +substitute_pseudo (rtx *loc, int old_regno, rtx new_reg)
> This is another case where I found the term "reload pseudo" a bit confusing,
> since AIUI new_reg can be an inheritance or split pseudo rather than a pseudo
> created solely for insn reloads.  I'll follow up about that on the original
> thread.  Maybe just:
>
> /* Replace all references to register OLD_REGNO in *LOC with pseudo register
>     NEW_REG.  Return true if any change was made.  */
Fixed.
>> +  code = GET_CODE (x);
>> +  if (code == REG && (int) REGNO (x) == old_regno)
>> +    {
>> +      *loc = new_reg;
>> +      return true;
>> +    }
> Maybe assert that the modes are the same?
I've just realized that the modes might be different because of 
secondary_memory_needed_mode in split_reg (probably small chance).  So I 
added a code dealing with it.  Thanks, Richard.
>> +/* Do inheritance transformation for insn INSN defining (if DEF_P) or
>> +   using ORIGINAL_REGNO where the subsequent insn(s) in EBB (remember
>> +   we traverse insns in the backward direction) for the original regno
>> +   is NEXT_USAGE_INSNS.	 The transformations look like
> Maybe:
>
> /* Do interitance transformations for insn INSN, which defines (if DEF_P)
>     or uses ORIGINAL_REGNO.  NEXT_USAGE_INSNS specifies which instruction
>     in the EBB next uses ORIGINAL_REGNO; it has the same form as the
>     "insns" field of usage_insns.
>
>     The transformations look like:
Fixed.
>> +
>> +     p <- ...		  i <- ...
>> +     ...		  p <- i    (new insn)
>> +     ...	     =>
>> +     <- ... p ...	  <- ... i ...
>> +   or
>> +     ...		  i <- p    (new insn)
>> +     <- ... p ...	  <- ... i ...
>> +     ...	     =>
>> +     <- ... p ...	  <- ... i ...
>> +   where p is a spilled original pseudo and i is a new inheritance pseudo.
>> +
>> +   The inheritance pseudo has the smallest class of two classes CL and
>> +   class of ORIGINAL REGNO.  It will have unique value if UNIQ_P.  The
>> +   unique value is necessary for correct assignment to inheritance
>> +   pseudo for input of an insn which should be the same as output
>> +   (bound pseudos).  Return true if we succeed in such
>> +   transformation.  */
> This comment looks really good, but I still wasn't sure about the
> UNIQ_P thing.  AIUI this is for cases like:
>
>                         i <- p            [new insn]
>     r <- ... p ...      r <- ... i ...    [input reload]
>     r <- ... r ...   => r <- ... r ...    [original insn]
>     <- r                <- r              [output reload]
>     ....                ......
>     <- ... p ...        <- ... i ...      [next ref]
>
> where "r" is used on both sides of the original insn and where the
> output reload assigns to something other than "p" (otherwise "next ref"
> wouldn't be the next ref).  But why does this affect the way "i" is created?
> I think it'd be worth expanding that part a bit.
I realized that code is not necessary anymore.  It was necessary when I 
generated 2 pseudos for reloading matching operands with different 
modes.  Now I use one pseudo for this and use (may be illegal in other 
parts of GCC) subregs of this pseudo.  I modified the code.
>> +  if (! ira_reg_classes_intersect_p[cl][rclass])
>> +    {
>> +      if (lra_dump_file != NULL)
>> +	{
>> +	  fprintf (lra_dump_file,
>> +		   "	Rejecting inheritance for %d "
>> +		   "because of too different classes %s and %s\n",
> Suggest s/too different/disjoint/
Fixed.
>> +  if ((ira_class_subset_p[cl][rclass] && cl != rclass)
>> +      || ira_class_hard_regs_num[cl] < ira_class_hard_regs_num[rclass])
>> +    {
>> +      if (lra_dump_file != NULL)
>> +	fprintf (lra_dump_file, "    Use smallest class of %s and %s\n",
>> +		 reg_class_names[cl], reg_class_names[rclass]);
>> +
>> +      rclass = cl;
>> +    }
> I don't understand the second line of the if statement.  Why do we prefer
> classes with fewer allocatable registers?
>
> My guess before reading the code was that we'd use the subunion of CL and
> RCLASS, so maybe a comment explaining why we use this choice would help.
I added a comment

      /* We don't use a subset of two classes because it can be
         NO_REGS.  This transformation is still profitable in most
         cases even if the classes are not intersected as register
move is probably cheaper than a memory load.  */

>> +  if (NEXT_INSN (new_insns) != NULL_RTX)
>> +    {
>> +      if (lra_dump_file != NULL)
>> +	{
>> +	  fprintf (lra_dump_file,
>> +		   "	Rejecting inheritance %d->%d "
>> +		   "as it results in 2 or more insns:\n",
>> +		   original_regno, REGNO (new_reg));
>> +	  debug_rtl_slim (lra_dump_file, new_insns, NULL_RTX, -1, 0);
>> +	  fprintf (lra_dump_file,
>> +		   "	>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n");
>> +	}
>> +      return false;
>> +    }
> Hmm, I wasn't sure about this at first.  Some targets define patterns for
> multiword moves and split them later.  Others expose the split straight away.
> The two approaches don't really imply any difference in cost, so I didn't
> want us to penalise the latter.
>
> But I suppose on targets that split straight away, lower-subreg would
> tend to replace the multiword pseudo with individual word-sized pseudos,
> so LRA shouldn't see them.  I suppose this check shouldn't matter in practice.
I think I saw it on practice and therefore I added the code.  I don't 
remember details although.  I tried to document in commentaries some 
specific cases in LRA.  But probably I should have made more comments 
with examples in the code because it is easy to forget them (too many 
cases and details).
>> +  if (def_p)
>> +    lra_process_new_insns (insn, NULL_RTX, new_insns,
>> +			   "Add original<-inheritance");
>> +  else
>> +    lra_process_new_insns (insn, new_insns, NULL_RTX,
>> +			   "Add inheritance<-pseudo");
> Maybe "original" rather than "pseudo" here too for consistency.
Fixed.
>> +/* Return true if we need a split for hard register REGNO or pseudo
>> +   REGNO which was assigned to a hard register.
>> +   POTENTIAL_RELOAD_HARD_REGS contains hard registers which might be
>> +   used for reloads since the EBB end.	It is an approximation of the
>> +   used hard registers in the split range.  The exact value would
>> +   require expensive calculations.  If we were aggressive with
>> +   splitting because of the approximation, the split pseudo will save
>> +   the same hard register assignment and will be removed in the undo
>> +   pass.  We still need the approximation because too aggressive
>> +   splitting would result in too inaccurate cost calculation in the
>> +   assignment pass because of too many generated moves which will be
>> +   probably removed in the undo pass.  */
>> +static inline bool
>> +need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno)
>> +{
>> +  int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
>> +
>> +  lra_assert (hard_regno >= 0);
>> +  return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
>> +	   && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
>> +	   && (usage_insns[regno].reloads_num
>> +	       + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
>> +	   && ((regno < FIRST_PSEUDO_REGISTER
>> +		&& ! bitmap_bit_p (&ebb_global_regs, regno))
>> +	       || (regno >= FIRST_PSEUDO_REGISTER
>> +		   && lra_reg_info[regno].nrefs > 3
>> +		   && bitmap_bit_p (&ebb_global_regs, regno))))
>> +	  || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno)));
>> +}
> Could you add more commentary about the thinking behind this particular
> choice of heuristic?  E.g. I wasn't sure what the reloads_num check did,
> or why we only split hard registers that are local to the EBB and only
> split pseudos that aren't.
>
> The 2 and 3 numbers seemed a bit magic too.  I suppose the 2 has
> something to do with "one save and one restore", but I wasn't sure
> why we applied it only for pseudos.  (AIUI that arm of the check
> deals with "genuine" split pseudos rather than call saves & restores.)
>
> Still, it says a lot for the high quality of LRA that, out of all the
> 1000s of lines of code I've read so far, this is the only part that
> didn't seem to have an intuitive justification.
Yes, right.  I checked many parameters and finally picked out the above 
ones.  What I found is that aggressive and even moderate splitting 
usually result in worse code.  Most splitting is undone and if we do 
splitting aggressively we generate a lot of insns which will be removed 
but their costs are taken into account during assignment pass.  
Inheritance has quite bigger chance to successful.  One reason why 
splitting is undone is that spilling + inheritance of short living 
pseudos is a substitution of splitting in some way.  Therefore I do 
splitting for long living pseudos. That means non local pseudos.

I added the following comments.

   return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
            && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
            /* We need at least 2 reloads to make pseudo splitting
               profitable.  We should provide hard regno splitting in
               any case to solve 1st insn scheduling problem when
               moving hard register definition up might result in
               impossibility to find hard register for reload pseudo of
               small register class.  */
            && (usage_insns[regno].reloads_num
                + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
            && (regno < FIRST_PSEUDO_REGISTER
                /* For short living pseudos, spilling + inheritance can
                   be considered a substitution for splitting.
                   Therefore we do not splitting for local pseudos.  It
                   decreases also aggressiveness of splitting.  The
                   minimal number of references is chosen taking into
                   account that for 2 references splitting has no sense
                   as we can just spill the pseudo.  */
                || (regno >= FIRST_PSEUDO_REGISTER
                    && lra_reg_info[regno].nrefs > 3
                    && bitmap_bit_p (&ebb_global_regs, regno))))

I removed checking that hard_regno is local.  It was a parameter to 
decrease aggressiveness of hard register splitting but I realized that 
it contradicts the reason for splitting to solve 1st insn scheduling 
problem because regional scheduling can move hard register definition 
beyond BB.
>> +  for (i = 0;
>> +       (cl = reg_class_subclasses[allocno_class][i]) != LIM_REG_CLASSES;
>> +       i++)
>> +    if (! SECONDARY_MEMORY_NEEDED (cl, hard_reg_class, mode)
>> +	&& ! SECONDARY_MEMORY_NEEDED (hard_reg_class, cl, mode)
>> +	&& TEST_HARD_REG_BIT (reg_class_contents[cl], hard_regno)
>> +	&& (best_cl == NO_REGS
>> +	    || (hard_reg_set_subset_p (reg_class_contents[best_cl],
>> +				       reg_class_contents[cl])
>> +		&& ! hard_reg_set_equal_p (reg_class_contents[best_cl],
>> +					   reg_class_contents[cl]))))
>> +      best_cl = cl;
> OK, so this suggestion isn't backed up by any evidence, but what do
> you think about this alternative:
>
> 	&& (best_cl == NO_REGS
> 	    || (ira_class_hard_regs_num[best_cl]
> 		< ira_class_hard_regs_num[cl]))
>
> which should choose the largest class that requires no secondary memory.
> It looks like the subset version could get "stuck" on a single-register
> class that happens to be early in the list but has no superclass smaller
> than allocno_class.
It seems reasonable.  I choose your variant.
>> +/* Do split transformation for insn INSN defining or
>> +   using ORIGINAL_REGNO where the subsequent insn(s) in EBB (remember
>> +   we traverse insns in the backward direction) for the original regno
>> +   is NEXT_USAGE_INSNS.	 The transformations look like
> Same suggestion as for the inheritance function above.
Fixed.
>> +  if (call_save_p)
>> +    save = emit_spill_move (true, new_reg, original_reg, -1);
>> +  else
>> +    {
>> +      start_sequence ();
>> +      emit_move_insn (new_reg, original_reg);
>> +      save = get_insns ();
>> +      end_sequence ();
>> +    }
>> +  if (NEXT_INSN (save) != NULL_RTX)
>> +    {
>> +      lra_assert (! call_save_p);
> Is emit_spill_move really guaranteed to return only one instruction in
> cases where emit_move_insn might not?  Both of them use emit_move_insn_1
> internally, so I wouldn't have expected much difference.
>
> In fact I wasn't really sure why:
>
>    save = gen_move_insn (new, original_reg);
For caller save, modes new and original can be different.
> wouldn't be correct for both.
>
> Same comments for the restore code.
I fixed it using only emit_spill_move.
>> +      /* See which defined values die here.  */
>> +      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
>> +	if (reg->type == OP_OUT && ! reg->early_clobber
>> +	    && (! reg->subreg_p
>> +		|| bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
>> +	  bitmap_clear_bit (&live_regs, reg->regno);
>> +      /* Mark each used value as live.	*/
>> +      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
>> +	if (reg->type == OP_IN
>> +	    && bitmap_bit_p (&check_only_regs, reg->regno))
>> +	  bitmap_set_bit (&live_regs, reg->regno);
>> +      /* Mark early clobber outputs dead.  */
>> +      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
>> +	if (reg->type == OP_OUT && reg->early_clobber && ! reg->subreg_p)
>> +	  bitmap_clear_bit (&live_regs, reg->regno);
> I don't think this would be correct for unreloaded insns because an
> unreloaded insn can have the same pseudo as an input and an earlyclobber
> output.  (Probably not an issue here, since we're called after the
> constraints pass.)  There's also the case of matched earlyclobber operands,
> where the matched input is specifically not affected by the earlyclobber.
>
> I'd have thought:
>
>        /* See which defined values die here.  */
>        for (reg = curr_id->regs; reg != NULL; reg = reg->next)
> 	if (reg->type == OP_OUT
> 	    && (! reg->subreg_p
> 		|| bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
> 	  bitmap_clear_bit (&live_regs, reg->regno);
>        /* Mark each used value as live.	*/
>        for (reg = curr_id->regs; reg != NULL; reg = reg->next)
> 	if (reg->type == OP_IN
> 	    && bitmap_bit_p (&check_only_regs, reg->regno))
> 	  bitmap_set_bit (&live_regs, reg->regno);
>
> ought to be correct, but perhaps I'm missing something.
>
> (I'm still uneasy about the special treatment of bound pseudos here.
> A clobber really does seem better.)
It seems a right variant.  I am going to use it.
>> +      /* It is quite important to remove dead move insns because it
>> +	 means removing dead store, we don't need to process them for
>> +	 constraints, and unfortunately some subsequent optimizations
>> +	 (like shrink-wrapping) currently based on assumption that
>> +	 there are no trivial dead insns.  */
> Maybe best to drop the "subsequent optimizations" part.  This comment
> is unlikely to be updated after any change to shrink-wrapping & co.,
> and the first two justifications seem convincing enough on their own.
Fixed.
>> +/* Add inheritance info REGNO and INSNS.  */
>> +static void
>> +add_to_inherit (int regno, rtx insns)
>> +{
>> +  int i;
>> +
>> +  for (i = 0; i < to_inherit_num; i++)
>> +    if (to_inherit[i].regno == regno)
>> +      return;
> Is the existing "insns" field guaranteed to match the "insns" parameter
> in this case, or might they be different?  Probably worth an assert or
> comment respectively.
I added a comment.
>> +/* Return first (if FIRST_P) or last non-debug insn in basic block BB.
>> +   Return null if there are no non-debug insns in the block.  */
>> +static rtx
>> +get_non_debug_insn (bool first_p, basic_block bb)
>> +{
>> +  rtx insn;
>> +
>> +  for (insn = first_p ? BB_HEAD (bb) : BB_END (bb);
>> +       insn != NULL_RTX && ! NONDEBUG_INSN_P (insn);
>> +       insn = first_p ? NEXT_INSN (insn) : PREV_INSN (insn))
>> +    ;
>> +  if (insn != NULL_RTX && BLOCK_FOR_INSN (insn) != bb)
>> +    insn = NULL_RTX;
>> +  return insn;
>> +}
> It probably doesn't matter in practice, but it looks like it'd be better
> to limit the walk to the bb, rather than walking until null and then
> testing the bb after the walk.
>
> Maybe it would be eaiser to split into two functions, since first_p is
> always constant.  E.g.:
>
>    rtx insn;
>
>    FOR_BB_INSNS (bb, insn)
>      if (NONDEBUG_INSN_P (insn))
>        return insn;
>    return NULL_RTX;
>
> for first_p.  s/FOR_BB_INSNS/FOR_BB_INSNS_REVERSE/ for !first_p.
That is better.  Fixed.
>> +/* Set up RES by registers living on edges FROM except the edge (FROM,
>> +   TO) or by registers set up in a jump insn in BB FROM.  */
>> +static void
>> +get_live_on_other_edges (basic_block from, basic_block to, bitmap res)
>> +{
>> +  int regno;
>> +  rtx last;
>> +  struct lra_insn_reg *reg;
>> +  edge e;
>> +  edge_iterator ei;
>> +
>> +  lra_assert (to != NULL);
>> +  bitmap_clear (res);
>> +  FOR_EACH_EDGE (e, ei, from->succs)
>> +    if (e->dest != to)
>> +      bitmap_ior_into (res, DF_LR_IN (e->dest));
>> +  if ((last = get_non_debug_insn (false, from)) == NULL_RTX || ! JUMP_P (last))
>> +    return;
>> +  curr_id = lra_get_insn_recog_data (last);
>> +  for (reg = curr_id->regs; reg != NULL; reg = reg->next)
>> +    if (reg->type != OP_IN
>> +	&& (regno = reg->regno) >= FIRST_PSEUDO_REGISTER)
>> +      bitmap_set_bit (res, regno);
>> +}
> Probably a silly question, sorry, but: why does the JUMP_P part only
> include pseudo registers?  The other calculations (here and elsewhere)
> seem to handle both hard and pseudo registers.
No, that is not a silly question.  I think it is a potential bug 
although it is probability is very small.  Originally, splitting and 
inheritance were done only on pseudos.  After that, splitting hard 
registers were added and that code was not changed.  Thanks, Richard.  I 
removed this condition.
>> +/* Do inheritance/split transformations in EBB starting with HEAD and
>> +   finishing on TAIL.  We process EBB insns in the reverse order.
>> +   Return true if we did any inheritance/split transformation in the
>> +   EBB.
>> +
>> +   We should avoid excessive splitting which results in worse code
>> +   because of inaccurate cost calculations for spilling new split
>> +   pseudos in such case.  To achieve this we do splitting only if
>> +   register pressure is high in given basic block and there reload
> "...and there are reload"
Fixed.
>> +   pseudos requiring hard registers.  We could do more register
>> +   pressure calculations at any given program point to avoid necessary
>> +   splitting even more but it is to expensive and the current approach
>> +   is well enough.  */
> "works well enough".
Fixed.
>> +  change_p = false;
>> +  curr_usage_insns_check++;
>> +  reloads_num = calls_num = 0;
>> +  /* Remember: we can remove the current insn.	*/
>> +  bitmap_clear (&check_only_regs);
>> +  last_processed_bb = NULL;
> I couldn't tell which part of the code the comment is referring to.
> Maybe left over?
It is a leftover.  I removed it.
>> +	  after_p = (last_insn != NULL_RTX && ! JUMP_P (last_insn)
>> +		     && (! CALL_P (last_insn)
>> +			 || (find_reg_note (last_insn,
>> +					   REG_NORETURN, NULL) == NULL_RTX
>> +			     && ((next_insn
>> +				  = next_nonnote_nondebug_insn (last_insn))
>> +				 == NULL_RTX
>> +				 || GET_CODE (next_insn) != BARRIER))));
> Genuine question, but: when are the last four lines needed?  The condition
> that they're testing for sounds like a noreturn call.
Yes, probably you are right.  I also can not imagine such situation.  I 
am removing them.
>> +      if (src_regno < lra_constraint_new_regno_start
>> +	  && src_regno >= FIRST_PSEUDO_REGISTER
>> +	  && reg_renumber[src_regno] < 0
>> +	  && dst_regno >= lra_constraint_new_regno_start
>> +	  && (cl = lra_get_allocno_class (dst_regno)) != NO_REGS)
>> +	{
>> +	  /* 'reload_pseudo <- original_pseudo'.  */
>> +	  reloads_num++;
>> +	  succ_p = false;
>> +	  if (usage_insns[src_regno].check == curr_usage_insns_check
>> +	      && (next_usage_insns = usage_insns[src_regno].insns) != NULL_RTX)
>> +	    succ_p = inherit_reload_reg (false,
>> +					 bitmap_bit_p (&lra_matched_pseudos,
>> +						       dst_regno),
>> +					 src_regno, cl,
>> +					 curr_insn, next_usage_insns);
>> +	  if (succ_p)
>> +	    change_p = true;
>> +	  else
>> +	    {
>> +	      usage_insns[src_regno].check = curr_usage_insns_check;
>> +	      usage_insns[src_regno].insns = curr_insn;
>> +	      usage_insns[src_regno].reloads_num = reloads_num;
>> +	      usage_insns[src_regno].calls_num = calls_num;
>> +	      usage_insns[src_regno].after_p = false;
>> +	    }
> Looks like this and other places could use the add_next_usage_insn
> helper function.
Fixed.
>> +	  if (cl != NO_REGS
>> +	      && hard_reg_set_subset_p (reg_class_contents[cl],
>> +					live_hard_regs))
>> +	    IOR_HARD_REG_SET (potential_reload_hard_regs,
>> +			      reg_class_contents[cl]);
> Redundant "cl != NO_REGS" check.  (Was a bit confused by that at first.)
Fixed.
> I don't understand the way potential_reload_hard_regs is set up.
> Why does it only include reload pseudos involved in moves of the form
> "reload_pseudo <- original_pseudo" and "original_pseudo <- reload_pseudo",
> but include those reloads regardless of whether inheritance is possible?
>
> I wondered whether it might be deliberately selective in order to speed
> up LRA, but we walk all the registers in an insn regardless.
>
> Same for reloads_num.
It is just simple heuristics.  They can be changed.  Adding new 
heuristics (more complicated ones) requires a lot of experiments. This 
work can be done lately.  LRA as now is not frozen.  Its development 
will be continued.
>> +	  if (cl != NO_REGS
>> +	      && hard_reg_set_subset_p (reg_class_contents[cl],
>> +					live_hard_regs))
>> +	    IOR_HARD_REG_SET (potential_reload_hard_regs,
>> +			      reg_class_contents[cl]);
> Same comment as for the previous block.
Fixed.
>> +		if (reg_renumber[dst_regno] < 0
>> +		    || (reg->type == OP_OUT && ! reg->subreg_p))
>> +		/* Invalidate.	*/
>> +		usage_insns[dst_regno].check = 0;
> Could you explain this condition a bit more?  Why does reg_renumber
> affect things?
I added a comment.
>> +/* This value affects EBB forming.  If probability of edge from EBB to
>> +   a BB is not greater than the following value, we don't add the BB
>> +   to EBB.  */
>> +#define EBB_PROBABILITY_CUTOFF (REG_BR_PROB_BASE / 2)
> It looks like schedule_ebbs uses a higher default cutoff for FDO.
> Would the same distinction be useful here?
>
> Maybe schedule_ebbs-like params would be good here too.
I am thinking about adding several parametersfor IRA and LRA.  But I'd 
like to add for the next release.
>> +  bitmap_and (&temp_bitmap_head, removed_pseudos, live);
>> +  EXECUTE_IF_SET_IN_BITMAP (&temp_bitmap_head, 0, regno, bi)
> This isn't going to have much effect on compile time, but
> EXECUTE_IF_AND_IN_BITMAP avoids the need for a temporary bitmap.
Fixed.
>> +/* Remove inheritance/split pseudos which are in REMOVE_PSEUDOS and
>> +   return true if we did any change.  The undo transformations for
>> +   inheritance looks like
>> +      i <- i2
>> +      p <- i	  =>   p <- i2
>> +   or removing
>> +      p <- i, i <- p, and i <- i3
>> +   where p is original pseudo from which inheritance pseudo i was
>> +   created, i and i3 are removed inheritance pseudos, i2 is another
>> +   not removed inheritance pseudo.  All split pseudos or other
>> +   occurrences of removed inheritance pseudos are changed on the
>> +   corresponding original pseudos.  */
>> +static bool
>> +remove_inheritance_pseudos (bitmap remove_pseudos)
>> +{
>> +  basic_block bb;
>> +  int regno, sregno, prev_sregno, dregno, restore_regno;
>> +  rtx set, prev_set, prev_insn;
>> +  bool change_p, done_p;
>> +
>> +  change_p = ! bitmap_empty_p (remove_pseudos);
> I wondered from the comment why we couldn't just return straight away
> for the empty set, but it looks like the function also schedules a
> constraints pass for instructions that keep their inheritance or
> split pseudos.  Is that right?  Might be worth mentioning that
> in the function comment if so.
Yes, that is right.  I've added a comment.
>> +	      else if (bitmap_bit_p (remove_pseudos, sregno)
>> +		       && bitmap_bit_p (&lra_inheritance_pseudos, sregno))
>> +		{
>> +		  /* Search the following pattern:
>> +		       inherit_or_split_pseudo1 <- inherit_or_split_pseudo2
>> +		       original_pseudo <- inherit_or_split_pseudo1
>> +		    where the 2nd insn is the current insn and
>> +		    inherit_or_split_pseudo2 is not removed.  If it is found,
>> +		    change the current insn onto:
>> +		       original_pseudo1 <- inherit_or_split_pseudo2.  */
> s/original_pseudo1/original_pseudo/ I think (we don't change the destination).
Yes, it is a typo.  Fixed.
>> +		  for (prev_insn = PREV_INSN (curr_insn);
>> +		       prev_insn != NULL_RTX && ! NONDEBUG_INSN_P (prev_insn);
>> +		       prev_insn = PREV_INSN (prev_insn))
>> +		    ;
>> +		  if (prev_insn != NULL_RTX && BLOCK_FOR_INSN (prev_insn) == bb
>> +		      && (prev_set = single_set (prev_insn)) != NULL_RTX
>> +		      /* There should be no subregs in insn we are
>> +			 searching because only the original reg might
>> +			 be in subreg when we changed the mode of
>> +			 load/store for splitting.  */
>> +		      && REG_P (SET_DEST (prev_set))
>> +		      && REG_P (SET_SRC (prev_set))
>> +		      && (int) REGNO (SET_DEST (prev_set)) == sregno
>> +		      && ((prev_sregno = REGNO (SET_SRC (prev_set)))
>> +			  >= FIRST_PSEUDO_REGISTER)
>> +		      && (lra_reg_info[sregno].restore_regno
>> +			  == lra_reg_info[prev_sregno].restore_regno)
>> +		      && ! bitmap_bit_p (remove_pseudos, prev_sregno))
> I'm sure the restore_regno comparison near the end is correct,
> but could you add a comment to explain it?  The substitution
> itself seems OK either way.
I added a comment.
>> +	      struct lra_insn_reg *reg;
>> +	      bool insn_change_p = false;
>> +
>> +	      curr_id = lra_get_insn_recog_data (curr_insn);
>> +	      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
>> +		if ((regno = reg->regno) >= lra_constraint_new_regno_start
>> +		    && lra_reg_info[regno].restore_regno >= 0)
> Is the first part of the comparison needed?  Most other places don't check,
> so it looked at first glance like there was something special here.
I think it is a leftover from older implementation where I worked 
differently with restore_reg.  I removed.
>> +		  {
>> +		    if (change_p && bitmap_bit_p (remove_pseudos, regno))
>> +		      {
>> +			restore_regno = lra_reg_info[regno].restore_regno;
>> +			substitute_pseudo (&curr_insn, regno,
>> +					   regno_reg_rtx[restore_regno]);
>> +			insn_change_p = true;
>> +		      }
>> +		    else if (NONDEBUG_INSN_P (curr_insn))
>> +		      {
>> +			lra_push_insn_and_update_insn_regno_info (curr_insn);
>> +			lra_set_used_insn_alternative_by_uid
>> +			  (INSN_UID (curr_insn), -1);
>> +		      }
>> +		  }
>> +	      if (insn_change_p)
>> +		{
>> +		  lra_update_insn_regno_info (curr_insn);
>> +		  if (lra_dump_file != NULL)
>> +		    {
>> +		      fprintf (lra_dump_file, "	   Restore original insn:\n");
>> +		      debug_rtl_slim (lra_dump_file,
>> +				      curr_insn, curr_insn, -1, 0);
>> +		    }
>> +		}
> AIUI we could have a partial restore, keeping some registers but
> restoring others.  Is that right?  The dump entry made it sounds
> like a full restore.
Yes, the dump is a bit misleading.  It might be a partial restore.
> Maybe something like:
>
> 	      struct lra_insn_reg *reg;
> 	      bool restored_regs_p = false;
> 	      bool kept_regs_p = false;
>
> 	      curr_id = lra_get_insn_recog_data (curr_insn);
> 	      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
> 		{
> 		  regno = reg->regno;
> 		  restore_regno = lra_reg_info[regno].restore_regno;
> 		  if (restore_regno >= 0)
> 		    {
> 		      if (change_p && bitmap_bit_p (remove_pseudos, regno))
> 			{
> 			  substitute_pseudo (&curr_insn, regno,
> 					     regno_reg_rtx[restore_regno]);
> 			  restored_regs_p = true;
> 			}
> 		      else
> 			kept_regs_p = true;
> 		    }
> 		}
> 	      if (NONDEBUG_INSN_P (curr_insn) && kept_regs_p)
> 		{
> 		  /* The instruction has changed since the previous
> 		     constraints pass.  */
> 		  lra_push_insn_and_update_insn_regno_info (curr_insn);
> 		  lra_set_used_insn_alternative_by_uid
> 		    (INSN_UID (curr_insn), -1);
> 		}
> 	      else if (restored_regs_p)
> 		/* The instruction has been restored to the form that
> 		   it had during the previous constraints pass.  */
> 		lra_update_insn_regno_info (curr_insn);
>
> 	      if (restored_regs_p && lra_dump_file != NULL)
> 		{
> 		  fprintf (lra_dump_file,
> 			   "	   Insn after restoring regs:\n");
> 		  debug_rtl_slim (lra_dump_file, curr_insn, curr_insn, -1, 0);
> 		}
>
> (if correct) might make the partial case clearer, but that's personal
> preference, so please feel free to ignore, chop or change.
  I used your variant.
> Also, is regno_reg_rtx[restore_regno] always correct when restoring
> registers?  I thought restore_regno could be a hard register and that
> the hard register might not necessarily be used in the same mode as
> the regno_reg_rtx[] entry.
Right.  I fixed it.
> That just leaves lra.h, lra-int.h and lra.c itself.  I'm hoping to have
> read through those by the middle of next week, but any comments about them
> will probably just be banal stuff (even more than my comments so far :-))
> so I deliberately left them to last.
>
Richard, thanks again for your invaluable feedback which made and will 
make LRA better.
Richard Sandiford Oct. 17, 2012, 11:24 a.m. UTC | #14
Thanks for all the updates.

Vladimir Makarov <vmakarov@redhat.com> writes:
>>> +	  /* index * scale + disp => new base + index * scale  */
>>> +	  enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
>>> +
>>> +	  lra_assert (INDEX_REG_CLASS != NO_REGS);
>>> +	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
>>> +	  lra_assert (GET_CODE (*addr_loc) == PLUS);
>>> +	  lra_emit_move (new_reg, *ad.disp_loc);
>>> +	  if (CONSTANT_P (XEXP (*addr_loc, 1)))
>>> +	    XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0);
>>> +	  XEXP (*addr_loc, 0) = new_reg;
>> The canonical form is (plus (mult ...) (reg)) rather than
>> (plus (reg) (mult ...)), but it looks like we create the latter.
>> I realise you try both forms here:
> It might happen because equiv substitution in LRA.
>>> +	  /* Some targets like ARM, accept address operands in
>>> +	     specific order -- try exchange them if necessary.	*/
>>> +	  if (! valid_address_p (mode, *addr_loc, as))
>>> +	    {
>>> +	      exchange_plus_ops (*addr_loc);
>>> +	      if (! valid_address_p (mode, *addr_loc, as))
>>> +		exchange_plus_ops (*addr_loc);
>>> +	    }
>> but I think we should try the canonical form first.  And I'd prefer it
>> if we didn't try the other form at all, especially in 4.8.  It isn't
>> really the backend's job to reject non-canonical rtl.  This might well
>> be another case where some targets need a (hopefully small) tweak in
>> order to play by the rules.
>>
>> Also, I suppose this section of code feeds back to my question on
>> Wednesday about the distinction that LRA seems to make between the
>> compile-time constant in:
>>
>>    (plus (reg X1) (const_int Y1))
>>
>> and the link-time constant in:
>>
>>    (plus (reg X2) (symbol_ref Y2))
>>
>> It looked like extract_address_regs classified X1 as a base register and
>> X2 as an index register.  The difference between the two constants has
>> no run-time significance though, and I think we should handle both X1
>> and X2 as base registers (as I think reload does).
>>
>> I think the path above would then be specific to scaled indices.
>> In the original address the "complex" index must come first and the
>> displacement second.  In the modified address, the index would stay
>> first and the new base register would be second.  More below.
> As I wrote above the problem is also in that equiv substitution can 
> create non-canonical forms.

Right.  Just in case there's a misunderstanding: I'm not complaining
about these routines internally using forms that are noncanonical
(which could happen because of equiv substitution, like you say).
I just think that what we eventually try to validate should be canonical.
In a way it's similar to how the simplify-rtx.c routines work.

If there are targets that only accept noncanonical rtl (which is after
all just a specific type of invalid rtl), they need to be fixed.

>>> +      /* base + scale * index + disp => new base + scale * index  */
>>> +      new_reg = base_plus_disp_to_reg (mode, as, &ad);
>>> +      *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc);
>>> +      if (! valid_address_p (mode, *addr_loc, as))
>>> +	{
>>> +	  /* Some targets like ARM, accept address operands in
>>> +	     specific order -- try exchange them if necessary.	*/
>>> +	  exchange_plus_ops (*addr_loc);
>>> +	  if (! valid_address_p (mode, *addr_loc, as))
>>> +	    exchange_plus_ops (*addr_loc);
>>> +	}
>> Same comment as above about canonical rtl.  Here we can have two
>> registers -- in which case the base should come first -- or a more
>> complex index -- in which case the index should come first.
>>
>> We should be able to pass both rtxes to simplify_gen_binary (PLUS, ...),
>> with the operands in either order, and let it take care of the details.
>> Using simplify_gen_binary would help with the earlier index+disp case too.
> Equiv substitution can create non-canonical forms.  There are 2 approaches:
> o have a code for dealing with non-canonical forms (equiv substitution, 
> target stupidity)
> o always support canonical forms and require them from targets.
>
> I decided to use the 1st variant but I am reconsidering it.  I'll try to 
> fix before inclusion.  But I am not sure I have time for this.  All 
> these changes makes LRA unstable. In fact, I've just found that changes 
> I already made so far resulted in 2 SPEC2000 tests broken although GCC 
> testsuite and bootstrap looks good.

OK.  I'm happy to try fixing the noncanonical thing.

>>> +  /* If this is post-increment, first copy the location to the reload reg.  */
>>> +  if (post && real_in != result)
>>> +    emit_insn (gen_move_insn (result, real_in));
>> Nit, but real_in != result can never be true AIUI, and I was confused how
>> the code could be correct in that case.  Maybe just remove it, or make
>> it an assert?
> No, it might be true:
>
> real_in = in == value ? incloc : in;
> ...
> if (cond)
>    result = incloc;
> else
>    result = ...
>
> if (post && real_in != result)
>
> So it is true if in==value && cond

Sorry, what I meant was that cond is "! post && REG_P (incloc)":

  if (! post && REG_P (incloc))
    result = incloc;
  else
    result = lra_create_new_reg (GET_MODE (value), value, new_rclass,
				 "INC/DEC result");

so it can never be true in the "post" case quoted above.

>>> +	      dest_reg = SET_DEST (set);
>>> +	      /* The equivalence pseudo could be set up as SUBREG in a
>>> +		 case when it is a call restore insn in a mode
>>> +		 different from the pseudo mode.  */
>>> +	      if (GET_CODE (dest_reg) == SUBREG)
>>> +		dest_reg = SUBREG_REG (dest_reg);
>>> +	      if ((REG_P (dest_reg)
>>> +		   && (x = get_equiv_substitution (dest_reg)) != dest_reg
>>> +		   /* Remove insns which set up a pseudo whose value
>>> +		      can not be changed.  Such insns might be not in
>>> +		      init_insns because we don't update equiv data
>>> +		      during insn transformations.
>>> +			
>>> +		      As an example, let suppose that a pseudo got
>>> +		      hard register and on the 1st pass was not
>>> +		      changed to equivalent constant.  We generate an
>>> +		      additional insn setting up the pseudo because of
>>> +		      secondary memory movement.  Then the pseudo is
>>> +		      spilled and we use the equiv constant.  In this
>>> +		      case we should remove the additional insn and
>>> +		      this insn is not init_insns list.	 */
>>> +		   && (! MEM_P (x) || MEM_READONLY_P (x)
>>> +		       || in_list_p (curr_insn,
>>> +				     ira_reg_equiv
>>> +				     [REGNO (dest_reg)].init_insns)))
>> This is probably a stupid question, sorry, but when do we ever want
>> to keep an assignment to a substituted pseudo?  I.e. why isn't this just:
>>
>> 	      if ((REG_P (dest_reg)
>> 		   && (x = get_equiv_substitution (dest_reg)) != dest_reg)
> Equivalence can be memory location.  It means you can assign many 
> different values to the location.  Removing them would generate a wrong 
> code.  For example, if you use your simple variant of code, you will 
> have > 100 test failures of GCC testsuite.

OK :-)

>>> +/* Return true if we need a split for hard register REGNO or pseudo
>>> +   REGNO which was assigned to a hard register.
>>> +   POTENTIAL_RELOAD_HARD_REGS contains hard registers which might be
>>> +   used for reloads since the EBB end.	It is an approximation of the
>>> +   used hard registers in the split range.  The exact value would
>>> +   require expensive calculations.  If we were aggressive with
>>> +   splitting because of the approximation, the split pseudo will save
>>> +   the same hard register assignment and will be removed in the undo
>>> +   pass.  We still need the approximation because too aggressive
>>> +   splitting would result in too inaccurate cost calculation in the
>>> +   assignment pass because of too many generated moves which will be
>>> +   probably removed in the undo pass.  */
>>> +static inline bool
>>> +need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno)
>>> +{
>>> +  int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
>>> +
>>> +  lra_assert (hard_regno >= 0);
>>> +  return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
>>> +	   && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
>>> +	   && (usage_insns[regno].reloads_num
>>> +	       + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
>>> +	   && ((regno < FIRST_PSEUDO_REGISTER
>>> +		&& ! bitmap_bit_p (&ebb_global_regs, regno))
>>> +	       || (regno >= FIRST_PSEUDO_REGISTER
>>> +		   && lra_reg_info[regno].nrefs > 3
>>> +		   && bitmap_bit_p (&ebb_global_regs, regno))))
>>> +	  || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno)));
>>> +}
>> Could you add more commentary about the thinking behind this particular
>> choice of heuristic?  E.g. I wasn't sure what the reloads_num check did,
>> or why we only split hard registers that are local to the EBB and only
>> split pseudos that aren't.
>>
>> The 2 and 3 numbers seemed a bit magic too.  I suppose the 2 has
>> something to do with "one save and one restore", but I wasn't sure
>> why we applied it only for pseudos.  (AIUI that arm of the check
>> deals with "genuine" split pseudos rather than call saves & restores.)
>>
>> Still, it says a lot for the high quality of LRA that, out of all the
>> 1000s of lines of code I've read so far, this is the only part that
>> didn't seem to have an intuitive justification.
> Yes, right.  I checked many parameters and finally picked out the above 
> ones.  What I found is that aggressive and even moderate splitting 
> usually result in worse code.  Most splitting is undone and if we do 
> splitting aggressively we generate a lot of insns which will be removed 
> but their costs are taken into account during assignment pass.  
> Inheritance has quite bigger chance to successful.  One reason why 
> splitting is undone is that spilling + inheritance of short living 
> pseudos is a substitution of splitting in some way.  Therefore I do 
> splitting for long living pseudos. That means non local pseudos.
>
> I added the following comments.
>
>    return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
>             && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
>             /* We need at least 2 reloads to make pseudo splitting
>                profitable.  We should provide hard regno splitting in
>                any case to solve 1st insn scheduling problem when
>                moving hard register definition up might result in
>                impossibility to find hard register for reload pseudo of
>                small register class.  */
>             && (usage_insns[regno].reloads_num
>                 + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
>             && (regno < FIRST_PSEUDO_REGISTER
>                 /* For short living pseudos, spilling + inheritance can
>                    be considered a substitution for splitting.
>                    Therefore we do not splitting for local pseudos.  It
>                    decreases also aggressiveness of splitting.  The
>                    minimal number of references is chosen taking into
>                    account that for 2 references splitting has no sense
>                    as we can just spill the pseudo.  */
>                 || (regno >= FIRST_PSEUDO_REGISTER
>                     && lra_reg_info[regno].nrefs > 3
>                     && bitmap_bit_p (&ebb_global_regs, regno))))
>
> I removed checking that hard_regno is local.  It was a parameter to 
> decrease aggressiveness of hard register splitting but I realized that 
> it contradicts the reason for splitting to solve 1st insn scheduling 
> problem because regional scheduling can move hard register definition 
> beyond BB.

Thanks, this looks much better to me FWIW.

Richard
Vladimir Makarov Oct. 17, 2012, 7:53 p.m. UTC | #15
On 12-10-15 12:49 PM, Richard Sandiford wrote:
> Hi Vlad,
>
> Some comments about the rest of LRA.  Nothing major here...
>
> Vladimir Makarov <vmakarov@redhat.com> writes:
>> +/* Info about register in an insn.  */
>> +struct lra_insn_reg
>> +{
>> +  /* The biggest mode through which the insn refers to the register
>> +     (remember the register can be accessed through a subreg in the
>> +     insn).  */
>> +  ENUM_BITFIELD(machine_mode) biggest_mode : 16;
> AFAICT, this is actually always the mode of a specific reference,
> and if there are references to the same register in different modes,
> those references get their own lra_insn_regs.  "mode" might be better
> than "biggest_mode" if so.
>
I seems mode is also not good.  I've just modified the comment to 
reflect the fact that is just a reference.
>> +/* Static part (common info for insns with the same ICODE) of LRA
>> +   internal insn info.	It exists in at most one exemplar for each
>> +   non-negative ICODE.	Warning: if the structure definition is
>> +   changed, the initializer for debug_insn_static_data in lra.c should
>> +   be changed too.  */
> Probably worth saying (before the warning) that there is also
> one structure for each asm.
Good point.  I added a comment.
>> +/* LRA internal info about an insn (LRA internal insn
>> +   representation).  */
>> +struct lra_insn_recog_data
>> +{
>> +  int icode; /* The insn code.	*/
>> +  rtx insn; /* The insn itself.	 */
>> +  /* Common data for insns with the same ICODE.	 */
>> +  struct lra_static_insn_data *insn_static_data;
> Maybe worth mentioning asms here too.
Fixed.
>> +  /* Two arrays of size correspondingly equal to the operand and the
>> +     duplication numbers: */
>> +  rtx **operand_loc; /* The operand locations, NULL if no operands.  */
>> +  rtx **dup_loc; /* The dup locations, NULL if no dups.	 */
>> +  /* Number of hard registers implicitly used in given call insn.  The
>> +     value can be NULL or points to array of the hard register numbers
>> +     ending with a negative value.  */
>> +  int *arg_hard_regs;
>> +#ifdef HAVE_ATTR_enabled
>> +  /* Alternative enabled for the insn.	NULL for debug insns.  */
>> +  bool *alternative_enabled_p;
>> +#endif
>> +  /* The alternative should be used for the insn, -1 if invalid, or we
>> +     should try to use any alternative, or the insn is a debug
>> +     insn.  */
>> +  int used_insn_alternative;
>> +  struct lra_insn_reg *regs;  /* Always NULL for a debug insn.	*/
> Comments consistently above the field.
Fixed.
>> +extern void lra_expand_reg_info (void);
> This doesn't exist any more.
Fixed.
>> +extern int lra_constraint_new_insn_uid_start;
> Just saying in case: this seems to be write-only, with lra-constraints.c
> instead using a static variable to track the uid start.
>
> I realise you might want to keep it anyway for consistency with
> lra_constraint_new_regno_start, or for debugging.
>
>> +extern rtx lra_secondary_memory[NUM_MACHINE_MODES];
> This doesn't exist any more.
Removed.  Thanks.
>> +/* lra-saves.c: */
>> +
>> +extern bool lra_save_restore (void);
> Same for this file & function.
Removed.
>> +/* The function returns TRUE if at least one hard register from ones
>> +   starting with HARD_REGNO and containing value of MODE are in set
>> +   HARD_REGSET.	 */
>> +static inline bool
>> +lra_hard_reg_set_intersection_p (int hard_regno, enum machine_mode mode,
>> +				 HARD_REG_SET hard_regset)
>> +{
>> +  int i;
>> +
>> +  lra_assert (hard_regno >= 0);
>> +  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
>> +    if (TEST_HARD_REG_BIT (hard_regset, hard_regno + i))
>> +      return true;
>> +  return false;
>> +}
> This is the same as overlaps_hard_reg_set_p.
>
I removed it and started to use the function overlaps_hard_reg_set_p.
>> +/* Return hard regno and offset of (sub-)register X through arguments
>> +   HARD_REGNO and OFFSET.  If it is not (sub-)register or the hard
>> +   register is unknown, then return -1 and 0 correspondingly.  */
> The function seems to return -1 for both.
Fixed.  It does not matter for the rest of code as offset is used only 
when hard_regno >= 0.
>> +/* Add hard registers starting with HARD_REGNO and holding value of
>> +   MODE to the set S.  */
>> +static inline void
>> +lra_add_hard_reg_set (int hard_regno, enum machine_mode mode, HARD_REG_SET *s)
>> +{
>> +  int i;
>> +
>> +  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
>> +    SET_HARD_REG_BIT (*s, hard_regno + i);
>> +}
> This is add_to_hard_reg_set.
Removed.
>> +   Here is block diagram of LRA passes:
>> +
>> +	  ---------------------				
>> +	 | Undo inheritance    |      ---------------	     ---------------
>> +	 | for spilled pseudos)|     | Memory-memory |	    | New (and old) |
>> +	 | and splits (for     |<----| move coalesce |<-----|	 pseudos    |
>> +	 | pseudos got the     |      ---------------	    |	assignment  |
>> +  Start	 |  same  hard regs)   |			     ---------------
>> +    |	  ---------------------					    ^
>> +    V		  |		 ----------------		    |
>> + -----------	  V		| Update virtual |		    |
>> +|  Remove   |----> ------------>|    register	 |		    |
>> +| scratches |	  ^		|  displacements |		    |
>> + -----------	  |		 ----------------		    |
>> +		  |			 |			    |
>> +		  |			 V	   New		    |
>> +	 ----------------    No	   ------------	 pseudos   -------------------
>> +	| Spilled pseudo | change |Constraints:| or insns | Inheritance/split |
>> +	|    to memory	 |<-------|    RTL     |--------->|  transformations  |
>> +	|  substitution	 |	  | transfor-  |	  |    in EBB scope   |
>> +	 ----------------	  |  mations   |	   -------------------
>> +		|		    ------------
>> +		V
>> +    -------------------------
>> +   | Hard regs substitution, |
>> +   |  devirtalization, and   |------> Finish
>> +   | restoring scratches got |
>> +   |	     memory	     |
>> +    -------------------------
> This is a great diagram, thanks.
>
>> +/* Create and return a new reg from register FROM corresponding to
>> +   machine description operand of mode MD_MODE.	 Initialize its
>> +   register class to RCLASS.  Print message about assigning class
>> +   RCLASS containing new register name TITLE unless it is NULL.	 The
>> +   created register will have unique held value.  */
>> +rtx
>> +lra_create_new_reg_with_unique_value (enum machine_mode md_mode, rtx original,
>> +				      enum reg_class rclass, const char *title)
> Comment says FROM, but parameter is called ORIGINAL.  The code copes
> with both null and non-register ORIGINALs, which aren't mentinoed in
> the comment.
Fixed.
>> +/* Target checks operands through operand predicates to recognize an
>> +   insn.  We should have a special precaution to generate add insns
>> +   which are frequent results of elimination.
>> +
>> +   Emit insns for x = y + z.  X can be used to store intermediate
>> +   values and should be not in Y and Z when we use x to store an
>> +   intermediate value.	*/
> I think this should say what Y and Z are allowed to be, since it's
> more than just registers and constants.
Fixed.
>> +/* Map INSN_UID -> the operand alternative data (NULL if unknown).  We
>> +   assume that this data is valid until register info is changed
>> +   because classes in the data can be changed.	*/
>> +struct operand_alternative *op_alt_data[LAST_INSN_CODE];
> In that case I think it should be in a target_globals structure,
> a bit like target_ira.
Fixed.
>> +	    for (curr = list; curr != NULL; curr = curr->next)
>> +	      if (curr->regno == regno)
>> +		break;
>> +	    if (curr == NULL || curr->subreg_p != subreg_p
>> +		|| curr->biggest_mode != mode)
>> +	      {
>> +		/* This is a new hard regno or the info can not be
>> +		   integrated into the found structure.	 */
>> +#ifdef STACK_REGS
>> +		early_clobber
>> +		  = (early_clobber
>> +		     /* This clobber is to inform popping floating
>> +			point stack only.  */
>> +		     && ! (FIRST_STACK_REG <= regno
>> +			   && regno <= LAST_STACK_REG));
>> +#endif
>> +		list = new_insn_reg (regno, type, mode, subreg_p,
>> +				     early_clobber, list);
>> +	      }
>> +	    else
>> +	      {
>> +		if (curr->type != type)
>> +		  curr->type = OP_INOUT;
>> +		if (curr->early_clobber != early_clobber)
>> +		  curr->early_clobber = true;
>> +	      }
> OK, so this is probably only a technicality, but I think this should be:
>
> 	    for (curr = list; curr != NULL; curr = curr->next)
> 	      if (curr->regno == regno
> 		  && curr->subreg_p == subreg_p
> 		  && curr->biggest_mode == mode)
> 		{
> 		  ..reuse..;
> 		  return list;
> 		}
> 	     ..new entry..;
> 	     return list;
Fixed by using your approach.  It cannot be return because the return 
originally is after a loop covering this code.
>> +      icode = INSN_CODE (insn);
>> +      if (icode < 0)
>> +	/* It might be a new simple insn which is not recognized yet.  */
>> +	INSN_CODE (insn) = icode = recog (PATTERN (insn), insn, 0);
> Any reason not to use recog_memoized here?
It simply will call recog always as icode < 0.  But as it has simpler 
interface, I've changed it.
>> +      n = insn_static_data->n_operands;
>> +      if (n == 0)
>> +	locs = NULL;
>> +      else
>> +	{
>> +	
>> +	  locs = (rtx **) xmalloc (n * sizeof (rtx *));
>> +	  memcpy (locs, recog_data.operand_loc, n * sizeof (rtx *));
>> +	}
> Excess blank line after "else" (sorry!)
It looks like it is already fixed.
>> +  /* Some output operand can be recognized only from the context not
>> +     from the constraints which are empty in this case.	 Call insn may
>> +     contain a hard register in set destination with empty constraint
>> +     and extract_insn treats them as an input.	*/
>> +  for (i = 0; i < insn_static_data->n_operands; i++)
>> +    {
>> +      int j;
>> +      rtx pat, set;
>> +      struct lra_operand_data *operand = &insn_static_data->operand[i];
>> +
>> +      /* ??? Should we treat 'X' the same way.	It looks to me that
>> +	 'X' means anything and empty constraint means we do not
>> +	 care.	*/
> FWIW, I think any X output operand has to be "=X" or "+X"; just "X"
> would be as wrong as "r".  genrecog is supposed to complain about that
> for insns, and parse_output_constraint for asms.
>
> So I agree the code is correct in just handling empty constraints.
Ok.
>> +/* Update all the insn info about INSN.	 It is usually called when
>> +   something in the insn was changed.  Return the udpated info.	 */
> Typo: updated.
Fixed.
>> +  for (i = 0; i < reg_info_size; i++)
>> +    {
>> +      bitmap_initialize (&lra_reg_info[i].insn_bitmap, &reg_obstack);
>> +#ifdef STACK_REGS
>> +      lra_reg_info[i].no_stack_p = false;
>> +#endif
>> +      CLEAR_HARD_REG_SET (lra_reg_info[i].conflict_hard_regs);
>> +      lra_reg_info[i].preferred_hard_regno1 = -1;
>> +      lra_reg_info[i].preferred_hard_regno2 = -1;
>> +      lra_reg_info[i].preferred_hard_regno_profit1 = 0;
>> +      lra_reg_info[i].preferred_hard_regno_profit2 = 0;
>> +      lra_reg_info[i].live_ranges = NULL;
>> +      lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
>> +      lra_reg_info[i].last_reload = 0;
>> +      lra_reg_info[i].restore_regno = -1;
>> +      lra_reg_info[i].val = get_new_reg_value ();
>> +      lra_reg_info[i].copies = NULL;
>> +    }
> The same loop (with a different start index) appears in expand_reg_info.
> It'd be nice to factor it out, so that there's only one place to update
> if the structure is changed.
Fixed.

>> +	  for (curr = data->regs; curr != NULL; curr = curr->next)
>> +	    if (curr->regno == regno)
>> +	      break;
>> +	  if (curr->subreg_p != subreg_p || curr->biggest_mode != mode)
>> +	    /* The info can not be integrated into the found
>> +	       structure.  */
>> +	    data->regs = new_insn_reg (regno, type, mode, subreg_p,
>> +				       early_clobber, data->regs);
>> +	  else
>> +	    {
>> +	      if (curr->type != type)
>> +		curr->type = OP_INOUT;
>> +	      if (curr->early_clobber != early_clobber)
>> +		curr->early_clobber = true;
>> +	    }
>> +	  lra_assert (curr != NULL);
>> +	}
> Same loop comment as for collect_non_operand_hard_regs.  Maybe another
> factoring opportunity.
Fixed.
>> +		/* Some ports don't recognize the following addresses
>> +		   as legitimate.  Although they are legitimate if
>> +		   they satisfies the constraints and will be checked
>> +		   by insn constraints which we ignore here.  */
>> +		&& GET_CODE (XEXP (op, 0)) != UNSPEC
>> +		&& GET_CODE (XEXP (op, 0)) != PRE_DEC
>> +		&& GET_CODE (XEXP (op, 0)) != PRE_INC
>> +		&& GET_CODE (XEXP (op, 0)) != POST_DEC
>> +		&& GET_CODE (XEXP (op, 0)) != POST_INC
>> +		&& GET_CODE (XEXP (op, 0)) != PRE_MODIFY
>> +		&& GET_CODE (XEXP (op, 0)) != POST_MODIFY)
> GET_RTX_CLASS (GET_CODE (XEXP (op, 0))) == RTX_AUTOINC
Fixed.
>> +/* Determine if the current function has an exception receiver block
>> +   that reaches the exit block via non-exceptional edges  */
>> +static bool
>> +has_nonexceptional_receiver (void)
>> +{
>> +  edge e;
>> +  edge_iterator ei;
>> +  basic_block *tos, *worklist, bb;
>> +
>> +  /* If we're not optimizing, then just err on the safe side.  */
>> +  if (!optimize)
>> +    return true;
>> +
>> +  /* First determine which blocks can reach exit via normal paths.  */
>> +  tos = worklist = XNEWVEC (basic_block, n_basic_blocks + 1);
>> +
>> +  FOR_EACH_BB (bb)
>> +    bb->flags &= ~BB_REACHABLE;
>> +
>> +  /* Place the exit block on our worklist.  */
>> +  EXIT_BLOCK_PTR->flags |= BB_REACHABLE;
>> +  *tos++ = EXIT_BLOCK_PTR;
>> +
>> +  /* Iterate: find everything reachable from what we've already seen.  */
>> +  while (tos != worklist)
>> +    {
>> +      bb = *--tos;
>> +
>> +      FOR_EACH_EDGE (e, ei, bb->preds)
>> +	if (!(e->flags & EDGE_ABNORMAL))
>> +	  {
>> +	    basic_block src = e->src;
>> +
>> +	    if (!(src->flags & BB_REACHABLE))
>> +	      {
>> +		src->flags |= BB_REACHABLE;
>> +		*tos++ = src;
>> +	      }
>> +	  }
>> +    }
>> +  free (worklist);
>> +
>> +  /* Now see if there's a reachable block with an exceptional incoming
>> +     edge.  */
>> +  FOR_EACH_BB (bb)
>> +    if (bb->flags & BB_REACHABLE)
>> +      FOR_EACH_EDGE (e, ei, bb->preds)
>> +	if (e->flags & EDGE_ABNORMAL)
>> +	  return true;
>> +
>> +  /* No exceptional block reached exit unexceptionally.	 */
>> +  return false;
>> +}
> Looks like we could just early out on the first loop and get rid
> of the second.
It seems so.  I fixed it.  This is from reload.  Code in reload could be 
fixed too.
>> +/* Remove all REG_DEAD and REG_UNUSED notes and regenerate REG_INC.
>> +   We change pseudos by hard registers without notification of DF and
>> +   that can make the notes obsolete.  DF-infrastructure does not deal
>> +   with REG_INC notes -- so we should regenerate them here.  */
> These days passes are supposed to regenerate REG_DEAD and REG_UNUSED
> notes if they need them, so that part might not be necessary.
> The REG_INC bit is still needed though...
Ok.  I fixed it.
>> +/* Initialize LRA data once per function.  */
>> +void
>> +lra_init (void)
>> +{
>> +  init_op_alt_data ();
>> +}
> I think it's more like:
>
> /* Initialize LRA whenever register-related information is changed.  */
Fixed.
>
> In summary, LRA looks really good to me FWIW.  Thanks for all your hard work.
>
> Getting rid of reload always seemed like a pipe dream, and if the only
> known drawback of this replacement is that it takes a while on extreme
> testcases, that's an amazing achievement.  (Not to say compile time
> isn't important, just that there were so many other hurdles to overcome.)
It is my second attempt.  The first one was YARA project.  I got a lot 
of experience from this project and knowledge how not to do this.
LRA will be still a long lasting project.  I don't think I found all 
weirdness of reload just trying 8 targets (fixing one bug on one target 
frequently resulted in new bugs on other targets so it required to do 
frequently cardinal changes to the original code). Only after trying the 
8 targets I got feeling that this approach could well.  There are still 
much more targets (and subtargets) to port to LRA.  And I hope that 
people will help me.  I am very glad and appreciate that you did such 
rigorous review because it means that you now understand LRA very well 
(sometimes I think even better than me).
> It looks like opinion has crystalised in favour of merging LRA for 4.8.
> I hope that's what happens.  I don't see that anything would be gained
> by delaying it to 4.9.  The code's not going to get any more testing on the
> branch that it already has; whenever we merge, the stress test is always
> going to be trunk.
>
Richard, thanks for you invaluable help.
Steven Bosscher Oct. 17, 2012, 8:12 p.m. UTC | #16
On Wed, Oct 17, 2012 at 9:53 PM, Vladimir Makarov wrote:
> On 12-10-15 12:49 PM, Richard Sandiford wrote:
>> Getting rid of reload always seemed like a pipe dream, and if the only
>> known drawback of this replacement is that it takes a while on extreme
>> testcases, that's an amazing achievement.  (Not to say compile time
>> isn't important, just that there were so many other hurdles to overcome.)

Just to be clear, LRA now does no worse from a compile time POV than,
say, tree-ssa-live. Most of the scalability problems have been
addressed.

> It is my second attempt.  The first one was YARA project.  I got a lot of
> experience from this project and knowledge how not to do this.
> LRA will be still a long lasting project.  I don't think I found all
> weirdness of reload just trying 8 targets (fixing one bug on one target
> frequently resulted in new bugs on other targets so it required to do
> frequently cardinal changes to the original code). Only after trying the 8
> targets I got feeling that this approach could well.

Hats off to you, Vlad, for your years of effort on improving GCC's RA!

Ciao!
Steven
Vladimir Makarov Oct. 19, 2012, 5:14 a.m. UTC | #17
On 12-10-15 8:06 AM, Richard Sandiford wrote:
> Vladimir Makarov <vmakarov@redhat.com> writes:
>>> if that's accurate.  I dropped the term "reload pseudo" because of
>>> the general comment in my earlier reply about the use of "reload pseudo"
>>> when the code seems to include inheritance and split pseudos too.
>> There is no inheritance and splitting yet.  It is done after the
>> constraint pass.
>> So at this stage >= new_regno_start means reload pseudo.
> Ah, OK.
>
>>> That's a change in the meaning of NEW_CLASS, but seems easier for
>>> callers to handle.  I think all it requires is changing:
>>>
>>>> +      common_class = ira_reg_class_subset[rclass][cl];
>>>> +      if (new_class != NULL)
>>>> +	*new_class = common_class;
>>> to:
>>>
>>>         common_class = ira_reg_class_subset[rclass][cl];
>>>         if (new_class != NULL && rclass != common_class)
>>> 	*new_class = common_class;
>> This change results in infinite LRA looping on a first libgcc file
>> compilation.  Unfortunately I have no time to investigate it.
>> I'd like to say that most code of in this code is very sensitive to
>> changes.  I see it a lot.  You change something looking obvious and a
>> target is broken.
>> I am going to investigate it when I have more time.
> Thanks.
>
>>>> +    default:
>>>> +      {
>>>> +	const char *fmt = GET_RTX_FORMAT (code);
>>>> +	int i;
>>>> +
>>>> +	if (GET_RTX_LENGTH (code) != 1
>>>> +	    || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
>>>> +	  {
>>>> +	    for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
>>>> +	      if (fmt[i] == 'e')
>>>> +		extract_loc_address_regs (false, mode, as, &XEXP (x, i),
>>>> +					  context_p, code, SCRATCH,
>>>> +					  modify_p, ad);
>>>> +	    break;
>>>> +	  }
>>>> +	/* fall through for case UNARY_OP (UNSPEC ...)	*/
>>>> +      }
>>>> +
>>>> +    case UNSPEC:
>>>> +      if (ad->disp_loc == NULL)
>>>> +	ad->disp_loc = loc;
>>>> +      else if (ad->base_reg_loc == NULL)
>>>> +	{
>>>> +	  ad->base_reg_loc = loc;
>>>> +	  ad->base_outer_code = outer_code;
>>>> +	  ad->index_code = index_code;
>>>> +	  ad->base_modify_p = modify_p;
>>>> +	}
>>>> +      else
>>>> +	{
>>>> +	  lra_assert (ad->index_reg_loc == NULL);
>>>> +	  ad->index_reg_loc = loc;
>>>> +	}
>>>> +      break;
>>>> +
>>>> +    }
>>> Which targets use a bare UNSPEC as a displacement?  I thought a
>>> displacement had to be a link-time constant, in which case it should
>>> satisfy CONSTANT_P.  For UNSPECs, that means wrapping it in a CONST.
>> I saw it somewhere.  I guess IA64.
>>> I'm just a bit worried that the UNSPEC handling is sensitive to the
>>> order that subrtxes are processed (unlike PLUS, which goes to some
>>> trouble to work out what's what).  It could be especially confusing
>>> because the default case processes operands in reverse order while
>>> PLUS processes them in forward order.
>>>
>>> Also, which cases require the special UNARY_OP (UNSPEC ...) fallthrough?
>>> Probably deserves a comment.
>> I don't remember.  To figure out, I should switch it off and try all
>> targets supported by LRA.
>>> AIUI the base_reg_loc, index_reg_loc and disp_loc fields aren't just
>>> recording where reloads of a particular class need to go (obviously
>>> in the case of disp_loc, which isn't reloaded at all).  The feidls
>>> have semantic value too.  I.e. we use them to work out the value
>>> of at least part of the address.
>>>
>>> In that case it seems dangerous to look through general rtxes
>>> in the way that the default case above does.  Maybe just making
>>> sure that DISP_LOC is involved in a sum with the base would be
>>> enough, but another idea was:
>>>
>>> ----------------------------------------------------------------
>>> I know of three ways of "mutating" (for want of a better word)
>>> an address:
>>>
>>>     1. (and X (const_int X)), to align
>>>     2. a subreg
>>>     3. a unary operator (such as truncation or extension)
>>>
>>> So maybe we could:
>>>
>>>     a. remove outer mutations (using a helper function)
>>>     b. handle LO_SUM, PRE_*, POST_*: as now
>>>     c. otherwise treat the address of the sum of one, two or three pieces.
>>>        c1. Peel mutations of all pieces.
>>>        c2. Classify the pieces into base, index and displacement.
>>>            This would be similar to the jousting code above, but hopefully
>>>            easier because all three rtxes are to hand.  E.g. we could
>>>            do the base vs. index thing in a similar way to
>>>            commutative_operand_precedence.
>>>        c3. Record which pieces were mutated (e.g. using something like the
>>>            index_loc vs. index_reg_loc distinction in the current code)
>>>
>>> That should be general enough for current targets, but if it isn't,
>>> we could generalise it further when we know what generalisation is needed.
>>>
>>> That's still going to be a fair amount of code, but hopefully not more,
>>> and we might have more confidence at each stage what each value is.
>>> And it avoids the risk of treating "mutated" addresses as "unmutated" ones.
>>> ----------------------------------------------------------------
>>>
>>> Just an idea though.  Probably not for 4.8, although I might try it
>>> if I find time.
>> I am not sure that you listed all the cases.  It would be great if you
>> listed all the cases. In this case we could make this function more clear.
>> I tried to do this first but permanently found new cases.  After that I
>> gave up and tried to use more general implementation.
>>
>> This function was rewritten and modified many times.  I am afraid to do
>> this again when clock is ticking.
>>
>> It would be great if you re-implement the function according to your
>> ideas and we could try it on 8 targets to which LRA was already ported.
>> An LRA sub-branch would a perfect place to do it
> Fair enough.  I still stand by:
>
>>> It would be nice to sort out the disp_loc thing for 4.8 though.
> though.  My point is that base_plus_disp_to_reg assumes that
> *base_loc and *disp_loc are involved in a sum.  It adds them together,
> replaces the base_loc with the new pseudo, and removes the disp_loc.
> But extract_address_regs seems to be deliberately written in a way that
> doesn't require base_loc and disp_loc to be involved in a sum, and uses
> a more indirect approach to working out disp_loc instead.  It just feels
> like it leaves open the potential for a silent wrong-code bug.
I add the explicit check of base + disp case.  I also added a lot of 
asserts checking that we don't assign twice.
>>>> +/* Reload pseudos created for matched input and output reloads whose
>>>> +   mode are different.	Such pseudos has a modified rules for finding
>>>> +   their living ranges, e.g. assigning to subreg of such pseudo means
>>>> +   changing all pseudo value.  */
>>>> +bitmap_head lra_bound_pseudos;
>>> Maybe:
>>>
>>> /* Reload pseudos created for matched input and output reloads whose
>>>      modes are different.  Such pseudos have different live ranges from
>>>      other pseudos; e.g. any assignment to a subreg of these pseudos
>>>      changes the whole pseudo's value.  */
>> Fixed.
>>> Although that said, couldn't emit_move_insn_1 (called by gen_move_insn)
>>> split a multiword pseudo move into two word moves?  Using the traditional
>>> clobber technique sounds better than having special liveness rules.
>>>
>> It is not only about multi-words pseudos.  It is about representation of
>> this situation by constructions semantically incorrect in order parts of
>> compiler.  Reload has no such problem as it does not use RTL.   So I
>> don't think it splits as I use emit_move_insn and that calls
>> emit_move_insn_1 too.
> But my point is that emit_move_insn_1 _does_ split moves that have no
> .md pattern of their own.  E.g. some targets do not define double-word
> move patterns because such moves are always equivalent to two individual
> word moves.  And if emit_move_insn_1 splits:
>
>     (set (reg:DI X) (reg:DI Y))
>
> into:
>
>     (set (subreg:SI (reg:DI X) 0) (subreg:SI (reg:DI Y) 0))
>     (set (subreg:SI (reg:DI X) 4) (subreg:SI (reg:DI Y) 4))
>
> then it would be to say that the subreg in the second instruction
> is a complete definition of X.
>> I really needed a special liveness treatment (although I don't
>> remember details) and therefore I added it.  I had no detail design
>> for LRA.  The code was modified by numerous test failures on different
>> targets.  There is a lot of code analogous to reload one and probably
>> its necessity should be rigorously questioned.  I thought about and
>> modified part of this code but unfortunately not all.
>>
>> Also bound pseudos are rare.  Their bitmap is very small and testing (2
>> lines of code in overall) them in ira-lives.c is fast.
> FWIW, It wasn't really speed as much as correctness I was worried about.
> In a way, rarity makes having special rules seem even more dangerous.
>
Right.  I don't think it is more dangerous than clobbers but I like what 
you proposed.  After few tries, I managed with clobbers.  So now there 
are no bound pseudos (or special reload pseudos. that is how it is 
called now).  The code is clear and there is no additional notion for 
LRA.  Thank you.
>>>> +      /* We create pseudo for out rtx because we always should keep
>>>> +	 registers with the same original regno have synchronized
>>>> +	 value (it is not true for out register but it will be
>>>> +	 corrected by the next insn).
>>> I don't understand this comment, sorry.
>>>
>> Pseudos have values -- see comments for lra_reg_info.  Different pseudos
>> with the same value do not conflict even if they live in the same
>> place.  When we create a pseudo we assign value of original pseudo (if
>> any) from which we created the new pseudo.  If we create the pseudo from
>> the input pseudo, the new pseudo will no conflict with the input pseudo
>> which is wrong when the input pseudo lives after the insn and as the new
>> pseudo value is changed by the insn output.  Therefore we create the new
>> pseudo from the output.
>>
>> I hope it is more understandable.  I changed the comment.
> Yeah, I think that makes it a lot clearer, thanks.
>
>>>> +  /* In and out operand can be got from transformations before
>>>> +     processing constraints.  So the pseudos might have inaccurate
>>>> +     class and we should make their classes more accurate.  */
>>>> +  narrow_reload_pseudo_class (in_rtx, goal_class);
>>>> +  narrow_reload_pseudo_class (out_rtx, goal_class);
>>> I don't understand this, sorry.  Does "transformations" mean inheritance
>>> and reload splitting?  So the registers we're changing here are inheritance
>>> and split pseudos rather than reload pseudos created for this instruction?
>>> If so, it sounds on face value like it conflicts with the comment quoted
>>> above about not allowing reload instructions to the narrow the class
>>> of pseudos.  Might be worth saying why that's OK here but not there.
>> Again, inheritance and splitting is done after the constraint pass.
>>
>> The transformations here are mostly reloading of subregs which is done
>> before reloads for given insn.  On this transformation we create new
>> pseudos for which we don't know reg class yet.  In case we don't know
>> pseudo reg class yet, we assign ALL_REGS to the pseudo.
> OK, thanks.
>
>>> Also, all uses but one of lra_get_hard_regno_and_offset follow
>>> the pattern:
>>>
>>>         lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
>>>         /* The real hard regno of the operand after the allocation.  */
>>>         x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
>>>
>>> so couldn't lra_get_hard_regno_and_offset just return the final
>>> hard register, including elimination?  Then it could apply the
>>> elimination on the original rtx.
>>>
>>> FWIW, the exception I mentioned was operands_match_p:
>>>
>>>         lra_get_hard_regno_and_offset (x, &i, &offset);
>>>         if (i < 0)
>>> 	goto slow;
>>>         i += offset;
>>>
>>> but I'm not sure why this is the only caller that would want
>>> to ignore elimination.
>> ???
Sorry, I just composed this email for 2 days and that was a placeholder 
for your question I should answer or work on:)  It looks I did not 
answered the question.
> Not sure what you meant here :-)  Was that a placeholder,
> or something else?  What I was getting at was that it would
> be nice to replace all occurences of:
>
>        lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
>        /* The real hard regno of the operand after the allocation.  */
>        x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
>
> with something like:
>
>        x_hard_regno = lra_get_hard_regno (x);
>
> and this operands_match_p code seemed to be the only place that didn't
> apply get_final_hard_regno to the result of lra_get_hard_regno_and_offset.
> I wasn't really sure why operands_match_p was different.
Fixed.
>>>> +  int i, j, x_hard_regno, offset;
>>>> +  enum machine_mode mode;
>>>> +  rtx x;
>>>> +  const char *fmt;
>>>> +  enum rtx_code code;
>>>> +
>>>> +  if (*loc == NULL_RTX)
>>>> +    return false;
>>>> +  x = *loc;
>>>> +  code = GET_CODE (x);
>>>> +  mode = GET_MODE (x);
>>>> +  if (code == SUBREG)
>>>> +    {
>>>> +      loc = &SUBREG_REG (x);
>>>> +      x = SUBREG_REG (x);
>>>> +      code = GET_CODE (x);
>>>> +      if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
>>>> +	mode = GET_MODE (x);
>>>> +    }
>>>> +
>>>> +  if (REG_P (x))
>>>> +    {
>>>> +      lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
>>>> +      /* The real hard regno of the operand after the allocation.  */
>>>> +      x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
>>>> +      return (x_hard_regno >= 0
>>>> +	      && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));
>>> With the subreg mode handling above, this looks little-endian specific.
>>> The MEM case:
>>>> +  if (MEM_P (x))
>>>> +    {
>>>> +      struct address ad;
>>>> +      enum machine_mode mode = GET_MODE (x);
>>>> +      rtx *addr_loc = &XEXP (x, 0);
>>>> +
>>>> +      extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
>>>> +      if (ad.base_reg_loc != NULL)
>>>> +	{
>>>> +	  if (uses_hard_regs_p (ad.base_reg_loc, set))
>>>> +	    return true;
>>>> +	}
>>>> +      if (ad.index_reg_loc != NULL)
>>>> +	{
>>>> +	  if (uses_hard_regs_p (ad.index_reg_loc, set))
>>>> +	    return true;
>>>> +	}
>>>> +    }
>>> is independent of the subreg handling, so perhaps the paradoxical subreg
>>> case should be handled separately, using simplify_subreg_regno.
> Not sure: did you have any thoughts on this?
>
No.  Sorry.  I'll think about this morning.  A bit tired for today.

>>>> +		    match_p = false;
>>>> +		    if (operands_match_p (*curr_id->operand_loc[nop],
>>>> +					  *curr_id->operand_loc[m], m_hregno))
>>>> +		      {
>>>> +			int i;
>>>> +			
>>>> +			for (i = 0; i < early_clobbered_regs_num; i++)
>>>> +			  if (early_clobbered_nops[i] == m)
>>>> +			    break;
>>>> +			/* We should reject matching of an early
>>>> +			   clobber operand if the matching operand is
>>>> +			   not dying in the insn.  */
>>>> +			if (i >= early_clobbered_regs_num
>>> Why not simply use operands m's early_clobber field?
>> Ok.  Fixed.
>>>> +			    || operand_reg[nop] == NULL_RTX
>>>> +			    || (find_regno_note (curr_insn, REG_DEAD,
>>>> +						 REGNO (operand_reg[nop]))
>>>> +				!= NULL_RTX))
>>>> +			  match_p = true;
>>> ...although I don't really understand this condition.  If the two
>>> operands are the same value X, then X must die here whatever the
>>> notes say.  So I assume this is coping with a case where the operands
>>> are different but still match.  If so, could you give an example?
>> I remember I saw such insn but I don't remember details.
>>> Matched earlyclobbers explicitly guarantee that the earlyclobber doesn't
>>> apply to the matched input operand; the earlyclobber only applies to
>>> other input operands.  So I'd have expected it was those operands
>>> that might need reloading rather than this one.
>>>
>>> E.g. if X occurs three times, twice in a matched earlyclobber pair
>>> and once as an independent operand, it's the latter operand that would
>>> need reloading.
>> Yes, I know.
> But in that case I don't understand the condition.  If we have:
>
>    (set (reg X) (... (reg X) ...))
>
> (which is the kind of thing operands_match_p is testing for)
> then there is no requirement for a REG_DEAD note for X.
> But it's still OK for two Xs form an earlyclobber pair.
>
I think I added this for some purpose.  It might be IRA assigning 
wrongly the same hard register in situation.

(set (reg X1) (... (reg X2) ...)) REG_UNUSED:X1)
and X2 lives below.

I think it is better not to change this code.  I promise I'll remove it 
on LRA branch to find why I added.

>>>> +			  /* If we didn't already win, we can reload
>>>> +			     constants via force_const_mem, and other
>>>> +			     MEMs by reloading the address like for
>>>> +			     'o'.  */
>>>> +			  if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
>>>> +			    badop = false;
>>> It seems a bit inconsistent to treat a spilled pseudo whose address
>>> might well need reloading as a win, while not treating existing MEMs
>>> whose addresses need reloading as a win.
>> Well, probability of reloading address of spilled pseudo is very small
>> on most targets but reloading for MEM in this case is real. So I see it
>> logical.
> OK, that's probably true. :-)
>
>>>> +	      if (! no_regs_p)
>>>> +		reload_nregs
>>>> +		  += ira_reg_class_max_nregs[this_alternative][mode];
>>> I wasn't sure why we counted this even in the "const_to_mem && constmmeok"
>>> and "MEM_P (op) && offmemok" cases from:
>>> 	      /* We prefer to reload pseudos over reloading other
>>> 		 things, since such reloads may be able to be
>>> 		 eliminated later.  So bump REJECT in other cases.
>>> 		 Don't do this in the case where we are forcing a
>>> 		 constant into memory and it will then win since we
>>> 		 don't want to have a different alternative match
>>> 		 then.	*/
>>> 	      if (! (REG_P (op)
>>> 		     && REGNO (op) >= FIRST_PSEUDO_REGISTER)
>>> 		  && ! (const_to_mem && constmemok)
>>> 		  /* We can reload the address instead of memory (so
>>> 		     do not punish it).	 It is preferable to do to
>>> 		     avoid cycling in some cases.  */
>>> 		  && ! (MEM_P (op) && offmemok))
>>> 		reject += 2;
>> I think constmemok is obvious.  It is not a reload, it just putting
>> constant in the constant pool.  We should not punish it as no additional
>> insns are generated.
>>
>> There is a comment for offmemok case.  I think it describes it.
>> Apparently it was a fix for LRA cycling.  I don't remember details. To
>> restore them, I need to remove the code and to try it on many targets.
>> I guess, it would take 3-4 days.  But I removed this as it does not
>> affect x86/x86-64.
> Sorry, my comment wasn't as clear as it should have been.  I think:
>
> 	      /* We prefer to reload pseudos over reloading other
> 		 things, since such reloads may be able to be
> 		 eliminated later.  So bump REJECT in other cases.
> 		 Don't do this in the case where we are forcing a
> 		 constant into memory and it will then win since we
> 		 don't want to have a different alternative match
> 		 then.	*/
> 	      if (! (REG_P (op)
> 		     && REGNO (op) >= FIRST_PSEUDO_REGISTER)
> 		  && ! (const_to_mem && constmemok)
> 		  /* We can reload the address instead of memory (so
> 		     do not punish it).	 It is preferable to do to
> 		     avoid cycling in some cases.  */
> 		  && ! (MEM_P (op) && offmemok))
> 		reject += 2;
>
> is fine.  The point I was trying to make is that this code (correctly IMO)
> ignores register reload costs if the operand won't be reloaded into
> a register.  The operand won't be reloaded into a register if:
>
> 	(const_to_mem && constmemok)
>       || (MEM_P (op) && offmemok)
>
> However:
>
> 	      if (! no_regs_p)
> 		reload_nregs
> 		  += ira_reg_class_max_nregs[this_alternative][mode];
>
> seems to count the register reload cost regardless of whether
> the operand would be reloaded into a register.  It felt like
> it ought to be something like:
>
> 	      if (!((const_to_mem && const_mem_ok)
> 		    || (MEM_P (op) && offmemok)))
> 		{
> 		  /* We prefer to reload pseudos over reloading other
> 		     things, since such reloads may be able to be
> 		     eliminated later.  So bump REJECT in other cases.  */
> 		  if (! (REG_P (op)
> 			 && REGNO (op) >= FIRST_PSEUDO_REGISTER)
> 		    reject += 2;
>
> 		  if (! no_regs_p)
> 		    reload_nregs
> 		      += ira_reg_class_max_nregs[this_alternative][mode];
> 		}
>
>
It has sense.  I fixed it using your variant.
Vladimir Makarov Oct. 19, 2012, 5:14 a.m. UTC | #18
On 12-10-17 7:24 AM, Richard Sandiford wrote:
> Thanks for all the updates.
>
> Vladimir Makarov <vmakarov@redhat.com> writes:
>>>> +	  /* index * scale + disp => new base + index * scale  */
>>>> +	  enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
>>>> +
>>>> +	  lra_assert (INDEX_REG_CLASS != NO_REGS);
>>>> +	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
>>>> +	  lra_assert (GET_CODE (*addr_loc) == PLUS);
>>>> +	  lra_emit_move (new_reg, *ad.disp_loc);
>>>> +	  if (CONSTANT_P (XEXP (*addr_loc, 1)))
>>>> +	    XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0);
>>>> +	  XEXP (*addr_loc, 0) = new_reg;
>>> The canonical form is (plus (mult ...) (reg)) rather than
>>> (plus (reg) (mult ...)), but it looks like we create the latter.
>>> I realise you try both forms here:
>> It might happen because equiv substitution in LRA.
>>>> +	  /* Some targets like ARM, accept address operands in
>>>> +	     specific order -- try exchange them if necessary.	*/
>>>> +	  if (! valid_address_p (mode, *addr_loc, as))
>>>> +	    {
>>>> +	      exchange_plus_ops (*addr_loc);
>>>> +	      if (! valid_address_p (mode, *addr_loc, as))
>>>> +		exchange_plus_ops (*addr_loc);
>>>> +	    }
>>> but I think we should try the canonical form first.  And I'd prefer it
>>> if we didn't try the other form at all, especially in 4.8.  It isn't
>>> really the backend's job to reject non-canonical rtl.  This might well
>>> be another case where some targets need a (hopefully small) tweak in
>>> order to play by the rules.
>>>
>>> Also, I suppose this section of code feeds back to my question on
>>> Wednesday about the distinction that LRA seems to make between the
>>> compile-time constant in:
>>>
>>>     (plus (reg X1) (const_int Y1))
>>>
>>> and the link-time constant in:
>>>
>>>     (plus (reg X2) (symbol_ref Y2))
>>>
>>> It looked like extract_address_regs classified X1 as a base register and
>>> X2 as an index register.  The difference between the two constants has
>>> no run-time significance though, and I think we should handle both X1
>>> and X2 as base registers (as I think reload does).
>>>
>>> I think the path above would then be specific to scaled indices.
>>> In the original address the "complex" index must come first and the
>>> displacement second.  In the modified address, the index would stay
>>> first and the new base register would be second.  More below.
>> As I wrote above the problem is also in that equiv substitution can
>> create non-canonical forms.
> Right.  Just in case there's a misunderstanding: I'm not complaining
> about these routines internally using forms that are noncanonical
> (which could happen because of equiv substitution, like you say).
> I just think that what we eventually try to validate should be canonical.
> In a way it's similar to how the simplify-rtx.c routines work.
>
> If there are targets that only accept noncanonical rtl (which is after
> all just a specific type of invalid rtl), they need to be fixed.
Agree.  In order not to forget to fix targets I am removing operand 
exchange.
>>>> +      /* base + scale * index + disp => new base + scale * index  */
>>>> +      new_reg = base_plus_disp_to_reg (mode, as, &ad);
>>>> +      *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc);
>>>> +      if (! valid_address_p (mode, *addr_loc, as))
>>>> +	{
>>>> +	  /* Some targets like ARM, accept address operands in
>>>> +	     specific order -- try exchange them if necessary.	*/
>>>> +	  exchange_plus_ops (*addr_loc);
>>>> +	  if (! valid_address_p (mode, *addr_loc, as))
>>>> +	    exchange_plus_ops (*addr_loc);
>>>> +	}
>>> Same comment as above about canonical rtl.  Here we can have two
>>> registers -- in which case the base should come first -- or a more
>>> complex index -- in which case the index should come first.
>>>
>>> We should be able to pass both rtxes to simplify_gen_binary (PLUS, ...),
>>> with the operands in either order, and let it take care of the details.
>>> Using simplify_gen_binary would help with the earlier index+disp case too.
>> Equiv substitution can create non-canonical forms.  There are 2 approaches:
>> o have a code for dealing with non-canonical forms (equiv substitution,
>> target stupidity)
>> o always support canonical forms and require them from targets.
>>
>> I decided to use the 1st variant but I am reconsidering it.  I'll try to
>> fix before inclusion.  But I am not sure I have time for this.  All
>> these changes makes LRA unstable. In fact, I've just found that changes
>> I already made so far resulted in 2 SPEC2000 tests broken although GCC
>> testsuite and bootstrap looks good.
> OK.  I'm happy to try fixing the noncanonical thing.
>
>>>> +  /* If this is post-increment, first copy the location to the reload reg.  */
>>>> +  if (post && real_in != result)
>>>> +    emit_insn (gen_move_insn (result, real_in));
>>> Nit, but real_in != result can never be true AIUI, and I was confused how
>>> the code could be correct in that case.  Maybe just remove it, or make
>>> it an assert?
>> No, it might be true:
>>
>> real_in = in == value ? incloc : in;
>> ...
>> if (cond)
>>     result = incloc;
>> else
>>     result = ...
>>
>> if (post && real_in != result)
>>
>> So it is true if in==value && cond
> Sorry, what I meant was that cond is "! post && REG_P (incloc)":
>
>    if (! post && REG_P (incloc))
>      result = incloc;
>    else
>      result = lra_create_new_reg (GET_MODE (value), value, new_rclass,
> 				 "INC/DEC result");
>
> so it can never be true in the "post" case quoted above.
>
>
Fixed.
diff mbox

Patch

Index: lra-spills.c
===================================================================
--- lra-spills.c	(revision 0)
+++ lra-spills.c	(working copy)
@@ -0,0 +1,601 @@ 
+/* Change pseudos by memory.
+   Copyright (C) 2010, 2011, 2012
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.	If not see
+<http://www.gnu.org/licenses/>.	 */
+
+
+/* This file contains code for a pass to change spilled pseudos into
+   memory.
+
+   The pass creates necessary stack slots and assign spilled pseudos
+   to the stack slots in following way:
+
+   for all spilled pseudos P most frequently used first do
+     for all stack slots S do
+       if P doesn't conflict with pseudos assigned to S then
+	 assign S to P and goto to the next pseudo process
+       end
+     end
+     create new stack slot S and assign P to S
+   end
+ 
+   The actual algorithm is bit more complicated because of different
+   pseudo sizes.
+
+   After that the code changes spilled pseudos (except ones created
+   from scratches) by corresponding stack slot memory in RTL.
+
+   If at least one stack slot was created, we need to run more passes
+   because we have new addresses which should be checked and because
+   the old address displacements might change and address constraints
+   (or insn memory constraints) might be not satisfied any more.
+
+   For some targets, the pass can spill some pseudos into hard
+   registers of different class (usually into vector registers)
+   instead of spilling them into memory if it is possible and
+   profitable.	Spilling GENERAL_REGS pseudo into SSE registers for
+   modern Intel x86/x86-64 processors is an example of such
+   optimization.  And this is actually recommended by Intel
+   optimization guide.
+
+   The file also contains code for final change of pseudos on hard
+   regs correspondingly assigned to them.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "output.h"
+#include "regs.h"
+#include "hard-reg-set.h"
+#include "flags.h"
+#include "function.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "except.h"
+#include "timevar.h"
+#include "target.h"
+#include "lra-int.h"
+#include "ira.h"
+#include "df.h"
+
+
+/* Max regno at the start of the pass.	*/
+static int regs_num;
+
+/* Map spilled regno -> hard regno used instead of memory for
+   spilling.  */
+static rtx *spill_hard_reg;
+
+/* The structure describes stack slot of a spilled pseudo.  */
+struct pseudo_slot
+{
+  /* Number (0, 1, ...) of the stack slot to which given pseudo
+     belongs.  */
+  int slot_num;
+  /* First or next slot with the same slot number.  */
+  struct pseudo_slot *next, *first;
+  /* Memory representing the spilled pseudo.  */
+  rtx mem;
+};
+
+/* The stack slots for each spilled pseudo.  Indexed by regnos.	 */
+static struct pseudo_slot *pseudo_slots;
+
+/* The structure describes a stack slot which can be used for several
+   spilled pseudos.  */
+struct slot
+{
+  /* First pseudo with given stack slot.  */
+  int regno;
+  /* Hard reg into which the slot pseudos are spilled.	The value is
+     negative for pseudos spilled into memory.	*/
+  int hard_regno;
+  /* Memory representing the all stack slot.  It can be different from
+     memory representing a pseudo belonging to give stack slot because
+     pseudo can be placed in a part of the corresponding stack slot.
+     The value is NULL for pseudos spilled into a hard reg.  */
+  rtx mem;
+  /* Combined live ranges of all pseudos belonging to given slot.  It
+     is used to figure out that a new spilled pseudo can use given
+     stack slot.  */
+  lra_live_range_t live_ranges;
+};
+
+/* Array containing info about the stack slots.	 The array element is
+   indexed by the stack slot number in the range [0..slost_num).  */
+static struct slot *slots;
+/* The number of the stack slots currently existing.  */
+static int slots_num;
+
+/* Set up memory of the spilled pseudo I.  The function can allocate
+   the corresponding stack slot if it is not done yet.	*/
+static void
+assign_mem_slot (int i)
+{
+  rtx x = NULL_RTX;
+  enum machine_mode mode = GET_MODE (regno_reg_rtx[i]);
+  unsigned int inherent_size = PSEUDO_REGNO_BYTES (i);
+  unsigned int inherent_align = GET_MODE_ALIGNMENT (mode);
+  unsigned int max_ref_width = GET_MODE_SIZE (lra_reg_info[i].biggest_mode);
+  unsigned int total_size = MAX (inherent_size, max_ref_width);
+  unsigned int min_align = max_ref_width * BITS_PER_UNIT;
+  int adjust = 0;
+
+  lra_assert (regno_reg_rtx[i] != NULL_RTX && REG_P (regno_reg_rtx[i])
+	      && lra_reg_info[i].nrefs != 0 && reg_renumber[i] < 0);
+  
+  x = slots[pseudo_slots[i].slot_num].mem;
+  
+  if (x)
+    ;
+  /* Each pseudo has an inherent size which comes from its own mode,
+     and a total size which provides room for paradoxical subregs
+     which refer to the pseudo reg in wider modes.
+     
+     We can use a slot already allocated if it provides both enough
+     inherent space and enough total space.  Otherwise, we allocate a
+     new slot, making sure that it has no less inherent space, and no
+     less total space, then the previous slot.	*/
+  else
+    {
+      rtx stack_slot;
+
+      /* No known place to spill from => no slot to reuse.  */
+      x = assign_stack_local (mode, total_size,
+			      min_align > inherent_align
+			      || total_size > inherent_size ? -1 : 0);
+      x = lra_eliminate_regs_1 (x, GET_MODE (x), false, false, true);
+      stack_slot = x;
+      /* Cancel the big-endian correction done in assign_stack_local.
+	 Get the address of the beginning of the slot.	This is so we
+	 can do a big-endian correction unconditionally below.	*/
+      if (BYTES_BIG_ENDIAN)
+	{
+	  adjust = inherent_size - total_size;
+	  if (adjust)
+	    stack_slot
+	      = adjust_address_nv (x,
+				   mode_for_size (total_size * BITS_PER_UNIT,
+						  MODE_INT, 1),
+				   adjust);
+	}
+      slots[pseudo_slots[i].slot_num].mem = stack_slot;
+    }
+      
+  /* On a big endian machine, the "address" of the slot is the address
+     of the low part that fits its inherent mode.  */
+  if (BYTES_BIG_ENDIAN && inherent_size < total_size)
+    adjust += (total_size - inherent_size);
+  
+  /* If we have any adjustment to make, or if the stack slot is the
+     wrong mode, make a new stack slot.	 */
+  x = adjust_address_nv (x, GET_MODE (regno_reg_rtx[i]), adjust);
+  
+  /* Set all of the memory attributes as appropriate for a spill.  */
+  set_mem_attrs_for_spill (x);
+  pseudo_slots[i].mem = x;
+}
+
+/* Sort pseudos according their usage frequencies.  */
+static int
+regno_freq_compare (const void *v1p, const void *v2p)
+{
+  const int regno1 = *(const int *) v1p;
+  const int regno2 = *(const int *) v2p;
+  int diff;
+
+  if ((diff = lra_reg_info[regno2].freq - lra_reg_info[regno1].freq) != 0)
+    return diff;
+  return regno1 - regno2;
+}
+
+/* Redefine STACK_GROWS_DOWNWARD in terms of 0 or 1.  */
+#ifdef STACK_GROWS_DOWNWARD
+# undef STACK_GROWS_DOWNWARD
+# define STACK_GROWS_DOWNWARD 1
+#else
+# define STACK_GROWS_DOWNWARD 0
+#endif
+
+/* Sort pseudos according their slot numbers putting ones with smaller
+   numbers first, or last when the frame pointer is not needed.	 So
+   pseudos with the first slot will be finally addressed with smaller
+   address displacement.  */
+static int
+pseudo_reg_slot_compare (const void *v1p, const void *v2p)
+{
+  const int regno1 = *(const int *) v1p;
+  const int regno2 = *(const int *) v2p;
+  int diff, slot_num1, slot_num2;
+  int total_size1, total_size2;
+
+  slot_num1 = pseudo_slots[regno1].slot_num;
+  slot_num2 = pseudo_slots[regno2].slot_num;
+  if ((diff = slot_num1 - slot_num2) != 0)
+    return (frame_pointer_needed
+	    || !FRAME_GROWS_DOWNWARD == STACK_GROWS_DOWNWARD ? diff : -diff);
+  total_size1 = MAX (PSEUDO_REGNO_BYTES (regno1),
+		     GET_MODE_SIZE (lra_reg_info[regno1].biggest_mode));
+  total_size2 = MAX (PSEUDO_REGNO_BYTES (regno2),
+		     GET_MODE_SIZE (lra_reg_info[regno2].biggest_mode));
+  if ((diff = total_size2 - total_size1) != 0)
+    return diff;
+  return regno1 - regno2;
+}
+
+/* Assign spill hard registers to N pseudos in PSEUDO_REGNOS.  Put the
+   pseudos which did not get a spill hard register at the beginning of
+   array PSEUDO_REGNOS.	 Return the number of such pseudos.  */
+static int
+assign_spill_hard_regs (int *pseudo_regnos, int n)
+{
+  int i, k, p, regno, res, spill_class_size, hard_regno, nr;
+  enum reg_class rclass, spill_class;
+  enum machine_mode mode;
+  lra_live_range_t r;
+  rtx insn, set;
+  basic_block bb;
+  HARD_REG_SET conflict_hard_regs;
+  bitmap_head ok_insn_bitmap;
+  bitmap set_jump_crosses = regstat_get_setjmp_crosses ();
+  /* Hard registers which can not be used for any purpose at given
+     program point because they are unallocatable or already allocated
+     for other pseudos.	 */ 
+  HARD_REG_SET *reserved_hard_regs;
+
+  if (! lra_reg_spill_p)
+    return n;
+  /* Set up reserved hard regs for every program point.	 */
+  reserved_hard_regs = (HARD_REG_SET *) xmalloc (sizeof (HARD_REG_SET)
+						 * lra_live_max_point);
+  for (p = 0; p < lra_live_max_point; p++)
+    COPY_HARD_REG_SET (reserved_hard_regs[p], lra_no_alloc_regs);
+  for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
+    if (lra_reg_info[i].nrefs != 0
+	&& (hard_regno = lra_get_regno_hard_regno (i)) >= 0)
+      for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
+	for (p = r->start; p <= r->finish; p++)
+	  lra_add_hard_reg_set (hard_regno, lra_reg_info[i].biggest_mode,
+				&reserved_hard_regs[p]);
+  bitmap_initialize (&ok_insn_bitmap, &reg_obstack);
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+      if (DEBUG_INSN_P (insn)
+	  || ((set = single_set (insn)) != NULL_RTX
+	      && REG_P (SET_SRC (set)) && REG_P (SET_DEST (set))))
+	bitmap_set_bit (&ok_insn_bitmap, INSN_UID (insn));
+  for (res = i = 0; i < n; i++)
+    {
+      regno = pseudo_regnos[i];
+      rclass = lra_get_allocno_class (regno);
+      if (bitmap_bit_p (set_jump_crosses, regno)
+	  || (spill_class = targetm.spill_class (rclass)) == NO_REGS
+	  || (targetm.spill_class_mode (rclass, spill_class,
+					PSEUDO_REGNO_MODE (regno))
+	      != PSEUDO_REGNO_MODE (regno))
+	  || bitmap_intersect_compl_p (&lra_reg_info[regno].insn_bitmap,
+				       &ok_insn_bitmap))
+	{
+	  pseudo_regnos[res++] = regno;
+	  continue;
+	}
+      COPY_HARD_REG_SET (conflict_hard_regs,
+			 lra_reg_info[regno].conflict_hard_regs);
+      for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+	for (p = r->start; p <= r->finish; p++)
+	  IOR_HARD_REG_SET (conflict_hard_regs, reserved_hard_regs[p]);
+      spill_class = targetm.spill_class (lra_get_allocno_class (regno));
+      lra_assert (spill_class != NO_REGS);
+      spill_class_size = ira_class_hard_regs_num[spill_class];
+      mode = lra_reg_info[regno].biggest_mode;
+      for (k = 0; k < spill_class_size; k++)
+	{
+	  hard_regno = ira_class_hard_regs[spill_class][k];
+	  if (! lra_hard_reg_set_intersection_p (hard_regno, mode,
+						 conflict_hard_regs))
+	    break;
+	}
+      if (k >= spill_class_size)
+	{
+	   /* There is no available regs -- assign memory later.  */
+	  pseudo_regnos[res++] = regno;
+	  continue;
+	}
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "  Spill r%d into hr%d\n", regno, hard_regno);
+      /* Update reserved_hard_regs.  */
+      for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+	for (p = r->start; p <= r->finish; p++)
+	  lra_add_hard_reg_set (hard_regno, lra_reg_info[regno].biggest_mode,
+				&reserved_hard_regs[p]);
+      spill_hard_reg[regno]
+	= gen_raw_REG (PSEUDO_REGNO_MODE (regno), hard_regno);
+      for (nr = 0;
+	   nr < hard_regno_nregs[hard_regno][lra_reg_info[regno].biggest_mode];
+	   nr++);
+      df_set_regs_ever_live (hard_regno + nr, true);
+    }
+  bitmap_clear (&ok_insn_bitmap);
+  free (reserved_hard_regs);
+  return res;
+}
+
+/* Add pseudo REGNO to slot SLOT_NUM.  */
+static void
+add_pseudo_to_slot (int regno, int slot_num)
+{
+  struct pseudo_slot *first;
+
+  if (slots[slot_num].regno < 0)
+    {
+      /* It is the first pseudo in the slot.  */
+      slots[slot_num].regno = regno;
+      pseudo_slots[regno].first = &pseudo_slots[regno];
+      pseudo_slots[regno].next = NULL;
+    }
+  else
+    {
+      first = pseudo_slots[regno].first = &pseudo_slots[slots[slot_num].regno];
+      pseudo_slots[regno].next = pseudo_slots[slots[slot_num].regno].next;
+      first->next = &pseudo_slots[regno];
+    }
+  pseudo_slots[regno].mem = NULL_RTX;
+  pseudo_slots[regno].slot_num = slot_num;
+  slots[slot_num].live_ranges
+    = lra_merge_live_ranges (slots[slot_num].live_ranges,
+			     lra_copy_live_range_list
+			     (lra_reg_info[regno].live_ranges));
+}
+
+/* Assign stack slot numbers to pseudos in array PSEUDO_REGNOS of
+   length N.  Sort pseudos in PSEUDO_REGNOS for subsequent assigning
+   memory stack slots.	*/
+static void
+assign_stack_slot_num_and_sort_pseudos (int *pseudo_regnos, int n)
+{
+  int i, j, regno;
+
+  slots_num = 0;
+  /* Assign stack slot numbers to spilled pseudos, use smaller numbers
+     for most frequently used pseudos.	*/
+  for (i = 0; i < n; i++)
+    {
+      regno = pseudo_regnos[i];
+      if (! flag_ira_share_spill_slots)
+	j = slots_num;
+      else
+	{
+	  for (j = 0; j < slots_num; j++)
+	    if (slots[j].hard_regno < 0
+		&& ! (lra_intersected_live_ranges_p
+		      (slots[j].live_ranges,
+		       lra_reg_info[regno].live_ranges)))
+	      break;
+	}
+      if (j >= slots_num)
+	{
+	  /* New slot.	*/
+	  slots[j].live_ranges = NULL;
+	  slots[j].regno = slots[j].hard_regno = -1;
+	  slots[j].mem = NULL_RTX;
+	  slots_num++;
+	}
+      add_pseudo_to_slot (regno, j);
+    }
+  /* Sort regnos according to their slot numbers.  */
+  qsort (pseudo_regnos, n, sizeof (int), pseudo_reg_slot_compare);
+}
+
+/* Recursively process LOC in INSN and change spilled pseudos to the
+   corresponding memory or spilled hard reg.  Ignore spilled pseudos
+   created from the scratches.	*/
+static bool
+remove_pseudos (rtx *loc, rtx insn)
+{
+  int i;
+  bool res;
+  rtx hard_reg;
+  const char *fmt;
+  enum rtx_code code;
+
+  if (*loc == NULL_RTX)
+    return false;
+  code = GET_CODE (*loc);
+  if (code == REG && (i = REGNO (*loc)) >= FIRST_PSEUDO_REGISTER
+      && lra_get_regno_hard_regno (i) < 0
+      /* We do not want to assign memory for former scratches because
+	 it might result in an address reload for some targets.	 In
+	 any case we transform such pseudos not getting hard registers
+	 into scratches back.  */
+      && ! lra_former_scratch_p (i))
+    {
+      hard_reg = spill_hard_reg[i];
+      *loc = copy_rtx (hard_reg != NULL_RTX ? hard_reg : pseudo_slots[i].mem);
+      return true;
+    }
+
+  res = false;
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	res = remove_pseudos (&XEXP (*loc, i), insn) || res;
+      else if (fmt[i] == 'E')
+	{
+	  int j;
+
+	  for (j = XVECLEN (*loc, i) - 1; j >= 0; j--)
+	    res = remove_pseudos (&XVECEXP (*loc, i, j), insn) || res;
+	}
+    }
+  return res;
+}
+
+/* Convert spilled pseudos into their stack slots or spill hard regs,
+   put insns to process on the constraint stack (that is all insns in
+   which pseudos were changed to memory or spill hard regs).   */
+static void
+spill_pseudos (void)
+{
+  basic_block bb;
+  rtx insn;
+  int i;
+  bitmap_head spilled_pseudos, changed_insns;
+
+  bitmap_initialize (&spilled_pseudos, &reg_obstack);
+  bitmap_initialize (&changed_insns, &reg_obstack);
+  for (i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
+    {
+      if (lra_reg_info[i].nrefs != 0 && lra_get_regno_hard_regno (i) < 0
+	  && ! lra_former_scratch_p (i))
+	{
+	  bitmap_set_bit (&spilled_pseudos, i);
+	  bitmap_ior_into (&changed_insns, &lra_reg_info[i].insn_bitmap);
+	}
+    }
+  FOR_EACH_BB (bb)
+    {
+      FOR_BB_INSNS (bb, insn)
+	if (bitmap_bit_p (&changed_insns, INSN_UID (insn)))
+	  {
+	    remove_pseudos (&PATTERN (insn), insn);
+	    if (lra_dump_file != NULL)
+	      fprintf (lra_dump_file,
+		       "Changing spilled pseudos to memory in insn #%u\n",
+		       INSN_UID (insn));
+	    lra_push_insn (insn);
+	    if (lra_reg_spill_p || targetm.different_addr_displacement_p ())
+	      lra_set_used_insn_alternative (insn, -1);
+	  }
+      bitmap_and_compl_into (DF_LR_IN (bb), &spilled_pseudos);
+      bitmap_and_compl_into (DF_LR_OUT (bb), &spilled_pseudos);
+    }
+  bitmap_clear (&spilled_pseudos);
+  bitmap_clear (&changed_insns);
+}
+
+/* Return true if we need to change some pseudos into memory.  */
+bool
+lra_need_for_spills_p (void)
+{
+  int i; max_regno = max_reg_num ();
+
+  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
+    if (lra_reg_info[i].nrefs != 0 && lra_get_regno_hard_regno (i) < 0
+	&& ! lra_former_scratch_p (i))
+      return true;
+  return false;
+}
+
+/* Change spilled pseudos into memory or spill hard regs.  The
+   function put changed insns on the constraint stack (these insns
+   will be considered on the next constraint pass).  The changed insns
+   are all insns in which pseudos were changed.	 */
+void
+lra_spill (void)
+{
+  int i, n, curr_regno;
+  int *pseudo_regnos;
+
+  regs_num = max_reg_num ();
+  spill_hard_reg = (rtx *) xmalloc (sizeof (rtx) * regs_num);
+  pseudo_regnos = (int *) xmalloc (sizeof (int) * regs_num);
+  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < regs_num; i++)
+    if (lra_reg_info[i].nrefs != 0 && lra_get_regno_hard_regno (i) < 0
+	/* We do not want to assign memory for former scratches.  */
+	&& ! lra_former_scratch_p (i))
+      {
+	spill_hard_reg[i] = NULL_RTX;
+	pseudo_regnos[n++] = i;
+      }
+  lra_assert (n > 0);
+  pseudo_slots = (struct pseudo_slot *) xmalloc (sizeof (struct pseudo_slot)
+						 * regs_num);
+  slots = (struct slot *) xmalloc (sizeof (struct slot) * regs_num);
+  /* Sort regnos according their usage frequencies.  */
+  qsort (pseudo_regnos, n, sizeof (int), regno_freq_compare);
+  n = assign_spill_hard_regs (pseudo_regnos, n);
+  assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n);
+  for (i = 0; i < n; i++)
+    if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
+      assign_mem_slot (pseudo_regnos[i]);
+  if (lra_dump_file != NULL)
+    {
+      for (i = 0; i < slots_num; i++)
+	{
+	  fprintf (lra_dump_file, "  Slot %d regnos (width = %d):", i,
+		   GET_MODE_SIZE (GET_MODE (slots[i].mem)));
+	  for (curr_regno = slots[i].regno;;
+	       curr_regno = pseudo_slots[curr_regno].next - pseudo_slots)
+	    {
+	      fprintf (lra_dump_file, "	 %d", curr_regno);
+	      if (pseudo_slots[curr_regno].next == NULL)
+		break;
+	    }
+	  fprintf (lra_dump_file, "\n");
+	}
+    }
+  spill_pseudos ();
+  free (slots);
+  free (pseudo_slots);
+  free (pseudo_regnos);
+}
+
+/* Final change of pseudos got hard registers into the corresponding
+   hard registers.  */
+void
+lra_hard_reg_substitution (void)
+{
+  int i, hard_regno;
+  basic_block bb;
+  rtx insn;
+  int max_regno = max_reg_num ();
+
+  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
+    if (lra_reg_info[i].nrefs != 0
+	&& (hard_regno = lra_get_regno_hard_regno (i)) >= 0)
+      SET_REGNO (regno_reg_rtx[i], hard_regno);
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+      if (INSN_P (insn))
+	{
+	  lra_insn_recog_data_t id;
+	  bool insn_change_p = false;
+
+	  id = lra_get_insn_recog_data (insn);
+	  for (i = id->insn_static_data->n_operands - 1; i >= 0; i--)
+	    {
+	      rtx op = *id->operand_loc[i];
+
+	      if (GET_CODE (op) == SUBREG && REG_P (SUBREG_REG (op)))
+		{
+		  lra_assert (REGNO (SUBREG_REG (op)) < FIRST_PSEUDO_REGISTER);
+		  alter_subreg (id->operand_loc[i], ! DEBUG_INSN_P (insn));
+		  lra_update_dup (id, i);
+		  insn_change_p = true;
+		}
+	    }
+	  if (insn_change_p)
+	    lra_update_operator_dups (id);
+	}
+}
Index: lra-coalesce.c
===================================================================
--- lra-coalesce.c	(revision 0)
+++ lra-coalesce.c	(working copy)
@@ -0,0 +1,375 @@ 
+/* Coalesce spilled pseudos.
+   Copyright (C) 2010, 2011, 2012
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.	If not see
+<http://www.gnu.org/licenses/>.	 */
+
+
+/* This file contains a pass making some simple RTL code
+   transformations by coalescing pseudos to remove some move insns.
+
+   Spilling pseudos in LRA can create memory-memory moves.  We should
+   remove potential memory-memory moves before the next constraint
+   pass because the constraint pass will generate additional insns for
+   such moves and all these insns will be hard to remove afterwards.
+
+   Here we coalesce only spilled pseudos.  Coalescing non-spilled
+   pseudos (with different hard regs) might result in spilling
+   additional pseudos because of possible conflicts with other
+   non-spilled pseudos and, as a consequence, in more constraint
+   passes and even LRA infinite cycling.  Trivial the same hard
+   register moves will be removed by subsequent compiler passes.
+
+   We don't coalesce bound pseudos.  It complicates LRA code a lot
+   without visible generated code improvement.
+
+   The pseudo live-ranges are used to find conflicting pseudos during
+   coalescing.
+
+   Most frequently executed moves is tried to be coalesced first.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "output.h"
+#include "regs.h"
+#include "hard-reg-set.h"
+#include "flags.h"
+#include "function.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "except.h"
+#include "timevar.h"
+#include "ira.h"
+#include "lra-int.h"
+#include "df.h"
+
+/* Arrays whose elements represent the first and the next pseudo
+   (regno) in the coalesced pseudos group to which given pseudo (its
+   regno is the index) belongs.	 The next of the last pseudo in the
+   group refers to the first pseudo in the group, in other words the
+   group is represented by a cyclic list.  */
+static int *first_coalesced_pseudo, *next_coalesced_pseudo;
+
+/* The function is used to sort moves according to their execution
+   frequencies.	 */
+static int
+move_freq_compare_func (const void *v1p, const void *v2p)
+{
+  rtx mv1 = *(const rtx *) v1p;
+  rtx mv2 = *(const rtx *) v2p;
+  int pri1, pri2;
+  
+  pri1 = BLOCK_FOR_INSN (mv1)->frequency;
+  pri2 = BLOCK_FOR_INSN (mv2)->frequency;
+  if (pri2 - pri1)
+    return pri2 - pri1;
+
+  /* If frequencies are equal, sort by moves, so that the results of
+     qsort leave nothing to chance.  */
+  return (int) INSN_UID (mv1) - (int) INSN_UID (mv2);
+}
+
+/* Pseudos which go away after coalescing.  */
+static bitmap_head coalesced_pseudos_bitmap;
+
+/* Merge two sets of coalesced pseudos given correspondingly by
+   pseudos REGNO1 and REGNO2 (more accurately merging REGNO2 group
+   into REGNO1 group).	Set up COALESCED_PSEUDOS_BITMAP.  */
+static void
+merge_pseudos (int regno1, int regno2)
+{
+  int regno, first, first2, last, next;
+
+  first = first_coalesced_pseudo[regno1];
+  if ((first2 = first_coalesced_pseudo[regno2]) == first)
+    return;
+  for (last = regno2, regno = next_coalesced_pseudo[regno2];;
+       regno = next_coalesced_pseudo[regno])
+    {
+      first_coalesced_pseudo[regno] = first;
+      bitmap_set_bit (&coalesced_pseudos_bitmap, regno);
+      if (regno == regno2)
+	break;
+      last = regno;
+    }
+  next = next_coalesced_pseudo[first];
+  next_coalesced_pseudo[first] = regno2;
+  next_coalesced_pseudo[last] = next;
+  lra_reg_info[first].live_ranges
+    = (lra_merge_live_ranges
+       (lra_reg_info[first].live_ranges,
+	lra_copy_live_range_list (lra_reg_info[first2].live_ranges)));
+  if (GET_MODE_SIZE (lra_reg_info[first].biggest_mode)
+      < GET_MODE_SIZE (lra_reg_info[first2].biggest_mode))
+    lra_reg_info[first].biggest_mode = lra_reg_info[first2].biggest_mode;
+}
+
+/* Change pseudos in *LOC on their coalescing group
+   representatives.  */
+static bool
+substitute (rtx *loc)
+{
+  int i, regno;
+  const char *fmt;
+  enum rtx_code code;
+  bool res;
+
+  if (*loc == NULL_RTX)
+    return false;
+  code = GET_CODE (*loc);
+  if (code == REG)
+    {
+      regno = REGNO (*loc);
+      if (regno < FIRST_PSEUDO_REGISTER
+	  || first_coalesced_pseudo[regno] == regno)
+	return false;
+      *loc = regno_reg_rtx[first_coalesced_pseudo[regno]];
+      return true;
+    }
+
+  res = false;
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	{
+	  if (substitute (&XEXP (*loc, i)))
+	    res = true;
+	}
+      else if (fmt[i] == 'E')
+	{
+	  int j;
+
+	  for (j = XVECLEN (*loc, i) - 1; j >= 0; j--)
+	    if (substitute (&XVECEXP (*loc, i, j)))
+	      res = true;
+	}
+    }
+  return res;
+}
+
+/* The current iteration (1, 2, ...) of the coalescing pass.  */
+int lra_coalesce_iter;
+
+/* Return true if the move involving REGNO1 and REGNO2 is a potential
+   memory-memory move.	*/
+static bool
+mem_move_p (int regno1, int regno2)
+{
+  return reg_renumber[regno1] < 0 && reg_renumber[regno2] < 0;
+}
+
+
+/* Pseudos which go away after coalescing and pseudos used instead of
+   the removed pseudos.	 */
+static bitmap_head removed_pseudos_bitmap, used_pseudos_bitmap;
+
+/* Set up REMOVED_PSEUDOS_BITMAP and USED_PSEUDOS_BITMAP, and update
+   LR_BITMAP (a BB live info bitmap).  */
+static void
+update_live_info (bitmap lr_bitmap)
+{
+  unsigned int j;
+  bitmap_iterator bi;
+
+  bitmap_clear (&removed_pseudos_bitmap);
+  bitmap_clear (&used_pseudos_bitmap);
+  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
+			    FIRST_PSEUDO_REGISTER, j, bi)
+    {
+      bitmap_set_bit (&removed_pseudos_bitmap, j);
+      bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
+    }
+  if (! bitmap_empty_p (&removed_pseudos_bitmap))
+    {
+      bitmap_and_compl_into (lr_bitmap, &removed_pseudos_bitmap);
+      bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
+    }
+}
+
+/* The major function for aggressive pseudo coalescing of moves only
+   if the both pseudos were spilled and not bound.  */
+bool
+lra_coalesce (void)
+{
+  basic_block bb;
+  rtx mv, set, insn, next, *sorted_moves;
+  int i, n, mv_num, sregno, dregno, restore_regno;
+  unsigned int regno;
+  int coalesced_moves;
+  int max_regno = max_reg_num ();
+  bitmap_head involved_insns_bitmap, split_origin_bitmap;
+  bitmap_iterator bi;
+
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file,
+	     "\n********** Pseudos coalescing #%d: **********\n\n",
+	     ++lra_coalesce_iter);
+  first_coalesced_pseudo = (int *) xmalloc (sizeof (int) * max_regno);
+  next_coalesced_pseudo = (int *) xmalloc (sizeof (int) * max_regno);
+  for (i = 0; i < max_regno; i++)
+    first_coalesced_pseudo[i] = next_coalesced_pseudo[i] = i;
+  sorted_moves = (rtx *) xmalloc (get_max_uid () * sizeof (rtx));
+  mv_num = 0;
+  /* Collect pseudos whose live ranges were split.  */
+  bitmap_initialize (&split_origin_bitmap, &reg_obstack);
+  EXECUTE_IF_SET_IN_BITMAP (&lra_split_pseudos, 0, regno, bi)
+    if ((restore_regno = lra_reg_info[regno].restore_regno) >= 0)
+      bitmap_set_bit (&split_origin_bitmap, restore_regno);
+  /* Collect moves.  */
+  coalesced_moves = 0;
+  FOR_EACH_BB (bb)
+    {
+      FOR_BB_INSNS_SAFE (bb, insn, next)
+	if (INSN_P (insn)
+	    && (set = single_set (insn)) != NULL_RTX
+	    && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set))
+	    && (sregno = REGNO (SET_SRC (set))) >= FIRST_PSEUDO_REGISTER
+	    && (dregno = REGNO (SET_DEST (set))) >= FIRST_PSEUDO_REGISTER
+	    && mem_move_p (sregno, dregno)
+	    /* Don't coalesce inheritance pseudos because spilled
+	       inheritance pseudos will be removed in subsequent 'undo
+	       inheritance' pass.  */
+	    && lra_reg_info[sregno].restore_regno < 0
+	    && lra_reg_info[dregno].restore_regno < 0
+	    /* We undo splits for spilled pseudos whose live ranges
+	       were split.  So don't coalesce them, it is not
+	       necessary and the undo transformations would be
+	       wrong.  */
+	    && ! bitmap_bit_p (&split_origin_bitmap, sregno)
+	    && ! bitmap_bit_p (&split_origin_bitmap, dregno)
+	    && ! side_effects_p (set)
+	    /* Don't coalesces bound pseudos.  Bound pseudos has own
+	       rules for finding live ranges.  It is hard to maintain
+	       this info with coalescing and it is not worth to do
+	       it.  */
+	    && ! bitmap_bit_p (&lra_bound_pseudos, sregno)
+	    && ! bitmap_bit_p (&lra_bound_pseudos, dregno)
+	    /* We don't want to coalesce regnos with equivalences,
+	       at least without updating this info.  */
+	    && ira_reg_equiv[sregno].constant == NULL_RTX
+	    && ira_reg_equiv[sregno].memory == NULL_RTX
+	    && ira_reg_equiv[sregno].invariant == NULL_RTX
+	    && ira_reg_equiv[dregno].constant == NULL_RTX
+	    && ira_reg_equiv[dregno].memory == NULL_RTX
+	    && ira_reg_equiv[dregno].invariant == NULL_RTX
+	    && !(lra_intersected_live_ranges_p
+		 (lra_reg_info[sregno].live_ranges,
+		  lra_reg_info[dregno].live_ranges)))
+	  sorted_moves[mv_num++] = insn;
+    }
+  bitmap_clear (&removed_pseudos_bitmap);
+  qsort (sorted_moves, mv_num, sizeof (rtx), move_freq_compare_func);
+  /* Coalesced copies, most frequently executed first.	*/
+  bitmap_initialize (&coalesced_pseudos_bitmap, &reg_obstack);
+  bitmap_initialize (&involved_insns_bitmap, &reg_obstack);
+  for (; mv_num != 0;)
+    {
+      for (i = 0; i < mv_num; i++)
+	{
+	  mv = sorted_moves[i];
+	  set = single_set (mv);
+	  lra_assert (set != NULL && REG_P (SET_SRC (set))
+		      && REG_P (SET_DEST (set)));
+	  sregno = REGNO (SET_SRC (set));
+	  dregno = REGNO (SET_DEST (set));
+	  if (! lra_intersected_live_ranges_p
+		(lra_reg_info[first_coalesced_pseudo[sregno]].live_ranges,
+		 lra_reg_info[first_coalesced_pseudo[dregno]].live_ranges))
+	    {
+	      coalesced_moves++;
+	      if (lra_dump_file != NULL)
+		fprintf
+		  (lra_dump_file,
+		   "	  Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
+		   INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
+		   dregno, ORIGINAL_REGNO (SET_DEST (set)),
+		   BLOCK_FOR_INSN (mv)->frequency);
+	      bitmap_ior_into (&involved_insns_bitmap,
+			       &lra_reg_info[sregno].insn_bitmap);
+	      bitmap_ior_into (&involved_insns_bitmap,
+			       &lra_reg_info[dregno].insn_bitmap);
+	      merge_pseudos (sregno, dregno);
+	      i++;
+	      break;
+	    }
+	}
+      /* Collect the rest of copies.  */
+      for (n = 0; i < mv_num; i++)
+	{
+	  mv = sorted_moves[i];
+	  set = single_set (mv);
+	  lra_assert (set != NULL && REG_P (SET_SRC (set))
+		      && REG_P (SET_DEST (set)));
+	  sregno = REGNO (SET_SRC (set));
+	  dregno = REGNO (SET_DEST (set));
+	  if (first_coalesced_pseudo[sregno] != first_coalesced_pseudo[dregno])
+	    sorted_moves[n++] = mv;
+	  else if (lra_dump_file != NULL)
+	    {
+	      coalesced_moves++;
+	      fprintf
+		(lra_dump_file, "      Coalescing move %i:r%d-r%d (freq=%d)\n",
+		 INSN_UID (mv), sregno, dregno,
+		 BLOCK_FOR_INSN (mv)->frequency);
+	    }
+	}
+      mv_num = n;
+    }
+  bitmap_initialize (&removed_pseudos_bitmap, &reg_obstack);
+  bitmap_initialize (&used_pseudos_bitmap, &reg_obstack);
+  FOR_EACH_BB (bb)
+    {
+      update_live_info (DF_LR_IN (bb));
+      update_live_info (DF_LR_OUT (bb));
+      FOR_BB_INSNS_SAFE (bb, insn, next)
+	if (INSN_P (insn)
+	    && bitmap_bit_p (&involved_insns_bitmap, INSN_UID (insn)))
+	  {
+	    if (! substitute (&insn))
+	      continue;
+	    lra_update_insn_regno_info (insn);
+	    if ((set = single_set (insn)) != NULL_RTX
+		&& REG_P (SET_DEST (set)) && REG_P (SET_SRC (set))
+		&& REGNO (SET_SRC (set)) == REGNO (SET_DEST (set))
+		&& ! side_effects_p (set))
+	      {
+		/* Coalesced move.  */
+		if (lra_dump_file != NULL)
+		  fprintf (lra_dump_file, "	 Removing move %i (freq=%d)\n",
+			 INSN_UID (insn), BLOCK_FOR_INSN (insn)->frequency);
+		lra_set_insn_deleted (insn);
+	      }
+	  }
+    }
+  bitmap_clear (&removed_pseudos_bitmap);
+  bitmap_clear (&used_pseudos_bitmap);
+  bitmap_clear (&involved_insns_bitmap);
+  bitmap_clear (&coalesced_pseudos_bitmap);
+  if (lra_dump_file != NULL && coalesced_moves != 0)
+    fprintf (lra_dump_file, "Coalesced Moves = %d\n", coalesced_moves);
+  free (sorted_moves);
+  free (next_coalesced_pseudo);
+  free (first_coalesced_pseudo);
+  return coalesced_moves != 0;
+}
Index: lra-assigns.c
===================================================================
--- lra-assigns.c	(revision 0)
+++ lra-assigns.c	(working copy)
@@ -0,0 +1,1316 @@ 
+/* Assign reload pseudos.
+   Copyright (C) 2010, 2011, 2012
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.	If not see
+<http://www.gnu.org/licenses/>.	 */
+
+
+/* This file contains a pass mostly assigning hard registers to reload
+   pseudos.  There is no any RTL code transformation on this pass.
+
+   Reload pseudos get what they need (usually) hard registers in
+   anyway possibly by spilling non-reload pseudos and by assignment
+   reload pseudos with smallest number of available hard registers
+   first.
+
+   If reload pseudos can get hard registers only through spilling
+   other pseudos, we choose what pseudos to spill taking into account
+   how given reload pseudo benefits and also how other reload pseudos
+   not assigned yet benefit too (see function spill_for).
+
+   Non-reload pseudos can get hard registers too if it is possible and
+   improves the code.  It might be possible because of spilling
+   non-reload pseudos on given pass.
+
+   Bound pseudos always get the same hard register if any.
+
+   We try to assign hard registers processing pseudos by threads.  The
+   thread contains reload and inheritance pseudos connected by copies
+   (move insns).  It improves the chance to get the same hard register
+   to pseudos in the thread and, as the result, to remove some move
+   insns.
+
+   When we assign hard register to a pseudo, we decrease the cost of
+   the hard registers for corresponding pseudos connected by copies.
+ 
+   If two hard registers are equally good for assigning the pseudo
+   with hard register cost point of view, we prefer a hard register in
+   smaller register bank.  By default, there is only one register
+   bank.  A target can define register banks by hook
+   register_bank. For example, x86-64 has a few register banks: hard
+   regs with and without REX prefixes are in different banks.  It
+   permits to generate smaller code as insns without REX prefix are
+   shorter.
+
+   If a few hard registers are still equally good for the assignment,
+   we choose the least used hard register.  It is called leveling and
+   may be profitable for some targets.
+
+   Only insns with changed allocation pseudos are processed on the
+   next constraint pass.
+
+   The pseudo live-ranges are used to find conflicting pseudos.	 */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "hard-reg-set.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "target.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "output.h"
+#include "regs.h"
+#include "function.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "except.h"
+#include "df.h"
+#include "ira.h"
+#include "sparseset.h"
+#include "lra-int.h"
+
+/* Array containing corresponding values of function
+   lra_get_allocno_class.  It is used to speed up the code.  */
+static enum reg_class *regno_allocno_class_array;
+
+/* Info about pseudo used during the assignment pass.  Thread is a set
+   of connected reload and inheritance pseudos with the same set of
+   available hard reg set.  Thread is a pseudo itself for other
+   cases.  */
+struct regno_assign_info
+{
+  /* First/next pseudo of the same thread.  */
+  int first, next;
+  /* Frequency of the thread (execution frequency of only reload
+     pseudos in the thread when the thread contains a reload pseudo).
+     Defined only for the first thread pseudo.	*/
+  int freq;
+};
+
+/* Map regno to the corresponding regno assignment info.  */
+static struct regno_assign_info *regno_assign_info;
+
+/* Process a pseudo copy with execution frequency COPY_FREQ connecting
+   REGNO1 and REGNO2 to form threads.  */
+static void
+process_copy_to_form_thread (int regno1, int regno2, int copy_freq)
+{
+  int last, regno1_first, regno2_first;
+
+  lra_assert (regno1 >= lra_constraint_new_regno_start
+	      && regno2 >= lra_constraint_new_regno_start);
+  regno1_first = regno_assign_info[regno1].first;
+  regno2_first = regno_assign_info[regno2].first;
+  if (regno1_first != regno2_first)
+    {
+      for (last = regno2_first;
+	   regno_assign_info[last].next >= 0;
+	   last = regno_assign_info[last].next)
+	regno_assign_info[last].first = regno1_first;
+      regno_assign_info[last].next = regno_assign_info[regno1_first].next;
+      regno_assign_info[regno1_first].first = regno2_first;
+      regno_assign_info[regno1_first].freq
+	+= regno_assign_info[regno2_first].freq;
+    }
+  regno_assign_info[regno1_first].freq -= 2 * copy_freq;
+  lra_assert (regno_assign_info[regno1_first].freq >= 0);
+}
+
+/* Initialize REGNO_ASSIGN_INFO and form threads.  */
+static void
+init_regno_assign_info (void)
+{
+  int i, regno1, regno2;
+  lra_copy_t cp;
+
+  regno_assign_info
+    = (struct regno_assign_info *) xmalloc (sizeof (struct regno_assign_info)
+				       * max_reg_num ());
+  for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+    {
+      regno_assign_info[i].first = i;
+      regno_assign_info[i].next = -1;
+      regno_assign_info[i].freq = lra_reg_info[i].freq;
+    }
+  /* Form the threads.	*/
+  for (i = 0; (cp = lra_get_copy (i)) != NULL; i++)
+    if ((regno1 = cp->regno1) >= lra_constraint_new_regno_start
+	&& (regno2 = cp->regno2) >= lra_constraint_new_regno_start
+	&& reg_renumber[regno1] < 0 && lra_reg_info[regno1].nrefs != 0
+	&& reg_renumber[regno2] < 0 && lra_reg_info[regno2].nrefs != 0
+	&& (ira_class_hard_regs_num[regno_allocno_class_array[regno1]]
+	    == ira_class_hard_regs_num[regno_allocno_class_array[regno2]]))
+      process_copy_to_form_thread (regno1, regno2, cp->freq);
+}
+
+/* Free REGNO_ASSIGN_INFO.  */
+static void
+finish_regno_assign_info (void)
+{
+  free (regno_assign_info);
+}
+
+/* The function is used to sort *reload* and *inheritance* pseudos to
+   try to assign them hard registers.  We put pseudos from the same
+   thread always nearby.  */
+static int
+reload_pseudo_compare_func (const void *v1p, const void *v2p)
+{
+  int r1 = *(const int *) v1p, r2 = *(const int *) v2p;
+  enum reg_class cl1 = regno_allocno_class_array[r1];
+  enum reg_class cl2 = regno_allocno_class_array[r2];
+  int diff;
+  
+  lra_assert (r1 >= lra_constraint_new_regno_start
+	      && r2 >= lra_constraint_new_regno_start);
+  
+  /* Prefer to assign reload registers with smaller classes first to
+     guarantee assignment to all reload registers.  */
+  if ((diff = (ira_class_hard_regs_num[cl1]
+	       - ira_class_hard_regs_num[cl2])) != 0)
+    return diff;
+  if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
+	       - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
+    return diff;
+  /* Put pseudos from the thread nearby.  */
+  if ((diff = regno_assign_info[r1].first - regno_assign_info[r2].first) != 0)
+    return diff;
+  /* If regs are equally good, sort by their numbers, so that the
+     results of qsort leave nothing to chance.	*/
+  return r1 - r2;
+}
+
+/* The function is used to sort *non-reload* pseudos to try to assign
+   them hard registers.	 The order calculation is simpler than in the
+   previous function and based on the pseudo frequency usage.  */
+static int
+pseudo_compare_func (const void *v1p, const void *v2p)
+{
+  int r1 = *(const int *) v1p, r2 = *(const int *) v2p;
+  int diff;
+
+  /* Prefer to assign more frequently used registers first.  */
+  if ((diff = lra_reg_info[r2].freq - lra_reg_info[r1].freq) != 0)
+    return diff;
+  
+  /* If regs are equally good, sort by their numbers, so that the
+     results of qsort leave nothing to chance.	*/
+  return r1 - r2;
+}
+
+/* Map: program point -> bitmap of all pseudos living at the point and
+   assigned to hard registers.	*/
+static bitmap_head *live_hard_reg_pseudos;
+
+/* reg_renumber corresponding to pseudos marked in
+   live_hard_reg_pseudos.  reg_renumber might be not matched to
+   live_hard_reg_pseudos but live_pseudos_reg_renumber always reflects
+   live_hard_reg_pseudos.  */
+static int *live_pseudos_reg_renumber;
+
+/* Sparseset used to calculate living hard reg pseudos for some program
+   point range.	 */
+static sparseset live_range_hard_reg_pseudos;
+
+/* Sparseset used to calculate living reload pseudos for some program
+   point range.	 */
+static sparseset live_range_reload_pseudos;
+
+/* Allocate and initialize the data about living pseudos at program
+   points.  */
+static void
+init_lives (void)
+{
+  int i;
+
+  live_range_hard_reg_pseudos = sparseset_alloc (max_reg_num ());
+  live_range_reload_pseudos = sparseset_alloc (max_reg_num ());
+  live_hard_reg_pseudos = (bitmap_head *) xmalloc (sizeof (bitmap_head)
+						   * lra_live_max_point);
+  for (i = 0; i < lra_live_max_point; i++)
+    bitmap_initialize (&live_hard_reg_pseudos[i], &reg_obstack);
+  live_pseudos_reg_renumber
+    = (int *) xmalloc (sizeof (int) * max_reg_num ());
+  for (i = 0; i < max_reg_num (); i++)
+    live_pseudos_reg_renumber[i] = -1;
+}
+
+/* Free the data about living pseudos at program points.  */
+static void
+finish_lives (void)
+{
+  int i;
+
+  sparseset_free (live_range_hard_reg_pseudos);
+  sparseset_free (live_range_reload_pseudos);
+  for (i = 0; i < lra_live_max_point; i++)
+    bitmap_clear (&live_hard_reg_pseudos[i]);
+  free (live_hard_reg_pseudos);
+  free (live_pseudos_reg_renumber);
+}
+
+/* Update LIVE_HARD_REG_PSEUDOS and LIVE_PSEUDOS_REG_RENUMBER by
+   pseudo REGNO assignment or by the pseudo spilling if FREE_P.	 */
+static void
+update_lives (int regno, bool free_p)
+{
+  int p;
+  lra_live_range_t r;
+
+  if (reg_renumber[regno] < 0)
+    return;
+  live_pseudos_reg_renumber[regno] = free_p ? -1 : reg_renumber[regno];
+  for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+    {
+      for (p = r->start; p <= r->finish; p++)
+	if (free_p)
+	  bitmap_clear_bit (&live_hard_reg_pseudos[p], regno);
+	else
+	  bitmap_set_bit (&live_hard_reg_pseudos[p], regno);
+    }
+}
+
+/* Sparseset used to calculate reload pseudos conflicting with a given
+   pseudo when we are trying to find a hard register for the given
+   pseudo.  */
+static sparseset conflict_reload_and_inheritance_pseudos;
+
+/* Map: program point -> bitmap of all reload and inheritance pseudos
+   living at the point.	 */
+static bitmap_head *live_reload_and_inheritance_pseudos;
+
+/* Allocate and initialize data about living reload pseudos at any
+   given program point.	 */
+static void
+init_live_reload_and_inheritance_pseudos (void)
+{
+  int i, p;
+  lra_live_range_t r;
+  
+  conflict_reload_and_inheritance_pseudos = sparseset_alloc (max_reg_num ());
+  live_reload_and_inheritance_pseudos
+    = (bitmap_head *) xmalloc (sizeof (bitmap_head) * lra_live_max_point);
+  for (p = 0; p < lra_live_max_point; p++)
+    bitmap_initialize (&live_reload_and_inheritance_pseudos[p], &reg_obstack);
+  for (i = lra_constraint_new_regno_start; i < max_reg_num (); i++)
+    for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
+      for (p = r->start; p <= r->finish; p++)
+	bitmap_set_bit (&live_reload_and_inheritance_pseudos[p], i);
+}
+
+/* Finalize data about living reload pseudos at any given program
+   point.  */
+static void
+finish_live_reload_and_inheritance_pseudos (void)
+{
+  int p;
+
+  sparseset_free (conflict_reload_and_inheritance_pseudos);
+  for (p = 0; p < lra_live_max_point; p++)
+    bitmap_clear (&live_reload_and_inheritance_pseudos[p]);
+  free (live_reload_and_inheritance_pseudos);
+}
+
+/* The value used to check that cost of given hard reg is really
+   defined currently.  */
+static int curr_hard_regno_costs_check = 0;
+/* Array used to check that cost of the corresponding hard reg (the
+   array element index) is really defined currently.  */
+static int hard_regno_costs_check[FIRST_PSEUDO_REGISTER];
+/* The current costs of allocation of hard regs.  Defined only if the
+   value of the corresponding element of the previous array is equal to
+   CURR_HARD_REGNO_COSTS_CHECK.	 */
+static int hard_regno_costs[FIRST_PSEUDO_REGISTER];
+
+/* Find and return best (or TRY_ONLY_HARD_REGNO) free hard register
+   for pseudo REGNO.  In the failure case, return a negative number.
+   Return through *COST the cost of usage of the hard register for the
+   pseudo.  Best free hard register has smallest cost of usage for
+   REGNO or smallest register bank if the cost is the same.  */
+static int
+find_hard_regno_for (int regno, int *cost, int try_only_hard_regno)
+{
+  HARD_REG_SET conflict_set;
+  int best_cost = INT_MAX, best_bank = INT_MAX, best_usage = INT_MAX;
+  lra_live_range_t r;
+  int p, i, j, rclass_size, best_hard_regno, bank, hard_regno;
+  int hr, conflict_hr, nregs;
+  enum machine_mode biggest_mode;
+  unsigned int k, conflict_regno;
+  int val, biggest_nregs, nregs_diff;
+  enum reg_class rclass;
+  bitmap_iterator bi;
+  bool *rclass_intersect_p;
+  HARD_REG_SET impossible_start_hard_regs;
+
+  COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs);
+  rclass = regno_allocno_class_array[regno];
+  rclass_intersect_p = ira_reg_classes_intersect_p[rclass];
+  curr_hard_regno_costs_check++;
+  sparseset_clear (conflict_reload_and_inheritance_pseudos);
+  sparseset_clear (live_range_hard_reg_pseudos);
+  IOR_HARD_REG_SET (conflict_set, lra_reg_info[regno].conflict_hard_regs);
+  biggest_mode = lra_reg_info[regno].biggest_mode;
+  for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+    {
+      EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start], 0, k, bi)
+	if (rclass_intersect_p[regno_allocno_class_array[k]])
+	  sparseset_set_bit (live_range_hard_reg_pseudos, k);
+      EXECUTE_IF_SET_IN_BITMAP (&live_reload_and_inheritance_pseudos[r->start],
+				0, k, bi)
+	if (lra_reg_info[k].preferred_hard_regno1 >= 0
+	    && live_pseudos_reg_renumber[k] < 0
+	    && rclass_intersect_p[regno_allocno_class_array[k]])
+	  sparseset_set_bit (conflict_reload_and_inheritance_pseudos, k);
+      for (p = r->start + 1; p <= r->finish; p++)
+	{
+	  lra_live_range_t r2;
+	  
+	  for (r2 = lra_start_point_ranges[p];
+	       r2 != NULL;
+	       r2 = r2->start_next)
+	    {
+	      if (r2->regno >= lra_constraint_new_regno_start
+		  && lra_reg_info[r2->regno].preferred_hard_regno1 >= 0
+		  && live_pseudos_reg_renumber[r2->regno] < 0
+		  && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
+		sparseset_set_bit (conflict_reload_and_inheritance_pseudos,
+				   r2->regno);
+	      if (live_pseudos_reg_renumber[r2->regno] >= 0
+		  && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
+		sparseset_set_bit (live_range_hard_reg_pseudos, r2->regno);
+	    }
+	}
+    }
+  if ((hard_regno = lra_reg_info[regno].preferred_hard_regno1) >= 0)
+    {
+      if (hard_regno_costs_check[hard_regno] != curr_hard_regno_costs_check)
+	hard_regno_costs[hard_regno] = 0;
+      hard_regno_costs_check[hard_regno] = curr_hard_regno_costs_check;
+      hard_regno_costs[hard_regno]
+	-= lra_reg_info[regno].preferred_hard_regno_profit1;
+      if ((hard_regno = lra_reg_info[regno].preferred_hard_regno2) >= 0)
+	{
+	  if (hard_regno_costs_check[hard_regno]
+	      != curr_hard_regno_costs_check)
+	    hard_regno_costs[hard_regno] = 0;
+	  hard_regno_costs_check[hard_regno] = curr_hard_regno_costs_check;
+	  hard_regno_costs[hard_regno]
+	    -= lra_reg_info[regno].preferred_hard_regno_profit2;
+	}
+    }
+#ifdef STACK_REGS
+  if (lra_reg_info[regno].no_stack_p)
+    for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
+      SET_HARD_REG_BIT (conflict_set, i);
+#endif
+  sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno);
+  val = lra_reg_info[regno].val;
+  CLEAR_HARD_REG_SET (impossible_start_hard_regs);
+  EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
+    if (val == lra_reg_info[conflict_regno].val)
+      {
+	conflict_hr = live_pseudos_reg_renumber[conflict_regno];
+	nregs = (hard_regno_nregs[conflict_hr]
+		 [lra_reg_info[conflict_regno].biggest_mode]);
+	/* Remember about multi-register pseudos.  For example, 2 hard
+	   register pseudos can start on the same hard register but can
+	   not start on HR and HR+1/HR-1.  */ 
+	for (hr = conflict_hr + 1;
+	     hr < FIRST_PSEUDO_REGISTER && hr < conflict_hr + nregs;
+	     hr++)
+	  SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
+	for (hr = conflict_hr - 1;
+	     hr >= 0 && hr + hard_regno_nregs[hr][biggest_mode] > conflict_hr;
+	     hr--)
+	  SET_HARD_REG_BIT (impossible_start_hard_regs, hr);
+      }
+    else
+      {
+	lra_add_hard_reg_set (live_pseudos_reg_renumber[conflict_regno],
+			      lra_reg_info[conflict_regno].biggest_mode,
+			      &conflict_set);
+	if (hard_reg_set_subset_p (reg_class_contents[rclass],
+				   conflict_set))
+	  return -1;
+      }
+  EXECUTE_IF_SET_IN_SPARSESET (conflict_reload_and_inheritance_pseudos,
+			       conflict_regno)
+    if (val != lra_reg_info[conflict_regno].val)
+      {
+	lra_assert (live_pseudos_reg_renumber[conflict_regno] < 0);
+	if ((hard_regno
+	     = lra_reg_info[conflict_regno].preferred_hard_regno1) >= 0)
+	  {
+	    if (hard_regno_costs_check[hard_regno]
+		!= curr_hard_regno_costs_check)
+	      hard_regno_costs[hard_regno] = 0;
+	    hard_regno_costs_check[hard_regno] = curr_hard_regno_costs_check;
+	    hard_regno_costs[hard_regno]
+	      += lra_reg_info[conflict_regno].preferred_hard_regno_profit1;
+	    if ((hard_regno
+		 = lra_reg_info[conflict_regno].preferred_hard_regno2) >= 0)
+	      {
+		if (hard_regno_costs_check[hard_regno]
+		    != curr_hard_regno_costs_check)
+		  hard_regno_costs[hard_regno] = 0;
+		hard_regno_costs_check[hard_regno]
+		  = curr_hard_regno_costs_check;
+		hard_regno_costs[hard_regno]
+		  += lra_reg_info[conflict_regno].preferred_hard_regno_profit2;
+	      }
+	  }
+      }
+  /* That is important for allocation of multi-word pseudos.  */
+  IOR_COMPL_HARD_REG_SET (conflict_set, reg_class_contents[rclass]);
+  lra_assert (rclass != NO_REGS);
+  rclass_size = ira_class_hard_regs_num[rclass];
+  best_hard_regno = -1;
+  hard_regno = ira_class_hard_regs[rclass][0];
+  biggest_nregs = hard_regno_nregs[hard_regno][biggest_mode];
+  nregs_diff = (biggest_nregs
+		- hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (regno)]);
+  for (i = 0; i < rclass_size; i++)
+    {
+      if (try_only_hard_regno >= 0)
+	hard_regno = try_only_hard_regno;
+      else
+	hard_regno = ira_class_hard_regs[rclass][i];
+      if (! lra_hard_reg_set_intersection_p (hard_regno,
+					     PSEUDO_REGNO_MODE (regno),
+					     conflict_set)
+	  /* We can not use prohibited_class_mode_regs because it is
+	     defined not for all classes.  */
+	  && HARD_REGNO_MODE_OK (hard_regno, PSEUDO_REGNO_MODE (regno))
+	  && ! TEST_HARD_REG_BIT (impossible_start_hard_regs, hard_regno)
+	  && (nregs_diff == 0
+#ifdef WORDS_BIG_ENDIAN
+	      || (hard_regno - nregs_diff >= 0
+		  && TEST_HARD_REG_BIT (reg_class_contents[rclass],
+					hard_regno - nregs_diff))
+#else
+	      || TEST_HARD_REG_BIT (reg_class_contents[rclass],
+				    hard_regno + nregs_diff)
+#endif
+	      ))
+	{
+	  if (hard_regno_costs_check[hard_regno]
+	      != curr_hard_regno_costs_check)
+	    {
+	      hard_regno_costs_check[hard_regno] = curr_hard_regno_costs_check;
+	      hard_regno_costs[hard_regno] = 0;
+	    }
+	  for (j = 0;
+	       j < hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (regno)];
+	       j++)
+	    if (! TEST_HARD_REG_BIT (call_used_reg_set, hard_regno + j)
+		&& ! df_regs_ever_live_p (hard_regno + j))
+	      /* It needs save restore.	 */
+	      hard_regno_costs[hard_regno]
+		+= 2 * ENTRY_BLOCK_PTR->next_bb->frequency;
+	  bank = targetm.register_bank (hard_regno);
+	  if (best_hard_regno < 0 || hard_regno_costs[hard_regno] < best_cost
+	      || (hard_regno_costs[hard_regno] == best_cost
+		  && (bank < best_bank
+		      /* Hard register usage leveling actually results
+			 in bigger code for targets with conditional
+			 execution like ARM because it reduces chance
+			 of if-conversion after LRA.  */
+		      || (! targetm.have_conditional_execution ()
+			  && bank == best_bank
+			  && best_usage > lra_hard_reg_usage[hard_regno]))))
+	    {
+	      best_hard_regno = hard_regno;
+	      best_cost = hard_regno_costs[hard_regno];
+	      best_bank = bank;
+	      best_usage = lra_hard_reg_usage[hard_regno];
+	    }
+	}
+      if (try_only_hard_regno >= 0)
+	break;
+    }
+  if (best_hard_regno >= 0)
+    *cost = best_cost - lra_reg_info[regno].freq;
+  return best_hard_regno;
+}
+
+/* Current value used for checking elements in
+   update_hard_regno_preference_check.	*/
+static int curr_update_hard_regno_preference_check;
+/* If an element value is equal to the above variable value, then the
+   corresponding regno has been processed for preference
+   propagation.	 */
+static int *update_hard_regno_preference_check;
+
+/* Update HARD_REGNO preference for pseudos connected (directly or
+   indirectly) to a pseudo with REGNO.	Use divisor DIV to the
+   corresponding copy frequency for the hard regno cost preference
+   calculation.	 The more indirectly a pseudo connected, the less the
+   cost preference.  It is achieved by increasing the divisor for each
+   next recursive level move.  */
+static void
+update_hard_regno_preference (int regno, int hard_regno, int div)
+{
+  int another_regno, cost;
+  lra_copy_t cp, next_cp;
+
+  /* Search depth 5 seems to be enough.	 */
+  if (div > (1 << 5))
+    return;
+  for (cp = lra_reg_info[regno].copies; cp != NULL; cp = next_cp)
+    {
+      if (cp->regno1 == regno)
+	{
+	  next_cp = cp->regno1_next;
+	  another_regno = cp->regno2;
+	}
+      else if (cp->regno2 == regno)
+	{
+	  next_cp = cp->regno2_next;
+	  another_regno = cp->regno1;
+	}
+      else
+	gcc_unreachable ();
+      if (reg_renumber[another_regno] < 0
+	  && (update_hard_regno_preference_check[another_regno]
+	      != curr_update_hard_regno_preference_check))
+	{
+	  update_hard_regno_preference_check[another_regno]
+	    = curr_update_hard_regno_preference_check;
+	  cost = cp->freq < div ? 1 : cp->freq / div;
+	  lra_setup_reload_pseudo_preferenced_hard_reg
+	    (another_regno, hard_regno, cost);
+	  update_hard_regno_preference (another_regno, hard_regno, div * 2);
+	}
+    }
+}
+
+/* Update REG_RENUMBER and other pseudo preferences by assignment of
+   HARD_REGNO to pseudo REGNO and print about it if PRINT_P.  */
+void
+lra_setup_reg_renumber (int regno, int hard_regno, bool print_p)
+{
+  int i, hr;
+
+  if ((hr = hard_regno) < 0)
+    hr = reg_renumber[regno];
+  reg_renumber[regno] = hard_regno;
+  lra_assert (hr >= 0);
+  for (i = 0; i < hard_regno_nregs[hr][PSEUDO_REGNO_MODE (regno)]; i++)
+    if (hard_regno < 0)
+      lra_hard_reg_usage[hr + i] -= lra_reg_info[regno].freq;
+    else
+      lra_hard_reg_usage[hr + i] += lra_reg_info[regno].freq;
+  if (print_p && lra_dump_file != NULL)
+    fprintf (lra_dump_file, "	   Assign %d to %sr%d (freq=%d)\n",
+	     reg_renumber[regno],
+	     regno < lra_constraint_new_regno_start
+	     ? ""
+	     : bitmap_bit_p (&lra_inheritance_pseudos, regno) ? "inheritance "
+	     : bitmap_bit_p (&lra_split_pseudos, regno) ? "split "
+	     : bitmap_bit_p (&lra_optional_reload_pseudos, regno)
+	     ? "optional reload ": "reload ",
+	     regno, lra_reg_info[regno].freq);
+  if (hard_regno >= 0)
+    {
+      curr_update_hard_regno_preference_check++;
+      update_hard_regno_preference (regno, hard_regno, 1);
+    }
+}
+
+/* Pseudos which should be not spilled for a particular pseudo.	 */
+static bitmap_head ignore_pseudos_bitmap;
+
+/* Bitmaps used to contain spill pseudos for given pseudo hard regno
+   and best spill pseudos for given pseudo (and best hard regno).  */
+static bitmap_head spill_pseudos_bitmap, best_spill_pseudos_bitmap;
+
+/* Current pseudo check for validity of elements in
+   TRY_HARD_REG_PSEUDOS.  */
+static int curr_pseudo_check;
+/* Array used for validity of elements in TRY_HARD_REG_PSEUDOS.	 */
+static int try_hard_reg_pseudos_check[FIRST_PSEUDO_REGISTER];
+/* Pseudos who hold given hard register at the considered points.  */
+static bitmap_head try_hard_reg_pseudos[FIRST_PSEUDO_REGISTER];
+
+/* Set up try_hard_reg_pseudos for given program point P and class
+   RCLASS.  Those are pseudos living at P and assigned to a hard
+   register of RCLASS.	In other words, those are pseudos which can be
+   spilled to assign a hard register of RCLASS to a pseudo living at
+   P.  */
+static void
+setup_try_hard_regno_pseudos (int p, enum reg_class rclass)
+{
+  int i, hard_regno;
+  enum machine_mode mode;
+  unsigned int spill_regno;
+  bitmap_iterator bi;
+
+  /* Find what pseudos could be spilled.  */
+  EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[p], 0, spill_regno, bi)
+    {
+      mode = PSEUDO_REGNO_MODE (spill_regno);
+      if (lra_hard_reg_set_intersection_p
+	  (live_pseudos_reg_renumber[spill_regno],
+	   mode, reg_class_contents[rclass]))
+	{
+	  hard_regno = live_pseudos_reg_renumber[spill_regno];
+	  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
+	    {
+	      if (try_hard_reg_pseudos_check[hard_regno + i]
+		  != curr_pseudo_check)
+		{
+		  try_hard_reg_pseudos_check[hard_regno + i]
+		    = curr_pseudo_check;
+		  bitmap_clear (&try_hard_reg_pseudos[hard_regno + i]);
+		}
+	      bitmap_set_bit (&try_hard_reg_pseudos[hard_regno + i],
+			      spill_regno);
+	    }
+	}
+    }
+}
+
+/* Assign temporarily HARD_REGNO to pseudo REGNO.  Temporary
+   assignment means that we might undo the data change.	 */
+static void
+assign_temporarily (int regno, int hard_regno)
+{
+  int p;
+  lra_live_range_t r;
+
+  for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+    {
+      for (p = r->start; p <= r->finish; p++)
+	if (hard_regno < 0)
+	  bitmap_clear_bit (&live_hard_reg_pseudos[p], regno);
+	else
+	  bitmap_set_bit (&live_hard_reg_pseudos[p], regno);
+    }
+  live_pseudos_reg_renumber[regno] = hard_regno;
+}
+
+/* Array used for sorting reload pseudos for subsequent allocation
+   after spilling some pseudo.	*/
+static int *sorted_reload_pseudos;
+
+/* Spill some pseudos for a reload pseudo REGNO and return hard
+   register which should be used for pseudo after spilling.  The
+   function adds spilled pseudos to SPILLED_PSEUDO_BITMAP.  When we
+   choose hard register (and pseudos occupying the hard registers and
+   to be spilled), we take into account not only how REGNO will
+   benefit from the spills but also how other reload pseudos not
+   assigned to hard registers yet benefit from the spills too.	*/
+static int
+spill_for (int regno, bitmap spilled_pseudo_bitmap)
+{
+  int i, j, n, p, hard_regno, best_hard_regno, cost, best_cost, rclass_size;
+  int reload_hard_regno, reload_cost;
+  enum machine_mode mode, mode2;
+  enum reg_class rclass;
+  HARD_REG_SET spilled_hard_regs;
+  unsigned int k, spill_regno, reload_regno, uid;
+  int insn_pseudos_num, best_insn_pseudos_num;
+  lra_live_range_t r;
+  bitmap_iterator bi, bi2;
+
+  rclass = regno_allocno_class_array[regno];
+  lra_assert (reg_renumber[regno] < 0 && rclass != NO_REGS);
+  bitmap_clear (&ignore_pseudos_bitmap);
+  bitmap_clear (&best_spill_pseudos_bitmap);
+  EXECUTE_IF_SET_IN_BITMAP (&lra_reg_info[regno].insn_bitmap, 0, uid, bi)
+    {
+      struct lra_insn_reg *ir;
+      
+      for (ir = lra_get_insn_regs (uid); ir != NULL; ir = ir->next)
+	if (ir->regno >= FIRST_PSEUDO_REGISTER)
+	  bitmap_set_bit (&ignore_pseudos_bitmap, ir->regno);
+    }
+  best_hard_regno = -1;
+  best_cost = INT_MAX;
+  best_insn_pseudos_num = INT_MAX;
+  rclass_size = ira_class_hard_regs_num[rclass];
+  mode = PSEUDO_REGNO_MODE (regno);
+  curr_pseudo_check++; /* Invalidate try_hard_reg_pseudos elements.  */
+  for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+    for (p = r->start; p <= r->finish; p++)
+      setup_try_hard_regno_pseudos (p, rclass);
+  for (i = 0; i < rclass_size; i++)
+    {
+      hard_regno = ira_class_hard_regs[rclass][i];
+      bitmap_clear (&spill_pseudos_bitmap);
+      for (j = hard_regno_nregs[hard_regno][mode] - 1; j >= 0; j--)
+	{
+	  if (try_hard_reg_pseudos_check[hard_regno + j] != curr_pseudo_check)
+	    continue;
+	  lra_assert (!bitmap_empty_p (&try_hard_reg_pseudos[hard_regno + j]));
+	  bitmap_ior_into (&spill_pseudos_bitmap,
+			   &try_hard_reg_pseudos[hard_regno + j]);
+	}
+      /* Spill pseudos.	 */
+      CLEAR_HARD_REG_SET (spilled_hard_regs);
+      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+	if ((int) spill_regno >= lra_constraint_new_regno_start
+	    /* ??? */
+	    && ! bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
+	    && ! bitmap_bit_p (&lra_split_pseudos, spill_regno)
+	    && ! bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno))
+	  goto fail;
+      insn_pseudos_num = 0;
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "	 Trying %d:", hard_regno);
+      sparseset_clear (live_range_reload_pseudos);
+      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+	{
+	  if (bitmap_bit_p (&ignore_pseudos_bitmap, spill_regno))
+	    insn_pseudos_num++;
+	  mode2 = PSEUDO_REGNO_MODE (spill_regno);
+	  update_lives (spill_regno, true);
+	  if (lra_dump_file != NULL)
+	    fprintf (lra_dump_file, " spill %d(freq=%d)",
+		     spill_regno, lra_reg_info[spill_regno].freq);
+	  lra_add_hard_reg_set (reg_renumber[spill_regno], mode2,
+				&spilled_hard_regs);
+	  for (r = lra_reg_info[spill_regno].live_ranges;
+	       r != NULL;
+	       r = r->next)
+	    {
+	      EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start],
+					0, k, bi2)
+		sparseset_set_bit (live_range_hard_reg_pseudos, k);
+	      for (p = r->start + 1; p <= r->finish; p++)
+		{
+		  lra_live_range_t r2;
+		  
+		  for (r2 = lra_start_point_ranges[p];
+		       r2 != NULL;
+		       r2 = r2->start_next)
+		    if (r2->regno >= lra_constraint_new_regno_start)
+		      sparseset_set_bit (live_range_reload_pseudos, r2->regno);
+		}
+	    }
+	}
+      /* We are trying to spill a reload pseudo.  That is wrong we
+	 should assign all reload pseudos, otherwise we cannot reuse
+	 the selected alternatives.  */
+      hard_regno = find_hard_regno_for (regno, &cost, -1);
+      if (hard_regno >= 0)
+	{
+	  assign_temporarily (regno, hard_regno);
+	  n = 0;
+	  EXECUTE_IF_SET_IN_SPARSESET (live_range_reload_pseudos, reload_regno)
+	    if (live_pseudos_reg_renumber[reload_regno] < 0
+		&& (hard_reg_set_intersect_p
+		    (reg_class_contents
+		     [regno_allocno_class_array[reload_regno]],
+		     spilled_hard_regs)))
+	      sorted_reload_pseudos[n++] = reload_regno;
+	  qsort (sorted_reload_pseudos, n, sizeof (int),
+		 reload_pseudo_compare_func);
+	  for (j = 0; j < n; j++)
+	    {
+	      reload_regno = sorted_reload_pseudos[j];
+	      if (live_pseudos_reg_renumber[reload_regno] < 0
+		  && (reload_hard_regno
+		      = find_hard_regno_for (reload_regno,
+					     &reload_cost, -1)) >= 0
+		  && (lra_hard_reg_set_intersection_p
+		      (reload_hard_regno, PSEUDO_REGNO_MODE (reload_regno),
+		       spilled_hard_regs)))
+		{
+		  if (lra_dump_file != NULL)
+		    fprintf (lra_dump_file, " assign %d(cost=%d)",
+			     reload_regno, reload_cost);
+		  assign_temporarily (reload_regno, reload_hard_regno);
+		  cost += reload_cost;
+		}
+	    }
+	  EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+	    {
+	      rtx x;
+	      
+	      cost += lra_reg_info[spill_regno].freq;
+	      if (ira_reg_equiv[spill_regno].memory != NULL
+		  || ira_reg_equiv[spill_regno].constant != NULL)
+		for (x = ira_reg_equiv[spill_regno].init_insns;
+		     x != NULL;
+		     x = XEXP (x, 1))
+		  cost -= REG_FREQ_FROM_BB (BLOCK_FOR_INSN (XEXP (x, 0)));
+	    }
+	  if (best_insn_pseudos_num > insn_pseudos_num
+	      || (best_insn_pseudos_num == insn_pseudos_num
+		  && best_cost > cost))
+	    {
+	      best_insn_pseudos_num = insn_pseudos_num;
+	      best_cost = cost;
+	      best_hard_regno = hard_regno;
+	      bitmap_copy (&best_spill_pseudos_bitmap, &spill_pseudos_bitmap);
+	      if (lra_dump_file != NULL)
+		fprintf (lra_dump_file, "	 Now best %d(cost=%d)\n",
+			 hard_regno, cost);
+	    }
+	  assign_temporarily (regno, -1);
+	  for (j = 0; j < n; j++)
+	    {
+	      reload_regno = sorted_reload_pseudos[j];
+	      if (live_pseudos_reg_renumber[reload_regno] >= 0)
+		assign_temporarily (reload_regno, -1);
+	    }
+	}
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "\n");
+      /* Restore the live hard reg pseudo info for spilled pseudos.  */
+      EXECUTE_IF_SET_IN_BITMAP (&spill_pseudos_bitmap, 0, spill_regno, bi)
+	update_lives (spill_regno, false);
+    fail:
+      ;
+    }
+  /* Spill: */
+  EXECUTE_IF_SET_IN_BITMAP (&best_spill_pseudos_bitmap, 0, spill_regno, bi)
+    {
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "      Spill %sr%d(hr=%d, freq=%d) for r%d\n",
+		 ((int) spill_regno < lra_constraint_new_regno_start
+		  ? ""
+		  : bitmap_bit_p (&lra_inheritance_pseudos, spill_regno)
+		  ? "inheritance "
+		  : bitmap_bit_p (&lra_split_pseudos, spill_regno)
+		  ? "split "
+		  : bitmap_bit_p (&lra_optional_reload_pseudos, spill_regno)
+		  ? "optional reload " : "reload "),
+		 spill_regno, reg_renumber[spill_regno],
+		 lra_reg_info[spill_regno].freq, regno);
+      update_lives (spill_regno, true);
+      lra_setup_reg_renumber (spill_regno, -1, false);
+    }
+  bitmap_ior_into (spilled_pseudo_bitmap, &best_spill_pseudos_bitmap);
+  return best_hard_regno;
+}
+
+/* Assign HARD_REGNO to REGNO.	*/
+static void
+assign_hard_regno (int hard_regno, int regno)
+{
+  int i;
+
+  lra_assert (hard_regno >= 0);
+  lra_setup_reg_renumber (regno, hard_regno, true);
+  update_lives (regno, false);
+  for (i = 0;
+       i < hard_regno_nregs[hard_regno][lra_reg_info[regno].biggest_mode];
+       i++)
+    df_set_regs_ever_live (hard_regno + i, true);
+}
+
+/* Array used for sorting different pseudos.  */
+static int *sorted_pseudos;
+
+/* Constraint transformation can use equivalences and they can
+   contains pseudos assigned to hard registers.	 Such equivalence
+   usage might create new conflicts of pseudos with hard registers
+   (like ones used for parameter passing or call clobbered ones) or
+   other pseudos assigned to the same hard registers.  Another very
+   rare risky transformation is restoring whole multi-register pseudo
+   when only one subreg lives and unused hard register is used already
+   for something else.
+
+   Process pseudos assigned to hard registers (most frequently used
+   first), spill if a conflict is found, and mark the spilled pseudos
+   in SPILLED_PSEUDO_BITMAP.  Set up LIVE_HARD_REG_PSEUDOS from
+   pseudos, assigned to hard registers.	 */
+static void
+setup_live_pseudos_and_spill_after_risky_transforms (bitmap
+						     spilled_pseudo_bitmap)
+{
+  int p, i, j, n, regno, hard_regno;
+  unsigned int k, conflict_regno;
+  int val;
+  HARD_REG_SET conflict_set;
+  enum machine_mode mode;
+  lra_live_range_t r;
+  bitmap_iterator bi;
+
+  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+    if (reg_renumber[i] >= 0 && lra_reg_info[i].nrefs > 0)
+      {
+	if (lra_risky_transformations_p)
+	  sorted_pseudos[n++] = i;
+	else
+	  update_lives (i, false);
+      }
+  if (! lra_risky_transformations_p)
+    return;
+  qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
+  for (i = 0; i < n; i++)
+    {
+      regno = sorted_pseudos[i];
+      hard_regno = reg_renumber[regno];
+      lra_assert (hard_regno >= 0);
+      mode = lra_reg_info[regno].biggest_mode;
+      sparseset_clear (live_range_hard_reg_pseudos);
+      for (r = lra_reg_info[regno].live_ranges; r != NULL; r = r->next)
+	{
+	  EXECUTE_IF_SET_IN_BITMAP (&live_hard_reg_pseudos[r->start], 0, k, bi)
+	    sparseset_set_bit (live_range_hard_reg_pseudos, k);
+	  for (p = r->start + 1; p <= r->finish; p++)
+	    {
+	      lra_live_range_t r2;
+	      
+	      for (r2 = lra_start_point_ranges[p];
+		   r2 != NULL;
+		   r2 = r2->start_next)
+		if (live_pseudos_reg_renumber[r2->regno] >= 0)
+		  sparseset_set_bit (live_range_hard_reg_pseudos, r2->regno);
+	    }
+	}
+      COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs);
+      IOR_HARD_REG_SET (conflict_set, lra_reg_info[regno].conflict_hard_regs);
+      val = lra_reg_info[regno].val;
+      EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
+	if (val != lra_reg_info[conflict_regno].val
+	    /* If it is multi-register pseudos they should start on
+	       the same hard register.	*/
+	    || hard_regno != reg_renumber[conflict_regno])
+	  lra_add_hard_reg_set (reg_renumber[conflict_regno],
+				lra_reg_info[conflict_regno].biggest_mode,
+				&conflict_set);
+      if (! lra_hard_reg_set_intersection_p (hard_regno, mode, conflict_set))
+	{
+	  update_lives (regno, false);
+	  continue;
+	}
+      bitmap_set_bit (spilled_pseudo_bitmap, regno);
+      for (j = 0;
+	   j < hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (regno)];
+	   j++)
+	lra_hard_reg_usage[hard_regno + j] -= lra_reg_info[regno].freq;
+      reg_renumber[regno] = -1;
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "    Spill r%d after risky transformations\n",
+		 regno);
+    }
+}
+
+/* Improve allocation by assigning the same hard regno of inheritance
+   pseudos to the connected pseudos.  We need this because inheritance
+   pseudos are allocated after reload pseudos in the thread and when
+   we assign a hard register to a reload pseudo we don't know yet that
+   the connected inheritance pseudos can get the same hard register.
+   Add pseudos with changed allocation to bitmap CHANGED_PSEUDOS.  */
+static void
+improve_inheritance (bitmap changed_pseudos)
+{
+  unsigned int k;
+  int regno, another_regno, hard_regno, another_hard_regno, cost, i, n;
+  lra_copy_t cp, next_cp;
+  bitmap_iterator bi;
+
+  n = 0;
+  EXECUTE_IF_SET_IN_BITMAP (&lra_inheritance_pseudos, 0, k, bi)
+    if (reg_renumber[k] >= 0 && lra_reg_info[k].nrefs != 0)
+      sorted_pseudos[n++] = k;
+  qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
+  for (i = 0; i < n; i++)
+    {
+      regno = sorted_pseudos[i];
+      hard_regno = reg_renumber[regno];
+      lra_assert (hard_regno >= 0);
+      for (cp = lra_reg_info[regno].copies; cp != NULL; cp = next_cp)
+	{
+	  if (cp->regno1 == regno)
+	    {
+	      next_cp = cp->regno1_next;
+	      another_regno = cp->regno2;
+	    }
+	  else if (cp->regno2 == regno)
+	    {
+	      next_cp = cp->regno2_next;
+	      another_regno = cp->regno1;
+	    }
+	  else
+	    gcc_unreachable ();
+	  /* Don't change reload pseudo allocation.  It might have
+	     this allocation for a purpose (e.g. bound to another
+	     pseudo) and changing it can result in LRA cycling.	 */
+	  if (another_regno < lra_constraint_new_regno_start
+	      && (another_hard_regno = reg_renumber[another_regno]) >= 0
+	      && another_hard_regno != hard_regno)
+	    {
+	      if (lra_dump_file != NULL)
+		fprintf
+		  (lra_dump_file,
+		   "	Improving inheritance for %d(%d) and %d(%d)...\n",
+		   regno, hard_regno, another_regno, another_hard_regno);
+	      update_lives (another_regno, true);
+	      lra_setup_reg_renumber (another_regno, -1, false);
+	      if (hard_regno
+		  == find_hard_regno_for (another_regno, &cost, hard_regno))
+		assign_hard_regno (hard_regno, another_regno);
+	      else
+		assign_hard_regno (another_hard_regno, another_regno);
+	      bitmap_set_bit (changed_pseudos, another_regno);
+	    }
+	}
+    }
+}
+
+
+/* Bitmap finally containing all pseudos spilled on this assignment
+   pass.  */
+static bitmap_head all_spilled_pseudos;
+/* All pseudos whose allocation was changed.  */
+static bitmap_head changed_pseudo_bitmap;
+
+/* Assign hard registers to reload pseudos and other pseudos.  */
+static void
+assign_by_spills (void)
+{
+  int i, n, nfails, iter, regno, hard_regno, cost, restore_regno;
+  rtx insn;
+  basic_block bb;
+  bitmap_head changed_insns, do_not_assign_nonreload_pseudos;
+  bitmap_head non_reload_pseudos;
+  unsigned int u;
+  bitmap_iterator bi;
+
+  for (n = 0, i = lra_constraint_new_regno_start; i < max_reg_num (); i++)
+    if (reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
+	&& regno_allocno_class_array[i] != NO_REGS)
+      sorted_pseudos[n++] = i;
+  bitmap_initialize (&ignore_pseudos_bitmap, &reg_obstack);
+  bitmap_initialize (&spill_pseudos_bitmap, &reg_obstack);
+  bitmap_initialize (&best_spill_pseudos_bitmap, &reg_obstack);
+  update_hard_regno_preference_check = (int *) xmalloc (sizeof (int)
+							* max_reg_num ());
+  memset (update_hard_regno_preference_check, 0,
+	  sizeof (int) * max_reg_num ());
+  curr_update_hard_regno_preference_check = 0;
+  memset (try_hard_reg_pseudos_check, 0, sizeof (try_hard_reg_pseudos_check));
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    bitmap_initialize (&try_hard_reg_pseudos[i], &reg_obstack);
+  curr_pseudo_check = 0;
+  bitmap_initialize (&changed_insns, &reg_obstack);
+  bitmap_initialize (&non_reload_pseudos, &reg_obstack);
+  bitmap_ior (&non_reload_pseudos, &lra_inheritance_pseudos, &lra_split_pseudos);
+  bitmap_ior_into (&non_reload_pseudos, &lra_optional_reload_pseudos);
+  for (iter = 0; iter <= 1; iter++)
+    {
+      qsort (sorted_pseudos, n, sizeof (int), reload_pseudo_compare_func);
+      nfails = 0;
+      for (i = 0; i < n; i++)
+	{
+	  regno = sorted_pseudos[i];
+	  if (lra_dump_file != NULL)
+	    fprintf (lra_dump_file, "	 Assigning to %d "
+		     "(cl=%s, orig=%d, freq=%d, tfirst=%d, tfreq=%d)...\n",
+		     regno, reg_class_names[regno_allocno_class_array[regno]],
+		     ORIGINAL_REGNO (regno_reg_rtx[regno]),
+		     lra_reg_info[regno].freq, regno_assign_info[regno].first,
+		     regno_assign_info[regno_assign_info[regno].first].freq);
+	  hard_regno = find_hard_regno_for (regno, &cost, -1);
+	  if (hard_regno < 0
+	      && ! bitmap_bit_p (&non_reload_pseudos, regno))
+	    hard_regno = spill_for (regno, &all_spilled_pseudos);
+	  if (hard_regno < 0)
+	    {
+	      if (! bitmap_bit_p (&non_reload_pseudos, regno))
+		sorted_pseudos[nfails++] = regno;
+	    }
+	  else
+	    {
+	      /* Remember that reload pseudos can be spilled on the
+		 1st pass.  */
+	      bitmap_clear_bit (&all_spilled_pseudos, regno);
+	      assign_hard_regno (hard_regno, regno);
+	    }
+	}
+      if (nfails == 0)
+	break;
+      lra_assert (iter == 0);
+      /* This is a very rare event.  We can not assign a hard
+	 register to reload pseudo because the hard register was
+	 assigned to another reload pseudo on a previous
+	 assignment pass.  For x86 example, on the 1st pass we
+	 assigned CX (although another hard register could be used
+	 for this) to reload pseudo in an insn, on the 2nd pass we
+	 need CX (and only this) hard register for a new reload
+	 pseudo in the same insn.  */
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "  2nd iter for reload pseudo assignments:\n");
+      for (i = 0; i < nfails; i++)
+	{
+	  if (lra_dump_file != NULL)
+	    fprintf (lra_dump_file, "	 Reload r%d assignment failure\n",
+		     sorted_pseudos[i]);
+	  bitmap_ior_into (&changed_insns,
+			   &lra_reg_info[sorted_pseudos[i]].insn_bitmap);
+	}
+      FOR_EACH_BB (bb)
+	FOR_BB_INSNS (bb, insn)
+	if (bitmap_bit_p (&changed_insns, INSN_UID (insn)))
+	  {
+	    lra_insn_recog_data_t data;
+	    struct lra_insn_reg *r;
+	      
+	    data = lra_get_insn_recog_data (insn);
+	    for (r = data->regs; r != NULL; r = r->next)
+	      {
+		regno = r->regno;
+		/* We can use inheritance pseudos in original insns
+		   (not reload ones).  */
+		if (regno < lra_constraint_new_regno_start
+		    || bitmap_bit_p (&lra_inheritance_pseudos, regno)
+		    || reg_renumber[regno] < 0)
+		  continue;
+		sorted_pseudos[nfails++] = regno;
+		if (lra_dump_file != NULL)
+		  fprintf (lra_dump_file,
+			   "	  Spill reload r%d(hr=%d, freq=%d)\n",
+			   regno, reg_renumber[regno],
+			   lra_reg_info[regno].freq);
+		update_lives (regno, true);
+		lra_setup_reg_renumber (regno, -1, false);
+	      }
+	  }
+      n = nfails;
+    }
+  improve_inheritance (&changed_pseudo_bitmap);
+  bitmap_clear (&non_reload_pseudos);
+  bitmap_clear (&changed_insns);
+  /* We should not assign to original pseudos of inheritance pseudos
+     or split pseudos if any its inheritance pseudo did not get hard
+     register or any its split pseudo was not split because undo
+     inheritance/split pass will extend live range of such inheritance
+     or split pseudos.	*/
+  bitmap_initialize (&do_not_assign_nonreload_pseudos, &reg_obstack);
+  EXECUTE_IF_SET_IN_BITMAP (&lra_inheritance_pseudos, 0, u, bi)
+    if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
+	&& reg_renumber[u] < 0 && bitmap_bit_p (&lra_inheritance_pseudos, u))
+      bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
+  EXECUTE_IF_SET_IN_BITMAP (&lra_split_pseudos, 0, u, bi)
+    if ((restore_regno = lra_reg_info[u].restore_regno) >= 0
+	&& reg_renumber[u] >= 0 && bitmap_bit_p (&lra_split_pseudos, u))
+      bitmap_set_bit (&do_not_assign_nonreload_pseudos, restore_regno);
+  for (n = 0, i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+    if (((i < lra_constraint_new_regno_start
+	  && ! bitmap_bit_p (&do_not_assign_nonreload_pseudos, i))
+	 || (bitmap_bit_p (&lra_inheritance_pseudos, i)
+	     && lra_reg_info[i].restore_regno >= 0)
+	 || (bitmap_bit_p (&lra_split_pseudos, i)
+	     && lra_reg_info[i].restore_regno >= 0)
+	 || bitmap_bit_p (&lra_optional_reload_pseudos, i))
+	&& reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
+	&& regno_allocno_class_array[i] != NO_REGS)
+      sorted_pseudos[n++] = i;
+  bitmap_clear (&do_not_assign_nonreload_pseudos);
+  if (n != 0 && lra_dump_file != NULL)
+    fprintf (lra_dump_file, "  Reassing non-reload pseudos\n");
+  qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
+  for (i = 0; i < n; i++)
+    {
+      regno = sorted_pseudos[i];
+      hard_regno = find_hard_regno_for (regno, &cost, -1);
+      if (hard_regno >= 0)
+	{
+	  assign_hard_regno (hard_regno, regno);
+	  /* We change allocation for non-reload pseudo on this
+	     iteration -- mark the pseudo for invalidation of used
+	     alternatives of insns containing the pseudo.  */
+	  bitmap_set_bit (&changed_pseudo_bitmap, regno);
+	}
+    }
+  free (update_hard_regno_preference_check);
+  bitmap_clear (&best_spill_pseudos_bitmap);
+  bitmap_clear (&spill_pseudos_bitmap);
+  bitmap_clear (&ignore_pseudos_bitmap);
+}
+
+
+/* Entry function to assign hard registers to new reload pseudos
+   starting with LRA_CONSTRAINT_NEW_REGNO_START (by possible spilling
+   of old pseudos) and possibly to the old pseudos.  The function adds
+   what insns to process for the next constraint pass.	Those are all
+   insns who contains non-reload and non-inheritance pseudos with
+   changed allocation.
+
+   Return true if we did not spill any non-reload and non-inheritance
+   pseudos.  */
+bool
+lra_assign (void)
+{
+  int i;
+  unsigned int u;
+  bitmap_iterator bi;
+  bitmap_head insns_to_process;
+  bool no_spills_p;
+
+  init_lives ();
+  sorted_pseudos = (int *) xmalloc (sizeof (int) * max_reg_num ());
+  sorted_reload_pseudos = (int *) xmalloc (sizeof (int) * max_reg_num ());
+  regno_allocno_class_array
+    = (enum reg_class *) xmalloc (sizeof (enum reg_class) * max_reg_num ());
+  for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+    regno_allocno_class_array[i] = lra_get_allocno_class (i);
+  init_regno_assign_info ();
+  bitmap_initialize (&all_spilled_pseudos, &reg_obstack);
+  setup_live_pseudos_and_spill_after_risky_transforms (&all_spilled_pseudos);
+#ifdef ENABLE_CHECKING
+  for (i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+    if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0
+	&& lra_reg_info[i].call_p
+	&& lra_hard_reg_set_intersection_p (reg_renumber[i],
+					    PSEUDO_REGNO_MODE (i),
+					    call_used_reg_set))
+      gcc_unreachable ();
+#endif
+  /* Setup insns to process on the next constraint pass.  */
+  bitmap_initialize (&changed_pseudo_bitmap, &reg_obstack);
+  init_live_reload_and_inheritance_pseudos ();
+  assign_by_spills ();
+  finish_live_reload_and_inheritance_pseudos ();
+  bitmap_ior_into (&changed_pseudo_bitmap, &all_spilled_pseudos);
+  no_spills_p = true;
+  EXECUTE_IF_SET_IN_BITMAP (&all_spilled_pseudos, 0, u, bi)
+    /* We ignore spilled pseudos created on last inheritance pass
+       because they will be removed.  */
+    if (lra_reg_info[u].restore_regno < 0)
+      {
+	no_spills_p = false;
+	break;
+      }
+  bitmap_clear (&all_spilled_pseudos);
+  bitmap_initialize (&insns_to_process, &reg_obstack);
+  EXECUTE_IF_SET_IN_BITMAP (&changed_pseudo_bitmap, 0, u, bi)
+    bitmap_ior_into (&insns_to_process, &lra_reg_info[u].insn_bitmap);
+  bitmap_clear (&changed_pseudo_bitmap);
+  EXECUTE_IF_SET_IN_BITMAP (&insns_to_process, 0, u, bi)
+    {
+      lra_push_insn_by_uid (u);
+      /* Invalidate alternatives for insn should be processed.	*/
+      lra_set_used_insn_alternative_by_uid (u, -1);
+    }
+  bitmap_clear (&insns_to_process);
+  finish_regno_assign_info ();
+  free (regno_allocno_class_array);
+  free (sorted_pseudos);
+  free (sorted_reload_pseudos);
+  finish_lives ();
+  return no_spills_p;
+}
Index: lra-int.h
===================================================================
--- lra-int.h	(revision 0)
+++ lra-int.h	(working copy)
@@ -0,0 +1,474 @@ 
+/* Local Register Allocator (LRA) intercommunication header file.
+   Copyright (C) 2010, 2011, 2012
+   Free Software Foundation, Inc.
+   Contributed by Vladimir Makarov <vmakarov@redhat.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.	If not see
+<http://www.gnu.org/licenses/>.	 */
+
+#include "lra.h"
+#include "bitmap.h"
+#include "insn-attr.h"
+
+#ifdef ENABLE_CHECKING
+#define lra_assert(c) gcc_assert (c)
+#else
+/* Always define and include C, so that warnings for empty body in an
+  ‘if’ statement and unused variable do not occur.  */
+#define lra_assert(c) ((void)(0 && (c)))
+#endif
+
+/* The parameter used to prevent infinite reloading for an insn.  Each
+   insn operands might require a reload and, if it is a memory, its
+   base and index registers might require a reload too.	 */
+#define LRA_MAX_INSN_RELOADS (MAX_RECOG_OPERANDS * 3)
+
+/* Return the hard register which given pseudo REGNO assigned to.
+   Negative value means that the register got memory or we don't know
+   allocation yet.  */
+static inline int
+lra_get_regno_hard_regno (int regno)
+{
+  resize_reg_info ();
+  return reg_renumber[regno];
+}
+
+typedef struct lra_live_range *lra_live_range_t;
+
+/* The structure describes program points where a given pseudo lives.
+   The live ranges can be used to find conflicts with other pseudos.
+   If the live ranges of two pseudos are intersected, the pseudos are
+   in conflict.	 */
+struct lra_live_range
+{
+  /* Pseudo regno whose live range is described by given
+     structure.	 */
+  int regno;
+  /* Program point range.  */
+  int start, finish;
+  /* Next structure describing program points where the pseudo
+     lives.  */
+  lra_live_range_t next;
+  /* Pointers to structures with the same start/finish.	 */
+  lra_live_range_t start_next, finish_next;
+};
+
+typedef struct lra_copy *lra_copy_t;
+
+/* Copy between pseudos which affects assigning hard registers.	 */
+struct lra_copy
+{
+  /* True if regno1 is the destination of the copy.  */
+  bool regno1_dest_p;
+  /* Execution frequency of the copy.  */
+  int freq;
+  /* Pseudos connected by the copy.  REGNO1 < REGNO2.  */
+  int regno1, regno2;
+  /* Next copy with correspondingly REGNO1 and REGNO2.	*/
+  lra_copy_t regno1_next, regno2_next;
+};
+
+/* Common info about a register (pseudo or hard register).  */
+struct lra_reg
+{
+  /* Bitmap of UIDs of insns (including debug insns) referring the
+     reg.  */
+  bitmap_head insn_bitmap;
+  /* The following fields are defined only for pseudos.	 */
+  /* Hard registers with which the pseudo conflicts.  */
+  HARD_REG_SET conflict_hard_regs;
+  /* We assign hard registers to reload pseudos which can occur in few
+     places.  So two hard register preferences are enough for them.
+     The following fields define the preferred hard registers.	If
+     there are no such hard registers the first field value is
+     negative.	If there is only one preferred hard register, the 2nd
+     field is negative.	 */
+  int preferred_hard_regno1, preferred_hard_regno2;
+  /* Profits to use the corresponding preferred hard registers.	 If
+     the both hard registers defined, the first hard register has not
+     less profit than the second one.  */
+  int preferred_hard_regno_profit1, preferred_hard_regno_profit2;
+#ifdef STACK_REGS
+  /* True if the pseudo should not be assigned to a stack register.  */
+  bool no_stack_p;
+#endif
+#ifdef ENABLE_CHECKING
+  /* True if the pseudo crosses a call.	 It is setup in lra-lives.c
+     and used to check that the pseudo crossing a call did not get a
+     call used hard register.  */
+  bool call_p;
+#endif
+  /* Number of references and execution frequencies of the register in
+     *non-debug* insns.	 */
+  int nrefs, freq;
+  int last_reload;
+  /* Regno used to undo the inheritance.  It can be non-zero only
+     between couple of inheritance and undo inheritance passes.	 */
+  int restore_regno;
+  /* Value holding by register.	 If the pseudos have the same value
+     they do not conflict.  */
+  int val;
+  /* These members are set up in lra-lives.c and updated in
+     lra-coalesce.c.  */
+  /* The biggest size mode in which each pseudo reg is referred in
+     whole function (possibly via subreg).  */
+  enum machine_mode biggest_mode;
+  /* Live ranges of the pseudo.	 */
+  lra_live_range_t live_ranges;
+  /* This member is set up in lra-lives.c for subsequent
+     assignments.  */
+  lra_copy_t copies;
+};
+
+/* References to the common info about each register.  */
+extern struct lra_reg *lra_reg_info;
+
+/* Static info about each insn operand (common for all insns with the
+   same ICODE).	 Warning: if the structure definition is changed, the
+   initializer for debug_operand_data in lra.c should be changed
+   too.	 */
+struct lra_operand_data
+{
+  /* The machine description constraint string of the operand.	*/
+  const char *constraint;
+  /* It is taken only from machine description (which is different
+     from recog_data.operand_mode) and can be of VOIDmode.  */
+  ENUM_BITFIELD(machine_mode) mode : 16;
+  /* The type of the operand (in/out/inout).  */
+  ENUM_BITFIELD (op_type) type : 8;
+  /* Through if accessed through STRICT_LOW.  */
+  unsigned int strict_low : 1;
+  /* True if the operand is an operator.  */
+  unsigned int is_operator : 1;
+  /* True if there is an early clobber alternative for this operand.
+     This field is set up every time when corresponding
+     operand_alternative in lra_static_insn_data is set up.  */
+  unsigned int early_clobber : 1;
+};
+
+/* Info about register in an insn.  */
+struct lra_insn_reg
+{
+  /* The biggest mode through which the insn refers to the register
+     (remember the register can be accessed through a subreg in the
+     insn).  */
+  ENUM_BITFIELD(machine_mode) biggest_mode : 16;
+  /* The type of the corresponding operand which is the register.  */
+  ENUM_BITFIELD (op_type) type : 8;
+  /* True if the reg is accessed through a subreg and the subreg is
+     just a part of the register.  */
+  unsigned int subreg_p : 1;
+  /* True if there is an early clobber alternative for this
+     operand.  */
+  unsigned int early_clobber : 1;
+  /* The corresponding regno of the register.  */
+  int regno;
+  /* Next reg info of the same insn.  */
+  struct lra_insn_reg *next;
+};
+
+/* Static part (common info for insns with the same ICODE) of LRA
+   internal insn info.	It exists in at most one exemplar for each
+   non-negative ICODE.	Warning: if the structure definition is
+   changed, the initializer for debug_insn_static_data in lra.c should
+   be changed too.  */
+struct lra_static_insn_data
+{
+  /* Static info about each insn operand.  */
+  struct lra_operand_data *operand;
+  /* Each duplication refers to the number of the corresponding
+     operand which is duplicated.  */
+  int *dup_num;
+  /* The number of an operand marked as commutative, -1 otherwise.  */
+  int commutative;
+  /* Number of operands, duplications, and alternatives of the
+     insn.  */
+  char n_operands;
+  char n_dups;
+  char n_alternatives;
+  /* Insns in machine description (or clobbers in asm) may contain
+     explicit hard regs which are not operands.	 The following list
+     describes such hard registers.  */
+  struct lra_insn_reg *hard_regs;
+  /* Array [n_alternatives][n_operand] of static constraint info for
+     given operand in given alternative.  This info can be changed if
+     the target reg info is changed.  */
+  struct operand_alternative *operand_alternative;
+};
+
+/* LRA internal info about an insn (LRA internal insn
+   representation).  */
+struct lra_insn_recog_data
+{
+  int icode; /* The insn code.	*/
+  rtx insn; /* The insn itself.	 */
+  /* Common data for insns with the same ICODE.	 */
+  struct lra_static_insn_data *insn_static_data;
+  /* Two arrays of size correspondingly equal to the operand and the
+     duplication numbers: */
+  rtx **operand_loc; /* The operand locations, NULL if no operands.  */
+  rtx **dup_loc; /* The dup locations, NULL if no dups.	 */
+  /* Number of hard registers implicitly used in given call insn.  The
+     value can be NULL or points to array of the hard register numbers
+     ending with a negative value.  */
+  int *arg_hard_regs;
+#ifdef HAVE_ATTR_enabled
+  /* Alternative enabled for the insn.	NULL for debug insns.  */
+  bool *alternative_enabled_p;
+#endif
+  /* The alternative should be used for the insn, -1 if invalid, or we
+     should try to use any alternative, or the insn is a debug
+     insn.  */
+  int used_insn_alternative;
+  struct lra_insn_reg *regs;  /* Always NULL for a debug insn.	*/
+};
+
+typedef struct lra_insn_recog_data *lra_insn_recog_data_t;
+
+/* lra.c: */
+
+extern FILE *lra_dump_file;
+
+extern bool lra_reg_spill_p;
+
+extern HARD_REG_SET lra_no_alloc_regs;
+
+extern int lra_insn_recog_data_len;
+extern lra_insn_recog_data_t *lra_insn_recog_data;
+
+extern bitmap_head lra_constraint_insn_stack_bitmap;
+extern VEC (rtx, heap) *lra_constraint_insn_stack;
+
+extern int lra_curr_reload_num;
+
+extern void lra_push_insn (rtx);
+extern void lra_push_insn_by_uid (unsigned int);
+extern void lra_push_insn_and_update_insn_regno_info (rtx);
+
+extern rtx lra_create_new_reg_with_unique_value (enum machine_mode, rtx,
+						 enum reg_class, const char *);
+extern void lra_set_regno_unique_value (int);
+extern void lra_invalidate_insn_data (rtx);
+extern void lra_set_insn_deleted (rtx);
+extern void lra_delete_dead_insn (rtx);
+extern void lra_emit_add (rtx, rtx, rtx);
+extern void lra_emit_move (rtx, rtx);
+extern void lra_update_dups (lra_insn_recog_data_t, signed char *);
+
+extern void lra_process_new_insns (rtx, rtx, rtx, const char *);
+
+extern lra_insn_recog_data_t lra_set_insn_recog_data (rtx);
+extern lra_insn_recog_data_t lra_update_insn_recog_data (rtx);
+extern void lra_set_used_insn_alternative (rtx, int);
+extern void lra_set_used_insn_alternative_by_uid (int, int);
+
+extern void lra_invalidate_insn_regno_info (rtx);
+extern void lra_update_insn_regno_info (rtx);
+extern struct lra_insn_reg *lra_get_insn_regs (int);
+
+extern void lra_expand_reg_info (void);
+extern void lra_free_copies (void);
+extern void lra_create_copy (int, int, int);
+extern lra_copy_t lra_get_copy (int);
+extern bool lra_former_scratch_p (int);
+extern bool lra_former_scratch_operand_p (rtx, int);
+
+extern int lra_constraint_new_regno_start;
+extern bitmap_head lra_inheritance_pseudos;
+extern bitmap_head lra_split_pseudos;
+extern bitmap_head lra_optional_reload_pseudos;
+extern int lra_constraint_new_insn_uid_start;
+
+/* lra-constraints.c: */
+
+extern bitmap_head lra_matched_pseudos;
+extern bitmap_head lra_bound_pseudos;
+
+extern rtx lra_secondary_memory[NUM_MACHINE_MODES];
+
+extern int lra_constraint_offset (int, enum machine_mode);
+
+extern int lra_constraint_iter;
+extern int lra_constraint_iter_after_spill;
+extern bool lra_risky_transformations_p;
+extern int lra_inheritance_iter;
+extern int lra_undo_inheritance_iter;
+extern bool lra_constraints (bool);
+extern void lra_contraints_init (void);
+extern void lra_contraints_finish (void);
+extern void lra_inheritance (void);
+extern bool lra_undo_inheritance (void);
+
+/* lra-lives.c: */
+
+extern int lra_live_max_point;
+extern lra_live_range_t *lra_start_point_ranges, *lra_finish_point_ranges;
+extern int *lra_point_freq;
+
+extern int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER];
+
+extern int lra_live_range_iter;
+extern void lra_create_live_ranges (bool);
+extern lra_live_range_t lra_copy_live_range_list (lra_live_range_t);
+extern lra_live_range_t lra_merge_live_ranges (lra_live_range_t,
+					       lra_live_range_t);
+extern bool lra_intersected_live_ranges_p (lra_live_range_t,
+					   lra_live_range_t);
+extern bool lra_live_range_in_p (lra_live_range_t, lra_live_range_t);
+extern void lra_print_live_range_list (FILE *, lra_live_range_t);
+extern void lra_debug_live_range_list (lra_live_range_t);
+extern void lra_debug_pseudo_live_ranges (int);
+extern void lra_debug_live_ranges (void);
+extern void lra_clear_live_ranges (void);
+extern void lra_live_ranges_init (void);
+extern void lra_live_ranges_finish (void);
+extern void lra_setup_reload_pseudo_preferenced_hard_reg (int, int, int);
+
+/* lra-assigns.c: */
+
+extern void lra_setup_reg_renumber (int, int, bool);
+extern bool lra_assign (void);
+
+
+/* lra-coalesce.c: */
+
+extern int lra_coalesce_iter;
+extern bool lra_coalesce (void);
+
+/* lra-saves.c: */
+
+extern bool lra_save_restore (void);
+
+/* lra-spills.c:  */
+
+extern bool lra_need_for_spills_p (void);
+extern void lra_spill (void);
+extern void lra_hard_reg_substitution (void);
+
+
+/* lra-elimination.c: */
+
+extern void lra_debug_elim_table (void);
+extern int lra_get_elimation_hard_regno (int);
+extern rtx lra_eliminate_regs_1 (rtx, enum machine_mode, bool, bool, bool);
+extern void lra_eliminate (bool);
+
+extern void lra_eliminate_reg_if_possible (rtx *);
+
+
+
+/* The function returns TRUE if at least one hard register from ones
+   starting with HARD_REGNO and containing value of MODE are in set
+   HARD_REGSET.	 */
+static inline bool
+lra_hard_reg_set_intersection_p (int hard_regno, enum machine_mode mode,
+				 HARD_REG_SET hard_regset)
+{
+  int i;
+
+  lra_assert (hard_regno >= 0);
+  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
+    if (TEST_HARD_REG_BIT (hard_regset, hard_regno + i))
+      return true;
+  return false;
+}
+
+/* Return hard regno and offset of (sub-)register X through arguments
+   HARD_REGNO and OFFSET.  If it is not (sub-)register or the hard
+   register is unknown, then return -1 and 0 correspondingly.  */
+static inline void
+lra_get_hard_regno_and_offset (rtx x, int *hard_regno, int *offset)
+{
+  rtx reg;
+
+  *hard_regno = *offset = -1;
+  reg = x;
+  if (GET_CODE (x) == SUBREG)
+    reg = SUBREG_REG (x);
+  if (! REG_P (reg))
+    return;
+  if ((*hard_regno = REGNO (reg)) >= FIRST_PSEUDO_REGISTER)
+    *hard_regno = lra_get_regno_hard_regno (*hard_regno);
+  if (*hard_regno < 0)
+    return;
+  *offset = 0;
+  if (GET_CODE (x) == SUBREG)
+    *offset += subreg_regno_offset (*hard_regno, GET_MODE (reg),
+				   SUBREG_BYTE (x),  GET_MODE (x));
+}
+
+/* Add hard registers starting with HARD_REGNO and holding value of
+   MODE to the set S.  */
+static inline void
+lra_add_hard_reg_set (int hard_regno, enum machine_mode mode, HARD_REG_SET *s)
+{
+  int i;
+
+  for (i = hard_regno_nregs[hard_regno][mode] - 1; i >= 0; i--)
+    SET_HARD_REG_BIT (*s, hard_regno + i);
+}
+
+/* Update insn operands which are duplication of NOP operand.  The
+   insn is represented by its LRA internal representation ID.  */
+static inline void
+lra_update_dup (lra_insn_recog_data_t id, int nop)
+{
+  int i;
+  struct lra_static_insn_data *static_id = id->insn_static_data;
+
+  for (i = 0; i < static_id->n_dups; i++)
+    if (static_id->dup_num[i] == nop)
+      *id->dup_loc[i] = *id->operand_loc[nop];
+}
+
+/* Process operator duplications in insn with ID.  We do it after the
+   operands processing.	 Generally speaking, we could do this probably
+   simultaneously with operands processing because a common practice
+   is to enumerate the operators after their operands.	*/
+static inline void
+lra_update_operator_dups (lra_insn_recog_data_t id)
+{
+  int i;
+  struct lra_static_insn_data *static_id = id->insn_static_data;
+
+  for (i = 0; i < static_id->n_dups; i++)
+    {
+      int ndup = static_id->dup_num[i];
+      
+      if (static_id->operand[ndup].is_operator)
+	*id->dup_loc[i] = *id->operand_loc[ndup];
+    }
+}
+
+/* Return info about INSN.  Set up the info if it is not done yet.  */
+static inline lra_insn_recog_data_t
+lra_get_insn_recog_data (rtx insn)
+{
+  lra_insn_recog_data_t data;
+  unsigned int uid = INSN_UID (insn);
+
+  if (lra_insn_recog_data_len > (int) uid
+      && (data = lra_insn_recog_data[uid]) != NULL)
+    {
+      /* Check that we did not change insn without updating the insn
+	 info.	*/
+      lra_assert (data->insn == insn
+		  && (INSN_CODE (insn) < 0
+		      || data->icode == INSN_CODE (insn)));
+      return data;
+    }
+  return lra_set_insn_recog_data (insn);
+}
Index: lra-constraints.c
===================================================================
--- lra-constraints.c	(revision 0)
+++ lra-constraints.c	(working copy)
@@ -0,0 +1,5169 @@ 
+/* Code for RTL transformations to satisfy insn constraints.
+   Copyright (C) 2010, 2011, 2012
+   Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+
+/* This file contains code for 3 passes: constraint pass,
+   inheritance/split pass, and pass for undoing failed inheritance and
+   split.
+
+   The major goal of constraint pass is to transform RTL to satisfy
+   insn and address constraints by:
+     o choosing insn alternatives;
+     o generating *reload insns* (or reloads in brief) and *reload
+       pseudos* which will got necessary hard registers later;
+     o substituting pseudo equivalences (if it is done once, is done
+       everywhere) and removes insns initializing used equivalent
+       substitution.
+
+   The constraint pass has biggest and most complicated code in LRA.
+   There are a lot of important details like:
+     o reuse of input reload pseudos to simplify reload pseudo
+       allocations;
+     o bound reload pseudos (when different modes are needed);
+     o some heuristics to choose insn alternative to improve the
+       inheritance;
+     o early clobbers etc.
+
+   The pass is mimicking former reload pass in alternative choosing
+   because the reload pass is oriented to current machine description
+   model.  It might be changed if the machine description model is
+   changed.
+
+   There is special code for preventing all LRA and this pass cycling
+   in case of bugs.
+
+   To speed the pass up we process only necessary insns (first time
+   all insns) and reuse of already chosen alternatives in some
+   cases.
+
+   The inheritance/spilt pass is to transform code to achieve
+   ineheritance and live range splitting.  It is done on backward
+   traverse of EBBs.
+
+   The inheritance optimization goal is to reuse values in hard
+   registers. There is analogous optimization in old reload pass.  The
+   inheritance is achieved by following transformation:
+
+       reload_p1 <- p	     reload_p1 <- p
+       ...		     new_p <- reload_p1
+       ...		=>   ...
+       reload_p2 <- p	     reload_p2 <- new_p
+
+   where p is spilled and not changed between the insns.  Reload_p1 is
+   also called *original pseudo* and new_p is called *inheritance
+   pseudo*.
+
+   The subsequent assignment pass will try to assign the same (or
+   another if it is not possible) hard register to new_p as to
+   reload_p1 or reload_p2.
+
+   If it fails to assign a hard register, the opposite transformation
+   will restore the original code on (the pass called undoing
+   inheritance) because with spilled new_p the code would be much
+   worse.  The inheritance is done in EBB scope.  The above is just a
+   simplified example to get an idea of the inheritance as the
+   inheritance is also done for non-reload insns.
+
+   Splitting (transformation) is also done in EBB scope on the same
+   pass as the inheritance:
+
+       r <- ... or ... <- r		 r <- ... or ... <- r
+       ...				 s <- r (new insn -- save)
+       ...			  => 
+       ...				 r <- s (new insn -- restore)
+       ... <- r				 ... <- r
+
+    The *split pseudo* s is assigned to the hard register of the
+    original pseudo or hard register r.
+
+    Splitting is done:
+      o In EBBs with high register pressure for global pseudos (living
+	in at least 2 BBs) and assigned to hard registers when there
+	are more one reloads needing the hard registers;
+      o for pseudos needing save/restore code around calls.
+
+    If the split pseudo still has the same hard register as the
+    original pseudo after the subsequent assignment pass, the opposite
+    transformation is done on the same pass for undoing inheritance.  */
+
+#undef REG_OK_STRICT
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "hard-reg-set.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "regs.h"
+#include "insn-config.h"
+#include "insn-codes.h"
+#include "recog.h"
+#include "output.h"
+#include "addresses.h"
+#include "target.h"
+#include "function.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "except.h"
+#include "optabs.h"
+#include "df.h"
+#include "ira.h"
+#include "rtl-error.h"
+#include "lra-int.h"
+
+/* Value of LRA_CURR_RELOAD_NUM at the beginning of BB of the current
+   insn.  Remember that LRA_CURR_RELOAD_NUM is the number of emitted
+   reload insns.  */
+static int bb_reload_num;
+
+/* Array whose element is (MEM:MODE BASE_REG) corresponding to the
+   mode (index) and where BASE_REG is a base hard register for given
+   memory mode.	 */
+static rtx indirect_mem[MAX_MACHINE_MODE];
+
+/* Return indirect memory for MODE.  */
+static rtx
+get_indirect_mem (enum machine_mode mode)
+{
+  if (indirect_mem[mode] == NULL_RTX)
+    {
+      enum reg_class rclass = base_reg_class (mode, ADDR_SPACE_GENERIC,
+					      MEM, SCRATCH);
+
+      indirect_mem[mode]
+	 = gen_rtx_MEM (mode, regno_reg_rtx [ira_class_hard_regs[rclass][0]]);
+    }
+  return indirect_mem[mode];
+}
+
+/* Initiate INDIRECT_MEM.  */
+static void
+init_indirect_mem (void)
+{
+  int i;
+
+  for (i = 0; i < MAX_MACHINE_MODE; i++)
+    indirect_mem[i] = NULL_RTX;
+}
+
+
+
+/* The current insn being processed and corresponding its data (basic
+   block, the insn data, and the insn static data.  */
+static rtx curr_insn;
+static basic_block curr_bb;
+static lra_insn_recog_data_t curr_id;
+static struct lra_static_insn_data *curr_static_id;
+
+
+
+/* Start numbers for new registers and insns at the current constraints
+   pass start.	*/
+static int new_regno_start;
+static int new_insn_uid_start;
+
+/* Return hard regno of REGNO or if it is was not assigned to a hard
+   register, use a hard register from its allocno class.  */
+static int
+get_try_hard_regno (int regno)
+{
+  int hard_regno;
+  enum reg_class rclass;
+
+  if ((hard_regno = regno) >= FIRST_PSEUDO_REGISTER)
+    hard_regno = lra_get_regno_hard_regno (regno);
+  if (hard_regno >= 0)
+    return hard_regno;
+  rclass = lra_get_allocno_class (regno);
+  if (rclass == NO_REGS)
+    return -1;
+  return ira_class_hard_regs[rclass][0];
+}
+
+/* Return class of hard regno of REGNO or if it is was not assigned to
+   a hard register, return its allocno class but only for reload
+   pseudos created on the current constraint pass.  Otherwise, return
+   NO_REGS.  */
+static enum reg_class
+get_reg_class (int regno)
+{
+  int hard_regno;
+
+  if ((hard_regno = regno) >= FIRST_PSEUDO_REGISTER)
+    hard_regno = lra_get_regno_hard_regno (regno);
+  if (hard_regno >= 0)
+    return REGNO_REG_CLASS (hard_regno);
+  if (regno >= new_regno_start)
+    return lra_get_allocno_class (regno);
+  return NO_REGS;
+}
+
+/* Return true if REGNO in REG_MODE satisfies reg class constraint CL.
+   For new reload pseudos we should make more accurate class
+   *NEW_CLASS (we set up it if it is not NULL) to satisfy the
+   constraints.  Otherwise, set up NEW_CLASS to NO_REGS.  */
+static bool
+in_class_p (int regno, enum machine_mode reg_mode,
+	    enum reg_class cl, enum reg_class *new_class)
+{
+  enum reg_class rclass, common_class;
+  int class_size, hard_regno, nregs, i, j;
+
+  if (new_class != NULL)
+    *new_class = NO_REGS;
+  if (regno < FIRST_PSEUDO_REGISTER)
+    return TEST_HARD_REG_BIT (reg_class_contents[cl], regno);
+  rclass = get_reg_class (regno);
+  if (regno < new_regno_start
+      /* Do not make more accurate class from reloads generated.  They
+	 are mostly moves with a lot of constraints.  Making more
+	 accurate class may results in very narrow class and
+	 impossibility of find registers for several reloads of one
+	 insn.	*/
+      || INSN_UID (curr_insn) >= new_insn_uid_start)
+    return ((regno >= new_regno_start && rclass == ALL_REGS)
+	    || (rclass != NO_REGS && ira_class_subset_p[rclass][cl]
+		&& ! hard_reg_set_subset_p (reg_class_contents[cl],
+					    lra_no_alloc_regs)));
+  else
+    {
+      common_class = ira_reg_class_subset[rclass][cl];
+      if (new_class != NULL)
+	*new_class = common_class;
+      if (hard_reg_set_subset_p (reg_class_contents[common_class],
+				 lra_no_alloc_regs))
+	return false;
+      /* Check that there are enough allocatable regs.  */
+      class_size = ira_class_hard_regs_num[common_class];
+      for (i = 0; i < class_size; i++)
+	{
+	  hard_regno = ira_class_hard_regs[common_class][i];
+	  nregs = hard_regno_nregs[hard_regno][reg_mode];
+	  if (nregs == 1)
+	    return true;
+	  for (j = 0; j < nregs; j++)
+	    if (TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno + j))
+	      break;
+	  if (j >= nregs)
+	    return true;
+	}
+      return false;
+    }
+}
+
+/* Return true if REGNO satisfies a memory constraint.	*/
+static bool
+in_mem_p (int regno)
+{
+  return get_reg_class (regno) == NO_REGS;
+}
+
+/* Return the defined and profitable equiv substitution of reg X, return
+   X otherwise.	 */
+static rtx
+get_equiv_substitution (rtx x)
+{
+  int regno;
+  rtx res;
+
+  if (! REG_P (x) || (regno = REGNO (x)) < FIRST_PSEUDO_REGISTER
+      || ! ira_reg_equiv[regno].defined_p
+      || ! ira_reg_equiv[regno].profitable_p
+      || lra_get_regno_hard_regno (regno) >= 0)
+    return x;
+  if ((res = ira_reg_equiv[regno].memory) != NULL_RTX)
+    return res;
+  if ((res = ira_reg_equiv[regno].constant) != NULL_RTX)
+    return res;
+  if ((res = ira_reg_equiv[regno].invariant) != NULL_RTX)
+    return res;
+  gcc_unreachable ();
+}
+
+
+
+/* The page contains code to reuse input reloads.  */
+
+/* Structure describes input reload of the current insns.  */
+struct input_reload
+{
+  /* Reloaded value.  */
+  rtx input;
+  /* Reload pseudo used.  */
+  rtx reg;
+};
+
+/* The number of elements in the following array.  */
+static int curr_insn_input_reloads_num;
+/* Array containing info about input reloads.  It is used to find the
+   same input reload and reuse the reload pseudo in this case.	*/
+static struct input_reload curr_insn_input_reloads[LRA_MAX_INSN_RELOADS];
+
+/* Initiate data concerning reuse of input reloads for the current
+   insn.  */
+static void
+init_curr_insn_input_reloads (void)
+{
+  curr_insn_input_reloads_num = 0;
+}
+
+/* Change class of pseudo REGNO to NEW_CLASS.  Print info about it
+   using TITLE.	 Output a new line if NL_P.  */
+static void
+change_class (int regno, enum reg_class new_class,
+	      const char *title, bool nl_p)
+{
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file, "%s to class %s for r%d",
+	     title, reg_class_names[new_class], regno);
+  setup_reg_classes (regno, new_class, NO_REGS, new_class);
+  if (lra_dump_file != NULL && nl_p)
+    fprintf (lra_dump_file, "\n");
+}
+
+/* Create a new pseudo using MODE, RCLASS, ORIGINAL, TITLE or reuse
+   already created input reload pseudo (only if TYPE is not OP_OUT).
+   The result pseudo is returned through RESULT_REG.  Return TRUE if
+   we created a new pseudo, FALSE if we reused the already created
+   input reload pseudo.	 */
+static bool
+get_reload_reg (enum op_type type, enum machine_mode mode, rtx original,
+		enum reg_class rclass, const char *title, rtx *result_reg)
+{
+  bool res_p;
+  int i, regno;
+  enum reg_class new_class;
+
+  if (type == OP_OUT)
+    {
+      *result_reg
+	= lra_create_new_reg_with_unique_value (mode, original, rclass, title);
+      return true;
+    }
+  for (i = 0; i < curr_insn_input_reloads_num; i++)
+    if (rtx_equal_p (curr_insn_input_reloads[i].input, original))
+      break;
+  if (i >= curr_insn_input_reloads_num
+      || ! in_class_p (REGNO (curr_insn_input_reloads[i].reg),
+		       GET_MODE (curr_insn_input_reloads[i].reg),
+		       rclass, &new_class))
+    {
+      res_p = true;
+      *result_reg = lra_create_new_reg (mode, original, rclass, title);
+    }
+  else
+    {
+      lra_assert (! side_effects_p (original));
+      res_p = false;
+      *result_reg = curr_insn_input_reloads[i].reg;
+      regno = REGNO (*result_reg);
+      if (lra_dump_file != NULL)
+	 {
+	   fprintf (lra_dump_file, "	 Reuse r%d for reload ", regno);
+	   print_value_slim (lra_dump_file, original, 1);
+	 }
+      if (rclass != new_class)
+	 change_class (regno, new_class, ", change", false);
+      if (lra_dump_file != NULL)
+	 fprintf (lra_dump_file, "\n");
+    }
+  lra_assert (curr_insn_input_reloads_num < LRA_MAX_INSN_RELOADS);
+  curr_insn_input_reloads[curr_insn_input_reloads_num].input = original;
+  curr_insn_input_reloads[curr_insn_input_reloads_num++].reg = *result_reg;
+  return res_p;
+}
+
+
+
+/* The page contains code to extract memory address parts.  */
+
+/* Info about base and index regs of an address.  In some rare cases,
+   base/index register can be actually memory.	In this case we will
+   reload it.  */
+struct address
+{
+  rtx *base_reg_loc;  /* NULL if there is no a base register.  */
+  rtx *base_reg_loc2; /* Second location of {post/pre}_modify, NULL
+			 otherwise.  */
+  rtx *index_reg_loc; /* NULL if there is no an index register.	 */
+  rtx *index_loc; /* location of index reg * scale or index_reg_loc
+		      otherwise.  */
+  rtx *disp_loc; /* NULL if there is no a displacement.	 */
+  /* Defined if base_reg_loc is not NULL.  */
+  enum rtx_code base_outer_code, index_code;
+  /* True if the base register is modified in the address, for
+     example, in PRE_INC.  */
+  bool base_modify_p;
+};
+
+/* Wrapper around REGNO_OK_FOR_INDEX_P, to allow pseudos.  */
+static inline bool
+ok_for_index_p_nonstrict (rtx reg)
+{
+  unsigned regno = REGNO (reg);
+  
+  return regno >= FIRST_PSEUDO_REGISTER || REGNO_OK_FOR_INDEX_P (regno);
+}
+
+/* A version of regno_ok_for_base_p for use here, when all pseudos
+   should count as OK.	Arguments as for regno_ok_for_base_p.  */
+static inline bool
+ok_for_base_p_nonstrict (rtx reg, enum machine_mode mode, addr_space_t as,
+			 enum rtx_code outer_code, enum rtx_code index_code)
+{
+  unsigned regno = REGNO (reg);
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    return true;
+  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
+}
+
+/* Process address part in space AS (or all address if TOP_P) with
+   location *LOC to extract address characteristics.
+
+   If CONTEXT_P is false, we are looking at the base part of an
+   address, otherwise we are looking at the index part.
+
+   MODE is the mode of the memory reference; OUTER_CODE and INDEX_CODE
+   give the context that the rtx appears in; MODIFY_P if *LOC is
+   modified.  */
+static void
+extract_loc_address_regs (bool top_p, enum machine_mode mode, addr_space_t as,
+			  rtx *loc, bool context_p, enum rtx_code outer_code,
+			  enum rtx_code index_code,
+			  bool modify_p, struct address *ad)
+{
+  rtx x = *loc;
+  enum rtx_code code = GET_CODE (x);
+  bool base_ok_p;
+
+  switch (code)
+    {
+    case CONST_INT:
+    case CONST:
+    case SYMBOL_REF:
+    case LABEL_REF:
+      if (! context_p)
+	ad->disp_loc = loc;
+      return;
+
+    case CC0:
+    case PC:
+      return;
+
+    case PLUS:
+    case LO_SUM:
+      /* When we have an address that is a sum, we must determine
+	 whether registers are "base" or "index" regs.	If there is a
+	 sum of two registers, we must choose one to be the
+	 "base".  */
+      {
+	rtx *arg0_loc = &XEXP (x, 0);
+	rtx *arg1_loc = &XEXP (x, 1);
+	rtx arg0 = *arg0_loc;
+	rtx arg1 = *arg1_loc;
+	enum rtx_code code0 = GET_CODE (arg0);
+	enum rtx_code code1 = GET_CODE (arg1);
+
+	/* Look inside subregs.	 */
+	if (code0 == SUBREG)
+	  {
+	    arg0_loc = &SUBREG_REG (arg0);
+	    arg0 = *arg0_loc;
+	    code0 = GET_CODE (arg0);
+	  }
+	if (code1 == SUBREG)
+	  {
+	    arg1_loc = &SUBREG_REG (arg1);
+	    arg1 = *arg1_loc;
+	    code1 = GET_CODE (arg1);
+	  }
+
+	/* If this machine only allows one register per address, it
+	   must be in the first operand.  */
+	if (MAX_REGS_PER_ADDRESS == 1 || code == LO_SUM)
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, false, code,
+				      code1, modify_p, ad);
+	    ad->disp_loc = arg1_loc;
+	  }
+	/* If index and base registers are the same on this machine,
+	   just record registers in any non-constant operands.	We
+	   assume here, as well as in the tests below, that all
+	   addresses are in canonical form.  */
+	else if (INDEX_REG_CLASS
+		 == base_reg_class (VOIDmode, as, PLUS, SCRATCH)
+		 && code0 != PLUS && code0 != MULT)
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
+				      code1, modify_p, ad);
+	    if (! CONSTANT_P (arg1))
+	      extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
+					code0, modify_p, ad);
+	    else
+	      ad->disp_loc = arg1_loc;
+	  }
+
+	/* If the second operand is a constant integer, it doesn't
+	   change what class the first operand must be.	 */
+	else if (code1 == CONST_INT || code1 == CONST_DOUBLE)
+	  {
+	    ad->disp_loc = arg1_loc;
+	    extract_loc_address_regs (false, mode, as, arg0_loc, context_p,
+				      PLUS, code1, modify_p, ad);
+	  }
+	/* If the second operand is a symbolic constant, the first
+	   operand must be an index register but only if this part is
+	   all the address.  */
+	else if (code1 == SYMBOL_REF || code1 == CONST || code1 == LABEL_REF)
+	  {
+	    ad->disp_loc = arg1_loc;
+	    extract_loc_address_regs (false, mode, as, arg0_loc,
+				      top_p ? true : context_p, PLUS, code1,
+				      modify_p, ad);
+	  }
+	/* If both operands are registers but one is already a hard
+	   register of index or reg-base class, give the other the
+	   class that the hard register is not.	 */
+	else if (code0 == REG && code1 == REG
+		 && REGNO (arg0) < FIRST_PSEUDO_REGISTER
+		 && ((base_ok_p
+		      = ok_for_base_p_nonstrict (arg0, mode, as, PLUS, REG))
+		     || ok_for_index_p_nonstrict (arg0)))
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, ! base_ok_p,
+				      PLUS, REG, modify_p, ad);
+	    extract_loc_address_regs (false, mode, as, arg1_loc, base_ok_p,
+				      PLUS, REG, modify_p, ad);
+	  }
+	else if (code0 == REG && code1 == REG
+		 && REGNO (arg1) < FIRST_PSEUDO_REGISTER
+		 && ((base_ok_p
+		      = ok_for_base_p_nonstrict (arg1, mode, as, PLUS, REG))
+		     || ok_for_index_p_nonstrict (arg1)))
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, base_ok_p,
+				      PLUS, REG, modify_p, ad);
+	    extract_loc_address_regs (false, mode, as, arg1_loc, ! base_ok_p,
+				      PLUS, REG, modify_p, ad);
+	  }
+	/* If one operand is known to be a pointer, it must be the
+	   base with the other operand the index.  Likewise if the
+	   other operand is a MULT.  */
+	else if ((code0 == REG && REG_POINTER (arg0)) || code1 == MULT)
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
+				      code1, modify_p, ad);
+	    if (code1 == MULT)
+	      ad->index_loc = arg1_loc;
+	    extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
+				      code0, modify_p, ad);
+	  }
+	else if ((code1 == REG && REG_POINTER (arg1)) || code0 == MULT)
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, true, PLUS,
+				      code1, modify_p, ad);
+	    if (code0 == MULT)
+	      ad->index_loc = arg0_loc;
+	    extract_loc_address_regs (false, mode, as, arg1_loc, false, PLUS,
+				      code0, modify_p, ad);
+	  }
+	/* Otherwise, count equal chances that each might be a base or
+	   index register.  This case should be rare.  */
+	else
+	  {
+	    extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
+				      code1, modify_p, ad);
+	    extract_loc_address_regs (false, mode, as, arg1_loc,
+				      ad->base_reg_loc != NULL, PLUS,
+				      code0, modify_p, ad);
+	  }
+      }
+      break;
+
+    case POST_MODIFY:
+    case PRE_MODIFY:
+      extract_loc_address_regs (false, mode, as, &XEXP (x, 0), false,
+				code, GET_CODE (XEXP (XEXP (x, 1), 1)),
+				true, ad);
+      lra_assert (rtx_equal_p (XEXP (XEXP (x, 1), 0), XEXP (x, 0)));
+      ad->base_reg_loc2 = &XEXP (XEXP (x, 1), 0);
+      if (REG_P (XEXP (XEXP (x, 1), 1)))
+	extract_loc_address_regs (false, mode, as, &XEXP (XEXP (x, 1), 1),
+				  true, code, REG, modify_p, ad);
+      break;
+
+    case POST_INC:
+    case PRE_INC:
+    case POST_DEC:
+    case PRE_DEC:
+      extract_loc_address_regs (false, mode, as, &XEXP (x, 0), false, code,
+				SCRATCH, true, ad);
+      break;
+
+      /* We process memory as a register.  That means we flatten
+	 addresses.  In other words, the final code will never
+	 contains memory in an address even if the target supports
+	 such addresses (it is too rare these days).  Memory also can
+	 occur in address as a result some previous transformations
+	 like equivalence substitution.	 */
+    case MEM:
+    case REG:
+      if (context_p)
+	ad->index_reg_loc = loc;
+      else
+	{
+	  ad->base_reg_loc = loc;
+	  ad->base_outer_code = outer_code;
+	  ad->index_code = index_code;
+	  ad->base_modify_p = modify_p;
+	}
+      break;
+    default:
+      {
+	const char *fmt = GET_RTX_FORMAT (code);
+	int i;
+
+	if (GET_RTX_LENGTH (code) != 1
+	    || fmt[0] != 'e' || GET_CODE (XEXP (x, 0)) != UNSPEC)
+	  {
+	    for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+	      if (fmt[i] == 'e')
+		extract_loc_address_regs (false, mode, as, &XEXP (x, i),
+					  context_p, code, SCRATCH,
+					  modify_p, ad);
+	    break;
+	  }
+	/* fall through for case UNARY_OP (UNSPEC ...)	*/
+      }
+
+    case UNSPEC:
+      if (ad->disp_loc == NULL)
+	ad->disp_loc = loc;
+      else if (ad->base_reg_loc == NULL)
+	{
+	  ad->base_reg_loc = loc;
+	  ad->base_outer_code = outer_code;
+	  ad->index_code = index_code;
+	  ad->base_modify_p = modify_p;
+	}
+      else
+	{
+	  lra_assert (ad->index_reg_loc == NULL);
+	  ad->index_reg_loc = loc;
+	}
+      break;
+
+    }
+}
+
+
+/* Extract address characteristics in address with location *LOC in
+   space AS.  Return them in AD.  Parameter OUTER_CODE for MEM should
+   be MEM.  Parameter OUTER_CODE for 'p' constraint should be ADDRESS
+   and MEM_MODE should be VOIDmode.  */
+static void
+extract_address_regs (enum machine_mode mem_mode, addr_space_t as,
+		      rtx *loc, enum rtx_code outer_code, struct address *ad)
+{
+  ad->base_reg_loc = ad->base_reg_loc2
+    = ad->index_reg_loc = ad->index_loc = ad->disp_loc = NULL;
+  ad->base_outer_code = SCRATCH;
+  ad->index_code = SCRATCH;
+  ad->base_modify_p = false;
+  extract_loc_address_regs (true, mem_mode, as, loc, false, outer_code,
+			    SCRATCH, false, ad);  
+  if (ad->index_loc == NULL)
+    /* SUBREG ??? */
+    ad->index_loc = ad->index_reg_loc;
+}
+
+
+
+/* The page contains major code to choose the current insn alternative
+   and generate reloads for it.	 */
+
+/* Return start register offset of hard register REGNO in MODE.	 */
+int
+lra_constraint_offset (int regno, enum machine_mode mode)
+{
+  lra_assert (regno < FIRST_PSEUDO_REGISTER);
+  /* On a WORDS_BIG_ENDIAN machine, point to the last register of a
+     multiple hard register group of scalar integer registers, so that
+     for example (reg:DI 0) and (reg:SI 1) will be considered the same
+     register.	*/
+  if (WORDS_BIG_ENDIAN && GET_MODE_SIZE (mode) > UNITS_PER_WORD
+      && SCALAR_INT_MODE_P (mode))
+    return hard_regno_nregs[regno][mode] - 1;
+  return 0;
+}
+
+/* Like rtx_equal_p except that it allows a REG and a SUBREG to match
+   if they are the same hard reg, and has special hacks for
+   auto-increment and auto-decrement.  This is specifically intended for
+   process_alt_operands to use in determining whether two operands
+   match.  X is the operand whose number is the lower of the two.
+
+   It is supposed that X is the output operand and Y is the input
+   operand.  */
+static bool
+operands_match_p (rtx x, rtx y, int y_hard_regno)
+{
+  int i, offset;
+  RTX_CODE code = GET_CODE (x);
+  const char *fmt;
+
+  if (x == y)
+    return true;
+  if ((code == REG || (code == SUBREG && REG_P (SUBREG_REG (x))))
+      && (REG_P (y) || (GET_CODE (y) == SUBREG && REG_P (SUBREG_REG (y)))))
+    {
+      int j;
+      
+      lra_get_hard_regno_and_offset (x, &i, &offset);
+      if (i < 0)
+	goto slow;
+      i += offset;
+
+      if ((j = y_hard_regno) < 0)
+	goto slow;
+
+      i += lra_constraint_offset (i, GET_MODE (x));
+      j += lra_constraint_offset (j, GET_MODE (y));
+
+      return i == j;
+    }
+
+  /* If two operands must match, because they are really a single
+     operand of an assembler insn, then two post-increments are invalid
+     because the assembler insn would increment only once.  On the
+     other hand, a post-increment matches ordinary indexing if the
+     post-increment is the output operand.  */
+  if (code == POST_DEC || code == POST_INC || code == POST_MODIFY)
+    return operands_match_p (XEXP (x, 0), y, y_hard_regno);
+
+  /* Two pre-increments are invalid because the assembler insn would
+     increment only once.  On the other hand, a pre-increment matches
+     ordinary indexing if the pre-increment is the input operand.  */
+  if (GET_CODE (y) == PRE_DEC || GET_CODE (y) == PRE_INC
+      || GET_CODE (y) == PRE_MODIFY)
+    return operands_match_p (x, XEXP (y, 0), y_hard_regno);
+  
+ slow:
+
+  if (code == REG && GET_CODE (y) == SUBREG && REG_P (SUBREG_REG (y))
+      && x == SUBREG_REG (y))
+    return true;
+  if (GET_CODE (y) == REG && code == SUBREG && REG_P (SUBREG_REG (x))
+      && SUBREG_REG (x) == y)
+    return true;
+
+  /* Now we have disposed of all the cases in which different rtx
+     codes can match.  */
+  if (code != GET_CODE (y))
+    return false;
+
+  /* (MULT:SI x y) and (MULT:HI x y) are NOT equivalent.  */
+  if (GET_MODE (x) != GET_MODE (y))
+    return false;
+
+  switch (code)
+    {
+    case CONST_INT:
+    case CONST_DOUBLE:
+    case CONST_FIXED:
+      return false;
+
+    case LABEL_REF:
+      return XEXP (x, 0) == XEXP (y, 0);
+    case SYMBOL_REF:
+      return XSTR (x, 0) == XSTR (y, 0);
+
+    default:
+      break;
+    }
+
+  /* Compare the elements.  If any pair of corresponding elements fail
+     to match, return false for the whole things.  */
+
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      int val, j;
+      switch (fmt[i])
+	{
+	case 'w':
+	  if (XWINT (x, i) != XWINT (y, i))
+	    return false;
+	  break;
+
+	case 'i':
+	  if (XINT (x, i) != XINT (y, i))
+	    return false;
+	  break;
+
+	case 'e':
+	  val = operands_match_p (XEXP (x, i), XEXP (y, i), y_hard_regno);
+	  if (val == 0)
+	    return false;
+	  break;
+
+	case '0':
+	  break;
+
+	case 'E':
+	  if (XVECLEN (x, i) != XVECLEN (y, i))
+	    return false;
+	  for (j = XVECLEN (x, i) - 1; j >= 0; --j)
+	    {
+	      val = operands_match_p (XVECEXP (x, i, j), XVECEXP (y, i, j),
+				      y_hard_regno);
+	      if (val == 0)
+		return false;
+	    }
+	  break;
+
+	  /* It is believed that rtx's at this level will never
+	     contain anything but integers and other rtx's, except for
+	     within LABEL_REFs and SYMBOL_REFs.	 */
+	default:
+	  gcc_unreachable ();
+	}
+    }
+  return true;
+}
+
+/* Reload pseudos created for input and output(or early clobber)
+   reloads which should have the same hard register.  Such pseudos as
+   input operands should be not inherited because the output rewrite
+   the value in any away.  */
+bitmap_head lra_matched_pseudos;
+
+/* Reload pseudos created for matched input and output reloads whose
+   mode are different.	Such pseudos has a modified rules for finding
+   their living ranges, e.g. assigning to subreg of such pseudo means
+   changing all pseudo value.  */
+bitmap_head lra_bound_pseudos;
+
+/* True if X is a constant that can be forced into the constant pool.
+   MODE is the mode of the operand, or VOIDmode if not known.  */
+#define CONST_POOL_OK_P(MODE, X)		\
+  ((MODE) != VOIDmode				\
+   && CONSTANT_P (X)				\
+   && GET_CODE (X) != HIGH			\
+   && !targetm.cannot_force_const_mem (MODE, X))
+
+/* True if C is a non-empty register class that has too few registers
+   to be safely used as a reload target class.	*/
+#define SMALL_REGISTER_CLASS_P(C)					\
+  (reg_class_size [(C)] == 1						\
+   || (reg_class_size [(C)] >= 1 && targetm.class_likely_spilled_p (C)))
+
+/* Return mode of WHAT inside of WHERE whose mode of the context is
+   OUTER_MODE.	If WHERE does not contain WHAT, return VOIDmode.  */
+static enum machine_mode
+find_mode (rtx *where, enum machine_mode outer_mode, rtx *what)
+{
+  int i, j;
+  enum machine_mode mode;
+  rtx x;
+  const char *fmt;
+  enum rtx_code code;
+
+  if (where == what)
+    return outer_mode;
+  if (*where == NULL_RTX)
+    return VOIDmode;
+  x = *where;
+  code = GET_CODE (x);
+  outer_mode = GET_MODE (x);
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	{
+	  if ((mode = find_mode (&XEXP (x, i), outer_mode, what)) != VOIDmode)
+	    return mode;
+	}
+      else if (fmt[i] == 'E')
+	{
+	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+	  if ((mode = find_mode (&XVECEXP (x, i, j), outer_mode, what))
+	      != VOIDmode)
+	    return mode;
+	}
+    }
+  return VOIDmode;
+}
+
+/* Return mode for operand NOP of the current insn.  */
+static inline enum machine_mode
+get_op_mode (int nop)
+{
+  rtx *loc;
+  enum machine_mode mode;
+  bool md_first_p = asm_noperands (PATTERN (curr_insn)) < 0;
+
+  /* Take mode from the machine description first.  */
+  if (md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
+    return mode;
+  loc = curr_id->operand_loc[nop];
+  /* Take mode from the operand second.	 */
+  mode = GET_MODE (*loc);
+  if (mode != VOIDmode)
+    return mode;
+  if (! md_first_p && (mode = curr_static_id->operand[nop].mode) != VOIDmode)
+    return mode;
+  /* Here is a very rare case.	Take mode from the context.  */
+  return find_mode (&PATTERN (curr_insn), VOIDmode, loc);
+}
+
+/* If REG is a reload pseudo, try to make its class satisfying CL.  */
+static void
+narrow_reload_pseudo_class (rtx reg, enum reg_class cl)
+{
+  int regno;
+  enum reg_class rclass;
+
+  /* Do not make more accurate class from reloads generated.  They are
+     mostly moves with a lot of constraints.  Making more accurate
+     class may results in very narrow class and impossibility of find
+     registers for several reloads of one insn.	 */
+  if (INSN_UID (curr_insn) >= new_insn_uid_start)
+    return;
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+  if (! REG_P (reg) || (regno = REGNO (reg)) < new_regno_start)
+    return;
+  rclass = get_reg_class (regno);
+  rclass = ira_reg_class_subset[rclass][cl];
+  if (rclass == NO_REGS)
+    return;
+  change_class (regno, rclass, "      Change", true);
+}
+
+/* Generate reloads for matching OUT and INS (array of input operand
+   numbers with end marker -1) with reg class GOAL_CLASS.  Add input
+   and output reloads correspondingly to the lists *BEFORE and
+   *AFTER.  */
+static void
+match_reload (signed char out, signed char *ins, enum reg_class goal_class,
+	      rtx *before, rtx *after)
+{
+  int i, in;
+  rtx new_in_reg, new_out_reg, reg;
+  enum machine_mode inmode, outmode;
+  rtx in_rtx = *curr_id->operand_loc[ins[0]];
+  rtx out_rtx = *curr_id->operand_loc[out];
+
+  outmode = get_op_mode (out);
+  inmode = get_op_mode (ins[0]);
+  if (inmode != outmode)
+    {
+      if (GET_MODE_SIZE (inmode) > GET_MODE_SIZE (outmode))
+	{
+	  reg = new_in_reg
+	    = lra_create_new_reg_with_unique_value (inmode, in_rtx,
+						    goal_class, "");
+	  if (SCALAR_INT_MODE_P (inmode))
+	    new_out_reg = gen_lowpart_SUBREG (outmode, reg);
+	  else
+	    new_out_reg = gen_rtx_SUBREG (outmode, reg, 0);
+	}
+      else
+	{
+	  reg = new_out_reg
+	    = lra_create_new_reg_with_unique_value (outmode, out_rtx,
+						    goal_class, "");
+	  if (SCALAR_INT_MODE_P (outmode))
+	    new_in_reg = gen_lowpart_SUBREG (inmode, reg);
+	  else
+	    new_in_reg = gen_rtx_SUBREG (inmode, reg, 0);
+	}
+      bitmap_set_bit (&lra_matched_pseudos, REGNO (reg));
+      bitmap_set_bit (&lra_bound_pseudos, REGNO (reg));
+    }
+  else
+    {
+      /* We create pseudo for out rtx because we always should keep
+	 registers with the same original regno have synchronized
+	 value (it is not true for out register but it will be
+	 corrected by the next insn).
+	 
+	 Do not reuse register because of the following situation: a <-
+	 a op b, and b should be the same as a.	 */
+	 
+      new_in_reg = new_out_reg
+	= lra_create_new_reg_with_unique_value (outmode, out_rtx,
+						goal_class, "");
+      /* Don't generate inheritance for the new register because we
+	 can not use the same hard register for the corresponding
+	 inheritance pseudo for input reload.  */
+      bitmap_set_bit (&lra_matched_pseudos, REGNO (new_in_reg));
+    }
+  /* In and out operand can be got from transformations before
+     processing constraints.  So the pseudos might have inaccurate
+     class and we should make their classes more accurate.  */
+  narrow_reload_pseudo_class (in_rtx, goal_class);
+  narrow_reload_pseudo_class (out_rtx, goal_class);
+  push_to_sequence (*before);
+  lra_emit_move (copy_rtx (new_in_reg), in_rtx);
+  *before = get_insns ();
+  end_sequence ();
+  for (i = 0; (in = ins[i]) >= 0; i++)
+    *curr_id->operand_loc[in] = new_in_reg;
+  lra_update_dups (curr_id, ins);
+  if (find_reg_note (curr_insn, REG_UNUSED, out_rtx) == NULL_RTX)
+    {
+      start_sequence ();
+      lra_emit_move (out_rtx, copy_rtx (new_out_reg));
+      emit_insn (*after);
+      *after = get_insns ();
+      end_sequence ();
+    }
+  *curr_id->operand_loc[out] = new_out_reg;
+  lra_update_dup (curr_id, out);
+}
+
+/* Return register class which is union of all reg classes in insn
+   constraint alternative string starting with P.  */
+static enum reg_class
+reg_class_from_constraints (const char *p)
+{
+  int c, len;
+  enum reg_class op_class = NO_REGS;
+
+  do
+    switch ((c = *p, len = CONSTRAINT_LEN (c, p)), c)
+      {
+      case '#':
+      case ',':
+	return op_class;
+
+      case 'p':
+	op_class = (reg_class_subunion
+		    [op_class][base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
+					       ADDRESS, SCRATCH)]);
+	break;
+	
+      case 'g':
+      case 'r':
+	op_class = reg_class_subunion[op_class][GENERAL_REGS];
+	break;
+	
+      default:
+	if (REG_CLASS_FROM_CONSTRAINT (c, p) == NO_REGS)
+	  {
+#ifdef EXTRA_CONSTRAINT_STR
+	    if (EXTRA_ADDRESS_CONSTRAINT (c, p))
+	      op_class
+		= (reg_class_subunion
+		   [op_class][base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
+					      ADDRESS, SCRATCH)]);
+#endif
+	    break;
+	  }
+	
+	op_class
+	  = reg_class_subunion[op_class][REG_CLASS_FROM_CONSTRAINT (c, p)];
+	break;
+      }
+  while ((p += len), c);
+  return op_class;
+}
+
+/* Return final hard regno (plus offset) which will be after
+   elimination.	 We do this for matching constraints because the final
+   hard regno could have a different class.  */
+static int
+get_final_hard_regno (int hard_regno, int offset)
+{
+  if (hard_regno < 0)
+    return hard_regno;
+  hard_regno += offset;
+  return lra_get_elimation_hard_regno (hard_regno);
+}
+
+/* Return register class of OP.	 That is a class of the hard register
+   itself (if OP is a hard register), or class of assigned hard
+   register to the pseudo (if OP is pseudo), or allocno class of
+   unassigned pseudo (if OP is reload pseudo).	Return NO_REGS
+   otherwise.  */
+static enum reg_class
+get_op_class (rtx op)
+{
+  int regno, hard_regno, offset;
+
+  if (! REG_P (op))
+    return NO_REGS;
+  lra_get_hard_regno_and_offset (op, &hard_regno, &offset);
+  if (hard_regno >= 0)
+    {
+      hard_regno = get_final_hard_regno (hard_regno, offset);
+      return REGNO_REG_CLASS (hard_regno);
+    }
+  /* Reload pseudo will get a hard register in any case.  */
+  if ((regno = REGNO (op)) >= new_regno_start)
+    return lra_get_allocno_class (regno);
+  return NO_REGS;
+}
+
+/* Return generated insn mem_pseudo:=val if TO_P or val:=mem_pseudo
+   otherwise.  If modes of MEM_PSEUDO and VAL are different, use
+   SUBREG for VAL to make them equal.  Assign CODE to the insn if it
+   is not recognized.
+
+   We can not use emit_move_insn in some cases because of bad used
+   practice in some machine descriptions.  For example, power can use
+   only base+index addressing for altivec move insns and it is checked
+   by insn predicates.	On the other hand, the same move insn
+   constraints permit to use offsetable memory for moving vector mode
+   values from/to general registers to/from memory.  emit_move_insn
+   will transform offsetable address to one with base+index addressing
+   which is rejected by the constraint.	 So sometimes we need to
+   generate move insn without modifications and assign the code
+   explicitly because the generated move can be unrecognizable because
+   of the predicates.  */
+static rtx
+emit_spill_move (bool to_p, rtx mem_pseudo, rtx val, int code)
+{
+  rtx insn, after;
+
+  start_sequence ();
+  if (GET_MODE (mem_pseudo) != GET_MODE (val))
+    val = gen_rtx_SUBREG (GET_MODE (mem_pseudo),
+			  GET_CODE (val) == SUBREG ? SUBREG_REG (val) : val,
+			  0);
+  if (to_p)
+    insn = gen_move_insn (mem_pseudo, val);
+  else
+    insn = gen_move_insn (val, mem_pseudo);
+  if (recog_memoized (insn) < 0)
+    INSN_CODE (insn) = code;
+  emit_insn (insn);
+  after = get_insns ();
+  end_sequence ();
+  return after;
+}
+
+/* Process a special case insn (register move), return true if we
+   don't need to process it anymore.  Return that RTL was changed
+   through CHANGE_P and macro SECONDARY_MEMORY_NEEDED says to use
+   secondary memory through SEC_MEM_P.	*/
+static bool
+check_and_process_move (bool *change_p, bool *sec_mem_p)
+{
+  int sregno, dregno;
+  rtx set, dest, src, dreg, sr, dr, sreg, new_reg, before, scratch_reg;
+  enum reg_class dclass, sclass, secondary_class;
+  enum machine_mode sreg_mode;
+  secondary_reload_info sri;
+
+  *sec_mem_p = *change_p = false;
+  if ((set = single_set (curr_insn)) == NULL)
+    return false;
+  dreg = dest = SET_DEST (set);
+  sreg = src = SET_SRC (set);
+  /* Quick check on the right move insn which does not need
+     reloads.  */
+  if ((dclass = get_op_class (dest)) != NO_REGS
+      && (sclass = get_op_class (src)) != NO_REGS
+      && targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2)
+    return true;
+  if (GET_CODE (dest) == SUBREG)
+    dreg = SUBREG_REG (dest);
+  if (GET_CODE (src) == SUBREG)
+    sreg = SUBREG_REG (src);
+  if (! REG_P (dreg) || ! REG_P (sreg))
+    return false;
+  sclass = dclass = NO_REGS;
+  dr = get_equiv_substitution (dreg);
+  if (dr != dreg)
+    dreg = copy_rtx (dr);
+  if (REG_P (dreg))
+    dclass = get_reg_class (REGNO (dreg));
+  if (dclass == ALL_REGS)
+    /* We don't know what class we will use -- let it be figured out
+       by curr_insn_transform function.	 Remember some targets does not
+       work with such classes through their implementation of
+       machine-dependent hooks like secondary_memory_needed.  */
+    return false;
+  sreg_mode = GET_MODE (sreg);
+  sr = get_equiv_substitution (sreg);
+  if (sr != sreg)
+    sreg = copy_rtx (sr);
+  if (REG_P (sreg))
+    sclass = get_reg_class (REGNO (sreg));
+  if (sclass == ALL_REGS)
+    /* See comments above.  */
+    return false;
+#ifdef SECONDARY_MEMORY_NEEDED
+  if (dclass != NO_REGS && sclass != NO_REGS
+      && SECONDARY_MEMORY_NEEDED (sclass, dclass, GET_MODE (src)))
+    {
+      *sec_mem_p = true;
+      return false;
+    }
+#endif
+  sri.prev_sri = NULL;
+  sri.icode = CODE_FOR_nothing;
+  sri.extra_cost = 0;
+  secondary_class = NO_REGS;
+  /* Set up hard register for a reload pseudo for hook
+     secondary_reload because some targets just ignore unassigned
+     pseudos in the hook.  */
+  if (dclass != NO_REGS
+      && REG_P (dreg) && (dregno = REGNO (dreg)) >= new_regno_start
+      && lra_get_regno_hard_regno (dregno) < 0)
+    reg_renumber[dregno] = ira_class_hard_regs[dclass][0];
+  else
+    dregno = -1;
+  if (sclass != NO_REGS
+      && REG_P (sreg) && (sregno = REGNO (sreg)) >= new_regno_start
+      && lra_get_regno_hard_regno (sregno) < 0)
+    reg_renumber[sregno] = ira_class_hard_regs[sclass][0];
+  else
+    sregno = -1;
+  if (sclass != NO_REGS)
+    secondary_class
+      = (enum reg_class) targetm.secondary_reload (false, dest,
+						   (reg_class_t) sclass,
+						   GET_MODE (src), &sri);
+  if (sclass == NO_REGS
+      || ((secondary_class != NO_REGS || sri.icode != CODE_FOR_nothing)
+	  && dclass != NO_REGS))
+    secondary_class
+      = (enum reg_class) targetm.secondary_reload (true, sreg,
+						   (reg_class_t) dclass,
+						   sreg_mode, &sri);
+  if (sregno >= 0)
+    reg_renumber [sregno] = -1;
+  if (dregno >= 0)
+    reg_renumber [dregno] = -1;
+  if (secondary_class == NO_REGS && sri.icode == CODE_FOR_nothing)
+    return false;
+  *change_p = true;
+  new_reg = NULL_RTX;
+  if (secondary_class != NO_REGS)
+    new_reg = lra_create_new_reg_with_unique_value (sreg_mode, NULL_RTX,
+						    secondary_class,
+						    "secondary");
+  start_sequence ();
+  if (sri.icode == CODE_FOR_nothing)
+    lra_emit_move (new_reg, sreg);
+  else
+    {
+      enum reg_class scratch_class;
+
+      scratch_class = (reg_class_from_constraints
+		       (insn_data[sri.icode].operand[2].constraint));
+      scratch_reg = (lra_create_new_reg_with_unique_value
+		     (insn_data[sri.icode].operand[2].mode, NULL_RTX,
+		      scratch_class, "scratch"));
+      emit_insn (GEN_FCN (sri.icode) (new_reg != NULL_RTX ? new_reg : dest,
+				      sreg, scratch_reg));
+    }
+  before = get_insns ();
+  end_sequence ();
+  lra_process_new_insns (curr_insn, before, NULL_RTX, "Inserting the move");
+  if (new_reg != NULL_RTX)
+    {
+      if (GET_CODE (src) == SUBREG)
+	SUBREG_REG (src) = new_reg;
+      else
+	SET_SRC (set) = new_reg;
+    }
+  else
+    {
+      if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file, "Deleting move %u\n", INSN_UID (curr_insn));
+	  debug_rtl_slim (lra_dump_file, curr_insn, curr_insn, -1, 0);
+	}
+      lra_set_insn_deleted (curr_insn);
+      return true;
+    }
+  return false;
+}
+
+/* The following data describe the result of process_alt_operands.
+   The data are used in curr_insn_transform to generate reloads.  */
+
+/* The chosen reg classes which should be used for the corresponding
+   operands.  */
+static enum reg_class goal_alt[MAX_RECOG_OPERANDS];
+/* True if the operand should be the same as another operand and the
+   another operand does not need a reload.  */
+static bool goal_alt_match_win[MAX_RECOG_OPERANDS];
+/* True if the operand does not need a reload.	*/
+static bool goal_alt_win[MAX_RECOG_OPERANDS];
+/* True if the operand can be offsetable memory.  */
+static bool goal_alt_offmemok[MAX_RECOG_OPERANDS];
+/* The number of an operand to which given operand can be matched to.  */
+static int goal_alt_matches[MAX_RECOG_OPERANDS];
+/* The number of elements in the following array.  */
+static int goal_alt_dont_inherit_ops_num;
+/* Numbers of operands whose reload pseudos should not be inherited.  */
+static int goal_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+/* True if the insn commutative operands should be swapped.  */
+static bool goal_alt_swapped;
+/* The chosen insn alternative.	 */
+static int goal_alt_number;
+
+/* The following five variables are used to choose the best insn
+   alternative.	 They reflect final characteristics of the best
+   alternative.	 */
+
+/* Number of necessary reloads and overall cost reflecting the
+   previous value and other unpleasantness of the best alternative.  */
+static int best_losers, best_overall;
+/* Number of small register classes used for operands of the best
+   alternative.	 */
+static int best_small_class_operands_num;
+/* Overall number hard registers used for reloads.  For example, on
+   some targets we need 2 general registers to reload DFmode and only
+   one floating point register.	 */
+static int best_reload_nregs;
+/* Overall number reflecting distances of previous reloading the same
+   value.  It is used to improve inheritance chances.  */
+static int best_reload_sum;
+
+/* True if the current insn should have no correspondingly input or
+   output reloads.  */
+static bool no_input_reloads_p, no_output_reloads_p;
+
+/* True if we swapped the commutative operands in the current
+   insn.  */
+static int curr_swapped;
+
+/* Make reloads for addr register in LOC which should be of class CL,
+   add reloads to list BEFORE.	If AFTER is not null emit insns to set
+   the register up after the insn (it is case of inc/dec, modify).  */
+static bool
+process_addr_reg (rtx *loc, rtx *before, rtx *after, enum reg_class cl)
+{
+  int regno, final_regno;
+  enum reg_class rclass, new_class;
+  rtx reg = *loc;
+  rtx new_reg;
+  enum machine_mode mode;
+  bool change_p = false;
+
+  mode = GET_MODE (reg);
+  if (! REG_P (reg))
+    {
+      /* Always reload memory in an address even if the target
+	 supports such addresses.  */
+      new_reg
+	= lra_create_new_reg_with_unique_value (mode, reg, cl, "address");
+      push_to_sequence (*before);
+      lra_emit_move (new_reg, reg);
+      *before = get_insns ();
+      end_sequence ();
+      *loc = new_reg;
+      if (after != NULL)
+	{
+	  start_sequence ();
+	  lra_emit_move (reg, new_reg);
+	  emit_insn (*after);
+	  *after = get_insns ();
+	  end_sequence ();
+	}
+      return true;
+    }
+  lra_assert (REG_P (reg));
+  final_regno = regno = REGNO (reg);
+  if (regno < FIRST_PSEUDO_REGISTER)
+    {
+      rtx final_reg = reg;
+      rtx *final_loc = &final_reg;
+
+      lra_eliminate_reg_if_possible (final_loc);
+      final_regno = REGNO (*final_loc);
+    }
+  /* Use class of hard register after elimination because some targets
+     do not recognize virtual hard registers as valid address
+     registers.	 */
+  rclass = get_reg_class (final_regno);
+  if ((*loc = get_equiv_substitution (reg)) != reg)
+    {
+      if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file,
+		   "Changing pseudo %d in address of insn %u on equiv ",
+		   REGNO (reg), INSN_UID (curr_insn));
+	  print_value_slim (lra_dump_file, *loc, 1);
+	  fprintf (lra_dump_file, "\n");
+	}
+      *loc = copy_rtx (*loc);
+      change_p = true;
+    }
+  if (*loc != reg || ! in_class_p (final_regno, GET_MODE (reg), cl, &new_class))
+    {
+      reg = *loc;
+      if (get_reload_reg (OP_IN, mode, reg, cl, "address", &new_reg))
+	{
+	  push_to_sequence (*before);
+	  lra_emit_move (new_reg, reg);
+	  *before = get_insns ();
+	  end_sequence ();
+	}
+      *loc = new_reg;
+      if (after != NULL)
+	{
+	  start_sequence ();
+	  lra_emit_move (reg, new_reg);
+	  emit_insn (*after);
+	  *after = get_insns ();
+	  end_sequence ();
+	}
+      change_p = true;
+    }
+  else if (new_class != NO_REGS && rclass != new_class)
+    change_class (regno, new_class, "	   Change", true);
+  return change_p;
+}
+
+#ifndef SLOW_UNALIGNED_ACCESS
+#define SLOW_UNALIGNED_ACCESS(mode, align) 0
+#endif
+
+/* Make reloads for subreg in operand NOP with internal subreg mode
+   REG_MODE, add new reloads for further processing.  Return true if
+   any reload was generated.  */
+static bool
+simplify_operand_subreg (int nop, enum machine_mode reg_mode)
+{
+  int hard_regno;
+  rtx before, after;
+  enum machine_mode mode;
+  rtx reg, new_reg;
+  rtx operand = *curr_id->operand_loc[nop];
+
+  before = after = NULL_RTX;
+
+  if (GET_CODE (operand) != SUBREG)
+    return false;
+  
+  mode = GET_MODE (operand);
+  reg = SUBREG_REG (operand);
+  /* If we change address for paradoxical subreg of memory, the
+     address might violate the necessary alignment or the access might
+     be slow.  So take this into consideration.	 */
+  if ((MEM_P (reg)
+       && ((! STRICT_ALIGNMENT
+	    && ! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg)))
+	   || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode)))
+      || (REG_P (reg) && REGNO (reg) < FIRST_PSEUDO_REGISTER))
+    {
+      alter_subreg (curr_id->operand_loc[nop], false);
+      return true;
+    }
+  /* Force reload if this is a constant or PLUS or if there may be a
+     problem accessing OPERAND in the outer mode.  */
+  if ((REG_P (reg)
+       && REGNO (reg) >= FIRST_PSEUDO_REGISTER
+       && (hard_regno = lra_get_regno_hard_regno (REGNO (reg))) >= 0
+       /* Don't reload paradoxical subregs because we could be looping
+	  having repeatedly final regno out of hard regs range.	 */
+       && (hard_regno_nregs[hard_regno][GET_MODE (reg)]
+	   >= hard_regno_nregs[hard_regno][mode])
+       && simplify_subreg_regno (hard_regno, GET_MODE (reg),
+				 SUBREG_BYTE (operand), mode) < 0)
+      || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg))
+    {
+      /* Constant mode ???? */
+      enum op_type type = curr_static_id->operand[nop].type;
+      /* The class will be defined later in curr_insn_transform.  */
+      enum reg_class rclass
+	= (enum reg_class) targetm.preferred_reload_class (reg, ALL_REGS);
+
+      new_reg = lra_create_new_reg_with_unique_value (reg_mode, reg, rclass,
+						      "subreg reg");
+      bitmap_set_bit (&lra_optional_reload_pseudos, REGNO (new_reg));
+      if (type != OP_OUT
+	  || GET_MODE_SIZE (GET_MODE (reg)) > GET_MODE_SIZE (mode))
+	{
+	  push_to_sequence (before);
+	  lra_emit_move (new_reg, reg);
+	  before = get_insns ();
+	  end_sequence ();
+	}
+      if (type != OP_IN)
+	{
+	  start_sequence ();
+	  lra_emit_move (reg, new_reg);
+	  emit_insn (after);
+	  after = get_insns ();
+	  end_sequence ();
+	}
+      SUBREG_REG (operand) = new_reg;
+      lra_process_new_insns (curr_insn, before, after,
+			     "Inserting subreg reload");
+      return true;
+    }
+  return false;
+}
+
+/* Return TRUE if *LOC refers for a hard register from SET.  */
+static bool
+uses_hard_regs_p (rtx *loc, HARD_REG_SET set)
+{
+  int i, j, x_hard_regno, offset;
+  enum machine_mode mode;
+  rtx x;
+  const char *fmt;
+  enum rtx_code code;
+
+  if (*loc == NULL_RTX)
+    return false;
+  x = *loc;
+  code = GET_CODE (x);
+  mode = GET_MODE (x);
+  if (code == SUBREG)
+    {
+      loc = &SUBREG_REG (x);
+      x = SUBREG_REG (x);
+      code = GET_CODE (x);
+      if (GET_MODE_SIZE (GET_MODE (x)) > GET_MODE_SIZE (mode))
+	mode = GET_MODE (x);
+    }
+  
+  if (REG_P (x))
+    {
+      lra_get_hard_regno_and_offset (x, &x_hard_regno, &offset);
+      /* The real hard regno of the operand after the allocation.  */
+      x_hard_regno = get_final_hard_regno (x_hard_regno, offset);
+      return (x_hard_regno >= 0
+	      && lra_hard_reg_set_intersection_p (x_hard_regno, mode, set));
+    }
+  if (MEM_P (x))
+    {
+      struct address ad;
+      enum machine_mode mode = GET_MODE (x);
+      rtx *addr_loc = &XEXP (x, 0);
+
+      extract_address_regs (mode, MEM_ADDR_SPACE (x), addr_loc, MEM, &ad);
+      if (ad.base_reg_loc != NULL)
+	{
+	  if (uses_hard_regs_p (ad.base_reg_loc, set))
+	    return true;
+	}
+      if (ad.index_reg_loc != NULL)
+	{
+	  if (uses_hard_regs_p (ad.index_reg_loc, set))
+	    return true;
+	}
+    }
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	{
+	  if (uses_hard_regs_p (&XEXP (x, i), set))
+	    return true;
+	}
+      else if (fmt[i] == 'E')
+	{
+	  for (j = XVECLEN (*loc, i) - 1; j >= 0; j--)
+	    if (uses_hard_regs_p (&XVECEXP (*loc, i, j), set))
+	      return true;
+	}
+    }
+  return false;
+}
+
+/* Cost factor for each additional reload and maximal cost bound for
+   insn reloads.  One might ask about such strange numbers.  Their
+   values occurred historically from former reload pass.  */
+#define LOSER_COST_FACTOR 6
+#define MAX_OVERALL_COST_BOUND 600
+
+/* Major function to choose the current insn alternative and what
+   operands should be reloaded and how.	 If ONLY_ALTERNATIVE is not
+   negative we should consider only this alternative.  Return false if
+   we can not choose the alternative or find how to reload the
+   operands.  */
+static bool
+process_alt_operands (int only_alternative)
+{
+  bool ok_p = false;
+  int nop, small_class_operands_num, overall, nalt, offset;
+  int n_alternatives = curr_static_id->n_alternatives;
+  int n_operands = curr_static_id->n_operands;
+  /* LOSERS counts those that don't fit this alternative and would
+     require loading.  */
+  int losers;
+  /* REJECT is a count of how undesirable this alternative says it is
+     if any reloading is required.  If the alternative matches exactly
+     then REJECT is ignored, but otherwise it gets this much counted
+     against it in addition to the reloading needed.  */
+  int reject;
+  /* The number of elements in the following array.  */
+  int early_clobbered_regs_num;
+  /* Numbers of operands which are early clobber registers.  */
+  int early_clobbered_nops[MAX_RECOG_OPERANDS];
+  enum reg_class curr_alt[MAX_RECOG_OPERANDS];
+  HARD_REG_SET curr_alt_set[MAX_RECOG_OPERANDS];
+  bool curr_alt_match_win[MAX_RECOG_OPERANDS];
+  bool curr_alt_win[MAX_RECOG_OPERANDS];
+  bool curr_alt_offmemok[MAX_RECOG_OPERANDS];
+  int curr_alt_matches[MAX_RECOG_OPERANDS];
+  /* The number of elements in the following array.  */
+  int curr_alt_dont_inherit_ops_num;
+  /* Numbers of operands whose reload pseudos should not be inherited.	*/
+  int curr_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+  rtx op;
+  rtx no_subreg_operand[MAX_RECOG_OPERANDS], operand_reg[MAX_RECOG_OPERANDS];
+  int hard_regno[MAX_RECOG_OPERANDS];
+  enum machine_mode biggest_mode[MAX_RECOG_OPERANDS];
+  int reload_nregs, reload_sum;
+  bool costly_p;
+  enum reg_class cl;
+
+  /* Calculate some data common for all alternatives to speed up the
+     function.	*/
+  for (nop = 0; nop < n_operands; nop++)
+    {
+      op = no_subreg_operand[nop] = *curr_id->operand_loc[nop];
+      lra_get_hard_regno_and_offset (op, &hard_regno[nop], &offset);
+      /* The real hard regno of the operand after the allocation.  */
+      hard_regno[nop] = get_final_hard_regno (hard_regno[nop], offset);
+      
+      operand_reg[nop] = op;
+      biggest_mode[nop] = GET_MODE (operand_reg[nop]);
+      if (GET_CODE (operand_reg[nop]) == SUBREG)
+	{
+	  operand_reg[nop] = SUBREG_REG (operand_reg[nop]);
+	  if (GET_MODE_SIZE (biggest_mode[nop])
+	      < GET_MODE_SIZE (GET_MODE (operand_reg[nop])))
+	    biggest_mode[nop] = GET_MODE (operand_reg[nop]);
+	}
+      if (REG_P (operand_reg[nop]))
+	no_subreg_operand[nop] = operand_reg[nop];
+      else
+	operand_reg[nop] = NULL_RTX;
+    }
+
+  /* The constraints are made of several alternatives.	Each operand's
+     constraint looks like foo,bar,... with commas separating the
+     alternatives.  The first alternatives for all operands go
+     together, the second alternatives go together, etc.
+
+     First loop over alternatives.  */
+  for (nalt = 0; nalt < n_alternatives; nalt++)
+    {
+      /* Loop over operands for one constraint alternative.  */
+      if (
+#ifdef HAVE_ATTR_enabled
+	  (curr_id->alternative_enabled_p != NULL
+	   && ! curr_id->alternative_enabled_p[nalt])
+	  ||
+#endif
+	  (only_alternative >= 0 && nalt != only_alternative))
+	continue;
+
+      overall = losers = reject = reload_nregs = reload_sum = 0;
+      for (nop = 0; nop < n_operands; nop++)
+	reject += (curr_static_id
+		   ->operand_alternative[nalt * n_operands + nop].reject);
+      early_clobbered_regs_num = 0;
+
+      for (nop = 0; nop < n_operands; nop++)
+	{
+	  const char *p;
+	  char *end;
+	  int len, c, m, i, opalt_num, this_alternative_matches;
+	  bool win, did_match, offmemok, early_clobber_p;
+	  /* false => this operand can be reloaded somehow for this
+	     alternative.  */
+	  bool badop;
+	  /* false => this operand can be reloaded if the alternative
+	     allows regs.  */
+	  bool winreg;
+	  /* False if a constant forced into memory would be OK for
+	     this operand.  */
+	  bool constmemok;
+	  enum reg_class this_alternative, this_costly_alternative;
+	  HARD_REG_SET this_alternative_set, this_costly_alternative_set;
+	  bool this_alternative_match_win, this_alternative_win;
+	  bool this_alternative_offmemok;
+	  int invalidate_m;
+	  enum machine_mode mode;
+
+	  opalt_num = nalt * n_operands + nop;
+	  if (curr_static_id->operand_alternative[opalt_num].anything_ok)
+	    {
+	      /* Fast track for no constraints at all.	*/
+	      curr_alt[nop] = NO_REGS;
+	      CLEAR_HARD_REG_SET (curr_alt_set[nop]);
+	      curr_alt_win[nop] = true;
+	      curr_alt_match_win[nop] = false;
+	      curr_alt_offmemok[nop] = false;
+	      curr_alt_matches[nop] = -1;
+	      continue;
+	    }
+      
+	  op = no_subreg_operand[nop];
+	  mode = get_op_mode (nop);
+
+	  win = did_match = winreg = offmemok = constmemok = false;
+	  badop = true;
+      
+	  early_clobber_p = false;
+	  p = curr_static_id->operand_alternative[opalt_num].constraint;
+      
+	  this_costly_alternative = this_alternative = NO_REGS;
+	  /* We update set of possible hard regs besides its class
+	     because reg class might be inaccurate.  For example,
+	     union of LO_REGS (l), HI_REGS(h), and STACK_REG(k) in ARM
+	     is translated in HI_REGS because classes are merged by
+	     pairs and there is no accurate intermediate class.	 */
+	  CLEAR_HARD_REG_SET (this_alternative_set);
+	  CLEAR_HARD_REG_SET (this_costly_alternative_set);
+	  this_alternative_win = false;
+	  this_alternative_match_win = false;
+	  this_alternative_offmemok = false;
+	  this_alternative_matches = -1;
+  
+	  invalidate_m = -1;
+
+	  /* An empty constraint should be excluded by the fast
+	     track.  */
+	  lra_assert (*p != 0 && *p != ',');
+  
+	  /* Scan this alternative's specs for this operand; set WIN
+	     if the operand fits any letter in this alternative.
+	     Otherwise, clear BADOP if this operand could fit some
+	     letter after reloads, or set WINREG if this operand could
+	     fit after reloads provided the constraint allows some
+	     registers.	 */
+	  costly_p = false;
+	  do
+	    {
+	      switch ((c = *p, len = CONSTRAINT_LEN (c, p)), c)
+		{
+		case '\0':
+		  len = 0;
+		  break;
+		case ',':
+		  c = '\0';
+		  break;
+	
+		case '=':  case '+': case '?': case '*': case '!':
+		case ' ': case '\t':
+		  break;
+		  
+		case '%':
+		  /* We only support one commutative marker, the first
+		     one.  We already set commutative above.  */
+		  break;
+		  
+		case '&':
+		  early_clobber_p = true;
+		  break;
+		  
+		case '#':
+		  /* Ignore rest of this alternative.  */
+		  c = '\0';
+		  break;
+		  
+		case '0':  case '1':  case '2':	 case '3':  case '4':
+		case '5':  case '6':  case '7':	 case '8':  case '9':
+		  {
+		    int m_hregno, m_offset;
+		    bool match_p;
+		    
+		    m = strtoul (p, &end, 10);
+		    p = end;
+		    len = 0;
+		    lra_assert (nop > m);
+		    
+		    this_alternative_matches = m;
+		    lra_get_hard_regno_and_offset (*curr_id->operand_loc[m],
+						   &m_hregno, &m_offset);
+		    m_hregno = get_final_hard_regno (m_hregno, m_offset);
+		    /* We are supposed to match a previous operand.
+		       If we do, we win if that one did.  If we do
+		       not, count both of the operands as losers.
+		       (This is too conservative, since most of the
+		       time only a single reload insn will be needed
+		       to make the two operands win.  As a result,
+		       this alternative may be rejected when it is
+		       actually desirable.)  */
+		    /* If it conflicts with others.  */
+		    match_p = false;
+		    if (operands_match_p (*curr_id->operand_loc[nop],
+					  *curr_id->operand_loc[m], m_hregno))
+		      {
+			int i;
+			
+			for (i = 0; i < early_clobbered_regs_num; i++)
+			  if (early_clobbered_nops[i] == m)
+			    break;
+			/* We should reject matching of an early
+			   clobber operand if the matching operand is
+			   not dying in the insn.  */
+			if (i >= early_clobbered_regs_num
+			    || operand_reg[nop] == NULL_RTX
+			    || (find_regno_note (curr_insn, REG_DEAD,
+						 REGNO (operand_reg[nop]))
+				!= NULL_RTX))
+			  match_p = true;
+		      }
+		    if (match_p)
+		      {
+			/* If we are matching a non-offsettable
+			   address where an offsettable address was
+			   expected, then we must reject this
+			   combination, because we can't reload
+			   it.	*/
+			if (curr_alt_offmemok[m]
+			    && MEM_P (*curr_id->operand_loc[m])
+			    && curr_alt[m] == NO_REGS && ! curr_alt_win[m])
+			  continue;
+			
+			did_match = curr_alt_win[m];
+		      }
+		    else
+		      {
+			/* Operands don't match.  */
+			/* Retroactively mark the operand we had to
+			   match as a loser, if it wasn't already and
+			   it wasn't matched to a register constraint
+			   (e.g it might be matched by memory).	 */
+			if (curr_alt_win[m]
+			    && (operand_reg[m] == NULL_RTX
+				|| hard_regno[m] < 0))
+			  {
+			    losers++;
+			    if (curr_alt[m] != NO_REGS)
+			      reload_nregs
+				+= (ira_reg_class_max_nregs[curr_alt[m]]
+				    [GET_MODE (*curr_id->operand_loc[m])]);
+			}
+			invalidate_m = m;
+			if (curr_alt[m] == NO_REGS)
+			  continue;
+			
+			/* We prefer no matching alternatives because
+			   it gives more freedom in RA.	 */
+			if (operand_reg[nop] == NULL_RTX
+			    || (find_regno_note (curr_insn, REG_DEAD,
+						 REGNO (operand_reg[nop]))
+				 == NULL_RTX))
+			  reject += 2;
+		      }
+		    /* This can be fixed with reloads if the operand
+		       we are supposed to match can be fixed with
+		       reloads.	 */
+		    badop = false;
+		    this_alternative = curr_alt[m];
+		    COPY_HARD_REG_SET (this_alternative_set, curr_alt_set[m]);
+		    
+		    /* If we have to reload this operand and some
+		       previous operand also had to match the same
+		       thing as this operand, we don't know how to do
+		       that.  So reject this alternative.  */
+		    if (! did_match)
+		      for (i = 0; i < nop; i++)
+			if (curr_alt_matches[i] == this_alternative_matches)
+			  badop = true;
+		    
+		    break;
+		  }
+		  
+		case 'p':
+		  cl = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
+				       ADDRESS, SCRATCH);
+		  this_alternative = reg_class_subunion[this_alternative][cl];
+		  IOR_HARD_REG_SET (this_alternative_set,
+				    reg_class_contents[cl]);
+		  if (costly_p)
+		    {
+		      this_costly_alternative
+			= reg_class_subunion[this_costly_alternative][cl];
+		      IOR_HARD_REG_SET (this_costly_alternative_set,
+					reg_class_contents[cl]);
+		    }
+		  win = true;
+		  badop = false;
+		  break;
+		  
+		case TARGET_MEM_CONSTRAINT:
+		  if (MEM_P (op)
+		      || (REG_P (op)
+			  && REGNO (op) >= FIRST_PSEUDO_REGISTER
+			  && in_mem_p (REGNO (op))))
+		    win = true;
+		  if (CONST_POOL_OK_P (mode, op))
+		    badop = false;
+		  constmemok = true;
+		  break;
+		  
+		case '<':
+		  if (MEM_P (op)
+		      && (GET_CODE (XEXP (op, 0)) == PRE_DEC
+			  || GET_CODE (XEXP (op, 0)) == POST_DEC))
+		    win = true;
+		  break;
+		  
+		case '>':
+		  if (MEM_P (op)
+		      && (GET_CODE (XEXP (op, 0)) == PRE_INC
+			  || GET_CODE (XEXP (op, 0)) == POST_INC))
+		    win = true;
+		  break;
+		  
+		  /* Memory op whose address is not offsettable.  */
+		case 'V':
+		  if (MEM_P (op)
+		      && ! offsettable_nonstrict_memref_p (op))
+		    win = true;
+		  break;
+		  
+		  /* Memory operand whose address is offsettable.  */
+		case 'o':
+		  if ((MEM_P (op)
+		       && offsettable_nonstrict_memref_p (op))
+		      || (REG_P (op)
+			  && REGNO (op) >= FIRST_PSEUDO_REGISTER
+			  && in_mem_p (REGNO (op))))
+		    win = true;
+		  if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
+		    badop = false;
+		  constmemok = true;
+		  offmemok = true;
+		  break;
+		  
+		case 'E':
+		case 'F':
+		  if (GET_CODE (op) == CONST_DOUBLE
+		      || (GET_CODE (op) == CONST_VECTOR
+			  && (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)))
+		    win = true;
+		  break;
+		  
+		case 'G':
+		case 'H':
+		  if (GET_CODE (op) == CONST_DOUBLE
+		      && CONST_DOUBLE_OK_FOR_CONSTRAINT_P (op, c, p))
+		    win = true;
+		  break;
+		  
+		case 's':
+		  if (CONST_INT_P (op)
+		      || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))
+		    break;
+		case 'i':
+		  if (CONSTANT_P (op)
+		      && (! flag_pic || LEGITIMATE_PIC_OPERAND_P (op)))
+		    win = true;
+		  break;
+		  
+		case 'n':
+		  if (CONST_INT_P (op)
+		      || (GET_CODE (op) == CONST_DOUBLE && mode == VOIDmode))
+		    win = true;
+		  break;
+	
+		case 'I':
+		case 'J':
+		case 'K':
+		case 'L':
+		case 'M':
+		case 'N':
+		case 'O':
+		case 'P':
+		  if (CONST_INT_P (op)
+		      && CONST_OK_FOR_CONSTRAINT_P (INTVAL (op), c, p))
+		    win = true;
+		  break;
+		  
+		case 'X':
+		  /* This constraint should be excluded by the fast
+		     track.  */
+		  gcc_unreachable ();
+		  break;
+		  
+		case 'g':
+		  if (/* A PLUS is never a valid operand, but LRA can
+			 make it from a register when eliminating
+			 registers.  */
+		      GET_CODE (op) != PLUS
+		      && (! CONSTANT_P (op) || ! flag_pic
+			  || LEGITIMATE_PIC_OPERAND_P (op))
+		      && (! REG_P (op)
+			  || (REGNO (op) >= FIRST_PSEUDO_REGISTER
+			      && in_mem_p (REGNO (op)))))
+		    win = true;
+		  /* Drop through into 'r' case.  */
+		  
+		case 'r':
+		  this_alternative
+		    = reg_class_subunion[this_alternative][GENERAL_REGS];
+		  IOR_HARD_REG_SET (this_alternative_set,
+				    reg_class_contents[GENERAL_REGS]);
+		  if (costly_p)
+		    {
+		      this_costly_alternative
+			= (reg_class_subunion
+			   [this_costly_alternative][GENERAL_REGS]);
+		      IOR_HARD_REG_SET (this_costly_alternative_set,
+					reg_class_contents[GENERAL_REGS]);
+		    }
+		  goto reg;
+		  
+		default:
+		  if (REG_CLASS_FROM_CONSTRAINT (c, p) == NO_REGS)
+		    {
+#ifdef EXTRA_CONSTRAINT_STR
+		      if (EXTRA_MEMORY_CONSTRAINT (c, p))
+			{
+			  if (EXTRA_CONSTRAINT_STR (op, c, p))
+			    win = true;
+			  /* For regno_equiv_mem_loc we have to
+			     check.  */
+			  else if (REG_P (op)
+				   && REGNO (op) >= FIRST_PSEUDO_REGISTER
+				   && in_mem_p (REGNO (op)))
+			    {
+			      /* We could transform spilled memory
+				 finally to indirect memory.  */
+			      if (EXTRA_CONSTRAINT_STR
+				  (get_indirect_mem (mode), c, p))
+				win = true;
+			    }
+			  
+			  /* If we didn't already win, we can reload
+			     constants via force_const_mem, and other
+			     MEMs by reloading the address like for
+			     'o'.  */
+			  if (CONST_POOL_OK_P (mode, op) || MEM_P (op))
+			    badop = false;
+			  constmemok = true;
+			  offmemok = true;
+			  break;
+			}
+		      if (EXTRA_ADDRESS_CONSTRAINT (c, p))
+			{
+			  if (EXTRA_CONSTRAINT_STR (op, c, p))
+			    win = true;
+			  
+			  /* If we didn't already win, we can reload
+			     the address into a base register.	*/
+			  cl = base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
+					       ADDRESS, SCRATCH);
+			  this_alternative
+			    = reg_class_subunion[this_alternative][cl];
+			  IOR_HARD_REG_SET (this_alternative_set,
+					    reg_class_contents[cl]);
+			  if (costly_p)
+			    {
+			      this_costly_alternative
+				= (reg_class_subunion
+				   [this_costly_alternative][cl]);
+			      IOR_HARD_REG_SET (this_costly_alternative_set,
+						reg_class_contents[cl]);
+			    }
+			  badop = false;
+			  break;
+			}
+		      
+		      if (EXTRA_CONSTRAINT_STR (op, c, p))
+			win = true;
+		      else if (REG_P (op)
+			       && REGNO (op) >= FIRST_PSEUDO_REGISTER
+			       && in_mem_p (REGNO (op)))
+			{
+			  /* We could transform spilled memory finally
+			     to indirect memory.  */
+			  if (EXTRA_CONSTRAINT_STR (get_indirect_mem (mode),
+						    c, p))
+			    win = true;
+			}
+#endif
+		      break;
+		    }
+		  
+		  cl = REG_CLASS_FROM_CONSTRAINT (c, p);
+		  this_alternative = reg_class_subunion[this_alternative][cl];
+		  IOR_HARD_REG_SET (this_alternative_set,
+				    reg_class_contents[cl]);
+		  if (costly_p)
+		    {
+		      this_costly_alternative
+			= reg_class_subunion[this_costly_alternative][cl];
+		      IOR_HARD_REG_SET (this_costly_alternative_set,
+					reg_class_contents[cl]);
+		    }
+		reg:
+		  if (mode == BLKmode)
+		    break;
+		  winreg = true;
+		  if (REG_P (op))
+		    {
+		      if (hard_regno[nop] >= 0
+			  && in_hard_reg_set_p (this_alternative_set,
+						mode, hard_regno[nop]))
+			win = true;
+		      else if (hard_regno[nop] < 0
+			       && in_class_p (REGNO (op), GET_MODE (op),
+					      this_alternative, NULL))
+			win = true;
+		    }
+		  break;
+		}
+	      if (c != ' ' && c != '\t')
+		costly_p = c == '*';
+	    }
+	  while ((p += len), c);
+  
+	  /* If this operand could be handled with a reg, and some reg
+	     is allowed, then this operand can be handled.  */
+	  if (winreg && this_alternative != NO_REGS)
+	    badop = false;
+  
+	  /* Record which operands fit this alternative.  */
+	  if (win)
+	    {
+	      this_alternative_win = true;
+	      if (operand_reg[nop] != NULL_RTX)
+		{
+		  if (hard_regno[nop] >= 0)
+		    {
+		      if (in_hard_reg_set_p (this_costly_alternative_set,
+					     mode, hard_regno[nop]))
+			reject++;
+		    }
+		  else
+		    {
+		      /* Prefer won reg to spilled pseudo under other equal
+			 conditions.  */
+		      reject++;
+		      if (in_class_p (REGNO (operand_reg[nop]),
+				      GET_MODE (operand_reg[nop]),
+				      this_costly_alternative, NULL))
+			reject++;
+		    }
+		  /* We simulate the behaviour of old reload here.
+		     Although scratches need hard registers and it
+		     might result in spilling other pseudos, no reload
+		     insns are generated for the scratches.  So it
+		     might cost something but probably less than old
+		     reload pass believes.  */
+		  if (lra_former_scratch_p (REGNO (operand_reg[nop])))
+		    reject += LOSER_COST_FACTOR;
+		}
+	    }
+	  else if (did_match)
+	    this_alternative_match_win = true;
+	  else
+	    {
+	      int const_to_mem = 0;
+	      bool no_regs_p;
+
+	      no_regs_p
+		= (this_alternative == NO_REGS
+		   || (hard_reg_set_subset_p
+		       (reg_class_contents[this_alternative],
+			lra_no_alloc_regs)));
+	      this_alternative_offmemok = offmemok;
+	      if (this_costly_alternative != NO_REGS)
+		reject++;
+	      /* If the operand is dying, has a matching constraint,
+		 and satisfies constraints of the matched operand
+		 which failed to satisfy the own constraints, we do
+		 not need to generate a reload insn for this
+		 operand.  */
+	      if (this_alternative_matches < 0
+		  || curr_alt_win[this_alternative_matches]
+		  || ! REG_P (op)
+		  || find_regno_note (curr_insn, REG_DEAD,
+				      REGNO (op)) == NULL_RTX
+		  || ((hard_regno[nop] < 0
+		       || ! in_hard_reg_set_p (this_alternative_set,
+					       mode, hard_regno[nop]))
+		      && (hard_regno[nop] >= 0
+			  || ! in_class_p (REGNO (op), GET_MODE (op),
+					   this_alternative, NULL))))
+		losers++;
+	      if (operand_reg[nop] != NULL_RTX)
+		{
+		  int last_reload = (lra_reg_info[ORIGINAL_REGNO
+						  (operand_reg[nop])]
+				     .last_reload);
+
+		  if (last_reload > bb_reload_num)
+		    reload_sum += last_reload;
+		  else
+		    reload_sum += bb_reload_num;
+		}
+	      if (badop
+		  /* Alternative loses if it has no regs for a reg
+		     operand.  */
+		  || (REG_P (op) && no_regs_p
+		      && this_alternative_matches < 0))
+		goto fail;
+      
+	      /* If this is a constant that is reloaded into the
+		 desired class by copying it to memory first, count
+		 that as another reload.  This is consistent with
+		 other code and is required to avoid choosing another
+		 alternative when the constant is moved into memory.
+		 Note that the test here is precisely the same as in
+		 the code below that calls force_const_mem.  */
+	      if (CONST_POOL_OK_P (mode, op)
+		  && ((targetm.preferred_reload_class
+		       (op, this_alternative) == NO_REGS)
+		      || no_input_reloads_p)
+		  && get_op_mode (nop) != VOIDmode)
+		{
+		  const_to_mem = 1;
+		  if (! no_regs_p)
+		    losers++;
+		}
+      
+	      /* Alternative loses if it requires a type of reload not
+		 permitted for this insn.  We can always reload
+		 objects with a REG_UNUSED note.  */
+	      if ((curr_static_id->operand[nop].type != OP_IN
+		   && no_output_reloads_p
+		   && ! find_reg_note (curr_insn, REG_UNUSED, op))
+		  || (curr_static_id->operand[nop].type != OP_OUT
+		      && no_input_reloads_p && ! const_to_mem))
+		goto fail;
+      
+	      /* If we can't reload this value at all, reject this
+		 alternative.  Note that we could also lose due to
+		 LIMIT_RELOAD_CLASS, but we don't check that here.  */
+	      if (! CONSTANT_P (op) && ! no_regs_p)
+		{
+		  if (targetm.preferred_reload_class
+		      (op, this_alternative) == NO_REGS)
+		    reject = MAX_OVERALL_COST_BOUND;
+	  
+		  if (curr_static_id->operand[nop].type == OP_OUT
+		      && (targetm.preferred_output_reload_class
+			  (op, this_alternative) == NO_REGS))
+		    reject = MAX_OVERALL_COST_BOUND;
+		}
+      
+	      /* We prefer to reload pseudos over reloading other
+		 things, since such reloads may be able to be
+		 eliminated later.  So bump REJECT in other cases.
+		 Don't do this in the case where we are forcing a
+		 constant into memory and it will then win since we
+		 don't want to have a different alternative match
+		 then.	*/
+	      if (! (REG_P (op)
+		     && REGNO (op) >= FIRST_PSEUDO_REGISTER)
+		  && ! (const_to_mem && constmemok)
+		  /* We can reload the address instead of memory (so
+		     do not punish it).	 It is preferable to do to
+		     avoid cycling in some cases.  */
+		  && ! (MEM_P (op) && offmemok))
+		reject += 2;
+      
+	      /* Input reloads can be inherited more often than output
+		 reloads can be removed, so penalize output
+		 reloads.  */
+	      if (!REG_P (op) || curr_static_id->operand[nop].type != OP_IN)
+		reject++;
+	      if (this_alternative_matches < 0
+		  && no_regs_p && ! this_alternative_offmemok && ! constmemok)
+		goto fail;
+
+	      if (! no_regs_p)
+		reload_nregs
+		  += ira_reg_class_max_nregs[this_alternative][mode];
+	    }
+  
+	  if (early_clobber_p)
+	    reject++;
+	  /* ??? Should we update the cost because early clobber
+	     register reloads or it is a rare thing to be worth to do
+	     it.  */
+	  overall = losers * LOSER_COST_FACTOR + reject;
+	  if ((best_losers == 0 || losers != 0) && best_overall < overall)
+	    goto fail;
+
+	  curr_alt[nop] = this_alternative;
+	  COPY_HARD_REG_SET (curr_alt_set[nop], this_alternative_set);
+	  curr_alt_win[nop] = this_alternative_win;
+	  curr_alt_match_win[nop] = this_alternative_match_win;
+	  curr_alt_offmemok[nop] = this_alternative_offmemok;
+	  curr_alt_matches[nop] = this_alternative_matches;
+  
+	  if (invalidate_m >= 0 && ! this_alternative_win)
+	    curr_alt_win[invalidate_m] = false;
+  
+	  if (early_clobber_p && operand_reg[nop] != NULL_RTX)
+	    early_clobbered_nops[early_clobbered_regs_num++] = nop;
+	}
+      ok_p = true;
+      curr_alt_dont_inherit_ops_num = 0;
+      for (nop = 0; nop < early_clobbered_regs_num; nop++)
+	{
+	  int i, j, clobbered_hard_regno;
+	  HARD_REG_SET temp_set;
+
+	  i = early_clobbered_nops[nop];
+	  if ((! curr_alt_win[i] && ! curr_alt_match_win[i])
+	      || hard_regno[i] < 0)
+	    continue;
+	  clobbered_hard_regno = hard_regno[i];
+	  CLEAR_HARD_REG_SET (temp_set);
+	  for (j = hard_regno_nregs[clobbered_hard_regno][biggest_mode[i]] - 1;
+	       j >= 0;
+	       j--)
+	    SET_HARD_REG_BIT (temp_set, clobbered_hard_regno + j);
+	  for (j = 0; j < n_operands; j++)
+	    if (j == i
+		/* We don't want process insides of match_operator and
+		   match_parallel because otherwise we would process
+		   their operands once again generating a wrong
+		   code.  */
+		|| curr_static_id->operand[j].is_operator)
+	      continue;
+	    else if (curr_alt_matches[j] == i && curr_alt_match_win[j])
+	      {
+		/* This is a trick.  Such operands don't conflict and
+		   don't need a reload.	 But it is hard to transfer
+		   this information to the assignment pass which
+		   spills one operand without this info.  We avoid the
+		   conflict by forcing to use the same pseudo for the
+		   operands hoping that the pseudo gets the same hard
+		   regno as the operands and the reloads are gone.  */
+		if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])
+		  {
+		    curr_alt_win[i] = false;
+		    curr_alt_match_win[j] = false;
+		  }
+		continue;
+	      }
+	    else if (curr_alt_matches[i] == j && curr_alt_match_win[i])
+	      {
+		/* See the comment for the previous case.  */
+		if (*curr_id->operand_loc[i] != *curr_id->operand_loc[j])
+		  {
+		    curr_alt_win[j] = false;
+		    curr_alt_match_win[i] = false;
+		  }
+		continue;
+	      }
+	    else if (uses_hard_regs_p (curr_id->operand_loc[j], temp_set))
+	      break;
+	  if (j >= n_operands)
+	    continue;
+	  /* We need to reload early clobbered register.  */
+	  for (j = 0; j < n_operands; j++)
+	    if (curr_alt_matches[j] == i)
+	      {
+		curr_alt_match_win[j] = false;
+		losers++;
+		overall += LOSER_COST_FACTOR;
+	      }
+	  if (! curr_alt_match_win[i])
+	    curr_alt_dont_inherit_ops[curr_alt_dont_inherit_ops_num++] = i;
+	  else
+	    {
+	      /* Remember pseudos used for match reloads are never
+		 inherited.  */
+	      lra_assert (curr_alt_matches[i] >= 0);
+	      curr_alt_win[curr_alt_matches[i]] = false;
+	    }
+	  curr_alt_win[i] = curr_alt_match_win[i] = false;
+	  losers++;
+	  overall += LOSER_COST_FACTOR;
+	}
+      small_class_operands_num = 0;
+      for (nop = 0; nop < n_operands; nop++)
+	/* If this alternative can be made to work by reloading, and
+	   it needs less reloading than the others checked so far,
+	   record it as the chosen goal for reloading.	*/
+	small_class_operands_num
+	  += SMALL_REGISTER_CLASS_P (curr_alt[nop]) ? 1 : 0;
+
+      if ((best_losers != 0 && losers == 0)
+	  || (((best_losers == 0 && losers == 0)
+	       || (best_losers != 0 && losers != 0))
+	      && (best_overall > overall
+		  || (best_overall == overall
+		      /* If the cost of the reloads is the same,
+			 prefer alternative which requires minimal
+			 number of small register classes for the
+			 operands.  This improves chances of reloads
+			 for insn requiring small register
+			 classes.  */
+		      && (small_class_operands_num
+			  < best_small_class_operands_num
+			  || (small_class_operands_num
+			      == best_small_class_operands_num
+			      && (reload_nregs < best_reload_nregs
+				  || (reload_nregs == best_reload_nregs
+				      && best_reload_sum < reload_sum))))))))
+	{
+	  for (nop = 0; nop < n_operands; nop++)
+	    {
+	      goal_alt_win[nop] = curr_alt_win[nop];
+	      goal_alt_match_win[nop] = curr_alt_match_win[nop];
+	      goal_alt_matches[nop] = curr_alt_matches[nop];
+	      goal_alt[nop] = curr_alt[nop];
+	      goal_alt_offmemok[nop] = curr_alt_offmemok[nop];
+	    }
+	  goal_alt_dont_inherit_ops_num = curr_alt_dont_inherit_ops_num;
+	  for (nop = 0; nop < curr_alt_dont_inherit_ops_num; nop++)
+	    goal_alt_dont_inherit_ops[nop] = curr_alt_dont_inherit_ops[nop];
+	  goal_alt_swapped = curr_swapped;
+	  best_overall = overall;
+	  best_losers = losers;
+	  best_small_class_operands_num = small_class_operands_num;
+	  best_reload_nregs = reload_nregs;
+	  best_reload_sum = reload_sum;
+	  goal_alt_number = nalt;
+	}
+      if (losers == 0)
+	/* Everything is satisfied.  Do not process alternatives
+	   anymore.  */ 
+	break;
+    fail:
+      ;
+    }
+  return ok_p;
+}
+
+/* Return 1 if ADDR is a valid memory address for mode MODE in address
+   space AS, and check that each pseudo has the proper kind of hard
+   reg.	 */
+static int
+valid_address_p (enum machine_mode mode ATTRIBUTE_UNUSED,
+		 rtx addr, addr_space_t as)
+{
+#ifdef GO_IF_LEGITIMATE_ADDRESS
+  lra_assert (ADDR_SPACE_GENERIC_P (as));
+  GO_IF_LEGITIMATE_ADDRESS (mode, addr, win);
+  return 0;
+  
+ win:
+  return 1;
+#else
+  return targetm.addr_space.legitimate_address_p (mode, addr, 0, as);
+#endif
+}
+
+/* Make reload base reg + disp from address AD in space AS of memory
+   with MODE into a new pseudo.	 Return the new pseudo.	 */
+static rtx
+base_plus_disp_to_reg (enum machine_mode mode, addr_space_t as,
+		       struct address *ad)
+{
+  enum reg_class cl;
+  rtx new_reg;
+
+  lra_assert (ad->base_reg_loc != NULL && ad->disp_loc != NULL);
+  cl = base_reg_class (mode, as, ad->base_outer_code, ad->index_code);
+  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "base + disp");
+  lra_emit_add (new_reg, *ad->base_reg_loc, *ad->disp_loc);
+  return new_reg;
+}
+
+/* Make substitution in address AD in space AS with location ADDR_LOC.
+   Update AD and ADDR_LOC if it is necessary.  Return true if a
+   substitution was made.  */
+static bool
+equiv_address_substitution (struct address *ad, rtx *addr_loc,
+			    enum machine_mode mode, addr_space_t as,
+			    enum rtx_code code)
+{
+  rtx base_reg, new_base_reg, index_reg, new_index_reg;
+  HOST_WIDE_INT disp, scale;
+  bool change_p;
+
+  if (ad->base_reg_loc == NULL)
+    base_reg = new_base_reg = NULL_RTX;
+  else
+    {
+      base_reg = *ad->base_reg_loc;
+      new_base_reg = get_equiv_substitution (base_reg);
+    }
+  if (ad->index_reg_loc == NULL)
+    index_reg = new_index_reg = NULL_RTX;
+  else
+    {
+      index_reg = *ad->index_reg_loc;
+      new_index_reg = get_equiv_substitution (index_reg);
+    }
+  if (base_reg == new_base_reg && index_reg == new_index_reg)
+    return false;
+  disp = 0;
+  change_p = false;
+  if (lra_dump_file != NULL)
+    {
+      fprintf (lra_dump_file, "Changing address in insn %d ",
+	       INSN_UID (curr_insn));
+      print_value_slim (lra_dump_file, *addr_loc, 1);
+    }
+  if (base_reg != new_base_reg)
+    {
+      if (REG_P (new_base_reg))
+	{
+	  *ad->base_reg_loc = new_base_reg;
+	  change_p = true;
+	}
+      else if (GET_CODE (new_base_reg) == PLUS
+	       && REG_P (XEXP (new_base_reg, 0))
+	       && CONST_INT_P (XEXP (new_base_reg, 1)))
+	{
+	  disp += INTVAL (XEXP (new_base_reg, 1));
+	  *ad->base_reg_loc = XEXP (new_base_reg, 0);
+	  change_p = true;
+	}
+      if (ad->base_reg_loc2 != NULL)
+	*ad->base_reg_loc2 = *ad->base_reg_loc;
+    }
+  scale = 1;
+  if (ad->index_loc != NULL && GET_CODE (*ad->index_loc) == MULT)
+    {
+      lra_assert (CONST_INT_P (XEXP (*ad->index_loc, 1)));
+      scale = INTVAL (XEXP (*ad->index_loc, 1));
+    }
+  if (index_reg != new_index_reg)
+    {
+      if (REG_P (new_index_reg))
+	{
+	  *ad->index_reg_loc = new_index_reg;
+	  change_p = true;
+	}
+      else if (GET_CODE (new_index_reg) == PLUS
+	       && REG_P (XEXP (new_index_reg, 0))
+	       && CONST_INT_P (XEXP (new_index_reg, 1)))
+	{
+	  disp += INTVAL (XEXP (new_index_reg, 1)) * scale;
+	  *ad->index_reg_loc = XEXP (new_index_reg, 0);
+	  change_p = true;
+	}
+    }
+  if (disp != 0)
+    {
+      if (ad->disp_loc != NULL)
+	*ad->disp_loc = plus_constant (Pmode, *ad->disp_loc, disp);
+      else
+	{
+	  *addr_loc = gen_rtx_PLUS (Pmode, *addr_loc, GEN_INT (disp));
+	  extract_address_regs (mode, as, addr_loc, code, ad);
+	}
+      change_p = true;
+    }
+  if (lra_dump_file != NULL)
+    {
+      if (! change_p)
+	fprintf (lra_dump_file, " -- no change\n");
+      else
+	{
+	  fprintf (lra_dump_file, " on equiv ");
+	  print_value_slim (lra_dump_file, *addr_loc, 1);
+	  fprintf (lra_dump_file, "\n");
+	}
+    }
+  return change_p;
+}
+
+/* Exchange operands of plus X.	 */
+static void
+exchange_plus_ops (rtx x)
+{
+  rtx op0;
+
+  lra_assert (GET_CODE (x) == PLUS);
+  op0 = XEXP (x, 0);
+  XEXP (x, 0) = XEXP (x, 1);
+  XEXP (x, 1) = op0;
+}
+
+/* Major function to make reloads for address in operand NOP.  Add to
+   reloads to the list *BEFORE and *AFTER.  We might need to add
+   reloads to *AFTER because of inc/dec, {pre, post} modify in the
+   address.  Return true for any RTL change.  */
+static bool
+process_address (int nop, rtx *before, rtx *after)
+{
+  struct address ad;
+  enum machine_mode mode;
+  rtx new_reg, *addr_loc, saved_index_reg, saved_base_reg, saved_base_reg2;
+  bool ok_p;
+  addr_space_t as;
+  rtx op = *curr_id->operand_loc[nop];
+  const char *constraint = curr_static_id->operand[nop].constraint;
+  bool change_p;
+  enum rtx_code code;
+
+  if (constraint[0] == 'p'
+      || EXTRA_ADDRESS_CONSTRAINT (constraint[0], constraint))
+    {
+      mode = VOIDmode;
+      addr_loc = curr_id->operand_loc[nop];
+      as = ADDR_SPACE_GENERIC;
+      code = ADDRESS;
+    }
+  else if (MEM_P (op))
+    {
+      mode = GET_MODE (op);
+      addr_loc = &XEXP (op, 0);
+      as = MEM_ADDR_SPACE (op);
+      code = MEM;
+    }
+  else if (GET_CODE (op) == SUBREG
+	   && MEM_P (SUBREG_REG (op)))
+    {
+      mode = GET_MODE (SUBREG_REG (op));
+      addr_loc = &XEXP (SUBREG_REG (op), 0);
+      as = MEM_ADDR_SPACE (SUBREG_REG (op));
+      code = MEM;
+    }
+  else
+    return false;
+  if (GET_CODE (*addr_loc) == AND)
+    addr_loc = &XEXP (*addr_loc, 0);
+  extract_address_regs (mode, as, addr_loc, code, &ad);
+  saved_base_reg = saved_base_reg2 = saved_index_reg = NULL_RTX;
+  change_p = equiv_address_substitution (&ad, addr_loc, mode, as, code);
+  if (ad.base_reg_loc != NULL)
+    {
+      if (process_addr_reg
+	  (ad.base_reg_loc, before,
+	   (ad.base_modify_p && REG_P (*ad.base_reg_loc)
+	    && find_regno_note (curr_insn, REG_DEAD,
+				REGNO (*ad.base_reg_loc)) == NULL
+	    ? after : NULL),
+	   base_reg_class (mode, as, ad.base_outer_code, ad.index_code)))
+	change_p = true;
+      if (ad.base_reg_loc2 != NULL)
+	*ad.base_reg_loc2 = *ad.base_reg_loc;
+      saved_base_reg = *ad.base_reg_loc;
+      lra_eliminate_reg_if_possible (ad.base_reg_loc);
+      if (ad.base_reg_loc2 != NULL)
+	{
+	  saved_base_reg2 = *ad.base_reg_loc2;
+	  lra_eliminate_reg_if_possible (ad.base_reg_loc2);
+	}
+    }
+  if (ad.index_reg_loc != NULL)
+    {
+      if (process_addr_reg (ad.index_reg_loc, before, NULL, INDEX_REG_CLASS))
+	change_p = true;
+      saved_index_reg = *ad.index_reg_loc;
+      lra_eliminate_reg_if_possible (ad.index_reg_loc);
+    }
+  /* Some ports do not check displacements for virtual registers -- so
+     we substitute them temporarily by real registers.	*/
+  ok_p = valid_address_p (mode, *addr_loc, as);
+  if (saved_base_reg != NULL_RTX)
+    {
+      *ad.base_reg_loc = saved_base_reg;
+      if (saved_base_reg2 != NULL_RTX)
+	*ad.base_reg_loc = saved_base_reg2;
+    }
+  if (saved_index_reg != NULL_RTX)
+    *ad.index_reg_loc = saved_index_reg;
+  if (ok_p
+      /* The following addressing is checked by constraints and
+	 usually target specific legitimate address hooks do not
+	 consider them valid.  */
+      || GET_CODE (*addr_loc) == POST_DEC || GET_CODE (*addr_loc) == POST_INC
+      || GET_CODE (*addr_loc) == PRE_DEC || GET_CODE (*addr_loc) == PRE_DEC
+      || GET_CODE (*addr_loc) == PRE_MODIFY
+      || GET_CODE (*addr_loc) == POST_MODIFY
+      /* In this case we can not do anything because if it is wrong
+	 that is because of wrong displacement.	 Remember that any
+	 address was legitimate in non-strict sense before LRA.	 */
+      || ad.disp_loc == NULL)
+    return change_p;
+
+  /* Addresses were legitimate before LRA.  So if the address has
+     two registers than it can have two of them.  We should also
+     not worry about scale for the same reason.	 */
+  push_to_sequence (*before);
+  if (ad.base_reg_loc == NULL)
+    {
+      if (ad.index_reg_loc == NULL)
+	{
+	  int code = -1;
+	  enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
+	  
+	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
+#ifdef HAVE_lo_sum
+	  {
+	    rtx insn;
+	    rtx last = get_last_insn ();
+
+	    /* disp => lo_sum (new_base, disp)	*/
+	    insn = emit_insn (gen_rtx_SET
+			      (VOIDmode, new_reg,
+			       gen_rtx_HIGH (Pmode, copy_rtx (*ad.disp_loc))));
+	    code = recog_memoized (insn);
+	    if (code >= 0)
+	      {
+		rtx save = *ad.disp_loc;
+
+		*ad.disp_loc = gen_rtx_LO_SUM (Pmode, new_reg, *ad.disp_loc);
+		if (! valid_address_p (mode, *ad.disp_loc, as))
+		  {
+		    *ad.disp_loc = save;
+		    code = -1;
+		  }
+	      }
+	    if (code < 0)
+	      delete_insns_since (last);
+	  }
+#endif
+	  if (code < 0)
+	    {
+	      /* disp => new_base  */
+	      lra_emit_move (new_reg, *ad.disp_loc);
+	      *ad.disp_loc = new_reg;
+	    }
+	}
+      else
+	{
+	  /* index * scale + disp => new base + index * scale  */
+	  enum reg_class cl = base_reg_class (mode, as, SCRATCH, SCRATCH);
+
+	  lra_assert (INDEX_REG_CLASS != NO_REGS);
+	  new_reg = lra_create_new_reg (Pmode, NULL_RTX, cl, "disp");
+	  lra_assert (GET_CODE (*addr_loc) == PLUS);
+	  lra_emit_move (new_reg, *ad.disp_loc);
+	  if (CONSTANT_P (XEXP (*addr_loc, 1)))
+	    XEXP (*addr_loc, 1) = XEXP (*addr_loc, 0);
+	  XEXP (*addr_loc, 0) = new_reg;
+	  /* Some targets like ARM, accept address operands in
+	     specific order -- try exchange them if necessary.	*/
+	  if (! valid_address_p (mode, *addr_loc, as))
+	    {
+	      exchange_plus_ops (*addr_loc);
+	      if (! valid_address_p (mode, *addr_loc, as))
+		exchange_plus_ops (*addr_loc);
+	    }
+	}
+    }
+  else if (ad.index_reg_loc == NULL)
+    {
+      /* We don't use transformation 'base + disp => base + new index'
+	 because of bad practice used in some machine descriptions
+	 (see comments for emit_spill_move).  */
+      /* base + disp => new base  */
+      new_reg = base_plus_disp_to_reg (mode, as, &ad);
+      *addr_loc = new_reg;
+    }
+  else
+    {
+      /* base + scale * index + disp => new base + scale * index  */
+      new_reg = base_plus_disp_to_reg (mode, as, &ad);
+      *addr_loc = gen_rtx_PLUS (Pmode, new_reg, *ad.index_loc);
+      if (! valid_address_p (mode, *addr_loc, as))
+	{
+	  /* Some targets like ARM, accept address operands in
+	     specific order -- try exchange them if necessary.	*/
+	  exchange_plus_ops (*addr_loc);
+	  if (! valid_address_p (mode, *addr_loc, as))
+	    exchange_plus_ops (*addr_loc);
+	}
+    }
+  *before = get_insns ();
+  end_sequence ();
+  return true;
+}
+
+/* Emit insns to reload VALUE into a new register.  VALUE is an
+   auto-increment or auto-decrement RTX whose operand is a register or
+   memory location; so reloading involves incrementing that location.
+   IN is either identical to VALUE, or some cheaper place to reload
+   value being incremented/decremented from.
+
+   INC_AMOUNT is the number to increment or decrement by (always
+   positive and ignored for POST_MODIFY/PRE_MODIFY).
+
+   Return pseudo containing the result.	 */
+static rtx
+emit_inc (enum reg_class new_rclass, rtx in, rtx value, int inc_amount)
+{
+  /* REG or MEM to be copied and incremented.  */
+  rtx incloc = XEXP (value, 0);
+  /* Nonzero if increment after copying.  */
+  int post = (GET_CODE (value) == POST_DEC || GET_CODE (value) == POST_INC
+	      || GET_CODE (value) == POST_MODIFY);
+  rtx last;
+  rtx inc;
+  rtx add_insn;
+  int code;
+  rtx real_in = in == value ? incloc : in;
+  rtx result;
+  bool plus_p = true;
+
+  if (GET_CODE (value) == PRE_MODIFY || GET_CODE (value) == POST_MODIFY)
+    {
+      lra_assert (GET_CODE (XEXP (value, 1)) == PLUS
+		  || GET_CODE (XEXP (value, 1)) == MINUS);
+      lra_assert (rtx_equal_p (XEXP (XEXP (value, 1), 0), XEXP (value, 0)));
+      plus_p = GET_CODE (XEXP (value, 1)) == PLUS;
+      inc = XEXP (XEXP (value, 1), 1);
+    }
+  else
+    {
+      if (GET_CODE (value) == PRE_DEC || GET_CODE (value) == POST_DEC)
+	inc_amount = -inc_amount;
+
+      inc = GEN_INT (inc_amount);
+    }
+
+  if (! post && REG_P (incloc))
+    result = incloc;
+  else
+    result = lra_create_new_reg (GET_MODE (value), value, new_rclass,
+				 "INC/DEC result");
+
+  /* If this is post-increment, first copy the location to the reload reg.  */
+  if (post && real_in != result)
+    emit_insn (gen_move_insn (result, real_in));
+
+  /* We suppose that there are insns to add/sub with the constant
+     increment permitted in {PRE/POST)_{DEC/INC/MODIFY}.  At least the
+     old reload worked with this assumption.  If the assumption
+     becomes wrong, we should use approach in function
+     base_plus_disp_to_reg.  */
+  if (in == value)
+    {
+      /* See if we can directly increment INCLOC.  */
+      last = get_last_insn ();
+      add_insn = emit_insn (plus_p
+			    ? gen_add2_insn (incloc, inc)
+			    : gen_sub2_insn (incloc, inc));
+
+      code = recog_memoized (add_insn);
+      /* We should restore recog_data for the current insn.  */
+      if (code >= 0)
+	{
+	  if (! post && result != incloc)
+	    emit_insn (gen_move_insn (result, incloc));
+	  return result;
+	}
+      delete_insns_since (last);
+    }
+
+  /* If couldn't do the increment directly, must increment in RESULT.
+     The way we do this depends on whether this is pre- or
+     post-increment.  For pre-increment, copy INCLOC to the reload
+     register, increment it there, then save back.  */
+  if (! post)
+    {
+      if (real_in != result)
+	emit_insn (gen_move_insn (result, real_in));
+      if (plus_p)
+	emit_insn (gen_add2_insn (result, inc));
+      else
+	emit_insn (gen_sub2_insn (result, inc));
+      if (result != incloc)
+	emit_insn (gen_move_insn (incloc, result));
+    }
+  else
+    {
+      /* Post-increment.
+
+	 Because this might be a jump insn or a compare, and because
+	 RESULT may not be available after the insn in an input
+	 reload, we must do the incrementing before the insn being
+	 reloaded for.
+
+	 We have already copied IN to RESULT.  Increment the copy in
+	 RESULT, save that back, then decrement RESULT so it has
+	 the original value.  */
+      if (plus_p)
+	emit_insn (gen_add2_insn (result, inc));
+      else
+	emit_insn (gen_sub2_insn (result, inc));
+      emit_insn (gen_move_insn (incloc, result));
+      /* Restore non-modified value for the result.  We prefer this
+	 way because it does not require an addition hard
+	 register.  */
+      if (plus_p)
+	{
+	  if (CONST_INT_P (inc))
+	    emit_insn (gen_add2_insn (result, GEN_INT (-INTVAL (inc))));
+	  else
+	    emit_insn (gen_sub2_insn (result, inc));
+	}
+      else if (CONST_INT_P (inc))
+	emit_insn (gen_add2_insn (result, inc));
+      else
+	emit_insn (gen_add2_insn (result, inc));
+    }
+  return result;
+}
+
+/* Main entry point of this file: search the body of the current insn
+   to choose the best alternative.  It is mimicking insn alternative
+   cost calculation model of former reload pass.  That is because
+   machine descriptions were written to use this model.	 This model
+   can be changed in future.  Make commutative operand exchange if it
+   is chosen.
+
+   Return true if some RTL changes happened during function call.  */
+static bool
+curr_insn_transform (void)
+{
+  int i, j, k;
+  int n_operands;
+  int n_alternatives;
+  int commutative;
+  signed char goal_alt_matched[MAX_RECOG_OPERANDS][MAX_RECOG_OPERANDS];
+  rtx before, after;
+  bool alt_p = false;
+  /* Flag that the insn has been changed through a transformation.  */
+  bool change_p;
+  bool sec_mem_p;
+#ifdef SECONDARY_MEMORY_NEEDED
+  bool use_sec_mem_p;
+#endif
+  int max_regno_before;
+  int reused_alternative_num;
+
+  no_input_reloads_p = no_output_reloads_p = false;
+  goal_alt_number = -1;
+
+  if (check_and_process_move (&change_p, &sec_mem_p))
+    return change_p;
+
+  /* JUMP_INSNs and CALL_INSNs are not allowed to have any output
+     reloads; neither are insns that SET cc0.  Insns that use CC0 are
+     not allowed to have any input reloads.  */
+  if (JUMP_P (curr_insn) || CALL_P (curr_insn))
+    no_output_reloads_p = true;
+
+#ifdef HAVE_cc0
+  if (reg_referenced_p (cc0_rtx, PATTERN (curr_insn)))
+    no_input_reloads_p = true;
+  if (reg_set_p (cc0_rtx, PATTERN (curr_insn)))
+    no_output_reloads_p = true;
+#endif
+
+  n_operands = curr_static_id->n_operands;
+  n_alternatives = curr_static_id->n_alternatives;
+
+  /* Just return "no reloads" if insn has no operands with
+     constraints.  */
+  if (n_operands == 0 || n_alternatives == 0)
+    return false;
+
+  max_regno_before = max_reg_num ();
+
+  for (i = 0; i < n_operands; i++)
+    {
+      goal_alt_matched[i][0] = -1;
+      goal_alt_matches[i] = -1;
+    }
+
+  commutative = curr_static_id->commutative;
+
+  /* Now see what we need for pseudos that didn't get hard regs or got
+     the wrong kind of hard reg.  For this, we must consider all the
+     operands together against the register constraints.  */
+
+  best_losers = best_overall = MAX_RECOG_OPERANDS * 2 + MAX_OVERALL_COST_BOUND;
+  best_small_class_operands_num = best_reload_sum = 0;
+
+  curr_swapped = false;
+  goal_alt_swapped = false;
+
+  /* Make equivalence substitution and memory subreg elimination
+     before address processing because an address legitimacy can
+     depend on memory mode.  */
+  for (i = 0; i < n_operands; i++)
+    {
+      rtx op = *curr_id->operand_loc[i];
+      rtx subst, old = op;
+      bool op_change_p = false;
+
+      if (GET_CODE (old) == SUBREG)
+	old = SUBREG_REG (old);
+      subst = get_equiv_substitution (old);
+      if (subst != old)
+	{
+	  subst = copy_rtx (subst);
+	  lra_assert (REG_P (old));
+	  if (GET_CODE (op) == SUBREG)
+	    SUBREG_REG (op) = subst;
+	  else
+	    *curr_id->operand_loc[i] = subst;
+	  if (lra_dump_file != NULL)
+	    {
+	      fprintf (lra_dump_file,
+		       "Changing pseudo %d in operand %i of insn %u on equiv ",
+		       REGNO (old), i, INSN_UID (curr_insn));
+	      print_value_slim (lra_dump_file, subst, 1);
+	      fprintf (lra_dump_file, "\n");
+	    }
+	  op_change_p = change_p = true;
+	}
+      if (simplify_operand_subreg (i, GET_MODE (old)) || op_change_p)
+	{
+	  change_p = true;
+	  lra_update_dup (curr_id, i);
+	}
+    }
+
+  /* Reload address registers and displacements.  We do it before
+     finding an alternative because of memory constraints.  */
+  before = after = NULL_RTX;
+  for (i = 0; i < n_operands; i++)
+    if (! curr_static_id->operand[i].is_operator
+	&& process_address (i, &before, &after))
+      {
+	change_p = true;
+	lra_update_dup (curr_id, i);
+      }
+  
+  if (change_p)
+    /* Changes in the insn might result in that we can not satisfy
+       constraints in lately used alternative of the insn.  */
+    lra_set_used_insn_alternative (curr_insn, -1);
+
+ try_swapped:
+
+  reused_alternative_num = curr_id->used_insn_alternative;
+  if (lra_dump_file != NULL && reused_alternative_num >= 0)
+    fprintf (lra_dump_file, "Reusing alternative %d for insn #%u\n",
+	     reused_alternative_num, INSN_UID (curr_insn));
+
+  if (process_alt_operands (reused_alternative_num))
+    alt_p = true;
+
+  /* If insn is commutative (it's safe to exchange a certain pair of
+     operands) then we need to try each alternative twice, the second
+     time matching those two operands as if we had exchanged them.  To
+     do this, really exchange them in operands.
+
+     If we have just tried the alternatives the second time, return
+     operands to normal and drop through.  */
+
+  if (reused_alternative_num < 0 && commutative >= 0)
+    {
+      rtx x;
+
+      curr_swapped = !curr_swapped;
+      if (curr_swapped)
+	{
+	  x = *curr_id->operand_loc[commutative];
+	  *curr_id->operand_loc[commutative]
+	    = *curr_id->operand_loc[commutative + 1];
+	  *curr_id->operand_loc[commutative + 1] = x;
+	  /* Swap the duplicates too.  */
+	  lra_update_dup (curr_id, commutative);
+	  lra_update_dup (curr_id, commutative + 1);
+	  goto try_swapped;
+	}
+      else
+	{
+	  x = *curr_id->operand_loc[commutative];
+	  *curr_id->operand_loc[commutative]
+	    = *curr_id->operand_loc[commutative + 1];
+	  *curr_id->operand_loc[commutative + 1] = x;
+	  lra_update_dup (curr_id, commutative);
+	  lra_update_dup (curr_id, commutative + 1);
+	}
+    }
+
+  /* The operands don't meet the constraints.  goal_alt describes the
+     alternative that we could reach by reloading the fewest operands.
+     Reload so as to fit it.  */
+
+  if (! alt_p && ! sec_mem_p)
+    {
+      /* No alternative works with reloads??  */
+      if (INSN_CODE (curr_insn) >= 0)
+	fatal_insn ("unable to generate reloads for:", curr_insn);
+      error_for_asm (curr_insn,
+		     "inconsistent operand constraints in an %<asm%>");
+      /* Avoid further trouble with this insn.	*/
+      PATTERN (curr_insn) = gen_rtx_USE (VOIDmode, const0_rtx);
+      lra_invalidate_insn_data (curr_insn);
+      return true;
+    }
+
+  /* If the best alternative is with operands 1 and 2 swapped, swap
+     them.  Update the operand numbers of any reloads already
+     pushed.  */
+
+  if (goal_alt_swapped)
+    {
+      rtx tem;
+
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "  Commutative operand exchange in insn %u\n",
+		 INSN_UID (curr_insn));
+
+      tem = *curr_id->operand_loc[commutative];
+      *curr_id->operand_loc[commutative]
+	= *curr_id->operand_loc[commutative + 1];
+      *curr_id->operand_loc[commutative + 1] = tem;
+
+      /* Swap the duplicates too.  */
+      lra_update_dup (curr_id, commutative);
+      lra_update_dup (curr_id, commutative + 1);
+      change_p = true;
+    }
+
+#ifdef SECONDARY_MEMORY_NEEDED
+  /* Some target macros SECONDARY_MEMORY_NEEDED (e.g. x86) are defined
+     too conservatively.  So we use the secondary memory only if there
+     is no any alternative without reloads.  */
+  use_sec_mem_p = false;
+  if (! alt_p)
+    use_sec_mem_p = true;
+  else if (sec_mem_p)
+    {
+      for (i = 0; i < n_operands; i++)
+	if (! goal_alt_win[i] && ! goal_alt_match_win[i])
+	  break;
+      use_sec_mem_p = i < n_operands;
+    }
+
+  if (use_sec_mem_p)
+    {
+      rtx new_reg, set, src, dest;
+      enum machine_mode sec_mode;
+
+      lra_assert (sec_mem_p);
+      set = single_set (curr_insn);
+      lra_assert (set != NULL_RTX && ! side_effects_p (set));
+      dest = SET_DEST (set);
+      src = SET_SRC (set);
+#ifdef SECONDARY_MEMORY_NEEDED_MODE
+      sec_mode = SECONDARY_MEMORY_NEEDED_MODE (GET_MODE (src));
+#else
+      sec_mode = GET_MODE (src);
+#endif
+      new_reg = lra_create_new_reg (sec_mode, NULL_RTX,
+				    NO_REGS, "secondary");
+      /* If the mode is changed, it should be wider.  */
+      lra_assert (GET_MODE_SIZE (GET_MODE (new_reg))
+		  >= GET_MODE_SIZE (GET_MODE (src)));
+      after = emit_spill_move (false, new_reg, dest, INSN_CODE (curr_insn));
+      lra_process_new_insns (curr_insn, NULL_RTX, after,
+			     "Inserting the sec. move");
+      before = emit_spill_move (true, new_reg, src, INSN_CODE (curr_insn));
+      lra_process_new_insns (curr_insn, before, NULL_RTX, "Changing on");
+      lra_set_insn_deleted (curr_insn);
+      return true;
+    }
+#endif
+
+  lra_assert (goal_alt_number >= 0);
+  lra_set_used_insn_alternative (curr_insn, goal_alt_number);
+
+  if (lra_dump_file != NULL)
+    {
+      const char *p;
+
+      fprintf (lra_dump_file, "	 Choosing alt %d in insn %u:",
+	       goal_alt_number, INSN_UID (curr_insn));
+      for (i = 0; i < n_operands; i++)
+	{
+	  p = (curr_static_id->operand_alternative
+	       [goal_alt_number * n_operands + i].constraint);
+	  if (*p == '\0')
+	    continue;
+	  fprintf (lra_dump_file, "  (%d) ", i);
+	  for (; *p != '\0' && *p != ',' && *p != '#'; p++)
+	    fputc (*p, lra_dump_file);
+	}
+      fprintf (lra_dump_file, "\n");
+    }
+
+  /* Right now, for any pair of operands I and J that are required to
+     match, with J < I, goal_alt_matches[I] is J.  Add I to
+     goal_alt_matched[J].  */
+  
+  for (i = 0; i < n_operands; i++)
+    if ((j = goal_alt_matches[i]) >= 0)
+      {
+	for (k = 0; goal_alt_matched[j][k] >= 0; k++)
+	  ;
+	/* We allow matching one output operand and several input
+	   operands.  */
+	lra_assert (k == 0
+		    || (curr_static_id->operand[j].type == OP_OUT
+			&& curr_static_id->operand[i].type == OP_IN
+			&& (curr_static_id->operand
+			    [goal_alt_matched[j][0]].type == OP_IN)));
+	goal_alt_matched[j][k] = i;
+	goal_alt_matched[j][k + 1] = -1;
+      }
+  
+  for (i = 0; i < n_operands; i++)
+    goal_alt_win[i] |= goal_alt_match_win[i];
+  
+  /* Any constants that aren't allowed and can't be reloaded into
+     registers are here changed into memory references.	 */
+  for (i = 0; i < n_operands; i++)
+    if (goal_alt_win[i])
+      {
+	int regno;
+	enum reg_class new_class;
+	rtx reg = *curr_id->operand_loc[i];
+
+	if (GET_CODE (reg) == SUBREG)
+	  reg = SUBREG_REG (reg);
+	    
+	if (REG_P (reg) && (regno = REGNO (reg)) >= FIRST_PSEUDO_REGISTER)
+	  {
+	    bool ok_p = in_class_p (regno, GET_MODE (reg),
+				    goal_alt[i], &new_class);
+
+	    if (new_class != NO_REGS && get_reg_class (regno) != new_class)
+	      {
+		lra_assert (ok_p);
+		change_class (regno, new_class, "      Change", true);
+	      }
+	  }
+      }
+    else
+      {
+	const char *constraint;
+	char c;
+	rtx op = *curr_id->operand_loc[i];
+	rtx subreg = NULL_RTX;
+	rtx plus = NULL_RTX;
+	enum machine_mode mode = get_op_mode (i);
+	
+	if (GET_CODE (op) == SUBREG)
+	  {
+	    subreg = op;
+	    op = SUBREG_REG (op);
+	    mode = GET_MODE (op);
+	  }
+	
+	if (GET_CODE (op) == PLUS)
+	  {
+	    plus = op;
+	    op = XEXP (op, 1);
+	  }
+	
+	if (CONST_POOL_OK_P (mode, op)
+	    && ((targetm.preferred_reload_class
+		 (op, (enum reg_class) goal_alt[i]) == NO_REGS)
+		|| no_input_reloads_p)
+	    && mode != VOIDmode)
+	  {
+	    rtx tem = force_const_mem (mode, op);
+	    
+	    change_p = true;
+	    /* If we stripped a SUBREG or a PLUS above add it back.  */
+	    if (plus != NULL_RTX)
+	      tem = gen_rtx_PLUS (mode, XEXP (plus, 0), tem);
+	    
+	    if (subreg != NULL_RTX)
+	      tem = gen_rtx_SUBREG (mode, tem, SUBREG_BYTE (subreg));
+	    
+	    *curr_id->operand_loc[i] = tem;
+	    lra_update_dup (curr_id, i);
+	    process_address (i, &before, &after);
+	    
+	    /* If the alternative accepts constant pool refs directly
+	       there will be no reload needed at all.  */
+	    if (plus != NULL_RTX || subreg != NULL_RTX)
+	      continue;
+	    /* Skip alternatives before the one requested.  */
+	    constraint = (curr_static_id->operand_alternative
+			  [goal_alt_number * n_operands + i].constraint);
+	    for (;
+		 (c = *constraint) && c != ',' && c != '#';
+		 constraint += CONSTRAINT_LEN (c, constraint))
+	      {
+		if (c == TARGET_MEM_CONSTRAINT || c == 'o')
+		  break;
+#ifdef EXTRA_CONSTRAINT_STR
+		if (EXTRA_MEMORY_CONSTRAINT (c, constraint)
+		    && EXTRA_CONSTRAINT_STR (tem, c, constraint))
+		  break;
+#endif
+	      }
+	    if (c == '\0' || c == ',' || c == '#')
+	      continue;
+	    
+	    goal_alt_win[i] = true;
+	  }
+      }
+  
+  for (i = 0; i < n_operands; i++)
+    {
+      rtx old, new_reg;
+      rtx op = *curr_id->operand_loc[i];
+
+      if (goal_alt_win[i])
+	{
+	  if (goal_alt[i] == NO_REGS
+	      && REG_P (op)
+	      && lra_former_scratch_operand_p (curr_insn, i))
+	    change_class (REGNO (op), NO_REGS, "      Change", true);
+	  continue;
+	}
+      
+      /* Operands that match previous ones have already been handled.  */
+      if (goal_alt_matches[i] >= 0)
+	continue;
+
+      /* We should not have an operand with a non-offsettable address
+	 appearing where an offsettable address will do.  It also may
+	 be a case when the address should be special in other words
+	 not a general one (e.g. it needs no index reg).  */
+      if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i] && MEM_P (op))
+	{
+	  enum reg_class rclass;
+	  rtx *loc = &XEXP (op, 0);
+	  enum rtx_code code = GET_CODE (*loc);
+
+	  push_to_sequence (before);
+	  rclass = base_reg_class (GET_MODE (op), MEM_ADDR_SPACE (op),
+				   MEM, SCRATCH);
+	  if (code == PRE_DEC || code == POST_DEC
+	      || code == PRE_INC || code == POST_INC
+	      || code == PRE_MODIFY || code == POST_MODIFY)
+	    new_reg = emit_inc (rclass, *loc, *loc,
+				/* This value does not matter for MODIFY.  */
+				GET_MODE_SIZE (GET_MODE (op)));
+	  else if (get_reload_reg (OP_IN, Pmode, *loc, rclass,
+				   "offsetable address", &new_reg))
+	    lra_emit_move (new_reg, *loc);
+	  before = get_insns ();
+	  end_sequence ();
+	  *loc = new_reg;
+	  lra_update_dup (curr_id, i);
+	}
+      else if (goal_alt_matched[i][0] == -1)
+	{
+	  enum machine_mode mode;
+	  rtx reg, *loc;
+	  int hard_regno, byte;
+	  enum op_type type = curr_static_id->operand[i].type;
+
+	  loc = curr_id->operand_loc[i];
+	  mode = get_op_mode (i);
+	  if (GET_CODE (*loc) == SUBREG)
+	    {
+	      reg = SUBREG_REG (*loc);
+	      byte = SUBREG_BYTE (*loc);
+	      if (REG_P (reg)
+		  /* Strict_low_part requires reload the register not
+		     the sub-register.	*/
+		  && (curr_static_id->operand[i].strict_low
+		      || (GET_MODE_SIZE (mode)
+			  <= GET_MODE_SIZE (GET_MODE (reg))
+			  && (hard_regno
+			      = get_try_hard_regno (REGNO (reg))) >= 0
+			  && (simplify_subreg_regno
+			      (hard_regno,
+			       GET_MODE (reg), byte, mode) < 0)
+			  && (goal_alt[i] == NO_REGS
+			      || (simplify_subreg_regno
+				  (ira_class_hard_regs[goal_alt[i]][0],
+				   GET_MODE (reg), byte, mode) >= 0)))))
+		{
+		  loc = &SUBREG_REG (*loc);
+		  mode = GET_MODE (*loc);
+		}
+	    }
+	  old = *loc;
+	  if (get_reload_reg (type, mode, old, goal_alt[i], "", &new_reg)
+	      && type != OP_OUT)
+	    {
+	      push_to_sequence (before);
+	      lra_emit_move (new_reg, old);
+	      before = get_insns ();
+	      end_sequence ();
+	    }
+	  *loc = new_reg;
+	  if (type != OP_IN)
+	    {
+	      if (find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX)
+		{
+		  start_sequence ();
+		  /* We don't want sharing subregs as the pseudo can
+		     get a memory and the memory can be processed
+		     several times for eliminations.  */
+		  lra_emit_move (GET_CODE (old) == SUBREG && type == OP_INOUT
+				 ? copy_rtx (old) : old,
+				 new_reg);
+		  emit_insn (after);
+		  after = get_insns ();
+		  end_sequence ();
+		}
+	      *loc = new_reg;
+	    }
+	  for (j = 0; j < goal_alt_dont_inherit_ops_num; j++)
+	    if (goal_alt_dont_inherit_ops[j] == i)
+	      {
+		lra_set_regno_unique_value (REGNO (new_reg));
+		break;
+	      }
+	  lra_update_dup (curr_id, i);
+	}
+      else if (curr_static_id->operand[i].type == OP_IN
+	       && (curr_static_id->operand[goal_alt_matched[i][0]].type
+		   == OP_OUT))
+	{
+	  signed char arr[2];
+
+	  arr[0] = i;
+	  arr[1] = -1;
+	  match_reload (goal_alt_matched[i][0], arr,
+			goal_alt[i], &before, &after);
+	}
+      else if (curr_static_id->operand[i].type == OP_OUT
+	       && (curr_static_id->operand[goal_alt_matched[i][0]].type
+		   == OP_IN))
+	match_reload (i, goal_alt_matched[i], goal_alt[i], &before, &after);
+      else
+	{
+	  lra_assert (INSN_CODE (curr_insn) < 0);
+	  error_for_asm (curr_insn,
+			 "inconsistent operand constraints in an %<asm%>");
+	  /* Avoid further trouble with this insn.  */
+	  PATTERN (curr_insn) = gen_rtx_USE (VOIDmode, const0_rtx);
+	  return false;
+	}
+    }
+  if (before != NULL_RTX || after != NULL_RTX
+      || max_regno_before != max_reg_num ())
+    change_p = true;
+  if (change_p)
+    {
+      lra_update_operator_dups (curr_id);
+      /* Something changes -- process the insn.	 */
+      lra_update_insn_regno_info (curr_insn);
+    }
+  lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
+  return change_p;
+}
+
+/* Return true if X is in LIST.	 */
+static bool
+in_list_p (rtx x, rtx list)
+{
+  for (; list != NULL_RTX; list = XEXP (list, 1))
+    if (XEXP (list, 0) == x)
+      return true;
+  return false;
+}
+
+/* Return true if X contains an allocatable hard register (if
+   HARD_REG_P) or a (spilled if SPILLED_P) pseudo.  */
+static bool
+contains_reg_p (rtx x, bool hard_reg_p, bool spilled_p)
+{
+  int i, j;
+  const char *fmt;
+  enum rtx_code code;
+
+  code = GET_CODE (x);
+  if (REG_P (x))
+    {
+      int regno = REGNO (x);
+      HARD_REG_SET alloc_regs;
+
+      if (hard_reg_p)
+	{
+	  if (regno >= FIRST_PSEUDO_REGISTER)
+	    regno = lra_get_regno_hard_regno (regno);
+	  if (regno < 0)
+	    return false;
+	  COMPL_HARD_REG_SET (alloc_regs, lra_no_alloc_regs);
+	  return lra_hard_reg_set_intersection_p (regno, GET_MODE (x),
+						  alloc_regs);
+	}
+      else
+	{
+	  if (regno < FIRST_PSEUDO_REGISTER)
+	    return false;
+	  if (! spilled_p)
+	    return true;
+	  return lra_get_regno_hard_regno (regno) < 0;
+	}
+    }
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	{
+	  if (contains_reg_p (XEXP (x, i), hard_reg_p, spilled_p))
+	    return true;
+	}
+      else if (fmt[i] == 'E')
+	{
+	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+	    if (contains_reg_p (XVECEXP (x, i, j), hard_reg_p, spilled_p))
+	      return true;
+	}
+    }
+  return false;
+}
+
+/* Process all regs in debug location *LOC and change them on
+   equivalent substitution.  Return true if any change was done.  */
+static bool
+debug_loc_equivalence_change_p (rtx *loc)
+{
+  rtx subst, reg, x = *loc;
+  bool result = false;
+  enum rtx_code code = GET_CODE (x);
+  const char *fmt;
+  int i, j;
+
+  if (code == SUBREG)
+    {
+      reg = SUBREG_REG (x);
+      if ((subst = get_equiv_substitution (reg)) != reg
+	  && GET_MODE (subst) == VOIDmode)
+	{
+	  /* We cannot reload debug location.  Simplify subreg here
+	     while we know the inner mode.  */
+	  *loc = simplify_gen_subreg (GET_MODE (x), subst,
+				      GET_MODE (reg), SUBREG_BYTE (x));
+	  return true;
+	}
+    }
+  if (code == REG && (subst = get_equiv_substitution (x)) != x)
+    {
+      *loc = subst;
+      return true;
+    }
+
+  /* Scan all the operand sub-expressions.  */
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	result = debug_loc_equivalence_change_p (&XEXP (x, i)) || result;
+      else if (fmt[i] == 'E')
+	for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+	  result
+	    = debug_loc_equivalence_change_p (&XVECEXP (x, i, j)) || result;
+    }
+  return result;
+}
+
+/* Maximum allowed number of constraint pass iterations after the last
+   spill pass.	It is for preventing LRA cycling in a bug case.	 */
+#define MAX_CONSTRAINT_ITERATION_NUMBER 15
+
+/* Maximum number of generated reload insns per an insn.  It is for
+   preventing this pass cycling in a bug case.	*/
+#define MAX_RELOAD_INSNS_NUMBER LRA_MAX_INSN_RELOADS
+
+/* The current iteration number of this LRA pass.  */
+int lra_constraint_iter;
+
+/* The current iteration number of this LRA pass after the last spill
+   pass.  */
+int lra_constraint_iter_after_spill;
+
+/* True if we substituted equiv which needs checking register
+   allocation correctness because the equivalent value contains
+   allocatable hard registers or when we restore multi-register
+   pseudo.  */
+bool lra_risky_transformations_p;
+
+/* Entry function of LRA constraint pass.  Return true if the
+   constraint pass did change the code.	 */
+bool
+lra_constraints (bool first_p)
+{
+  bool changed_p;
+  int i, hard_regno, new_insns_num;
+  unsigned int min_len;
+  rtx set, x, dest_reg;
+  basic_block last_bb;
+
+  lra_constraint_iter++;
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file, "\n********** Local #%d: **********\n\n",
+	     lra_constraint_iter);
+  lra_constraint_iter_after_spill++;
+  if (lra_constraint_iter_after_spill > MAX_CONSTRAINT_ITERATION_NUMBER)
+    internal_error
+      ("Maximum number of LRA constraint passes is achieved (%d)\n",
+       MAX_CONSTRAINT_ITERATION_NUMBER);
+  changed_p = false;
+  lra_risky_transformations_p = false;
+  new_insn_uid_start = get_max_uid ();
+  new_regno_start = first_p ? lra_constraint_new_regno_start : max_reg_num ();
+  for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
+    ira_reg_equiv[i].profitable_p = true;
+  for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
+    if (lra_reg_info[i].nrefs != 0)
+      {
+	if ((hard_regno = lra_get_regno_hard_regno (i)) >= 0)
+	  {
+	    int j, nregs = hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (i)];
+	    
+	    for (j = 0; j < nregs; j++)
+	      df_set_regs_ever_live (hard_regno + j, true);
+	  }
+	else if ((x = get_equiv_substitution (regno_reg_rtx[i])) != NULL_RTX)
+	  {
+	    if (! first_p && contains_reg_p (x, false, false))
+	      /* After RTL transformation, we can not guarantee that
+		 pseudo in the substitution was not reloaded which
+		 might make equivalence invalid.  For example, in
+		 reverse equiv of p0
+
+		 p0 <- ...
+		 ...
+		 equiv_mem <- p0
+
+		 the memory address register was reloaded before the
+		 2nd insn.  */
+	      ira_reg_equiv[i].defined_p = false;
+	    if (contains_reg_p (x, false, true))
+	      ira_reg_equiv[i].profitable_p = false;
+	  }
+      }
+  lra_eliminate (false);
+  min_len = VEC_length (rtx, lra_constraint_insn_stack);
+  new_insns_num = 0;
+  last_bb = NULL;
+  changed_p = false;
+  for (;VEC_length (rtx, lra_constraint_insn_stack) != 0;)
+    {
+      curr_insn = VEC_pop (rtx, lra_constraint_insn_stack);
+      bitmap_clear_bit (&lra_constraint_insn_stack_bitmap,
+			INSN_UID (curr_insn));
+      curr_bb = BLOCK_FOR_INSN (curr_insn); 
+      if (curr_bb != last_bb)
+	{
+	  last_bb = curr_bb;
+	  bb_reload_num = lra_curr_reload_num;
+	}
+      if (min_len > VEC_length (rtx, lra_constraint_insn_stack))
+	{
+	  min_len = VEC_length (rtx, lra_constraint_insn_stack);
+	  new_insns_num = 0;
+	}
+      if (new_insns_num > MAX_RELOAD_INSNS_NUMBER)
+	internal_error
+	  ("Max. number of generated reload insns per insn is achieved (%d)\n",
+	   MAX_RELOAD_INSNS_NUMBER);
+      new_insns_num++;
+      if (DEBUG_INSN_P (curr_insn))
+	{
+	  /* We need to check equivalence in debug insn and change
+	     pseudo to the equivalent value if necessary.  */
+	  curr_id = lra_get_insn_recog_data (curr_insn);
+	  if (debug_loc_equivalence_change_p (curr_id->operand_loc[0]))
+	    changed_p = true;
+	}
+      else if (INSN_P (curr_insn))
+	{
+	  if ((set = single_set (curr_insn)) != NULL_RTX)
+	    {
+	      dest_reg = SET_DEST (set);
+	      /* The equivalence pseudo could be set up as SUBREG in a
+		 case when it is a call restore insn in a mode
+		 different from the pseudo mode.  */
+	      if (GET_CODE (dest_reg) == SUBREG)
+		dest_reg = SUBREG_REG (dest_reg);
+	      if ((REG_P (dest_reg)
+		   && (x = get_equiv_substitution (dest_reg)) != dest_reg
+		   /* Remove insns which set up a pseudo whose value
+		      can not be changed.  Such insns might be not in
+		      init_insns because we don't update equiv data
+		      during insn transformations.
+			  
+		      As an example, let suppose that a pseudo got
+		      hard register and on the 1st pass was not
+		      changed to equivalent constant.  We generate an
+		      additional insn setting up the pseudo because of
+		      secondary memory movement.  Then the pseudo is
+		      spilled and we use the equiv constant.  In this
+		      case we should remove the additional insn and
+		      this insn is not init_insns list.	 */
+		   && (! MEM_P (x) || MEM_READONLY_P (x)
+		       || in_list_p (curr_insn,
+				     ira_reg_equiv
+				     [REGNO (dest_reg)].init_insns)))
+		  || (((x = get_equiv_substitution (SET_SRC (set)))
+		       != SET_SRC (set))
+		      && in_list_p (curr_insn,
+				    ira_reg_equiv
+				    [REGNO (SET_SRC (set))].init_insns)))
+		{
+		  /* This is equiv init insn of pseudo which did not get a
+		     hard register -- remove the insn.	*/
+		  if (lra_dump_file != NULL)
+		    {
+		      fprintf (lra_dump_file,
+			       "      Removing equiv init insn %i (freq=%d)\n",
+			       INSN_UID (curr_insn),
+			       BLOCK_FOR_INSN (curr_insn)->frequency);
+		      debug_rtl_slim (lra_dump_file,
+				      curr_insn, curr_insn, -1, 0);
+		    }
+		  if (contains_reg_p (x, true, false))
+		    lra_risky_transformations_p = true;
+		  lra_set_insn_deleted (curr_insn);
+		  continue;
+		}
+	    }
+	  curr_id = lra_get_insn_recog_data (curr_insn);
+	  curr_static_id = curr_id->insn_static_data;
+	  init_curr_insn_input_reloads ();
+	  if (curr_insn_transform ())
+	    changed_p = true;
+	}
+    }
+  /* If we used a new hard regno, changed_p should be true because the
+     hard reg is assigned to a new pseudo.  */
+#ifdef ENABLE_CHECKING
+  if (! changed_p)
+    {
+      for (i = FIRST_PSEUDO_REGISTER; i < new_regno_start; i++)
+	if (lra_reg_info[i].nrefs != 0
+	    && (hard_regno = lra_get_regno_hard_regno (i)) >= 0)
+	  {
+	    int j, nregs = hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (i)];
+	    
+	    for (j = 0; j < nregs; j++)
+	      lra_assert (df_regs_ever_live_p (hard_regno + j));
+	  }
+    }
+#endif
+  return changed_p;
+}
+
+/* Initiate the LRA constraint pass.  It is done once per
+   function.  */
+void
+lra_contraints_init (void)
+{
+  init_indirect_mem ();
+  bitmap_initialize (&lra_matched_pseudos, &reg_obstack);
+  bitmap_initialize (&lra_bound_pseudos, &reg_obstack);
+}
+
+/* Finalize the LRA constraint pass.  It is done once per
+   function.  */
+void
+lra_contraints_finish (void)
+{
+  bitmap_clear (&lra_bound_pseudos);
+  bitmap_clear (&lra_matched_pseudos);
+}
+
+
+
+/* This page contains code to do inheritance/split
+   transformations.  */
+
+/* Number of reloads passed so far in current EBB.  */
+static int reloads_num;
+
+/* Number of calls passed so far in current EBB.  */
+static int calls_num;
+
+/* Current reload pseudo check for validity of elements in
+   USAGE_INSNS.	 */
+static int curr_usage_insns_check;
+
+/* Info about last usage of registers in EBB to do inheritance/split
+   transformation.  Inheritance transformation is done from a spilled
+   pseudo and split transformations from a hard register or a pseudo
+   assigned to a hard register.	 */
+struct usage_insns
+{
+  /* If the value is equal to CURR_USAGE_INSNS_CHECK, then the member
+     value INSNS is valid.  The insns is chain of optional debug insns
+     and a finishing non-debug insn using the corresponding reg.  */
+  int check;
+  /* Value of global reloads_num at the ???corresponding next insns.  */
+  int reloads_num;
+  /* Value of global reloads_num at the ???corresponding next insns.  */
+  int calls_num;
+  /* It can be true only for splitting.	 And it means that the restore
+     insn should be put after insn given by the following member.  */
+  bool after_p;
+  /* Next insns in the current EBB which use the original reg and the
+     original reg value is not changed between the current insn and
+     the next insns.  In order words, e.g. for inheritance, if we need
+     to use the original reg value again in the next insns we can try
+     to use the value in a hard register from a reload insn of the
+     current insn.  */
+  rtx insns;
+};
+
+/* Map: regno -> corresponding pseudo usage insns.  */
+static struct usage_insns *usage_insns;
+
+/* Process all regs OLD_REGNO in location *LOC and change them on the
+   reload pseudo NEW_REG.  Return true if any change was done.	*/
+static bool
+substitute_pseudo (rtx *loc, int old_regno, rtx new_reg)
+{
+  rtx x = *loc;
+  bool result = false;
+  enum rtx_code code;
+  const char *fmt;
+  int i, j;
+
+  if (x == NULL_RTX)
+    return false;
+
+  code = GET_CODE (x);
+  if (code == REG && (int) REGNO (x) == old_regno)
+    {
+      *loc = new_reg;
+      return true;
+    }
+
+  /* Scan all the operand sub-expressions.  */
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	{
+	  if (substitute_pseudo (&XEXP (x, i), old_regno, new_reg))
+	    result = true;
+	}
+      else if (fmt[i] == 'E')
+	{
+	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+	    if (substitute_pseudo (&XVECEXP (x, i, j), old_regno, new_reg))
+	      result = true;
+	}
+    }
+  return result;
+}
+
+/* Registers involved in inheritance/split in the current EBB
+   (inheritance/split pseudos and original registers).	*/
+static bitmap_head check_only_regs;
+
+/* Do inheritance transformation for insn INSN defining (if DEF_P) or
+   using ORIGINAL_REGNO where the subsequent insn(s) in EBB (remember
+   we traverse insns in the backward direction) for the original regno
+   is NEXT_USAGE_INSNS.	 The transformations look like
+
+     p <- ...		  i <- ...
+     ...		  p <- i    (new insn)
+     ...	     =>
+     <- ... p ...	  <- ... i ...
+   or
+     ...		  i <- p    (new insn)
+     <- ... p ...	  <- ... i ...
+     ...	     =>
+     <- ... p ...	  <- ... i ...
+   where p is a spilled original pseudo and i is a new inheritance pseudo.
+   
+   The inheritance pseudo has the smallest class of two classes CL and
+   class of ORIGINAL REGNO.  It will have unique value if UNIQ_P.  The
+   unique value is necessary for correct assignment to inheritance
+   pseudo for input of an insn which should be the same as output
+   (bound pseudos).  Return true if we succeed in such
+   transformation.  */
+static bool
+inherit_reload_reg (bool def_p, bool uniq_p, int original_regno,
+		    enum reg_class cl, rtx insn, rtx next_usage_insns)
+{
+  enum reg_class rclass = lra_get_allocno_class (original_regno);
+  rtx original_reg = regno_reg_rtx[original_regno];
+  rtx new_reg, new_insns, usage_insn;
+
+  lra_assert (! usage_insns[original_regno].after_p);
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file,
+	     "	  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<\n");
+  if (! ira_reg_classes_intersect_p[cl][rclass])
+    {
+      if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file,
+		   "	Rejecting inheritance for %d "
+		   "because of too different classes %s and %s\n",
+		   original_regno, reg_class_names[cl],
+		   reg_class_names[rclass]);
+	  fprintf (lra_dump_file,
+		   "	>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n");
+	}
+      return false;
+    }
+  if ((ira_class_subset_p[cl][rclass] && cl != rclass)
+      || ira_class_hard_regs_num[cl] < ira_class_hard_regs_num[rclass])
+    {
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "    Use smallest class of %s and %s\n",
+		 reg_class_names[cl], reg_class_names[rclass]);
+      
+      rclass = cl;
+    }
+  if (uniq_p)
+    new_reg = lra_create_new_reg_with_unique_value (GET_MODE (original_reg),
+						    original_reg,
+						    rclass, "inheritance");
+  else
+    new_reg = lra_create_new_reg (GET_MODE (original_reg), original_reg,
+				  rclass, "inheritance");
+  start_sequence ();
+  if (def_p)
+    emit_move_insn (original_reg, new_reg);
+  else
+    emit_move_insn (new_reg, original_reg);
+  new_insns = get_insns ();
+  end_sequence ();
+  if (NEXT_INSN (new_insns) != NULL_RTX)
+    {
+      if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file,
+		   "	Rejecting inheritance %d->%d "
+		   "as it results in 2 or more insns:\n",
+		   original_regno, REGNO (new_reg));
+	  debug_rtl_slim (lra_dump_file, new_insns, NULL_RTX, -1, 0);
+	  fprintf (lra_dump_file,
+		   "	>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n");
+	}
+      return false;
+    }
+  substitute_pseudo (&insn, original_regno, new_reg);
+  lra_update_insn_regno_info (insn);
+  if (! def_p)
+    {
+      /* We now have a new usage insn for original regno.  */
+      usage_insns[original_regno].check = curr_usage_insns_check;
+      usage_insns[original_regno].insns = new_insns;
+      usage_insns[original_regno].reloads_num = reloads_num;
+      usage_insns[original_regno].calls_num = calls_num;
+      usage_insns[original_regno].after_p = false;
+    }
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file, "	 Original reg change %d->%d:\n",
+	     original_regno, REGNO (new_reg));
+  lra_reg_info[REGNO (new_reg)].restore_regno = original_regno;
+  bitmap_set_bit (&check_only_regs, REGNO (new_reg));
+  bitmap_set_bit (&check_only_regs, original_regno);
+  bitmap_set_bit (&lra_inheritance_pseudos, REGNO (new_reg));
+  if (def_p)
+    lra_process_new_insns (insn, NULL_RTX, new_insns,
+			   "Add original<-inheritance");
+  else
+    lra_process_new_insns (insn, new_insns, NULL_RTX,
+			   "Add inheritance<-pseudo");
+  while (next_usage_insns != NULL_RTX)
+    {
+      if (GET_CODE (next_usage_insns) != INSN_LIST)
+	{
+	  usage_insn = next_usage_insns;
+	  lra_assert (NONDEBUG_INSN_P (usage_insn));
+	  next_usage_insns = NULL;
+	}
+      else
+	{
+	  usage_insn = XEXP (next_usage_insns, 0);
+	  lra_assert (DEBUG_INSN_P (usage_insn));
+	  next_usage_insns = XEXP (next_usage_insns, 1);
+	}
+      substitute_pseudo (&usage_insn, original_regno, new_reg);
+      lra_update_insn_regno_info (usage_insn);
+      if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file, "    Inheritance reuse change %d->%d:\n",
+		   original_regno, REGNO (new_reg));
+	  debug_rtl_slim (lra_dump_file, usage_insn, usage_insn,
+			  -1, 0);
+	}
+    }
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file,
+	     "	  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n");
+  return true;
+}
+
+/* Return true if we need a caller save/restore for pseudo REGNO which
+   was assigned to a hard register.  */
+static inline bool
+need_for_call_save_p (int regno)
+{
+  lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
+  return (usage_insns[regno].calls_num < calls_num
+	  && (lra_hard_reg_set_intersection_p
+	      (reg_renumber[regno], PSEUDO_REGNO_MODE (regno),
+	       call_used_reg_set)));
+}
+
+/* Global registers occuring in the current EBB.  */
+static bitmap_head ebb_global_regs;
+
+/* Return true if we need a split for hard register REGNO or pseudo
+   REGNO which was assigned to a hard register.
+   POTENTIAL_RELOAD_HARD_REGS contains hard registers which might be
+   used for reloads since the EBB end.	It is an approximation of the
+   used hard registers in the split range.  The exact value would
+   require expensive calculations.  If we were aggressive with
+   splitting because of the approximation, the split pseudo will save
+   the same hard register assignment and will be removed in the undo
+   pass.  We still need the approximation because too aggressive
+   splitting would result in too inaccurate cost calculation in the
+   assignment pass because of too many generated moves which will be
+   probably removed in the undo pass.  */
+static inline bool
+need_for_split_p (HARD_REG_SET potential_reload_hard_regs, int regno)
+{
+  int hard_regno = regno < FIRST_PSEUDO_REGISTER ? regno : reg_renumber[regno];
+
+  lra_assert (hard_regno >= 0);
+  return ((TEST_HARD_REG_BIT (potential_reload_hard_regs, hard_regno)
+	   && ! TEST_HARD_REG_BIT (lra_no_alloc_regs, hard_regno)
+	   && (usage_insns[regno].reloads_num
+	       + (regno < FIRST_PSEUDO_REGISTER ? 0 : 2) < reloads_num)
+	   && ((regno < FIRST_PSEUDO_REGISTER
+		&& ! bitmap_bit_p (&ebb_global_regs, regno))
+	       || (regno >= FIRST_PSEUDO_REGISTER
+		   && lra_reg_info[regno].nrefs > 3
+		   && bitmap_bit_p (&ebb_global_regs, regno))))
+	  || (regno >= FIRST_PSEUDO_REGISTER && need_for_call_save_p (regno)));
+}
+
+/* Return class for the split pseudo created from original pseudo with
+   ALLOCNO_CLASS and MODE which got a hard register HARD_REGNO.	 We
+   choose subclass of ALLOCNO_CLASS which contains HARD_REGNO and
+   results in no secondary memory movements.  */
+static enum reg_class
+choose_split_class (enum reg_class allocno_class,
+		    int hard_regno ATTRIBUTE_UNUSED,
+		    enum machine_mode mode ATTRIBUTE_UNUSED)
+{
+#ifndef SECONDARY_MEMORY_NEEDED
+  return allocno_class;
+#else
+  int i;
+  enum reg_class cl, best_cl = NO_REGS;
+  enum reg_class hard_reg_class = REGNO_REG_CLASS (hard_regno);
+  
+  if (! SECONDARY_MEMORY_NEEDED (allocno_class, allocno_class, mode)
+      && TEST_HARD_REG_BIT (reg_class_contents[allocno_class], hard_regno))
+    return allocno_class;
+  for (i = 0;
+       (cl = reg_class_subclasses[allocno_class][i]) != LIM_REG_CLASSES;
+       i++)
+    if (! SECONDARY_MEMORY_NEEDED (cl, hard_reg_class, mode)
+	&& ! SECONDARY_MEMORY_NEEDED (hard_reg_class, cl, mode)
+	&& TEST_HARD_REG_BIT (reg_class_contents[cl], hard_regno)
+	&& (best_cl == NO_REGS
+	    || (hard_reg_set_subset_p (reg_class_contents[best_cl],
+				       reg_class_contents[cl])
+		&& ! hard_reg_set_equal_p (reg_class_contents[best_cl],
+					   reg_class_contents[cl]))))
+      best_cl = cl;
+  return best_cl;
+#endif
+}
+
+/* Do split transformation for insn INSN defining or
+   using ORIGINAL_REGNO where the subsequent insn(s) in EBB (remember
+   we traverse insns in the backward direction) for the original regno
+   is NEXT_USAGE_INSNS.	 The transformations look like
+
+     p <- ...		  p <- ...
+     ...		  s <- p    (new insn -- save)
+     ...	     =>
+     ...		  p <- s    (new insn -- restore)
+     <- ... p ...	  <- ... p ...
+   or
+     <- ... p ...	  <- ... p ...
+     ...		  s <- p    (new insn -- save)
+     ...	     =>
+     ...		  p <- s    (new insn -- restore)
+     <- ... p ...	  <- ... p ...
+
+   where p is an original pseudo got a hard register or a hard
+   register and s is a new split pseudo.  The save is put before INSN
+   if BEFORE_P is true.	 Return true if we succeed in such
+   transformation.  */
+static bool
+split_reg (bool before_p, int original_regno, rtx insn, rtx next_usage_insns)
+{
+  enum reg_class rclass;
+  rtx original_reg;
+  int hard_regno;
+  rtx new_reg, save, restore, usage_insn;
+  bool after_p;
+  bool call_save_p;
+
+  if (original_regno < FIRST_PSEUDO_REGISTER)
+    {
+      rclass = ira_allocno_class_translate[REGNO_REG_CLASS (original_regno)];
+      hard_regno = original_regno;
+      call_save_p = false;
+    }
+  else
+    {
+      hard_regno = reg_renumber[original_regno];
+      rclass = lra_get_allocno_class (original_regno);
+      original_reg = regno_reg_rtx[original_regno];
+      call_save_p = need_for_call_save_p (original_regno);
+    }
+  original_reg = regno_reg_rtx[original_regno];
+  lra_assert (hard_regno >= 0);
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file,
+	     "	  ((((((((((((((((((((((((((((((((((((((((((((((((\n");
+  if (call_save_p)
+    {
+      enum machine_mode sec_mode;
+      
+#ifdef SECONDARY_MEMORY_NEEDED_MODE
+      sec_mode = SECONDARY_MEMORY_NEEDED_MODE (GET_MODE (original_reg));
+#else
+      sec_mode = GET_MODE (original_reg);
+#endif
+      new_reg = lra_create_new_reg (sec_mode, NULL_RTX,
+				    NO_REGS, "save");
+    }
+  else
+    {
+      rclass = choose_split_class (rclass, hard_regno,
+				   GET_MODE (original_reg));
+      if (rclass == NO_REGS)
+	{
+	  if (lra_dump_file != NULL)
+	    {
+	      fprintf (lra_dump_file,
+		       "    Rejecting split of %d(%s): "
+		       "no good reg class for %d(%s)\n",
+		       original_regno,
+		       reg_class_names[lra_get_allocno_class (original_regno)],
+		       hard_regno,
+		       reg_class_names[REGNO_REG_CLASS (hard_regno)]);
+	      fprintf
+		(lra_dump_file,
+		 "    ))))))))))))))))))))))))))))))))))))))))))))))))\n");
+	    }
+	  return false;
+	}
+      new_reg = lra_create_new_reg (GET_MODE (original_reg), original_reg,
+				    rclass, "split");
+      reg_renumber[REGNO (new_reg)] = hard_regno;
+    }
+  if (call_save_p)
+    save = emit_spill_move (true, new_reg, original_reg, -1);
+  else
+    {
+      start_sequence ();
+      emit_move_insn (new_reg, original_reg);
+      save = get_insns ();
+      end_sequence ();
+    }
+  if (NEXT_INSN (save) != NULL_RTX)
+    {
+      lra_assert (! call_save_p);
+      if (lra_dump_file != NULL)
+	{
+	  fprintf
+	    (lra_dump_file,
+	     "	  Rejecting split %d->%d resulting in > 2 %s save insns:\n",
+	     original_regno, REGNO (new_reg), call_save_p ? "call" : "");
+	  debug_rtl_slim (lra_dump_file, save, NULL_RTX, -1, 0);
+	  fprintf (lra_dump_file,
+		   "	))))))))))))))))))))))))))))))))))))))))))))))))\n");
+	}
+      return false;
+    }
+  if (call_save_p)
+    restore = emit_spill_move (false, new_reg, original_reg, -1);
+  else
+    {
+      start_sequence ();
+      emit_move_insn (original_reg, new_reg);
+      restore = get_insns ();
+      end_sequence ();
+    }
+  if (NEXT_INSN (restore) != NULL_RTX)
+    {
+      lra_assert (! call_save_p);
+      if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file,
+		   "	Rejecting split %d->%d "
+		   "resulting in > 2 %s restore insns:\n",
+		   original_regno, REGNO (new_reg), call_save_p ? "call" : "");
+	  debug_rtl_slim (lra_dump_file, restore, NULL_RTX, -1, 0);
+	  fprintf (lra_dump_file,
+		   "	))))))))))))))))))))))))))))))))))))))))))))))))\n");
+	}
+      return false;
+    }
+  after_p = usage_insns[original_regno].after_p;
+  lra_reg_info[REGNO (new_reg)].restore_regno = original_regno;
+  bitmap_set_bit (&check_only_regs, REGNO (new_reg));
+  bitmap_set_bit (&check_only_regs, original_regno);
+  bitmap_set_bit (&lra_split_pseudos, REGNO (new_reg));
+  for (;;)
+    {
+      if (GET_CODE (next_usage_insns) != INSN_LIST)
+	{
+	  usage_insn = next_usage_insns;
+	  break;
+	}
+      usage_insn = XEXP (next_usage_insns, 0);
+      lra_assert (DEBUG_INSN_P (usage_insn));
+      next_usage_insns = XEXP (next_usage_insns, 1);
+      substitute_pseudo (&usage_insn, original_regno, new_reg);
+      lra_update_insn_regno_info (usage_insn);
+      if (lra_dump_file != NULL)
+	{
+	  fprintf (lra_dump_file, "    Split reuse change %d->%d:\n",
+		   original_regno, REGNO (new_reg));
+	  debug_rtl_slim (lra_dump_file, usage_insn, usage_insn,
+			  -1, 0);
+	}
+    }
+  lra_assert (NONDEBUG_INSN_P (usage_insn));
+  lra_assert (usage_insn != insn || (after_p && before_p));
+  lra_process_new_insns (usage_insn, after_p ? NULL_RTX : restore,
+			 after_p ? restore : NULL_RTX,
+			 call_save_p
+			 ?  "Add reg<-save" : "Add reg<-split");
+  lra_process_new_insns (insn, before_p ? save : NULL_RTX,
+			 before_p ? NULL_RTX : save,
+			 call_save_p
+			 ?  "Add save<-reg" : "Add split<-reg");
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file,
+	     "	  ))))))))))))))))))))))))))))))))))))))))))))))))\n");
+  return true;
+}
+
+/* Check only registers living at the current program point in the
+   current EBB.	 */
+static bitmap_head live_regs;
+
+/* Update live info in EBB given by its HEAD and TAIL insns after
+   inheritance/split transformation.  The function removes dead moves
+   too.	 */
+static void
+update_ebb_live_info (rtx head, rtx tail)
+{
+  unsigned int j;
+  int regno;
+  bool live_p;
+  rtx prev_insn, set;
+  bool remove_p;
+  basic_block last_bb, prev_bb, curr_bb;
+  bitmap_iterator bi;
+  struct lra_insn_reg *reg;
+  edge e;
+  edge_iterator ei;
+
+  last_bb = BLOCK_FOR_INSN (tail);
+  prev_bb = NULL;
+  for (curr_insn = tail;
+       curr_insn != PREV_INSN (head);
+       curr_insn = prev_insn)
+    {
+      prev_insn = PREV_INSN (curr_insn);
+      if (! INSN_P (curr_insn))
+	continue;
+      curr_bb = BLOCK_FOR_INSN (curr_insn);
+      if (curr_bb != prev_bb)
+	{
+	  if (prev_bb != NULL)
+	    {
+	      /* Udpate DF_LR_IN (prev_bb):  */
+	      EXECUTE_IF_SET_IN_BITMAP (&check_only_regs, 0, j, bi)
+		if (bitmap_bit_p (&live_regs, j))
+		  bitmap_set_bit (DF_LR_IN (prev_bb), j);
+		else
+		  bitmap_clear_bit (DF_LR_IN (prev_bb), j);
+	    }
+	  if (curr_bb != last_bb)
+	    {
+	      /* Update DF_LR_OUT (curr_bb):  */
+	      EXECUTE_IF_SET_IN_BITMAP (&check_only_regs, 0, j, bi)
+		{
+		  live_p = bitmap_bit_p (&live_regs, j);
+		  if (! live_p)
+		    FOR_EACH_EDGE (e, ei, curr_bb->succs)
+		      if (bitmap_bit_p (DF_LR_IN (e->dest), j))
+			{
+			  live_p = true;
+			  break;
+			}
+		  if (live_p)
+		    bitmap_set_bit (DF_LR_OUT (curr_bb), j);
+		  else
+		    bitmap_clear_bit (DF_LR_OUT (curr_bb), j);
+		}
+	    }
+	  prev_bb = curr_bb;
+	  bitmap_and (&live_regs, &check_only_regs, DF_LR_OUT (curr_bb));
+	}
+      if (DEBUG_INSN_P (curr_insn))
+	continue;
+      curr_id = lra_get_insn_recog_data (curr_insn);
+      remove_p = false;
+      if ((set = single_set (curr_insn)) != NULL_RTX && REG_P (SET_DEST (set))
+	  && (regno = REGNO (SET_DEST (set))) >= FIRST_PSEUDO_REGISTER
+	  && bitmap_bit_p (&check_only_regs, regno)
+	  && ! bitmap_bit_p (&live_regs, regno))
+	remove_p = true;
+      /* See which defined values die here.  */
+      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+	if (reg->type == OP_OUT && ! reg->early_clobber
+	    && (! reg->subreg_p
+		|| bitmap_bit_p (&lra_bound_pseudos, reg->regno)))
+	  bitmap_clear_bit (&live_regs, reg->regno);
+      /* Mark each used value as live.	*/
+      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+	if (reg->type == OP_IN
+	    && bitmap_bit_p (&check_only_regs, reg->regno))
+	  bitmap_set_bit (&live_regs, reg->regno);
+      /* Mark early clobber outputs dead.  */
+      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+	if (reg->type == OP_OUT && reg->early_clobber && ! reg->subreg_p)
+	  bitmap_clear_bit (&live_regs, reg->regno);
+      /* It is quite important to remove dead move insns because it
+	 means removing dead store, we don't need to process them for
+	 constraints, and unfortunately some subsequent optimizations
+	 (like shrink-wrapping) currently based on assumption that
+	 there are no trivial dead insns.  */
+      if (remove_p)
+	{
+	  if (lra_dump_file != NULL)
+	    {
+	      fprintf (lra_dump_file, "	    Removing dead insn:\n ");
+	      debug_rtl_slim (lra_dump_file, curr_insn, curr_insn, -1, 0);
+	    }
+	  lra_set_insn_deleted (curr_insn);
+	}
+    }
+}
+
+/* The structure describes info to do an inheritance for the current
+   insn.  We need to collect such info first before doing the
+   transformations because the transformations change the insn
+   internal representation.  */
+struct to_inherit
+{
+  /* Original regno.  */
+  int regno;
+  /* Subsequent insns which can inherit original reg value.  */
+  rtx insns;
+};
+
+/* Array containing all info for doing inheritance from the current
+   insn.  */
+static struct to_inherit to_inherit[LRA_MAX_INSN_RELOADS];
+
+/* Number elements in the previous array.  */
+static int to_inherit_num;
+
+/* Add inheritance info REGNO and INSNS.  */
+static void
+add_to_inherit (int regno, rtx insns)
+{
+  int i;
+
+  for (i = 0; i < to_inherit_num; i++)
+    if (to_inherit[i].regno == regno)
+      return;
+  lra_assert (to_inherit_num < LRA_MAX_INSN_RELOADS);
+  to_inherit[to_inherit_num].regno = regno;
+  to_inherit[to_inherit_num++].insns = insns;
+}
+
+/* Return first (if FIRST_P) or last non-debug insn in basic block BB.
+   Return null if there are no non-debug insns in the block.  */
+static rtx
+get_non_debug_insn (bool first_p, basic_block bb)
+{
+  rtx insn;
+
+  for (insn = first_p ? BB_HEAD (bb) : BB_END (bb);
+       insn != NULL_RTX && ! NONDEBUG_INSN_P (insn);
+       insn = first_p ? NEXT_INSN (insn) : PREV_INSN (insn))
+    ;
+  if (insn != NULL_RTX && BLOCK_FOR_INSN (insn) != bb)
+    insn = NULL_RTX;
+  return insn;
+}
+
+/* Set up RES by registers living on edges FROM except the edge (FROM,
+   TO) or by registers set up in a jump insn in BB FROM.  */
+static void
+get_live_on_other_edges (basic_block from, basic_block to, bitmap res)
+{
+  int regno;
+  rtx last;
+  struct lra_insn_reg *reg;
+  edge e;
+  edge_iterator ei;
+
+  lra_assert (to != NULL);
+  bitmap_clear (res);
+  FOR_EACH_EDGE (e, ei, from->succs)
+    if (e->dest != to)
+      bitmap_ior_into (res, DF_LR_IN (e->dest));
+  if ((last = get_non_debug_insn (false, from)) == NULL_RTX || ! JUMP_P (last))
+    return;
+  curr_id = lra_get_insn_recog_data (last);
+  for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+    if (reg->type != OP_IN
+	&& (regno = reg->regno) >= FIRST_PSEUDO_REGISTER)
+      bitmap_set_bit (res, regno);
+}
+	
+/* Used as a temporary results of some bitmap calculations.  */
+static bitmap_head temp_bitmap;
+
+/* The function is used to form list REGNO usages which consists of
+   optional debug insns finished by a non-debug insn using REGNO.
+   RELOADS_NUM is current number of reload insns processed so far.  */
+static void
+add_next_usage_insn (int regno, rtx insn, int reloads_num)
+{
+  rtx next_usage_insns;
+  
+  if (usage_insns[regno].check == curr_usage_insns_check
+      && (next_usage_insns = usage_insns[regno].insns) != NULL_RTX
+      && DEBUG_INSN_P (insn))
+    {
+      /* Check that we did not add the debug insn yet.	*/
+      if (next_usage_insns != insn
+	  && (GET_CODE (next_usage_insns) != INSN_LIST
+	      || XEXP (next_usage_insns, 0) != insn))
+	usage_insns[regno].insns = gen_rtx_INSN_LIST (VOIDmode, insn,
+						      next_usage_insns);
+    }
+  else if (NONDEBUG_INSN_P (insn))
+    {
+      usage_insns[regno].check = curr_usage_insns_check;
+      usage_insns[regno].insns = insn;
+      usage_insns[regno].reloads_num = reloads_num;
+      usage_insns[regno].calls_num = calls_num;
+      usage_insns[regno].after_p = false;
+    }
+  else
+    usage_insns[regno].check = 0;
+}
+  
+/* Do inheritance/split transformations in EBB starting with HEAD and
+   finishing on TAIL.  We process EBB insns in the reverse order.
+   Return true if we did any inheritance/split transformation in the
+   EBB.
+
+   We should avoid excessive splitting which results in worse code
+   because of inaccurate cost calculations for spilling new split
+   pseudos in such case.  To achieve this we do splitting only if
+   register pressure is high in given basic block and there reload
+   pseudos requiring hard registers.  We could do more register
+   pressure calculations at any given program point to avoid necessary
+   splitting even more but it is to expensive and the current approach
+   is well enough.  */
+static bool
+inherit_in_ebb (rtx head, rtx tail)
+{
+  int i, src_regno, dst_regno;
+  bool change_p, succ_p;
+  rtx prev_insn, next_usage_insns, set, first_insn, last_insn, next_insn;
+  enum reg_class cl;
+  struct lra_insn_reg *reg;
+  basic_block last_processed_bb, curr_bb = NULL;
+  HARD_REG_SET potential_reload_hard_regs, live_hard_regs;
+  bitmap to_process;
+  unsigned int j;
+  bitmap_iterator bi;
+  bool head_p, after_p;
+
+  change_p = false;
+  curr_usage_insns_check++;
+  reloads_num = calls_num = 0;
+  /* Remember: we can remove the current insn.	*/
+  bitmap_clear (&check_only_regs);
+  last_processed_bb = NULL;
+  CLEAR_HARD_REG_SET (potential_reload_hard_regs);
+  CLEAR_HARD_REG_SET (live_hard_regs);
+  /* We don't process new insns generated in the loop.	*/
+  for (curr_insn = tail; curr_insn != PREV_INSN (head); curr_insn = prev_insn)
+    {
+      prev_insn = PREV_INSN (curr_insn);
+      if (BLOCK_FOR_INSN (curr_insn) != NULL)
+	curr_bb = BLOCK_FOR_INSN (curr_insn);
+      if (last_processed_bb != curr_bb)
+	{
+	  /* We are at the end of BB.  Add qualified living
+	     pseudos for potential splitting.  */
+	  to_process = DF_LR_OUT (curr_bb);
+	  if (last_processed_bb != NULL)
+	    {	
+	      /* We are somewhere in the middle of EBB.	 */
+	      get_live_on_other_edges (curr_bb, last_processed_bb,
+				       &temp_bitmap);
+	      to_process = &temp_bitmap;
+	    }
+	  last_processed_bb = curr_bb;
+	  last_insn = get_non_debug_insn (false, curr_bb);
+	  after_p = (last_insn != NULL_RTX && ! JUMP_P (last_insn)
+		     && (! CALL_P (last_insn)
+			 || (find_reg_note (last_insn,
+					   REG_NORETURN, NULL) == NULL_RTX
+			     && ((next_insn
+				  = next_nonnote_nondebug_insn (last_insn))
+				 == NULL_RTX
+				 || GET_CODE (next_insn) != BARRIER))));
+	  REG_SET_TO_HARD_REG_SET (live_hard_regs, DF_LR_OUT (curr_bb));
+	  IOR_HARD_REG_SET (live_hard_regs, eliminable_regset);
+	  IOR_HARD_REG_SET (live_hard_regs, lra_no_alloc_regs);
+	  CLEAR_HARD_REG_SET (potential_reload_hard_regs);
+	  EXECUTE_IF_SET_IN_BITMAP (to_process, 0, j, bi)
+	    {
+	      if ((int) j >= lra_constraint_new_regno_start)
+		break;
+	      if (j < FIRST_PSEUDO_REGISTER || reg_renumber[j] >= 0)
+		{
+		  if (j < FIRST_PSEUDO_REGISTER)
+		    SET_HARD_REG_BIT (live_hard_regs, j);
+		  else
+		    lra_add_hard_reg_set (reg_renumber[j],
+					  PSEUDO_REGNO_MODE (j),
+					  &live_hard_regs);
+		  usage_insns[j].check = curr_usage_insns_check;
+		  usage_insns[j].insns = last_insn;
+		  usage_insns[j].reloads_num = reloads_num;
+		  usage_insns[j].calls_num = calls_num;
+		  usage_insns[j].after_p = after_p;
+		}
+	    }
+	}
+      src_regno = dst_regno = -1;
+      if (NONDEBUG_INSN_P (curr_insn)
+	  && (set = single_set (curr_insn)) != NULL_RTX
+	  && REG_P (SET_DEST (set)) && REG_P (SET_SRC (set)))
+	{
+	  src_regno = REGNO (SET_SRC (set));
+	  dst_regno = REGNO (SET_DEST (set));
+	}
+      if (src_regno < lra_constraint_new_regno_start
+	  && src_regno >= FIRST_PSEUDO_REGISTER
+	  && reg_renumber[src_regno] < 0
+	  && dst_regno >= lra_constraint_new_regno_start
+	  && (cl = lra_get_allocno_class (dst_regno)) != NO_REGS)
+	{
+	  /* 'reload_pseudo <- original_pseudo'.  */
+	  reloads_num++;
+	  succ_p = false;
+	  if (usage_insns[src_regno].check == curr_usage_insns_check
+	      && (next_usage_insns = usage_insns[src_regno].insns) != NULL_RTX)
+	    succ_p = inherit_reload_reg (false,
+					 bitmap_bit_p (&lra_matched_pseudos,
+						       dst_regno),
+					 src_regno, cl,
+					 curr_insn, next_usage_insns);
+	  if (succ_p)
+	    change_p = true;
+	  else
+	    {
+	      usage_insns[src_regno].check = curr_usage_insns_check;
+	      usage_insns[src_regno].insns = curr_insn;
+	      usage_insns[src_regno].reloads_num = reloads_num;
+	      usage_insns[src_regno].calls_num = calls_num;
+	      usage_insns[src_regno].after_p = false;
+	    }
+	  if (cl != NO_REGS
+	      && hard_reg_set_subset_p (reg_class_contents[cl],
+					live_hard_regs))
+	    IOR_HARD_REG_SET (potential_reload_hard_regs,
+			      reg_class_contents[cl]);
+	}
+      else if (src_regno >= lra_constraint_new_regno_start
+	       && dst_regno < lra_constraint_new_regno_start
+	       && dst_regno >= FIRST_PSEUDO_REGISTER
+	       && reg_renumber[dst_regno] < 0
+	       && (cl = lra_get_allocno_class (src_regno)) != NO_REGS
+	       && usage_insns[dst_regno].check == curr_usage_insns_check
+	       && (next_usage_insns
+		   = usage_insns[dst_regno].insns) != NULL_RTX)
+	{
+	  reloads_num++;
+	  /* 'original_pseudo <- reload_pseudo'.  */
+	  if (! JUMP_P (curr_insn)
+	      && inherit_reload_reg (true, false, dst_regno, cl,
+				     curr_insn, next_usage_insns))
+	    change_p = true;
+	  /* Invalidate.  */
+	  usage_insns[dst_regno].check = 0;
+	  if (cl != NO_REGS
+	      && hard_reg_set_subset_p (reg_class_contents[cl],
+					live_hard_regs))
+	    IOR_HARD_REG_SET (potential_reload_hard_regs,
+			      reg_class_contents[cl]);
+	}
+      else if (INSN_P (curr_insn))
+	{
+	  int max_uid = get_max_uid ();
+
+	  curr_id = lra_get_insn_recog_data (curr_insn);
+	  to_inherit_num = 0;
+	  /* Process insn definitions.	*/
+	  for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+	    if (reg->type != OP_IN
+		&& (dst_regno = reg->regno) < lra_constraint_new_regno_start
+		&& usage_insns[dst_regno].check == curr_usage_insns_check
+		&& (next_usage_insns
+		    = usage_insns[dst_regno].insns) != NULL_RTX)
+	      {
+		if (dst_regno >= FIRST_PSEUDO_REGISTER && reg->type == OP_OUT
+		    && reg_renumber[dst_regno] < 0 && ! reg->subreg_p)
+		  {
+		    struct lra_insn_reg *r;
+		    
+		    for (r = curr_id->regs; r != NULL; r = r->next)
+		      if (r->type != OP_OUT && r->regno == dst_regno)
+			break;
+		    /* Don't do inheritance if the pseudo is also
+		       used in the insn.  */
+		    if (r == NULL)
+		      /* We can not do inheritance right now
+			 because the current insn reg info (chain
+			 regs) can change after that.  */
+		      add_to_inherit (dst_regno, next_usage_insns);
+		  }
+		/* We can not process one reg twice here because of
+		   usage_insns invalidation.  */
+		if ((dst_regno < FIRST_PSEUDO_REGISTER
+		     || reg_renumber[dst_regno] >= 0)
+		    && ! reg->subreg_p && reg->type == OP_OUT)
+		  {
+		    HARD_REG_SET s;
+		    
+		    if (need_for_split_p (potential_reload_hard_regs,
+					  dst_regno)
+			&& split_reg (false, dst_regno, curr_insn,
+				      next_usage_insns))
+		      change_p = true;
+		    CLEAR_HARD_REG_SET (s);
+		    if (dst_regno < FIRST_PSEUDO_REGISTER)
+		      SET_HARD_REG_BIT (s, dst_regno);
+		    else
+		      lra_add_hard_reg_set (reg_renumber[dst_regno],
+					    PSEUDO_REGNO_MODE (dst_regno), &s);
+		    AND_COMPL_HARD_REG_SET (live_hard_regs, s);
+		  }
+		if (reg_renumber[dst_regno] < 0
+		    || (reg->type == OP_OUT && ! reg->subreg_p))
+		/* Invalidate.	*/
+		usage_insns[dst_regno].check = 0;
+	      }
+	  if (! JUMP_P (curr_insn))
+	    for (i = 0; i < to_inherit_num; i++)
+	      if (inherit_reload_reg (true, false, to_inherit[i].regno,
+				      ALL_REGS, curr_insn,
+				      to_inherit[i].insns))
+	      change_p = true;
+	  if (CALL_P (curr_insn))
+	    {
+	      rtx cheap, pat, dest, restore;
+	      int regno, hard_regno;
+
+	      calls_num++;
+	      if ((cheap = find_reg_note (curr_insn,
+					  REG_RETURNED, NULL_RTX)) != NULL_RTX
+		  && ((cheap = XEXP (cheap, 0)), true)
+		  && (regno = REGNO (cheap)) >= FIRST_PSEUDO_REGISTER
+		  && (hard_regno = reg_renumber[regno]) >= 0
+		  /* If there are pending saves/restores, the
+		     optimization is not worth.	 */
+		  && usage_insns[regno].calls_num == calls_num - 1
+		  && TEST_HARD_REG_BIT (call_used_reg_set, hard_regno))
+		{
+		  /* Restore the pseudo from the call result as
+		     REG_RETURNED note says that the pseudo value is
+		     in the call result and the pseudo is an argument
+		     of the call.  */
+		  pat = PATTERN (curr_insn);
+		  if (GET_CODE (pat) == PARALLEL)
+		    pat = XVECEXP (pat, 0, 0);
+		  dest = SET_DEST (pat);
+		  start_sequence ();
+		  emit_move_insn (cheap, copy_rtx (dest));
+		  restore = get_insns ();
+		  end_sequence ();
+		  lra_process_new_insns (curr_insn, NULL, restore,
+					 "Inserting call parameter restore");
+		  /* We don't need to save/restore of the pseudo from
+		     this call.	 */
+		  usage_insns[regno].calls_num = calls_num;
+		  bitmap_set_bit (&check_only_regs, regno);
+		}
+	    }
+	  to_inherit_num = 0;
+	  /* Process insn usages.  */
+	  for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+	    if ((reg->type != OP_OUT
+		 || (reg->type == OP_OUT && reg->subreg_p))
+		&& (src_regno = reg->regno) < lra_constraint_new_regno_start)
+	      {
+		if (src_regno >= FIRST_PSEUDO_REGISTER
+		    && reg_renumber[src_regno] < 0 && reg->type == OP_IN)
+		  {
+		    if (usage_insns[src_regno].check == curr_usage_insns_check
+			&& (next_usage_insns
+			    = usage_insns[src_regno].insns) != NULL_RTX
+			&& NONDEBUG_INSN_P (curr_insn))
+		      add_to_inherit (src_regno, next_usage_insns);
+		    else
+		      /* Add usages.  */
+		      add_next_usage_insn (src_regno, curr_insn, reloads_num);
+		  }
+		else if (src_regno < FIRST_PSEUDO_REGISTER
+			 || reg_renumber[src_regno] >= 0)
+		  {
+		    bool before_p;
+		    rtx use_insn = curr_insn;
+
+		    before_p = (JUMP_P (curr_insn)
+				|| (CALL_P (curr_insn) && reg->type == OP_IN));
+		    if (usage_insns[src_regno].check == curr_usage_insns_check
+			&& (next_usage_insns
+			    = usage_insns[src_regno].insns) != NULL_RTX
+			/* To avoid processing the pseudo twice or
+			   more.  */
+			&& ((GET_CODE (next_usage_insns) != INSN_LIST
+			     && INSN_UID (next_usage_insns) < max_uid)
+			    || (GET_CODE (next_usage_insns) == INSN_LIST
+				&& (INSN_UID (XEXP (next_usage_insns, 0))
+				    < max_uid)))
+			&& need_for_split_p (potential_reload_hard_regs,
+					     src_regno)
+			&& NONDEBUG_INSN_P (curr_insn)
+			&& split_reg (before_p, src_regno, curr_insn,
+				      next_usage_insns))
+		      {
+			if (reg->subreg_p)
+			  lra_risky_transformations_p = true;
+			change_p = true;
+			/* Invalidate.	*/
+			usage_insns[src_regno].check = 0;
+			if (before_p)
+			  use_insn = PREV_INSN (curr_insn);
+		      }
+		    if (NONDEBUG_INSN_P (curr_insn))
+		      {
+			if (src_regno < FIRST_PSEUDO_REGISTER)
+			  SET_HARD_REG_BIT (live_hard_regs, src_regno);
+			else
+			  lra_add_hard_reg_set (reg_renumber[src_regno],
+						PSEUDO_REGNO_MODE (src_regno),
+						&live_hard_regs);
+		      }
+		    add_next_usage_insn (src_regno, use_insn, reloads_num);
+		  }
+	      }
+	  for (i = 0; i < to_inherit_num; i++)
+	    {
+	      src_regno = to_inherit[i].regno;
+	      if (inherit_reload_reg (false, false, src_regno, ALL_REGS,
+				      curr_insn, to_inherit[i].insns))
+		change_p = true;
+	      else
+		{
+		  usage_insns[src_regno].check = curr_usage_insns_check;
+		  usage_insns[src_regno].insns = curr_insn;
+		  usage_insns[src_regno].reloads_num = reloads_num;
+		  usage_insns[src_regno].calls_num = calls_num;
+		  usage_insns[src_regno].after_p = false;
+		}
+	    }
+	}
+      /* We reached the start of the current basic block.  */
+      if (prev_insn == NULL_RTX || prev_insn == PREV_INSN (head)
+	  || BLOCK_FOR_INSN (prev_insn) != curr_bb)
+	{
+	  /* We reached the beginning of the current block -- do
+	     rest of spliting in the current BB.  */
+	  first_insn = get_non_debug_insn (true, curr_bb);
+	  to_process = DF_LR_IN (curr_bb);
+	  if (BLOCK_FOR_INSN (head) != curr_bb)
+	    {	
+	      /* We are somewhere in the middle of EBB.	 */
+	      get_live_on_other_edges (EDGE_PRED (curr_bb, 0)->src,
+				       curr_bb, &temp_bitmap);
+	      to_process = &temp_bitmap;
+	    }
+	  head_p = true;
+	  EXECUTE_IF_SET_IN_BITMAP (to_process, 0, j, bi)
+	    {
+	      if ((int) j >= lra_constraint_new_regno_start)
+		break;
+	      if (((int) j < FIRST_PSEUDO_REGISTER || reg_renumber[j] >= 0)
+		  && usage_insns[j].check == curr_usage_insns_check
+		  && (next_usage_insns = usage_insns[j].insns) != NULL_RTX)
+		{
+		  if (first_insn != NULL_RTX
+		      && need_for_split_p (potential_reload_hard_regs, j))
+		    {
+		      if (lra_dump_file != NULL && head_p)
+			{
+			  fprintf (lra_dump_file,
+				   "  ----------------------------------\n");
+			  head_p = false;
+			}
+		      if (split_reg (true, j, first_insn, next_usage_insns))
+			change_p = true;
+		    }
+		  usage_insns[j].check = 0;
+		}
+	    }
+	}
+    }
+  return change_p;
+}
+
+/* This value affects EBB forming.  If probability of edge from EBB to
+   a BB is not greater than the following value, we don't add the BB
+   to EBB.  */ 
+#define EBB_PROBABILITY_CUTOFF (REG_BR_PROB_BASE / 2)
+
+/* Current number of inheritance/split iteration.  */
+int lra_inheritance_iter;
+
+/* Entry function for inheritance/split pass.  */
+void
+lra_inheritance (void)
+{
+  int i;
+  basic_block bb, start_bb;
+  edge e;
+
+  lra_inheritance_iter++;
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file, "\n********** Inheritance #%d: **********\n\n",
+	     lra_inheritance_iter);
+  curr_usage_insns_check = 0;
+  usage_insns
+    = (struct usage_insns *) xmalloc (sizeof (struct usage_insns)
+				      * lra_constraint_new_regno_start);
+  for (i = 0; i < lra_constraint_new_regno_start; i++)
+    usage_insns[i].check = 0;
+  bitmap_initialize (&check_only_regs, &reg_obstack);
+  bitmap_initialize (&live_regs, &reg_obstack);
+  bitmap_initialize (&temp_bitmap, &reg_obstack);
+  bitmap_initialize (&ebb_global_regs, &reg_obstack);
+  FOR_EACH_BB (bb)
+    {
+      start_bb = bb;
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "EBB");
+      /* Form a EBB starting with BB.  */
+      bitmap_clear (&ebb_global_regs);
+      bitmap_ior_into (&ebb_global_regs, DF_LR_IN (bb));
+      for (;;)
+	{
+	  if (lra_dump_file != NULL)
+	    fprintf (lra_dump_file, " %d", bb->index);
+	  if (bb->next_bb == EXIT_BLOCK_PTR || LABEL_P (BB_HEAD (bb->next_bb)))
+	    break;
+	  e = find_fallthru_edge (bb->succs);
+	  if (! e)
+	    break;
+	  if (e->probability <= EBB_PROBABILITY_CUTOFF)
+	    break;
+	  bb = bb->next_bb;
+	}
+      bitmap_ior_into (&ebb_global_regs, DF_LR_OUT (bb));
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "\n");
+      if (inherit_in_ebb (BB_HEAD (start_bb), BB_END (bb)))
+	/* Remember that the EBB head and tail can change in
+	   inherit_in_ebb.  */
+	update_ebb_live_info (BB_HEAD (start_bb), BB_END (bb));
+    }
+  bitmap_clear (&ebb_global_regs);
+  bitmap_clear (&temp_bitmap);
+  bitmap_clear (&live_regs);
+  bitmap_clear (&check_only_regs);
+  free (usage_insns);
+}
+
+
+
+/* This page contains code to undo failed inheritance/split
+   transformations.  */
+
+/* Current number of iteration undoing inheritance/split.  */
+int lra_undo_inheritance_iter;
+
+/* Temporary bitmaps used during calls of FIX_BB_LIVE_INFO.  */
+static bitmap_head temp_bitmap_head;
+
+/* Fix BB live info LIVE after removing pseudos created on pass doing
+   inheritance/split which are REMOVED_PSEUDOS.	 */
+static void
+fix_bb_live_info (bitmap live, bitmap removed_pseudos)
+{
+  unsigned int regno;
+  bitmap_iterator bi;
+
+  bitmap_and (&temp_bitmap_head, removed_pseudos, live);
+  EXECUTE_IF_SET_IN_BITMAP (&temp_bitmap_head, 0, regno, bi)
+    {
+      bitmap_clear_bit (live, regno);
+      bitmap_set_bit (live, lra_reg_info[regno].restore_regno);
+    }
+}
+
+/* Return regno of the (subreg of) REG. Otherwise, return a negative
+   number.  */
+static int
+get_regno (rtx reg)
+{
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+  if (REG_P (reg))
+    return REGNO (reg);
+  return -1;
+}
+
+/* Remove inheritance/split pseudos which are in REMOVE_PSEUDOS and
+   return true if we did any change.  The undo transformations for
+   inheritance looks like
+      i <- i2
+      p <- i	  =>   p <- i2
+   or removing
+      p <- i, i <- p, and i <- i3
+   where p is original pseudo from which inheritance pseudo i was
+   created, i and i3 are removed inheritance pseudos, i2 is another
+   not removed inheritance pseudo.  All split pseudos or other
+   occurrences of removed inheritance pseudos are changed on the
+   corresponding original pseudos.  */
+static bool
+remove_inheritance_pseudos (bitmap remove_pseudos)
+{
+  basic_block bb;
+  int regno, sregno, prev_sregno, dregno, restore_regno;
+  rtx set, prev_set, prev_insn;
+  bool change_p, done_p;
+
+  change_p = ! bitmap_empty_p (remove_pseudos);
+  bitmap_initialize (&temp_bitmap_head, &reg_obstack);
+  FOR_EACH_BB (bb)
+    {
+      fix_bb_live_info (DF_LR_IN (bb), remove_pseudos);
+      fix_bb_live_info (DF_LR_OUT (bb), remove_pseudos);
+      FOR_BB_INSNS_REVERSE (bb, curr_insn)
+	{
+	  if (! INSN_P (curr_insn))
+	    continue;
+	  done_p = false;
+	  sregno = dregno = -1;
+	  if (change_p && NONDEBUG_INSN_P (curr_insn)
+	      && (set = single_set (curr_insn)) != NULL_RTX)
+	    {
+	      dregno = get_regno (SET_DEST (set));
+	      sregno = get_regno (SET_SRC (set));
+	    }
+	  
+	  if (sregno >= 0 && dregno >= 0)
+	    {
+	      if ((bitmap_bit_p (remove_pseudos, sregno)
+		   && (lra_reg_info[sregno].restore_regno == dregno
+		       || (bitmap_bit_p (remove_pseudos, dregno)
+			   && (lra_reg_info[sregno].restore_regno
+			       == lra_reg_info[dregno].restore_regno))))
+		  || (bitmap_bit_p (remove_pseudos, dregno)
+		      && lra_reg_info[dregno].restore_regno == sregno))
+		/* One of the following cases:
+		     original <- removed inheritance pseudo
+		     removed inherit pseudo <- another removed inherit pseudo
+		     removed inherit pseudo <- original pseudo
+		   Or
+		     removed_split_pseudo <- original_reg
+		     original_reg <- removed_split_pseudo */
+		{
+		  if (lra_dump_file != NULL)
+		    {
+		      fprintf (lra_dump_file, "	   Removing %s:\n",
+			       bitmap_bit_p (&lra_split_pseudos, sregno)
+			       || bitmap_bit_p (&lra_split_pseudos, dregno)
+			       ? "split" : "inheritance");
+		      debug_rtl_slim (lra_dump_file,
+				      curr_insn, curr_insn, -1, 0);
+		    }
+		  lra_set_insn_deleted (curr_insn);
+		  done_p = true;
+		}
+	      else if (bitmap_bit_p (remove_pseudos, sregno)
+		       && bitmap_bit_p (&lra_inheritance_pseudos, sregno))
+		{
+		  /* Search the following pattern:
+		       inherit_or_split_pseudo1 <- inherit_or_split_pseudo2
+		       original_pseudo <- inherit_or_split_pseudo1
+		    where the 2nd insn is the current insn and
+		    inherit_or_split_pseudo2 is not removed.  If it is found,
+		    change the current insn onto:
+		       original_pseudo1 <- inherit_or_split_pseudo2.  */
+		  for (prev_insn = PREV_INSN (curr_insn);
+		       prev_insn != NULL_RTX && ! NONDEBUG_INSN_P (prev_insn);
+		       prev_insn = PREV_INSN (prev_insn))
+		    ;
+		  if (prev_insn != NULL_RTX && BLOCK_FOR_INSN (prev_insn) == bb
+		      && (prev_set = single_set (prev_insn)) != NULL_RTX
+		      /* There should be no subregs in insn we are
+			 searching because only the original reg might
+			 be in subreg when we changed the mode of
+			 load/store for splitting.  */
+		      && REG_P (SET_DEST (prev_set))
+		      && REG_P (SET_SRC (prev_set))
+		      && (int) REGNO (SET_DEST (prev_set)) == sregno
+		      && ((prev_sregno = REGNO (SET_SRC (prev_set)))
+			  >= FIRST_PSEUDO_REGISTER)
+		      && (lra_reg_info[sregno].restore_regno
+			  == lra_reg_info[prev_sregno].restore_regno)
+		      && ! bitmap_bit_p (remove_pseudos, prev_sregno))
+		    {
+		      lra_assert (GET_MODE (SET_SRC (prev_set))
+				  == GET_MODE (regno_reg_rtx[sregno]));
+		      if (GET_CODE (SET_SRC (set)) == SUBREG)
+			SUBREG_REG (SET_SRC (set)) = SET_SRC (prev_set);
+		      else
+			SET_SRC (set) = SET_SRC (prev_set);
+		      lra_push_insn_and_update_insn_regno_info (curr_insn);
+		      lra_set_used_insn_alternative_by_uid
+			(INSN_UID (curr_insn), -1);
+		      done_p = true;
+		      if (lra_dump_file != NULL)
+			{
+			  fprintf (lra_dump_file, "    Change reload insn:\n");
+			  debug_rtl_slim (lra_dump_file,
+					  curr_insn, curr_insn, -1, 0);
+			}
+		    }
+		}
+	    }
+	  if (! done_p)
+	    {
+	      struct lra_insn_reg *reg;
+	      bool insn_change_p = false;
+
+	      curr_id = lra_get_insn_recog_data (curr_insn);
+	      for (reg = curr_id->regs; reg != NULL; reg = reg->next)
+		if ((regno = reg->regno) >= lra_constraint_new_regno_start
+		    && lra_reg_info[regno].restore_regno >= 0)
+		  {
+		    if (change_p && bitmap_bit_p (remove_pseudos, regno))
+		      {
+			restore_regno = lra_reg_info[regno].restore_regno;
+			substitute_pseudo (&curr_insn, regno,
+					   regno_reg_rtx[restore_regno]);
+			insn_change_p = true;
+		      }
+		    else if (NONDEBUG_INSN_P (curr_insn))
+		      {
+			lra_push_insn_and_update_insn_regno_info (curr_insn);
+			lra_set_used_insn_alternative_by_uid
+			  (INSN_UID (curr_insn), -1);
+		      }
+		  }
+	      if (insn_change_p)
+		{
+		  lra_update_insn_regno_info (curr_insn);
+		  if (lra_dump_file != NULL)
+		    {
+		      fprintf (lra_dump_file, "	   Restore original insn:\n");
+		      debug_rtl_slim (lra_dump_file,
+				      curr_insn, curr_insn, -1, 0);
+		    }
+		}
+	    }
+	}
+    }
+  bitmap_clear (&temp_bitmap_head);
+  return change_p;
+}
+
+/* Entry function for undoing inheritance/split transformation.	 Return true
+   if we did any RTL change in this pass.  */
+bool
+lra_undo_inheritance (void)
+{
+  unsigned int regno;
+  int restore_regno, hard_regno;
+  int n_all_inherit, n_inherit, n_all_split, n_split;
+  bitmap_head remove_pseudos;
+  bitmap_iterator bi;
+  bool change_p;
+
+  lra_undo_inheritance_iter++;
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file,
+	     "\n********** Undoing inheritance #%d: **********\n\n",
+	     lra_undo_inheritance_iter);
+  bitmap_initialize (&remove_pseudos, &reg_obstack);
+  n_inherit = n_all_inherit = 0;
+  EXECUTE_IF_SET_IN_BITMAP (&lra_inheritance_pseudos, 0, regno, bi)
+    if (lra_reg_info[regno].restore_regno >= 0)
+      {
+	n_all_inherit++;
+	if (reg_renumber[regno] < 0)
+	  bitmap_set_bit (&remove_pseudos, regno);
+	else
+	  n_inherit++;
+      }
+  if (lra_dump_file != NULL && n_all_inherit != 0)
+    fprintf (lra_dump_file, "Inherit %d out of %d (%.2f%%)\n",
+	     n_inherit, n_all_inherit,
+	     (double) n_inherit / n_all_inherit * 100);
+  n_split = n_all_split = 0;
+  EXECUTE_IF_SET_IN_BITMAP (&lra_split_pseudos, 0, regno, bi)
+    if ((restore_regno = lra_reg_info[regno].restore_regno) >= 0)
+      {
+	n_all_split++;
+	hard_regno = (restore_regno >= FIRST_PSEUDO_REGISTER
+		      ? reg_renumber[restore_regno] : restore_regno);
+	if (hard_regno < 0 || reg_renumber[regno] == hard_regno)
+	  bitmap_set_bit (&remove_pseudos, regno);
+	else
+	  {
+	    n_split++;
+	    if (lra_dump_file != NULL)
+	      fprintf (lra_dump_file, "	     Keep split r%d (orig=r%d)\n",
+		       regno, restore_regno);
+	  }
+      }
+  if (lra_dump_file != NULL && n_all_split != 0)
+    fprintf (lra_dump_file, "Split %d out of %d (%.2f%%)\n",
+	     n_split, n_all_split,
+	     (double) n_split / n_all_split * 100);
+  change_p = remove_inheritance_pseudos (&remove_pseudos);
+  bitmap_clear (&remove_pseudos);
+  /* Clear restore_regnos.  */
+  EXECUTE_IF_SET_IN_BITMAP (&lra_inheritance_pseudos, 0, regno, bi)
+    lra_reg_info[regno].restore_regno = -1;
+  EXECUTE_IF_SET_IN_BITMAP (&lra_split_pseudos, 0, regno, bi)
+    lra_reg_info[regno].restore_regno = -1;
+  return change_p;
+}
Index: lra.c
===================================================================
--- lra.c	(revision 0)
+++ lra.c	(working copy)
@@ -0,0 +1,2368 @@ 
+/* LRA (local register allocator) driver and LRA utilities.
+   Copyright (C) 2010, 2011, 2012
+   Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.	If not see
+<http://www.gnu.org/licenses/>.	 */
+
+
+/* The Local Register Allocator (LRA) is a replacement of former
+   reload pass.	 It is focused to simplify code solving the reload
+   pass tasks, to make the code maintenance easier, and to implement new
+   perspective optimizations.
+
+   The major LRA design solutions are:
+     o division small manageable, separated sub-tasks
+     o reflection of all transformations and decisions in RTL as more
+       as possible
+     o insn constraints as a primary source of the info (minimizing
+       number of target-depended macros/hooks)
+
+   In brief LRA works by iterative insn process with the final goal is
+   to satisfy all insn and address constraints:
+     o New reload insns (in brief reloads) and reload pseudos might be
+       generated;
+     o Some pseudos might be spilled to assign hard registers to
+       new reload pseudos;
+     o Some pseudos are bound.	It means they always got the same
+       memory or hard register;
+     o Changing spilled pseudos to stack memory or their equivalences;
+     o Allocation stack memory changes the address displacement and
+       new iteration is needed.
+
+   Here is block diagram of LRA passes:
+
+	  ---------------------				    
+	 | Undo inheritance    |      ---------------	     --------------- 
+	 | for spilled pseudos)|     | Memory-memory |	    | New (and old) |
+	 | and splits (for     |<----| move coalesce |<-----|	 pseudos    |
+	 | pseudos got the     |      ---------------	    |	assignment  |
+  Start	 |  same  hard regs)   |			     --------------- 
+    |	  ---------------------					    ^
+    V		  |		 ----------------		    |
+ -----------	  V		| Update virtual |		    |
+|  Remove   |----> ------------>|    register	 |		    |
+| scratches |	  ^		|  displacements |		    |
+ -----------	  |		 ----------------		    |
+		  |			 |			    |
+		  |			 V	   New		    |
+	 ----------------    No	   ------------	 pseudos   -------------------
+	| Spilled pseudo | change |Constraints:| or insns | Inheritance/split |
+	|    to memory	 |<-------|    RTL     |--------->|  transformations  |
+	|  substitution	 |	  | transfor-  |	  |    in EBB scope   |
+	 ----------------	  |  mations   |	   -------------------
+		|		    ------------ 
+		V
+    -------------------------
+   | Hard regs substitution, |
+   |  devirtalization, and   |------> Finish
+   | restoring scratches got |
+   |	     memory	     |
+    -------------------------
+
+   To speed up the process:
+     o We process only insns affected by changes on previous
+       iterations;
+     o We don't use DFA-infrastructure because it results in much slower
+       compiler speed than a special IR described below does;
+     o We use a special insn representation for quick access to insn
+       info which is always *synchronized* with the current RTL;
+       o Insn IR is minimized by memory.  It is divided on three parts:
+	 o one specific for each insn in RTL (only operand locations);
+	 o one common for all insns in RTL with the same insn code
+	   (different operand attributes from machine descriptions);
+	 o one oriented for maintenance of live info (list of pseudos).
+       o Pseudo data:
+	 o all insns where the pseudo is referenced;
+	 o live info (conflicting hard regs, live ranges, # of
+	   references etc);
+	 o data used for assigning (preferred hard regs, costs etc).
+
+   This file contains LRA driver, LRA utility functions and data, and
+   code for dealing with scratches.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "regs.h"
+#include "insn-config.h"
+#include "insn-codes.h"
+#include "recog.h"
+#include "output.h"
+#include "addresses.h"
+#include "hard-reg-set.h"
+#include "flags.h"
+#include "function.h"
+#include "expr.h"
+#include "basic-block.h"
+#include "except.h"
+#include "tree-pass.h"
+#include "timevar.h"
+#include "target.h"
+#include "vec.h"
+#include "ira.h"
+#include "lra-int.h"
+#include "df.h"
+
+/* Hard registers currently not available for allocation.  It can
+   changed after some hard  registers become not eliminable.  */
+HARD_REG_SET lra_no_alloc_regs;
+
+static int get_new_reg_value (void);
+static void expand_reg_info (void);
+static void invalidate_insn_recog_data (int);
+static int get_insn_freq (rtx);
+static void invalidate_insn_data_regno_info (lra_insn_recog_data_t, rtx, int);
+
+/* Expand all regno related info needed for LRA.  */
+static void
+expand_reg_data (void)
+{
+  resize_reg_info ();
+  expand_reg_info ();
+  ira_expand_reg_equiv ();
+}
+
+/* Create and return a new reg from register FROM corresponding to
+   machine description operand of mode MD_MODE.	 Initialize its
+   register class to RCLASS.  Print message about assigning class
+   RCLASS containing new register name TITLE unless it is NULL.	 The
+   created register will have unique held value.  */
+rtx
+lra_create_new_reg_with_unique_value (enum machine_mode md_mode, rtx original,
+				      enum reg_class rclass, const char *title)
+{
+  enum machine_mode mode;
+  rtx new_reg;
+
+  if (original == NULL_RTX || (mode = GET_MODE (original)) == VOIDmode)
+    mode = md_mode;
+  lra_assert (mode != VOIDmode);
+  new_reg = gen_reg_rtx (mode);
+  if (original == NULL_RTX || ! REG_P (original))
+    {
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "      Creating newreg=%i", REGNO (new_reg));
+    }
+  else
+    {
+      if (ORIGINAL_REGNO (original) >= FIRST_PSEUDO_REGISTER)
+	ORIGINAL_REGNO (new_reg) = ORIGINAL_REGNO (original);
+      REG_USERVAR_P (new_reg) = REG_USERVAR_P (original);
+      REG_POINTER (new_reg) = REG_POINTER (original);
+      REG_ATTRS (new_reg) = REG_ATTRS (original);
+      if (lra_dump_file != NULL)
+	fprintf (lra_dump_file, "      Creating newreg=%i from oldreg=%i",
+		 REGNO (new_reg), REGNO (original));
+    }
+  if (lra_dump_file != NULL)
+    {
+      if (title != NULL)
+	fprintf (lra_dump_file, ", assigning class %s to%s%s r%d",
+		 reg_class_names[rclass], *title == '\0' ? "" : " ",
+		 title, REGNO (new_reg));
+      fprintf (lra_dump_file, "\n");
+    }
+  expand_reg_data ();
+  setup_reg_classes (REGNO (new_reg), rclass, NO_REGS, rclass);
+  return new_reg;
+}
+
+/* Analogous to the previous function but also inherits value of
+   ORIGINAL.  */
+rtx
+lra_create_new_reg (enum machine_mode md_mode, rtx original,
+		    enum reg_class rclass, const char *title)
+{
+  rtx new_reg;
+
+  new_reg
+    = lra_create_new_reg_with_unique_value (md_mode, original, rclass, title);
+  if (original != NULL_RTX && REG_P (original))
+    lra_reg_info[REGNO (new_reg)].val = lra_reg_info[REGNO (original)].val;
+  return new_reg;
+}
+
+/* Set up for REGNO unique hold value.	*/
+void
+lra_set_regno_unique_value (int regno)
+{
+  lra_reg_info[regno].val = get_new_reg_value ();
+}
+
+/* Invalidate INSN related info used by LRA.  */
+void
+lra_invalidate_insn_data (rtx insn)
+{
+  lra_invalidate_insn_regno_info (insn);
+  invalidate_insn_recog_data (INSN_UID (insn));
+}
+
+/* Mark INSN deleted and invalidate the insn related info used by
+   LRA.	 */
+void
+lra_set_insn_deleted (rtx insn)
+{
+  lra_invalidate_insn_data (insn);
+  SET_INSN_DELETED (insn);
+}
+
+/* Delete an unneeded INSN and any previous insns who sole purpose is
+   loading data that is dead in INSN.  */
+void
+lra_delete_dead_insn (rtx insn)
+{
+  rtx prev = prev_real_insn (insn);
+  rtx prev_dest;
+
+  /* If the previous insn sets a register that dies in our insn,
+     delete it too.  */
+  if (prev && GET_CODE (PATTERN (prev)) == SET
+      && (prev_dest = SET_DEST (PATTERN (prev)), REG_P (prev_dest))
+      && reg_mentioned_p (prev_dest, PATTERN (insn))
+      && find_regno_note (insn, REG_DEAD, REGNO (prev_dest))
+      && ! side_effects_p (SET_SRC (PATTERN (prev))))
+    lra_delete_dead_insn (prev);
+
+  lra_set_insn_deleted (insn);
+}
+
+/* Target checks operands through operand predicates to recognize an
+   insn.  We should have a special precaution to generate add insns
+   which are frequent results of elimination.
+
+   Emit insns for x = y + z.  X can be used to store intermediate
+   values and should be not in Y and Z when we use x to store an
+   intermediate value.	*/ 
+void
+lra_emit_add (rtx x, rtx y, rtx z)
+{
+  int old;
+  rtx insn, last;
+  rtx a1, a2, base, index, disp, scale, index_scale;
+  bool ok_p;
+
+  insn = gen_add3_insn (x, y, z);
+  old = max_reg_num ();
+  if (insn != NULL_RTX)
+    emit_insn (insn);
+  else
+    {
+      disp = a2 = NULL_RTX;
+      if (GET_CODE (y) == PLUS)
+	{
+	  a1 = XEXP (y, 0);
+	  a2 = XEXP (y, 1);
+	  disp = z;
+	}
+      else
+	{
+	  a1 = y;
+	  if (CONSTANT_P (z))
+	    disp = z;
+	  else
+	    a2 = z;
+	}
+      index_scale = scale = NULL_RTX;
+      if (GET_CODE (a1) == MULT)
+	{
+	  index_scale = a1;
+	  index = XEXP (a1, 0);
+	  scale = XEXP (a1, 1);
+	  base = a2;
+	}
+      else if (a2 != NULL_RTX && GET_CODE (a2) == MULT)
+	{
+	  index_scale = a2;
+	  index = XEXP (a2, 0);
+	  scale = XEXP (a2, 1);
+	  base = a1;
+	}
+      else
+	{
+	  base = a1;
+	  index = a2;
+	}
+      if (! REG_P (base)
+	  || (index != NULL_RTX && ! REG_P (index))
+	  || (disp != NULL_RTX && ! CONSTANT_P (disp))
+	  || (scale != NULL_RTX && ! CONSTANT_P (scale)))
+	{
+	  /* Its is not an address generation.	Probably we have no 3 op
+	     add.  Last chance is to use 2-op add insn.	 */
+	  lra_assert (x != y && x != z);
+	  emit_move_insn (x, z);
+	  insn = gen_add2_insn (x, y);
+	  emit_insn (insn);
+	}
+      else
+	{
+	  if (index_scale == NULL_RTX)
+	    index_scale = index;
+	  if (disp == NULL_RTX)
+	    {
+	      /* Generate x = index_scale; x = x + base.  */
+	      lra_assert (index_scale != NULL_RTX && base != NULL_RTX);
+	      emit_move_insn (x, index_scale);
+	      insn = gen_add2_insn (x, base);
+	      emit_insn (insn);
+	    }
+	  else if (scale == NULL_RTX)
+	    {
+	      /* Try x = base + disp.  */
+	      lra_assert (base != NULL_RTX);
+	      last = get_last_insn ();
+	      insn = emit_move_insn (x, gen_rtx_PLUS (GET_MODE (base),
+						      base, disp));
+	      if (recog_memoized (insn) < 0)
+		{
+		  delete_insns_since (last);
+		  /* Generate x = disp; x = x + base.  */
+		  emit_move_insn (x, disp);
+		  insn = gen_add2_insn (x, base);
+		  emit_insn (insn);
+		}
+	      /* Generate x = x + index.  */
+	      if (index != NULL_RTX)
+		{
+		  insn = gen_add2_insn (x, index);
+		  emit_insn (insn);
+		}
+	    }
+	  else
+	    {
+	      /* Try x = index_scale; x = x + disp; x = x + base.  */
+	      last = get_last_insn ();
+	      insn = emit_move_insn (x, index_scale);
+	      ok_p = false;
+	      if (recog_memoized (insn) >= 0)
+		{
+		  insn = gen_add2_insn (x, disp);
+		  if (insn != NULL_RTX)
+		    {
+		      emit_insn (insn);
+		      insn = gen_add2_insn (x, disp);
+		      if (insn != NULL_RTX)
+			{
+			  emit_insn (insn);
+			  ok_p = true;
+			}
+		    }
+		}
+	      if (! ok_p)
+		{
+		  delete_insns_since (last);
+		  /* Generate x = disp; x = x + base; x = x + index_scale.  */
+		  emit_move_insn (x, disp);
+		  insn = gen_add2_insn (x, base);
+		  emit_insn (insn);
+		  insn = gen_add2_insn (x, index_scale);
+		  emit_insn (insn);
+		}
+	    }
+	}
+    }
+  /* Functions emit_... can create pseudos -- so expand the pseudo
+     data.  */
+  if (old != max_reg_num ())
+    expand_reg_data ();
+}
+
+/* The number of emitted reload insns so far.  */
+int lra_curr_reload_num;
+
+/* Emit x := y, processing special case when y = u + v or y = u + v *
+   scale + w through emit_add (Y can be an address which is base +
+   index reg * scale + displacement in general case).  X may be used
+   as intermediate result therefore it should be not in Y.  */
+void
+lra_emit_move (rtx x, rtx y)
+{
+  int old;
+
+  if (GET_CODE (y) != PLUS)
+    {
+      if (rtx_equal_p (x, y))
+	return;
+      old = max_reg_num ();
+      emit_move_insn (x, y);
+      if (REG_P (x))
+	lra_reg_info[ORIGINAL_REGNO (x)].last_reload = ++lra_curr_reload_num;
+      /* Function emit_move can create pseudos -- so expand the pseudo
+	 data.	*/
+      if (old != max_reg_num ())
+	expand_reg_data ();
+      return;
+    }
+  lra_emit_add (x, XEXP (y, 0), XEXP (y, 1));
+}
+
+/* Update insn operands which are duplication of operands whose
+   numbers are in array of NOPS (with end marker -1).  The insn is
+   represented by its LRA internal representation ID.  */
+void
+lra_update_dups (lra_insn_recog_data_t id, signed char *nops)
+{
+  int i, j, nop;
+  struct lra_static_insn_data *static_id = id->insn_static_data;
+
+  for (i = 0; i < static_id->n_dups; i++)
+    for (j = 0; (nop = nops[j]) >= 0; j++)
+      if (static_id->dup_num[i] == nop)
+	*id->dup_loc[i] = *id->operand_loc[nop];
+}
+
+
+
+/* This page contains code dealing with info about registers in the
+   insns.  */
+
+/* Pools for insn reg info.  */
+static alloc_pool insn_reg_pool;
+
+/* Initiate pool for insn reg info.  */
+static void
+init_insn_regs (void)
+{
+  insn_reg_pool
+    = create_alloc_pool ("insn regs", sizeof (struct lra_insn_reg), 100);
+}
+
+/* Create LRA insn related info about referenced REGNO with TYPE
+   (in/out/inout), biggest reference mode MODE, flag that it is
+   reference through subreg (SUBREG_P), flag that is early clobbered
+   in the insn (EARLY_CLOBBER), and reference to the next insn reg
+   info (NEXT).	 */
+static struct lra_insn_reg *
+new_insn_reg (int regno, enum op_type type, enum machine_mode mode,
+	      bool subreg_p, bool early_clobber, struct lra_insn_reg *next)
+{
+  struct lra_insn_reg *ir;
+
+  ir = (struct lra_insn_reg *) pool_alloc (insn_reg_pool);
+  ir->type = type;
+  ir->biggest_mode = mode;
+  ir->subreg_p = subreg_p;
+  ir->early_clobber = early_clobber;
+  ir->regno = regno;
+  ir->next = next;
+  return ir;
+}
+
+/* Free insn reg info IR.  */
+static void
+free_insn_reg (struct lra_insn_reg *ir)
+{
+  pool_free (insn_reg_pool, ir);
+}
+
+/* Free insn reg info list IR.	*/
+static void
+free_insn_regs (struct lra_insn_reg *ir)
+{
+  struct lra_insn_reg *next_ir;
+
+  for (; ir != NULL; ir = next_ir)
+    {
+      next_ir = ir->next;
+      free_insn_reg (ir);
+    }
+}
+
+/* Finish pool for insn reg info.  */
+static void
+finish_insn_regs (void)
+{
+  free_alloc_pool (insn_reg_pool);
+}
+
+
+
+/* This page contains code dealing LRA insn info (or in other words
+   LRA internal insn representation).  */
+
+
+/* Map INSN_CODE -> the static insn data.  This info is valid during
+   all translation unit.  */
+struct lra_static_insn_data *insn_code_data[LAST_INSN_CODE];
+
+/* Map INSN_UID -> the operand alternative data (NULL if unknown).  We
+   assume that this data is valid until register info is changed
+   because classes in the data can be changed.	*/
+struct operand_alternative *op_alt_data[LAST_INSN_CODE];
+
+/* Debug insns are represented as a special insn with one input
+   operand which is RTL expression in var_location.  */
+
+/* The following data are used as static insn operand data for all
+   debug insns.	 If structure lra_operand_data is changed, the
+   initializer should be changed too.  */
+static struct lra_operand_data debug_operand_data =
+  {
+    NULL, /* alternative  */
+    VOIDmode, /* We are not interesting in the operand mode.  */
+    OP_IN,
+    0, 0, 0 
+  };
+
+/* The following data are used as static insn data for all debug
+   insns.  If structure lra_static_insn_data is changed, the
+   initializer should be changed too.  */
+static struct lra_static_insn_data debug_insn_static_data =
+  {
+    &debug_operand_data,
+    0,	/* Duplication operands #.  */
+    -1, /* Commutative operand #.  */
+    1,	/* Operands #.	There is only one operand which is debug RTL
+	   expression.	*/
+    0,	/* Duplications #.  */
+    0,	/* Alternatives #.  We are not interesting in alternatives
+	   because we does not proceed debug_insns for reloads.	 */
+    NULL, /* Hard registers referenced in machine description.	*/
+    NULL  /* Descriptions of operands in alternatives.	*/
+  };
+
+/* Called once per compiler work to initialize some LRA data related
+   to insns.  */
+static void
+init_insn_code_data_once (void)
+{
+  memset (insn_code_data, 0, sizeof (insn_code_data));
+  memset (op_alt_data, 0, sizeof (op_alt_data));
+}
+
+/* Called once per compiler work to finalize some LRA data related to
+   insns.  */
+static void
+finish_insn_code_data_once (void)
+{
+  int i;
+
+  for (i = 0; i < LAST_INSN_CODE; i++)
+    {
+      if (insn_code_data[i] != NULL)
+	free (insn_code_data[i]);
+      if (op_alt_data[i] != NULL)
+	free (op_alt_data[i]);
+    }
+}
+
+/* Initialize LRA info about operands in insn alternatives.  */
+static void
+init_op_alt_data (void)
+{
+ int i;
+
+  for (i = 0; i < LAST_INSN_CODE; i++)
+    if (op_alt_data[i] != NULL)
+      {
+	free (op_alt_data[i]);
+	op_alt_data[i] = NULL;
+      }
+}
+
+/* Return static insn data, allocate and setup if necessary.  Although
+   dup_num is static data (it depends only on icode), to set it up we
+   need to extract insn first.	So recog_data should be valid for
+   normal insn (ICODE >= 0) before the call.  */
+static struct lra_static_insn_data *
+get_static_insn_data (int icode, int nop, int ndup, int nalt)
+{
+  struct lra_static_insn_data *data;
+
+  lra_assert (icode < LAST_INSN_CODE);
+  if (icode >= 0 && (data = insn_code_data[icode]) != NULL)
+    return data;
+  lra_assert (nop >= 0 && ndup >= 0 && nalt >= 0);
+  data = ((struct lra_static_insn_data *)
+	  xmalloc (sizeof (struct lra_static_insn_data)
+		   + sizeof (struct lra_operand_data) * nop
+		   + sizeof (int) * ndup));
+  data->n_operands = nop;
+  data->n_dups = ndup;
+  data->n_alternatives = nalt;
+  data->operand = ((struct lra_operand_data *)
+		   ((char *) data + sizeof (struct lra_static_insn_data)));
+  data->dup_num = ((int *) ((char *) data->operand
+			    + sizeof (struct lra_operand_data) * nop));
+  if (icode >= 0)
+    {
+      int i;
+
+      insn_code_data[icode] = data;
+      for (i = 0; i < nop; i++)
+	{
+	  data->operand[i].constraint
+	    = insn_data[icode].operand[i].constraint;
+	  data->operand[i].mode = insn_data[icode].operand[i].mode;
+	  data->operand[i].strict_low = insn_data[icode].operand[i].strict_low;
+	  data->operand[i].is_operator
+	    = insn_data[icode].operand[i].is_operator;
+	  data->operand[i].type
+	    = (data->operand[i].constraint[0] == '=' ? OP_OUT
+	       : data->operand[i].constraint[0] == '+' ? OP_INOUT
+	       : OP_IN);
+	}
+      for (i = 0; i < ndup; i++)
+	data->dup_num[i] = recog_data.dup_num[i];
+    }
+  return data;
+}
+
+/* The current length of the following array.  */
+int lra_insn_recog_data_len;
+
+/* Map INSN_UID -> the insn recog data (NULL if unknown).  */
+lra_insn_recog_data_t *lra_insn_recog_data;
+
+/* Initialize LRA data about insns.  */
+static void
+init_insn_recog_data (void)
+{
+  lra_insn_recog_data_len = 0;
+  lra_insn_recog_data = NULL;
+  init_insn_regs ();
+}
+
+/* Expand, if necessary, LRA data about insns.	*/
+static void
+check_and_expand_insn_recog_data (int index)
+{
+  int i, old;
+
+  if (lra_insn_recog_data_len > index)
+    return;
+  old = lra_insn_recog_data_len;
+  lra_insn_recog_data_len = index * 3 / 2 + 1;
+  lra_insn_recog_data
+    = (lra_insn_recog_data_t *) xrealloc (lra_insn_recog_data,
+					  lra_insn_recog_data_len
+					  * sizeof (lra_insn_recog_data_t));
+  for (i = old; i < lra_insn_recog_data_len; i++)
+    lra_insn_recog_data[i] = NULL;
+}
+
+/* Finish LRA DATA about insn.	*/
+static void
+free_insn_recog_data (lra_insn_recog_data_t data)
+{
+  if (data->operand_loc != NULL)
+    free (data->operand_loc);
+  if (data->dup_loc != NULL)
+    free (data->dup_loc);
+  if (data->arg_hard_regs != NULL)
+    free (data->arg_hard_regs);
+#ifdef HAVE_ATTR_enabled
+  if (data->alternative_enabled_p != NULL)
+    free (data->alternative_enabled_p);
+#endif
+  if (data->icode < 0 && NONDEBUG_INSN_P (data->insn))
+    {
+      if (data->insn_static_data->operand_alternative != NULL)
+	free (data->insn_static_data->operand_alternative);
+      free_insn_regs (data->insn_static_data->hard_regs);
+      free (data->insn_static_data);
+    }
+  free_insn_regs (data->regs);
+  data->regs = NULL;
+  free (data);
+}
+
+/* Finish LRA data about all insns.  */
+static void
+finish_insn_recog_data (void)
+{
+  int i;
+  lra_insn_recog_data_t data;
+
+  for (i = 0; i < lra_insn_recog_data_len; i++)
+    if ((data = lra_insn_recog_data[i]) != NULL)
+      free_insn_recog_data (data);
+  finish_insn_regs ();
+  free (lra_insn_recog_data);
+}
+
+/* Setup info about operands in alternatives of LRA DATA of insn.  */
+static void
+setup_operand_alternative (lra_insn_recog_data_t data)
+{
+  int i, nop, nalt;
+  int icode = data->icode;
+  struct lra_static_insn_data *static_data = data->insn_static_data;
+
+  if (icode >= 0
+      && (static_data->operand_alternative = op_alt_data[icode]) != NULL)
+    return;
+  static_data->commutative = -1;
+  nop = static_data->n_operands;
+  if (nop == 0)
+    {
+      static_data->operand_alternative = NULL;
+      return;
+    }
+  nalt = static_data->n_alternatives;
+  static_data->operand_alternative
+    = ((struct operand_alternative *)
+       xmalloc (nalt * nop * sizeof (struct operand_alternative)));
+  memset (static_data->operand_alternative, 0,
+	  nalt * nop * sizeof (struct operand_alternative));
+  if (icode >= 0)
+    op_alt_data[icode] = static_data->operand_alternative;
+  for (i = 0; i < nop; i++)
+    {
+      int j;
+      struct operand_alternative *op_alt_start, *op_alt;
+      const char *p = static_data->operand[i].constraint;
+
+      static_data->operand[i].early_clobber = 0;
+      op_alt_start = &static_data->operand_alternative[i];
+
+      for (j = 0; j < nalt; j++)
+	{
+	  op_alt = op_alt_start + j * nop;
+	  op_alt->cl = NO_REGS;
+	  op_alt->constraint = p;
+	  op_alt->matches = -1;
+	  op_alt->matched = -1;
+
+	  if (*p == '\0' || *p == ',')
+	    {
+	      op_alt->anything_ok = 1;
+	      continue;
+	    }
+
+	  for (;;)
+	    {
+	      char c = *p;
+	      if (c == '#')
+		do
+		  c = *++p;
+		while (c != ',' && c != '\0');
+	      if (c == ',' || c == '\0')
+		{
+		  p++;
+		  break;
+		}
+
+	      switch (c)
+		{
+		case '=': case '+': case '*':
+		case 'E': case 'F': case 'G': case 'H':
+		case 's': case 'i': case 'n':
+		case 'I': case 'J': case 'K': case 'L':
+		case 'M': case 'N': case 'O': case 'P':
+		  /* These don't say anything we care about.  */
+		  break;
+
+		case '%':
+		  /* We currently only support one commutative pair of
+		     operands.	*/
+		  if (static_data->commutative < 0)
+		    static_data->commutative = i;
+		  else
+		    lra_assert (data->icode < 0); /* Asm  */
+
+		  /* The last operand should not be marked
+		     commutative.  */
+		  lra_assert (i != nop - 1);
+		  break;
+
+		case '?':
+		  op_alt->reject += 6;
+		  break;
+		case '!':
+		  op_alt->reject += 600;
+		  break;
+		case '&':
+		  op_alt->earlyclobber = 1;
+		  static_data->operand[i].early_clobber = 1;
+		  break;
+
+		case '0': case '1': case '2': case '3': case '4':
+		case '5': case '6': case '7': case '8': case '9':
+		  {
+		    char *end;
+		    op_alt->matches = strtoul (p, &end, 10);
+		    static_data->operand_alternative
+		      [j * nop + op_alt->matches].matched = i;
+		    p = end;
+		  }
+		  continue;
+
+		case TARGET_MEM_CONSTRAINT:
+		  op_alt->memory_ok = 1;
+		  break;
+		case '<':
+		  op_alt->decmem_ok = 1;
+		  break;
+		case '>':
+		  op_alt->incmem_ok = 1;
+		  break;
+		case 'V':
+		  op_alt->nonoffmem_ok = 1;
+		  break;
+		case 'o':
+		  op_alt->offmem_ok = 1;
+		  break;
+		case 'X':
+		  op_alt->anything_ok = 1;
+		  break;
+
+		case 'p':
+		  op_alt->is_address = 1;
+		  op_alt->cl = (reg_class_subunion[(int) op_alt->cl]
+				[(int) base_reg_class (VOIDmode,
+						       ADDR_SPACE_GENERIC,
+						       ADDRESS, SCRATCH)]);
+		  break;
+
+		case 'g':
+		case 'r':
+		  op_alt->cl =
+		   reg_class_subunion[(int) op_alt->cl][(int) GENERAL_REGS];
+		  break;
+
+		default:
+		  if (EXTRA_MEMORY_CONSTRAINT (c, p))
+		    {
+		      op_alt->memory_ok = 1;
+		      break;
+		    }
+		  if (EXTRA_ADDRESS_CONSTRAINT (c, p))
+		    {
+		      op_alt->is_address = 1;
+		      op_alt->cl
+			= (reg_class_subunion
+			   [(int) op_alt->cl]
+			   [(int) base_reg_class (VOIDmode, ADDR_SPACE_GENERIC,
+						  ADDRESS, SCRATCH)]);
+		      break;
+		    }
+
+		  op_alt->cl
+		    = (reg_class_subunion
+		       [(int) op_alt->cl]
+		       [(int)
+			REG_CLASS_FROM_CONSTRAINT ((unsigned char) c, p)]);
+		  break;
+		}
+	      p += CONSTRAINT_LEN (c, p);
+	    }
+	}
+    }
+}
+
+/* Recursively process X and collect info about registers, which are
+   not the insn operands, in X with TYPE (in/out/inout) and flag that
+   it is early clobbered in the insn (EARLY_CLOBBER) and add the info
+   to LIST.  X is a part of insn given by DATA.	 Return the result
+   list.  */
+static struct lra_insn_reg *
+collect_non_operand_hard_regs (rtx *x, lra_insn_recog_data_t data,
+			       struct lra_insn_reg *list,
+			       enum op_type type, bool early_clobber)
+{
+  int i, j, regno, last;
+  bool subreg_p;
+  enum machine_mode mode;
+  struct lra_insn_reg *curr;
+  rtx op = *x;
+  enum rtx_code code = GET_CODE (op);
+  const char *fmt = GET_RTX_FORMAT (code);
+
+  for (i = 0; i < data->insn_static_data->n_operands; i++)
+    if (x == data->operand_loc[i])
+      /* It is an operand loc. Stop here.  */
+      return list;
+  for (i = 0; i < data->insn_static_data->n_dups; i++)
+    if (x == data->dup_loc[i])
+      /* It is a dup loc. Stop here.  */
+      return list;
+  mode = GET_MODE (op);
+  subreg_p = false;
+  if (code == SUBREG)
+    {
+      op = SUBREG_REG (op);
+      code = GET_CODE (op);
+      if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (op)))
+	{
+	  mode = GET_MODE (op);
+	  if (GET_MODE_SIZE (mode) > REGMODE_NATURAL_SIZE (mode))
+	    subreg_p = true;
+	}
+    }
+  if (REG_P (op))
+    {
+      if ((regno = REGNO (op)) >= FIRST_PSEUDO_REGISTER)
+	return list;
+      for (last = regno + hard_regno_nregs[regno][mode];
+	   regno < last;
+	   regno++)
+	if (! TEST_HARD_REG_BIT (lra_no_alloc_regs, regno))
+	  {
+	    for (curr = list; curr != NULL; curr = curr->next)
+	      if (curr->regno == regno)
+		break;
+	    if (curr == NULL || curr->subreg_p != subreg_p
+		|| curr->biggest_mode != mode)
+	      {
+		/* This is a new hard regno or the info can not be
+		   integrated into the found structure.	 */
+#ifdef STACK_REGS
+		early_clobber
+		  = (early_clobber
+		     /* This clobber is to inform popping floating
+			point stack only.  */
+		     && ! (FIRST_STACK_REG <= regno
+			   && regno <= LAST_STACK_REG));
+#endif
+		list = new_insn_reg (regno, type, mode, subreg_p,
+				     early_clobber, list);
+	      }
+	    else
+	      {
+		if (curr->type != type)
+		  curr->type = OP_INOUT;
+		if (curr->early_clobber != early_clobber)
+		  curr->early_clobber = true;
+	      }
+	  }
+      return list;
+    }
+  switch (code)
+    {
+    case SET:
+      list = collect_non_operand_hard_regs (&SET_DEST (op), data,
+					    list, OP_OUT, false);
+      list = collect_non_operand_hard_regs (&SET_SRC (op), data,
+					    list, OP_IN, false);
+      break;
+    case CLOBBER:
+      /* We treat clobber of non-operand hard registers as early
+	 clobber (the behavior is expected from asm).  */ 
+      list = collect_non_operand_hard_regs (&XEXP (op, 0), data,
+					    list, OP_OUT, true);
+      break;
+    case PRE_INC: case PRE_DEC: case POST_INC: case POST_DEC:
+      list = collect_non_operand_hard_regs (&XEXP (op, 0), data,
+					    list, OP_INOUT, false);
+      break;
+    case PRE_MODIFY: case POST_MODIFY:
+      list = collect_non_operand_hard_regs (&XEXP (op, 0), data,
+					    list, OP_INOUT, false);
+      list = collect_non_operand_hard_regs (&XEXP (op, 1), data,
+					    list, OP_IN, false);
+      break;
+    default:
+      fmt = GET_RTX_FORMAT (code);
+      for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+	{
+	  if (fmt[i] == 'e')
+	    list = collect_non_operand_hard_regs (&XEXP (op, i), data,
+						  list, OP_IN, false);
+	  else if (fmt[i] == 'E')
+	    for (j = XVECLEN (op, i) - 1; j >= 0; j--)
+	      list = collect_non_operand_hard_regs (&XVECEXP (op, i, j), data,
+						    list, OP_IN, false);
+	}
+    }
+  return list;
+}
+
+/* Set up and return info about INSN.  Set up the info if it is not set up
+   yet.	 */
+lra_insn_recog_data_t
+lra_set_insn_recog_data (rtx insn)
+{
+  lra_insn_recog_data_t data;
+  int i, n, icode;
+  rtx **locs;
+  unsigned int uid = INSN_UID (insn);
+  struct lra_static_insn_data *insn_static_data;
+
+  check_and_expand_insn_recog_data (uid);
+  if (DEBUG_INSN_P (insn))
+    icode = -1;
+  else
+    {
+      icode = INSN_CODE (insn);
+      if (icode < 0)
+	/* It might be a new simple insn which is not recognized yet.  */
+	INSN_CODE (insn) = icode = recog (PATTERN (insn), insn, 0);
+    }
+  data
+    = (lra_insn_recog_data_t) xmalloc (sizeof (struct lra_insn_recog_data));
+  lra_insn_recog_data[uid] = data;
+  data->insn = insn;
+  data->used_insn_alternative = -1;
+  data->icode = icode;
+  data->regs = NULL;
+  if (DEBUG_INSN_P (insn))
+    {
+      data->insn_static_data = &debug_insn_static_data;
+      data->dup_loc = NULL;
+      data->arg_hard_regs = NULL;
+#ifdef HAVE_ATTR_enabled
+      data->alternative_enabled_p = NULL;
+#endif
+      data->operand_loc = (rtx **) xmalloc (sizeof (rtx *));
+      data->operand_loc[0] = &INSN_VAR_LOCATION_LOC (insn);
+      return data;
+    }
+  if (icode < 0)
+    {
+      int nop;
+      enum machine_mode operand_mode[MAX_RECOG_OPERANDS];
+      const char *constraints[MAX_RECOG_OPERANDS];
+
+      nop = asm_noperands (PATTERN (insn));
+      data->operand_loc = data->dup_loc = NULL;
+      if (nop < 0)
+	/* Its is a special insn like USE or CLOBBER.  */
+	data->insn_static_data = insn_static_data
+	  = get_static_insn_data (-1, 0, 0, 1);
+      else
+	{
+	  /* expand_asm_operands makes sure there aren't too many
+	     operands.	*/
+	  lra_assert (nop <= MAX_RECOG_OPERANDS);
+	  if (nop != 0)
+	    data->operand_loc = (rtx **) xmalloc (nop * sizeof (rtx *));
+	  /* Now get the operand values and constraints out of the
+	     insn.  */
+	  decode_asm_operands (PATTERN (insn), NULL,
+			       data->operand_loc,
+			       constraints, operand_mode, NULL);
+	  n = 1;
+	  if (nop > 0)
+	    {
+	      const char *p =  recog_data.constraints[0];
+	      
+	      for (p =	constraints[0]; *p; p++)
+		n += *p == ',';
+	    }
+	  data->insn_static_data = insn_static_data
+	    = get_static_insn_data (-1, nop, 0, n);
+	  for (i = 0; i < nop; i++)
+	    {
+	      insn_static_data->operand[i].mode = operand_mode[i];
+	      insn_static_data->operand[i].constraint = constraints[i];
+	      insn_static_data->operand[i].strict_low = false;
+	      insn_static_data->operand[i].is_operator = false;
+	    }
+	}
+      for (i = 0; i < insn_static_data->n_operands; i++)
+	insn_static_data->operand[i].type
+	  = (insn_static_data->operand[i].constraint[0] == '=' ? OP_OUT
+	     : insn_static_data->operand[i].constraint[0] == '+' ? OP_INOUT
+	     : OP_IN);
+#ifdef HAVE_ATTR_enabled
+      data->alternative_enabled_p = NULL;
+#endif
+    }
+  else
+    {
+      insn_extract (insn);
+      data->insn_static_data = insn_static_data
+	= get_static_insn_data (icode, insn_data[icode].n_operands,
+				insn_data[icode].n_dups,
+				insn_data[icode].n_alternatives);
+      n = insn_static_data->n_operands;
+      if (n == 0)
+	locs = NULL;
+      else
+	{
+	  
+	  locs = (rtx **) xmalloc (n * sizeof (rtx *));
+	  memcpy (locs, recog_data.operand_loc, n * sizeof (rtx *));
+	}
+      data->operand_loc = locs;
+      n = insn_static_data->n_dups;
+      if (n == 0)
+	locs = NULL;
+      else
+	{
+	  locs = (rtx **) xmalloc (n * sizeof (rtx *));
+	  memcpy (locs, recog_data.dup_loc, n * sizeof (rtx *));
+	}
+      data->dup_loc = locs;
+#ifdef HAVE_ATTR_enabled
+      {
+	bool *bp;
+
+	n = insn_static_data->n_alternatives;
+	lra_assert (n >= 0);
+	data->alternative_enabled_p
+	  = bp = (bool *) xmalloc (n * sizeof (bool));
+	/* Cache the insn because we don't want to call extract_insn
+	   from get_attr_enabled as extract_insn modifies
+	   which_alternative.  The attribute enabled should not depend
+	   on insn operands, operand modes, operand types, and operand
+	   constraints.	 It should depend on the architecture.	If it
+	   is not true, we should rewrite this file code to use
+	   extract_insn instead of less expensive insn_extract.	 */
+	recog_data.insn = insn;
+	for (i = 0; i < n; i++)
+	  {
+	    which_alternative = i;
+	    bp[i] = get_attr_enabled (insn);
+	  }
+      }
+#endif
+    }
+  if (GET_CODE (PATTERN (insn)) == CLOBBER || GET_CODE (PATTERN (insn)) == USE)
+    insn_static_data->hard_regs = NULL;
+  else
+    insn_static_data->hard_regs
+      = collect_non_operand_hard_regs (&PATTERN (insn), data,
+				       NULL, OP_IN, false);
+  setup_operand_alternative (data);
+  data->arg_hard_regs = NULL;
+  if (CALL_P (insn))
+    {
+      rtx link;
+      int n_hard_regs, regno, arg_hard_regs[FIRST_PSEUDO_REGISTER];
+
+      n_hard_regs = 0;
+      /* Finding implicit hard register usage.	We believe it will be
+	 not changed whatever transformations are used.	 Call insns
+	 are such example.  */
+      for (link = CALL_INSN_FUNCTION_USAGE (insn);
+	   link != NULL_RTX;
+	   link = XEXP (link, 1))
+	if (GET_CODE (XEXP (link, 0)) == USE
+	    && REG_P (XEXP (XEXP (link, 0), 0)))
+	  {
+	    regno = REGNO (XEXP (XEXP (link, 0), 0));
+	    lra_assert (regno < FIRST_PSEUDO_REGISTER);
+	    /* It is an argument register.  */
+	    for (i = (hard_regno_nregs
+		      [regno][GET_MODE (XEXP (XEXP (link, 0), 0))]) - 1;
+		 i >= 0;
+		 i--)
+	      arg_hard_regs[n_hard_regs++] = regno + i;
+	  }
+      if (n_hard_regs != 0)
+	{
+	  arg_hard_regs[n_hard_regs++] = -1;
+	  data->arg_hard_regs = ((int *) xmalloc (n_hard_regs * sizeof (int)));
+	  memcpy (data->arg_hard_regs, arg_hard_regs,
+		  sizeof (int) * n_hard_regs);
+	}
+    }
+  /* Some output operand can be recognized only from the context not
+     from the constraints which are empty in this case.	 Call insn may
+     contain a hard register in set destination with empty constraint
+     and extract_insn treats them as an input.	*/
+  for (i = 0; i < insn_static_data->n_operands; i++)
+    {
+      int j;
+      rtx pat, set;
+      struct lra_operand_data *operand = &insn_static_data->operand[i];
+
+      /* ??? Should we treat 'X' the same way.	It looks to me that
+	 'X' means anything and empty constraint means we do not
+	 care.	*/
+      if (operand->type != OP_IN || *operand->constraint != '\0'
+	  || operand->is_operator)
+	continue;
+      pat = PATTERN (insn);
+      if (GET_CODE (pat) == SET)
+	{
+	  if (data->operand_loc[i] != &SET_DEST (pat))
+	    continue;
+	}
+      else if (GET_CODE (pat) == PARALLEL)
+	{
+	  for (j = XVECLEN (pat, 0) - 1; j >= 0; j--)
+	    {
+	      set = XVECEXP (PATTERN (insn), 0, j);
+	      if (GET_CODE (set) == SET
+		  && &SET_DEST (set) == data->operand_loc[i])
+		break;
+	    }
+	  if (j < 0)
+	    continue;
+	}
+      else
+	continue;
+      operand->type = OP_OUT;
+    }
+  return data;
+}
+
+/* Return info about insn give by UID.	The info should be already set
+   up.	*/
+static lra_insn_recog_data_t
+get_insn_recog_data_by_uid (int uid)
+{
+  lra_insn_recog_data_t data;
+
+  data = lra_insn_recog_data[uid];
+  lra_assert (data != NULL);
+  return data;
+}
+
+/* Invalidate all info about insn given by its UID.  */
+static void
+invalidate_insn_recog_data (int uid)
+{
+  lra_insn_recog_data_t data;
+
+  data = lra_insn_recog_data[uid];
+  lra_assert (data != NULL);
+  free_insn_recog_data (data);
+  lra_insn_recog_data[uid] = NULL;
+}
+
+/* Update all the insn info about INSN.	 It is usually called when
+   something in the insn was changed.  Return the udpated info.	 */
+lra_insn_recog_data_t
+lra_update_insn_recog_data (rtx insn)
+{
+  lra_insn_recog_data_t data;
+  int n;
+  unsigned int uid = INSN_UID (insn);
+  struct lra_static_insn_data *insn_static_data;
+  
+  check_and_expand_insn_recog_data (uid);
+  if ((data = lra_insn_recog_data[uid]) != NULL
+      && data->icode != INSN_CODE (insn))
+    {
+      invalidate_insn_data_regno_info (data, insn, get_insn_freq (insn));
+      invalidate_insn_recog_data (uid);
+      data = NULL;
+    }
+  if (data == NULL)
+    return lra_get_insn_recog_data (insn);
+  insn_static_data = data->insn_static_data;
+  data->used_insn_alternative = -1;
+  if (DEBUG_INSN_P (insn))
+    return data;
+  if (data->icode < 0)
+    {
+      int nop;
+      enum machine_mode operand_mode[MAX_RECOG_OPERANDS];
+      const char *constraints[MAX_RECOG_OPERANDS];
+
+      nop = asm_noperands (PATTERN (insn));
+      if (nop >= 0)
+	{
+	  lra_assert (nop == data->insn_static_data->n_operands);
+	  /* Now get the operand values and constraints out of the
+	     insn.  */
+	  decode_asm_operands (PATTERN (insn), NULL,
+			       data->operand_loc,
+			       constraints, operand_mode, NULL);
+#ifdef ENABLE_CHECKING
+	  {
+	    int i;
+
+	    for (i = 0; i < nop; i++)
+	      lra_assert
+		(insn_static_data->operand[i].mode == operand_mode[i]
+		 && insn_static_data->operand[i].constraint == constraints[i]
+		 && ! insn_static_data->operand[i].is_operator);
+	  }
+#endif
+	}
+#ifdef ENABLE_CHECKING
+      {
+	int i;
+
+	for (i = 0; i < insn_static_data->n_operands; i++)
+	  lra_assert
+	    (insn_static_data->operand[i].type
+	     == (insn_static_data->operand[i].constraint[0] == '=' ? OP_OUT
+		 : insn_static_data->operand[i].constraint[0] == '+' ? OP_INOUT
+		 : OP_IN));
+      }
+#endif
+    }
+  else
+    {
+      insn_extract (insn);
+      n = insn_static_data->n_operands;
+      if (n != 0)
+	memcpy (data->operand_loc, recog_data.operand_loc, n * sizeof (rtx *));
+      n = insn_static_data->n_dups;
+      if (n != 0)
+	memcpy (data->dup_loc, recog_data.dup_loc, n * sizeof (rtx *));
+#ifdef HAVE_ATTR_enabled
+#ifdef ENABLE_CHECKING
+      {
+	int i;
+	bool *bp;
+	
+	n = insn_static_data->n_alternatives;
+	bp = data->alternative_enabled_p;
+	lra_assert (n >= 0 && bp != NULL);
+	/* Cache the insn to prevent extract_insn call from
+	   get_attr_enabled.  */
+	recog_data.insn = insn;
+	for (i = 0; i < n; i++)
+	  {
+	    which_alternative = i;
+	    lra_assert (bp[i] == get_attr_enabled (insn));
+	  }
+      }
+#endif
+#endif
+    }
+  return data;
+}
+
+/* Set up that INSN is using alternative ALT now.  */
+void
+lra_set_used_insn_alternative (rtx insn, int alt)
+{
+  lra_insn_recog_data_t data;
+
+  data = lra_get_insn_recog_data (insn);
+  data->used_insn_alternative = alt;
+}
+
+/* Set up that insn with UID is using alternative ALT now.  The insn
+   info should be already set up.  */
+void
+lra_set_used_insn_alternative_by_uid (int uid, int alt)
+{
+  lra_insn_recog_data_t data;
+
+  check_and_expand_insn_recog_data (uid);
+  data = lra_insn_recog_data[uid];
+  lra_assert (data != NULL);
+  data->used_insn_alternative = alt;
+}
+
+
+
+/* This page contains code dealing with common register info and
+   pseudo copies.  */
+
+/* The size of the following array.  */
+static int reg_info_size;
+/* Common info about each register.  */
+struct lra_reg *lra_reg_info;
+
+/* Last register value.	 */
+static int last_reg_value;
+
+/* Return new register value.  */
+static int
+get_new_reg_value (void)
+{
+  return ++last_reg_value;
+}
+
+/* Pools for copies.  */
+static alloc_pool copy_pool;
+
+DEF_VEC_P(lra_copy_t);
+DEF_VEC_ALLOC_P(lra_copy_t, heap);
+
+/* Vec referring to pseudo copies.  */
+static VEC(lra_copy_t,heap) *copy_vec;
+
+/* Initialize common reg info and copies.  */
+static void
+init_reg_info (void)
+{
+  int i;
+
+  last_reg_value = 0;
+  reg_info_size = max_reg_num () * 3 / 2 + 1;
+  lra_reg_info
+    = (struct lra_reg *) xmalloc (reg_info_size * sizeof (struct lra_reg));
+  for (i = 0; i < reg_info_size; i++)
+    {
+      bitmap_initialize (&lra_reg_info[i].insn_bitmap, &reg_obstack);
+#ifdef STACK_REGS
+      lra_reg_info[i].no_stack_p = false;
+#endif
+      CLEAR_HARD_REG_SET (lra_reg_info[i].conflict_hard_regs);
+      lra_reg_info[i].preferred_hard_regno1 = -1;
+      lra_reg_info[i].preferred_hard_regno2 = -1;
+      lra_reg_info[i].preferred_hard_regno_profit1 = 0;
+      lra_reg_info[i].preferred_hard_regno_profit2 = 0;
+      lra_reg_info[i].live_ranges = NULL;
+      lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
+      lra_reg_info[i].last_reload = 0;
+      lra_reg_info[i].restore_regno = -1;
+      lra_reg_info[i].val = get_new_reg_value ();
+      lra_reg_info[i].copies = NULL;
+    }
+  copy_pool
+    = create_alloc_pool ("lra copies", sizeof (struct lra_copy), 100);
+  copy_vec = VEC_alloc (lra_copy_t, heap, 100);
+}
+
+
+/* Finish common reg info and copies.  */
+static void
+finish_reg_info (void)
+{
+  int i;
+
+  for (i = 0; i < reg_info_size; i++)
+    bitmap_clear (&lra_reg_info[i].insn_bitmap);
+  free (lra_reg_info);
+  reg_info_size = 0;
+  free_alloc_pool (copy_pool);
+  VEC_free (lra_copy_t, heap, copy_vec);
+}
+
+/* Expand common reg info if it is necessary.  */
+static void
+expand_reg_info (void)
+{
+  int i, old = reg_info_size;
+
+  if (reg_info_size > max_reg_num ())
+    return;
+  reg_info_size = max_reg_num () * 3 / 2 + 1;
+  lra_reg_info
+    = (struct lra_reg *) xrealloc (lra_reg_info,
+				   reg_info_size * sizeof (struct lra_reg));
+  for (i = old; i < reg_info_size; i++)
+    {
+      bitmap_initialize (&lra_reg_info[i].insn_bitmap, &reg_obstack);
+#ifdef STACK_REGS
+      lra_reg_info[i].no_stack_p = false;
+#endif
+      CLEAR_HARD_REG_SET (lra_reg_info[i].conflict_hard_regs);
+      lra_reg_info[i].preferred_hard_regno1 = -1;
+      lra_reg_info[i].preferred_hard_regno2 = -1;
+      lra_reg_info[i].preferred_hard_regno_profit1 = 0;
+      lra_reg_info[i].preferred_hard_regno_profit2 = 0;
+      lra_reg_info[i].live_ranges = NULL;
+      lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
+      lra_reg_info[i].last_reload = 0;
+      lra_reg_info[i].restore_regno = -1;
+      lra_reg_info[i].val = get_new_reg_value ();
+      lra_reg_info[i].copies = NULL;
+    }
+}
+
+/* Free all copies.  */
+void
+lra_free_copies (void)
+{
+  lra_copy_t cp;
+
+  while (VEC_length (lra_copy_t, copy_vec) != 0)
+    {
+      cp = VEC_pop (lra_copy_t, copy_vec);
+      lra_reg_info[cp->regno1].copies = lra_reg_info[cp->regno2].copies = NULL;
+      pool_free (copy_pool, cp);
+    }
+}
+
+/* Create copy of two pseudos REGNO1 and REGNO2.  The copy execution
+   frequency is FREQ.  */
+void
+lra_create_copy (int regno1, int regno2, int freq)
+{
+  bool regno1_dest_p;
+  lra_copy_t cp;
+
+  lra_assert (regno1 != regno2);
+  regno1_dest_p = true;
+  if (regno1 > regno2)
+    {
+      int temp = regno2;
+
+      regno1_dest_p = false;
+      regno2 = regno1;
+      regno1 = temp;
+    }
+  cp = (lra_copy_t) pool_alloc (copy_pool);
+  VEC_safe_push (lra_copy_t, heap, copy_vec, cp);
+  cp->regno1_dest_p = regno1_dest_p;
+  cp->freq = freq;
+  cp->regno1 = regno1;
+  cp->regno2 = regno2;
+  cp->regno1_next = lra_reg_info[regno1].copies;
+  lra_reg_info[regno1].copies = cp;
+  cp->regno2_next = lra_reg_info[regno2].copies;
+  lra_reg_info[regno2].copies = cp;
+  if (lra_dump_file != NULL)
+    fprintf (lra_dump_file, "	   Creating copy r%d%sr%d@%d\n",
+	     regno1, regno1_dest_p ? "<-" : "->", regno2, freq);
+}
+
+/* Return N-th (0, 1, ...) copy.  If there is no copy, return
+   NULL.  */
+lra_copy_t
+lra_get_copy (int n)
+{
+  if (n >= (int) VEC_length (lra_copy_t, copy_vec))
+    return NULL;
+  return VEC_index (lra_copy_t, copy_vec, n);
+}
+
+
+
+/* This page contains code dealing with info about registers in
+   insns.  */
+
+/* Process X of insn UID recursively and add info (operand type is
+   given by TYPE, flag of that it is early clobber is EARLY_CLOBBER)
+   about registers in X to the insn DATA.  */
+static void
+add_regs_to_insn_regno_info (lra_insn_recog_data_t data, rtx x, int uid,
+			     enum op_type type, bool early_clobber)
+{
+  int i, j, regno;
+  bool subreg_p;
+  enum machine_mode mode;
+  const char *fmt;
+  enum rtx_code code;
+  struct lra_insn_reg *curr;
+
+  code = GET_CODE (x);
+  mode = GET_MODE (x);
+  subreg_p = false;
+  if (GET_CODE (x) == SUBREG)
+    {
+      x = SUBREG_REG (x);
+      code = GET_CODE (x);
+      if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (GET_MODE (x)))
+	{
+	  mode = GET_MODE (x);
+	  if (GET_MODE_SIZE (mode) > REGMODE_NATURAL_SIZE (mode))
+	    subreg_p = true;
+	}
+    }
+  if (REG_P (x))
+    {
+      regno = REGNO (x);
+      expand_reg_info ();
+      if (bitmap_set_bit (&lra_reg_info[regno].insn_bitmap, uid))
+	data->regs = new_insn_reg (regno, type, mode, subreg_p, early_clobber,
+				   data->regs);
+      else
+	{
+	  for (curr = data->regs; curr != NULL; curr = curr->next)
+	    if (curr->regno == regno)
+	      break;
+	  if (curr->subreg_p != subreg_p || curr->biggest_mode != mode)
+	    /* The info can not be integrated into the found
+	       structure.  */
+	    data->regs = new_insn_reg (regno, type, mode, subreg_p,
+				       early_clobber, data->regs);
+	  else
+	    {
+	      if (curr->type != type)
+		curr->type = OP_INOUT;
+	      if (curr->early_clobber != early_clobber)
+		curr->early_clobber = true;
+	    }
+	  lra_assert (curr != NULL);
+	}
+      return;
+    }
+
+  switch (code)
+    {
+    case SET:
+      add_regs_to_insn_regno_info (data, SET_DEST (x), uid, OP_OUT, false);
+      add_regs_to_insn_regno_info (data, SET_SRC (x), uid, OP_IN, false);
+      break;
+    case CLOBBER:
+      /* We treat clobber of non-operand hard registers as early
+	 clobber (the behavior is expected from asm).  */ 
+      add_regs_to_insn_regno_info (data, XEXP (x, 0), uid, OP_OUT, true);
+      break;
+    case PRE_INC: case PRE_DEC: case POST_INC: case POST_DEC:
+      add_regs_to_insn_regno_info (data, XEXP (x, 0), uid, OP_INOUT, false);
+      break;
+    case PRE_MODIFY: case POST_MODIFY:
+      add_regs_to_insn_regno_info (data, XEXP (x, 0), uid, OP_INOUT, false);
+      add_regs_to_insn_regno_info (data, XEXP (x, 1), uid, OP_IN, false);
+      break;
+    default:
+      if ((code != PARALLEL && code != EXPR_LIST) || type != OP_OUT)
+	/* Some targets place small structures in registers for return
+	   values of functions, and those registers are wrapped in
+	   PARALLEL that we may see as the destination of a SET.  Here
+	   is an example:
+
+	   (call_insn 13 12 14 2 (set (parallel:BLK [
+		(expr_list:REG_DEP_TRUE (reg:DI 0 ax)
+		    (const_int 0 [0]))
+		(expr_list:REG_DEP_TRUE (reg:DI 1 dx)
+		    (const_int 8 [0x8]))
+	       ])
+	     (call (mem:QI (symbol_ref:DI (...	*/
+	type = OP_IN;
+      fmt = GET_RTX_FORMAT (code);
+      for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+	{
+	  if (fmt[i] == 'e')
+	    add_regs_to_insn_regno_info (data, XEXP (x, i), uid, type, false);
+	  else if (fmt[i] == 'E')
+	    {
+	      for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+		add_regs_to_insn_regno_info (data, XVECEXP (x, i, j), uid,
+					     type, false);
+	    }
+	}
+    }
+}
+
+/* Return execution frequency of INSN.	*/
+static int
+get_insn_freq (rtx insn)
+{
+  basic_block bb;
+
+  if ((bb = BLOCK_FOR_INSN (insn)) != NULL)
+    return REG_FREQ_FROM_BB (bb);
+  else
+    {
+      lra_assert (lra_insn_recog_data[INSN_UID (insn)]
+		  ->insn_static_data->n_operands == 0);
+      /* We don't care about such insn, e.g. it might be jump with
+	 addr_vec.  */
+      return 1;
+    }
+}
+
+/* Invalidate all reg info of INSN with DATA and execution frequency
+   FREQ.  Update common info about the invalidated registers.  */
+static void
+invalidate_insn_data_regno_info (lra_insn_recog_data_t data, rtx insn,
+				 int freq)
+{
+  int uid;
+  bool debug_p;
+  unsigned int i;
+  struct lra_insn_reg *ir, *next_ir;
+
+  uid = INSN_UID (insn);
+  debug_p = DEBUG_INSN_P (insn);
+  for (ir = data->regs; ir != NULL; ir = next_ir)
+    {
+      i = ir->regno;
+      next_ir = ir->next;
+      free_insn_reg (ir);
+      bitmap_clear_bit (&lra_reg_info[i].insn_bitmap, uid);
+      if (i >= FIRST_PSEUDO_REGISTER && ! debug_p)
+	{
+	  lra_reg_info[i].nrefs--;
+	  lra_reg_info[i].freq -= freq;
+	  lra_assert (lra_reg_info[i].nrefs >= 0 && lra_reg_info[i].freq >= 0);
+	}
+    }
+  data->regs = NULL;
+}
+
+/* Invalidate all reg info of INSN.  Update common info about the
+   invalidated registers.  */
+void
+lra_invalidate_insn_regno_info (rtx insn)
+{
+  invalidate_insn_data_regno_info (lra_get_insn_recog_data (insn), insn,
+				   get_insn_freq (insn));
+}
+
+/* Update common reg info from reg info of insn given by its DATA and
+   execution frequency FREQ.  */
+static void
+setup_insn_reg_info (lra_insn_recog_data_t data, int freq)
+{
+  unsigned int i;
+  struct lra_insn_reg *ir;
+
+  for (ir = data->regs; ir != NULL; ir = ir->next)
+    if ((i = ir->regno) >= FIRST_PSEUDO_REGISTER)
+      {
+	lra_reg_info[i].nrefs++;
+	lra_reg_info[i].freq += freq;
+      }
+}
+
+/* Set up insn reg info of INSN.  Update common reg info from reg info
+   of INSN.  */
+void
+lra_update_insn_regno_info (rtx insn)
+{
+  int i, uid, freq;
+  lra_insn_recog_data_t data;
+  struct lra_static_insn_data *static_data;
+  enum rtx_code code;
+
+  if (! INSN_P (insn))
+    return;
+  data = lra_get_insn_recog_data (insn);
+  static_data = data->insn_static_data;
+  freq = get_insn_freq (insn);
+  invalidate_insn_data_regno_info (data, insn, freq);
+  uid = INSN_UID (insn);
+  for (i = static_data->n_operands - 1; i >= 0; i--)
+    add_regs_to_insn_regno_info (data, *data->operand_loc[i], uid,
+				 static_data->operand[i].type,
+				 static_data->operand[i].early_clobber);
+  if ((code = GET_CODE (PATTERN (insn))) == CLOBBER || code == USE)
+    add_regs_to_insn_regno_info (data, XEXP (PATTERN (insn), 0), uid,
+				 code == USE ? OP_IN : OP_OUT, false);
+  if (NONDEBUG_INSN_P (insn))
+    setup_insn_reg_info (data, freq);
+}
+
+/* Return reg info of insn given by it UID.  */
+struct lra_insn_reg *
+lra_get_insn_regs (int uid)
+{
+  lra_insn_recog_data_t data;
+
+  data = get_insn_recog_data_by_uid (uid);
+  return data->regs;
+}
+
+
+
+/* This page contains code dealing with stack of the insns which
+   should be processed by the next constraint pass.  */
+
+/* Bitmap used to put an insn on the stack only in one exemplar.  */
+bitmap_head lra_constraint_insn_stack_bitmap;
+
+/* The stack itself.  */
+VEC (rtx, heap) *lra_constraint_insn_stack;
+
+/* Put INSN on the stack.  */
+void
+lra_push_insn (rtx insn)
+{
+  if (! bitmap_set_bit (&lra_constraint_insn_stack_bitmap, INSN_UID (insn)))
+    return;
+  lra_update_insn_regno_info (insn);
+  VEC_safe_push (rtx, heap, lra_constraint_insn_stack, insn);
+}
+
+/* Put insn with UID on the stack.  */
+void
+lra_push_insn_by_uid (unsigned int uid)
+{
+  lra_push_insn (lra_insn_recog_data[uid]->insn);
+}
+
+/* Put INSN on the stack and update its reg info.  */
+void
+lra_push_insn_and_update_insn_regno_info (rtx insn)
+{
+  lra_update_insn_regno_info (insn);
+  if (! bitmap_set_bit (&lra_constraint_insn_stack_bitmap, INSN_UID (insn)))
+    return;
+  VEC_safe_push (rtx, heap, lra_constraint_insn_stack, insn);
+}
+
+/* Push insns FROM to TO (excluding it) going in reverse order.	 */
+static void
+push_insns (rtx from, rtx to)
+{
+  rtx insn;
+
+  if (from == NULL_RTX)
+    return;
+  for (insn = from; insn != to; insn = PREV_INSN (insn))
+    lra_push_insn (insn);
+}
+
+/* Emit insns BEFORE before INSN and insns AFTER after INSN.  Put the
+   insns onto the stack.  Print about emitting the insns with
+   TITLE.  */
+void
+lra_process_new_insns (rtx insn, rtx before, rtx after, const char *title)
+{
+  rtx last;
+
+  if (lra_dump_file != NULL && (before != NULL_RTX || after != NULL_RTX))
+    {
+      debug_rtl_slim (lra_dump_file, insn, insn, -1, 0);
+      if (before != NULL_RTX)
+	{
+	  fprintf (lra_dump_file,"    %s before:\n", title);
+	  debug_rtl_slim (lra_dump_file, before, NULL_RTX, -1, 0);
+	}
+      if (after != NULL_RTX)
+	{
+	  fprintf (lra_dump_file, "    %s after:\n", title);
+	  debug_rtl_slim (lra_dump_file, after, NULL_RTX, -1, 0);
+	}
+      fprintf (lra_dump_file, "\n");
+    }
+  if (before != NULL_RTX)
+    {
+      emit_insn_before (before, insn);
+      push_insns (PREV_INSN (insn), PREV_INSN (before));
+    }
+  if (after != NULL_RTX)
+    {
+      for (last = after; NEXT_INSN (last) != NULL_RTX; last = NEXT_INSN (last))
+	;
+      emit_insn_after (after, insn);
+      push_insns (last, insn);
+    }
+}
+
+
+
+/* This page contains code dealing with scratches (changing them onto
+   pseudos and restoring them from the pseudos).
+
+   We change scratches into pseudos at the beginning of LRA to
+   simplify dealing with them (conflicts, hard register assignments).
+
+   If the pseudo denoting scratch was spilled it means that we do need
+   a hard register for it.  Such pseudos are transformed back to
+   scratches at the end of LRA.	 */
+
+/* Description of location of a former scratch operand.	 */
+struct loc
+{
+  rtx insn; /* Insn where the scratch was.  */
+  int nop;  /* Number of the operand which was a scratch.  */
+};
+
+typedef struct loc *loc_t;
+
+DEF_VEC_P(loc_t);
+DEF_VEC_ALLOC_P(loc_t, heap);
+
+/* Locations of the former scratches.  */
+static VEC (loc_t, heap) *scratches;
+
+/* Bitmap of scratch regnos.  */
+static bitmap_head scratch_bitmap;
+
+/* Bitmap of scratch operands.	*/
+static bitmap_head scratch_operand_bitmap;
+
+/* Return true if pseudo REGNO is made of SCRATCH.  */
+bool
+lra_former_scratch_p (int regno)
+{
+  return bitmap_bit_p (&scratch_bitmap, regno);
+}
+
+/* Return true if the operand NOP of INSN is a former scratch.	*/
+bool
+lra_former_scratch_operand_p (rtx insn, int nop)
+{
+  return bitmap_bit_p (&scratch_operand_bitmap,
+		       INSN_UID (insn) * MAX_RECOG_OPERANDS + nop) != 0;
+}
+
+/* Change scratches onto pseudos and save their location.  */
+static void
+remove_scratches (void)
+{
+  int i;
+  bool insn_changed_p;
+  basic_block bb;
+  rtx insn, reg;
+  loc_t loc;
+  lra_insn_recog_data_t id;
+  struct lra_static_insn_data *static_id;
+
+  scratches = VEC_alloc (loc_t, heap, get_max_uid ());
+  bitmap_initialize (&scratch_bitmap, &reg_obstack);
+  bitmap_initialize (&scratch_operand_bitmap, &reg_obstack);
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+    if (INSN_P (insn))
+      {
+	id = lra_get_insn_recog_data (insn);
+	static_id = id->insn_static_data;
+	insn_changed_p = false;
+	for (i = 0; i < static_id->n_operands; i++)
+	  if (GET_CODE (*id->operand_loc[i]) == SCRATCH
+	      && GET_MODE (*id->operand_loc[i]) != VOIDmode)
+	    {
+	      insn_changed_p = true;
+	      *id->operand_loc[i] = reg
+		= lra_create_new_reg (static_id->operand[i].mode,
+				      *id->operand_loc[i], ALL_REGS, NULL);
+	      add_reg_note (insn, REG_UNUSED, reg);
+	      lra_update_dup (id, i);
+	      loc = (struct loc *) xmalloc (sizeof (struct loc));
+	      loc->insn = insn;
+	      loc->nop = i;
+	      VEC_safe_push (loc_t, heap, scratches, loc);
+	      bitmap_set_bit (&scratch_bitmap, REGNO (*id->operand_loc[i]));
+	      bitmap_set_bit (&scratch_operand_bitmap,
+			      INSN_UID (insn) * MAX_RECOG_OPERANDS + i);
+	      if (lra_dump_file != NULL)
+		fprintf (lra_dump_file,
+			 "Removing SCRATCH in insn #%u (nop %d)\n",
+			 INSN_UID (insn), i);
+	    }
+	if (insn_changed_p)
+	  /* Because we might use DF right after caller-saves sub-pass
+	     we need to keep DF info up to date.  */
+	  df_insn_rescan (insn);
+      }
+}
+
+/* Changes pseudos created by function remove_scratches onto scratches.	 */
+static void
+restore_scratches (void)
+{
+  int i, regno;
+  loc_t loc;
+  rtx last = NULL_RTX;
+  lra_insn_recog_data_t id = NULL;
+
+  for (i = 0; VEC_iterate (loc_t, scratches, i, loc); i++)
+    {
+      if (last != loc->insn)
+	{
+	  last = loc->insn;
+	  id = lra_get_insn_recog_data (last);
+	}
+      if (REG_P (*id->operand_loc[loc->nop])
+	  && ((regno = REGNO (*id->operand_loc[loc->nop]))
+	      >= FIRST_PSEUDO_REGISTER)
+	  && lra_get_regno_hard_regno (regno) < 0)
+	{
+	  /* It should be only case when scratch register with chosen
+	     constraint 'X' did not get memory or hard register.  */
+	  lra_assert (lra_former_scratch_p (regno));
+	  *id->operand_loc[loc->nop]
+	    = gen_rtx_SCRATCH (GET_MODE (*id->operand_loc[loc->nop]));
+	  lra_update_dup (id, loc->nop);
+	  if (lra_dump_file != NULL)
+	    fprintf (lra_dump_file, "Restoring SCRATCH in insn #%u(nop %d)\n",
+		     INSN_UID (loc->insn), loc->nop);
+	}
+    }
+  for (i = 0; VEC_iterate (loc_t, scratches, i, loc); i++)
+    free (loc);
+  VEC_free (loc_t, heap, scratches);
+  bitmap_clear (&scratch_bitmap);
+  bitmap_clear (&scratch_operand_bitmap);
+}
+
+
+
+#ifdef ENABLE_CHECKING
+
+/* Function checks RTL for correctness.	 If FINAL_P is true, it is
+   done at the end of LRA and the check is more rigorous.  */
+static void
+check_rtl (bool final_p)
+{
+  int i;
+  basic_block bb;
+  rtx insn;
+  lra_insn_recog_data_t id;
+
+  lra_assert (! final_p || reload_completed);
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+    if (NONDEBUG_INSN_P (insn)
+	&& GET_CODE (PATTERN (insn)) != USE
+	&& GET_CODE (PATTERN (insn)) != CLOBBER
+	&& GET_CODE (PATTERN (insn)) != ADDR_VEC
+	&& GET_CODE (PATTERN (insn)) != ADDR_DIFF_VEC
+	&& GET_CODE (PATTERN (insn)) != ASM_INPUT)
+      {
+	if (final_p)
+	  {
+	    extract_insn (insn);
+	    lra_assert (constrain_operands (1));
+	    continue;
+	  }
+	if (insn_invalid_p (insn, false))
+	  fatal_insn_not_found (insn);
+	if (asm_noperands (PATTERN (insn)) >= 0)
+	  continue;
+	id = lra_get_insn_recog_data (insn);
+	/* The code is based on assumption that all addresses in
+	   regular instruction are legitimate before LRA.  The code in
+	   lra-constraints.c is based on assumption that there is no
+	   subreg of memory as an insn operand.	 */
+	for (i = 0; i < id->insn_static_data->n_operands; i++)
+	  {
+	    rtx op = *id->operand_loc[i];
+	      
+	    if (MEM_P (op)
+		&& (GET_MODE (op) != BLKmode
+		    || GET_CODE (XEXP (op, 0)) != SCRATCH)
+		&& ! memory_address_p (GET_MODE (op), XEXP (op, 0))
+		/* Some ports don't recognize the following addresses
+		   as legitimate.  Although they are legitimate if
+		   they satisfies the constraints and will be checked
+		   by insn constraints which we ignore here.  */
+		&& GET_CODE (XEXP (op, 0)) != UNSPEC
+		&& GET_CODE (XEXP (op, 0)) != PRE_DEC
+		&& GET_CODE (XEXP (op, 0)) != PRE_INC
+		&& GET_CODE (XEXP (op, 0)) != POST_DEC
+		&& GET_CODE (XEXP (op, 0)) != POST_INC
+		&& GET_CODE (XEXP (op, 0)) != PRE_MODIFY
+		&& GET_CODE (XEXP (op, 0)) != POST_MODIFY)
+	      fatal_insn_not_found (insn);
+	  }
+      }
+}
+#endif /* #ifdef ENABLE_CHECKING */
+
+/* Determine if the current function has an exception receiver block
+   that reaches the exit block via non-exceptional edges  */
+static bool
+has_nonexceptional_receiver (void)
+{
+  edge e;
+  edge_iterator ei;
+  basic_block *tos, *worklist, bb;
+
+  /* If we're not optimizing, then just err on the safe side.  */
+  if (!optimize)
+    return true;
+  
+  /* First determine which blocks can reach exit via normal paths.  */
+  tos = worklist = XNEWVEC (basic_block, n_basic_blocks + 1);
+
+  FOR_EACH_BB (bb)
+    bb->flags &= ~BB_REACHABLE;
+
+  /* Place the exit block on our worklist.  */
+  EXIT_BLOCK_PTR->flags |= BB_REACHABLE;
+  *tos++ = EXIT_BLOCK_PTR;
+  
+  /* Iterate: find everything reachable from what we've already seen.  */
+  while (tos != worklist)
+    {
+      bb = *--tos;
+
+      FOR_EACH_EDGE (e, ei, bb->preds)
+	if (!(e->flags & EDGE_ABNORMAL))
+	  {
+	    basic_block src = e->src;
+
+	    if (!(src->flags & BB_REACHABLE))
+	      {
+		src->flags |= BB_REACHABLE;
+		*tos++ = src;
+	      }
+	  }
+    }
+  free (worklist);
+
+  /* Now see if there's a reachable block with an exceptional incoming
+     edge.  */
+  FOR_EACH_BB (bb)
+    if (bb->flags & BB_REACHABLE)
+      FOR_EACH_EDGE (e, ei, bb->preds)
+	if (e->flags & EDGE_ABNORMAL)
+	  return true;
+
+  /* No exceptional block reached exit unexceptionally.	 */
+  return false;
+}
+
+#ifdef AUTO_INC_DEC
+
+/* Process recursively X of INSN and add REG_INC notes if necessary.  */
+static void
+add_auto_inc_notes (rtx insn, rtx x)
+{
+  enum rtx_code code = GET_CODE (x);
+  const char *fmt;
+  int i, j;
+
+  if (code == MEM && auto_inc_p (XEXP (x, 0)))
+    {
+      add_reg_note (insn, REG_INC, XEXP (XEXP (x, 0), 0));
+      return;
+    }
+
+  /* Scan all X sub-expressions.  */
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    {
+      if (fmt[i] == 'e')
+	add_auto_inc_notes (insn, XEXP (x, i));
+      else if (fmt[i] == 'E')
+	for (j = XVECLEN (x, i) - 1; j >= 0; j--)
+	  add_auto_inc_notes (insn, XVECEXP (x, i, j));
+    }
+}
+
+#endif
+
+/* Remove all REG_DEAD and REG_UNUSED notes and regenerate REG_INC.
+   We change pseudos by hard registers without notification of DF and
+   that can make the notes obsolete.  DF-infrastructure does not deal
+   with REG_INC notes -- so we should regenerate them here.  */
+static void
+update_reg_notes (void)
+{
+  rtx *pnote;
+  basic_block bb;
+  rtx insn;
+
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+    if (NONDEBUG_INSN_P (insn))
+      {
+	pnote = &REG_NOTES (insn);
+	while (*pnote != 0)
+	  {
+	    if (REG_NOTE_KIND (*pnote) == REG_DEAD
+		|| REG_NOTE_KIND (*pnote) == REG_UNUSED
+		|| REG_NOTE_KIND (*pnote) == REG_INC)
+	      *pnote = XEXP (*pnote, 1);
+	    else
+	      pnote = &XEXP (*pnote, 1);
+	  }
+#ifdef AUTO_INC_DEC
+	add_auto_inc_notes (insn, PATTERN (insn));
+#endif
+      }
+}
+
+/* Set to 1 while in lra.  */
+int lra_in_progress;
+
+/* Start of reload pseudo regnos before the new spill pass.  */ 
+int lra_constraint_new_regno_start;
+
+/* Inheritance pseudo regnos before the new spill pass.	 */ 
+bitmap_head lra_inheritance_pseudos;
+
+/* Split pseudo regnos before the new spill pass.  */ 
+bitmap_head lra_split_pseudos;
+
+/* Reload pseudo regnos before the new assign pass which still can be
+   spilled after the assinment pass.  */ 
+bitmap_head lra_optional_reload_pseudos;
+
+/* First UID of insns generated before a new spill pass.  */
+int lra_constraint_new_insn_uid_start;
+
+/* File used for output of LRA debug information.  */
+FILE *lra_dump_file;
+
+/* True if we should try spill into registers of different classes
+   instead of memory.  */
+bool lra_reg_spill_p;
+
+/* Set up value LRA_REG_SPILL_P.  */
+static void
+setup_reg_spill_flag (void)
+{
+  int cl;
+
+  if (targetm.spill_class != NULL)
+    for (cl = 0; cl < (int) LIM_REG_CLASSES; cl++)
+      if (targetm.spill_class ((enum reg_class) cl) != NO_REGS)
+	{
+	  lra_reg_spill_p = true;
+	  return;
+	}
+  lra_reg_spill_p = false;
+}
+
+/* Major LRA entry function.  F is a file should be used to dump LRA
+   debug info.	*/
+void
+lra (FILE *f)
+{
+  int i;
+  bool live_p, scratch_p, inserted_p;
+
+  lra_dump_file = f;
+
+
+  init_insn_recog_data ();
+
+#ifdef ENABLE_CHECKING
+  check_rtl (false);
+#endif
+
+  COPY_HARD_REG_SET (lra_no_alloc_regs, ira_no_alloc_regs);
+
+  lra_live_range_iter = lra_coalesce_iter = 0;
+  lra_constraint_iter = lra_constraint_iter_after_spill = 0;
+  lra_inheritance_iter = lra_undo_inheritance_iter = 0;
+
+  setup_reg_spill_flag ();
+
+  /* We can not set up reload_in_progress because it prevents new
+     pseudo creation.  */
+  lra_in_progress = 1;
+
+  init_reg_info ();
+  expand_reg_info ();
+
+  /* Function remove_scratches can creates new pseudos for clobbers --
+     so set up lra_constraint_new_regno_start before its call to
+     permit changing reg classes for pseudos created by this
+     simplification.  */
+  lra_constraint_new_regno_start = max_reg_num ();
+  remove_scratches ();
+  scratch_p = lra_constraint_new_regno_start != max_reg_num ();
+
+  /* A function that has a non-local label that can reach the exit
+     block via non-exceptional paths must save all call-saved
+     registers.	 */
+  if (cfun->has_nonlocal_label && has_nonexceptional_receiver ())
+    crtl->saves_all_registers = 1;
+
+  if (crtl->saves_all_registers)
+    for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+      if (! call_used_regs[i] && ! fixed_regs[i] && ! LOCAL_REGNO (i))
+	df_set_regs_ever_live (i, true);
+
+  /* We don't DF from now and avoid its using because it is to
+     expensive when a lot of RTL changes are made.  */
+  df_set_flags (DF_NO_INSN_RESCAN);
+  lra_constraint_insn_stack = VEC_alloc (rtx, heap, get_max_uid ());
+  bitmap_initialize (&lra_constraint_insn_stack_bitmap, &reg_obstack);
+  lra_live_ranges_init ();
+  lra_contraints_init ();
+  lra_curr_reload_num = 0;
+  push_insns (get_last_insn (), NULL_RTX);
+  /* It is needed for the 1st coalescing.  */
+  lra_constraint_new_insn_uid_start = get_max_uid ();
+  bitmap_initialize (&lra_inheritance_pseudos, &reg_obstack);
+  bitmap_initialize (&lra_split_pseudos, &reg_obstack);
+  bitmap_initialize (&lra_optional_reload_pseudos, &reg_obstack);
+  live_p = false;
+  for (;;)
+    {
+      for (;;)
+	{
+	  bitmap_clear (&lra_optional_reload_pseudos);
+	  /* We should try to assign hard registers to scratches even
+	     if there were no RTL transformations in
+	     lra_constraints.  */
+	  if (! lra_constraints (lra_constraint_iter == 0)
+	      && (lra_constraint_iter > 1
+		  || (! scratch_p && ! caller_save_needed)))
+	    break;
+	  /* Constraint transformations may result in that eliminable
+	     hard regs become uneliminable and pseudos which use them
+	     should be spilled.	 It is better to do it before pseudo
+	     assignments.
+
+	     For example, rs6000 can make
+	     RS6000_PIC_OFFSET_TABLE_REGNUM uneliminable if we started
+	     to use a constant pool.  */
+	  lra_eliminate (false);
+	  lra_inheritance ();
+	  /* We need live ranges for lra_assign -- so build them.  */
+	  lra_create_live_ranges (true);
+	  live_p = true;
+	  /* If we don't spill non-reload and non-inheritance pseudos,
+	     there is no sense to run memory-memory move coalescing.
+	     If inheritance pseudos were spilled, the memory-memory
+	     moves involving them will be removed by pass undoing
+	     inheritance.  */
+	  if (! lra_assign () && lra_coalesce ())	
+	    live_p = false;
+	  if (lra_undo_inheritance ())
+	    live_p = false;
+	}
+      bitmap_clear (&lra_inheritance_pseudos);
+      bitmap_clear (&lra_split_pseudos);
+      if (! lra_need_for_spills_p ())
+	break;
+      if (! live_p)
+	{
+	  /* We need full live info for spilling pseudos into
+	     registers instead of memory.  */
+	  lra_create_live_ranges (lra_reg_spill_p);
+	  live_p = true;
+	}
+      lra_spill ();
+      /* Assignment of stack slots changes elimination offsets for
+	 some eliminations.  So update the offsets here.  */
+      lra_eliminate (false);
+      lra_constraint_new_regno_start = max_reg_num ();
+      lra_constraint_new_insn_uid_start = get_max_uid ();
+      bitmap_clear (&lra_matched_pseudos);
+      lra_constraint_iter_after_spill = 0;
+    }
+  restore_scratches ();
+  lra_eliminate (true);
+  lra_hard_reg_substitution ();
+  lra_in_progress = 0;
+  lra_clear_live_ranges ();
+  lra_live_ranges_finish ();
+  lra_contraints_finish ();
+  finish_reg_info ();
+  bitmap_clear (&lra_constraint_insn_stack_bitmap);
+  VEC_free (rtx, heap, lra_constraint_insn_stack);
+  finish_insn_recog_data ();
+  regstat_free_n_sets_and_refs ();
+  regstat_free_ri ();
+  reload_completed = 1;
+  update_reg_notes ();
+
+  inserted_p = fixup_abnormal_edges ();
+
+  /* We've possibly turned single trapping insn into multiple ones.  */
+  if (cfun->can_throw_non_call_exceptions)
+    {
+      sbitmap blocks;
+      blocks = sbitmap_alloc (last_basic_block);
+      sbitmap_ones (blocks);
+      find_many_sub_basic_blocks (blocks);
+      sbitmap_free (blocks);
+    }
+
+  if (inserted_p)
+    commit_edge_insertions ();
+
+  /* Replacing pseudos with their memory equivalents might have
+     created shared rtx.  Subsequent passes would get confused
+     by this, so unshare everything here.  */
+  unshare_all_rtl_again (get_insns ());
+
+#ifdef ENABLE_CHECKING
+  check_rtl (true);
+#endif
+}
+
+/* Called once per compiler to initialize LRA data once.  */
+void
+lra_init_once (void)
+{
+  init_insn_code_data_once ();
+}
+
+/* Initialize LRA data once per function.  */
+void
+lra_init (void)
+{
+  init_op_alt_data ();
+}
+
+/* Called once per compiler to finish LRA data which are initialize
+   once.  */
+void
+lra_finish_once (void)
+{
+  finish_insn_code_data_once ();
+}
Index: lra.h
===================================================================
--- lra.h	(revision 0)
+++ lra.h	(working copy)
@@ -0,0 +1,40 @@ 
+/* Communication between the Local Register Allocator (LRA) and
+   the rest of the compiler.
+   Copyright (C) 2010, 2011, 2012
+   Free Software Foundation, Inc.
+   Contributed by Vladimir Makarov <vmakarov@redhat.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.	If not see
+<http://www.gnu.org/licenses/>.	 */
+
+/* Return the allocno reg class of REGNO.  If it is a reload pseudo,
+   the pseudo should finally get hard register of the allocno
+   class.  */
+static inline enum reg_class
+lra_get_allocno_class (int regno)
+{
+  resize_reg_info ();
+  return reg_allocno_class (regno);
+}
+
+extern rtx lra_create_new_reg (enum machine_mode, rtx, enum reg_class,
+			       const char *);
+extern void lra_init_elimination (void);
+extern rtx lra_eliminate_regs (rtx, enum machine_mode, rtx);
+extern void lra (FILE *);
+extern void lra_init_once (void);
+extern void lra_init (void);
+extern void lra_finish_once (void);