Patchwork Patch: Revert find_reloads part of the IRA merge

login
register
mail settings
Submitter Bernd Schmidt
Date July 14, 2010, 11:28 a.m.
Message ID <4C3D9F4C.8080304@codesourcery.com>
Download mbox | patch
Permalink /patch/58869/
State New
Headers show

Comments

Bernd Schmidt - July 14, 2010, 11:28 a.m.
I've now reverted the find_reloads change.

Attached is a patch which tries to solve the problem in a different way;
I fixed some other problems that showed up in testing on ARM.  Specifically:
  * Immediately marking used_spill_regs as ever_live can cause a
    situation where the next iteration will use a different set of
    hard regs, and we end up pushing registers that are never used.
    If we delay settings regs_ever_live until nothing else has
    changed, we can avoid that, at the cost of a little bit of
    complexity, and the risk of requiring more iterations.  Not
    sure this is worth it.
  * When evaluating costs of alternatives, reload isn't taking into
    account whether something is going to need secondary reloads.
  * Use live_throughout sets to determine if a reload will force
    a spill.  This seems to work better than the previous heuristic,
    but it can still fail (dead_or_set isn't taken into account
    since it's quite a useless set, the test isn't run on address
    reloads, etc.)

Not committed.  It would be nice to get reports from some target
maintainers about what this does to code quality.


Bernd
Jeff Law - July 14, 2010, 7:03 p.m.
On 07/14/10 05:28, Bernd Schmidt wrote:
> I've now reverted the find_reloads change.
>
> Attached is a patch which tries to solve the problem in a different way;
> I fixed some other problems that showed up in testing on ARM.  Specifically:
>    * Immediately marking used_spill_regs as ever_live can cause a
>      situation where the next iteration will use a different set of
>      hard regs, and we end up pushing registers that are never used.
>      If we delay settings regs_ever_live until nothing else has
>      changed, we can avoid that, at the cost of a little bit of
>      complexity, and the risk of requiring more iterations.  Not
>      sure this is worth it.
>    
How often have you seen this occur in practice?  The complexity is 
trivial, so if it's occuring with any regularity, I'd say go for it.


>    * When evaluating costs of alternatives, reload isn't taking into
>      account whether something is going to need secondary reloads.
>    
I'm all for more accurate cost modeling.

>    * Use live_throughout sets to determine if a reload will force
>      a spill.  This seems to work better than the previous heuristic,
>      but it can still fail (dead_or_set isn't taken into account
>      since it's quite a useless set, the test isn't run on address
>      reloads, etc.)
>    
Well, you're detecting one of (many) cases where we know a particular 
reload is going to cause a spill.  While it'd be nice to catch the other 
cases, I don't think detecting all the cases where we can prove that a 
reload is going to cause a spill is necessary.  What you're proposing is 
a clear incremental improvement IMHO.

Jeff
Bernd Schmidt - July 14, 2010, 9:35 p.m.
On 07/14/2010 09:03 PM, Jeff Law wrote:
> On 07/14/10 05:28, Bernd Schmidt wrote:
>> Attached is a patch which tries to solve the problem in a different way;
>> I fixed some other problems that showed up in testing on ARM. 
>> Specifically:
>>    * Immediately marking used_spill_regs as ever_live can cause a
>>      situation where the next iteration will use a different set of
>>      hard regs, and we end up pushing registers that are never used.
>>      If we delay settings regs_ever_live until nothing else has
>>      changed, we can avoid that, at the cost of a little bit of
>>      complexity, and the risk of requiring more iterations.  Not
>>      sure this is worth it.
>>    
> How often have you seen this occur in practice?  The complexity is
> trivial, so if it's occuring with any regularity, I'd say go for it.

Anecdotally, I seem to recall seeing this every now and then, and it's
been bugging me for a while.  Practically, in the last test run I think
it only affected one testcase out of the many I have collected for
comparing assembly output.  I've added this code because one of the two
other changes made it show up.

>>    * When evaluating costs of alternatives, reload isn't taking into
>>      account whether something is going to need secondary reloads.
>>    
> I'm all for more accurate cost modeling.

Yes.  I think I'll put this in (after a bit more thought and some
testing on its own), it seems like the right thing to do.

>>    * Use live_throughout sets to determine if a reload will force
>>      a spill.  This seems to work better than the previous heuristic,
>>      but it can still fail (dead_or_set isn't taken into account
>>      since it's quite a useless set, the test isn't run on address
>>      reloads, etc.)
>>    
> Well, you're detecting one of (many) cases where we know a particular
> reload is going to cause a spill.  While it'd be nice to catch the other
> cases, I don't think detecting all the cases where we can prove that a
> reload is going to cause a spill is necessary.  What you're proposing is
> a clear incremental improvement IMHO.

Not really, it suffers from the same conceptual problem as the previous
attempt I reverted.  If we can't get completely accurate answers (and at
this point, we can't), it can happen that we prefer an alternative for
the wrong reasons (e.g. both of them will force a spill, but we can
prove it only for one of them here).

In practice it seems to work reasonably well, at least on ARM.  More
testing would be good.


Bernd
Jeff Law - July 14, 2010, 10:26 p.m.
On 07/14/10 15:35, Bernd Schmidt wrote:
> On 07/14/2010 09:03 PM, Jeff Law wrote:
>    
>> On 07/14/10 05:28, Bernd Schmidt wrote:
>>      
>>> Attached is a patch which tries to solve the problem in a different way;
>>> I fixed some other problems that showed up in testing on ARM.
>>> Specifically:
>>>     * Immediately marking used_spill_regs as ever_live can cause a
>>>       situation where the next iteration will use a different set of
>>>       hard regs, and we end up pushing registers that are never used.
>>>       If we delay settings regs_ever_live until nothing else has
>>>       changed, we can avoid that, at the cost of a little bit of
>>>       complexity, and the risk of requiring more iterations.  Not
>>>       sure this is worth it.
>>>
>>>        
>> How often have you seen this occur in practice?  The complexity is
>> trivial, so if it's occuring with any regularity, I'd say go for it.
>>      
> Anecdotally, I seem to recall seeing this every now and then, and it's
> been bugging me for a while.  Practically, in the last test run I think
> it only affected one testcase out of the many I have collected for
> comparing assembly output.  I've added this code because one of the two
> other changes made it show up.
>    
I've had patches like this (accumulating regs in the main reload loop 
and having them change on further iterations pessimizing code).  They 
didn't trigger often, but fixing it was trivial, so why not? :-)

> * Use live_throughout sets to determine if a reload will force
>>>       a spill.  This seems to work better than the previous heuristic,
>>>       but it can still fail (dead_or_set isn't taken into account
>>>       since it's quite a useless set, the test isn't run on address
>>>       reloads, etc.)
>>>
>>>        
>> Well, you're detecting one of (many) cases where we know a particular
>> reload is going to cause a spill.  While it'd be nice to catch the other
>> cases, I don't think detecting all the cases where we can prove that a
>> reload is going to cause a spill is necessary.  What you're proposing is
>> a clear incremental improvement IMHO.
>>      
> Not really, it suffers from the same conceptual problem as the previous
> attempt I reverted.  If we can't get completely accurate answers (and at
> this point, we can't), it can happen that we prefer an alternative for
> the wrong reasons (e.g. both of them will force a spill, but we can
> prove it only for one of them here).
>    
Good point.

> In practice it seems to work reasonably well, at least on ARM.  More
> testing would be good.
>    
Let's hope others chime in.

jeff

Patch

Index: gcc/reload1.c
===================================================================
--- gcc.orig/reload1.c
+++ gcc/reload1.c
@@ -197,7 +197,7 @@  static HARD_REG_SET bad_spill_regs;
    insn.  This includes registers used for user variables and registers that
    we can't eliminate.  A register that appears in this set also can't be used
    to retry register allocation.  */
-static HARD_REG_SET bad_spill_regs_global;
+HARD_REG_SET bad_spill_regs_global;
 
 /* Describes order of use of registers for reloading
    of spilled pseudo-registers.  `n_spills' is the number of
@@ -227,6 +227,10 @@  static HARD_REG_SET *pseudo_forbidden_re
    marked in this set.  */
 static HARD_REG_SET used_spill_regs;
 
+/* Keeps track of hard registers used for spills in the current iteration.
+   This set will be merged into used_spill_regs in finish_spills.  */
+static HARD_REG_SET current_used_spill_regs;
+
 /* Index of last register assigned as a spill register.  We allocate in
    a round-robin fashion.  */
 static int last_spill_reg;
@@ -823,6 +827,7 @@  reload (rtx first, int global)
 
   /* Spill any hard regs that we know we can't eliminate.  */
   CLEAR_HARD_REG_SET (used_spill_regs);
+  CLEAR_HARD_REG_SET (current_used_spill_regs);
   /* There can be multiple ways to eliminate a register;
      they should be listed adjacently.
      Elimination for any register fails only if all possible ways fail.  */
@@ -1511,7 +1516,7 @@  calculate_needs_all_insns (int global)
 	    did_elimination = eliminate_regs_in_insn (insn, 0);
 
 	  /* Analyze the instruction.  */
-	  operands_changed = find_reloads (insn, 0, spill_indirect_levels,
+	  operands_changed = find_reloads (chain, 0, spill_indirect_levels,
 					   global, spill_reg_order);
 
 	  /* If a no-op set needs more than one reload, this is likely
@@ -2042,7 +2047,7 @@  find_reload_regs (struct insn_chain *cha
     }
 
   COPY_HARD_REG_SET (chain->used_spill_regs, used_spill_regs_local);
-  IOR_HARD_REG_SET (used_spill_regs, used_spill_regs_local);
+  IOR_HARD_REG_SET (current_used_spill_regs, used_spill_regs_local);
 
   memcpy (chain->rld, rld, n_reloads * sizeof (struct reload));
 }
@@ -2052,6 +2057,7 @@  select_reload_regs (void)
 {
   struct insn_chain *chain;
 
+  CLEAR_HARD_REG_SET (current_used_spill_regs);
   /* Try to satisfy the needs for each insn.  */
   for (chain = insns_need_reload; chain != 0;
        chain = chain->next_need_reload)
@@ -4279,34 +4285,10 @@  finish_spills (int global)
 {
   struct insn_chain *chain;
   int something_changed = 0;
+  int new_regs_used;
   unsigned i;
   reg_set_iterator rsi;
 
-  /* Build the spill_regs array for the function.  */
-  /* If there are some registers still to eliminate and one of the spill regs
-     wasn't ever used before, additional stack space may have to be
-     allocated to store this register.  Thus, we may have changed the offset
-     between the stack and frame pointers, so mark that something has changed.
-
-     One might think that we need only set VAL to 1 if this is a call-used
-     register.  However, the set of registers that must be saved by the
-     prologue is not identical to the call-used set.  For example, the
-     register used by the call insn for the return PC is a call-used register,
-     but must be saved by the prologue.  */
-
-  n_spills = 0;
-  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-    if (TEST_HARD_REG_BIT (used_spill_regs, i))
-      {
-	spill_reg_order[i] = n_spills;
-	spill_regs[n_spills++] = i;
-	if (num_eliminable && ! df_regs_ever_live_p (i))
-	  something_changed = 1;
-	df_set_regs_ever_live (i, true);
-      }
-    else
-      spill_reg_order[i] = -1;
-
   EXECUTE_IF_SET_IN_REG_SET (&spilled_pseudos, FIRST_PSEUDO_REGISTER, i, rsi)
     if (! ira_conflicts_p || reg_renumber[i] >= 0)
       {
@@ -4388,6 +4370,8 @@  finish_spills (int global)
 	 makes inheritance work somewhat better.  */
       if (chain->need_reload)
 	{
+	  HARD_REG_SET all_used_spill_regs;
+
 	  REG_SET_TO_HARD_REG_SET (used_by_pseudos, &chain->live_throughout);
 	  REG_SET_TO_HARD_REG_SET (used_by_pseudos2, &chain->dead_or_set);
 	  IOR_HARD_REG_SET (used_by_pseudos, used_by_pseudos2);
@@ -4398,8 +4382,10 @@  finish_spills (int global)
 	     may be not included in the value calculated here because
 	     of possible removing caller-saves insns (see function
 	     delete_caller_save_insns.  */
+	  COPY_HARD_REG_SET (all_used_spill_regs, used_spill_regs);
+	  IOR_HARD_REG_SET (all_used_spill_regs, current_used_spill_regs);
 	  COMPL_HARD_REG_SET (chain->used_spill_regs, used_by_pseudos);
-	  AND_HARD_REG_SET (chain->used_spill_regs, used_spill_regs);
+	  AND_HARD_REG_SET (chain->used_spill_regs, all_used_spill_regs);
 	}
     }
 
@@ -4425,6 +4411,43 @@  finish_spills (int global)
 	}
     }
 
+  /* Build the spill_regs array for the function.  */
+  /* If there are some registers still to eliminate and one of the spill regs
+     wasn't ever used before, additional stack space may have to be
+     allocated to store this register.  Thus, we may have changed the offset
+     between the stack and frame pointers, so mark that something has changed.
+     However, only do this if nothing else has changed - if we spilled a pseudo,
+     we may not end up using the same spill reg in the next iteration.
+
+     One might think that we need only set VAL to 1 if this is a call-used
+     register.  However, the set of registers that must be saved by the
+     prologue is not identical to the call-used set.  For example, the
+     register used by the call insn for the return PC is a call-used register,
+     but must be saved by the prologue.  */
+
+  n_spills = 0;
+  new_regs_used = 0;
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    {
+      if (!something_changed && TEST_HARD_REG_BIT (current_used_spill_regs, i))
+	{
+	  if (num_eliminable && ! df_regs_ever_live_p (i))
+	    new_regs_used++;
+	  df_set_regs_ever_live (i, true);
+	  SET_HARD_REG_BIT (used_spill_regs, i);
+	}
+
+      if (TEST_HARD_REG_BIT (used_spill_regs, i))      
+	{
+	  spill_reg_order[i] = n_spills;
+	  spill_regs[n_spills++] = i;
+	}
+      else
+	spill_reg_order[i] = -1;
+    }
+  if (new_regs_used)
+    something_changed = 1;
+
   return something_changed;
 }
 
@@ -4587,7 +4610,7 @@  reload_as_needed (int live_known)
 	      CLEAR_REG_SET (&reg_has_output_reload);
 	      CLEAR_HARD_REG_SET (reg_is_output_reload);
 
-	      find_reloads (insn, 1, spill_indirect_levels, live_known,
+	      find_reloads (chain, 1, spill_indirect_levels, live_known,
 			    spill_reg_order);
 	    }
 
Index: gcc/reload.c
===================================================================
--- gcc.orig/reload.c
+++ gcc/reload.c
@@ -2533,8 +2533,8 @@  safe_from_earlyclobber (rtx op, rtx clob
   return immune_p (op, clobber, early_data);
 }
 
-/* Main entry point of this file: search the body of INSN
-   for values that need reloading and record them with push_reload.
+/* Main entry point of this file: search the body of the insn found in
+   CHAIN for values that need reloading and record them with push_reload.
    REPLACE nonzero means record also where the values occur
    so that subst_reloads can be used.
 
@@ -2556,9 +2556,10 @@  safe_from_earlyclobber (rtx op, rtx clob
    commutative operands, reg_equiv_address substitution, or whatever.  */
 
 int
-find_reloads (rtx insn, int replace, int ind_levels, int live_known,
-	      short *reload_reg_p)
+find_reloads (struct insn_chain *chain, int replace, int ind_levels,
+	      int live_known, short *reload_reg_p)
 {
+  rtx insn = chain->insn;
   int insn_code_number;
   int i, j;
   int noperands;
@@ -2601,7 +2602,7 @@  find_reloads (rtx insn, int replace, int
   char goal_alternative_offmemok[MAX_RECOG_OPERANDS];
   char goal_alternative_earlyclobber[MAX_RECOG_OPERANDS];
   int goal_alternative_swapped;
-  int best;
+  int best, best_forced_reloads;
   int commutative;
   char operands_match[MAX_RECOG_OPERANDS][MAX_RECOG_OPERANDS];
   rtx substed_operand[MAX_RECOG_OPERANDS];
@@ -2808,7 +2809,7 @@  find_reloads (rtx insn, int replace, int
 		  || GET_CODE (recog_data.operand[i]) == PLUS))
 	    {
 	      INSN_CODE (insn) = -1;
-	      retval = find_reloads (insn, replace, ind_levels, live_known,
+	      retval = find_reloads (chain, replace, ind_levels, live_known,
 				     reload_reg_p);
 	      return retval;
 	    }
@@ -2927,6 +2928,7 @@  find_reloads (rtx insn, int replace, int
      all the operands together against the register constraints.  */
 
   best = MAX_RECOG_OPERANDS * 2 + 600;
+  best_forced_reloads = 0;
 
   swapped = 0;
   goal_alternative_swapped = 0;
@@ -3484,6 +3486,11 @@  find_reloads (rtx insn, int replace, int
 
 	      this_alternative_offmemok[i] = offmemok;
 	      losers++;
+	      if (secondary_reload_class (modified[i] != RELOAD_WRITE,
+					  this_alternative[i],
+					  operand_mode[i], operand) != NO_REGS)
+		reject += 2;
+
 	      if (badop)
 		bad = 1;
 	      /* Alternative loses if it has no regs for a reg operand.  */
@@ -3713,7 +3720,33 @@  find_reloads (rtx insn, int replace, int
 	 record it as the chosen goal for reloading.  */
       if (! bad)
 	{
-	  if (best > losers)
+	  int forced_reloads = 0;
+	  bool change_p = best > losers;
+	  if (best >= losers)
+	    {
+	      for (i = 0; i < noperands; i++)
+		{
+		  HARD_REG_SET tmp, used_by_pseudos;
+		  enum reg_class alt_class = this_alternative[i];
+		  if (this_alternative_win[i]
+		      || this_alternative_match_win[i]
+		      || this_alternative_matches[i] >= 0
+		      || alt_class == NO_REGS)
+		    continue;
+		  COPY_HARD_REG_SET (tmp, reg_class_contents[alt_class]);
+		  AND_COMPL_HARD_REG_SET (tmp, bad_spill_regs_global);
+		  REG_SET_TO_HARD_REG_SET (used_by_pseudos,
+					   &chain->live_throughout);
+		  compute_use_by_pseudos (&used_by_pseudos,
+					  &chain->live_throughout);
+		  AND_COMPL_HARD_REG_SET (tmp, used_by_pseudos);
+		  if (hard_reg_set_empty_p (tmp))
+		    forced_reloads++;
+		}
+	      if (best == losers && forced_reloads < best_forced_reloads)
+		change_p = true;
+	    }
+	  if (change_p)
 	    {
 	      for (i = 0; i < noperands; i++)
 		{
@@ -3729,6 +3762,7 @@  find_reloads (rtx insn, int replace, int
 		}
 	      goal_alternative_swapped = swapped;
 	      best = losers;
+	      best_forced_reloads = forced_reloads;
 	      goal_alternative_number = this_alternative_number;
 	      goal_earlyclobber = this_earlyclobber;
 	    }
Index: gcc/reload.h
===================================================================
--- gcc.orig/reload.h
+++ gcc/reload.h
@@ -241,9 +241,20 @@  extern struct insn_chain *reload_insn_ch
 
 /* Allocate a new insn_chain structure.  */
 extern struct insn_chain *new_insn_chain (void);
+
+/* Search the body of the insn in CHAIN for values that need reloading
+   and record them with push_reload.  REPLACE nonzero means record
+   also where the values occur so that subst_reloads can be used.  */
+extern int find_reloads (struct insn_chain *, int, int, int, short *);
 #endif
 
 #if defined SET_HARD_REG_BIT
+/* These are the hard registers that can't be used as spill register for any
+   insn.  This includes registers used for user variables and registers that
+   we can't eliminate.  A register that appears in this set also can't be used
+   to retry register allocation.  */
+extern HARD_REG_SET bad_spill_regs_global;
+
 extern void compute_use_by_pseudos (HARD_REG_SET *, bitmap);
 #endif
 
@@ -282,11 +293,6 @@  extern int operands_match_p (rtx, rtx);
 /* Return 1 if altering OP will not modify the value of CLOBBER.  */
 extern int safe_from_earlyclobber (rtx, rtx);
 
-/* Search the body of INSN for values that need reloading and record them
-   with push_reload.  REPLACE nonzero means record also where the values occur
-   so that subst_reloads can be used.  */
-extern int find_reloads (rtx, int, int, int, short *);
-
 /* Compute the sum of X and Y, making canonicalizations assumed in an
    address, namely: sum constant integers, surround the sum of two
    constants with a CONST, put the constant as the second operand, and