diff mbox series

[1/6] ira: Add a ira_loop_border_costs class

Message ID mptlezs2a81.fsf@arm.com
State New
Headers show
Series ira: Fix performance regression in exchange2 [PR98782] | expand

Commit Message

Richard Sandiford Jan. 6, 2022, 2:46 p.m. UTC
The final index into (ira_)memory_move_cost is 1 for loads and
0 for stores.  Thus the combination:

  entry_freq * memory_cost[1] + exit_freq * memory_cost[0]

is the cost of loading a register on entry to a loop and
storing it back on exit from the loop.  This is the cost to
use if the register is successfully allocated within the
loop but is spilled in the parent loop.  Similarly:

  entry_freq * memory_cost[0] + exit_freq * memory_cost[1]

is the cost of storing a register on entry to the loop and
restoring it on exit from the loop.  This is the cost to
use if the register is spilled within the loop but is
successfully allocated in the parent loop.

The patch adds a helper class for calculating these values and
mechanically replaces the existing instances.  There is no attempt to
editorialise the choice between using “spill inside” and “spill outside”
costs.  (I think one of them is the wrong way round, but a later patch
deals with that.)

No functional change intended.

gcc/
	PR rtl-optimization/98782
	* ira-int.h (ira_loop_border_costs): New class.
	* ira-color.c (ira_loop_border_costs::ira_loop_border_costs):
	New constructor.
	(calculate_allocno_spill_cost): Use ira_loop_border_costs.
	(color_pass): Likewise.
	(move_spill_restore): Likewise.
---
 gcc/ira-color.c | 76 +++++++++++++++++++------------------------------
 gcc/ira-int.h   | 56 ++++++++++++++++++++++++++++++++++++
 2 files changed, 86 insertions(+), 46 deletions(-)

Comments

Jan Hubicka Jan. 6, 2022, 3:08 p.m. UTC | #1
> The final index into (ira_)memory_move_cost is 1 for loads and
> 0 for stores.  Thus the combination:
> 
>   entry_freq * memory_cost[1] + exit_freq * memory_cost[0]
> 
> is the cost of loading a register on entry to a loop and
> storing it back on exit from the loop.  This is the cost to
> use if the register is successfully allocated within the
> loop but is spilled in the parent loop.  Similarly:
> 
>   entry_freq * memory_cost[0] + exit_freq * memory_cost[1]
> 
> is the cost of storing a register on entry to the loop and
> restoring it on exit from the loop.  This is the cost to
> use if the register is spilled within the loop but is
> successfully allocated in the parent loop.
> 
> The patch adds a helper class for calculating these values and
> mechanically replaces the existing instances.  There is no attempt to
> editorialise the choice between using “spill inside” and “spill outside”
> costs.  (I think one of them is the wrong way round, but a later patch
> deals with that.)
> 
> No functional change intended.
> 
> gcc/
> 	PR rtl-optimization/98782
> 	* ira-int.h (ira_loop_border_costs): New class.
> 	* ira-color.c (ira_loop_border_costs::ira_loop_border_costs):
> 	New constructor.
> 	(calculate_allocno_spill_cost): Use ira_loop_border_costs.
> 	(color_pass): Likewise.
> 	(move_spill_restore): Likewise.

Thanks for working on this.  For profile bits, the patch looks good to
me.  In general I am trying to move away from the integer frequencies.
It would be more precise to calculate the m_entry_freq and m_exit_freq
as profile_count m_entry_count, m_exit_count
and do conversion only later.

count->to_frequency method basically scales it to the range 0...BB_FREQ_MAX.

Honza
Richard Sandiford Jan. 7, 2022, 11:12 a.m. UTC | #2
Jan Hubicka <hubicka@kam.mff.cuni.cz> writes:
>> The final index into (ira_)memory_move_cost is 1 for loads and
>> 0 for stores.  Thus the combination:
>> 
>>   entry_freq * memory_cost[1] + exit_freq * memory_cost[0]
>> 
>> is the cost of loading a register on entry to a loop and
>> storing it back on exit from the loop.  This is the cost to
>> use if the register is successfully allocated within the
>> loop but is spilled in the parent loop.  Similarly:
>> 
>>   entry_freq * memory_cost[0] + exit_freq * memory_cost[1]
>> 
>> is the cost of storing a register on entry to the loop and
>> restoring it on exit from the loop.  This is the cost to
>> use if the register is spilled within the loop but is
>> successfully allocated in the parent loop.
>> 
>> The patch adds a helper class for calculating these values and
>> mechanically replaces the existing instances.  There is no attempt to
>> editorialise the choice between using “spill inside” and “spill outside”
>> costs.  (I think one of them is the wrong way round, but a later patch
>> deals with that.)
>> 
>> No functional change intended.
>> 
>> gcc/
>> 	PR rtl-optimization/98782
>> 	* ira-int.h (ira_loop_border_costs): New class.
>> 	* ira-color.c (ira_loop_border_costs::ira_loop_border_costs):
>> 	New constructor.
>> 	(calculate_allocno_spill_cost): Use ira_loop_border_costs.
>> 	(color_pass): Likewise.
>> 	(move_spill_restore): Likewise.
>
> Thanks for working on this.  For profile bits, the patch looks good to
> me.  In general I am trying to move away from the integer frequencies.
> It would be more precise to calculate the m_entry_freq and m_exit_freq
> as profile_count m_entry_count, m_exit_count
> and do conversion only later.
>
> count->to_frequency method basically scales it to the range 0...BB_FREQ_MAX.

Yeah.  If I get time, I'd like to see how easy it would be to move the
RAs away from the BB_FREQ_MAX scaling.  I agree it's not really precise
enough.

The problem is that the costs are multiplied by several other things,
and there have recently been overflow problems (g:7d02c8bf75980fa2468f).
I guess the way to avoid that would be to use sreals, but the problem
then is that the costing code (particularly record_reg_classes) is very
compile-time sensitive.  So it might need a bit of surgery to move
over to sreal costs without negatively affecting compile time (and
perhaps memory consumption).

Of course, RA changes would become much easier if we got rid of
old reload. :-)  (OK, bit of a cheap shot in this case, but true
in general.)

Thanks,
Richard
Vladimir Makarov Jan. 7, 2022, 2:38 p.m. UTC | #3
On 2022-01-06 09:46, Richard Sandiford wrote:
> The final index into (ira_)memory_move_cost is 1 for loads and
> 0 for stores.  Thus the combination:
>
>    entry_freq * memory_cost[1] + exit_freq * memory_cost[0]
>
> is the cost of loading a register on entry to a loop and
> storing it back on exit from the loop.  This is the cost to
> use if the register is successfully allocated within the
> loop but is spilled in the parent loop.  Similarly:
>
>    entry_freq * memory_cost[0] + exit_freq * memory_cost[1]
>
> is the cost of storing a register on entry to the loop and
> restoring it on exit from the loop.  This is the cost to
> use if the register is spilled within the loop but is
> successfully allocated in the parent loop.
>
> The patch adds a helper class for calculating these values and
> mechanically replaces the existing instances.  There is no attempt to
> editorialise the choice between using “spill inside” and “spill outside”
> costs.  (I think one of them is the wrong way round, but a later patch
> deals with that.)
>
> No functional change intended.
>
> gcc/
> 	PR rtl-optimization/98782
> 	* ira-int.h (ira_loop_border_costs): New class.
> 	* ira-color.c (ira_loop_border_costs::ira_loop_border_costs):
> 	New constructor.
> 	(calculate_allocno_spill_cost): Use ira_loop_border_costs.
> 	(color_pass): Likewise.
> 	(move_spill_restore): Likewise.
It is OK for me.
diff mbox series

Patch

diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index e0b4e490043..66c11710b97 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -2567,13 +2567,23 @@  ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int regno, bool exit_p)
   return REG_FREQ_FROM_EDGE_FREQ (freq);
 }
 
+/* Construct an object that describes the boundary between A and its
+   parent allocno.  */
+ira_loop_border_costs::ira_loop_border_costs (ira_allocno_t a)
+  : m_mode (ALLOCNO_MODE (a)),
+    m_class (ALLOCNO_CLASS (a)),
+    m_entry_freq (ira_loop_edge_freq (ALLOCNO_LOOP_TREE_NODE (a),
+				      ALLOCNO_REGNO (a), false)),
+    m_exit_freq (ira_loop_edge_freq (ALLOCNO_LOOP_TREE_NODE (a),
+				     ALLOCNO_REGNO (a), true))
+{
+}
+
 /* Calculate and return the cost of putting allocno A into memory.  */
 static int
 calculate_allocno_spill_cost (ira_allocno_t a)
 {
   int regno, cost;
-  machine_mode mode;
-  enum reg_class rclass;
   ira_allocno_t parent_allocno;
   ira_loop_tree_node_t parent_node, loop_node;
 
@@ -2586,24 +2596,12 @@  calculate_allocno_spill_cost (ira_allocno_t a)
     return cost;
   if ((parent_allocno = parent_node->regno_allocno_map[regno]) == NULL)
     return cost;
-  mode = ALLOCNO_MODE (a);
-  rclass = ALLOCNO_CLASS (a);
+  ira_loop_border_costs border_costs (a);
   if (ALLOCNO_HARD_REGNO (parent_allocno) < 0)
-    cost -= (ira_memory_move_cost[mode][rclass][0]
-	     * ira_loop_edge_freq (loop_node, regno, true)
-	     + ira_memory_move_cost[mode][rclass][1]
-	     * ira_loop_edge_freq (loop_node, regno, false));
+    cost -= border_costs.spill_outside_loop_cost ();
   else
-    {
-      ira_init_register_move_cost_if_necessary (mode);
-      cost += ((ira_memory_move_cost[mode][rclass][1]
-		* ira_loop_edge_freq (loop_node, regno, true)
-		+ ira_memory_move_cost[mode][rclass][0]
-		* ira_loop_edge_freq (loop_node, regno, false))
-	       - (ira_register_move_cost[mode][rclass][rclass]
-		  * (ira_loop_edge_freq (loop_node, regno, false)
-		     + ira_loop_edge_freq (loop_node, regno, true))));
-    }
+    cost += (border_costs.spill_inside_loop_cost ()
+	     - border_costs.move_between_loops_cost ());
   return cost;
 }
 
@@ -3342,7 +3340,7 @@  static void
 color_pass (ira_loop_tree_node_t loop_tree_node)
 {
   int regno, hard_regno, index = -1, n;
-  int cost, exit_freq, enter_freq;
+  int cost;
   unsigned int j;
   bitmap_iterator bi;
   machine_mode mode;
@@ -3466,8 +3464,6 @@  color_pass (ira_loop_tree_node_t loop_tree_node)
 		}
 	      continue;
 	    }
-	  exit_freq = ira_loop_edge_freq (subloop_node, regno, true);
-	  enter_freq = ira_loop_edge_freq (subloop_node, regno, false);
 	  ira_assert (regno < ira_reg_equiv_len);
 	  if (ira_equiv_no_lvalue_p (regno))
 	    {
@@ -3483,16 +3479,16 @@  color_pass (ira_loop_tree_node_t loop_tree_node)
 	    }
 	  else if (hard_regno < 0)
 	    {
+	      ira_loop_border_costs border_costs (subloop_allocno);
 	      ALLOCNO_UPDATED_MEMORY_COST (subloop_allocno)
-		-= ((ira_memory_move_cost[mode][rclass][1] * enter_freq)
-		    + (ira_memory_move_cost[mode][rclass][0] * exit_freq));
+		-= border_costs.spill_outside_loop_cost ();
 	    }
 	  else
 	    {
+	      ira_loop_border_costs border_costs (subloop_allocno);
 	      aclass = ALLOCNO_CLASS (subloop_allocno);
 	      ira_init_register_move_cost_if_necessary (mode);
-	      cost = (ira_register_move_cost[mode][rclass][rclass]
-		      * (exit_freq + enter_freq));
+	      cost = border_costs.move_between_loops_cost ();
 	      ira_allocate_and_set_or_copy_costs
 		(&ALLOCNO_UPDATED_HARD_REG_COSTS (subloop_allocno), aclass,
 		 ALLOCNO_UPDATED_CLASS_COST (subloop_allocno),
@@ -3508,8 +3504,7 @@  color_pass (ira_loop_tree_node_t loop_tree_node)
 		ALLOCNO_UPDATED_CLASS_COST (subloop_allocno)
 		  = ALLOCNO_UPDATED_HARD_REG_COSTS (subloop_allocno)[index];
 	      ALLOCNO_UPDATED_MEMORY_COST (subloop_allocno)
-		+= (ira_memory_move_cost[mode][rclass][0] * enter_freq
-		    + ira_memory_move_cost[mode][rclass][1] * exit_freq);
+		+= border_costs.spill_inside_loop_cost ();
 	    }
 	}
     }
@@ -3550,7 +3545,6 @@  move_spill_restore (void)
 {
   int cost, regno, hard_regno, hard_regno2, index;
   bool changed_p;
-  int enter_freq, exit_freq;
   machine_mode mode;
   enum reg_class rclass;
   ira_allocno_t a, parent_allocno, subloop_allocno;
@@ -3605,38 +3599,28 @@  move_spill_restore (void)
 		       - (ALLOCNO_HARD_REG_COSTS (subloop_allocno) == NULL
 			  ? ALLOCNO_CLASS_COST (subloop_allocno)
 			  : ALLOCNO_HARD_REG_COSTS (subloop_allocno)[index]));
-	      exit_freq = ira_loop_edge_freq (subloop_node, regno, true);
-	      enter_freq = ira_loop_edge_freq (subloop_node, regno, false);
+	      ira_loop_border_costs border_costs (subloop_allocno);
 	      if ((hard_regno2 = ALLOCNO_HARD_REGNO (subloop_allocno)) < 0)
-		cost -= (ira_memory_move_cost[mode][rclass][0] * exit_freq
-			 + ira_memory_move_cost[mode][rclass][1] * enter_freq);
+		cost -= border_costs.spill_outside_loop_cost ();
 	      else
 		{
-		  cost
-		    += (ira_memory_move_cost[mode][rclass][0] * exit_freq
-			+ ira_memory_move_cost[mode][rclass][1] * enter_freq);
+		  cost += border_costs.spill_outside_loop_cost ();
 		  if (hard_regno2 != hard_regno)
-		    cost -= (ira_register_move_cost[mode][rclass][rclass]
-			     * (exit_freq + enter_freq));
+		    cost -= border_costs.move_between_loops_cost ();
 		}
 	    }
 	  if ((parent = loop_node->parent) != NULL
 	      && (parent_allocno = parent->regno_allocno_map[regno]) != NULL)
 	    {
 	      ira_assert (rclass == ALLOCNO_CLASS (parent_allocno));
-	      exit_freq	= ira_loop_edge_freq (loop_node, regno, true);
-	      enter_freq = ira_loop_edge_freq (loop_node, regno, false);
+	      ira_loop_border_costs border_costs (a);
 	      if ((hard_regno2 = ALLOCNO_HARD_REGNO (parent_allocno)) < 0)
-		cost -= (ira_memory_move_cost[mode][rclass][0] * exit_freq
-			 + ira_memory_move_cost[mode][rclass][1] * enter_freq);
+		cost -= border_costs.spill_outside_loop_cost ();
 	      else
 		{
-		  cost
-		    += (ira_memory_move_cost[mode][rclass][1] * exit_freq
-			+ ira_memory_move_cost[mode][rclass][0] * enter_freq);
+		  cost += border_costs.spill_inside_loop_cost ();
 		  if (hard_regno2 != hard_regno)
-		    cost -= (ira_register_move_cost[mode][rclass][rclass]
-			     * (exit_freq + enter_freq));
+		    cost -= border_costs.move_between_loops_cost ();
 		}
 	    }
 	  if (cost < 0)
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index 59d2872cac3..b32c80d4c9e 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -1539,4 +1539,60 @@  ira_need_caller_save_p (ira_allocno_t a, unsigned int regno)
 				     ALLOCNO_MODE (a), regno);
 }
 
+/* Represents the boundary between an allocno in one loop and its parent
+   allocno in the enclosing loop.  It is usually possible to change a
+   register's allocation on this boundary; the class provides routines
+   for calculating the cost of such changes.  */
+class ira_loop_border_costs
+{
+public:
+  ira_loop_border_costs (ira_allocno_t);
+
+  int move_between_loops_cost () const;
+  int spill_outside_loop_cost () const;
+  int spill_inside_loop_cost () const;
+
+private:
+  /* The mode and class of the child allocno.  */
+  machine_mode m_mode;
+  reg_class m_class;
+
+  /* Sums the frequencies of the entry edges and the exit edges.  */
+  int m_entry_freq, m_exit_freq;
+};
+
+/* Return the cost of storing the register on entry to the loop and
+   loading it back on exit from the loop.  This is the cost to use if
+   the register is spilled within the loop but is successfully allocated
+   in the parent loop.  */
+inline int
+ira_loop_border_costs::spill_inside_loop_cost () const
+{
+  return (m_entry_freq * ira_memory_move_cost[m_mode][m_class][0]
+	  + m_exit_freq * ira_memory_move_cost[m_mode][m_class][1]);
+}
+
+/* Return the cost of loading the register on entry to the loop and
+   storing it back on exit from the loop.  This is the cost to use if
+   the register is successfully allocated within the loop but is spilled
+   in the parent loop.  */
+inline int
+ira_loop_border_costs::spill_outside_loop_cost () const
+{
+  return (m_entry_freq * ira_memory_move_cost[m_mode][m_class][1]
+	  + m_exit_freq * ira_memory_move_cost[m_mode][m_class][0]);
+}
+
+/* Return the cost of moving the pseudo register between different hard
+   registers on entry and exit from the loop.  This is the cost to use
+   if the register is successfully allocated within both this loop and
+   the parent loop, but the allocations for the loops differ.  */
+inline int
+ira_loop_border_costs::move_between_loops_cost () const
+{
+  ira_init_register_move_cost_if_necessary (m_mode);
+  auto move_cost = ira_register_move_cost[m_mode][m_class][m_class];
+  return move_cost * (m_entry_freq + m_exit_freq);
+}
+
 #endif /* GCC_IRA_INT_H */