diff mbox

[RFC] Preferred rename register in regrename pass

Message ID B5E67142681B53468FAF6B7C31356562441B7622@hhmail02.hh.imgtec.org
State New
Headers show

Commit Message

Robert Suchanek Sept. 17, 2015, 2:38 p.m. UTC
Hi,

We came across a situation for MIPS64 where moves for sign-extension were
not converted into a nop because of IRA spilled some of the allocnos and
assigned different hard register for the output operand in the move.
LRA is not fixing this up as most likely the move was not introduced by
the LRA itself.  I found it hard to fix this in LRA and looked at
an alternative solution where regrename pass appeared to be the best candidate.

The patch below introduces a notion of a preferred rename register and attempts
to use the output register for an input register iff the input register dies
in an instruction.  The preferred register is validated and in the case it fails
to be validated, it falls back to the old technique of finding the best rename register.
Of course, it has slightly limited scope of use as it's not enabled be default,
however, when targeting performance one is likely to enable it via
-funroll-loops or -fpeel-loops.

I did some experiments with -funroll-loops on x86_64-unknown-linux-gnu and
the code size improved almost 0.4% on average case.  I haven't done an extensive
performance testing but it appeared SPEC2006 had some minor improvement on average,
which could be real improvement or just noise.
On MIPS64 with -funroll-loops, there were a number of cases where the unnecessary
moves turned into a nop in CSiBE.  MIPS32 also marginally improved but to
a lower degree.

The patch successfully passed x86_64-unknown-linux-gnu, mips-mti-linux-gnu and
mips-img-linux-gnu.

I'm not sure if this is something that should be enabled by default for everyone
or a target hook should be added.  Any other comments/suggestions?

Regards,
Robert

gcc/
	* regrename.c (find_preferred_rename_reg): New function.
	(record_operand_use): Remove assertion.  Allocate or resize heads and
	chains vectors, if necessary.
	(find_best_rename_reg): Use the new function and validate chosen
	register.
	(build_def_use): Don't allocate and initialize space of size 0.
	(regrename_finish): Free heads and chains vectors.
	(regrename_optimize): Pass true to initializing function.
	* regrename.h (struct operand_rr_info): Replace arrays of heads and
	chains with vectors.
---
 gcc/regrename.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
 gcc/regrename.h |  4 +--
 2 files changed, 82 insertions(+), 8 deletions(-)

Comments

Jeff Law Sept. 17, 2015, 5:01 p.m. UTC | #1
On 09/17/2015 08:38 AM, Robert Suchanek wrote:
> Hi,
>
> We came across a situation for MIPS64 where moves for sign-extension were
> not converted into a nop because of IRA spilled some of the allocnos and
> assigned different hard register for the output operand in the move.
> LRA is not fixing this up as most likely the move was not introduced by
> the LRA itself.  I found it hard to fix this in LRA and looked at
> an alternative solution where regrename pass appeared to be the best candidate.
Yea, we've never been great at tying the source & destination of 
sign/zero extensions.  The inherently different modes often caused the 
old allocator (and pre-allocator bits like regmove, and post-allocator 
bits like reload) to 'give up'.


>
> I'm not sure if this is something that should be enabled by default for everyone
> or a target hook should be added.  Any other comments/suggestions?
I'll let Bernd comment on the patch itself.  But I would say that if 
you're setting up cases where we can tie the source/dest of an extension 
together, then it's a good thing.  It'll cause more of them to turn into 
NOPs and it'll make the redundant extension elimination pass more 
effective as well.  This would be something that I'd expect to benefit 
most architectures (obviously to varying degrees).


Jeff
Eric Botcazou Sept. 17, 2015, 9:26 p.m. UTC | #2
> I'll let Bernd comment on the patch itself.  But I would say that if
> you're setting up cases where we can tie the source/dest of an extension
> together, then it's a good thing.  It'll cause more of them to turn into
> NOPs and it'll make the redundant extension elimination pass more
> effective as well.

Not if you do it in regrename though, as it's run very late in the pipeline.
If you want to make REE more effective, this would need to be done during RA.
Bernd Schmidt Sept. 18, 2015, 3:19 p.m. UTC | #3
On 09/17/2015 04:38 PM, Robert Suchanek wrote:
> We came across a situation for MIPS64 where moves for sign-extension were
> not converted into a nop because of IRA spilled some of the allocnos and
> assigned different hard register for the output operand in the move.
> LRA is not fixing this up as most likely the move was not introduced by
> the LRA itself.  I found it hard to fix this in LRA and looked at
> an alternative solution where regrename pass appeared to be the best candidate.

For reference, please post examples of the insn pattern(s) where you 
would hope to get an improvement. Do they use matching constraints 
between the input and output operands in at least one alternative?

So this does look like something that could be addressed in regrename, 
but I think the patch is not quite the way to do it.

> +/* Return a preferred rename register for HEAD.  */

Function comments ideally ought to be a little more detailed. Preferred 
how and why?

> +static int
> +find_preferred_rename_reg (du_head_p head)
> +{
> +  struct du_chain *this_du;
> +  int preferred_reg = -1;
> +
> +  for (this_du = head->first; this_du; this_du = this_du->next_use)

This loop seems to search for the insn where the chain terminates (i.e. 
the register dies). It seems strange to do this here rather than during 
the initial scan in record_out_operands where we visit every insn and 
already look for REG_DEAD notes.

> +      rtx note;
> +      insn_rr_info *p;
> +
> +      /* The preferred rename register is an output register iff an input
> +	 register dies in an instruction but the candidate must be validated by
> +	 check_new_reg_p.  */
> +      for (note = REG_NOTES (this_du->insn); note; note = XEXP (note, 1))
> +	if (insn_rr.exists()
> +	    && REG_NOTE_KIND (note) == REG_DEAD
> +	    && REGNO (XEXP (note, 0)) == head->regno
> +	    && (p = &insn_rr[INSN_UID (this_du->insn)])
> +	    && p->op_info)
> +	  {
> +	    int i;
> +	    for (i = 0; i < p->op_info->n_chains; i++)
> +	      {
> +		struct du_head *next_head = p->op_info->heads[i];
> +		if (head != next_head)

Here you're not actually verifying the chosen preferred reg is an 
output? Is the use of plain "p->op_info" (which is actually an array) 
intentional as a guess that operand 0 is the output? I'm not thrilled 
with this, and at the very least it should be "p->op_info[0]." to avoid 
reader confusion.
It's also not verifying that this is indeed a case where choosing a 
preferred reg has a beneficial effect at all.

The use of insn_rr would probably also be unnecessary if this was done 
during the scan phase.

> +		    preferred_reg = next_head->regno;

The problem here is that there's an ordering issue. What if next_head 
gets renamed afterwards? The choice of preferred register hasn't bought 
us anything in that case.

For all these reasons I'd suggest a different approach, looking for such 
situations during the scan. Try to detect a situation where
  * we have a REG_DEAD note for an existing chain
  * the insn fulfils certain conditions (i.e. it's a move, or maybe one
    of the alternatives has a matching constraint). After all, there's
    not much point in tying if the reg that dies was used in a memory
    address.
  * a new chain is started for a single output
Then, instead of picking a best register, mark the two chains as tied. 
Then, when choosing a rename register, see if a tied chain already was 
renamed, and try to pick the same register first.

> @@ -1826,7 +1900,7 @@ regrename_optimize (void)
>     df_analyze ();
>     df_set_flags (DF_DEFER_INSN_RESCAN);
>
> -  regrename_init (false);
> +  regrename_init (true);

It would be good to avoid this as it makes the renamer more expensive. I 
expect that if you follow the strategy described above, this won't be 
necessary.

> -  struct du_chain *chains[MAX_REGS_PER_ADDRESS];
> -  struct du_head *heads[MAX_REGS_PER_ADDRESS];
> +  vec<struct du_chain *> chains;
> +  vec<struct du_head *> heads;

Given that MAX_REGS_PER_ADDRESS tends to be 1 or 2 this appears to make 
things more heavyweight, especially with the extra loop needed to free 
the vecs. If possible, try to avoid this. (Again, AFAICS this 
information shouldn't really be necessary for what you're trying to do).


Bernd
diff mbox

Patch

diff --git a/gcc/regrename.c b/gcc/regrename.c
index c328c1b..90fee98 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -174,6 +174,51 @@  dump_def_use_chain (int from)
     }
 }
 
+/* Return a preferred rename register for HEAD.  */
+
+static int
+find_preferred_rename_reg (du_head_p head)
+{
+  struct du_chain *this_du;
+  int preferred_reg = -1;
+
+  for (this_du = head->first; this_du; this_du = this_du->next_use)
+    {
+      rtx note;
+      insn_rr_info *p;
+
+      /* The preferred rename register is an output register iff an input
+	 register dies in an instruction but the candidate must be validated by
+	 check_new_reg_p.  */
+      for (note = REG_NOTES (this_du->insn); note; note = XEXP (note, 1))
+	if (insn_rr.exists()
+	    && REG_NOTE_KIND (note) == REG_DEAD
+	    && REGNO (XEXP (note, 0)) == head->regno
+	    && (p = &insn_rr[INSN_UID (this_du->insn)])
+	    && p->op_info)
+	  {
+	    int i;
+	    for (i = 0; i < p->op_info->n_chains; i++)
+	      {
+		struct du_head *next_head = p->op_info->heads[i];
+		if (head != next_head)
+		  {
+		    preferred_reg = next_head->regno;
+		    if (dump_file)
+		      fprintf (dump_file,
+			       "Chain %s (%d) has preferred rename register"
+			       " %s for insn %d [%s]\n",
+			       reg_names[head->regno], head->id,
+			       reg_names[preferred_reg],
+			       INSN_UID (this_du->insn),
+			       reg_class_names[this_du->cl]);
+		  }
+	      }
+	  }
+    }
+  return preferred_reg;
+}
+
 static void
 free_chain_data (void)
 {
@@ -206,7 +251,16 @@  record_operand_use (struct du_head *head, struct du_chain *this_du)
 {
   if (cur_operand == NULL)
     return;
-  gcc_assert (cur_operand->n_chains < MAX_REGS_PER_ADDRESS);
+
+  if (!cur_operand->heads.exists ())
+    cur_operand->heads.create (0);
+  if (!cur_operand->chains.exists ())
+    cur_operand->chains.create (0);
+  if (cur_operand->heads.length () <= (unsigned) cur_operand->n_chains)
+    cur_operand->heads.safe_grow_cleared (cur_operand->n_chains + 1);
+  if (cur_operand->chains.length () <= (unsigned) cur_operand->n_chains)
+    cur_operand->chains.safe_grow_cleared (cur_operand->n_chains + 1);
+
   cur_operand->heads[cur_operand->n_chains] = head;
   cur_operand->chains[cur_operand->n_chains++] = this_du;
 }
@@ -355,6 +409,7 @@  find_rename_reg (du_head_p this_head, enum reg_class super_class,
   enum reg_class preferred_class;
   int pass;
   int best_new_reg = old_reg;
+  int preferred_reg = -1;
 
   /* Further narrow the set of registers we can use for renaming.
      If the chain needs a call-saved register, mark the call-used
@@ -370,6 +425,11 @@  find_rename_reg (du_head_p this_head, enum reg_class super_class,
   preferred_class
     = (enum reg_class) targetm.preferred_rename_class (super_class);
 
+  /* Try to find a preferred rename register for THIS_HEAD.  */
+  if ((preferred_reg = find_preferred_rename_reg (this_head)) != -1
+      && check_new_reg_p (old_reg, preferred_reg, this_head, *unavailable))
+    return preferred_reg;
+
   /* If PREFERRED_CLASS is not NO_REGS, we iterate in the first pass
      over registers that belong to PREFERRED_CLASS and try to find the
      best register within the class.  If that failed, we iterate in
@@ -1588,10 +1648,14 @@  build_def_use (basic_block bb)
 	  if (insn_rr.exists ())
 	    {
 	      insn_info = &insn_rr[INSN_UID (insn)];
-	      insn_info->op_info = XOBNEWVEC (&rename_obstack, operand_rr_info,
-					      recog_data.n_operands);
-	      memset (insn_info->op_info, 0,
-		      sizeof (operand_rr_info) * recog_data.n_operands);
+	      if (recog_data.n_operands > 0)
+		{
+		  insn_info->op_info = XOBNEWVEC (&rename_obstack,
+						  operand_rr_info,
+						  recog_data.n_operands);
+		  memset (insn_info->op_info, 0,
+			  sizeof (operand_rr_info) * recog_data.n_operands);
+		}
 	    }
 
 	  /* Simplify the code below by promoting OP_OUT to OP_INOUT in
@@ -1811,6 +1875,16 @@  regrename_init (bool insn_info)
 void
 regrename_finish (void)
 {
+  int i;
+  struct insn_rr_info *item;
+
+  FOR_EACH_VEC_ELT (insn_rr, i, item)
+    if (item->op_info)
+      {
+	item->op_info->heads.release ();
+	item->op_info->chains.release ();
+      }
+
   insn_rr.release ();
   free_chain_data ();
   obstack_free (&rename_obstack, NULL);
@@ -1826,7 +1900,7 @@  regrename_optimize (void)
   df_analyze ();
   df_set_flags (DF_DEFER_INSN_RESCAN);
 
-  regrename_init (false);
+  regrename_init (true);
 
   regrename_analyze (NULL);
 
diff --git a/gcc/regrename.h b/gcc/regrename.h
index bbe156d..2e3bc20 100644
--- a/gcc/regrename.h
+++ b/gcc/regrename.h
@@ -71,8 +71,8 @@  struct operand_rr_info
   int n_chains;
   /* Holds either the chain for the operand itself, or for the registers in
      a memory operand.  */
-  struct du_chain *chains[MAX_REGS_PER_ADDRESS];
-  struct du_head *heads[MAX_REGS_PER_ADDRESS];
+  vec<struct du_chain *> chains;
+  vec<struct du_head *> heads;
 };
 
 /* A struct to hold a vector of operand_rr_info structures describing the