diff mbox series

[3/5] Make ira_get_dup_out_num handle more cases

Message ID mptblyrm4wg.fsf@arm.com
State New
Headers show
Series Tweak IRA handling of tying and earlyclobbers | expand

Commit Message

Richard Sandiford June 21, 2019, 1:42 p.m. UTC
SVE has a prefix instruction (MOVPRFX) that acts as a move but is
designed to be easily fusible with the following instruction.  The SVE
port therefore has lots of patterns with constraints of the form:

  A: operand 0: =w,?w
     ...
     operand n:  0, w

where the first alternative is a single instruction and the second
alternative uses MOVPRFX.

Ideally we want operand n to be allocated to the same register as
operand 0 in this case.

add_insn_allocno_copies is the main IRA routine that deals with tied
operands.  It is (rightly) very conservative, and only handles cases in
which we're confident about saving a full move.  So for a pattern like:

  B: operand 0: =w,w
     ...
     operand n:  0,w

we don't (and shouldn't) assume that tying operands 0 and n would
save the cost of a move.

But in A, the second alternative has a ? marker, which makes it more
expensive than the first alternative by a full reload.  So I think for
copy elision we should ignore the untied operand n in the second
alternative of A.

One approach would be to add '*' markers to each pattern and make
ira_get_dup_out_num honour them.  But I think the rule applies on
first principles, so marking with '*' shouldn't be necessary.

This patch instead makes ira_get_dup_out_num ignore expensive
alternatives if there are other alternatives that match exactly.
The cheapest way of doing that seemed to be to take expensive
alternatives out of consideration in ira_setup_alts, which provides
a bitmask of alternatives and has all the information available.
add_insn_allocno_copies is the only current user of ira_setup_alts,
so no other code should be affected.

If all available alternatives are disparaged or need a reload,
there's not much we can do to cut them down at this stage,
since it's hard to predict which operands will be reloaded and
which registers will need to be spilled.

An interesting case is patterns like this msp430 one:

;; Alternatives 2 and 3 are to handle cases generated by reload.
(define_insn "subqi3"
  [(set (match_operand:QI           0 "nonimmediate_operand" "=rYs,  rm,  &?r, ?&r")
	(minus:QI (match_operand:QI 1 "general_operand"       "0,    0,    !r,  !i")
		  (match_operand:QI 2 "general_operand"      " riYs, rmi, rmi,   r")))]
  ""
  "@
  SUB.B\t%2, %0
  SUB%X0.B\t%2, %0
  MOV%X0.B\t%1, %0 { SUB%X0.B\t%2, %0
  MOV%X0.B\t%1, %0 { SUB%X0.B\t%2, %0"
)

Here alternative 3 is significantly more expensive then alternative 0
(reject costs 0 and 606 respectively).  But if operand 1 is an integer
constant, we'll still use alternative 3 if operand 2 is an allocated
register.  On the other hand, if operand 1 is an integer constant but
operand 2 is spilled to memory, we'll move the constant into a register
and use the first alternative.

So in this case, if operand 1 is a register, we should consider
only the first two alternatives and thus try to tie operand 1
to operand 0 (which we didn't do previously).  If operand 1 is a
constant integer, we should consider at least alternatives 0, 1 and 3.
We could exclude alternative 2, but I don't have any evidence that
that's useful.


2019-06-21  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* ira.c (ira_setup_alts): If any valid alternatives have zero cost,
	exclude any others that are disparaged or that are bound to need
	a reload or spill.
	(ira_get_dup_out_num): Expand comment.

Comments

Vladimir Makarov June 24, 2019, 2:31 p.m. UTC | #1
On 2019-06-21 9:42 a.m., Richard Sandiford wrote:
> SVE has a prefix instruction (MOVPRFX) that acts as a move but is
> designed to be easily fusible with the following instruction.  The SVE
> port therefore has lots of patterns with constraints of the form:
>
>    A: operand 0: =w,?w
>       ...
>       operand n:  0, w
>
> where the first alternative is a single instruction and the second
> alternative uses MOVPRFX.
>
> Ideally we want operand n to be allocated to the same register as
> operand 0 in this case.
>
> add_insn_allocno_copies is the main IRA routine that deals with tied
> operands.  It is (rightly) very conservative, and only handles cases in
> which we're confident about saving a full move.  So for a pattern like:
>
>    B: operand 0: =w,w
>       ...
>       operand n:  0,w
>
> we don't (and shouldn't) assume that tying operands 0 and n would
> save the cost of a move.
>
> But in A, the second alternative has a ? marker, which makes it more
> expensive than the first alternative by a full reload.  So I think for
> copy elision we should ignore the untied operand n in the second
> alternative of A.
>
> One approach would be to add '*' markers to each pattern and make
> ira_get_dup_out_num honour them.  But I think the rule applies on
> first principles, so marking with '*' shouldn't be necessary.
I think direct approach is better.  The modifiers were designed for the 
old algorithms.  Now I think their treatment prevent and will prevent 
significant RA algorithm modifications.  I found that when tried to 
significantly change algorithm for calculation of register class costs 
and register class preferences.
> This patch instead makes ira_get_dup_out_num ignore expensive
> alternatives if there are other alternatives that match exactly.
> The cheapest way of doing that seemed to be to take expensive
> alternatives out of consideration in ira_setup_alts, which provides
> a bitmask of alternatives and has all the information available.
> add_insn_allocno_copies is the only current user of ira_setup_alts,
> so no other code should be affected.
>
>
> 2019-06-21  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
> 	* ira.c (ira_setup_alts): If any valid alternatives have zero cost,
> 	exclude any others that are disparaged or that are bound to need
> 	a reload or spill.
> 	(ira_get_dup_out_num): Expand comment.
The patch is ok for me.
> Index: gcc/ira.c
> ===================================================================
> --- gcc/ira.c	2019-06-21 14:34:09.455685354 +0100
> +++ gcc/ira.c	2019-06-21 14:34:13.239653892 +0100
> @@ -1787,7 +1787,9 @@ setup_prohibited_mode_move_regs (void)
>   /* Extract INSN and return the set of alternatives that we should consider.
>      This excludes any alternatives whose constraints are obviously impossible
>      to meet (e.g. because the constraint requires a constant and the operand
> -   is nonconstant).  */
> +   is nonconstant).  It also excludes alternatives that are bound to need
> +   a spill or reload, as long as we have other alternatives that match
> +   exactly.  */
>   alternative_mask
>   ira_setup_alts (rtx_insn *insn)
>   {
> @@ -1800,6 +1802,7 @@ ira_setup_alts (rtx_insn *insn)
>     preprocess_constraints (insn);
>     alternative_mask preferred = get_preferred_alternatives (insn);
>     alternative_mask alts = 0;
> +  alternative_mask exact_alts = 0;
>     /* Check that the hard reg set is enough for holding all
>        alternatives.  It is hard to imagine the situation when the
>        assertion is wrong.  */
> @@ -1816,20 +1819,24 @@ ira_setup_alts (rtx_insn *insn)
>       {
>         for (nalt = 0; nalt < recog_data.n_alternatives; nalt++)
>   	{
> -	  if (!TEST_BIT (preferred, nalt) || TEST_BIT (alts, nalt))
> +	  if (!TEST_BIT (preferred, nalt) || TEST_BIT (exact_alts, nalt))
>   	    continue;
>   
>   	  const operand_alternative *op_alt
>   	    = &recog_op_alt[nalt * recog_data.n_operands];
> +	  int this_reject = 0;
>   	  for (nop = 0; nop < recog_data.n_operands; nop++)
>   	    {
>   	      int c, len;
>   
> +	      this_reject += op_alt[nop].reject;
> +
>   	      rtx op = recog_data.operand[nop];
>   	      p = op_alt[nop].constraint;
>   	      if (*p == 0 || *p == ',')
>   		continue;
> -	
> +
> +	      bool win_p = false;
>   	      do
>   		switch (c = *p, len = CONSTRAINT_LEN (c, p), c)
>   		  {
> @@ -1847,7 +1854,14 @@ ira_setup_alts (rtx_insn *insn)
>   
>   		  case '0':  case '1':  case '2':  case '3':  case '4':
>   		  case '5':  case '6':  case '7':  case '8':  case '9':
> -		    goto op_success;
> +		    {
> +		      rtx other = recog_data.operand[c - '0'];
> +		      if (MEM_P (other)
> +			  ? rtx_equal_p (other, op)
> +			  : REG_P (op) || SUBREG_P (op))
> +			goto op_success;
> +		      win_p = true;
> +		    }
>   		    break;
>   		
>   		  case 'g':
> @@ -1861,7 +1875,11 @@ ira_setup_alts (rtx_insn *insn)
>   			{
>   			case CT_REGISTER:
>   			  if (reg_class_for_constraint (cn) != NO_REGS)
> -			    goto op_success;
> +			    {
> +			      if (REG_P (op) || SUBREG_P (op))
> +				goto op_success;
> +			      win_p = true;
> +			    }
>   			  break;
>   
>   			case CT_CONST_INT:
> @@ -1872,9 +1890,14 @@ ira_setup_alts (rtx_insn *insn)
>   			  break;
>   
>   			case CT_ADDRESS:
> +			  goto op_success;
> +
>   			case CT_MEMORY:
>   			case CT_SPECIAL_MEMORY:
> -			  goto op_success;
> +			  if (MEM_P (op))
> +			    goto op_success;
> +			  win_p = true;
> +			  break;
>   
>   			case CT_FIXED_FORM:
>   			  if (constraint_satisfied_p (op, cn))
> @@ -1885,12 +1908,22 @@ ira_setup_alts (rtx_insn *insn)
>   		    }
>   		  }
>   	      while (p += len, c);
> -	      break;
> +	      if (!win_p)
> +		break;
> +	      /* We can make the alternative match by spilling a register
> +		 to memory or loading something into a register.  Count a
> +		 cost of one reload (the equivalent of the '?' constraint).  */
> +	      this_reject += 6;
>   	    op_success:
>   	      ;
>   	    }
> +
>   	  if (nop >= recog_data.n_operands)
> -	    alts |= ALTERNATIVE_BIT (nalt);
> +	    {
> +	      alts |= ALTERNATIVE_BIT (nalt);
> +	      if (this_reject == 0)
> +		exact_alts |= ALTERNATIVE_BIT (nalt);
> +	    }
>   	}
>         if (commutative < 0)
>   	break;
> @@ -1900,13 +1933,13 @@ ira_setup_alts (rtx_insn *insn)
>         if (curr_swapped)
>   	break;
>       }
> -  return alts;
> +  return exact_alts ? exact_alts : alts;
>   }
>   
>   /* Return the number of the output non-early clobber operand which
>      should be the same in any case as operand with number OP_NUM (or
> -   negative value if there is no such operand).  The function takes
> -   only really possible alternatives into consideration.  */
> +   negative value if there is no such operand).  ALTS is the mask
> +   of alternatives that we should consider.  */
>   int
>   ira_get_dup_out_num (int op_num, alternative_mask alts)
>   {
diff mbox series

Patch

Index: gcc/ira.c
===================================================================
--- gcc/ira.c	2019-06-21 14:34:09.455685354 +0100
+++ gcc/ira.c	2019-06-21 14:34:13.239653892 +0100
@@ -1787,7 +1787,9 @@  setup_prohibited_mode_move_regs (void)
 /* Extract INSN and return the set of alternatives that we should consider.
    This excludes any alternatives whose constraints are obviously impossible
    to meet (e.g. because the constraint requires a constant and the operand
-   is nonconstant).  */
+   is nonconstant).  It also excludes alternatives that are bound to need
+   a spill or reload, as long as we have other alternatives that match
+   exactly.  */
 alternative_mask
 ira_setup_alts (rtx_insn *insn)
 {
@@ -1800,6 +1802,7 @@  ira_setup_alts (rtx_insn *insn)
   preprocess_constraints (insn);
   alternative_mask preferred = get_preferred_alternatives (insn);
   alternative_mask alts = 0;
+  alternative_mask exact_alts = 0;
   /* Check that the hard reg set is enough for holding all
      alternatives.  It is hard to imagine the situation when the
      assertion is wrong.  */
@@ -1816,20 +1819,24 @@  ira_setup_alts (rtx_insn *insn)
     {
       for (nalt = 0; nalt < recog_data.n_alternatives; nalt++)
 	{
-	  if (!TEST_BIT (preferred, nalt) || TEST_BIT (alts, nalt))
+	  if (!TEST_BIT (preferred, nalt) || TEST_BIT (exact_alts, nalt))
 	    continue;
 
 	  const operand_alternative *op_alt
 	    = &recog_op_alt[nalt * recog_data.n_operands];
+	  int this_reject = 0;
 	  for (nop = 0; nop < recog_data.n_operands; nop++)
 	    {
 	      int c, len;
 
+	      this_reject += op_alt[nop].reject;
+
 	      rtx op = recog_data.operand[nop];
 	      p = op_alt[nop].constraint;
 	      if (*p == 0 || *p == ',')
 		continue;
-	      
+
+	      bool win_p = false;
 	      do
 		switch (c = *p, len = CONSTRAINT_LEN (c, p), c)
 		  {
@@ -1847,7 +1854,14 @@  ira_setup_alts (rtx_insn *insn)
 
 		  case '0':  case '1':  case '2':  case '3':  case '4':
 		  case '5':  case '6':  case '7':  case '8':  case '9':
-		    goto op_success;
+		    {
+		      rtx other = recog_data.operand[c - '0'];
+		      if (MEM_P (other)
+			  ? rtx_equal_p (other, op)
+			  : REG_P (op) || SUBREG_P (op))
+			goto op_success;
+		      win_p = true;
+		    }
 		    break;
 		    
 		  case 'g':
@@ -1861,7 +1875,11 @@  ira_setup_alts (rtx_insn *insn)
 			{
 			case CT_REGISTER:
 			  if (reg_class_for_constraint (cn) != NO_REGS)
-			    goto op_success;
+			    {
+			      if (REG_P (op) || SUBREG_P (op))
+				goto op_success;
+			      win_p = true;
+			    }
 			  break;
 
 			case CT_CONST_INT:
@@ -1872,9 +1890,14 @@  ira_setup_alts (rtx_insn *insn)
 			  break;
 
 			case CT_ADDRESS:
+			  goto op_success;
+
 			case CT_MEMORY:
 			case CT_SPECIAL_MEMORY:
-			  goto op_success;
+			  if (MEM_P (op))
+			    goto op_success;
+			  win_p = true;
+			  break;
 
 			case CT_FIXED_FORM:
 			  if (constraint_satisfied_p (op, cn))
@@ -1885,12 +1908,22 @@  ira_setup_alts (rtx_insn *insn)
 		    }
 		  }
 	      while (p += len, c);
-	      break;
+	      if (!win_p)
+		break;
+	      /* We can make the alternative match by spilling a register
+		 to memory or loading something into a register.  Count a
+		 cost of one reload (the equivalent of the '?' constraint).  */
+	      this_reject += 6;
 	    op_success:
 	      ;
 	    }
+
 	  if (nop >= recog_data.n_operands)
-	    alts |= ALTERNATIVE_BIT (nalt);
+	    {
+	      alts |= ALTERNATIVE_BIT (nalt);
+	      if (this_reject == 0)
+		exact_alts |= ALTERNATIVE_BIT (nalt);
+	    }
 	}
       if (commutative < 0)
 	break;
@@ -1900,13 +1933,13 @@  ira_setup_alts (rtx_insn *insn)
       if (curr_swapped)
 	break;
     }
-  return alts;
+  return exact_alts ? exact_alts : alts;
 }
 
 /* Return the number of the output non-early clobber operand which
    should be the same in any case as operand with number OP_NUM (or
-   negative value if there is no such operand).  The function takes
-   only really possible alternatives into consideration.  */
+   negative value if there is no such operand).  ALTS is the mask
+   of alternatives that we should consider.  */
 int
 ira_get_dup_out_num (int op_num, alternative_mask alts)
 {