diff mbox series

[RS6000] rs6000_rtx_costs reduce cost for SETs

Message ID 20200915011946.3395-8-amodra@gmail.com
State New
Headers show
Series [RS6000] rs6000_rtx_costs reduce cost for SETs | expand

Commit Message

Alan Modra Sept. 15, 2020, 1:19 a.m. UTC
Also use rs6000_cost only for speed.

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost for SETs
	when insn operation cost handled on recursive call.  Only use
	rs6000_cost for speed.  Tidy break/return.  Tidy AND costing.

Comments

Segher Boessenkool Sept. 17, 2020, 5:51 p.m. UTC | #1
Hi!

On Tue, Sep 15, 2020 at 10:49:45AM +0930, Alan Modra wrote:
> Also use rs6000_cost only for speed.

More directly: use something completely different for !speed, namely,
code size.

> -      if (CONST_INT_P (XEXP (x, 1))
> -	  && satisfies_constraint_I (XEXP (x, 1)))
> +      if (!speed)
> +	/* A little more than one insn so that nothing is tempted to
> +	   turn a shift left into a multiply.  */
> +	*total = COSTS_N_INSNS (1) + 1;

Please don't.  We have a lot of code elsewhere to handle this directly,
already.  Also, this is just wrong for size.  Five shifts is *not*
better than four muls.  If that is the only way to get good results,
than unfortunately we probably have to; but do not do this without any
proof.

>      case FMA:
> -      if (mode == SFmode)
> +      if (!speed)
> +	*total = COSTS_N_INSNS (1) + 1;

Not here, either.

>      case DIV:
>      case MOD:
>        if (FLOAT_MODE_P (mode))
>  	{
> -	  *total = mode == DFmode ? rs6000_cost->ddiv
> -				  : rs6000_cost->sdiv;
> +	  if (!speed)
> +	    *total = COSTS_N_INSNS (1) + 2;

And why + 2 even?

> -	  if (GET_MODE (XEXP (x, 1)) == DImode)
> +	  if (!speed)
> +	    *total = COSTS_N_INSNS (1) + 2;
> +	  else if (GET_MODE (XEXP (x, 1)) == DImode)
>  	    *total = rs6000_cost->divdi;
>  	  else
>  	    *total = rs6000_cost->divsi;

(more)

> @@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
>        return false;
>  
>      case AND:
> +      *total = COSTS_N_INSNS (1);
>        right = XEXP (x, 1);
>        if (CONST_INT_P (right))
>  	{
> @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
>  	       || left_code == LSHIFTRT)
>  	      && rs6000_is_valid_shift_mask (right, left, mode))
>  	    {
> -	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> -	      if (!CONST_INT_P (XEXP (left, 1)))
> -		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
> -	      *total += COSTS_N_INSNS (1);
> +	      rtx reg_op = XEXP (left, 0);
> +	      if (!REG_P (reg_op))
> +		*total += rtx_cost (reg_op, mode, left_code, 0, speed);
> +	      reg_op = XEXP (left, 1);
> +	      if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> +		*total += rtx_cost (reg_op, mode, left_code, 1, speed);
>  	      return true;
>  	    }
>  	}
> -
> -      *total = COSTS_N_INSNS (1);
>        return false;

This doesn't improve anything?  It just makes it different from all
surrounding code?

> @@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
>        if (outer_code == TRUNCATE
>  	  && GET_CODE (XEXP (x, 0)) == MULT)
>  	{
> -	  if (mode == DImode)
> +	  if (!speed)
> +	    *total = COSTS_N_INSNS (1) + 1;

(more)

> +    case SET:
> +      /* The default cost of a SET is the number of general purpose
> +	 regs being set multiplied by COSTS_N_INSNS (1).  That only
> +	 works where the incremental cost of the operation and
> +	 operands is zero, when the operation performed can be done in
> +	 one instruction.  For other cases where we add COSTS_N_INSNS
> +	 for some operation (see point 5 above), COSTS_N_INSNS (1)
> +	 should be subtracted from the total cost.  */

What does "incremental cost" mean there?  If what increases?

> +      {
> +	rtx_code src_code = GET_CODE (SET_SRC (x));
> +	if (src_code == CONST_INT
> +	    || src_code == CONST_DOUBLE
> +	    || src_code == CONST_WIDE_INT)
> +	  return false;
> +	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> +			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));

This should use set_src_cost, if anything.  But that will recurse then,
currently?  Ugh.

Using rtx_cost for SET_DEST is problematic, too.

What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
that, hrm.

rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
(ab)used for that, sigh.

Many targets have something for it already, but all quite different from
this.

> +	if (set_cost >= COSTS_N_INSNS (1))
> +	  *total += set_cost - COSTS_N_INSNS (1);

I don't understand this part at all, for example.  Why not just
  *total += set_cost - COSTS_N_INSNS (1);
?  If set_cost is lower than one insn's cost, don't we have a problem
already?


Generic things.  Please split this patch up when sending it again, it
does too many different things, and many of those are not obvious.

All such changes that aren't completely obvious (like the previous ones
were) should have some measurement.  We are in stage1, and we will
notice (non-trivial) degradations, but if we can expect degradations
(like for this patch), it needs benchmarking.

Since you add !speed all over the place, maybe we should just have a
separate function that does !speed?  It looks like quite a few things
will simplify.


Segher
Alan Modra Sept. 18, 2020, 3:38 a.m. UTC | #2
On Thu, Sep 17, 2020 at 12:51:25PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Sep 15, 2020 at 10:49:45AM +0930, Alan Modra wrote:
> > Also use rs6000_cost only for speed.
> 
> More directly: use something completely different for !speed, namely,
> code size.

Yes, that might be better.

> > -      if (CONST_INT_P (XEXP (x, 1))
> > -	  && satisfies_constraint_I (XEXP (x, 1)))
> > +      if (!speed)
> > +	/* A little more than one insn so that nothing is tempted to
> > +	   turn a shift left into a multiply.  */
> > +	*total = COSTS_N_INSNS (1) + 1;
> 
> Please don't.  We have a lot of code elsewhere to handle this directly,
> already.  Also, this is just wrong for size.  Five shifts is *not*
> better than four muls.  If that is the only way to get good results,
> than unfortunately we probably have to; but do not do this without any
> proof.

Huh.  If a cost of 5 is "just wrong for size" then you prefer a cost
of 12 for example (power9 mulsi or muldi rs6000_cost)?  Noticing that
result for !speed rs6000_rtx_costs is the entire basis for the !speed
changes.  I don't have any proof that this is correct.

> >      case FMA:
> > -      if (mode == SFmode)
> > +      if (!speed)
> > +	*total = COSTS_N_INSNS (1) + 1;
> 
> Not here, either.
> 
> >      case DIV:
> >      case MOD:
> >        if (FLOAT_MODE_P (mode))
> >  	{
> > -	  *total = mode == DFmode ? rs6000_cost->ddiv
> > -				  : rs6000_cost->sdiv;
> > +	  if (!speed)
> > +	    *total = COSTS_N_INSNS (1) + 2;
> 
> And why + 2 even?
> 
> > -	  if (GET_MODE (XEXP (x, 1)) == DImode)
> > +	  if (!speed)
> > +	    *total = COSTS_N_INSNS (1) + 2;
> > +	  else if (GET_MODE (XEXP (x, 1)) == DImode)
> >  	    *total = rs6000_cost->divdi;
> >  	  else
> >  	    *total = rs6000_cost->divsi;
> 
> (more)

OK, I can remove all the !speed changes.  To be honest, I didn't look
anywhere near as much at code size changes as I worried about
performance.  And about not regressing any fiddly testcase we have.

> > @@ -21368,6 +21378,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> >        return false;
> >  
> >      case AND:
> > +      *total = COSTS_N_INSNS (1);
> >        right = XEXP (x, 1);
> >        if (CONST_INT_P (right))
> >  	{
> > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> >  	       || left_code == LSHIFTRT)
> >  	      && rs6000_is_valid_shift_mask (right, left, mode))
> >  	    {
> > -	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > -	      if (!CONST_INT_P (XEXP (left, 1)))
> > -		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
> > -	      *total += COSTS_N_INSNS (1);
> > +	      rtx reg_op = XEXP (left, 0);
> > +	      if (!REG_P (reg_op))
> > +		*total += rtx_cost (reg_op, mode, left_code, 0, speed);
> > +	      reg_op = XEXP (left, 1);
> > +	      if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> > +		*total += rtx_cost (reg_op, mode, left_code, 1, speed);
> >  	      return true;
> >  	    }
> >  	}
> > -
> > -      *total = COSTS_N_INSNS (1);
> >        return false;
> 
> This doesn't improve anything?  It just makes it different from all
> surrounding code?

So it moves the common COSTS_N_INSNS (1) count and doesn't recurse for
regs, like it doesn't for const_int.

> > @@ -21519,7 +21530,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> >        if (outer_code == TRUNCATE
> >  	  && GET_CODE (XEXP (x, 0)) == MULT)
> >  	{
> > -	  if (mode == DImode)
> > +	  if (!speed)
> > +	    *total = COSTS_N_INSNS (1) + 1;
> 
> (more)
> 
> > +    case SET:
> > +      /* The default cost of a SET is the number of general purpose
> > +	 regs being set multiplied by COSTS_N_INSNS (1).  That only
> > +	 works where the incremental cost of the operation and
> > +	 operands is zero, when the operation performed can be done in
> > +	 one instruction.  For other cases where we add COSTS_N_INSNS
> > +	 for some operation (see point 5 above), COSTS_N_INSNS (1)
> > +	 should be subtracted from the total cost.  */
> 
> What does "incremental cost" mean there?  If what increases?
> 
> > +      {
> > +	rtx_code src_code = GET_CODE (SET_SRC (x));
> > +	if (src_code == CONST_INT
> > +	    || src_code == CONST_DOUBLE
> > +	    || src_code == CONST_WIDE_INT)
> > +	  return false;
> > +	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > +			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));
> 
> This should use set_src_cost, if anything.  But that will recurse then,
> currently?  Ugh.
> 
> Using rtx_cost for SET_DEST is problematic, too.
> 
> What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
> that, hrm.
> 
> rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
> (ab)used for that, sigh.
> 
> Many targets have something for it already, but all quite different from
> this.

Right, you are starting to understand just how difficult it is to do
anything at all to rs6000_rtx_costs.

> > +	if (set_cost >= COSTS_N_INSNS (1))
> > +	  *total += set_cost - COSTS_N_INSNS (1);
> 
> I don't understand this part at all, for example.  Why not just
>   *total += set_cost - COSTS_N_INSNS (1);
> ?  If set_cost is lower than one insn's cost, don't we have a problem
> already?

The set_cost I calculate here from src and dest can easily be zero.
(set (reg) (reg)) and (set (reg) (const_int 0)) for example have a
dest cost of zero and a src cost of zero.  That can't change without
breaking places where rtx_costs is called to compare pieces of RTL.
Here though we happen to be looking at a SET, so have an entire
instruction.  The value returned should be comparable to our
instruction costs.  That's tricky to do, and this change is just a
hack.  Without the hack I saw some testcases regress.

I don't like this hack any more than you do reviewing it!

> 
> Generic things.  Please split this patch up when sending it again, it
> does too many different things, and many of those are not obvious.
> 
> All such changes that aren't completely obvious (like the previous ones
> were) should have some measurement.  We are in stage1, and we will
> notice (non-trivial) degradations, but if we can expect degradations
> (like for this patch), it needs benchmarking.

Pat did benchmark these changes..  I was somewhat surprised to see
a small improvement in spec results.

> Since you add !speed all over the place, maybe we should just have a
> separate function that does !speed?  It looks like quite a few things
> will simplify.

Revised patch as follows.

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost for SETs
	when insn operation cost handled on recursive call.  Tidy
	break/return.  Tidy AND costing.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6af8a9a31cb..26c2f443502 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21397,7 +21397,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	*total = rs6000_cost->fp;
       else
 	*total = rs6000_cost->dmul;
-      break;
+      return false;
 
     case DIV:
     case MOD:
@@ -21457,6 +21457,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case AND:
+      *total = COSTS_N_INSNS (1);
       right = XEXP (x, 1);
       if (CONST_INT_P (right))
 	{
@@ -21469,15 +21470,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	       || left_code == LSHIFTRT)
 	      && rs6000_is_valid_shift_mask (right, left, mode))
 	    {
-	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
-	      if (!CONST_INT_P (XEXP (left, 1)))
-		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
-	      *total += COSTS_N_INSNS (1);
+	      rtx reg_op = XEXP (left, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, left_code, 0, speed);
+	      reg_op = XEXP (left, 1);
+	      if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
+		*total += rtx_cost (reg_op, mode, left_code, 1, speed);
 	      return true;
 	    }
 	}
-
-      *total = COSTS_N_INSNS (1);
       return false;
 
     case IOR:
@@ -21575,7 +21576,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = rs6000_cost->fp;
 	  return false;
 	}
-      break;
+      return false;
 
     case NE:
     case EQ:
@@ -21613,13 +21614,40 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = 0;
 	  return true;
 	}
-      break;
+      return false;
+
+    case SET:
+      /* The default cost of a SET is the number of general purpose
+	 regs being set multiplied by COSTS_N_INSNS (1).  Here we
+	 happen to be looking at a SET, so have an instruction rather
+	 than just a piece of RTL and want to return a cost comparable
+	 to rs6000 instruction costing.  That's a little complicated
+	 because in some cases the cost of SET operands is non-zero,
+	 see point 5 above and cost of PLUS for example, and in
+	 others it is zero, for example for (set (reg) (reg)).
+	 But (set (reg) (reg)) actually costs the same as 
+	 (set (reg) (plus (reg) (reg))).  Hack around this by
+	 subtracting COSTS_N_INSNS (1) from the operand cost in cases
+	 were we add COSTS_N_INSNS (1) for some operation.  Don't do
+	 so for constants that might cost more than zero because they
+	 don't fit in one instruction.  FIXME: rtx_costs should not be
+	 looking at entire instructions.  */
+      {
+	rtx_code src_code = GET_CODE (SET_SRC (x));
+	if (src_code == CONST_INT
+	    || src_code == CONST_DOUBLE
+	    || src_code == CONST_WIDE_INT)
+	  return false;
+	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
+			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));
+	if (set_cost >= COSTS_N_INSNS (1))
+	  *total += set_cost - COSTS_N_INSNS (1);
+	return true;
+      }
 
     default:
-      break;
+      return false;
     }
-
-  return false;
 }
 
 /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost.  */
Segher Boessenkool Sept. 18, 2020, 6:13 p.m. UTC | #3
On Fri, Sep 18, 2020 at 01:08:42PM +0930, Alan Modra wrote:
> On Thu, Sep 17, 2020 at 12:51:25PM -0500, Segher Boessenkool wrote:
> > > -      if (CONST_INT_P (XEXP (x, 1))
> > > -	  && satisfies_constraint_I (XEXP (x, 1)))
> > > +      if (!speed)
> > > +	/* A little more than one insn so that nothing is tempted to
> > > +	   turn a shift left into a multiply.  */
> > > +	*total = COSTS_N_INSNS (1) + 1;
> > 
> > Please don't.  We have a lot of code elsewhere to handle this directly,
> > already.  Also, this is just wrong for size.  Five shifts is *not*
> > better than four muls.  If that is the only way to get good results,
> > than unfortunately we probably have to; but do not do this without any
> > proof.
> 
> Huh.  If a cost of 5 is "just wrong for size" then you prefer a cost
> of 12 for example (power9 mulsi or muldi rs6000_cost)?  Noticing that
> result for !speed rs6000_rtx_costs is the entire basis for the !speed
> changes.  I don't have any proof that this is correct.

No, just 4, like all other insns -- it is the size of the insn, after
all!  Not accidentally using the speed cost is a fine change of course,
but the + 1 isn't based on anything afaics, so it can only hurt.

> > > -	  if (GET_MODE (XEXP (x, 1)) == DImode)
> > > +	  if (!speed)
> > > +	    *total = COSTS_N_INSNS (1) + 2;
> > > +	  else if (GET_MODE (XEXP (x, 1)) == DImode)
> > >  	    *total = rs6000_cost->divdi;
> > >  	  else
> > >  	    *total = rs6000_cost->divsi;
> > 
> > (more)
> 
> OK, I can remove all the !speed changes.

No, those are quite okay (as a separate patch though).  But you
shouldn't add random numbers to it, one insn is just one insn.  The cost
function for -O2 is just code size, nothing more, nothing less.

> > >      case AND:
> > > +      *total = COSTS_N_INSNS (1);
> > >        right = XEXP (x, 1);
> > >        if (CONST_INT_P (right))
> > >  	{
> > > @@ -21380,15 +21391,15 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
> > >  	       || left_code == LSHIFTRT)
> > >  	      && rs6000_is_valid_shift_mask (right, left, mode))
> > >  	    {
> > > -	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
> > > -	      if (!CONST_INT_P (XEXP (left, 1)))
> > > -		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
> > > -	      *total += COSTS_N_INSNS (1);
> > > +	      rtx reg_op = XEXP (left, 0);
> > > +	      if (!REG_P (reg_op))
> > > +		*total += rtx_cost (reg_op, mode, left_code, 0, speed);
> > > +	      reg_op = XEXP (left, 1);
> > > +	      if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
> > > +		*total += rtx_cost (reg_op, mode, left_code, 1, speed);
> > >  	      return true;
> > >  	    }
> > >  	}
> > > -
> > > -      *total = COSTS_N_INSNS (1);
> > >        return false;
> > 
> > This doesn't improve anything?  It just makes it different from all
> > surrounding code?
> 
> So it moves the common COSTS_N_INSNS (1) count and doesn't recurse for
> regs, like it doesn't for const_int.

I meant that *total set only.  It is fine to have one extra source line.

Does not recursing for regs help anything?  Yes, for CONST_INT that is
questionable as well, but adding more stuff like it does not help
(without actual justification).

This routine is not hot at all.  Maybe decades ago it was, and back then
the thought was that micro-optimisations like this were useful (and
maybe they were, _back then_ :-) )

> > > +    case SET:
> > > +      /* The default cost of a SET is the number of general purpose
> > > +	 regs being set multiplied by COSTS_N_INSNS (1).  That only
> > > +	 works where the incremental cost of the operation and
> > > +	 operands is zero, when the operation performed can be done in
> > > +	 one instruction.  For other cases where we add COSTS_N_INSNS
> > > +	 for some operation (see point 5 above), COSTS_N_INSNS (1)
> > > +	 should be subtracted from the total cost.  */
> > 
> > What does "incremental cost" mean there?  If what increases?

Can you improve that comment please?

> > > +      {
> > > +	rtx_code src_code = GET_CODE (SET_SRC (x));
> > > +	if (src_code == CONST_INT
> > > +	    || src_code == CONST_DOUBLE
> > > +	    || src_code == CONST_WIDE_INT)
> > > +	  return false;
> > > +	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
> > > +			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));
> > 
> > This should use set_src_cost, if anything.  But that will recurse then,
> > currently?  Ugh.
> > 
> > Using rtx_cost for SET_DEST is problematic, too.
> > 
> > What (if anything!) calls this for a SET?  Oh, set_rtx_cost still does
> > that, hrm.
> > 
> > rtx_cost costs RTL *expressions*.  Not instructions.  Except where it is
> > (ab)used for that, sigh.
> > 
> > Many targets have something for it already, but all quite different from
> > this.
> 
> Right, you are starting to understand just how difficult it is to do
> anything at all to rs6000_rtx_costs.

"Starting to understand", lol :-)  I created insn_cost back in 2017, for
this and may related problems: trying to derive the cost of an RTL
expression does not make sense at all in almost all places, what we want
to know is if the machine code is faster (or smaller for -Os).  That was
after many years of fighting this (for rs6000 and other targets; hardly
any of this is specific to our port).

Eventually we should be able to delete this completely.  That will be a
good day.

Currently mostly set_rtx_cost and set_src_cost are in the way of getting
there, and mostly because many ways those are used simply make no sense
at all.

Other than those, there is just a loooong tail of random stuff.


Improving this is fine of course: it will be quite a long time before it
isn't useful anymore, and longer until it isn't used anymore.


> > > +	if (set_cost >= COSTS_N_INSNS (1))
> > > +	  *total += set_cost - COSTS_N_INSNS (1);
> > 
> > I don't understand this part at all, for example.  Why not just
> >   *total += set_cost - COSTS_N_INSNS (1);
> > ?  If set_cost is lower than one insn's cost, don't we have a problem
> > already?
> 
> The set_cost I calculate here from src and dest can easily be zero.
> (set (reg) (reg)) and (set (reg) (const_int 0)) for example have a
> dest cost of zero and a src cost of zero.  That can't change without
> breaking places where rtx_costs is called to compare pieces of RTL.
> Here though we happen to be looking at a SET, so have an entire
> instruction.  The value returned should be comparable to our
> instruction costs.  That's tricky to do, and this change is just a
> hack.  Without the hack I saw some testcases regress.
> 
> I don't like this hack any more than you do reviewing it!

So maybe handle all SETs with a zero set_cost before the general case?
Isn't that what we used to do, and what the general code does?

> > Generic things.  Please split this patch up when sending it again, it
> > does too many different things, and many of those are not obvious.
> > 
> > All such changes that aren't completely obvious (like the previous ones
> > were) should have some measurement.  We are in stage1, and we will
> > notice (non-trivial) degradations, but if we can expect degradations
> > (like for this patch), it needs benchmarking.
> 
> Pat did benchmark these changes..  I was somewhat surprised to see
> a small improvement in spec results.

Thanks (to both of you).  Interesting!  Which of these unrelated changes
does this come from?


Segher
Alan Modra Sept. 21, 2020, 7:07 a.m. UTC | #4
On Fri, Sep 18, 2020 at 01:13:18PM -0500, Segher Boessenkool wrote:
> Thanks (to both of you).  Interesting!  Which of these unrelated changes
> does this come from?

Most of the changes I saw in code generation (not in spec, I didn't
look there, but in gcc) came down to this change to the cost for SETs,
and "rs6000_rtx_costs multi-insn constants".  I expect they were the
changes that made most difference to spec results, with this patch
likely resulting in more if-conversion.

So here is the patch again, this time without any distracting other
changes.  With a further revised comment.

	* config/rs6000/rs6000.c (rs6000_rtx_costs): Reduce cost of SET
	operands.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8969baa4dcf..2d770afd8fe 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21599,6 +21599,35 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	}
       break;
 
+    case SET:
+      /* On entry the value in *TOTAL is the number of general purpose
+	 regs being set, multiplied by COSTS_N_INSNS (1).  Handle
+	 costing of set operands specially since in most cases we have
+	 an instruction rather than just a piece of RTL and should
+	 return a cost comparable to insn_cost.  That's a little
+	 complicated because in some cases the cost of SET operands is
+	 non-zero, see point 5 above and cost of PLUS for example, and
+	 in others it is zero, for example for (set (reg) (reg)).
+	 But (set (reg) (reg)) has the same insn_cost as
+	 (set (reg) (plus (reg) (reg))).  Hack around this by
+	 subtracting COSTS_N_INSNS (1) from the operand cost in cases
+	 were we add at least COSTS_N_INSNS (1) for some operation.
+	 However, don't do so for constants.  Constants might cost
+	 more than zero when they require more than one instruction,
+	 and we do want the cost of extra instructions.  */
+      {
+	rtx_code src_code = GET_CODE (SET_SRC (x));
+	if (src_code == CONST_INT
+	    || src_code == CONST_DOUBLE
+	    || src_code == CONST_WIDE_INT)
+	  return false;
+	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
+			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));
+	if (set_cost >= COSTS_N_INSNS (1))
+	  *total += set_cost - COSTS_N_INSNS (1);
+	return true;
+      }
+
     default:
       break;
     }
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fb5fe7969a3..86c90c4d756 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21277,15 +21277,19 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
     case PLUS:
     case MINUS:
-      if (FLOAT_MODE_P (mode))
+      if (speed && FLOAT_MODE_P (mode))
 	*total = rs6000_cost->fp;
       else
 	*total = COSTS_N_INSNS (1);
       return false;
 
     case MULT:
-      if (CONST_INT_P (XEXP (x, 1))
-	  && satisfies_constraint_I (XEXP (x, 1)))
+      if (!speed)
+	/* A little more than one insn so that nothing is tempted to
+	   turn a shift left into a multiply.  */
+	*total = COSTS_N_INSNS (1) + 1;
+      else if (CONST_INT_P (XEXP (x, 1))
+	       && satisfies_constraint_I (XEXP (x, 1)))
 	{
 	  if (INTVAL (XEXP (x, 1)) >= -256
 	      && INTVAL (XEXP (x, 1)) <= 255)
@@ -21304,18 +21308,22 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case FMA:
-      if (mode == SFmode)
+      if (!speed)
+	*total = COSTS_N_INSNS (1) + 1;
+      else if (mode == SFmode)
 	*total = rs6000_cost->fp;
       else
 	*total = rs6000_cost->dmul;
-      break;
+      return false;
 
     case DIV:
     case MOD:
       if (FLOAT_MODE_P (mode))
 	{
-	  *total = mode == DFmode ? rs6000_cost->ddiv
-				  : rs6000_cost->sdiv;
+	  if (!speed)
+	    *total = COSTS_N_INSNS (1) + 2;
+	  else
+	    *total = mode == DFmode ? rs6000_cost->ddiv : rs6000_cost->sdiv;
 	  return false;
 	}
       /* FALLTHRU */
@@ -21334,7 +21342,9 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	}
       else
 	{
-	  if (GET_MODE (XEXP (x, 1)) == DImode)
+	  if (!speed)
+	    *total = COSTS_N_INSNS (1) + 2;
+	  else if (GET_MODE (XEXP (x, 1)) == DImode)
 	    *total = rs6000_cost->divdi;
 	  else
 	    *total = rs6000_cost->divsi;
@@ -21368,6 +21378,7 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case AND:
+      *total = COSTS_N_INSNS (1);
       right = XEXP (x, 1);
       if (CONST_INT_P (right))
 	{
@@ -21380,15 +21391,15 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	       || left_code == LSHIFTRT)
 	      && rs6000_is_valid_shift_mask (right, left, mode))
 	    {
-	      *total = rtx_cost (XEXP (left, 0), mode, left_code, 0, speed);
-	      if (!CONST_INT_P (XEXP (left, 1)))
-		*total += rtx_cost (XEXP (left, 1), SImode, left_code, 1, speed);
-	      *total += COSTS_N_INSNS (1);
+	      rtx reg_op = XEXP (left, 0);
+	      if (!REG_P (reg_op))
+		*total += rtx_cost (reg_op, mode, left_code, 0, speed);
+	      reg_op = XEXP (left, 1);
+	      if (!REG_P (reg_op) && !CONST_INT_P (reg_op))
+		*total += rtx_cost (reg_op, mode, left_code, 1, speed);
 	      return true;
 	    }
 	}
-
-      *total = COSTS_N_INSNS (1);
       return false;
 
     case IOR:
@@ -21519,7 +21530,9 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
       if (outer_code == TRUNCATE
 	  && GET_CODE (XEXP (x, 0)) == MULT)
 	{
-	  if (mode == DImode)
+	  if (!speed)
+	    *total = COSTS_N_INSNS (1) + 1;
+	  else if (mode == DImode)
 	    *total = rs6000_cost->muldi;
 	  else
 	    *total = rs6000_cost->mulsi;
@@ -21554,11 +21567,16 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
     case FIX:
     case UNSIGNED_FIX:
     case FLOAT_TRUNCATE:
-      *total = rs6000_cost->fp;
+      if (!speed)
+	*total = COSTS_N_INSNS (1);
+      else
+	*total = rs6000_cost->fp;
       return false;
 
     case FLOAT_EXTEND:
-      if (mode == DFmode)
+      if (!speed)
+	*total = COSTS_N_INSNS (1);
+      else if (mode == DFmode)
 	*total = rs6000_cost->sfdf_convert;
       else
 	*total = rs6000_cost->fp;
@@ -21576,7 +21594,7 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = rs6000_cost->fp;
 	  return false;
 	}
-      break;
+      return false;
 
     case NE:
     case EQ:
@@ -21614,13 +21632,32 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	  *total = 0;
 	  return true;
 	}
-      break;
+      return false;
+
+    case SET:
+      /* The default cost of a SET is the number of general purpose
+	 regs being set multiplied by COSTS_N_INSNS (1).  That only
+	 works where the incremental cost of the operation and
+	 operands is zero, when the operation performed can be done in
+	 one instruction.  For other cases where we add COSTS_N_INSNS
+	 for some operation (see point 5 above), COSTS_N_INSNS (1)
+	 should be subtracted from the total cost.  */
+      {
+	rtx_code src_code = GET_CODE (SET_SRC (x));
+	if (src_code == CONST_INT
+	    || src_code == CONST_DOUBLE
+	    || src_code == CONST_WIDE_INT)
+	  return false;
+	int set_cost = (rtx_cost (SET_SRC (x), mode, SET, 1, speed)
+			+ rtx_cost (SET_DEST (x), mode, SET, 0, speed));
+	if (set_cost >= COSTS_N_INSNS (1))
+	  *total += set_cost - COSTS_N_INSNS (1);
+	return true;
+      }
 
     default:
-      break;
+      return false;
     }
-
-  return false;
 }
 
 /* Debug form of r6000_rtx_costs that is selected if -mdebug=cost.  */