diff mbox

[GCC8,14/33] Handle more cheap operations in force_expr_to_var_cost

Message ID VI1PR0802MB2176E1CEFF7CE6F25CFF7A3DE7190@VI1PR0802MB2176.eurprd08.prod.outlook.com
State New
Headers show

Commit Message

Bin Cheng April 18, 2017, 10:44 a.m. UTC
Hi,
This patch handles more cheap cases in function force_expr_to_var_cost, specifically,
TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR.

Is it OK?

Thanks,
bin
2017-04-11  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Handle more
	operators: TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR
	and BIT_NOT_EXPR.
From 83045d32b974cb657e1d471c15f67a5b190f2534 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Fri, 17 Mar 2017 10:04:29 +0000
Subject: [PATCH 14/33] cheap-arith_op-in-force_expr-20170225.txt

---
 gcc/tree-ssa-loop-ivopts.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Richard Biener April 26, 2017, 12:58 p.m. UTC | #1
On Tue, Apr 18, 2017 at 12:44 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> This patch handles more cheap cases in function force_expr_to_var_cost, specifically,
> TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR.
>
> Is it OK?

I wonder if using add_cost is good here.  TRUNC_DIV by power of two
better matches
shift_cost, no, or div_pow2_cheap?  Likewise for LSHIFT/RSHIFT.  We do
have [us]div_cost as well btw.
And we have neg_cost.

I mean if we're going all the way of using target based costs in
force_expr_to_var_cost rather
than estimate_num_insns then we should do as good as we can?

Richard.


> Thanks,
> bin
> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Handle more
>         operators: TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR
>         and BIT_NOT_EXPR.
Bin.Cheng April 26, 2017, 1:30 p.m. UTC | #2
On Wed, Apr 26, 2017 at 1:58 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Apr 18, 2017 at 12:44 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> Hi,
>> This patch handles more cheap cases in function force_expr_to_var_cost, specifically,
>> TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR.
>>
>> Is it OK?
>
> I wonder if using add_cost is good here.  TRUNC_DIV by power of two
> better matches
> shift_cost, no, or div_pow2_cheap?  Likewise for LSHIFT/RSHIFT.  We do
> have [us]div_cost as well btw.
> And we have neg_cost.
>
> I mean if we're going all the way of using target based costs in
> force_expr_to_var_cost rather
> than estimate_num_insns then we should do as good as we can?
I tend to believe it doesn't matter much.  We can do experiments in
the future.  On the contrary, more generalization/abstraction might be
good here so that we only need to measure cost by number of simple alu
operations.  And existing code uses add_cost rather than neg_cost for
negate_expr.
One argument for general cost is, in general, cost computed by
force_expr_to_var_cost will be amortized by loop niter.  As for
induction related computation, add/sub/scale/mult are the most common
operations anyway.

Thanks,
bin
>
> Richard.
>
>
>> Thanks,
>> bin
>> 2017-04-11  Bin Cheng  <bin.cheng@arm.com>
>>
>>         * tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Handle more
>>         operators: TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR
>>         and BIT_NOT_EXPR.
Jeff Law April 27, 2017, 3:30 p.m. UTC | #3
On 04/26/2017 06:58 AM, Richard Biener wrote:
> On Tue, Apr 18, 2017 at 12:44 PM, Bin Cheng <Bin.Cheng@arm.com>
> wrote:
>> Hi, This patch handles more cheap cases in function
>> force_expr_to_var_cost, specifically, TRUNC_DIV_EXPR, BIT_AND_EXPR,
>> BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR.
>> 
>> Is it OK?
> 
> I wonder if using add_cost is good here.  TRUNC_DIV by power of two 
> better matches shift_cost, no, or div_pow2_cheap?  Likewise for
> LSHIFT/RSHIFT.  We do have [us]div_cost as well btw. And we have
> neg_cost.
In an ideal world, we'd have a canoncial form and just handle hte 
canonical form.  But that hasn't ever really panned out for this kind of 
stuff in RTL -- the decision about what is the preferred form of an 
expression changes based on use context.

I don't think these problems are as bad at the gimple level, but clearly 
they still exist.

The more we query the target, the less predictable the compiler's 
behavior will be over time.   It was a huge problem in RTL leading us to 
a point where it became exceedingly difficult to predict how a change in 
a pass would ultimately affect the performance across targets.

That led to a guiding principle that we want to avoid querying the 
target in gimple as much as possible.  We've relaxed that somewhat (we 
have to be pragmatic), but we need to be real careful here.

So my recommendation would be to define a set of costs for gimple and 
get those as solid as we can given an "ideal" target.  Only query the 
target for cases where it's critical.


Jeff
Bin.Cheng April 27, 2017, 3:45 p.m. UTC | #4
On Thu, Apr 27, 2017 at 4:30 PM, Jeff Law <law@redhat.com> wrote:
> On 04/26/2017 06:58 AM, Richard Biener wrote:
>>
>> On Tue, Apr 18, 2017 at 12:44 PM, Bin Cheng <Bin.Cheng@arm.com>
>> wrote:
>>>
>>> Hi, This patch handles more cheap cases in function
>>> force_expr_to_var_cost, specifically, TRUNC_DIV_EXPR, BIT_AND_EXPR,
>>> BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR.
>>>
>>> Is it OK?
>>
>>
>> I wonder if using add_cost is good here.  TRUNC_DIV by power of two better
>> matches shift_cost, no, or div_pow2_cheap?  Likewise for
>> LSHIFT/RSHIFT.  We do have [us]div_cost as well btw. And we have
>> neg_cost.
>
> In an ideal world, we'd have a canoncial form and just handle hte canonical
> form.  But that hasn't ever really panned out for this kind of stuff in RTL
> -- the decision about what is the preferred form of an expression changes
> based on use context.
>
> I don't think these problems are as bad at the gimple level, but clearly
> they still exist.
>
> The more we query the target, the less predictable the compiler's behavior
> will be over time.   It was a huge problem in RTL leading us to a point
> where it became exceedingly difficult to predict how a change in a pass
> would ultimately affect the performance across targets.
Yeah, this is what I had in mind, but not expressed as clear as this.
The patch's intention is to differentiate between expensive (div) and
cheap operations.

Thanks,
bin
>
> That led to a guiding principle that we want to avoid querying the target in
> gimple as much as possible.  We've relaxed that somewhat (we have to be
> pragmatic), but we need to be real careful here.
>
> So my recommendation would be to define a set of costs for gimple and get
> those as solid as we can given an "ideal" target.  Only query the target for
> cases where it's critical.
>
>
> Jeff
Richard Biener May 2, 2017, 1:44 p.m. UTC | #5
On Thu, Apr 27, 2017 at 5:45 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Apr 27, 2017 at 4:30 PM, Jeff Law <law@redhat.com> wrote:
>> On 04/26/2017 06:58 AM, Richard Biener wrote:
>>>
>>> On Tue, Apr 18, 2017 at 12:44 PM, Bin Cheng <Bin.Cheng@arm.com>
>>> wrote:
>>>>
>>>> Hi, This patch handles more cheap cases in function
>>>> force_expr_to_var_cost, specifically, TRUNC_DIV_EXPR, BIT_AND_EXPR,
>>>> BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR.
>>>>
>>>> Is it OK?
>>>
>>>
>>> I wonder if using add_cost is good here.  TRUNC_DIV by power of two better
>>> matches shift_cost, no, or div_pow2_cheap?  Likewise for
>>> LSHIFT/RSHIFT.  We do have [us]div_cost as well btw. And we have
>>> neg_cost.
>>
>> In an ideal world, we'd have a canoncial form and just handle hte canonical
>> form.  But that hasn't ever really panned out for this kind of stuff in RTL
>> -- the decision about what is the preferred form of an expression changes
>> based on use context.
>>
>> I don't think these problems are as bad at the gimple level, but clearly
>> they still exist.
>>
>> The more we query the target, the less predictable the compiler's behavior
>> will be over time.   It was a huge problem in RTL leading us to a point
>> where it became exceedingly difficult to predict how a change in a pass
>> would ultimately affect the performance across targets.
> Yeah, this is what I had in mind, but not expressed as clear as this.
> The patch's intention is to differentiate between expensive (div) and
> cheap operations.

Ok.  Let's go with the patch as-is then and if possible change the code
to use estimate_num_insns instead (but I guess we may end up comparing
the cost to the address costs we compute more exactly?)

Richard.

> Thanks,
> bin
>>
>> That led to a guiding principle that we want to avoid querying the target in
>> gimple as much as possible.  We've relaxed that somewhat (we have to be
>> pragmatic), but we need to be real careful here.
>>
>> So my recommendation would be to define a set of costs for gimple and get
>> those as solid as we can given an "ideal" target.  Only query the target for
>> cases where it's critical.
>>
>>
>> Jeff
diff mbox

Patch

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 6f64d71..c9cf9cf 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -4087,6 +4087,11 @@  force_expr_to_var_cost (tree expr, bool speed)
     case PLUS_EXPR:
     case MINUS_EXPR:
     case MULT_EXPR:
+    case TRUNC_DIV_EXPR:
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+    case LSHIFT_EXPR:
+    case RSHIFT_EXPR:
       op0 = TREE_OPERAND (expr, 0);
       op1 = TREE_OPERAND (expr, 1);
       STRIP_NOPS (op0);
@@ -4095,6 +4100,7 @@  force_expr_to_var_cost (tree expr, bool speed)
 
     CASE_CONVERT:
     case NEGATE_EXPR:
+    case BIT_NOT_EXPR:
       op0 = TREE_OPERAND (expr, 0);
       STRIP_NOPS (op0);
       op1 = NULL_TREE;
@@ -4163,6 +4169,23 @@  force_expr_to_var_cost (tree expr, bool speed)
 	return comp_cost (target_spill_cost [speed], 0);
       break;
 
+    case TRUNC_DIV_EXPR:
+      /* Division by power of two is usually cheap, so we allow it.  Forbid
+	 anything else.  */
+      if (integer_pow2p (TREE_OPERAND (expr, 1)))
+	cost = comp_cost (add_cost (speed, mode), 0);
+      else
+	cost = comp_cost (target_spill_cost[speed], 0);
+      break;
+
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+    case BIT_NOT_EXPR:
+    case LSHIFT_EXPR:
+    case RSHIFT_EXPR:
+      cost = comp_cost (add_cost (speed, mode), 0);
+      break;
+
     default:
       gcc_unreachable ();
     }