Message ID | VI1PR0802MB2176E1CEFF7CE6F25CFF7A3DE7190@VI1PR0802MB2176.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
On Tue, Apr 18, 2017 at 12:44 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: > Hi, > This patch handles more cheap cases in function force_expr_to_var_cost, specifically, > TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR. > > Is it OK? I wonder if using add_cost is good here. TRUNC_DIV by power of two better matches shift_cost, no, or div_pow2_cheap? Likewise for LSHIFT/RSHIFT. We do have [us]div_cost as well btw. And we have neg_cost. I mean if we're going all the way of using target based costs in force_expr_to_var_cost rather than estimate_num_insns then we should do as good as we can? Richard. > Thanks, > bin > 2017-04-11 Bin Cheng <bin.cheng@arm.com> > > * tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Handle more > operators: TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR > and BIT_NOT_EXPR.
On Wed, Apr 26, 2017 at 1:58 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, Apr 18, 2017 at 12:44 PM, Bin Cheng <Bin.Cheng@arm.com> wrote: >> Hi, >> This patch handles more cheap cases in function force_expr_to_var_cost, specifically, >> TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR. >> >> Is it OK? > > I wonder if using add_cost is good here. TRUNC_DIV by power of two > better matches > shift_cost, no, or div_pow2_cheap? Likewise for LSHIFT/RSHIFT. We do > have [us]div_cost as well btw. > And we have neg_cost. > > I mean if we're going all the way of using target based costs in > force_expr_to_var_cost rather > than estimate_num_insns then we should do as good as we can? I tend to believe it doesn't matter much. We can do experiments in the future. On the contrary, more generalization/abstraction might be good here so that we only need to measure cost by number of simple alu operations. And existing code uses add_cost rather than neg_cost for negate_expr. One argument for general cost is, in general, cost computed by force_expr_to_var_cost will be amortized by loop niter. As for induction related computation, add/sub/scale/mult are the most common operations anyway. Thanks, bin > > Richard. > > >> Thanks, >> bin >> 2017-04-11 Bin Cheng <bin.cheng@arm.com> >> >> * tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Handle more >> operators: TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR >> and BIT_NOT_EXPR.
On 04/26/2017 06:58 AM, Richard Biener wrote: > On Tue, Apr 18, 2017 at 12:44 PM, Bin Cheng <Bin.Cheng@arm.com> > wrote: >> Hi, This patch handles more cheap cases in function >> force_expr_to_var_cost, specifically, TRUNC_DIV_EXPR, BIT_AND_EXPR, >> BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR. >> >> Is it OK? > > I wonder if using add_cost is good here. TRUNC_DIV by power of two > better matches shift_cost, no, or div_pow2_cheap? Likewise for > LSHIFT/RSHIFT. We do have [us]div_cost as well btw. And we have > neg_cost. In an ideal world, we'd have a canoncial form and just handle hte canonical form. But that hasn't ever really panned out for this kind of stuff in RTL -- the decision about what is the preferred form of an expression changes based on use context. I don't think these problems are as bad at the gimple level, but clearly they still exist. The more we query the target, the less predictable the compiler's behavior will be over time. It was a huge problem in RTL leading us to a point where it became exceedingly difficult to predict how a change in a pass would ultimately affect the performance across targets. That led to a guiding principle that we want to avoid querying the target in gimple as much as possible. We've relaxed that somewhat (we have to be pragmatic), but we need to be real careful here. So my recommendation would be to define a set of costs for gimple and get those as solid as we can given an "ideal" target. Only query the target for cases where it's critical. Jeff
On Thu, Apr 27, 2017 at 4:30 PM, Jeff Law <law@redhat.com> wrote: > On 04/26/2017 06:58 AM, Richard Biener wrote: >> >> On Tue, Apr 18, 2017 at 12:44 PM, Bin Cheng <Bin.Cheng@arm.com> >> wrote: >>> >>> Hi, This patch handles more cheap cases in function >>> force_expr_to_var_cost, specifically, TRUNC_DIV_EXPR, BIT_AND_EXPR, >>> BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR. >>> >>> Is it OK? >> >> >> I wonder if using add_cost is good here. TRUNC_DIV by power of two better >> matches shift_cost, no, or div_pow2_cheap? Likewise for >> LSHIFT/RSHIFT. We do have [us]div_cost as well btw. And we have >> neg_cost. > > In an ideal world, we'd have a canoncial form and just handle hte canonical > form. But that hasn't ever really panned out for this kind of stuff in RTL > -- the decision about what is the preferred form of an expression changes > based on use context. > > I don't think these problems are as bad at the gimple level, but clearly > they still exist. > > The more we query the target, the less predictable the compiler's behavior > will be over time. It was a huge problem in RTL leading us to a point > where it became exceedingly difficult to predict how a change in a pass > would ultimately affect the performance across targets. Yeah, this is what I had in mind, but not expressed as clear as this. The patch's intention is to differentiate between expensive (div) and cheap operations. Thanks, bin > > That led to a guiding principle that we want to avoid querying the target in > gimple as much as possible. We've relaxed that somewhat (we have to be > pragmatic), but we need to be real careful here. > > So my recommendation would be to define a set of costs for gimple and get > those as solid as we can given an "ideal" target. Only query the target for > cases where it's critical. > > > Jeff
On Thu, Apr 27, 2017 at 5:45 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Thu, Apr 27, 2017 at 4:30 PM, Jeff Law <law@redhat.com> wrote: >> On 04/26/2017 06:58 AM, Richard Biener wrote: >>> >>> On Tue, Apr 18, 2017 at 12:44 PM, Bin Cheng <Bin.Cheng@arm.com> >>> wrote: >>>> >>>> Hi, This patch handles more cheap cases in function >>>> force_expr_to_var_cost, specifically, TRUNC_DIV_EXPR, BIT_AND_EXPR, >>>> BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR. >>>> >>>> Is it OK? >>> >>> >>> I wonder if using add_cost is good here. TRUNC_DIV by power of two better >>> matches shift_cost, no, or div_pow2_cheap? Likewise for >>> LSHIFT/RSHIFT. We do have [us]div_cost as well btw. And we have >>> neg_cost. >> >> In an ideal world, we'd have a canoncial form and just handle hte canonical >> form. But that hasn't ever really panned out for this kind of stuff in RTL >> -- the decision about what is the preferred form of an expression changes >> based on use context. >> >> I don't think these problems are as bad at the gimple level, but clearly >> they still exist. >> >> The more we query the target, the less predictable the compiler's behavior >> will be over time. It was a huge problem in RTL leading us to a point >> where it became exceedingly difficult to predict how a change in a pass >> would ultimately affect the performance across targets. > Yeah, this is what I had in mind, but not expressed as clear as this. > The patch's intention is to differentiate between expensive (div) and > cheap operations. Ok. Let's go with the patch as-is then and if possible change the code to use estimate_num_insns instead (but I guess we may end up comparing the cost to the address costs we compute more exactly?) Richard. > Thanks, > bin >> >> That led to a guiding principle that we want to avoid querying the target in >> gimple as much as possible. We've relaxed that somewhat (we have to be >> pragmatic), but we need to be real careful here. >> >> So my recommendation would be to define a set of costs for gimple and get >> those as solid as we can given an "ideal" target. Only query the target for >> cases where it's critical. >> >> >> Jeff
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 6f64d71..c9cf9cf 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -4087,6 +4087,11 @@ force_expr_to_var_cost (tree expr, bool speed) case PLUS_EXPR: case MINUS_EXPR: case MULT_EXPR: + case TRUNC_DIV_EXPR: + case BIT_AND_EXPR: + case BIT_IOR_EXPR: + case LSHIFT_EXPR: + case RSHIFT_EXPR: op0 = TREE_OPERAND (expr, 0); op1 = TREE_OPERAND (expr, 1); STRIP_NOPS (op0); @@ -4095,6 +4100,7 @@ force_expr_to_var_cost (tree expr, bool speed) CASE_CONVERT: case NEGATE_EXPR: + case BIT_NOT_EXPR: op0 = TREE_OPERAND (expr, 0); STRIP_NOPS (op0); op1 = NULL_TREE; @@ -4163,6 +4169,23 @@ force_expr_to_var_cost (tree expr, bool speed) return comp_cost (target_spill_cost [speed], 0); break; + case TRUNC_DIV_EXPR: + /* Division by power of two is usually cheap, so we allow it. Forbid + anything else. */ + if (integer_pow2p (TREE_OPERAND (expr, 1))) + cost = comp_cost (add_cost (speed, mode), 0); + else + cost = comp_cost (target_spill_cost[speed], 0); + break; + + case BIT_AND_EXPR: + case BIT_IOR_EXPR: + case BIT_NOT_EXPR: + case LSHIFT_EXPR: + case RSHIFT_EXPR: + cost = comp_cost (add_cost (speed, mode), 0); + break; + default: gcc_unreachable (); }
Hi, This patch handles more cheap cases in function force_expr_to_var_cost, specifically, TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR. Is it OK? Thanks, bin 2017-04-11 Bin Cheng <bin.cheng@arm.com> * tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Handle more operators: TRUNC_DIV_EXPR, BIT_AND_EXPR, BIT_IOR_EXPR, RSHIFT_EXPR and BIT_NOT_EXPR. From 83045d32b974cb657e1d471c15f67a5b190f2534 Mon Sep 17 00:00:00 2001 From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com> Date: Fri, 17 Mar 2017 10:04:29 +0000 Subject: [PATCH 14/33] cheap-arith_op-in-force_expr-20170225.txt --- gcc/tree-ssa-loop-ivopts.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)