Message ID | 834106856.39408357.1385171003989.JavaMail.root@redhat.com |
---|---|
State | New |
Headers | show |
On 11/22/13 18:43, Kai Tietz wrote: >> So at least for f1 and f2, these regexps match regardless of whether or >> not the code was transformed. >> >> Without your patch (f1) >> x1_2 = (unsigned char) x_1(D); >> _3 = x1_2 << 5; >> return _3; >> >> With your patch (f1): >> _4 = x_1(D) << 5; >> _2 = (unsigned char) _4; >> return _2; >> >> So all that's happened is we changed which object got casted -- we have >> the same number casts of an object to (unsigned char). > > That is actual wished. We shouldn't come to patterns, which have more type-casts by this patch. > What we see here is the normalization of shift-left/right operations. This patch takes care that we prefer in general (Type) (X shift-left/right y) over ((Type) X) shift-left/right y notation. Right, my point is the test didn't actually check for normalization, it just checks if there's a particular number/type of casts. I think it's also worth pointing out for the record this patch in and of itself really isn't an optimization. It's setting a preferred form for shifts. I won't call it canonicalization because we don't insist on the this form, but it's the form we want to see. Since it's just setting a preferred form and not an optimization, it seems to me we should expect it to sometimes generate better code and sometimes worse. I'll take a closer look after I fix my arm regression :-) jeff
On 11/22/13 18:43, Kai Tietz wrote: > ----- Original Message ----- > > That is actual wished. We shouldn't come to patterns, which have more type-casts by this patch. > What we see here is the normalization of shift-left/right operations. This patch takes care that we prefer in general (Type) (X shift-left/right y) over ((Type) X) shift-left/right y notation. What is the rationale behind one for over the other? Or is it arbitrary? I can easily make a case for either form based on what we're trying to optimize. In general, it seems to me that.... If we're more interested in combining the shift with statements that define the shift's operands, then the former is going to be a better representation because we're typecasting the result and thus the typecast doesn't interfere with the desired optimization. In contrast if we are more interested in cases where the shift defines operands for some other statement, then the latter form is more beneficial because we're typecasting the inputs and thus the typecast doesn't interfere with the desired optimization. Unfortunately, within a pass (or even a phase of a pass) we can have both cases. Let's look at the code in tree-ssa-forwprop.c you're proposing we change. The code in question loops over every statement and tries to combine the current statement with the statements which define its operands, then it moves on to the next statement in execution order. So your patch seems to make perfect sense when STMT is a shift. It moves the typecast out of the way of combining STMT with the statements which define its operands. However, once we move to the next statement in the IL, the shift is now defining operands for later statements and the other form would seem to be preferred. So it almost seems to me you want to apply the transformation on STMT, try to combine into the STMT, then apply the reverse transformation on STMT if it's safe to do so. That kindof implies there really isn't a perferred global form for shifts with typecasts, but instead there is a form that is preferred based on the optimizations we're trying to perform at any given time. It may also argue that type promotion/demotion in general shouldn't be separate passes. I'm still pondering that. Thoughts? Jeff
On Tue, Nov 26, 2013 at 8:00 AM, Jeff Law <law@redhat.com> wrote: > On 11/22/13 18:43, Kai Tietz wrote: >> >> ----- Original Message ----- >> >> That is actual wished. We shouldn't come to patterns, which have more >> type-casts by this patch. >> What we see here is the normalization of shift-left/right operations. >> This patch takes care that we prefer in general (Type) (X shift-left/right >> y) over ((Type) X) shift-left/right y notation. > > > What is the rationale behind one for over the other? Or is it arbitrary? I > can easily make a case for either form based on what we're trying to > optimize. In general, it seems to me that.... When (Type) is a widening cast then you have to worry to not introduce undefined behavior when y is larger than the number of bits in X. When (Type) is a shortening cast then we lose information - because then the range we can determine for y from the shift operation will be larger. So I'm not sure I follow the reasoning to move out the casting when possible. The goal for canonicalization should be to shorten operations where possible. Thus both cases, sinking and hoisting the cast should be applied dependent on validity and whether the shift will be performed in a smaller type in the end. > If we're more interested in combining the shift with statements that define > the shift's operands, then the former is going to be a better representation > because we're typecasting the result and thus the typecast doesn't interfere > with the desired optimization. > > In contrast if we are more interested in cases where the shift defines > operands for some other statement, then the latter form is more beneficial > because we're typecasting the inputs and thus the typecast doesn't interfere > with the desired optimization. Always a trade-off which is why combining passes always have to consider casts. > Unfortunately, within a pass (or even a phase of a pass) we can have both > cases. Let's look at the code in tree-ssa-forwprop.c you're proposing we > change. > > The code in question loops over every statement and tries to combine the > current statement with the statements which define its operands, then it > moves on to the next statement in execution order. > > So your patch seems to make perfect sense when STMT is a shift. It moves > the typecast out of the way of combining STMT with the statements which > define its operands. > > However, once we move to the next statement in the IL, the shift is now > defining operands for later statements and the other form would seem to be > preferred. > > So it almost seems to me you want to apply the transformation on STMT, try > to combine into the STMT, then apply the reverse transformation on STMT if > it's safe to do so. > > That kindof implies there really isn't a perferred global form for shifts > with typecasts, but instead there is a form that is preferred based on the > optimizations we're trying to perform at any given time. It may also argue > that type promotion/demotion in general shouldn't be separate passes. I'm > still pondering that. ;) Not an easy problem. > Thoughts? I think the goal of "canonicalizing" operations to work on the smallest mode possible during early GIMPLE passes makes the most sense (even if then a cast is not always outside or inside an operation). Later (currently during RTL expansion) we can widen operations again according to target needs. Shorter operations simply carry more implicit information on value ranges (and are usually easier to vectorize for example). Richard. > Jeff > >
On 11/26/13 02:58, Richard Biener wrote: >> What is the rationale behind one for over the other? Or is it arbitrary? I >> can easily make a case for either form based on what we're trying to >> optimize. In general, it seems to me that.... > > When (Type) is a widening cast then you have to worry to not introduce > undefined behavior when y is larger than the number of bits in X. > When (Type) is a shortening cast then we lose information - because > then the range we can determine for y from the shift operation will > be larger. > > So I'm not sure I follow the reasoning to move out the casting when > possible. Assume that we're not introducing undefined behaviour. Obviously if we're introducing undefined behaviour, then the transformation is wrong, plain and simple. If by moving the cast we can eliminate an ALU operation because the shift (in this example) combines with something earlier or later in the IL, then we win. Similarly if by moving the cast we expose redundant casting, then we win. A narrower range is obviously good, but often just having a narrower range won't in and of itself provide an optimization opportunity. > The goal for canonicalization should be to shorten operations where > possible. Thus both cases, sinking and hoisting the cast should > be applied dependent on validity and whether the shift will be performed > in a smaller type in the end. I disagree, there are going to be times when sinking or hoisting the cast allows the shift to combine with other instructions in the IL. There are times when the ability to combine with other instructions in the stream should be driving these decisions. I would buy that as a guiding principle that applies in cases where we don't have other optimization opportunities we can expose by hoisting/sinking the type casts. I would even buy that we prefer the narrowest range through some point in the compiler (say vrp2), then we allow wider ranges as needed to facilitate other optimizations. > >> If we're more interested in combining the shift with statements that define >> the shift's operands, then the former is going to be a better representation >> because we're typecasting the result and thus the typecast doesn't interfere >> with the desired optimization. >> >> In contrast if we are more interested in cases where the shift defines >> operands for some other statement, then the latter form is more beneficial >> because we're typecasting the inputs and thus the typecast doesn't interfere >> with the desired optimization. > > Always a trade-off which is why combining passes always have to > consider casts. Actually, I would say quite the opposite here. There's a very clear line when we want to go from one representation to the other within the forwprop passes. By doing so, we relieve the pattern matching aspects of that pass from needing to concern themselves with the typecasting issues. > > I think the goal of "canonicalizing" operations to work on the smallest > mode possible during early GIMPLE passes makes the most sense > (even if then a cast is not always outside or inside an operation). > > Later (currently during RTL expansion) we can widen operations again > according to target needs. But what you end up missing here is optimization opportunities that expose themselves when the casts are moved from inputs to outputs or vice-versa. > > Shorter operations simply carry more implicit information on value ranges > (and are usually easier to vectorize for example). Yes. There's no argument about that. jeff
On Tue, Nov 26, 2013 at 10:42 PM, Jeff Law <law@redhat.com> wrote: > On 11/26/13 02:58, Richard Biener wrote: >>> >>> What is the rationale behind one for over the other? Or is it arbitrary? >>> I >>> can easily make a case for either form based on what we're trying to >>> optimize. In general, it seems to me that.... >> >> >> When (Type) is a widening cast then you have to worry to not introduce >> undefined behavior when y is larger than the number of bits in X. >> When (Type) is a shortening cast then we lose information - because >> then the range we can determine for y from the shift operation will >> be larger. >> >> So I'm not sure I follow the reasoning to move out the casting when >> possible. > > Assume that we're not introducing undefined behaviour. Obviously if we're > introducing undefined behaviour, then the transformation is wrong, plain and > simple. > > If by moving the cast we can eliminate an ALU operation because the shift > (in this example) combines with something earlier or later in the IL, then > we win. Similarly if by moving the cast we expose redundant casting, then > we win. > > A narrower range is obviously good, but often just having a narrower range > won't in and of itself provide an optimization opportunity. > > > > >> The goal for canonicalization should be to shorten operations where >> possible. Thus both cases, sinking and hoisting the cast should >> be applied dependent on validity and whether the shift will be performed >> in a smaller type in the end. > > I disagree, there are going to be times when sinking or hoisting the cast > allows the shift to combine with other instructions in the IL. There are > times when the ability to combine with other instructions in the stream > should be driving these decisions. > > I would buy that as a guiding principle that applies in cases where we don't > have other optimization opportunities we can expose by hoisting/sinking the > type casts. I would even buy that we prefer the narrowest range through > some point in the compiler (say vrp2), then we allow wider ranges as needed > to facilitate other optimizations. > > > > >> >>> If we're more interested in combining the shift with statements that >>> define >>> the shift's operands, then the former is going to be a better >>> representation >>> because we're typecasting the result and thus the typecast doesn't >>> interfere >>> with the desired optimization. >>> >>> In contrast if we are more interested in cases where the shift defines >>> operands for some other statement, then the latter form is more >>> beneficial >>> because we're typecasting the inputs and thus the typecast doesn't >>> interfere >>> with the desired optimization. >> >> >> Always a trade-off which is why combining passes always have to >> consider casts. > > Actually, I would say quite the opposite here. There's a very clear line > when we want to go from one representation to the other within the forwprop > passes. By doing so, we relieve the pattern matching aspects of that pass > from needing to concern themselves with the typecasting issues. > > >> >> I think the goal of "canonicalizing" operations to work on the smallest >> mode possible during early GIMPLE passes makes the most sense >> (even if then a cast is not always outside or inside an operation). >> >> Later (currently during RTL expansion) we can widen operations again >> according to target needs. > > But what you end up missing here is optimization opportunities that expose > themselves when the casts are moved from inputs to outputs or vice-versa. Only if the "optimization opportunities" do not handle the casts themselves. We for example have the get_unwidened () helper in fold that helps transforms to look through some casts. I think we want similar helpers for the GIMPLE level combining - the combiner knows whether for an operand with type T it's ok to have a wider or narrower operand or if it wants an exact T2 operand instead. Helpers should exist that get it at that, if possible. Currently the issue would be how to represent the result of "get it at that", but I'll work on removing the tree-building-and-re-gimplifying way of middle-end expression creation for the next stage1 so that we can easily represent multiple-stmts intermediate "expressions". It'll be a (maybe wrapped in some C++ sugar, maybe then with non-GC allocation) gimple_seq. So you do op = get_me_the_op_in_type_X (X, gimple_assign_rhs2 (stmt)); if (!op) return false; and now op is just a regular op you can work with, eventually pointing to a temporary sequence that you'd need to insert. Note that forwprops combining should really work like fold - combine from the expression leafs to the expression roots (works currently by visiting in stmt order). Now consider (T2)(T1)(a << 5) and ((T1)(a << 5)) >> 5. The 2nd form would prefer if we'd have "combined" the leaf (T1)(a << 5) to ((T1)a << 5) while the first would be pessimized by that. So - what kind of form do you prefer? I'd say you can't tell that from the use(s) but you have to be able to decide without and thus cannot decide based on "may I better be able to combine this". Instead the combining code needs to deal with both forms. Richard. > > >> >> Shorter operations simply carry more implicit information on value ranges >> (and are usually easier to vectorize for example). > > Yes. There's no argument about that. > > jeff >
Index: gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c =================================================================== --- /dev/null +++ gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +char f1 (char x) +{ + unsigned char x1 = (unsigned char) x; + return (x1 << 5); +} + +short f2 (unsigned int x) +{ + unsigned short x1 = (unsigned short) x; + return (x1 << 15); +} + +unsigned int f3 (unsigned short x) +{ + unsigned long long x1 = (unsigned long long) x; + return (unsigned int) (x1 >> 15); +} + +unsigned long long f4 (unsigned short x) +{ + unsigned int x1 = (unsigned int) x; + return (unsigned long long) (x1 >> 7); +} + +/* { dg-final { scan-tree-dump-times "= \\\(long long unsigned int\\\)" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "= \\\(short int\\\)" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "= \\\(unsigned int\\\)" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "= \\\(unsigned char\\\)" 1 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ + Index: gcc-org/gcc/tree-ssa-forwprop.c =================================================================== --- gcc-org.orig/gcc/tree-ssa-forwprop.c +++ gcc-org/gcc/tree-ssa-forwprop.c @@ -551,6 +551,107 @@ forward_propagate_into_gimple_cond (gimp return 0; } +/* Try to normalize shift-left/right statement from form + ((TYPE1) X) CODE Y to (TYPE) (X CODE Y). TYPE, and X have to be + integral-kind types. + Returns none-zero if the stmt was changed. */ + +static int +simplify_shift (enum tree_code code, gimple_stmt_iterator *gsi_p) +{ + gimple stmt = gsi_stmt (*gsi_p); + gimple def, newop; + tree op1, opx, op2, t_opx, tem; + int ret; + + if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))) + return 0; + + op1 = gimple_assign_rhs1 (stmt); + if (TREE_CODE (op1) != SSA_NAME + || !(def = SSA_NAME_DEF_STMT (op1)) + || !is_gimple_assign (def) + || !gimple_assign_cast_p (def) + || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op1) + || !has_single_use (op1)) + return 0; + + opx = gimple_assign_rhs1 (def); + if (TREE_CODE (opx) != SSA_NAME) + return 0; + + t_opx = TREE_TYPE (opx); + + if (!INTEGRAL_TYPE_P (t_opx)) + return 0; + + op2 = gimple_assign_rhs2 (stmt); + + if (code == LSHIFT_EXPR) + { + if (TYPE_PRECISION (TREE_TYPE (op1)) + > TYPE_PRECISION (t_opx)) + return 0; + } + else if (code == RSHIFT_EXPR) + { + /* If type of OPX and OP1 are compatible, we don't need to check OP2 + for validity as we don't change range of operation. + Otherwise we need to check that right-hand operand of shift-right + doesn't exceed type-precision of inner operand. */ + if (!useless_type_conversion_p (TREE_TYPE (op1), t_opx)) + { + if (TREE_CODE (op2) != INTEGER_CST) + return 0; + if (integer_zerop (fold_binary (LT_EXPR, boolean_type_node, op2, + build_int_cst (TREE_TYPE (op2), + TYPE_PRECISION (t_opx))))) + return 0; + + /* See if cast can be moved out of the shift-right operation. */ + if (TYPE_PRECISION (TREE_TYPE (op1)) <= TYPE_PRECISION (t_opx) + || (!TYPE_UNSIGNED (t_opx) && TYPE_UNSIGNED (TREE_TYPE (op1)))) + return 0; + } + } + else + return 0; + + /* Do transformation ((T1) X) shift-code N -> (T1) (X shift-code N). */ + if (dump_file) + { + fprintf (dump_file, " Replaced "); + print_gimple_stmt (dump_file, stmt, 0, 0); + fprintf (dump_file, " "); + print_gimple_stmt (dump_file, def, 0, 0); + } + + tem = make_ssa_name (t_opx, NULL); + newop = gimple_build_assign_with_ops (code, tem, opx, op2); + gimple_set_location (newop, gimple_location (stmt)); + gsi_insert_before (gsi_p, newop, GSI_SAME_STMT); + gimple_assign_set_rhs_with_ops_1 (gsi_p, NOP_EXPR, tem, NULL_TREE, NULL_TREE); + + if (gimple_location (def) != UNKNOWN_LOCATION) + gimple_set_location (stmt, gimple_location (def)); + + stmt = gsi_stmt (*gsi_p); + update_stmt (stmt); + + if (TREE_CODE (op1) == SSA_NAME) + ret = remove_prop_source_from_use (op1) ? 2 : 1; + else + ret = 1; + if (dump_file) + { + fprintf (dump_file, " by "); + print_gimple_stmt (dump_file, stmt, 0, 0); + fprintf (dump_file, " "); + print_gimple_stmt (dump_file, newop, 0, 0); + } + + return ret; +} /* Propagate from the ssa name definition statements of COND_EXPR in the rhs of statement STMT into the conditional if that simplifies it. @@ -3432,6 +3533,15 @@ ssa_forward_propagate_and_combine (void) || code == NEGATE_EXPR) && TREE_CODE (rhs1) == SSA_NAME) changed = simplify_not_neg_expr (&gsi); + else if (code == LSHIFT_EXPR + || code == RSHIFT_EXPR) + { + int did_something; + did_something = simplify_shift (code, &gsi); + if (did_something == 2) + cfg_changed = true; + changed = did_something != 0; + } else if (code == COND_EXPR || code == VEC_COND_EXPR) {