diff mbox

[tree-ssa-forwprop] : Add type raising in shift-operations

Message ID 834106856.39408357.1385171003989.JavaMail.root@redhat.com
State New
Headers show

Commit Message

Kai Tietz Nov. 23, 2013, 1:43 a.m. UTC
----- Original Message -----
> On 11/21/13 08:15, Kai Tietz wrote:
> > Hi,
> >
> > this is the required forward-propagation part for the type-demotion pass as
> > separate patch.
> > This patch is sharing some adjustments in testsuite due new optimization of
> > type-casts on shift-operations.
> >
> > This patch tries to generate the "normal" form (TYPE) (X shift-op Y) out of
> > the "denormal" form "((TYPE) X) shift-op Y".
> >
> > ChangeLog
> >
> > 2013-11-21  Kai Tietz  <ktietz@redhat.com>
> >
> > 	* tree-ssa-forwprop.c (simplify_shift):  New function.
> > 	(ssa_forward_propagate_and_combine): Use it.
> >
> > ChangeLog testsuite
> >
> > 	* gcc.dg/tree-ssa-ts-shift-2.c: New test.
> >
> >
> > Shared testsuite part between type-demotion and forward-propagation
> > patches.
> >
> > Changelog gcc/testsuite:
> >
> > 	* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
> > 	* gcc.dg/vect/vect-over-widen-1.c: Likewise.
> > 	* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> > 	* gcc.dg/vect/vect-over-widen-3.c: Likewise.
> > 	* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> > 	* gcc.dg/vect/vect-over-widen-4.c: Likewise.
> >
> > Bootstrapped for x86_64-unknown-linux-gnu.  Ok for apply?
> >
> > Index: gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c
> > ===================================================================
> > --- /dev/null
> > +++ gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c
> > @@ -0,0 +1,33 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +unsigned char f1 (char x)
> > +{
> > +  unsigned char x1 = (unsigned char) x;
> > +  return (x1 << 5);
> > +}
> > +
> > +unsigned short f2 (unsigned int x)
> > +{
> > +  unsigned short x1 = (unsigned short) x;
> > +  return (x1 << 15);
> > +}
> > +
> > +unsigned int f3 (unsigned short x)
> > +{
> > +  unsigned long long x1 = (unsigned long long) x;
> > +  return (unsigned int) (x1 >> 15);
> > +}
> > +
> > +unsigned long long f4 (unsigned short x)
> > +{
> > +  unsigned int x1 = (unsigned int) x;
> > +  return (unsigned long long) (x1 >> 7);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "= \\\(long long unsigned int\\\)" 1
> > "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "= \\\(short unsigned int\\\)" 1
> > "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "= \\\(unsigned int\\\)" 1
> > "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "= \\\(unsigned char\\\)" 1
> > "optimized" } } */
> So at least for f1 and f2, these regexps match regardless of whether or
> not the code was transformed.
> 
> Without your patch (f1)
>    x1_2 = (unsigned char) x_1(D);
>    _3 = x1_2 << 5;
>    return _3;
> 
> With your patch (f1):
>    _4 = x_1(D) << 5;
>    _2 = (unsigned char) _4;
>    return _2;
> 
> So all that's happened is we changed which object got casted -- we have
> the same number casts of an object to (unsigned char).

That is actual wished.  We shouldn't come to patterns, which have more type-casts by this patch.
What we see here is the normalization of shift-left/right operations.  This patch takes care that we prefer in general (Type) (X shift-left/right y) over ((Type) X) shift-left/right y notation.


> ISTM you might be better off scanning the details of the forwprop dump.
>   For example, in f1 we get:
> 
>     Replaced _3 = x1_2 << 5;
>       x1_2 = (unsigned char) x_1(D);
>     by    by _3 = (unsigned char) _5;
>       _5 = x_1(D) << 5;
> 
> 
> We basically see the same thing happen in f2.
> 
> For f1 & f2 I don't see that we've done anything other than change the
> casts.  They don't show any benefit from this patch.
> 
> f3 & f4 on the other hand do show improvements.

I adjusted test so, that f1, and f2 are showing an effect, too.  I had kept here simple transformation to see normalized pattern-form.
In new variant the checks won't match without that patch anymore.
 
> 
> WHen I looked over a bunch of .i files, the results were mixed.
> Sometimes better, sometimes worse without any clear bias.

Hmm, this sample here is questionable.
 
> So here's an example that gets worse:
> 
> 
> *************** rshift_double (long unsigned int l1, lon
> *** 651,661 ****
>      _29 = (long int) _28;
>      *hv_13(D) = _29;
>      _31 = l1_9(D) >> _27;
> !   _32 = (unsigned int) count_4(D);
> !   _33 = 63 - _32;
> !   _34 = (int) _33;
> !   _35 = h1.18_26 << _34;
> !   _36 = _35 << 1;
>      _37 = _36 | _31;
>      *lv_12(D) = _37;
>    ;;    succ:       11
> --- 652,663 ----
>      _29 = (long int) _28;
>      *hv_13(D) = _29;
>      _31 = l1_9(D) >> _27;
> !   _33 = (unsigned int) count_4(D);
> !   _34 = 63 - _33;
> !   _35 = (int) _34;
> !   _71 = h1_10(D) << _35;
> !   _32 = _71 << 1;
> !   _36 = (long unsigned int) _32;
>      _37 = _36 | _31;
>      *lv_12(D) = _37;
> 
> 
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
> > +
> > Index: gcc-org/gcc/tree-ssa-forwprop.c
> > ===================================================================
> > --- gcc-org.orig/gcc/tree-ssa-forwprop.c
> > +++ gcc-org/gcc/tree-ssa-forwprop.c
> > @@ -551,6 +551,107 @@ forward_propagate_into_gimple_cond (gimp
> >     return 0;
> >   }
> >
> > +/* Try to simplify shift-statement ((TYPE1) X) CODE Y for
> > +   integral-kind types.
> > +   Returns none-zero if the stmt was changed.  */
> Simplify it to what?  I know from reading the patch explanation, but
> it'd be better to include that in a comment.  Even better would be to
> show a little sequence of psuedo-gimple and how it gets changed.

Agreed. Simplify might be the wrong term here. Due we are doing pattern-transformation - if possible - into its normalized form.
I modified comment here, and adjusted the print to display by-string only once.
 
> 
> > +
> > +static int
> > +simplify_shift (enum tree_code code, gimple_stmt_iterator *gsi_p)
> > +{
> > +  gimple stmt = gsi_stmt (*gsi_p);
> > +  gimple def, newop;
> > +  tree op1, opx, op2, t_opx, tem;
> > +  int ret;
> > +
> > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
> > +    return 0;
> > +
> > +  op1 = gimple_assign_rhs1 (stmt);
> > +  if (TREE_CODE (op1) != SSA_NAME
> > +      || !(def = SSA_NAME_DEF_STMT (op1))
> > +      || !is_gimple_assign (def)
> > +      || !gimple_assign_cast_p (def)
> > +      || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op1)
> > +      || !has_single_use (op1))
> > +    return 0;
> > +
> > +  opx = gimple_assign_rhs1 (def);
> > +  if (TREE_CODE (opx) != SSA_NAME)
> > +    return 0;
> > +
> > +  t_opx = TREE_TYPE (opx);
> > +
> > +  if (!INTEGRAL_TYPE_P (t_opx))
> > +    return 0;
> > +
> > +  op2 = gimple_assign_rhs2 (stmt);
> > +
> > +  if (code == LSHIFT_EXPR)
> > +    {
> > +      if (TYPE_PRECISION (TREE_TYPE (op1))
> > +	  > TYPE_PRECISION (t_opx))
> > +        return 0;
> > +    }
> > +  else if (code == RSHIFT_EXPR)
> > +    {
> > +      /* If type of OPX and OP1 are compatible, we don't need to check OP2
> > +         for validity as we don't change range of operation.
> > +         Otherwise we need to check that right-hand operand of shift-right
> > +	 doesn't exceed type-precision of inner operand.  */
> There's some kind of whitespace problem here.
> 
> > +
> > +  /* Do transformation ((T1) X) shift-code N -> (T1) (X shift-code N).  */
> > +  if (dump_file)
> > +    {
> > +      fprintf (dump_file, "  Replaced ");
> > +      print_gimple_stmt (dump_file, stmt, 0, 0);
> > +      fprintf (dump_file, "    ");
> > +      print_gimple_stmt (dump_file, def, 0, 0);
> > +      fprintf (dump_file, "  by ");
> > +    }
> > +
> > +  tem = make_ssa_name (t_opx, NULL);
> > +  newop = gimple_build_assign_with_ops (code, tem, opx, op2);
> > +  gimple_set_location (newop, gimple_location (stmt));
> > +  gsi_insert_before (gsi_p, newop, GSI_SAME_STMT);
> > +  gimple_assign_set_rhs_with_ops_1 (gsi_p, NOP_EXPR, tem, NULL_TREE,
> > NULL_TREE);
> > +
> > +  if (gimple_location (def) != UNKNOWN_LOCATION)
> > +    gimple_set_location (stmt, gimple_location (def));
> > +
> > +  stmt = gsi_stmt (*gsi_p);
> > +  update_stmt (stmt);
> > +
> > +  if (TREE_CODE (op1) == SSA_NAME)
> > +    ret = remove_prop_source_from_use (op1) ? 2 : 1;
> > +  else
> > +    ret = 1;
> > +  if (dump_file)
> > +    {
> > +      fprintf (dump_file, "   by ");
> > +      print_gimple_stmt (dump_file, stmt, 0, 0);
> > +      fprintf (dump_file, "    ");
> > +      print_gimple_stmt (dump_file, newop, 0, 0);
> As someone has already noted, the dump files are sub-optimal with the
> "by by" output:

Yeah, adjusted it in attached patch.
 
>    Replaced _3 = x1_2 << 5;
>      x1_2 = (unsigned char) x_1(D);
>    by    by _3 = (unsigned char) _5;
>      _5 = x_1(D) << 5;
> >
> >   /* Propagate from the ssa name definition statements of COND_EXPR
> >      in the rhs of statement STMT into the conditional if that simplifies
> >      it.
> > @@ -3432,6 +3533,15 @@ ssa_forward_propagate_and_combine (void)
> >   		     || code == NEGATE_EXPR)
> >   		    && TREE_CODE (rhs1) == SSA_NAME)
> >   		  changed = simplify_not_neg_expr (&gsi);
> > +		else if (code == LSHIFT_EXPR
> > +		         || code == RSHIFT_EXPR)
> > +		  {
> > +		    int did_something;
> > +		    did_something = simplify_shift (code, &gsi);
> > +		    if (did_something == 2)
> > +		      cfg_changed = true;
> > +		    changed = did_something != 0;
> > +		  }
> >   		else if (code == COND_EXPR
> >   			 || code == VEC_COND_EXPR)
> >   		  {
> Does this apply to rotates as well?

For rotate this transformation doesn't apply in the same way.  Just in case that type-cast and X have same precision we could do this transformation.  So this case might be some future enhancement for this function.

> 
> 
> >
> > Index: gcc-trunk/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> > ===================================================================
> > --- gcc-trunk.orig/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> > +++ gcc-trunk/gcc/testsuite/gcc.dg/vect/vect-over-widen-1-big-array.c
> > @@ -58,9 +58,8 @@ int main (void)
> >     return 0;
> >   }
> >
> > -/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern:
> > detected" 2 "vect" { target vect_widen_shift } } } */
> > -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern:
> > detected" 2 "vect" { target vect_widen_shift } } } */
> > -/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern:
> > detected" 4 "vect" { target { ! vect_widen_shift } } } } */
> > +/* { dg-final { scan-tree-dump-times "vect_recog_widen_shift_pattern:
> > detected" 4 "vect" { target vect_widen_shift } } } */
> > +/* { dg-final { scan-tree-dump-times "vect_recog_over_widening_pattern:
> > detected" 0 "vect" } } */
> >   /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
> >   /* { dg-final { cleanup-tree-dump "vect" } } */
> I'm guessing this was one of the cases you mentioned to me privately via
> email.  Are we still vectorizing as well, just differently without the
> need for as many conversions?

Yes.  The vectorization still happens in the same way as before.  Just the shift-widening-patterns getting superfluous by this patch.
 
> Simliarly for the other vect testcases.  At the least you need to
> indicate we're not actually regressing on those tests -- when you change
> an epected output in a way that makes it look like the test might be
> compromised, you need to explain yourself more thoroughly.  My obvious
> concern is we're not vectorizing as much and the testsuite changes are
> merely a way of hiding that fact.

The output-test isn't changed.  All tests still result in vectorized loop and if you take a closer look to generated output, you will notice that generated vectorization is identical to that one without this patch.
 
> I think there's some questions that need to be answered before this can
> go forward.
> 
> jeff
> 
> 
> 

Regards,
Kai

ChangeLog

2013-11-22  Kai Tietz  <ktietz@redhat.com>

	* tree-ssa-forwprop.c (simplify_shift):  New function.
	(ssa_forward_propagate_and_combine): Use it.

ChangeLog testsuite

	* gcc.dg/tree-ssa-ts-shift-2.c: New test.


Bootstrapped for x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai

Comments

Jeff Law Nov. 26, 2013, 5:40 a.m. UTC | #1
On 11/22/13 18:43, Kai Tietz wrote:
>> So at least for f1 and f2, these regexps match regardless of whether or
>> not the code was transformed.
>>
>> Without your patch (f1)
>>     x1_2 = (unsigned char) x_1(D);
>>     _3 = x1_2 << 5;
>>     return _3;
>>
>> With your patch (f1):
>>     _4 = x_1(D) << 5;
>>     _2 = (unsigned char) _4;
>>     return _2;
>>
>> So all that's happened is we changed which object got casted -- we have
>> the same number casts of an object to (unsigned char).
>
> That is actual wished.  We shouldn't come to patterns, which have more type-casts by this patch.
> What we see here is the normalization of shift-left/right operations.  This patch takes care that we prefer in general (Type) (X shift-left/right y) over ((Type) X) shift-left/right y notation.
Right, my point is the test didn't actually check for normalization, it 
just checks if there's a particular number/type of casts.

I think it's also worth pointing out for the record this patch in and of 
itself really isn't an optimization.  It's setting a preferred form for 
shifts.  I won't call it canonicalization because we don't insist on the 
this form, but it's the form we want to see.

Since it's just setting a preferred form and not an optimization, it 
seems to me we should expect it to sometimes generate better code and 
sometimes worse.

I'll take a closer look after I fix my arm regression :-)

jeff
Jeff Law Nov. 26, 2013, 7 a.m. UTC | #2
On 11/22/13 18:43, Kai Tietz wrote:
> ----- Original Message -----
>
> That is actual wished.  We shouldn't come to patterns, which have more type-casts by this patch.
> What we see here is the normalization of shift-left/right operations.  This patch takes care that we prefer in general (Type) (X shift-left/right y) over ((Type) X) shift-left/right y notation.

What is the rationale behind one for over the other?  Or is it 
arbitrary?  I can easily make a case for either form based on what we're 
trying to optimize.  In general, it seems to me that....

If we're more interested in combining the shift with statements that 
define the shift's operands, then the former is going to be a better 
representation because we're typecasting the result and thus the 
typecast doesn't interfere with the desired optimization.

In contrast if we are more interested in cases where the shift defines 
operands for some other statement, then the latter form is more 
beneficial because we're typecasting the inputs and thus the typecast 
doesn't interfere with the desired optimization.

Unfortunately, within a pass (or even a phase of a pass) we can have 
both cases.  Let's look at the code in tree-ssa-forwprop.c you're 
proposing we change.

The code in question loops over every statement and tries to combine the 
current statement with the statements which define its operands, then it 
moves on to the next statement in execution order.

So your patch seems to make perfect sense when STMT is a shift.  It 
moves the typecast out of the way of combining STMT with the statements 
which define its operands.

However, once we move to the next statement in the IL, the shift is now 
defining operands for later statements and the other form would seem to 
be preferred.

So it almost seems to me you want to apply the transformation on STMT, 
try to combine into the STMT, then apply the reverse transformation on 
STMT if it's safe to do so.

That kindof implies there really isn't a perferred global form for 
shifts with typecasts, but instead there is a form that is preferred 
based on the optimizations we're trying to perform at any given time. 
It may also argue that type promotion/demotion in general shouldn't be 
separate passes.  I'm still pondering that.


Thoughts?

Jeff
Richard Biener Nov. 26, 2013, 9:58 a.m. UTC | #3
On Tue, Nov 26, 2013 at 8:00 AM, Jeff Law <law@redhat.com> wrote:
> On 11/22/13 18:43, Kai Tietz wrote:
>>
>> ----- Original Message -----
>>
>> That is actual wished.  We shouldn't come to patterns, which have more
>> type-casts by this patch.
>> What we see here is the normalization of shift-left/right operations.
>> This patch takes care that we prefer in general (Type) (X shift-left/right
>> y) over ((Type) X) shift-left/right y notation.
>
>
> What is the rationale behind one for over the other?  Or is it arbitrary?  I
> can easily make a case for either form based on what we're trying to
> optimize.  In general, it seems to me that....

When (Type) is a widening cast then you have to worry to not introduce
undefined behavior when y is larger than the number of bits in X.
When (Type) is a shortening cast then we lose information - because
then the range we can determine for y from the shift operation will
be larger.

So I'm not sure I follow the reasoning to move out the casting when
possible.

The goal for canonicalization should be to shorten operations where
possible.  Thus both cases, sinking and hoisting the cast should
be applied dependent on validity and whether the shift will be performed
in a smaller type in the end.

> If we're more interested in combining the shift with statements that define
> the shift's operands, then the former is going to be a better representation
> because we're typecasting the result and thus the typecast doesn't interfere
> with the desired optimization.
>
> In contrast if we are more interested in cases where the shift defines
> operands for some other statement, then the latter form is more beneficial
> because we're typecasting the inputs and thus the typecast doesn't interfere
> with the desired optimization.

Always a trade-off which is why combining passes always have to
consider casts.

> Unfortunately, within a pass (or even a phase of a pass) we can have both
> cases.  Let's look at the code in tree-ssa-forwprop.c you're proposing we
> change.
>
> The code in question loops over every statement and tries to combine the
> current statement with the statements which define its operands, then it
> moves on to the next statement in execution order.
>
> So your patch seems to make perfect sense when STMT is a shift.  It moves
> the typecast out of the way of combining STMT with the statements which
> define its operands.
>
> However, once we move to the next statement in the IL, the shift is now
> defining operands for later statements and the other form would seem to be
> preferred.
>
> So it almost seems to me you want to apply the transformation on STMT, try
> to combine into the STMT, then apply the reverse transformation on STMT if
> it's safe to do so.
>
> That kindof implies there really isn't a perferred global form for shifts
> with typecasts, but instead there is a form that is preferred based on the
> optimizations we're trying to perform at any given time. It may also argue
> that type promotion/demotion in general shouldn't be separate passes.  I'm
> still pondering that.

;)  Not an easy problem.

> Thoughts?

I think the goal of "canonicalizing" operations to work on the smallest
mode possible during early GIMPLE passes makes the most sense
(even if then a cast is not always outside or inside an operation).

Later (currently during RTL expansion) we can widen operations again
according to target needs.

Shorter operations simply carry more implicit information on value ranges
(and are usually easier to vectorize for example).

Richard.

> Jeff
>
>
Jeff Law Nov. 26, 2013, 9:42 p.m. UTC | #4
On 11/26/13 02:58, Richard Biener wrote:
>> What is the rationale behind one for over the other?  Or is it arbitrary?  I
>> can easily make a case for either form based on what we're trying to
>> optimize.  In general, it seems to me that....
>
> When (Type) is a widening cast then you have to worry to not introduce
> undefined behavior when y is larger than the number of bits in X.
> When (Type) is a shortening cast then we lose information - because
> then the range we can determine for y from the shift operation will
> be larger.
>
> So I'm not sure I follow the reasoning to move out the casting when
> possible.
Assume that we're not introducing undefined behaviour.  Obviously if 
we're introducing undefined behaviour, then the transformation is wrong, 
plain and simple.

If by moving the cast we can eliminate an ALU operation because the 
shift (in this example) combines with something earlier or later in the 
IL, then we win.  Similarly if by moving the cast we expose redundant 
casting, then we win.

A narrower range is obviously good, but often just having a narrower 
range won't in and of itself provide an optimization opportunity.



> The goal for canonicalization should be to shorten operations where
> possible.  Thus both cases, sinking and hoisting the cast should
> be applied dependent on validity and whether the shift will be performed
> in a smaller type in the end.
I disagree, there are going to be times when sinking or hoisting the 
cast allows the shift to combine with other instructions in the IL. 
There are times when the ability to combine with other instructions in 
the stream should be driving these decisions.

I would buy that as a guiding principle that applies in cases where we 
don't have other optimization opportunities we can expose by 
hoisting/sinking the type casts.  I would even buy that we prefer the 
narrowest range through some point in the compiler (say vrp2), then we 
allow wider ranges as needed to facilitate other optimizations.



>
>> If we're more interested in combining the shift with statements that define
>> the shift's operands, then the former is going to be a better representation
>> because we're typecasting the result and thus the typecast doesn't interfere
>> with the desired optimization.
>>
>> In contrast if we are more interested in cases where the shift defines
>> operands for some other statement, then the latter form is more beneficial
>> because we're typecasting the inputs and thus the typecast doesn't interfere
>> with the desired optimization.
>
> Always a trade-off which is why combining passes always have to
> consider casts.
Actually, I would say quite the opposite here.  There's a very clear 
line when we want to go from one representation to the other within the 
forwprop passes.  By doing so, we relieve the pattern matching aspects 
of that pass from needing to concern themselves with the typecasting issues.

>
> I think the goal of "canonicalizing" operations to work on the smallest
> mode possible during early GIMPLE passes makes the most sense
> (even if then a cast is not always outside or inside an operation).
>
> Later (currently during RTL expansion) we can widen operations again
> according to target needs.
But what you end up missing here is optimization opportunities that 
expose themselves when the casts are moved from inputs to outputs or 
vice-versa.


>
> Shorter operations simply carry more implicit information on value ranges
> (and are usually easier to vectorize for example).
Yes.  There's no argument about that.

jeff
Richard Biener Nov. 27, 2013, 11:16 a.m. UTC | #5
On Tue, Nov 26, 2013 at 10:42 PM, Jeff Law <law@redhat.com> wrote:
> On 11/26/13 02:58, Richard Biener wrote:
>>>
>>> What is the rationale behind one for over the other?  Or is it arbitrary?
>>> I
>>> can easily make a case for either form based on what we're trying to
>>> optimize.  In general, it seems to me that....
>>
>>
>> When (Type) is a widening cast then you have to worry to not introduce
>> undefined behavior when y is larger than the number of bits in X.
>> When (Type) is a shortening cast then we lose information - because
>> then the range we can determine for y from the shift operation will
>> be larger.
>>
>> So I'm not sure I follow the reasoning to move out the casting when
>> possible.
>
> Assume that we're not introducing undefined behaviour.  Obviously if we're
> introducing undefined behaviour, then the transformation is wrong, plain and
> simple.
>
> If by moving the cast we can eliminate an ALU operation because the shift
> (in this example) combines with something earlier or later in the IL, then
> we win.  Similarly if by moving the cast we expose redundant casting, then
> we win.
>
> A narrower range is obviously good, but often just having a narrower range
> won't in and of itself provide an optimization opportunity.
>
>
>
>
>> The goal for canonicalization should be to shorten operations where
>> possible.  Thus both cases, sinking and hoisting the cast should
>> be applied dependent on validity and whether the shift will be performed
>> in a smaller type in the end.
>
> I disagree, there are going to be times when sinking or hoisting the cast
> allows the shift to combine with other instructions in the IL. There are
> times when the ability to combine with other instructions in the stream
> should be driving these decisions.
>
> I would buy that as a guiding principle that applies in cases where we don't
> have other optimization opportunities we can expose by hoisting/sinking the
> type casts.  I would even buy that we prefer the narrowest range through
> some point in the compiler (say vrp2), then we allow wider ranges as needed
> to facilitate other optimizations.
>
>
>
>
>>
>>> If we're more interested in combining the shift with statements that
>>> define
>>> the shift's operands, then the former is going to be a better
>>> representation
>>> because we're typecasting the result and thus the typecast doesn't
>>> interfere
>>> with the desired optimization.
>>>
>>> In contrast if we are more interested in cases where the shift defines
>>> operands for some other statement, then the latter form is more
>>> beneficial
>>> because we're typecasting the inputs and thus the typecast doesn't
>>> interfere
>>> with the desired optimization.
>>
>>
>> Always a trade-off which is why combining passes always have to
>> consider casts.
>
> Actually, I would say quite the opposite here.  There's a very clear line
> when we want to go from one representation to the other within the forwprop
> passes.  By doing so, we relieve the pattern matching aspects of that pass
> from needing to concern themselves with the typecasting issues.
>
>
>>
>> I think the goal of "canonicalizing" operations to work on the smallest
>> mode possible during early GIMPLE passes makes the most sense
>> (even if then a cast is not always outside or inside an operation).
>>
>> Later (currently during RTL expansion) we can widen operations again
>> according to target needs.
>
> But what you end up missing here is optimization opportunities that expose
> themselves when the casts are moved from inputs to outputs or vice-versa.

Only if the "optimization opportunities" do not handle the casts themselves.
We for example have the get_unwidened () helper in fold that helps
transforms to look through some casts.  I think we want similar helpers
for the GIMPLE level combining - the combiner knows whether for
an operand with type T it's ok to have a wider or narrower operand
or if it wants an exact T2 operand instead.  Helpers should exist
that get it at that, if possible.

Currently the issue would be how to represent the result of "get it at that",
but I'll work on removing the tree-building-and-re-gimplifying way of
middle-end expression creation for the next stage1 so that we can easily
represent multiple-stmts intermediate "expressions".  It'll be a
(maybe wrapped in some C++ sugar, maybe then with non-GC
allocation) gimple_seq.

So you do

  op = get_me_the_op_in_type_X (X, gimple_assign_rhs2 (stmt));
  if (!op)
    return false;

and now op is just a regular op you can work with, eventually pointing
to a temporary sequence that you'd need to insert.

Note that forwprops combining should really work like fold - combine
from the expression leafs to the expression roots (works currently
by visiting in stmt order).  Now consider (T2)(T1)(a << 5) and
((T1)(a << 5)) >> 5.  The 2nd form would prefer if we'd have "combined"
the leaf (T1)(a << 5) to ((T1)a << 5) while the first would be pessimized
by that.  So - what kind of form do you prefer?  I'd say you can't
tell that from the use(s) but you have to be able to decide without
and thus cannot decide based on "may I better be able to combine this".
Instead the combining code needs to deal with both forms.

Richard.

>
>
>>
>> Shorter operations simply carry more implicit information on value ranges
>> (and are usually easier to vectorize for example).
>
> Yes.  There's no argument about that.
>
> jeff
>
diff mbox

Patch

Index: gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c
===================================================================
--- /dev/null
+++ gcc-org/gcc/testsuite/gcc.dg/tree-ssa/ts-shift-2.c
@@ -0,0 +1,33 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+char f1 (char x)
+{
+  unsigned char x1 = (unsigned char) x;
+  return (x1 << 5);
+}
+
+short f2 (unsigned int x)
+{
+  unsigned short x1 = (unsigned short) x;
+  return (x1 << 15);
+}
+
+unsigned int f3 (unsigned short x)
+{
+  unsigned long long x1 = (unsigned long long) x;
+  return (unsigned int) (x1 >> 15);
+}
+
+unsigned long long f4 (unsigned short x)
+{
+  unsigned int x1 = (unsigned int) x;
+  return (unsigned long long) (x1 >> 7);
+}
+
+/* { dg-final { scan-tree-dump-times "= \\\(long long unsigned int\\\)" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "= \\\(short int\\\)" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "= \\\(unsigned int\\\)" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "= \\\(unsigned char\\\)" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
Index: gcc-org/gcc/tree-ssa-forwprop.c
===================================================================
--- gcc-org.orig/gcc/tree-ssa-forwprop.c
+++ gcc-org/gcc/tree-ssa-forwprop.c
@@ -551,6 +551,107 @@  forward_propagate_into_gimple_cond (gimp
   return 0;
 }
 
+/* Try to normalize shift-left/right statement from form
+   ((TYPE1) X) CODE Y to (TYPE) (X CODE Y).  TYPE, and X have to be
+   integral-kind types.
+   Returns none-zero if the stmt was changed.  */
+
+static int
+simplify_shift (enum tree_code code, gimple_stmt_iterator *gsi_p)
+{
+  gimple stmt = gsi_stmt (*gsi_p);
+  gimple def, newop;
+  tree op1, opx, op2, t_opx, tem;
+  int ret;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
+    return 0;
+
+  op1 = gimple_assign_rhs1 (stmt);
+  if (TREE_CODE (op1) != SSA_NAME
+      || !(def = SSA_NAME_DEF_STMT (op1))
+      || !is_gimple_assign (def)
+      || !gimple_assign_cast_p (def)
+      || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op1)
+      || !has_single_use (op1))
+    return 0;
+
+  opx = gimple_assign_rhs1 (def);
+  if (TREE_CODE (opx) != SSA_NAME)
+    return 0;
+
+  t_opx = TREE_TYPE (opx);
+
+  if (!INTEGRAL_TYPE_P (t_opx))
+    return 0;
+
+  op2 = gimple_assign_rhs2 (stmt);
+
+  if (code == LSHIFT_EXPR)
+    {
+      if (TYPE_PRECISION (TREE_TYPE (op1))
+	  > TYPE_PRECISION (t_opx))
+        return 0;
+    }
+  else if (code == RSHIFT_EXPR)
+    {
+      /* If type of OPX and OP1 are compatible, we don't need to check OP2
+         for validity as we don't change range of operation.
+         Otherwise we need to check that right-hand operand of shift-right
+	 doesn't exceed type-precision of inner operand.  */
+      if (!useless_type_conversion_p (TREE_TYPE (op1), t_opx))
+        {
+	  if (TREE_CODE (op2) != INTEGER_CST)
+	    return 0;
+	  if (integer_zerop (fold_binary (LT_EXPR, boolean_type_node, op2,
+	                                  build_int_cst (TREE_TYPE (op2),
+					                 TYPE_PRECISION (t_opx)))))
+	    return 0;
+
+          /* See if cast can be moved out of the shift-right operation.  */
+	  if (TYPE_PRECISION (TREE_TYPE (op1)) <= TYPE_PRECISION (t_opx)
+	      || (!TYPE_UNSIGNED (t_opx) && TYPE_UNSIGNED (TREE_TYPE (op1))))
+	    return 0;
+        }
+    }
+  else
+    return 0;
+
+  /* Do transformation ((T1) X) shift-code N -> (T1) (X shift-code N).  */
+  if (dump_file)
+    {
+      fprintf (dump_file, "  Replaced ");
+      print_gimple_stmt (dump_file, stmt, 0, 0);
+      fprintf (dump_file, "    ");
+      print_gimple_stmt (dump_file, def, 0, 0);
+    }
+
+  tem = make_ssa_name (t_opx, NULL);
+  newop = gimple_build_assign_with_ops (code, tem, opx, op2);
+  gimple_set_location (newop, gimple_location (stmt));
+  gsi_insert_before (gsi_p, newop, GSI_SAME_STMT);
+  gimple_assign_set_rhs_with_ops_1 (gsi_p, NOP_EXPR, tem, NULL_TREE, NULL_TREE);
+
+  if (gimple_location (def) != UNKNOWN_LOCATION)
+    gimple_set_location (stmt, gimple_location (def));
+
+  stmt = gsi_stmt (*gsi_p);
+  update_stmt (stmt);
+
+  if (TREE_CODE (op1) == SSA_NAME)
+    ret = remove_prop_source_from_use (op1) ? 2 : 1;
+  else
+    ret = 1;
+  if (dump_file)
+    {
+      fprintf (dump_file, "   by ");
+      print_gimple_stmt (dump_file, stmt, 0, 0);
+      fprintf (dump_file, "    ");
+      print_gimple_stmt (dump_file, newop, 0, 0);
+    }
+
+  return ret;
+}
 
 /* Propagate from the ssa name definition statements of COND_EXPR
    in the rhs of statement STMT into the conditional if that simplifies it.
@@ -3432,6 +3533,15 @@  ssa_forward_propagate_and_combine (void)
 		     || code == NEGATE_EXPR)
 		    && TREE_CODE (rhs1) == SSA_NAME)
 		  changed = simplify_not_neg_expr (&gsi);
+		else if (code == LSHIFT_EXPR
+		         || code == RSHIFT_EXPR)
+		  {
+		    int did_something;
+		    did_something = simplify_shift (code, &gsi);
+		    if (did_something == 2)
+		      cfg_changed = true;
+		    changed = did_something != 0;
+		  }
 		else if (code == COND_EXPR
 			 || code == VEC_COND_EXPR)
 		  {