Patchwork [ping] Strength reduction

login
register
mail settings
Submitter Richard Guenther
Date June 20, 2012, 11:11 a.m.
Message ID <CAFiYyc3LgH8Jbst9AAk9hqmJspFnkc8GMk4oF-AbpNR=cqVkWA@mail.gmail.com>
Download mbox | patch
Permalink /patch/166013/
State New
Headers show

Comments

Richard Guenther - June 20, 2012, 11:11 a.m.
On Thu, Jun 14, 2012 at 3:21 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Pro forma ping. :)

;)

I notice (with all of these functions)

+unsigned
+negate_cost (enum machine_mode mode, bool speed)
+{
+  static unsigned costs[NUM_MACHINE_MODES];
+  rtx seq;
+  unsigned cost;
+
+  if (costs[mode])
+    return costs[mode];
+
+  start_sequence ();
+  force_operand (gen_rtx_fmt_e (NEG, mode,
+				gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
+		 NULL_RTX);
+  seq = get_insns ();
+  end_sequence ();
+
+  cost = seq_cost (seq, speed);
+  if (!cost)
+    cost = 1;

that the cost[] array is independent on the speed argument.  Thus whatever
comes first determines the cost.  Odd, and probably not good.  A fix
would be appreciated (even for the current code ...) - simply make the
array costs[NUM_MACHINE_MODES][2].

As for the renaming - can you name the functions consistently?  Thus
the above would be negate_reg_cost?  And maybe rename the other
FIXME function, too?
William J. Schmidt - June 20, 2012, 4:22 p.m.
On Wed, 2012-06-20 at 13:11 +0200, Richard Guenther wrote:
> On Thu, Jun 14, 2012 at 3:21 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > Pro forma ping. :)
> 
> ;)
> 
> I notice (with all of these functions)
> 
> +unsigned
> +negate_cost (enum machine_mode mode, bool speed)
> +{
> +  static unsigned costs[NUM_MACHINE_MODES];
> +  rtx seq;
> +  unsigned cost;
> +
> +  if (costs[mode])
> +    return costs[mode];
> +
> +  start_sequence ();
> +  force_operand (gen_rtx_fmt_e (NEG, mode,
> +				gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
> +		 NULL_RTX);
> +  seq = get_insns ();
> +  end_sequence ();
> +
> +  cost = seq_cost (seq, speed);
> +  if (!cost)
> +    cost = 1;
> 
> that the cost[] array is independent on the speed argument.  Thus whatever
> comes first determines the cost.  Odd, and probably not good.  A fix
> would be appreciated (even for the current code ...) - simply make the
> array costs[NUM_MACHINE_MODES][2].
> 
> As for the renaming - can you name the functions consistently?  Thus
> the above would be negate_reg_cost?  And maybe rename the other
> FIXME function, too?

I agree with all this.  I'll prepare all the cost model changes as a
separate preliminaries patch.

> 
> Index: gcc/tree-ssa-strength-reduction.c
> ===================================================================
> --- gcc/tree-ssa-strength-reduction.c	(revision 0)
> +++ gcc/tree-ssa-strength-reduction.c	(revision 0)
> @@ -0,0 +1,1611 @@
> +/* Straight-line strength reduction.
> +   Copyright (C) 2012  Free Software Foundation, Inc.
> 
> I know we have these 'tree-ssa-' names, but really this is gimple-ssa now ;)
> So, please name it gimple-ssa-strength-reduction.c.

Will do.  Vive la revolution? ;)

> 
> +  /* Access to the statement for subsequent modification.  Cached to
> +     save compile time.  */
> +  gimple_stmt_iterator cand_gsi;
> 
> this is a iterator for cand_stmt?  Then caching it is no longer necessary
> as the iterator is the stmt itself after recent infrastructure changes.

Oh yeah, I remember seeing that go by.  Nice.  Will change.

> 
> +/* Hash table embodying a mapping from statements to candidates.  */
> +static htab_t stmt_cand_map;
> ...
> +static hashval_t
> +stmt_cand_hash (const void *p)
> +{
> +  return htab_hash_pointer (((const_slsr_cand_t) p)->cand_stmt);
> +}
> 
> use a pointer-map instead.
> 
> +/* Callback to produce a hash value for a candidate chain header.  */
> +
> +static hashval_t
> +base_cand_hash (const void *p)
> +{
> +  tree ssa_name = ((const_cand_chain_t) p)->base_name;
> +
> +  if (TREE_CODE (ssa_name) != SSA_NAME)
> +    return (hashval_t) 0;
> +
> +  return (hashval_t) SSA_NAME_VERSION (ssa_name);
> +}
> 
> does it ever happen that ssa_name is not an SSA_NAME?  

Not in this patch, but when I introduce CAND_REF in a later patch it
could happen since the base field of a CAND_REF is a MEM_REF.  It's a
safety valve in case of misuse.  I'll think about this some more.

> I'm not sure
> the memory savings over simply using a fixed-size (num_ssa_names)
> array indexed by SSA_NAME_VERSION pointing to the chain is worth
> using a hashtable for this?

That's reasonable.  I'll do that.

> 
> +  node = (cand_chain_t) pool_alloc (chain_pool);
> +  node->base_name = c->base_name;
> 
> If you never free pool entries it's more efficient to use an obstack.
> alloc-pool
> only pays off if you get freed item re-use.

OK.  I'll change both cand_pool and chain_pool to obstacks.

> 
> +  switch (gimple_assign_rhs_code (gs))
> +    {
> +    case MULT_EXPR:
> +      rhs2 = gimple_assign_rhs2 (gs);
> +
> +      if (TREE_CODE (rhs2) == INTEGER_CST)
> +	return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
> +
> +      if (TREE_CODE (rhs1) == INTEGER_CST)
> +	return multiply_by_cost (TREE_INT_CST_LOW (rhs1), lhs_mode, speed);
> 
> In theory all commutative statements should have constant operands only
> at rhs2 ...

I'm glad I'm not the only one who thought that was the theory. ;)  I
wasn't sure, and I've seen violations of this come up in practice.
Should I assert when that happens instead, and track down the offending
optimizations?

> 
> Also you do not verify that the constant fits in a host-wide-int - but maybe
> you do not care?  Thus, I'd do
> 
>    if (host_integerp (rhs2, 0))
>      return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
> 
> or make multiply_by[_const?]_cost take a double-int instead.  Likewise below
> for add.

Ok.  Name change looks good also, I'll include that in the cost model
changes.

> 
> +    case MODIFY_EXPR:
> +      /* Be suspicious of assigning costs to copies that may well go away.  */
> +      return 0;
> 
> MODIFY_EXPR is never a gimple_assign_rhs_code.  Simple copies have
> a code of SSA_NAME for example.  But as you assert if you get to an
> unhandled code I wonder why you needed the above ...

I'll remove this, and document that we are deliberately not touching
copies (which was my original intent).

> 
> +static slsr_cand_t
> +base_cand_from_table (tree base_in)
> +{
> +  slsr_cand mapping_key;
> +
> +  gimple def = SSA_NAME_DEF_STMT (base_in);
> +  if (!def)
> +    return (slsr_cand_t) NULL;
> +
> +  mapping_key.cand_stmt = def;
> +  return (slsr_cand_t) htab_find (stmt_cand_map, &mapping_key);
> 
> isn't that reachable via the base-name -> chain mapping for base_in?

Maybe so.  I need to think about it some more (it got evicted from my
mental cache).  It could be I created this before adding the chains
stuff and never cleaned up.

> 
> +  if (TREE_CODE (rhs2) == SSA_NAME && operand_equal_p (rhs1, rhs2, 0))
> +    return;
> 
> SSA_NAMEs can be compared by pointer equality, thus the above is
> equivalent to
> 
>   if (TREE_CODE (rhs2) == SSA_NAME && rhs1 == rhs2)
> 
> or even just
> 
>   if (rhs1 == rhs2)
> 
> applies elsewhere as well.
> 

Ok.  I promise I'll remember this eventually...

> +/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return
> +   the multiple in *MULTIPLE.  Otherwise return FALSE and leave *MULTIPLE
> +   unchanged.  */
> +/* ??? - Should this be moved to double-int.c?  */
> 
> I think so.

Ok, I'll include this in the preliminaries patch.

> 
> +static bool
> +double_int_multiple_of (double_int product, double_int factor,
> +			bool unsigned_p, double_int *multiple)
> +{
> +  double_int remainder;
> +  double_int quotient = double_int_divmod (product, factor, unsigned_p,
> +					   TRUNC_DIV_EXPR, &remainder);
> +  if (double_int_zero_p (remainder))
> +    {
> +      *multiple = quotient;
> +      return true;
> +    }
> +
> +  return false;
> +}
> 
> 
> In general I find a lot of code of similar structure, factoring bits may make
> it easier to read.  Obvious candidates include:
> 
> +      /* Add the interpretation to the statement-candidate mapping.  */
> +      slot = htab_find_slot (stmt_cand_map, c, INSERT);
> +      gcc_assert (!*slot);
> +      *slot = c;
> 
> into a function add_cand_for_stmt ()
> 
> +      if (TREE_CODE (rhs1) != SSA_NAME || TREE_CODE (rhs2) != INTEGER_CST)
> +	return;
> 
> doing some simple checks of this kind in the function walking all statements
> and pass in operands and operation code.

Sounds good.  You should have seen it before I already did one round of
factoring... :)

> 
> +/* Return TRUE if GS is a statement that defines an SSA name from
> +   a NOP_EXPR and is legal for us to combine an add and multiply
> +   across.  This is legitimate for casts from a signed type to
> +   a signed or unsigned type of the same or larger size.  It is not
> +   legitimate to convert any unsigned type to a signed type, or
> +   to an unsigned type of a different size.
> +
> +   The reasoning here is that signed integer overflow is undefined,
> +   so any program that was expecting overflow that no longer occurs
> +   is not correct.  Unsigned integers, however, have wrap semantics,
> +   and it is reasonable for programs to assume an overflowing add
> +   will wrap.
> +
> +   With -fwrapv, signed integers also have wrap semantics, so widening
> +   casts are not allowed then either.  */
> +
> +static bool
> +legal_cast_p (gimple gs)
> +{
> +  tree lhs, rhs, lhs_type, rhs_type;
> +  unsigned lhs_size, rhs_size, lhs_uns, rhs_uns;
> +
> +  if (!is_gimple_assign (gs)
> +      || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME
> 
> That's always true if the code is NOP_EXPR
> 
> +      || gimple_assign_rhs_code (gs) != NOP_EXPR)
> 
> CONVERT_EXPR_CODE_P ()
> 
> +    return false;
> +
> +  lhs = gimple_assign_lhs (gs);
> +  rhs = gimple_assign_rhs1 (gs);
> +
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    return false;
> 
> Likewise.
> 
> +  lhs_type = TREE_TYPE (SSA_NAME_VAR (lhs));
> +  rhs_type = TREE_TYPE (SSA_NAME_VAR (rhs));
> 
> Get the type from the SSA_NAMEs themselves, no need to lookup
> SSA_NAME_VAR.  If it happens you ICE that way you are looking
> at released SSA names ...
> 
> +  lhs_size = TYPE_PRECISION (lhs_type);
> +  rhs_size = TYPE_PRECISION (rhs_type);
> +  lhs_uns = TYPE_UNSIGNED (lhs_type);
> +  rhs_uns = TYPE_UNSIGNED (rhs_type);
> +
> +  if (lhs_size < rhs_size
> +      || (rhs_uns && !lhs_uns)
> +      || (rhs_uns && lhs_uns && rhs_size != lhs_size)
> +      || (!rhs_uns && flag_wrapv && rhs_size != lhs_size))
> +    return false;
> 
> So we basically check whether the lhs can represent all values of the rhs?
> So ...
> 
>     if (lhs_size > rhs_size)
>       return true;
> 
> is missing?  Because you'd reject a widening of an unsigned int to an
> unsigned long.

That rejection is intentional.  Assuming 4-byte int and 8-byte long, the
unsigned int would wrap at 32 bits, so allowing the widening could
change semantics.  I've actually encountered problems like this all over
libiberty and other GCC support libraries.

> 
> As for your handling of flag_wrapv and the unsigned flag, please try
> to use the TYPE_OVERFLOW_UNDEFINED/TYPE_OVERFLOW_WRAPS
> type properties instead for the parts that care about overflow behavior
> and not value-representation.
> 
> +  return true;
> +}
> 

OK, I'll look into those and let you know if I have questions.

> I have a deja-vu for seeing similar code elsewhere (widening shift/multiply
> support in the vector pattern recognizer maybe).  While I may understand
> what the textual description says including an example would explain
> things better.  For example
> 
> "Return TRUE if GS is a statement that defines an SSA name from
>  a conversion and is legal for us to associate with an add or multiply.
>  Thus transform name = (T) name'; name * X; to name' * X"
> 
> But I suppose I got the semantics wrong (and thus the example).  Can you
> clarify?

OK, I'll try.  This was definitely the hardest part to get right.

> 
> Otherwise the pass looks quite good.  It looks though that it will optimize
> cross-basic-block strength-reduction opportunities, eventually causing
> cross-BB register livetime increases?  Or is this not an issue?

This is true, and it's made somewhat worse by later optimizations.  I
attempted to reduce the scope where possible by choosing the most
closely dominating candidate as a basis, thinking this would help limit
lifetimes.  However, later optimizations fold up the various adds so
that all candidates in a chain end up directly depending on the
"ultimate basis," resulting in one longer lifetime.

It's something to keep an eye on.  If it appears to be an issue, we may
need to look into some sort of distance heuristic to limit which
candidates can be used as a basis.  It's not a big deal to keep a
register live across a single conditional branch, but reusing a value
from a control-equivalent block when there's an intervening loop is not
so good.  (On the other hand, it's just another opportunity for
intelligent register spill design. :)

As I indicated, though, I don't think this is a problem limited to this
pass, but is a general issue many places in the middle end.

> 
> Thanks for working on this,
> Richard.
> 

And thanks very much for the review!

Bill
Richard Henderson - June 20, 2012, 6:52 p.m.
On 06/20/2012 04:11 AM, Richard Guenther wrote:
> I notice (with all of these functions)
> 
> +unsigned
> +negate_cost (enum machine_mode mode, bool speed)
> +{
> +  static unsigned costs[NUM_MACHINE_MODES];
> +  rtx seq;
> +  unsigned cost;
> +
> +  if (costs[mode])
> +    return costs[mode];
> +
> +  start_sequence ();
> +  force_operand (gen_rtx_fmt_e (NEG, mode,
> +				gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
> +		 NULL_RTX);

I don't suppose there's any way to share data with what init_expmed computes?

Not, strictly speaking, the cleanest thing to include expmed.h here, but surely
a tad better than re-computing identical data (and without the clever rtl
garbage avoidance tricks).


r~
William J. Schmidt - June 20, 2012, 7:45 p.m.
On Wed, 2012-06-20 at 11:52 -0700, Richard Henderson wrote:
> On 06/20/2012 04:11 AM, Richard Guenther wrote:
> > I notice (with all of these functions)
> > 
> > +unsigned
> > +negate_cost (enum machine_mode mode, bool speed)
> > +{
> > +  static unsigned costs[NUM_MACHINE_MODES];
> > +  rtx seq;
> > +  unsigned cost;
> > +
> > +  if (costs[mode])
> > +    return costs[mode];
> > +
> > +  start_sequence ();
> > +  force_operand (gen_rtx_fmt_e (NEG, mode,
> > +				gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
> > +		 NULL_RTX);
> 
> I don't suppose there's any way to share data with what init_expmed computes?
> 
> Not, strictly speaking, the cleanest thing to include expmed.h here, but surely
> a tad better than re-computing identical data (and without the clever rtl
> garbage avoidance tricks).

Interesting.  I was building on what ivopts already has; not sure of the
history there.  It looks like there is some overlap in function, but
expmed doesn't have everything ivopts uses today (particularly the hash
table of costs for multiplies by various constants).  The stuff I need
for type promotion/demotion is also not present (which I'm computing on
demand for whatever mode pairs are encountered).  Not sure how great it
would be to precompute that for all pairs, and obviously precomputing
costs of multiplying by all constants isn't going to work.  So if the
two functionalities were to be combined, it would seem to require some
modification to how expmed works.

Thanks,
Bill
> 
> 
> r~
>
Richard Guenther - June 21, 2012, 9:13 a.m.
On Wed, Jun 20, 2012 at 6:22 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> On Wed, 2012-06-20 at 13:11 +0200, Richard Guenther wrote:
>> On Thu, Jun 14, 2012 at 3:21 PM, William J. Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>> > Pro forma ping. :)
>>
>> ;)
>>
>> I notice (with all of these functions)
>>
>> +unsigned
>> +negate_cost (enum machine_mode mode, bool speed)
>> +{
>> +  static unsigned costs[NUM_MACHINE_MODES];
>> +  rtx seq;
>> +  unsigned cost;
>> +
>> +  if (costs[mode])
>> +    return costs[mode];
>> +
>> +  start_sequence ();
>> +  force_operand (gen_rtx_fmt_e (NEG, mode,
>> +                             gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
>> +              NULL_RTX);
>> +  seq = get_insns ();
>> +  end_sequence ();
>> +
>> +  cost = seq_cost (seq, speed);
>> +  if (!cost)
>> +    cost = 1;
>>
>> that the cost[] array is independent on the speed argument.  Thus whatever
>> comes first determines the cost.  Odd, and probably not good.  A fix
>> would be appreciated (even for the current code ...) - simply make the
>> array costs[NUM_MACHINE_MODES][2].
>>
>> As for the renaming - can you name the functions consistently?  Thus
>> the above would be negate_reg_cost?  And maybe rename the other
>> FIXME function, too?
>
> I agree with all this.  I'll prepare all the cost model changes as a
> separate preliminaries patch.
>
>>
>> Index: gcc/tree-ssa-strength-reduction.c
>> ===================================================================
>> --- gcc/tree-ssa-strength-reduction.c (revision 0)
>> +++ gcc/tree-ssa-strength-reduction.c (revision 0)
>> @@ -0,0 +1,1611 @@
>> +/* Straight-line strength reduction.
>> +   Copyright (C) 2012  Free Software Foundation, Inc.
>>
>> I know we have these 'tree-ssa-' names, but really this is gimple-ssa now ;)
>> So, please name it gimple-ssa-strength-reduction.c.
>
> Will do.  Vive la revolution? ;)
>
>>
>> +  /* Access to the statement for subsequent modification.  Cached to
>> +     save compile time.  */
>> +  gimple_stmt_iterator cand_gsi;
>>
>> this is a iterator for cand_stmt?  Then caching it is no longer necessary
>> as the iterator is the stmt itself after recent infrastructure changes.
>
> Oh yeah, I remember seeing that go by.  Nice.  Will change.
>
>>
>> +/* Hash table embodying a mapping from statements to candidates.  */
>> +static htab_t stmt_cand_map;
>> ...
>> +static hashval_t
>> +stmt_cand_hash (const void *p)
>> +{
>> +  return htab_hash_pointer (((const_slsr_cand_t) p)->cand_stmt);
>> +}
>>
>> use a pointer-map instead.
>>
>> +/* Callback to produce a hash value for a candidate chain header.  */
>> +
>> +static hashval_t
>> +base_cand_hash (const void *p)
>> +{
>> +  tree ssa_name = ((const_cand_chain_t) p)->base_name;
>> +
>> +  if (TREE_CODE (ssa_name) != SSA_NAME)
>> +    return (hashval_t) 0;
>> +
>> +  return (hashval_t) SSA_NAME_VERSION (ssa_name);
>> +}
>>
>> does it ever happen that ssa_name is not an SSA_NAME?
>
> Not in this patch, but when I introduce CAND_REF in a later patch it
> could happen since the base field of a CAND_REF is a MEM_REF.  It's a
> safety valve in case of misuse.  I'll think about this some more.

Hm.  But then you produce a gigantic hash-collision for them ...

>> I'm not sure
>> the memory savings over simply using a fixed-size (num_ssa_names)
>> array indexed by SSA_NAME_VERSION pointing to the chain is worth
>> using a hashtable for this?
>
> That's reasonable.  I'll do that.
>
>>
>> +  node = (cand_chain_t) pool_alloc (chain_pool);
>> +  node->base_name = c->base_name;
>>
>> If you never free pool entries it's more efficient to use an obstack.
>> alloc-pool
>> only pays off if you get freed item re-use.
>
> OK.  I'll change both cand_pool and chain_pool to obstacks.
>
>>
>> +  switch (gimple_assign_rhs_code (gs))
>> +    {
>> +    case MULT_EXPR:
>> +      rhs2 = gimple_assign_rhs2 (gs);
>> +
>> +      if (TREE_CODE (rhs2) == INTEGER_CST)
>> +     return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
>> +
>> +      if (TREE_CODE (rhs1) == INTEGER_CST)
>> +     return multiply_by_cost (TREE_INT_CST_LOW (rhs1), lhs_mode, speed);
>>
>> In theory all commutative statements should have constant operands only
>> at rhs2 ...
>
> I'm glad I'm not the only one who thought that was the theory. ;)  I
> wasn't sure, and I've seen violations of this come up in practice.
> Should I assert when that happens instead, and track down the offending
> optimizations?

Yes please.  Good enough if you file bugreports for the cases you hit
and in the end simply drop the assert (but do not optimize) in the patch.

>>
>> Also you do not verify that the constant fits in a host-wide-int - but maybe
>> you do not care?  Thus, I'd do
>>
>>    if (host_integerp (rhs2, 0))
>>      return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
>>
>> or make multiply_by[_const?]_cost take a double-int instead.  Likewise below
>> for add.
>
> Ok.  Name change looks good also, I'll include that in the cost model
> changes.
>
>>
>> +    case MODIFY_EXPR:
>> +      /* Be suspicious of assigning costs to copies that may well go away.  */
>> +      return 0;
>>
>> MODIFY_EXPR is never a gimple_assign_rhs_code.  Simple copies have
>> a code of SSA_NAME for example.  But as you assert if you get to an
>> unhandled code I wonder why you needed the above ...
>
> I'll remove this, and document that we are deliberately not touching
> copies (which was my original intent).
>
>>
>> +static slsr_cand_t
>> +base_cand_from_table (tree base_in)
>> +{
>> +  slsr_cand mapping_key;
>> +
>> +  gimple def = SSA_NAME_DEF_STMT (base_in);
>> +  if (!def)
>> +    return (slsr_cand_t) NULL;
>> +
>> +  mapping_key.cand_stmt = def;
>> +  return (slsr_cand_t) htab_find (stmt_cand_map, &mapping_key);
>>
>> isn't that reachable via the base-name -> chain mapping for base_in?
>
> Maybe so.  I need to think about it some more (it got evicted from my
> mental cache).  It could be I created this before adding the chains
> stuff and never cleaned up.

Ok.

>>
>> +  if (TREE_CODE (rhs2) == SSA_NAME && operand_equal_p (rhs1, rhs2, 0))
>> +    return;
>>
>> SSA_NAMEs can be compared by pointer equality, thus the above is
>> equivalent to
>>
>>   if (TREE_CODE (rhs2) == SSA_NAME && rhs1 == rhs2)
>>
>> or even just
>>
>>   if (rhs1 == rhs2)
>>
>> applies elsewhere as well.
>>
>
> Ok.  I promise I'll remember this eventually...

;)

>> +/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return
>> +   the multiple in *MULTIPLE.  Otherwise return FALSE and leave *MULTIPLE
>> +   unchanged.  */
>> +/* ??? - Should this be moved to double-int.c?  */
>>
>> I think so.
>
> Ok, I'll include this in the preliminaries patch.
>
>>
>> +static bool
>> +double_int_multiple_of (double_int product, double_int factor,
>> +                     bool unsigned_p, double_int *multiple)
>> +{
>> +  double_int remainder;
>> +  double_int quotient = double_int_divmod (product, factor, unsigned_p,
>> +                                        TRUNC_DIV_EXPR, &remainder);
>> +  if (double_int_zero_p (remainder))
>> +    {
>> +      *multiple = quotient;
>> +      return true;
>> +    }
>> +
>> +  return false;
>> +}
>>
>>
>> In general I find a lot of code of similar structure, factoring bits may make
>> it easier to read.  Obvious candidates include:
>>
>> +      /* Add the interpretation to the statement-candidate mapping.  */
>> +      slot = htab_find_slot (stmt_cand_map, c, INSERT);
>> +      gcc_assert (!*slot);
>> +      *slot = c;
>>
>> into a function add_cand_for_stmt ()
>>
>> +      if (TREE_CODE (rhs1) != SSA_NAME || TREE_CODE (rhs2) != INTEGER_CST)
>> +     return;
>>
>> doing some simple checks of this kind in the function walking all statements
>> and pass in operands and operation code.
>
> Sounds good.  You should have seen it before I already did one round of
> factoring... :)
>
>>
>> +/* Return TRUE if GS is a statement that defines an SSA name from
>> +   a NOP_EXPR and is legal for us to combine an add and multiply
>> +   across.  This is legitimate for casts from a signed type to
>> +   a signed or unsigned type of the same or larger size.  It is not
>> +   legitimate to convert any unsigned type to a signed type, or
>> +   to an unsigned type of a different size.
>> +
>> +   The reasoning here is that signed integer overflow is undefined,
>> +   so any program that was expecting overflow that no longer occurs
>> +   is not correct.  Unsigned integers, however, have wrap semantics,
>> +   and it is reasonable for programs to assume an overflowing add
>> +   will wrap.
>> +
>> +   With -fwrapv, signed integers also have wrap semantics, so widening
>> +   casts are not allowed then either.  */
>> +
>> +static bool
>> +legal_cast_p (gimple gs)
>> +{
>> +  tree lhs, rhs, lhs_type, rhs_type;
>> +  unsigned lhs_size, rhs_size, lhs_uns, rhs_uns;
>> +
>> +  if (!is_gimple_assign (gs)
>> +      || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME
>>
>> That's always true if the code is NOP_EXPR
>>
>> +      || gimple_assign_rhs_code (gs) != NOP_EXPR)
>>
>> CONVERT_EXPR_CODE_P ()
>>
>> +    return false;
>> +
>> +  lhs = gimple_assign_lhs (gs);
>> +  rhs = gimple_assign_rhs1 (gs);
>> +
>> +  if (TREE_CODE (rhs) != SSA_NAME)
>> +    return false;
>>
>> Likewise.
>>
>> +  lhs_type = TREE_TYPE (SSA_NAME_VAR (lhs));
>> +  rhs_type = TREE_TYPE (SSA_NAME_VAR (rhs));
>>
>> Get the type from the SSA_NAMEs themselves, no need to lookup
>> SSA_NAME_VAR.  If it happens you ICE that way you are looking
>> at released SSA names ...
>>
>> +  lhs_size = TYPE_PRECISION (lhs_type);
>> +  rhs_size = TYPE_PRECISION (rhs_type);
>> +  lhs_uns = TYPE_UNSIGNED (lhs_type);
>> +  rhs_uns = TYPE_UNSIGNED (rhs_type);
>> +
>> +  if (lhs_size < rhs_size
>> +      || (rhs_uns && !lhs_uns)
>> +      || (rhs_uns && lhs_uns && rhs_size != lhs_size)
>> +      || (!rhs_uns && flag_wrapv && rhs_size != lhs_size))
>> +    return false;
>>
>> So we basically check whether the lhs can represent all values of the rhs?
>> So ...
>>
>>     if (lhs_size > rhs_size)
>>       return true;
>>
>> is missing?  Because you'd reject a widening of an unsigned int to an
>> unsigned long.
>
> That rejection is intentional.  Assuming 4-byte int and 8-byte long, the
> unsigned int would wrap at 32 bits, so allowing the widening could
> change semantics.  I've actually encountered problems like this all over
> libiberty and other GCC support libraries.
>
>>
>> As for your handling of flag_wrapv and the unsigned flag, please try
>> to use the TYPE_OVERFLOW_UNDEFINED/TYPE_OVERFLOW_WRAPS
>> type properties instead for the parts that care about overflow behavior
>> and not value-representation.
>>
>> +  return true;
>> +}
>>
>
> OK, I'll look into those and let you know if I have questions.
>
>> I have a deja-vu for seeing similar code elsewhere (widening shift/multiply
>> support in the vector pattern recognizer maybe).  While I may understand
>> what the textual description says including an example would explain
>> things better.  For example
>>
>> "Return TRUE if GS is a statement that defines an SSA name from
>>  a conversion and is legal for us to associate with an add or multiply.
>>  Thus transform name = (T) name'; name * X; to name' * X"
>>
>> But I suppose I got the semantics wrong (and thus the example).  Can you
>> clarify?
>
> OK, I'll try.  This was definitely the hardest part to get right.

Of course ;)  It's probably also important due to the all-present casts to
sizetype for address offset computations.

>>
>> Otherwise the pass looks quite good.  It looks though that it will optimize
>> cross-basic-block strength-reduction opportunities, eventually causing
>> cross-BB register livetime increases?  Or is this not an issue?
>
> This is true, and it's made somewhat worse by later optimizations.  I
> attempted to reduce the scope where possible by choosing the most
> closely dominating candidate as a basis, thinking this would help limit
> lifetimes.  However, later optimizations fold up the various adds so
> that all candidates in a chain end up directly depending on the
> "ultimate basis," resulting in one longer lifetime.
>
> It's something to keep an eye on.  If it appears to be an issue, we may
> need to look into some sort of distance heuristic to limit which
> candidates can be used as a basis.  It's not a big deal to keep a
> register live across a single conditional branch, but reusing a value
> from a control-equivalent block when there's an intervening loop is not
> so good.  (On the other hand, it's just another opportunity for
> intelligent register spill design. :)
>
> As I indicated, though, I don't think this is a problem limited to this
> pass, but is a general issue many places in the middle end.

Of course.  For this pass as you already have a 'cost' associated you
could simply pessimize candidates from a different basic-block slightly.

But I guess without a real testcase it would be just some magic ...

Richard.

>>
>> Thanks for working on this,
>> Richard.
>>
>
> And thanks very much for the review!
>
> Bill
>

Patch

Index: gcc/tree-ssa-strength-reduction.c
===================================================================
--- gcc/tree-ssa-strength-reduction.c	(revision 0)
+++ gcc/tree-ssa-strength-reduction.c	(revision 0)
@@ -0,0 +1,1611 @@ 
+/* Straight-line strength reduction.
+   Copyright (C) 2012  Free Software Foundation, Inc.

I know we have these 'tree-ssa-' names, but really this is gimple-ssa now ;)
So, please name it gimple-ssa-strength-reduction.c.

+  /* Access to the statement for subsequent modification.  Cached to
+     save compile time.  */
+  gimple_stmt_iterator cand_gsi;

this is a iterator for cand_stmt?  Then caching it is no longer necessary
as the iterator is the stmt itself after recent infrastructure changes.

+/* Hash table embodying a mapping from statements to candidates.  */
+static htab_t stmt_cand_map;
...
+static hashval_t
+stmt_cand_hash (const void *p)
+{
+  return htab_hash_pointer (((const_slsr_cand_t) p)->cand_stmt);
+}

use a pointer-map instead.

+/* Callback to produce a hash value for a candidate chain header.  */
+
+static hashval_t
+base_cand_hash (const void *p)
+{
+  tree ssa_name = ((const_cand_chain_t) p)->base_name;
+
+  if (TREE_CODE (ssa_name) != SSA_NAME)
+    return (hashval_t) 0;
+
+  return (hashval_t) SSA_NAME_VERSION (ssa_name);
+}

does it ever happen that ssa_name is not an SSA_NAME?  I'm not sure
the memory savings over simply using a fixed-size (num_ssa_names)
array indexed by SSA_NAME_VERSION pointing to the chain is worth
using a hashtable for this?

+  node = (cand_chain_t) pool_alloc (chain_pool);
+  node->base_name = c->base_name;

If you never free pool entries it's more efficient to use an obstack.
alloc-pool
only pays off if you get freed item re-use.

+  switch (gimple_assign_rhs_code (gs))
+    {
+    case MULT_EXPR:
+      rhs2 = gimple_assign_rhs2 (gs);
+
+      if (TREE_CODE (rhs2) == INTEGER_CST)
+	return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
+
+      if (TREE_CODE (rhs1) == INTEGER_CST)
+	return multiply_by_cost (TREE_INT_CST_LOW (rhs1), lhs_mode, speed);

In theory all commutative statements should have constant operands only
at rhs2 ...

Also you do not verify that the constant fits in a host-wide-int - but maybe
you do not care?  Thus, I'd do

   if (host_integerp (rhs2, 0))
     return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);

or make multiply_by[_const?]_cost take a double-int instead.  Likewise below
for add.

+    case MODIFY_EXPR:
+      /* Be suspicious of assigning costs to copies that may well go away.  */
+      return 0;

MODIFY_EXPR is never a gimple_assign_rhs_code.  Simple copies have
a code of SSA_NAME for example.  But as you assert if you get to an
unhandled code I wonder why you needed the above ...

+static slsr_cand_t
+base_cand_from_table (tree base_in)
+{
+  slsr_cand mapping_key;
+
+  gimple def = SSA_NAME_DEF_STMT (base_in);
+  if (!def)
+    return (slsr_cand_t) NULL;
+
+  mapping_key.cand_stmt = def;
+  return (slsr_cand_t) htab_find (stmt_cand_map, &mapping_key);

isn't that reachable via the base-name -> chain mapping for base_in?

+  if (TREE_CODE (rhs2) == SSA_NAME && operand_equal_p (rhs1, rhs2, 0))
+    return;

SSA_NAMEs can be compared by pointer equality, thus the above is
equivalent to

  if (TREE_CODE (rhs2) == SSA_NAME && rhs1 == rhs2)

or even just

  if (rhs1 == rhs2)

applies elsewhere as well.

+/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return
+   the multiple in *MULTIPLE.  Otherwise return FALSE and leave *MULTIPLE
+   unchanged.  */
+/* ??? - Should this be moved to double-int.c?  */

I think so.

+static bool
+double_int_multiple_of (double_int product, double_int factor,
+			bool unsigned_p, double_int *multiple)
+{
+  double_int remainder;
+  double_int quotient = double_int_divmod (product, factor, unsigned_p,
+					   TRUNC_DIV_EXPR, &remainder);
+  if (double_int_zero_p (remainder))
+    {
+      *multiple = quotient;
+      return true;
+    }
+
+  return false;
+}


In general I find a lot of code of similar structure, factoring bits may make
it easier to read.  Obvious candidates include:

+      /* Add the interpretation to the statement-candidate mapping.  */
+      slot = htab_find_slot (stmt_cand_map, c, INSERT);
+      gcc_assert (!*slot);
+      *slot = c;

into a function add_cand_for_stmt ()

+      if (TREE_CODE (rhs1) != SSA_NAME || TREE_CODE (rhs2) != INTEGER_CST)
+	return;

doing some simple checks of this kind in the function walking all statements
and pass in operands and operation code.

+/* Return TRUE if GS is a statement that defines an SSA name from
+   a NOP_EXPR and is legal for us to combine an add and multiply
+   across.  This is legitimate for casts from a signed type to
+   a signed or unsigned type of the same or larger size.  It is not
+   legitimate to convert any unsigned type to a signed type, or
+   to an unsigned type of a different size.
+
+   The reasoning here is that signed integer overflow is undefined,
+   so any program that was expecting overflow that no longer occurs
+   is not correct.  Unsigned integers, however, have wrap semantics,
+   and it is reasonable for programs to assume an overflowing add
+   will wrap.
+
+   With -fwrapv, signed integers also have wrap semantics, so widening
+   casts are not allowed then either.  */
+
+static bool
+legal_cast_p (gimple gs)
+{
+  tree lhs, rhs, lhs_type, rhs_type;
+  unsigned lhs_size, rhs_size, lhs_uns, rhs_uns;
+
+  if (!is_gimple_assign (gs)
+      || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME

That's always true if the code is NOP_EXPR

+      || gimple_assign_rhs_code (gs) != NOP_EXPR)

CONVERT_EXPR_CODE_P ()

+    return false;
+
+  lhs = gimple_assign_lhs (gs);
+  rhs = gimple_assign_rhs1 (gs);
+
+  if (TREE_CODE (rhs) != SSA_NAME)
+    return false;

Likewise.

+  lhs_type = TREE_TYPE (SSA_NAME_VAR (lhs));
+  rhs_type = TREE_TYPE (SSA_NAME_VAR (rhs));

Get the type from the SSA_NAMEs themselves, no need to lookup
SSA_NAME_VAR.  If it happens you ICE that way you are looking
at released SSA names ...

+  lhs_size = TYPE_PRECISION (lhs_type);
+  rhs_size = TYPE_PRECISION (rhs_type);
+  lhs_uns = TYPE_UNSIGNED (lhs_type);
+  rhs_uns = TYPE_UNSIGNED (rhs_type);
+
+  if (lhs_size < rhs_size
+      || (rhs_uns && !lhs_uns)
+      || (rhs_uns && lhs_uns && rhs_size != lhs_size)
+      || (!rhs_uns && flag_wrapv && rhs_size != lhs_size))
+    return false;

So we basically check whether the lhs can represent all values of the rhs?
So ...

    if (lhs_size > rhs_size)
      return true;

is missing?  Because you'd reject a widening of an unsigned int to an
unsigned long.

As for your handling of flag_wrapv and the unsigned flag, please try
to use the TYPE_OVERFLOW_UNDEFINED/TYPE_OVERFLOW_WRAPS
type properties instead for the parts that care about overflow behavior
and not value-representation.

+  return true;
+}

I have a deja-vu for seeing similar code elsewhere (widening shift/multiply
support in the vector pattern recognizer maybe).  While I may understand
what the textual description says including an example would explain
things better.  For example

"Return TRUE if GS is a statement that defines an SSA name from
 a conversion and is legal for us to associate with an add or multiply.
 Thus transform name = (T) name'; name * X; to name' * X"

But I suppose I got the semantics wrong (and thus the example).  Can you
clarify?

Otherwise the pass looks quite good.  It looks though that it will optimize
cross-basic-block strength-reduction opportunities, eventually causing
cross-BB register livetime increases?  Or is this not an issue?

Thanks for working on this,
Richard.