Patchwork Straight-line strength reduction, stage 1

login
register
mail settings
Submitter William J. Schmidt
Date Oct. 30, 2011, 4:10 p.m.
Message ID <1319991007.2567.15.camel@gnopaine>
Download mbox | patch
Permalink /patch/122647/
State New
Headers show

Comments

William J. Schmidt - Oct. 30, 2011, 4:10 p.m.
Greetings,

IVOPTS handles strength reduction of induction variables, but GCC does
not currently perform strength reduction in straight-line code.  This
has been noted before in PR22586 and PR35308.  PR46556 is also a case
that could be handled with strength reduction.  This patch adds a pass
to perform strength reduction along dominator paths on the easiest
cases, where replacements are obviously profitable.

My intent is to add subsequent installments to handle more involved
cases, as described in the code commentary.  The cases not yet covered
will require target-specific cost analysis to determine profitability.
It is likely that this will leverage some of the cost function
infrastructure in tree-ssa-loop-ivopts.c.

I've bootstrapped and tested the code on powerpc64-linux with no
regressions.  I've also run SPEC CPU2006 to compare results.  32-bit
PowerPC gains about 1% on integer code.  Other results are in the noise
range.  64-bit integer code would have also shown gains, except for one
bad result (400.perlbench degraded 4%).  Initial analysis shows that
very different control flow is created for regmatch with the patch than
without.  I will be investigating further this week, but presumably some
touchy threshold was no longer met for a downstream optimization --
probably an indirect effect.

My hope is to commit this first stage as part of 4.7, with the remainder
to follow in 4.8.

Thanks for your consideration,

Bill


2011-10-30  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

gcc:

	* tree-pass.h (pass_strength_reduction): New declaration.
	* timevar.def (TV_TREE_SLSR): New time-var.
	* tree-ssa-strength-reduction.c: New file.
	* Makefile.in: New dependences.
	* passes.c (init_optimization_passes): Add new pass.

gcc/testsuite:

	* gcc.dg/tree-ssa/slsr-1.c: New test case.
	* gcc.dg/tree-ssa/slsr-2.c: New test case.
	* gcc.dg/tree-ssa/slsr-3.c: New test case.
	* gcc.dg/tree-ssa/slsr-4.c: New test case.
Richard Guenther - Nov. 4, 2011, 1:55 p.m.
On Sun, Oct 30, 2011 at 5:10 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Greetings,
>
> IVOPTS handles strength reduction of induction variables, but GCC does
> not currently perform strength reduction in straight-line code.  This
> has been noted before in PR22586 and PR35308.  PR46556 is also a case
> that could be handled with strength reduction.  This patch adds a pass
> to perform strength reduction along dominator paths on the easiest
> cases, where replacements are obviously profitable.
>
> My intent is to add subsequent installments to handle more involved
> cases, as described in the code commentary.  The cases not yet covered
> will require target-specific cost analysis to determine profitability.
> It is likely that this will leverage some of the cost function
> infrastructure in tree-ssa-loop-ivopts.c.
>
> I've bootstrapped and tested the code on powerpc64-linux with no
> regressions.  I've also run SPEC CPU2006 to compare results.  32-bit
> PowerPC gains about 1% on integer code.  Other results are in the noise
> range.  64-bit integer code would have also shown gains, except for one
> bad result (400.perlbench degraded 4%).  Initial analysis shows that
> very different control flow is created for regmatch with the patch than
> without.  I will be investigating further this week, but presumably some
> touchy threshold was no longer met for a downstream optimization --
> probably an indirect effect.
>
> My hope is to commit this first stage as part of 4.7, with the remainder
> to follow in 4.8.
>
> Thanks for your consideration,

I've had a quick look for now and noted two things

1) you try to handle casting already - for the specific patterns, it seems
the constraints are the same as for detecting when using a widening
multiplication/add is possible?  I think we should have some common
code for the legality checks.

2) you do not handle POINTER_PLUS_EXPR - which looks most
interesting for the pure pointer arithmetic case.  (and you do not
treat PLUS_EXPR as commutative, thus a + b*c you handle but
not b*c + a(?), for a*b + c*d we'd have two candidates)

I'm not sure I follow the dominance checks - usually instead of

+         || !dominated_by_p (CDI_DOMINATORS,
+                             gimple_bb (c->cand_stmt),
+                             gimple_bb (use_stmt)))

you'd special case gimple_bb (c->cand_stmt) == gimple_bb (use_stmt)
and have stmt uids to resolve dominance inside a basic block.

For the overall structure I see you are matching the MULT_EXPRs
and are looking at all immediate uses of it for candidates with +-.
I'd have expected you walk the BBs in dominator order, visiting
statements and looking for the +- instead.  That would avoid any
issue about dominance (because you walk BBs properly) and
wouldn't require immediate use walks either (I couldn't convince
myself 100% you are not visiting stmts multiple times that way...)
But maybe I missed sth in my quick look.

Most interesting (and one reason why strength reduction from PRE
didn't work out) is the from a A + B*C stmt lookup all candidates
for computing the value in a more efficient way.

You do not seem to transform stmts on-the-fly, so for

a1 = c + d;
a2 = c + 2*d;
a3 = c + 3*d;

are you able to generate

a1 = c + d;
a2 = a1 + d;
a3 = a2 + d;

?  On-the-fly operation would also handle this if the candidate info
for a2 is kept as c + 2*d.  Though it's probably worth lookign at

a1 = c + d;
a2 = a1 + d;
a3 = c + 3*d;

and see if that still figures out that a3 = a2 + d (thus, do you,
when you find the candidate a1 + 1 * d, fold in candidate information
for a1?  thus, try to reduce it to an ultimate base?)

Thanks,
Richard.


> Bill
>
>
> 2011-10-30  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
> gcc:
>
>        * tree-pass.h (pass_strength_reduction): New declaration.
>        * timevar.def (TV_TREE_SLSR): New time-var.
>        * tree-ssa-strength-reduction.c: New file.
>        * Makefile.in: New dependences.
>        * passes.c (init_optimization_passes): Add new pass.
>
> gcc/testsuite:
>
>        * gcc.dg/tree-ssa/slsr-1.c: New test case.
>        * gcc.dg/tree-ssa/slsr-2.c: New test case.
>        * gcc.dg/tree-ssa/slsr-3.c: New test case.
>        * gcc.dg/tree-ssa/slsr-4.c: New test case.
>
>
> Index: gcc/tree-pass.h
> ===================================================================
> --- gcc/tree-pass.h     (revision 180617)
> +++ gcc/tree-pass.h     (working copy)
> @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_tracer;
>  extern struct gimple_opt_pass pass_warn_unused_result;
>  extern struct gimple_opt_pass pass_split_functions;
>  extern struct gimple_opt_pass pass_feedback_split_functions;
> +extern struct gimple_opt_pass pass_strength_reduction;
>
>  /* IPA Passes */
>  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c      (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c      (revision 0)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +extern void foo (int);
> +
> +void
> +f (int *p, unsigned int n)
> +{
> +  foo (*(p + n * 4));
> +  foo (*(p + 32 + n * 4));
> +  if (n > 3)
> +    foo (*(p + 16 + n * 4));
> +  else
> +    foo (*(p + 48 + n * 4));
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\+ 128" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 64" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 192" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c      (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c      (revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +extern void foo (int);
> +
> +void
> +f (int *p, int n)
> +{
> +  foo (*(p + n++ * 4));
> +  foo (*(p + 32 + n++ * 4));
> +  foo (*(p + 16 + n * 4));
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\+ 144" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 96" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c      (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c      (revision 0)
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +foo (int a[], int b[], int i)
> +{
> +  a[i] = b[i] + 2;
> +  i++;
> +  a[i] = b[i] + 2;
> +  i++;
> +  a[i] = b[i] + 2;
> +  i++;
> +  a[i] = b[i] + 2;
> +  i++;
> +  return i;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 4" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 8" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 12" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c      (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c      (revision 0)
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-slsr -fdump-tree-optimized" } */
> +
> +void foo (int);
> +
> +int
> +f (int i)
> +{
> +  int x, y;
> +
> +  x = i * 4;
> +  y = x * 10;
> +  foo (y);
> +
> +  i = i + 5;
> +  x = i * 4;
> +  y = x * 10;
> +  foo (y);
> +
> +  i = i - 4;
> +  x = i * 4;
> +  y = x * 10;
> +  foo (y);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "slsr" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/timevar.def
> ===================================================================
> --- gcc/timevar.def     (revision 180617)
> +++ gcc/timevar.def     (working copy)
> @@ -252,6 +252,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
>  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
>  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
>  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
> +DEFTIMEVAR (TV_TREE_SLSR             , "straight-line strength reduction")
>
>  /* Everything else in rest_of_compilation not included above.  */
>  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
> Index: gcc/tree-ssa-strength-reduction.c
> ===================================================================
> --- gcc/tree-ssa-strength-reduction.c   (revision 0)
> +++ gcc/tree-ssa-strength-reduction.c   (revision 0)
> @@ -0,0 +1,813 @@
> +/* Straight-line strength reduction.
> +   Copyright (C) 2011  Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* There are many algorithms for performing strength reduction on
> +   loops.  This is not one of them.  IVOPTS handles strength reduction
> +   of induction variables just fine.  This pass is intended to pick
> +   up the crumbs it leaves behind, by considering opportunities for
> +   strength reduction along dominator paths.
> +
> +   Strength reduction will be implemented in four stages, gradually
> +   adding more complex candidates:
> +
> +   1) Explicit multiplies, known constant multipliers, no
> +      conditional increments.
> +   2) Explicit multiplies, unknown constant multipliers,
> +      no conditional increments.
> +   3) Explicit multiplies, conditional increments.
> +   4) Implicit multiplies in addressing expressions.
> +
> +   It would also be possible to apply strength reduction to divisions
> +   and modulos, but such opportunities are relatively uncommon.
> +
> +   Strength reduction is also currently restricted to integer operations.
> +   If desired, it could be extended to floating-point operations under
> +   control of something like -funsafe-math-optimizations.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "basic-block.h"
> +#include "tree-pass.h"
> +#include "timevar.h"
> +#include "cfgloop.h"
> +#include "tree-pretty-print.h"
> +#include "gimple-pretty-print.h"
> +#include "alloc-pool.h"
> +#include "tree-flow.h"
> +
> +/* Index into the candidate vector, offset by 1.  VECs are zero-based,
> +   while cand_idx's are one-based, with zero indicating null.  */
> +
> +typedef unsigned int cand_idx;
> +
> +/* Information about a strength reduction candidate.  A candidate
> +   is a statement S1 of the expanded form
> +
> +     S1:  LHS = (B + c1) * M,
> +
> +   where B is an SSA name, c1 is a constant that may be zero, and
> +   M is either an SSA name or a nonzero constant.  A candidate may
> +   only be strength reduced if a previous "basis" statement
> +
> +     S0:  S = (B' + c0) * M
> +
> +   of the same form exists previously in the same block or in a
> +   dominator.  Assuming replacement is profitable, there are two
> +   cases:
> +
> +   (1) If B = B', then S1 can be replaced with:
> +
> +         S1':  LHS = S + (c1 - c0) * M,
> +
> +       where (c1 - c0) * M is folded to the extent possible.
> +
> +   (2) If B differs from B', then we require that B = B' + c0,
> +       and S1 can be replaced with:
> +
> +         S1':  LHS = S + c1 * M,
> +
> +       where c1 * M is folded if possible.
> +
> +   Candidate records are allocated from an allocation pool.  They are
> +   addressed both from a hash table keyed on S1, and from a vector of
> +   candidate pointers arranged in predominator order.  */
> +
> +typedef struct slsr_cand_d
> +{
> +  /* The candidate statement S1.  */
> +  gimple cand_stmt;
> +
> +  /* The base SSA name B.  */
> +  tree base_name;
> +
> +  /* The stride M.  */
> +  tree stride;
> +
> +  /* The index constant c1.  */
> +  double_int index;
> +
> +  /* Index of this candidate in the candidate vector.  */
> +  cand_idx cand_num;
> +
> +  /* Index of the basis statement S0, if any, in the candidate vector.  */
> +  cand_idx basis;
> +
> +  /* First candidate for which this candidate is a basis, if one exists.  */
> +  cand_idx dependent;
> +
> +  /* Next candidate having the same basis as this one.  */
> +  cand_idx sibling;
> +
> +  /* If this is a conditional candidate, the defining PHI statement
> +     for the base SSA name B.  */
> +  gimple def_phi;
> +
> +  /* Access to the statement for subsequent modification.  Cached to
> +     save compile time.  */
> +  gimple_stmt_iterator cand_gsi;
> +
> +} slsr_cand, *slsr_cand_t;
> +
> +typedef const struct slsr_cand_d *const_slsr_cand_t;
> +
> +/* Candidates are maintained in a vector.  If candidate X dominates
> +   candidate Y, then X appears before Y in the vector; but the
> +   converse does not necessarily hold.  */
> +DEF_VEC_P (slsr_cand_t);
> +DEF_VEC_ALLOC_P (slsr_cand_t, heap);
> +static VEC (slsr_cand_t, heap) *cand_vec;
> +
> +/* Hash table embodying a mapping from statements to candidates.  */
> +static htab_t stmt_cand_map;
> +
> +/* Allocation pool for candidates.  */
> +static alloc_pool cand_pool;
> +
> +
> +/* Produce a pointer to the IDX'th candidate in the candidate vector.  */
> +
> +static slsr_cand_t
> +lookup_cand (cand_idx idx)
> +{
> +  return VEC_index (slsr_cand_t, cand_vec, idx - 1);
> +}
> +
> +/* Callback to produce a hash value for a candidate.  */
> +
> +static hashval_t
> +stmt_cand_hash (const void *p)
> +{
> +  return htab_hash_pointer (((const_slsr_cand_t)p)->cand_stmt);
> +}
> +
> +/* Callback when an element is removed from the hash table.
> +   We never remove entries until the entire table is released.  */
> +
> +static void
> +stmt_cand_free (void *p ATTRIBUTE_UNUSED)
> +{
> +}
> +
> +/* Callback to return true if two candidates are equal.  */
> +
> +static int
> +stmt_cand_eq (const void *p1, const void *p2)
> +{
> +  const_slsr_cand_t const cand1 = (const_slsr_cand_t)p1;
> +  const_slsr_cand_t const cand2 = (const_slsr_cand_t)p2;
> +  return (cand1->cand_stmt == cand2->cand_stmt);
> +}
> +
> +/* Return TRUE if GS is a statement that defines an SSA name from
> +   a NOP_EXPR and is legal for us to combine an add and multiply
> +   across.  This is legitimate for casts from a signed type to
> +   a signed or unsigned type of the same or larger size.  It is not
> +   legitimate to convert any unsigned type to a signed type, or
> +   to an unsigned type of a different size.
> +
> +   The reasoning here is that signed integer overflow is undefined,
> +   so any program that was expecting overflow that no longer occurs
> +   is not correct.  Unsigned integers, however, have wrap semantics,
> +   and it is reasonable for programs to assume an overflowing add
> +   will wrap.  */
> +
> +static bool
> +legal_cast_p (gimple gs)
> +{
> +  tree lhs, rhs;
> +  unsigned int src_size, dst_size, src_uns, dst_uns;
> +
> +  if (!is_gimple_assign (gs)
> +      || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME
> +      || gimple_assign_rhs_code (gs) != NOP_EXPR)
> +    return false;
> +
> +  lhs = gimple_assign_lhs (gs);
> +  rhs = gimple_assign_rhs1 (gs);
> +
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    return false;
> +
> +  src_size = TYPE_PRECISION (TREE_TYPE (SSA_NAME_VAR (rhs)));
> +  src_uns = TYPE_UNSIGNED (TREE_TYPE (SSA_NAME_VAR (rhs)));
> +  dst_size = TYPE_PRECISION (TREE_TYPE (SSA_NAME_VAR (lhs)));
> +  dst_uns = TYPE_UNSIGNED (TREE_TYPE (SSA_NAME_VAR (lhs)));
> +
> +  if (dst_size < src_size)
> +    return false;
> +
> +  if (src_uns && !dst_uns)
> +    return false;
> +
> +  if (src_uns && dst_uns && src_size != dst_size)
> +    return false;
> +
> +  return true;
> +}
> +
> +/* If GS is a statement that defines an SSA name from a NOP_EXPR or
> +   CONVERT_EXPR and is legal for our purposes (see legal_cast_p),
> +   return the source SSA name.  */
> +
> +static tree
> +source_of_legal_cast (gimple gs)
> +{
> +  if (legal_cast_p (gs))
> +    return gimple_assign_rhs1 (gs);
> +
> +  return NULL_TREE;
> +}
> +
> +/* If GS is a statement that defines an SSA name from a NOP_EXPR or
> +   CONVERT_EXPR and is legal for our purposes (see legal_cast_p),
> +   return the target SSA name.  */
> +
> +static tree
> +target_of_legal_cast (gimple gs)
> +{
> +  if (legal_cast_p (gs))
> +    return gimple_assign_lhs (gs);
> +
> +  return NULL_TREE;
> +}
> +
> +/* Return TRUE if GS consists of an add or subtract of BASE_NAME
> +   with a constant, with the result placed in an SSA name.  GS is
> +   known to be a gimple assignment.  CODE is the RHS code for GS.  */
> +
> +static bool
> +stmt_adds_base_to_const (gimple gs, enum tree_code code, tree base_name)
> +{
> +  return (TREE_CODE (gimple_assign_lhs (gs)) == SSA_NAME
> +         && (code == PLUS_EXPR || code == MINUS_EXPR)
> +         && operand_equal_p (gimple_assign_rhs1 (gs), base_name, 0)
> +         && TREE_CODE (gimple_assign_rhs2 (gs)) == INTEGER_CST);
> +}
> +
> +/* Recursive helper for find_basis_for_candidate.  To find a
> +   possible basis, first try the immediate uses of the given
> +   BASE_NAME.  If a particular use doesn't appear in a multiply,
> +   see if it appears in an add that feeds one or more multiplies.
> +   Recurse using the SSA name defined on the add.  Each potential
> +   basis found in this manner must also appear in a block that
> +   dominates the candidate statement, be present in the candidate
> +   table, and have the same stride M.  If more than one possible
> +   basis exists, pick the one with highest index in the vector,
> +   which will be the most immediately dominating basis.  */
> +
> +static slsr_cand_t
> +find_basis_for_candidate_1 (slsr_cand_t c, tree base_name, slsr_cand_t basis)
> +{
> +  imm_use_iterator use_iter;
> +  use_operand_p use_p;
> +
> +  FOR_EACH_IMM_USE_FAST (use_p, use_iter, base_name)
> +    {
> +      gimple use_stmt = USE_STMT (use_p);
> +      slsr_cand_t use_basis = NULL;
> +      enum tree_code code;
> +      tree cast_target;
> +
> +      if (!is_gimple_assign (use_stmt))
> +       continue;
> +
> +      /* When revising a candidate's basis, make sure not to pick itself.  */
> +      if (c->cand_stmt == use_stmt)
> +       continue;
> +
> +      /* Look through casts where legal.  */
> +      cast_target = target_of_legal_cast (use_stmt);
> +      if (cast_target)
> +       return find_basis_for_candidate_1 (c, cast_target, basis);
> +
> +      /* If the use statement is a multiply, it's a potential basis.
> +        Otherwise, we have to look at the uses of the use statement
> +        to find any multiplies that may be potential bases.  */
> +      code = gimple_assign_rhs_code (use_stmt);
> +
> +      if (code == MULT_EXPR)
> +       {
> +         slsr_cand mapping_key;
> +         mapping_key.cand_stmt = use_stmt;
> +         use_basis = (slsr_cand_t)htab_find (stmt_cand_map, &mapping_key);
> +       }
> +
> +      /* Recurse if this is an add of a constant that might feed a
> +        multiply.  */
> +      else if (stmt_adds_base_to_const (use_stmt, code, base_name))
> +       {
> +         tree lhs = gimple_assign_lhs (use_stmt);
> +         use_basis = find_basis_for_candidate_1 (c, lhs, basis);
> +       }
> +
> +      if (!use_basis
> +         || !operand_equal_p (c->stride, use_basis->stride, 0)
> +         || !dominated_by_p (CDI_DOMINATORS,
> +                             gimple_bb (c->cand_stmt),
> +                             gimple_bb (use_stmt)))
> +       continue;
> +
> +      /* When revising a candidate's basis, be careful not to pick
> +        a basis that occurs after the candidate.  */
> +      if (use_basis->cand_num > c->cand_num)
> +       continue;
> +
> +      if (!basis || basis->cand_num < use_basis->cand_num)
> +       basis = use_basis;
> +    }
> +
> +  return basis;
> +}
> +
> +/* Look for the nearest dominating statement in the candidate
> +   vector that can serve as a basis for the new CANDIDATE.  If
> +   found, adjust the dependent links for the basis entry and
> +   return the index of the basis entry.  Otherwise return zero.  */
> +
> +static unsigned int
> +find_basis_for_candidate (slsr_cand_t c)
> +{
> +  slsr_cand_t basis = find_basis_for_candidate_1 (c, c->base_name, NULL);
> +
> +  if (basis)
> +    {
> +      c->sibling = basis->dependent;
> +      basis->dependent = c->cand_num;
> +      return basis->cand_num;
> +    }
> +
> +  return 0;
> +}
> +
> +/* Starting with statement GS, look backwards through any
> +   intervening legal casts to find one or more
> +   adds of an SSA name and a constant.  Return the earliest
> +   SSA name in the chain as *BASE, and the sum of all constants
> +   in the chain as INDEX.  */
> +
> +static void
> +combine_feeding_adds (gimple gs, tree *base, double_int *index)
> +{
> +  double_int new_index;
> +  tree cast_source;
> +
> +  while ((cast_source = source_of_legal_cast (gs)))
> +    {
> +      gs = SSA_NAME_DEF_STMT (cast_source);
> +      *base = cast_source;
> +    }
> +
> +  /* If there aren't any more adds, we're done.  */
> +  if (!is_gimple_assign (gs)
> +      || (gimple_assign_rhs_code (gs) != PLUS_EXPR
> +         && gimple_assign_rhs_code (gs) != MINUS_EXPR)
> +      || TREE_CODE (gimple_assign_rhs1 (gs)) != SSA_NAME
> +      || TREE_CODE (gimple_assign_rhs2 (gs)) != INTEGER_CST)
> +    return;
> +
> +  *base = gimple_assign_rhs1 (gs);
> +  new_index = tree_to_double_int (gimple_assign_rhs2 (gs));
> +
> +  if (gimple_assign_rhs_code (gs) == MINUS_EXPR)
> +    new_index = double_int_neg (new_index);
> +
> +  *index = double_int_add (*index, new_index);
> +  combine_feeding_adds (SSA_NAME_DEF_STMT (*base), base, index);
> +}
> +
> +/* Given statement GS containing an integer multiply, determine
> +   whether it's a possible strength-reduction candidate.  If so,
> +   add it to the candidate vector and the statement-candidate
> +   mapping.  */
> +
> +static void
> +process_possible_candidate (gimple_stmt_iterator gsi, gimple gs)
> +{
> +  tree rhs1 = gimple_assign_rhs1 (gs);
> +  tree rhs2 = gimple_assign_rhs2 (gs);
> +  tree base;
> +  double_int index;
> +  slsr_cand_t c;
> +  void **slot;
> +
> +  /* TODO:  Unknown constant multipliers will be dealt with in
> +     stage 2.  */
> +  if (TREE_CODE (rhs2) != INTEGER_CST)
> +    return;
> +
> +  /* If RHS1 isn't fed by an addition or subtraction of an SSA_NAME
> +     and a constant, then treat it as an add of zero.  Look through
> +     legal casts and combine multiple chained adds to find
> +     the true base name and index.  */
> +  gcc_assert (TREE_CODE (rhs1) == SSA_NAME);
> +  base = rhs1;
> +  index = double_int_zero;
> +  combine_feeding_adds (SSA_NAME_DEF_STMT (rhs1), &base, &index);
> +
> +  c = (slsr_cand_t)pool_alloc (cand_pool);
> +  c->cand_stmt = gs;
> +  c->base_name = base;
> +  c->stride = rhs2;
> +  c->index = index;
> +  c->cand_num = VEC_length (slsr_cand_t, cand_vec) + 1;
> +  c->dependent = 0;
> +  c->sibling = 0;
> +  c->def_phi = NULL;
> +  c->cand_gsi = gsi;
> +  c->basis = find_basis_for_candidate (c);
> +
> +  VEC_safe_push (slsr_cand_t, heap, cand_vec, c);
> +
> +  slot = htab_find_slot (stmt_cand_map, c, INSERT);
> +  gcc_assert (!*slot);
> +  *slot = c;
> +}
> +
> +/* Find strength-reduction candidates in block BB.  */
> +
> +static void
> +find_candidates_in_block (basic_block bb)
> +{
> +  gimple_stmt_iterator gsi;
> +  basic_block son;
> +
> +  /* Process this block.  */
> +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple gs = gsi_stmt (gsi);
> +      if (is_gimple_assign (gs)
> +         && gimple_assign_rhs_code (gs) == MULT_EXPR
> +         && INTEGRAL_MODE_P (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs)))))
> +       process_possible_candidate (gsi, gs);
> +    }
> +
> +  /* Process dominated blocks.  */
> +  for (son = first_dom_son (CDI_DOMINATORS, bb);
> +       son;
> +       son = next_dom_son (CDI_DOMINATORS, son))
> +    find_candidates_in_block (son);
> +}
> +
> +/* Dump a candidate for debug.  */
> +
> +static void
> +dump_candidate (slsr_cand_t c)
> +{
> +  fprintf (dump_file, "%3d  [%d] ", c->cand_num,
> +          gimple_bb (c->cand_stmt)->index);
> +  print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> +  fputs ("     base:        ", dump_file);
> +  print_generic_expr (dump_file, c->base_name, 0);
> +  fputs ("\n     index:       ", dump_file);
> +  dump_double_int (dump_file, c->index, false);
> +  fputs ("\n     stride:      ", dump_file);
> +  print_generic_expr (dump_file, c->stride, 0);
> +  fprintf (dump_file, "\n     basis:     %3d", c->basis);
> +  fprintf (dump_file, "\n     dependent: %3d", c->dependent);
> +  fprintf (dump_file, "\n     sibling:   %3d", c->sibling);
> +  fputs ("\n     phi:         ", dump_file);
> +  if (c->def_phi)
> +    print_gimple_stmt (dump_file, c->def_phi, 0, 0);
> +  else
> +    fputs ("n/a", dump_file);
> +  fputs ("\n\n", dump_file);
> +}
> +
> +/* Dump the candidate vector for debug.  */
> +
> +static void
> +dump_cand_vec (void)
> +{
> +  unsigned int i;
> +  slsr_cand_t c;
> +
> +  fprintf (dump_file, "\nStrength reduction candidate vector:\n\n");
> +
> +  FOR_EACH_VEC_ELT (slsr_cand_t, cand_vec, i, c)
> +    dump_candidate (c);
> +}
> +
> +/* Recursive helper for unconditional_cands_with_known_stride_p.
> +   Returns TRUE iff C, its siblings, and its dependents are all
> +   unconditional candidates.  */
> +
> +static bool
> +unconditional_cands (slsr_cand_t c)
> +{
> +  if (c->def_phi)
> +    return false;
> +
> +  if (c->sibling && !unconditional_cands (lookup_cand (c->sibling)))
> +    return false;
> +
> +  if (c->dependent && !unconditional_cands (lookup_cand (c->dependent)))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Determine whether or not the tree of candidates rooted at
> +   ROOT consists entirely of unconditional increments with
> +   an INTEGER_CST stride.  */
> +
> +static bool
> +unconditional_cands_with_known_stride_p (slsr_cand_t root)
> +{
> +  /* The stride is identical for all related candidates, so
> +     check it once.  */
> +  if (TREE_CODE (root->stride) != INTEGER_CST)
> +    return false;
> +
> +  return unconditional_cands (lookup_cand (root->dependent));
> +}
> +
> +/* GS is an add or subtract that used to be a multiply.  Follow
> +   its immediate uses to update the candidate table accordingly.  */
> +
> +static void
> +update_chained_candidates (gimple gs)
> +{
> +  tree base_name = gimple_assign_lhs (gs);
> +  imm_use_iterator use_iter;
> +  use_operand_p use_p;
> +  tree cast_target;
> +
> +  FOR_EACH_IMM_USE_FAST (use_p, use_iter, base_name)
> +    {
> +      gimple use_stmt = USE_STMT (use_p);
> +      enum tree_code code;
> +      slsr_cand_t c;
> +
> +      if (!is_gimple_assign (use_stmt))
> +       continue;
> +
> +      /* Look forward through converts that don't change semantics.  */
> +      cast_target = target_of_legal_cast (use_stmt);
> +      if (cast_target)
> +       {
> +         update_chained_candidates (use_stmt);
> +         continue;
> +       }
> +
> +      code = gimple_assign_rhs_code (use_stmt);
> +
> +      /* Case 1:  The name defined by the add appears in a
> +        multiply that exists in the candidate table.  */
> +      if (code == MULT_EXPR)
> +       {
> +         slsr_cand mapping_key;
> +         tree base;
> +         double_int index;
> +
> +         mapping_key.cand_stmt = use_stmt;
> +         if (!(c = (slsr_cand_t)htab_find (stmt_cand_map, &mapping_key)))
> +           continue;
> +
> +         base = base_name;
> +         index = double_int_zero;
> +         combine_feeding_adds (gs, &base, &index);
> +         c->base_name = base;
> +         c->index = index;
> +
> +         if (!c->basis)
> +           c->basis = find_basis_for_candidate (c);
> +         else
> +           /* If all has worked correctly, the modified candidate has
> +              the same basis as before.  */
> +           gcc_assert (lookup_cand (c->basis)
> +                       == find_basis_for_candidate_1 (c, base, NULL));
> +
> +         if (dump_file && (dump_flags & TDF_DETAILS))
> +           {
> +             fputs ("\nRevised candidate:\n", dump_file);
> +             dump_candidate (c);
> +           }
> +       }
> +
> +      /* Case 2: The name defined by the add appears in another
> +        add with a constant, which in turn feeds one or more multiplies
> +        in the candidate table.  Recurse to find the multiplies.  */
> +      else if (stmt_adds_base_to_const (use_stmt, code, base_name))
> +       update_chained_candidates (use_stmt);
> +    }
> +}
> +
> +/* Replace candidate C, each sibling of candidate C, and each
> +   dependent of candidate C with an add or subtract.  */
> +
> +static void
> +replace_dependents (slsr_cand_t c)
> +{
> +  slsr_cand_t basis = lookup_cand (c->basis);
> +  tree basis_name = gimple_assign_lhs (basis->cand_stmt);
> +  tree incr_type = TREE_TYPE (gimple_assign_rhs1 (c->cand_stmt));
> +  double_int increment, stride;
> +
> +  if (operand_equal_p (c->base_name, basis->base_name, 0))
> +    increment = double_int_sub (c->index, basis->index);
> +  else
> +    increment = c->index;
> +
> +  stride = tree_to_double_int (c->stride);
> +  increment = double_int_mul (increment, stride);
> +
> +  /* It is highly unlikely, but possible, that the resulting
> +     increment doesn't fit in a HWI.  Abandon the replacement
> +     in this case.  This does not affect siblings or dependents
> +     of C.  */
> +  /* Restriction to signed HWI is conservative for unsigned types
> +     but allows for safe negation without twisted logic.  */
> +  if (double_int_fits_in_shwi_p (increment))
> +    {
> +      enum tree_code code = PLUS_EXPR;
> +      tree orig_type
> +       = TREE_TYPE (SSA_NAME_VAR (gimple_assign_rhs1 (c->cand_stmt)));
> +      tree repl_type = TREE_TYPE (SSA_NAME_VAR (basis_name));
> +      tree incr_tree;
> +
> +      if (double_int_negative_p (increment))
> +       {
> +         code = MINUS_EXPR;
> +         increment = double_int_neg (increment);
> +       }
> +
> +      incr_tree = double_int_to_tree (incr_type, increment);
> +
> +      /* A widening cast may be necessary.  */
> +      if (orig_type != repl_type)
> +       {
> +         tree new_basis_name = create_tmp_reg (orig_type, "slsr");
> +         gimple nop_stmt;
> +         add_referenced_var (new_basis_name);
> +         new_basis_name = make_ssa_name (new_basis_name, NULL);
> +         nop_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_basis_name,
> +                                                  basis_name, NULL_TREE);
> +         gimple_set_location (nop_stmt, gimple_location (c->cand_stmt));
> +         gsi_insert_before (&c->cand_gsi, nop_stmt, GSI_SAME_STMT);
> +         basis_name = new_basis_name;
> +
> +         if (dump_file && (dump_flags & TDF_DETAILS))
> +           {
> +             fputs ("Inserting: ", dump_file);
> +             print_gimple_stmt (dump_file, nop_stmt, 0, 0);
> +           }
> +       }
> +
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       {
> +         fputs ("Replacing: ", dump_file);
> +         print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> +       }
> +
> +      gimple_assign_set_rhs_with_ops (&c->cand_gsi, code,
> +                                     basis_name, incr_tree);
> +      update_stmt (c->cand_stmt);
> +
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       {
> +         fputs ("With: ", dump_file);
> +         print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> +         fputs ("\n", dump_file);
> +       }
> +
> +      /* The multiply we just converted to an add might well feed
> +        into one or more other multiplies in the candidate table.
> +        If so, we need to update those candidate entries.  */
> +      update_chained_candidates (c->cand_stmt);
> +    }
> +
> +  if (c->sibling)
> +    replace_dependents (lookup_cand (c->sibling));
> +
> +  if (c->dependent)
> +    replace_dependents (lookup_cand (c->dependent));
> +}
> +
> +/* Analyze costs of related candidates in the candidate vector,
> +   and make beneficial replacements.  */
> +
> +static void
> +analyze_candidates_and_replace (void)
> +{
> +  unsigned int i;
> +  slsr_cand_t c;
> +
> +  /* Each candidate that has a null basis and a non-null
> +     dependent is the root of a tree of related statements.
> +     Analyze each tree to determine a subset of those
> +     statements that can be replaced with maximum benefit.  */
> +  FOR_EACH_VEC_ELT (slsr_cand_t, cand_vec, i, c)
> +    {
> +      if (c->basis != 0 || c->dependent == 0)
> +       continue;
> +
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "\nProcessing dependency tree rooted at %d.\n",
> +                c->cand_num);
> +
> +      /* If the common stride of all related candidates is a
> +        known constant, and none of these has a phi-dependence,
> +        then all replacements are considered profitable.
> +        Each replaces a multiply by a single add, with the
> +        possibility that a feeding add also goes dead as a
> +        result.  */
> +      if (unconditional_cands_with_known_stride_p (c))
> +       replace_dependents (lookup_cand (c->dependent));
> +
> +      /* TODO:  When the stride is an SSA name, it may still
> +        be profitable to replace some or all of the dependent
> +        candidates, depending on whether the introduced increments
> +        can be reused.  */
> +
> +      /* TODO:  When conditional increments occur so that a
> +        candidate is dependent upon a phi-basis, the cost of
> +        introducing a temporary must be accounted for.  */
> +
> +      /* TODO:  Strength reduction candidates hidden in
> +        addressing expressions are not yet handled, and will
> +        have more complex cost functions.  */
> +    }
> +}
> +
> +/* Main entry point for straight-line strength reduction.  */
> +
> +static unsigned int
> +execute_strength_reduction (void)
> +{
> +  /* Create the allocation pool where candidates will reside.  */
> +  cand_pool = create_alloc_pool ("Strength reduction pool",
> +                                sizeof (slsr_cand), 128);
> +
> +  /* Allocate the candidate vector.  */
> +  cand_vec = VEC_alloc (slsr_cand_t, heap, 128);
> +
> +  /* Allocate the mapping from statements to candidate indices.  */
> +  stmt_cand_map = htab_create (500, stmt_cand_hash,
> +                              stmt_cand_eq, stmt_cand_free);
> +
> +  /* Initialize the loop optimizer.  We need to detect flow across
> +     back edges, and this gives us dominator information as well.  */
> +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> +
> +  /* Walk the CFG in predominator order looking for strength reduction
> +     candidates.  */
> +  find_candidates_in_block (ENTRY_BLOCK_PTR);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    dump_cand_vec ();
> +
> +  /* Analyze costs and make appropriate replacements.  */
> +  analyze_candidates_and_replace ();
> +
> +  /* Free resources.  */
> +  loop_optimizer_finalize ();
> +  htab_delete (stmt_cand_map);
> +  VEC_free (slsr_cand_t, heap, cand_vec);
> +  free_alloc_pool (cand_pool);
> +
> +  return 0;
> +}
> +
> +static bool
> +gate_strength_reduction (void)
> +{
> +  return optimize > 0;
> +}
> +
> +struct gimple_opt_pass pass_strength_reduction =
> +{
> + {
> +  GIMPLE_PASS,
> +  "slsr",                              /* name */
> +  gate_strength_reduction,             /* gate */
> +  execute_strength_reduction,          /* execute */
> +  NULL,                                        /* sub */
> +  NULL,                                        /* next */
> +  0,                                   /* static_pass_number */
> +  TV_TREE_SLSR,                                /* tv_id */
> +  PROP_cfg | PROP_ssa,                 /* properties_required */
> +  0,                                   /* properties_provided */
> +  0,                                   /* properties_destroyed */
> +  0,                                   /* todo_flags_start */
> +  TODO_dump_func | TODO_verify_ssa     /* todo_flags_finish */
> + }
> +};
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     (revision 180617)
> +++ gcc/Makefile.in     (working copy)
> @@ -1475,6 +1475,7 @@ OBJS = \
>        tree-ssa-reassoc.o \
>        tree-ssa-sccvn.o \
>        tree-ssa-sink.o \
> +       tree-ssa-strength-reduction.o \
>        tree-ssa-strlen.o \
>        tree-ssa-structalias.o \
>        tree-ssa-tail-merge.o \
> @@ -2519,6 +2520,10 @@ tree-ssa-sccvn.o : tree-ssa-sccvn.c $(TREE_FLOW_H)
>    alloc-pool.h $(BASIC_BLOCK_H) $(BITMAP_H) langhooks.h $(HASHTAB_H) $(GIMPLE_H) \
>    $(TREE_INLINE_H) tree-iterator.h tree-ssa-propagate.h tree-ssa-sccvn.h \
>    $(PARAMS_H) tree-pretty-print.h gimple-pretty-print.h gimple-fold.h
> +tree-ssa-strength-reduction.o : tree-ssa-strength-reduction.c $(CONFIG_H) \
> +   $(SYSTEM_H) coretypes.h $(TREE_H) $(GIMPLE_H) $(BASIC_BLOCK_H) \
> +   $(TREE_PASS_H) $(TIMEVAR_H) $(CFGLOOP_H) tree-pretty-print.h alloc-pool.h \
> +   $(TREE_FLOW_H)
>  tree-vrp.o : tree-vrp.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
>    $(TREE_FLOW_H) $(TREE_PASS_H) $(TREE_DUMP_H) $(DIAGNOSTIC_H) $(GGC_H) \
>    $(BASIC_BLOCK_H) tree-ssa-propagate.h $(FLAGS_H) $(TREE_DUMP_H) \
> Index: gcc/passes.c
> ===================================================================
> --- gcc/passes.c        (revision 180617)
> +++ gcc/passes.c        (working copy)
> @@ -1368,6 +1368,7 @@ init_optimization_passes (void)
>        }
>       NEXT_PASS (pass_lower_vector_ssa);
>       NEXT_PASS (pass_cse_reciprocals);
> +      NEXT_PASS (pass_strength_reduction);
>       NEXT_PASS (pass_reassoc);
>       NEXT_PASS (pass_vrp);
>       NEXT_PASS (pass_dominator);
>
>
>
William J. Schmidt - Nov. 4, 2011, 4:21 p.m.
Richard, thanks for the quick reply!  I realize there's a lot of traffic
on gcc-patches right now, so I appreciate your time.

On Fri, 2011-11-04 at 14:55 +0100, Richard Guenther wrote:
> On Sun, Oct 30, 2011 at 5:10 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > Greetings,
> >
> > IVOPTS handles strength reduction of induction variables, but GCC does
> > not currently perform strength reduction in straight-line code.  This
> > has been noted before in PR22586 and PR35308.  PR46556 is also a case
> > that could be handled with strength reduction.  This patch adds a pass
> > to perform strength reduction along dominator paths on the easiest
> > cases, where replacements are obviously profitable.
> >
> > My intent is to add subsequent installments to handle more involved
> > cases, as described in the code commentary.  The cases not yet covered
> > will require target-specific cost analysis to determine profitability.
> > It is likely that this will leverage some of the cost function
> > infrastructure in tree-ssa-loop-ivopts.c.
> >
> > I've bootstrapped and tested the code on powerpc64-linux with no
> > regressions.  I've also run SPEC CPU2006 to compare results.  32-bit
> > PowerPC gains about 1% on integer code.  Other results are in the noise
> > range.  64-bit integer code would have also shown gains, except for one
> > bad result (400.perlbench degraded 4%).  Initial analysis shows that
> > very different control flow is created for regmatch with the patch than
> > without.  I will be investigating further this week, but presumably some
> > touchy threshold was no longer met for a downstream optimization --
> > probably an indirect effect.
> >
> > My hope is to commit this first stage as part of 4.7, with the remainder
> > to follow in 4.8.
> >
> > Thanks for your consideration,
> 
> I've had a quick look for now and noted two things
> 
> 1) you try to handle casting already - for the specific patterns, it seems
> the constraints are the same as for detecting when using a widening
> multiplication/add is possible?  I think we should have some common
> code for the legality checks.

I took a quick look at the logic in the widening mult/add code this
morning.  I don't think I can use the same logic, since it doesn't
appear to constrain a widening operation on unsigned when wrap semantics
might be in play.  I found cases in libiberty (IIRC) where this breaks
code.

The issue is for code like:

	unsigned a;
	long unsigned b, c;
	b1 = (long unsigned) a1;
	c1 = b1 * 10;
	a2 = a + 1; /* may wrap */
	b2 = (long unsigned) a2;
	c2 = b2 * 10;

We'd like to replace the last statement with

	c2 = c1 + 10;

but this isn't correct when a is max(unsigned).  I am not sure why the
widening mult/add code wouldn't have similar concerns.  Perhaps I missed
something there.

> 
> 2) you do not handle POINTER_PLUS_EXPR - which looks most
> interesting for the pure pointer arithmetic case.

For the patterns I'm looking at here, the PLUS_EXPR is creating an
integer index that feeds an integer multiply, which doesn't make sense
for addresses.  I think the issue here is that I am handling one set of
cases and you are thinking about some others that I need to pick up as
well.  More on this below...

>   (and you do not
> treat PLUS_EXPR as commutative, thus a + b*c you handle but
> not b*c + a(?), for a*b + c*d we'd have two candidates)

The cases I'm looking at in this first patch are of the form (x + c) *
d, where x is an SSA name and c and d are constants.  I was under the
impression that I could expect the constant to always be the second RHS
operand for an add of this form; if this is not the case, then I do need
to handle the commutativity of the operands.  (I thought this was the
case for multiplies as well, but I just ran across a case where the
vectorizer produced x = 10 * y, so I know now that I can't count on it
there.)

> 
> I'm not sure I follow the dominance checks - usually instead of
> 
> +         || !dominated_by_p (CDI_DOMINATORS,
> +                             gimple_bb (c->cand_stmt),
> +                             gimple_bb (use_stmt)))
> 
> you'd special case gimple_bb (c->cand_stmt) == gimple_bb (use_stmt)
> and have stmt uids to resolve dominance inside a basic block.

The statement uids aren't needed because of the partial ordering
established on the candidate list.  If one candidate dominates another,
it will have a lower candidate number.  This allows me to avoid a pass
to assign stmt uids.

The first time this routine is called for a candidate, it isn't yet in
the table; all candidates in the table that may serve as its basis must
be in the same block or be a dominator.

The routine may be called again for a candidate as part of
update_chained_candidates when a new feeding add may be created (example
below).  For that case, the subsequent lines handle what the stmt uids
would:

   /* When revising a candidate's basis, be careful not to pick
      a basis that occurs after the candidate.  */
   if (use_basis->cand_num > c->cand_num)
     continue;

> 
> For the overall structure I see you are matching the MULT_EXPRs
> and are looking at all immediate uses of it for candidates with +-.
> I'd have expected you walk the BBs in dominator order, visiting
> statements and looking for the +- instead.  That would avoid any
> issue about dominance (because you walk BBs properly) and
> wouldn't require immediate use walks either (I couldn't convince
> myself 100% you are not visiting stmts multiple times that way...)
> But maybe I missed sth in my quick look.

This would be possible, but also would require keeping a lot of
ancillary information around about adds that may or may not be useful.
Most adds of an SSA name and a constant won't feed a strength reduction
candidate.  The immediate use walk is only done to find the basis among
other candidates, which I felt was cheaper by comparison.

So when we have:

	b1 = x1 * 10;
	x2 = x1 + 1;
	b2 = x2 * 10;
	x3 = x2 + 1;
	b3 = x3 * 10;

the candidate entry for the last statement has a base_name of x2.  We
find its definition as the second statement, walk its uses and find that
b2 is a potential basis with the same stride.  This is by far the usual
case.  The complexities come in when there are potential casts and
additional unfolded adds with constants in between; when those are
present we have to chain forward looking for the multiply.

So I do walk the statements in dominator order, but I only record
information about the multiplies as I encounter them, and look at the
definition of the candidate's base_name as needed to pick up the rest.

When looking for a basis, the dominator check is still required to avoid
picking up a basis, say, from the "then" block of an if-then-else as the
basis for a candidate in the "else" block.  That could be avoided with
logic to grow and shrink the candidate table as blocks are entered and
exited, but I don't want to do that; having the whole web of related
candidates around is important for more complex cases where global costs
need to be considered before replacing any candidates.

> 
> Most interesting (and one reason why strength reduction from PRE
> didn't work out) is the from a A + B*C stmt lookup all candidates
> for computing the value in a more efficient way.
> 
> You do not seem to transform stmts on-the-fly, so for
> 
> a1 = c + d;
> a2 = c + 2*d;
> a3 = c + 3*d;
> 
> are you able to generate
> 
> a1 = c + d;
> a2 = a1 + d;
> a3 = a2 + d;
> 
> ?  

Not in this form, because for this first patch I am looking only at
integer constants for the common stride.  But if you put it in the form
I am looking for:

	b1 = x1 * 10;
	x2 = x1 + 1;
	b2 = x2 * 10;
	x3 = x2 + 1;
	b3 = x3 * 10;

then yes, this will be converted to

	b1 = x1 * 10;
	[x2 = x1 + 1;]
	b2 = b1 + 10;
	[x3 = x2 + 1;]
	b3 = b2 + 10;

with the bracketed statements likely going dead.  This doesn't require
any on-the-fly updating because enough information is present in the
candidate table to process each candidate.

> On-the-fly operation would also handle this if the candidate info
> for a2 is kept as c + 2*d.  Though it's probably worth lookign at
> 
> a1 = c + d;
> a2 = a1 + d;
> a3 = c + 3*d;
> 
> and see if that still figures out that a3 = a2 + d (thus, do you,
> when you find the candidate a1 + 1 * d, fold in candidate information
> for a1?  thus, try to reduce it to an ultimate base?)

I do have on-the-fly updating for cases like the following:

	b1 = x1 * 10;
	c1 = b1 * 3;
	x2 = x1 + 1;
	b2 = x2 * 10;
	c2 = b2 * 3;

This becomes, after processing the definition of b2:

	b1 = x1 * 10;
	c1 = b1 * 3;
	[x2 = x1 + 1;]
	b2 = b1 + 10;
	c2 = b2 * 3;

Since we introduced a new add of an SSA name and a constant, we look at
immediate uses and find that the last statement is in the candidate
table.  Initially it did not have a basis, but now the candidate's
base_name is changed from b2 to b1 and we find that the second statement
is a viable basis.  Subsequently we change the last statement giving

	b1 = x1 * 10;
	c1 = b1 * 3;
	[x2 = x1 + 1;]
	[b2 = b1 + 10;]
	c2 = c1 + 30;

As far as the question of going back to an ultimate basis -- I
deliberately put things in the form

	b1 = x1 * 10;
	b2 = b1 + 10;
	b3 = b2 + 10;

rather than

	b1 = x1 * 10;
	b2 = b1 + 10;
	b3 = b1 + 20;

because the first form seems more flexible for downstream optimization
choices.  In practice, fwprop changes the first form into the second,
but choosing the first form allows that to happen or not happen as
things change in the future.

But your points are well-taken for opportunities that appear in the
slightly different form you presented them, and I will ponder them to
make sure they are covered.  Most of them come under the case when the
stride is an SSA name, not a constant, so would be addressed in
follow-on work.  But I want to think about the infrastructure to make
sure I can handle those cases correctly without too much rip-up.

> Thanks,
> Richard.
> 

While I'm here -- I'm looking ahead to some of the follow-on work where
profitability of replacements isn't necessarily obvious.  I'm planning
to use and extend some of the cost logic in tree-ssa-loop-ivopts.c that
generates RTL sequences.  That module has the following FIXME:

/* FIXME: Expressions are expanded to RTL in this pass to determine the
   cost of different addressing modes.  This should be moved to a TBD
   interface between the GIMPLE and RTL worlds.  */

Any thoughts on how you would like to see this done?  I could make my
cost changes within tree-ssa-loop-ivopts.c, or pull all of it out into a
new common interface file, as you prefer.  If the latter, do you have a
naming preference?

Thanks again,
Bill


> > Bill
> >
> >
> > 2011-10-30  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> >
> > gcc:
> >
> >        * tree-pass.h (pass_strength_reduction): New declaration.
> >        * timevar.def (TV_TREE_SLSR): New time-var.
> >        * tree-ssa-strength-reduction.c: New file.
> >        * Makefile.in: New dependences.
> >        * passes.c (init_optimization_passes): Add new pass.
> >
> > gcc/testsuite:
> >
> >        * gcc.dg/tree-ssa/slsr-1.c: New test case.
> >        * gcc.dg/tree-ssa/slsr-2.c: New test case.
> >        * gcc.dg/tree-ssa/slsr-3.c: New test case.
> >        * gcc.dg/tree-ssa/slsr-4.c: New test case.
> >
> >
> > Index: gcc/tree-pass.h
> > ===================================================================
> > --- gcc/tree-pass.h     (revision 180617)
> > +++ gcc/tree-pass.h     (working copy)
> > @@ -449,6 +449,7 @@ extern struct gimple_opt_pass pass_tracer;
> >  extern struct gimple_opt_pass pass_warn_unused_result;
> >  extern struct gimple_opt_pass pass_split_functions;
> >  extern struct gimple_opt_pass pass_feedback_split_functions;
> > +extern struct gimple_opt_pass pass_strength_reduction;
> >
> >  /* IPA Passes */
> >  extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
> > Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c      (revision 0)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c      (revision 0)
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-optimized" } */
> > +
> > +extern void foo (int);
> > +
> > +void
> > +f (int *p, unsigned int n)
> > +{
> > +  foo (*(p + n * 4));
> > +  foo (*(p + 32 + n * 4));
> > +  if (n > 3)
> > +    foo (*(p + 16 + n * 4));
> > +  else
> > +    foo (*(p + 48 + n * 4));
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "\\+ 128" 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "\\+ 64" 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "\\+ 192" 1 "optimized" } } */
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c      (revision 0)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c      (revision 0)
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-optimized" } */
> > +
> > +extern void foo (int);
> > +
> > +void
> > +f (int *p, int n)
> > +{
> > +  foo (*(p + n++ * 4));
> > +  foo (*(p + 32 + n++ * 4));
> > +  foo (*(p + 16 + n * 4));
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "\\+ 144" 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "\\+ 96" 1 "optimized" } } */
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c      (revision 0)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c      (revision 0)
> > @@ -0,0 +1,22 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-optimized" } */
> > +
> > +int
> > +foo (int a[], int b[], int i)
> > +{
> > +  a[i] = b[i] + 2;
> > +  i++;
> > +  a[i] = b[i] + 2;
> > +  i++;
> > +  a[i] = b[i] + 2;
> > +  i++;
> > +  a[i] = b[i] + 2;
> > +  i++;
> > +  return i;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "\\+ 4" 2 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "\\+ 8" 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "\\+ 12" 1 "optimized" } } */
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c      (revision 0)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c      (revision 0)
> > @@ -0,0 +1,37 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-slsr -fdump-tree-optimized" } */
> > +
> > +void foo (int);
> > +
> > +int
> > +f (int i)
> > +{
> > +  int x, y;
> > +
> > +  x = i * 4;
> > +  y = x * 10;
> > +  foo (y);
> > +
> > +  i = i + 5;
> > +  x = i * 4;
> > +  y = x * 10;
> > +  foo (y);
> > +
> > +  i = i - 4;
> > +  x = i * 4;
> > +  y = x * 10;
> > +  foo (y);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */
> > +/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */
> > +/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */
> > +/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */
> > +/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */
> > +/* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */
> > +/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */
> > +/* { dg-final { cleanup-tree-dump "slsr" } } */
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
> > Index: gcc/timevar.def
> > ===================================================================
> > --- gcc/timevar.def     (revision 180617)
> > +++ gcc/timevar.def     (working copy)
> > @@ -252,6 +252,7 @@ DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
> >  DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
> >  DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
> >  DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
> > +DEFTIMEVAR (TV_TREE_SLSR             , "straight-line strength reduction")
> >
> >  /* Everything else in rest_of_compilation not included above.  */
> >  DEFTIMEVAR (TV_EARLY_LOCAL          , "early local passes")
> > Index: gcc/tree-ssa-strength-reduction.c
> > ===================================================================
> > --- gcc/tree-ssa-strength-reduction.c   (revision 0)
> > +++ gcc/tree-ssa-strength-reduction.c   (revision 0)
> > @@ -0,0 +1,813 @@
> > +/* Straight-line strength reduction.
> > +   Copyright (C) 2011  Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it under
> > +the terms of the GNU General Public License as published by the Free
> > +Software Foundation; either version 3, or (at your option) any later
> > +version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.  */
> > +
> > +/* There are many algorithms for performing strength reduction on
> > +   loops.  This is not one of them.  IVOPTS handles strength reduction
> > +   of induction variables just fine.  This pass is intended to pick
> > +   up the crumbs it leaves behind, by considering opportunities for
> > +   strength reduction along dominator paths.
> > +
> > +   Strength reduction will be implemented in four stages, gradually
> > +   adding more complex candidates:
> > +
> > +   1) Explicit multiplies, known constant multipliers, no
> > +      conditional increments.
> > +   2) Explicit multiplies, unknown constant multipliers,
> > +      no conditional increments.
> > +   3) Explicit multiplies, conditional increments.
> > +   4) Implicit multiplies in addressing expressions.
> > +
> > +   It would also be possible to apply strength reduction to divisions
> > +   and modulos, but such opportunities are relatively uncommon.
> > +
> > +   Strength reduction is also currently restricted to integer operations.
> > +   If desired, it could be extended to floating-point operations under
> > +   control of something like -funsafe-math-optimizations.  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "tree.h"
> > +#include "gimple.h"
> > +#include "basic-block.h"
> > +#include "tree-pass.h"
> > +#include "timevar.h"
> > +#include "cfgloop.h"
> > +#include "tree-pretty-print.h"
> > +#include "gimple-pretty-print.h"
> > +#include "alloc-pool.h"
> > +#include "tree-flow.h"
> > +
> > +/* Index into the candidate vector, offset by 1.  VECs are zero-based,
> > +   while cand_idx's are one-based, with zero indicating null.  */
> > +
> > +typedef unsigned int cand_idx;
> > +
> > +/* Information about a strength reduction candidate.  A candidate
> > +   is a statement S1 of the expanded form
> > +
> > +     S1:  LHS = (B + c1) * M,
> > +
> > +   where B is an SSA name, c1 is a constant that may be zero, and
> > +   M is either an SSA name or a nonzero constant.  A candidate may
> > +   only be strength reduced if a previous "basis" statement
> > +
> > +     S0:  S = (B' + c0) * M
> > +
> > +   of the same form exists previously in the same block or in a
> > +   dominator.  Assuming replacement is profitable, there are two
> > +   cases:
> > +
> > +   (1) If B = B', then S1 can be replaced with:
> > +
> > +         S1':  LHS = S + (c1 - c0) * M,
> > +
> > +       where (c1 - c0) * M is folded to the extent possible.
> > +
> > +   (2) If B differs from B', then we require that B = B' + c0,
> > +       and S1 can be replaced with:
> > +
> > +         S1':  LHS = S + c1 * M,
> > +
> > +       where c1 * M is folded if possible.
> > +
> > +   Candidate records are allocated from an allocation pool.  They are
> > +   addressed both from a hash table keyed on S1, and from a vector of
> > +   candidate pointers arranged in predominator order.  */
> > +
> > +typedef struct slsr_cand_d
> > +{
> > +  /* The candidate statement S1.  */
> > +  gimple cand_stmt;
> > +
> > +  /* The base SSA name B.  */
> > +  tree base_name;
> > +
> > +  /* The stride M.  */
> > +  tree stride;
> > +
> > +  /* The index constant c1.  */
> > +  double_int index;
> > +
> > +  /* Index of this candidate in the candidate vector.  */
> > +  cand_idx cand_num;
> > +
> > +  /* Index of the basis statement S0, if any, in the candidate vector.  */
> > +  cand_idx basis;
> > +
> > +  /* First candidate for which this candidate is a basis, if one exists.  */
> > +  cand_idx dependent;
> > +
> > +  /* Next candidate having the same basis as this one.  */
> > +  cand_idx sibling;
> > +
> > +  /* If this is a conditional candidate, the defining PHI statement
> > +     for the base SSA name B.  */
> > +  gimple def_phi;
> > +
> > +  /* Access to the statement for subsequent modification.  Cached to
> > +     save compile time.  */
> > +  gimple_stmt_iterator cand_gsi;
> > +
> > +} slsr_cand, *slsr_cand_t;
> > +
> > +typedef const struct slsr_cand_d *const_slsr_cand_t;
> > +
> > +/* Candidates are maintained in a vector.  If candidate X dominates
> > +   candidate Y, then X appears before Y in the vector; but the
> > +   converse does not necessarily hold.  */
> > +DEF_VEC_P (slsr_cand_t);
> > +DEF_VEC_ALLOC_P (slsr_cand_t, heap);
> > +static VEC (slsr_cand_t, heap) *cand_vec;
> > +
> > +/* Hash table embodying a mapping from statements to candidates.  */
> > +static htab_t stmt_cand_map;
> > +
> > +/* Allocation pool for candidates.  */
> > +static alloc_pool cand_pool;
> > +
> > +
> > +/* Produce a pointer to the IDX'th candidate in the candidate vector.  */
> > +
> > +static slsr_cand_t
> > +lookup_cand (cand_idx idx)
> > +{
> > +  return VEC_index (slsr_cand_t, cand_vec, idx - 1);
> > +}
> > +
> > +/* Callback to produce a hash value for a candidate.  */
> > +
> > +static hashval_t
> > +stmt_cand_hash (const void *p)
> > +{
> > +  return htab_hash_pointer (((const_slsr_cand_t)p)->cand_stmt);
> > +}
> > +
> > +/* Callback when an element is removed from the hash table.
> > +   We never remove entries until the entire table is released.  */
> > +
> > +static void
> > +stmt_cand_free (void *p ATTRIBUTE_UNUSED)
> > +{
> > +}
> > +
> > +/* Callback to return true if two candidates are equal.  */
> > +
> > +static int
> > +stmt_cand_eq (const void *p1, const void *p2)
> > +{
> > +  const_slsr_cand_t const cand1 = (const_slsr_cand_t)p1;
> > +  const_slsr_cand_t const cand2 = (const_slsr_cand_t)p2;
> > +  return (cand1->cand_stmt == cand2->cand_stmt);
> > +}
> > +
> > +/* Return TRUE if GS is a statement that defines an SSA name from
> > +   a NOP_EXPR and is legal for us to combine an add and multiply
> > +   across.  This is legitimate for casts from a signed type to
> > +   a signed or unsigned type of the same or larger size.  It is not
> > +   legitimate to convert any unsigned type to a signed type, or
> > +   to an unsigned type of a different size.
> > +
> > +   The reasoning here is that signed integer overflow is undefined,
> > +   so any program that was expecting overflow that no longer occurs
> > +   is not correct.  Unsigned integers, however, have wrap semantics,
> > +   and it is reasonable for programs to assume an overflowing add
> > +   will wrap.  */
> > +
> > +static bool
> > +legal_cast_p (gimple gs)
> > +{
> > +  tree lhs, rhs;
> > +  unsigned int src_size, dst_size, src_uns, dst_uns;
> > +
> > +  if (!is_gimple_assign (gs)
> > +      || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME
> > +      || gimple_assign_rhs_code (gs) != NOP_EXPR)
> > +    return false;
> > +
> > +  lhs = gimple_assign_lhs (gs);
> > +  rhs = gimple_assign_rhs1 (gs);
> > +
> > +  if (TREE_CODE (rhs) != SSA_NAME)
> > +    return false;
> > +
> > +  src_size = TYPE_PRECISION (TREE_TYPE (SSA_NAME_VAR (rhs)));
> > +  src_uns = TYPE_UNSIGNED (TREE_TYPE (SSA_NAME_VAR (rhs)));
> > +  dst_size = TYPE_PRECISION (TREE_TYPE (SSA_NAME_VAR (lhs)));
> > +  dst_uns = TYPE_UNSIGNED (TREE_TYPE (SSA_NAME_VAR (lhs)));
> > +
> > +  if (dst_size < src_size)
> > +    return false;
> > +
> > +  if (src_uns && !dst_uns)
> > +    return false;
> > +
> > +  if (src_uns && dst_uns && src_size != dst_size)
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* If GS is a statement that defines an SSA name from a NOP_EXPR or
> > +   CONVERT_EXPR and is legal for our purposes (see legal_cast_p),
> > +   return the source SSA name.  */
> > +
> > +static tree
> > +source_of_legal_cast (gimple gs)
> > +{
> > +  if (legal_cast_p (gs))
> > +    return gimple_assign_rhs1 (gs);
> > +
> > +  return NULL_TREE;
> > +}
> > +
> > +/* If GS is a statement that defines an SSA name from a NOP_EXPR or
> > +   CONVERT_EXPR and is legal for our purposes (see legal_cast_p),
> > +   return the target SSA name.  */
> > +
> > +static tree
> > +target_of_legal_cast (gimple gs)
> > +{
> > +  if (legal_cast_p (gs))
> > +    return gimple_assign_lhs (gs);
> > +
> > +  return NULL_TREE;
> > +}
> > +
> > +/* Return TRUE if GS consists of an add or subtract of BASE_NAME
> > +   with a constant, with the result placed in an SSA name.  GS is
> > +   known to be a gimple assignment.  CODE is the RHS code for GS.  */
> > +
> > +static bool
> > +stmt_adds_base_to_const (gimple gs, enum tree_code code, tree base_name)
> > +{
> > +  return (TREE_CODE (gimple_assign_lhs (gs)) == SSA_NAME
> > +         && (code == PLUS_EXPR || code == MINUS_EXPR)
> > +         && operand_equal_p (gimple_assign_rhs1 (gs), base_name, 0)
> > +         && TREE_CODE (gimple_assign_rhs2 (gs)) == INTEGER_CST);
> > +}
> > +
> > +/* Recursive helper for find_basis_for_candidate.  To find a
> > +   possible basis, first try the immediate uses of the given
> > +   BASE_NAME.  If a particular use doesn't appear in a multiply,
> > +   see if it appears in an add that feeds one or more multiplies.
> > +   Recurse using the SSA name defined on the add.  Each potential
> > +   basis found in this manner must also appear in a block that
> > +   dominates the candidate statement, be present in the candidate
> > +   table, and have the same stride M.  If more than one possible
> > +   basis exists, pick the one with highest index in the vector,
> > +   which will be the most immediately dominating basis.  */
> > +
> > +static slsr_cand_t
> > +find_basis_for_candidate_1 (slsr_cand_t c, tree base_name, slsr_cand_t basis)
> > +{
> > +  imm_use_iterator use_iter;
> > +  use_operand_p use_p;
> > +
> > +  FOR_EACH_IMM_USE_FAST (use_p, use_iter, base_name)
> > +    {
> > +      gimple use_stmt = USE_STMT (use_p);
> > +      slsr_cand_t use_basis = NULL;
> > +      enum tree_code code;
> > +      tree cast_target;
> > +
> > +      if (!is_gimple_assign (use_stmt))
> > +       continue;
> > +
> > +      /* When revising a candidate's basis, make sure not to pick itself.  */
> > +      if (c->cand_stmt == use_stmt)
> > +       continue;
> > +
> > +      /* Look through casts where legal.  */
> > +      cast_target = target_of_legal_cast (use_stmt);
> > +      if (cast_target)
> > +       return find_basis_for_candidate_1 (c, cast_target, basis);
> > +
> > +      /* If the use statement is a multiply, it's a potential basis.
> > +        Otherwise, we have to look at the uses of the use statement
> > +        to find any multiplies that may be potential bases.  */
> > +      code = gimple_assign_rhs_code (use_stmt);
> > +
> > +      if (code == MULT_EXPR)
> > +       {
> > +         slsr_cand mapping_key;
> > +         mapping_key.cand_stmt = use_stmt;
> > +         use_basis = (slsr_cand_t)htab_find (stmt_cand_map, &mapping_key);
> > +       }
> > +
> > +      /* Recurse if this is an add of a constant that might feed a
> > +        multiply.  */
> > +      else if (stmt_adds_base_to_const (use_stmt, code, base_name))
> > +       {
> > +         tree lhs = gimple_assign_lhs (use_stmt);
> > +         use_basis = find_basis_for_candidate_1 (c, lhs, basis);
> > +       }
> > +
> > +      if (!use_basis
> > +         || !operand_equal_p (c->stride, use_basis->stride, 0)
> > +         || !dominated_by_p (CDI_DOMINATORS,
> > +                             gimple_bb (c->cand_stmt),
> > +                             gimple_bb (use_stmt)))
> > +       continue;
> > +
> > +      /* When revising a candidate's basis, be careful not to pick
> > +        a basis that occurs after the candidate.  */
> > +      if (use_basis->cand_num > c->cand_num)
> > +       continue;
> > +
> > +      if (!basis || basis->cand_num < use_basis->cand_num)
> > +       basis = use_basis;
> > +    }
> > +
> > +  return basis;
> > +}
> > +
> > +/* Look for the nearest dominating statement in the candidate
> > +   vector that can serve as a basis for the new CANDIDATE.  If
> > +   found, adjust the dependent links for the basis entry and
> > +   return the index of the basis entry.  Otherwise return zero.  */
> > +
> > +static unsigned int
> > +find_basis_for_candidate (slsr_cand_t c)
> > +{
> > +  slsr_cand_t basis = find_basis_for_candidate_1 (c, c->base_name, NULL);
> > +
> > +  if (basis)
> > +    {
> > +      c->sibling = basis->dependent;
> > +      basis->dependent = c->cand_num;
> > +      return basis->cand_num;
> > +    }
> > +
> > +  return 0;
> > +}
> > +
> > +/* Starting with statement GS, look backwards through any
> > +   intervening legal casts to find one or more
> > +   adds of an SSA name and a constant.  Return the earliest
> > +   SSA name in the chain as *BASE, and the sum of all constants
> > +   in the chain as INDEX.  */
> > +
> > +static void
> > +combine_feeding_adds (gimple gs, tree *base, double_int *index)
> > +{
> > +  double_int new_index;
> > +  tree cast_source;
> > +
> > +  while ((cast_source = source_of_legal_cast (gs)))
> > +    {
> > +      gs = SSA_NAME_DEF_STMT (cast_source);
> > +      *base = cast_source;
> > +    }
> > +
> > +  /* If there aren't any more adds, we're done.  */
> > +  if (!is_gimple_assign (gs)
> > +      || (gimple_assign_rhs_code (gs) != PLUS_EXPR
> > +         && gimple_assign_rhs_code (gs) != MINUS_EXPR)
> > +      || TREE_CODE (gimple_assign_rhs1 (gs)) != SSA_NAME
> > +      || TREE_CODE (gimple_assign_rhs2 (gs)) != INTEGER_CST)
> > +    return;
> > +
> > +  *base = gimple_assign_rhs1 (gs);
> > +  new_index = tree_to_double_int (gimple_assign_rhs2 (gs));
> > +
> > +  if (gimple_assign_rhs_code (gs) == MINUS_EXPR)
> > +    new_index = double_int_neg (new_index);
> > +
> > +  *index = double_int_add (*index, new_index);
> > +  combine_feeding_adds (SSA_NAME_DEF_STMT (*base), base, index);
> > +}
> > +
> > +/* Given statement GS containing an integer multiply, determine
> > +   whether it's a possible strength-reduction candidate.  If so,
> > +   add it to the candidate vector and the statement-candidate
> > +   mapping.  */
> > +
> > +static void
> > +process_possible_candidate (gimple_stmt_iterator gsi, gimple gs)
> > +{
> > +  tree rhs1 = gimple_assign_rhs1 (gs);
> > +  tree rhs2 = gimple_assign_rhs2 (gs);
> > +  tree base;
> > +  double_int index;
> > +  slsr_cand_t c;
> > +  void **slot;
> > +
> > +  /* TODO:  Unknown constant multipliers will be dealt with in
> > +     stage 2.  */
> > +  if (TREE_CODE (rhs2) != INTEGER_CST)
> > +    return;
> > +
> > +  /* If RHS1 isn't fed by an addition or subtraction of an SSA_NAME
> > +     and a constant, then treat it as an add of zero.  Look through
> > +     legal casts and combine multiple chained adds to find
> > +     the true base name and index.  */
> > +  gcc_assert (TREE_CODE (rhs1) == SSA_NAME);
> > +  base = rhs1;
> > +  index = double_int_zero;
> > +  combine_feeding_adds (SSA_NAME_DEF_STMT (rhs1), &base, &index);
> > +
> > +  c = (slsr_cand_t)pool_alloc (cand_pool);
> > +  c->cand_stmt = gs;
> > +  c->base_name = base;
> > +  c->stride = rhs2;
> > +  c->index = index;
> > +  c->cand_num = VEC_length (slsr_cand_t, cand_vec) + 1;
> > +  c->dependent = 0;
> > +  c->sibling = 0;
> > +  c->def_phi = NULL;
> > +  c->cand_gsi = gsi;
> > +  c->basis = find_basis_for_candidate (c);
> > +
> > +  VEC_safe_push (slsr_cand_t, heap, cand_vec, c);
> > +
> > +  slot = htab_find_slot (stmt_cand_map, c, INSERT);
> > +  gcc_assert (!*slot);
> > +  *slot = c;
> > +}
> > +
> > +/* Find strength-reduction candidates in block BB.  */
> > +
> > +static void
> > +find_candidates_in_block (basic_block bb)
> > +{
> > +  gimple_stmt_iterator gsi;
> > +  basic_block son;
> > +
> > +  /* Process this block.  */
> > +  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > +    {
> > +      gimple gs = gsi_stmt (gsi);
> > +      if (is_gimple_assign (gs)
> > +         && gimple_assign_rhs_code (gs) == MULT_EXPR
> > +         && INTEGRAL_MODE_P (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs)))))
> > +       process_possible_candidate (gsi, gs);
> > +    }
> > +
> > +  /* Process dominated blocks.  */
> > +  for (son = first_dom_son (CDI_DOMINATORS, bb);
> > +       son;
> > +       son = next_dom_son (CDI_DOMINATORS, son))
> > +    find_candidates_in_block (son);
> > +}
> > +
> > +/* Dump a candidate for debug.  */
> > +
> > +static void
> > +dump_candidate (slsr_cand_t c)
> > +{
> > +  fprintf (dump_file, "%3d  [%d] ", c->cand_num,
> > +          gimple_bb (c->cand_stmt)->index);
> > +  print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> > +  fputs ("     base:        ", dump_file);
> > +  print_generic_expr (dump_file, c->base_name, 0);
> > +  fputs ("\n     index:       ", dump_file);
> > +  dump_double_int (dump_file, c->index, false);
> > +  fputs ("\n     stride:      ", dump_file);
> > +  print_generic_expr (dump_file, c->stride, 0);
> > +  fprintf (dump_file, "\n     basis:     %3d", c->basis);
> > +  fprintf (dump_file, "\n     dependent: %3d", c->dependent);
> > +  fprintf (dump_file, "\n     sibling:   %3d", c->sibling);
> > +  fputs ("\n     phi:         ", dump_file);
> > +  if (c->def_phi)
> > +    print_gimple_stmt (dump_file, c->def_phi, 0, 0);
> > +  else
> > +    fputs ("n/a", dump_file);
> > +  fputs ("\n\n", dump_file);
> > +}
> > +
> > +/* Dump the candidate vector for debug.  */
> > +
> > +static void
> > +dump_cand_vec (void)
> > +{
> > +  unsigned int i;
> > +  slsr_cand_t c;
> > +
> > +  fprintf (dump_file, "\nStrength reduction candidate vector:\n\n");
> > +
> > +  FOR_EACH_VEC_ELT (slsr_cand_t, cand_vec, i, c)
> > +    dump_candidate (c);
> > +}
> > +
> > +/* Recursive helper for unconditional_cands_with_known_stride_p.
> > +   Returns TRUE iff C, its siblings, and its dependents are all
> > +   unconditional candidates.  */
> > +
> > +static bool
> > +unconditional_cands (slsr_cand_t c)
> > +{
> > +  if (c->def_phi)
> > +    return false;
> > +
> > +  if (c->sibling && !unconditional_cands (lookup_cand (c->sibling)))
> > +    return false;
> > +
> > +  if (c->dependent && !unconditional_cands (lookup_cand (c->dependent)))
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Determine whether or not the tree of candidates rooted at
> > +   ROOT consists entirely of unconditional increments with
> > +   an INTEGER_CST stride.  */
> > +
> > +static bool
> > +unconditional_cands_with_known_stride_p (slsr_cand_t root)
> > +{
> > +  /* The stride is identical for all related candidates, so
> > +     check it once.  */
> > +  if (TREE_CODE (root->stride) != INTEGER_CST)
> > +    return false;
> > +
> > +  return unconditional_cands (lookup_cand (root->dependent));
> > +}
> > +
> > +/* GS is an add or subtract that used to be a multiply.  Follow
> > +   its immediate uses to update the candidate table accordingly.  */
> > +
> > +static void
> > +update_chained_candidates (gimple gs)
> > +{
> > +  tree base_name = gimple_assign_lhs (gs);
> > +  imm_use_iterator use_iter;
> > +  use_operand_p use_p;
> > +  tree cast_target;
> > +
> > +  FOR_EACH_IMM_USE_FAST (use_p, use_iter, base_name)
> > +    {
> > +      gimple use_stmt = USE_STMT (use_p);
> > +      enum tree_code code;
> > +      slsr_cand_t c;
> > +
> > +      if (!is_gimple_assign (use_stmt))
> > +       continue;
> > +
> > +      /* Look forward through converts that don't change semantics.  */
> > +      cast_target = target_of_legal_cast (use_stmt);
> > +      if (cast_target)
> > +       {
> > +         update_chained_candidates (use_stmt);
> > +         continue;
> > +       }
> > +
> > +      code = gimple_assign_rhs_code (use_stmt);
> > +
> > +      /* Case 1:  The name defined by the add appears in a
> > +        multiply that exists in the candidate table.  */
> > +      if (code == MULT_EXPR)
> > +       {
> > +         slsr_cand mapping_key;
> > +         tree base;
> > +         double_int index;
> > +
> > +         mapping_key.cand_stmt = use_stmt;
> > +         if (!(c = (slsr_cand_t)htab_find (stmt_cand_map, &mapping_key)))
> > +           continue;
> > +
> > +         base = base_name;
> > +         index = double_int_zero;
> > +         combine_feeding_adds (gs, &base, &index);
> > +         c->base_name = base;
> > +         c->index = index;
> > +
> > +         if (!c->basis)
> > +           c->basis = find_basis_for_candidate (c);
> > +         else
> > +           /* If all has worked correctly, the modified candidate has
> > +              the same basis as before.  */
> > +           gcc_assert (lookup_cand (c->basis)
> > +                       == find_basis_for_candidate_1 (c, base, NULL));
> > +
> > +         if (dump_file && (dump_flags & TDF_DETAILS))
> > +           {
> > +             fputs ("\nRevised candidate:\n", dump_file);
> > +             dump_candidate (c);
> > +           }
> > +       }
> > +
> > +      /* Case 2: The name defined by the add appears in another
> > +        add with a constant, which in turn feeds one or more multiplies
> > +        in the candidate table.  Recurse to find the multiplies.  */
> > +      else if (stmt_adds_base_to_const (use_stmt, code, base_name))
> > +       update_chained_candidates (use_stmt);
> > +    }
> > +}
> > +
> > +/* Replace candidate C, each sibling of candidate C, and each
> > +   dependent of candidate C with an add or subtract.  */
> > +
> > +static void
> > +replace_dependents (slsr_cand_t c)
> > +{
> > +  slsr_cand_t basis = lookup_cand (c->basis);
> > +  tree basis_name = gimple_assign_lhs (basis->cand_stmt);
> > +  tree incr_type = TREE_TYPE (gimple_assign_rhs1 (c->cand_stmt));
> > +  double_int increment, stride;
> > +
> > +  if (operand_equal_p (c->base_name, basis->base_name, 0))
> > +    increment = double_int_sub (c->index, basis->index);
> > +  else
> > +    increment = c->index;
> > +
> > +  stride = tree_to_double_int (c->stride);
> > +  increment = double_int_mul (increment, stride);
> > +
> > +  /* It is highly unlikely, but possible, that the resulting
> > +     increment doesn't fit in a HWI.  Abandon the replacement
> > +     in this case.  This does not affect siblings or dependents
> > +     of C.  */
> > +  /* Restriction to signed HWI is conservative for unsigned types
> > +     but allows for safe negation without twisted logic.  */
> > +  if (double_int_fits_in_shwi_p (increment))
> > +    {
> > +      enum tree_code code = PLUS_EXPR;
> > +      tree orig_type
> > +       = TREE_TYPE (SSA_NAME_VAR (gimple_assign_rhs1 (c->cand_stmt)));
> > +      tree repl_type = TREE_TYPE (SSA_NAME_VAR (basis_name));
> > +      tree incr_tree;
> > +
> > +      if (double_int_negative_p (increment))
> > +       {
> > +         code = MINUS_EXPR;
> > +         increment = double_int_neg (increment);
> > +       }
> > +
> > +      incr_tree = double_int_to_tree (incr_type, increment);
> > +
> > +      /* A widening cast may be necessary.  */
> > +      if (orig_type != repl_type)
> > +       {
> > +         tree new_basis_name = create_tmp_reg (orig_type, "slsr");
> > +         gimple nop_stmt;
> > +         add_referenced_var (new_basis_name);
> > +         new_basis_name = make_ssa_name (new_basis_name, NULL);
> > +         nop_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_basis_name,
> > +                                                  basis_name, NULL_TREE);
> > +         gimple_set_location (nop_stmt, gimple_location (c->cand_stmt));
> > +         gsi_insert_before (&c->cand_gsi, nop_stmt, GSI_SAME_STMT);
> > +         basis_name = new_basis_name;
> > +
> > +         if (dump_file && (dump_flags & TDF_DETAILS))
> > +           {
> > +             fputs ("Inserting: ", dump_file);
> > +             print_gimple_stmt (dump_file, nop_stmt, 0, 0);
> > +           }
> > +       }
> > +
> > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > +       {
> > +         fputs ("Replacing: ", dump_file);
> > +         print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> > +       }
> > +
> > +      gimple_assign_set_rhs_with_ops (&c->cand_gsi, code,
> > +                                     basis_name, incr_tree);
> > +      update_stmt (c->cand_stmt);
> > +
> > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > +       {
> > +         fputs ("With: ", dump_file);
> > +         print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> > +         fputs ("\n", dump_file);
> > +       }
> > +
> > +      /* The multiply we just converted to an add might well feed
> > +        into one or more other multiplies in the candidate table.
> > +        If so, we need to update those candidate entries.  */
> > +      update_chained_candidates (c->cand_stmt);
> > +    }
> > +
> > +  if (c->sibling)
> > +    replace_dependents (lookup_cand (c->sibling));
> > +
> > +  if (c->dependent)
> > +    replace_dependents (lookup_cand (c->dependent));
> > +}
> > +
> > +/* Analyze costs of related candidates in the candidate vector,
> > +   and make beneficial replacements.  */
> > +
> > +static void
> > +analyze_candidates_and_replace (void)
> > +{
> > +  unsigned int i;
> > +  slsr_cand_t c;
> > +
> > +  /* Each candidate that has a null basis and a non-null
> > +     dependent is the root of a tree of related statements.
> > +     Analyze each tree to determine a subset of those
> > +     statements that can be replaced with maximum benefit.  */
> > +  FOR_EACH_VEC_ELT (slsr_cand_t, cand_vec, i, c)
> > +    {
> > +      if (c->basis != 0 || c->dependent == 0)
> > +       continue;
> > +
> > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > +       fprintf (dump_file, "\nProcessing dependency tree rooted at %d.\n",
> > +                c->cand_num);
> > +
> > +      /* If the common stride of all related candidates is a
> > +        known constant, and none of these has a phi-dependence,
> > +        then all replacements are considered profitable.
> > +        Each replaces a multiply by a single add, with the
> > +        possibility that a feeding add also goes dead as a
> > +        result.  */
> > +      if (unconditional_cands_with_known_stride_p (c))
> > +       replace_dependents (lookup_cand (c->dependent));
> > +
> > +      /* TODO:  When the stride is an SSA name, it may still
> > +        be profitable to replace some or all of the dependent
> > +        candidates, depending on whether the introduced increments
> > +        can be reused.  */
> > +
> > +      /* TODO:  When conditional increments occur so that a
> > +        candidate is dependent upon a phi-basis, the cost of
> > +        introducing a temporary must be accounted for.  */
> > +
> > +      /* TODO:  Strength reduction candidates hidden in
> > +        addressing expressions are not yet handled, and will
> > +        have more complex cost functions.  */
> > +    }
> > +}
> > +
> > +/* Main entry point for straight-line strength reduction.  */
> > +
> > +static unsigned int
> > +execute_strength_reduction (void)
> > +{
> > +  /* Create the allocation pool where candidates will reside.  */
> > +  cand_pool = create_alloc_pool ("Strength reduction pool",
> > +                                sizeof (slsr_cand), 128);
> > +
> > +  /* Allocate the candidate vector.  */
> > +  cand_vec = VEC_alloc (slsr_cand_t, heap, 128);
> > +
> > +  /* Allocate the mapping from statements to candidate indices.  */
> > +  stmt_cand_map = htab_create (500, stmt_cand_hash,
> > +                              stmt_cand_eq, stmt_cand_free);
> > +
> > +  /* Initialize the loop optimizer.  We need to detect flow across
> > +     back edges, and this gives us dominator information as well.  */
> > +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> > +
> > +  /* Walk the CFG in predominator order looking for strength reduction
> > +     candidates.  */
> > +  find_candidates_in_block (ENTRY_BLOCK_PTR);
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +    dump_cand_vec ();
> > +
> > +  /* Analyze costs and make appropriate replacements.  */
> > +  analyze_candidates_and_replace ();
> > +
> > +  /* Free resources.  */
> > +  loop_optimizer_finalize ();
> > +  htab_delete (stmt_cand_map);
> > +  VEC_free (slsr_cand_t, heap, cand_vec);
> > +  free_alloc_pool (cand_pool);
> > +
> > +  return 0;
> > +}
> > +
> > +static bool
> > +gate_strength_reduction (void)
> > +{
> > +  return optimize > 0;
> > +}
> > +
> > +struct gimple_opt_pass pass_strength_reduction =
> > +{
> > + {
> > +  GIMPLE_PASS,
> > +  "slsr",                              /* name */
> > +  gate_strength_reduction,             /* gate */
> > +  execute_strength_reduction,          /* execute */
> > +  NULL,                                        /* sub */
> > +  NULL,                                        /* next */
> > +  0,                                   /* static_pass_number */
> > +  TV_TREE_SLSR,                                /* tv_id */
> > +  PROP_cfg | PROP_ssa,                 /* properties_required */
> > +  0,                                   /* properties_provided */
> > +  0,                                   /* properties_destroyed */
> > +  0,                                   /* todo_flags_start */
> > +  TODO_dump_func | TODO_verify_ssa     /* todo_flags_finish */
> > + }
> > +};
> > Index: gcc/Makefile.in
> > ===================================================================
> > --- gcc/Makefile.in     (revision 180617)
> > +++ gcc/Makefile.in     (working copy)
> > @@ -1475,6 +1475,7 @@ OBJS = \
> >        tree-ssa-reassoc.o \
> >        tree-ssa-sccvn.o \
> >        tree-ssa-sink.o \
> > +       tree-ssa-strength-reduction.o \
> >        tree-ssa-strlen.o \
> >        tree-ssa-structalias.o \
> >        tree-ssa-tail-merge.o \
> > @@ -2519,6 +2520,10 @@ tree-ssa-sccvn.o : tree-ssa-sccvn.c $(TREE_FLOW_H)
> >    alloc-pool.h $(BASIC_BLOCK_H) $(BITMAP_H) langhooks.h $(HASHTAB_H) $(GIMPLE_H) \
> >    $(TREE_INLINE_H) tree-iterator.h tree-ssa-propagate.h tree-ssa-sccvn.h \
> >    $(PARAMS_H) tree-pretty-print.h gimple-pretty-print.h gimple-fold.h
> > +tree-ssa-strength-reduction.o : tree-ssa-strength-reduction.c $(CONFIG_H) \
> > +   $(SYSTEM_H) coretypes.h $(TREE_H) $(GIMPLE_H) $(BASIC_BLOCK_H) \
> > +   $(TREE_PASS_H) $(TIMEVAR_H) $(CFGLOOP_H) tree-pretty-print.h alloc-pool.h \
> > +   $(TREE_FLOW_H)
> >  tree-vrp.o : tree-vrp.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
> >    $(TREE_FLOW_H) $(TREE_PASS_H) $(TREE_DUMP_H) $(DIAGNOSTIC_H) $(GGC_H) \
> >    $(BASIC_BLOCK_H) tree-ssa-propagate.h $(FLAGS_H) $(TREE_DUMP_H) \
> > Index: gcc/passes.c
> > ===================================================================
> > --- gcc/passes.c        (revision 180617)
> > +++ gcc/passes.c        (working copy)
> > @@ -1368,6 +1368,7 @@ init_optimization_passes (void)
> >        }
> >       NEXT_PASS (pass_lower_vector_ssa);
> >       NEXT_PASS (pass_cse_reciprocals);
> > +      NEXT_PASS (pass_strength_reduction);
> >       NEXT_PASS (pass_reassoc);
> >       NEXT_PASS (pass_vrp);
> >       NEXT_PASS (pass_dominator);
> >
> >
> >
> 
>
William J. Schmidt - Nov. 5, 2011, 11:02 p.m.
On Fri, 2011-11-04 at 14:55 +0100, Richard Guenther wrote:
> On Sun, Oct 30, 2011 at 5:10 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > 
> 
> You do not seem to transform stmts on-the-fly, so for
> 
> a1 = c + d;
> a2 = c + 2*d;
> a3 = c + 3*d;
> 
> are you able to generate
> 
> a1 = c + d;
> a2 = a1 + d;
> a3 = a2 + d;
> 
> ?  On-the-fly operation would also handle this if the candidate info
> for a2 is kept as c + 2*d.  Though it's probably worth lookign at
> 
> a1 = c + d;
> a2 = a1 + d;
> a3 = c + 3*d;
> 
> and see if that still figures out that a3 = a2 + d (thus, do you,
> when you find the candidate a1 + 1 * d, fold in candidate information
> for a1?  thus, try to reduce it to an ultimate base?)
> 
> Thanks,
> Richard.

Just a couple of quick thoughts here.

As I mentioned, this patch is only for the cases where the stride is a
constant.  The only interesting patterns I could think of for that case
is what I'm currently handling, where an add-immediate feeds a multiply,
e.g., y = (x + c) * d where c and d are constants.

Once the stride is a variable, we have not only those cases, but also
cases like you show here where the multiply feeds into an add.  Those
can be handled with the existing infrastructure in a slightly different
way.  The main differences are:

 - The cand_stmt will be the add in this case.  We always want the
candidate to be the statement that we hope to replace.

 - The base_name will be the "ultimate base," so that all the original
candidates in your example will have c for the base.  This may involve
looking back through casts.

 - The index will be the multiplier applied to the stride.

The logic for finding the nearest dominating basis will be pretty much
identical.  

The candidate table again contains enough information that we don't need
to do on-the-fly replacement, but can examine all the related candidates
at once.  This will be important for the add-feeding-multiply case with
an SSA name stride, since we sometimes need to introduce multiplies by a
constant in order to remove general multiplies of two registers.

But again, that's all for a follow-on patch.  My thought was to get this
one set of easy candidates handled in a first patch so you could get a
look at the general infrastructure without having to review a huge chunk
of code at once.  Once that patch is in place, the next stages would be:

2. SSA-name strides, both multiply-add and add-multiply forms.
3. Cases involving conditional increments (looking through PHIs).
4. Cases where the multiplies are hidden in addressing expressions.

I have a pretty good idea where I'm going with stages 2 and 3.  Stage 4
is where things are likely to get a bit bloodier, and I will be glad for
any advice about the best way to handle those as we get to that point.

Thanks again,
Bill

Patch

Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 180617)
+++ gcc/tree-pass.h	(working copy)
@@ -449,6 +449,7 @@  extern struct gimple_opt_pass pass_tracer;
 extern struct gimple_opt_pass pass_warn_unused_result;
 extern struct gimple_opt_pass pass_split_functions;
 extern struct gimple_opt_pass pass_feedback_split_functions;
+extern struct gimple_opt_pass pass_strength_reduction;
 
 /* IPA Passes */
 extern struct simple_ipa_opt_pass pass_ipa_lower_emutls;
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-1.c	(revision 0)
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern void foo (int);
+
+void
+f (int *p, unsigned int n)
+{
+  foo (*(p + n * 4));
+  foo (*(p + 32 + n * 4));
+  if (n > 3)
+    foo (*(p + 16 + n * 4));
+  else
+    foo (*(p + 48 + n * 4));
+}
+
+/* { dg-final { scan-tree-dump-times "\\+ 128" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 64" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 192" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-2.c	(revision 0)
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern void foo (int);
+
+void
+f (int *p, int n)
+{
+  foo (*(p + n++ * 4));
+  foo (*(p + 32 + n++ * 4));
+  foo (*(p + 16 + n * 4));
+}
+
+/* { dg-final { scan-tree-dump-times "\\+ 144" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 96" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-3.c	(revision 0)
@@ -0,0 +1,22 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+foo (int a[], int b[], int i)
+{
+  a[i] = b[i] + 2;
+  i++;
+  a[i] = b[i] + 2;
+  i++;
+  a[i] = b[i] + 2;
+  i++;
+  a[i] = b[i] + 2;
+  i++;
+  return i;
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 4" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 8" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 12" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c	(revision 0)
@@ -0,0 +1,37 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-slsr -fdump-tree-optimized" } */
+
+void foo (int);
+
+int
+f (int i)
+{
+  int x, y;
+
+  x = i * 4;
+  y = x * 10;
+  foo (y);
+
+  i = i + 5;
+  x = i * 4;
+  y = x * 10;
+  foo (y);
+
+  i = i - 4;
+  x = i * 4;
+  y = x * 10;
+  foo (y);
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */
+/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */
+/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */
+/* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */
+/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def	(revision 180617)
+++ gcc/timevar.def	(working copy)
@@ -252,6 +252,7 @@  DEFTIMEVAR (TV_TREE_IFCOMBINE        , "tree if-co
 DEFTIMEVAR (TV_TREE_UNINIT           , "uninit var analysis")
 DEFTIMEVAR (TV_PLUGIN_INIT           , "plugin initialization")
 DEFTIMEVAR (TV_PLUGIN_RUN            , "plugin execution")
+DEFTIMEVAR (TV_TREE_SLSR             , "straight-line strength reduction")
 
 /* Everything else in rest_of_compilation not included above.  */
 DEFTIMEVAR (TV_EARLY_LOCAL	     , "early local passes")
Index: gcc/tree-ssa-strength-reduction.c
===================================================================
--- gcc/tree-ssa-strength-reduction.c	(revision 0)
+++ gcc/tree-ssa-strength-reduction.c	(revision 0)
@@ -0,0 +1,813 @@ 
+/* Straight-line strength reduction.
+   Copyright (C) 2011  Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* There are many algorithms for performing strength reduction on
+   loops.  This is not one of them.  IVOPTS handles strength reduction
+   of induction variables just fine.  This pass is intended to pick
+   up the crumbs it leaves behind, by considering opportunities for
+   strength reduction along dominator paths.
+
+   Strength reduction will be implemented in four stages, gradually
+   adding more complex candidates:
+
+   1) Explicit multiplies, known constant multipliers, no
+      conditional increments.
+   2) Explicit multiplies, unknown constant multipliers,
+      no conditional increments.
+   3) Explicit multiplies, conditional increments.
+   4) Implicit multiplies in addressing expressions.
+
+   It would also be possible to apply strength reduction to divisions
+   and modulos, but such opportunities are relatively uncommon.
+
+   Strength reduction is also currently restricted to integer operations.
+   If desired, it could be extended to floating-point operations under
+   control of something like -funsafe-math-optimizations.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "gimple.h"
+#include "basic-block.h"
+#include "tree-pass.h"
+#include "timevar.h"
+#include "cfgloop.h"
+#include "tree-pretty-print.h"
+#include "gimple-pretty-print.h"
+#include "alloc-pool.h"
+#include "tree-flow.h"
+
+/* Index into the candidate vector, offset by 1.  VECs are zero-based,
+   while cand_idx's are one-based, with zero indicating null.  */
+
+typedef unsigned int cand_idx;
+
+/* Information about a strength reduction candidate.  A candidate
+   is a statement S1 of the expanded form
+
+     S1:  LHS = (B + c1) * M,
+
+   where B is an SSA name, c1 is a constant that may be zero, and
+   M is either an SSA name or a nonzero constant.  A candidate may
+   only be strength reduced if a previous "basis" statement
+
+     S0:  S = (B' + c0) * M
+   
+   of the same form exists previously in the same block or in a
+   dominator.  Assuming replacement is profitable, there are two
+   cases:
+
+   (1) If B = B', then S1 can be replaced with:
+
+         S1':  LHS = S + (c1 - c0) * M,
+
+       where (c1 - c0) * M is folded to the extent possible.
+
+   (2) If B differs from B', then we require that B = B' + c0,
+       and S1 can be replaced with:
+
+         S1':  LHS = S + c1 * M,
+
+       where c1 * M is folded if possible.
+
+   Candidate records are allocated from an allocation pool.  They are
+   addressed both from a hash table keyed on S1, and from a vector of
+   candidate pointers arranged in predominator order.  */
+
+typedef struct slsr_cand_d
+{
+  /* The candidate statement S1.  */
+  gimple cand_stmt;
+
+  /* The base SSA name B.  */
+  tree base_name;
+
+  /* The stride M.  */
+  tree stride;
+
+  /* The index constant c1.  */
+  double_int index;
+
+  /* Index of this candidate in the candidate vector.  */
+  cand_idx cand_num;
+
+  /* Index of the basis statement S0, if any, in the candidate vector.  */
+  cand_idx basis;
+
+  /* First candidate for which this candidate is a basis, if one exists.  */
+  cand_idx dependent;
+
+  /* Next candidate having the same basis as this one.  */
+  cand_idx sibling;
+
+  /* If this is a conditional candidate, the defining PHI statement
+     for the base SSA name B.  */
+  gimple def_phi;
+
+  /* Access to the statement for subsequent modification.  Cached to
+     save compile time.  */
+  gimple_stmt_iterator cand_gsi;
+
+} slsr_cand, *slsr_cand_t;
+
+typedef const struct slsr_cand_d *const_slsr_cand_t;
+
+/* Candidates are maintained in a vector.  If candidate X dominates
+   candidate Y, then X appears before Y in the vector; but the
+   converse does not necessarily hold.  */
+DEF_VEC_P (slsr_cand_t);
+DEF_VEC_ALLOC_P (slsr_cand_t, heap);
+static VEC (slsr_cand_t, heap) *cand_vec;
+
+/* Hash table embodying a mapping from statements to candidates.  */
+static htab_t stmt_cand_map;
+
+/* Allocation pool for candidates.  */
+static alloc_pool cand_pool;
+
+
+/* Produce a pointer to the IDX'th candidate in the candidate vector.  */
+
+static slsr_cand_t
+lookup_cand (cand_idx idx)
+{
+  return VEC_index (slsr_cand_t, cand_vec, idx - 1);
+}
+
+/* Callback to produce a hash value for a candidate.  */
+
+static hashval_t
+stmt_cand_hash (const void *p)
+{
+  return htab_hash_pointer (((const_slsr_cand_t)p)->cand_stmt);
+}
+
+/* Callback when an element is removed from the hash table.
+   We never remove entries until the entire table is released.  */
+
+static void
+stmt_cand_free (void *p ATTRIBUTE_UNUSED)
+{
+}
+
+/* Callback to return true if two candidates are equal.  */
+
+static int
+stmt_cand_eq (const void *p1, const void *p2)
+{
+  const_slsr_cand_t const cand1 = (const_slsr_cand_t)p1;
+  const_slsr_cand_t const cand2 = (const_slsr_cand_t)p2;
+  return (cand1->cand_stmt == cand2->cand_stmt);
+}
+
+/* Return TRUE if GS is a statement that defines an SSA name from
+   a NOP_EXPR and is legal for us to combine an add and multiply
+   across.  This is legitimate for casts from a signed type to
+   a signed or unsigned type of the same or larger size.  It is not 
+   legitimate to convert any unsigned type to a signed type, or
+   to an unsigned type of a different size.
+
+   The reasoning here is that signed integer overflow is undefined,
+   so any program that was expecting overflow that no longer occurs
+   is not correct.  Unsigned integers, however, have wrap semantics,
+   and it is reasonable for programs to assume an overflowing add
+   will wrap.  */
+
+static bool
+legal_cast_p (gimple gs)
+{
+  tree lhs, rhs;
+  unsigned int src_size, dst_size, src_uns, dst_uns;
+
+  if (!is_gimple_assign (gs)
+      || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME
+      || gimple_assign_rhs_code (gs) != NOP_EXPR)
+    return false;
+
+  lhs = gimple_assign_lhs (gs);
+  rhs = gimple_assign_rhs1 (gs);
+
+  if (TREE_CODE (rhs) != SSA_NAME)
+    return false;
+
+  src_size = TYPE_PRECISION (TREE_TYPE (SSA_NAME_VAR (rhs)));
+  src_uns = TYPE_UNSIGNED (TREE_TYPE (SSA_NAME_VAR (rhs)));
+  dst_size = TYPE_PRECISION (TREE_TYPE (SSA_NAME_VAR (lhs)));
+  dst_uns = TYPE_UNSIGNED (TREE_TYPE (SSA_NAME_VAR (lhs)));
+
+  if (dst_size < src_size)
+    return false;
+
+  if (src_uns && !dst_uns)
+    return false;
+
+  if (src_uns && dst_uns && src_size != dst_size)
+    return false;
+
+  return true;
+}
+
+/* If GS is a statement that defines an SSA name from a NOP_EXPR or
+   CONVERT_EXPR and is legal for our purposes (see legal_cast_p),
+   return the source SSA name.  */
+
+static tree
+source_of_legal_cast (gimple gs)
+{
+  if (legal_cast_p (gs))
+    return gimple_assign_rhs1 (gs);
+
+  return NULL_TREE;
+}
+
+/* If GS is a statement that defines an SSA name from a NOP_EXPR or
+   CONVERT_EXPR and is legal for our purposes (see legal_cast_p),
+   return the target SSA name.  */
+
+static tree
+target_of_legal_cast (gimple gs)
+{
+  if (legal_cast_p (gs))
+    return gimple_assign_lhs (gs);
+
+  return NULL_TREE;
+}
+
+/* Return TRUE if GS consists of an add or subtract of BASE_NAME
+   with a constant, with the result placed in an SSA name.  GS is
+   known to be a gimple assignment.  CODE is the RHS code for GS.  */
+
+static bool
+stmt_adds_base_to_const (gimple gs, enum tree_code code, tree base_name)
+{
+  return (TREE_CODE (gimple_assign_lhs (gs)) == SSA_NAME
+	  && (code == PLUS_EXPR || code == MINUS_EXPR)
+	  && operand_equal_p (gimple_assign_rhs1 (gs), base_name, 0)
+	  && TREE_CODE (gimple_assign_rhs2 (gs)) == INTEGER_CST);
+}
+
+/* Recursive helper for find_basis_for_candidate.  To find a
+   possible basis, first try the immediate uses of the given
+   BASE_NAME.  If a particular use doesn't appear in a multiply,
+   see if it appears in an add that feeds one or more multiplies.
+   Recurse using the SSA name defined on the add.  Each potential
+   basis found in this manner must also appear in a block that
+   dominates the candidate statement, be present in the candidate
+   table, and have the same stride M.  If more than one possible
+   basis exists, pick the one with highest index in the vector,
+   which will be the most immediately dominating basis.  */
+
+static slsr_cand_t
+find_basis_for_candidate_1 (slsr_cand_t c, tree base_name, slsr_cand_t basis)
+{
+  imm_use_iterator use_iter;
+  use_operand_p use_p;
+  
+  FOR_EACH_IMM_USE_FAST (use_p, use_iter, base_name)
+    {
+      gimple use_stmt = USE_STMT (use_p);
+      slsr_cand_t use_basis = NULL;
+      enum tree_code code;
+      tree cast_target;
+      
+      if (!is_gimple_assign (use_stmt))
+	continue;
+
+      /* When revising a candidate's basis, make sure not to pick itself.  */
+      if (c->cand_stmt == use_stmt)
+	continue;
+
+      /* Look through casts where legal.  */
+      cast_target = target_of_legal_cast (use_stmt);
+      if (cast_target)
+	return find_basis_for_candidate_1 (c, cast_target, basis);
+
+      /* If the use statement is a multiply, it's a potential basis.
+	 Otherwise, we have to look at the uses of the use statement
+	 to find any multiplies that may be potential bases.  */
+      code = gimple_assign_rhs_code (use_stmt);
+
+      if (code == MULT_EXPR)
+	{
+	  slsr_cand mapping_key;
+	  mapping_key.cand_stmt = use_stmt;
+	  use_basis = (slsr_cand_t)htab_find (stmt_cand_map, &mapping_key);
+	}
+
+      /* Recurse if this is an add of a constant that might feed a
+	 multiply.  */
+      else if (stmt_adds_base_to_const (use_stmt, code, base_name))
+	{
+	  tree lhs = gimple_assign_lhs (use_stmt);
+	  use_basis = find_basis_for_candidate_1 (c, lhs, basis);
+	}
+
+      if (!use_basis
+	  || !operand_equal_p (c->stride, use_basis->stride, 0)
+	  || !dominated_by_p (CDI_DOMINATORS,
+			      gimple_bb (c->cand_stmt),
+			      gimple_bb (use_stmt)))
+	continue;
+
+      /* When revising a candidate's basis, be careful not to pick
+	 a basis that occurs after the candidate.  */
+      if (use_basis->cand_num > c->cand_num)
+	continue;
+
+      if (!basis || basis->cand_num < use_basis->cand_num)
+	basis = use_basis;
+    }
+
+  return basis;
+}
+
+/* Look for the nearest dominating statement in the candidate
+   vector that can serve as a basis for the new CANDIDATE.  If 
+   found, adjust the dependent links for the basis entry and
+   return the index of the basis entry.  Otherwise return zero.  */
+
+static unsigned int
+find_basis_for_candidate (slsr_cand_t c)
+{
+  slsr_cand_t basis = find_basis_for_candidate_1 (c, c->base_name, NULL);
+
+  if (basis)
+    {
+      c->sibling = basis->dependent;
+      basis->dependent = c->cand_num;
+      return basis->cand_num;
+    }
+
+  return 0;
+}
+
+/* Starting with statement GS, look backwards through any
+   intervening legal casts to find one or more
+   adds of an SSA name and a constant.  Return the earliest
+   SSA name in the chain as *BASE, and the sum of all constants
+   in the chain as INDEX.  */
+
+static void
+combine_feeding_adds (gimple gs, tree *base, double_int *index)
+{
+  double_int new_index;
+  tree cast_source;
+
+  while ((cast_source = source_of_legal_cast (gs)))
+    {
+      gs = SSA_NAME_DEF_STMT (cast_source);
+      *base = cast_source;
+    }
+
+  /* If there aren't any more adds, we're done.  */
+  if (!is_gimple_assign (gs)
+      || (gimple_assign_rhs_code (gs) != PLUS_EXPR
+	  && gimple_assign_rhs_code (gs) != MINUS_EXPR)
+      || TREE_CODE (gimple_assign_rhs1 (gs)) != SSA_NAME
+      || TREE_CODE (gimple_assign_rhs2 (gs)) != INTEGER_CST)
+    return;
+
+  *base = gimple_assign_rhs1 (gs);
+  new_index = tree_to_double_int (gimple_assign_rhs2 (gs));
+
+  if (gimple_assign_rhs_code (gs) == MINUS_EXPR)
+    new_index = double_int_neg (new_index);
+
+  *index = double_int_add (*index, new_index);
+  combine_feeding_adds (SSA_NAME_DEF_STMT (*base), base, index);
+}
+
+/* Given statement GS containing an integer multiply, determine
+   whether it's a possible strength-reduction candidate.  If so,
+   add it to the candidate vector and the statement-candidate
+   mapping.  */
+
+static void
+process_possible_candidate (gimple_stmt_iterator gsi, gimple gs)
+{
+  tree rhs1 = gimple_assign_rhs1 (gs);
+  tree rhs2 = gimple_assign_rhs2 (gs);
+  tree base;
+  double_int index;
+  slsr_cand_t c;
+  void **slot;
+
+  /* TODO:  Unknown constant multipliers will be dealt with in
+     stage 2.  */
+  if (TREE_CODE (rhs2) != INTEGER_CST)
+    return;
+
+  /* If RHS1 isn't fed by an addition or subtraction of an SSA_NAME
+     and a constant, then treat it as an add of zero.  Look through
+     legal casts and combine multiple chained adds to find
+     the true base name and index.  */
+  gcc_assert (TREE_CODE (rhs1) == SSA_NAME);
+  base = rhs1;
+  index = double_int_zero;
+  combine_feeding_adds (SSA_NAME_DEF_STMT (rhs1), &base, &index);
+
+  c = (slsr_cand_t)pool_alloc (cand_pool);
+  c->cand_stmt = gs;
+  c->base_name = base;
+  c->stride = rhs2;
+  c->index = index;
+  c->cand_num = VEC_length (slsr_cand_t, cand_vec) + 1;
+  c->dependent = 0;
+  c->sibling = 0;
+  c->def_phi = NULL;
+  c->cand_gsi = gsi;
+  c->basis = find_basis_for_candidate (c);
+
+  VEC_safe_push (slsr_cand_t, heap, cand_vec, c);
+
+  slot = htab_find_slot (stmt_cand_map, c, INSERT);
+  gcc_assert (!*slot);
+  *slot = c;
+}
+
+/* Find strength-reduction candidates in block BB.  */
+
+static void
+find_candidates_in_block (basic_block bb)
+{
+  gimple_stmt_iterator gsi;
+  basic_block son;
+
+  /* Process this block.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple gs = gsi_stmt (gsi);
+      if (is_gimple_assign (gs)
+	  && gimple_assign_rhs_code (gs) == MULT_EXPR
+	  && INTEGRAL_MODE_P (TYPE_MODE (TREE_TYPE (gimple_assign_lhs (gs)))))
+	process_possible_candidate (gsi, gs);
+    }
+
+  /* Process dominated blocks.  */
+  for (son = first_dom_son (CDI_DOMINATORS, bb);
+       son;
+       son = next_dom_son (CDI_DOMINATORS, son))
+    find_candidates_in_block (son);
+}
+
+/* Dump a candidate for debug.  */
+
+static void
+dump_candidate (slsr_cand_t c)
+{
+  fprintf (dump_file, "%3d  [%d] ", c->cand_num,
+	   gimple_bb (c->cand_stmt)->index);
+  print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
+  fputs ("     base:        ", dump_file);
+  print_generic_expr (dump_file, c->base_name, 0);
+  fputs ("\n     index:       ", dump_file);
+  dump_double_int (dump_file, c->index, false);
+  fputs ("\n     stride:      ", dump_file);
+  print_generic_expr (dump_file, c->stride, 0);
+  fprintf (dump_file, "\n     basis:     %3d", c->basis);
+  fprintf (dump_file, "\n     dependent: %3d", c->dependent);
+  fprintf (dump_file, "\n     sibling:   %3d", c->sibling);
+  fputs ("\n     phi:         ", dump_file);
+  if (c->def_phi)
+    print_gimple_stmt (dump_file, c->def_phi, 0, 0);
+  else
+    fputs ("n/a", dump_file);
+  fputs ("\n\n", dump_file);
+}
+
+/* Dump the candidate vector for debug.  */
+
+static void
+dump_cand_vec (void)
+{
+  unsigned int i;
+  slsr_cand_t c;
+
+  fprintf (dump_file, "\nStrength reduction candidate vector:\n\n");
+  
+  FOR_EACH_VEC_ELT (slsr_cand_t, cand_vec, i, c)
+    dump_candidate (c);
+}
+
+/* Recursive helper for unconditional_cands_with_known_stride_p.
+   Returns TRUE iff C, its siblings, and its dependents are all
+   unconditional candidates.  */
+
+static bool
+unconditional_cands (slsr_cand_t c)
+{
+  if (c->def_phi)
+    return false;
+
+  if (c->sibling && !unconditional_cands (lookup_cand (c->sibling)))
+    return false;
+
+  if (c->dependent && !unconditional_cands (lookup_cand (c->dependent)))
+    return false;
+
+  return true;
+}
+
+/* Determine whether or not the tree of candidates rooted at
+   ROOT consists entirely of unconditional increments with
+   an INTEGER_CST stride.  */
+
+static bool
+unconditional_cands_with_known_stride_p (slsr_cand_t root)
+{
+  /* The stride is identical for all related candidates, so
+     check it once.  */
+  if (TREE_CODE (root->stride) != INTEGER_CST)
+    return false;
+
+  return unconditional_cands (lookup_cand (root->dependent));
+}
+
+/* GS is an add or subtract that used to be a multiply.  Follow
+   its immediate uses to update the candidate table accordingly.  */
+
+static void
+update_chained_candidates (gimple gs)
+{
+  tree base_name = gimple_assign_lhs (gs);
+  imm_use_iterator use_iter;
+  use_operand_p use_p;
+  tree cast_target;
+
+  FOR_EACH_IMM_USE_FAST (use_p, use_iter, base_name)
+    {
+      gimple use_stmt = USE_STMT (use_p);
+      enum tree_code code;
+      slsr_cand_t c;
+
+      if (!is_gimple_assign (use_stmt))
+	continue;
+
+      /* Look forward through converts that don't change semantics.  */
+      cast_target = target_of_legal_cast (use_stmt);
+      if (cast_target)
+	{
+	  update_chained_candidates (use_stmt);
+	  continue;
+	}
+
+      code = gimple_assign_rhs_code (use_stmt);
+
+      /* Case 1:  The name defined by the add appears in a 
+	 multiply that exists in the candidate table.  */
+      if (code == MULT_EXPR)
+	{
+	  slsr_cand mapping_key;
+	  tree base;
+	  double_int index;
+
+	  mapping_key.cand_stmt = use_stmt;
+	  if (!(c = (slsr_cand_t)htab_find (stmt_cand_map, &mapping_key)))
+	    continue;
+
+	  base = base_name;
+	  index = double_int_zero;
+	  combine_feeding_adds (gs, &base, &index);
+	  c->base_name = base;
+	  c->index = index;
+	  
+	  if (!c->basis)
+	    c->basis = find_basis_for_candidate (c);
+	  else
+	    /* If all has worked correctly, the modified candidate has
+	       the same basis as before.  */
+	    gcc_assert (lookup_cand (c->basis)
+			== find_basis_for_candidate_1 (c, base, NULL));
+
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fputs ("\nRevised candidate:\n", dump_file);
+	      dump_candidate (c);
+	    }
+	}
+
+      /* Case 2: The name defined by the add appears in another
+	 add with a constant, which in turn feeds one or more multiplies
+	 in the candidate table.  Recurse to find the multiplies.  */
+      else if (stmt_adds_base_to_const (use_stmt, code, base_name))
+	update_chained_candidates (use_stmt);
+    }
+}
+
+/* Replace candidate C, each sibling of candidate C, and each
+   dependent of candidate C with an add or subtract.  */
+
+static void
+replace_dependents (slsr_cand_t c)
+{
+  slsr_cand_t basis = lookup_cand (c->basis);
+  tree basis_name = gimple_assign_lhs (basis->cand_stmt);
+  tree incr_type = TREE_TYPE (gimple_assign_rhs1 (c->cand_stmt));
+  double_int increment, stride;
+  
+  if (operand_equal_p (c->base_name, basis->base_name, 0))
+    increment = double_int_sub (c->index, basis->index);
+  else
+    increment = c->index;
+
+  stride = tree_to_double_int (c->stride);
+  increment = double_int_mul (increment, stride);
+
+  /* It is highly unlikely, but possible, that the resulting
+     increment doesn't fit in a HWI.  Abandon the replacement
+     in this case.  This does not affect siblings or dependents
+     of C.  */
+  /* Restriction to signed HWI is conservative for unsigned types
+     but allows for safe negation without twisted logic.  */
+  if (double_int_fits_in_shwi_p (increment))
+    {
+      enum tree_code code = PLUS_EXPR;
+      tree orig_type
+	= TREE_TYPE (SSA_NAME_VAR (gimple_assign_rhs1 (c->cand_stmt)));
+      tree repl_type = TREE_TYPE (SSA_NAME_VAR (basis_name));
+      tree incr_tree;
+
+      if (double_int_negative_p (increment))
+	{
+	  code = MINUS_EXPR;
+	  increment = double_int_neg (increment);
+	}
+
+      incr_tree = double_int_to_tree (incr_type, increment);
+
+      /* A widening cast may be necessary.  */
+      if (orig_type != repl_type)
+	{
+	  tree new_basis_name = create_tmp_reg (orig_type, "slsr");
+	  gimple nop_stmt;
+	  add_referenced_var (new_basis_name);
+	  new_basis_name = make_ssa_name (new_basis_name, NULL);
+	  nop_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_basis_name,
+						   basis_name, NULL_TREE);
+	  gimple_set_location (nop_stmt, gimple_location (c->cand_stmt));
+	  gsi_insert_before (&c->cand_gsi, nop_stmt, GSI_SAME_STMT);
+	  basis_name = new_basis_name;
+
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fputs ("Inserting: ", dump_file);
+	      print_gimple_stmt (dump_file, nop_stmt, 0, 0);
+	    }
+	}
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fputs ("Replacing: ", dump_file);
+	  print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
+	}
+
+      gimple_assign_set_rhs_with_ops (&c->cand_gsi, code,
+				      basis_name, incr_tree);
+      update_stmt (c->cand_stmt);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fputs ("With: ", dump_file);
+	  print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
+	  fputs ("\n", dump_file);
+	}
+
+      /* The multiply we just converted to an add might well feed
+	 into one or more other multiplies in the candidate table.
+	 If so, we need to update those candidate entries.  */
+      update_chained_candidates (c->cand_stmt);
+    }
+
+  if (c->sibling)
+    replace_dependents (lookup_cand (c->sibling));
+
+  if (c->dependent)
+    replace_dependents (lookup_cand (c->dependent));
+}
+
+/* Analyze costs of related candidates in the candidate vector,
+   and make beneficial replacements.  */
+
+static void
+analyze_candidates_and_replace (void)
+{
+  unsigned int i;
+  slsr_cand_t c;
+
+  /* Each candidate that has a null basis and a non-null
+     dependent is the root of a tree of related statements.
+     Analyze each tree to determine a subset of those
+     statements that can be replaced with maximum benefit.  */
+  FOR_EACH_VEC_ELT (slsr_cand_t, cand_vec, i, c)
+    {
+      if (c->basis != 0 || c->dependent == 0)
+	continue;
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "\nProcessing dependency tree rooted at %d.\n",
+		 c->cand_num);
+
+      /* If the common stride of all related candidates is a
+	 known constant, and none of these has a phi-dependence,
+	 then all replacements are considered profitable.
+	 Each replaces a multiply by a single add, with the
+	 possibility that a feeding add also goes dead as a
+	 result.  */
+      if (unconditional_cands_with_known_stride_p (c))
+	replace_dependents (lookup_cand (c->dependent));
+
+      /* TODO:  When the stride is an SSA name, it may still
+	 be profitable to replace some or all of the dependent
+	 candidates, depending on whether the introduced increments
+	 can be reused.  */
+
+      /* TODO:  When conditional increments occur so that a 
+	 candidate is dependent upon a phi-basis, the cost of
+	 introducing a temporary must be accounted for.  */
+
+      /* TODO:  Strength reduction candidates hidden in 
+	 addressing expressions are not yet handled, and will
+	 have more complex cost functions.  */
+    }
+}
+
+/* Main entry point for straight-line strength reduction.  */
+
+static unsigned int
+execute_strength_reduction (void)
+{
+  /* Create the allocation pool where candidates will reside.  */
+  cand_pool = create_alloc_pool ("Strength reduction pool",
+				 sizeof (slsr_cand), 128);
+
+  /* Allocate the candidate vector.  */
+  cand_vec = VEC_alloc (slsr_cand_t, heap, 128);
+
+  /* Allocate the mapping from statements to candidate indices.  */
+  stmt_cand_map = htab_create (500, stmt_cand_hash,
+			       stmt_cand_eq, stmt_cand_free);
+
+  /* Initialize the loop optimizer.  We need to detect flow across
+     back edges, and this gives us dominator information as well.  */
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
+
+  /* Walk the CFG in predominator order looking for strength reduction
+     candidates.  */
+  find_candidates_in_block (ENTRY_BLOCK_PTR);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    dump_cand_vec ();
+
+  /* Analyze costs and make appropriate replacements.  */
+  analyze_candidates_and_replace ();
+
+  /* Free resources.  */
+  loop_optimizer_finalize ();
+  htab_delete (stmt_cand_map);
+  VEC_free (slsr_cand_t, heap, cand_vec);
+  free_alloc_pool (cand_pool);
+
+  return 0;
+}
+
+static bool
+gate_strength_reduction (void)
+{
+  return optimize > 0;
+}
+
+struct gimple_opt_pass pass_strength_reduction =
+{
+ {
+  GIMPLE_PASS,
+  "slsr",				/* name */
+  gate_strength_reduction,		/* gate */
+  execute_strength_reduction,		/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_TREE_SLSR,				/* tv_id */
+  PROP_cfg | PROP_ssa,			/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func | TODO_verify_ssa	/* todo_flags_finish */
+ }
+};
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 180617)
+++ gcc/Makefile.in	(working copy)
@@ -1475,6 +1475,7 @@  OBJS = \
 	tree-ssa-reassoc.o \
 	tree-ssa-sccvn.o \
 	tree-ssa-sink.o \
+	tree-ssa-strength-reduction.o \
 	tree-ssa-strlen.o \
 	tree-ssa-structalias.o \
 	tree-ssa-tail-merge.o \
@@ -2519,6 +2520,10 @@  tree-ssa-sccvn.o : tree-ssa-sccvn.c $(TREE_FLOW_H)
    alloc-pool.h $(BASIC_BLOCK_H) $(BITMAP_H) langhooks.h $(HASHTAB_H) $(GIMPLE_H) \
    $(TREE_INLINE_H) tree-iterator.h tree-ssa-propagate.h tree-ssa-sccvn.h \
    $(PARAMS_H) tree-pretty-print.h gimple-pretty-print.h gimple-fold.h
+tree-ssa-strength-reduction.o : tree-ssa-strength-reduction.c $(CONFIG_H) \
+   $(SYSTEM_H) coretypes.h $(TREE_H) $(GIMPLE_H) $(BASIC_BLOCK_H) \
+   $(TREE_PASS_H) $(TIMEVAR_H) $(CFGLOOP_H) tree-pretty-print.h alloc-pool.h \
+   $(TREE_FLOW_H)
 tree-vrp.o : tree-vrp.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
    $(TREE_FLOW_H) $(TREE_PASS_H) $(TREE_DUMP_H) $(DIAGNOSTIC_H) $(GGC_H) \
    $(BASIC_BLOCK_H) tree-ssa-propagate.h $(FLAGS_H) $(TREE_DUMP_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 180617)
+++ gcc/passes.c	(working copy)
@@ -1368,6 +1368,7 @@  init_optimization_passes (void)
 	}
       NEXT_PASS (pass_lower_vector_ssa);
       NEXT_PASS (pass_cse_reciprocals);
+      NEXT_PASS (pass_strength_reduction);
       NEXT_PASS (pass_reassoc);
       NEXT_PASS (pass_vrp);
       NEXT_PASS (pass_dominator);