diff mbox

Strength reduction part 3 of 4: candidates with unknown strides

Message ID 1343842586.4033.15.camel@oc2474580526.ibm.com
State New
Headers show

Commit Message

Bill Schmidt Aug. 1, 2012, 5:36 p.m. UTC
Greetings,

Thanks for the review of part 2!  Here's another chunk of the SLSR code
(I feel I owe you a few beers at this point).  This performs analysis
and replacement on groups of related candidates having an SSA name
(rather than a constant) for a stride.

This leaves only the conditional increment (CAND_PHI) case, which will
be handled in the last patch of the series.

Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
regressions.  Ok for trunk?

Thanks,
Bill


gcc:

2012-08-01  Bill Schmidt  <wschmidt@linux.ibm.com>

	* gimple-ssa-strength-reduction.c (struct incr_info_d): New struct.
	(incr_vec): New static var.
	(incr_vec_len): Likewise.
	(address_arithmetic_p): Likewise.
	(stmt_cost): Remove dead assignment.
	(dump_incr_vec): New function.
	(cand_abs_increment): Likewise.
	(lazy_create_slsr_reg): Likewise.
	(incr_vec_index): Likewise.
	(count_candidates): Likewise.
	(record_increment): Likewise.
	(record_increments): Likewise.
	(unreplaced_cand_in_tree): Likewise.
	(optimize_cands_for_speed_p): Likewise.
	(lowest_cost_path): Likewise.
	(total_savings): Likewise.
	(analyze_increments): Likewise.
	(ncd_for_two_cands): Likewise.
	(nearest_common_dominator_for_cands): Likewise.
	(profitable_increment_p): Likewise.
	(insert_initializers): Likewise.
	(introduce_cast_before_cand): Likewise.
	(replace_rhs_if_not_dup): Likewise.
	(replace_one_candidate): Likewise.
	(replace_profitable_candidates): Likewise.
	(analyze_candidates_and_replace): Handle candidates with SSA-name
	strides.

gcc/testsuite:

2012-08-01  Bill Schmidt  <wschmidt@linux.ibm.com>

	* gcc.dg/tree-ssa/slsr-5.c: New.
	* gcc.dg/tree-ssa/slsr-6.c: New.
	* gcc.dg/tree-ssa/slsr-7.c: New.
	* gcc.dg/tree-ssa/slsr-8.c: New.
	* gcc.dg/tree-ssa/slsr-9.c: New.
	* gcc.dg/tree-ssa/slsr-10.c: New.
	* gcc.dg/tree-ssa/slsr-11.c: New.
	* gcc.dg/tree-ssa/slsr-12.c: New.
	* gcc.dg/tree-ssa/slsr-13.c: New.
	* gcc.dg/tree-ssa/slsr-14.c: New.
	* gcc.dg/tree-ssa/slsr-15.c: New.
	* gcc.dg/tree-ssa/slsr-16.c: New.
	* gcc.dg/tree-ssa/slsr-17.c: New.
	* gcc.dg/tree-ssa/slsr-18.c: New.
	* gcc.dg/tree-ssa/slsr-19.c: New.
	* gcc.dg/tree-ssa/slsr-20.c: New.
	* gcc.dg/tree-ssa/slsr-21.c: New.
	* gcc.dg/tree-ssa/slsr-22.c: New.
	* gcc.dg/tree-ssa/slsr-23.c: New.
	* gcc.dg/tree-ssa/slsr-24.c: New.
	* gcc.dg/tree-ssa/slsr-25.c: New.
	* gcc.dg/tree-ssa/slsr-26.c: New.
	* gcc.dg/tree-ssa/slsr-30.c: New.
	* gcc.dg/tree-ssa/slsr-31.c: New.

Comments

Richard Biener Aug. 7, 2012, 11:42 a.m. UTC | #1
On Wed, 1 Aug 2012, William J. Schmidt wrote:

> Greetings,
> 
> Thanks for the review of part 2!  Here's another chunk of the SLSR code
> (I feel I owe you a few beers at this point).  This performs analysis
> and replacement on groups of related candidates having an SSA name
> (rather than a constant) for a stride.
> 
> This leaves only the conditional increment (CAND_PHI) case, which will
> be handled in the last patch of the series.
> 
> Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> regressions.  Ok for trunk?

Ok with

+      /* Create a new SSA name to hold the initializer's value.  */
+      stride_type = TREE_TYPE (SSA_NAME_VAR (c->stride));

just using TREE_TYPE () on the SSA_NAMEs, no need to look at
SSA_NAME_VAR.

Thanks,
Richard.

> Thanks,
> Bill
> 
> 
> gcc:
> 
> 2012-08-01  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> 	* gimple-ssa-strength-reduction.c (struct incr_info_d): New struct.
> 	(incr_vec): New static var.
> 	(incr_vec_len): Likewise.
> 	(address_arithmetic_p): Likewise.
> 	(stmt_cost): Remove dead assignment.
> 	(dump_incr_vec): New function.
> 	(cand_abs_increment): Likewise.
> 	(lazy_create_slsr_reg): Likewise.
> 	(incr_vec_index): Likewise.
> 	(count_candidates): Likewise.
> 	(record_increment): Likewise.
> 	(record_increments): Likewise.
> 	(unreplaced_cand_in_tree): Likewise.
> 	(optimize_cands_for_speed_p): Likewise.
> 	(lowest_cost_path): Likewise.
> 	(total_savings): Likewise.
> 	(analyze_increments): Likewise.
> 	(ncd_for_two_cands): Likewise.
> 	(nearest_common_dominator_for_cands): Likewise.
> 	(profitable_increment_p): Likewise.
> 	(insert_initializers): Likewise.
> 	(introduce_cast_before_cand): Likewise.
> 	(replace_rhs_if_not_dup): Likewise.
> 	(replace_one_candidate): Likewise.
> 	(replace_profitable_candidates): Likewise.
> 	(analyze_candidates_and_replace): Handle candidates with SSA-name
> 	strides.
> 
> gcc/testsuite:
> 
> 2012-08-01  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> 	* gcc.dg/tree-ssa/slsr-5.c: New.
> 	* gcc.dg/tree-ssa/slsr-6.c: New.
> 	* gcc.dg/tree-ssa/slsr-7.c: New.
> 	* gcc.dg/tree-ssa/slsr-8.c: New.
> 	* gcc.dg/tree-ssa/slsr-9.c: New.
> 	* gcc.dg/tree-ssa/slsr-10.c: New.
> 	* gcc.dg/tree-ssa/slsr-11.c: New.
> 	* gcc.dg/tree-ssa/slsr-12.c: New.
> 	* gcc.dg/tree-ssa/slsr-13.c: New.
> 	* gcc.dg/tree-ssa/slsr-14.c: New.
> 	* gcc.dg/tree-ssa/slsr-15.c: New.
> 	* gcc.dg/tree-ssa/slsr-16.c: New.
> 	* gcc.dg/tree-ssa/slsr-17.c: New.
> 	* gcc.dg/tree-ssa/slsr-18.c: New.
> 	* gcc.dg/tree-ssa/slsr-19.c: New.
> 	* gcc.dg/tree-ssa/slsr-20.c: New.
> 	* gcc.dg/tree-ssa/slsr-21.c: New.
> 	* gcc.dg/tree-ssa/slsr-22.c: New.
> 	* gcc.dg/tree-ssa/slsr-23.c: New.
> 	* gcc.dg/tree-ssa/slsr-24.c: New.
> 	* gcc.dg/tree-ssa/slsr-25.c: New.
> 	* gcc.dg/tree-ssa/slsr-26.c: New.
> 	* gcc.dg/tree-ssa/slsr-30.c: New.
> 	* gcc.dg/tree-ssa/slsr-31.c: New.
> 
> 
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-10.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-10.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-10.c	(revision 0)
> @@ -0,0 +1,23 @@
> +/* Verify straight-line strength reduction for simple integer addition
> +   with stride reversed on 1st and 3rd instances.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int s, int c)
> +{
> +  int a1, a2, a3, x1, x2, x3, x;
> +
> +  a1 = 2 * s;
> +  x1 = a1 + c;
> +  a2 = 4 * s;
> +  x2 = c + a2;
> +  a3 = 6 * s;
> +  x3 = a3 + c;
> +  x = x1 + x2 + x3;
> +  return x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-11.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-11.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-11.c	(revision 0)
> @@ -0,0 +1,24 @@
> +/* Verify straight-line strength reduction for simple integer addition
> +   with casts thrown in.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +long
> +f (int s, long c)
> +{
> +  int a1, a2, a3;
> +  long x1, x2, x3, x;
> +
> +  a1 = 2 * s;
> +  x1 = c + a1;
> +  a2 = 4 * s;
> +  x2 = c + a2;
> +  a3 = 6 * s;
> +  x3 = c + a3;
> +  x = x1 + x2 + x3;
> +  return x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-20.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-20.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-20.c	(revision 0)
> @@ -0,0 +1,21 @@
> +/* Verify straight-line strength reduction for multiply candidates
> +   with stride in inconsistent positions.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int c, int s)
> +{
> +  int x1, x2, y1, y2;
> +
> +  y1 = c + 2;
> +  x1 = y1 * s;
> +  y2 = y1 + 2;
> +  x2 = s * y2;
> +  return x1 + x2;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* s" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\* 2" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-12.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-12.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-12.c	(revision 0)
> @@ -0,0 +1,30 @@
> +/* Verify that no straight-line strength reduction occurs across sibling
> +   blocks.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int s, int c)
> +{
> +  int a1, a2, a3, x1, x2, x3, x;
> +
> +  if (c > 0)
> +    {
> +      a1 = 2 * s;
> +      x1 = c + a1;
> +    }
> +  else
> +    {
> +      a1 = 4 * s;
> +      x1 = c + a1;
> +    }
> +
> +  a2 = 6 * s;
> +  x2 = c + a2;
> +  x = x1 + x2;
> +  return x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-5.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-5.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-5.c	(revision 0)
> @@ -0,0 +1,22 @@
> +/* Verify straight-line strength reduction for simple add candidates.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int s, int c)
> +{
> +  int a1, a2, a3, x1, x2, x3, x;
> +
> +  a1 = 2 * s;
> +  x1 = c + a1;
> +  a2 = 4 * s;
> +  x2 = c + a2;
> +  a3 = 6 * s;
> +  x3 = c + a3;
> +  x = x1 + x2 + x3;
> +  return x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-21.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-21.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-21.c	(revision 0)
> @@ -0,0 +1,32 @@
> +/* Verify straight-line strength reduction for multiply candidates
> +   with variable stride and control flow.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int n, int x, int stride)
> +{
> +  int a, x1, x2, x3;
> +
> +  a = x * stride;
> +
> +  if (n > 64)
> +    {
> +      x1 = x + 3;
> +      a += x1 * stride;
> +      x2 = x1 + 3;
> +      a += x2 * stride;
> +    }
> +  else
> +    {
> +      x3 = x + 3;
> +      a += x3 * stride;
> +    }
> +
> +  return a;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* stride" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\* 3" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c	(revision 0)
> @@ -0,0 +1,25 @@
> +/* Verify straight-line strength reduction fails for simple integer addition
> +   with casts thrown in when -fwrapv is used.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-dom2 -fwrapv" } */
> +/* { dg-skip-if "" { ilp32 } { "-m32" } { "" } } */
> +
> +long
> +f (int s, long c)
> +{
> +  int a1, a2, a3;
> +  long x1, x2, x3, x;
> +
> +  a1 = 2 * s;
> +  x1 = c + a1;
> +  a2 = 4 * s;
> +  x2 = c + a2;
> +  a3 = 6 * s;
> +  x3 = c + a3;
> +  x = x1 + x2 + x3;
> +  return x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 3 "dom2" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-13.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-13.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-13.c	(revision 0)
> @@ -0,0 +1,25 @@
> +/* x2 and x3 will be strength-reduced based on the same statement
> +   but with different variables as the stride.  Note that they will
> +   be strength-reduced by introducing an initializer 4*s which is
> +   cheaper than 5*s; similar for 4*c and 5*c.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int s, int c)
> +{
> +  int a2, a3, x1, x2, x3, x;
> +
> +  x1 = c + s;
> +  a2 = 5 * s;
> +  x2 = c + a2;
> +  a3 = 5 * c;
> +  x3 = s + a3;
> +  x = x1 + x2 + x3;
> +  return x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* 4" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\* 5" 0 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-6.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-6.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-6.c	(revision 0)
> @@ -0,0 +1,25 @@
> +/* Verify straight-line strength reduction for simple add candidates,
> +   pointer version.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +void
> +f (int s, char *c, char *x1, char *x2, char *x3)
> +{
> +  int a1, a2, a3;
> +
> +  a1 = 2 * s;
> +  x1 = c + a1;
> +  *x1 = 1;
> +  a2 = 4 * s;
> +  x2 = c + a2;
> +  *x2 = 2;
> +  a3 = 6 * s;
> +  x3 = c + a3;
> +  *x3 = 3;
> +}
> +
> +/* There will be four ' * ' instances for the parms, one in the code.  */
> +/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-22.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-22.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-22.c	(revision 0)
> @@ -0,0 +1,29 @@
> +/* Verify straight-line strength reduction for multiply candidates
> +   with variable stride and control flow.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int n, int x, int stride)
> +{
> +  int a, x1, x2, x3;
> +
> +  a = x * stride;
> +
> +  if (n > 64)
> +    {
> +      x1 = x + 3;
> +      a += x1 * stride;
> +      x2 = x1 + 3;
> +      a += x2 * stride;
> +      x3 = x2 + 3;
> +      a += x3 * stride;
> +    }
> +
> +  return a;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* stride" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\* 3" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-31.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-31.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-31.c	(revision 0)
> @@ -0,0 +1,27 @@
> +/* Verify straight-line strength reduction for add candidates in
> +   which the stride is unknown and increments appear that differ
> +   only in sign.  Verify the increments are shared.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int s, int c)
> +{
> +  int a1, a2, a3, a4, x1, x2, x3, x4, x;
> +
> +  a1 = 2 * s;
> +  x1 = c + a1;
> +  a2 = 4 * s;  /* incr = +2  */
> +  x2 = c + a2;
> +  a3 = 7 * s;
> +  x3 = c + a3;
> +  a4 = 5 * s;  /* incr = -2  */
> +  x4 = c + a4;
> +  x = x1 + x2 + x3 + x4;
> +  return x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* 2" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\* -2" 0 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-14.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-14.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-14.c	(revision 0)
> @@ -0,0 +1,32 @@
> +/* Straight-line strength reduction control flow variation.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int n, int c, int s)
> +{
> +  int a1, a2, x, x1, x2, x3, x4;
> +
> +  a1 = 2 * s;
> +
> +  if (n > 64)
> +    {
> +      x1 = c + a1;
> +      a2 = 4 * s;
> +      x2 = c + a2;
> +      x = x1 + x2;
> +    }
> +  else
> +    {
> +      x3 = c + a1;
> +      a2 = 4 * s;
> +      x4 = c + a2;
> +      x = x4 / x3;
> +    }
> +
> +  return x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-7.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-7.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-7.c	(revision 0)
> @@ -0,0 +1,22 @@
> +/* Verify straight-line strength reduction for simple integer subtraction.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int s, int c)
> +{
> +  int a1, a2, a3, x1, x2, x3, x;
> +
> +  a1 = 2 * s;
> +  x1 = c - a1;
> +  a2 = 4 * s;
> +  x2 = c - a2;
> +  a3 = 6 * s;
> +  x3 = c - a3;
> +  x = x1 + x2 + x3;
> +  return x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-23.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-23.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-23.c	(revision 0)
> @@ -0,0 +1,29 @@
> +/* Verify straight-line strength reduction for multiply candidates
> +   with variable stride and control flow.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int 
> +f (int n, int x, int stride)
> +{
> +  int a, x1, x2, x3;
> +
> +  a = x * stride;
> +  x1 = x + 3;
> +  a += x1 * stride;
> +
> +  if (n > 64)
> +    {
> +      x2 = x1 + 3;
> +      a += x2 * stride;
> +      x3 = x2 + 3;
> +      a += x3 * stride;
> +    }
> +
> +  return a;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* stride" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\* 3" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-15.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-15.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-15.c	(revision 0)
> @@ -0,0 +1,27 @@
> +/* Straight-line strength reduction control flow variation.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int 
> +f (int n, int c, int s)
> +{
> +  int a, x1, x2, x3;
> +
> +  x1 = x2 = x3 = c;
> +
> +  if (n > 64)
> +    {
> +      a = 2 * s;
> +      x1 = c + a;
> +      a = 4 * s;
> +      x2 = c + a;
> +      a = 6 * s;
> +      x3 = c + a;
> +    }
> +
> +  return x1 + x2 + x3;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c	(revision 0)
> @@ -0,0 +1,23 @@
> +/* Verify straight-line strength reduction for simple pointer subtraction.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int*
> +f (int s, int *c)
> +{
> +  int a1, a2, a3, *x1, *x2, *x3;
> +
> +  a1 = 2 * s;
> +  x1 = c - a1;
> +  a2 = 4 * s;
> +  x2 = c - a2;
> +  a3 = 6 * s;
> +  x3 = c - a3;
> +  return x1 ? x2 : x3;
> +}
> +
> +/* There are 2 ' * ' instances in the decls (since "int * x3;" is
> +   optimized out), 1 parm, 2 in the code.  */
> +/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-24.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-24.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-24.c	(revision 0)
> @@ -0,0 +1,31 @@
> +/* Verify straight-line strength reduction for multiply candidates
> +   with variable stride and control flow, increment = 1.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int n, int x, int stride)
> +{
> +  int a, x1, x2, x3;
> +
> +  a = x * stride;
> +
> +  if (n > 64)
> +    {
> +      x1 = x + 1;
> +      a += x1 * stride;
> +      x2 = x1 + 1;
> +      a += x2 * stride;
> +    }
> +  else
> +    {
> +      x3 = x + 1;
> +      a += x3 * stride;
> +    }
> +
> +  return a;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-16.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-16.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-16.c	(revision 0)
> @@ -0,0 +1,28 @@
> +/* Straight-line strength reduction control flow variation.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int n, int c, int s)
> +{
> +  int a2, a3, a4, x1, x2, x3, x4;
> +
> +  x1 = c + s;
> +  a2 = 3 * s;
> +  x2 = c + a2;
> +  x3 = x4 = c;
> +
> +  if (n > 64)
> +    {
> +      a3 = 5 * s;
> +      x3 = c + a3;
> +      a4 = 7 * s;
> +      x4 = c + a4;
> +    }
> +
> +  return x1 + x2 + x3 + x4;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-9.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-9.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-9.c	(revision 0)
> @@ -0,0 +1,23 @@
> +/* Verify straight-line strength reduction for simple integer addition
> +   with stride reversed.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int s, int c)
> +{
> +  int a1, a2, a3, x1, x2, x3, x;
> +
> +  a1 = 2 * s;
> +  x1 = a1 + c;
> +  a2 = 4 * s;
> +  x2 = a2 + c;
> +  a3 = 6 * s;
> +  x3 = a3 + c;
> +  x = x1 + x2 + x3;
> +  return x;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-25.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-25.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-25.c	(revision 0)
> @@ -0,0 +1,31 @@
> +/* Verify straight-line strength reduction for multiply candidates
> +   with variable stride and control flow, increment = -1.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int 
> +f (int n, int x, int stride)
> +{
> +  int a, x1, x2, x3;
> +
> +  a = x * stride;
> +
> +  if (n > 64)
> +    {
> +      x1 = x - 1;
> +      a += x1 * stride;
> +      x2 = x1 - 1;
> +      a += x2 * stride;
> +    }
> +  else
> +    {
> +      x3 = x - 1;
> +      a += x3 * stride;
> +    }
> +
> +  return a;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-17.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-17.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-17.c	(revision 0)
> @@ -0,0 +1,31 @@
> +/* Straight-line strength reduction control flow variation with incr = 1.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int n, int c, int s)
> +{
> +  int a2, a3, a4, x1, x2, x3, x4;
> +
> +  x1 = c + s;
> +  x2 = x3 = x4 = c;
> +
> +  if (n > 64)
> +    {
> +      a2 = 2 * s;
> +      x2 = c + a2;
> +      a3 = 3 * s;
> +      x3 = c + a3;
> +    }
> +  else
> +    {
> +      a4 = 2 * s;
> +      x4 = c + a4;
> +    }
> +
> +  return x1 + x2 + x3 + x4;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 0 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-26.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-26.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-26.c	(revision 0)
> @@ -0,0 +1,32 @@
> +/* Verify straight-line strength reduction for multiply candidates
> +   with variable stride and control flow, increment = -3.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int n, int x, int stride)
> +{
> +  int a, x1, x2, x3;
> +
> +  a = x * stride;
> +
> +  if (n > 64)
> +    {
> +      x1 = x - 3;
> +      a += x1 * stride;
> +      x2 = x1 - 3;
> +      a += x2 * stride;
> +    }
> +  else
> +    {
> +      x3 = x - 3;
> +      a += x3 * stride;
> +    }
> +
> +  return a;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* stride" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\* 3" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-18.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-18.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-18.c	(revision 0)
> @@ -0,0 +1,32 @@
> +/* Straight-line strength reduction control flow variation with incr = -1.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int n, int c, int s)
> +{
> +  int a1, a2, a3, a4, x1, x2, x3, x4;
> +
> +  a1 = 4 * s;
> +  x1 = c + a1;
> +  x2 = x3 = x4 = c;
> +
> +  if (n > 64)
> +    {
> +      a2 = 3 * s;
> +      x2 = c + a2;
> +      a3 = 2 * s;
> +      x3 = c + a3;
> +    }
> +  else
> +    {
> +      a4 = 3 * s;
> +      x4 = c + a4;
> +    }
> +
> +  return x1 + x2 + x3 + x4;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-19.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-19.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-19.c	(revision 0)
> @@ -0,0 +1,22 @@
> +/* Verify straight-line strength reduction for multiply candidates
> +   with stride in RHS1 position.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +int
> +f (int c, int s)
> +{
> +  int x1, x2, y1, y2;
> +
> +  y1 = c + 2;
> +  x1 = s * y1;
> +  y2 = y1 + 2;
> +  x2 = s * y2;
> +  return x1 + x2;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " \\* y" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\* 2" 1 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> +
> Index: gcc/gimple-ssa-strength-reduction.c
> ===================================================================
> --- gcc/gimple-ssa-strength-reduction.c	(revision 190038)
> +++ gcc/gimple-ssa-strength-reduction.c	(working copy)
> @@ -30,8 +30,7 @@ along with GCC; see the file COPYING3.  If not see
>     1) Explicit multiplies, known constant multipliers, no
>        conditional increments. (complete)
>     2) Explicit multiplies, unknown constant multipliers,
> -      no conditional increments. (data gathering complete,
> -      replacements pending)
> +      no conditional increments. (complete)
>     3) Implicit multiplies in addressing expressions. (complete)
>     4) Explicit multiplies, conditional increments. (pending)
>  
> @@ -235,6 +234,41 @@ struct cand_chain_d
>  typedef struct cand_chain_d cand_chain, *cand_chain_t;
>  typedef const struct cand_chain_d *const_cand_chain_t;
>  
> +/* Information about a unique "increment" associated with candidates
> +   having an SSA name for a stride.  An increment is the difference
> +   between the index of the candidate and the index of its basis,
> +   i.e., (i - i') as discussed in the module commentary.
> +
> +   When we are not going to generate address arithmetic we treat
> +   increments that differ only in sign as the same, allowing sharing
> +   of the cost of initializers.  The absolute value of the increment
> +   is stored in the incr_info.  */
> +
> +struct incr_info_d
> +{
> +  /* The increment that relates a candidate to its basis.  */
> +  double_int incr;
> +
> +  /* How many times the increment occurs in the candidate tree.  */
> +  unsigned count;
> +
> +  /* Cost of replacing candidates using this increment.  Negative and
> +     zero costs indicate replacement should be performed.  */
> +  int cost;
> +
> +  /* If this increment is profitable but is not -1, 0, or 1, it requires
> +     an initializer T_0 = stride * incr to be found or introduced in the
> +     nearest common dominator of all candidates.  This field holds T_0
> +     for subsequent use.  */
> +  tree initializer;
> +
> +  /* If the initializer was found to already exist, this is the block
> +     where it was found.  */
> +  basic_block init_bb;
> +};
> +
> +typedef struct incr_info_d incr_info, *incr_info_t;
> +
>  /* Candidates are maintained in a vector.  If candidate X dominates
>     candidate Y, then X appears before Y in the vector; but the
>     converse does not necessarily hold.  */
> @@ -259,6 +293,16 @@ static htab_t base_cand_map;
>  
>  /* Obstack for candidate chains.  */
>  static struct obstack chain_obstack;
> +
> +/* An array INCR_VEC of incr_infos is used during analysis of related
> +   candidates having an SSA name for a stride.  INCR_VEC_LEN describes
> +   its current length.  */
> +static incr_info_t incr_vec;
> +static unsigned incr_vec_len;
> +
> +/* For a chain of candidates with unknown stride, indicates whether or not
> +   we must generate pointer arithmetic when replacing statements.  */
> +static bool address_arithmetic_p;
>  
>  /* Produce a pointer to the IDX'th candidate in the candidate vector.  */
>  
> @@ -421,7 +465,6 @@ stmt_cost (gimple gs, bool speed)
>      case PLUS_EXPR:
>      case POINTER_PLUS_EXPR:
>      case MINUS_EXPR:
> -      rhs2 = gimple_assign_rhs2 (gs);
>        return add_cost (speed, lhs_mode);
>  
>      case NEGATE_EXPR:
> @@ -1407,6 +1450,30 @@ dump_cand_chains (void)
>    htab_traverse_noresize (base_cand_map, base_cand_dump_callback, NULL);
>    fputs ("\n", dump_file);
>  }
> +
> +/* Dump the increment vector for debug.  */
> +
> +static void
> +dump_incr_vec (void)
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      unsigned i;
> +
> +      fprintf (dump_file, "\nIncrement vector:\n\n");
> +  
> +      for (i = 0; i < incr_vec_len; i++)
> +	{
> +	  fprintf (dump_file, "%3d  increment:   ", i);
> +	  dump_double_int (dump_file, incr_vec[i].incr, false);
> +	  fprintf (dump_file, "\n     count:       %d", incr_vec[i].count);
> +	  fprintf (dump_file, "\n     cost:        %d", incr_vec[i].cost);
> +	  fputs ("\n     initializer: ", dump_file);
> +	  print_generic_expr (dump_file, incr_vec[i].initializer, 0);
> +	  fputs ("\n\n", dump_file);
> +	}
> +    }
> +}
>  
>  /* Recursive helper for unconditional_cands_with_known_stride_p.
>     Returns TRUE iff C, its siblings, and its dependents are all
> @@ -1508,6 +1575,35 @@ cand_increment (slsr_cand_t c)
>    return double_int_sub (c->index, basis->index);
>  }
>  
> +/* Calculate the increment required for candidate C relative to
> +   its basis.  If we aren't going to generate pointer arithmetic
> +   for this candidate, return the absolute value of that increment
> +   instead.  */
> +
> +static inline double_int
> +cand_abs_increment (slsr_cand_t c)
> +{
> +  double_int increment = cand_increment (c);
> +
> +  if (!address_arithmetic_p && double_int_negative_p (increment))
> +    increment = double_int_neg (increment);
> +
> +  return increment;
> +}
> +
> +/* If *VAR is NULL or is not of a compatible type with TYPE, create a
> +   new temporary reg of type TYPE and store it in *VAR.  */
> +
> +static inline void
> +lazy_create_slsr_reg (tree *var, tree type)
> +{
> +  if (!*var || !types_compatible_p (TREE_TYPE (*var), type))
> +    {
> +      *var = create_tmp_reg (type, "slsr");
> +      add_referenced_var (*var);
> +    }
> +}
> +
>  /* Return TRUE iff candidate C has already been replaced under
>     another interpretation.  */
>  
> @@ -1630,6 +1726,764 @@ replace_dependents (slsr_cand_t c)
>      replace_dependents (lookup_cand (c->dependent));
>  }
>  
> +/* Return the index in the increment vector of the given INCREMENT.  */
> +
> +static inline unsigned
> +incr_vec_index (double_int increment)
> +{
> +  unsigned i;
> +  
> +  for (i = 0;
> +       i < incr_vec_len && !double_int_equal_p (increment, incr_vec[i].incr);
> +       i++)
> +    ;
> +
> +  gcc_assert (i < incr_vec_len);
> +  return i;
> +}
> +
> +/* Count the number of candidates in the tree rooted at C that have
> +   not already been replaced under other interpretations.  */
> +
> +static unsigned
> +count_candidates (slsr_cand_t c)
> +{
> +  unsigned count = cand_already_replaced (c) ? 0 : 1;
> +
> +  if (c->sibling)
> +    count += count_candidates (lookup_cand (c->sibling));
> +
> +  if (c->dependent)
> +    count += count_candidates (lookup_cand (c->dependent));
> +
> +  return count;
> +}
> +
> +/* Increase the count of INCREMENT by one in the increment vector.
> +   INCREMENT is associated with candidate C.  If an initializer
> +   T_0 = stride * I is provided by a candidate that dominates all
> +   candidates with the same increment, also record T_0 for subsequent use.  */
> +
> +static void
> +record_increment (slsr_cand_t c, double_int increment)
> +{
> +  bool found = false;
> +  unsigned i;
> +
> +  /* Treat increments that differ only in sign as identical so as to
> +     share initializers, unless we are generating pointer arithmetic.  */
> +  if (!address_arithmetic_p && double_int_negative_p (increment))
> +    increment = double_int_neg (increment);
> +
> +  for (i = 0; i < incr_vec_len; i++)
> +    {
> +      if (double_int_equal_p (incr_vec[i].incr, increment))
> +	{
> +	  incr_vec[i].count++;
> +	  found = true;
> +
> +	  /* If we previously recorded an initializer that doesn't
> +	     dominate this candidate, it's not going to be useful to
> +	     us after all.  */
> +	  if (incr_vec[i].initializer
> +	      && !dominated_by_p (CDI_DOMINATORS,
> +				  gimple_bb (c->cand_stmt),
> +				  incr_vec[i].init_bb))
> +	    {
> +	      incr_vec[i].initializer = NULL_TREE;
> +	      incr_vec[i].init_bb = NULL;
> +	    }
> +	  
> +	  break;
> +	}
> +    }
> +
> +  if (!found)
> +    {
> +      /* The first time we see an increment, create the entry for it.
> +	 If this is the root candidate which doesn't have a basis, set
> +	 the count to zero.  We're only processing it so it can possibly
> +	 provide an initializer for other candidates.  */
> +      incr_vec[incr_vec_len].incr = increment;
> +      incr_vec[incr_vec_len].count = c->basis ? 1 : 0;
> +      incr_vec[incr_vec_len].cost = COST_INFINITE;
> +      
> +      /* Optimistically record the first occurrence of this increment
> +	 as providing an initializer (if it does); we will revise this
> +	 opinion later if it doesn't dominate all other occurrences.
> +         Exception:  increments of -1, 0, 1 never need initializers.  */
> +      if (c->kind == CAND_ADD
> +	  && double_int_equal_p (c->index, increment)
> +	  && (double_int_scmp (increment, double_int_one) > 0
> +	      || double_int_scmp (increment, double_int_minus_one) < 0))
> +	{
> +	  tree t0;
> +	  tree rhs1 = gimple_assign_rhs1 (c->cand_stmt);
> +	  tree rhs2 = gimple_assign_rhs2 (c->cand_stmt);
> +	  if (operand_equal_p (rhs1, c->base_expr, 0))
> +	    t0 = rhs2;
> +	  else
> +	    t0 = rhs1;
> +	  if (SSA_NAME_DEF_STMT (t0) && gimple_bb (SSA_NAME_DEF_STMT (t0)))
> +	    {
> +	      incr_vec[incr_vec_len].initializer = t0;
> +	      incr_vec[incr_vec_len++].init_bb
> +		= gimple_bb (SSA_NAME_DEF_STMT (t0));
> +	    }
> +	  else
> +	    {
> +	      incr_vec[incr_vec_len].initializer = NULL_TREE;
> +	      incr_vec[incr_vec_len++].init_bb = NULL;
> +	    }
> +	}
> +      else
> +	{
> +	  incr_vec[incr_vec_len].initializer = NULL_TREE;
> +	  incr_vec[incr_vec_len++].init_bb = NULL;
> +	}
> +    }
> +}
> +
> +/* Determine how many times each unique increment occurs in the set
> +   of candidates rooted at C's parent, recording the data in the
> +   increment vector.  For each unique increment I, if an initializer
> +   T_0 = stride * I is provided by a candidate that dominates all
> +   candidates with the same increment, also record T_0 for subsequent
> +   use.  */
> +
> +static void
> +record_increments (slsr_cand_t c)
> +{
> +  if (!cand_already_replaced (c))
> +    record_increment (c, cand_increment (c));
> +
> +  if (c->sibling)
> +    record_increments (lookup_cand (c->sibling));
> +
> +  if (c->dependent)
> +    record_increments (lookup_cand (c->dependent));
> +}
> +
> +/* Return the first candidate in the tree rooted at C that has not
> +   already been replaced, favoring siblings over dependents.  */
> +
> +static slsr_cand_t
> +unreplaced_cand_in_tree (slsr_cand_t c)
> +{
> +  if (!cand_already_replaced (c))
> +    return c;
> +
> +  if (c->sibling)
> +    {
> +      slsr_cand_t sib = unreplaced_cand_in_tree (lookup_cand (c->sibling));
> +      if (sib)
> +	return sib;
> +    }
> +
> +  if (c->dependent)
> +    {
> +      slsr_cand_t dep = unreplaced_cand_in_tree (lookup_cand (c->dependent));
> +      if (dep)
> +	return dep;
> +    }
> +
> +  return NULL;
> +}
> +
> +/* Return TRUE if the candidates in the tree rooted at C should be
> +   optimized for speed, else FALSE.  We estimate this based on the block
> +   containing the most dominant candidate in the tree that has not yet
> +   been replaced.  */
> +
> +static bool
> +optimize_cands_for_speed_p (slsr_cand_t c)
> +{
> +  slsr_cand_t c2 = unreplaced_cand_in_tree (c);
> +  gcc_assert (c2);
> +  return optimize_bb_for_speed_p (gimple_bb (c2->cand_stmt));
> +}
> +
> +/* Add COST_IN to the lowest cost of any dependent path starting at
> +   candidate C or any of its siblings, counting only candidates along
> +   such paths with increment INCR.  Assume that replacing a candidate
> +   reduces cost by REPL_SAVINGS.  Also account for savings from any
> +   statements that would go dead.  */
> +
> +static int
> +lowest_cost_path (int cost_in, int repl_savings, slsr_cand_t c, double_int incr)
> +{
> +  int local_cost, sib_cost;
> +  double_int cand_incr = cand_abs_increment (c);
> +
> +  if (cand_already_replaced (c))
> +    local_cost = cost_in;
> +  else if (double_int_equal_p (incr, cand_incr))
> +    local_cost = cost_in - repl_savings - c->dead_savings;
> +  else
> +    local_cost = cost_in - c->dead_savings;
> +
> +  if (c->dependent)
> +    local_cost = lowest_cost_path (local_cost, repl_savings, 
> +				   lookup_cand (c->dependent), incr);
> +
> +  if (c->sibling)
> +    {
> +      sib_cost = lowest_cost_path (cost_in, repl_savings,
> +				   lookup_cand (c->sibling), incr);
> +      local_cost = MIN (local_cost, sib_cost);
> +    }
> +
> +  return local_cost;
> +}
> +
> +/* Compute the total savings that would accrue from all replacements
> +   in the candidate tree rooted at C, counting only candidates with
> +   increment INCR.  Assume that replacing a candidate reduces cost
> +   by REPL_SAVINGS.  Also account for savings from statements that
> +   would go dead.  */
> +
> +static int
> +total_savings (int repl_savings, slsr_cand_t c, double_int incr)
> +{
> +  int savings = 0;
> +  double_int cand_incr = cand_abs_increment (c);
> +
> +  if (double_int_equal_p (incr, cand_incr)
> +      && !cand_already_replaced (c))
> +    savings += repl_savings + c->dead_savings;
> +
> +  if (c->dependent)
> +    savings += total_savings (repl_savings, lookup_cand (c->dependent), incr);
> +
> +  if (c->sibling)
> +    savings += total_savings (repl_savings, lookup_cand (c->sibling), incr);
> +
> +  return savings;
> +}
> +
> +/* Use target-specific costs to determine and record which increments
> +   in the current candidate tree are profitable to replace, assuming
> +   MODE and SPEED.  FIRST_DEP is the first dependent of the root of
> +   the candidate tree.
> +
> +   One slight limitation here is that we don't account for the possible
> +   introduction of casts in some cases.  See replace_one_candidate for
> +   the cases where these are introduced.  This should probably be cleaned
> +   up sometime.  */
> +
> +static void
> +analyze_increments (slsr_cand_t first_dep, enum machine_mode mode, bool speed)
> +{
> +  unsigned i;
> +
> +  for (i = 0; i < incr_vec_len; i++)
> +    {
> +      HOST_WIDE_INT incr = double_int_to_shwi (incr_vec[i].incr);
> +
> +      /* If somehow this increment is bigger than a HWI, we won't
> +	 be optimizing candidates that use it.  And if the increment
> +	 has a count of zero, nothing will be done with it.  */
> +      if (!double_int_fits_in_shwi_p (incr_vec[i].incr)
> +	  || !incr_vec[i].count)
> +	incr_vec[i].cost = COST_INFINITE;
> +
> +      /* Increments of 0, 1, and -1 are always profitable to replace,
> +	 because they always replace a multiply or add with an add or
> +	 copy, and may cause one or more existing instructions to go
> +	 dead.  Exception:  -1 can't be assumed to be profitable for
> +	 pointer addition.  */
> +      else if (incr == 0
> +	       || incr == 1
> +	       || (incr == -1
> +		   && (gimple_assign_rhs_code (first_dep->cand_stmt)
> +		       != POINTER_PLUS_EXPR)))
> +	incr_vec[i].cost = COST_NEUTRAL;
> +      
> +      /* For any other increment, if this is a multiply candidate, we
> +	 must introduce a temporary T and initialize it with
> +	 T_0 = stride * increment.  When optimizing for speed, walk the
> +	 candidate tree to calculate the best cost reduction along any
> +	 path; if it offsets the fixed cost of inserting the initializer,
> +	 replacing the increment is profitable.  When optimizing for
> +         size, instead calculate the total cost reduction from replacing
> +	 all candidates with this increment.  */
> +      else if (first_dep->kind == CAND_MULT)
> +	{
> +	  int cost = mult_by_coeff_cost (incr, mode, speed);
> +	  int repl_savings = mul_cost (speed, mode) - add_cost (speed, mode);
> +	  if (speed)
> +	    cost = lowest_cost_path (cost, repl_savings, first_dep,
> +				     incr_vec[i].incr);
> +	  else
> +	    cost -= total_savings (repl_savings, first_dep, incr_vec[i].incr);
> +
> +	  incr_vec[i].cost = cost;
> +	}
> +
> +      /* If this is an add candidate, the initializer may already
> +	 exist, so only calculate the cost of the initializer if it
> +	 doesn't.  We are replacing one add with another here, so the
> +	 known replacement savings is zero.  We will account for removal
> +	 of dead instructions in lowest_cost_path or total_savings.  */
> +      else
> +	{
> +	  int cost = 0;
> +	  if (!incr_vec[i].initializer)
> +	    cost = mult_by_coeff_cost (incr, mode, speed);
> +
> +	  if (speed)
> +	    cost = lowest_cost_path (cost, 0, first_dep, incr_vec[i].incr);
> +	  else
> +	    cost -= total_savings (0, first_dep, incr_vec[i].incr);
> +
> +	  incr_vec[i].cost = cost;
> +	}
> +    }
> +}
> +
> +/* Return the nearest common dominator of BB1 and BB2.  If the blocks
> +   are identical, return the earlier of C1 and C2 in *WHERE.  Otherwise,
> +   if the NCD matches BB1, return C1 in *WHERE; if the NCD matches BB2,
> +   return C2 in *WHERE; and if the NCD matches neither, return NULL in
> +   *WHERE.  Note: It is possible for one of C1 and C2 to be NULL.  */
> +
> +static basic_block
> +ncd_for_two_cands (basic_block bb1, basic_block bb2,
> +		   slsr_cand_t c1, slsr_cand_t c2, slsr_cand_t *where)
> +{
> +  basic_block ncd;
> +
> +  if (!bb1)
> +    {
> +      *where = c2;
> +      return bb2;
> +    }
> +
> +  if (!bb2)
> +    {
> +      *where = c1;
> +      return bb1;
> +    }
> +
> +  ncd = nearest_common_dominator (CDI_DOMINATORS, bb1, bb2);
> +      
> +  /* If both candidates are in the same block, the earlier
> +     candidate wins.  */
> +  if (bb1 == ncd && bb2 == ncd)
> +    {
> +      if (!c1 || (c2 && c2->cand_num < c1->cand_num))
> +	*where = c2;
> +      else
> +	*where = c1;
> +    }
> +
> +  /* Otherwise, if one of them produced a candidate in the
> +     dominator, that one wins.  */
> +  else if (bb1 == ncd)
> +    *where = c1;
> +
> +  else if (bb2 == ncd)
> +    *where = c2;
> +
> +  /* If neither matches the dominator, neither wins.  */
> +  else
> +    *where = NULL;
> +
> +  return ncd;
> +}
> +
> +/* Consider all candidates in the tree rooted at C for which INCR
> +   represents the required increment of C relative to its basis.
> +   Find and return the basic block that most nearly dominates all
> +   such candidates.  If the returned block contains one or more of
> +   the candidates, return the earliest candidate in the block in
> +   *WHERE.  */
> +
> +static basic_block
> +nearest_common_dominator_for_cands (slsr_cand_t c, double_int incr,
> +				    slsr_cand_t *where)
> +{
> +  basic_block sib_ncd = NULL, dep_ncd = NULL, this_ncd = NULL, ncd;
> +  slsr_cand_t sib_where = NULL, dep_where = NULL, this_where = NULL, new_where;
> +  double_int cand_incr;
> +
> +  /* First find the NCD of all siblings and dependents.  */
> +  if (c->sibling)
> +    sib_ncd = nearest_common_dominator_for_cands (lookup_cand (c->sibling),
> +						  incr, &sib_where);
> +  if (c->dependent)
> +    dep_ncd = nearest_common_dominator_for_cands (lookup_cand (c->dependent),
> +						  incr, &dep_where);
> +  if (!sib_ncd && !dep_ncd)
> +    {
> +      new_where = NULL;
> +      ncd = NULL;
> +    }
> +  else if (sib_ncd && !dep_ncd)
> +    {
> +      new_where = sib_where;
> +      ncd = sib_ncd;
> +    }
> +  else if (dep_ncd && !sib_ncd)
> +    {
> +      new_where = dep_where;
> +      ncd = dep_ncd;
> +    }
> +  else
> +    ncd = ncd_for_two_cands (sib_ncd, dep_ncd, sib_where,
> +			     dep_where, &new_where);
> +
> +  /* If the candidate's increment doesn't match the one we're interested
> +     in, then the result depends only on siblings and dependents.  */
> +  cand_incr = cand_abs_increment (c);
> +
> +  if (!double_int_equal_p (cand_incr, incr) || cand_already_replaced (c))
> +    {
> +      *where = new_where;
> +      return ncd;
> +    }
> +
> +  /* Otherwise, compare this candidate with the result from all siblings
> +     and dependents.  */
> +  this_where = c;
> +  this_ncd = gimple_bb (c->cand_stmt);
> +  ncd = ncd_for_two_cands (ncd, this_ncd, new_where, this_where, where);
> +
> +  return ncd;
> +}
> +
> +/* Return TRUE if the increment indexed by INDEX is profitable to replace.  */
> +
> +static inline bool
> +profitable_increment_p (unsigned index)
> +{
> +  return (incr_vec[index].cost <= COST_NEUTRAL);
> +}
> +
> +/* For each profitable increment in the increment vector not equal to
> +   0 or 1 (or -1, for non-pointer arithmetic), find the nearest common
> +   dominator of all statements in the candidate chain rooted at C
> +   that require that increment, and insert an initializer
> +   T_0 = stride * increment at that location.  Record T_0 with the
> +   increment record.  */
> +
> +static void
> +insert_initializers (slsr_cand_t c)
> +{
> +  unsigned i;
> +  tree new_var = NULL_TREE;
> +
> +  for (i = 0; i < incr_vec_len; i++)
> +    {
> +      basic_block bb;
> +      slsr_cand_t where = NULL;
> +      gimple init_stmt;
> +      tree stride_type, new_name, incr_tree;
> +      double_int incr = incr_vec[i].incr;
> +
> +      if (!profitable_increment_p (i)
> +	  || double_int_one_p (incr)
> +	  || (double_int_minus_one_p (incr)
> +	      && gimple_assign_rhs_code (c->cand_stmt) != POINTER_PLUS_EXPR)
> +	  || double_int_zero_p (incr))
> +	continue;
> +
> +      /* We may have already identified an existing initializer that
> +	 will suffice.  */
> +      if (incr_vec[i].initializer)
> +	{
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    {
> +	      fputs ("Using existing initializer: ", dump_file);
> +	      print_gimple_stmt (dump_file,
> +				 SSA_NAME_DEF_STMT (incr_vec[i].initializer),
> +				 0, 0);
> +	    }
> +	  continue;
> +	}
> +
> +      /* Find the block that most closely dominates all candidates
> +	 with this increment.  If there is at least one candidate in
> +	 that block, the earliest one will be returned in WHERE.  */
> +      bb = nearest_common_dominator_for_cands (c, incr, &where);
> +
> +      /* Create a new SSA name to hold the initializer's value.  */
> +      stride_type = TREE_TYPE (SSA_NAME_VAR (c->stride));
> +      lazy_create_slsr_reg (&new_var, stride_type);
> +      new_name = make_ssa_name (new_var, NULL);
> +      incr_vec[i].initializer = new_name;
> +
> +      /* Create the initializer and insert it in the latest possible
> +	 dominating position.  */
> +      incr_tree = double_int_to_tree (stride_type, incr);
> +      init_stmt = gimple_build_assign_with_ops (MULT_EXPR, new_name,
> +						c->stride, incr_tree);
> +      if (where)
> +	{
> +	  gimple_stmt_iterator gsi = gsi_for_stmt (where->cand_stmt);
> +	  gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
> +	  gimple_set_location (init_stmt, gimple_location (where->cand_stmt));
> +	}
> +      else
> +	{
> +	  gimple_stmt_iterator gsi = gsi_last_bb (bb);
> +	  gimple basis_stmt = lookup_cand (c->basis)->cand_stmt;
> +
> +	  if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
> +	    gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
> +	  else
> +	    gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
> +
> +	  gimple_set_location (init_stmt, gimple_location (basis_stmt));
> +	}
> +
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	{
> +	  fputs ("Inserting initializer: ", dump_file);
> +	  print_gimple_stmt (dump_file, init_stmt, 0, 0);
> +	}
> +    }
> +}
> +
> +/* Create a NOP_EXPR that copies FROM_EXPR into a new SSA name of
> +   type TO_TYPE, and insert it in front of the statement represented
> +   by candidate C.  Use *NEW_VAR to create the new SSA name.  Return
> +   the new SSA name.  */
> +
> +static tree
> +introduce_cast_before_cand (slsr_cand_t c, tree to_type,
> +			    tree from_expr, tree *new_var)
> +{
> +  tree cast_lhs;
> +  gimple cast_stmt;
> +  gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
> +
> +  lazy_create_slsr_reg (new_var, to_type);
> +  cast_lhs = make_ssa_name (*new_var, NULL);
> +  cast_stmt = gimple_build_assign_with_ops (NOP_EXPR, cast_lhs,
> +					    from_expr, NULL_TREE);
> +  gimple_set_location (cast_stmt, gimple_location (c->cand_stmt));
> +  gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fputs ("  Inserting: ", dump_file);
> +      print_gimple_stmt (dump_file, cast_stmt, 0, 0);
> +    }
> +
> +  return cast_lhs;
> +}
> +
> +/* Replace the RHS of the statement represented by candidate C with 
> +   NEW_CODE, NEW_RHS1, and NEW_RHS2, provided that to do so doesn't
> +   leave C unchanged or just interchange its operands.  The original
> +   operation and operands are in OLD_CODE, OLD_RHS1, and OLD_RHS2.
> +   If the replacement was made and we are doing a details dump,
> +   return the revised statement, else NULL.  */
> +
> +static gimple
> +replace_rhs_if_not_dup (enum tree_code new_code, tree new_rhs1, tree new_rhs2,
> +			enum tree_code old_code, tree old_rhs1, tree old_rhs2,
> +			slsr_cand_t c)
> +{
> +  if (new_code != old_code
> +      || ((!operand_equal_p (new_rhs1, old_rhs1, 0)
> +	   || !operand_equal_p (new_rhs2, old_rhs2, 0))
> +	  && (!operand_equal_p (new_rhs1, old_rhs2, 0)
> +	      || !operand_equal_p (new_rhs2, old_rhs1, 0))))
> +    {
> +      gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
> +      gimple_assign_set_rhs_with_ops (&gsi, new_code, new_rhs1, new_rhs2);
> +      update_stmt (gsi_stmt (gsi));
> +
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	return gsi_stmt (gsi);
> +    }
> +
> +  else if (dump_file && (dump_flags & TDF_DETAILS))
> +    fputs ("  (duplicate, not actually replacing)\n", dump_file);
> +
> +  return NULL;
> +}
> +
> +/* Strength-reduce the statement represented by candidate C by replacing
> +   it with an equivalent addition or subtraction.  I is the index into
> +   the increment vector identifying C's increment.  NEW_VAR is used to
> +   create a new SSA name if a cast needs to be introduced.  BASIS_NAME
> +   is the rhs1 to use in creating the add/subtract.  */
> +
> +static void
> +replace_one_candidate (slsr_cand_t c, unsigned i, tree *new_var,
> +		       tree basis_name)
> +{
> +  gimple stmt_to_print = NULL;
> +  tree orig_rhs1, orig_rhs2;
> +  tree rhs2;
> +  enum tree_code orig_code, repl_code;
> +  double_int cand_incr;
> +
> +  orig_code = gimple_assign_rhs_code (c->cand_stmt);
> +  orig_rhs1 = gimple_assign_rhs1 (c->cand_stmt);
> +  orig_rhs2 = gimple_assign_rhs2 (c->cand_stmt);
> +  cand_incr = cand_increment (c);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fputs ("Replacing: ", dump_file);
> +      print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
> +      stmt_to_print = c->cand_stmt;
> +    }
> +
> +  if (address_arithmetic_p)
> +    repl_code = POINTER_PLUS_EXPR;
> +  else
> +    repl_code = PLUS_EXPR;
> +
> +  /* If the increment has an initializer T_0, replace the candidate
> +     statement with an add of the basis name and the initializer.  */
> +  if (incr_vec[i].initializer)
> +    {
> +      tree init_type = TREE_TYPE (SSA_NAME_VAR (incr_vec[i].initializer));
> +      tree orig_type = TREE_TYPE (SSA_NAME_VAR (orig_rhs2));
> +
> +      if (types_compatible_p (orig_type, init_type))
> +	rhs2 = incr_vec[i].initializer;
> +      else
> +	rhs2 = introduce_cast_before_cand (c, orig_type,
> +					   incr_vec[i].initializer,
> +					   new_var);
> +
> +      if (!double_int_equal_p (incr_vec[i].incr, cand_incr))
> +	{
> +	  gcc_assert (repl_code == PLUS_EXPR);
> +	  repl_code = MINUS_EXPR;
> +	}
> +
> +      stmt_to_print = replace_rhs_if_not_dup (repl_code, basis_name, rhs2,
> +					      orig_code, orig_rhs1, orig_rhs2,
> +					      c);
> +    }
> +
> +  /* Otherwise, the increment is one of -1, 0, and 1.  Replace
> +     with a subtract of the stride from the basis name, a copy
> +     from the basis name, or an add of the stride to the basis
> +     name, respectively.  It may be necessary to introduce a
> +     cast (or reuse an existing cast).  */
> +  else if (double_int_one_p (cand_incr))
> +    {
> +      tree stride_type = TREE_TYPE (SSA_NAME_VAR (c->stride));
> +      tree orig_type = TREE_TYPE (SSA_NAME_VAR (orig_rhs2));
> +      
> +      if (types_compatible_p (orig_type, stride_type))
> +	rhs2 = c->stride;
> +      else
> +	rhs2 = introduce_cast_before_cand (c, orig_type, c->stride, new_var);
> +      
> +      stmt_to_print = replace_rhs_if_not_dup (repl_code, basis_name, rhs2,
> +					      orig_code, orig_rhs1, orig_rhs2,
> +					      c);
> +    }
> +
> +  else if (double_int_minus_one_p (cand_incr))
> +    {
> +      tree stride_type = TREE_TYPE (SSA_NAME_VAR (c->stride));
> +      tree orig_type = TREE_TYPE (SSA_NAME_VAR (orig_rhs2));
> +      gcc_assert (repl_code != POINTER_PLUS_EXPR);
> +      
> +      if (types_compatible_p (orig_type, stride_type))
> +	rhs2 = c->stride;
> +      else
> +	rhs2 = introduce_cast_before_cand (c, orig_type, c->stride, new_var);
> +      
> +      if (orig_code != MINUS_EXPR
> +	  || !operand_equal_p (basis_name, orig_rhs1, 0)
> +	  || !operand_equal_p (rhs2, orig_rhs2, 0))
> +	{
> +	  gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
> +	  gimple_assign_set_rhs_with_ops (&gsi, MINUS_EXPR, basis_name, rhs2);
> +	  update_stmt (gsi_stmt (gsi));
> +
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    stmt_to_print = gsi_stmt (gsi);
> +	}
> +      else if (dump_file && (dump_flags & TDF_DETAILS))
> +	fputs ("  (duplicate, not actually replacing)\n", dump_file);
> +    }
> +
> +  else if (double_int_zero_p (cand_incr))
> +    {
> +      tree lhs = gimple_assign_lhs (c->cand_stmt);
> +      tree lhs_type = TREE_TYPE (SSA_NAME_VAR (lhs));
> +      tree basis_type = TREE_TYPE (SSA_NAME_VAR (basis_name));
> +      
> +      if (types_compatible_p (lhs_type, basis_type))
> +	{
> +	  gimple copy_stmt = gimple_build_assign (lhs, basis_name);
> +	  gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
> +	  gimple_set_location (copy_stmt, gimple_location (c->cand_stmt));
> +	  gsi_replace (&gsi, copy_stmt, false);
> +
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    stmt_to_print = copy_stmt;
> +	}
> +      else
> +	{
> +	  gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
> +	  gimple cast_stmt = gimple_build_assign_with_ops (NOP_EXPR, lhs,
> +							   basis_name,
> +							   NULL_TREE);
> +	  gimple_set_location (cast_stmt, gimple_location (c->cand_stmt));
> +	  gsi_replace (&gsi, cast_stmt, false);
> +
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    stmt_to_print = cast_stmt;
> +	}
> +    }
> +  else
> +    gcc_unreachable ();
> +  
> +  if (dump_file && (dump_flags & TDF_DETAILS) && stmt_to_print)
> +    {
> +      fputs ("With: ", dump_file);
> +      print_gimple_stmt (dump_file, stmt_to_print, 0, 0);
> +      fputs ("\n", dump_file);
> +    }
> +}
> +
> +/* For each candidate in the tree rooted at C, replace it with
> +   an increment if such has been shown to be profitable.  */
> +
> +static void
> +replace_profitable_candidates (slsr_cand_t c)
> +{
> +  if (!cand_already_replaced (c))
> +    {
> +      double_int increment = cand_abs_increment (c);
> +      tree new_var = NULL;
> +      enum tree_code orig_code = gimple_assign_rhs_code (c->cand_stmt);
> +      unsigned i;
> +
> +      i = incr_vec_index (increment);
> +
> +      /* Only process profitable increments.  Nothing useful can be done
> +	 to a cast or copy.  */
> +      if (profitable_increment_p (i) 
> +	  && orig_code != MODIFY_EXPR
> +	  && orig_code != NOP_EXPR)
> +	{
> +	  slsr_cand_t basis = lookup_cand (c->basis);
> +	  tree basis_name = gimple_assign_lhs (basis->cand_stmt);
> +	  replace_one_candidate (c, i, &new_var, basis_name);
> +	}
> +    }
> +
> +  if (c->sibling)
> +    replace_profitable_candidates (lookup_cand (c->sibling));
> +
> +  if (c->dependent)
> +    replace_profitable_candidates (lookup_cand (c->dependent));
> +}
> +
>  /* Analyze costs of related candidates in the candidate vector,
>     and make beneficial replacements.  */
>  
> @@ -1670,12 +2524,47 @@ analyze_candidates_and_replace (void)
>        else if (unconditional_cands_with_known_stride_p (c))
>  	replace_dependents (first_dep);
>  
> -      /* TODO:  When the stride is an SSA name, it may still be
> -	 profitable to replace some or all of the dependent
> -	 candidates, depending on whether the introduced increments
> -	 can be reused, or are less expensive to calculate than
> -	 the replaced statements.  */
> +      /* When the stride is an SSA name, it may still be profitable
> +	 to replace some or all of the dependent candidates, depending
> +	 on whether the introduced increments can be reused, or are
> +	 less expensive to calculate than the replaced statements.  */
> +      else
> +	{
> +	  unsigned length;
> +	  enum machine_mode mode;
> +	  bool speed;
>  
> +	  /* Determine whether we'll be generating pointer arithmetic
> +	     when replacing candidates.  */
> +	  address_arithmetic_p = (c->kind == CAND_ADD
> +				  && POINTER_TYPE_P (TREE_TYPE (c->base_expr)));
> +
> +	  /* If all candidates have already been replaced under other
> +	     interpretations, nothing remains to be done.  */
> +	  length = count_candidates (c);
> +	  if (!length)
> +	    continue;
> +
> +	  /* Construct an array of increments for this candidate chain.  */
> +	  incr_vec = XNEWVEC (incr_info, length);
> +	  incr_vec_len = 0;
> +	  record_increments (c);
> +
> +	  /* Determine which increments are profitable to replace.  */
> +	  mode = TYPE_MODE (TREE_TYPE (gimple_assign_lhs (c->cand_stmt)));
> +	  speed = optimize_cands_for_speed_p (c);
> +	  analyze_increments (first_dep, mode, speed);
> +
> +	  /* Insert initializers of the form T_0 = stride * increment
> +	     for use in profitable replacements.  */
> +	  insert_initializers (first_dep);
> +	  dump_incr_vec ();
> +
> +	  /* Perform the replacements.  */
> +	  replace_profitable_candidates (first_dep);
> +	  free (incr_vec);
> +	}
> +
>        /* TODO:  When conditional increments occur so that a 
>  	 candidate is dependent upon a phi-basis, the cost of
>  	 introducing a temporary must be accounted for.  */
> 
> 
>
diff mbox

Patch

Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-10.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-10.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-10.c	(revision 0)
@@ -0,0 +1,23 @@ 
+/* Verify straight-line strength reduction for simple integer addition
+   with stride reversed on 1st and 3rd instances.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int s, int c)
+{
+  int a1, a2, a3, x1, x2, x3, x;
+
+  a1 = 2 * s;
+  x1 = a1 + c;
+  a2 = 4 * s;
+  x2 = c + a2;
+  a3 = 6 * s;
+  x3 = a3 + c;
+  x = x1 + x2 + x3;
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-11.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-11.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-11.c	(revision 0)
@@ -0,0 +1,24 @@ 
+/* Verify straight-line strength reduction for simple integer addition
+   with casts thrown in.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+long
+f (int s, long c)
+{
+  int a1, a2, a3;
+  long x1, x2, x3, x;
+
+  a1 = 2 * s;
+  x1 = c + a1;
+  a2 = 4 * s;
+  x2 = c + a2;
+  a3 = 6 * s;
+  x3 = c + a3;
+  x = x1 + x2 + x3;
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-20.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-20.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-20.c	(revision 0)
@@ -0,0 +1,21 @@ 
+/* Verify straight-line strength reduction for multiply candidates
+   with stride in inconsistent positions.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int c, int s)
+{
+  int x1, x2, y1, y2;
+
+  y1 = c + 2;
+  x1 = y1 * s;
+  y2 = y1 + 2;
+  x2 = s * y2;
+  return x1 + x2;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* s" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\* 2" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-12.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-12.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-12.c	(revision 0)
@@ -0,0 +1,30 @@ 
+/* Verify that no straight-line strength reduction occurs across sibling
+   blocks.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int s, int c)
+{
+  int a1, a2, a3, x1, x2, x3, x;
+
+  if (c > 0)
+    {
+      a1 = 2 * s;
+      x1 = c + a1;
+    }
+  else
+    {
+      a1 = 4 * s;
+      x1 = c + a1;
+    }
+
+  a2 = 6 * s;
+  x2 = c + a2;
+  x = x1 + x2;
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-5.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-5.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-5.c	(revision 0)
@@ -0,0 +1,22 @@ 
+/* Verify straight-line strength reduction for simple add candidates.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int s, int c)
+{
+  int a1, a2, a3, x1, x2, x3, x;
+
+  a1 = 2 * s;
+  x1 = c + a1;
+  a2 = 4 * s;
+  x2 = c + a2;
+  a3 = 6 * s;
+  x3 = c + a3;
+  x = x1 + x2 + x3;
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-21.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-21.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-21.c	(revision 0)
@@ -0,0 +1,32 @@ 
+/* Verify straight-line strength reduction for multiply candidates
+   with variable stride and control flow.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int n, int x, int stride)
+{
+  int a, x1, x2, x3;
+
+  a = x * stride;
+
+  if (n > 64)
+    {
+      x1 = x + 3;
+      a += x1 * stride;
+      x2 = x1 + 3;
+      a += x2 * stride;
+    }
+  else
+    {
+      x3 = x + 3;
+      a += x3 * stride;
+    }
+
+  return a;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* stride" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\* 3" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-30.c	(revision 0)
@@ -0,0 +1,25 @@ 
+/* Verify straight-line strength reduction fails for simple integer addition
+   with casts thrown in when -fwrapv is used.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-dom2 -fwrapv" } */
+/* { dg-skip-if "" { ilp32 } { "-m32" } { "" } } */
+
+long
+f (int s, long c)
+{
+  int a1, a2, a3;
+  long x1, x2, x3, x;
+
+  a1 = 2 * s;
+  x1 = c + a1;
+  a2 = 4 * s;
+  x2 = c + a2;
+  a3 = 6 * s;
+  x3 = c + a3;
+  x = x1 + x2 + x3;
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 3 "dom2" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-13.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-13.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-13.c	(revision 0)
@@ -0,0 +1,25 @@ 
+/* x2 and x3 will be strength-reduced based on the same statement
+   but with different variables as the stride.  Note that they will
+   be strength-reduced by introducing an initializer 4*s which is
+   cheaper than 5*s; similar for 4*c and 5*c.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int s, int c)
+{
+  int a2, a3, x1, x2, x3, x;
+
+  x1 = c + s;
+  a2 = 5 * s;
+  x2 = c + a2;
+  a3 = 5 * c;
+  x3 = s + a3;
+  x = x1 + x2 + x3;
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* 4" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\* 5" 0 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-6.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-6.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-6.c	(revision 0)
@@ -0,0 +1,25 @@ 
+/* Verify straight-line strength reduction for simple add candidates,
+   pointer version.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+void
+f (int s, char *c, char *x1, char *x2, char *x3)
+{
+  int a1, a2, a3;
+
+  a1 = 2 * s;
+  x1 = c + a1;
+  *x1 = 1;
+  a2 = 4 * s;
+  x2 = c + a2;
+  *x2 = 2;
+  a3 = 6 * s;
+  x3 = c + a3;
+  *x3 = 3;
+}
+
+/* There will be four ' * ' instances for the parms, one in the code.  */
+/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-22.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-22.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-22.c	(revision 0)
@@ -0,0 +1,29 @@ 
+/* Verify straight-line strength reduction for multiply candidates
+   with variable stride and control flow.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int n, int x, int stride)
+{
+  int a, x1, x2, x3;
+
+  a = x * stride;
+
+  if (n > 64)
+    {
+      x1 = x + 3;
+      a += x1 * stride;
+      x2 = x1 + 3;
+      a += x2 * stride;
+      x3 = x2 + 3;
+      a += x3 * stride;
+    }
+
+  return a;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* stride" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\* 3" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-31.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-31.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-31.c	(revision 0)
@@ -0,0 +1,27 @@ 
+/* Verify straight-line strength reduction for add candidates in
+   which the stride is unknown and increments appear that differ
+   only in sign.  Verify the increments are shared.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int s, int c)
+{
+  int a1, a2, a3, a4, x1, x2, x3, x4, x;
+
+  a1 = 2 * s;
+  x1 = c + a1;
+  a2 = 4 * s;  /* incr = +2  */
+  x2 = c + a2;
+  a3 = 7 * s;
+  x3 = c + a3;
+  a4 = 5 * s;  /* incr = -2  */
+  x4 = c + a4;
+  x = x1 + x2 + x3 + x4;
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* 2" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\* -2" 0 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-14.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-14.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-14.c	(revision 0)
@@ -0,0 +1,32 @@ 
+/* Straight-line strength reduction control flow variation.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int n, int c, int s)
+{
+  int a1, a2, x, x1, x2, x3, x4;
+
+  a1 = 2 * s;
+
+  if (n > 64)
+    {
+      x1 = c + a1;
+      a2 = 4 * s;
+      x2 = c + a2;
+      x = x1 + x2;
+    }
+  else
+    {
+      x3 = c + a1;
+      a2 = 4 * s;
+      x4 = c + a2;
+      x = x4 / x3;
+    }
+
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-7.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-7.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-7.c	(revision 0)
@@ -0,0 +1,22 @@ 
+/* Verify straight-line strength reduction for simple integer subtraction.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int s, int c)
+{
+  int a1, a2, a3, x1, x2, x3, x;
+
+  a1 = 2 * s;
+  x1 = c - a1;
+  a2 = 4 * s;
+  x2 = c - a2;
+  a3 = 6 * s;
+  x3 = c - a3;
+  x = x1 + x2 + x3;
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-23.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-23.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-23.c	(revision 0)
@@ -0,0 +1,29 @@ 
+/* Verify straight-line strength reduction for multiply candidates
+   with variable stride and control flow.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int 
+f (int n, int x, int stride)
+{
+  int a, x1, x2, x3;
+
+  a = x * stride;
+  x1 = x + 3;
+  a += x1 * stride;
+
+  if (n > 64)
+    {
+      x2 = x1 + 3;
+      a += x2 * stride;
+      x3 = x2 + 3;
+      a += x3 * stride;
+    }
+
+  return a;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* stride" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\* 3" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-15.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-15.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-15.c	(revision 0)
@@ -0,0 +1,27 @@ 
+/* Straight-line strength reduction control flow variation.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int 
+f (int n, int c, int s)
+{
+  int a, x1, x2, x3;
+
+  x1 = x2 = x3 = c;
+
+  if (n > 64)
+    {
+      a = 2 * s;
+      x1 = c + a;
+      a = 4 * s;
+      x2 = c + a;
+      a = 6 * s;
+      x3 = c + a;
+    }
+
+  return x1 + x2 + x3;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c	(revision 0)
@@ -0,0 +1,23 @@ 
+/* Verify straight-line strength reduction for simple pointer subtraction.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int*
+f (int s, int *c)
+{
+  int a1, a2, a3, *x1, *x2, *x3;
+
+  a1 = 2 * s;
+  x1 = c - a1;
+  a2 = 4 * s;
+  x2 = c - a2;
+  a3 = 6 * s;
+  x3 = c - a3;
+  return x1 ? x2 : x3;
+}
+
+/* There are 2 ' * ' instances in the decls (since "int * x3;" is
+   optimized out), 1 parm, 2 in the code.  */
+/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-24.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-24.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-24.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* Verify straight-line strength reduction for multiply candidates
+   with variable stride and control flow, increment = 1.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int n, int x, int stride)
+{
+  int a, x1, x2, x3;
+
+  a = x * stride;
+
+  if (n > 64)
+    {
+      x1 = x + 1;
+      a += x1 * stride;
+      x2 = x1 + 1;
+      a += x2 * stride;
+    }
+  else
+    {
+      x3 = x + 1;
+      a += x3 * stride;
+    }
+
+  return a;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-16.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-16.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-16.c	(revision 0)
@@ -0,0 +1,28 @@ 
+/* Straight-line strength reduction control flow variation.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int n, int c, int s)
+{
+  int a2, a3, a4, x1, x2, x3, x4;
+
+  x1 = c + s;
+  a2 = 3 * s;
+  x2 = c + a2;
+  x3 = x4 = c;
+
+  if (n > 64)
+    {
+      a3 = 5 * s;
+      x3 = c + a3;
+      a4 = 7 * s;
+      x4 = c + a4;
+    }
+
+  return x1 + x2 + x3 + x4;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-9.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-9.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-9.c	(revision 0)
@@ -0,0 +1,23 @@ 
+/* Verify straight-line strength reduction for simple integer addition
+   with stride reversed.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int s, int c)
+{
+  int a1, a2, a3, x1, x2, x3, x;
+
+  a1 = 2 * s;
+  x1 = a1 + c;
+  a2 = 4 * s;
+  x2 = a2 + c;
+  a3 = 6 * s;
+  x3 = a3 + c;
+  x = x1 + x2 + x3;
+  return x;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-25.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-25.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-25.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* Verify straight-line strength reduction for multiply candidates
+   with variable stride and control flow, increment = -1.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int 
+f (int n, int x, int stride)
+{
+  int a, x1, x2, x3;
+
+  a = x * stride;
+
+  if (n > 64)
+    {
+      x1 = x - 1;
+      a += x1 * stride;
+      x2 = x1 - 1;
+      a += x2 * stride;
+    }
+  else
+    {
+      x3 = x - 1;
+      a += x3 * stride;
+    }
+
+  return a;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-17.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-17.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-17.c	(revision 0)
@@ -0,0 +1,31 @@ 
+/* Straight-line strength reduction control flow variation with incr = 1.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int n, int c, int s)
+{
+  int a2, a3, a4, x1, x2, x3, x4;
+
+  x1 = c + s;
+  x2 = x3 = x4 = c;
+
+  if (n > 64)
+    {
+      a2 = 2 * s;
+      x2 = c + a2;
+      a3 = 3 * s;
+      x3 = c + a3;
+    }
+  else
+    {
+      a4 = 2 * s;
+      x4 = c + a4;
+    }
+
+  return x1 + x2 + x3 + x4;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 0 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-26.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-26.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-26.c	(revision 0)
@@ -0,0 +1,32 @@ 
+/* Verify straight-line strength reduction for multiply candidates
+   with variable stride and control flow, increment = -3.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int n, int x, int stride)
+{
+  int a, x1, x2, x3;
+
+  a = x * stride;
+
+  if (n > 64)
+    {
+      x1 = x - 3;
+      a += x1 * stride;
+      x2 = x1 - 3;
+      a += x2 * stride;
+    }
+  else
+    {
+      x3 = x - 3;
+      a += x3 * stride;
+    }
+
+  return a;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* stride" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\* 3" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-18.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-18.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-18.c	(revision 0)
@@ -0,0 +1,32 @@ 
+/* Straight-line strength reduction control flow variation with incr = -1.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int n, int c, int s)
+{
+  int a1, a2, a3, a4, x1, x2, x3, x4;
+
+  a1 = 4 * s;
+  x1 = c + a1;
+  x2 = x3 = x4 = c;
+
+  if (n > 64)
+    {
+      a2 = 3 * s;
+      x2 = c + a2;
+      a3 = 2 * s;
+      x3 = c + a3;
+    }
+  else
+    {
+      a4 = 3 * s;
+      x4 = c + a4;
+    }
+
+  return x1 + x2 + x3 + x4;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-19.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-19.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-19.c	(revision 0)
@@ -0,0 +1,22 @@ 
+/* Verify straight-line strength reduction for multiply candidates
+   with stride in RHS1 position.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int
+f (int c, int s)
+{
+  int x1, x2, y1, y2;
+
+  y1 = c + 2;
+  x1 = s * y1;
+  y2 = y1 + 2;
+  x2 = s * y2;
+  return x1 + x2;
+}
+
+/* { dg-final { scan-tree-dump-times " \\* y" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\* 2" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
Index: gcc/gimple-ssa-strength-reduction.c
===================================================================
--- gcc/gimple-ssa-strength-reduction.c	(revision 190038)
+++ gcc/gimple-ssa-strength-reduction.c	(working copy)
@@ -30,8 +30,7 @@  along with GCC; see the file COPYING3.  If not see
    1) Explicit multiplies, known constant multipliers, no
       conditional increments. (complete)
    2) Explicit multiplies, unknown constant multipliers,
-      no conditional increments. (data gathering complete,
-      replacements pending)
+      no conditional increments. (complete)
    3) Implicit multiplies in addressing expressions. (complete)
    4) Explicit multiplies, conditional increments. (pending)
 
@@ -235,6 +234,41 @@  struct cand_chain_d
 typedef struct cand_chain_d cand_chain, *cand_chain_t;
 typedef const struct cand_chain_d *const_cand_chain_t;
 
+/* Information about a unique "increment" associated with candidates
+   having an SSA name for a stride.  An increment is the difference
+   between the index of the candidate and the index of its basis,
+   i.e., (i - i') as discussed in the module commentary.
+
+   When we are not going to generate address arithmetic we treat
+   increments that differ only in sign as the same, allowing sharing
+   of the cost of initializers.  The absolute value of the increment
+   is stored in the incr_info.  */
+
+struct incr_info_d
+{
+  /* The increment that relates a candidate to its basis.  */
+  double_int incr;
+
+  /* How many times the increment occurs in the candidate tree.  */
+  unsigned count;
+
+  /* Cost of replacing candidates using this increment.  Negative and
+     zero costs indicate replacement should be performed.  */
+  int cost;
+
+  /* If this increment is profitable but is not -1, 0, or 1, it requires
+     an initializer T_0 = stride * incr to be found or introduced in the
+     nearest common dominator of all candidates.  This field holds T_0
+     for subsequent use.  */
+  tree initializer;
+
+  /* If the initializer was found to already exist, this is the block
+     where it was found.  */
+  basic_block init_bb;
+};
+
+typedef struct incr_info_d incr_info, *incr_info_t;
+
 /* Candidates are maintained in a vector.  If candidate X dominates
    candidate Y, then X appears before Y in the vector; but the
    converse does not necessarily hold.  */
@@ -259,6 +293,16 @@  static htab_t base_cand_map;
 
 /* Obstack for candidate chains.  */
 static struct obstack chain_obstack;
+
+/* An array INCR_VEC of incr_infos is used during analysis of related
+   candidates having an SSA name for a stride.  INCR_VEC_LEN describes
+   its current length.  */
+static incr_info_t incr_vec;
+static unsigned incr_vec_len;
+
+/* For a chain of candidates with unknown stride, indicates whether or not
+   we must generate pointer arithmetic when replacing statements.  */
+static bool address_arithmetic_p;
 
 /* Produce a pointer to the IDX'th candidate in the candidate vector.  */
 
@@ -421,7 +465,6 @@  stmt_cost (gimple gs, bool speed)
     case PLUS_EXPR:
     case POINTER_PLUS_EXPR:
     case MINUS_EXPR:
-      rhs2 = gimple_assign_rhs2 (gs);
       return add_cost (speed, lhs_mode);
 
     case NEGATE_EXPR:
@@ -1407,6 +1450,30 @@  dump_cand_chains (void)
   htab_traverse_noresize (base_cand_map, base_cand_dump_callback, NULL);
   fputs ("\n", dump_file);
 }
+
+/* Dump the increment vector for debug.  */
+
+static void
+dump_incr_vec (void)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      unsigned i;
+
+      fprintf (dump_file, "\nIncrement vector:\n\n");
+  
+      for (i = 0; i < incr_vec_len; i++)
+	{
+	  fprintf (dump_file, "%3d  increment:   ", i);
+	  dump_double_int (dump_file, incr_vec[i].incr, false);
+	  fprintf (dump_file, "\n     count:       %d", incr_vec[i].count);
+	  fprintf (dump_file, "\n     cost:        %d", incr_vec[i].cost);
+	  fputs ("\n     initializer: ", dump_file);
+	  print_generic_expr (dump_file, incr_vec[i].initializer, 0);
+	  fputs ("\n\n", dump_file);
+	}
+    }
+}
 
 /* Recursive helper for unconditional_cands_with_known_stride_p.
    Returns TRUE iff C, its siblings, and its dependents are all
@@ -1508,6 +1575,35 @@  cand_increment (slsr_cand_t c)
   return double_int_sub (c->index, basis->index);
 }
 
+/* Calculate the increment required for candidate C relative to
+   its basis.  If we aren't going to generate pointer arithmetic
+   for this candidate, return the absolute value of that increment
+   instead.  */
+
+static inline double_int
+cand_abs_increment (slsr_cand_t c)
+{
+  double_int increment = cand_increment (c);
+
+  if (!address_arithmetic_p && double_int_negative_p (increment))
+    increment = double_int_neg (increment);
+
+  return increment;
+}
+
+/* If *VAR is NULL or is not of a compatible type with TYPE, create a
+   new temporary reg of type TYPE and store it in *VAR.  */
+
+static inline void
+lazy_create_slsr_reg (tree *var, tree type)
+{
+  if (!*var || !types_compatible_p (TREE_TYPE (*var), type))
+    {
+      *var = create_tmp_reg (type, "slsr");
+      add_referenced_var (*var);
+    }
+}
+
 /* Return TRUE iff candidate C has already been replaced under
    another interpretation.  */
 
@@ -1630,6 +1726,764 @@  replace_dependents (slsr_cand_t c)
     replace_dependents (lookup_cand (c->dependent));
 }
 
+/* Return the index in the increment vector of the given INCREMENT.  */
+
+static inline unsigned
+incr_vec_index (double_int increment)
+{
+  unsigned i;
+  
+  for (i = 0;
+       i < incr_vec_len && !double_int_equal_p (increment, incr_vec[i].incr);
+       i++)
+    ;
+
+  gcc_assert (i < incr_vec_len);
+  return i;
+}
+
+/* Count the number of candidates in the tree rooted at C that have
+   not already been replaced under other interpretations.  */
+
+static unsigned
+count_candidates (slsr_cand_t c)
+{
+  unsigned count = cand_already_replaced (c) ? 0 : 1;
+
+  if (c->sibling)
+    count += count_candidates (lookup_cand (c->sibling));
+
+  if (c->dependent)
+    count += count_candidates (lookup_cand (c->dependent));
+
+  return count;
+}
+
+/* Increase the count of INCREMENT by one in the increment vector.
+   INCREMENT is associated with candidate C.  If an initializer
+   T_0 = stride * I is provided by a candidate that dominates all
+   candidates with the same increment, also record T_0 for subsequent use.  */
+
+static void
+record_increment (slsr_cand_t c, double_int increment)
+{
+  bool found = false;
+  unsigned i;
+
+  /* Treat increments that differ only in sign as identical so as to
+     share initializers, unless we are generating pointer arithmetic.  */
+  if (!address_arithmetic_p && double_int_negative_p (increment))
+    increment = double_int_neg (increment);
+
+  for (i = 0; i < incr_vec_len; i++)
+    {
+      if (double_int_equal_p (incr_vec[i].incr, increment))
+	{
+	  incr_vec[i].count++;
+	  found = true;
+
+	  /* If we previously recorded an initializer that doesn't
+	     dominate this candidate, it's not going to be useful to
+	     us after all.  */
+	  if (incr_vec[i].initializer
+	      && !dominated_by_p (CDI_DOMINATORS,
+				  gimple_bb (c->cand_stmt),
+				  incr_vec[i].init_bb))
+	    {
+	      incr_vec[i].initializer = NULL_TREE;
+	      incr_vec[i].init_bb = NULL;
+	    }
+	  
+	  break;
+	}
+    }
+
+  if (!found)
+    {
+      /* The first time we see an increment, create the entry for it.
+	 If this is the root candidate which doesn't have a basis, set
+	 the count to zero.  We're only processing it so it can possibly
+	 provide an initializer for other candidates.  */
+      incr_vec[incr_vec_len].incr = increment;
+      incr_vec[incr_vec_len].count = c->basis ? 1 : 0;
+      incr_vec[incr_vec_len].cost = COST_INFINITE;
+      
+      /* Optimistically record the first occurrence of this increment
+	 as providing an initializer (if it does); we will revise this
+	 opinion later if it doesn't dominate all other occurrences.
+         Exception:  increments of -1, 0, 1 never need initializers.  */
+      if (c->kind == CAND_ADD
+	  && double_int_equal_p (c->index, increment)
+	  && (double_int_scmp (increment, double_int_one) > 0
+	      || double_int_scmp (increment, double_int_minus_one) < 0))
+	{
+	  tree t0;
+	  tree rhs1 = gimple_assign_rhs1 (c->cand_stmt);
+	  tree rhs2 = gimple_assign_rhs2 (c->cand_stmt);
+	  if (operand_equal_p (rhs1, c->base_expr, 0))
+	    t0 = rhs2;
+	  else
+	    t0 = rhs1;
+	  if (SSA_NAME_DEF_STMT (t0) && gimple_bb (SSA_NAME_DEF_STMT (t0)))
+	    {
+	      incr_vec[incr_vec_len].initializer = t0;
+	      incr_vec[incr_vec_len++].init_bb
+		= gimple_bb (SSA_NAME_DEF_STMT (t0));
+	    }
+	  else
+	    {
+	      incr_vec[incr_vec_len].initializer = NULL_TREE;
+	      incr_vec[incr_vec_len++].init_bb = NULL;
+	    }
+	}
+      else
+	{
+	  incr_vec[incr_vec_len].initializer = NULL_TREE;
+	  incr_vec[incr_vec_len++].init_bb = NULL;
+	}
+    }
+}
+
+/* Determine how many times each unique increment occurs in the set
+   of candidates rooted at C's parent, recording the data in the
+   increment vector.  For each unique increment I, if an initializer
+   T_0 = stride * I is provided by a candidate that dominates all
+   candidates with the same increment, also record T_0 for subsequent
+   use.  */
+
+static void
+record_increments (slsr_cand_t c)
+{
+  if (!cand_already_replaced (c))
+    record_increment (c, cand_increment (c));
+
+  if (c->sibling)
+    record_increments (lookup_cand (c->sibling));
+
+  if (c->dependent)
+    record_increments (lookup_cand (c->dependent));
+}
+
+/* Return the first candidate in the tree rooted at C that has not
+   already been replaced, favoring siblings over dependents.  */
+
+static slsr_cand_t
+unreplaced_cand_in_tree (slsr_cand_t c)
+{
+  if (!cand_already_replaced (c))
+    return c;
+
+  if (c->sibling)
+    {
+      slsr_cand_t sib = unreplaced_cand_in_tree (lookup_cand (c->sibling));
+      if (sib)
+	return sib;
+    }
+
+  if (c->dependent)
+    {
+      slsr_cand_t dep = unreplaced_cand_in_tree (lookup_cand (c->dependent));
+      if (dep)
+	return dep;
+    }
+
+  return NULL;
+}
+
+/* Return TRUE if the candidates in the tree rooted at C should be
+   optimized for speed, else FALSE.  We estimate this based on the block
+   containing the most dominant candidate in the tree that has not yet
+   been replaced.  */
+
+static bool
+optimize_cands_for_speed_p (slsr_cand_t c)
+{
+  slsr_cand_t c2 = unreplaced_cand_in_tree (c);
+  gcc_assert (c2);
+  return optimize_bb_for_speed_p (gimple_bb (c2->cand_stmt));
+}
+
+/* Add COST_IN to the lowest cost of any dependent path starting at
+   candidate C or any of its siblings, counting only candidates along
+   such paths with increment INCR.  Assume that replacing a candidate
+   reduces cost by REPL_SAVINGS.  Also account for savings from any
+   statements that would go dead.  */
+
+static int
+lowest_cost_path (int cost_in, int repl_savings, slsr_cand_t c, double_int incr)
+{
+  int local_cost, sib_cost;
+  double_int cand_incr = cand_abs_increment (c);
+
+  if (cand_already_replaced (c))
+    local_cost = cost_in;
+  else if (double_int_equal_p (incr, cand_incr))
+    local_cost = cost_in - repl_savings - c->dead_savings;
+  else
+    local_cost = cost_in - c->dead_savings;
+
+  if (c->dependent)
+    local_cost = lowest_cost_path (local_cost, repl_savings, 
+				   lookup_cand (c->dependent), incr);
+
+  if (c->sibling)
+    {
+      sib_cost = lowest_cost_path (cost_in, repl_savings,
+				   lookup_cand (c->sibling), incr);
+      local_cost = MIN (local_cost, sib_cost);
+    }
+
+  return local_cost;
+}
+
+/* Compute the total savings that would accrue from all replacements
+   in the candidate tree rooted at C, counting only candidates with
+   increment INCR.  Assume that replacing a candidate reduces cost
+   by REPL_SAVINGS.  Also account for savings from statements that
+   would go dead.  */
+
+static int
+total_savings (int repl_savings, slsr_cand_t c, double_int incr)
+{
+  int savings = 0;
+  double_int cand_incr = cand_abs_increment (c);
+
+  if (double_int_equal_p (incr, cand_incr)
+      && !cand_already_replaced (c))
+    savings += repl_savings + c->dead_savings;
+
+  if (c->dependent)
+    savings += total_savings (repl_savings, lookup_cand (c->dependent), incr);
+
+  if (c->sibling)
+    savings += total_savings (repl_savings, lookup_cand (c->sibling), incr);
+
+  return savings;
+}
+
+/* Use target-specific costs to determine and record which increments
+   in the current candidate tree are profitable to replace, assuming
+   MODE and SPEED.  FIRST_DEP is the first dependent of the root of
+   the candidate tree.
+
+   One slight limitation here is that we don't account for the possible
+   introduction of casts in some cases.  See replace_one_candidate for
+   the cases where these are introduced.  This should probably be cleaned
+   up sometime.  */
+
+static void
+analyze_increments (slsr_cand_t first_dep, enum machine_mode mode, bool speed)
+{
+  unsigned i;
+
+  for (i = 0; i < incr_vec_len; i++)
+    {
+      HOST_WIDE_INT incr = double_int_to_shwi (incr_vec[i].incr);
+
+      /* If somehow this increment is bigger than a HWI, we won't
+	 be optimizing candidates that use it.  And if the increment
+	 has a count of zero, nothing will be done with it.  */
+      if (!double_int_fits_in_shwi_p (incr_vec[i].incr)
+	  || !incr_vec[i].count)
+	incr_vec[i].cost = COST_INFINITE;
+
+      /* Increments of 0, 1, and -1 are always profitable to replace,
+	 because they always replace a multiply or add with an add or
+	 copy, and may cause one or more existing instructions to go
+	 dead.  Exception:  -1 can't be assumed to be profitable for
+	 pointer addition.  */
+      else if (incr == 0
+	       || incr == 1
+	       || (incr == -1
+		   && (gimple_assign_rhs_code (first_dep->cand_stmt)
+		       != POINTER_PLUS_EXPR)))
+	incr_vec[i].cost = COST_NEUTRAL;
+      
+      /* For any other increment, if this is a multiply candidate, we
+	 must introduce a temporary T and initialize it with
+	 T_0 = stride * increment.  When optimizing for speed, walk the
+	 candidate tree to calculate the best cost reduction along any
+	 path; if it offsets the fixed cost of inserting the initializer,
+	 replacing the increment is profitable.  When optimizing for
+         size, instead calculate the total cost reduction from replacing
+	 all candidates with this increment.  */
+      else if (first_dep->kind == CAND_MULT)
+	{
+	  int cost = mult_by_coeff_cost (incr, mode, speed);
+	  int repl_savings = mul_cost (speed, mode) - add_cost (speed, mode);
+	  if (speed)
+	    cost = lowest_cost_path (cost, repl_savings, first_dep,
+				     incr_vec[i].incr);
+	  else
+	    cost -= total_savings (repl_savings, first_dep, incr_vec[i].incr);
+
+	  incr_vec[i].cost = cost;
+	}
+
+      /* If this is an add candidate, the initializer may already
+	 exist, so only calculate the cost of the initializer if it
+	 doesn't.  We are replacing one add with another here, so the
+	 known replacement savings is zero.  We will account for removal
+	 of dead instructions in lowest_cost_path or total_savings.  */
+      else
+	{
+	  int cost = 0;
+	  if (!incr_vec[i].initializer)
+	    cost = mult_by_coeff_cost (incr, mode, speed);
+
+	  if (speed)
+	    cost = lowest_cost_path (cost, 0, first_dep, incr_vec[i].incr);
+	  else
+	    cost -= total_savings (0, first_dep, incr_vec[i].incr);
+
+	  incr_vec[i].cost = cost;
+	}
+    }
+}
+
+/* Return the nearest common dominator of BB1 and BB2.  If the blocks
+   are identical, return the earlier of C1 and C2 in *WHERE.  Otherwise,
+   if the NCD matches BB1, return C1 in *WHERE; if the NCD matches BB2,
+   return C2 in *WHERE; and if the NCD matches neither, return NULL in
+   *WHERE.  Note: It is possible for one of C1 and C2 to be NULL.  */
+
+static basic_block
+ncd_for_two_cands (basic_block bb1, basic_block bb2,
+		   slsr_cand_t c1, slsr_cand_t c2, slsr_cand_t *where)
+{
+  basic_block ncd;
+
+  if (!bb1)
+    {
+      *where = c2;
+      return bb2;
+    }
+
+  if (!bb2)
+    {
+      *where = c1;
+      return bb1;
+    }
+
+  ncd = nearest_common_dominator (CDI_DOMINATORS, bb1, bb2);
+      
+  /* If both candidates are in the same block, the earlier
+     candidate wins.  */
+  if (bb1 == ncd && bb2 == ncd)
+    {
+      if (!c1 || (c2 && c2->cand_num < c1->cand_num))
+	*where = c2;
+      else
+	*where = c1;
+    }
+
+  /* Otherwise, if one of them produced a candidate in the
+     dominator, that one wins.  */
+  else if (bb1 == ncd)
+    *where = c1;
+
+  else if (bb2 == ncd)
+    *where = c2;
+
+  /* If neither matches the dominator, neither wins.  */
+  else
+    *where = NULL;
+
+  return ncd;
+}
+
+/* Consider all candidates in the tree rooted at C for which INCR
+   represents the required increment of C relative to its basis.
+   Find and return the basic block that most nearly dominates all
+   such candidates.  If the returned block contains one or more of
+   the candidates, return the earliest candidate in the block in
+   *WHERE.  */
+
+static basic_block
+nearest_common_dominator_for_cands (slsr_cand_t c, double_int incr,
+				    slsr_cand_t *where)
+{
+  basic_block sib_ncd = NULL, dep_ncd = NULL, this_ncd = NULL, ncd;
+  slsr_cand_t sib_where = NULL, dep_where = NULL, this_where = NULL, new_where;
+  double_int cand_incr;
+
+  /* First find the NCD of all siblings and dependents.  */
+  if (c->sibling)
+    sib_ncd = nearest_common_dominator_for_cands (lookup_cand (c->sibling),
+						  incr, &sib_where);
+  if (c->dependent)
+    dep_ncd = nearest_common_dominator_for_cands (lookup_cand (c->dependent),
+						  incr, &dep_where);
+  if (!sib_ncd && !dep_ncd)
+    {
+      new_where = NULL;
+      ncd = NULL;
+    }
+  else if (sib_ncd && !dep_ncd)
+    {
+      new_where = sib_where;
+      ncd = sib_ncd;
+    }
+  else if (dep_ncd && !sib_ncd)
+    {
+      new_where = dep_where;
+      ncd = dep_ncd;
+    }
+  else
+    ncd = ncd_for_two_cands (sib_ncd, dep_ncd, sib_where,
+			     dep_where, &new_where);
+
+  /* If the candidate's increment doesn't match the one we're interested
+     in, then the result depends only on siblings and dependents.  */
+  cand_incr = cand_abs_increment (c);
+
+  if (!double_int_equal_p (cand_incr, incr) || cand_already_replaced (c))
+    {
+      *where = new_where;
+      return ncd;
+    }
+
+  /* Otherwise, compare this candidate with the result from all siblings
+     and dependents.  */
+  this_where = c;
+  this_ncd = gimple_bb (c->cand_stmt);
+  ncd = ncd_for_two_cands (ncd, this_ncd, new_where, this_where, where);
+
+  return ncd;
+}
+
+/* Return TRUE if the increment indexed by INDEX is profitable to replace.  */
+
+static inline bool
+profitable_increment_p (unsigned index)
+{
+  return (incr_vec[index].cost <= COST_NEUTRAL);
+}
+
+/* For each profitable increment in the increment vector not equal to
+   0 or 1 (or -1, for non-pointer arithmetic), find the nearest common
+   dominator of all statements in the candidate chain rooted at C
+   that require that increment, and insert an initializer
+   T_0 = stride * increment at that location.  Record T_0 with the
+   increment record.  */
+
+static void
+insert_initializers (slsr_cand_t c)
+{
+  unsigned i;
+  tree new_var = NULL_TREE;
+
+  for (i = 0; i < incr_vec_len; i++)
+    {
+      basic_block bb;
+      slsr_cand_t where = NULL;
+      gimple init_stmt;
+      tree stride_type, new_name, incr_tree;
+      double_int incr = incr_vec[i].incr;
+
+      if (!profitable_increment_p (i)
+	  || double_int_one_p (incr)
+	  || (double_int_minus_one_p (incr)
+	      && gimple_assign_rhs_code (c->cand_stmt) != POINTER_PLUS_EXPR)
+	  || double_int_zero_p (incr))
+	continue;
+
+      /* We may have already identified an existing initializer that
+	 will suffice.  */
+      if (incr_vec[i].initializer)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fputs ("Using existing initializer: ", dump_file);
+	      print_gimple_stmt (dump_file,
+				 SSA_NAME_DEF_STMT (incr_vec[i].initializer),
+				 0, 0);
+	    }
+	  continue;
+	}
+
+      /* Find the block that most closely dominates all candidates
+	 with this increment.  If there is at least one candidate in
+	 that block, the earliest one will be returned in WHERE.  */
+      bb = nearest_common_dominator_for_cands (c, incr, &where);
+
+      /* Create a new SSA name to hold the initializer's value.  */
+      stride_type = TREE_TYPE (SSA_NAME_VAR (c->stride));
+      lazy_create_slsr_reg (&new_var, stride_type);
+      new_name = make_ssa_name (new_var, NULL);
+      incr_vec[i].initializer = new_name;
+
+      /* Create the initializer and insert it in the latest possible
+	 dominating position.  */
+      incr_tree = double_int_to_tree (stride_type, incr);
+      init_stmt = gimple_build_assign_with_ops (MULT_EXPR, new_name,
+						c->stride, incr_tree);
+      if (where)
+	{
+	  gimple_stmt_iterator gsi = gsi_for_stmt (where->cand_stmt);
+	  gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
+	  gimple_set_location (init_stmt, gimple_location (where->cand_stmt));
+	}
+      else
+	{
+	  gimple_stmt_iterator gsi = gsi_last_bb (bb);
+	  gimple basis_stmt = lookup_cand (c->basis)->cand_stmt;
+
+	  if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
+	    gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
+	  else
+	    gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
+
+	  gimple_set_location (init_stmt, gimple_location (basis_stmt));
+	}
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fputs ("Inserting initializer: ", dump_file);
+	  print_gimple_stmt (dump_file, init_stmt, 0, 0);
+	}
+    }
+}
+
+/* Create a NOP_EXPR that copies FROM_EXPR into a new SSA name of
+   type TO_TYPE, and insert it in front of the statement represented
+   by candidate C.  Use *NEW_VAR to create the new SSA name.  Return
+   the new SSA name.  */
+
+static tree
+introduce_cast_before_cand (slsr_cand_t c, tree to_type,
+			    tree from_expr, tree *new_var)
+{
+  tree cast_lhs;
+  gimple cast_stmt;
+  gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
+
+  lazy_create_slsr_reg (new_var, to_type);
+  cast_lhs = make_ssa_name (*new_var, NULL);
+  cast_stmt = gimple_build_assign_with_ops (NOP_EXPR, cast_lhs,
+					    from_expr, NULL_TREE);
+  gimple_set_location (cast_stmt, gimple_location (c->cand_stmt));
+  gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fputs ("  Inserting: ", dump_file);
+      print_gimple_stmt (dump_file, cast_stmt, 0, 0);
+    }
+
+  return cast_lhs;
+}
+
+/* Replace the RHS of the statement represented by candidate C with 
+   NEW_CODE, NEW_RHS1, and NEW_RHS2, provided that to do so doesn't
+   leave C unchanged or just interchange its operands.  The original
+   operation and operands are in OLD_CODE, OLD_RHS1, and OLD_RHS2.
+   If the replacement was made and we are doing a details dump,
+   return the revised statement, else NULL.  */
+
+static gimple
+replace_rhs_if_not_dup (enum tree_code new_code, tree new_rhs1, tree new_rhs2,
+			enum tree_code old_code, tree old_rhs1, tree old_rhs2,
+			slsr_cand_t c)
+{
+  if (new_code != old_code
+      || ((!operand_equal_p (new_rhs1, old_rhs1, 0)
+	   || !operand_equal_p (new_rhs2, old_rhs2, 0))
+	  && (!operand_equal_p (new_rhs1, old_rhs2, 0)
+	      || !operand_equal_p (new_rhs2, old_rhs1, 0))))
+    {
+      gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
+      gimple_assign_set_rhs_with_ops (&gsi, new_code, new_rhs1, new_rhs2);
+      update_stmt (gsi_stmt (gsi));
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	return gsi_stmt (gsi);
+    }
+
+  else if (dump_file && (dump_flags & TDF_DETAILS))
+    fputs ("  (duplicate, not actually replacing)\n", dump_file);
+
+  return NULL;
+}
+
+/* Strength-reduce the statement represented by candidate C by replacing
+   it with an equivalent addition or subtraction.  I is the index into
+   the increment vector identifying C's increment.  NEW_VAR is used to
+   create a new SSA name if a cast needs to be introduced.  BASIS_NAME
+   is the rhs1 to use in creating the add/subtract.  */
+
+static void
+replace_one_candidate (slsr_cand_t c, unsigned i, tree *new_var,
+		       tree basis_name)
+{
+  gimple stmt_to_print = NULL;
+  tree orig_rhs1, orig_rhs2;
+  tree rhs2;
+  enum tree_code orig_code, repl_code;
+  double_int cand_incr;
+
+  orig_code = gimple_assign_rhs_code (c->cand_stmt);
+  orig_rhs1 = gimple_assign_rhs1 (c->cand_stmt);
+  orig_rhs2 = gimple_assign_rhs2 (c->cand_stmt);
+  cand_incr = cand_increment (c);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fputs ("Replacing: ", dump_file);
+      print_gimple_stmt (dump_file, c->cand_stmt, 0, 0);
+      stmt_to_print = c->cand_stmt;
+    }
+
+  if (address_arithmetic_p)
+    repl_code = POINTER_PLUS_EXPR;
+  else
+    repl_code = PLUS_EXPR;
+
+  /* If the increment has an initializer T_0, replace the candidate
+     statement with an add of the basis name and the initializer.  */
+  if (incr_vec[i].initializer)
+    {
+      tree init_type = TREE_TYPE (SSA_NAME_VAR (incr_vec[i].initializer));
+      tree orig_type = TREE_TYPE (SSA_NAME_VAR (orig_rhs2));
+
+      if (types_compatible_p (orig_type, init_type))
+	rhs2 = incr_vec[i].initializer;
+      else
+	rhs2 = introduce_cast_before_cand (c, orig_type,
+					   incr_vec[i].initializer,
+					   new_var);
+
+      if (!double_int_equal_p (incr_vec[i].incr, cand_incr))
+	{
+	  gcc_assert (repl_code == PLUS_EXPR);
+	  repl_code = MINUS_EXPR;
+	}
+
+      stmt_to_print = replace_rhs_if_not_dup (repl_code, basis_name, rhs2,
+					      orig_code, orig_rhs1, orig_rhs2,
+					      c);
+    }
+
+  /* Otherwise, the increment is one of -1, 0, and 1.  Replace
+     with a subtract of the stride from the basis name, a copy
+     from the basis name, or an add of the stride to the basis
+     name, respectively.  It may be necessary to introduce a
+     cast (or reuse an existing cast).  */
+  else if (double_int_one_p (cand_incr))
+    {
+      tree stride_type = TREE_TYPE (SSA_NAME_VAR (c->stride));
+      tree orig_type = TREE_TYPE (SSA_NAME_VAR (orig_rhs2));
+      
+      if (types_compatible_p (orig_type, stride_type))
+	rhs2 = c->stride;
+      else
+	rhs2 = introduce_cast_before_cand (c, orig_type, c->stride, new_var);
+      
+      stmt_to_print = replace_rhs_if_not_dup (repl_code, basis_name, rhs2,
+					      orig_code, orig_rhs1, orig_rhs2,
+					      c);
+    }
+
+  else if (double_int_minus_one_p (cand_incr))
+    {
+      tree stride_type = TREE_TYPE (SSA_NAME_VAR (c->stride));
+      tree orig_type = TREE_TYPE (SSA_NAME_VAR (orig_rhs2));
+      gcc_assert (repl_code != POINTER_PLUS_EXPR);
+      
+      if (types_compatible_p (orig_type, stride_type))
+	rhs2 = c->stride;
+      else
+	rhs2 = introduce_cast_before_cand (c, orig_type, c->stride, new_var);
+      
+      if (orig_code != MINUS_EXPR
+	  || !operand_equal_p (basis_name, orig_rhs1, 0)
+	  || !operand_equal_p (rhs2, orig_rhs2, 0))
+	{
+	  gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
+	  gimple_assign_set_rhs_with_ops (&gsi, MINUS_EXPR, basis_name, rhs2);
+	  update_stmt (gsi_stmt (gsi));
+
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    stmt_to_print = gsi_stmt (gsi);
+	}
+      else if (dump_file && (dump_flags & TDF_DETAILS))
+	fputs ("  (duplicate, not actually replacing)\n", dump_file);
+    }
+
+  else if (double_int_zero_p (cand_incr))
+    {
+      tree lhs = gimple_assign_lhs (c->cand_stmt);
+      tree lhs_type = TREE_TYPE (SSA_NAME_VAR (lhs));
+      tree basis_type = TREE_TYPE (SSA_NAME_VAR (basis_name));
+      
+      if (types_compatible_p (lhs_type, basis_type))
+	{
+	  gimple copy_stmt = gimple_build_assign (lhs, basis_name);
+	  gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
+	  gimple_set_location (copy_stmt, gimple_location (c->cand_stmt));
+	  gsi_replace (&gsi, copy_stmt, false);
+
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    stmt_to_print = copy_stmt;
+	}
+      else
+	{
+	  gimple_stmt_iterator gsi = gsi_for_stmt (c->cand_stmt);
+	  gimple cast_stmt = gimple_build_assign_with_ops (NOP_EXPR, lhs,
+							   basis_name,
+							   NULL_TREE);
+	  gimple_set_location (cast_stmt, gimple_location (c->cand_stmt));
+	  gsi_replace (&gsi, cast_stmt, false);
+
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    stmt_to_print = cast_stmt;
+	}
+    }
+  else
+    gcc_unreachable ();
+  
+  if (dump_file && (dump_flags & TDF_DETAILS) && stmt_to_print)
+    {
+      fputs ("With: ", dump_file);
+      print_gimple_stmt (dump_file, stmt_to_print, 0, 0);
+      fputs ("\n", dump_file);
+    }
+}
+
+/* For each candidate in the tree rooted at C, replace it with
+   an increment if such has been shown to be profitable.  */
+
+static void
+replace_profitable_candidates (slsr_cand_t c)
+{
+  if (!cand_already_replaced (c))
+    {
+      double_int increment = cand_abs_increment (c);
+      tree new_var = NULL;
+      enum tree_code orig_code = gimple_assign_rhs_code (c->cand_stmt);
+      unsigned i;
+
+      i = incr_vec_index (increment);
+
+      /* Only process profitable increments.  Nothing useful can be done
+	 to a cast or copy.  */
+      if (profitable_increment_p (i) 
+	  && orig_code != MODIFY_EXPR
+	  && orig_code != NOP_EXPR)
+	{
+	  slsr_cand_t basis = lookup_cand (c->basis);
+	  tree basis_name = gimple_assign_lhs (basis->cand_stmt);
+	  replace_one_candidate (c, i, &new_var, basis_name);
+	}
+    }
+
+  if (c->sibling)
+    replace_profitable_candidates (lookup_cand (c->sibling));
+
+  if (c->dependent)
+    replace_profitable_candidates (lookup_cand (c->dependent));
+}
+
 /* Analyze costs of related candidates in the candidate vector,
    and make beneficial replacements.  */
 
@@ -1670,12 +2524,47 @@  analyze_candidates_and_replace (void)
       else if (unconditional_cands_with_known_stride_p (c))
 	replace_dependents (first_dep);
 
-      /* TODO:  When the stride is an SSA name, it may still be
-	 profitable to replace some or all of the dependent
-	 candidates, depending on whether the introduced increments
-	 can be reused, or are less expensive to calculate than
-	 the replaced statements.  */
+      /* When the stride is an SSA name, it may still be profitable
+	 to replace some or all of the dependent candidates, depending
+	 on whether the introduced increments can be reused, or are
+	 less expensive to calculate than the replaced statements.  */
+      else
+	{
+	  unsigned length;
+	  enum machine_mode mode;
+	  bool speed;
 
+	  /* Determine whether we'll be generating pointer arithmetic
+	     when replacing candidates.  */
+	  address_arithmetic_p = (c->kind == CAND_ADD
+				  && POINTER_TYPE_P (TREE_TYPE (c->base_expr)));
+
+	  /* If all candidates have already been replaced under other
+	     interpretations, nothing remains to be done.  */
+	  length = count_candidates (c);
+	  if (!length)
+	    continue;
+
+	  /* Construct an array of increments for this candidate chain.  */
+	  incr_vec = XNEWVEC (incr_info, length);
+	  incr_vec_len = 0;
+	  record_increments (c);
+
+	  /* Determine which increments are profitable to replace.  */
+	  mode = TYPE_MODE (TREE_TYPE (gimple_assign_lhs (c->cand_stmt)));
+	  speed = optimize_cands_for_speed_p (c);
+	  analyze_increments (first_dep, mode, speed);
+
+	  /* Insert initializers of the form T_0 = stride * increment
+	     for use in profitable replacements.  */
+	  insert_initializers (first_dep);
+	  dump_incr_vec ();
+
+	  /* Perform the replacements.  */
+	  replace_profitable_candidates (first_dep);
+	  free (incr_vec);
+	}
+
       /* TODO:  When conditional increments occur so that a 
 	 candidate is dependent upon a phi-basis, the cost of
 	 introducing a temporary must be accounted for.  */