diff mbox

[RFC] Fix PR71915, PR71490: Handle casts on strides consistently

Message ID 1477086370.5636.18.camel@oc8801110288.ibm.com
State New
Headers show

Commit Message

Bill Schmidt Oct. 21, 2016, 9:46 p.m. UTC
Hi,

I've been meaning for some time now to do a better job handling strength
reduction candidates where the stride of the candidate and its basis
involves a cast (usually widening) from another type.  The two PRs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71490 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71915 show that we miss
strength reduction opportunities in such cases.  Essentially the way I
was handling this before required a cast to go back to the original
type, which was illegal when this was a narrowing operation or a cast
from a wrapping type to a non-wrapping type.

This patch improves matters by tracking the effective type of the stride
so that we perform arithmetic in that type directly.  This allows us to
strength-reduce some cases missed before.  We still have to be careful
not to ever perform a narrowing or wrap-to-nonwrap conversion, but these
cases don't come up nearly so often in practice with this patch.

The patch is pretty straightforward but rather large, so I'd like to
leave it up for review for a week or so before committing it -- extra
eyes welcome!

There is an existing test case that checks for the narrowing problem,
and now fails with this patch as it should.  That is, it's now legal to
insert the initializer that we formerly knew to be bad for this test.
Since the test no longer serves any purpose, I'm deleting it.

gcc.dg/tree-ssa/slsr-8.c generates quite different intermediate code now
than when I first added it to the test suite.  As a result of various
optimization changes, it's no longer maintained as a single block;
instead, the optimizable computations are sunk into two legs of a
conditional.  This exposed another similar case, leading to the bug
reports.  This test now passes, but I had to adjust it for the new code
generation we get.  I also added some commentary to indicate what we
expect to happen in that test, since it isn't too obvious.

I've bootstrapped this and tested it on powerpc64le-unknown-linux-gnu
with no regressions.  To avoid the Wrath of Octoploid ;), I've also
tested it against ffmpeg using an x86_64-pc-linux-gnu cross with -O3
-march=amdfam10, also with no failures.  I've also verified that we hit
this code about 35 times in the test suite, so it looks like we have
some decent additional test coverage.

Thanks in advance to anyone willing to take a look.

Bill


[gcc]

2016-10-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/71915
	* gimple-ssa-strength-reduction.c (struct slsr_cand_d): Add
	stride_type field.
	(find_basis_for_base_expr): Require stride types to match when
	seeking a basis.
	(alloc_cand_and_find_basis): Record the stride type.
	(slsr_process_phi): Pass stride type to alloc_cand_and_find_basis.
	(backtrace_base_for_ref): Pass types to legal_cast_p_1 rather than
	the expressions having those types.
	(slsr_process_ref): Pass stride type to alloc_cand_and_find_basis.
	(create_mul_ssa_cand): Likewise.
	(create_mul_imm_cand): Likewise.
	(create_add_ssa_cand): Likewise.
	(create_add_imm_cand): Likewise.
	(legal_cast_p_1): Change interface to accept types rather than the
	expressions having those types.
	(legal_cast_p): Pass types to legal_cast_p_1.
	(slsr_process_cast): Pass stride type to
	alloc_cand_and_find_basis.
	(slsr_process_copy): Likewise.
	(dump_candidate): Display stride type when a cast exists.
	(create_add_on_incoming_edge): Introduce a cast when necessary for
	the stride type.
	(analyze_increments): Change the code checking for invalid casts
	to rely on the stride type, and update the documentation and
	example.  Change the code checking for pointer multiplies to rely
	on the stride type.
	(insert_initializers): Introduce a cast when necessary for the
	stride type.  Use the stride type for the type of the initializer.

[gcc/testsuite]

2016-10-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/71915
	* gcc.dg/tree-ssa/pr54245.c: Delete.
	* gcc.dg/tree-ssa/slsr-8.c: Adjust for new optimization and
	document why.

Comments

Richard Biener Oct. 24, 2016, 10:55 a.m. UTC | #1
On Fri, Oct 21, 2016 at 11:46 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Hi,
>
> I've been meaning for some time now to do a better job handling strength
> reduction candidates where the stride of the candidate and its basis
> involves a cast (usually widening) from another type.  The two PRs
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71490 and
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71915 show that we miss
> strength reduction opportunities in such cases.  Essentially the way I
> was handling this before required a cast to go back to the original
> type, which was illegal when this was a narrowing operation or a cast
> from a wrapping type to a non-wrapping type.
>
> This patch improves matters by tracking the effective type of the stride
> so that we perform arithmetic in that type directly.  This allows us to
> strength-reduce some cases missed before.  We still have to be careful
> not to ever perform a narrowing or wrap-to-nonwrap conversion, but these
> cases don't come up nearly so often in practice with this patch.
>
> The patch is pretty straightforward but rather large, so I'd like to
> leave it up for review for a week or so before committing it -- extra
> eyes welcome!
>
> There is an existing test case that checks for the narrowing problem,
> and now fails with this patch as it should.  That is, it's now legal to
> insert the initializer that we formerly knew to be bad for this test.
> Since the test no longer serves any purpose, I'm deleting it.
>
> gcc.dg/tree-ssa/slsr-8.c generates quite different intermediate code now
> than when I first added it to the test suite.  As a result of various
> optimization changes, it's no longer maintained as a single block;
> instead, the optimizable computations are sunk into two legs of a
> conditional.  This exposed another similar case, leading to the bug
> reports.  This test now passes, but I had to adjust it for the new code
> generation we get.  I also added some commentary to indicate what we
> expect to happen in that test, since it isn't too obvious.
>
> I've bootstrapped this and tested it on powerpc64le-unknown-linux-gnu
> with no regressions.  To avoid the Wrath of Octoploid ;), I've also
> tested it against ffmpeg using an x86_64-pc-linux-gnu cross with -O3
> -march=amdfam10, also with no failures.  I've also verified that we hit
> this code about 35 times in the test suite, so it looks like we have
> some decent additional test coverage.
>
> Thanks in advance to anyone willing to take a look.

LGTM.

Richard.

> Bill
>
>
> [gcc]
>
> 2016-10-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         PR tree-optimization/71915
>         * gimple-ssa-strength-reduction.c (struct slsr_cand_d): Add
>         stride_type field.
>         (find_basis_for_base_expr): Require stride types to match when
>         seeking a basis.
>         (alloc_cand_and_find_basis): Record the stride type.
>         (slsr_process_phi): Pass stride type to alloc_cand_and_find_basis.
>         (backtrace_base_for_ref): Pass types to legal_cast_p_1 rather than
>         the expressions having those types.
>         (slsr_process_ref): Pass stride type to alloc_cand_and_find_basis.
>         (create_mul_ssa_cand): Likewise.
>         (create_mul_imm_cand): Likewise.
>         (create_add_ssa_cand): Likewise.
>         (create_add_imm_cand): Likewise.
>         (legal_cast_p_1): Change interface to accept types rather than the
>         expressions having those types.
>         (legal_cast_p): Pass types to legal_cast_p_1.
>         (slsr_process_cast): Pass stride type to
>         alloc_cand_and_find_basis.
>         (slsr_process_copy): Likewise.
>         (dump_candidate): Display stride type when a cast exists.
>         (create_add_on_incoming_edge): Introduce a cast when necessary for
>         the stride type.
>         (analyze_increments): Change the code checking for invalid casts
>         to rely on the stride type, and update the documentation and
>         example.  Change the code checking for pointer multiplies to rely
>         on the stride type.
>         (insert_initializers): Introduce a cast when necessary for the
>         stride type.  Use the stride type for the type of the initializer.
>
> [gcc/testsuite]
>
> 2016-10-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         PR tree-optimization/71915
>         * gcc.dg/tree-ssa/pr54245.c: Delete.
>         * gcc.dg/tree-ssa/slsr-8.c: Adjust for new optimization and
>         document why.
>
>
> Index: gcc/gimple-ssa-strength-reduction.c
> ===================================================================
> --- gcc/gimple-ssa-strength-reduction.c (revision 241379)
> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> @@ -246,6 +246,13 @@ struct slsr_cand_d
>       replacement MEM_REF.)  */
>    tree cand_type;
>
> +  /* The type to be used to interpret the stride field when the stride
> +     is not a constant.  Normally the same as the type of the recorded
> +     stride, but when the stride has been cast we need to maintain that
> +     knowledge in order to make legal substitutions without losing
> +     precision.  When the stride is a constant, this will be sizetype.  */
> +  tree stride_type;
> +
>    /* The kind of candidate (CAND_MULT, etc.).  */
>    enum cand_kind kind;
>
> @@ -502,6 +509,7 @@ find_basis_for_base_expr (slsr_cand_t c, tree base
>           || one_basis->cand_stmt == c->cand_stmt
>           || !operand_equal_p (one_basis->stride, c->stride, 0)
>           || !types_compatible_p (one_basis->cand_type, c->cand_type)
> +         || !types_compatible_p (one_basis->stride_type, c->stride_type)
>           || !dominated_by_p (CDI_DOMINATORS,
>                               gimple_bb (c->cand_stmt),
>                               gimple_bb (one_basis->cand_stmt)))
> @@ -615,7 +623,7 @@ record_potential_basis (slsr_cand_t c, tree base)
>  static slsr_cand_t
>  alloc_cand_and_find_basis (enum cand_kind kind, gimple *gs, tree base,
>                            const widest_int &index, tree stride, tree ctype,
> -                          unsigned savings)
> +                          tree stype, unsigned savings)
>  {
>    slsr_cand_t c = (slsr_cand_t) obstack_alloc (&cand_obstack,
>                                                sizeof (slsr_cand));
> @@ -624,6 +632,7 @@ alloc_cand_and_find_basis (enum cand_kind kind, gi
>    c->stride = stride;
>    c->index = index;
>    c->cand_type = ctype;
> +  c->stride_type = stype;
>    c->kind = kind;
>    c->cand_num = cand_vec.length () + 1;
>    c->next_interp = 0;
> @@ -809,7 +818,8 @@ slsr_process_phi (gphi *phi, bool speed)
>    base_type = TREE_TYPE (arg0_base);
>
>    c = alloc_cand_and_find_basis (CAND_PHI, phi, arg0_base,
> -                                0, integer_one_node, base_type, savings);
> +                                0, integer_one_node, base_type,
> +                                sizetype, savings);
>
>    /* Add the candidate to the statement-candidate mapping.  */
>    add_cand_for_stmt (phi, c);
> @@ -838,7 +848,8 @@ backtrace_base_for_ref (tree *pbase)
>       e.g. 'B' is widened from an 'int' in order to calculate
>       a 64-bit address.  */
>    if (CONVERT_EXPR_P (base_in)
> -      && legal_cast_p_1 (base_in, TREE_OPERAND (base_in, 0)))
> +      && legal_cast_p_1 (TREE_TYPE (base_in),
> +                        TREE_TYPE (TREE_OPERAND (base_in, 0))))
>      base_in = get_unwidened (base_in, NULL_TREE);
>
>    if (TREE_CODE (base_in) != SSA_NAME)
> @@ -995,7 +1006,7 @@ slsr_process_ref (gimple *gs)
>      return;
>
>    c = alloc_cand_and_find_basis (CAND_REF, gs, base, index, offset,
> -                                type, 0);
> +                                type, sizetype, 0);
>
>    /* Add the candidate to the statement-candidate mapping.  */
>    add_cand_for_stmt (gs, c);
> @@ -1010,6 +1021,7 @@ static slsr_cand_t
>  create_mul_ssa_cand (gimple *gs, tree base_in, tree stride_in, bool speed)
>  {
>    tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
> +  tree stype = NULL_TREE;
>    widest_int index;
>    unsigned savings = 0;
>    slsr_cand_t c;
> @@ -1030,6 +1042,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
>           index = base_cand->index;
>           stride = stride_in;
>           ctype = base_cand->cand_type;
> +         stype = TREE_TYPE (stride_in);
>           if (has_single_use (base_in))
>             savings = (base_cand->dead_savings
>                        + stmt_cost (base_cand->cand_stmt, speed));
> @@ -1045,6 +1058,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
>           index = base_cand->index * wi::to_widest (base_cand->stride);
>           stride = stride_in;
>           ctype = base_cand->cand_type;
> +         stype = TREE_TYPE (stride_in);
>           if (has_single_use (base_in))
>             savings = (base_cand->dead_savings
>                        + stmt_cost (base_cand->cand_stmt, speed));
> @@ -1064,10 +1078,11 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
>        index = 0;
>        stride = stride_in;
>        ctype = TREE_TYPE (base_in);
> +      stype = TREE_TYPE (stride_in);
>      }
>
>    c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride,
> -                                ctype, savings);
> +                                ctype, stype, savings);
>    return c;
>  }
>
> @@ -1156,7 +1171,7 @@ create_mul_imm_cand (gimple *gs, tree base_in, tre
>      }
>
>    c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride,
> -                                ctype, savings);
> +                                ctype, sizetype, savings);
>    return c;
>  }
>
> @@ -1212,7 +1227,8 @@ static slsr_cand_t
>  create_add_ssa_cand (gimple *gs, tree base_in, tree addend_in,
>                      bool subtract_p, bool speed)
>  {
> -  tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL;
> +  tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
> +  tree stype = NULL_TREE;
>    widest_int index;
>    unsigned savings = 0;
>    slsr_cand_t c;
> @@ -1237,6 +1253,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>             index = -index;
>           stride = addend_cand->base_expr;
>           ctype = TREE_TYPE (base_in);
> +         stype = addend_cand->cand_type;
>           if (has_single_use (addend_in))
>             savings = (addend_cand->dead_savings
>                        + stmt_cost (addend_cand->cand_stmt, speed));
> @@ -1263,6 +1280,8 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>           index = subtract_p ? -1 : 1;
>           stride = addend_in;
>           ctype = base_cand->cand_type;
> +         stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype
> +                  : TREE_TYPE (addend_in));
>           if (has_single_use (base_in))
>             savings = (base_cand->dead_savings
>                        + stmt_cost (base_cand->cand_stmt, speed));
> @@ -1286,6 +1305,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>                   index = -index;
>                   stride = subtrahend_cand->base_expr;
>                   ctype = TREE_TYPE (base_in);
> +                 stype = subtrahend_cand->cand_type;
>                   if (has_single_use (addend_in))
>                     savings = (subtrahend_cand->dead_savings
>                                + stmt_cost (subtrahend_cand->cand_stmt, speed));
> @@ -1312,10 +1332,12 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>        index = subtract_p ? -1 : 1;
>        stride = addend_in;
>        ctype = TREE_TYPE (base_in);
> +      stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype
> +              : TREE_TYPE (addend_in));
>      }
>
>    c = alloc_cand_and_find_basis (CAND_ADD, gs, base, index, stride,
> -                                ctype, savings);
> +                                ctype, stype, savings);
>    return c;
>  }
>
> @@ -1329,6 +1351,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con
>  {
>    enum cand_kind kind = CAND_ADD;
>    tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
> +  tree stype = NULL_TREE;
>    widest_int index, multiple;
>    unsigned savings = 0;
>    slsr_cand_t c;
> @@ -1356,6 +1379,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con
>           index = base_cand->index + multiple;
>           stride = base_cand->stride;
>           ctype = base_cand->cand_type;
> +         stype = base_cand->stride_type;
>           if (has_single_use (base_in))
>             savings = (base_cand->dead_savings
>                        + stmt_cost (base_cand->cand_stmt, speed));
> @@ -1376,10 +1400,11 @@ create_add_imm_cand (gimple *gs, tree base_in, con
>        index = index_in;
>        stride = integer_one_node;
>        ctype = TREE_TYPE (base_in);
> +      stype = sizetype;
>      }
>
>    c = alloc_cand_and_find_basis (kind, gs, base, index, stride,
> -                                ctype, savings);
> +                                ctype, stype, savings);
>    return c;
>  }
>
> @@ -1456,14 +1481,11 @@ slsr_process_neg (gimple *gs, tree rhs1, bool spee
>     for more details.  */
>
>  static bool
> -legal_cast_p_1 (tree lhs, tree rhs)
> +legal_cast_p_1 (tree lhs_type, tree rhs_type)
>  {
> -  tree lhs_type, rhs_type;
>    unsigned lhs_size, rhs_size;
>    bool lhs_wraps, rhs_wraps;
>
> -  lhs_type = TREE_TYPE (lhs);
> -  rhs_type = TREE_TYPE (rhs);
>    lhs_size = TYPE_PRECISION (lhs_type);
>    rhs_size = TYPE_PRECISION (rhs_type);
>    lhs_wraps = ANY_INTEGRAL_TYPE_P (lhs_type) && TYPE_OVERFLOW_WRAPS (lhs_type);
> @@ -1521,7 +1543,7 @@ legal_cast_p (gimple *gs, tree rhs)
>        || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (gs)))
>      return false;
>
> -  return legal_cast_p_1 (gimple_assign_lhs (gs), rhs);
> +  return legal_cast_p_1 (TREE_TYPE (gimple_assign_lhs (gs)), TREE_TYPE (rhs));
>  }
>
>  /* Given GS which is a cast to a scalar integer type, determine whether
> @@ -1556,7 +1578,8 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe
>           c = alloc_cand_and_find_basis (base_cand->kind, gs,
>                                          base_cand->base_expr,
>                                          base_cand->index, base_cand->stride,
> -                                        ctype, savings);
> +                                        ctype, base_cand->stride_type,
> +                                        savings);
>           if (base_cand->next_interp)
>             base_cand = lookup_cand (base_cand->next_interp);
>           else
> @@ -1574,10 +1597,10 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe
>          The first of these is somewhat arbitrary, but the choice of
>          1 for the stride simplifies the logic for propagating casts
>          into their uses.  */
> -      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1,
> -                                    0, integer_one_node, ctype, 0);
> -      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1,
> -                                     0, integer_one_node, ctype, 0);
> +      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0,
> +                                    integer_one_node, ctype, sizetype, 0);
> +      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0,
> +                                     integer_one_node, ctype, sizetype, 0);
>        c->next_interp = c2->cand_num;
>      }
>
> @@ -1613,7 +1636,8 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe
>           c = alloc_cand_and_find_basis (base_cand->kind, gs,
>                                          base_cand->base_expr,
>                                          base_cand->index, base_cand->stride,
> -                                        base_cand->cand_type, savings);
> +                                        base_cand->cand_type,
> +                                        base_cand->stride_type, savings);
>           if (base_cand->next_interp)
>             base_cand = lookup_cand (base_cand->next_interp);
>           else
> @@ -1631,10 +1655,12 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe
>          The first of these is somewhat arbitrary, but the choice of
>          1 for the stride simplifies the logic for propagating casts
>          into their uses.  */
> -      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1,
> -                                    0, integer_one_node, TREE_TYPE (rhs1), 0);
> -      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1,
> -                                     0, integer_one_node, TREE_TYPE (rhs1), 0);
> +      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0,
> +                                    integer_one_node, TREE_TYPE (rhs1),
> +                                    sizetype, 0);
> +      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0,
> +                                     integer_one_node, TREE_TYPE (rhs1),
> +                                     sizetype, 0);
>        c->next_interp = c2->cand_num;
>      }
>
> @@ -1755,6 +1781,13 @@ dump_candidate (slsr_cand_t c)
>        fputs (" + ", dump_file);
>        print_decs (c->index, dump_file);
>        fputs (") * ", dump_file);
> +      if (TREE_CODE (c->stride) != INTEGER_CST
> +         && c->stride_type != TREE_TYPE (c->stride))
> +       {
> +         fputs ("(", dump_file);
> +         print_generic_expr (dump_file, c->stride_type, 0);
> +         fputs (")", dump_file);
> +       }
>        print_generic_expr (dump_file, c->stride, 0);
>        fputs (" : ", dump_file);
>        break;
> @@ -1764,6 +1797,13 @@ dump_candidate (slsr_cand_t c)
>        fputs (" + (", dump_file);
>        print_decs (c->index, dump_file);
>        fputs (" * ", dump_file);
> +      if (TREE_CODE (c->stride) != INTEGER_CST
> +         && c->stride_type != TREE_TYPE (c->stride))
> +       {
> +         fputs ("(", dump_file);
> +         print_generic_expr (dump_file, c->stride_type, 0);
> +         fputs (")", dump_file);
> +       }
>        print_generic_expr (dump_file, c->stride, 0);
>        fputs (") : ", dump_file);
>        break;
> @@ -2143,7 +2183,7 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
>    basic_block insert_bb;
>    gimple_stmt_iterator gsi;
>    tree lhs, basis_type;
> -  gassign *new_stmt;
> +  gassign *new_stmt, *cast_stmt = NULL;
>
>    /* If the add candidate along this incoming edge has the same
>       index as C's hidden basis, the hidden basis represents this
> @@ -2187,13 +2227,27 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
>           new_stmt = gimple_build_assign (lhs, code, basis_name,
>                                           incr_vec[i].initializer);
>         }
> -      else if (increment == 1)
> -       new_stmt = gimple_build_assign (lhs, plus_code, basis_name, c->stride);
> -      else if (increment == -1)
> -       new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name,
> -                                       c->stride);
> -      else
> -       gcc_unreachable ();
> +      else {
> +       tree stride;
> +
> +       if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
> +         {
> +           tree cast_stride = make_temp_ssa_name (c->stride_type, NULL,
> +                                                  "slsr");
> +           cast_stmt = gimple_build_assign (cast_stride, NOP_EXPR,
> +                                            c->stride);
> +           stride = cast_stride;
> +         }
> +       else
> +         stride = c->stride;
> +
> +       if (increment == 1)
> +         new_stmt = gimple_build_assign (lhs, plus_code, basis_name, stride);
> +       else if (increment == -1)
> +         new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, stride);
> +       else
> +         gcc_unreachable ();
> +      }
>      }
>
>    insert_bb = single_succ_p (e->src) ? e->src : split_edge (e);
> @@ -2200,14 +2254,34 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
>    gsi = gsi_last_bb (insert_bb);
>
>    if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
> -    gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT);
> +    {
> +      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> +      if (cast_stmt)
> +       {
> +         gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
> +         gimple_set_location (cast_stmt, loc);
> +       }
> +    }
>    else
> -    gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
> +    {
> +      if (cast_stmt)
> +       {
> +         gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT);
> +         gimple_set_location (cast_stmt, loc);
> +       }
> +      gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
> +    }
>
>    gimple_set_location (new_stmt, loc);
>
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> +      if (cast_stmt)
> +       {
> +         fprintf (dump_file, "Inserting cast in block %d: ",
> +                  insert_bb->index);
> +         print_gimple_stmt (dump_file, cast_stmt, 0, 0);
> +       }
>        fprintf (dump_file, "Inserting in block %d: ", insert_bb->index);
>        print_gimple_stmt (dump_file, new_stmt, 0, 0);
>      }
> @@ -2825,40 +2899,35 @@ analyze_increments (slsr_cand_t first_dep, machine
>                    && !POINTER_TYPE_P (first_dep->cand_type)))
>         incr_vec[i].cost = COST_NEUTRAL;
>
> -      /* FORNOW: If we need to add an initializer, give up if a cast from
> -        the candidate's type to its stride's type can lose precision.
> -        This could eventually be handled better by expressly retaining the
> -        result of a cast to a wider type in the stride.  Example:
> +      /* If we need to add an initializer, give up if a cast from the
> +        candidate's type to its stride's type can lose precision.
> +        Note that this already takes into account that the stride may
> +        have been cast to a wider type, in which case this test won't
> +        fire.  Example:
>
>             short int _1;
>            _2 = (int) _1;
>            _3 = _2 * 10;
> -          _4 = x + _3;    ADD: x + (10 * _1) : int
> +          _4 = x + _3;    ADD: x + (10 * (int)_1) : int
>            _5 = _2 * 15;
> -          _6 = x + _3;    ADD: x + (15 * _1) : int
> +          _6 = x + _5;    ADD: x + (15 * (int)_1) : int
>
> -         Right now replacing _6 would cause insertion of an initializer
> -        of the form "short int T = _1 * 5;" followed by a cast to
> -        int, which could overflow incorrectly.  Had we recorded _2 or
> -        (int)_1 as the stride, this wouldn't happen.  However, doing
> -         this breaks other opportunities, so this will require some
> -        care.  */
> +        Although the stride was a short int initially, the stride
> +        used in the analysis has been widened to an int, and such
> +        widening will be done in the initializer as well.  */
>        else if (!incr_vec[i].initializer
>                && TREE_CODE (first_dep->stride) != INTEGER_CST
> -              && !legal_cast_p_1 (first_dep->stride,
> -                                  gimple_assign_lhs (first_dep->cand_stmt)))
> -
> +              && !legal_cast_p_1 (first_dep->stride_type,
> +                                  TREE_TYPE (gimple_assign_lhs
> +                                             (first_dep->cand_stmt))))
>         incr_vec[i].cost = COST_INFINITE;
>
>        /* If we need to add an initializer, make sure we don't introduce
>          a multiply by a pointer type, which can happen in certain cast
> -        scenarios.  FIXME: When cleaning up these cast issues, we can
> -         afford to introduce the multiply provided we cast out to an
> -         unsigned int of appropriate size.  */
> +        scenarios.  */
>        else if (!incr_vec[i].initializer
>                && TREE_CODE (first_dep->stride) != INTEGER_CST
> -              && POINTER_TYPE_P (TREE_TYPE (first_dep->stride)))
> -
> +              && POINTER_TYPE_P (first_dep->stride_type))
>         incr_vec[i].cost = COST_INFINITE;
>
>        /* For any other increment, if this is a multiply candidate, we
> @@ -3105,7 +3174,8 @@ insert_initializers (slsr_cand_t c)
>        basic_block bb;
>        slsr_cand_t where = NULL;
>        gassign *init_stmt;
> -      tree stride_type, new_name, incr_tree;
> +      gassign *cast_stmt = NULL;
> +      tree new_name, incr_tree, init_stride;
>        widest_int incr = incr_vec[i].incr;
>
>        if (!profitable_increment_p (i)
> @@ -3134,31 +3204,63 @@ insert_initializers (slsr_cand_t c)
>          that block, the earliest one will be returned in WHERE.  */
>        bb = nearest_common_dominator_for_cands (c, incr, &where);
>
> +      /* If the nominal stride has a different type than the recorded
> +        stride type, build a cast from the nominal stride to that type.  */
> +      if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
> +       {
> +         init_stride = make_temp_ssa_name (c->stride_type, NULL, "slsr");
> +         cast_stmt = gimple_build_assign (init_stride, NOP_EXPR, c->stride);
> +       }
> +      else
> +       init_stride = c->stride;
> +
>        /* Create a new SSA name to hold the initializer's value.  */
> -      stride_type = TREE_TYPE (c->stride);
> -      new_name = make_temp_ssa_name (stride_type, NULL, "slsr");
> +      new_name = make_temp_ssa_name (c->stride_type, NULL, "slsr");
>        incr_vec[i].initializer = new_name;
>
>        /* Create the initializer and insert it in the latest possible
>          dominating position.  */
> -      incr_tree = wide_int_to_tree (stride_type, incr);
> +      incr_tree = wide_int_to_tree (c->stride_type, incr);
>        init_stmt = gimple_build_assign (new_name, MULT_EXPR,
> -                                      c->stride, incr_tree);
> +                                      init_stride, incr_tree);
>        if (where)
>         {
>           gimple_stmt_iterator gsi = gsi_for_stmt (where->cand_stmt);
> +         location_t loc = gimple_location (where->cand_stmt);
> +
> +         if (cast_stmt)
> +           {
> +             gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
> +             gimple_set_location (cast_stmt, loc);
> +           }
> +
>           gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
> -         gimple_set_location (init_stmt, gimple_location (where->cand_stmt));
> +         gimple_set_location (init_stmt, loc);
>         }
>        else
>         {
>           gimple_stmt_iterator gsi = gsi_last_bb (bb);
>           gimple *basis_stmt = lookup_cand (c->basis)->cand_stmt;
> +         location_t loc = gimple_location (basis_stmt);
>
>           if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
> -           gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
> +           {
> +             if (cast_stmt)
> +               {
> +                 gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
> +                 gimple_set_location (cast_stmt, loc);
> +               }
> +             gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
> +           }
>           else
> -           gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
> +           {
> +             if (cast_stmt)
> +               {
> +                 gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT);
> +                 gimple_set_location (cast_stmt, loc);
> +               }
> +             gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
> +           }
>
>           gimple_set_location (init_stmt, gimple_location (basis_stmt));
>         }
> @@ -3165,6 +3267,11 @@ insert_initializers (slsr_cand_t c)
>
>        if (dump_file && (dump_flags & TDF_DETAILS))
>         {
> +         if (cast_stmt)
> +           {
> +             fputs ("Inserting stride cast: ", dump_file);
> +             print_gimple_stmt (dump_file, cast_stmt, 0, 0);
> +           }
>           fputs ("Inserting initializer: ", dump_file);
>           print_gimple_stmt (dump_file, init_stmt, 0, 0);
>         }
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr54245.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr54245.c     (revision 241379)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr54245.c     (working copy)
> @@ -1,48 +0,0 @@
> -/* { dg-do compile } */
> -/* { dg-options "-O1 -fdump-tree-slsr-details" } */
> -
> -#include <stdio.h>
> -
> -#define W1  22725
> -#define W2  21407
> -#define W3  19266
> -#define W6  8867
> -
> -void idct_row(short *row, int *dst)
> -{
> -    int a0, a1, b0, b1;
> -
> -    a0 = W1 * row[0];
> -    a1 = a0;
> -
> -    a0 += W2 * row[2];
> -    a1 += W6 * row[2];
> -
> -    b0 = W1 * row[1];
> -    b1 = W3 * row[1];
> -
> -    dst[0] = a0 + b0;
> -    dst[1] = a0 - b0;
> -    dst[2] = a1 + b1;
> -    dst[3] = a1 - b1;
> -}
> -
> -static short block[8] = { 1, 2, 3, 4 };
> -
> -int main(void)
> -{
> -    int out[4];
> -    int i;
> -
> -    idct_row(block, out);
> -
> -    for (i = 0; i < 4; i++)
> -        printf("%d\n", out[i]);
> -
> -    return !(out[2] == 87858 && out[3] == 10794);
> -}
> -
> -/* For now, disable inserting an initializer when the multiplication will
> -   take place in a smaller type than originally.  This test may be deleted
> -   in future when this case is handled more precisely.  */
> -/* { dg-final { scan-tree-dump-times "Inserting initializer" 0 "slsr" { target { ! int16 } } } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c      (revision 241379)
> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c      (working copy)
> @@ -17,7 +17,12 @@ f (int s, int *c)
>    return x1 ? x2 : x3;
>  }
>
> +/* Note that since some branch prediction heuristics changed, the
> +   calculations of x2 and x3 are pushed downward into the legs
> +   of the conditional, changing the code presented to SLSR.
> +   However, this proves to be a useful test for introducing an
> +   initializer with a cast, so we'll keep it as is.  */
> +
>  /* There are 4 ' * ' instances in the decls (since "int * iftmp.0;" is
> -   added), 1 parm, 2 in the code.  The second one in the code can be
> -   a widening mult.  */
> -/* { dg-final { scan-tree-dump-times " w?\\* " 7 "optimized" } } */
> +   added), 2 parms, 3 in the code.  */
> +/* { dg-final { scan-tree-dump-times " \\* " 9 "optimized" } } */
>
>
Christophe Lyon Nov. 4, 2016, 2:10 p.m. UTC | #2
On 24 October 2016 at 12:55, Richard Biener <richard.guenther@gmail.com> wrote:
> On Fri, Oct 21, 2016 at 11:46 PM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
>> Hi,
>>
>> I've been meaning for some time now to do a better job handling strength
>> reduction candidates where the stride of the candidate and its basis
>> involves a cast (usually widening) from another type.  The two PRs
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71490 and
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71915 show that we miss
>> strength reduction opportunities in such cases.  Essentially the way I
>> was handling this before required a cast to go back to the original
>> type, which was illegal when this was a narrowing operation or a cast
>> from a wrapping type to a non-wrapping type.
>>
>> This patch improves matters by tracking the effective type of the stride
>> so that we perform arithmetic in that type directly.  This allows us to
>> strength-reduce some cases missed before.  We still have to be careful
>> not to ever perform a narrowing or wrap-to-nonwrap conversion, but these
>> cases don't come up nearly so often in practice with this patch.
>>
>> The patch is pretty straightforward but rather large, so I'd like to
>> leave it up for review for a week or so before committing it -- extra
>> eyes welcome!
>>
>> There is an existing test case that checks for the narrowing problem,
>> and now fails with this patch as it should.  That is, it's now legal to
>> insert the initializer that we formerly knew to be bad for this test.
>> Since the test no longer serves any purpose, I'm deleting it.
>>
>> gcc.dg/tree-ssa/slsr-8.c generates quite different intermediate code now
>> than when I first added it to the test suite.  As a result of various
>> optimization changes, it's no longer maintained as a single block;
>> instead, the optimizable computations are sunk into two legs of a
>> conditional.  This exposed another similar case, leading to the bug
>> reports.  This test now passes, but I had to adjust it for the new code
>> generation we get.  I also added some commentary to indicate what we
>> expect to happen in that test, since it isn't too obvious.
>>
>> I've bootstrapped this and tested it on powerpc64le-unknown-linux-gnu
>> with no regressions.  To avoid the Wrath of Octoploid ;), I've also
>> tested it against ffmpeg using an x86_64-pc-linux-gnu cross with -O3
>> -march=amdfam10, also with no failures.  I've also verified that we hit
>> this code about 35 times in the test suite, so it looks like we have
>> some decent additional test coverage.
>>
>> Thanks in advance to anyone willing to take a look.
>
> LGTM.
>

Hi,

Since this was committed, I'm seeing a failure in
  gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " \\* " 9
on aarch64 targets.

Christophe


> Richard.
>
>> Bill
>>
>>
>> [gcc]
>>
>> 2016-10-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>>
>>         PR tree-optimization/71915
>>         * gimple-ssa-strength-reduction.c (struct slsr_cand_d): Add
>>         stride_type field.
>>         (find_basis_for_base_expr): Require stride types to match when
>>         seeking a basis.
>>         (alloc_cand_and_find_basis): Record the stride type.
>>         (slsr_process_phi): Pass stride type to alloc_cand_and_find_basis.
>>         (backtrace_base_for_ref): Pass types to legal_cast_p_1 rather than
>>         the expressions having those types.
>>         (slsr_process_ref): Pass stride type to alloc_cand_and_find_basis.
>>         (create_mul_ssa_cand): Likewise.
>>         (create_mul_imm_cand): Likewise.
>>         (create_add_ssa_cand): Likewise.
>>         (create_add_imm_cand): Likewise.
>>         (legal_cast_p_1): Change interface to accept types rather than the
>>         expressions having those types.
>>         (legal_cast_p): Pass types to legal_cast_p_1.
>>         (slsr_process_cast): Pass stride type to
>>         alloc_cand_and_find_basis.
>>         (slsr_process_copy): Likewise.
>>         (dump_candidate): Display stride type when a cast exists.
>>         (create_add_on_incoming_edge): Introduce a cast when necessary for
>>         the stride type.
>>         (analyze_increments): Change the code checking for invalid casts
>>         to rely on the stride type, and update the documentation and
>>         example.  Change the code checking for pointer multiplies to rely
>>         on the stride type.
>>         (insert_initializers): Introduce a cast when necessary for the
>>         stride type.  Use the stride type for the type of the initializer.
>>
>> [gcc/testsuite]
>>
>> 2016-10-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>>
>>         PR tree-optimization/71915
>>         * gcc.dg/tree-ssa/pr54245.c: Delete.
>>         * gcc.dg/tree-ssa/slsr-8.c: Adjust for new optimization and
>>         document why.
>>
>>
>> Index: gcc/gimple-ssa-strength-reduction.c
>> ===================================================================
>> --- gcc/gimple-ssa-strength-reduction.c (revision 241379)
>> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
>> @@ -246,6 +246,13 @@ struct slsr_cand_d
>>       replacement MEM_REF.)  */
>>    tree cand_type;
>>
>> +  /* The type to be used to interpret the stride field when the stride
>> +     is not a constant.  Normally the same as the type of the recorded
>> +     stride, but when the stride has been cast we need to maintain that
>> +     knowledge in order to make legal substitutions without losing
>> +     precision.  When the stride is a constant, this will be sizetype.  */
>> +  tree stride_type;
>> +
>>    /* The kind of candidate (CAND_MULT, etc.).  */
>>    enum cand_kind kind;
>>
>> @@ -502,6 +509,7 @@ find_basis_for_base_expr (slsr_cand_t c, tree base
>>           || one_basis->cand_stmt == c->cand_stmt
>>           || !operand_equal_p (one_basis->stride, c->stride, 0)
>>           || !types_compatible_p (one_basis->cand_type, c->cand_type)
>> +         || !types_compatible_p (one_basis->stride_type, c->stride_type)
>>           || !dominated_by_p (CDI_DOMINATORS,
>>                               gimple_bb (c->cand_stmt),
>>                               gimple_bb (one_basis->cand_stmt)))
>> @@ -615,7 +623,7 @@ record_potential_basis (slsr_cand_t c, tree base)
>>  static slsr_cand_t
>>  alloc_cand_and_find_basis (enum cand_kind kind, gimple *gs, tree base,
>>                            const widest_int &index, tree stride, tree ctype,
>> -                          unsigned savings)
>> +                          tree stype, unsigned savings)
>>  {
>>    slsr_cand_t c = (slsr_cand_t) obstack_alloc (&cand_obstack,
>>                                                sizeof (slsr_cand));
>> @@ -624,6 +632,7 @@ alloc_cand_and_find_basis (enum cand_kind kind, gi
>>    c->stride = stride;
>>    c->index = index;
>>    c->cand_type = ctype;
>> +  c->stride_type = stype;
>>    c->kind = kind;
>>    c->cand_num = cand_vec.length () + 1;
>>    c->next_interp = 0;
>> @@ -809,7 +818,8 @@ slsr_process_phi (gphi *phi, bool speed)
>>    base_type = TREE_TYPE (arg0_base);
>>
>>    c = alloc_cand_and_find_basis (CAND_PHI, phi, arg0_base,
>> -                                0, integer_one_node, base_type, savings);
>> +                                0, integer_one_node, base_type,
>> +                                sizetype, savings);
>>
>>    /* Add the candidate to the statement-candidate mapping.  */
>>    add_cand_for_stmt (phi, c);
>> @@ -838,7 +848,8 @@ backtrace_base_for_ref (tree *pbase)
>>       e.g. 'B' is widened from an 'int' in order to calculate
>>       a 64-bit address.  */
>>    if (CONVERT_EXPR_P (base_in)
>> -      && legal_cast_p_1 (base_in, TREE_OPERAND (base_in, 0)))
>> +      && legal_cast_p_1 (TREE_TYPE (base_in),
>> +                        TREE_TYPE (TREE_OPERAND (base_in, 0))))
>>      base_in = get_unwidened (base_in, NULL_TREE);
>>
>>    if (TREE_CODE (base_in) != SSA_NAME)
>> @@ -995,7 +1006,7 @@ slsr_process_ref (gimple *gs)
>>      return;
>>
>>    c = alloc_cand_and_find_basis (CAND_REF, gs, base, index, offset,
>> -                                type, 0);
>> +                                type, sizetype, 0);
>>
>>    /* Add the candidate to the statement-candidate mapping.  */
>>    add_cand_for_stmt (gs, c);
>> @@ -1010,6 +1021,7 @@ static slsr_cand_t
>>  create_mul_ssa_cand (gimple *gs, tree base_in, tree stride_in, bool speed)
>>  {
>>    tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
>> +  tree stype = NULL_TREE;
>>    widest_int index;
>>    unsigned savings = 0;
>>    slsr_cand_t c;
>> @@ -1030,6 +1042,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
>>           index = base_cand->index;
>>           stride = stride_in;
>>           ctype = base_cand->cand_type;
>> +         stype = TREE_TYPE (stride_in);
>>           if (has_single_use (base_in))
>>             savings = (base_cand->dead_savings
>>                        + stmt_cost (base_cand->cand_stmt, speed));
>> @@ -1045,6 +1058,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
>>           index = base_cand->index * wi::to_widest (base_cand->stride);
>>           stride = stride_in;
>>           ctype = base_cand->cand_type;
>> +         stype = TREE_TYPE (stride_in);
>>           if (has_single_use (base_in))
>>             savings = (base_cand->dead_savings
>>                        + stmt_cost (base_cand->cand_stmt, speed));
>> @@ -1064,10 +1078,11 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre
>>        index = 0;
>>        stride = stride_in;
>>        ctype = TREE_TYPE (base_in);
>> +      stype = TREE_TYPE (stride_in);
>>      }
>>
>>    c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride,
>> -                                ctype, savings);
>> +                                ctype, stype, savings);
>>    return c;
>>  }
>>
>> @@ -1156,7 +1171,7 @@ create_mul_imm_cand (gimple *gs, tree base_in, tre
>>      }
>>
>>    c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride,
>> -                                ctype, savings);
>> +                                ctype, sizetype, savings);
>>    return c;
>>  }
>>
>> @@ -1212,7 +1227,8 @@ static slsr_cand_t
>>  create_add_ssa_cand (gimple *gs, tree base_in, tree addend_in,
>>                      bool subtract_p, bool speed)
>>  {
>> -  tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL;
>> +  tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
>> +  tree stype = NULL_TREE;
>>    widest_int index;
>>    unsigned savings = 0;
>>    slsr_cand_t c;
>> @@ -1237,6 +1253,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>>             index = -index;
>>           stride = addend_cand->base_expr;
>>           ctype = TREE_TYPE (base_in);
>> +         stype = addend_cand->cand_type;
>>           if (has_single_use (addend_in))
>>             savings = (addend_cand->dead_savings
>>                        + stmt_cost (addend_cand->cand_stmt, speed));
>> @@ -1263,6 +1280,8 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>>           index = subtract_p ? -1 : 1;
>>           stride = addend_in;
>>           ctype = base_cand->cand_type;
>> +         stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype
>> +                  : TREE_TYPE (addend_in));
>>           if (has_single_use (base_in))
>>             savings = (base_cand->dead_savings
>>                        + stmt_cost (base_cand->cand_stmt, speed));
>> @@ -1286,6 +1305,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>>                   index = -index;
>>                   stride = subtrahend_cand->base_expr;
>>                   ctype = TREE_TYPE (base_in);
>> +                 stype = subtrahend_cand->cand_type;
>>                   if (has_single_use (addend_in))
>>                     savings = (subtrahend_cand->dead_savings
>>                                + stmt_cost (subtrahend_cand->cand_stmt, speed));
>> @@ -1312,10 +1332,12 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre
>>        index = subtract_p ? -1 : 1;
>>        stride = addend_in;
>>        ctype = TREE_TYPE (base_in);
>> +      stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype
>> +              : TREE_TYPE (addend_in));
>>      }
>>
>>    c = alloc_cand_and_find_basis (CAND_ADD, gs, base, index, stride,
>> -                                ctype, savings);
>> +                                ctype, stype, savings);
>>    return c;
>>  }
>>
>> @@ -1329,6 +1351,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con
>>  {
>>    enum cand_kind kind = CAND_ADD;
>>    tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
>> +  tree stype = NULL_TREE;
>>    widest_int index, multiple;
>>    unsigned savings = 0;
>>    slsr_cand_t c;
>> @@ -1356,6 +1379,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con
>>           index = base_cand->index + multiple;
>>           stride = base_cand->stride;
>>           ctype = base_cand->cand_type;
>> +         stype = base_cand->stride_type;
>>           if (has_single_use (base_in))
>>             savings = (base_cand->dead_savings
>>                        + stmt_cost (base_cand->cand_stmt, speed));
>> @@ -1376,10 +1400,11 @@ create_add_imm_cand (gimple *gs, tree base_in, con
>>        index = index_in;
>>        stride = integer_one_node;
>>        ctype = TREE_TYPE (base_in);
>> +      stype = sizetype;
>>      }
>>
>>    c = alloc_cand_and_find_basis (kind, gs, base, index, stride,
>> -                                ctype, savings);
>> +                                ctype, stype, savings);
>>    return c;
>>  }
>>
>> @@ -1456,14 +1481,11 @@ slsr_process_neg (gimple *gs, tree rhs1, bool spee
>>     for more details.  */
>>
>>  static bool
>> -legal_cast_p_1 (tree lhs, tree rhs)
>> +legal_cast_p_1 (tree lhs_type, tree rhs_type)
>>  {
>> -  tree lhs_type, rhs_type;
>>    unsigned lhs_size, rhs_size;
>>    bool lhs_wraps, rhs_wraps;
>>
>> -  lhs_type = TREE_TYPE (lhs);
>> -  rhs_type = TREE_TYPE (rhs);
>>    lhs_size = TYPE_PRECISION (lhs_type);
>>    rhs_size = TYPE_PRECISION (rhs_type);
>>    lhs_wraps = ANY_INTEGRAL_TYPE_P (lhs_type) && TYPE_OVERFLOW_WRAPS (lhs_type);
>> @@ -1521,7 +1543,7 @@ legal_cast_p (gimple *gs, tree rhs)
>>        || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (gs)))
>>      return false;
>>
>> -  return legal_cast_p_1 (gimple_assign_lhs (gs), rhs);
>> +  return legal_cast_p_1 (TREE_TYPE (gimple_assign_lhs (gs)), TREE_TYPE (rhs));
>>  }
>>
>>  /* Given GS which is a cast to a scalar integer type, determine whether
>> @@ -1556,7 +1578,8 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe
>>           c = alloc_cand_and_find_basis (base_cand->kind, gs,
>>                                          base_cand->base_expr,
>>                                          base_cand->index, base_cand->stride,
>> -                                        ctype, savings);
>> +                                        ctype, base_cand->stride_type,
>> +                                        savings);
>>           if (base_cand->next_interp)
>>             base_cand = lookup_cand (base_cand->next_interp);
>>           else
>> @@ -1574,10 +1597,10 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe
>>          The first of these is somewhat arbitrary, but the choice of
>>          1 for the stride simplifies the logic for propagating casts
>>          into their uses.  */
>> -      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1,
>> -                                    0, integer_one_node, ctype, 0);
>> -      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1,
>> -                                     0, integer_one_node, ctype, 0);
>> +      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0,
>> +                                    integer_one_node, ctype, sizetype, 0);
>> +      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0,
>> +                                     integer_one_node, ctype, sizetype, 0);
>>        c->next_interp = c2->cand_num;
>>      }
>>
>> @@ -1613,7 +1636,8 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe
>>           c = alloc_cand_and_find_basis (base_cand->kind, gs,
>>                                          base_cand->base_expr,
>>                                          base_cand->index, base_cand->stride,
>> -                                        base_cand->cand_type, savings);
>> +                                        base_cand->cand_type,
>> +                                        base_cand->stride_type, savings);
>>           if (base_cand->next_interp)
>>             base_cand = lookup_cand (base_cand->next_interp);
>>           else
>> @@ -1631,10 +1655,12 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe
>>          The first of these is somewhat arbitrary, but the choice of
>>          1 for the stride simplifies the logic for propagating casts
>>          into their uses.  */
>> -      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1,
>> -                                    0, integer_one_node, TREE_TYPE (rhs1), 0);
>> -      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1,
>> -                                     0, integer_one_node, TREE_TYPE (rhs1), 0);
>> +      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0,
>> +                                    integer_one_node, TREE_TYPE (rhs1),
>> +                                    sizetype, 0);
>> +      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0,
>> +                                     integer_one_node, TREE_TYPE (rhs1),
>> +                                     sizetype, 0);
>>        c->next_interp = c2->cand_num;
>>      }
>>
>> @@ -1755,6 +1781,13 @@ dump_candidate (slsr_cand_t c)
>>        fputs (" + ", dump_file);
>>        print_decs (c->index, dump_file);
>>        fputs (") * ", dump_file);
>> +      if (TREE_CODE (c->stride) != INTEGER_CST
>> +         && c->stride_type != TREE_TYPE (c->stride))
>> +       {
>> +         fputs ("(", dump_file);
>> +         print_generic_expr (dump_file, c->stride_type, 0);
>> +         fputs (")", dump_file);
>> +       }
>>        print_generic_expr (dump_file, c->stride, 0);
>>        fputs (" : ", dump_file);
>>        break;
>> @@ -1764,6 +1797,13 @@ dump_candidate (slsr_cand_t c)
>>        fputs (" + (", dump_file);
>>        print_decs (c->index, dump_file);
>>        fputs (" * ", dump_file);
>> +      if (TREE_CODE (c->stride) != INTEGER_CST
>> +         && c->stride_type != TREE_TYPE (c->stride))
>> +       {
>> +         fputs ("(", dump_file);
>> +         print_generic_expr (dump_file, c->stride_type, 0);
>> +         fputs (")", dump_file);
>> +       }
>>        print_generic_expr (dump_file, c->stride, 0);
>>        fputs (") : ", dump_file);
>>        break;
>> @@ -2143,7 +2183,7 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
>>    basic_block insert_bb;
>>    gimple_stmt_iterator gsi;
>>    tree lhs, basis_type;
>> -  gassign *new_stmt;
>> +  gassign *new_stmt, *cast_stmt = NULL;
>>
>>    /* If the add candidate along this incoming edge has the same
>>       index as C's hidden basis, the hidden basis represents this
>> @@ -2187,13 +2227,27 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
>>           new_stmt = gimple_build_assign (lhs, code, basis_name,
>>                                           incr_vec[i].initializer);
>>         }
>> -      else if (increment == 1)
>> -       new_stmt = gimple_build_assign (lhs, plus_code, basis_name, c->stride);
>> -      else if (increment == -1)
>> -       new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name,
>> -                                       c->stride);
>> -      else
>> -       gcc_unreachable ();
>> +      else {
>> +       tree stride;
>> +
>> +       if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
>> +         {
>> +           tree cast_stride = make_temp_ssa_name (c->stride_type, NULL,
>> +                                                  "slsr");
>> +           cast_stmt = gimple_build_assign (cast_stride, NOP_EXPR,
>> +                                            c->stride);
>> +           stride = cast_stride;
>> +         }
>> +       else
>> +         stride = c->stride;
>> +
>> +       if (increment == 1)
>> +         new_stmt = gimple_build_assign (lhs, plus_code, basis_name, stride);
>> +       else if (increment == -1)
>> +         new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, stride);
>> +       else
>> +         gcc_unreachable ();
>> +      }
>>      }
>>
>>    insert_bb = single_succ_p (e->src) ? e->src : split_edge (e);
>> @@ -2200,14 +2254,34 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
>>    gsi = gsi_last_bb (insert_bb);
>>
>>    if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
>> -    gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT);
>> +    {
>> +      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
>> +      if (cast_stmt)
>> +       {
>> +         gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
>> +         gimple_set_location (cast_stmt, loc);
>> +       }
>> +    }
>>    else
>> -    gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
>> +    {
>> +      if (cast_stmt)
>> +       {
>> +         gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT);
>> +         gimple_set_location (cast_stmt, loc);
>> +       }
>> +      gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
>> +    }
>>
>>    gimple_set_location (new_stmt, loc);
>>
>>    if (dump_file && (dump_flags & TDF_DETAILS))
>>      {
>> +      if (cast_stmt)
>> +       {
>> +         fprintf (dump_file, "Inserting cast in block %d: ",
>> +                  insert_bb->index);
>> +         print_gimple_stmt (dump_file, cast_stmt, 0, 0);
>> +       }
>>        fprintf (dump_file, "Inserting in block %d: ", insert_bb->index);
>>        print_gimple_stmt (dump_file, new_stmt, 0, 0);
>>      }
>> @@ -2825,40 +2899,35 @@ analyze_increments (slsr_cand_t first_dep, machine
>>                    && !POINTER_TYPE_P (first_dep->cand_type)))
>>         incr_vec[i].cost = COST_NEUTRAL;
>>
>> -      /* FORNOW: If we need to add an initializer, give up if a cast from
>> -        the candidate's type to its stride's type can lose precision.
>> -        This could eventually be handled better by expressly retaining the
>> -        result of a cast to a wider type in the stride.  Example:
>> +      /* If we need to add an initializer, give up if a cast from the
>> +        candidate's type to its stride's type can lose precision.
>> +        Note that this already takes into account that the stride may
>> +        have been cast to a wider type, in which case this test won't
>> +        fire.  Example:
>>
>>             short int _1;
>>            _2 = (int) _1;
>>            _3 = _2 * 10;
>> -          _4 = x + _3;    ADD: x + (10 * _1) : int
>> +          _4 = x + _3;    ADD: x + (10 * (int)_1) : int
>>            _5 = _2 * 15;
>> -          _6 = x + _3;    ADD: x + (15 * _1) : int
>> +          _6 = x + _5;    ADD: x + (15 * (int)_1) : int
>>
>> -         Right now replacing _6 would cause insertion of an initializer
>> -        of the form "short int T = _1 * 5;" followed by a cast to
>> -        int, which could overflow incorrectly.  Had we recorded _2 or
>> -        (int)_1 as the stride, this wouldn't happen.  However, doing
>> -         this breaks other opportunities, so this will require some
>> -        care.  */
>> +        Although the stride was a short int initially, the stride
>> +        used in the analysis has been widened to an int, and such
>> +        widening will be done in the initializer as well.  */
>>        else if (!incr_vec[i].initializer
>>                && TREE_CODE (first_dep->stride) != INTEGER_CST
>> -              && !legal_cast_p_1 (first_dep->stride,
>> -                                  gimple_assign_lhs (first_dep->cand_stmt)))
>> -
>> +              && !legal_cast_p_1 (first_dep->stride_type,
>> +                                  TREE_TYPE (gimple_assign_lhs
>> +                                             (first_dep->cand_stmt))))
>>         incr_vec[i].cost = COST_INFINITE;
>>
>>        /* If we need to add an initializer, make sure we don't introduce
>>          a multiply by a pointer type, which can happen in certain cast
>> -        scenarios.  FIXME: When cleaning up these cast issues, we can
>> -         afford to introduce the multiply provided we cast out to an
>> -         unsigned int of appropriate size.  */
>> +        scenarios.  */
>>        else if (!incr_vec[i].initializer
>>                && TREE_CODE (first_dep->stride) != INTEGER_CST
>> -              && POINTER_TYPE_P (TREE_TYPE (first_dep->stride)))
>> -
>> +              && POINTER_TYPE_P (first_dep->stride_type))
>>         incr_vec[i].cost = COST_INFINITE;
>>
>>        /* For any other increment, if this is a multiply candidate, we
>> @@ -3105,7 +3174,8 @@ insert_initializers (slsr_cand_t c)
>>        basic_block bb;
>>        slsr_cand_t where = NULL;
>>        gassign *init_stmt;
>> -      tree stride_type, new_name, incr_tree;
>> +      gassign *cast_stmt = NULL;
>> +      tree new_name, incr_tree, init_stride;
>>        widest_int incr = incr_vec[i].incr;
>>
>>        if (!profitable_increment_p (i)
>> @@ -3134,31 +3204,63 @@ insert_initializers (slsr_cand_t c)
>>          that block, the earliest one will be returned in WHERE.  */
>>        bb = nearest_common_dominator_for_cands (c, incr, &where);
>>
>> +      /* If the nominal stride has a different type than the recorded
>> +        stride type, build a cast from the nominal stride to that type.  */
>> +      if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
>> +       {
>> +         init_stride = make_temp_ssa_name (c->stride_type, NULL, "slsr");
>> +         cast_stmt = gimple_build_assign (init_stride, NOP_EXPR, c->stride);
>> +       }
>> +      else
>> +       init_stride = c->stride;
>> +
>>        /* Create a new SSA name to hold the initializer's value.  */
>> -      stride_type = TREE_TYPE (c->stride);
>> -      new_name = make_temp_ssa_name (stride_type, NULL, "slsr");
>> +      new_name = make_temp_ssa_name (c->stride_type, NULL, "slsr");
>>        incr_vec[i].initializer = new_name;
>>
>>        /* Create the initializer and insert it in the latest possible
>>          dominating position.  */
>> -      incr_tree = wide_int_to_tree (stride_type, incr);
>> +      incr_tree = wide_int_to_tree (c->stride_type, incr);
>>        init_stmt = gimple_build_assign (new_name, MULT_EXPR,
>> -                                      c->stride, incr_tree);
>> +                                      init_stride, incr_tree);
>>        if (where)
>>         {
>>           gimple_stmt_iterator gsi = gsi_for_stmt (where->cand_stmt);
>> +         location_t loc = gimple_location (where->cand_stmt);
>> +
>> +         if (cast_stmt)
>> +           {
>> +             gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
>> +             gimple_set_location (cast_stmt, loc);
>> +           }
>> +
>>           gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
>> -         gimple_set_location (init_stmt, gimple_location (where->cand_stmt));
>> +         gimple_set_location (init_stmt, loc);
>>         }
>>        else
>>         {
>>           gimple_stmt_iterator gsi = gsi_last_bb (bb);
>>           gimple *basis_stmt = lookup_cand (c->basis)->cand_stmt;
>> +         location_t loc = gimple_location (basis_stmt);
>>
>>           if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
>> -           gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
>> +           {
>> +             if (cast_stmt)
>> +               {
>> +                 gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
>> +                 gimple_set_location (cast_stmt, loc);
>> +               }
>> +             gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
>> +           }
>>           else
>> -           gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
>> +           {
>> +             if (cast_stmt)
>> +               {
>> +                 gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT);
>> +                 gimple_set_location (cast_stmt, loc);
>> +               }
>> +             gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
>> +           }
>>
>>           gimple_set_location (init_stmt, gimple_location (basis_stmt));
>>         }
>> @@ -3165,6 +3267,11 @@ insert_initializers (slsr_cand_t c)
>>
>>        if (dump_file && (dump_flags & TDF_DETAILS))
>>         {
>> +         if (cast_stmt)
>> +           {
>> +             fputs ("Inserting stride cast: ", dump_file);
>> +             print_gimple_stmt (dump_file, cast_stmt, 0, 0);
>> +           }
>>           fputs ("Inserting initializer: ", dump_file);
>>           print_gimple_stmt (dump_file, init_stmt, 0, 0);
>>         }
>> Index: gcc/testsuite/gcc.dg/tree-ssa/pr54245.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/tree-ssa/pr54245.c     (revision 241379)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/pr54245.c     (working copy)
>> @@ -1,48 +0,0 @@
>> -/* { dg-do compile } */
>> -/* { dg-options "-O1 -fdump-tree-slsr-details" } */
>> -
>> -#include <stdio.h>
>> -
>> -#define W1  22725
>> -#define W2  21407
>> -#define W3  19266
>> -#define W6  8867
>> -
>> -void idct_row(short *row, int *dst)
>> -{
>> -    int a0, a1, b0, b1;
>> -
>> -    a0 = W1 * row[0];
>> -    a1 = a0;
>> -
>> -    a0 += W2 * row[2];
>> -    a1 += W6 * row[2];
>> -
>> -    b0 = W1 * row[1];
>> -    b1 = W3 * row[1];
>> -
>> -    dst[0] = a0 + b0;
>> -    dst[1] = a0 - b0;
>> -    dst[2] = a1 + b1;
>> -    dst[3] = a1 - b1;
>> -}
>> -
>> -static short block[8] = { 1, 2, 3, 4 };
>> -
>> -int main(void)
>> -{
>> -    int out[4];
>> -    int i;
>> -
>> -    idct_row(block, out);
>> -
>> -    for (i = 0; i < 4; i++)
>> -        printf("%d\n", out[i]);
>> -
>> -    return !(out[2] == 87858 && out[3] == 10794);
>> -}
>> -
>> -/* For now, disable inserting an initializer when the multiplication will
>> -   take place in a smaller type than originally.  This test may be deleted
>> -   in future when this case is handled more precisely.  */
>> -/* { dg-final { scan-tree-dump-times "Inserting initializer" 0 "slsr" { target { ! int16 } } } } */
>> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c      (revision 241379)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c      (working copy)
>> @@ -17,7 +17,12 @@ f (int s, int *c)
>>    return x1 ? x2 : x3;
>>  }
>>
>> +/* Note that since some branch prediction heuristics changed, the
>> +   calculations of x2 and x3 are pushed downward into the legs
>> +   of the conditional, changing the code presented to SLSR.
>> +   However, this proves to be a useful test for introducing an
>> +   initializer with a cast, so we'll keep it as is.  */
>> +
>>  /* There are 4 ' * ' instances in the decls (since "int * iftmp.0;" is
>> -   added), 1 parm, 2 in the code.  The second one in the code can be
>> -   a widening mult.  */
>> -/* { dg-final { scan-tree-dump-times " w?\\* " 7 "optimized" } } */
>> +   added), 2 parms, 3 in the code.  */
>> +/* { dg-final { scan-tree-dump-times " \\* " 9 "optimized" } } */
>>
>>
Bill Schmidt Nov. 4, 2016, 2:36 p.m. UTC | #3
On Nov 4, 2016, at 9:10 AM, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> 
> 
> Hi,
> 
> Since this was committed, I'm seeing a failure in
>  gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " \\* " 9
> on aarch64 targets.

Hi Christophe,

Best open a bug report or I will likely lose this in the crush.  If you don't mind,
please include the generated assembly file so I can see why the number of *'s
is off, and provide the output from -fdump-tree-slsr-details.  I imagine this is
a minor issue.  Please CC me as wschmidt@gcc.gnu.org.

Thanks!
Bill
Christophe Lyon Nov. 4, 2016, 2:47 p.m. UTC | #4
On 4 November 2016 at 15:36, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
> On Nov 4, 2016, at 9:10 AM, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>>
>>
>> Hi,
>>
>> Since this was committed, I'm seeing a failure in
>>  gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " \\* " 9
>> on aarch64 targets.
>
> Hi Christophe,
>
> Best open a bug report or I will likely lose this in the crush.  If you don't mind,
> please include the generated assembly file so I can see why the number of *'s
> is off, and provide the output from -fdump-tree-slsr-details.  I imagine this is
> a minor issue.  Please CC me as wschmidt@gcc.gnu.org.
>

OK, I've created pr78210 for this.

Christophe

> Thanks!
> Bill
diff mbox

Patch

Index: gcc/gimple-ssa-strength-reduction.c
===================================================================
--- gcc/gimple-ssa-strength-reduction.c	(revision 241379)
+++ gcc/gimple-ssa-strength-reduction.c	(working copy)
@@ -246,6 +246,13 @@  struct slsr_cand_d
      replacement MEM_REF.)  */
   tree cand_type;
 
+  /* The type to be used to interpret the stride field when the stride
+     is not a constant.  Normally the same as the type of the recorded
+     stride, but when the stride has been cast we need to maintain that
+     knowledge in order to make legal substitutions without losing 
+     precision.  When the stride is a constant, this will be sizetype.  */
+  tree stride_type;
+
   /* The kind of candidate (CAND_MULT, etc.).  */
   enum cand_kind kind;
 
@@ -502,6 +509,7 @@  find_basis_for_base_expr (slsr_cand_t c, tree base
 	  || one_basis->cand_stmt == c->cand_stmt
 	  || !operand_equal_p (one_basis->stride, c->stride, 0)
 	  || !types_compatible_p (one_basis->cand_type, c->cand_type)
+	  || !types_compatible_p (one_basis->stride_type, c->stride_type)
 	  || !dominated_by_p (CDI_DOMINATORS,
 			      gimple_bb (c->cand_stmt),
 			      gimple_bb (one_basis->cand_stmt)))
@@ -615,7 +623,7 @@  record_potential_basis (slsr_cand_t c, tree base)
 static slsr_cand_t
 alloc_cand_and_find_basis (enum cand_kind kind, gimple *gs, tree base,
 			   const widest_int &index, tree stride, tree ctype,
-			   unsigned savings)
+			   tree stype, unsigned savings)
 {
   slsr_cand_t c = (slsr_cand_t) obstack_alloc (&cand_obstack,
 					       sizeof (slsr_cand));
@@ -624,6 +632,7 @@  alloc_cand_and_find_basis (enum cand_kind kind, gi
   c->stride = stride;
   c->index = index;
   c->cand_type = ctype;
+  c->stride_type = stype;
   c->kind = kind;
   c->cand_num = cand_vec.length () + 1;
   c->next_interp = 0;
@@ -809,7 +818,8 @@  slsr_process_phi (gphi *phi, bool speed)
   base_type = TREE_TYPE (arg0_base);
 
   c = alloc_cand_and_find_basis (CAND_PHI, phi, arg0_base,
-				 0, integer_one_node, base_type, savings);
+				 0, integer_one_node, base_type,
+				 sizetype, savings);
 
   /* Add the candidate to the statement-candidate mapping.  */
   add_cand_for_stmt (phi, c);
@@ -838,7 +848,8 @@  backtrace_base_for_ref (tree *pbase)
      e.g. 'B' is widened from an 'int' in order to calculate
      a 64-bit address.  */
   if (CONVERT_EXPR_P (base_in)
-      && legal_cast_p_1 (base_in, TREE_OPERAND (base_in, 0)))
+      && legal_cast_p_1 (TREE_TYPE (base_in),
+			 TREE_TYPE (TREE_OPERAND (base_in, 0))))
     base_in = get_unwidened (base_in, NULL_TREE);
 
   if (TREE_CODE (base_in) != SSA_NAME)
@@ -995,7 +1006,7 @@  slsr_process_ref (gimple *gs)
     return;
 
   c = alloc_cand_and_find_basis (CAND_REF, gs, base, index, offset,
-				 type, 0);
+				 type, sizetype, 0);
 
   /* Add the candidate to the statement-candidate mapping.  */
   add_cand_for_stmt (gs, c);
@@ -1010,6 +1021,7 @@  static slsr_cand_t
 create_mul_ssa_cand (gimple *gs, tree base_in, tree stride_in, bool speed)
 {
   tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
+  tree stype = NULL_TREE;
   widest_int index;
   unsigned savings = 0;
   slsr_cand_t c;
@@ -1030,6 +1042,7 @@  create_mul_ssa_cand (gimple *gs, tree base_in, tre
 	  index = base_cand->index;
 	  stride = stride_in;
 	  ctype = base_cand->cand_type;
+	  stype = TREE_TYPE (stride_in);
 	  if (has_single_use (base_in))
 	    savings = (base_cand->dead_savings 
 		       + stmt_cost (base_cand->cand_stmt, speed));
@@ -1045,6 +1058,7 @@  create_mul_ssa_cand (gimple *gs, tree base_in, tre
 	  index = base_cand->index * wi::to_widest (base_cand->stride);
 	  stride = stride_in;
 	  ctype = base_cand->cand_type;
+	  stype = TREE_TYPE (stride_in);
 	  if (has_single_use (base_in))
 	    savings = (base_cand->dead_savings
 		       + stmt_cost (base_cand->cand_stmt, speed));
@@ -1064,10 +1078,11 @@  create_mul_ssa_cand (gimple *gs, tree base_in, tre
       index = 0;
       stride = stride_in;
       ctype = TREE_TYPE (base_in);
+      stype = TREE_TYPE (stride_in);
     }
 
   c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride,
-				 ctype, savings);
+				 ctype, stype, savings);
   return c;
 }
 
@@ -1156,7 +1171,7 @@  create_mul_imm_cand (gimple *gs, tree base_in, tre
     }
 
   c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride,
-				 ctype, savings);
+				 ctype, sizetype, savings);
   return c;
 }
 
@@ -1212,7 +1227,8 @@  static slsr_cand_t
 create_add_ssa_cand (gimple *gs, tree base_in, tree addend_in,
 		     bool subtract_p, bool speed)
 {
-  tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL;
+  tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
+  tree stype = NULL_TREE;
   widest_int index;
   unsigned savings = 0;
   slsr_cand_t c;
@@ -1237,6 +1253,7 @@  create_add_ssa_cand (gimple *gs, tree base_in, tre
 	    index = -index;
 	  stride = addend_cand->base_expr;
 	  ctype = TREE_TYPE (base_in);
+	  stype = addend_cand->cand_type;
 	  if (has_single_use (addend_in))
 	    savings = (addend_cand->dead_savings
 		       + stmt_cost (addend_cand->cand_stmt, speed));
@@ -1263,6 +1280,8 @@  create_add_ssa_cand (gimple *gs, tree base_in, tre
 	  index = subtract_p ? -1 : 1;
 	  stride = addend_in;
 	  ctype = base_cand->cand_type;
+	  stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype
+		   : TREE_TYPE (addend_in));
 	  if (has_single_use (base_in))
 	    savings = (base_cand->dead_savings
 		       + stmt_cost (base_cand->cand_stmt, speed));
@@ -1286,6 +1305,7 @@  create_add_ssa_cand (gimple *gs, tree base_in, tre
 		  index = -index;
 		  stride = subtrahend_cand->base_expr;
 		  ctype = TREE_TYPE (base_in);
+		  stype = subtrahend_cand->cand_type;
 		  if (has_single_use (addend_in))
 		    savings = (subtrahend_cand->dead_savings 
 			       + stmt_cost (subtrahend_cand->cand_stmt, speed));
@@ -1312,10 +1332,12 @@  create_add_ssa_cand (gimple *gs, tree base_in, tre
       index = subtract_p ? -1 : 1;
       stride = addend_in;
       ctype = TREE_TYPE (base_in);
+      stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype
+	       : TREE_TYPE (addend_in));
     }
 
   c = alloc_cand_and_find_basis (CAND_ADD, gs, base, index, stride,
-				 ctype, savings);
+				 ctype, stype, savings);
   return c;
 }
 
@@ -1329,6 +1351,7 @@  create_add_imm_cand (gimple *gs, tree base_in, con
 {
   enum cand_kind kind = CAND_ADD;
   tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE;
+  tree stype = NULL_TREE;
   widest_int index, multiple;
   unsigned savings = 0;
   slsr_cand_t c;
@@ -1356,6 +1379,7 @@  create_add_imm_cand (gimple *gs, tree base_in, con
 	  index = base_cand->index + multiple;
 	  stride = base_cand->stride;
 	  ctype = base_cand->cand_type;
+	  stype = base_cand->stride_type;
 	  if (has_single_use (base_in))
 	    savings = (base_cand->dead_savings 
 		       + stmt_cost (base_cand->cand_stmt, speed));
@@ -1376,10 +1400,11 @@  create_add_imm_cand (gimple *gs, tree base_in, con
       index = index_in;
       stride = integer_one_node;
       ctype = TREE_TYPE (base_in);
+      stype = sizetype;
     }
 
   c = alloc_cand_and_find_basis (kind, gs, base, index, stride,
-				 ctype, savings);
+				 ctype, stype, savings);
   return c;
 }
 
@@ -1456,14 +1481,11 @@  slsr_process_neg (gimple *gs, tree rhs1, bool spee
    for more details.  */
 
 static bool
-legal_cast_p_1 (tree lhs, tree rhs)
+legal_cast_p_1 (tree lhs_type, tree rhs_type)
 {
-  tree lhs_type, rhs_type;
   unsigned lhs_size, rhs_size;
   bool lhs_wraps, rhs_wraps;
 
-  lhs_type = TREE_TYPE (lhs);
-  rhs_type = TREE_TYPE (rhs);
   lhs_size = TYPE_PRECISION (lhs_type);
   rhs_size = TYPE_PRECISION (rhs_type);
   lhs_wraps = ANY_INTEGRAL_TYPE_P (lhs_type) && TYPE_OVERFLOW_WRAPS (lhs_type);
@@ -1521,7 +1543,7 @@  legal_cast_p (gimple *gs, tree rhs)
       || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (gs)))
     return false;
 
-  return legal_cast_p_1 (gimple_assign_lhs (gs), rhs);
+  return legal_cast_p_1 (TREE_TYPE (gimple_assign_lhs (gs)), TREE_TYPE (rhs));
 }
 
 /* Given GS which is a cast to a scalar integer type, determine whether
@@ -1556,7 +1578,8 @@  slsr_process_cast (gimple *gs, tree rhs1, bool spe
 	  c = alloc_cand_and_find_basis (base_cand->kind, gs,
 					 base_cand->base_expr,
 					 base_cand->index, base_cand->stride,
-					 ctype, savings);
+					 ctype, base_cand->stride_type,
+					 savings);
 	  if (base_cand->next_interp)
 	    base_cand = lookup_cand (base_cand->next_interp);
 	  else
@@ -1574,10 +1597,10 @@  slsr_process_cast (gimple *gs, tree rhs1, bool spe
 	 The first of these is somewhat arbitrary, but the choice of
 	 1 for the stride simplifies the logic for propagating casts
 	 into their uses.  */
-      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1,
-				     0, integer_one_node, ctype, 0);
-      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1,
-				      0, integer_one_node, ctype, 0);
+      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0,
+				     integer_one_node, ctype, sizetype, 0);
+      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0,
+				      integer_one_node, ctype, sizetype, 0);
       c->next_interp = c2->cand_num;
     }
 
@@ -1613,7 +1636,8 @@  slsr_process_copy (gimple *gs, tree rhs1, bool spe
 	  c = alloc_cand_and_find_basis (base_cand->kind, gs,
 					 base_cand->base_expr,
 					 base_cand->index, base_cand->stride,
-					 base_cand->cand_type, savings);
+					 base_cand->cand_type,
+					 base_cand->stride_type, savings);
 	  if (base_cand->next_interp)
 	    base_cand = lookup_cand (base_cand->next_interp);
 	  else
@@ -1631,10 +1655,12 @@  slsr_process_copy (gimple *gs, tree rhs1, bool spe
 	 The first of these is somewhat arbitrary, but the choice of
 	 1 for the stride simplifies the logic for propagating casts
 	 into their uses.  */
-      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1,
-				     0, integer_one_node, TREE_TYPE (rhs1), 0);
-      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1,
-				      0, integer_one_node, TREE_TYPE (rhs1), 0);
+      c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0,
+				     integer_one_node, TREE_TYPE (rhs1),
+				     sizetype, 0);
+      c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0,
+				      integer_one_node, TREE_TYPE (rhs1),
+				      sizetype, 0);
       c->next_interp = c2->cand_num;
     }
 
@@ -1755,6 +1781,13 @@  dump_candidate (slsr_cand_t c)
       fputs (" + ", dump_file);
       print_decs (c->index, dump_file);
       fputs (") * ", dump_file);
+      if (TREE_CODE (c->stride) != INTEGER_CST
+	  && c->stride_type != TREE_TYPE (c->stride))
+	{
+	  fputs ("(", dump_file);
+	  print_generic_expr (dump_file, c->stride_type, 0);
+	  fputs (")", dump_file);
+	}
       print_generic_expr (dump_file, c->stride, 0);
       fputs (" : ", dump_file);
       break;
@@ -1764,6 +1797,13 @@  dump_candidate (slsr_cand_t c)
       fputs (" + (", dump_file);
       print_decs (c->index, dump_file);
       fputs (" * ", dump_file);
+      if (TREE_CODE (c->stride) != INTEGER_CST
+	  && c->stride_type != TREE_TYPE (c->stride))
+	{
+	  fputs ("(", dump_file);
+	  print_generic_expr (dump_file, c->stride_type, 0);
+	  fputs (")", dump_file);
+	}
       print_generic_expr (dump_file, c->stride, 0);
       fputs (") : ", dump_file);
       break;
@@ -2143,7 +2183,7 @@  create_add_on_incoming_edge (slsr_cand_t c, tree b
   basic_block insert_bb;
   gimple_stmt_iterator gsi;
   tree lhs, basis_type;
-  gassign *new_stmt;
+  gassign *new_stmt, *cast_stmt = NULL;
 
   /* If the add candidate along this incoming edge has the same
      index as C's hidden basis, the hidden basis represents this
@@ -2187,13 +2227,27 @@  create_add_on_incoming_edge (slsr_cand_t c, tree b
 	  new_stmt = gimple_build_assign (lhs, code, basis_name,
 					  incr_vec[i].initializer);
 	}
-      else if (increment == 1)
-	new_stmt = gimple_build_assign (lhs, plus_code, basis_name, c->stride);
-      else if (increment == -1)
-	new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name,
-					c->stride);
-      else
-	gcc_unreachable ();
+      else {
+	tree stride;
+
+	if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
+	  {
+	    tree cast_stride = make_temp_ssa_name (c->stride_type, NULL,
+						   "slsr");
+	    cast_stmt = gimple_build_assign (cast_stride, NOP_EXPR,
+					     c->stride);
+	    stride = cast_stride;
+	  }
+	else
+	  stride = c->stride;
+
+	if (increment == 1)
+	  new_stmt = gimple_build_assign (lhs, plus_code, basis_name, stride);
+	else if (increment == -1)
+	  new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, stride);
+	else
+	  gcc_unreachable ();
+      }
     }
 
   insert_bb = single_succ_p (e->src) ? e->src : split_edge (e);
@@ -2200,14 +2254,34 @@  create_add_on_incoming_edge (slsr_cand_t c, tree b
   gsi = gsi_last_bb (insert_bb);
 
   if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
-    gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT);
+    {
+      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+      if (cast_stmt)
+	{
+	  gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
+	  gimple_set_location (cast_stmt, loc);
+	}
+    }
   else
-    gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
+    {
+      if (cast_stmt)
+	{
+	  gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT);
+	  gimple_set_location (cast_stmt, loc);
+	}
+      gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
+    }
 
   gimple_set_location (new_stmt, loc);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
+      if (cast_stmt)
+	{
+	  fprintf (dump_file, "Inserting cast in block %d: ",
+		   insert_bb->index);
+	  print_gimple_stmt (dump_file, cast_stmt, 0, 0);
+	}
       fprintf (dump_file, "Inserting in block %d: ", insert_bb->index);
       print_gimple_stmt (dump_file, new_stmt, 0, 0);
     }
@@ -2825,40 +2899,35 @@  analyze_increments (slsr_cand_t first_dep, machine
 		   && !POINTER_TYPE_P (first_dep->cand_type)))
 	incr_vec[i].cost = COST_NEUTRAL;
 
-      /* FORNOW: If we need to add an initializer, give up if a cast from
-	 the candidate's type to its stride's type can lose precision.
-	 This could eventually be handled better by expressly retaining the
-	 result of a cast to a wider type in the stride.  Example:
+      /* If we need to add an initializer, give up if a cast from the
+	 candidate's type to its stride's type can lose precision.
+	 Note that this already takes into account that the stride may
+	 have been cast to a wider type, in which case this test won't
+	 fire.  Example:
 
            short int _1;
 	   _2 = (int) _1;
 	   _3 = _2 * 10;
-	   _4 = x + _3;    ADD: x + (10 * _1) : int
+	   _4 = x + _3;    ADD: x + (10 * (int)_1) : int
 	   _5 = _2 * 15;
-	   _6 = x + _3;    ADD: x + (15 * _1) : int
+	   _6 = x + _5;    ADD: x + (15 * (int)_1) : int
 
-         Right now replacing _6 would cause insertion of an initializer
-	 of the form "short int T = _1 * 5;" followed by a cast to 
-	 int, which could overflow incorrectly.  Had we recorded _2 or
-	 (int)_1 as the stride, this wouldn't happen.  However, doing
-         this breaks other opportunities, so this will require some
-	 care.  */
+	 Although the stride was a short int initially, the stride
+	 used in the analysis has been widened to an int, and such
+	 widening will be done in the initializer as well.  */
       else if (!incr_vec[i].initializer
 	       && TREE_CODE (first_dep->stride) != INTEGER_CST
-	       && !legal_cast_p_1 (first_dep->stride,
-				   gimple_assign_lhs (first_dep->cand_stmt)))
-
+	       && !legal_cast_p_1 (first_dep->stride_type,
+				   TREE_TYPE (gimple_assign_lhs
+					      (first_dep->cand_stmt))))
 	incr_vec[i].cost = COST_INFINITE;
 
       /* If we need to add an initializer, make sure we don't introduce
 	 a multiply by a pointer type, which can happen in certain cast
-	 scenarios.  FIXME: When cleaning up these cast issues, we can
-         afford to introduce the multiply provided we cast out to an
-         unsigned int of appropriate size.  */
+	 scenarios.  */
       else if (!incr_vec[i].initializer
 	       && TREE_CODE (first_dep->stride) != INTEGER_CST
-	       && POINTER_TYPE_P (TREE_TYPE (first_dep->stride)))
-
+	       && POINTER_TYPE_P (first_dep->stride_type))
 	incr_vec[i].cost = COST_INFINITE;
 
       /* For any other increment, if this is a multiply candidate, we
@@ -3105,7 +3174,8 @@  insert_initializers (slsr_cand_t c)
       basic_block bb;
       slsr_cand_t where = NULL;
       gassign *init_stmt;
-      tree stride_type, new_name, incr_tree;
+      gassign *cast_stmt = NULL;
+      tree new_name, incr_tree, init_stride;
       widest_int incr = incr_vec[i].incr;
 
       if (!profitable_increment_p (i)
@@ -3134,31 +3204,63 @@  insert_initializers (slsr_cand_t c)
 	 that block, the earliest one will be returned in WHERE.  */
       bb = nearest_common_dominator_for_cands (c, incr, &where);
 
+      /* If the nominal stride has a different type than the recorded
+	 stride type, build a cast from the nominal stride to that type.  */
+      if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
+	{
+	  init_stride = make_temp_ssa_name (c->stride_type, NULL, "slsr");
+	  cast_stmt = gimple_build_assign (init_stride, NOP_EXPR, c->stride);
+	}
+      else
+	init_stride = c->stride;
+
       /* Create a new SSA name to hold the initializer's value.  */
-      stride_type = TREE_TYPE (c->stride);
-      new_name = make_temp_ssa_name (stride_type, NULL, "slsr");
+      new_name = make_temp_ssa_name (c->stride_type, NULL, "slsr");
       incr_vec[i].initializer = new_name;
 
       /* Create the initializer and insert it in the latest possible
 	 dominating position.  */
-      incr_tree = wide_int_to_tree (stride_type, incr);
+      incr_tree = wide_int_to_tree (c->stride_type, incr);
       init_stmt = gimple_build_assign (new_name, MULT_EXPR,
-				       c->stride, incr_tree);
+				       init_stride, incr_tree);
       if (where)
 	{
 	  gimple_stmt_iterator gsi = gsi_for_stmt (where->cand_stmt);
+	  location_t loc = gimple_location (where->cand_stmt);
+
+	  if (cast_stmt)
+	    {
+	      gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
+	      gimple_set_location (cast_stmt, loc);
+	    }
+
 	  gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
-	  gimple_set_location (init_stmt, gimple_location (where->cand_stmt));
+	  gimple_set_location (init_stmt, loc);
 	}
       else
 	{
 	  gimple_stmt_iterator gsi = gsi_last_bb (bb);
 	  gimple *basis_stmt = lookup_cand (c->basis)->cand_stmt;
+	  location_t loc = gimple_location (basis_stmt);
 
 	  if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi)))
-	    gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
+	    {
+	      if (cast_stmt)
+		{
+		  gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT);
+		  gimple_set_location (cast_stmt, loc);
+		}
+	      gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT);
+	    }
 	  else
-	    gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
+	    {
+	      if (cast_stmt)
+		{
+		  gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT);
+		  gimple_set_location (cast_stmt, loc);
+		}
+	      gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT);
+	    }
 
 	  gimple_set_location (init_stmt, gimple_location (basis_stmt));
 	}
@@ -3165,6 +3267,11 @@  insert_initializers (slsr_cand_t c)
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
+	  if (cast_stmt)
+	    {
+	      fputs ("Inserting stride cast: ", dump_file);
+	      print_gimple_stmt (dump_file, cast_stmt, 0, 0);
+	    }
 	  fputs ("Inserting initializer: ", dump_file);
 	  print_gimple_stmt (dump_file, init_stmt, 0, 0);
 	}
Index: gcc/testsuite/gcc.dg/tree-ssa/pr54245.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr54245.c	(revision 241379)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr54245.c	(working copy)
@@ -1,48 +0,0 @@ 
-/* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-slsr-details" } */
-
-#include <stdio.h>
-
-#define W1  22725
-#define W2  21407
-#define W3  19266
-#define W6  8867
-
-void idct_row(short *row, int *dst)
-{
-    int a0, a1, b0, b1;
-
-    a0 = W1 * row[0];
-    a1 = a0;
-
-    a0 += W2 * row[2];
-    a1 += W6 * row[2];
-
-    b0 = W1 * row[1];
-    b1 = W3 * row[1];
-
-    dst[0] = a0 + b0;
-    dst[1] = a0 - b0;
-    dst[2] = a1 + b1;
-    dst[3] = a1 - b1;
-}
-
-static short block[8] = { 1, 2, 3, 4 };
-
-int main(void)
-{
-    int out[4];
-    int i;
-
-    idct_row(block, out);
-
-    for (i = 0; i < 4; i++)
-        printf("%d\n", out[i]);
-
-    return !(out[2] == 87858 && out[3] == 10794);
-}
-
-/* For now, disable inserting an initializer when the multiplication will
-   take place in a smaller type than originally.  This test may be deleted
-   in future when this case is handled more precisely.  */
-/* { dg-final { scan-tree-dump-times "Inserting initializer" 0 "slsr" { target { ! int16 } } } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c	(revision 241379)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c	(working copy)
@@ -17,7 +17,12 @@  f (int s, int *c)
   return x1 ? x2 : x3;
 }
 
+/* Note that since some branch prediction heuristics changed, the
+   calculations of x2 and x3 are pushed downward into the legs
+   of the conditional, changing the code presented to SLSR.
+   However, this proves to be a useful test for introducing an
+   initializer with a cast, so we'll keep it as is.  */
+
 /* There are 4 ' * ' instances in the decls (since "int * iftmp.0;" is
-   added), 1 parm, 2 in the code.  The second one in the code can be
-   a widening mult.  */
-/* { dg-final { scan-tree-dump-times " w?\\* " 7 "optimized" } } */
+   added), 2 parms, 3 in the code.  */
+/* { dg-final { scan-tree-dump-times " \\* " 9 "optimized" } } */