Patchwork Fix PR46556 (poor address generation)

login
register
mail settings
Submitter William J. Schmidt
Date Oct. 8, 2011, 2:32 p.m.
Message ID <1318084337.18738.12.camel@gnopaine>
Download mbox | patch
Permalink /patch/118543/
State New
Headers show

Comments

William J. Schmidt - Oct. 8, 2011, 2:32 p.m.
Greetings,

Here are the revised changes for the tree portions of the patch.  I've
attempted to resolve all comments to date on those portions.  Per
Steven's comment, I moved copy_ref_info into tree-ssa-address.c; let me
know if there's a better place, or whether you'd prefer to leave it
where it was.

I looked into changing the second reassoc pass to use a different
pass_late_reassoc entry, but this impacted the test suite.  There are
about 20 tests that rely on -fdump-tree-reassoc being associated with
two dump files named reassoc1 and reassoc2.  Rather than change all
these test cases for a temporary solution, I chose to use the deprecated
first_pass_instance boolean to distinguish between the two passes.  I
marked this as a Bad Thing and it will be removed once I have time to
work on the straight-line strength reducer.

I looked into adding a test case with a negative offset, but was unable
to come up with a construct that would have a negative offset on the
base MEM_REF and still be recognized by this particular pattern matcher.
In any case, the use of double_ints throughout should remove that
concern.

Thanks,
Bill


2011-10-08  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

gcc:

	PR rtl-optimization/46556
	* tree.h (copy_ref_info): Expose existing function.
	* tree-ssa-loop-ivopts.c (copy_ref_info): Move this function to...
	* tree-ssa-address.c (copy_ref_info): ...here, and remove static token.
	* tree-ssa-reassoc.c (restructure_base_and_offset): New function.
	(restructure_mem_ref): Likewise.
	(reassociate_bb): Look for opportunities to call restructure_mem_ref.

gcc/testsuite:

	PR rtl-optimization/46556
	* gcc.dg/tree-ssa/pr46556-1.c: New testcase.
	* gcc.dg/tree-ssa/pr46556-2.c: Likewise.
	* gcc.dg/tree-ssa/pr46556-3.c: Likewise.
Richard Guenther - Oct. 11, 2011, 11:40 a.m.
On Sat, 8 Oct 2011, William J. Schmidt wrote:

> Greetings,
> 
> Here are the revised changes for the tree portions of the patch.  I've
> attempted to resolve all comments to date on those portions.  Per
> Steven's comment, I moved copy_ref_info into tree-ssa-address.c; let me
> know if there's a better place, or whether you'd prefer to leave it
> where it was.
> 
> I looked into changing the second reassoc pass to use a different
> pass_late_reassoc entry, but this impacted the test suite.  There are
> about 20 tests that rely on -fdump-tree-reassoc being associated with
> two dump files named reassoc1 and reassoc2.  Rather than change all
> these test cases for a temporary solution, I chose to use the deprecated
> first_pass_instance boolean to distinguish between the two passes.  I
> marked this as a Bad Thing and it will be removed once I have time to
> work on the straight-line strength reducer.
> 
> I looked into adding a test case with a negative offset, but was unable
> to come up with a construct that would have a negative offset on the
> base MEM_REF and still be recognized by this particular pattern matcher.
> In any case, the use of double_ints throughout should remove that
> concern.

Comments below.

> Thanks,
> Bill
> 
> 
> 2011-10-08  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> gcc:
> 
> 	PR rtl-optimization/46556
> 	* tree.h (copy_ref_info): Expose existing function.
> 	* tree-ssa-loop-ivopts.c (copy_ref_info): Move this function to...
> 	* tree-ssa-address.c (copy_ref_info): ...here, and remove static token.
> 	* tree-ssa-reassoc.c (restructure_base_and_offset): New function.
> 	(restructure_mem_ref): Likewise.
> 	(reassociate_bb): Look for opportunities to call restructure_mem_ref.
> 
> gcc/testsuite:
> 
> 	PR rtl-optimization/46556
> 	* gcc.dg/tree-ssa/pr46556-1.c: New testcase.
> 	* gcc.dg/tree-ssa/pr46556-2.c: Likewise.
> 	* gcc.dg/tree-ssa/pr46556-3.c: Likewise.
> 
> 
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h	(revision 179708)
> +++ gcc/tree.h	(working copy)
> @@ -5777,6 +5777,7 @@ tree target_for_debug_bind (tree);
>  /* In tree-ssa-address.c.  */
>  extern tree tree_mem_ref_addr (tree, tree);
>  extern void copy_mem_ref_info (tree, tree);
> +extern void copy_ref_info (tree, tree);
>  
>  /* In tree-vrp.c */
>  extern bool ssa_name_nonnegative_p (const_tree);
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dom2" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 2 "dom2" } } */
> +/* { dg-final { cleanup-tree-dump "dom2" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dom2" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 12)
> +    foo (p->a[n], p->c[n], p->b[n]);
> +  else if (n > 3)
> +    foo (p->b[n], p->a[n], p->c[n]);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "dom2" } } */
> +/* { dg-final { cleanup-tree-dump "dom2" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dom2" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 3)
> +    {
> +      foo (p->a[n], p->c[n], p->b[n]);
> +      if (n > 12)
> +	foo (p->b[n], p->a[n], p->c[n]);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "dom2" } } */
> +/* { dg-final { cleanup-tree-dump "dom2" } } */
> Index: gcc/tree-ssa-loop-ivopts.c
> ===================================================================
> --- gcc/tree-ssa-loop-ivopts.c	(revision 179708)
> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
> @@ -6278,64 +6278,6 @@ rewrite_use_nonlinear_expr (struct ivopts_data *da
>      }
>  }
>  
> -/* Copies the reference information from OLD_REF to NEW_REF.  */
> -
> -static void
> -copy_ref_info (tree new_ref, tree old_ref)
> -{
> -  tree new_ptr_base = NULL_TREE;
> -
> -  TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref);
> -  TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref);
> -
> -  new_ptr_base = TREE_OPERAND (new_ref, 0);
> -
> -  /* We can transfer points-to information from an old pointer
> -     or decl base to the new one.  */
> -  if (new_ptr_base
> -      && TREE_CODE (new_ptr_base) == SSA_NAME
> -      && !SSA_NAME_PTR_INFO (new_ptr_base))
> -    {
> -      tree base = get_base_address (old_ref);
> -      if (!base)
> -	;
> -      else if ((TREE_CODE (base) == MEM_REF
> -		|| TREE_CODE (base) == TARGET_MEM_REF)
> -	       && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
> -	       && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
> -	{
> -	  struct ptr_info_def *new_pi;
> -	  duplicate_ssa_name_ptr_info
> -	    (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
> -	  new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
> -	  /* We have to be careful about transfering alignment information.  */
> -	  if (TREE_CODE (old_ref) == MEM_REF
> -	      && !(TREE_CODE (new_ref) == TARGET_MEM_REF
> -		   && (TMR_INDEX2 (new_ref)
> -		       || (TMR_STEP (new_ref)
> -			   && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
> -			       < new_pi->align)))))
> -	    {
> -	      new_pi->misalign += double_int_sub (mem_ref_offset (old_ref),
> -						  mem_ref_offset (new_ref)).low;
> -	      new_pi->misalign &= (new_pi->align - 1);
> -	    }
> -	  else
> -	    {
> -	      new_pi->align = 1;
> -	      new_pi->misalign = 0;
> -	    }
> -	}
> -      else if (TREE_CODE (base) == VAR_DECL
> -	       || TREE_CODE (base) == PARM_DECL
> -	       || TREE_CODE (base) == RESULT_DECL)
> -	{
> -	  struct ptr_info_def *pi = get_ptr_info (new_ptr_base);
> -	  pt_solution_set_var (&pi->pt, base);
> -	}
> -    }
> -}
> -
>  /* Performs a peephole optimization to reorder the iv update statement with
>     a mem ref to enable instruction combining in later phases. The mem ref uses
>     the iv value before the update, so the reordering transformation requires
> Index: gcc/tree-ssa-address.c
> ===================================================================
> --- gcc/tree-ssa-address.c	(revision 179708)
> +++ gcc/tree-ssa-address.c	(working copy)
> @@ -832,6 +832,64 @@ copy_mem_ref_info (tree to, tree from)
>    TREE_THIS_VOLATILE (to) = TREE_THIS_VOLATILE (from);
>  }
>  
> +/* Copies the reference information from OLD_REF to NEW_REF.  */

Please add something like "NEW_REF is supposed to be either a MEM_REF
or a TARGET_MEM_REF.".  You can checkin the patch moving this
function separately.

> +
> +void
> +copy_ref_info (tree new_ref, tree old_ref)
> +{
> +  tree new_ptr_base = NULL_TREE;
> +
> +  TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref);
> +  TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref);

this function misses to transfer TREE_THIS_NOTRAP which is supposed
to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF.
If you make the function generic please adjust it to at least do ...

> +  new_ptr_base = TREE_OPERAND (new_ref, 0);
> +
> +  /* We can transfer points-to information from an old pointer
> +     or decl base to the new one.  */
> +  if (new_ptr_base
> +      && TREE_CODE (new_ptr_base) == SSA_NAME
> +      && !SSA_NAME_PTR_INFO (new_ptr_base))
> +    {
> +      tree base = get_base_address (old_ref);
> +      if (!base)
> +	;
> +      else if ((TREE_CODE (base) == MEM_REF
> +		|| TREE_CODE (base) == TARGET_MEM_REF)
> +	       && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
> +	       && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
> +	{
> +	  struct ptr_info_def *new_pi;
> +	  duplicate_ssa_name_ptr_info
> +	    (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
> +	  new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
> +	  /* We have to be careful about transfering alignment information.  */
> +	  if (TREE_CODE (old_ref) == MEM_REF
> +	      && !(TREE_CODE (new_ref) == TARGET_MEM_REF
> +		   && (TMR_INDEX2 (new_ref)
> +		       || (TMR_STEP (new_ref)
> +			   && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
> +			       < new_pi->align)))))
> +	    {
> +	      new_pi->misalign += double_int_sub (mem_ref_offset (old_ref),
> +						  mem_ref_offset (new_ref)).low;
> +	      new_pi->misalign &= (new_pi->align - 1);
> +	    }
> +	  else
> +	    {
> +	      new_pi->align = 1;
> +	      new_pi->misalign = 0;
> +	    }

...

          TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base);

> +	}
> +      else if (TREE_CODE (base) == VAR_DECL
> +	       || TREE_CODE (base) == PARM_DECL
> +	       || TREE_CODE (base) == RESULT_DECL)
> +	{
> +	  struct ptr_info_def *pi = get_ptr_info (new_ptr_base);
> +	  pt_solution_set_var (&pi->pt, base);
> +	}
> +    }
> +}
> +
>  /* Move constants in target_mem_ref REF to offset.  Returns the new target
>     mem ref if anything changes, NULL_TREE otherwise.  */
>  
> Index: gcc/tree-ssa-reassoc.c
> ===================================================================
> --- gcc/tree-ssa-reassoc.c	(revision 179708)
> +++ gcc/tree-ssa-reassoc.c	(working copy)
> @@ -2815,6 +2815,121 @@ break_up_subtract_bb (basic_block bb)
>      break_up_subtract_bb (son);
>  }
>  
> +/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI,
> +   determine whether there is a profitable opportunity to restructure
> +   address arithmetic within BASE and OFFSET.  If so, produce such
> +   a restructuring and return it.  */
> +
> +static tree
> +restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi,

You are not changing *expr, so don't pass it as reference.

> +			     tree base, tree offset, HOST_WIDE_INT bitpos)
> +{
> +  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
> +  double_int c, c1, c2, c3, c4;
> +
> +  /* Look for the following pattern:
> +
> +       base   = MEM_REF (T1, C1);
> +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> +       bitpos = C4 * BITS_PER_UNIT
> +
> +     If found, convert it to:
> +
> +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> +                C1 + (C2 * C3) + C4)
> +   */
> +  if (!base
> +      || !offset
> +      || TREE_CODE (base) != MEM_REF
> +      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST

operand 1 of a MEM_REF is always an INTEGER_CST.

> +      || TREE_CODE (offset) != MULT_EXPR)
> +    return NULL_TREE;
> +
> +  mult_op0 = TREE_OPERAND (offset, 0);
> +  mult_op1 = TREE_OPERAND (offset, 1);
> +
> +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> +      || TREE_CODE (mult_op1) != INTEGER_CST
> +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> +    return NULL_TREE;
> +
> +  t1 = TREE_OPERAND (base, 0);
> +  t2 = TREE_OPERAND (mult_op0, 0);
> +  c1 = TREE_INT_CST (TREE_OPERAND (base, 1));

mem_ref_offset (base)

> +  c2 = TREE_INT_CST (TREE_OPERAND (mult_op0, 1));
> +  c3 = TREE_INT_CST (mult_op1);

tree_to_double_int ()

> +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);

You don't verify that bitpos % BITS_PER_UNIT is zero anywhere.

> +  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
> +
> +  if (!double_int_fits_in_shwi_p (c)
> +      || !double_int_fits_in_shwi_p (c3))
> +    return NULL_TREE;

I think those tests are not necessary if you use ...

> +  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
> +
> +  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
> +			   build_int_cst (sizetype, double_int_to_shwi (c3)));

double_int_to_tree (sizetype, c3)

> +  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
> +					true, GSI_SAME_STMT);
> +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> +  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
> +				       true, GSI_SAME_STMT);
> +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
> +			 build_int_cst (offset_type, double_int_to_shwi (c)));

double_int_to_tree (offset_type, c)

Please delay gimplification to the caller, that way this function
solely operates on the trees returned from get_inner_reference.
Or are you concerned that fold might undo your association?

> +  return mem_ref;
> +}
> +
> +/* If *EXPR contains a memory reference, determine whether there is an
> +   opportunity to restructure the base and offset expressions for
> +   improved performance.  */
> +
> +static bool
> +restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi)
> +{
> +  enum tree_code code = TREE_CODE (*expr);
> +  tree base, offset, mem_ref;
> +  HOST_WIDE_INT bitsize, bitpos;
> +  enum machine_mode mode;
> +  int unsignedp, volatilep;
> +
> +  /* Only expressions that reference memory are of interest.  Bitfield
> +     references should eventually be lowered prior to this point and
> +     are not dealt with.  Skip BLKmode aggregates as well.  */
> +  if (! handled_component_p (*expr)
> +      || code == BIT_FIELD_REF
> +      || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1)))
> +      || TYPE_MODE (TREE_TYPE (*expr)) == BLKmode)
> +    return false;
> +
> +  /* Determine whether the expression can be represented with base and
> +     offset components.  */
> +  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
> +			      &unsignedp, &volatilep, false);
> +  if (!base || !offset)
> +    return false;
> +
> +  /* Look for a restructuring opportunity.  */
> +  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
> +					      offset, bitpos)) == NULL_TREE)
> +    return false;
> +
> +  /* Found one.  Record the replacement in the dump file and complete
> +     the replacement.  */
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "\nOriginal_expr = ");
> +      print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS);
> +      fprintf (dump_file, "\nmem_ref = ");
> +      print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS);
> +      fprintf (dump_file, "\n\n");
> +    }

Thus gimplify and add the statements here.

> +  copy_ref_info (mem_ref, *expr);
> +  *expr = mem_ref;
> +
> +  return true;
> +}
> +
>  /* Reassociate expressions in basic block BB and its post-dominator as
>     children.  */
>  
> @@ -2823,14 +2938,43 @@ reassociate_bb (basic_block bb)
>  {
>    gimple_stmt_iterator gsi;
>    basic_block son;
> +  bool in_loop = loop_depth (bb->loop_father) != 0;
>  
>    for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
>      {
>        gimple stmt = gsi_stmt (gsi);
>  
> -      if (is_gimple_assign (stmt)
> -	  && !stmt_could_throw_p (stmt))
> +      /* During late reassociation only, look for restructuring
> +	 opportunities within an expression that references memory.
> +	 We only do this for blocks not contained in loops, since the
> +	 ivopts machinery does a good job on loop expressions, and we
> +	 don't want to interfere with other loop optimizations.  */

I'm not sure I buy this.  IVOPTs would have produced [TARGET_]MEM_REFs
which you don't handle.  Did you do any measurements what happens if
you enable it generally?

> +      /* TODO: This belongs more properly in a separate pass that
> +	 performs general strength reduction on straight-line code.
> +	 Eventually move this there.  */
> +      if (!first_pass_instance /* TODO: deprecated  */
> +	  && !in_loop
> +	  && gimple_vuse (stmt)
> +	  && gimple_assign_single_p (stmt))
>  	{
> +	  tree *lhs, *rhs;
> +	  if (gimple_vdef (stmt))
> +	    {
> +	      lhs = gimple_assign_lhs_ptr (stmt);
> +	      if (restructure_mem_ref (lhs, &gsi))
> +		update_stmt (stmt);
> +	    }
> +	  else if (gimple_vuse (stmt))

That will be always true.


> +	    {
> +	      rhs = gimple_assign_rhs1_ptr (stmt);
> +	      if (restructure_mem_ref (rhs, &gsi))
> +		update_stmt (stmt);
> +	    }
> +	}
> +
> +      else if (is_gimple_assign (stmt)
> +	       && !stmt_could_throw_p (stmt))
> +	{
>  	  tree lhs, rhs1, rhs2;
>  	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);

You verified the patch has no performance degradations on ppc64
for SPEC CPU, did you see any improvements?

The pattern matching is still very ad-hoc and doesn't consider
statements that feed the base address.  There is conceptually
no difference between p->a[n] and *(p + n * 4).  You placed this
lowering in reassoc to catch CSE opportunities with DOM, right?
Does RTL CSE not do it's job or is the transform undone by
fwprop before it gets a chance to do it?

Thanks,
Richard.
Paolo Bonzini - Oct. 11, 2011, 1:07 p.m.
On 10/11/2011 01:40 PM, Richard Guenther wrote:
> The pattern matching is still very ad-hoc and doesn't consider
> statements that feed the base address.  There is conceptually
> no difference between p->a[n] and *(p + n * 4).  You placed this
> lowering in reassoc to catch CSE opportunities with DOM, right?
> Does RTL CSE not do it's job or is the transform undone by
> fwprop before it gets a chance to do it?

The idea (with the patch CSE I sent William) is that fwprop will try to 
be as greedy as possible, within the bounds of addresses allowed by the 
target.  Then CSE will fix the cases where fwprop could only do half of 
the job (and hopefully, scheduling will produce good code even when CSE 
chooses to go back to simple addressing modes).

If I understand correctly, these reassociations are all working on 
addressing modes that the target does not support, hence they fall 
outside the scope of both fwprop and CSE.

Paolo
William J. Schmidt - Oct. 11, 2011, 2:12 p.m.
Hi Richard,

Thanks for the comments -- a few responses below.

On Tue, 2011-10-11 at 13:40 +0200, Richard Guenther wrote:
> On Sat, 8 Oct 2011, William J. Schmidt wrote:
> 

<snip>

> 
> > +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
> 
> You don't verify that bitpos % BITS_PER_UNIT is zero anywhere.

I'll add a check in the caller.  I was thinking this was unnecessary
since I had excluded bitfield operations, but on reflection that may not
be sufficient.

<snip>

> 
> > +  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
> > +					true, GSI_SAME_STMT);
> > +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> > +  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
> > +				       true, GSI_SAME_STMT);
> > +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
> > +			 build_int_cst (offset_type, double_int_to_shwi (c)));
> 
> double_int_to_tree (offset_type, c)
> 
> Please delay gimplification to the caller, that way this function
> solely operates on the trees returned from get_inner_reference.
> Or are you concerned that fold might undo your association?

I'll try that.  I was just basing this on some suggestions you had made
earlier; I don't believe there is any problem with delaying it.

<snip>

> >  
> >    for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
> >      {
> >        gimple stmt = gsi_stmt (gsi);
> >  
> > -      if (is_gimple_assign (stmt)
> > -	  && !stmt_could_throw_p (stmt))
> > +      /* During late reassociation only, look for restructuring
> > +	 opportunities within an expression that references memory.
> > +	 We only do this for blocks not contained in loops, since the
> > +	 ivopts machinery does a good job on loop expressions, and we
> > +	 don't want to interfere with other loop optimizations.  */
> 
> I'm not sure I buy this.  IVOPTs would have produced [TARGET_]MEM_REFs
> which you don't handle.  Did you do any measurements what happens if
> you enable it generally?

Actually I agree with you -- in an earlier iteration this was still
enabled for reassoc1 ahead of loop optimizations and was causing
degradations.  So long as it doesn't occur early it should be fine to do
everywhere now, and catch non-ivar cases in loops.

<snip>

> You verified the patch has no performance degradations on ppc64
> for SPEC CPU, did you see any improvements?
> 

Yes, a few in the 2-3% range.  Nothing stellar.

> The pattern matching is still very ad-hoc and doesn't consider
> statements that feed the base address.  There is conceptually
> no difference between p->a[n] and *(p + n * 4).

That's true.  Since we abandoned the general address-lowering approach,
this was aimed at the specific pattern that comes up frequently in
practice.  I would expect the *(p + n * 4) cases to be handled by the
general straight-line strength reduction, which is the correct long-term
approach.  (Cases like p->a[n], where the multiplication is not yet
explicit, will be a bit of a wart as part of strength reduction, too,
but that's still the right place for it eventually.)

>   You placed this
> lowering in reassoc to catch CSE opportunities with DOM, right?
> Does RTL CSE not do it's job or is the transform undone by
> fwprop before it gets a chance to do it?

I think with Paolo's suggested patch for RTL CSE, this could be moved
back to expand.  I will have to experiment with it again to make sure.
If so, that would certainly be my preference as well.

(Or having the whole problem just disappear might be my preference on
some days... :)

Thanks,
Bill
William J. Schmidt - Oct. 11, 2011, 2:51 p.m.
On Tue, 2011-10-11 at 09:12 -0500, William J. Schmidt wrote:

> > The pattern matching is still very ad-hoc and doesn't consider
> > statements that feed the base address.  There is conceptually
> > no difference between p->a[n] and *(p + n * 4).
> 
> That's true.  Since we abandoned the general address-lowering approach,
> this was aimed at the specific pattern that comes up frequently in
> practice.  I would expect the *(p + n * 4) cases to be handled by the
> general straight-line strength reduction, which is the correct long-term
> approach.  (Cases like p->a[n], where the multiplication is not yet
> explicit, will be a bit of a wart as part of strength reduction, too,
> but that's still the right place for it eventually.)

Going through my notes, I do have some code for the *(p + n * 4) case
lying around from the last time I tried this in expand, so I'll try to
get this back in place (either in reassoc2 or expand, depending on how
the CSE works out).
Ian Taylor - Oct. 11, 2011, 9 p.m.
On Tue, Oct 11, 2011 at 4:40 AM, Richard Guenther <rguenther@suse.de> wrote:

> this function misses to transfer TREE_THIS_NOTRAP which is supposed
> to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF.
> If you make the function generic please adjust it to at least do ...
> ...
>
>          TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base);

This line was indeed added to the patch as committed.  This appears to have
broken the build of libgo.  I now get this:

../../../gccgo3/libgo/go/image/png/writer.go: In function
‘png.writeIDATs.pN23_libgo_image.png.encoder’:
../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
marked for throw, but doesn’t
# .MEM_775 = VDEF <.MEM_774>
MEM[base: D.8326_1070, offset: 0B] = VIEW_CONVERT_EXPR<struct
{
  uint8 * __values;
  int __count;
  int __capacity;
}>(GOTMP.495);

../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
marked for throw, but doesn’t
# .MEM_776 = VDEF <.MEM_775>
D.7574 = MEM[base: D.8325_1069, offset: 0B];

../../../gccgo3/libgo/go/image/png/writer.go:403:1: internal compiler
error: verify_gimple failed
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.


I have not yet done a full investigation, but it appears that this
function is now marking
a newly created reference as TREE_THIS_NOTRAP, which it did not previously do.
The new instruction is within an exception region, and the tree-cfg
checker insists that
instructions in exception region are permitted to trap.  It may be
that the ivopts pass
now requires TODO_cleanup_cfg, or it may be something more complicated.

You should be able to recreate the problem yourself by using
--enable-languages=go when you run configure.

Ian
William J. Schmidt - Oct. 11, 2011, 9:12 p.m.
Thanks -- I will back that line out for now and investigate.  Regression
test was fine for the languages we normally build, but go and ada aren't
among those.  Sorry for the trouble!

On Tue, 2011-10-11 at 14:00 -0700, Ian Lance Taylor wrote:
> On Tue, Oct 11, 2011 at 4:40 AM, Richard Guenther <rguenther@suse.de> wrote:
> 
> > this function misses to transfer TREE_THIS_NOTRAP which is supposed
> > to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF.
> > If you make the function generic please adjust it to at least do ...
> > ...
> >
> >          TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base);
> 
> This line was indeed added to the patch as committed.  This appears to have
> broken the build of libgo.  I now get this:
> 
> ../../../gccgo3/libgo/go/image/png/writer.go: In function
> ‘png.writeIDATs.pN23_libgo_image.png.encoder’:
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
> marked for throw, but doesn’t
> # .MEM_775 = VDEF <.MEM_774>
> MEM[base: D.8326_1070, offset: 0B] = VIEW_CONVERT_EXPR<struct
> {
>   uint8 * __values;
>   int __count;
>   int __capacity;
> }>(GOTMP.495);
> 
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
> marked for throw, but doesn’t
> # .MEM_776 = VDEF <.MEM_775>
> D.7574 = MEM[base: D.8325_1069, offset: 0B];
> 
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: internal compiler
> error: verify_gimple failed
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> 
> 
> I have not yet done a full investigation, but it appears that this
> function is now marking
> a newly created reference as TREE_THIS_NOTRAP, which it did not previously do.
> The new instruction is within an exception region, and the tree-cfg
> checker insists that
> instructions in exception region are permitted to trap.  It may be
> that the ivopts pass
> now requires TODO_cleanup_cfg, or it may be something more complicated.
> 
> You should be able to recreate the problem yourself by using
> --enable-languages=go when you run configure.
> 
> Ian
>
Richard Guenther - Oct. 12, 2011, 10:01 a.m.
On Tue, 11 Oct 2011, Ian Lance Taylor wrote:

> On Tue, Oct 11, 2011 at 4:40 AM, Richard Guenther <rguenther@suse.de> wrote:
> 
> > this function misses to transfer TREE_THIS_NOTRAP which is supposed
> > to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF.
> > If you make the function generic please adjust it to at least do ...
> > ...
> >
> >          TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base);
> 
> This line was indeed added to the patch as committed.  This appears to have
> broken the build of libgo.  I now get this:
> 
> ../../../gccgo3/libgo/go/image/png/writer.go: In function
> ‘png.writeIDATs.pN23_libgo_image.png.encoder’:
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
> marked for throw, but doesn’t
> # .MEM_775 = VDEF <.MEM_774>
> MEM[base: D.8326_1070, offset: 0B] = VIEW_CONVERT_EXPR<struct
> {
>   uint8 * __values;
>   int __count;
>   int __capacity;
> }>(GOTMP.495);
> 
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
> marked for throw, but doesn’t
> # .MEM_776 = VDEF <.MEM_775>
> D.7574 = MEM[base: D.8325_1069, offset: 0B];
> 
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: internal compiler
> error: verify_gimple failed
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> 
> 
> I have not yet done a full investigation, but it appears that this
> function is now marking
> a newly created reference as TREE_THIS_NOTRAP, which it did not previously do.
> The new instruction is within an exception region, and the tree-cfg
> checker insists that
> instructions in exception region are permitted to trap.  It may be

No, the new stmt seems to have an associated EH region but doesn't
trap.  Now, the old stmt also didn't trap (TREE_THIS_NOTRAP (base) was
true), so the question is why did the checker not complain on it
or, why did it not have an EH region associated but the new stmt does.

> that the ivopts pass
> now requires TODO_cleanup_cfg, or it may be something more complicated.
>
> You should be able to recreate the problem yourself by using
> --enable-languages=go when you run configure.

I suppose you can create a C testcase?

Richard.
Ian Taylor - Oct. 13, 2011, 3:56 a.m.
Richard Guenther <rguenther@suse.de> writes:

>> You should be able to recreate the problem yourself by using
>> --enable-languages=go when you run configure.
>
> I suppose you can create a C testcase?

I don't know whether I can, as the Go frontend is generating
VIEW_CONVERT_EXPR here.  I will try to take another look, but not before
next week.

Ian

Patch

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 179708)
+++ gcc/tree.h	(working copy)
@@ -5777,6 +5777,7 @@  tree target_for_debug_bind (tree);
 /* In tree-ssa-address.c.  */
 extern tree tree_mem_ref_addr (tree, tree);
 extern void copy_mem_ref_info (tree, tree);
+extern void copy_ref_info (tree, tree);
 
 /* In tree-vrp.c */
 extern bool ssa_name_nonnegative_p (const_tree);
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
@@ -0,0 +1,22 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dom2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 2 "dom2" } } */
+/* { dg-final { cleanup-tree-dump "dom2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
@@ -0,0 +1,26 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dom2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 12)
+    foo (p->a[n], p->c[n], p->b[n]);
+  else if (n > 3)
+    foo (p->b[n], p->a[n], p->c[n]);
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "dom2" } } */
+/* { dg-final { cleanup-tree-dump "dom2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
@@ -0,0 +1,28 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dom2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 3)
+    {
+      foo (p->a[n], p->c[n], p->b[n]);
+      if (n > 12)
+	foo (p->b[n], p->a[n], p->c[n]);
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "dom2" } } */
+/* { dg-final { cleanup-tree-dump "dom2" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 179708)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -6278,64 +6278,6 @@  rewrite_use_nonlinear_expr (struct ivopts_data *da
     }
 }
 
-/* Copies the reference information from OLD_REF to NEW_REF.  */
-
-static void
-copy_ref_info (tree new_ref, tree old_ref)
-{
-  tree new_ptr_base = NULL_TREE;
-
-  TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref);
-  TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref);
-
-  new_ptr_base = TREE_OPERAND (new_ref, 0);
-
-  /* We can transfer points-to information from an old pointer
-     or decl base to the new one.  */
-  if (new_ptr_base
-      && TREE_CODE (new_ptr_base) == SSA_NAME
-      && !SSA_NAME_PTR_INFO (new_ptr_base))
-    {
-      tree base = get_base_address (old_ref);
-      if (!base)
-	;
-      else if ((TREE_CODE (base) == MEM_REF
-		|| TREE_CODE (base) == TARGET_MEM_REF)
-	       && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
-	       && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
-	{
-	  struct ptr_info_def *new_pi;
-	  duplicate_ssa_name_ptr_info
-	    (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
-	  new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
-	  /* We have to be careful about transfering alignment information.  */
-	  if (TREE_CODE (old_ref) == MEM_REF
-	      && !(TREE_CODE (new_ref) == TARGET_MEM_REF
-		   && (TMR_INDEX2 (new_ref)
-		       || (TMR_STEP (new_ref)
-			   && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
-			       < new_pi->align)))))
-	    {
-	      new_pi->misalign += double_int_sub (mem_ref_offset (old_ref),
-						  mem_ref_offset (new_ref)).low;
-	      new_pi->misalign &= (new_pi->align - 1);
-	    }
-	  else
-	    {
-	      new_pi->align = 1;
-	      new_pi->misalign = 0;
-	    }
-	}
-      else if (TREE_CODE (base) == VAR_DECL
-	       || TREE_CODE (base) == PARM_DECL
-	       || TREE_CODE (base) == RESULT_DECL)
-	{
-	  struct ptr_info_def *pi = get_ptr_info (new_ptr_base);
-	  pt_solution_set_var (&pi->pt, base);
-	}
-    }
-}
-
 /* Performs a peephole optimization to reorder the iv update statement with
    a mem ref to enable instruction combining in later phases. The mem ref uses
    the iv value before the update, so the reordering transformation requires
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	(revision 179708)
+++ gcc/tree-ssa-address.c	(working copy)
@@ -832,6 +832,64 @@  copy_mem_ref_info (tree to, tree from)
   TREE_THIS_VOLATILE (to) = TREE_THIS_VOLATILE (from);
 }
 
+/* Copies the reference information from OLD_REF to NEW_REF.  */
+
+void
+copy_ref_info (tree new_ref, tree old_ref)
+{
+  tree new_ptr_base = NULL_TREE;
+
+  TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref);
+  TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref);
+
+  new_ptr_base = TREE_OPERAND (new_ref, 0);
+
+  /* We can transfer points-to information from an old pointer
+     or decl base to the new one.  */
+  if (new_ptr_base
+      && TREE_CODE (new_ptr_base) == SSA_NAME
+      && !SSA_NAME_PTR_INFO (new_ptr_base))
+    {
+      tree base = get_base_address (old_ref);
+      if (!base)
+	;
+      else if ((TREE_CODE (base) == MEM_REF
+		|| TREE_CODE (base) == TARGET_MEM_REF)
+	       && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
+	       && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
+	{
+	  struct ptr_info_def *new_pi;
+	  duplicate_ssa_name_ptr_info
+	    (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
+	  new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
+	  /* We have to be careful about transfering alignment information.  */
+	  if (TREE_CODE (old_ref) == MEM_REF
+	      && !(TREE_CODE (new_ref) == TARGET_MEM_REF
+		   && (TMR_INDEX2 (new_ref)
+		       || (TMR_STEP (new_ref)
+			   && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
+			       < new_pi->align)))))
+	    {
+	      new_pi->misalign += double_int_sub (mem_ref_offset (old_ref),
+						  mem_ref_offset (new_ref)).low;
+	      new_pi->misalign &= (new_pi->align - 1);
+	    }
+	  else
+	    {
+	      new_pi->align = 1;
+	      new_pi->misalign = 0;
+	    }
+	}
+      else if (TREE_CODE (base) == VAR_DECL
+	       || TREE_CODE (base) == PARM_DECL
+	       || TREE_CODE (base) == RESULT_DECL)
+	{
+	  struct ptr_info_def *pi = get_ptr_info (new_ptr_base);
+	  pt_solution_set_var (&pi->pt, base);
+	}
+    }
+}
+
 /* Move constants in target_mem_ref REF to offset.  Returns the new target
    mem ref if anything changes, NULL_TREE otherwise.  */
 
Index: gcc/tree-ssa-reassoc.c
===================================================================
--- gcc/tree-ssa-reassoc.c	(revision 179708)
+++ gcc/tree-ssa-reassoc.c	(working copy)
@@ -2815,6 +2815,121 @@  break_up_subtract_bb (basic_block bb)
     break_up_subtract_bb (son);
 }
 
+/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI,
+   determine whether there is a profitable opportunity to restructure
+   address arithmetic within BASE and OFFSET.  If so, produce such
+   a restructuring and return it.  */
+
+static tree
+restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi,
+			     tree base, tree offset, HOST_WIDE_INT bitpos)
+{
+  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
+  double_int c, c1, c2, c3, c4;
+
+  /* Look for the following pattern:
+
+       base   = MEM_REF (T1, C1);
+       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
+       bitpos = C4 * BITS_PER_UNIT
+
+     If found, convert it to:
+
+       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
+                C1 + (C2 * C3) + C4)
+   */
+  if (!base
+      || !offset
+      || TREE_CODE (base) != MEM_REF
+      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST
+      || TREE_CODE (offset) != MULT_EXPR)
+    return NULL_TREE;
+
+  mult_op0 = TREE_OPERAND (offset, 0);
+  mult_op1 = TREE_OPERAND (offset, 1);
+
+  if (TREE_CODE (mult_op0) != PLUS_EXPR
+      || TREE_CODE (mult_op1) != INTEGER_CST
+      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
+    return NULL_TREE;
+
+  t1 = TREE_OPERAND (base, 0);
+  t2 = TREE_OPERAND (mult_op0, 0);
+  c1 = TREE_INT_CST (TREE_OPERAND (base, 1));
+  c2 = TREE_INT_CST (TREE_OPERAND (mult_op0, 1));
+  c3 = TREE_INT_CST (mult_op1);
+  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
+  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
+
+  if (!double_int_fits_in_shwi_p (c)
+      || !double_int_fits_in_shwi_p (c3))
+    return NULL_TREE;
+
+  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
+
+  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
+			   build_int_cst (sizetype, double_int_to_shwi (c3)));
+  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
+					true, GSI_SAME_STMT);
+  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
+  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
+				       true, GSI_SAME_STMT);
+  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
+			 build_int_cst (offset_type, double_int_to_shwi (c)));
+  return mem_ref;
+}
+
+/* If *EXPR contains a memory reference, determine whether there is an
+   opportunity to restructure the base and offset expressions for
+   improved performance.  */
+
+static bool
+restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi)
+{
+  enum tree_code code = TREE_CODE (*expr);
+  tree base, offset, mem_ref;
+  HOST_WIDE_INT bitsize, bitpos;
+  enum machine_mode mode;
+  int unsignedp, volatilep;
+
+  /* Only expressions that reference memory are of interest.  Bitfield
+     references should eventually be lowered prior to this point and
+     are not dealt with.  Skip BLKmode aggregates as well.  */
+  if (! handled_component_p (*expr)
+      || code == BIT_FIELD_REF
+      || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1)))
+      || TYPE_MODE (TREE_TYPE (*expr)) == BLKmode)
+    return false;
+
+  /* Determine whether the expression can be represented with base and
+     offset components.  */
+  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
+			      &unsignedp, &volatilep, false);
+  if (!base || !offset)
+    return false;
+
+  /* Look for a restructuring opportunity.  */
+  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
+					      offset, bitpos)) == NULL_TREE)
+    return false;
+
+  /* Found one.  Record the replacement in the dump file and complete
+     the replacement.  */
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nOriginal_expr = ");
+      print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS);
+      fprintf (dump_file, "\nmem_ref = ");
+      print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS);
+      fprintf (dump_file, "\n\n");
+    }
+
+  copy_ref_info (mem_ref, *expr);
+  *expr = mem_ref;
+
+  return true;
+}
+
 /* Reassociate expressions in basic block BB and its post-dominator as
    children.  */
 
@@ -2823,14 +2938,43 @@  reassociate_bb (basic_block bb)
 {
   gimple_stmt_iterator gsi;
   basic_block son;
+  bool in_loop = loop_depth (bb->loop_father) != 0;
 
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
     {
       gimple stmt = gsi_stmt (gsi);
 
-      if (is_gimple_assign (stmt)
-	  && !stmt_could_throw_p (stmt))
+      /* During late reassociation only, look for restructuring
+	 opportunities within an expression that references memory.
+	 We only do this for blocks not contained in loops, since the
+	 ivopts machinery does a good job on loop expressions, and we
+	 don't want to interfere with other loop optimizations.  */
+      /* TODO: This belongs more properly in a separate pass that
+	 performs general strength reduction on straight-line code.
+	 Eventually move this there.  */
+      if (!first_pass_instance /* TODO: deprecated  */
+	  && !in_loop
+	  && gimple_vuse (stmt)
+	  && gimple_assign_single_p (stmt))
 	{
+	  tree *lhs, *rhs;
+	  if (gimple_vdef (stmt))
+	    {
+	      lhs = gimple_assign_lhs_ptr (stmt);
+	      if (restructure_mem_ref (lhs, &gsi))
+		update_stmt (stmt);
+	    }
+	  else if (gimple_vuse (stmt))
+	    {
+	      rhs = gimple_assign_rhs1_ptr (stmt);
+	      if (restructure_mem_ref (rhs, &gsi))
+		update_stmt (stmt);
+	    }
+	}
+
+      else if (is_gimple_assign (stmt)
+	       && !stmt_could_throw_p (stmt))
+	{
 	  tree lhs, rhs1, rhs2;
 	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);