Patchwork Address lowering [1/3] Main patch

login
register
mail settings
Submitter William J. Schmidt
Date July 19, 2011, 9:42 p.m.
Message ID <1311111755.4429.16.camel@oc2474580526.ibm.com>
Download mbox | patch
Permalink /patch/105547/
State New
Headers show

Comments

William J. Schmidt - July 19, 2011, 9:42 p.m.
I've been distracted by other things, but got back to this today...

On Wed, 2011-07-06 at 16:58 +0200, Richard Guenther wrote:
> Ah, so we still have the ARRAY_REFs here.  Yeah, well - then the
> issue boils down to get_inner_reference canonicalizing the offset
> according to what fold-const.c implements, and we simply emit
> the offset in that form (if you look at the normal_inner_ref: case
> in expand_expr_real*).  Thus with the above form at expansion
> time the matching would be applied there (or during our RTL
> address-generation in fwprop.c ...).

I put together a simple patch to match the specific pattern from the
original problem report in expand_expr_real_1.  This returns the code
generation for that specific pattern to what it was a few years ago, but
has little effect on SPEC cpu2000 results.  A small handful of
benchmarks are improved, but most results are in the noise range.  So
solving the original problem alone doesn't account for very much of the
improvement we're getting with the full address lowering.

Note that this only converted the basic pattern; I did not attempt to do
the extra "zero-offset mem-ref" optimization, which I would have had to
rewrite for an RTL phase.  I don't see much point in pursuing that,
given these results.

> 
> Another idea is of course to lower all handled_component_p
> memory accesses - something I did on the old mem-ref branch
> and I believe I also suggested at some point.  Without such lowering
> all the address-generation isn't exposed to the GIMPLE optimizers.
> 
> The simplest way of doing the lowering is to do sth like
> 
>   base = get_inner_reference (rhs, ..., &bitpos, &offset, ...);
>   base = fold_build2 (POINTER_PLUS_EXPR, ...,
>                                base, offset);
>   base = force_gimple_operand (base, ... is_gimple_mem_ref_addr);
>   tem = fold_build2 (MEM_REF, TREE_TYPE (rhs),
>                              base,
>                              build_int_cst (get_alias_ptr_type (rhs), bitpos));
> 
> at some point.  I didn't end up doing that because of the loss of
> alias information.

My next experiment will be to try something simple along these lines.
I'll look at the old mem-ref branch to see what you were doing there.
It will be interesting to see whether the gains generally outweigh the
small loss of TBAA information, as we saw using the affine-combination
approach.

As an FYI, this is the patch I used for my experiment.  Let me know if
you see anything horrifically wrong that might have impacted the
results.  I didn't make any attempt to figure out whether the pattern
rewrite is appropriate for the target machine, since this was just an
experiment.

Thanks,
Bill
Richard Guenther - July 20, 2011, 9:03 a.m.
On Tue, 19 Jul 2011, William J. Schmidt wrote:

> I've been distracted by other things, but got back to this today...
> 
> On Wed, 2011-07-06 at 16:58 +0200, Richard Guenther wrote:
> > Ah, so we still have the ARRAY_REFs here.  Yeah, well - then the
> > issue boils down to get_inner_reference canonicalizing the offset
> > according to what fold-const.c implements, and we simply emit
> > the offset in that form (if you look at the normal_inner_ref: case
> > in expand_expr_real*).  Thus with the above form at expansion
> > time the matching would be applied there (or during our RTL
> > address-generation in fwprop.c ...).
> 
> I put together a simple patch to match the specific pattern from the
> original problem report in expand_expr_real_1.  This returns the code
> generation for that specific pattern to what it was a few years ago, but
> has little effect on SPEC cpu2000 results.  A small handful of
> benchmarks are improved, but most results are in the noise range.  So
> solving the original problem alone doesn't account for very much of the
> improvement we're getting with the full address lowering.
> 
> Note that this only converted the basic pattern; I did not attempt to do
> the extra "zero-offset mem-ref" optimization, which I would have had to
> rewrite for an RTL phase.  I don't see much point in pursuing that,
> given these results.
> 
> > 
> > Another idea is of course to lower all handled_component_p
> > memory accesses - something I did on the old mem-ref branch
> > and I believe I also suggested at some point.  Without such lowering
> > all the address-generation isn't exposed to the GIMPLE optimizers.
> > 
> > The simplest way of doing the lowering is to do sth like
> > 
> >   base = get_inner_reference (rhs, ..., &bitpos, &offset, ...);
> >   base = fold_build2 (POINTER_PLUS_EXPR, ...,
> >                                base, offset);
> >   base = force_gimple_operand (base, ... is_gimple_mem_ref_addr);
> >   tem = fold_build2 (MEM_REF, TREE_TYPE (rhs),
> >                              base,
> >                              build_int_cst (get_alias_ptr_type (rhs), bitpos));
> > 
> > at some point.  I didn't end up doing that because of the loss of
> > alias information.
> 
> My next experiment will be to try something simple along these lines.
> I'll look at the old mem-ref branch to see what you were doing there.
> It will be interesting to see whether the gains generally outweigh the
> small loss of TBAA information, as we saw using the affine-combination
> approach.
> 
> As an FYI, this is the patch I used for my experiment.  Let me know if
> you see anything horrifically wrong that might have impacted the
> results.  I didn't make any attempt to figure out whether the pattern
> rewrite is appropriate for the target machine, since this was just an
> experiment.

I wonder if the code below triggered at all as since we expand from
SSA we no longer see the larger trees in-place but you have to
look them up via SSA defs using get_gimple_for_ssa_name (or the
helper get_def_for_expr).  So I expect that the check for a MULT_EXPR
offset only will trigger if the memory reference was still in the
form of a variable ARRAY_REF (then get_inner_reference will produce
an offset of the form i * sizeof (element)).  I think what we want
to catch here is also the case of *(ptr + i) which will not show
up as ARRAY_REF but as MEM[ptr + 0] where ptr is an SSA name
with a defining statement doing ptr' + i * sizeof (element).

Richard.

> Thanks,
> Bill
> 
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c	(revision 176422)
> +++ gcc/expr.c	(working copy)
> @@ -7152,7 +7152,64 @@ expand_constructor (tree exp, rtx target, enum exp
>    return target;
>  }
>  
> +/* Look for specific patterns in the inner reference BASE and its
> +   OFFSET expression.  If found, return a single tree of type EXPTYPE
> +   that reassociates the addressability to combine constants.  */
> +static tree
> +reassociate_base_and_offset (tree base, tree offset, tree exptype)
> +{
> +  tree mult_op0, mult_op1, t1, t2;
> +  tree mult_expr, add_expr, mem_ref;
> +  tree offset_type;
> +  HOST_WIDE_INT c1, c2, c3, c;
>  
> +  /* Currently we look for the following pattern, where Tj is an
> +     arbitrary tree, and Ck is an integer constant:
> +
> +     base:    MEM_REF (T1, C1)
> +     offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> +
> +     This is converted to:
> +
> +              MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> +                       C1 + C2*C3)
> +  */
> +  if (!base
> +      || !offset
> +      || TREE_CODE (base) != MEM_REF
> +      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST
> +      || TREE_CODE (offset) != MULT_EXPR)
> +    return NULL_TREE;
> +
> +  mult_op0 = TREE_OPERAND (offset, 0);
> +  mult_op1 = TREE_OPERAND (offset, 1);
> +
> +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> +      || TREE_CODE (mult_op1) != INTEGER_CST
> +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> +    return NULL_TREE;
> +
> +  t1 = TREE_OPERAND (base, 0);
> +  t2 = TREE_OPERAND (mult_op0, 0);
> +  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
> +  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
> +  c3 = TREE_INT_CST_LOW (mult_op1);
> +  c = c1 + c2 * c3;
> +
> +  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
> +
> +  mult_expr = build2 (MULT_EXPR, sizetype, t2, 
> +		      build_int_cst (offset_type, c3));
> +
> +  add_expr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> +
> +  mem_ref = build2 (MEM_REF, exptype, add_expr, 
> +		    build_int_cst (offset_type, c));
> +
> +  return mem_ref;
> +}
> +
> +
>  /* expand_expr: generate code for computing expression EXP.
>     An rtx for the computed value is returned.  The value is never null.
>     In the case of a void EXP, const0_rtx is returned.
> @@ -9034,7 +9091,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
>        {
>  	enum machine_mode mode1, mode2;
>  	HOST_WIDE_INT bitsize, bitpos;
> -	tree offset;
> +	tree offset, tem2;
>  	int volatilep = 0, must_force_mem;
>  	bool packedp = false;
>  	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
> @@ -9051,6 +9108,15 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
>  		&& DECL_PACKED (TREE_OPERAND (exp, 1))))
>  	  packedp = true;
>  
> +	/* Experimental:  Attempt to reassociate the base and offset
> +	   to combine constants.  */
> +	if ((tem2 = reassociate_base_and_offset (tem, offset, TREE_TYPE (exp)))
> +	    != NULL_TREE)
> +	  {
> +	    tem = tem2;
> +	    offset = NULL_TREE;
> +	  }
> +
>  	/* If TEM's type is a union of variable size, pass TARGET to the inner
>  	   computation, since it will need a temporary and TARGET is known
>  	   to have to do.  This occurs in unchecked conversion in Ada.  */
> 
> 
>
William J. Schmidt - July 20, 2011, 1:14 p.m.
On Wed, 2011-07-20 at 11:03 +0200, Richard Guenther wrote:

> I wonder if the code below triggered at all as since we expand from
> SSA we no longer see the larger trees in-place but you have to
> look them up via SSA defs using get_gimple_for_ssa_name (or the
> helper get_def_for_expr).  So I expect that the check for a MULT_EXPR
> offset only will trigger if the memory reference was still in the
> form of a variable ARRAY_REF (then get_inner_reference will produce
> an offset of the form i * sizeof (element)).  I think what we want
> to catch here is also the case of *(ptr + i) which will not show
> up as ARRAY_REF but as MEM[ptr + 0] where ptr is an SSA name
> with a defining statement doing ptr' + i * sizeof (element).
> 
> Richard.

Confirmed.  The compiler is pretty smart about reconstructing ARRAY_REFs
out of *(p + i) if p is known to point to an array, but if p is a scalar
pointer with unknown target the code doesn't kick in.  I'll cobble up a
version that tests for both patterns and get some new numbers.

Thanks,
Bill

Patch

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 176422)
+++ gcc/expr.c	(working copy)
@@ -7152,7 +7152,64 @@  expand_constructor (tree exp, rtx target, enum exp
   return target;
 }
 
+/* Look for specific patterns in the inner reference BASE and its
+   OFFSET expression.  If found, return a single tree of type EXPTYPE
+   that reassociates the addressability to combine constants.  */
+static tree
+reassociate_base_and_offset (tree base, tree offset, tree exptype)
+{
+  tree mult_op0, mult_op1, t1, t2;
+  tree mult_expr, add_expr, mem_ref;
+  tree offset_type;
+  HOST_WIDE_INT c1, c2, c3, c;
 
+  /* Currently we look for the following pattern, where Tj is an
+     arbitrary tree, and Ck is an integer constant:
+
+     base:    MEM_REF (T1, C1)
+     offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
+
+     This is converted to:
+
+              MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
+                       C1 + C2*C3)
+  */
+  if (!base
+      || !offset
+      || TREE_CODE (base) != MEM_REF
+      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST
+      || TREE_CODE (offset) != MULT_EXPR)
+    return NULL_TREE;
+
+  mult_op0 = TREE_OPERAND (offset, 0);
+  mult_op1 = TREE_OPERAND (offset, 1);
+
+  if (TREE_CODE (mult_op0) != PLUS_EXPR
+      || TREE_CODE (mult_op1) != INTEGER_CST
+      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
+    return NULL_TREE;
+
+  t1 = TREE_OPERAND (base, 0);
+  t2 = TREE_OPERAND (mult_op0, 0);
+  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
+  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
+  c3 = TREE_INT_CST_LOW (mult_op1);
+  c = c1 + c2 * c3;
+
+  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
+
+  mult_expr = build2 (MULT_EXPR, sizetype, t2, 
+		      build_int_cst (offset_type, c3));
+
+  add_expr = build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
+
+  mem_ref = build2 (MEM_REF, exptype, add_expr, 
+		    build_int_cst (offset_type, c));
+
+  return mem_ref;
+}
+
+
 /* expand_expr: generate code for computing expression EXP.
    An rtx for the computed value is returned.  The value is never null.
    In the case of a void EXP, const0_rtx is returned.
@@ -9034,7 +9091,7 @@  expand_expr_real_1 (tree exp, rtx target, enum mac
       {
 	enum machine_mode mode1, mode2;
 	HOST_WIDE_INT bitsize, bitpos;
-	tree offset;
+	tree offset, tem2;
 	int volatilep = 0, must_force_mem;
 	bool packedp = false;
 	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
@@ -9051,6 +9108,15 @@  expand_expr_real_1 (tree exp, rtx target, enum mac
 		&& DECL_PACKED (TREE_OPERAND (exp, 1))))
 	  packedp = true;
 
+	/* Experimental:  Attempt to reassociate the base and offset
+	   to combine constants.  */
+	if ((tem2 = reassociate_base_and_offset (tem, offset, TREE_TYPE (exp)))
+	    != NULL_TREE)
+	  {
+	    tem = tem2;
+	    offset = NULL_TREE;
+	  }
+
 	/* If TEM's type is a union of variable size, pass TARGET to the inner
 	   computation, since it will need a temporary and TARGET is known
 	   to have to do.  This occurs in unchecked conversion in Ada.  */