diff mbox

Simplify address expression in IVOPT

Message ID 001d01cedad7$273e4820$75bad860$@arm.com
State New
Headers show

Commit Message

Bin Cheng Nov. 6, 2013, 10:01 a.m. UTC
> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Tuesday, November 05, 2013 8:18 PM
> To: Bin Cheng
> Cc: GCC Patches
> Subject: Re: [PATCH GCC]Simplify address expression in IVOPT
> 
> On Tue, Nov 5, 2013 at 11:13 AM, bin.cheng <bin.cheng@arm.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> >> owner@gcc.gnu.org] On Behalf Of bin.cheng
> >> Sent: Monday, November 04, 2013 4:35 PM
> >> To: 'Richard Biener'
> >> Cc: GCC Patches
> >> Subject: RE: [PATCH GCC]Simplify address expression in IVOPT
> >>
> >>
> >>
> >> > -----Original Message-----
> >> > From: Richard Biener [mailto:richard.guenther@gmail.com]
> >> > Sent: Wednesday, October 30, 2013 10:46 PM
> >> > To: Bin Cheng
> >> > Cc: GCC Patches
> >> > Subject: Re: [PATCH GCC]Simplify address expression in IVOPT
> >> >
> >> > On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng <bin.cheng@arm.com>
> wrote:
> >> >
> >> > Hmm.  I think you want what get_inner_reference_aff does, using the
> >> > return value of get_inner_reference as starting point for
> >> determine_base_object.
> >> > And you don't want to restrict yourselves so much on what exprs to
> >> process,
> >> > but only exclude DECL_Ps.
> >> Did you mean I should pass all address expressions to
> >> get_inner_reference_aff except the one with declaration operand?
> >> I am a little confused here why DECL_Ps should be handled specially?
> > Seems
> >> it can be handled properly by the get_* function.  Anything important
> >> I missed?
> >>
> >> > Just amend get_inner_reference_aff to return the tree base object.
> >> I found it's inappropriate to do this because: functions like
> >> get_inner_reference* only accept reference expressions, while
> >> base_object has to be computed for other kinds of expressions too.
> >> Take gimple statement "a_1 = *ptr_1" as an example, the base passed
> >> to alloc_iv is
> > ptr_1;
> >> the base_object computed by determine_base_object is also ptr_1,
> >> which we can't rely on get_inner_reference* .
> >>
> >> Also It seems the comment before determine_base_object might be
> >> inaccurate:
> >> " Returns a memory object to that EXPR points." which I think is "
> >> Returns
> > a
> >> pointer pointing to the main memory object to that EXPR points."
> >> Right?
> >> >
> >> > Note that this isn't really "simplifying" but rather lowering all
> >> addresses.
> >> >
> >> > The other changes in this patch are unrelated, right?
> >> Right, I should have named this message like "refine cost computation
> >> of expressions in IVOPT" just as the patch file.
> >
> > Hi,
> > I updated the patch according to review comments.  Now it lowers all
> > address expressions except ones with DECL_P to
> > get_inner_reference_aff, it also computes base_object which is used
later
> to ease determine_base_object.
> >
> > Bootstrap and test on x86/x86_64/arm on going,  is it OK if tests pass?
> 
>  static struct iv *
>  alloc_iv (tree base, tree step)
>  {
> +  tree expr = base;
> +  tree base_object = base;
>    struct iv *iv = XCNEW (struct iv);
>    gcc_assert (step != NULL_TREE);
> 
> +  /* Lower all address expressions except ones with DECL_P as oeprand.
> +     By doing this:
> +       1) More accurate cost can be computed for address expressions;
> +       2) Duplicate candidates won't be created for bases in different
> +          forms, like &a[0] and &a.  */  STRIP_NOPS (expr);  if
> + (TREE_CODE (expr) == ADDR_EXPR && !DECL_P (TREE_OPERAND (expr, 0)))
> +    {
> +      aff_tree comb;
> +      double_int size;
> +      base_object = get_inner_reference_aff (TREE_OPERAND (expr, 0),
> +     &comb, &size);
> +      gcc_assert (base_object != NULL_TREE);
> +      base = fold_convert (TREE_TYPE (base), aff_combination_to_tree
> (&comb));
> +    }
> +
>    iv->base = base;
> -  iv->base_object = determine_base_object (base);
> +  iv->base_object = determine_base_object (base_object);
> 
> 
> for less confusion do not introduce 'expr' but just base_object
> (s/expr/base_object/).  Also can you split out this change (and the
related
> tree-affine.c one) from the rest?
> 
> Ok with that change assuming it passes bootstrap & testing.

It passes test.  I split it into two patches as attached.
Are these two OK?

Thanks,
bin

3-ivopt-expr_cost-a-20131106:
gcc/testsuite/ChangeLog
2013-11-06  Bin Cheng  <bin.cheng@arm.com>

	* gcc.dg/tree-ssa/loop-2.c: Refine check condition.
	* gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto.
	* gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto.

2013-11-06  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-ivopts.c (alloc_iv): Lower address expressions.
	* tree-affine.c (get_inner_reference_aff): Return base.
	* tree-affine.h (get_inner_reference_aff): Change prototype.

3-ivopt-expr_cost-b-20131106:
2013-11-06  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-ivopts.c (get_shiftadd_cost): Check equality
	using operand_equal_p.
	(force_expr_to_var_cost): Refactor the code.  Handle type
conversion.
	(split_address_cost): Call force_expr_to_var_cost.

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 204117)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3481,17 +3500,21 @@ get_shiftadd_cost (tree expr, enum machine_mode mo
   int m = exact_log2 (int_cst_value (cst));
   int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode));
   int sa_cost;
+  bool equal_p = false;
 
   if (!(m >= 0 && m < maxm))
     return false;
 
+  if (operand_equal_p (op1, mult, 0))
+    equal_p = true;
+
   sa_cost = (TREE_CODE (expr) != MINUS_EXPR
              ? shiftadd_cost (speed, mode, m)
-             : (mult == op1
+             : (equal_p
                 ? shiftsub1_cost (speed, mode, m)
                 : shiftsub0_cost (speed, mode, m)));
   res = new_cost (sa_cost, 0);
-  res = add_costs (res, mult == op1 ? cost0 : cost1);
+  res = add_costs (res, equal_p ? cost0 : cost1);
 
   STRIP_NOPS (multop);
   if (!is_gimple_val (multop))
@@ -3585,30 +3608,14 @@ force_expr_to_var_cost (tree expr, bool speed)
       op1 = TREE_OPERAND (expr, 1);
       STRIP_NOPS (op0);
       STRIP_NOPS (op1);
-
-      if (is_gimple_val (op0))
-	cost0 = no_cost;
-      else
-	cost0 = force_expr_to_var_cost (op0, speed);
-
-      if (is_gimple_val (op1))
-	cost1 = no_cost;
-      else
-	cost1 = force_expr_to_var_cost (op1, speed);
-
       break;
 
+    case NOP_EXPR:
+    case CONVERT_EXPR:
     case NEGATE_EXPR:
       op0 = TREE_OPERAND (expr, 0);
       STRIP_NOPS (op0);
       op1 = NULL_TREE;
-
-      if (is_gimple_val (op0))
-	cost0 = no_cost;
-      else
-	cost0 = force_expr_to_var_cost (op0, speed);
-
-      cost1 = no_cost;
       break;
 
     default:
@@ -3616,6 +3623,18 @@ force_expr_to_var_cost (tree expr, bool speed)
       return new_cost (target_spill_cost[speed], 0);
     }
 
+  if (op0 == NULL_TREE
+      || (is_gimple_val (op0) && (TREE_CODE (op0) != ADDR_EXPR)))
+    cost0 = no_cost;
+  else
+    cost0 = force_expr_to_var_cost (op0, speed);
+
+  if (op1 == NULL_TREE
+      || (is_gimple_val (op1) && (TREE_CODE (op1) != ADDR_EXPR)))
+    cost1 = no_cost;
+  else
+    cost1 = force_expr_to_var_cost (op1, speed);
+
   mode = TYPE_MODE (TREE_TYPE (expr));
   switch (TREE_CODE (expr))
     {
@@ -3639,7 +3658,18 @@ force_expr_to_var_cost (tree expr, bool speed)
                                     speed, &sa_cost))
             return sa_cost;
         }
+
       break;
+    case NOP_EXPR:
+    case CONVERT_EXPR:
+      {
+	tree inner_mode, outer_mode;
+	outer_mode = TREE_TYPE (expr);
+	inner_mode = TREE_TYPE (op0);
+	cost = new_cost (convert_cost (TYPE_MODE (outer_mode),
+				       TYPE_MODE (inner_mode), speed), 0);
+      }
+      break;
 
     case MULT_EXPR:
       if (cst_and_fits_in_hwi (op0))
@@ -3713,7 +3743,7 @@ split_address_cost (struct ivopts_data *data,
       *var_present = true;
       fd_ivopts_data = data;
       walk_tree (&addr, find_depends, depends_on, NULL);
-      return new_cost (target_spill_cost[data->speed], 0);
+      return force_expr_to_var_cost (addr, data->speed);
     }
 
   *offset += bitpos / BITS_PER_UNIT;

Comments

Richard Biener Nov. 6, 2013, 11:20 a.m. UTC | #1
On Wed, Nov 6, 2013 at 11:01 AM, bin.cheng <bin.cheng@arm.com> wrote:
>
>
>> -----Original Message-----
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: Tuesday, November 05, 2013 8:18 PM
>> To: Bin Cheng
>> Cc: GCC Patches
>> Subject: Re: [PATCH GCC]Simplify address expression in IVOPT
>>
>> On Tue, Nov 5, 2013 at 11:13 AM, bin.cheng <bin.cheng@arm.com> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
>> >> owner@gcc.gnu.org] On Behalf Of bin.cheng
>> >> Sent: Monday, November 04, 2013 4:35 PM
>> >> To: 'Richard Biener'
>> >> Cc: GCC Patches
>> >> Subject: RE: [PATCH GCC]Simplify address expression in IVOPT
>> >>
>> >>
>> >>
>> >> > -----Original Message-----
>> >> > From: Richard Biener [mailto:richard.guenther@gmail.com]
>> >> > Sent: Wednesday, October 30, 2013 10:46 PM
>> >> > To: Bin Cheng
>> >> > Cc: GCC Patches
>> >> > Subject: Re: [PATCH GCC]Simplify address expression in IVOPT
>> >> >
>> >> > On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng <bin.cheng@arm.com>
>> wrote:
>> >> >
>> >> > Hmm.  I think you want what get_inner_reference_aff does, using the
>> >> > return value of get_inner_reference as starting point for
>> >> determine_base_object.
>> >> > And you don't want to restrict yourselves so much on what exprs to
>> >> process,
>> >> > but only exclude DECL_Ps.
>> >> Did you mean I should pass all address expressions to
>> >> get_inner_reference_aff except the one with declaration operand?
>> >> I am a little confused here why DECL_Ps should be handled specially?
>> > Seems
>> >> it can be handled properly by the get_* function.  Anything important
>> >> I missed?
>> >>
>> >> > Just amend get_inner_reference_aff to return the tree base object.
>> >> I found it's inappropriate to do this because: functions like
>> >> get_inner_reference* only accept reference expressions, while
>> >> base_object has to be computed for other kinds of expressions too.
>> >> Take gimple statement "a_1 = *ptr_1" as an example, the base passed
>> >> to alloc_iv is
>> > ptr_1;
>> >> the base_object computed by determine_base_object is also ptr_1,
>> >> which we can't rely on get_inner_reference* .
>> >>
>> >> Also It seems the comment before determine_base_object might be
>> >> inaccurate:
>> >> " Returns a memory object to that EXPR points." which I think is "
>> >> Returns
>> > a
>> >> pointer pointing to the main memory object to that EXPR points."
>> >> Right?
>> >> >
>> >> > Note that this isn't really "simplifying" but rather lowering all
>> >> addresses.
>> >> >
>> >> > The other changes in this patch are unrelated, right?
>> >> Right, I should have named this message like "refine cost computation
>> >> of expressions in IVOPT" just as the patch file.
>> >
>> > Hi,
>> > I updated the patch according to review comments.  Now it lowers all
>> > address expressions except ones with DECL_P to
>> > get_inner_reference_aff, it also computes base_object which is used
> later
>> to ease determine_base_object.
>> >
>> > Bootstrap and test on x86/x86_64/arm on going,  is it OK if tests pass?
>>
>>  static struct iv *
>>  alloc_iv (tree base, tree step)
>>  {
>> +  tree expr = base;
>> +  tree base_object = base;
>>    struct iv *iv = XCNEW (struct iv);
>>    gcc_assert (step != NULL_TREE);
>>
>> +  /* Lower all address expressions except ones with DECL_P as oeprand.
>> +     By doing this:
>> +       1) More accurate cost can be computed for address expressions;
>> +       2) Duplicate candidates won't be created for bases in different
>> +          forms, like &a[0] and &a.  */  STRIP_NOPS (expr);  if
>> + (TREE_CODE (expr) == ADDR_EXPR && !DECL_P (TREE_OPERAND (expr, 0)))
>> +    {
>> +      aff_tree comb;
>> +      double_int size;
>> +      base_object = get_inner_reference_aff (TREE_OPERAND (expr, 0),
>> +     &comb, &size);
>> +      gcc_assert (base_object != NULL_TREE);
>> +      base = fold_convert (TREE_TYPE (base), aff_combination_to_tree
>> (&comb));
>> +    }
>> +
>>    iv->base = base;
>> -  iv->base_object = determine_base_object (base);
>> +  iv->base_object = determine_base_object (base_object);
>>
>>
>> for less confusion do not introduce 'expr' but just base_object
>> (s/expr/base_object/).  Also can you split out this change (and the
> related
>> tree-affine.c one) from the rest?
>>
>> Ok with that change assuming it passes bootstrap & testing.
>
> It passes test.  I split it into two patches as attached.
> Are these two OK?

The first one is ok (the Lower address expressions one).

The get_shiftadd_cost change in the 2nd is ok, for the force_expr_to_var_cost,
it's mostly refactoring apart from conversions?  For

+    case NOP_EXPR:
+    case CONVERT_EXPR:

use

      CASE_CONVERT:

in both places.

I see that the refactoring introduces the ADDR_EXPR check in

+  if (op0 == NULL_TREE
+      || (is_gimple_val (op0) && (TREE_CODE (op0) != ADDR_EXPR)))
+    cost0 = no_cost;
+  else
+    cost0 = force_expr_to_var_cost (op0, speed);

what's the reason for this?  IMHO the is_gimple_val && != ADDR_EXPR
check should be rewritten in a "positive" way - it seems you
want (TREE_CODE (op0) == SSA_NAME || CONSTANT_CLASS_P (op0))?
Which seems what the other-way-around checks boils down to
(apart from also allowing aggregate memory vars which cannot appear here).

@@ -3713,7 +3743,7 @@ split_address_cost (struct ivopts_data *data,
       *var_present = true;
       fd_ivopts_data = data;
       walk_tree (&addr, find_depends, depends_on, NULL);
-      return new_cost (target_spill_cost[data->speed], 0);
+      return force_expr_to_var_cost (addr, data->speed);
     }

this seems to be an unrelated change - there are other spill-cost
users in ivopts as well, not sure why one or the other variant is used.

For IVOPTs which is quite fragile with code generation quality it
helps to split up patches as much as possible to allow easy
bisection when problems arise on targets.

So, you can commit the get_shiftadd_cost change, re-post a fixed
conversion-handling patch and split the split_address_cost piece,
motivating it?

Thanks,
Richard.


> Thanks,
> bin
>
> 3-ivopt-expr_cost-a-20131106:
> gcc/testsuite/ChangeLog
> 2013-11-06  Bin Cheng  <bin.cheng@arm.com>
>
>         * gcc.dg/tree-ssa/loop-2.c: Refine check condition.
>         * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto.
>         * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto.
>
> 2013-11-06  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-ssa-loop-ivopts.c (alloc_iv): Lower address expressions.
>         * tree-affine.c (get_inner_reference_aff): Return base.
>         * tree-affine.h (get_inner_reference_aff): Change prototype.
>
> 3-ivopt-expr_cost-b-20131106:
> 2013-11-06  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-ssa-loop-ivopts.c (get_shiftadd_cost): Check equality
>         using operand_equal_p.
>         (force_expr_to_var_cost): Refactor the code.  Handle type
> conversion.
>         (split_address_cost): Call force_expr_to_var_cost.
diff mbox

Patch

Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/loop-2.c	(revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-2.c	(working copy)
@@ -27,7 +27,7 @@  void xxx(void)
 
 /* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */
 
 /* 17 * iter should be strength reduced.  */
 
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(working copy)
@@ -7,7 +7,8 @@ 
 
 extern char a[];
 
-/* Can not infer loop iteration from array -- exit test can not be replaced.  */
+/* Can not infer loop iteration from array -- exit test can not be
+   replaced by the array address.  */
 void foo (unsigned int i_width, TYPE dst)
 {
   unsigned long long i = 0;
@@ -21,5 +22,5 @@  void foo (unsigned int i_width, TYPE dst)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "\[^:\]*if \\(.*j_\[0-9\]+.*\\)" 1 "ivopts"} } */
 /* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(working copy)
@@ -18,5 +18,5 @@  long foo(long* p, long* p2, int N1, int N2)
   return s;
 }
 
-/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(.*p2.*\\)" 1 "ivopts"} } */
 /* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 204117)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -924,11 +924,30 @@  determine_base_object (tree expr)
 static struct iv *
 alloc_iv (tree base, tree step)
 {
+  tree base_object = base;
   struct iv *iv = XCNEW (struct iv);
   gcc_assert (step != NULL_TREE);
 
+  /* Lower all address expressions except ones with DECL_P as operand.
+     By doing this:
+       1) More accurate cost can be computed for address expressions;
+       2) Duplicate candidates won't be created for bases in different
+          forms, like &a[0] and &a.  */
+  STRIP_NOPS (base_object);
+  if (TREE_CODE (base_object) == ADDR_EXPR
+      && !DECL_P (TREE_OPERAND (base_object, 0)))
+    {
+      aff_tree comb;
+      double_int size;
+      base_object = get_inner_reference_aff (TREE_OPERAND (base_object, 0),
+					     &comb, &size);
+      gcc_assert (base_object != NULL_TREE);
+      base_object = build_fold_addr_expr (base_object);
+      base = fold_convert (TREE_TYPE (base), aff_combination_to_tree (&comb));
+    }
+
   iv->base = base;
-  iv->base_object = determine_base_object (base);
+  iv->base_object = determine_base_object (base_object);
   iv->step = step;
   iv->biv_p = false;
   iv->have_use_for = false;
Index: gcc/tree-affine.c
===================================================================
--- gcc/tree-affine.c	(revision 204117)
+++ gcc/tree-affine.c	(working copy)
@@ -874,10 +874,11 @@  debug_aff (aff_tree *val)
   fprintf (stderr, "\n");
 }
 
-/* Returns address of the reference REF in ADDR.  The size of the accessed
-   location is stored to SIZE.  */
+/* Computes address of the reference REF in ADDR.  The size of the accessed
+   location is stored to SIZE.  Returns the ultimate containing object to
+   which REF refers.  */
 
-void
+tree
 get_inner_reference_aff (tree ref, aff_tree *addr, double_int *size)
 {
   HOST_WIDE_INT bitsize, bitpos;
@@ -904,6 +905,8 @@  get_inner_reference_aff (tree ref, aff_tree *addr,
   aff_combination_add (addr, &tmp);
 
   *size = double_int::from_shwi ((bitsize + BITS_PER_UNIT - 1) / BITS_PER_UNIT);
+
+  return base;
 }
 
 /* Returns true if a region of size SIZE1 at position 0 and a region of
Index: gcc/tree-affine.h
===================================================================
--- gcc/tree-affine.h	(revision 204117)
+++ gcc/tree-affine.h	(working copy)
@@ -74,7 +74,7 @@  bool aff_combination_constant_multiple_p (aff_tree
 void aff_combination_expand (aff_tree *, struct pointer_map_t **);
 void tree_to_aff_combination_expand (tree, tree, aff_tree *,
 				     struct pointer_map_t **);
-void get_inner_reference_aff (tree, aff_tree *, double_int *);
+tree get_inner_reference_aff (tree, aff_tree *, double_int *);
 void free_affine_expand_cache (struct pointer_map_t **);
 bool aff_comb_cannot_overlap_p (aff_tree *, double_int, double_int);