Simplify address expression in IVOPT

Message ID	000001ced487$c9e8e540$5dbaafc0$@arm.com
State	New
Headers	show Return-Path: <gcc-patches-return-352530-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=mQGf6EjTjsQNg05zugggGtCuVATYIXHcEY0SRoOShzN/oqv5AtZQP CzbYvMvh1IjCXFnXMC0FmdnTHB3pu88W1Bh4wHVTYUJ1Q4NXTpybBVBVjYF0ID4D vtbuiayW9zDT4rkAe6qWpAGTB+0Z1R3KA5m8fT6hD+khtl+dQuQkX8= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: "bin.cheng" <bin.cheng@arm.com> To: <gcc-patches@gcc.gnu.org> Subject: [PATCH GCC]Simplify address expression in IVOPT Date: Tue, 29 Oct 2013 17:18:09 +0800 Message-ID: <000001ced487$c9e8e540$5dbaafc0$@arm.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0001_01CED4CA.D80D5DC0"

Message ID

000001ced487$c9e8e540$5dbaafc0$@arm.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type; q=dns; s=
	default; b=mQGf6EjTjsQNg05zugggGtCuVATYIXHcEY0SRoOShzN/oqv5AtZQP
	CzbYvMvh1IjCXFnXMC0FmdnTHB3pu88W1Bh4wHVTYUJ1Q4NXTpybBVBVjYF0ID4D
	vtbuiayW9zDT4rkAe6qWpAGTB+0Z1R3KA5m8fT6hD+khtl+dQuQkX8=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: "bin.cheng" <bin.cheng@arm.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH GCC]Simplify address expression in IVOPT
Date: Tue, 29 Oct 2013 17:18:09 +0800
Message-ID: <000001ced487$c9e8e540$5dbaafc0$@arm.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----=_NextPart_000_0001_01CED4CA.D80D5DC0"

Commit Message

Bin Cheng Oct. 29, 2013, 9:18 a.m. UTC

Hi,
I noticed that IVOPT generates complex address expressions like below for iv
base.
	&arr_base[0].y
	&arr[0]
	&MEM[p+o]
It's even worse for targets support auto-increment addressing mode because
IVOPT adjusts such base expression with +/- step, then creates below:
	&arr_base[0].y +/- step
	&arr[0] +/- step
	&MEM[p+o] +/- step
It has two disadvantages:
1) Cost computation in IVOPT can't handle complex address expression and
general returns spill_cost for it, which is bad since address iv is
important to IVOPT.
2) IVOPT creates duplicate candidates for IVs which have same value in
different forms, for example, two candidates are generated with each for
"&a[0]" and "&a".  Again, it's even worse for auto-increment addressing
mode.

This patch fixes the issue by simplifying address expression at the entry of
allocating IV struct.  Maybe the simplification can be put in various fold*
functions but I think it might be better in this way, because:
1) fold* functions are used from front-end to various tree optimizations,
the simplified address expressions may not be what each optimizer wanted.
Think about parallelism related passes, they might want the array index
information kept for further analysis.
2) In some way, the simplification is conflict with current implementation
of fold* function.  Take fold_binary_loc as an example, it tries to simplify
"&a[i1] +p c* i2" into "&a[i1+i2]".  Of course we can simplify in this way
for IVOPT too, but that will cause new problems like: a) we have to add code
in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want
to avoid; b) the simplification can't always be done because of the
sign/unsigned offset problem (especially for auto-increment addressing
mode).
3) There are many entry point for fold* functions, the change will be
non-trivial.
4) The simplification is only done in alloc_iv for true (not duplicate ones)
iv struct, the number of such iv should be moderate.

With these points, I think it might be a win to do the simplification in
IVOPT and create a kind of sand box to let IVOPT play.  Any suggestions?

Bootstrap and tested on x86/x86_64/arm.
The patch causes three cases failed on some target, but all of them are
false alarm, which can be resolved by refining test cases to check more
accurate information.

Is it OK?

Thanks.
bin

gcc/testsuite/ChangeLog
2013-10-29  Bin Cheng  <bin.cheng@arm.com>

	* gcc.dg/tree-ssa/loop-2.c: Refine check condition.
	* gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto.
	* gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto.

2013-10-29  Bin Cheng  <bin.cheng@arm.com>

	* tree-ssa-loop-ivopts.c (alloc_iv): Simplify base expression.
	(get_shiftadd_cost): Check equality using operand_equal_p.
	(force_expr_to_var_cost): Refactor the code.  Handle type
	conversion.
	(split_address_cost): Call force_expr_to_var_cost.

Comments

Richard Biener Oct. 30, 2013, 2:46 p.m. UTC | #1

On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng <bin.cheng@arm.com> wrote:
> Hi,
> I noticed that IVOPT generates complex address expressions like below for iv
> base.
>         &arr_base[0].y
>         &arr[0]
>         &MEM[p+o]
> It's even worse for targets support auto-increment addressing mode because
> IVOPT adjusts such base expression with +/- step, then creates below:
>         &arr_base[0].y +/- step
>         &arr[0] +/- step
>         &MEM[p+o] +/- step
> It has two disadvantages:
> 1) Cost computation in IVOPT can't handle complex address expression and
> general returns spill_cost for it, which is bad since address iv is
> important to IVOPT.
> 2) IVOPT creates duplicate candidates for IVs which have same value in
> different forms, for example, two candidates are generated with each for
> "&a[0]" and "&a".  Again, it's even worse for auto-increment addressing
> mode.
>
> This patch fixes the issue by simplifying address expression at the entry of
> allocating IV struct.  Maybe the simplification can be put in various fold*
> functions but I think it might be better in this way, because:
> 1) fold* functions are used from front-end to various tree optimizations,
> the simplified address expressions may not be what each optimizer wanted.
> Think about parallelism related passes, they might want the array index
> information kept for further analysis.
> 2) In some way, the simplification is conflict with current implementation
> of fold* function.  Take fold_binary_loc as an example, it tries to simplify
> "&a[i1] +p c* i2" into "&a[i1+i2]".  Of course we can simplify in this way
> for IVOPT too, but that will cause new problems like: a) we have to add code
> in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want
> to avoid; b) the simplification can't always be done because of the
> sign/unsigned offset problem (especially for auto-increment addressing
> mode).
> 3) There are many entry point for fold* functions, the change will be
> non-trivial.
> 4) The simplification is only done in alloc_iv for true (not duplicate ones)
> iv struct, the number of such iv should be moderate.
>
> With these points, I think it might be a win to do the simplification in
> IVOPT and create a kind of sand box to let IVOPT play.  Any suggestions?
>
> Bootstrap and tested on x86/x86_64/arm.
> The patch causes three cases failed on some target, but all of them are
> false alarm, which can be resolved by refining test cases to check more
> accurate information.
>
> Is it OK?

Hmm.  I think you want what get_inner_reference_aff does, using
the return value of get_inner_reference as starting point for
determine_base_object.  And you don't want to restrict yourselves
so much on what exprs to process, but only exclude DECL_Ps.
Just amend get_inner_reference_aff to return the tree base object.

Note that this isn't really "simplifying" but rather lowering all
addresses.

The other changes in this patch are unrelated, right?

Thanks,
Richard.

> Thanks.
> bin
>
> gcc/testsuite/ChangeLog
> 2013-10-29  Bin Cheng  <bin.cheng@arm.com>
>
>         * gcc.dg/tree-ssa/loop-2.c: Refine check condition.
>         * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto.
>         * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto.
>
> 2013-10-29  Bin Cheng  <bin.cheng@arm.com>
>
>         * tree-ssa-loop-ivopts.c (alloc_iv): Simplify base expression.
>         (get_shiftadd_cost): Check equality using operand_equal_p.
>         (force_expr_to_var_cost): Refactor the code.  Handle type
>         conversion.
>         (split_address_cost): Call force_expr_to_var_cost.

Yufeng Zhang Oct. 31, 2013, 12:37 p.m. UTC | #2

On 10/30/13 14:46, Richard Biener wrote:
> On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng<bin.cheng@arm.com>  wrote:
>> Hi,
>> I noticed that IVOPT generates complex address expressions like below for iv
>> base.
>>          &arr_base[0].y
>>          &arr[0]
>>          &MEM[p+o]
>> It's even worse for targets support auto-increment addressing mode because
>> IVOPT adjusts such base expression with +/- step, then creates below:
>>          &arr_base[0].y +/- step
>>          &arr[0] +/- step
>>          &MEM[p+o] +/- step
>> It has two disadvantages:
>> 1) Cost computation in IVOPT can't handle complex address expression and
>> general returns spill_cost for it, which is bad since address iv is
>> important to IVOPT.
>> 2) IVOPT creates duplicate candidates for IVs which have same value in
>> different forms, for example, two candidates are generated with each for
>> "&a[0]" and "&a".  Again, it's even worse for auto-increment addressing
>> mode.
>>
>> This patch fixes the issue by simplifying address expression at the entry of
>> allocating IV struct.  Maybe the simplification can be put in various fold*
>> functions but I think it might be better in this way, because:
>> 1) fold* functions are used from front-end to various tree optimizations,
>> the simplified address expressions may not be what each optimizer wanted.
>> Think about parallelism related passes, they might want the array index
>> information kept for further analysis.
>> 2) In some way, the simplification is conflict with current implementation
>> of fold* function.  Take fold_binary_loc as an example, it tries to simplify
>> "&a[i1] +p c* i2" into "&a[i1+i2]".  Of course we can simplify in this way
>> for IVOPT too, but that will cause new problems like: a) we have to add code
>> in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want
>> to avoid; b) the simplification can't always be done because of the
>> sign/unsigned offset problem (especially for auto-increment addressing
>> mode).
>> 3) There are many entry point for fold* functions, the change will be
>> non-trivial.
>> 4) The simplification is only done in alloc_iv for true (not duplicate ones)
>> iv struct, the number of such iv should be moderate.
>>
>> With these points, I think it might be a win to do the simplification in
>> IVOPT and create a kind of sand box to let IVOPT play.  Any suggestions?
>>
>> Bootstrap and tested on x86/x86_64/arm.
>> The patch causes three cases failed on some target, but all of them are
>> false alarm, which can be resolved by refining test cases to check more
>> accurate information.
>>
>> Is it OK?
>
> Hmm.  I think you want what get_inner_reference_aff does, using
> the return value of get_inner_reference as starting point for
> determine_base_object.  And you don't want to restrict yourselves
> so much on what exprs to process, but only exclude DECL_Ps.
> Just amend get_inner_reference_aff to return the tree base object.

Or, update determine_base_object to handle MEM_REF, ARRAY_REF, 
COMPONENT_REF, etc. by calling get_inner_reference to get the base and 
continuing the recursive determine_base_object on the return value 
(TREE_OPERAND (base, 0)).

Calling an amended get_inner_reference_aff can be expensive, as the 
function will also spend time in transforming the reference from tree to 
aff_tree.

Yufeng

Bin Cheng Nov. 4, 2013, 8:35 a.m. UTC | #3

> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Wednesday, October 30, 2013 10:46 PM
> To: Bin Cheng
> Cc: GCC Patches
> Subject: Re: [PATCH GCC]Simplify address expression in IVOPT
> 
> On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng <bin.cheng@arm.com> wrote:
> 
> Hmm.  I think you want what get_inner_reference_aff does, using the return
> value of get_inner_reference as starting point for determine_base_object.
> And you don't want to restrict yourselves so much on what exprs to
process,
> but only exclude DECL_Ps.
Did you mean I should pass all address expressions to
get_inner_reference_aff except the one with declaration operand?
I am a little confused here why DECL_Ps should be handled specially?  Seems
it can be handled properly by the get_* function.  Anything important I
missed?

> Just amend get_inner_reference_aff to return the tree base object.
I found it's inappropriate to do this because: functions like
get_inner_reference* only accept reference expressions, while base_object
has to be computed for other kinds of expressions too.  Take gimple
statement "a_1 = *ptr_1" as an example, the base passed to alloc_iv is
ptr_1; the base_object computed by determine_base_object is also ptr_1,
which we can't rely on get_inner_reference* .

Also It seems the comment before determine_base_object might be inaccurate:
" Returns a memory object to that EXPR points." which I think is " Returns a
pointer pointing to the main memory object to that EXPR points."
Right?
> 
> Note that this isn't really "simplifying" but rather lowering all
addresses.
> 
> The other changes in this patch are unrelated, right?
Right, I should have named this message like "refine cost computation of
expressions in IVOPT" just as the patch file.  

Thanks.
bin

Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/loop-2.c	(revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-2.c	(working copy)
@@ -27,7 +27,7 @@  void xxx(void)
 
 /* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */
 
 /* 17 * iter should be strength reduced.  */
 
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(working copy)
@@ -7,7 +7,8 @@ 
 
 extern char a[];
 
-/* Can not infer loop iteration from array -- exit test can not be replaced.  */
+/* Can not infer loop iteration from array -- exit test can not be
+   replaced by the array address.  */
 void foo (unsigned int i_width, TYPE dst)
 {
   unsigned long long i = 0;
@@ -21,5 +22,5 @@  void foo (unsigned int i_width, TYPE dst)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "\[^:\]*if \\(.*j_\[0-9\]+.*\\)" 1 "ivopts"} } */
 /* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(working copy)
@@ -18,5 +18,5 @@  long foo(long* p, long* p2, int N1, int N2)
   return s;
 }
 
-/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(.*p2.*\\)" 1 "ivopts"} } */
 /* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 204117)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -924,9 +924,25 @@  determine_base_object (tree expr)
 static struct iv *
 alloc_iv (tree base, tree step)
 {
+  tree expr = base;
   struct iv *iv = XCNEW (struct iv);
   gcc_assert (step != NULL_TREE);
 
+  /* Simplify complex address expressions like &MEM_REF, &ARRAY_REF
+     and &COMPONENT_REF into general expressions.  By doing this:
+       1) More accurate cost can be computed for address expressions;
+       2) Duplicate candidates won't be created.  */
+  STRIP_NOPS (expr);
+  if (TREE_CODE (expr) == ADDR_EXPR
+      && (TREE_CODE (TREE_OPERAND (expr, 0)) == MEM_REF
+	  || TREE_CODE (TREE_OPERAND (expr, 0)) == ARRAY_REF
+	  || TREE_CODE (TREE_OPERAND (expr, 0)) == COMPONENT_REF))
+    {
+      aff_tree comb;
+      tree_to_aff_combination (expr, TREE_TYPE (base), &comb);
+      base = fold_convert (TREE_TYPE (base), aff_combination_to_tree (&comb));
+    }
+
   iv->base = base;
   iv->base_object = determine_base_object (base);
   iv->step = step;
@@ -3481,17 +3497,21 @@  get_shiftadd_cost (tree expr, enum machine_mode mo
   int m = exact_log2 (int_cst_value (cst));
   int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode));
   int sa_cost;
+  bool equal_p = false;
 
   if (!(m >= 0 && m < maxm))
     return false;
 
+  if (operand_equal_p (op1, mult, 0))
+    equal_p = true;
+
   sa_cost = (TREE_CODE (expr) != MINUS_EXPR
              ? shiftadd_cost (speed, mode, m)
-             : (mult == op1
+             : (equal_p
                 ? shiftsub1_cost (speed, mode, m)
                 : shiftsub0_cost (speed, mode, m)));
   res = new_cost (sa_cost, 0);
-  res = add_costs (res, mult == op1 ? cost0 : cost1);
+  res = add_costs (res, equal_p ? cost0 : cost1);
 
   STRIP_NOPS (multop);
   if (!is_gimple_val (multop))
@@ -3585,30 +3605,14 @@  force_expr_to_var_cost (tree expr, bool speed)
       op1 = TREE_OPERAND (expr, 1);
       STRIP_NOPS (op0);
       STRIP_NOPS (op1);
-
-      if (is_gimple_val (op0))
-	cost0 = no_cost;
-      else
-	cost0 = force_expr_to_var_cost (op0, speed);
-
-      if (is_gimple_val (op1))
-	cost1 = no_cost;
-      else
-	cost1 = force_expr_to_var_cost (op1, speed);
-
       break;
 
+    case NOP_EXPR:
+    case CONVERT_EXPR:
     case NEGATE_EXPR:
       op0 = TREE_OPERAND (expr, 0);
       STRIP_NOPS (op0);
       op1 = NULL_TREE;
-
-      if (is_gimple_val (op0))
-	cost0 = no_cost;
-      else
-	cost0 = force_expr_to_var_cost (op0, speed);
-
-      cost1 = no_cost;
       break;
 
     default:
@@ -3616,6 +3620,18 @@  force_expr_to_var_cost (tree expr, bool speed)
       return new_cost (target_spill_cost[speed], 0);
     }
 
+  if (op0 == NULL_TREE
+      || (is_gimple_val (op0) && (TREE_CODE (op0) != ADDR_EXPR)))
+    cost0 = no_cost;
+  else
+    cost0 = force_expr_to_var_cost (op0, speed);
+
+  if (op1 == NULL_TREE
+      || (is_gimple_val (op1) && (TREE_CODE (op1) != ADDR_EXPR)))
+    cost1 = no_cost;
+  else
+    cost1 = force_expr_to_var_cost (op1, speed);
+
   mode = TYPE_MODE (TREE_TYPE (expr));
   switch (TREE_CODE (expr))
     {
@@ -3639,7 +3655,18 @@  force_expr_to_var_cost (tree expr, bool speed)
                                     speed, &sa_cost))
             return sa_cost;
         }
+
       break;
+    case NOP_EXPR:
+    case CONVERT_EXPR:
+      {
+	tree inner_mode, outer_mode;
+	outer_mode = TREE_TYPE (expr);
+	inner_mode = TREE_TYPE (op0);
+	cost = new_cost (convert_cost (TYPE_MODE (outer_mode),
+				       TYPE_MODE (inner_mode), speed), 0);
+      }
+      break;
 
     case MULT_EXPR:
       if (cst_and_fits_in_hwi (op0))
@@ -3713,7 +3740,7 @@  split_address_cost (struct ivopts_data *data,
       *var_present = true;
       fd_ivopts_data = data;
       walk_tree (&addr, find_depends, depends_on, NULL);
-      return new_cost (target_spill_cost[data->speed], 0);
+      return force_expr_to_var_cost (addr, data->speed);
     }
 
   *offset += bitpos / BITS_PER_UNIT;