Patchwork Catch more MEM_REFs sharing common addressing part in gimple strength reduction

login
register
mail settings
Submitter Bin Cheng
Date Sept. 2, 2013, 6:56 a.m.
Message ID <003f01cea7a9$8e984ae0$abc8e0a0$@arm.com>
Download mbox | patch
Permalink /patch/271685/
State New
Headers show

Comments

Bin Cheng - Sept. 2, 2013, 6:56 a.m.
Hi,

The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find
different MEM_REFs sharing common part in addressing expression.  If such
MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient
addressing expression during the RTL passes.
The pass analyzes addressing expression in each MEM_REF to see if it can be
formalized as follows:
     base:    MEM_REF (T1, C1)
     offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
     bitpos:  C4 * BITS_PER_UNIT
Then restructures it into below form:
     MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
              C1 + (C2 * C3) + C4)
At last, rewrite the MEM_REFs if there are two or more sharing common
(non-constant) part.
The problem is it doesn't back trace T2.  If T2 is recorded as a CAND_ADD in
form of "T2' + C5", the MEM_REF should be restructure into:
     MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)),
              C1 + (C2 * C3) + C4 + (C5 * C3))

The patch also includes a test case to illustrate the problem.

Bootstrapped and tested on x86/x86_64/arm-a15, is it ok?

Thanks.
bin

2013-09-02  Bin Cheng  <bin.cheng@arm.com>

	* gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New.
	(restructure_reference): Call backtrace_base_for_ref.

gcc/testsuite/ChangeLog
2013-09-02  Bin Cheng  <bin.cheng@arm.com>

	* gcc.dg/tree-ssa/slsr-39.c: New test.
Richard Guenther - Sept. 2, 2013, 9:15 a.m.
On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng <bin.cheng@arm.com> wrote:
> Hi,
>
> The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find
> different MEM_REFs sharing common part in addressing expression.  If such
> MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient
> addressing expression during the RTL passes.
> The pass analyzes addressing expression in each MEM_REF to see if it can be
> formalized as follows:
>      base:    MEM_REF (T1, C1)
>      offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
>      bitpos:  C4 * BITS_PER_UNIT
> Then restructures it into below form:
>      MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
>               C1 + (C2 * C3) + C4)
> At last, rewrite the MEM_REFs if there are two or more sharing common
> (non-constant) part.
> The problem is it doesn't back trace T2.  If T2 is recorded as a CAND_ADD in
> form of "T2' + C5", the MEM_REF should be restructure into:
>      MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)),
>               C1 + (C2 * C3) + C4 + (C5 * C3))
>
> The patch also includes a test case to illustrate the problem.
>
> Bootstrapped and tested on x86/x86_64/arm-a15, is it ok?

This looks ok to me if Bill is ok with it.

Thanks,
Richard.

> Thanks.
> bin
>
> 2013-09-02  Bin Cheng  <bin.cheng@arm.com>
>
>         * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New.
>         (restructure_reference): Call backtrace_base_for_ref.
>
> gcc/testsuite/ChangeLog
> 2013-09-02  Bin Cheng  <bin.cheng@arm.com>
>
>         * gcc.dg/tree-ssa/slsr-39.c: New test.
William J. Schmidt - Sept. 8, 2013, 3:14 a.m.
On Mon, 2013-09-02 at 11:15 +0200, Richard Biener wrote:
> On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng <bin.cheng@arm.com> wrote:
> > Hi,
> >
> > The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find
> > different MEM_REFs sharing common part in addressing expression.  If such
> > MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient
> > addressing expression during the RTL passes.
> > The pass analyzes addressing expression in each MEM_REF to see if it can be
> > formalized as follows:
> >      base:    MEM_REF (T1, C1)
> >      offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> >      bitpos:  C4 * BITS_PER_UNIT
> > Then restructures it into below form:
> >      MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> >               C1 + (C2 * C3) + C4)
> > At last, rewrite the MEM_REFs if there are two or more sharing common
> > (non-constant) part.
> > The problem is it doesn't back trace T2.  If T2 is recorded as a CAND_ADD in
> > form of "T2' + C5", the MEM_REF should be restructure into:
> >      MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)),
> >               C1 + (C2 * C3) + C4 + (C5 * C3))
> >
> > The patch also includes a test case to illustrate the problem.
> >
> > Bootstrapped and tested on x86/x86_64/arm-a15, is it ok?
> 
> This looks ok to me if Bill is ok with it.

Sorry, I've been on vacation and haven't been checking in until now.
I'll have a look at this tomorrow -- sounds good on the surface!

Thanks,
Bill

> 
> Thanks,
> Richard.
> 
> > Thanks.
> > bin
> >
> > 2013-09-02  Bin Cheng  <bin.cheng@arm.com>
> >
> >         * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New.
> >         (restructure_reference): Call backtrace_base_for_ref.
> >
> > gcc/testsuite/ChangeLog
> > 2013-09-02  Bin Cheng  <bin.cheng@arm.com>
> >
> >         * gcc.dg/tree-ssa/slsr-39.c: New test.
>
William J. Schmidt - Sept. 9, 2013, 12:15 a.m.
On Mon, 2013-09-02 at 11:15 +0200, Richard Biener wrote:
> On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng <bin.cheng@arm.com> wrote:
> > Hi,
> >
> > The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find
> > different MEM_REFs sharing common part in addressing expression.  If such
> > MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient
> > addressing expression during the RTL passes.
> > The pass analyzes addressing expression in each MEM_REF to see if it can be
> > formalized as follows:
> >      base:    MEM_REF (T1, C1)
> >      offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> >      bitpos:  C4 * BITS_PER_UNIT
> > Then restructures it into below form:
> >      MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> >               C1 + (C2 * C3) + C4)
> > At last, rewrite the MEM_REFs if there are two or more sharing common
> > (non-constant) part.
> > The problem is it doesn't back trace T2.  If T2 is recorded as a CAND_ADD in
> > form of "T2' + C5", the MEM_REF should be restructure into:
> >      MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)),
> >               C1 + (C2 * C3) + C4 + (C5 * C3))
> >
> > The patch also includes a test case to illustrate the problem.
> >
> > Bootstrapped and tested on x86/x86_64/arm-a15, is it ok?
> 
> This looks ok to me if Bill is ok with it.

This is a good generalization and I'm fine with it.  There are a few
minor nits that should be corrected, outlined below.

> 
> Thanks,
> Richard.
> 
> > Thanks.
> > bin
> >
> > 2013-09-02  Bin Cheng  <bin.cheng@arm.com>
> >
> >         * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New.
> >         (restructure_reference): Call backtrace_base_for_ref.
> >
> > gcc/testsuite/ChangeLog
> > 2013-09-02  Bin Cheng  <bin.cheng@arm.com>
> >
> >         * gcc.dg/tree-ssa/slsr-39.c: New test.
> 

>>Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c
>>===================================================================
>>--- >>gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c	>>(revision 0)
>>+++ >>gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c	>>(revision 0)
>>@@ -0,0 +1,26 @@
>>+/* Verify straight-line strength reduction for back-tracing
>>+   CADN_ADD for T2 in:

CAND_ADD

>>+
>>+    *PBASE:    T1
>>+    *POFFSET:  MULT_EXPR (T2, C3)
>>+    *PINDEX:   C1 + (C2 * C3) + C4  */
>>+
>>+/* { dg-do compile } */
>>+/* { dg-options "-O2 -fdump-tree-slsr" } */
>>+
>>+typedef int arr_2[50][50];
>>+
>>+void foo (arr_2 a2, int v1)
>>+{
>>+  int i, j;
>>+
>>+  i = v1 + 5;
>>+  j = i;
>>+  a2 [i] [j++] = i;
>>+  a2 [i] [j++] = i;
>>+  a2 [i] [i-1] += 1;
>>+  return;
>>+}
>>+
>>+/* { dg-final { scan-tree-dump-times "MEM" 4 "slsr" } } */
>>+/* { dg-final { cleanup-tree-dump "slsr" } } */
>>Index: gcc/gimple-ssa-strength-reduction.c
>>===================================================================
>>--- >>gcc/gimple-ssa-strength-reduction.c	>>(revision 202067)
>>+++ >>gcc/gimple-ssa-strength-reduction.c	>>(working copy)
>>@@ -750,6 +750,57 @@ slsr_process_phi (gimple phi, bool speed)
>>   add_cand_for_stmt (phi, c);
>> }
>> 
>>+/* Given PBASE which is a pointer to tree, loop up the defining

look up

>>+   statement for it and check whether the candidate is in the
>>+   form of:
>>+
>>+     X = B + (1 * S), S is integer constant
>>+     X = B + (i * S), S is integer one
>>+
>>+   If so, set PBASE to the candiate's base_expr and return double

candidate's

>>+   int (i * S).
>>+   Otherwise, just return double int zero.  */

This is sufficient, since you are properly checking the next_interp
chain.  Another possible form would be

    X = (B + i) * 1,

but if this is present, then one of the forms you're checking for should
also be present, so there's no need to check the MULT_CANDs.

>>+
>>+static double_int
>>+backtrace_base_for_ref (tree *pbase)
>>+{
>>+  tree base_in = *pbase;
>>+  slsr_cand_t base_cand;
>>+
>>+  STRIP_NOPS (base_in);
>>+  if (TREE_CODE (base_in) != SSA_NAME)
>>+    return tree_to_double_int (integer_zero_node);
>>+
>>+  base_cand = base_cand_from_table (base_in);
>>+
>>+  while (base_cand && base_cand->kind != CAND_PHI)
>>+    {
>>+      if (base_cand->kind == CAND_ADD
>>+	  && base_cand->index.is_one ()
>>+	  && TREE_CODE (base_cand->stride) == INTEGER_CST)
>>+	{
>>+	  /* X = B + (1 * S), S is integer constant.  */
>>+	  *pbase = base_cand->base_expr;
>>+	  return tree_to_double_int (base_cand->stride);
>>+	}
>>+      else if (base_cand->kind == CAND_ADD
>>+	       && TREE_CODE (base_cand->stride) == INTEGER_CST
>>+	       && integer_onep (base_cand->stride))
>>+        {
>>+	  /* X = B + (i * S), S is integer one.  */
>>+	  *pbase = base_cand->base_expr;
>>+	  return base_cand->index;
>>+	}
>>+
>>+      if (base_cand->next_interp)
>>+	base_cand = lookup_cand (base_cand->next_interp);
>>+      else
>>+	base_cand = NULL;
>>+    }
>>+
>>+  return tree_to_double_int (integer_zero_node);
>>+}
>>+
>> /* Look for the following pattern:
>> 
>>     *PBASE:    MEM_REF (T1, C1)
>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed)
>> 
>>     *PBASE:    T1
>>     *POFFSET:  MULT_EXPR (T2, C3)
>>-    *PINDEX:   C1 + (C2 * C3) + C4  */
>>+    *PINDEX:   C1 + (C2 * C3) + C4
>> 
>>+   When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It
                             ^                                      ^
                             a                                      it

>>+   will be further restructured to:
>>+
>>+    *PBASE:    T1
>>+    *POFFSET:  MULT_EXPR (T2', C3)
>>+    *PINDEX:   C1 + (C2 * C3) + C4 + (C5 * C3)  */
>>+
>> static bool
>> restructure_reference (tree *pbase, tree *poffset, double_int
*pindex,
>> 		       tree *ptype)
>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset,
>>   double_int index = *pindex;
>>   double_int bpu = double_int::from_uhwi (BITS_PER_UNIT);
>>   tree mult_op0, mult_op1, t1, t2, type;
>>-  double_int c1, c2, c3, c4;
>>+  double_int c1, c2, c3, c4, c5;
>> 
>>   if (!base
>>       || !offset
>>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree
*poffset,
>>     }
>> 
>>   c4 = index.udiv (bpu, FLOOR_DIV_EXPR);
>>+  c5 = backtrace_base_for_ref (&t2);
>> 
>>   *pbase = t1;
>>-  *poffset = fold_build2 (MULT_EXPR, sizetype, t2,
>>-			  double_int_to_tree (sizetype, c3));
>>-  *pindex = c1 + c2 * c3 + c4;
>>+  *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2),
>>+			 double_int_to_tree (sizetype, c3));

I am not sure why you changed this call.  fold_build2 is a more
efficient call than size_binop.  size_binop makes several checks that
will fail in this case, and then calls fold_build2_loc, right?  Not a
big deal but seems like changing it back would be better.  Perhaps I'm
missing something (as usual ;).

Thanks,
Bill

>>+  *pindex = c1 + c2 * c3 + c4 + c5 * c3;
>>   *ptype = type;
>> 
>>   return true;
Bin Cheng - Sept. 9, 2013, 6:25 a.m.
Thanks for reviewing, I will correct all stupid spelling problem in the next version of patch.

On Mon, Sep 9, 2013 at 8:15 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
>
>>>+   int (i * S).
>>>+   Otherwise, just return double int zero.  */
>
> This is sufficient, since you are properly checking the next_interp
> chain.  Another possible form would be
>
>     X = (B + i) * 1,
>
> but if this is present, then one of the forms you're checking for should
> also be present, so there's no need to check the MULT_CANDs.
I'm not very sure here since I didn't check MULT_CAND in the patch.  Could you please explain more about this?

>
>>>+
>>>+static double_int
>>>+backtrace_base_for_ref (tree *pbase)
>>>+{
>>>+  tree base_in = *pbase;
>>>+  slsr_cand_t base_cand;
>>>+
>>>+  STRIP_NOPS (base_in);
>>>+  if (TREE_CODE (base_in) != SSA_NAME)
>>>+    return tree_to_double_int (integer_zero_node);
>>>+
>>>+  base_cand = base_cand_from_table (base_in);
>>>+
>>>+  while (base_cand && base_cand->kind != CAND_PHI)
>>>+    {
>>>+      if (base_cand->kind == CAND_ADD
>>>+       && base_cand->index.is_one ()
>>>+       && TREE_CODE (base_cand->stride) == INTEGER_CST)
>>>+     {
>>>+       /* X = B + (1 * S), S is integer constant.  */
>>>+       *pbase = base_cand->base_expr;
>>>+       return tree_to_double_int (base_cand->stride);
>>>+     }
>>>+      else if (base_cand->kind == CAND_ADD
>>>+            && TREE_CODE (base_cand->stride) == INTEGER_CST
>>>+            && integer_onep (base_cand->stride))
>>>+        {
>>>+       /* X = B + (i * S), S is integer one.  */
>>>+       *pbase = base_cand->base_expr;
>>>+       return base_cand->index;
>>>+     }
>>>+
>>>+      if (base_cand->next_interp)
>>>+     base_cand = lookup_cand (base_cand->next_interp);
>>>+      else
>>>+     base_cand = NULL;
>>>+    }
>>>+
>>>+  return tree_to_double_int (integer_zero_node);
>>>+}
>>>+
>>> /* Look for the following pattern:
>>>
>>>     *PBASE:    MEM_REF (T1, C1)
>>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed)
>>>
>>>     *PBASE:    T1
>>>     *POFFSET:  MULT_EXPR (T2, C3)
>>>-    *PINDEX:   C1 + (C2 * C3) + C4  */
>>>+    *PINDEX:   C1 + (C2 * C3) + C4
>>>
>>>+   When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It
>                              ^                                      ^
>                              a                                      it
>
>>>+   will be further restructured to:
>>>+
>>>+    *PBASE:    T1
>>>+    *POFFSET:  MULT_EXPR (T2', C3)
>>>+    *PINDEX:   C1 + (C2 * C3) + C4 + (C5 * C3)  */
>>>+
>>> static bool
>>> restructure_reference (tree *pbase, tree *poffset, double_int
> *pindex,
>>>                     tree *ptype)
>>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset,
>>>   double_int index = *pindex;
>>>   double_int bpu = double_int::from_uhwi (BITS_PER_UNIT);
>>>   tree mult_op0, mult_op1, t1, t2, type;
>>>-  double_int c1, c2, c3, c4;
>>>+  double_int c1, c2, c3, c4, c5;
>>>
>>>   if (!base
>>>       || !offset
>>>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree
> *poffset,
>>>     }
>>>
>>>   c4 = index.udiv (bpu, FLOOR_DIV_EXPR);
>>>+  c5 = backtrace_base_for_ref (&t2);
>>>
>>>   *pbase = t1;
>>>-  *poffset = fold_build2 (MULT_EXPR, sizetype, t2,
>>>-                       double_int_to_tree (sizetype, c3));
>>>-  *pindex = c1 + c2 * c3 + c4;
>>>+  *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2),
>>>+                      double_int_to_tree (sizetype, c3));
>
> I am not sure why you changed this call.  fold_build2 is a more
> efficient call than size_binop.  size_binop makes several checks that
> will fail in this case, and then calls fold_build2_loc, right?  Not a
> big deal but seems like changing it back would be better.  Perhaps I'm
> missing something (as usual ;).
I rely on size_binop to convert T2 into sizetype, because T2' may be in other kind of type.  Otherwise there will be ssa_verify error later.

Thanks.
bin
William J. Schmidt - Sept. 9, 2013, 3:20 p.m.
On Mon, 2013-09-09 at 14:25 +0800, bin.cheng wrote:
> Thanks for reviewing, I will correct all stupid spelling problem in the next version of patch.
> 
> On Mon, Sep 9, 2013 at 8:15 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
> >
> >>>+   int (i * S).
> >>>+   Otherwise, just return double int zero.  */
> >
> > This is sufficient, since you are properly checking the next_interp
> > chain.  Another possible form would be
> >
> >     X = (B + i) * 1,
> >
> > but if this is present, then one of the forms you're checking for should
> > also be present, so there's no need to check the MULT_CANDs.
> I'm not very sure here since I didn't check MULT_CAND in the patch.  Could you please explain more about this?

Sorry, perhaps I shouldn't have mentioned it.  I was simply stating
that, although a candidate representing B + i could be represented with
a CAND_MULT as shown, there is no need for you to check it (as you
don't) since there will also be a corresponding CAND_ADD in one of the
other forms.  Since you are walking the next_interp chain, this works.

In other words, the code is fine as is.  I was just thinking out loud
about other candidate types.

> 
> >
> >>>+
> >>>+static double_int
> >>>+backtrace_base_for_ref (tree *pbase)
> >>>+{
> >>>+  tree base_in = *pbase;
> >>>+  slsr_cand_t base_cand;
> >>>+
> >>>+  STRIP_NOPS (base_in);
> >>>+  if (TREE_CODE (base_in) != SSA_NAME)
> >>>+    return tree_to_double_int (integer_zero_node);
> >>>+
> >>>+  base_cand = base_cand_from_table (base_in);
> >>>+
> >>>+  while (base_cand && base_cand->kind != CAND_PHI)
> >>>+    {
> >>>+      if (base_cand->kind == CAND_ADD
> >>>+       && base_cand->index.is_one ()
> >>>+       && TREE_CODE (base_cand->stride) == INTEGER_CST)
> >>>+     {
> >>>+       /* X = B + (1 * S), S is integer constant.  */
> >>>+       *pbase = base_cand->base_expr;
> >>>+       return tree_to_double_int (base_cand->stride);
> >>>+     }
> >>>+      else if (base_cand->kind == CAND_ADD
> >>>+            && TREE_CODE (base_cand->stride) == INTEGER_CST
> >>>+            && integer_onep (base_cand->stride))
> >>>+        {
> >>>+       /* X = B + (i * S), S is integer one.  */
> >>>+       *pbase = base_cand->base_expr;
> >>>+       return base_cand->index;
> >>>+     }
> >>>+
> >>>+      if (base_cand->next_interp)
> >>>+     base_cand = lookup_cand (base_cand->next_interp);
> >>>+      else
> >>>+     base_cand = NULL;
> >>>+    }
> >>>+
> >>>+  return tree_to_double_int (integer_zero_node);
> >>>+}
> >>>+
> >>> /* Look for the following pattern:
> >>>
> >>>     *PBASE:    MEM_REF (T1, C1)
> >>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed)
> >>>
> >>>     *PBASE:    T1
> >>>     *POFFSET:  MULT_EXPR (T2, C3)
> >>>-    *PINDEX:   C1 + (C2 * C3) + C4  */
> >>>+    *PINDEX:   C1 + (C2 * C3) + C4
> >>>
> >>>+   When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It
> >                              ^                                      ^
> >                              a                                      it
> >
> >>>+   will be further restructured to:
> >>>+
> >>>+    *PBASE:    T1
> >>>+    *POFFSET:  MULT_EXPR (T2', C3)
> >>>+    *PINDEX:   C1 + (C2 * C3) + C4 + (C5 * C3)  */
> >>>+
> >>> static bool
> >>> restructure_reference (tree *pbase, tree *poffset, double_int
> > *pindex,
> >>>                     tree *ptype)
> >>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset,
> >>>   double_int index = *pindex;
> >>>   double_int bpu = double_int::from_uhwi (BITS_PER_UNIT);
> >>>   tree mult_op0, mult_op1, t1, t2, type;
> >>>-  double_int c1, c2, c3, c4;
> >>>+  double_int c1, c2, c3, c4, c5;
> >>>
> >>>   if (!base
> >>>       || !offset
> >>>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree
> > *poffset,
> >>>     }
> >>>
> >>>   c4 = index.udiv (bpu, FLOOR_DIV_EXPR);
> >>>+  c5 = backtrace_base_for_ref (&t2);
> >>>
> >>>   *pbase = t1;
> >>>-  *poffset = fold_build2 (MULT_EXPR, sizetype, t2,
> >>>-                       double_int_to_tree (sizetype, c3));
> >>>-  *pindex = c1 + c2 * c3 + c4;
> >>>+  *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2),
> >>>+                      double_int_to_tree (sizetype, c3));
> >
> > I am not sure why you changed this call.  fold_build2 is a more
> > efficient call than size_binop.  size_binop makes several checks that
> > will fail in this case, and then calls fold_build2_loc, right?  Not a
> > big deal but seems like changing it back would be better.  Perhaps I'm
> > missing something (as usual ;).
> I rely on size_binop to convert T2 into sizetype, because T2' may be in other kind of type.  Otherwise there will be ssa_verify error later.

OK, I see now.  I had thought this was handled by fold_build2, but
apparently not.  I guess all T2's formerly handled were already sizetype
as expected.  Thanks for the explanation!

Bill

> 
> Thanks.
> bin
> 
> 
> 
>
William J. Schmidt - Sept. 9, 2013, 3:35 p.m.
On Mon, 2013-09-09 at 10:20 -0500, Bill Schmidt wrote:
> On Mon, 2013-09-09 at 14:25 +0800, bin.cheng wrote:
> > Thanks for reviewing, I will correct all stupid spelling problem in the next version of patch.
> > 
> > On Mon, Sep 9, 2013 at 8:15 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote:
> > >
> > >>>+   int (i * S).
> > >>>+   Otherwise, just return double int zero.  */
> > >
> > > This is sufficient, since you are properly checking the next_interp
> > > chain.  Another possible form would be
> > >
> > >     X = (B + i) * 1,
> > >
> > > but if this is present, then one of the forms you're checking for should
> > > also be present, so there's no need to check the MULT_CANDs.
> > I'm not very sure here since I didn't check MULT_CAND in the patch.  Could you please explain more about this?
> 
> Sorry, perhaps I shouldn't have mentioned it.  I was simply stating
> that, although a candidate representing B + i could be represented with
> a CAND_MULT as shown, there is no need for you to check it (as you
> don't) since there will also be a corresponding CAND_ADD in one of the
> other forms.  Since you are walking the next_interp chain, this works.
> 
> In other words, the code is fine as is.  I was just thinking out loud
> about other candidate types.
> 
> > 
> > >
> > >>>+
> > >>>+static double_int
> > >>>+backtrace_base_for_ref (tree *pbase)
> > >>>+{
> > >>>+  tree base_in = *pbase;
> > >>>+  slsr_cand_t base_cand;
> > >>>+
> > >>>+  STRIP_NOPS (base_in);
> > >>>+  if (TREE_CODE (base_in) != SSA_NAME)
> > >>>+    return tree_to_double_int (integer_zero_node);
> > >>>+
> > >>>+  base_cand = base_cand_from_table (base_in);
> > >>>+
> > >>>+  while (base_cand && base_cand->kind != CAND_PHI)
> > >>>+    {
> > >>>+      if (base_cand->kind == CAND_ADD
> > >>>+       && base_cand->index.is_one ()
> > >>>+       && TREE_CODE (base_cand->stride) == INTEGER_CST)
> > >>>+     {
> > >>>+       /* X = B + (1 * S), S is integer constant.  */
> > >>>+       *pbase = base_cand->base_expr;
> > >>>+       return tree_to_double_int (base_cand->stride);
> > >>>+     }
> > >>>+      else if (base_cand->kind == CAND_ADD
> > >>>+            && TREE_CODE (base_cand->stride) == INTEGER_CST
> > >>>+            && integer_onep (base_cand->stride))
> > >>>+        {
> > >>>+       /* X = B + (i * S), S is integer one.  */
> > >>>+       *pbase = base_cand->base_expr;
> > >>>+       return base_cand->index;
> > >>>+     }
> > >>>+
> > >>>+      if (base_cand->next_interp)
> > >>>+     base_cand = lookup_cand (base_cand->next_interp);
> > >>>+      else
> > >>>+     base_cand = NULL;
> > >>>+    }
> > >>>+
> > >>>+  return tree_to_double_int (integer_zero_node);
> > >>>+}
> > >>>+
> > >>> /* Look for the following pattern:
> > >>>
> > >>>     *PBASE:    MEM_REF (T1, C1)
> > >>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed)
> > >>>
> > >>>     *PBASE:    T1
> > >>>     *POFFSET:  MULT_EXPR (T2, C3)
> > >>>-    *PINDEX:   C1 + (C2 * C3) + C4  */
> > >>>+    *PINDEX:   C1 + (C2 * C3) + C4
> > >>>
> > >>>+   When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It
> > >                              ^                                      ^
> > >                              a                                      it
> > >
> > >>>+   will be further restructured to:
> > >>>+
> > >>>+    *PBASE:    T1
> > >>>+    *POFFSET:  MULT_EXPR (T2', C3)
> > >>>+    *PINDEX:   C1 + (C2 * C3) + C4 + (C5 * C3)  */
> > >>>+
> > >>> static bool
> > >>> restructure_reference (tree *pbase, tree *poffset, double_int
> > > *pindex,
> > >>>                     tree *ptype)
> > >>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset,
> > >>>   double_int index = *pindex;
> > >>>   double_int bpu = double_int::from_uhwi (BITS_PER_UNIT);
> > >>>   tree mult_op0, mult_op1, t1, t2, type;
> > >>>-  double_int c1, c2, c3, c4;
> > >>>+  double_int c1, c2, c3, c4, c5;
> > >>>
> > >>>   if (!base
> > >>>       || !offset
> > >>>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree
> > > *poffset,
> > >>>     }
> > >>>
> > >>>   c4 = index.udiv (bpu, FLOOR_DIV_EXPR);
> > >>>+  c5 = backtrace_base_for_ref (&t2);
> > >>>
> > >>>   *pbase = t1;
> > >>>-  *poffset = fold_build2 (MULT_EXPR, sizetype, t2,
> > >>>-                       double_int_to_tree (sizetype, c3));
> > >>>-  *pindex = c1 + c2 * c3 + c4;
> > >>>+  *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2),
> > >>>+                      double_int_to_tree (sizetype, c3));
> > >
> > > I am not sure why you changed this call.  fold_build2 is a more
> > > efficient call than size_binop.  size_binop makes several checks that
> > > will fail in this case, and then calls fold_build2_loc, right?  Not a
> > > big deal but seems like changing it back would be better.  Perhaps I'm
> > > missing something (as usual ;).
> > I rely on size_binop to convert T2 into sizetype, because T2' may be in other kind of type.  Otherwise there will be ssa_verify error later.
> 
> OK, I see now.  I had thought this was handled by fold_build2, but
> apparently not.  I guess all T2's formerly handled were already sizetype
> as expected.  Thanks for the explanation!

So, wouldn't it suffice to change t2 to fold_convert (sizetype, t2) in
the argument list to fold_build2?  It's picking nits, but that would be
slightly more efficient.

Bill

> 
> Bill
> 
> > 
> > Thanks.
> > bin
> > 
> > 
> > 
> >

Patch

Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c	(revision 0)
@@ -0,0 +1,26 @@ 
+/* Verify straight-line strength reduction for back-tracing
+   CADN_ADD for T2 in:
+
+    *PBASE:    T1
+    *POFFSET:  MULT_EXPR (T2, C3)
+    *PINDEX:   C1 + (C2 * C3) + C4  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i] [j++] = i;
+  a2 [i] [j++] = i;
+  a2 [i] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 4 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */
Index: gcc/gimple-ssa-strength-reduction.c
===================================================================
--- gcc/gimple-ssa-strength-reduction.c	(revision 202067)
+++ gcc/gimple-ssa-strength-reduction.c	(working copy)
@@ -750,6 +750,57 @@  slsr_process_phi (gimple phi, bool speed)
   add_cand_for_stmt (phi, c);
 }
 
+/* Given PBASE which is a pointer to tree, loop up the defining
+   statement for it and check whether the candidate is in the
+   form of:
+
+     X = B + (1 * S), S is integer constant
+     X = B + (i * S), S is integer one
+
+   If so, set PBASE to the candiate's base_expr and return double
+   int (i * S).
+   Otherwise, just return double int zero.  */
+
+static double_int
+backtrace_base_for_ref (tree *pbase)
+{
+  tree base_in = *pbase;
+  slsr_cand_t base_cand;
+
+  STRIP_NOPS (base_in);
+  if (TREE_CODE (base_in) != SSA_NAME)
+    return tree_to_double_int (integer_zero_node);
+
+  base_cand = base_cand_from_table (base_in);
+
+  while (base_cand && base_cand->kind != CAND_PHI)
+    {
+      if (base_cand->kind == CAND_ADD
+	  && base_cand->index.is_one ()
+	  && TREE_CODE (base_cand->stride) == INTEGER_CST)
+	{
+	  /* X = B + (1 * S), S is integer constant.  */
+	  *pbase = base_cand->base_expr;
+	  return tree_to_double_int (base_cand->stride);
+	}
+      else if (base_cand->kind == CAND_ADD
+	       && TREE_CODE (base_cand->stride) == INTEGER_CST
+	       && integer_onep (base_cand->stride))
+        {
+	  /* X = B + (i * S), S is integer one.  */
+	  *pbase = base_cand->base_expr;
+	  return base_cand->index;
+	}
+
+      if (base_cand->next_interp)
+	base_cand = lookup_cand (base_cand->next_interp);
+      else
+	base_cand = NULL;
+    }
+
+  return tree_to_double_int (integer_zero_node);
+}
+
 /* Look for the following pattern:
 
     *PBASE:    MEM_REF (T1, C1)
@@ -767,8 +818,15 @@  slsr_process_phi (gimple phi, bool speed)
 
     *PBASE:    T1
     *POFFSET:  MULT_EXPR (T2, C3)
-    *PINDEX:   C1 + (C2 * C3) + C4  */
+    *PINDEX:   C1 + (C2 * C3) + C4
 
+   When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It
+   will be further restructured to:
+
+    *PBASE:    T1
+    *POFFSET:  MULT_EXPR (T2', C3)
+    *PINDEX:   C1 + (C2 * C3) + C4 + (C5 * C3)  */
+
 static bool
 restructure_reference (tree *pbase, tree *poffset, double_int *pindex,
 		       tree *ptype)
@@ -777,7 +835,7 @@  restructure_reference (tree *pbase, tree *poffset,
   double_int index = *pindex;
   double_int bpu = double_int::from_uhwi (BITS_PER_UNIT);
   tree mult_op0, mult_op1, t1, t2, type;
-  double_int c1, c2, c3, c4;
+  double_int c1, c2, c3, c4, c5;
 
   if (!base
       || !offset
@@ -823,11 +881,12 @@  restructure_reference (tree *pbase, tree *poffset,
     }
 
   c4 = index.udiv (bpu, FLOOR_DIV_EXPR);
+  c5 = backtrace_base_for_ref (&t2);
 
   *pbase = t1;
-  *poffset = fold_build2 (MULT_EXPR, sizetype, t2,
-			  double_int_to_tree (sizetype, c3));
-  *pindex = c1 + c2 * c3 + c4;
+  *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2),
+			 double_int_to_tree (sizetype, c3));
+  *pindex = c1 + c2 * c3 + c4 + c5 * c3;
   *ptype = type;
 
   return true;