Message ID | 003f01cea7a9$8e984ae0$abc8e0a0$@arm.com |
---|---|
State | New |
Headers | show |
On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng <bin.cheng@arm.com> wrote: > Hi, > > The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find > different MEM_REFs sharing common part in addressing expression. If such > MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient > addressing expression during the RTL passes. > The pass analyzes addressing expression in each MEM_REF to see if it can be > formalized as follows: > base: MEM_REF (T1, C1) > offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) > bitpos: C4 * BITS_PER_UNIT > Then restructures it into below form: > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), > C1 + (C2 * C3) + C4) > At last, rewrite the MEM_REFs if there are two or more sharing common > (non-constant) part. > The problem is it doesn't back trace T2. If T2 is recorded as a CAND_ADD in > form of "T2' + C5", the MEM_REF should be restructure into: > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)), > C1 + (C2 * C3) + C4 + (C5 * C3)) > > The patch also includes a test case to illustrate the problem. > > Bootstrapped and tested on x86/x86_64/arm-a15, is it ok? This looks ok to me if Bill is ok with it. Thanks, Richard. > Thanks. > bin > > 2013-09-02 Bin Cheng <bin.cheng@arm.com> > > * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New. > (restructure_reference): Call backtrace_base_for_ref. > > gcc/testsuite/ChangeLog > 2013-09-02 Bin Cheng <bin.cheng@arm.com> > > * gcc.dg/tree-ssa/slsr-39.c: New test.
On Mon, 2013-09-02 at 11:15 +0200, Richard Biener wrote: > On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng <bin.cheng@arm.com> wrote: > > Hi, > > > > The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find > > different MEM_REFs sharing common part in addressing expression. If such > > MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient > > addressing expression during the RTL passes. > > The pass analyzes addressing expression in each MEM_REF to see if it can be > > formalized as follows: > > base: MEM_REF (T1, C1) > > offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) > > bitpos: C4 * BITS_PER_UNIT > > Then restructures it into below form: > > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), > > C1 + (C2 * C3) + C4) > > At last, rewrite the MEM_REFs if there are two or more sharing common > > (non-constant) part. > > The problem is it doesn't back trace T2. If T2 is recorded as a CAND_ADD in > > form of "T2' + C5", the MEM_REF should be restructure into: > > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)), > > C1 + (C2 * C3) + C4 + (C5 * C3)) > > > > The patch also includes a test case to illustrate the problem. > > > > Bootstrapped and tested on x86/x86_64/arm-a15, is it ok? > > This looks ok to me if Bill is ok with it. Sorry, I've been on vacation and haven't been checking in until now. I'll have a look at this tomorrow -- sounds good on the surface! Thanks, Bill > > Thanks, > Richard. > > > Thanks. > > bin > > > > 2013-09-02 Bin Cheng <bin.cheng@arm.com> > > > > * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New. > > (restructure_reference): Call backtrace_base_for_ref. > > > > gcc/testsuite/ChangeLog > > 2013-09-02 Bin Cheng <bin.cheng@arm.com> > > > > * gcc.dg/tree-ssa/slsr-39.c: New test. >
On Mon, 2013-09-02 at 11:15 +0200, Richard Biener wrote: > On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng <bin.cheng@arm.com> wrote: > > Hi, > > > > The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find > > different MEM_REFs sharing common part in addressing expression. If such > > MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient > > addressing expression during the RTL passes. > > The pass analyzes addressing expression in each MEM_REF to see if it can be > > formalized as follows: > > base: MEM_REF (T1, C1) > > offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) > > bitpos: C4 * BITS_PER_UNIT > > Then restructures it into below form: > > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), > > C1 + (C2 * C3) + C4) > > At last, rewrite the MEM_REFs if there are two or more sharing common > > (non-constant) part. > > The problem is it doesn't back trace T2. If T2 is recorded as a CAND_ADD in > > form of "T2' + C5", the MEM_REF should be restructure into: > > MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)), > > C1 + (C2 * C3) + C4 + (C5 * C3)) > > > > The patch also includes a test case to illustrate the problem. > > > > Bootstrapped and tested on x86/x86_64/arm-a15, is it ok? > > This looks ok to me if Bill is ok with it. This is a good generalization and I'm fine with it. There are a few minor nits that should be corrected, outlined below. > > Thanks, > Richard. > > > Thanks. > > bin > > > > 2013-09-02 Bin Cheng <bin.cheng@arm.com> > > > > * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New. > > (restructure_reference): Call backtrace_base_for_ref. > > > > gcc/testsuite/ChangeLog > > 2013-09-02 Bin Cheng <bin.cheng@arm.com> > > > > * gcc.dg/tree-ssa/slsr-39.c: New test. > >>Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>=================================================================== >>--- >>gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>(revision 0) >>+++ >>gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>(revision 0) >>@@ -0,0 +1,26 @@ >>+/* Verify straight-line strength reduction for back-tracing >>+ CADN_ADD for T2 in: CAND_ADD >>+ >>+ *PBASE: T1 >>+ *POFFSET: MULT_EXPR (T2, C3) >>+ *PINDEX: C1 + (C2 * C3) + C4 */ >>+ >>+/* { dg-do compile } */ >>+/* { dg-options "-O2 -fdump-tree-slsr" } */ >>+ >>+typedef int arr_2[50][50]; >>+ >>+void foo (arr_2 a2, int v1) >>+{ >>+ int i, j; >>+ >>+ i = v1 + 5; >>+ j = i; >>+ a2 [i] [j++] = i; >>+ a2 [i] [j++] = i; >>+ a2 [i] [i-1] += 1; >>+ return; >>+} >>+ >>+/* { dg-final { scan-tree-dump-times "MEM" 4 "slsr" } } */ >>+/* { dg-final { cleanup-tree-dump "slsr" } } */ >>Index: gcc/gimple-ssa-strength-reduction.c >>=================================================================== >>--- >>gcc/gimple-ssa-strength-reduction.c >>(revision 202067) >>+++ >>gcc/gimple-ssa-strength-reduction.c >>(working copy) >>@@ -750,6 +750,57 @@ slsr_process_phi (gimple phi, bool speed) >> add_cand_for_stmt (phi, c); >> } >> >>+/* Given PBASE which is a pointer to tree, loop up the defining look up >>+ statement for it and check whether the candidate is in the >>+ form of: >>+ >>+ X = B + (1 * S), S is integer constant >>+ X = B + (i * S), S is integer one >>+ >>+ If so, set PBASE to the candiate's base_expr and return double candidate's >>+ int (i * S). >>+ Otherwise, just return double int zero. */ This is sufficient, since you are properly checking the next_interp chain. Another possible form would be X = (B + i) * 1, but if this is present, then one of the forms you're checking for should also be present, so there's no need to check the MULT_CANDs. >>+ >>+static double_int >>+backtrace_base_for_ref (tree *pbase) >>+{ >>+ tree base_in = *pbase; >>+ slsr_cand_t base_cand; >>+ >>+ STRIP_NOPS (base_in); >>+ if (TREE_CODE (base_in) != SSA_NAME) >>+ return tree_to_double_int (integer_zero_node); >>+ >>+ base_cand = base_cand_from_table (base_in); >>+ >>+ while (base_cand && base_cand->kind != CAND_PHI) >>+ { >>+ if (base_cand->kind == CAND_ADD >>+ && base_cand->index.is_one () >>+ && TREE_CODE (base_cand->stride) == INTEGER_CST) >>+ { >>+ /* X = B + (1 * S), S is integer constant. */ >>+ *pbase = base_cand->base_expr; >>+ return tree_to_double_int (base_cand->stride); >>+ } >>+ else if (base_cand->kind == CAND_ADD >>+ && TREE_CODE (base_cand->stride) == INTEGER_CST >>+ && integer_onep (base_cand->stride)) >>+ { >>+ /* X = B + (i * S), S is integer one. */ >>+ *pbase = base_cand->base_expr; >>+ return base_cand->index; >>+ } >>+ >>+ if (base_cand->next_interp) >>+ base_cand = lookup_cand (base_cand->next_interp); >>+ else >>+ base_cand = NULL; >>+ } >>+ >>+ return tree_to_double_int (integer_zero_node); >>+} >>+ >> /* Look for the following pattern: >> >> *PBASE: MEM_REF (T1, C1) >>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) >> >> *PBASE: T1 >> *POFFSET: MULT_EXPR (T2, C3) >>- *PINDEX: C1 + (C2 * C3) + C4 */ >>+ *PINDEX: C1 + (C2 * C3) + C4 >> >>+ When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It ^ ^ a it >>+ will be further restructured to: >>+ >>+ *PBASE: T1 >>+ *POFFSET: MULT_EXPR (T2', C3) >>+ *PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ >>+ >> static bool >> restructure_reference (tree *pbase, tree *poffset, double_int *pindex, >> tree *ptype) >>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, >> double_int index = *pindex; >> double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); >> tree mult_op0, mult_op1, t1, t2, type; >>- double_int c1, c2, c3, c4; >>+ double_int c1, c2, c3, c4, c5; >> >> if (!base >> || !offset >>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree *poffset, >> } >> >> c4 = index.udiv (bpu, FLOOR_DIV_EXPR); >>+ c5 = backtrace_base_for_ref (&t2); >> >> *pbase = t1; >>- *poffset = fold_build2 (MULT_EXPR, sizetype, t2, >>- double_int_to_tree (sizetype, c3)); >>- *pindex = c1 + c2 * c3 + c4; >>+ *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2), >>+ double_int_to_tree (sizetype, c3)); I am not sure why you changed this call. fold_build2 is a more efficient call than size_binop. size_binop makes several checks that will fail in this case, and then calls fold_build2_loc, right? Not a big deal but seems like changing it back would be better. Perhaps I'm missing something (as usual ;). Thanks, Bill >>+ *pindex = c1 + c2 * c3 + c4 + c5 * c3; >> *ptype = type; >> >> return true;
Thanks for reviewing, I will correct all stupid spelling problem in the next version of patch. On Mon, Sep 9, 2013 at 8:15 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > >>>+ int (i * S). >>>+ Otherwise, just return double int zero. */ > > This is sufficient, since you are properly checking the next_interp > chain. Another possible form would be > > X = (B + i) * 1, > > but if this is present, then one of the forms you're checking for should > also be present, so there's no need to check the MULT_CANDs. I'm not very sure here since I didn't check MULT_CAND in the patch. Could you please explain more about this? > >>>+ >>>+static double_int >>>+backtrace_base_for_ref (tree *pbase) >>>+{ >>>+ tree base_in = *pbase; >>>+ slsr_cand_t base_cand; >>>+ >>>+ STRIP_NOPS (base_in); >>>+ if (TREE_CODE (base_in) != SSA_NAME) >>>+ return tree_to_double_int (integer_zero_node); >>>+ >>>+ base_cand = base_cand_from_table (base_in); >>>+ >>>+ while (base_cand && base_cand->kind != CAND_PHI) >>>+ { >>>+ if (base_cand->kind == CAND_ADD >>>+ && base_cand->index.is_one () >>>+ && TREE_CODE (base_cand->stride) == INTEGER_CST) >>>+ { >>>+ /* X = B + (1 * S), S is integer constant. */ >>>+ *pbase = base_cand->base_expr; >>>+ return tree_to_double_int (base_cand->stride); >>>+ } >>>+ else if (base_cand->kind == CAND_ADD >>>+ && TREE_CODE (base_cand->stride) == INTEGER_CST >>>+ && integer_onep (base_cand->stride)) >>>+ { >>>+ /* X = B + (i * S), S is integer one. */ >>>+ *pbase = base_cand->base_expr; >>>+ return base_cand->index; >>>+ } >>>+ >>>+ if (base_cand->next_interp) >>>+ base_cand = lookup_cand (base_cand->next_interp); >>>+ else >>>+ base_cand = NULL; >>>+ } >>>+ >>>+ return tree_to_double_int (integer_zero_node); >>>+} >>>+ >>> /* Look for the following pattern: >>> >>> *PBASE: MEM_REF (T1, C1) >>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) >>> >>> *PBASE: T1 >>> *POFFSET: MULT_EXPR (T2, C3) >>>- *PINDEX: C1 + (C2 * C3) + C4 */ >>>+ *PINDEX: C1 + (C2 * C3) + C4 >>> >>>+ When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It > ^ ^ > a it > >>>+ will be further restructured to: >>>+ >>>+ *PBASE: T1 >>>+ *POFFSET: MULT_EXPR (T2', C3) >>>+ *PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ >>>+ >>> static bool >>> restructure_reference (tree *pbase, tree *poffset, double_int > *pindex, >>> tree *ptype) >>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, >>> double_int index = *pindex; >>> double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); >>> tree mult_op0, mult_op1, t1, t2, type; >>>- double_int c1, c2, c3, c4; >>>+ double_int c1, c2, c3, c4, c5; >>> >>> if (!base >>> || !offset >>>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree > *poffset, >>> } >>> >>> c4 = index.udiv (bpu, FLOOR_DIV_EXPR); >>>+ c5 = backtrace_base_for_ref (&t2); >>> >>> *pbase = t1; >>>- *poffset = fold_build2 (MULT_EXPR, sizetype, t2, >>>- double_int_to_tree (sizetype, c3)); >>>- *pindex = c1 + c2 * c3 + c4; >>>+ *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2), >>>+ double_int_to_tree (sizetype, c3)); > > I am not sure why you changed this call. fold_build2 is a more > efficient call than size_binop. size_binop makes several checks that > will fail in this case, and then calls fold_build2_loc, right? Not a > big deal but seems like changing it back would be better. Perhaps I'm > missing something (as usual ;). I rely on size_binop to convert T2 into sizetype, because T2' may be in other kind of type. Otherwise there will be ssa_verify error later. Thanks. bin
On Mon, 2013-09-09 at 14:25 +0800, bin.cheng wrote: > Thanks for reviewing, I will correct all stupid spelling problem in the next version of patch. > > On Mon, Sep 9, 2013 at 8:15 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > > > >>>+ int (i * S). > >>>+ Otherwise, just return double int zero. */ > > > > This is sufficient, since you are properly checking the next_interp > > chain. Another possible form would be > > > > X = (B + i) * 1, > > > > but if this is present, then one of the forms you're checking for should > > also be present, so there's no need to check the MULT_CANDs. > I'm not very sure here since I didn't check MULT_CAND in the patch. Could you please explain more about this? Sorry, perhaps I shouldn't have mentioned it. I was simply stating that, although a candidate representing B + i could be represented with a CAND_MULT as shown, there is no need for you to check it (as you don't) since there will also be a corresponding CAND_ADD in one of the other forms. Since you are walking the next_interp chain, this works. In other words, the code is fine as is. I was just thinking out loud about other candidate types. > > > > >>>+ > >>>+static double_int > >>>+backtrace_base_for_ref (tree *pbase) > >>>+{ > >>>+ tree base_in = *pbase; > >>>+ slsr_cand_t base_cand; > >>>+ > >>>+ STRIP_NOPS (base_in); > >>>+ if (TREE_CODE (base_in) != SSA_NAME) > >>>+ return tree_to_double_int (integer_zero_node); > >>>+ > >>>+ base_cand = base_cand_from_table (base_in); > >>>+ > >>>+ while (base_cand && base_cand->kind != CAND_PHI) > >>>+ { > >>>+ if (base_cand->kind == CAND_ADD > >>>+ && base_cand->index.is_one () > >>>+ && TREE_CODE (base_cand->stride) == INTEGER_CST) > >>>+ { > >>>+ /* X = B + (1 * S), S is integer constant. */ > >>>+ *pbase = base_cand->base_expr; > >>>+ return tree_to_double_int (base_cand->stride); > >>>+ } > >>>+ else if (base_cand->kind == CAND_ADD > >>>+ && TREE_CODE (base_cand->stride) == INTEGER_CST > >>>+ && integer_onep (base_cand->stride)) > >>>+ { > >>>+ /* X = B + (i * S), S is integer one. */ > >>>+ *pbase = base_cand->base_expr; > >>>+ return base_cand->index; > >>>+ } > >>>+ > >>>+ if (base_cand->next_interp) > >>>+ base_cand = lookup_cand (base_cand->next_interp); > >>>+ else > >>>+ base_cand = NULL; > >>>+ } > >>>+ > >>>+ return tree_to_double_int (integer_zero_node); > >>>+} > >>>+ > >>> /* Look for the following pattern: > >>> > >>> *PBASE: MEM_REF (T1, C1) > >>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) > >>> > >>> *PBASE: T1 > >>> *POFFSET: MULT_EXPR (T2, C3) > >>>- *PINDEX: C1 + (C2 * C3) + C4 */ > >>>+ *PINDEX: C1 + (C2 * C3) + C4 > >>> > >>>+ When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It > > ^ ^ > > a it > > > >>>+ will be further restructured to: > >>>+ > >>>+ *PBASE: T1 > >>>+ *POFFSET: MULT_EXPR (T2', C3) > >>>+ *PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ > >>>+ > >>> static bool > >>> restructure_reference (tree *pbase, tree *poffset, double_int > > *pindex, > >>> tree *ptype) > >>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, > >>> double_int index = *pindex; > >>> double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); > >>> tree mult_op0, mult_op1, t1, t2, type; > >>>- double_int c1, c2, c3, c4; > >>>+ double_int c1, c2, c3, c4, c5; > >>> > >>> if (!base > >>> || !offset > >>>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree > > *poffset, > >>> } > >>> > >>> c4 = index.udiv (bpu, FLOOR_DIV_EXPR); > >>>+ c5 = backtrace_base_for_ref (&t2); > >>> > >>> *pbase = t1; > >>>- *poffset = fold_build2 (MULT_EXPR, sizetype, t2, > >>>- double_int_to_tree (sizetype, c3)); > >>>- *pindex = c1 + c2 * c3 + c4; > >>>+ *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2), > >>>+ double_int_to_tree (sizetype, c3)); > > > > I am not sure why you changed this call. fold_build2 is a more > > efficient call than size_binop. size_binop makes several checks that > > will fail in this case, and then calls fold_build2_loc, right? Not a > > big deal but seems like changing it back would be better. Perhaps I'm > > missing something (as usual ;). > I rely on size_binop to convert T2 into sizetype, because T2' may be in other kind of type. Otherwise there will be ssa_verify error later. OK, I see now. I had thought this was handled by fold_build2, but apparently not. I guess all T2's formerly handled were already sizetype as expected. Thanks for the explanation! Bill > > Thanks. > bin > > > >
On Mon, 2013-09-09 at 10:20 -0500, Bill Schmidt wrote: > On Mon, 2013-09-09 at 14:25 +0800, bin.cheng wrote: > > Thanks for reviewing, I will correct all stupid spelling problem in the next version of patch. > > > > On Mon, Sep 9, 2013 at 8:15 AM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > > > > > >>>+ int (i * S). > > >>>+ Otherwise, just return double int zero. */ > > > > > > This is sufficient, since you are properly checking the next_interp > > > chain. Another possible form would be > > > > > > X = (B + i) * 1, > > > > > > but if this is present, then one of the forms you're checking for should > > > also be present, so there's no need to check the MULT_CANDs. > > I'm not very sure here since I didn't check MULT_CAND in the patch. Could you please explain more about this? > > Sorry, perhaps I shouldn't have mentioned it. I was simply stating > that, although a candidate representing B + i could be represented with > a CAND_MULT as shown, there is no need for you to check it (as you > don't) since there will also be a corresponding CAND_ADD in one of the > other forms. Since you are walking the next_interp chain, this works. > > In other words, the code is fine as is. I was just thinking out loud > about other candidate types. > > > > > > > > >>>+ > > >>>+static double_int > > >>>+backtrace_base_for_ref (tree *pbase) > > >>>+{ > > >>>+ tree base_in = *pbase; > > >>>+ slsr_cand_t base_cand; > > >>>+ > > >>>+ STRIP_NOPS (base_in); > > >>>+ if (TREE_CODE (base_in) != SSA_NAME) > > >>>+ return tree_to_double_int (integer_zero_node); > > >>>+ > > >>>+ base_cand = base_cand_from_table (base_in); > > >>>+ > > >>>+ while (base_cand && base_cand->kind != CAND_PHI) > > >>>+ { > > >>>+ if (base_cand->kind == CAND_ADD > > >>>+ && base_cand->index.is_one () > > >>>+ && TREE_CODE (base_cand->stride) == INTEGER_CST) > > >>>+ { > > >>>+ /* X = B + (1 * S), S is integer constant. */ > > >>>+ *pbase = base_cand->base_expr; > > >>>+ return tree_to_double_int (base_cand->stride); > > >>>+ } > > >>>+ else if (base_cand->kind == CAND_ADD > > >>>+ && TREE_CODE (base_cand->stride) == INTEGER_CST > > >>>+ && integer_onep (base_cand->stride)) > > >>>+ { > > >>>+ /* X = B + (i * S), S is integer one. */ > > >>>+ *pbase = base_cand->base_expr; > > >>>+ return base_cand->index; > > >>>+ } > > >>>+ > > >>>+ if (base_cand->next_interp) > > >>>+ base_cand = lookup_cand (base_cand->next_interp); > > >>>+ else > > >>>+ base_cand = NULL; > > >>>+ } > > >>>+ > > >>>+ return tree_to_double_int (integer_zero_node); > > >>>+} > > >>>+ > > >>> /* Look for the following pattern: > > >>> > > >>> *PBASE: MEM_REF (T1, C1) > > >>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) > > >>> > > >>> *PBASE: T1 > > >>> *POFFSET: MULT_EXPR (T2, C3) > > >>>- *PINDEX: C1 + (C2 * C3) + C4 */ > > >>>+ *PINDEX: C1 + (C2 * C3) + C4 > > >>> > > >>>+ When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It > > > ^ ^ > > > a it > > > > > >>>+ will be further restructured to: > > >>>+ > > >>>+ *PBASE: T1 > > >>>+ *POFFSET: MULT_EXPR (T2', C3) > > >>>+ *PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ > > >>>+ > > >>> static bool > > >>> restructure_reference (tree *pbase, tree *poffset, double_int > > > *pindex, > > >>> tree *ptype) > > >>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, > > >>> double_int index = *pindex; > > >>> double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); > > >>> tree mult_op0, mult_op1, t1, t2, type; > > >>>- double_int c1, c2, c3, c4; > > >>>+ double_int c1, c2, c3, c4, c5; > > >>> > > >>> if (!base > > >>> || !offset > > >>>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree > > > *poffset, > > >>> } > > >>> > > >>> c4 = index.udiv (bpu, FLOOR_DIV_EXPR); > > >>>+ c5 = backtrace_base_for_ref (&t2); > > >>> > > >>> *pbase = t1; > > >>>- *poffset = fold_build2 (MULT_EXPR, sizetype, t2, > > >>>- double_int_to_tree (sizetype, c3)); > > >>>- *pindex = c1 + c2 * c3 + c4; > > >>>+ *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2), > > >>>+ double_int_to_tree (sizetype, c3)); > > > > > > I am not sure why you changed this call. fold_build2 is a more > > > efficient call than size_binop. size_binop makes several checks that > > > will fail in this case, and then calls fold_build2_loc, right? Not a > > > big deal but seems like changing it back would be better. Perhaps I'm > > > missing something (as usual ;). > > I rely on size_binop to convert T2 into sizetype, because T2' may be in other kind of type. Otherwise there will be ssa_verify error later. > > OK, I see now. I had thought this was handled by fold_build2, but > apparently not. I guess all T2's formerly handled were already sizetype > as expected. Thanks for the explanation! So, wouldn't it suffice to change t2 to fold_convert (sizetype, t2) in the argument list to fold_build2? It's picking nits, but that would be slightly more efficient. Bill > > Bill > > > > > Thanks. > > bin > > > > > > > >
Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c (revision 0) @@ -0,0 +1,26 @@ +/* Verify straight-line strength reduction for back-tracing + CADN_ADD for T2 in: + + *PBASE: T1 + *POFFSET: MULT_EXPR (T2, C3) + *PINDEX: C1 + (C2 * C3) + C4 */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-slsr" } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i] [j++] = i; + a2 [i] [j++] = i; + a2 [i] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times "MEM" 4 "slsr" } } */ +/* { dg-final { cleanup-tree-dump "slsr" } } */ Index: gcc/gimple-ssa-strength-reduction.c =================================================================== --- gcc/gimple-ssa-strength-reduction.c (revision 202067) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -750,6 +750,57 @@ slsr_process_phi (gimple phi, bool speed) add_cand_for_stmt (phi, c); } +/* Given PBASE which is a pointer to tree, loop up the defining + statement for it and check whether the candidate is in the + form of: + + X = B + (1 * S), S is integer constant + X = B + (i * S), S is integer one + + If so, set PBASE to the candiate's base_expr and return double + int (i * S). + Otherwise, just return double int zero. */ + +static double_int +backtrace_base_for_ref (tree *pbase) +{ + tree base_in = *pbase; + slsr_cand_t base_cand; + + STRIP_NOPS (base_in); + if (TREE_CODE (base_in) != SSA_NAME) + return tree_to_double_int (integer_zero_node); + + base_cand = base_cand_from_table (base_in); + + while (base_cand && base_cand->kind != CAND_PHI) + { + if (base_cand->kind == CAND_ADD + && base_cand->index.is_one () + && TREE_CODE (base_cand->stride) == INTEGER_CST) + { + /* X = B + (1 * S), S is integer constant. */ + *pbase = base_cand->base_expr; + return tree_to_double_int (base_cand->stride); + } + else if (base_cand->kind == CAND_ADD + && TREE_CODE (base_cand->stride) == INTEGER_CST + && integer_onep (base_cand->stride)) + { + /* X = B + (i * S), S is integer one. */ + *pbase = base_cand->base_expr; + return base_cand->index; + } + + if (base_cand->next_interp) + base_cand = lookup_cand (base_cand->next_interp); + else + base_cand = NULL; + } + + return tree_to_double_int (integer_zero_node); +} + /* Look for the following pattern: *PBASE: MEM_REF (T1, C1) @@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed) *PBASE: T1 *POFFSET: MULT_EXPR (T2, C3) - *PINDEX: C1 + (C2 * C3) + C4 */ + *PINDEX: C1 + (C2 * C3) + C4 + When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It + will be further restructured to: + + *PBASE: T1 + *POFFSET: MULT_EXPR (T2', C3) + *PINDEX: C1 + (C2 * C3) + C4 + (C5 * C3) */ + static bool restructure_reference (tree *pbase, tree *poffset, double_int *pindex, tree *ptype) @@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset, double_int index = *pindex; double_int bpu = double_int::from_uhwi (BITS_PER_UNIT); tree mult_op0, mult_op1, t1, t2, type; - double_int c1, c2, c3, c4; + double_int c1, c2, c3, c4, c5; if (!base || !offset @@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree *poffset, } c4 = index.udiv (bpu, FLOOR_DIV_EXPR); + c5 = backtrace_base_for_ref (&t2); *pbase = t1; - *poffset = fold_build2 (MULT_EXPR, sizetype, t2, - double_int_to_tree (sizetype, c3)); - *pindex = c1 + c2 * c3 + c4; + *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2), + double_int_to_tree (sizetype, c3)); + *pindex = c1 + c2 * c3 + c4 + c5 * c3; *ptype = type; return true;