Message ID | 5282ACE5.8020304@arm.com |
---|---|
State | New |
Headers | show |
Hi Yufeng, On Tue, 2013-11-12 at 22:34 +0000, Yufeng Zhang wrote: > Hi Bill, > > Many thanks for the review. > > I find your suggestion on using the next_interp field quite > enlightening. I prepared a patch which adds changes without modifying > the framework. With the patch, the slsr pass now tries to create a > second candidate for each memory accessing gimple statement, and chain > it to the first one via the next_interp field. > > There are two implications in this approach though: > > 1) For each memory accessing gimple statement, there can be two > candidates, and these two candidates can be part of different dependency > graphs respectively (based on different base expr). Only one of the > dependency graph should be traversed to do replace_refs. Most of the > changes in the patch is to handle this implication. > > I am aware that you suggest to follow the next-interp chain only when > the searching fails for the first interpretation. However, that doesn't > work very well, as it can result in worse code-gen. Taking a varied > form of the added test slsr-41.c for example: > > i1: a2 [i] [j] = 1; > i2: a2 [i] [j+1] = 2; > i3: a2 [i+20] [j] = i; > > With the 2nd interpretation created conditionally, the following two > dependency chains will be established: > > i1 --> i2 (base expr is an SSA_NAME defined as (a2 + i * 200)) > i1 --> i3 (base expr is a tree expression of (a2 + i * 200)) So it seems to me that really what needs to happen is to unify those two base_exprs. We don't currently have logic in this pass to look up an SSA name based on {base, index, stride, cand_type}, but that could be done with a hash table. For now to save processing time it would make sense to only do that for MEM candidates, though the cand_type should be included in the hash to allow this to be used for other candidate types if necessary. Of course, the SSA name definition must dominate the candidate to be eligible as a basis, and that should be checked, but this should generally be the case. The goal should be for all of these references to have the same base expr so that i3 can choose either i1 or i2 as a basis. (For now the logic in the pass chooses the most dominating basis, but eventually I would like to add heuristics to make better choices.) If all three of these use the same base expr, that should eliminate your concerns, right? > > the result is that three gimples will be lowered to MEM_REFs differently > (as the candidates have different base_exprs); the later passes can get > confused, generating worse code. > > What this patch does is to create two interpretations where possible (if > different base exprs exist); the following dependency chains will be > produced: > > i1 --> i2 (base expr is an SSA_NAME defined as (a2 + i * 200)) > i1 --> i2 --> i3 (base expr is a tree expression of (a2 + i * 200)) > > In analyze_candidates_and_replace, a new function preferred_ref_cand is > called to analyze a root CAND_REF and replace_refs is only called if > this root CAND_REF is found to be part of a larger dependency graph (or > longer dependency chain in simple cases). In the example above, the 2nd > dependency chain will be picked up to do replace_refs. > > 2) The 2nd implication is that the alternative candidate may expose the > underlying tree expression of a base expr, which can cause more > aggressive extraction and folding of immediate offsets. Taking the new > test slsr-41 for example, the code-gen difference on x86_64 with the > original patch and this patch is (-O2): > > - leal 5(%rsi), %edx > + leal 5(%rsi), %eax > movslq %esi, %rsi > - salq $2, %rsi > - movslq %edx, %rax > - leaq (%rax,%rax,4), %rax > - leaq (%rax,%rax,4), %rcx > - salq $3, %rcx > - leaq (%rdi,%rcx), %rax > - addq %rsi, %rax > - movl $2, -1980(%rax) > - movl %edx, 20(%rax) > - movl %edx, 4024(%rax) > - leaq -600(%rdi,%rcx), %rax > - addl $1, 16(%rsi,%rax) > + imulq $204, %rsi, %rsi > + addq %rsi, %rdi > + movl $2, -980(%rdi) > + movl %eax, 1020(%rdi) > + movl %eax, 5024(%rdi) > + addl $1, 416(%rdi) > ret > > As you can see, the larger offsets are produced as the affine expander > is able to look deep into the tree expression. This raises concern that > larger immediates can cause worse code-gen when the immediates are out > of the supported range on a target. On x86_64 it is not obvious (as it > allows larger ranges), on arm cortex-a15 the load with the immediate > 5024 will be done by > > movw r2, #5024 > str r3, [r0, r2] > > which is not optimal. Things can get worse when there are multiple > loads/stores with large immediates as each one may require an extra mov > immediate instruction. One thing can potentially be done is to reduce > the strength of multiple large immediates later on in some RTL pass by > doing an initial offsetting first? What do you think? Are you > particularly concerned about this issue? To me, this seems like something that the middle end should not concern itself about, but that may be oversimplifying. I would think this is not the only pass that can create such issues, and the overall code generation should usually be improved anyway. Richard, would you care to weigh in here? A couple of quick comments on the next_interp patch: * You don't need num_of_dependents (). You should be able to add a forward declaration for count_candidates () and use it. * Your new test case is missing a final newline, so your patch doesn't apply cleanly. Please look into unifying the base expressions, as I believe you should not need the preferred_ref_cand logic if you do that. I still prefer the approach of using next_interp for its generality and expandibility. Thanks, Bill
Hi Bill, On 11/13/13 18:04, Bill Schmidt wrote: > Hi Yufeng, > > On Tue, 2013-11-12 at 22:34 +0000, Yufeng Zhang wrote: >> Hi Bill, >> >> Many thanks for the review. >> >> I find your suggestion on using the next_interp field quite >> enlightening. I prepared a patch which adds changes without modifying >> the framework. With the patch, the slsr pass now tries to create a >> second candidate for each memory accessing gimple statement, and chain >> it to the first one via the next_interp field. >> >> There are two implications in this approach though: >> >> 1) For each memory accessing gimple statement, there can be two >> candidates, and these two candidates can be part of different dependency >> graphs respectively (based on different base expr). Only one of the >> dependency graph should be traversed to do replace_refs. Most of the >> changes in the patch is to handle this implication. >> >> I am aware that you suggest to follow the next-interp chain only when >> the searching fails for the first interpretation. However, that doesn't >> work very well, as it can result in worse code-gen. Taking a varied >> form of the added test slsr-41.c for example: >> >> i1: a2 [i] [j] = 1; >> i2: a2 [i] [j+1] = 2; >> i3: a2 [i+20] [j] = i; >> >> With the 2nd interpretation created conditionally, the following two >> dependency chains will be established: >> >> i1 --> i2 (base expr is an SSA_NAME defined as (a2 + i * 200)) >> i1 --> i3 (base expr is a tree expression of (a2 + i * 200)) > > So it seems to me that really what needs to happen is to unify those two > base_exprs. We don't currently have logic in this pass to look up an > SSA name based on {base, index, stride, cand_type}, but that could be > done with a hash table. For now to save processing time it would make > sense to only do that for MEM candidates, though the cand_type should be > included in the hash to allow this to be used for other candidate types > if necessary. Of course, the SSA name definition must dominate the > candidate to be eligible as a basis, and that should be checked, but > this should generally be the case. I'm not quite sure if the SSA_NAME look-up works; maybe I haven't fully understood what you suggest. For i1 --> i3, the base_expr is the tree expression (a2 + i * 200), which is the result of a sequence of operations (conversion to affine, immediate offset removal and conversion to tree), with another SSA_NAME as the input. In other words, there are two SSA_NAMEs involved in the example: _s1: (a2 + i * 200). _s2: (a2 + (i * 200 + 4000)) their strides and indexes are different. I guess what you suggest is that given the tree expression (a2 + i * 200), look up an SSA_NAME and return _s1. If that is the case, the challenge will be how to analyze the tree expression and get the information on its {base, index, stride, cand_type}. While it would be too specific and narrative to check for a POINTER_PLUS_EXPR expression, the existing framework (e.g. create_add_ssa_cand) seems to assume that the analyzed tree represent a genuine gimple statement. Moreover, there may not be an SSA_NAME exists, for example in the following case: i1: a2 [i+1] [j] = 1; i2: a2 [i+1] [j+1] = 2; i3: a2 [i+20] [j] = i; you wouldn't be able to find an SSA_NAME for (a2 + i * 200). [snip] > A couple of quick comments on the next_interp patch: > > * You don't need num_of_dependents (). You should be able to add a > forward declaration for count_candidates () and use it. Missed count_candidates (); thanks! > * Your new test case is missing a final newline, so your patch doesn't > apply cleanly. I'll fix it. > Please look into unifying the base expressions, as I believe you should > not need the preferred_ref_cand logic if you do that. I would also like to live without preferred_ref_cand if feasible . :) > I still prefer the approach of using next_interp for its generality and > expandibility. Sure; this approach indeed fit the framework better. Regards, Yufeng
Hi Yufeng, On Wed, 2013-11-13 at 19:32 +0000, Yufeng Zhang wrote: > Hi Bill, > > On 11/13/13 18:04, Bill Schmidt wrote: > > Hi Yufeng, > > > > On Tue, 2013-11-12 at 22:34 +0000, Yufeng Zhang wrote: > >> Hi Bill, > >> > >> Many thanks for the review. > >> > >> I find your suggestion on using the next_interp field quite > >> enlightening. I prepared a patch which adds changes without modifying > >> the framework. With the patch, the slsr pass now tries to create a > >> second candidate for each memory accessing gimple statement, and chain > >> it to the first one via the next_interp field. > >> > >> There are two implications in this approach though: > >> > >> 1) For each memory accessing gimple statement, there can be two > >> candidates, and these two candidates can be part of different dependency > >> graphs respectively (based on different base expr). Only one of the > >> dependency graph should be traversed to do replace_refs. Most of the > >> changes in the patch is to handle this implication. > >> > >> I am aware that you suggest to follow the next-interp chain only when > >> the searching fails for the first interpretation. However, that doesn't > >> work very well, as it can result in worse code-gen. Taking a varied > >> form of the added test slsr-41.c for example: > >> > >> i1: a2 [i] [j] = 1; > >> i2: a2 [i] [j+1] = 2; > >> i3: a2 [i+20] [j] = i; > >> > >> With the 2nd interpretation created conditionally, the following two > >> dependency chains will be established: > >> > >> i1 --> i2 (base expr is an SSA_NAME defined as (a2 + i * 200)) > >> i1 --> i3 (base expr is a tree expression of (a2 + i * 200)) > > > > So it seems to me that really what needs to happen is to unify those two > > base_exprs. We don't currently have logic in this pass to look up an > > SSA name based on {base, index, stride, cand_type}, but that could be > > done with a hash table. For now to save processing time it would make > > sense to only do that for MEM candidates, though the cand_type should be > > included in the hash to allow this to be used for other candidate types > > if necessary. Of course, the SSA name definition must dominate the > > candidate to be eligible as a basis, and that should be checked, but > > this should generally be the case. > > I'm not quite sure if the SSA_NAME look-up works; maybe I haven't fully > understood what you suggest. > > For i1 --> i3, the base_expr is the tree expression (a2 + i * 200), > which is the result of a sequence of operations (conversion to affine, > immediate offset removal and conversion to tree), with another SSA_NAME > as the input. In other words, there are two SSA_NAMEs involved in the > example: > > _s1: (a2 + i * 200). > _s2: (a2 + (i * 200 + 4000)) > > their strides and indexes are different. > > I guess what you suggest is that given the tree expression (a2 + i * > 200), look up an SSA_NAME and return _s1. If that is the case, the > challenge will be how to analyze the tree expression and get the > information on its {base, index, stride, cand_type}. While it would be > too specific and narrative to check for a POINTER_PLUS_EXPR expression, > the existing framework (e.g. create_add_ssa_cand) seems to assume that > the analyzed tree represent a genuine gimple statement. > > Moreover, there may not be an SSA_NAME exists, for example in the > following case: > > i1: a2 [i+1] [j] = 1; > i2: a2 [i+1] [j+1] = 2; > i3: a2 [i+20] [j] = i; > > you wouldn't be able to find an SSA_NAME for (a2 + i * 200). Ok. It is probably too much to hope for to get a sufficiently general approach to handle all of these cases cleanly. Bleah. The whole preferred_ref_cand business seems very ad hoc to me, and to some extent is closing the barn door after the cows have escaped. Perhaps we can't use the next-interpretation infrastructure to solve this problem ideally, in which case I apologize for leading you down this path. The alternate patch at least keeps the candidate tree in a straightforward state, and the new version is less intrusive than the original. Let me look that version over more carefully and I'll get back to you. Thanks for your patience. Bill > > [snip] > > A couple of quick comments on the next_interp patch: > > > > * You don't need num_of_dependents (). You should be able to add a > > forward declaration for count_candidates () and use it. > > Missed count_candidates (); thanks! > > > * Your new test case is missing a final newline, so your patch doesn't > > apply cleanly. > > I'll fix it. > > > Please look into unifying the base expressions, as I believe you should > > not need the preferred_ref_cand logic if you do that. > > I would also like to live without preferred_ref_cand if feasible . :) > > > I still prefer the approach of using next_interp for its generality and > > expandibility. > > Sure; this approach indeed fit the framework better. > > > Regards, > Yufeng >
Hi Yufeng, The second version of your original patch is ok with me with the following changes. Sorry for the little side adventure into the next-interp logic; in the end that's going to hurt more than it helps in this case. Thanks for having a look at it, anyway. Thanks also for cleaning up this version to be less intrusive to common interfaces; I appreciate it. >diff --git a/gcc/gimple-ssa-strength-reduction.c b/gcc/gimple-ssa-strength-reduction.c >index 88afc91..d069246 100644 >--- a/gcc/gimple-ssa-strength-reduction.c >+++ b/gcc/gimple-ssa-strength-reduction.c >@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see > #include "params.h" > #include "hash-table.h" > #include "tree-ssa-address.h" >+#include "tree-affine.h" > > /* Information about a strength reduction candidate. Each statement > in the candidate table represents an expression of one of the >@@ -420,6 +421,42 @@ cand_chain_hasher::equal (const value_type *chain1, const compare_type *chain2) > /* Hash table embodying a mapping from base exprs to chains of candidates. */ > static hash_table <cand_chain_hasher> base_cand_map; > >+/* Pointer map used by tree_to_aff_combination_expand. */ >+static struct pointer_map_t *name_expansions; >+/* Pointer map embodying a mapping from bases to alternative bases. */ >+static struct pointer_map_t *alt_base_map; >+ >+/* Given BASE, use the tree affine combiniation facilities to >+ find the underlying tree expression for BASE, with any >+ immediate offset excluded. */ >+ >+static tree >+get_alternative_base (tree base) >+{ >+ tree *result = (tree *) pointer_map_contains (alt_base_map, base); >+ >+ if (result == NULL) >+ { >+ tree expr; >+ aff_tree aff; >+ >+ tree_to_aff_combination_expand (base, TREE_TYPE (base), >+ &aff, &name_expansions); >+ aff.offset = tree_to_double_int (integer_zero_node); >+ expr = aff_combination_to_tree (&aff); >+ >+ result = (tree *) pointer_map_insert (alt_base_map, base); >+ gcc_assert (!*result); >+ >+ if (expr == base) >+ *result = NULL; >+ else >+ *result = expr; >+ } >+ >+ return *result; >+} >+ > /* Look in the candidate table for a CAND_PHI that defines BASE and > return it if found; otherwise return NULL. */ > >@@ -439,9 +476,10 @@ find_phi_def (tree base) > return c->cand_num; > } > >-/* Helper routine for find_basis_for_candidate. May be called twice: >+/* Helper routine for find_basis_for_candidate. May be called three times: > once for the candidate's base expr, and optionally again for the >- candidate's phi definition. */ >+ candidate's phi definition, as well as for an alternative base expr >+ in the case of CAND_REF. */ Technically this will never be called three times. It will be called once for the candidate's base expression, and optionally either for the candidate's phi definition or for a CAND_REF's alternative base expression. (There is no phi processing for a CAND_REF.) > > static slsr_cand_t > find_basis_for_base_expr (slsr_cand_t c, tree base_expr) >@@ -518,6 +556,13 @@ find_basis_for_candidate (slsr_cand_t c) > } > } > >+ if (!basis && c->kind == CAND_REF) >+ { >+ tree alt_base_expr = get_alternative_base (c->base_expr); >+ if (alt_base_expr) >+ basis = find_basis_for_base_expr (c, alt_base_expr); >+ } >+ > if (basis) > { > c->sibling = basis->dependent; >@@ -528,17 +573,22 @@ find_basis_for_candidate (slsr_cand_t c) > return 0; > } > >-/* Record a mapping from the base expression of C to C itself, indicating that >- C may potentially serve as a basis using that base expression. */ >+/* Record a mapping from BASE to C, indicating that C may potentially serve >+ as a basis using that base expression. BASE may be the same as >+ C->BASE_EXPR; alternatively BASE can be a different tree that share the >+ underlining expression of C->BASE_EXPR. */ > > static void >-record_potential_basis (slsr_cand_t c) >+record_potential_basis (slsr_cand_t c, tree base) > { > cand_chain_t node; > cand_chain **slot; > >+ if (base == NULL) >+ return; Please do this check outside the common code; it's not necessary except for CAND_REFs. Replace with: gcc_assert (base); >+ > node = (cand_chain_t) obstack_alloc (&chain_obstack, sizeof (cand_chain)); >- node->base_expr = c->base_expr; >+ node->base_expr = base; > node->cand = c; > node->next = NULL; > slot = base_cand_map.find_slot (node, INSERT); >@@ -554,10 +604,18 @@ record_potential_basis (slsr_cand_t c) > } > > /* Allocate storage for a new candidate and initialize its fields. >- Attempt to find a basis for the candidate. */ >+ Attempt to find a basis for the candidate. >+ >+ For CAND_REF, an alternative base may also be recorded and used >+ to find a basis. This helps cases where the expression hidden >+ behind BASE (which is usually an SSA_NAME) has immediate offset, >+ e.g. >+ >+ a2[i][j] = 1; >+ a2[i + 20][j] = 2; */ > > static slsr_cand_t >-alloc_cand_and_find_basis (enum cand_kind kind, gimple gs, tree base, >+alloc_cand_and_find_basis (enum cand_kind kind, gimple gs, tree base, > double_int index, tree stride, tree ctype, > unsigned savings) > { >@@ -583,7 +641,9 @@ alloc_cand_and_find_basis (enum cand_kind kind, gimple gs, tree base, > else > c->basis = find_basis_for_candidate (c); > >- record_potential_basis (c); >+ record_potential_basis (c, base); >+ if (kind == CAND_REF) >+ record_potential_basis (c, get_alternative_base (base)); Tied to the above change: if (kind == CAND_REF) { tree alt_base = get_alternative_base (base); if (alt_base) record_potential_basis (c, alt_base); } > > return c; > } >@@ -3524,6 +3584,9 @@ execute_strength_reduction (void) > /* Allocate the mapping from base expressions to candidate chains. */ > base_cand_map.create (500); > >+ /* Allocate the mapping from bases to alternative bases. */ >+ alt_base_map = pointer_map_create (); >+ > /* Initialize the loop optimizer. We need to detect flow across > back edges, and this gives us dominator information as well. */ > loop_optimizer_init (AVOID_CFG_MODIFICATIONS); >@@ -3539,6 +3602,9 @@ execute_strength_reduction (void) > dump_cand_chains (); > } > >+ pointer_map_destroy (alt_base_map); >+ free_affine_expand_cache (&name_expansions); >+ > /* Analyze costs and make appropriate replacements. */ > analyze_candidates_and_replace (); > >diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-41.c b/gcc/testsuite/gcc.dg/tree-ssa/slsr-41.c >new file mode 100644 >index 0000000..870d714 >--- /dev/null >+++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-41.c >@@ -0,0 +1,24 @@ >+/* Verify straight-line strength reduction in using >+ alternative base expr to record and look for the >+ potential candidate. */ >+ >+/* { dg-do compile } */ >+/* { dg-options "-O2 -fdump-tree-slsr" } */ >+ >+typedef int arr_2[50][50]; >+ >+void foo (arr_2 a2, int v1) >+{ >+ int i, j; >+ >+ i = v1 + 5; >+ j = i; >+ a2 [i-10] [j] = 2; >+ a2 [i] [j++] = i; >+ a2 [i+20] [j++] = i; >+ a2 [i-3] [i-1] += 1; >+ return; >+} >+ >+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */ >+/* { dg-final { cleanup-tree-dump "slsr" } } */ As mentioned previously, please add the missing newline at EOF. Everything else looks OK to me. Please ask Richard for final approval, as I'm not a maintainer. Thanks, Bill (P.S. I prefer inline patches rather than attachments; it makes it easier to reply with markup.)
diff --git a/gcc/gimple-ssa-strength-reduction.c b/gcc/gimple-ssa-strength-reduction.c index 88afc91..d069246 100644 --- a/gcc/gimple-ssa-strength-reduction.c +++ b/gcc/gimple-ssa-strength-reduction.c @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see #include "params.h" #include "hash-table.h" #include "tree-ssa-address.h" +#include "tree-affine.h" /* Information about a strength reduction candidate. Each statement in the candidate table represents an expression of one of the @@ -420,6 +421,42 @@ cand_chain_hasher::equal (const value_type *chain1, const compare_type *chain2) /* Hash table embodying a mapping from base exprs to chains of candidates. */ static hash_table <cand_chain_hasher> base_cand_map; +/* Pointer map used by tree_to_aff_combination_expand. */ +static struct pointer_map_t *name_expansions; +/* Pointer map embodying a mapping from bases to alternative bases. */ +static struct pointer_map_t *alt_base_map; + +/* Given BASE, use the tree affine combiniation facilities to + find the underlying tree expression for BASE, with any + immediate offset excluded. */ + +static tree +get_alternative_base (tree base) +{ + tree *result = (tree *) pointer_map_contains (alt_base_map, base); + + if (result == NULL) + { + tree expr; + aff_tree aff; + + tree_to_aff_combination_expand (base, TREE_TYPE (base), + &aff, &name_expansions); + aff.offset = tree_to_double_int (integer_zero_node); + expr = aff_combination_to_tree (&aff); + + result = (tree *) pointer_map_insert (alt_base_map, base); + gcc_assert (!*result); + + if (expr == base) + *result = NULL; + else + *result = expr; + } + + return *result; +} + /* Look in the candidate table for a CAND_PHI that defines BASE and return it if found; otherwise return NULL. */ @@ -439,9 +476,10 @@ find_phi_def (tree base) return c->cand_num; } -/* Helper routine for find_basis_for_candidate. May be called twice: +/* Helper routine for find_basis_for_candidate. May be called three times: once for the candidate's base expr, and optionally again for the - candidate's phi definition. */ + candidate's phi definition, as well as for an alternative base expr + in the case of CAND_REF. */ static slsr_cand_t find_basis_for_base_expr (slsr_cand_t c, tree base_expr) @@ -518,6 +556,13 @@ find_basis_for_candidate (slsr_cand_t c) } } + if (!basis && c->kind == CAND_REF) + { + tree alt_base_expr = get_alternative_base (c->base_expr); + if (alt_base_expr) + basis = find_basis_for_base_expr (c, alt_base_expr); + } + if (basis) { c->sibling = basis->dependent; @@ -528,17 +573,22 @@ find_basis_for_candidate (slsr_cand_t c) return 0; } -/* Record a mapping from the base expression of C to C itself, indicating that - C may potentially serve as a basis using that base expression. */ +/* Record a mapping from BASE to C, indicating that C may potentially serve + as a basis using that base expression. BASE may be the same as + C->BASE_EXPR; alternatively BASE can be a different tree that share the + underlining expression of C->BASE_EXPR. */ static void -record_potential_basis (slsr_cand_t c) +record_potential_basis (slsr_cand_t c, tree base) { cand_chain_t node; cand_chain **slot; + if (base == NULL) + return; + node = (cand_chain_t) obstack_alloc (&chain_obstack, sizeof (cand_chain)); - node->base_expr = c->base_expr; + node->base_expr = base; node->cand = c; node->next = NULL; slot = base_cand_map.find_slot (node, INSERT); @@ -554,10 +604,18 @@ record_potential_basis (slsr_cand_t c) } /* Allocate storage for a new candidate and initialize its fields. - Attempt to find a basis for the candidate. */ + Attempt to find a basis for the candidate. + + For CAND_REF, an alternative base may also be recorded and used + to find a basis. This helps cases where the expression hidden + behind BASE (which is usually an SSA_NAME) has immediate offset, + e.g. + + a2[i][j] = 1; + a2[i + 20][j] = 2; */ static slsr_cand_t -alloc_cand_and_find_basis (enum cand_kind kind, gimple gs, tree base, +alloc_cand_and_find_basis (enum cand_kind kind, gimple gs, tree base, double_int index, tree stride, tree ctype, unsigned savings) { @@ -583,7 +641,9 @@ alloc_cand_and_find_basis (enum cand_kind kind, gimple gs, tree base, else c->basis = find_basis_for_candidate (c); - record_potential_basis (c); + record_potential_basis (c, base); + if (kind == CAND_REF) + record_potential_basis (c, get_alternative_base (base)); return c; } @@ -3524,6 +3584,9 @@ execute_strength_reduction (void) /* Allocate the mapping from base expressions to candidate chains. */ base_cand_map.create (500); + /* Allocate the mapping from bases to alternative bases. */ + alt_base_map = pointer_map_create (); + /* Initialize the loop optimizer. We need to detect flow across back edges, and this gives us dominator information as well. */ loop_optimizer_init (AVOID_CFG_MODIFICATIONS); @@ -3539,6 +3602,9 @@ execute_strength_reduction (void) dump_cand_chains (); } + pointer_map_destroy (alt_base_map); + free_affine_expand_cache (&name_expansions); + /* Analyze costs and make appropriate replacements. */ analyze_candidates_and_replace (); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-41.c b/gcc/testsuite/gcc.dg/tree-ssa/slsr-41.c new file mode 100644 index 0000000..870d714 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-41.c @@ -0,0 +1,24 @@ +/* Verify straight-line strength reduction in using + alternative base expr to record and look for the + potential candidate. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-slsr" } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i-10] [j] = 2; + a2 [i] [j++] = i; + a2 [i+20] [j++] = i; + a2 [i-3] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */ +/* { dg-final { cleanup-tree-dump "slsr" } } */