diff mbox

[PR41488] Recognize more induction variables by simplifying PEELED chrec in scalar evolution

Message ID 000901cef552$0a49b890$1edd29b0$@arm.com
State New
Headers show

Commit Message

Bin Cheng Dec. 10, 2013, 2:46 a.m. UTC
> -----Original Message-----
> From: Bin.Cheng [mailto:amker.cheng@gmail.com]
> Sent: Monday, December 09, 2013 1:15 PM
> To: Jeff Law
> Cc: Bin Cheng; gcc-patches List
> Subject: Re: [PATCH PR41488]Recognize more induction variables by
> simplifying PEELED chrec in scalar evolution
> 
> On Mon, Dec 9, 2013 at 12:00 PM, Jeff Law <law@redhat.com> wrote:
> > On 12/06/13 19:44, Bin.Cheng wrote:
> >>>
> >>> Right.  Based on reading the archives, it looks like this stuff
> >>> is/was generated by PRE.  I also suspect jump threading can create
> >>> them.  There was talk of throttling PRE to leave things in a form
> >>> that the IV analysis could more easily digest, but I'm not sure if
> >>> that was ever completed.
> >>
> >>
> >> Also could just because it is coded in that way, just as the case I
> >> added in patch.  I found real examples from ggc-page.c in GCC.
> >> But it's always good to cleanup input of an optimization, I presume
> >> that's why Richard tried to move IVOPT later before.
> >
> > Certainly.  That possibility was mentioned as well in either 41488 or
> > elsewhere in my research.
> >
> >
> >>>
> >>> I assume tree_to_aff_combination_expand is relatively expensive,
> >>> thus the two approaches, one which avoids
> tree_to_aff_combination_expand.
> >>
> >> Considering the dump of case in the patch:
> >
> > [ snip ]
> > So it's also a case using the affine stuff gets you get a simpler
> > interface to express the value in terms you may be able match.
> Yeah.
> 
> >
> >
> >>
> >> Another question, is it acceptable to add an parameter for
> >> tree_to_aff_combination_expand to limit the number of recursive call
> >> for it?  Thus we don't need to expand to leaf node every time.
> >
> > If there's a compelling reason to limit the recursion, then I'd think
> > such a parameter would be reasonable.
> Not for now.  Just come up this idea given that some recent patches also
use
> the interface to get back trace information and it's expensive.  Maybe
> Richard have some comment here too.
> 
> >
> >
> >
> >>
> >>>
> >>>
> >>> In add_old_iv_candidates, is it the case that the only time
> >>> SSA_NAME_DEF_STMT (def) is another PHI node is when it's one of
> >>> these affine
> >>
> >>
> >> I suppose.  Actually IVOPT make an assert on IP_ORIGINAL candidates
> >> in function rewrite_use_nonlinear_expr, like:
> >
> > Well, the reason I ask is after your patch we'd be ignoring anything
> > where SSA_NAME_DEF_STMT (def) is a PHI node and not a PEELED_CHREC.
> I
> > want to make sure there aren't other expected cases where
> > SSA_NAME_DEF_STMT (def) is a PHI that we'd want to call
> add_candidate_1 for.
> >
> > It sounds like there aren't other cases you'd expect to see here where
> > SSA_NAME_DEF_STMT (defFFF) is a PHI.
> Yes.
> 
> >
> >
> >> IVOPT adds original cand and tries to keep it the original IV is
> >> because it does have an updating statement.  For IVs picked up by
> >> this patch, it doesn't have stepping statement, so no need/way to
> >> leaving it untouched.  It will be represented by candidates for IVs
> >> from which it is peeled.
> >
> > Understood.  Thanks for clarifying.  The patch is fine for the trunk.
> Thanks very much for reviewing.  Since Sebastian hasn't added any
> comments, I will change the patch as suggested by Richard before, then
> retest it, and check in the new version if it's ok.
> 
Hi,
This is the new version patch computing the difference in tree affine then
comparing against integer_zero_node.
Bootstrap and test on current upstream.  I will commit it if there is no
other objection.

Thanks,
bin
> >
> > jeff
> 
> 
> 
> --
> Best Regards.

Comments

Jeff Law Dec. 10, 2013, 5:54 a.m. UTC | #1
On 12/09/13 19:46, bin.cheng wrote:
>>>> IVOPT adds original cand and tries to keep it the original IV is
>>>> because it does have an updating statement.  For IVs picked up by
>>>> this patch, it doesn't have stepping statement, so no need/way to
>>>> leaving it untouched.  It will be represented by candidates for IVs
>>>> from which it is peeled.
>>>
>>> Understood.  Thanks for clarifying.  The patch is fine for the trunk.
>> Thanks very much for reviewing.  Since Sebastian hasn't added any
>> comments, I will change the patch as suggested by Richard before, then
>> retest it, and check in the new version if it's ok.
>>
> Hi,
> This is the new version patch computing the difference in tree affine then
> comparing against integer_zero_node.
> Bootstrap and test on current upstream.  I will commit it if there is no
> other objection.
Please install.

Thanks,
Jeff
H.J. Lu Dec. 10, 2013, 8:31 p.m. UTC | #2
On Mon, Dec 9, 2013 at 6:46 PM, bin.cheng <bin.cheng@arm.com> wrote:
>
>>
> Hi,
> This is the new version patch computing the difference in tree affine then
> comparing against integer_zero_node.
> Bootstrap and test on current upstream.  I will commit it if there is no
> other objection.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59445
Jakub Jelinek Dec. 10, 2013, 10:50 p.m. UTC | #3
On Tue, Dec 10, 2013 at 10:46:32AM +0800, bin.cheng wrote:
> This is the new version patch computing the difference in tree affine then
> comparing against integer_zero_node.
> Bootstrap and test on current upstream.  I will commit it if there is no
> other objection.

This breaks bootstrap on x86_64-linux for me too, though in a different
pass.
Anyway, the bugs I'm seeing here:
1) other passes that use a cache for tree-affine computations
   don't initialize it by pointer_map_create (), tree-affine.c does that
   for you
2) the cache shouldn't be destroyed by pointer_map_destroy, but
   instead by free_affine_expand_cache that will actually make sure
   it doesn't leak memory, etc.
3) the actual issue why this breaks bootstrap is that you've added
   the pointer_map_destroy (should be free_affine_expand_cache per 2) )
   to scev_finalize, but scev_initialize is performed at the start
   of the loop passes and scev_finalize several passes later, and
   in between passes ggc_collect is called.  As the cache isn't registered
   with GC, if you are unlucky and ggc_collect actually collects
   in between scev_initialize and scev_finalize and garbage collects
   some trees only mentioned in the affine cache peeled_chrec_map,
   things can break completely.

So, IMHO please revert this patch for now and try to find some
better place for the free_affine_expand_cache (&peeled_chrec_map);
call so that the cache is never non-NULL when ggc_collect may be called.
Alternatively you'd need to either tell GC about it (but the structures
are heap allocated, not GC), or say instruct through some GTY magic
that during collection free_affine_expand_cache (&peeled_chrec_map);
should be called.

	Jakub
Bin.Cheng Dec. 11, 2013, 12:59 a.m. UTC | #4
On Wed, Dec 11, 2013 at 6:50 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Dec 10, 2013 at 10:46:32AM +0800, bin.cheng wrote:
>> This is the new version patch computing the difference in tree affine then
>> comparing against integer_zero_node.
>> Bootstrap and test on current upstream.  I will commit it if there is no
>> other objection.
>
> This breaks bootstrap on x86_64-linux for me too, though in a different
> pass.
> Anyway, the bugs I'm seeing here:
> 1) other passes that use a cache for tree-affine computations
>    don't initialize it by pointer_map_create (), tree-affine.c does that
>    for you
> 2) the cache shouldn't be destroyed by pointer_map_destroy, but
>    instead by free_affine_expand_cache that will actually make sure
>    it doesn't leak memory, etc.
> 3) the actual issue why this breaks bootstrap is that you've added
>    the pointer_map_destroy (should be free_affine_expand_cache per 2) )
>    to scev_finalize, but scev_initialize is performed at the start
>    of the loop passes and scev_finalize several passes later, and
>    in between passes ggc_collect is called.  As the cache isn't registered
>    with GC, if you are unlucky and ggc_collect actually collects
>    in between scev_initialize and scev_finalize and garbage collects
>    some trees only mentioned in the affine cache peeled_chrec_map,
>    things can break completely.
>
> So, IMHO please revert this patch for now and try to find some
> better place for the free_affine_expand_cache (&peeled_chrec_map);
> call so that the cache is never non-NULL when ggc_collect may be called.
> Alternatively you'd need to either tell GC about it (but the structures
> are heap allocated, not GC), or say instruct through some GTY magic
> that during collection free_affine_expand_cache (&peeled_chrec_map);
> should be called.

Sorry for bothering, I have reverted the patch.  Will investigate it.

Thanks,
bin
>
>         Jakub
Bin.Cheng Dec. 11, 2013, 6:24 a.m. UTC | #5
On Wed, Dec 11, 2013 at 4:31 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Dec 9, 2013 at 6:46 PM, bin.cheng <bin.cheng@arm.com> wrote:
>>
>>>
>> Hi,
>> This is the new version patch computing the difference in tree affine then
>> comparing against integer_zero_node.
>> Bootstrap and test on current upstream.  I will commit it if there is no
>> other objection.
>>
>
> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59445
I found out the cause and addressed it in the bug entry.

Thanks,
bin

>
> --
> H.J.
Bin.Cheng Dec. 11, 2013, 6:35 a.m. UTC | #6
On Wed, Dec 11, 2013 at 6:50 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Dec 10, 2013 at 10:46:32AM +0800, bin.cheng wrote:
>> This is the new version patch computing the difference in tree affine then
>> comparing against integer_zero_node.
>> Bootstrap and test on current upstream.  I will commit it if there is no
>> other objection.
>
> This breaks bootstrap on x86_64-linux for me too, though in a different
> pass.
> Anyway, the bugs I'm seeing here:
> 1) other passes that use a cache for tree-affine computations
>    don't initialize it by pointer_map_create (), tree-affine.c does that
>    for you
> 2) the cache shouldn't be destroyed by pointer_map_destroy, but
>    instead by free_affine_expand_cache that will actually make sure
>    it doesn't leak memory, etc.
> 3) the actual issue why this breaks bootstrap is that you've added
>    the pointer_map_destroy (should be free_affine_expand_cache per 2) )
>    to scev_finalize, but scev_initialize is performed at the start
>    of the loop passes and scev_finalize several passes later, and
>    in between passes ggc_collect is called.  As the cache isn't registered
>    with GC, if you are unlucky and ggc_collect actually collects
>    in between scev_initialize and scev_finalize and garbage collects
>    some trees only mentioned in the affine cache peeled_chrec_map,
>    things can break completely.
>
> So, IMHO please revert this patch for now and try to find some
> better place for the free_affine_expand_cache (&peeled_chrec_map);
> call so that the cache is never non-NULL when ggc_collect may be called.
> Alternatively you'd need to either tell GC about it (but the structures
> are heap allocated, not GC), or say instruct through some GTY magic
> that during collection free_affine_expand_cache (&peeled_chrec_map);
> should be called.

Thanks for helping on the issue, I shouldn't use the affine facility
before fully understanding it.

I know little about GC, so when ggc_collect may be called (between two
passes)? If so, I have to call free_affine_expand_cache just after the
calling to tree_to_affine_combination_expand in SCEV because it's an
analyzer and as you pointed out, the analyzing result is used by
different optimizers.  The pointer map would be maintained between
different passes if it's not instantly freed.

Thanks,
bin
>
>         Jakub
Jeff Law Dec. 11, 2013, 6:59 a.m. UTC | #7
On 12/10/13 23:35, Bin.Cheng wrote:
>
> I know little about GC, so when ggc_collect may be called (between two
> passes)? If so, I have to call free_affine_expand_cache just after the
> calling to tree_to_affine_combination_expand in SCEV because it's an
> analyzer and as you pointed out, the analyzing result is used by
> different optimizers.  The pointer map would be maintained between
> different passes if it's not instantly freed.
The garbage collector only runs between passes.  If you have roots in 
static storage, then  you'll need to decorate them so the garbage 
collector scans them.

There's a fairly extensive discussion of the garbage collector in the 
GCC internals manual.


jeff
Jakub Jelinek Dec. 11, 2013, 8:17 a.m. UTC | #8
On Tue, Dec 10, 2013 at 11:59:16PM -0700, Jeff Law wrote:
> On 12/10/13 23:35, Bin.Cheng wrote:
> >I know little about GC, so when ggc_collect may be called (between two
> >passes)? If so, I have to call free_affine_expand_cache just after the
> >calling to tree_to_affine_combination_expand in SCEV because it's an
> >analyzer and as you pointed out, the analyzing result is used by
> >different optimizers.  The pointer map would be maintained between
> >different passes if it's not instantly freed.
> The garbage collector only runs between passes.  If you have roots
> in static storage, then  you'll need to decorate them so the garbage
> collector scans them.
> 
> There's a fairly extensive discussion of the garbage collector in
> the GCC internals manual.

It isn't that easy, pointer_map isn't a data structure recognized by the
garbage collector at all (plus, as I've said before, the code is inserting
XNEW allocated structures into the pointer_map, which isn't GC friendly
either, but making them GC allocated might increase garbage unnecessarily,
as all the structs are short lived).

I've looked through scev_{initialize,finalize} callers, and almost all
passes initialize it and finalize within the same pass, it is only the loop
optimizations:
          NEXT_PASS (pass_tree_loop_init);
          NEXT_PASS (pass_lim);
          NEXT_PASS (pass_copy_prop);
          NEXT_PASS (pass_dce_loop);
          NEXT_PASS (pass_tree_unswitch);
          NEXT_PASS (pass_scev_cprop);
          NEXT_PASS (pass_record_bounds);
          NEXT_PASS (pass_check_data_deps);
          NEXT_PASS (pass_loop_distribution);
          NEXT_PASS (pass_copy_prop);
          NEXT_PASS (pass_graphite);
          PUSH_INSERT_PASSES_WITHIN (pass_graphite)
              NEXT_PASS (pass_graphite_transforms);
              NEXT_PASS (pass_lim);
              NEXT_PASS (pass_copy_prop);
              NEXT_PASS (pass_dce_loop);
          POP_INSERT_PASSES ()
          NEXT_PASS (pass_iv_canon);
          NEXT_PASS (pass_parallelize_loops);
          NEXT_PASS (pass_if_conversion);
          /* pass_vectorize must immediately follow pass_if_conversion.
             Please do not add any other passes in between.  */
          NEXT_PASS (pass_vectorize);
          PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
              NEXT_PASS (pass_dce_loop);
          POP_INSERT_PASSES ()
          NEXT_PASS (pass_predcom);
          NEXT_PASS (pass_complete_unroll);
          NEXT_PASS (pass_slp_vectorize);
          NEXT_PASS (pass_loop_prefetch);
          NEXT_PASS (pass_iv_optimize);
          NEXT_PASS (pass_lim);
          NEXT_PASS (pass_tree_loop_done);
where scev is live in between the passes, and unfortunately there
is no call that each pass calls that you could easily add the free_affine.*
call to (except for execute_function_todo but you'd need to invent another
TODO_* flag for it and the question is how you'd ensure it will be handled
for the non-loop specific passes that are just run in between these
passes (say pass_copy_prop, would we need pass_copy_prop_loop?).

Perhaps you could replace in tree-affine.c and all it's users
{,struct} pointer_map_t * with
struct GTY((user)) affine_expand_cache {
  pointer_map_t *cache;
};
and write custom gt_ggc_mx (affine_expand_cache *) function that would
would either pointer_map_traverse the cache and for each cache entry
call gt_ggc_mx on the embedded trees in the structures, or drop the cache
(dunno if the cache is really just a cache and it doesn't matter if it is
dropped any time, or if it can affect code generation).  There are also PCH
related functions that need to be defined, though I guess you could just
assert there that the cache is NULL and not walk anything.

	Jakub
Bin.Cheng Dec. 11, 2013, 8:31 a.m. UTC | #9
On Wed, Dec 11, 2013 at 4:17 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Dec 10, 2013 at 11:59:16PM -0700, Jeff Law wrote:
>> On 12/10/13 23:35, Bin.Cheng wrote:
>> >I know little about GC, so when ggc_collect may be called (between two
>> >passes)? If so, I have to call free_affine_expand_cache just after the
>> >calling to tree_to_affine_combination_expand in SCEV because it's an
>> >analyzer and as you pointed out, the analyzing result is used by
>> >different optimizers.  The pointer map would be maintained between
>> >different passes if it's not instantly freed.
>> The garbage collector only runs between passes.  If you have roots
>> in static storage, then  you'll need to decorate them so the garbage
>> collector scans them.
>>
>> There's a fairly extensive discussion of the garbage collector in
>> the GCC internals manual.
>
> It isn't that easy, pointer_map isn't a data structure recognized by the
> garbage collector at all (plus, as I've said before, the code is inserting
> XNEW allocated structures into the pointer_map, which isn't GC friendly
> either, but making them GC allocated might increase garbage unnecessarily,
> as all the structs are short lived).
>
> I've looked through scev_{initialize,finalize} callers, and almost all
> passes initialize it and finalize within the same pass, it is only the loop
> optimizations:
>           NEXT_PASS (pass_tree_loop_init);
>           NEXT_PASS (pass_lim);
>           NEXT_PASS (pass_copy_prop);
>           NEXT_PASS (pass_dce_loop);
>           NEXT_PASS (pass_tree_unswitch);
>           NEXT_PASS (pass_scev_cprop);
>           NEXT_PASS (pass_record_bounds);
>           NEXT_PASS (pass_check_data_deps);
>           NEXT_PASS (pass_loop_distribution);
>           NEXT_PASS (pass_copy_prop);
>           NEXT_PASS (pass_graphite);
>           PUSH_INSERT_PASSES_WITHIN (pass_graphite)
>               NEXT_PASS (pass_graphite_transforms);
>               NEXT_PASS (pass_lim);
>               NEXT_PASS (pass_copy_prop);
>               NEXT_PASS (pass_dce_loop);
>           POP_INSERT_PASSES ()
>           NEXT_PASS (pass_iv_canon);
>           NEXT_PASS (pass_parallelize_loops);
>           NEXT_PASS (pass_if_conversion);
>           /* pass_vectorize must immediately follow pass_if_conversion.
>              Please do not add any other passes in between.  */
>           NEXT_PASS (pass_vectorize);
>           PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
>               NEXT_PASS (pass_dce_loop);
>           POP_INSERT_PASSES ()
>           NEXT_PASS (pass_predcom);
>           NEXT_PASS (pass_complete_unroll);
>           NEXT_PASS (pass_slp_vectorize);
>           NEXT_PASS (pass_loop_prefetch);
>           NEXT_PASS (pass_iv_optimize);
>           NEXT_PASS (pass_lim);
>           NEXT_PASS (pass_tree_loop_done);
> where scev is live in between the passes, and unfortunately there
> is no call that each pass calls that you could easily add the free_affine.*
> call to (except for execute_function_todo but you'd need to invent another
> TODO_* flag for it and the question is how you'd ensure it will be handled
> for the non-loop specific passes that are just run in between these
> passes (say pass_copy_prop, would we need pass_copy_prop_loop?).
>
> Perhaps you could replace in tree-affine.c and all it's users
> {,struct} pointer_map_t * with
> struct GTY((user)) affine_expand_cache {
>   pointer_map_t *cache;
> };
> and write custom gt_ggc_mx (affine_expand_cache *) function that would
> would either pointer_map_traverse the cache and for each cache entry
> call gt_ggc_mx on the embedded trees in the structures, or drop the cache
> (dunno if the cache is really just a cache and it doesn't matter if it is
> dropped any time, or if it can affect code generation).  There are also PCH
> related functions that need to be defined, though I guess you could just
> assert there that the cache is NULL and not walk anything.

Thank both of you for being patient on this patch.
I went through the documentation quickly and realized that I have to
modify pointer-map structure to make it recognized by GC (maybe more
work suggested by Jakub).  It seems I shouldn't include that task in
this patch at this stage 3, I am thinking just call free_affine*
function in the place it is created for SCEV.  Of course, that makes
it more expensive.

I just found that the patch also fixes PR58296 on IVOPT and I do want
to include it in 4.9 if we are ok with it.  So what's your opinion?

Thanks,
bin
>
>         Jakub
Jakub Jelinek Dec. 11, 2013, 8:50 a.m. UTC | #10
On Wed, Dec 11, 2013 at 04:31:55PM +0800, Bin.Cheng wrote:
> Thank both of you for being patient on this patch.
> I went through the documentation quickly and realized that I have to
> modify pointer-map structure to make it recognized by GC (maybe more

Changing pointer_map at this point is IMHO not appropriate.

> work suggested by Jakub).  It seems I shouldn't include that task in
> this patch at this stage 3, I am thinking just call free_affine*
> function in the place it is created for SCEV.  Of course, that makes
> it more expensive.

Perhaps you could call free_affine_expand_cache say from
analyze_scalar_evolution, that is the innermost enclosing exported
API that indirectly calls your new code, but it seems it is also called
recursively from analyze_scalar_evolution_1 or functions it calls.
So maybe you'd need to rename analyze_scalar_evolution to
analyze_scalar_evolution_2 and adjust some calls to analyze_scalar_evolution
to the latter in tree-scalar-evolution.c, and add analyze_scalar_evolution
wrapper that calls analyze_scalar_evolution_2 and free_affine_expand_cache.
Whether this would work depends on detailed analysis of the
tree-scalar-evolution.c callgraph, there are tons of functions, most of
them static, and the question is if there is a single non-static (or at most
a few of them) function that cover all calls to your new code in the
static functions it (indirectly) calls, i.e. if there isn't any other
external entry point that might call your new code without doing
free_affine_expand_cache afterwards.

If you can find that, I'd say it would be the safest thing to do.
But let's see what say Richard thinks about it too.

	Jakub
Richard Biener Dec. 11, 2013, 10:04 a.m. UTC | #11
On Wed, Dec 11, 2013 at 9:50 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Dec 11, 2013 at 04:31:55PM +0800, Bin.Cheng wrote:
>> Thank both of you for being patient on this patch.
>> I went through the documentation quickly and realized that I have to
>> modify pointer-map structure to make it recognized by GC (maybe more
>
> Changing pointer_map at this point is IMHO not appropriate.
>
>> work suggested by Jakub).  It seems I shouldn't include that task in
>> this patch at this stage 3, I am thinking just call free_affine*
>> function in the place it is created for SCEV.  Of course, that makes
>> it more expensive.
>
> Perhaps you could call free_affine_expand_cache say from
> analyze_scalar_evolution, that is the innermost enclosing exported
> API that indirectly calls your new code, but it seems it is also called
> recursively from analyze_scalar_evolution_1 or functions it calls.
> So maybe you'd need to rename analyze_scalar_evolution to
> analyze_scalar_evolution_2 and adjust some calls to analyze_scalar_evolution
> to the latter in tree-scalar-evolution.c, and add analyze_scalar_evolution
> wrapper that calls analyze_scalar_evolution_2 and free_affine_expand_cache.
> Whether this would work depends on detailed analysis of the
> tree-scalar-evolution.c callgraph, there are tons of functions, most of
> them static, and the question is if there is a single non-static (or at most
> a few of them) function that cover all calls to your new code in the
> static functions it (indirectly) calls, i.e. if there isn't any other
> external entry point that might call your new code without doing
> free_affine_expand_cache afterwards.
>
> If you can find that, I'd say it would be the safest thing to do.
> But let's see what say Richard thinks about it too.

I think keeping the SCEV cache live over multiple passes is fragile
(we do re-use SSA names and passes do not appropriately call
scev_reset[_htab] in all cases).

With fixing that the easiest approach is to associate
a affine-expand cache with the scev info.

Without that there isn't a very good solution other than discarding the
cache after each expansion at the point you use it.  The SCEV
cache should effectively make most of the affine expansion caching
moot (but you can try instrumenting that to verify it).

That is, I suggest to try freeing the cache right after the second
tree_to_aff_combination_expand.

Btw,

+  /* Try harder to check if they are equal.  */
+  tree_to_aff_combination_expand (left, type, &aff1, &peeled_chrec_map);
+  tree_to_aff_combination_expand (step_val, type, &aff2, &peeled_chrec_map);
+  aff_combination_scale (&aff2, double_int_minus_one);
+  aff_combination_add (&aff1, &aff2);
+  left = fold_convert (type, aff_combination_to_tree (&aff1));
+
+  /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP
+     if "left" equals to "init + right".  */
+  if (operand_equal_p (left, integer_zero_node, 0))

you don't need aff_combination_to_tree, simply check

 aff1.n == 0 && aff1.offset.is_zero ()

(you may want to add aff_combination_zero_p or even
aff_combination_equal_p)

Thanks,
Richard.


>         Jakub
diff mbox

Patch

Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c	(revision 205688)
+++ gcc/tree-scalar-evolution.c	(working copy)
@@ -280,6 +280,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "cfgloop.h"
 #include "tree-chrec.h"
+#include "pointer-set.h"
+#include "tree-affine.h"
 #include "tree-scalar-evolution.h"
 #include "dumpfile.h"
 #include "params.h"
@@ -1380,7 +1382,67 @@  follow_ssa_edge (struct loop *loop, gimple def, gi
 }
 
 
+/* Pointer map used when simplifying PEELED_CHREC into POLYNOMIAL_CHREC.  */
+static pointer_map_t *peeled_chrec_map;
 
+/* Simplify PEELED_CHREC represented by (init_cond, arg) in LOOP.
+   Handle below case and return the corresponding POLYNOMIAL_CHREC:
+
+   # i_17 = PHI <i_13(5), 0(3)>
+   # _20 = PHI <_5(5), start_4(D)(3)>
+   ...
+   i_13 = i_17 + 1;
+   _5 = start_4(D) + i_13;
+
+   Though variable _20 appears as a PEELED_CHREC in the form of
+   (start_4, _5)_LOOP, it's a POLYNOMIAL_CHREC like {start_4, 1}_LOOP.
+
+   See PR41488.  */
+
+static tree
+simplify_peeled_chrec (struct loop *loop, tree arg, tree init_cond)
+{
+  aff_tree aff1, aff2;
+  tree ev, left, right, type, step_val;
+
+  ev = instantiate_parameters (loop, analyze_scalar_evolution (loop, arg));
+  if (ev == NULL_TREE || TREE_CODE (ev) != POLYNOMIAL_CHREC)
+    return chrec_dont_know;
+
+  left = CHREC_LEFT (ev);
+  right = CHREC_RIGHT (ev);
+  type = TREE_TYPE (left);
+  step_val = chrec_fold_plus (type, init_cond, right);
+
+  /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP
+     if "left" equals to "init + right".  */
+  if (operand_equal_p (left, step_val, 0))
+    {
+      if (dump_file && (dump_flags & TDF_SCEV))
+	fprintf (dump_file, "Simplify PEELED_CHREC into POLYNOMIAL_CHREC.\n");
+
+      return build_polynomial_chrec (loop->num, init_cond, right);
+    }
+
+  /* Try harder to check if they are equal.  */
+  tree_to_aff_combination_expand (left, type, &aff1, &peeled_chrec_map);
+  tree_to_aff_combination_expand (step_val, type, &aff2, &peeled_chrec_map);
+  aff_combination_scale (&aff2, double_int_minus_one);
+  aff_combination_add (&aff1, &aff2);
+  left = fold_convert (type, aff_combination_to_tree (&aff1));
+
+  /* Transform (init, {left, right}_LOOP)_LOOP to {init, right}_LOOP
+     if "left" equals to "init + right".  */
+  if (operand_equal_p (left, integer_zero_node, 0))
+    {
+      if (dump_file && (dump_flags & TDF_SCEV))
+	fprintf (dump_file, "Simplify PEELED_CHREC into POLYNOMIAL_CHREC.\n");
+
+      return build_polynomial_chrec (loop->num, init_cond, right);
+    }
+  return chrec_dont_know;
+}
+
 /* Given a LOOP_PHI_NODE, this function determines the evolution
    function from LOOP_PHI_NODE to LOOP_PHI_NODE in the loop.  */
 
@@ -1392,6 +1454,7 @@  analyze_evolution_in_loop (gimple loop_phi_node,
   tree evolution_function = chrec_not_analyzed_yet;
   struct loop *loop = loop_containing_stmt (loop_phi_node);
   basic_block bb;
+  static bool simplify_peeled_chrec_p = true;
 
   if (dump_file && (dump_flags & TDF_SCEV))
     {
@@ -1442,7 +1505,19 @@  analyze_evolution_in_loop (gimple loop_phi_node,
 	 all the other iterations it has the value of ARG.
 	 For the moment, PEELED_CHREC nodes are not built.  */
       if (res != t_true)
-	ev_fn = chrec_dont_know;
+	{
+	  ev_fn = chrec_dont_know;
+	  /* Try to recognize POLYNOMIAL_CHREC which appears in
+	     the form of PEELED_CHREC, but guard the process with
+	     a bool variable to keep the analyzer from infinite
+	     recurrence for real PEELED_RECs.  */
+	  if (simplify_peeled_chrec_p && TREE_CODE (arg) == SSA_NAME)
+	    {
+	      simplify_peeled_chrec_p = false;
+	      ev_fn = simplify_peeled_chrec (loop, arg, init_cond);
+	      simplify_peeled_chrec_p = true;
+	    }
+	}
 
       /* When there are multiple back edges of the loop (which in fact never
 	 happens currently, but nevertheless), merge their evolutions.  */
@@ -3086,6 +3161,8 @@  scev_initialize (void)
 
   initialize_scalar_evolutions_analyzer ();
 
+  peeled_chrec_map = pointer_map_create ();
+
   FOR_EACH_LOOP (loop, 0)
     {
       loop->nb_iterations = NULL_TREE;
@@ -3122,6 +3199,12 @@  scev_reset (void)
 
   scev_reset_htab ();
 
+  if (peeled_chrec_map)
+    {
+      pointer_map_destroy (peeled_chrec_map);
+      peeled_chrec_map = NULL;
+    }
+
   if (!current_loops)
     return;
 
@@ -3209,6 +3292,11 @@  scev_finalize (void)
     return;
   htab_delete (scalar_evolution_info);
   scalar_evolution_info = NULL;
+  if (peeled_chrec_map)
+    {
+      pointer_map_destroy (peeled_chrec_map);
+      peeled_chrec_map = NULL;
+    }
 }
 
 /* Returns true if the expression EXPR is considered to be too expensive
Index: gcc/testsuite/gcc.dg/tree-ssa/scev-7.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/scev-7.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/scev-7.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sccp-scev" } */
+
+struct struct_t
+{
+  int* data;
+};
+
+void foo (struct struct_t* sp, int start, int end)
+{
+  int i;
+
+  for (i = 1000; i+start > end; i--)
+    sp->data[i+start] = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into POLYNOMIAL_CHREC" 1 "sccp" } } */
+/* { dg-final { cleanup-tree-dump "sccp" } } */
Index: gcc/testsuite/gcc.dg/pr41488.c
===================================================================
--- gcc/testsuite/gcc.dg/pr41488.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr41488.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sccp-scev" } */
+
+struct struct_t
+{
+  int* data;
+};
+
+void foo (struct struct_t* sp, int start, int end)
+{
+  int i;
+
+  for (i = 0; i+start < end; i++)
+    sp->data[i+start] = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into POLYNOMIAL_CHREC" 1 "sccp" } } */
+/* { dg-final { cleanup-tree-dump "sccp" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 205688)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -2526,11 +2526,19 @@  add_old_iv_candidates (struct ivopts_data *data, s
       /* Additionally record the possibility of leaving the original iv
 	 untouched.  */
       def = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (data->current_loop));
-      cand = add_candidate_1 (data,
-			      iv->base, iv->step, true, IP_ORIGINAL, NULL,
-			      SSA_NAME_DEF_STMT (def));
-      cand->var_before = iv->ssa_name;
-      cand->var_after = def;
+      /* Don't add candidate if it's from another PHI node because
+	  it's an affine iv appearing in the form of PEELED_CHREC.  */
+      phi = SSA_NAME_DEF_STMT (def);
+      if (gimple_code (phi) != GIMPLE_PHI)
+	{
+	  cand = add_candidate_1 (data,
+				  iv->base, iv->step, true, IP_ORIGINAL, NULL,
+				  SSA_NAME_DEF_STMT (def));
+	  cand->var_before = iv->ssa_name;
+	  cand->var_after = def;
+	}
+      else
+	gcc_assert (gimple_bb (phi) == data->current_loop->header);
     }
 }