[PR62178] Improve candidate selecting in IVOPT, 2nd try.

On Tue, Dec 16, 2014 at 4:42 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Dec 11, 2014 at 8:08 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Dec 11, 2014 at 10:56 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Wed, Dec 10, 2014 at 9:47 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Fri, Dec 5, 2014 at 1:15 PM, Bin Cheng <bin.cheng@arm.com> wrote:
>>>>> Hi,
>>>>> Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt
>>>>> issue still exists.
>>>>>
>>>>> Current candidate selecting algorithm tends to select fewer candidates given
>>>>> below reasons:
>>>>>   1) to better handle loops with many induction uses but the best choice is
>>>>> one generic basic induction variable;
>>>>>   2) to keep compilation time low.
>>>>>
>>>>> One fundamental weakness of the strategy is the opposite situation can't be
>>>>> handled properly sometimes.  For these cases the best choice is each
>>>>> induction variable has its own candidate.
>>>>> This patch fixes the problem by shuffling candidate set after fix-point is
>>>>> reached by current implementation.  The reason why this strategy works is it
>>>>> replaces candidate set by selecting local optimal candidate for some
>>>>> induction uses, and the new candidate set (has lower cost) is exact what we
>>>>> want in the mentioned case.  Instrumentation data shows this can find better
>>>>> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64.
>>>>>
>>>>> This patch actually is extension to the first version patch posted at
>>>>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds
>>>>> another selecting pass with special seed set (more or less like the shuffled
>>>>> set in this patch).  Data also confirms this patch can find optimal sets for
>>>>> most loops found by the first one, as well as optimal sets for many new
>>>>> loops.
>>>>>
>>>>> Bootstrap and test on x86_64, no regression on benchmarks.  Bootstrap and
>>>>> test on aarch64.
>>>>> Since this patch only selects candidate set with lower cost, any regressions
>>>>> revealed are latent bugs of other components in GCC.
>>>>> I also collected GCC bootstrap time on x86_64, no regression either.
>>>>> Is this OK?
>>>>
>>>> The algorithm seems to be quadratic in the number of IV candidates
>>>> (at least):
>>> Yes, I worried about that too, that's why I measured the bootstrap
>>> time.  One way is restrict this procedure one time for each loop.  I
>>> already tried that and it can capture +90% loops.  Is this sounds
>>> reasonable?
>>
>> Yes.  That's my suggestion to handle it in the caller of try_improve_iv_set?
>>
>>> BTW, do we have some compilation time benchmarks for GCC?
>>
>> There are various testcases linked from PR47344, I don't remember
>> any particular one putting load on IVOPTs (but I do remember seeing
>> IVOPTs in the ~25% area in -ftime-report for some testcases).
>

Hi,
I further refined the patch.  Specifically, I factored out common
code, improved comments, and restricted new code in several ways, for
example, now iv_ca_replace runs exactly one time for each
find_optimal_iv_set; iv_ca_replace only tries to replace one candidate
in IVS each time and makes quick return if lower cost set is found;
most importantly, iv_ca_replace now checks
ALWAYS_PRUNE_CAND_SET_BOUND.
The patch is simplified with these changes.  As for compilation time,
IVOPT isn't regressed obviously for the overloaded case I created,
also regression in llvm compilation time benchmarks is gone.

I think we could adapt data structure in IVOPT to make it faster, for
example, record information in candidate about which uses are
represented by each cand, sort candidates by cost for each iv use.  I
may do some refactor in next stage1.

Bootstrap on x86_64, test ongoing.  So OK if no regressions?

Thanks,
bin

2014-12-17  Bin Cheng  <bin.cheng@arm.com>

    PR tree-optimization/62178
    * tree-ssa-loop-ivopts.c (cheaper_cost_with_cand): New function.
    (iv_ca_replace): New function.
    (try_improve_iv_set): New parameter try_replace_p.
    Break local optimal fixed-point by calling iv_ca_replace.
    (find_optimal_iv_set_1): Pass new argument to try_improve_iv_set.

gcc/testsuite/ChangeLog
2014-12-17  Bin Cheng  <bin.cheng@arm.com>

    PR tree-optimization/62178
    * gcc.target/aarch64/pr62178.c: New test.

[PR62178] Improve candidate selecting in IVOPT, 2nd try.

Commit Message

Comments

Patch