diff mbox

[5/5] Enable tree loop distribution at -O3 and above optimization levels.

Message ID CAHFci2_TpuXZ+bZ1zLYtU76G8MwVQZTs+GgJaVAONmqsXMy8+A@mail.gmail.com
State New
Headers show

Commit Message

Bin.Cheng Aug. 7, 2017, 9:10 a.m. UTC
On Fri, Jun 23, 2017 at 12:04 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Jun 23, 2017 at 10:47 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, Jun 23, 2017 at 6:04 AM, Jeff Law <law@redhat.com> wrote:
>>> On 06/07/2017 02:07 AM, Bin.Cheng wrote:
>>>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>>>>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>>>>> Hi,
>>>>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>>>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>>>>
>>>>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>>>>
>>>>>> Thanks,
>>>>>> bin
>>>>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>>>>
>>>>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>>>>       for -O3 and above levels.
>>>>> I think the question is how does this generally impact the performance
>>>>> of the generated code and to a lesser degree compile-time.
>>>>>
>>>>> Do you have any performance data?
>>>> Hi Jeff,
>>>> At this stage of the patch, only hmmer is impacted and improved
>>>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>>>> term, loop distribution is also one prerequisite transformation to
>>>> handle bwaves (at least).  For these two impacted cases, it helps to
>>>> resolve the gap against ICC.  I didn't check compilation time slow
>>>> down, we can restrict it to problem with small partition number if
>>>> that's a problem.
>>> Just a note. I know you've iterated further with Richi -- I'm not
>>> objecting to the patch, nor was I ready to approve.
>>>
>>> Are you and Richi happy with this as-is or are you looking to submit
>>> something newer based on the conversation the two of you have had?
>> Hi Jeff,
>> The patch series is updated in various ways according to review
>> comments, for example, it restricts compilation time by checking
>> number of data references against MAX_DATAREFS_FOR_DATADEPS as well as
>> restores data dependence cache.  There are still two missing parts I'd
>> like to do as followup patches: one is loop nest distribution and the
>> other is a data-locality cost model (at least) for small cases.  Now
>> Richi approved most patches except the last major one, but I still
>> need another iterate for some (approved) patches in order to fix
>> mistake/typo introduced when I separating the patch.
>
> The patch is ok after the approved parts of the ldist series has been committed.
> Note your patch lacks updates to invoke.texi (what options are enabled at -O3).
> Please adjust that before committing.
Hi All,
Given the loop distribution patches have been merged for a while and
couple of issues fixed.  I am submitting updated patch to enable the
pass by default at O3/above levels.
Bootstrap and test on x86_64 and AArch64 ongoing.  Hmmer still can be
improved.  Is it OK if no failure?

Thanks,
bin
2017-08-07  Bin Cheng  <bin.cheng@arm.com>

    * doc/invoke.texi: Document -ftree-loop-distribution for O3.
    * opts.c (default_options_table): Add OPT_ftree_loop_distribution.

Comments

Richard Biener Aug. 8, 2017, 1:10 p.m. UTC | #1
On Mon, Aug 7, 2017 at 11:10 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, Jun 23, 2017 at 12:04 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Jun 23, 2017 at 10:47 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Fri, Jun 23, 2017 at 6:04 AM, Jeff Law <law@redhat.com> wrote:
>>>> On 06/07/2017 02:07 AM, Bin.Cheng wrote:
>>>>> On Tue, Jun 6, 2017 at 6:47 PM, Jeff Law <law@redhat.com> wrote:
>>>>>> On 06/02/2017 05:52 AM, Bin Cheng wrote:
>>>>>>> Hi,
>>>>>>> This patch enables -ftree-loop-distribution by default at -O3 and above optimization levels.
>>>>>>> Bootstrap and test at O2/O3 on x86_64 and AArch64.  is it OK?
>>>>>>>
>>>>>>> Note I don't have strong opinion here and am fine with either it's accepted or rejected.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> bin
>>>>>>> 2017-05-31  Bin Cheng  <bin.cheng@arm.com>
>>>>>>>
>>>>>>>       * opts.c (default_options_table): Enable OPT_ftree_loop_distribution
>>>>>>>       for -O3 and above levels.
>>>>>> I think the question is how does this generally impact the performance
>>>>>> of the generated code and to a lesser degree compile-time.
>>>>>>
>>>>>> Do you have any performance data?
>>>>> Hi Jeff,
>>>>> At this stage of the patch, only hmmer is impacted and improved
>>>>> obviously in my local run of spec2006 for x86_64 and AArch64.  In long
>>>>> term, loop distribution is also one prerequisite transformation to
>>>>> handle bwaves (at least).  For these two impacted cases, it helps to
>>>>> resolve the gap against ICC.  I didn't check compilation time slow
>>>>> down, we can restrict it to problem with small partition number if
>>>>> that's a problem.
>>>> Just a note. I know you've iterated further with Richi -- I'm not
>>>> objecting to the patch, nor was I ready to approve.
>>>>
>>>> Are you and Richi happy with this as-is or are you looking to submit
>>>> something newer based on the conversation the two of you have had?
>>> Hi Jeff,
>>> The patch series is updated in various ways according to review
>>> comments, for example, it restricts compilation time by checking
>>> number of data references against MAX_DATAREFS_FOR_DATADEPS as well as
>>> restores data dependence cache.  There are still two missing parts I'd
>>> like to do as followup patches: one is loop nest distribution and the
>>> other is a data-locality cost model (at least) for small cases.  Now
>>> Richi approved most patches except the last major one, but I still
>>> need another iterate for some (approved) patches in order to fix
>>> mistake/typo introduced when I separating the patch.
>>
>> The patch is ok after the approved parts of the ldist series has been committed.
>> Note your patch lacks updates to invoke.texi (what options are enabled at -O3).
>> Please adjust that before committing.
> Hi All,
> Given the loop distribution patches have been merged for a while and
> couple of issues fixed.  I am submitting updated patch to enable the
> pass by default at O3/above levels.
> Bootstrap and test on x86_64 and AArch64 ongoing.  Hmmer still can be
> improved.  Is it OK if no failure?

Ok.

Thanks,
Richard.

> Thanks,
> bin
> 2017-08-07  Bin Cheng  <bin.cheng@arm.com>
>
>     * doc/invoke.texi: Document -ftree-loop-distribution for O3.
>     * opts.c (default_options_table): Add OPT_ftree_loop_distribution.
diff mbox

Patch

From 2bda01a939ac8c0bf54f04f7e29cc0d3155c7626 Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Wed, 28 Jun 2017 10:54:17 +0100
Subject: [PATCH] enable-loop-distribution-O3-20170802.txt

---
 gcc/doc/invoke.texi | 21 ++++++++++++++-------
 gcc/opts.c          |  1 +
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5ae9dc4..f48a71a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7248,13 +7248,20 @@  invoking @option{-O2} on programs that use computed gotos.
 @item -O3
 @opindex O3
 Optimize yet more.  @option{-O3} turns on all optimizations specified
-by @option{-O2} and also turns on the @option{-finline-functions},
-@option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
-@option{-ftree-loop-distribute-patterns}, @option{-fsplit-paths}
-@option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
-@option{-ftree-partial-pre}, @option{-fpeel-loops}
-and @option{-fipa-cp-clone} options.
+by @option{-O2} and also turns on the following optimization flags:
+@gccoptlist{-finline-functions @gol
+-funswitch-loops @gol
+-fpredictive-commoning @gol
+-fgcse-after-reload @gol
+-ftree-loop-vectorize @gol
+-ftree-loop-distribution @gol
+-ftree-loop-distribute-patterns @gol
+-fsplit-paths @gol
+-ftree-slp-vectorize @gol
+-fvect-cost-model @gol
+-ftree-partial-pre @gol
+-fpeel-loops @gol
+-fipa-cp-clone}
 
 @item -O0
 @opindex O0
diff --git a/gcc/opts.c b/gcc/opts.c
index 989cc6b..19e8c7f 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -525,6 +525,7 @@  static const struct default_options default_options_table[] =
 
     /* -O3 optimizations.  */
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribution, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fpredictive_commoning, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_fsplit_paths, NULL, 1 },
     /* Inlining of functions reducing size is a good idea with -Os
-- 
1.9.1