diff mbox series

[v2] vect: Replace hardcoded weight factor with param

Message ID 1ea0e5ca-3ce5-b35d-9b07-044d07ad5abf@linux.ibm.com
State New
Headers show
Series [v2] vect: Replace hardcoded weight factor with param | expand

Commit Message

Kewen.Lin May 19, 2021, 9:46 a.m. UTC
Hi Richi,

on 2021/5/19 下午4:15, Richard Biener wrote:
> On Wed, May 19, 2021 at 8:20 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> This patch is to replace the current hardcoded weight factor 50
>> for those statements in an inner loop relative to the loop being
>> vectorized with a specific parameter vect-inner-loop-weight-factor.
>>
>> The motivation behind this change is: if targets want to have one
>> unique function to gather some information in each add_stmt_cost
>> call, no matter that it's put before or after the cost tweaking
>> part for inner loop, it may have the need to adjust (expand or
>> shrink) the gathered data as the factor.  Now the factor is
>> hardcoded, it's not easily maintained.  Since it's possible that
>> targets have their own decisions on this costing like the others,
>> I used parameter instead of one unique macro here.
>>
>> Testing is ongoing, is it ok for trunk if everything goes well?
> 
> Certainly an improvement.  I suppose we might want to put
> the factor into vinfo->inner_loop_cost_factor.  That way
> we could adjust it easily in common code in the vectorizer
> when we for example have (non-guessed) profile data.
> 
> "weight_factor" is kind-of double-speak and I'm missing 'cost' ...
> so, bike-shedding to vect_inner_loop_cost_factor?
> 
> Just suggestions - as said, the patch is an improvement already.
> 

Thanks for your nice suggestions!  I've updated the patch accordingly
and attached it.  Does it look better to you?

btw, the testing on the previous patch passed, new round testing was
just kicked off.

BR,
Kewen
------
gcc/ChangeLog:

	* doc/invoke.texi (vect-inner-loop-cost-factor): Document new
	parameter.
	* params.opt (vect-inner-loop-cost-factor): New.
	* targhooks.c (default_add_stmt_cost): Replace hardcoded factor
	50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR, include head file
	tree-vectorizer.h and its required ones.
	* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Replace
	hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR.
	* config/arm/arm.c (arm_add_stmt_cost): Likewise.
	* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
	* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
	* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
	Likewise.
	(_loop_vec_info::_loop_vec_info): Init inner_loop_cost_factor.
	* tree-vectorizer.h (_loop_vec_info): Add inner_loop_cost_factor.
	(LOOP_VINFO_INNER_LOOP_COST_FACTOR): New macro.

Comments

Richard Biener May 19, 2021, 10:01 a.m. UTC | #1
On Wed, May 19, 2021 at 11:47 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> Hi Richi,
>
> on 2021/5/19 下午4:15, Richard Biener wrote:
> > On Wed, May 19, 2021 at 8:20 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi,
> >>
> >> This patch is to replace the current hardcoded weight factor 50
> >> for those statements in an inner loop relative to the loop being
> >> vectorized with a specific parameter vect-inner-loop-weight-factor.
> >>
> >> The motivation behind this change is: if targets want to have one
> >> unique function to gather some information in each add_stmt_cost
> >> call, no matter that it's put before or after the cost tweaking
> >> part for inner loop, it may have the need to adjust (expand or
> >> shrink) the gathered data as the factor.  Now the factor is
> >> hardcoded, it's not easily maintained.  Since it's possible that
> >> targets have their own decisions on this costing like the others,
> >> I used parameter instead of one unique macro here.
> >>
> >> Testing is ongoing, is it ok for trunk if everything goes well?
> >
> > Certainly an improvement.  I suppose we might want to put
> > the factor into vinfo->inner_loop_cost_factor.  That way
> > we could adjust it easily in common code in the vectorizer
> > when we for example have (non-guessed) profile data.
> >
> > "weight_factor" is kind-of double-speak and I'm missing 'cost' ...
> > so, bike-shedding to vect_inner_loop_cost_factor?
> >
> > Just suggestions - as said, the patch is an improvement already.
> >
>
> Thanks for your nice suggestions!  I've updated the patch accordingly
> and attached it.  Does it look better to you?

Minor nit:

+@item vect-inner-loop-cost-factor
+The factor which loop vectorizer uses to over weight those statements in
+an inner loop relative to the loop being vectorized.
+

the default value should be documented here, not..

+-param=vect-inner-loop-cost-factor=
+Common Joined UInteger Var(param_vect_inner_loop_cost_factor)
Init(50) IntegerRange(1, 999999) Param Optimization
+Indicates the factor which loop vectorizer uses to over weight those
statements in an inner loop relative to the loop being vectorized.
The default value is 50.
+

here (based on statistical analysis of existing cases).  Also the
params.opt docs
should be the "brief" one - but for simplicity simply make both docs identical
(apart from the default value doc).  I suggest

"The factor which the loop vectorizer applies to the cost of statements
in an inner loop relative to the loop being vectorized."

OK with that change.

Richard.

> btw, the testing on the previous patch passed, new round testing was
> just kicked off.
>
> BR,
> Kewen
> ------
> gcc/ChangeLog:
>
>         * doc/invoke.texi (vect-inner-loop-cost-factor): Document new
>         parameter.
>         * params.opt (vect-inner-loop-cost-factor): New.
>         * targhooks.c (default_add_stmt_cost): Replace hardcoded factor
>         50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR, include head file
>         tree-vectorizer.h and its required ones.
>         * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Replace
>         hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR.
>         * config/arm/arm.c (arm_add_stmt_cost): Likewise.
>         * config/i386/i386.c (ix86_add_stmt_cost): Likewise.
>         * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
>         * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
>         Likewise.
>         (_loop_vec_info::_loop_vec_info): Init inner_loop_cost_factor.
>         * tree-vectorizer.h (_loop_vec_info): Add inner_loop_cost_factor.
>         (LOOP_VINFO_INNER_LOOP_COST_FACTOR): New macro.
>
Kewen.Lin May 20, 2021, 8:52 a.m. UTC | #2
on 2021/5/19 下午6:01, Richard Biener wrote:
> On Wed, May 19, 2021 at 11:47 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> Hi Richi,
>>
>> on 2021/5/19 下午4:15, Richard Biener wrote:
>>> On Wed, May 19, 2021 at 8:20 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> This patch is to replace the current hardcoded weight factor 50
>>>> for those statements in an inner loop relative to the loop being
>>>> vectorized with a specific parameter vect-inner-loop-weight-factor.
>>>>
>>>> The motivation behind this change is: if targets want to have one
>>>> unique function to gather some information in each add_stmt_cost
>>>> call, no matter that it's put before or after the cost tweaking
>>>> part for inner loop, it may have the need to adjust (expand or
>>>> shrink) the gathered data as the factor.  Now the factor is
>>>> hardcoded, it's not easily maintained.  Since it's possible that
>>>> targets have their own decisions on this costing like the others,
>>>> I used parameter instead of one unique macro here.
>>>>
>>>> Testing is ongoing, is it ok for trunk if everything goes well?
>>>
>>> Certainly an improvement.  I suppose we might want to put
>>> the factor into vinfo->inner_loop_cost_factor.  That way
>>> we could adjust it easily in common code in the vectorizer
>>> when we for example have (non-guessed) profile data.
>>>
>>> "weight_factor" is kind-of double-speak and I'm missing 'cost' ...
>>> so, bike-shedding to vect_inner_loop_cost_factor?
>>>
>>> Just suggestions - as said, the patch is an improvement already.
>>>
>>
>> Thanks for your nice suggestions!  I've updated the patch accordingly
>> and attached it.  Does it look better to you?
> 
> Minor nit:
> 
> +@item vect-inner-loop-cost-factor
> +The factor which loop vectorizer uses to over weight those statements in
> +an inner loop relative to the loop being vectorized.
> +
> 
> the default value should be documented here, not..
> 
> +-param=vect-inner-loop-cost-factor=
> +Common Joined UInteger Var(param_vect_inner_loop_cost_factor)
> Init(50) IntegerRange(1, 999999) Param Optimization
> +Indicates the factor which loop vectorizer uses to over weight those
> statements in an inner loop relative to the loop being vectorized.
> The default value is 50.
> +
> 
> here (based on statistical analysis of existing cases).  Also the
> params.opt docs
> should be the "brief" one - but for simplicity simply make both docs identical
> (apart from the default value doc).  I suggest
> 
> "The factor which the loop vectorizer applies to the cost of statements
> in an inner loop relative to the loop being vectorized."
> 

Thanks for catching this and the suggestion!   

Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux
and aarch64-linux-gnu.

Committed in r12-939 as the suggested wordings.   

BR,
Kewen
Christophe Lyon May 20, 2021, 9:30 a.m. UTC | #3
On Thu, 20 May 2021 at 10:52, Kewen.Lin via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> on 2021/5/19 下午6:01, Richard Biener wrote:
> > On Wed, May 19, 2021 at 11:47 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>
> >> Hi Richi,
> >>
> >> on 2021/5/19 下午4:15, Richard Biener wrote:
> >>> On Wed, May 19, 2021 at 8:20 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> This patch is to replace the current hardcoded weight factor 50
> >>>> for those statements in an inner loop relative to the loop being
> >>>> vectorized with a specific parameter vect-inner-loop-weight-factor.
> >>>>
> >>>> The motivation behind this change is: if targets want to have one
> >>>> unique function to gather some information in each add_stmt_cost
> >>>> call, no matter that it's put before or after the cost tweaking
> >>>> part for inner loop, it may have the need to adjust (expand or
> >>>> shrink) the gathered data as the factor.  Now the factor is
> >>>> hardcoded, it's not easily maintained.  Since it's possible that
> >>>> targets have their own decisions on this costing like the others,
> >>>> I used parameter instead of one unique macro here.
> >>>>
> >>>> Testing is ongoing, is it ok for trunk if everything goes well?
> >>>
> >>> Certainly an improvement.  I suppose we might want to put
> >>> the factor into vinfo->inner_loop_cost_factor.  That way
> >>> we could adjust it easily in common code in the vectorizer
> >>> when we for example have (non-guessed) profile data.
> >>>
> >>> "weight_factor" is kind-of double-speak and I'm missing 'cost' ...
> >>> so, bike-shedding to vect_inner_loop_cost_factor?
> >>>
> >>> Just suggestions - as said, the patch is an improvement already.
> >>>
> >>
> >> Thanks for your nice suggestions!  I've updated the patch accordingly
> >> and attached it.  Does it look better to you?
> >
> > Minor nit:
> >
> > +@item vect-inner-loop-cost-factor
> > +The factor which loop vectorizer uses to over weight those statements in
> > +an inner loop relative to the loop being vectorized.
> > +
> >
> > the default value should be documented here, not..
> >
> > +-param=vect-inner-loop-cost-factor=
> > +Common Joined UInteger Var(param_vect_inner_loop_cost_factor)
> > Init(50) IntegerRange(1, 999999) Param Optimization
> > +Indicates the factor which loop vectorizer uses to over weight those
> > statements in an inner loop relative to the loop being vectorized.
> > The default value is 50.
> > +
> >
> > here (based on statistical analysis of existing cases).  Also the
> > params.opt docs
> > should be the "brief" one - but for simplicity simply make both docs identical
> > (apart from the default value doc).  I suggest
> >
> > "The factor which the loop vectorizer applies to the cost of statements
> > in an inner loop relative to the loop being vectorized."
> >
>
> Thanks for catching this and the suggestion!
>
> Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux
> and aarch64-linux-gnu.
>

This breaks the build for arm targets:
/tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
In function 'unsigned int arm_add_stmt_cost(vec_info*, void*, int,
vect_cost_for_stmt, _stmt_v
ec_info*, tree, int, vect_cost_model_location)':
/tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:12230:4:
error: 'loop_vec_info' was not declared in this scope
    loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
    ^
/tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:12230:18:
error: expected ';' before 'loop_vinfo'
    loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);

Can you fix it?

Thanks,

Christophe


> Committed in r12-939 as the suggested wordings.


>
> BR,
> Kewen
Kewen.Lin May 20, 2021, 10:09 a.m. UTC | #4
on 2021/5/20 下午5:30, Christophe Lyon wrote:
> On Thu, 20 May 2021 at 10:52, Kewen.Lin via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> on 2021/5/19 下午6:01, Richard Biener wrote:
>>> On Wed, May 19, 2021 at 11:47 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>
>>>> Hi Richi,
>>>>
>>>> on 2021/5/19 下午4:15, Richard Biener wrote:
>>>>> On Wed, May 19, 2021 at 8:20 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This patch is to replace the current hardcoded weight factor 50
>>>>>> for those statements in an inner loop relative to the loop being
>>>>>> vectorized with a specific parameter vect-inner-loop-weight-factor.
>>>>>>
>>>>>> The motivation behind this change is: if targets want to have one
>>>>>> unique function to gather some information in each add_stmt_cost
>>>>>> call, no matter that it's put before or after the cost tweaking
>>>>>> part for inner loop, it may have the need to adjust (expand or
>>>>>> shrink) the gathered data as the factor.  Now the factor is
>>>>>> hardcoded, it's not easily maintained.  Since it's possible that
>>>>>> targets have their own decisions on this costing like the others,
>>>>>> I used parameter instead of one unique macro here.
>>>>>>
>>>>>> Testing is ongoing, is it ok for trunk if everything goes well?
>>>>>
>>>>> Certainly an improvement.  I suppose we might want to put
>>>>> the factor into vinfo->inner_loop_cost_factor.  That way
>>>>> we could adjust it easily in common code in the vectorizer
>>>>> when we for example have (non-guessed) profile data.
>>>>>
>>>>> "weight_factor" is kind-of double-speak and I'm missing 'cost' ...
>>>>> so, bike-shedding to vect_inner_loop_cost_factor?
>>>>>
>>>>> Just suggestions - as said, the patch is an improvement already.
>>>>>
>>>>
>>>> Thanks for your nice suggestions!  I've updated the patch accordingly
>>>> and attached it.  Does it look better to you?
>>>
>>> Minor nit:
>>>
>>> +@item vect-inner-loop-cost-factor
>>> +The factor which loop vectorizer uses to over weight those statements in
>>> +an inner loop relative to the loop being vectorized.
>>> +
>>>
>>> the default value should be documented here, not..
>>>
>>> +-param=vect-inner-loop-cost-factor=
>>> +Common Joined UInteger Var(param_vect_inner_loop_cost_factor)
>>> Init(50) IntegerRange(1, 999999) Param Optimization
>>> +Indicates the factor which loop vectorizer uses to over weight those
>>> statements in an inner loop relative to the loop being vectorized.
>>> The default value is 50.
>>> +
>>>
>>> here (based on statistical analysis of existing cases).  Also the
>>> params.opt docs
>>> should be the "brief" one - but for simplicity simply make both docs identical
>>> (apart from the default value doc).  I suggest
>>>
>>> "The factor which the loop vectorizer applies to the cost of statements
>>> in an inner loop relative to the loop being vectorized."
>>>
>>
>> Thanks for catching this and the suggestion!
>>
>> Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux
>> and aarch64-linux-gnu.
>>
> 
> This breaks the build for arm targets:
> /tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
> In function 'unsigned int arm_add_stmt_cost(vec_info*, void*, int,
> vect_cost_for_stmt, _stmt_v
> ec_info*, tree, int, vect_cost_model_location)':
> /tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:12230:4:
> error: 'loop_vec_info' was not declared in this scope
>     loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
>     ^
> /tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:12230:18:
> error: expected ';' before 'loop_vinfo'
>     loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
> 
> Can you fix it?
> 

Oops!  Deeply sorry for that and thanks for the testing!

I just found that unlike the other targets arm.c doesn't include
"tree-vectorizer.h".  The issue should be fixed with the below patch:

gcc/ChangeLog:

	* config/arm/arm.c: Include head files tree-vectorizer.h and
	cfgloop.h.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index caf4e56b9fe..6ed34fbf627 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -69,6 +69,8 @@
 #include "gimplify.h"
 #include "gimple.h"
 #include "selftest.h"
+#include "cfgloop.h"
+#include "tree-vectorizer.h"

 /* This file should be included last.  */
 #include "target-def.h"


Is it counted as a obvious patch?

BR,
Kewen
Richard Biener May 20, 2021, 10:25 a.m. UTC | #5
On Thu, May 20, 2021 at 12:09 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>
> on 2021/5/20 下午5:30, Christophe Lyon wrote:
> > On Thu, 20 May 2021 at 10:52, Kewen.Lin via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> on 2021/5/19 下午6:01, Richard Biener wrote:
> >>> On Wed, May 19, 2021 at 11:47 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>
> >>>> Hi Richi,
> >>>>
> >>>> on 2021/5/19 下午4:15, Richard Biener wrote:
> >>>>> On Wed, May 19, 2021 at 8:20 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> This patch is to replace the current hardcoded weight factor 50
> >>>>>> for those statements in an inner loop relative to the loop being
> >>>>>> vectorized with a specific parameter vect-inner-loop-weight-factor.
> >>>>>>
> >>>>>> The motivation behind this change is: if targets want to have one
> >>>>>> unique function to gather some information in each add_stmt_cost
> >>>>>> call, no matter that it's put before or after the cost tweaking
> >>>>>> part for inner loop, it may have the need to adjust (expand or
> >>>>>> shrink) the gathered data as the factor.  Now the factor is
> >>>>>> hardcoded, it's not easily maintained.  Since it's possible that
> >>>>>> targets have their own decisions on this costing like the others,
> >>>>>> I used parameter instead of one unique macro here.
> >>>>>>
> >>>>>> Testing is ongoing, is it ok for trunk if everything goes well?
> >>>>>
> >>>>> Certainly an improvement.  I suppose we might want to put
> >>>>> the factor into vinfo->inner_loop_cost_factor.  That way
> >>>>> we could adjust it easily in common code in the vectorizer
> >>>>> when we for example have (non-guessed) profile data.
> >>>>>
> >>>>> "weight_factor" is kind-of double-speak and I'm missing 'cost' ...
> >>>>> so, bike-shedding to vect_inner_loop_cost_factor?
> >>>>>
> >>>>> Just suggestions - as said, the patch is an improvement already.
> >>>>>
> >>>>
> >>>> Thanks for your nice suggestions!  I've updated the patch accordingly
> >>>> and attached it.  Does it look better to you?
> >>>
> >>> Minor nit:
> >>>
> >>> +@item vect-inner-loop-cost-factor
> >>> +The factor which loop vectorizer uses to over weight those statements in
> >>> +an inner loop relative to the loop being vectorized.
> >>> +
> >>>
> >>> the default value should be documented here, not..
> >>>
> >>> +-param=vect-inner-loop-cost-factor=
> >>> +Common Joined UInteger Var(param_vect_inner_loop_cost_factor)
> >>> Init(50) IntegerRange(1, 999999) Param Optimization
> >>> +Indicates the factor which loop vectorizer uses to over weight those
> >>> statements in an inner loop relative to the loop being vectorized.
> >>> The default value is 50.
> >>> +
> >>>
> >>> here (based on statistical analysis of existing cases).  Also the
> >>> params.opt docs
> >>> should be the "brief" one - but for simplicity simply make both docs identical
> >>> (apart from the default value doc).  I suggest
> >>>
> >>> "The factor which the loop vectorizer applies to the cost of statements
> >>> in an inner loop relative to the loop being vectorized."
> >>>
> >>
> >> Thanks for catching this and the suggestion!
> >>
> >> Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux
> >> and aarch64-linux-gnu.
> >>
> >
> > This breaks the build for arm targets:
> > /tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
> > In function 'unsigned int arm_add_stmt_cost(vec_info*, void*, int,
> > vect_cost_for_stmt, _stmt_v
> > ec_info*, tree, int, vect_cost_model_location)':
> > /tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:12230:4:
> > error: 'loop_vec_info' was not declared in this scope
> >     loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
> >     ^
> > /tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:12230:18:
> > error: expected ';' before 'loop_vinfo'
> >     loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
> >
> > Can you fix it?
> >
>
> Oops!  Deeply sorry for that and thanks for the testing!
>
> I just found that unlike the other targets arm.c doesn't include
> "tree-vectorizer.h".  The issue should be fixed with the below patch:
>
> gcc/ChangeLog:
>
>         * config/arm/arm.c: Include head files tree-vectorizer.h and
>         cfgloop.h.
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index caf4e56b9fe..6ed34fbf627 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -69,6 +69,8 @@
>  #include "gimplify.h"
>  #include "gimple.h"
>  #include "selftest.h"
> +#include "cfgloop.h"
> +#include "tree-vectorizer.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
>
>
> Is it counted as a obvious patch?

Please check if you can build a cc1 cross to arm, then yes.

> BR,
> Kewen
Kewen.Lin May 20, 2021, 10:44 a.m. UTC | #6
on 2021/5/20 下午6:25, Richard Biener wrote:
> On Thu, May 20, 2021 at 12:09 PM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>
>> on 2021/5/20 下午5:30, Christophe Lyon wrote:
>>> On Thu, 20 May 2021 at 10:52, Kewen.Lin via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>> on 2021/5/19 下午6:01, Richard Biener wrote:
>>>>> On Wed, May 19, 2021 at 11:47 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>
>>>>>> Hi Richi,
>>>>>>
>>>>>> on 2021/5/19 下午4:15, Richard Biener wrote:
>>>>>>> On Wed, May 19, 2021 at 8:20 AM Kewen.Lin <linkw@linux.ibm.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This patch is to replace the current hardcoded weight factor 50
>>>>>>>> for those statements in an inner loop relative to the loop being
>>>>>>>> vectorized with a specific parameter vect-inner-loop-weight-factor.
>>>>>>>>
>>>>>>>> The motivation behind this change is: if targets want to have one
>>>>>>>> unique function to gather some information in each add_stmt_cost
>>>>>>>> call, no matter that it's put before or after the cost tweaking
>>>>>>>> part for inner loop, it may have the need to adjust (expand or
>>>>>>>> shrink) the gathered data as the factor.  Now the factor is
>>>>>>>> hardcoded, it's not easily maintained.  Since it's possible that
>>>>>>>> targets have their own decisions on this costing like the others,
>>>>>>>> I used parameter instead of one unique macro here.
>>>>>>>>
>>>>>>>> Testing is ongoing, is it ok for trunk if everything goes well?
>>>>>>>
>>>>>>> Certainly an improvement.  I suppose we might want to put
>>>>>>> the factor into vinfo->inner_loop_cost_factor.  That way
>>>>>>> we could adjust it easily in common code in the vectorizer
>>>>>>> when we for example have (non-guessed) profile data.
>>>>>>>
>>>>>>> "weight_factor" is kind-of double-speak and I'm missing 'cost' ...
>>>>>>> so, bike-shedding to vect_inner_loop_cost_factor?
>>>>>>>
>>>>>>> Just suggestions - as said, the patch is an improvement already.
>>>>>>>
>>>>>>
>>>>>> Thanks for your nice suggestions!  I've updated the patch accordingly
>>>>>> and attached it.  Does it look better to you?
>>>>>
>>>>> Minor nit:
>>>>>
>>>>> +@item vect-inner-loop-cost-factor
>>>>> +The factor which loop vectorizer uses to over weight those statements in
>>>>> +an inner loop relative to the loop being vectorized.
>>>>> +
>>>>>
>>>>> the default value should be documented here, not..
>>>>>
>>>>> +-param=vect-inner-loop-cost-factor=
>>>>> +Common Joined UInteger Var(param_vect_inner_loop_cost_factor)
>>>>> Init(50) IntegerRange(1, 999999) Param Optimization
>>>>> +Indicates the factor which loop vectorizer uses to over weight those
>>>>> statements in an inner loop relative to the loop being vectorized.
>>>>> The default value is 50.
>>>>> +
>>>>>
>>>>> here (based on statistical analysis of existing cases).  Also the
>>>>> params.opt docs
>>>>> should be the "brief" one - but for simplicity simply make both docs identical
>>>>> (apart from the default value doc).  I suggest
>>>>>
>>>>> "The factor which the loop vectorizer applies to the cost of statements
>>>>> in an inner loop relative to the loop being vectorized."
>>>>>
>>>>
>>>> Thanks for catching this and the suggestion!
>>>>
>>>> Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux
>>>> and aarch64-linux-gnu.
>>>>
>>>
>>> This breaks the build for arm targets:
>>> /tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:
>>> In function 'unsigned int arm_add_stmt_cost(vec_info*, void*, int,
>>> vect_cost_for_stmt, _stmt_v
>>> ec_info*, tree, int, vect_cost_model_location)':
>>> /tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:12230:4:
>>> error: 'loop_vec_info' was not declared in this scope
>>>     loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
>>>     ^
>>> /tmp/158661_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:12230:18:
>>> error: expected ';' before 'loop_vinfo'
>>>     loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
>>>
>>> Can you fix it?
>>>
>>
>> Oops!  Deeply sorry for that and thanks for the testing!
>>
>> I just found that unlike the other targets arm.c doesn't include
>> "tree-vectorizer.h".  The issue should be fixed with the below patch:
>>
>> gcc/ChangeLog:
>>
>>         * config/arm/arm.c: Include head files tree-vectorizer.h and
>>         cfgloop.h.
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index caf4e56b9fe..6ed34fbf627 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -69,6 +69,8 @@
>>  #include "gimplify.h"
>>  #include "gimple.h"
>>  #include "selftest.h"
>> +#include "cfgloop.h"
>> +#include "tree-vectorizer.h"
>>
>>  /* This file should be included last.  */
>>  #include "target-def.h"
>>
>>
>> Is it counted as a obvious patch?
> 
> Please check if you can build a cc1 cross to arm, then yes.
> 

Thanks for the prompt review!

Yes, it worked to build a cross cc1.  I did a trivial adjustment
to align with the exisiting include order like other ports by putting
cfgloop.h just after cfghooks.h as below.  Will commit this new.

BR,
Kewen
---
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index caf4e56b9fe..9377aaef342 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -32,6 +32,7 @@
 #include "tree.h"
 #include "memmodel.h"
 #include "cfghooks.h"
+#include "cfgloop.h"
 #include "df.h"
 #include "tm_p.h"
 #include "stringpool.h"
@@ -69,6 +70,7 @@
 #include "gimplify.h"
 #include "gimple.h"
 #include "selftest.h"
+#include "tree-vectorizer.h"

 /* This file should be included last.  */
 #include "target-def.h"
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 12625a4bee3..be883b61059 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15437,7 +15437,10 @@  aarch64_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	 arbitrary and could potentially be improved with analysis.  */
       if (where == vect_body && stmt_info
 	  && stmt_in_inner_loop_p (vinfo, stmt_info))
-	count *= 50; /*  FIXME  */
+	{
+	  gcc_assert (loop_vinfo);
+	  count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /*  FIXME  */
+	}
 
       retval = (unsigned) (count * stmt_cost);
       costs->region[where] += retval;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 340f7c95d76..223faa49b11 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12201,7 +12201,11 @@  arm_add_stmt_cost (vec_info *vinfo, void *data, int count,
 	 arbitrary and could potentially be improved with analysis.  */
       if (where == vect_body && stmt_info
 	  && stmt_in_inner_loop_p (vinfo, stmt_info))
-	count *= 50;  /* FIXME.  */
+	{
+	  loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
+	  gcc_assert (loop_vinfo);
+	  count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME.  */
+	}
 
       retval = (unsigned) (count * stmt_cost);
       cost[where] += retval;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7c41302c75b..43b1fb0de0b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22396,7 +22396,11 @@  ix86_add_stmt_cost (class vec_info *vinfo, void *data, int count,
      arbitrary and could potentially be improved with analysis.  */
   if (where == vect_body && stmt_info
       && stmt_in_inner_loop_p (vinfo, stmt_info))
-    count *= 50;  /* FIXME.  */
+    {
+      loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
+      gcc_assert (loop_vinfo);
+      count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME.  */
+    }
 
   retval = (unsigned) (count * stmt_cost);
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 48b8efd732b..859da8bd0ed 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5348,7 +5348,11 @@  rs6000_add_stmt_cost (class vec_info *vinfo, void *data, int count,
 	 arbitrary and could potentially be improved with analysis.  */
       if (where == vect_body && stmt_info
 	  && stmt_in_inner_loop_p (vinfo, stmt_info))
-	count *= 50;  /* FIXME.  */
+	{
+	  loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
+	  gcc_assert (loop_vinfo);
+	  count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME.  */
+	}
 
       retval = (unsigned) (count * stmt_cost);
       cost_data->cost[where] += retval;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8b70fdf580d..2234801cab4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14221,6 +14221,10 @@  code to iterate.  2 allows partial vector loads and stores in all loops.
 The parameter only has an effect on targets that support partial
 vector loads and stores.
 
+@item vect-inner-loop-cost-factor
+The factor which loop vectorizer uses to over weight those statements in
+an inner loop relative to the loop being vectorized.
+
 @item avoid-fma-max-bits
 Maximum number of bits for which we avoid creating FMAs.
 
diff --git a/gcc/params.opt b/gcc/params.opt
index 7c7aa78992a..a35c2abe359 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1089,4 +1089,8 @@  Bound on number of runtime checks inserted by the vectorizer's loop versioning f
 Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) IntegerRange(0, 2) Param Optimization
 Controls how loop vectorizer uses partial vectors.  0 means never, 1 means only for loops whose need to iterate can be removed, 2 means for all loops.  The default value is 2.
 
+-param=vect-inner-loop-cost-factor=
+Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) IntegerRange(1, 999999) Param Optimization
+Indicates the factor which loop vectorizer uses to over weight those statements in an inner loop relative to the loop being vectorized.  The default value is 50.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 952fad422eb..b595b7838af 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -90,6 +90,9 @@  along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "asan.h"
 #include "emit-rtl.h"
+#include "gimple.h"
+#include "cfgloop.h"
+#include "tree-vectorizer.h"
 
 bool
 default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
@@ -1400,7 +1403,11 @@  default_add_stmt_cost (class vec_info *vinfo, void *data, int count,
       arbitrary and could potentially be improved with analysis.  */
   if (where == vect_body && stmt_info
       && stmt_in_inner_loop_p (vinfo, stmt_info))
-    count *= 50;  /* FIXME.  */
+    {
+      loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (vinfo);
+      gcc_assert (loop_vinfo);
+      count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo);
+    }
 
   retval = (unsigned) (count * stmt_cost);
   cost[where] += retval;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 2aba503fef7..106c91964b5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -836,6 +836,7 @@  _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
     single_scalar_iteration_cost (0),
     vec_outside_cost (0),
     vec_inside_cost (0),
+    inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
     vectorizable (false),
     can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
     using_partial_vectors_p (false),
@@ -1237,7 +1238,7 @@  vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo)
   /* FORNOW.  */
   innerloop_iters = 1;
   if (loop->inner)
-    innerloop_iters = 50; /* FIXME */
+    innerloop_iters = LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo);
 
   for (i = 0; i < nbbs; i++)
     {
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 9861d9e8810..b8ba63cc8e2 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -689,6 +689,10 @@  public:
   /* The cost of the vector loop body.  */
   int vec_inside_cost;
 
+  /* The factor used to over weight those statements in an inner loop
+     relative to the loop being vectorized.  */
+  unsigned int inner_loop_cost_factor;
+
   /* Is the loop vectorizable? */
   bool vectorizable;
 
@@ -807,6 +811,7 @@  public:
 #define LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST(L) (L)->single_scalar_iteration_cost
 #define LOOP_VINFO_ORIG_LOOP_INFO(L)       (L)->orig_loop_info
 #define LOOP_VINFO_SIMD_IF_COND(L)         (L)->simd_if_cond
+#define LOOP_VINFO_INNER_LOOP_COST_FACTOR(L) (L)->inner_loop_cost_factor
 
 #define LOOP_VINFO_FULLY_MASKED_P(L)		\
   (LOOP_VINFO_USING_PARTIAL_VECTORS_P (L)	\