diff mbox series

[RFC] Should widening_mul should consider block frequency?

Message ID DA41BE1DDCA941489001C7FBD7A8820EE7DA2027@dggeml527-mbx.china.huawei.com
State New
Headers show
Series [RFC] Should widening_mul should consider block frequency? | expand

Commit Message

Yangfei (Felix) March 23, 2020, 9:52 a.m. UTC
Hi,

  I created a bug for this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94269 
  Looks like widening_mul phase may move multiply instruction from outside the loop to inside the loop, merging with one add instruction inside the loop.  
  This will increase the cost of the loop at least on aarch64 (4 cycles vs 1 cycle).  I think widening_mul should consider block frequency when doing such a combination.  
  I mean something like:

  Comments?

Thanks,
Felix

Comments

Li, Pan2 via Gcc-patches March 23, 2020, 3:25 p.m. UTC | #1
On Mon, Mar 23, 2020 at 10:53 AM Yangfei (Felix) <felix.yang@huawei.com> wrote:
>
> Hi,
>
>   I created a bug for this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94269
>   Looks like widening_mul phase may move multiply instruction from outside the loop to inside the loop, merging with one add instruction inside the loop.
>   This will increase the cost of the loop at least on aarch64 (4 cycles vs 1 cycle).  I think widening_mul should consider block frequency when doing such a combination.
>   I mean something like:

As written in the PR I'd follow fma generation and restrict defs to the same BB.

Richard.

> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index 54ba035..4439452 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -2721,7 +2721,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple *stmt,
>      {
>        if (!has_single_use (rhs1)
>           || !is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
> -                                 &type2, &mult_rhs2))
> +                                 &type2, &mult_rhs2)
> +         || (gimple_bb (rhs1_stmt) != gimple_bb (stmt)
> +             && gimple_bb (rhs1_stmt)->count.to_frequency(cfun)
> +                < gimple_bb (stmt)->count.to_frequency(cfun)))
>         return false;
>        add_rhs = rhs2;
>        conv_stmt = conv1_stmt;
> @@ -2730,7 +2733,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple *stmt,
>      {
>        if (!has_single_use (rhs2)
>           || !is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
> -                                 &type2, &mult_rhs2))
> +                                 &type2, &mult_rhs2)
> +         || (gimple_bb (rhs2_stmt) != gimple_bb (stmt)
> +             && gimple_bb (rhs2_stmt)->count.to_frequency(cfun)
> +                < gimple_bb (stmt)->count.to_frequency(cfun)))
>         return false;
>        add_rhs = rhs1;
>        conv_stmt = conv2_stmt;
>
>   Comments?
>
> Thanks,
> Felix
Yangfei (Felix) March 24, 2020, 11:36 a.m. UTC | #2
Hi!

> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Monday, March 23, 2020 11:25 PM
> To: Yangfei (Felix) <felix.yang@huawei.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] Should widening_mul should consider block frequency?
> 
> On Mon, Mar 23, 2020 at 10:53 AM Yangfei (Felix) <felix.yang@huawei.com>
> wrote:
> >
> > Hi,
> >
> >   I created a bug for this issue:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94269
> >   Looks like widening_mul phase may move multiply instruction from outside
> the loop to inside the loop, merging with one add instruction inside the loop.
> >   This will increase the cost of the loop at least on aarch64 (4 cycles vs 1
> cycle).  I think widening_mul should consider block frequency when doing such
> a combination.
> >   I mean something like:
> 
> As written in the PR I'd follow fma generation and restrict defs to the same BB.

Thanks for the suggestion.  That should be more consistent.  
Attached please find the adapted patch.  
Bootstrap and tested on both x86_64 and aarch64 Linux platform.  

gcc:
+2020-03-24  Felix Yang  <felix.yang@huawei.com>
+
+       PR tree-optimization/94269
+       * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
+       operation to single basic block.

gcc/testsuite:
+2020-03-24  Felix Yang  <felix.yang@huawei.com>
+
+       PR tree-optimization/94269
+       * gcc.dg/pr94269.c: New test.
+

Thanks,
Felix
Li, Pan2 via Gcc-patches March 24, 2020, 2:14 p.m. UTC | #3
On Tue, Mar 24, 2020 at 12:37 PM Yangfei (Felix) <felix.yang@huawei.com> wrote:
>
> Hi!
>
> > -----Original Message-----
> > From: Richard Biener [mailto:richard.guenther@gmail.com]
> > Sent: Monday, March 23, 2020 11:25 PM
> > To: Yangfei (Felix) <felix.yang@huawei.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [RFC] Should widening_mul should consider block frequency?
> >
> > On Mon, Mar 23, 2020 at 10:53 AM Yangfei (Felix) <felix.yang@huawei.com>
> > wrote:
> > >
> > > Hi,
> > >
> > >   I created a bug for this issue:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94269
> > >   Looks like widening_mul phase may move multiply instruction from outside
> > the loop to inside the loop, merging with one add instruction inside the loop.
> > >   This will increase the cost of the loop at least on aarch64 (4 cycles vs 1
> > cycle).  I think widening_mul should consider block frequency when doing such
> > a combination.
> > >   I mean something like:
> >
> > As written in the PR I'd follow fma generation and restrict defs to the same BB.
>
> Thanks for the suggestion.  That should be more consistent.
> Attached please find the adapted patch.
> Bootstrap and tested on both x86_64 and aarch64 Linux platform.

OK with moving the BB check before the is_widening_mult_p call
since it's way cheaper.

Thanks,
Richard.

> gcc:
> +2020-03-24  Felix Yang  <felix.yang@huawei.com>
> +
> +       PR tree-optimization/94269
> +       * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
> +       operation to single basic block.
>
> gcc/testsuite:
> +2020-03-24  Felix Yang  <felix.yang@huawei.com>
> +
> +       PR tree-optimization/94269
> +       * gcc.dg/pr94269.c: New test.
> +
>
> Thanks,
> Felix
Yangfei (Felix) March 26, 2020, 1:06 a.m. UTC | #4
Hi!

> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Tuesday, March 24, 2020 10:14 PM
> To: Yangfei (Felix) <felix.yang@huawei.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] Should widening_mul should consider block frequency?
> 
> > > As written in the PR I'd follow fma generation and restrict defs to the same
> BB.
> >
> > Thanks for the suggestion.  That should be more consistent.
> > Attached please find the adapted patch.
> > Bootstrap and tested on both x86_64 and aarch64 Linux platform.
> 
> OK with moving the BB check before the is_widening_mult_p call since it's way
> cheaper.

That's a good point.  I have attached the v2 patch.  
Also did a spec2017 test on aarch64, no obvious impact witnessed with this.  
Can you sponsor this patch please?  My networking does not work well and I am having some trouble pushing it : - (  

git commit msg: 

    widening_mul: restrict ops to be defined in the same basic-block when convert plusminus to widen

    In the testcase for PR94269, widening_mul moves two multiply instructions from outside the loop to inside
    the loop, merging with two add instructions separately.  This increases the cost of the loop.  Like FMA detection
    in the same pass, simply restrict ops to be defined in the same basic-block to avoid possibly moving multiply
    to a different block with a higher execution frequency.  

    2020-03-26  Felix Yang  <felix.yang@huawei.com>

        gcc/
        PR tree-optimization/94269
        * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
        operation to single basic block.

        gcc/testsuite/
        PR tree-optimization/94269
        * gcc.dg/pr94269.c: New test.

change log:

gcc:
+2020-03-26  Felix Yang  <felix.yang@huawei.com>
+
+       PR tree-optimization/94269
+       * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
+       operation to single basic block.

gcc/testsuite:
+2020-03-26  Felix Yang  <felix.yang@huawei.com>
+
+       PR tree-optimization/94269
+       * gcc.dg/pr94269.c: New test.
+
Li, Pan2 via Gcc-patches March 26, 2020, 7:37 a.m. UTC | #5
On Thu, Mar 26, 2020 at 2:06 AM Yangfei (Felix) <felix.yang@huawei.com> wrote:
>
> Hi!
>
> > -----Original Message-----
> > From: Richard Biener [mailto:richard.guenther@gmail.com]
> > Sent: Tuesday, March 24, 2020 10:14 PM
> > To: Yangfei (Felix) <felix.yang@huawei.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [RFC] Should widening_mul should consider block frequency?
> >
> > > > As written in the PR I'd follow fma generation and restrict defs to the same
> > BB.
> > >
> > > Thanks for the suggestion.  That should be more consistent.
> > > Attached please find the adapted patch.
> > > Bootstrap and tested on both x86_64 and aarch64 Linux platform.
> >
> > OK with moving the BB check before the is_widening_mult_p call since it's way
> > cheaper.
>
> That's a good point.  I have attached the v2 patch.
> Also did a spec2017 test on aarch64, no obvious impact witnessed with this.
> Can you sponsor this patch please?  My networking does not work well and I am having some trouble pushing it : - (

Pushed.  For the future can you please attach patches suitable for git am?

Thanks,
Richard.

> git commit msg:
>
>     widening_mul: restrict ops to be defined in the same basic-block when convert plusminus to widen
>
>     In the testcase for PR94269, widening_mul moves two multiply instructions from outside the loop to inside
>     the loop, merging with two add instructions separately.  This increases the cost of the loop.  Like FMA detection
>     in the same pass, simply restrict ops to be defined in the same basic-block to avoid possibly moving multiply
>     to a different block with a higher execution frequency.
>
>     2020-03-26  Felix Yang  <felix.yang@huawei.com>
>
>         gcc/
>         PR tree-optimization/94269
>         * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
>         operation to single basic block.
>
>         gcc/testsuite/
>         PR tree-optimization/94269
>         * gcc.dg/pr94269.c: New test.
>
> change log:
>
> gcc:
> +2020-03-26  Felix Yang  <felix.yang@huawei.com>
> +
> +       PR tree-optimization/94269
> +       * tree-ssa-math-opts.c (convert_plusminus_to_widen): Restrict this
> +       operation to single basic block.
>
> gcc/testsuite:
> +2020-03-26  Felix Yang  <felix.yang@huawei.com>
> +
> +       PR tree-optimization/94269
> +       * gcc.dg/pr94269.c: New test.
> +
Yangfei (Felix) March 26, 2020, 7:45 a.m. UTC | #6
> -----Original Message-----
> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Thursday, March 26, 2020 3:37 PM
> To: Yangfei (Felix) <felix.yang@huawei.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [RFC] Should widening_mul should consider block frequency?
> 
> >
> > That's a good point.  I have attached the v2 patch.
> > Also did a spec2017 test on aarch64, no obvious impact witnessed with this.
> > Can you sponsor this patch please?  My networking does not work well
> > and I am having some trouble pushing it : - (
> 
> Pushed.  For the future can you please attach patches suitable for git am?

Sure, will do.  Thanks for the help : - )  

Felix
diff mbox series

Patch

diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 54ba035..4439452 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2721,7 +2721,10 @@  convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple *stmt,
     {
       if (!has_single_use (rhs1)
          || !is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
-                                 &type2, &mult_rhs2))
+                                 &type2, &mult_rhs2)
+         || (gimple_bb (rhs1_stmt) != gimple_bb (stmt)
+             && gimple_bb (rhs1_stmt)->count.to_frequency(cfun)
+                < gimple_bb (stmt)->count.to_frequency(cfun)))
        return false;
       add_rhs = rhs2;
       conv_stmt = conv1_stmt;
@@ -2730,7 +2733,10 @@  convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple *stmt,
     {
       if (!has_single_use (rhs2)
          || !is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
-                                 &type2, &mult_rhs2))
+                                 &type2, &mult_rhs2)
+         || (gimple_bb (rhs2_stmt) != gimple_bb (stmt)
+             && gimple_bb (rhs2_stmt)->count.to_frequency(cfun)
+                < gimple_bb (stmt)->count.to_frequency(cfun)))
        return false;
       add_rhs = rhs1;
       conv_stmt = conv2_stmt;