[1/2] correct BB frequencies after loop changed

Message ID	20201009101242.2478660-1-guojiufu@linux.ibm.com
State	New
Headers	show Return-Path: <gcc-patches-bounces@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org EA4E23861812 To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/2] correct BB frequencies after loop changed Date: Fri, 9 Oct 2020 18:12:41 +0800 Message-Id: <20201009101242.2478660-1-guojiufu@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list From: guojiufu via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: guojiufu <guojiufu@linux.ibm.com> Cc: wschmidt@linux.ibm.com, dje.gcc@gmail.com, segher@kernel.crashing.org Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>
Series	[1/2] correct BB frequencies after loop changed \| expand [1/2] correct BB frequencies after loop changed [2/2] reset edge probibility and BB-count for peeled/unrolled loop

Jiufu Guo Oct. 9, 2020, 10:12 a.m. UTC

When investigating the issue from https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549786.html
I find the BB COUNTs of loop seems are not accurate in some case.
For example:

In below figure:


               COUNT:268435456<bb 2>  pre-header
                        |
                        |  .--------------------.
                        |  |                    |
                        V  v                    |
               COUNT:805306369<bb 3>            |
                       / \                      |
                   33%/   \                     |
                     /     \                    |
                    v       v                   |
COUNT:268435456<bb 10>  COUNT:536870911<bb 15>  | 
    exit-edge                 |   latch         |
                              ._________________.

Those COUNTs have below equations:
COUNT of exit-edge:268435456 = COUNT of pre-header:268435456
COUNT of exit-edge:268435456 = COUNT of header:805306369 * 33
COUNT of header:805306369 = COUNT of pre-header:268435456 + COUNT of latch:536870911


While after pcom:

               COUNT:268435456<bb 2>  pre-header
                        |
                        |  .--------------------.
                        |  |                    |
                        V  v                    |
               COUNT:268435456<bb 3>            |
                       / \                      |
                   50%/   \                     |
                     /     \                    |
                    v       v                   |
COUNT:134217728<bb 10>  COUNT:134217728<bb 15>  | 
    exit-edge                 |   latch         |
                              ._________________.

COUNT<bb 3> != COUNT<bb 2> + COUNT<bb 15>
COUNT<bb 10> != COUNT<bb2>

In some cases, the probility of exit-edge is easy to estimate, then
those COUNTs of other BBs in loop can be re-caculated.

Bootstrap and regtest pass on ppc64le. Is this ok for trunk?

Jiufu

gcc/ChangeLog:
2020-10-09  Jiufu Guo   <guojiufu@linux.ibm.com>

	* cfgloopmanip.h (recompute_loop_frequencies): New function.
	* cfgloopmanip.c (recompute_loop_frequencies): New implementation.
	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Call
	recompute_loop_frequencies.

---
 gcc/cfgloopmanip.c        | 53 +++++++++++++++++++++++++++++++++++++++
 gcc/cfgloopmanip.h        |  2 +-
 gcc/tree-ssa-loop-manip.c | 28 +++------------------
 3 files changed, 57 insertions(+), 26 deletions(-)

Jeff Law Nov. 17, 2020, 10:07 p.m. UTC | #1

Minor questions for Jan and Richi embedded below...

On 10/9/20 4:12 AM, guojiufu via Gcc-patches wrote:
> When investigating the issue from https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549786.html
> I find the BB COUNTs of loop seems are not accurate in some case.
> For example:
>
> In below figure:
>
>
>                COUNT:268435456<bb 2>  pre-header
>                         |
>                         |  .--------------------.
>                         |  |                    |
>                         V  v                    |
>                COUNT:805306369<bb 3>            |
>                        / \                      |
>                    33%/   \                     |
>                      /     \                    |
>                     v       v                   |
> COUNT:268435456<bb 10>  COUNT:536870911<bb 15>  | 
>     exit-edge                 |   latch         |
>                               ._________________.
>
> Those COUNTs have below equations:
> COUNT of exit-edge:268435456 = COUNT of pre-header:268435456
> COUNT of exit-edge:268435456 = COUNT of header:805306369 * 33
> COUNT of header:805306369 = COUNT of pre-header:268435456 + COUNT of latch:536870911
>
>
> While after pcom:
>
>                COUNT:268435456<bb 2>  pre-header
>                         |
>                         |  .--------------------.
>                         |  |                    |
>                         V  v                    |
>                COUNT:268435456<bb 3>            |
>                        / \                      |
>                    50%/   \                     |
>                      /     \                    |
>                     v       v                   |
> COUNT:134217728<bb 10>  COUNT:134217728<bb 15>  | 
>     exit-edge                 |   latch         |
>                               ._________________.
>
> COUNT<bb 3> != COUNT<bb 2> + COUNT<bb 15>
> COUNT<bb 10> != COUNT<bb2>
>
> In some cases, the probility of exit-edge is easy to estimate, then
> those COUNTs of other BBs in loop can be re-caculated.
>
> Bootstrap and regtest pass on ppc64le. Is this ok for trunk?
>
> Jiufu
>
> gcc/ChangeLog:
> 2020-10-09  Jiufu Guo   <guojiufu@linux.ibm.com>
>
> 	* cfgloopmanip.h (recompute_loop_frequencies): New function.
> 	* cfgloopmanip.c (recompute_loop_frequencies): New implementation.
> 	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Call
> 	recompute_loop_frequencies.
>
> ---
>  gcc/cfgloopmanip.c        | 53 +++++++++++++++++++++++++++++++++++++++
>  gcc/cfgloopmanip.h        |  2 +-
>  gcc/tree-ssa-loop-manip.c | 28 +++------------------
>  3 files changed, 57 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
> index 73134a20e33..b0ca82a67fd 100644
> --- a/gcc/cfgloopmanip.c
> +++ b/gcc/cfgloopmanip.c
> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimplify-me.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "dumpfile.h"
> +#include "cfgrtl.h"
>  
>  static void copy_loops_to (class loop **, int,
>  			   class loop *);
> @@ -1773,3 +1774,55 @@ loop_version (class loop *loop,
>  
>    return nloop;
>  }
> +
> +/* Recalculate the COUNTs of BBs in LOOP, if the probability of exit edge
> +   is NEW_PROB.  */
> +
> +bool
> +recompute_loop_frequencies (class loop *loop, profile_probability new_prob)
> +{
> +  edge exit = single_exit (loop);
> +  if (!exit)
> +    return false;
> +
> +  edge e;
> +  edge_iterator ei;
> +  edge non_exit;
> +  basic_block * bbs;
> +  profile_count exit_count = loop_preheader_edge (loop)->count ();
> +  profile_probability exit_p = exit_count.probability_in (loop->header->count);
> +  profile_count base_count = loop->header->count;
> +  profile_count after_num = base_count.apply_probability (exit_p);
> +  profile_count after_den = base_count.apply_probability (new_prob);
> +
> +  /* Update BB counts in loop body.
> +     COUNT<exit> = COUNT<preheader>
> +     COUNT<exit> = COUNT<header> * exit_edge_probility
> +     The COUNT<new_header> = COUNT<old_header> * old_exit_p / new_prob.  */
> +  bbs = get_loop_body (loop);
> +  scale_bbs_frequencies_profile_count (bbs, loop->num_nodes, after_num,
> +						     after_den);
> +  free (bbs);
> +
> +  /* Update probability and count of the BB besides exit edge (maybe latch).  */
> +  FOR_EACH_EDGE (e, ei, exit->src->succs)
> +    if (e != exit)
> +      break;
> +  non_exit = e;
Are we sure that exit->src has just two successors (will that case be
canonicalized before we get here?).  If it has > 2 successors, then I'm
pretty sure the frequencies get mucked up.  Richi could probably answer
whether or not the block with the loop exit edge can have > 2 successors.


> +
> +  non_exit->probability = new_prob.invert ();
> +  non_exit->dest->count = profile_count::zero ();
> +  FOR_EACH_EDGE (e, ei, non_exit->dest->preds)
> +    non_exit->dest->count += e->src->count.apply_probability (e->probability);
This generally looks sensible with the caveat that if exit->src has just
two successors.  If it can have more than two successors then I think we
need to distribute the count across them.

Jan, if we recompute non_exit->dest->count, then are there further
downstream effects that we need to account for?  ie, I'm worried we're
potentially introducing inconsistencies in the frequency data.



Jeff

Richard Biener Nov. 18, 2020, 7:28 a.m. UTC | #2

On Tue, 17 Nov 2020, Jeff Law wrote:

> 
> Minor questions for Jan and Richi embedded below...
> 
> On 10/9/20 4:12 AM, guojiufu via Gcc-patches wrote:
> > When investigating the issue from https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549786.html
> > I find the BB COUNTs of loop seems are not accurate in some case.
> > For example:
> >
> > In below figure:
> >
> >
> >                COUNT:268435456<bb 2>  pre-header
> >                         |
> >                         |  .--------------------.
> >                         |  |                    |
> >                         V  v                    |
> >                COUNT:805306369<bb 3>            |
> >                        / \                      |
> >                    33%/   \                     |
> >                      /     \                    |
> >                     v       v                   |
> > COUNT:268435456<bb 10>  COUNT:536870911<bb 15>  | 
> >     exit-edge                 |   latch         |
> >                               ._________________.
> >
> > Those COUNTs have below equations:
> > COUNT of exit-edge:268435456 = COUNT of pre-header:268435456
> > COUNT of exit-edge:268435456 = COUNT of header:805306369 * 33
> > COUNT of header:805306369 = COUNT of pre-header:268435456 + COUNT of latch:536870911
> >
> >
> > While after pcom:
> >
> >                COUNT:268435456<bb 2>  pre-header
> >                         |
> >                         |  .--------------------.
> >                         |  |                    |
> >                         V  v                    |
> >                COUNT:268435456<bb 3>            |
> >                        / \                      |
> >                    50%/   \                     |
> >                      /     \                    |
> >                     v       v                   |
> > COUNT:134217728<bb 10>  COUNT:134217728<bb 15>  | 
> >     exit-edge                 |   latch         |
> >                               ._________________.
> >
> > COUNT<bb 3> != COUNT<bb 2> + COUNT<bb 15>
> > COUNT<bb 10> != COUNT<bb2>
> >
> > In some cases, the probility of exit-edge is easy to estimate, then
> > those COUNTs of other BBs in loop can be re-caculated.
> >
> > Bootstrap and regtest pass on ppc64le. Is this ok for trunk?
> >
> > Jiufu
> >
> > gcc/ChangeLog:
> > 2020-10-09  Jiufu Guo   <guojiufu@linux.ibm.com>
> >
> > 	* cfgloopmanip.h (recompute_loop_frequencies): New function.
> > 	* cfgloopmanip.c (recompute_loop_frequencies): New implementation.
> > 	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Call
> > 	recompute_loop_frequencies.
> >
> > ---
> >  gcc/cfgloopmanip.c        | 53 +++++++++++++++++++++++++++++++++++++++
> >  gcc/cfgloopmanip.h        |  2 +-
> >  gcc/tree-ssa-loop-manip.c | 28 +++------------------
> >  3 files changed, 57 insertions(+), 26 deletions(-)
> >
> > diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
> > index 73134a20e33..b0ca82a67fd 100644
> > --- a/gcc/cfgloopmanip.c
> > +++ b/gcc/cfgloopmanip.c
> > @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "gimplify-me.h"
> >  #include "tree-ssa-loop-manip.h"
> >  #include "dumpfile.h"
> > +#include "cfgrtl.h"
> >  
> >  static void copy_loops_to (class loop **, int,
> >  			   class loop *);
> > @@ -1773,3 +1774,55 @@ loop_version (class loop *loop,
> >  
> >    return nloop;
> >  }
> > +
> > +/* Recalculate the COUNTs of BBs in LOOP, if the probability of exit edge
> > +   is NEW_PROB.  */
> > +
> > +bool
> > +recompute_loop_frequencies (class loop *loop, profile_probability new_prob)
> > +{
> > +  edge exit = single_exit (loop);
> > +  if (!exit)
> > +    return false;
> > +
> > +  edge e;
> > +  edge_iterator ei;
> > +  edge non_exit;
> > +  basic_block * bbs;
> > +  profile_count exit_count = loop_preheader_edge (loop)->count ();
> > +  profile_probability exit_p = exit_count.probability_in (loop->header->count);
> > +  profile_count base_count = loop->header->count;
> > +  profile_count after_num = base_count.apply_probability (exit_p);
> > +  profile_count after_den = base_count.apply_probability (new_prob);
> > +
> > +  /* Update BB counts in loop body.
> > +     COUNT<exit> = COUNT<preheader>
> > +     COUNT<exit> = COUNT<header> * exit_edge_probility
> > +     The COUNT<new_header> = COUNT<old_header> * old_exit_p / new_prob.  */
> > +  bbs = get_loop_body (loop);
> > +  scale_bbs_frequencies_profile_count (bbs, loop->num_nodes, after_num,
> > +						     after_den);
> > +  free (bbs);
> > +
> > +  /* Update probability and count of the BB besides exit edge (maybe latch).  */
> > +  FOR_EACH_EDGE (e, ei, exit->src->succs)
> > +    if (e != exit)
> > +      break;
> > +  non_exit = e;
> Are we sure that exit->src has just two successors (will that case be
> canonicalized before we get here?).? If it has > 2 successors, then I'm
> pretty sure the frequencies get mucked up.? Richi could probably answer
> whether or not the block with the loop exit edge can have > 2 successors.

There's nothing preventing more than two edges in the SRC generally
(the exit could be an edge off a switch).  But of course this function
is very likely called on loops that are countable which means niter
analysis has to handle it which in turn means all exits are controlled
by simple conditions on IVs.

I guess adding a gcc_assert (EDGE_COUNT (exit->src->succs) == 2) can't 
hurt (with a comment reflecting the above).

Richard.

> 
> > +
> > +  non_exit->probability = new_prob.invert ();
> > +  non_exit->dest->count = profile_count::zero ();
> > +  FOR_EACH_EDGE (e, ei, non_exit->dest->preds)
> > +    non_exit->dest->count += e->src->count.apply_probability (e->probability);
> This generally looks sensible with the caveat that if exit->src has just
> two successors.? If it can have more than two successors then I think we
> need to distribute the count across them.
> 
> Jan, if we recompute non_exit->dest->count, then are there further
> downstream effects that we need to account for?? ie, I'm worried we're
> potentially introducing inconsistencies in the frequency data.

Jeff Law Nov. 18, 2020, 11:45 p.m. UTC | #3

On 11/18/20 12:28 AM, Richard Biener wrote:
> On Tue, 17 Nov 2020, Jeff Law wrote:
>
>> Minor questions for Jan and Richi embedded below...
>>
>> On 10/9/20 4:12 AM, guojiufu via Gcc-patches wrote:
>>> When investigating the issue from https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549786.html
>>> I find the BB COUNTs of loop seems are not accurate in some case.
>>> For example:
>>>
>>> In below figure:
>>>
>>>
>>>                COUNT:268435456<bb 2>  pre-header
>>>                         |
>>>                         |  .--------------------.
>>>                         |  |                    |
>>>                         V  v                    |
>>>                COUNT:805306369<bb 3>            |
>>>                        / \                      |
>>>                    33%/   \                     |
>>>                      /     \                    |
>>>                     v       v                   |
>>> COUNT:268435456<bb 10>  COUNT:536870911<bb 15>  | 
>>>     exit-edge                 |   latch         |
>>>                               ._________________.
>>>
>>> Those COUNTs have below equations:
>>> COUNT of exit-edge:268435456 = COUNT of pre-header:268435456
>>> COUNT of exit-edge:268435456 = COUNT of header:805306369 * 33
>>> COUNT of header:805306369 = COUNT of pre-header:268435456 + COUNT of latch:536870911
>>>
>>>
>>> While after pcom:
>>>
>>>                COUNT:268435456<bb 2>  pre-header
>>>                         |
>>>                         |  .--------------------.
>>>                         |  |                    |
>>>                         V  v                    |
>>>                COUNT:268435456<bb 3>            |
>>>                        / \                      |
>>>                    50%/   \                     |
>>>                      /     \                    |
>>>                     v       v                   |
>>> COUNT:134217728<bb 10>  COUNT:134217728<bb 15>  | 
>>>     exit-edge                 |   latch         |
>>>                               ._________________.
>>>
>>> COUNT<bb 3> != COUNT<bb 2> + COUNT<bb 15>
>>> COUNT<bb 10> != COUNT<bb2>
>>>
>>> In some cases, the probility of exit-edge is easy to estimate, then
>>> those COUNTs of other BBs in loop can be re-caculated.
>>>
>>> Bootstrap and regtest pass on ppc64le. Is this ok for trunk?
>>>
>>> Jiufu
>>>
>>> gcc/ChangeLog:
>>> 2020-10-09  Jiufu Guo   <guojiufu@linux.ibm.com>
>>>
>>> 	* cfgloopmanip.h (recompute_loop_frequencies): New function.
>>> 	* cfgloopmanip.c (recompute_loop_frequencies): New implementation.
>>> 	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Call
>>> 	recompute_loop_frequencies.
>>>
>>> ---
>>>  gcc/cfgloopmanip.c        | 53 +++++++++++++++++++++++++++++++++++++++
>>>  gcc/cfgloopmanip.h        |  2 +-
>>>  gcc/tree-ssa-loop-manip.c | 28 +++------------------
>>>  3 files changed, 57 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
>>> index 73134a20e33..b0ca82a67fd 100644
>>> --- a/gcc/cfgloopmanip.c
>>> +++ b/gcc/cfgloopmanip.c
>>> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "gimplify-me.h"
>>>  #include "tree-ssa-loop-manip.h"
>>>  #include "dumpfile.h"
>>> +#include "cfgrtl.h"
>>>  
>>>  static void copy_loops_to (class loop **, int,
>>>  			   class loop *);
>>> @@ -1773,3 +1774,55 @@ loop_version (class loop *loop,
>>>  
>>>    return nloop;
>>>  }
>>> +
>>> +/* Recalculate the COUNTs of BBs in LOOP, if the probability of exit edge
>>> +   is NEW_PROB.  */
>>> +
>>> +bool
>>> +recompute_loop_frequencies (class loop *loop, profile_probability new_prob)
>>> +{
>>> +  edge exit = single_exit (loop);
>>> +  if (!exit)
>>> +    return false;
>>> +
>>> +  edge e;
>>> +  edge_iterator ei;
>>> +  edge non_exit;
>>> +  basic_block * bbs;
>>> +  profile_count exit_count = loop_preheader_edge (loop)->count ();
>>> +  profile_probability exit_p = exit_count.probability_in (loop->header->count);
>>> +  profile_count base_count = loop->header->count;
>>> +  profile_count after_num = base_count.apply_probability (exit_p);
>>> +  profile_count after_den = base_count.apply_probability (new_prob);
>>> +
>>> +  /* Update BB counts in loop body.
>>> +     COUNT<exit> = COUNT<preheader>
>>> +     COUNT<exit> = COUNT<header> * exit_edge_probility
>>> +     The COUNT<new_header> = COUNT<old_header> * old_exit_p / new_prob.  */
>>> +  bbs = get_loop_body (loop);
>>> +  scale_bbs_frequencies_profile_count (bbs, loop->num_nodes, after_num,
>>> +						     after_den);
>>> +  free (bbs);
>>> +
>>> +  /* Update probability and count of the BB besides exit edge (maybe latch).  */
>>> +  FOR_EACH_EDGE (e, ei, exit->src->succs)
>>> +    if (e != exit)
>>> +      break;
>>> +  non_exit = e;
>> Are we sure that exit->src has just two successors (will that case be
>> canonicalized before we get here?).? If it has > 2 successors, then I'm
>> pretty sure the frequencies get mucked up.? Richi could probably answer
>> whether or not the block with the loop exit edge can have > 2 successors.
> There's nothing preventing more than two edges in the SRC generally
> (the exit could be an edge off a switch).
That's precisely the case I was concerned about.

>   But of course this function
> is very likely called on loops that are countable which means niter
> analysis has to handle it which in turn means all exits are controlled
> by simple conditions on IVs.
Thanks.  It sounds like it's unlikely we'll have > 2 out edges.
>
> I guess adding a gcc_assert (EDGE_COUNT (exit->src->succs) == 2) can't 
> hurt (with a comment reflecting the above).
Sounds good to me.   Just catching this case if it happens is good
enough for me -- if it trips we can come back and adjust the code to
distribute across the edges.

So if Jan could chime in on the downstream edge updates question then I
think we'd be ready to move forward.

jeff

Jiufu Guo Nov. 24, 2020, 5:44 a.m. UTC | #4

Jeff Law <law@redhat.com> writes:

> On 11/18/20 12:28 AM, Richard Biener wrote:
>> On Tue, 17 Nov 2020, Jeff Law wrote:
>>
>>> Minor questions for Jan and Richi embedded below...
>>>
>>> On 10/9/20 4:12 AM, guojiufu via Gcc-patches wrote:
>>>> When investigating the issue from https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549786.html
>>>> I find the BB COUNTs of loop seems are not accurate in some case.
>>>> For example:
>>>>
>>>> In below figure:
>>>>
>>>>
>>>>                COUNT:268435456<bb 2>  pre-header
>>>>                         |
>>>>                         |  .--------------------.
>>>>                         |  |                    |
>>>>                         V  v                    |
>>>>                COUNT:805306369<bb 3>            |
>>>>                        / \                      |
>>>>                    33%/   \                     |
>>>>                      /     \                    |
>>>>                     v       v                   |
>>>> COUNT:268435456<bb 10>  COUNT:536870911<bb 15>  | 
>>>>     exit-edge                 |   latch         |
>>>>                               ._________________.
>>>>
>>>> Those COUNTs have below equations:
>>>> COUNT of exit-edge:268435456 = COUNT of pre-header:268435456
>>>> COUNT of exit-edge:268435456 = COUNT of header:805306369 * 33
>>>> COUNT of header:805306369 = COUNT of pre-header:268435456 + COUNT of latch:536870911
>>>>
>>>>
>>>> While after pcom:
>>>>
>>>>                COUNT:268435456<bb 2>  pre-header
>>>>                         |
>>>>                         |  .--------------------.
>>>>                         |  |                    |
>>>>                         V  v                    |
>>>>                COUNT:268435456<bb 3>            |
>>>>                        / \                      |
>>>>                    50%/   \                     |
>>>>                      /     \                    |
>>>>                     v       v                   |
>>>> COUNT:134217728<bb 10>  COUNT:134217728<bb 15>  | 
>>>>     exit-edge                 |   latch         |
>>>>                               ._________________.
>>>>
>>>> COUNT<bb 3> != COUNT<bb 2> + COUNT<bb 15>
>>>> COUNT<bb 10> != COUNT<bb2>
>>>>
>>>> In some cases, the probility of exit-edge is easy to estimate, then
>>>> those COUNTs of other BBs in loop can be re-caculated.
>>>>
>>>> Bootstrap and regtest pass on ppc64le. Is this ok for trunk?
>>>>
>>>> Jiufu
>>>>
>>>> gcc/ChangeLog:
>>>> 2020-10-09  Jiufu Guo   <guojiufu@linux.ibm.com>
>>>>
>>>> 	* cfgloopmanip.h (recompute_loop_frequencies): New function.
>>>> 	* cfgloopmanip.c (recompute_loop_frequencies): New implementation.
>>>> 	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Call
>>>> 	recompute_loop_frequencies.
>>>>
>>>> ---
>>>>  gcc/cfgloopmanip.c        | 53 +++++++++++++++++++++++++++++++++++++++
>>>>  gcc/cfgloopmanip.h        |  2 +-
>>>>  gcc/tree-ssa-loop-manip.c | 28 +++------------------
>>>>  3 files changed, 57 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
>>>> index 73134a20e33..b0ca82a67fd 100644
>>>> --- a/gcc/cfgloopmanip.c
>>>> +++ b/gcc/cfgloopmanip.c
>>>> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>  #include "gimplify-me.h"
>>>>  #include "tree-ssa-loop-manip.h"
>>>>  #include "dumpfile.h"
>>>> +#include "cfgrtl.h"
>>>>  
>>>>  static void copy_loops_to (class loop **, int,
>>>>  			   class loop *);
>>>> @@ -1773,3 +1774,55 @@ loop_version (class loop *loop,
>>>>  
>>>>    return nloop;
>>>>  }
>>>> +
>>>> +/* Recalculate the COUNTs of BBs in LOOP, if the probability of exit edge
>>>> +   is NEW_PROB.  */
>>>> +
>>>> +bool
>>>> +recompute_loop_frequencies (class loop *loop, profile_probability new_prob)
>>>> +{
>>>> +  edge exit = single_exit (loop);
>>>> +  if (!exit)
>>>> +    return false;
>>>> +
>>>> +  edge e;
>>>> +  edge_iterator ei;
>>>> +  edge non_exit;
>>>> +  basic_block * bbs;
>>>> +  profile_count exit_count = loop_preheader_edge (loop)->count ();
>>>> +  profile_probability exit_p = exit_count.probability_in (loop->header->count);
>>>> +  profile_count base_count = loop->header->count;
>>>> +  profile_count after_num = base_count.apply_probability (exit_p);
>>>> +  profile_count after_den = base_count.apply_probability (new_prob);
>>>> +
>>>> +  /* Update BB counts in loop body.
>>>> +     COUNT<exit> = COUNT<preheader>
>>>> +     COUNT<exit> = COUNT<header> * exit_edge_probility
>>>> +     The COUNT<new_header> = COUNT<old_header> * old_exit_p / new_prob.  */
>>>> +  bbs = get_loop_body (loop);
>>>> +  scale_bbs_frequencies_profile_count (bbs, loop->num_nodes, after_num,
>>>> +						     after_den);
>>>> +  free (bbs);
>>>> +
>>>> +  /* Update probability and count of the BB besides exit edge (maybe latch).  */
>>>> +  FOR_EACH_EDGE (e, ei, exit->src->succs)
>>>> +    if (e != exit)
>>>> +      break;
>>>> +  non_exit = e;
>>> Are we sure that exit->src has just two successors (will that case be
>>> canonicalized before we get here?).? If it has > 2 successors, then I'm
>>> pretty sure the frequencies get mucked up.? Richi could probably answer
>>> whether or not the block with the loop exit edge can have > 2 successors.
>> There's nothing preventing more than two edges in the SRC generally
>> (the exit could be an edge off a switch).
> That's precisely the case I was concerned about.
>
>>   But of course this function
>> is very likely called on loops that are countable which means niter
>> analysis has to handle it which in turn means all exits are controlled
>> by simple conditions on IVs.
> Thanks.  It sounds like it's unlikely we'll have > 2 out edges.
>>
>> I guess adding a gcc_assert (EDGE_COUNT (exit->src->succs) == 2) can't 
>> hurt (with a comment reflecting the above).
> Sounds good to me.   Just catching this case if it happens is good
> enough for me -- if it trips we can come back and adjust the code to
> distribute across the edges.
With this gcc_assert, run bootstrap and regression test, no failure
occur.
For this patch, in the original code, there is code:
-  new_nonexit = single_pred_edge (loop->latch);
-  prob = new_nonexit->probability;
-  new_nonexit->probability = new_exit->probability.invert ();
Which is also assume 2 successors.  So, it may relative safe.

Thanks,
Jiufu Guo.

>
> So if Jan could chime in on the downstream edge updates question then I
> think we'd be ready to move forward.
>
> jeff

Jiufu Guo Dec. 4, 2020, 3:05 a.m. UTC | #5

Jiufu Guo <guojiufu@linux.ibm.com> writes:

> Jeff Law <law@redhat.com> writes:
>
>> On 11/18/20 12:28 AM, Richard Biener wrote:
>>> On Tue, 17 Nov 2020, Jeff Law wrote:
>>>
>>>> Minor questions for Jan and Richi embedded below...
>>>>
>>>> On 10/9/20 4:12 AM, guojiufu via Gcc-patches wrote:
>>>>> When investigating the issue from https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549786.html
>>>>> I find the BB COUNTs of loop seems are not accurate in some case.
>>>>> For example:
>>>>>
>>>>> In below figure:
>>>>>
>>>>>
>>>>>                COUNT:268435456<bb 2>  pre-header
>>>>>                         |
>>>>>                         |  .--------------------.
>>>>>                         |  |                    |
>>>>>                         V  v                    |
>>>>>                COUNT:805306369<bb 3>            |
>>>>>                        / \                      |
>>>>>                    33%/   \                     |
>>>>>                      /     \                    |
>>>>>                     v       v                   |
>>>>> COUNT:268435456<bb 10>  COUNT:536870911<bb 15>  | 
>>>>>     exit-edge                 |   latch         |
>>>>>                               ._________________.
>>>>>
>>>>> Those COUNTs have below equations:
>>>>> COUNT of exit-edge:268435456 = COUNT of pre-header:268435456
>>>>> COUNT of exit-edge:268435456 = COUNT of header:805306369 * 33
>>>>> COUNT of header:805306369 = COUNT of pre-header:268435456 + COUNT of latch:536870911
>>>>>
>>>>>
>>>>> While after pcom:
>>>>>
>>>>>                COUNT:268435456<bb 2>  pre-header
>>>>>                         |
>>>>>                         |  .--------------------.
>>>>>                         |  |                    |
>>>>>                         V  v                    |
>>>>>                COUNT:268435456<bb 3>            |
>>>>>                        / \                      |
>>>>>                    50%/   \                     |
>>>>>                      /     \                    |
>>>>>                     v       v                   |
>>>>> COUNT:134217728<bb 10>  COUNT:134217728<bb 15>  | 
>>>>>     exit-edge                 |   latch         |
>>>>>                               ._________________.
>>>>>
>>>>> COUNT<bb 3> != COUNT<bb 2> + COUNT<bb 15>
>>>>> COUNT<bb 10> != COUNT<bb2>
>>>>>
>>>>> In some cases, the probility of exit-edge is easy to estimate, then
>>>>> those COUNTs of other BBs in loop can be re-caculated.
>>>>>
>>>>> Bootstrap and regtest pass on ppc64le. Is this ok for trunk?
>>>>>
>>>>> Jiufu
>>>>>
>>>>> gcc/ChangeLog:
>>>>> 2020-10-09  Jiufu Guo   <guojiufu@linux.ibm.com>
>>>>>
>>>>> 	* cfgloopmanip.h (recompute_loop_frequencies): New function.
>>>>> 	* cfgloopmanip.c (recompute_loop_frequencies): New implementation.
>>>>> 	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Call
>>>>> 	recompute_loop_frequencies.
>>>>>
>>>>> ---
>>>>>  gcc/cfgloopmanip.c        | 53 +++++++++++++++++++++++++++++++++++++++
>>>>>  gcc/cfgloopmanip.h        |  2 +-
>>>>>  gcc/tree-ssa-loop-manip.c | 28 +++------------------
>>>>>  3 files changed, 57 insertions(+), 26 deletions(-)
>>>>>
>>>>> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
>>>>> index 73134a20e33..b0ca82a67fd 100644
>>>>> --- a/gcc/cfgloopmanip.c
>>>>> +++ b/gcc/cfgloopmanip.c
>>>>> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>>  #include "gimplify-me.h"
>>>>>  #include "tree-ssa-loop-manip.h"
>>>>>  #include "dumpfile.h"
>>>>> +#include "cfgrtl.h"
>>>>>  
>>>>>  static void copy_loops_to (class loop **, int,
>>>>>  			   class loop *);
>>>>> @@ -1773,3 +1774,55 @@ loop_version (class loop *loop,
>>>>>  
>>>>>    return nloop;
>>>>>  }
>>>>> +
>>>>> +/* Recalculate the COUNTs of BBs in LOOP, if the probability of exit edge
>>>>> +   is NEW_PROB.  */
>>>>> +
>>>>> +bool
>>>>> +recompute_loop_frequencies (class loop *loop, profile_probability new_prob)
>>>>> +{
>>>>> +  edge exit = single_exit (loop);
>>>>> +  if (!exit)
>>>>> +    return false;
>>>>> +
>>>>> +  edge e;
>>>>> +  edge_iterator ei;
>>>>> +  edge non_exit;
>>>>> +  basic_block * bbs;
>>>>> +  profile_count exit_count = loop_preheader_edge (loop)->count ();
>>>>> +  profile_probability exit_p = exit_count.probability_in (loop->header->count);
>>>>> +  profile_count base_count = loop->header->count;
>>>>> +  profile_count after_num = base_count.apply_probability (exit_p);
>>>>> +  profile_count after_den = base_count.apply_probability (new_prob);
>>>>> +
>>>>> +  /* Update BB counts in loop body.
>>>>> +     COUNT<exit> = COUNT<preheader>
>>>>> +     COUNT<exit> = COUNT<header> * exit_edge_probility
>>>>> +     The COUNT<new_header> = COUNT<old_header> * old_exit_p / new_prob.  */
>>>>> +  bbs = get_loop_body (loop);
>>>>> +  scale_bbs_frequencies_profile_count (bbs, loop->num_nodes, after_num,
>>>>> +						     after_den);
>>>>> +  free (bbs);
>>>>> +
>>>>> +  /* Update probability and count of the BB besides exit edge (maybe latch).  */
>>>>> +  FOR_EACH_EDGE (e, ei, exit->src->succs)
>>>>> +    if (e != exit)
>>>>> +      break;
>>>>> +  non_exit = e;
>>>> Are we sure that exit->src has just two successors (will that case be
>>>> canonicalized before we get here?).? If it has > 2 successors, then I'm
>>>> pretty sure the frequencies get mucked up.? Richi could probably answer
>>>> whether or not the block with the loop exit edge can have > 2 successors.
>>> There's nothing preventing more than two edges in the SRC generally
>>> (the exit could be an edge off a switch).
>> That's precisely the case I was concerned about.
>>
>>>   But of course this function
>>> is very likely called on loops that are countable which means niter
>>> analysis has to handle it which in turn means all exits are controlled
>>> by simple conditions on IVs.
>> Thanks.  It sounds like it's unlikely we'll have > 2 out edges.
>>>
>>> I guess adding a gcc_assert (EDGE_COUNT (exit->src->succs) == 2) can't 
>>> hurt (with a comment reflecting the above).
>> Sounds good to me.   Just catching this case if it happens is good
>> enough for me -- if it trips we can come back and adjust the code to
>> distribute across the edges.
> With this gcc_assert, run bootstrap and regression test, no failure
> occur.
> For this patch, in the original code, there is code:
> -  new_nonexit = single_pred_edge (loop->latch);
> -  prob = new_nonexit->probability;
> -  new_nonexit->probability = new_exit->probability.invert ();
> Which is also assume 2 successors.  So, it may relative safe.
>
> Thanks,
> Jiufu Guo.
>
>>
>> So if Jan could chime in on the downstream edge updates question then I
>> think we'd be ready to move forward.
>>
>> jeff

Hi,

I saw Jeff say ok for patch [PATCH 2/2]
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/560833.html.

I'm wondering if we can approval this patch [PATCH 1/2]
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html.
and then I may commit these patches to trunk. :)

Thanks,
Jiufu Guo.

Jiufu Guo Dec. 4, 2020, 6:17 a.m. UTC | #6

Jiufu Guo <guojiufu@linux.ibm.com> writes:

> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>
>> Jeff Law <law@redhat.com> writes:
>>
>>> On 11/18/20 12:28 AM, Richard Biener wrote:
>>>> On Tue, 17 Nov 2020, Jeff Law wrote:
>>>>
>>>>> Minor questions for Jan and Richi embedded below...
>>>>>
>>>>> On 10/9/20 4:12 AM, guojiufu via Gcc-patches wrote:
>>>>>> When investigating the issue from https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549786.html
>>>>>> I find the BB COUNTs of loop seems are not accurate in some case.
>>>>>> For example:
>>>>>>
>>>>>> In below figure:
>>>>>>
>>>>>>
>>>>>>                COUNT:268435456<bb 2>  pre-header
>>>>>>                         |
>>>>>>                         |  .--------------------.
>>>>>>                         |  |                    |
>>>>>>                         V  v                    |
>>>>>>                COUNT:805306369<bb 3>            |
>>>>>>                        / \                      |
>>>>>>                    33%/   \                     |
>>>>>>                      /     \                    |
>>>>>>                     v       v                   |
>>>>>> COUNT:268435456<bb 10>  COUNT:536870911<bb 15>  | 
>>>>>>     exit-edge                 |   latch         |
>>>>>>                               ._________________.
>>>>>>
>>>>>> Those COUNTs have below equations:
>>>>>> COUNT of exit-edge:268435456 = COUNT of pre-header:268435456
>>>>>> COUNT of exit-edge:268435456 = COUNT of header:805306369 * 33
>>>>>> COUNT of header:805306369 = COUNT of pre-header:268435456 + COUNT of latch:536870911
>>>>>>
>>>>>>
>>>>>> While after pcom:
>>>>>>
>>>>>>                COUNT:268435456<bb 2>  pre-header
>>>>>>                         |
>>>>>>                         |  .--------------------.
>>>>>>                         |  |                    |
>>>>>>                         V  v                    |
>>>>>>                COUNT:268435456<bb 3>            |
>>>>>>                        / \                      |
>>>>>>                    50%/   \                     |
>>>>>>                      /     \                    |
>>>>>>                     v       v                   |
>>>>>> COUNT:134217728<bb 10>  COUNT:134217728<bb 15>  | 
>>>>>>     exit-edge                 |   latch         |
>>>>>>                               ._________________.
>>>>>>
>>>>>> COUNT<bb 3> != COUNT<bb 2> + COUNT<bb 15>
>>>>>> COUNT<bb 10> != COUNT<bb2>
>>>>>>
>>>>>> In some cases, the probility of exit-edge is easy to estimate, then
>>>>>> those COUNTs of other BBs in loop can be re-caculated.
>>>>>>
>>>>>> Bootstrap and regtest pass on ppc64le. Is this ok for trunk?
>>>>>>
>>>>>> Jiufu
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>> 2020-10-09  Jiufu Guo   <guojiufu@linux.ibm.com>
>>>>>>
>>>>>> 	* cfgloopmanip.h (recompute_loop_frequencies): New function.
>>>>>> 	* cfgloopmanip.c (recompute_loop_frequencies): New implementation.
>>>>>> 	* tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Call
>>>>>> 	recompute_loop_frequencies.
>>>>>>
>>>>>> ---
>>>>>>  gcc/cfgloopmanip.c        | 53 +++++++++++++++++++++++++++++++++++++++
>>>>>>  gcc/cfgloopmanip.h        |  2 +-
>>>>>>  gcc/tree-ssa-loop-manip.c | 28 +++------------------
>>>>>>  3 files changed, 57 insertions(+), 26 deletions(-)
>>>>>>
>>>>>> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
>>>>>> index 73134a20e33..b0ca82a67fd 100644
>>>>>> --- a/gcc/cfgloopmanip.c
>>>>>> +++ b/gcc/cfgloopmanip.c
>>>>>> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>>>>>>  #include "gimplify-me.h"
>>>>>>  #include "tree-ssa-loop-manip.h"
>>>>>>  #include "dumpfile.h"
>>>>>> +#include "cfgrtl.h"
>>>>>>  
>>>>>>  static void copy_loops_to (class loop **, int,
>>>>>>  			   class loop *);
>>>>>> @@ -1773,3 +1774,55 @@ loop_version (class loop *loop,
>>>>>>  
>>>>>>    return nloop;
>>>>>>  }
>>>>>> +
>>>>>> +/* Recalculate the COUNTs of BBs in LOOP, if the probability of exit edge
>>>>>> +   is NEW_PROB.  */
>>>>>> +
>>>>>> +bool
>>>>>> +recompute_loop_frequencies (class loop *loop, profile_probability new_prob)
>>>>>> +{
>>>>>> +  edge exit = single_exit (loop);
>>>>>> +  if (!exit)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  edge e;
>>>>>> +  edge_iterator ei;
>>>>>> +  edge non_exit;
>>>>>> +  basic_block * bbs;
>>>>>> +  profile_count exit_count = loop_preheader_edge (loop)->count ();
>>>>>> +  profile_probability exit_p = exit_count.probability_in (loop->header->count);
>>>>>> +  profile_count base_count = loop->header->count;
>>>>>> +  profile_count after_num = base_count.apply_probability (exit_p);
>>>>>> +  profile_count after_den = base_count.apply_probability (new_prob);
>>>>>> +
>>>>>> +  /* Update BB counts in loop body.
>>>>>> +     COUNT<exit> = COUNT<preheader>
>>>>>> +     COUNT<exit> = COUNT<header> * exit_edge_probility
>>>>>> +     The COUNT<new_header> = COUNT<old_header> * old_exit_p / new_prob.  */
>>>>>> +  bbs = get_loop_body (loop);
>>>>>> +  scale_bbs_frequencies_profile_count (bbs, loop->num_nodes, after_num,
>>>>>> +						     after_den);
>>>>>> +  free (bbs);
>>>>>> +
>>>>>> +  /* Update probability and count of the BB besides exit edge (maybe latch).  */
>>>>>> +  FOR_EACH_EDGE (e, ei, exit->src->succs)
>>>>>> +    if (e != exit)
>>>>>> +      break;
>>>>>> +  non_exit = e;
>>>>> Are we sure that exit->src has just two successors (will that case be
>>>>> canonicalized before we get here?).? If it has > 2 successors, then I'm
>>>>> pretty sure the frequencies get mucked up.? Richi could probably answer
>>>>> whether or not the block with the loop exit edge can have > 2 successors.
>>>> There's nothing preventing more than two edges in the SRC generally
>>>> (the exit could be an edge off a switch).
>>> That's precisely the case I was concerned about.
>>>
>>>>   But of course this function
>>>> is very likely called on loops that are countable which means niter
>>>> analysis has to handle it which in turn means all exits are controlled
>>>> by simple conditions on IVs.
>>> Thanks.  It sounds like it's unlikely we'll have > 2 out edges.
>>>>
>>>> I guess adding a gcc_assert (EDGE_COUNT (exit->src->succs) == 2) can't 
>>>> hurt (with a comment reflecting the above).
>>> Sounds good to me.   Just catching this case if it happens is good
>>> enough for me -- if it trips we can come back and adjust the code to
>>> distribute across the edges.
>> With this gcc_assert, run bootstrap and regression test, no failure
>> occur.
>> For this patch, in the original code, there is code:
>> -  new_nonexit = single_pred_edge (loop->latch);
>> -  prob = new_nonexit->probability;
>> -  new_nonexit->probability = new_exit->probability.invert ();
>> Which is also assume 2 successors.  So, it may relative safe.
>>
>> Thanks,
>> Jiufu Guo.
>>
>>>
>>> So if Jan could chime in on the downstream edge updates question then I
>>> think we'd be ready to move forward.
Oh, this may be indicate 'approval with comments', right? :)

Thanks,
Jiufu Guo.

>>>
>>> jeff
>
> Hi,
>
> I saw Jeff say ok for patch [PATCH 2/2]
> https://gcc.gnu.org/pipermail/gcc-patches/2020-December/560833.html.
>
> I'm wondering if we can approval this patch [PATCH 1/2]
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html.
> and then I may commit these patches to trunk. :)
>
> Thanks,
> Jiufu Guo.

Martin Liška Dec. 4, 2020, 6:59 a.m. UTC | #7

On 12/4/20 7:17 AM, Jiufu Guo via Gcc-patches wrote:
> Oh, this may be indicate 'approval with comments', right?:)

Yes, Honza can you please review the patch?

Thanks,
Martin

Jiufu Guo May 7, 2021, 2:36 a.m. UTC | #8

Gentle ping.

Original message: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html


Thanks,
Jiufu Guo.

Jiufu Guo May 20, 2021, 7:19 a.m. UTC | #9

Gentle ping^.

On 2021-05-07 10:36, guojiufu via Gcc-patches wrote:
> Gentle ping.
> 
> Original message:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
> 
> 
> Thanks,
> Jiufu Guo.

Jiufu Guo June 7, 2021, 2:37 a.m. UTC | #10

Gentle ping ;)

BR.
Jiufu Guo
On 2021-05-20 15:19, guojiufu via Gcc-patches wrote:
> Gentle ping^.
> 
> On 2021-05-07 10:36, guojiufu via Gcc-patches wrote:
>> Gentle ping.
>> 
>> Original message:
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
>> 
>> 
>> Thanks,
>> Jiufu Guo.

Jeff Law June 14, 2021, 2:38 a.m. UTC | #11

On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
> Gentle ping.
>
> Original message: 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
I think you need a more aggressive ping  :-)

OK for the trunk.  Sorry for the long delay.  I kept hoping someone else 
would step in and look at it.

jeff

Jan Hubicka June 14, 2021, 9:16 a.m. UTC | #12

> 
> 
> On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
> > Gentle ping.
> > 
> > Original message:
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
> I think you need a more aggressive ping  :-)
> 
> OK for the trunk.  Sorry for the long delay.  I kept hoping someone else
> would step in and look at it.
Sorry, the patch was on my todo list to think through for a while :(
It seems to me that both old and new code needs bit more work.  First
the exit loop frequency is set to

 prob = profile_probability::always ().apply_scale (1, new_est_niter + 1);

which is only correct if the estimated number of iterations is accurate.
If we do not have profile feedback and trip count is not known precisely
in most cases it won't be.  We estimate loops to iterate about 3 times
and then niter_for_unrolled_loop will apply the capping to 5 iterations
that is completely arbitrary.

Forcing exit probability to precise may then disable futher loop
optimizations since after the change we will think we know the loop
iterates 5 times and thus it is not worthy for loop opt (which is quite
oposite with the fact that we are just unrolling it thinking it is hot).

Old code does
 1) scale body down so only one iteration is done
 2) set exit edge probability to be 1/(new_est_iter+1)
    precisely
 3) scale up accoring to the 1/new_nonexit_prob
    which would be correct if the nonexit probability was updated to
    1-exit_probability but that does not seem to happen.

New code does
 1) give up when there are multiple exits.
    I wonder how common this is - we do outer loop vectorizaiton
 2) adjust loop body count according to the exit
 3) updat profile of BB after the exit edge.

Why do you need:
+  if (current_ir_type () != IR_GIMPLE)
+    update_br_prob_note (exit->src);

It is tree_transform_and_unroll_loop, so I think we should always have IR_GIMPLE?

Honza
> 
> jeff

Jiufu Guo June 15, 2021, 4:57 a.m. UTC | #13

On 2021-06-14 17:16, Jan Hubicka wrote:
>> 
>> 
>> On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
>> > Gentle ping.
>> >
>> > Original message:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
>> I think you need a more aggressive ping  :-)
>> 
>> OK for the trunk.  Sorry for the long delay.  I kept hoping someone 
>> else
>> would step in and look at it.
> Sorry, the patch was on my todo list to think through for a while :(
> It seems to me that both old and new code needs bit more work.  First
> the exit loop frequency is set to
> 
>  prob = profile_probability::always ().apply_scale (1, new_est_niter + 
> 1);
> 
> which is only correct if the estimated number of iterations is 
> accurate.
> If we do not have profile feedback and trip count is not known 
> precisely
> in most cases it won't be.  We estimate loops to iterate about 3 times
> and then niter_for_unrolled_loop will apply the capping to 5 iterations
> that is completely arbitrary.
> 
> Forcing exit probability to precise may then disable futher loop
> optimizations since after the change we will think we know the loop
> iterates 5 times and thus it is not worthy for loop opt (which is quite
> oposite with the fact that we are just unrolling it thinking it is 
> hot).

Thanks, understand your concern, both new and old code are assuming the
the number of iterations is accurate.
Maybe we could add code to reset exit probability for the case
where "!count_in.reliable_p ()".

> 
> Old code does
>  1) scale body down so only one iteration is done
>  2) set exit edge probability to be 1/(new_est_iter+1)
>     precisely
>  3) scale up accoring to the 1/new_nonexit_prob
>     which would be correct if the nonexit probability was updated to
>     1-exit_probability but that does not seem to happen.
> 
> New code does
Yes, this is intended: we know that the enter-count should be
equal to the exit-count of one loop, and then the
"loop-body-count * exit-probability = exit-count".
Also, the entry count of the loop would not be changed before and after
one optimization (or slightly change,e.g. peeling count).

Based on this, we could adjust the loop body count according to
exit-count (or say enter-count) and exit-probability, when the
exit-probability is easy to estimate.

>  1) give up when there are multiple exits.
>     I wonder how common this is - we do outer loop vectorizaiton

The computation in the new code is based on a single exit. This is
also a requirement of old code, and it would be true when run to here.

>  2) adjust loop body count according to the exit
>  3) updat profile of BB after the exit edge.
> 


> Why do you need:
> +  if (current_ir_type () != IR_GIMPLE)
> +    update_br_prob_note (exit->src);
> 
> It is tree_transform_and_unroll_loop, so I think we should always have
> IR_GIMPLE?

These two lines are added to "recompute_loop_frequencies" which can be 
used
in rtl, like the second patch of this:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html
Oh, maybe these two lines code would be put to 
tree_transform_and_unroll_loop
instead of common code recompute_loop_frequencies.

Thanks a lot for the review in your busy time!

BR.
Jiufu Guo
> 
> Honza
>> 
>> jeff

Jiufu Guo June 18, 2021, 8:24 a.m. UTC | #14

On 2021-06-15 12:57, guojiufu via Gcc-patches wrote:
> On 2021-06-14 17:16, Jan Hubicka wrote:
>>> 
>>> 
>>> On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
>>> > Gentle ping.
>>> >
>>> > Original message:
>>> > https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
>>> I think you need a more aggressive ping  :-)
>>> 
>>> OK for the trunk.  Sorry for the long delay.  I kept hoping someone 
>>> else
>>> would step in and look at it.
>> Sorry, the patch was on my todo list to think through for a while :(
>> It seems to me that both old and new code needs bit more work.  First
>> the exit loop frequency is set to
>> 
>>  prob = profile_probability::always ().apply_scale (1, new_est_niter + 
>> 1);
>> 
>> which is only correct if the estimated number of iterations is 
>> accurate.
>> If we do not have profile feedback and trip count is not known 
>> precisely
>> in most cases it won't be.  We estimate loops to iterate about 3 times
>> and then niter_for_unrolled_loop will apply the capping to 5 
>> iterations
>> that is completely arbitrary.
>> 
>> Forcing exit probability to precise may then disable futher loop
>> optimizations since after the change we will think we know the loop
>> iterates 5 times and thus it is not worthy for loop opt (which is 
>> quite
>> oposite with the fact that we are just unrolling it thinking it is 
>> hot).
> 
> Thanks, understand your concern, both new and old code are assuming the
> the number of iterations is accurate.
> Maybe we could add code to reset exit probability for the case
> where "!count_in.reliable_p ()".
> 
>> 
>> Old code does
>>  1) scale body down so only one iteration is done
>>  2) set exit edge probability to be 1/(new_est_iter+1)
>>     precisely
>>  3) scale up accoring to the 1/new_nonexit_prob
>>     which would be correct if the nonexit probability was updated to
>>     1-exit_probability but that does not seem to happen.
>> 
>> New code does
> Yes, this is intended: we know that the enter-count should be
> equal to the exit-count of one loop, and then the
> "loop-body-count * exit-probability = exit-count".
> Also, the entry count of the loop would not be changed before and after
> one optimization (or slightly change,e.g. peeling count).
> 
> Based on this, we could adjust the loop body count according to
> exit-count (or say enter-count) and exit-probability, when the
> exit-probability is easy to estimate.
> 
>>  1) give up when there are multiple exits.
>>     I wonder how common this is - we do outer loop vectorizaiton

Hi Honza, and guys:

I just had a statistic for bootstrap/test and spec2017 build and find
there are ~1700 times of single loops are hit this code; in spec2017 
build,
it hits 226 single-exit loops, and multi-exit loops are not hit.

Had a test with profile-report to see "mismatch count', with these 
patches
we may say the "mismatch count' is mitigated slightly, but not very 
aggressive:
150 mismatch counts are reduced.
But 119 mismatch counts are increased.

Any comments about this patch? Is it acceptable for the trunk? Thanks.


BR,
Jiufu Guo.


> 
> The computation in the new code is based on a single exit. This is
> also a requirement of old code, and it would be true when run to here.
> 
>>  2) adjust loop body count according to the exit
>>  3) updat profile of BB after the exit edge.
>> 
> 
> 
>> Why do you need:
>> +  if (current_ir_type () != IR_GIMPLE)
>> +    update_br_prob_note (exit->src);
>> 
>> It is tree_transform_and_unroll_loop, so I think we should always have
>> IR_GIMPLE?
> 
> These two lines are added to "recompute_loop_frequencies" which can be 
> used
> in rtl, like the second patch of this:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html
> Oh, maybe these two lines code would be put to 
> tree_transform_and_unroll_loop
> instead of common code recompute_loop_frequencies.
> 
> Thanks a lot for the review in your busy time!
> 
> BR.
> Jiufu Guo
>> 
>> Honza
>>> 
>>> jeff

Jiufu Guo July 5, 2021, 3:13 a.m. UTC | #15

Hi Honza and All,

After more checks, I'm thinking these patches may still be useful.
For patch 1:

https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html

This patch recalculates the loop's BB-count and could correct
some BB-count mismatch for loops which has a single exit.
 From the test result, we could say it reduce mismatched BB-counts
slightly.

For patch 2:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html
I updated as below:
It reset the loop's probability when the loop count becomes 
unrealistically
small.  In theory, it seems this would be the right direction to do 
this.

Bootstrap/regtest on powerpc64le with no new regressions. I'm thinking 
if
this is acceptable for trunk?

BR,
Jiufu Guo

Subject: Reset edge probability and BB-count for peeled/unrolled loop

This patch fix handles the case where unrolling in an unreliable count
number can cause a loop to no longer look hot and therefore not get 
aligned.
This patch scale by profile_probability::likely () if unrolled count 
gets
unrealistically small.  And this patch fixes the COUNT/PROB of peeled 
loop.

gcc/ChangeLog:
2021-07-01  Jiufu Guo   <guojiufu@linux.ibm.com>
	    Pat Haugen  <pthaugen@us.ibm.com>

	PR rtl-optimization/68212
	* cfgloopmanip.c (duplicate_loop_to_header_edge): Reset probablity
	of unrolled/peeled loop.

testsuite/ChangeLog:
2021-07-01  Jiufu Guo   <guojiufu@linux.ibm.com>
	    Pat Haugen  <pthaugen@us.ibm.com>
	PR rtl-optimization/68212
	* gcc.dg/pr68212.c: New test.


---
  gcc/cfgloopmanip.c             | 20 ++++++++++++++++++--
  gcc/testsuite/gcc.dg/pr68212.c | 13 +++++++++++++
  2 files changed, 31 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/pr68212.c

diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
index 4a9ab74642c..29d858c878a 100644
--- a/gcc/cfgloopmanip.c
+++ b/gcc/cfgloopmanip.c
@@ -1258,14 +1258,30 @@ duplicate_loop_to_header_edge (class loop *loop, 
edge e,
  	  /* If original loop is executed COUNT_IN times, the unrolled
  	     loop will account SCALE_MAIN_DEN times.  */
  	  scale_main = count_in.probability_in (scale_main_den);
+
+	  /* If we are guessing at the number of iterations and count_in
+	     becomes unrealistically small, reset probability.  */
+	  if (!(count_in.reliable_p () || loop->any_estimate))
+	    {
+	      profile_count new_count_in = count_in.apply_probability 
(scale_main);
+	      profile_count preheader_count = loop_preheader_edge 
(loop)->count ();
+	      if (new_count_in.apply_scale (1, 10) < preheader_count)
+		scale_main = profile_probability::likely ();
+	    }
+
  	  scale_act = scale_main * prob_pass_main;
  	}
        else
  	{
+	  profile_count new_loop_count;
  	  profile_count preheader_count = e->count ();
-	  for (i = 0; i < ndupl; i++)
-	    scale_main = scale_main * scale_step[i];
  	  scale_act = preheader_count.probability_in (count_in);
+	  /* Compute final preheader count after peeling NDUPL copies.  */
+	  for (i = 0; i < ndupl; i++)
+	    preheader_count = preheader_count.apply_probability 
(scale_step[i]);
+	  /* Subtract out exit(s) from peeled copies.  */
+	  new_loop_count = count_in - (e->count () - preheader_count);
+	  scale_main = new_loop_count.probability_in (count_in);
  	}
      }

diff --git a/gcc/testsuite/gcc.dg/pr68212.c 
b/gcc/testsuite/gcc.dg/pr68212.c
new file mode 100644
index 00000000000..e0cf71d5202
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68212.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param 
max-unroll-times=4 -fdump-rtl-alignments" } */
+
+void foo(long int *a, long int *b, long int n)
+{
+  long int i;
+
+  for (i = 0; i < n; i++)
+    a[i] = *b;
+}
+
+/* { dg-final { scan-rtl-dump-times "internal loop alignment added" 1 
"alignments"} } */
+

[1/2] correct BB frequencies after loop changed

Commit Message

Comments

Patch