diff mbox

rebuild frequency after vrp

Message ID CAO2gOZWFOuG2O0RLkkT5+tZOpDg+-=pBQk--88X+A-qObgDayw@mail.gmail.com
State New
Headers show

Commit Message

Dehao Chen June 2, 2014, 3:43 p.m. UTC
This patch rebuilds frequency after vrp.

Bootstrapped and testing on-going. OK for trunk if test pass?

Thanks,
Dehao

gcc/ChangeLog:
2014-06-02  Dehao Chen  <dehao@google.com>

        PR tree-optimization/61384
        * tree-vrp.c (execute_vrp): rebuild frequency after vrp.

gcc/testsuite/ChangeLog:
2014-06-02  Dehao Chen  <dehao@google.com>

        PR tree-optimization/61384
        * gcc.dg/pr61384.c: New testcase.

Comments

Jan Hubicka June 2, 2014, 4:13 p.m. UTC | #1
> This patch rebuilds frequency after vrp.

Why do you need to rebuild frequency after VRP?  I always tought it may be
useful to do VRP as early optimization (modulo to compile time costs),
but I do not think we should unconditionally rebuild frequencies like this...

Honza
> 
> Bootstrapped and testing on-going. OK for trunk if test pass?
> 
> Thanks,
> Dehao
> 
> gcc/ChangeLog:
> 2014-06-02  Dehao Chen  <dehao@google.com>
> 
>         PR tree-optimization/61384
>         * tree-vrp.c (execute_vrp): rebuild frequency after vrp.
> 
> gcc/testsuite/ChangeLog:
> 2014-06-02  Dehao Chen  <dehao@google.com>
> 
>         PR tree-optimization/61384
>         * gcc.dg/pr61384.c: New testcase.
> 
> Index: gcc/testsuite/gcc.dg/pr61384.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/pr61384.c (revision 0)
> +++ gcc/testsuite/gcc.dg/pr61384.c (revision 0)
> @@ -0,0 +1,32 @@
> +/* PR tree-optimization/61384 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +int a, b, d, e, g, h;
> +static int c = 1, f[5];
> +
> +int
> +fn1 (int p1, int p2)
> +{
> +  return p1 && p2 && p2;
> +}
> +
> +void
> +fn2 ()
> +{
> +  for (d = 0; d < 1; d++)
> +    {
> +      g = f[0];
> +      e = 0;
> +      h = fn1 (1, (a && c) ^ b);
> +    }
> +  for (; e < 5; e++)
> +    f[e] = 1;
> +}
> +
> +int
> +main ()
> +{
> +  fn2 ();
> +  return 0;
> +}
> 
> Index: gcc/tree-vrp.c
> ===================================================================
> --- gcc/tree-vrp.c (revision 211137)
> +++ gcc/tree-vrp.c (working copy)
> @@ -9794,7 +9794,7 @@ execute_vrp (void)
> 
>    scev_finalize ();
>    loop_optimizer_finalize ();
> -  return 0;
> +  return TODO_rebuild_frequencies;
>  }
> 
>  namespace {
Dehao Chen June 2, 2014, 4:17 p.m. UTC | #2
We need to rebuild frequency after vrp, otherwise the following code
in tree-ssa-threadupdate.c will make the frequency larger than
upper-bound.

          /* Excessive jump threading may make frequencies large enough so
             the computation overflows.  */
          if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2)
            rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e);

This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384

Thanks,
Dehao

On Mon, Jun 2, 2014 at 9:13 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> This patch rebuilds frequency after vrp.
>
> Why do you need to rebuild frequency after VRP?  I always tought it may be
> useful to do VRP as early optimization (modulo to compile time costs),
> but I do not think we should unconditionally rebuild frequencies like this...
>
> Honza
>>
>> Bootstrapped and testing on-going. OK for trunk if test pass?
>>
>> Thanks,
>> Dehao
>>
>> gcc/ChangeLog:
>> 2014-06-02  Dehao Chen  <dehao@google.com>
>>
>>         PR tree-optimization/61384
>>         * tree-vrp.c (execute_vrp): rebuild frequency after vrp.
>>
>> gcc/testsuite/ChangeLog:
>> 2014-06-02  Dehao Chen  <dehao@google.com>
>>
>>         PR tree-optimization/61384
>>         * gcc.dg/pr61384.c: New testcase.
>>
>> Index: gcc/testsuite/gcc.dg/pr61384.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/pr61384.c (revision 0)
>> +++ gcc/testsuite/gcc.dg/pr61384.c (revision 0)
>> @@ -0,0 +1,32 @@
>> +/* PR tree-optimization/61384 */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O3" } */
>> +
>> +int a, b, d, e, g, h;
>> +static int c = 1, f[5];
>> +
>> +int
>> +fn1 (int p1, int p2)
>> +{
>> +  return p1 && p2 && p2;
>> +}
>> +
>> +void
>> +fn2 ()
>> +{
>> +  for (d = 0; d < 1; d++)
>> +    {
>> +      g = f[0];
>> +      e = 0;
>> +      h = fn1 (1, (a && c) ^ b);
>> +    }
>> +  for (; e < 5; e++)
>> +    f[e] = 1;
>> +}
>> +
>> +int
>> +main ()
>> +{
>> +  fn2 ();
>> +  return 0;
>> +}
>>
>> Index: gcc/tree-vrp.c
>> ===================================================================
>> --- gcc/tree-vrp.c (revision 211137)
>> +++ gcc/tree-vrp.c (working copy)
>> @@ -9794,7 +9794,7 @@ execute_vrp (void)
>>
>>    scev_finalize ();
>>    loop_optimizer_finalize ();
>> -  return 0;
>> +  return TODO_rebuild_frequencies;
>>  }
>>
>>  namespace {
Jeff Law June 2, 2014, 4:45 p.m. UTC | #3
On 06/02/14 10:17, Dehao Chen wrote:
> We need to rebuild frequency after vrp, otherwise the following code
> in tree-ssa-threadupdate.c will make the frequency larger than
> upper-bound.
>
>            /* Excessive jump threading may make frequencies large enough so
>               the computation overflows.  */
>            if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2)
>              rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e);
>
> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384
Can you try this with Teresa's revamping of the jump threading frequency 
updates?  It's still in my queue of things to review.

Fixing this stuff in the updater would be better than rebuilding the 
frequencies, IMHO.

Jeff
Dehao Chen June 2, 2014, 5:26 p.m. UTC | #4
Just tried with Teresa's patch, the ICE in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 is not resolved.

Dehao

On Mon, Jun 2, 2014 at 9:45 AM, Jeff Law <law@redhat.com> wrote:
> On 06/02/14 10:17, Dehao Chen wrote:
>>
>> We need to rebuild frequency after vrp, otherwise the following code
>> in tree-ssa-threadupdate.c will make the frequency larger than
>> upper-bound.
>>
>>            /* Excessive jump threading may make frequencies large enough
>> so
>>               the computation overflows.  */
>>            if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2)
>>              rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e);
>>
>> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384
>
> Can you try this with Teresa's revamping of the jump threading frequency
> updates?  It's still in my queue of things to review.
>
> Fixing this stuff in the updater would be better than rebuilding the
> frequencies, IMHO.
>
> Jeff
>
Teresa Johnson June 2, 2014, 6:04 p.m. UTC | #5
On Mon, Jun 2, 2014 at 10:26 AM, Dehao Chen <dehao@google.com> wrote:
> Just tried with Teresa's patch, the ICE in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 is not resolved.

I will take a look to see why this wasn't fixed by my profile redesign.

Teresa

>
> Dehao
>
> On Mon, Jun 2, 2014 at 9:45 AM, Jeff Law <law@redhat.com> wrote:
>> On 06/02/14 10:17, Dehao Chen wrote:
>>>
>>> We need to rebuild frequency after vrp, otherwise the following code
>>> in tree-ssa-threadupdate.c will make the frequency larger than
>>> upper-bound.
>>>
>>>            /* Excessive jump threading may make frequencies large enough
>>> so
>>>               the computation overflows.  */
>>>            if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2)
>>>              rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e);
>>>
>>> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384
>>
>> Can you try this with Teresa's revamping of the jump threading frequency
>> updates?  It's still in my queue of things to review.
>>
>> Fixing this stuff in the updater would be better than rebuilding the
>> frequencies, IMHO.
>>
>> Jeff
>>
Jan Hubicka June 2, 2014, 6:07 p.m. UTC | #6
> On 06/02/14 10:17, Dehao Chen wrote:
> >We need to rebuild frequency after vrp, otherwise the following code
> >in tree-ssa-threadupdate.c will make the frequency larger than
> >upper-bound.
> >
> >           /* Excessive jump threading may make frequencies large enough so
> >              the computation overflows.  */
> >           if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2)
> >             rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e);
> >
> >This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384
> Can you try this with Teresa's revamping of the jump threading
> frequency updates?  It's still in my queue of things to review.
> 
> Fixing this stuff in the updater would be better than rebuilding the
> frequencies, IMHO.

Agreed. However one of problems is that the frequencies after VRP may be unfixable.
If i.e. you have

if (a)
  prob 20
else
  prob 80
if (a)
  prob 50
else
  prob 50
that can easily happen if the second conditional was more convoluted earlier in
the optimization queue.

We may track when this change and ask for frequency rebuild, but issue is that
I am not completely sure if that is going to give us consistently more reliable
profiles than without: the problem is that branch probabilities themselves may
be misupdated by earlier passes.  It may be interesting to do some analysis how
well estimated profile corelate with measured profile thorough the
optimization.

The problem above would be less troublesome if we jump threaded at least some
cases pre-branch prediction.

It is one of reasions why I think it would be cool to do jump threading in
early opts, too, at least in a lightweidt form. Now when VRP is able to annotate
SSA names, VRP results could feed loop estimation analysis for i.e. get upper
bound on iterations of:

if (a<50)
  for (i=0;i<a;i++)

that can in turn improve branch prediction. Also we can feed the ranges into
ipa-cp and get ipa-cp to propagate ranges rather than constants.  This will get
us free non-NULL propagation. IPA-inline's predicates can take use of value ranges
because they record conditionals and also the results can be fed back to post-IPA
VRP (by setting value ranges of default def SSA names of the parmeters)

The down side is, of course, that VRP is considered expensive.

Honza
> 
> Jeff
Jan Hubicka June 2, 2014, 6:13 p.m. UTC | #7
> We need to rebuild frequency after vrp, otherwise the following code
> in tree-ssa-threadupdate.c will make the frequency larger than
> upper-bound.
> 
>           /* Excessive jump threading may make frequencies large enough so
>              the computation overflows.  */
>           if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2)
>             rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e);
> 
> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384

I see, I still think C++ conversion gives us chance to turn both counts and
frequencies to types that are safe WRT such calculations.  Counts can be either
software floats as discussed or fixed point 64bit type (I think we only need
something like 8-10 bits to get problems Teresa is running into resolved - the
cases where expanding statements gets sub 1 counts that needs to be diferent
from 0) that does caps instead of overflow. Frequencies can be either the same
or 32bit fixed point with capping as we do, sort of, now.

I would really welcome if someone worked on this rather than papering around the
bugs in current hand written fixed point arithmetics. (I am currently occupied
by other projects, so I am not sure I can do it myself really soon)

Honza
Jeff Law June 2, 2014, 6:13 p.m. UTC | #8
On 06/02/14 12:07, Jan Hubicka wrote:
>
> It is one of reasions why I think it would be cool to do jump threading in
> early opts, too, at least in a lightweidt form.
Conceptually it's pretty easy to do during the into-ssa step.  Not sure 
it it'd catch the cases you care about though.

jeff
Teresa Johnson June 2, 2014, 6:36 p.m. UTC | #9
On Mon, Jun 2, 2014 at 11:04 AM, Teresa Johnson <tejohnson@google.com> wrote:
> On Mon, Jun 2, 2014 at 10:26 AM, Dehao Chen <dehao@google.com> wrote:
>> Just tried with Teresa's patch, the ICE in
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384 is not resolved.
>
> I will take a look to see why this wasn't fixed by my profile redesign.

Turns out this is a case of garbage-in, garbage-out. Jump threading is
producing a block with a frequency >10K, but it is directly due to a
block with an invalid frequency after the upstream ccp2 pass. After
ccp2:

;;   basic block 3, loop depth 1, count 0, freq 9100, maybe hot
;;    prev block 2, next block 4, flags: (NEW, REACHABLE)
;;    pred:       10 [91.0%]  (TRUE_VALUE,EXECUTABLE)
...
;;    succ:       5 [50.0%]  (TRUE_VALUE,EXECUTABLE)
;;                4 [50.0%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 1, count 0, freq 6825, maybe hot
;;   Invalid sum of incoming frequencies 4550, should be 6825
;;    prev block 3, next block 5, flags: (NEW, REACHABLE)
;;    pred:       3 [50.0%]  (FALSE_VALUE,EXECUTABLE)
;; , discriminator 4
;;    succ:       5 [100.0%]  (FALLTHRU,EXECUTABLE)


Jump threading introduced BB 13, with predecessors 3 and 4 shown above:

;;   basic block 13, loop depth 1, count 0, freq 11375, maybe hot
;;   Invalid sum of outgoing probabilities 40.0%
;;    prev block 12, next block 1, flags: (NEW, REACHABLE)
;;    pred:       4 [100.0%]  (FALLTHRU,EXECUTABLE)
;;                3 [50.0%]  (TRUE_VALUE,EXECUTABLE)

11375 = 9100/2 + 6825

If bb 4 had the correct frequency of 4550 (9100/2), this block would
have gotten the correct count of 9100. BB 13's successor has the
correct frequency of 9100.

Teresa

>
> Teresa
>
>>
>> Dehao
>>
>> On Mon, Jun 2, 2014 at 9:45 AM, Jeff Law <law@redhat.com> wrote:
>>> On 06/02/14 10:17, Dehao Chen wrote:
>>>>
>>>> We need to rebuild frequency after vrp, otherwise the following code
>>>> in tree-ssa-threadupdate.c will make the frequency larger than
>>>> upper-bound.
>>>>
>>>>            /* Excessive jump threading may make frequencies large enough
>>>> so
>>>>               the computation overflows.  */
>>>>            if (rd->dup_blocks[0]->frequency < BB_FREQ_MAX * 2)
>>>>              rd->dup_blocks[0]->frequency += EDGE_FREQUENCY (e);
>>>>
>>>> This is referring to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61384
>>>
>>> Can you try this with Teresa's revamping of the jump threading frequency
>>> updates?  It's still in my queue of things to review.
>>>
>>> Fixing this stuff in the updater would be better than rebuilding the
>>> frequencies, IMHO.
>>>
>>> Jeff
>>>
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Jan Hubicka June 2, 2014, 8:27 p.m. UTC | #10
> On 06/02/14 12:07, Jan Hubicka wrote:
> >
> >It is one of reasions why I think it would be cool to do jump threading in
> >early opts, too, at least in a lightweidt form.
> Conceptually it's pretty easy to do during the into-ssa step.  Not
> sure it it'd catch the cases you care about though.

Yep, it may be an option, too. I always had in mind Muchnick style cheap
DOM pass run during into-SSA or as very first cleanup to get rid of unnecesary
code quickly and cheaply. Our DOM is bit different beast though :).
Very basic jump threading may fit here - never really tought about that.

You probably know better than me how many of threading oppurtunities are
"obvious" and how many are the difficult ones solved by VRP&DOM. My gut feeling
is that doing the obvious ones may be good enough. But I do not know.

Honza
> 
> jeff
>
Jeff Law June 2, 2014, 8:35 p.m. UTC | #11
On 06/02/14 14:27, Jan Hubicka wrote:
>> On 06/02/14 12:07, Jan Hubicka wrote:
>>>
>>> It is one of reasions why I think it would be cool to do jump threading in
>>> early opts, too, at least in a lightweidt form.
>> Conceptually it's pretty easy to do during the into-ssa step.  Not
>> sure it it'd catch the cases you care about though.
>
> Yep, it may be an option, too. I always had in mind Muchnick style cheap
> DOM pass run during into-SSA or as very first cleanup to get rid of unnecesary
> code quickly and cheaply. Our DOM is bit different beast though :).
> Very basic jump threading may fit here - never really tought about that.
We had all this about 10 years ago ;-)  DOM actually started its life a 
Morgan-esque bolt onto the renamer.



> You probably know better than me how many of threading oppurtunities are
> "obvious" and how many are the difficult ones solved by VRP&DOM. My gut feeling
> is that doing the obvious ones may be good enough. But I do not know.
A huge number of them are trivially derivable by just looking at the PHI 
nodes in the current block and I've pondered many times having a simpler 
jump threading pass that runs early in the gimple pipeline.

Jeff
Jan Hubicka June 2, 2014, 8:44 p.m. UTC | #12
> On 06/02/14 14:27, Jan Hubicka wrote:
> >>On 06/02/14 12:07, Jan Hubicka wrote:
> >>>
> >>>It is one of reasions why I think it would be cool to do jump threading in
> >>>early opts, too, at least in a lightweidt form.
> >>Conceptually it's pretty easy to do during the into-ssa step.  Not
> >>sure it it'd catch the cases you care about though.
> >
> >Yep, it may be an option, too. I always had in mind Muchnick style cheap
> >DOM pass run during into-SSA or as very first cleanup to get rid of unnecesary
> >code quickly and cheaply. Our DOM is bit different beast though :).
> >Very basic jump threading may fit here - never really tought about that.
> We had all this about 10 years ago ;-)  DOM actually started its
> life a Morgan-esque bolt onto the renamer.

Yep, I know :)) But being one of first SSA passes, it envoled to be bit too
smart for what I want.

I used to have some numbers on running dom or VRP early. I suppose I can
re-test them and see how much benefits that makes.  This things are bit hard
to get - strenghtening early opts always makes inliner to increase its activity
making a mess of code size comparsions.
> 
> 
> 
> >You probably know better than me how many of threading oppurtunities are
> >"obvious" and how many are the difficult ones solved by VRP&DOM. My gut feeling
> >is that doing the obvious ones may be good enough. But I do not know.
> A huge number of them are trivially derivable by just looking at the
> PHI nodes in the current block and I've pondered many times having a
> simpler jump threading pass that runs early in the gimple pipeline.

Yep, I think it would make sense indeed even though we currently have 3 threading
passes and I am responsible for one of them already :)

Honza
> 
> Jeff
diff mbox

Patch

Index: gcc/testsuite/gcc.dg/pr61384.c
===================================================================
--- gcc/testsuite/gcc.dg/pr61384.c (revision 0)
+++ gcc/testsuite/gcc.dg/pr61384.c (revision 0)
@@ -0,0 +1,32 @@ 
+/* PR tree-optimization/61384 */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int a, b, d, e, g, h;
+static int c = 1, f[5];
+
+int
+fn1 (int p1, int p2)
+{
+  return p1 && p2 && p2;
+}
+
+void
+fn2 ()
+{
+  for (d = 0; d < 1; d++)
+    {
+      g = f[0];
+      e = 0;
+      h = fn1 (1, (a && c) ^ b);
+    }
+  for (; e < 5; e++)
+    f[e] = 1;
+}
+
+int
+main ()
+{
+  fn2 ();
+  return 0;
+}

Index: gcc/tree-vrp.c
===================================================================
--- gcc/tree-vrp.c (revision 211137)
+++ gcc/tree-vrp.c (working copy)
@@ -9794,7 +9794,7 @@  execute_vrp (void)

   scev_finalize ();
   loop_optimizer_finalize ();
-  return 0;
+  return TODO_rebuild_frequencies;
 }

 namespace {