diff mbox

Change the badness computation to ensure no integer-underflow

Message ID CAO2gOZXexPV1rb9ag_Xcv2sfcgOB9Uuyah1uremcMdMu1LG2Lw@mail.gmail.com
State New
Headers show

Commit Message

Dehao Chen June 24, 2013, 10:45 p.m. UTC
The original patch has some flaw. The new patch is attached.
Bootstrapped and passed regression tests.

Thanks,
Dehao

On Mon, Jun 24, 2013 at 9:44 AM, Dehao Chen <dehao@google.com> wrote:
> Hi, Richard,
>
> I've updated the patch (as attached) to use sreal to compute badness.
>
> Thanks,
> Dehao
>
> On Mon, Jun 24, 2013 at 5:41 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Sat, Jun 22, 2013 at 2:59 AM, Dehao Chen <dehao@google.com> wrote:
>>> This patch prevents integer-underflow of badness computation in ipa-inline.
>>>
>>> Bootstrapped and passed regression tests.
>>>
>>> OK for trunk?
>>>
>>> Thanks,
>>> Dehao
>>>
>>> gcc/ChangeLog:
>>> 2013-06-21  Dehao Chen  <dehao@google.com>
>>>
>>>         * ipa-inline.c (edge_badness): Fix integer underflow.
>>>
>>> Index: gcc/ipa-inline.c
>>> ===================================================================
>>> --- gcc/ipa-inline.c (revision 200326)
>>> +++ gcc/ipa-inline.c (working copy)
>>> @@ -888,11 +888,9 @@ edge_badness (struct cgraph_edge *edge, bool dump)
>>>    else if (max_count)
>>>      {
>>>        int relbenefit = relative_time_benefit (callee_info, edge, edge_time);
>>> -      badness =
>>> - ((int)
>>> - ((double) edge->count * INT_MIN / 2 / max_count /
>>> RELATIVE_TIME_BENEFIT_RANGE) *
>>> - relbenefit) / growth;
>>> -
>>> +      badness = ((int)((double) edge->count / max_count
>>> +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth;
>>> +
>>
>> FP operations on the host are frowned upon if code generation depends
>> on their outcome.  They all should use sreal_* as predict already does.
>> Other than that I wonder why the final division is int (so we get truncation
>> and not rounding)?  That increases the effect of doing the multiply by
>> relbenefit in double.
>>
>> Richard.
>>
>>>        /* Be sure that insanity of the profile won't lead to increasing counts
>>>   in the scalling and thus to overflow in the computation above.  */
>>>        gcc_assert (max_count >= edge->count);

Comments

Dehao Chen Aug. 19, 2013, 10:14 p.m. UTC | #1
Hi, Richard,

Could you take a second look at this patch?

Thanks,
Dehao

On Mon, Jun 24, 2013 at 3:45 PM, Dehao Chen <dehao@google.com> wrote:
> The original patch has some flaw. The new patch is attached.
> Bootstrapped and passed regression tests.
>
> Thanks,
> Dehao
>
> On Mon, Jun 24, 2013 at 9:44 AM, Dehao Chen <dehao@google.com> wrote:
>> Hi, Richard,
>>
>> I've updated the patch (as attached) to use sreal to compute badness.
>>
>> Thanks,
>> Dehao
>>
>> On Mon, Jun 24, 2013 at 5:41 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Sat, Jun 22, 2013 at 2:59 AM, Dehao Chen <dehao@google.com> wrote:
>>>> This patch prevents integer-underflow of badness computation in ipa-inline.
>>>>
>>>> Bootstrapped and passed regression tests.
>>>>
>>>> OK for trunk?
>>>>
>>>> Thanks,
>>>> Dehao
>>>>
>>>> gcc/ChangeLog:
>>>> 2013-06-21  Dehao Chen  <dehao@google.com>
>>>>
>>>>         * ipa-inline.c (edge_badness): Fix integer underflow.
>>>>
>>>> Index: gcc/ipa-inline.c
>>>> ===================================================================
>>>> --- gcc/ipa-inline.c (revision 200326)
>>>> +++ gcc/ipa-inline.c (working copy)
>>>> @@ -888,11 +888,9 @@ edge_badness (struct cgraph_edge *edge, bool dump)
>>>>    else if (max_count)
>>>>      {
>>>>        int relbenefit = relative_time_benefit (callee_info, edge, edge_time);
>>>> -      badness =
>>>> - ((int)
>>>> - ((double) edge->count * INT_MIN / 2 / max_count /
>>>> RELATIVE_TIME_BENEFIT_RANGE) *
>>>> - relbenefit) / growth;
>>>> -
>>>> +      badness = ((int)((double) edge->count / max_count
>>>> +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth;
>>>> +
>>>
>>> FP operations on the host are frowned upon if code generation depends
>>> on their outcome.  They all should use sreal_* as predict already does.
>>> Other than that I wonder why the final division is int (so we get truncation
>>> and not rounding)?  That increases the effect of doing the multiply by
>>> relbenefit in double.
>>>
>>> Richard.
>>>
>>>>        /* Be sure that insanity of the profile won't lead to increasing counts
>>>>   in the scalling and thus to overflow in the computation above.  */
>>>>        gcc_assert (max_count >= edge->count);
Richard Biener Aug. 27, 2013, 12:02 p.m. UTC | #2
On Tue, Jun 25, 2013 at 12:45 AM, Dehao Chen <dehao@google.com> wrote:
> The original patch has some flaw. The new patch is attached.
> Bootstrapped and passed regression tests.

Ok.

Thanks,
Richard.

> Thanks,
> Dehao
>
> On Mon, Jun 24, 2013 at 9:44 AM, Dehao Chen <dehao@google.com> wrote:
>> Hi, Richard,
>>
>> I've updated the patch (as attached) to use sreal to compute badness.
>>
>> Thanks,
>> Dehao
>>
>> On Mon, Jun 24, 2013 at 5:41 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Sat, Jun 22, 2013 at 2:59 AM, Dehao Chen <dehao@google.com> wrote:
>>>> This patch prevents integer-underflow of badness computation in ipa-inline.
>>>>
>>>> Bootstrapped and passed regression tests.
>>>>
>>>> OK for trunk?
>>>>
>>>> Thanks,
>>>> Dehao
>>>>
>>>> gcc/ChangeLog:
>>>> 2013-06-21  Dehao Chen  <dehao@google.com>
>>>>
>>>>         * ipa-inline.c (edge_badness): Fix integer underflow.
>>>>
>>>> Index: gcc/ipa-inline.c
>>>> ===================================================================
>>>> --- gcc/ipa-inline.c (revision 200326)
>>>> +++ gcc/ipa-inline.c (working copy)
>>>> @@ -888,11 +888,9 @@ edge_badness (struct cgraph_edge *edge, bool dump)
>>>>    else if (max_count)
>>>>      {
>>>>        int relbenefit = relative_time_benefit (callee_info, edge, edge_time);
>>>> -      badness =
>>>> - ((int)
>>>> - ((double) edge->count * INT_MIN / 2 / max_count /
>>>> RELATIVE_TIME_BENEFIT_RANGE) *
>>>> - relbenefit) / growth;
>>>> -
>>>> +      badness = ((int)((double) edge->count / max_count
>>>> +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth;
>>>> +
>>>
>>> FP operations on the host are frowned upon if code generation depends
>>> on their outcome.  They all should use sreal_* as predict already does.
>>> Other than that I wonder why the final division is int (so we get truncation
>>> and not rounding)?  That increases the effect of doing the multiply by
>>> relbenefit in double.
>>>
>>> Richard.
>>>
>>>>        /* Be sure that insanity of the profile won't lead to increasing counts
>>>>   in the scalling and thus to overflow in the computation above.  */
>>>>        gcc_assert (max_count >= edge->count);
Jan Hubicka Aug. 27, 2013, 12:50 p.m. UTC | #3
> >>
> >> I've updated the patch (as attached) to use sreal to compute badness.
> >>>> +      badness = ((int)((double) edge->count / max_count
> >>>> +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth;
> >>>> +
> >>>
> >>> FP operations on the host are frowned upon if code generation depends
> >>> on their outcome.  They all should use sreal_* as predict already does.
> >>> Other than that I wonder why the final division is int (so we get truncation
> >>> and not rounding)?  That increases the effect of doing the multiply by
> >>> relbenefit in double.
Sorry, i missed this patch - I need to update my scripts marking files interesting
to me.

I originally duplicated the trick from global.c that used to do this for ages.
The rationale is that if you do only one FP operation, then you will not get
difference in between hosts doing 80bit fp (x86) and 64bit fp (others) and you 
do the operation a lot faster than sreal does. While this is on slipperly ground,
controlled use of double should not imply host dependency.

Badness computation is on hot the path of inliner that is the slowest WPA
optimization we have (well becuase it only does something realy useful).  I
suppose I can try to get Martin to benchmark the patch on firefox on the
afternoon.

Honza
> >>>
> >>> Richard.
> >>>
> >>>>        /* Be sure that insanity of the profile won't lead to increasing counts
> >>>>   in the scalling and thus to overflow in the computation above.  */
> >>>>        gcc_assert (max_count >= edge->count);
Richard Biener Aug. 28, 2013, 10:45 a.m. UTC | #4
On Tue, Aug 27, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >>
>> >> I've updated the patch (as attached) to use sreal to compute badness.
>> >>>> +      badness = ((int)((double) edge->count / max_count
>> >>>> +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth;
>> >>>> +
>> >>>
>> >>> FP operations on the host are frowned upon if code generation depends
>> >>> on their outcome.  They all should use sreal_* as predict already does.
>> >>> Other than that I wonder why the final division is int (so we get truncation
>> >>> and not rounding)?  That increases the effect of doing the multiply by
>> >>> relbenefit in double.
> Sorry, i missed this patch - I need to update my scripts marking files interesting
> to me.
>
> I originally duplicated the trick from global.c that used to do this for ages.
> The rationale is that if you do only one FP operation, then you will not get
> difference in between hosts doing 80bit fp (x86) and 64bit fp (others) and you
> do the operation a lot faster than sreal does. While this is on slipperly ground,
> controlled use of double should not imply host dependency.

Well .. the above is clearly not "one" FP operation.  Also I don't follow.
If we can assume all hosts implement proper IEEE double math then
of course the issue is moot.  But fact is we don't (not sure if we can
declare such hosts broken and unsupported as host).  Of couse on
a perfect IEEE compliant host sreal.[ch] could be simply mapped to
double.  Oh, and yes, 80bit x87 unit isn't holding up to that premises.

> Badness computation is on hot the path of inliner that is the slowest WPA
> optimization we have (well becuase it only does something realy useful).  I
> suppose I can try to get Martin to benchmark the patch on firefox on the
> afternoon.
>
> Honza
>> >>>
>> >>> Richard.
>> >>>
>> >>>>        /* Be sure that insanity of the profile won't lead to increasing counts
>> >>>>   in the scalling and thus to overflow in the computation above.  */
>> >>>>        gcc_assert (max_count >= edge->count);
Jan Hubicka Aug. 28, 2013, 12:13 p.m. UTC | #5
> On Tue, Aug 27, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> >>
> >> >> I've updated the patch (as attached) to use sreal to compute badness.
> >> >>>> +      badness = ((int)((double) edge->count / max_count
> >> >>>> +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth;
> >> >>>> +
> >> >>>
> >> >>> FP operations on the host are frowned upon if code generation depends
> >> >>> on their outcome.  They all should use sreal_* as predict already does.
> >> >>> Other than that I wonder why the final division is int (so we get truncation
> >> >>> and not rounding)?  That increases the effect of doing the multiply by
> >> >>> relbenefit in double.
> > Sorry, i missed this patch - I need to update my scripts marking files interesting
> > to me.
> >
> > I originally duplicated the trick from global.c that used to do this for ages.
> > The rationale is that if you do only one FP operation, then you will not get
> > difference in between hosts doing 80bit fp (x86) and 64bit fp (others) and you
> > do the operation a lot faster than sreal does. While this is on slipperly ground,
> > controlled use of double should not imply host dependency.
> 
> Well .. the above is clearly not "one" FP operation.  Also I don't follow.
> If we can assume all hosts implement proper IEEE double math then
> of course the issue is moot.  But fact is we don't (not sure if we can

We are not really consistent across all hosts either, because of HOST_WIDE_INT
differences.  What we really cared about is to have compiler that does not give
different result based on build flags.

> declare such hosts broken and unsupported as host).  Of couse on
> a perfect IEEE compliant host sreal.[ch] could be simply mapped to
> double.  Oh, and yes, 80bit x87 unit isn't holding up to that premises.

sreal is not IEEE compliant.  We first time hit the problem of doing
non-trivial chains of fp opreations in predict.c during my school project and
Josef implemented it as a dirty but resonably quick FP emulation module.
Hopefully it will work well enough in badness computationm, too.

I am giving the patch brief benchmarking on profiledbootstrap and it it won't
cause major regression, I think we should go ahead with the patch.

I was never really happy about the double use there and in fact the whole fixed
point arithmetic in badness compuation is a mess.  If we had template based
fibonaci heap and sreal fast enough, turing it all to reals would save quite
some maintenance burden.

Honza
Richard Biener Aug. 28, 2013, 1:50 p.m. UTC | #6
On Wed, Aug 28, 2013 at 2:13 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Tue, Aug 27, 2013 at 2:50 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >> >>
>> >> >> I've updated the patch (as attached) to use sreal to compute badness.
>> >> >>>> +      badness = ((int)((double) edge->count / max_count
>> >> >>>> +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth;
>> >> >>>> +
>> >> >>>
>> >> >>> FP operations on the host are frowned upon if code generation depends
>> >> >>> on their outcome.  They all should use sreal_* as predict already does.
>> >> >>> Other than that I wonder why the final division is int (so we get truncation
>> >> >>> and not rounding)?  That increases the effect of doing the multiply by
>> >> >>> relbenefit in double.
>> > Sorry, i missed this patch - I need to update my scripts marking files interesting
>> > to me.
>> >
>> > I originally duplicated the trick from global.c that used to do this for ages.
>> > The rationale is that if you do only one FP operation, then you will not get
>> > difference in between hosts doing 80bit fp (x86) and 64bit fp (others) and you
>> > do the operation a lot faster than sreal does. While this is on slipperly ground,
>> > controlled use of double should not imply host dependency.
>>
>> Well .. the above is clearly not "one" FP operation.  Also I don't follow.
>> If we can assume all hosts implement proper IEEE double math then
>> of course the issue is moot.  But fact is we don't (not sure if we can
>
> We are not really consistent across all hosts either, because of HOST_WIDE_INT
> differences.  What we really cared about is to have compiler that does not give
> different result based on build flags.

Well, we care that a cross from X to T produces the same code as a cross
from Y to T or a native compiler on T.  Otherwise it's hard to even think of
using icecream like we do internally.

>> declare such hosts broken and unsupported as host).  Of couse on
>> a perfect IEEE compliant host sreal.[ch] could be simply mapped to
>> double.  Oh, and yes, 80bit x87 unit isn't holding up to that premises.
>
> sreal is not IEEE compliant.  We first time hit the problem of doing
> non-trivial chains of fp opreations in predict.c during my school project and
> Josef implemented it as a dirty but resonably quick FP emulation module.
> Hopefully it will work well enough in badness computationm, too.
>
> I am giving the patch brief benchmarking on profiledbootstrap and it it won't
> cause major regression, I think we should go ahead with the patch.
>
> I was never really happy about the double use there and in fact the whole fixed
> point arithmetic in badness compuation is a mess.  If we had template based
> fibonaci heap and sreal fast enough, turing it all to reals would save quite
> some maintenance burden.

Yeah, well.

Richard.

> Honza
Jan Hubicka Aug. 28, 2013, 5:11 p.m. UTC | #7
> > I am giving the patch brief benchmarking on profiledbootstrap and it it won't
> > cause major regression, I think we should go ahead with the patch.

Uhm, I profiledbootstrapped and we bit too fast to get resonable oprofile.  What I get is:
7443      9.4372  lto1                     lto1                     lto_end_uncompression(lto_compression_stream*)
4438      5.6271  lto1                     lto1                     _ZL14DFS_write_treeP12output_blockP4sccsP9tree_nodebb.lto_priv.4993
2351      2.9809  lto1                     lto1                     lto_output_tree(output_block*, tree_node*, bool, bool)
2179      2.7628  lto1                     lto1                     _ZL30linemap_macro_loc_to_exp_pointP9line_mapsjPPK8line_map.lto_priv.7860
1910      2.4217  lto1                     lto1                     _ZL19unpack_value_fieldsP7data_inP9bitpack_dP9tree_node.lto_priv.7292
1855      2.3520  libc-2.11.1.so           libc-2.11.1.so           msort_with_tmp
1531      1.9412  lto1                     lto1                     streamer_string_index(output_block*, char const*, unsigned int, bool)
1530      1.9399  libc-2.11.1.so           libc-2.11.1.so           _int_malloc
1471      1.8651  lto1                     lto1                     do_estimate_growth(cgraph_node*)
1306      1.6559  lto1                     lto1                     pointer_map_insert(pointer_map_t*, void const*)
1238      1.5697  lto1                     lto1                     _Z28streamer_pack_tree_bitfieldsP12output_blockP9bitpack_dP9tree_node.constprop.1086
1138      1.4429  lto1                     lto1                     compare_tree_sccs_1(tree_node*, tree_node*, tree_node***)
1082      1.3719  lto1                     lto1                     streamer_write_tree_body(output_block*, tree_node*, bool)
1044      1.3237  lto1                     lto1                     _ZL28estimate_calls_size_and_timeP11cgraph_nodePiS1_S1_j3vecIP9tree_node7va_heap6vl_ptrES7_S2_IP21ipa_agg_jump_function

We take 12 seconds of WPA on GCC (with my fork patch)
Execution times (seconds)
 phase setup             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    1412 kB ( 0%) ggc
 phase opt and generate  :   4.48 (37%) usr   0.05 ( 6%) sys   4.57 (34%) wall   42983 kB ( 7%) ggc
 phase stream in         :   7.21 (60%) usr   0.26 (32%) sys   7.47 (56%) wall  565102 kB (93%) ggc
 phase stream out        :   0.38 ( 3%) usr   0.50 (62%) sys   1.37 (10%) wall     623 kB ( 0%) ggc
 callgraph optimization  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       6 kB ( 0%) ggc
 ipa dead code removal   :   0.46 ( 4%) usr   0.00 ( 0%) sys   0.46 ( 3%) wall       0 kB ( 0%) ggc
 ipa cp                  :   0.36 ( 3%) usr   0.01 ( 1%) sys   0.41 ( 3%) wall   38261 kB ( 6%) ggc
 ipa inlining heuristics :   2.84 (24%) usr   0.05 ( 6%) sys   2.87 (21%) wall   60263 kB (10%) ggc
 ipa lto gimple in       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 ipa lto gimple out      :   0.04 ( 0%) usr   0.02 ( 2%) sys   0.06 ( 0%) wall       0 kB ( 0%) ggc
 ipa lto decl in         :   6.23 (52%) usr   0.18 (22%) sys   6.40 (48%) wall  425731 kB (70%) ggc
 ipa lto decl out        :   0.09 ( 1%) usr   0.01 ( 1%) sys   0.10 ( 1%) wall       0 kB ( 0%) ggc
 ipa lto cgraph I/O      :   0.22 ( 2%) usr   0.02 ( 2%) sys   0.25 ( 2%) wall   60840 kB (10%) ggc
 ipa lto decl merge      :   0.20 ( 2%) usr   0.00 ( 0%) sys   0.20 ( 1%) wall    1051 kB ( 0%) ggc
 ipa lto cgraph merge    :   0.22 ( 2%) usr   0.01 ( 1%) sys   0.25 ( 2%) wall   17676 kB ( 3%) ggc
 whopr wpa               :   0.38 ( 3%) usr   0.00 ( 0%) sys   0.35 ( 3%) wall     626 kB ( 0%) ggc
 whopr wpa I/O           :   0.01 ( 0%) usr   0.47 (58%) sys   0.98 ( 7%) wall       0 kB ( 0%) ggc
 whopr partitioning      :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall       0 kB ( 0%) ggc
 ipa reference           :   0.31 ( 3%) usr   0.01 ( 1%) sys   0.33 ( 2%) wall       0 kB ( 0%) ggc
 ipa profile             :   0.09 ( 1%) usr   0.01 ( 1%) sys   0.10 ( 1%) wall     150 kB ( 0%) ggc
 ipa pure const          :   0.29 ( 2%) usr   0.00 ( 0%) sys   0.30 ( 2%) wall       0 kB ( 0%) ggc
 tree SSA incremental    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     203 kB ( 0%) ggc
 tree operand scan       :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall    3512 kB ( 1%) ggc
 dominance computation   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 varconst                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 unaccounted todo        :   0.06 ( 0%) usr   0.01 ( 1%) sys   0.09 ( 1%) wall       0 kB ( 0%) ggc
 TOTAL                 :  12.08             0.81            13.43             610123 kB

Inliing heuristics was also around 25% w/o your change.  Timming maches my
experience with firefox - growth estimation tends to be the hot functions, with
caching, badness is off the radar.  As such I think the patch is safe to go.
Thank you!


> >
> > I was never really happy about the double use there and in fact the whole fixed
> > point arithmetic in badness compuation is a mess.  If we had template based
> > fibonaci heap and sreal fast enough, turing it all to reals would save quite
> > some maintenance burden.
> 
> Yeah, well.
> 
> Richard.
> 
> > Honza
diff mbox

Patch

Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 200375)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,7 @@ 
+2013-06-21  Dehao Chen  <dehao@google.com>
+
+	* ipa-inline.c (edge_badness): Fix integer underflow.
+
 2013-06-24  Martin Jambor  <mjambor@suse.cz>
 
 	PR tree-optimization/57358
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 200375)
+++ gcc/Makefile.in	(working copy)
@@ -2947,7 +2947,7 @@  ipa-inline.o : ipa-inline.c $(CONFIG_H) $(SYSTEM_H
    $(DIAGNOSTIC_H) $(FIBHEAP_H) $(PARAMS_H) $(TREE_PASS_H) \
    $(COVERAGE_H) $(GGC_H) $(TREE_FLOW_H) $(RTL_H) $(IPA_PROP_H) \
    $(EXCEPT_H) $(GIMPLE_PRETTY_PRINT_H) $(IPA_INLINE_H) $(TARGET_H) \
-   $(IPA_UTILS_H)
+   $(IPA_UTILS_H) sreal.h
 ipa-inline-analysis.o : ipa-inline-analysis.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(TREE_H) langhooks.h $(TREE_INLINE_H) $(FLAGS_H) $(CGRAPH_H) intl.h \
    $(DIAGNOSTIC_H) $(PARAMS_H) $(TREE_PASS_H) $(CFGLOOP_H) \
Index: gcc/ipa-inline.c
===================================================================
--- gcc/ipa-inline.c	(revision 200375)
+++ gcc/ipa-inline.c	(working copy)
@@ -113,10 +113,12 @@  along with GCC; see the file COPYING3.  If not see
 #include "target.h"
 #include "ipa-inline.h"
 #include "ipa-utils.h"
+#include "sreal.h"
 
 /* Statistics we collect about inlining algorithm.  */
 static int overall_size;
 static gcov_type max_count;
+static sreal max_count_real, max_relbenefit_real, half_int_min_real;
 
 /* Return false when inlining edge E would lead to violating
    limits on function unit growth or stack usage growth.  
@@ -887,12 +889,26 @@  edge_badness (struct cgraph_edge *edge, bool dump)
 
   else if (max_count)
     {
+      sreal tmp, relbenefit_real, growth_real;
       int relbenefit = relative_time_benefit (callee_info, edge, edge_time);
-      badness =
-	((int)
-	 ((double) edge->count * INT_MIN / 2 / max_count / RELATIVE_TIME_BENEFIT_RANGE) *
-	 relbenefit) / growth;
-      
+
+      sreal_init(&relbenefit_real, relbenefit, 0);
+      sreal_init(&growth_real, growth, 0);
+
+      /* relative_edge_count.  */
+      sreal_init (&tmp, edge->count, 0);
+      sreal_div (&tmp, &tmp, &max_count_real);
+
+      /* relative_time_benefit.  */
+      sreal_mul (&tmp, &tmp, &relbenefit_real);
+      sreal_div (&tmp, &tmp, &max_relbenefit_real);
+
+      /* growth_f_caller.  */
+      sreal_mul (&tmp, &tmp, &half_int_min_real);
+      sreal_div (&tmp, &tmp, &growth_real);
+
+      badness = -1 * sreal_to_int (&tmp);
+ 
       /* Be sure that insanity of the profile won't lead to increasing counts
 	 in the scalling and thus to overflow in the computation above.  */
       gcc_assert (max_count >= edge->count);
@@ -1448,6 +1464,9 @@  inline_small_functions (void)
 	  if (max_count < edge->count)
 	    max_count = edge->count;
       }
+  sreal_init (&max_count_real, max_count, 0);
+  sreal_init (&max_relbenefit_real, RELATIVE_TIME_BENEFIT_RANGE, 0);
+  sreal_init (&half_int_min_real, INT_MAX / 2, 0);
   ipa_free_postorder_info ();
   initialize_growth_caches ();