diff mbox series

[RFC] Add inlining growth bias flag

Message ID 20190906105846.1123-1-graham.markall@embecosm.com
State New
Headers show
Series [RFC] Add inlining growth bias flag | expand

Commit Message

Graham Markall Sept. 6, 2019, 10:58 a.m. UTC
This patch is an RFC to invite comments as to whether the approach
to solving the problem is a suitable one for upstream GCC, or whether
there are alternative approaches that would be recommended.

Motivation
----------

We have observed that in some cases the estimation of callee growth for
inlining particular functions can be tuned for better overall code size
with particular programs on particular targets. Although modification of
the heuristics to make a general improvement is a difficult problem to
tackle, simply biasing the growth by a fixed amount can lead to
improvements in code size within the context of a particular program.

This has first been tested on a proprietary program, where setting the
growth bias to -2 resulted in a saving of 1.35% in the text section size
(62396 bytes as opposed to 63252 bytes). Using the Embench suite (
https://www.embench.org/ ) for a riscv32 bare-metal target with -Os also
shows that adjusting the inline growth estimate carefully can also
reduce code size. Results presented here are percentages relative to the
Embench reference platform (which is the standard way of presenting
Embench results) for values of the bias from -2 to 2:

Benchmark	-2	-1	0	1	2
aha-mont64	 99.05	 99.05	 99.05	 99.05	 99.05
crc32		100.00	100.00	100.00	100.00	100.00
cubic		 94.66	 94.66	 94.66	 94.66	 94.66
edn		100.00	100.00	100.00	100.00	100.00
huffbench	 99.88	 99.88	 99.88	 99.88	 99.88
matmult-int	100.00	100.00	100.00	102.86	102.86
minver		 97.96	 97.96	 97.96	106.88	106.88
nbody		 98.87	 98.87	 98.87	 98.87	 98.87
nettle-aes	 99.31	 99.10	 99.10	 99.10	 99.10
nettle-sha256	 99.89	 99.89	 99.89	 99.89	 99.89
nsichneu	100.00	100.00	100.00	100.00	100.00
picojpeg	102.56	102.54	 99.73	 99.38	 99.28
qrduino		 99.70	 99.70	 99.90	 99.90	 99.90
sglib-combined	 95.52	100.00	100.00	100.43	101.12
slre		100.66	100.66	 99.09	 98.85	 98.85
st		 96.36	 96.36	 96.36	 96.82	 96.82
statemate	100.00	100.00	100.00	100.00	100.00
ud		100.00	100.00	100.00	100.00	100.00
wikisort	 99.48	 99.48	 99.48	 99.48	 99.48
Mean		 99.14	 99.36	 99.15	 99.77	 99.80

In most cases, leaving the bias at 0 (unmodified) produces the smallest
code. However, there are some cases where an alternative value prevails,
The "Best Diff" column shows the reduction in size compared to leaving
the bias at 0, for cases where changing it yielded an improvement:

Benchmark	Best	Worst   Best diff
aha-mont64	0	0
crc32		0	0
cubic		0	0
edn		0	0
huffbench	0	0
matmult-int	0	1 / 2
minver		0	1 / 2
nbody		0	0
nettle-aes	0	-2
nettle-sha256	0	0
nsichneu	0	0
picojpeg	2	-2      -0.45%
qrduino		-1 /-2	0       -0.20%
sglib-combined	-2	1       -4.48%
slre		1 / 2	-1 / -2 -0.24%
st		0	1 / 2
statemate	0	0
ud		0	0
wikisort	0	0


In summary, for this test setup:

- In most cases, leaving the growth bias at 0 is optimal.
- Biasing the growth estimate positively may either increase or decrease
  code size.
- Biasing the estimate negatively may also either increase or decrease
  code size.
- In a small number of cases, biasing the growth estimate improves code
  size (up to 4.48% smaller for sglib-combined).


Patch
-----

The growth bias can be set with a flag:

  -finline-growth-bias=<number>

which controls the bias (an increase or decrease) of the inline growth
in ipa-inline.c. In cases where the bias would result in a negative
function size, we clip the growth estimate so that adding the growth to
the size of the function results in a size of 0, by setting the growth
to the negative of the function size.

There is not a great deal of validation of the argument - if a
non-integer is passed as the parameter (e.g. -finline-growth-bias=xyz),
it will be as if the parameter were 0. More validation could be added if
necessary. It seemed to me that GCC's infrastructure doesn't seem to
anticipate option values / parameters that could contain negative
values. For parameters, -1 seemed to represent an error and could result
in an ICE (-2 etc. pass through OK though). For options, I looked at the
UInteger and Host_Wide_Int numeric types, but they both expect a
positive integer. I did consider extending this with an Integer type
that could accept positive and negative integers, (e.g. starting with
augmenting switch_bit_fields in opt-functions.awk) but rooting out
assumptions of non-negative integer values in places further down seemed
like an onerous task.


Further discussion
------------------

Given that a default setting of 0 (equivalent to current behaviour)
should provide for some potential improvement in code size in individual
situations where it is critical (albeit using a rather coarse instrument
to do so), without affecting the general case - is the addition of such
a flag to GCC likely to be considered an acceptable inclusion for this
purpose?


gcc/ChangeLog:

        * common.opt: Add -finline-growth-bias= flag.
        * ipa-inline.h (estimate_edge_growth): Adjust inline
        growth by bias factor, if supplied.
---
 gcc/common.opt   |  4 ++++
 gcc/ipa-inline.h | 39 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 1 deletion(-)

Comments

Jeff Law Sept. 9, 2019, 6:55 p.m. UTC | #1
On 9/6/19 4:58 AM, Graham Markall wrote:
> This patch is an RFC to invite comments as to whether the approach
> to solving the problem is a suitable one for upstream GCC, or whether
> there are alternative approaches that would be recommended.
> 
> Motivation
> ----------
> 
> We have observed that in some cases the estimation of callee growth for
> inlining particular functions can be tuned for better overall code size
> with particular programs on particular targets. Although modification of
> the heuristics to make a general improvement is a difficult problem to
> tackle, simply biasing the growth by a fixed amount can lead to
> improvements in code size within the context of a particular program.
> 
> This has first been tested on a proprietary program, where setting the
> growth bias to -2 resulted in a saving of 1.35% in the text section size
> (62396 bytes as opposed to 63252 bytes). Using the Embench suite (
> https://www.embench.org/ ) for a riscv32 bare-metal target with -Os also
> shows that adjusting the inline growth estimate carefully can also
> reduce code size. Results presented here are percentages relative to the
> Embench reference platform (which is the standard way of presenting
> Embench results) for values of the bias from -2 to 2:
> 
> Benchmark	-2	-1	0	1	2
> aha-mont64	 99.05	 99.05	 99.05	 99.05	 99.05
> crc32		100.00	100.00	100.00	100.00	100.00
> cubic		 94.66	 94.66	 94.66	 94.66	 94.66
> edn		100.00	100.00	100.00	100.00	100.00
> huffbench	 99.88	 99.88	 99.88	 99.88	 99.88
> matmult-int	100.00	100.00	100.00	102.86	102.86
> minver		 97.96	 97.96	 97.96	106.88	106.88
> nbody		 98.87	 98.87	 98.87	 98.87	 98.87
> nettle-aes	 99.31	 99.10	 99.10	 99.10	 99.10
> nettle-sha256	 99.89	 99.89	 99.89	 99.89	 99.89
> nsichneu	100.00	100.00	100.00	100.00	100.00
> picojpeg	102.56	102.54	 99.73	 99.38	 99.28
> qrduino		 99.70	 99.70	 99.90	 99.90	 99.90
> sglib-combined	 95.52	100.00	100.00	100.43	101.12
> slre		100.66	100.66	 99.09	 98.85	 98.85
> st		 96.36	 96.36	 96.36	 96.82	 96.82
> statemate	100.00	100.00	100.00	100.00	100.00
> ud		100.00	100.00	100.00	100.00	100.00
> wikisort	 99.48	 99.48	 99.48	 99.48	 99.48
> Mean		 99.14	 99.36	 99.15	 99.77	 99.80
> 
> In most cases, leaving the bias at 0 (unmodified) produces the smallest
> code. However, there are some cases where an alternative value prevails,
> The "Best Diff" column shows the reduction in size compared to leaving
> the bias at 0, for cases where changing it yielded an improvement:
> 
> Benchmark	Best	Worst   Best diff
> aha-mont64	0	0
> crc32		0	0
> cubic		0	0
> edn		0	0
> huffbench	0	0
> matmult-int	0	1 / 2
> minver		0	1 / 2
> nbody		0	0
> nettle-aes	0	-2
> nettle-sha256	0	0
> nsichneu	0	0
> picojpeg	2	-2      -0.45%
> qrduino		-1 /-2	0       -0.20%
> sglib-combined	-2	1       -4.48%
> slre		1 / 2	-1 / -2 -0.24%
> st		0	1 / 2
> statemate	0	0
> ud		0	0
> wikisort	0	0
> 
> 
> In summary, for this test setup:
> 
> - In most cases, leaving the growth bias at 0 is optimal.
> - Biasing the growth estimate positively may either increase or decrease
>   code size.
> - Biasing the estimate negatively may also either increase or decrease
>   code size.
> - In a small number of cases, biasing the growth estimate improves code
>   size (up to 4.48% smaller for sglib-combined).
> 
> 
> Patch
> -----
> 
> The growth bias can be set with a flag:
> 
>   -finline-growth-bias=<number>
> 
> which controls the bias (an increase or decrease) of the inline growth
> in ipa-inline.c. In cases where the bias would result in a negative
> function size, we clip the growth estimate so that adding the growth to
> the size of the function results in a size of 0, by setting the growth
> to the negative of the function size.
> 
> There is not a great deal of validation of the argument - if a
> non-integer is passed as the parameter (e.g. -finline-growth-bias=xyz),
> it will be as if the parameter were 0. More validation could be added if
> necessary. It seemed to me that GCC's infrastructure doesn't seem to
> anticipate option values / parameters that could contain negative
> values. For parameters, -1 seemed to represent an error and could result
> in an ICE (-2 etc. pass through OK though). For options, I looked at the
> UInteger and Host_Wide_Int numeric types, but they both expect a
> positive integer. I did consider extending this with an Integer type
> that could accept positive and negative integers, (e.g. starting with
> augmenting switch_bit_fields in opt-functions.awk) but rooting out
> assumptions of non-negative integer values in places further down seemed
> like an onerous task.
> 
> 
> Further discussion
> ------------------
> 
> Given that a default setting of 0 (equivalent to current behaviour)
> should provide for some potential improvement in code size in individual
> situations where it is critical (albeit using a rather coarse instrument
> to do so), without affecting the general case - is the addition of such
> a flag to GCC likely to be considered an acceptable inclusion for this
> purpose?
> 
> 
> gcc/ChangeLog:
> 
>         * common.opt: Add -finline-growth-bias= flag.
>         * ipa-inline.h (estimate_edge_growth): Adjust inline
>         growth by bias factor, if supplied.
I'm not sure if we really want to expose this as a first class option.
Perhaps a PARAM would be better.  Jan would have the final call though
since he's done the most work on the IPA bits.

Jeff
Graham Markall Sept. 11, 2019, 8:42 p.m. UTC | #2
Hi Jeff,

On 09/09/2019 19:55, Jeff Law wrote:
> I'm not sure if we really want to expose this as a first class option.
> Perhaps a PARAM would be better.  Jan would have the final call though
> since he's done the most work on the IPA bits.

Many thanks for the suggestion - I could turn it into a pair of integer
params (one to increase the growth and one to decrease it) - would that
still be suitable? (I think there's no easy way to have a param with as
signed value, but am I missing something obvious?)

Best regards,
Graham.
Richard Biener Sept. 12, 2019, 10:08 a.m. UTC | #3
On Fri, Sep 6, 2019 at 12:59 PM Graham Markall
<graham.markall@embecosm.com> wrote:
>
> This patch is an RFC to invite comments as to whether the approach
> to solving the problem is a suitable one for upstream GCC, or whether
> there are alternative approaches that would be recommended.
>
> Motivation
> ----------
>
> We have observed that in some cases the estimation of callee growth for
> inlining particular functions can be tuned for better overall code size
> with particular programs on particular targets. Although modification of
> the heuristics to make a general improvement is a difficult problem to
> tackle, simply biasing the growth by a fixed amount can lead to
> improvements in code size within the context of a particular program.
>
> This has first been tested on a proprietary program, where setting the
> growth bias to -2 resulted in a saving of 1.35% in the text section size
> (62396 bytes as opposed to 63252 bytes). Using the Embench suite (
> https://www.embench.org/ ) for a riscv32 bare-metal target with -Os also
> shows that adjusting the inline growth estimate carefully can also
> reduce code size. Results presented here are percentages relative to the
> Embench reference platform (which is the standard way of presenting
> Embench results) for values of the bias from -2 to 2:
>
> Benchmark       -2      -1      0       1       2
> aha-mont64       99.05   99.05   99.05   99.05   99.05
> crc32           100.00  100.00  100.00  100.00  100.00
> cubic            94.66   94.66   94.66   94.66   94.66
> edn             100.00  100.00  100.00  100.00  100.00
> huffbench        99.88   99.88   99.88   99.88   99.88
> matmult-int     100.00  100.00  100.00  102.86  102.86
> minver           97.96   97.96   97.96  106.88  106.88
> nbody            98.87   98.87   98.87   98.87   98.87
> nettle-aes       99.31   99.10   99.10   99.10   99.10
> nettle-sha256    99.89   99.89   99.89   99.89   99.89
> nsichneu        100.00  100.00  100.00  100.00  100.00
> picojpeg        102.56  102.54   99.73   99.38   99.28
> qrduino          99.70   99.70   99.90   99.90   99.90
> sglib-combined   95.52  100.00  100.00  100.43  101.12
> slre            100.66  100.66   99.09   98.85   98.85
> st               96.36   96.36   96.36   96.82   96.82
> statemate       100.00  100.00  100.00  100.00  100.00
> ud              100.00  100.00  100.00  100.00  100.00
> wikisort         99.48   99.48   99.48   99.48   99.48
> Mean             99.14   99.36   99.15   99.77   99.80
>
> In most cases, leaving the bias at 0 (unmodified) produces the smallest
> code. However, there are some cases where an alternative value prevails,
> The "Best Diff" column shows the reduction in size compared to leaving
> the bias at 0, for cases where changing it yielded an improvement:
>
> Benchmark       Best    Worst   Best diff
> aha-mont64      0       0
> crc32           0       0
> cubic           0       0
> edn             0       0
> huffbench       0       0
> matmult-int     0       1 / 2
> minver          0       1 / 2
> nbody           0       0
> nettle-aes      0       -2
> nettle-sha256   0       0
> nsichneu        0       0
> picojpeg        2       -2      -0.45%
> qrduino         -1 /-2  0       -0.20%
> sglib-combined  -2      1       -4.48%
> slre            1 / 2   -1 / -2 -0.24%
> st              0       1 / 2
> statemate       0       0
> ud              0       0
> wikisort        0       0
>
>
> In summary, for this test setup:
>
> - In most cases, leaving the growth bias at 0 is optimal.
> - Biasing the growth estimate positively may either increase or decrease
>   code size.
> - Biasing the estimate negatively may also either increase or decrease
>   code size.
> - In a small number of cases, biasing the growth estimate improves code
>   size (up to 4.48% smaller for sglib-combined).
>
>
> Patch
> -----
>
> The growth bias can be set with a flag:
>
>   -finline-growth-bias=<number>
>
> which controls the bias (an increase or decrease) of the inline growth
> in ipa-inline.c. In cases where the bias would result in a negative
> function size, we clip the growth estimate so that adding the growth to
> the size of the function results in a size of 0, by setting the growth
> to the negative of the function size.
>
> There is not a great deal of validation of the argument - if a
> non-integer is passed as the parameter (e.g. -finline-growth-bias=xyz),
> it will be as if the parameter were 0. More validation could be added if
> necessary. It seemed to me that GCC's infrastructure doesn't seem to
> anticipate option values / parameters that could contain negative
> values. For parameters, -1 seemed to represent an error and could result
> in an ICE (-2 etc. pass through OK though). For options, I looked at the
> UInteger and Host_Wide_Int numeric types, but they both expect a
> positive integer. I did consider extending this with an Integer type
> that could accept positive and negative integers, (e.g. starting with
> augmenting switch_bit_fields in opt-functions.awk) but rooting out
> assumptions of non-negative integer values in places further down seemed
> like an onerous task.
>
>
> Further discussion
> ------------------
>
> Given that a default setting of 0 (equivalent to current behaviour)
> should provide for some potential improvement in code size in individual
> situations where it is critical (albeit using a rather coarse instrument
> to do so), without affecting the general case - is the addition of such
> a flag to GCC likely to be considered an acceptable inclusion for this
> purpose?

I agree with Jeff that this should be a --param like all other inliner
tunings.

What the bias achieves is adjusting functions size estimate by
an absolute value so I'd implement it there like

Index: gcc/ipa-fnsummary.c
===================================================================
--- gcc/ipa-fnsummary.c (revision 275680)
+++ gcc/ipa-fnsummary.c (working copy)
@@ -2467,6 +2467,8 @@ compute_fn_summary (struct cgraph_node *
       break;
   node->calls_comdat_local = (e != NULL);

+  info->self_size = MAX (0, info->self_size + PARAM_VALUE
(PARAM_INLINE_FNSIZE_BIAS));
+
   /* Inlining characteristics are maintained by the cgraph_mark_inline.  */
   info->size = info->self_size;
   info->stack_frame_offset = 0;

which has the advantage to reduce the number of times the adjustment is
done.  I also think that you can use --param unlinlined-function-insns to
get the same effect of your parameter, but it doesn't allow negative values
though an effective negative value of -2 is possible since the default is 2.
The parameter is used during summary computation and so if we want
an additional parameter it could be accounted for at the same place this one is.

Richard.

>
> gcc/ChangeLog:
>
>         * common.opt: Add -finline-growth-bias= flag.
>         * ipa-inline.h (estimate_edge_growth): Adjust inline
>         growth by bias factor, if supplied.
> ---
>  gcc/common.opt   |  4 ++++
>  gcc/ipa-inline.h | 39 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 42 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index c1605385712..4725df2c477 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1670,6 +1670,10 @@ finline-limit=
>  Common RejectNegative Joined UInteger
>  -finline-limit=<number>        Limit the size of inlined functions to <number>.
>
> +finline-growth-bias=
> +Common RejectNegative Joined Var(flag_inline_growth_bias) Init(0)
> +-finline-growth-bias=<number>  Adjust the growth estimate for inlined functions by <number>.
> +
>  finline-atomics
>  Common Report Var(flag_inline_atomics) Init(1) Optimization
>  Inline __atomic operations when a lock free instruction sequence is available.
> diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
> index 18c8e1eebd0..65e105b00a2 100644
> --- a/gcc/ipa-inline.h
> +++ b/gcc/ipa-inline.h
> @@ -84,7 +84,44 @@ estimate_edge_growth (struct cgraph_edge *edge)
>  {
>    ipa_call_summary *s = ipa_call_summaries->get (edge);
>    gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed);
> -  return (estimate_edge_size (edge) - s->call_stmt_size);
> +
> +  if (dump_file)
> +    {
> +      const char *caller_name
> +       = edge->caller->ultimate_alias_target ()->name ();
> +      const char *callee_name
> +       = edge->callee->ultimate_alias_target ()->name ();
> +
> +      fprintf (dump_file, "Estimate edge growth for %s -> %s\n",
> +              caller_name, callee_name);
> +      fprintf (dump_file, "\t estimate_edge_size = %d\n",
> +              estimate_edge_size (edge));
> +      fprintf (dump_file, "\t s (%p) ->call_stmt_size = %d\n",
> +              ((void *) s), s->call_stmt_size);
> +    }
> +
> +  int growth = (estimate_edge_size (edge) - s->call_stmt_size);
> +
> +  int sz = 0;
> +  if (flag_inline_growth_bias)
> +    sz = strtol(flag_inline_growth_bias, NULL, 0);
> +  if (sz != 0)
> +    {
> +      struct cgraph_node *caller = edge->caller;
> +      ipa_fn_summary *fs = ipa_fn_summaries->get (caller);
> +      if (dump_file)
> +       fprintf (dump_file, "\t Adjusting growth by %d\n", sz);
> +      growth += sz;
> +      if (growth < 0 && fs->size + growth < 0) {
> +       if (dump_file)
> +          fprintf (dump_file, "\t Growth %d would shrink size beneath zero.\n", growth);
> +       growth = -fs->size;
> +        if (dump_file)
> +         fprintf (dump_file, "\t Setting growth to %d to make size zero.\n", growth);
> +      }
> +    }
> +
> +  return growth;
>  }
>
>  /* Return estimated callee runtime increase after inlining
> --
> 2.21.0
>
diff mbox series

Patch

diff --git a/gcc/common.opt b/gcc/common.opt
index c1605385712..4725df2c477 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1670,6 +1670,10 @@  finline-limit=
 Common RejectNegative Joined UInteger
 -finline-limit=<number>	Limit the size of inlined functions to <number>.
 
+finline-growth-bias=
+Common RejectNegative Joined Var(flag_inline_growth_bias) Init(0)
+-finline-growth-bias=<number>	Adjust the growth estimate for inlined functions by <number>.
+
 finline-atomics
 Common Report Var(flag_inline_atomics) Init(1) Optimization
 Inline __atomic operations when a lock free instruction sequence is available.
diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
index 18c8e1eebd0..65e105b00a2 100644
--- a/gcc/ipa-inline.h
+++ b/gcc/ipa-inline.h
@@ -84,7 +84,44 @@  estimate_edge_growth (struct cgraph_edge *edge)
 {
   ipa_call_summary *s = ipa_call_summaries->get (edge);
   gcc_checking_assert (s->call_stmt_size || !edge->callee->analyzed);
-  return (estimate_edge_size (edge) - s->call_stmt_size);
+
+  if (dump_file)
+    {
+      const char *caller_name
+	= edge->caller->ultimate_alias_target ()->name ();
+      const char *callee_name
+	= edge->callee->ultimate_alias_target ()->name ();
+
+      fprintf (dump_file, "Estimate edge growth for %s -> %s\n",
+	       caller_name, callee_name);
+      fprintf (dump_file, "\t estimate_edge_size = %d\n",
+	       estimate_edge_size (edge));
+      fprintf (dump_file, "\t s (%p) ->call_stmt_size = %d\n",
+	       ((void *) s), s->call_stmt_size);
+    }
+
+  int growth = (estimate_edge_size (edge) - s->call_stmt_size);
+
+  int sz = 0;
+  if (flag_inline_growth_bias)
+    sz = strtol(flag_inline_growth_bias, NULL, 0);
+  if (sz != 0)
+    {
+      struct cgraph_node *caller = edge->caller;
+      ipa_fn_summary *fs = ipa_fn_summaries->get (caller);
+      if (dump_file)
+	fprintf (dump_file, "\t Adjusting growth by %d\n", sz);
+      growth += sz;
+      if (growth < 0 && fs->size + growth < 0) {
+	if (dump_file)
+          fprintf (dump_file, "\t Growth %d would shrink size beneath zero.\n", growth);
+	growth = -fs->size;
+        if (dump_file)
+	  fprintf (dump_file, "\t Setting growth to %d to make size zero.\n", growth);
+      }
+    }
+
+  return growth;
 }
 
 /* Return estimated callee runtime increase after inlining