diff mbox series

[wwwdocs] Add some info on LTO/IPA changes for GCC 9

Message ID 20190430172112.u2ii6aykd64r7ufa@kam.mff.cuni.cz
State New
Headers show
Series [wwwdocs] Add some info on LTO/IPA changes for GCC 9 | expand

Commit Message

Jan Hubicka April 30, 2019, 5:21 p.m. UTC
Hi,
this patch adds some notes on LTO/IPA changes and some statistics 
on bulid-time/memory use improvements I collected this weekend.
I also added some of FDO changes I rememebr. Martin, perhaps you can
do more.

Honza

Comments

Martin Liška May 2, 2019, 8:31 a.m. UTC | #1
On 4/30/19 7:21 PM, Jan Hubicka wrote:
> Hi,
> this patch adds some notes on LTO/IPA changes and some statistics 
> on bulid-time/memory use improvements I collected this weekend.
> I also added some of FDO changes I rememebr. Martin, perhaps you can
> do more.

Hi.

I'm fine with the changes (and numbers) you provided. Please apply following
patch that corrects some typos:

--- xxx.eml	2019-05-02 10:26:03.187395558 +0200
+++ xxx.eml.new	2019-05-02 10:30:30.340814818 +0200
@@ -21,7 +21,7 @@
        can be transformed into <code>100 * how + 5</code> (for values defined

        in the switch statement).

    </li>

-+<li>Inter-prodcedural optimization improvements:

++<li>Inter-procedural optimization improvements:

 +  <ul>

 +   <li>Inliner defaults was tuned to better suits modern C++ codebases

 +       especially when built with link time optimizations.

@@ -29,8 +29,8 @@
 +       <code>max-inline-insns-size</code>,

 +       <code>uninlined-function-insns</code>,

 +       <code>uninlined-function-time</code>, <code>uninlined-thunk-insns</code>,

-+       and <code>uninlined-thunk-time</code> was added.</li>

-+   <li>Hot/cold partitioning is now more precise and agressive.</li>

++       and <code>uninlined-thunk-time</code> were added.</li>

++   <li>Hot/cold partitioning is now more precise and aggressive.</li>

 +   <li>Improved scalability for very large translation units (especially

 +       when link-time optimizing large programs).</li>

 +  </ul>
Martin Liška May 2, 2019, 8:52 a.m. UTC | #2
Hi.

I'm sending updated version.

Martin
Richard Biener May 2, 2019, 11:23 a.m. UTC | #3
On Thu, 2 May 2019, Martin Liška wrote:

> Hi.
> 
> I'm sending updated version.

OK (though the various absolute percentages might be misleading since
they obviously refer to a very specific build environment).

Richard.

> Martin
>
Jan Hubicka May 2, 2019, 12:20 p.m. UTC | #4
> On Thu, 2 May 2019, Martin Liška wrote:
> 
> > Hi.
> > 
> > I'm sending updated version.
> 
> OK (though the various absolute percentages might be misleading since
> they obviously refer to a very specific build environment).

I have comitted this and added bit more specific info
(Firefox/Libreoffice version and fact that my testing box has 8 cores)

Honza
> 
> Richard.
> 
> > Martin
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)
diff mbox series

Patch

Index: changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.59
diff -u -r1.59 changes.html
--- changes.html	9 Apr 2019 21:08:28 -0000	1.59
+++ changes.html	30 Apr 2019 17:19:05 -0000
@@ -195,6 +195,11 @@ 
     metadata such as the inlining chain, and profile information (if
     available).
   </li>
+  <li>Inter-procedural propagation of stack alignment can now be controlled by
+      <a href="https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-fipa-stack-alignment">-fipa-stack-alignment</code></a>.
+  <li>Propagation of addressability, readonly and writeonly flags on
+      static variables can now be controlled by
+      <a href="https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-fipa-reference-addressable">-fipa-reference-addressable</code></a>.
 </ul>
 <p>The following built-in functions have been introduced.</p>
 <ul>
@@ -246,6 +251,52 @@ 
       can be transformed into <code>100 * how + 5</code> (for values defined
       in the switch statement).
   </li>
+<li>Inter-prodcedural optimization improvements:
+  <ul>
+   <li>Inliner defaults was tuned to better suits modern C++ codebases
+       especially when built with link time optimizations.
+       New parameters <code>max-inline-insns-small</code>,
+       <code>max-inline-insns-size</code>,
+       <code>uninlined-function-insns</code>,
+       <code>uninlined-function-time</code>, <code>uninlined-thunk-insns</code>,
+       and <code>uninlined-thunk-time</code> was added.</li>
+   <li>Hot/cold partitioning is now more precise and agressive.</li>
+   <li>Improved scalability for very large translation units (especially
+       when link-time optimizing large programs).</li>
+  </ul>
+<li>Profile driven optimization improvements:
+  <ul>
+    <li><code>-fprofile-use</code> now enables
+	<code>-fversion-loops-for-strides</code>,
+	<code>-floop-interchange</code>,
+	<code>-funroll-and-jam</code>,
+	<code>-ftree-loop-distribution</code>.</li>
+    <li>Streaming of counter histograms was removed. This reduces
+	the size of profile files. Histogram is computed on the fly
+	with link-time optimization.
+        Parameter <code>hot-bb-count-ws-permille</code> was reduced
+        from 999 to 990 to account for more precise histograms.</li>
+  </ul>
+<li>Link-time optimization improvements:
+  <ul>
+    <li>Types are now simplified prior streaming resulting in significant
+	reductions of the LTO object files, link-time memory use, and 
+	improvements of link-time parallelism.</li>
+    <li>Default number of partitions (<code>--param lto-partitions</code>) was
+	increased from 32 to 128 enabling effective use of CPUs with more than
+	32 hyperthreads. <code>--param lto-max-streaming-parallelism</code>
+	can now be used to control number of streaming processes.</li>
+    <li>Warnings on C++ One Decl Rule violations (<code>-Wodr</code>) are
+	now more informative and produce fewer redundant results.</li>
+  </ul>
+  Overal compile time of Firefox and LibreOffice was reduced by about 5%
+  compared to GCC 8.3.  Size of LTO object files is reduced by 7%.
+  LTO link-time improves by 11% on 8-core machine and scales significantly better
+  for more parallel build environments. Serial stage of the link-time
+  optimization is 28% faster consuming 20% less memory.
+  Parallel stage now partitions to 128 partitions rather than 32 and
+  reduces memory use for every worker by 30%.
+  </li>
 </ul>
 <p>The following improvements to the <code>gcov</code> command-line utility
   have been made.</p>