diff mbox

[WWWDOCS] Document IPA/LTO/FDO/i386 changes in GCC-4.9

Message ID 20131118161757.GF11338@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Nov. 18, 2013, 4:17 p.m. UTC
Hi,
there was many changes in this area.  The following are ones I can think of.
Please fell free to suggest more changes.
We probably should mention Teresa's splitting work once it is complete
and new micro-architectures targetd by x86 backend.

Honza

Comments

Gerald Pfeifer Nov. 23, 2013, 1:30 a.m. UTC | #1
On Mon, 18 Nov 2013, Jan Hubicka wrote:
> there was many changes in this area.  The following are ones I can think 
> of. Please fell free to suggest more changes. We probably should mention 
> Teresa's splitting work once it is complete and new micro-architectures 
> targetd by x86 backend.

Yes, those definitely are good additional changes.

Thanks for taking the time to write these up (apart from the time to
actually hack much of it). :-)

> Index: changes.html
> ===================================================================
> +      <li>Type merging was rewritten. New implementation is significantly faster
> +	  and use less memory. 

"The new implementation"

"uses less memory"

> +      <li>Better partitioning algorithm resulting in less streaming during
> +	  link-time.</li>

"link time"

> +      <li>Early removal of virtual methods reduce size of object files and
> +	  improve link-time memory usage and compile time.</li>

"reduces the size of object files and improves"

> +      <li>Functions are no longer pointlessly renamed.</li>

Readers may struggle a bit with this.  What does it refer to?

> +      <li>Function bodies are now loaded on-demand and released early improving
> +	  overall memory usage at link-time.</li>

"link time"

> +    </ul>
> +    Memory usage of Firefox build with debug enabled was reduced from 15GB to
> +    3.5GB. Link time from 1700 seconds to 350 seconds.

"Memory usage building Firefox" 

Perhaps specify the version number?

If you want to write the second sentence without a verb, I suggest

"; link time..."

> +      <li>New type inheritance analysis module improving devirtualization.
> +	  Devirtualization now take into account anonymous name-spaces and the
> +	  C++11 <code>final</code> keyword.</li>

"takes into account"

> +      <li>Calls that was speculatively made direct are turned back to indirect
> +	  when doing so does not bring any noticeable benefits.</li>

"was" -> "were"

Also this sentence is a bit confusing.  It can be read that turning back
the calls does not bring noticable benefits.

> +      <li>Local aliases are introduced for symbols that are known to be
> +	  semantically equivalent across shared libraries improving dynamic
> +	  linking times.</li>

", improving"

> +      <li>New time profiling determine typical order in which functions are executed.</li>

"determines", and can you break the long line?

> +      <li>New function reordering pass (controlled by
> +	  <code>-freorder-functions</code>) significantly reduces
> +	  startup time of large applications.  Until binutils support is
> + 	  completed, it is effective only with link time optimization.</li>

"A new function reordering..." 

"link-time optimization"

> +    <li>Better inlining of <code>memcpy</code> and <code>memset</code> 
> +	that is avare of value ranges and produce shorter alignment prologues.

"aware"

"produces"

> +      for portions of program optimized for size.</li>

"of programs"

Cheers,
Gerald
Jan Hubicka Nov. 28, 2013, 4:54 p.m. UTC | #2
> > +      <li>Functions are no longer pointlessly renamed.</li>
> 
> Readers may struggle a bit with this.  What does it refer to?

We previously renamed every static function foo into foo.1234
(just as a precaution because other compilation unit may have also function foo).
This confuses many thins, so now we do renaming only when we see a conflict.

I am attaching the changes I comitted.

I dropped this from news changes.  In meantime we merged in the change enabling
slim LTO files by defualt, what about:

Index: changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.41
diff -c -p -r1.41 changes.html
*** changes.html	28 Nov 2013 15:05:51 -0000	1.41
--- changes.html	28 Nov 2013 16:53:37 -0000
***************
*** 15,20 ****
--- 15,25 ----
  <h2>Caveats</h2>
  
    <ul>
+     <li>Because <code>-fno-fat-lto-objects</code> is now by default,
+ 	<code>gcc-ar</code> and <code>gcc-nm</code> wrappers needs
+ 	to be used to handle objects compiled with <code>-flto</code>.
+ 	Additionally the resulting binary needs to be linked with
+ 	<code>-flto</code> (and appropriate optimization flags).</li>
      <li><p>Support for a number of older systems and recently
      unmaintained or untested target ports of GCC has been declared
      obsolete in GCC 4.9.  Unless there is activity to revive them, the
***************
*** 45,50 ****
--- 50,61 ----
      </li>
      <li>Link-time optimization (LTO) improvements:
      <ul>
+       <li>Slim LTO objects are now used by default.  This means that with
+ 	  <code>-flto</code> GCC will no longer produce non-LTO optimized binary
+ 	  in addition to storing object representation in the intermediate
+ 	  language. Consequently <code>-flto</code> no longer causes everything
+ 	  to be optimized twice (once at compile time and again during link time).
+ 	  This feature can be controlled by <code>-ffat-lto-objects</code>.</li>
        <li>Type merging was rewritten. The new implementation is significantly faster
  	  and uses less memory. 
        <li>Better partitioning algorithm resulting in less streaming during
Index: changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.40
diff -r1.40 changes.html
40,41d39
<   </ul>
<   <ul>
47a46,85
>     <li>Link-time optimization (LTO) improvements:
>     <ul>
>       <li>Type merging was rewritten. The new implementation is significantly faster
> 	  and uses less memory. 
>       <li>Better partitioning algorithm resulting in less streaming during
> 	  link time.</li>
>       <li>Early removal of virtual methods reduces the size of object files and
> 	  improves link-time memory usage and compile time.</li>
>       <li>Function bodies are now loaded on-demand and released early improving
> 	  overall memory usage at link time.</li>
>       <li>C++ hidden keyed methods can now be optimized out.</li>
>     </ul>
>     Memory usage building Firefox with debug enabled was reduced from 15GB to
>     3.5GB; link time from 1700 seconds to 350 seconds.
>     </li>
>     <li>Inter-procedural optimization improvements:
>     <ul>
>       <li>New type inheritance analysis module improving devirtualization.
> 	  Devirtualization now takes into account anonymous name-spaces and the
> 	  C++11 <code>final</code> keyword.</li>
>       <li>New speculative devirtualization pass (controlled by
> 	  <code>-fdevirtualize-speculatively</code>.</li>
>       <li>Calls that were speculatively made direct are turned back to indirect
> 	  where direct call is not cheaper.</li>
>       <li>Local aliases are introduced for symbols that are known to be
> 	  semantically equivalent across shared libraries improving dynamic
> 	  linking times.</li>
>     </ul>
>     <li>Feedback directed optimization improvements:
>     <ul>
>       <li>Profiling of programs using C++ inline functions is now more reliable.</li>
>       <li>New time profiling determines typical order in which functions are
> 	  executed.</li>
>       <li>A new function reordering pass (controlled by
> 	  <code>-freorder-functions</code>) significantly reduces
> 	  startup time of large applications.  Until binutils support is
>  	  completed, it is effective only with link-time optimization.</li>
>       <li>Feedback driven indirect call removal and devirtualization now handle
> 	  cross-module calls when link-time optimization is enabled.</li>
>     </ul></li>
337c375
<     <li> GCC now supports the new Intel microarchitecture named Silvermont
---
>     <li>GCC now supports the new Intel microarchitecture named Silvermont
339a378,388
>     <li><code>-march=generic</code> has been retuned for better support of
>       Intel core and AMD Bulldozer architectures.  Performance of AMD K7, K8,
>       Intel Pentium-M, and Pentium4 based CPUs is no longer considered important
>       for generic.
>     </li>
>     <li>Better inlining of <code>memcpy</code> and <code>memset</code> 
> 	that is aware of value ranges and produces shorter alignment prologues.
>     </li>
>     <li><code>-mno-accumulate-outgoing-args</code> is now honored when unwind
>       information is output.  Argument accumulation is also now turned off
>       for portions of programs optimized for size.</li>
diff mbox

Patch

Index: changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.36
diff -u -r1.36 changes.html
--- changes.html	15 Nov 2013 15:40:00 -0000	1.36
+++ changes.html	18 Nov 2013 16:15:32 -0000
@@ -37,14 +37,52 @@ 
   <ul>
     <li>AddressSanitizer, a fast memory error detector, is now available on ARM.
     </li>
-  </ul>
-  <ul>
     <li>UndefinedBehaviorSanitizer (ubsan), a fast undefined behavior detector,
         has been added and can be enabled via <code>-fsanitize=undefined</code>.
 	Various computations will be instrumented to detect undefined behavior
 	at runtime.  UndefinedBehaviorSanitizer is currently available for the C
 	and C++ languages.
     </li>
+    <li>Link-time optimization (LTO) improvements:
+    <ul>
+      <li>Type merging was rewritten. New implementation is significantly faster
+	  and use less memory. 
+      <li>Better partitioning algorithm resulting in less streaming during
+	  link-time.</li>
+      <li>Early removal of virtual methods reduce size of object files and
+	  improve link-time memory usage and compile time.</li>
+      <li>Functions are no longer pointlessly renamed.</li>
+      <li>Function bodies are now loaded on-demand and released early improving
+	  overall memory usage at link-time.</li>
+      <li>C++ hidden keyed methods can now be optimized out.</li>
+    </ul>
+    Memory usage of Firefox build with debug enabled was reduced from 15GB to
+    3.5GB. Link time from 1700 seconds to 350 seconds.
+    </li>
+    <li>Inter-procedural optimization improvements:
+    <ul>
+      <li>New type inheritance analysis module improving devirtualization.
+	  Devirtualization now take into account anonymous name-spaces and the
+	  C++11 <code>final</code> keyword.</li>
+      <li>New speculative devirtualization pass (controlled by
+	  <code>-fdevirtualize-speculatively</code>.</li>
+      <li>Calls that was speculatively made direct are turned back to indirect
+	  when doing so does not bring any noticeable benefits.</li>
+      <li>Local aliases are introduced for symbols that are known to be
+	  semantically equivalent across shared libraries improving dynamic
+	  linking times.</li>
+    </ul>
+    <li>Feedback directed optimization improvements:
+    <ul>
+      <li>Profiling of programs using C++ inline functions is now more reliable.</li>
+      <li>New time profiling determine typical order in which functions are executed.</li>
+      <li>New function reordering pass (controlled by
+	  <code>-freorder-functions</code>) significantly reduces
+	  startup time of large applications.  Until binutils support is
+ 	  completed, it is effective only with link time optimization.</li>
+      <li>Feedback driven indirect call removal and devirtualization now handle
+	  cross-module calls when link-time optimization is enabled.</li>
+    </ul></li>
   </ul>
 
 <h2 id="languages">New Languages and Language specific improvements</h2>
@@ -325,9 +363,20 @@ 
       href="http://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html"
       >Function Multiversioning</a>.
     </li>
-    <li> GCC now supports the new Intel microarchitecture named Silvermont
+    <li>GCC now supports the new Intel microarchitecture named Silvermont
       through <code>-march=slm</code>.
     </li>
+    <li><code>-march=generic</code> has been retuned for better support of
+      Intel core and AMD Bulldozer architectures.  Performance of AMD K7, K8,
+      Intel Pentium-M, and Pentium4 based CPUs is no longer considered important
+      for generic.
+    </li>
+    <li>Better inlining of <code>memcpy</code> and <code>memset</code> 
+	that is avare of value ranges and produce shorter alignment prologues.
+    </li>
+    <li><code>-mno-accumulate-outgoing-args</code> is now honored when unwind
+      information is output.  Argument accumulation is also now turned off
+      for portions of program optimized for size.</li>
   </ul>
 <h3 id="nds32">NDS32</h3>
   <ul>