diff mbox

Do not decompress functions sections when copying them to ltrans

Message ID 20151211011916.GA5527@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Dec. 11, 2015, 1:19 a.m. UTC
Hi,
this patch makes WPA to copy sections w/o decompressing them.  This leads
to a nice /tmp usage for GCC bootstrap (about 70%) and little for Firefox.
In GCC about 5% of the ltrans object file is the global decl section, while
for Firefox it is 85%.  I will try to figure out if there is something
terribly stupid pickled there.

The patch simply adds raw section i/o to lto-section-in.c and lto-section-out.c
which is used by copy_function_or_variable.  The catch is that WPA->ltrans
stremaing is not compressed and this fact is not represented in the object file
at all.  We simply test flag_wpa and flag_ltrans.  Now function sections born
at WPA time are uncompressed, while function sections just copied are
compressed and we do not know how to read them.

I tried to simply turn off the non-compressed path and set compression level
to minimal and then to none (which works despite the apparently outdated FIXME
comments I removed).  Sadly zlib manages to burn about 16% of WPA time
at minimal level and about 7% at none because it computes the checksum. Clealry
next stage1 it is time to switch to better compression backend.

For now I added the information if section is compressed into decl_state.  I am
not thrilled by this but it is only way I found w/o wasting 4 bytes per every
lto section (because the lto header is not really extensible and the stream is
assumed to be aligned).

The whole lowlevel lto streaming code is grand mess, I hope we will clean this
up and get more sane headers in foreseable future. Until that time this
solution does not waste extra space as it is easy to pickle the flag as part of
reference.

The patch saves about 7% of WPA time for firefox:

 phase opt and generate  :  75.66 (39%) usr   1.78 (14%) sys  77.44 (37%) wall  855644 kB (21%) ggc
 phase stream in         :  34.62 (18%) usr   1.95 (16%) sys  36.57 (18%) wall 3245604 kB (79%) ggc
 phase stream out        :  81.89 (42%) usr   8.49 (69%) sys  90.37 (44%) wall      50 kB ( 0%) ggc
 ipa dead code removal   :   4.33 ( 2%) usr   0.06 ( 0%) sys   4.24 ( 2%) wall       0 kB ( 0%) ggc
 ipa virtual call target :  25.15 (13%) usr   0.14 ( 1%) sys  25.42 (12%) wall       0 kB ( 0%) ggc
 ipa cp                  :   3.92 ( 2%) usr   0.21 ( 2%) sys   4.18 ( 2%) wall  340698 kB ( 8%) ggc
 ipa inlining heuristics :  24.12 (12%) usr   0.38 ( 3%) sys  24.37 (12%) wall  500427 kB (12%) ggc
 lto stream inflate      :   7.07 ( 4%) usr   0.38 ( 3%) sys   7.33 ( 4%) wall       0 kB ( 0%) ggc
 ipa lto gimple in       :   1.95 ( 1%) usr   0.61 ( 5%) sys   2.42 ( 1%) wall  324875 kB ( 8%) ggc
 ipa lto gimple out      :   9.16 ( 5%) usr   1.64 (13%) sys  10.49 ( 5%) wall      50 kB ( 0%) ggc
 ipa lto decl in         :  21.25 (11%) usr   1.01 ( 8%) sys  22.37 (11%) wall 2348869 kB (57%) ggc
 ipa lto decl out        :  67.33 (34%) usr   1.66 (13%) sys  68.96 (33%) wall       0 kB ( 0%) ggc
 ipa lto constructors out:   1.39 ( 1%) usr   0.38 ( 3%) sys   2.18 ( 1%) wall       0 kB ( 0%) ggc
 ipa lto decl merge      :   2.12 ( 2%) usr   0.00 ( 0%) sys   2.12 ( 2%) wall   13737 kB ( 0%) ggc
 ipa reference           :   2.14 ( 2%) usr   0.00 ( 0%) sys   2.13 ( 2%) wall       0 kB ( 0%) ggc
 ipa pure const          :   2.29 ( 2%) usr   0.01 ( 0%) sys   2.35 ( 2%) wall       0 kB ( 0%) ggc
 ipa icf                 :   9.02 ( 7%) usr   0.18 ( 2%) sys   9.72 ( 7%) wall   19203 kB ( 0%) ggc
 TOTAL                 : 195.27            12.37           207.64            4103297 kB
                                                                                
                                                                                
 phase opt and generate  :  79.00 (38%) usr   1.61 (13%) sys  80.61 (36%) wall 1000597 kB (24%) ggc
 phase stream in         :  33.93 (16%) usr   1.91 (15%) sys  35.83 (16%) wall 3242293 kB (76%) ggc
 phase stream out        :  96.90 (46%) usr   9.19 (72%) sys 106.09 (48%) wall      52 kB ( 0%) ggc
 garbage collection      :   2.94 ( 1%) usr   0.00 ( 0%) sys   2.93 ( 1%) wall       0 kB ( 0%) ggc
 ipa dead code removal   :   4.60 ( 2%) usr   0.04 ( 0%) sys   4.53 ( 2%) wall       0 kB ( 0%) ggc
 ipa virtual call target :  24.48 (12%) usr   0.14 ( 1%) sys  24.76 (11%) wall       0 kB ( 0%) ggc
 ipa cp                  :   4.92 ( 2%) usr   0.41 ( 3%) sys   5.31 ( 2%) wall  502843 kB (12%) ggc
 ipa inlining heuristics :  23.72 (11%) usr   0.23 ( 2%) sys  23.92 (11%) wall  490927 kB (12%) ggc
 lto stream inflate      :  14.35 ( 7%) usr   0.35 ( 3%) sys  15.22 ( 7%) wall       0 kB ( 0%) ggc
 ipa lto gimple in       :   1.79 ( 1%) usr   0.57 ( 4%) sys   2.46 ( 1%) wall  324857 kB ( 8%) ggc
 ipa lto gimple out      :   9.98 ( 5%) usr   1.45 (11%) sys  11.05 ( 5%) wall      52 kB ( 0%) ggc
 ipa lto decl in         :  21.01 (10%) usr   0.91 ( 7%) sys  21.90 (10%) wall 2345561 kB (55%) ggc
 ipa lto decl out        :  73.55 (35%) usr   2.09 (16%) sys  75.67 (34%) wall       0 kB ( 0%) ggc
 ipa lto constructors out:   1.87 ( 1%) usr   0.32 ( 3%) sys   2.18 ( 1%) wall       0 kB ( 0%) ggc
 ipa lto decl merge      :   2.06 ( 1%) usr   0.00 ( 0%) sys   2.05 ( 1%) wall   13737 kB ( 0%) ggc
 whopr wpa I/O           :   2.84 ( 1%) usr   5.14 (40%) sys   7.96 ( 4%) wall       0 kB ( 0%) ggc
 whopr partitioning      :   3.83 ( 2%) usr   0.01 ( 0%) sys   3.84 ( 2%) wall    5958 kB ( 0%) ggc
 ipa reference           :   2.63 ( 1%) usr   0.00 ( 0%) sys   2.64 ( 1%) wall       0 kB ( 0%) ggc
 ipa icf                 :   8.23 ( 4%) usr   0.12 ( 1%) sys   8.32 ( 4%) wall   19203 kB ( 0%) ggc
 TOTAL                 : 209.83            12.71           222.54            4244939 kB

This now compares well to 5.3:

Execution times (seconds)                                                       
 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1989 kB ( 0%) ggc
 phase opt and generate  :  68.61 (31%) usr   2.41 (14%) sys  77.67 (29%) wall 1189579 kB (27%) ggc
 phase stream in         :  36.38 (16%) usr   2.32 (14%) sys  56.20 (21%) wall 3168787 kB (73%) ggc
 phase stream out        : 113.37 (51%) usr  11.90 (71%) sys 130.49 (49%) wall     112 kB ( 0%) ggc
 phase finalize          :   3.40 ( 2%) usr   0.13 ( 1%) sys   3.55 ( 1%) wall       0 kB ( 0%) ggc
 garbage collection      :   6.13 ( 3%) usr   0.01 ( 0%) sys   6.18 ( 2%) wall       0 kB ( 0%) ggc
 ipa dead code removal   :   4.74 ( 2%) usr   0.05 ( 0%) sys   5.09 ( 2%) wall       0 kB ( 0%) ggc
 ipa virtual call target :  11.29 ( 5%) usr   0.15 ( 1%) sys  11.20 ( 4%) wall       1 kB ( 0%) ggc
 ipa cp                  :   5.22 ( 2%) usr   0.21 ( 1%) sys   5.51 ( 2%) wall  507623 kB (12%) ggc
 ipa inlining heuristics :  24.11 (11%) usr   0.33 ( 2%) sys  24.67 ( 9%) wall  497487 kB (11%) ggc
 ipa lto gimple in       :   4.20 ( 2%) usr   1.08 ( 6%) sys  10.73 ( 4%) wall  467276 kB (11%) ggc
 ipa lto gimple out      :  17.57 ( 8%) usr   1.92 (11%) sys  23.61 ( 9%) wall     112 kB ( 0%) ggc
 ipa lto decl in         :  26.19 (12%) usr   1.20 ( 7%) sys  31.62 (12%) wall 2242394 kB (51%) ggc
 ipa lto decl out        :  89.09 (40%) usr   3.64 (22%) sys  92.79 (35%) wall       0 kB ( 0%) ggc
 ipa lto constructors in :   0.79 ( 0%) usr   0.28 ( 2%) sys  14.33 ( 5%) wall   17992 kB ( 0%) ggc
 ipa lto constructors out:   2.57 ( 1%) usr   0.41 ( 2%) sys   4.02 ( 2%) wall       0 kB ( 0%) ggc
 ipa lto cgraph I/O      :   1.11 ( 1%) usr   0.33 ( 2%) sys   1.81 ( 1%) wall  432544 kB (10%) ggc
 ipa lto decl merge      :   2.47 ( 1%) usr   0.00 ( 0%) sys   2.47 ( 1%) wall    8191 kB ( 0%) ggc
 ipa lto cgraph merge    :   1.91 ( 1%) usr   0.01 ( 0%) sys   1.97 ( 1%) wall   14717 kB ( 0%) ggc
 whopr wpa I/O           :   2.92 ( 1%) usr   5.93 (35%) sys   8.84 ( 3%) wall       0 kB ( 0%) ggc
 whopr partitioning      :   3.91 ( 2%) usr   0.02 ( 0%) sys   3.93 ( 1%) wall    6001 kB ( 0%) ggc
 ipa icf                 :   7.77 ( 4%) usr   0.19 ( 1%) sys   8.05 ( 3%) wall   22534 kB ( 1%) ggc
 TOTAL                 : 221.76            16.76           267.92            4360470 kB

Except that I really need to do something with virtual call targets. As the
quality of information improved by improved TBAA we now do more walks.

The savings for cc1 build are bigger and incremental linking improvements eveyr bigger
(about 50%), but I accidentaly removed the logs...

lto-bootstrapped/regtested x86_64-linux, OK?

	* cgraph.c (cgraph_node::get_untransformed_body): Pass compressed
	flag to lto_get_section_data.
	* varpool.c (varpool_node::get_constructor): Likewise.
	* lto-section-in.c (lto_get_section_data): Add new flag decompress.
	(lto_free_section_data): Likewise.
	(lto_get_raw_section_data): New function.
	(lto_free_raw_section_data): New function.
	(copy_function_or_variable): Copy sections w/o decompressing.
	(lto_output_decl_state_refs): Picke compressed bit.
	* lto-streamer.h (lto_in_decl_state): New flag compressed.
	(lto_out_decl_state): Likewise.
	(lto_get_section_data, lto_free_section_data): Update prototypes
	(lto_get_raw_section_data, lto_free_raw_section_data): Declare.
	(lto_write_raw_data): Declare.
	(lto_begin_section): Remove FIXME.
	(lto_write_raw_data): New function.
	(lto_write_stream): Remove FIXME.
	(lto_new_out_decl_state): Set compressed flag.

	* lto.c (lto_read_in_decl_state): Unpickle compressed bit.

Comments

Richard Biener Dec. 11, 2015, 9:27 a.m. UTC | #1
On Fri, 11 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch makes WPA to copy sections w/o decompressing them.  This leads
> to a nice /tmp usage for GCC bootstrap (about 70%) and little for Firefox.
> In GCC about 5% of the ltrans object file is the global decl section, while
> for Firefox it is 85%.  I will try to figure out if there is something
> terribly stupid pickled there.
> 
> The patch simply adds raw section i/o to lto-section-in.c and lto-section-out.c
> which is used by copy_function_or_variable.  The catch is that WPA->ltrans
> stremaing is not compressed and this fact is not represented in the object file
> at all.  We simply test flag_wpa and flag_ltrans.  Now function sections born
> at WPA time are uncompressed, while function sections just copied are
> compressed and we do not know how to read them.
> 
> I tried to simply turn off the non-compressed path and set compression level
> to minimal and then to none (which works despite the apparently outdated FIXME
> comments I removed).  Sadly zlib manages to burn about 16% of WPA time
> at minimal level and about 7% at none because it computes the checksum. Clealry
> next stage1 it is time to switch to better compression backend.
> 
> For now I added the information if section is compressed into 
> decl_state.  I am not thrilled by this but it is only way I found w/o 
> wasting 4 bytes per every lto section (because the lto header is not 
> really extensible and the stream is assumed to be aligned).

So this trick now only applies to decl sections?  I think you
could have stolen a bit from lto_simple_header::main_size
(oddly lto_simple_header_with_strings adds its own main_size,
hiding the simple-hearder ones - huh).

Changing lto_header itself into

  int16_t major_version
  int8_t minor_version
  int8_t flags

would be another possibility (and bump the major version).  I think
we have no sections produced with just lto_header but always
lto_simple_header (from grepping).  Some sections have no header
(lto.opts).

So would the patch be a lot more difficult if you go down either of
the routes above?  (I think I prefer changing lto_header rather
than making main_size a bitfield)

Richard.

> The whole lowlevel lto streaming code is grand mess, I hope we will clean this
> up and get more sane headers in foreseable future. Until that time this
> solution does not waste extra space as it is easy to pickle the flag as part of
> reference.
> 
> The patch saves about 7% of WPA time for firefox:
> 
>  phase opt and generate  :  75.66 (39%) usr   1.78 (14%) sys  77.44 (37%) wall  855644 kB (21%) ggc
>  phase stream in         :  34.62 (18%) usr   1.95 (16%) sys  36.57 (18%) wall 3245604 kB (79%) ggc
>  phase stream out        :  81.89 (42%) usr   8.49 (69%) sys  90.37 (44%) wall      50 kB ( 0%) ggc
>  ipa dead code removal   :   4.33 ( 2%) usr   0.06 ( 0%) sys   4.24 ( 2%) wall       0 kB ( 0%) ggc
>  ipa virtual call target :  25.15 (13%) usr   0.14 ( 1%) sys  25.42 (12%) wall       0 kB ( 0%) ggc
>  ipa cp                  :   3.92 ( 2%) usr   0.21 ( 2%) sys   4.18 ( 2%) wall  340698 kB ( 8%) ggc
>  ipa inlining heuristics :  24.12 (12%) usr   0.38 ( 3%) sys  24.37 (12%) wall  500427 kB (12%) ggc
>  lto stream inflate      :   7.07 ( 4%) usr   0.38 ( 3%) sys   7.33 ( 4%) wall       0 kB ( 0%) ggc
>  ipa lto gimple in       :   1.95 ( 1%) usr   0.61 ( 5%) sys   2.42 ( 1%) wall  324875 kB ( 8%) ggc
>  ipa lto gimple out      :   9.16 ( 5%) usr   1.64 (13%) sys  10.49 ( 5%) wall      50 kB ( 0%) ggc
>  ipa lto decl in         :  21.25 (11%) usr   1.01 ( 8%) sys  22.37 (11%) wall 2348869 kB (57%) ggc
>  ipa lto decl out        :  67.33 (34%) usr   1.66 (13%) sys  68.96 (33%) wall       0 kB ( 0%) ggc
>  ipa lto constructors out:   1.39 ( 1%) usr   0.38 ( 3%) sys   2.18 ( 1%) wall       0 kB ( 0%) ggc
>  ipa lto decl merge      :   2.12 ( 2%) usr   0.00 ( 0%) sys   2.12 ( 2%) wall   13737 kB ( 0%) ggc
>  ipa reference           :   2.14 ( 2%) usr   0.00 ( 0%) sys   2.13 ( 2%) wall       0 kB ( 0%) ggc
>  ipa pure const          :   2.29 ( 2%) usr   0.01 ( 0%) sys   2.35 ( 2%) wall       0 kB ( 0%) ggc
>  ipa icf                 :   9.02 ( 7%) usr   0.18 ( 2%) sys   9.72 ( 7%) wall   19203 kB ( 0%) ggc
>  TOTAL                 : 195.27            12.37           207.64            4103297 kB
>                                                                                 
>                                                                                 
>  phase opt and generate  :  79.00 (38%) usr   1.61 (13%) sys  80.61 (36%) wall 1000597 kB (24%) ggc
>  phase stream in         :  33.93 (16%) usr   1.91 (15%) sys  35.83 (16%) wall 3242293 kB (76%) ggc
>  phase stream out        :  96.90 (46%) usr   9.19 (72%) sys 106.09 (48%) wall      52 kB ( 0%) ggc
>  garbage collection      :   2.94 ( 1%) usr   0.00 ( 0%) sys   2.93 ( 1%) wall       0 kB ( 0%) ggc
>  ipa dead code removal   :   4.60 ( 2%) usr   0.04 ( 0%) sys   4.53 ( 2%) wall       0 kB ( 0%) ggc
>  ipa virtual call target :  24.48 (12%) usr   0.14 ( 1%) sys  24.76 (11%) wall       0 kB ( 0%) ggc
>  ipa cp                  :   4.92 ( 2%) usr   0.41 ( 3%) sys   5.31 ( 2%) wall  502843 kB (12%) ggc
>  ipa inlining heuristics :  23.72 (11%) usr   0.23 ( 2%) sys  23.92 (11%) wall  490927 kB (12%) ggc
>  lto stream inflate      :  14.35 ( 7%) usr   0.35 ( 3%) sys  15.22 ( 7%) wall       0 kB ( 0%) ggc
>  ipa lto gimple in       :   1.79 ( 1%) usr   0.57 ( 4%) sys   2.46 ( 1%) wall  324857 kB ( 8%) ggc
>  ipa lto gimple out      :   9.98 ( 5%) usr   1.45 (11%) sys  11.05 ( 5%) wall      52 kB ( 0%) ggc
>  ipa lto decl in         :  21.01 (10%) usr   0.91 ( 7%) sys  21.90 (10%) wall 2345561 kB (55%) ggc
>  ipa lto decl out        :  73.55 (35%) usr   2.09 (16%) sys  75.67 (34%) wall       0 kB ( 0%) ggc
>  ipa lto constructors out:   1.87 ( 1%) usr   0.32 ( 3%) sys   2.18 ( 1%) wall       0 kB ( 0%) ggc
>  ipa lto decl merge      :   2.06 ( 1%) usr   0.00 ( 0%) sys   2.05 ( 1%) wall   13737 kB ( 0%) ggc
>  whopr wpa I/O           :   2.84 ( 1%) usr   5.14 (40%) sys   7.96 ( 4%) wall       0 kB ( 0%) ggc
>  whopr partitioning      :   3.83 ( 2%) usr   0.01 ( 0%) sys   3.84 ( 2%) wall    5958 kB ( 0%) ggc
>  ipa reference           :   2.63 ( 1%) usr   0.00 ( 0%) sys   2.64 ( 1%) wall       0 kB ( 0%) ggc
>  ipa icf                 :   8.23 ( 4%) usr   0.12 ( 1%) sys   8.32 ( 4%) wall   19203 kB ( 0%) ggc
>  TOTAL                 : 209.83            12.71           222.54            4244939 kB
> 
> This now compares well to 5.3:
> 
> Execution times (seconds)                                                       
>  phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1989 kB ( 0%) ggc
>  phase opt and generate  :  68.61 (31%) usr   2.41 (14%) sys  77.67 (29%) wall 1189579 kB (27%) ggc
>  phase stream in         :  36.38 (16%) usr   2.32 (14%) sys  56.20 (21%) wall 3168787 kB (73%) ggc
>  phase stream out        : 113.37 (51%) usr  11.90 (71%) sys 130.49 (49%) wall     112 kB ( 0%) ggc
>  phase finalize          :   3.40 ( 2%) usr   0.13 ( 1%) sys   3.55 ( 1%) wall       0 kB ( 0%) ggc
>  garbage collection      :   6.13 ( 3%) usr   0.01 ( 0%) sys   6.18 ( 2%) wall       0 kB ( 0%) ggc
>  ipa dead code removal   :   4.74 ( 2%) usr   0.05 ( 0%) sys   5.09 ( 2%) wall       0 kB ( 0%) ggc
>  ipa virtual call target :  11.29 ( 5%) usr   0.15 ( 1%) sys  11.20 ( 4%) wall       1 kB ( 0%) ggc
>  ipa cp                  :   5.22 ( 2%) usr   0.21 ( 1%) sys   5.51 ( 2%) wall  507623 kB (12%) ggc
>  ipa inlining heuristics :  24.11 (11%) usr   0.33 ( 2%) sys  24.67 ( 9%) wall  497487 kB (11%) ggc
>  ipa lto gimple in       :   4.20 ( 2%) usr   1.08 ( 6%) sys  10.73 ( 4%) wall  467276 kB (11%) ggc
>  ipa lto gimple out      :  17.57 ( 8%) usr   1.92 (11%) sys  23.61 ( 9%) wall     112 kB ( 0%) ggc
>  ipa lto decl in         :  26.19 (12%) usr   1.20 ( 7%) sys  31.62 (12%) wall 2242394 kB (51%) ggc
>  ipa lto decl out        :  89.09 (40%) usr   3.64 (22%) sys  92.79 (35%) wall       0 kB ( 0%) ggc
>  ipa lto constructors in :   0.79 ( 0%) usr   0.28 ( 2%) sys  14.33 ( 5%) wall   17992 kB ( 0%) ggc
>  ipa lto constructors out:   2.57 ( 1%) usr   0.41 ( 2%) sys   4.02 ( 2%) wall       0 kB ( 0%) ggc
>  ipa lto cgraph I/O      :   1.11 ( 1%) usr   0.33 ( 2%) sys   1.81 ( 1%) wall  432544 kB (10%) ggc
>  ipa lto decl merge      :   2.47 ( 1%) usr   0.00 ( 0%) sys   2.47 ( 1%) wall    8191 kB ( 0%) ggc
>  ipa lto cgraph merge    :   1.91 ( 1%) usr   0.01 ( 0%) sys   1.97 ( 1%) wall   14717 kB ( 0%) ggc
>  whopr wpa I/O           :   2.92 ( 1%) usr   5.93 (35%) sys   8.84 ( 3%) wall       0 kB ( 0%) ggc
>  whopr partitioning      :   3.91 ( 2%) usr   0.02 ( 0%) sys   3.93 ( 1%) wall    6001 kB ( 0%) ggc
>  ipa icf                 :   7.77 ( 4%) usr   0.19 ( 1%) sys   8.05 ( 3%) wall   22534 kB ( 1%) ggc
>  TOTAL                 : 221.76            16.76           267.92            4360470 kB
> 
> Except that I really need to do something with virtual call targets. As the
> quality of information improved by improved TBAA we now do more walks.
> 
> The savings for cc1 build are bigger and incremental linking improvements eveyr bigger
> (about 50%), but I accidentaly removed the logs...
> 
> lto-bootstrapped/regtested x86_64-linux, OK?
> 
> 	* cgraph.c (cgraph_node::get_untransformed_body): Pass compressed
> 	flag to lto_get_section_data.
> 	* varpool.c (varpool_node::get_constructor): Likewise.
> 	* lto-section-in.c (lto_get_section_data): Add new flag decompress.
> 	(lto_free_section_data): Likewise.
> 	(lto_get_raw_section_data): New function.
> 	(lto_free_raw_section_data): New function.
> 	(copy_function_or_variable): Copy sections w/o decompressing.
> 	(lto_output_decl_state_refs): Picke compressed bit.
> 	* lto-streamer.h (lto_in_decl_state): New flag compressed.
> 	(lto_out_decl_state): Likewise.
> 	(lto_get_section_data, lto_free_section_data): Update prototypes
> 	(lto_get_raw_section_data, lto_free_raw_section_data): Declare.
> 	(lto_write_raw_data): Declare.
> 	(lto_begin_section): Remove FIXME.
> 	(lto_write_raw_data): New function.
> 	(lto_write_stream): Remove FIXME.
> 	(lto_new_out_decl_state): Set compressed flag.
> 
> 	* lto.c (lto_read_in_decl_state): Unpickle compressed bit.
> Index: cgraph.c
> ===================================================================
> --- cgraph.c	(revision 231546)
> +++ cgraph.c	(working copy)
> @@ -3251,9 +3251,11 @@ cgraph_node::get_untransformed_body (voi
>  
>    /* We may have renamed the declaration, e.g., a static function.  */
>    name = lto_get_decl_name_mapping (file_data, name);
> +  struct lto_in_decl_state *decl_state
> +	 = lto_get_function_in_decl_state (file_data, decl);
>  
>    data = lto_get_section_data (file_data, LTO_section_function_body,
> -			       name, &len);
> +			       name, &len, decl_state->compressed);
>    if (!data)
>      fatal_error (input_location, "%s: section %s is missing",
>  		 file_data->file_name,
> @@ -3264,7 +3266,7 @@ cgraph_node::get_untransformed_body (voi
>    lto_input_function_body (file_data, this, data);
>    lto_stats.num_function_bodies++;
>    lto_free_section_data (file_data, LTO_section_function_body, name,
> -			 data, len);
> +			 data, len, decl_state->compressed);
>    lto_free_function_in_decl_state_for_node (this);
>    /* Keep lto file data so ipa-inline-analysis knows about cross module
>       inlining.  */
> Index: lto-section-in.c
> ===================================================================
> --- lto-section-in.c	(revision 231546)
> +++ lto-section-in.c	(working copy)
> @@ -130,7 +130,7 @@ const char *
>  lto_get_section_data (struct lto_file_decl_data *file_data,
>  		      enum lto_section_type section_type,
>  		      const char *name,
> -		      size_t *len)
> +		      size_t *len, bool decompress)
>  {
>    const char *data = (get_section_f) (file_data, section_type, name, len);
>    const size_t header_length = sizeof (struct lto_data_header);
> @@ -142,9 +142,10 @@ lto_get_section_data (struct lto_file_de
>    if (data == NULL)
>      return NULL;
>  
> -  /* FIXME lto: WPA mode does not write compressed sections, so for now
> -     suppress uncompression if flag_ltrans.  */
> -  if (!flag_ltrans)
> +  /* WPA->ltrans streams are not compressed with exception of function bodies
> +     and variable initializers that has been verbatim copied from earlier
> +     compilations.  */
> +  if (!flag_ltrans || decompress)
>      {
>        /* Create a mapping header containing the underlying data and length,
>  	 and prepend this to the uncompression buffer.  The uncompressed data
> @@ -170,6 +171,16 @@ lto_get_section_data (struct lto_file_de
>    return data;
>  }
>  
> +/* Get the section data without any header parsing or uncompression.  */
> +
> +const char *
> +lto_get_raw_section_data (struct lto_file_decl_data *file_data,
> +			  enum lto_section_type section_type,
> +			  const char *name,
> +			  size_t *len)
> +{
> +  return (get_section_f) (file_data, section_type, name, len);
> +}
>  
>  /* Free the data found from the above call.  The first three
>     parameters are the same as above.  DATA is the data to be freed and
> @@ -180,7 +191,7 @@ lto_free_section_data (struct lto_file_d
>  		       enum lto_section_type section_type,
>  		       const char *name,
>  		       const char *data,
> -		       size_t len)
> +		       size_t len, bool decompress)
>  {
>    const size_t header_length = sizeof (struct lto_data_header);
>    const char *real_data = data - header_length;
> @@ -189,9 +200,7 @@ lto_free_section_data (struct lto_file_d
>  
>    gcc_assert (free_section_f);
>  
> -  /* FIXME lto: WPA mode does not write compressed sections, so for now
> -     suppress uncompression mapping if flag_ltrans.  */
> -  if (flag_ltrans)
> +  if (flag_ltrans && !decompress)
>      {
>        (free_section_f) (file_data, section_type, name, data, len);
>        return;
> @@ -203,6 +212,17 @@ lto_free_section_data (struct lto_file_d
>    free (CONST_CAST (char *, real_data));
>  }
>  
> +/* Free data allocated by lto_get_raw_section_data.  */
> +
> +void
> +lto_free_raw_section_data (struct lto_file_decl_data *file_data,
> +		           enum lto_section_type section_type,
> +		           const char *name,
> +		           const char *data,
> +		           size_t len)
> +{
> +  (free_section_f) (file_data, section_type, name, data, len);
> +}
>  
>  /* Load a section of type SECTION_TYPE from FILE_DATA, parse the
>     header and then return an input block pointing to the section.  The
> Index: varpool.c
> ===================================================================
> --- varpool.c	(revision 231546)
> +++ varpool.c	(working copy)
> @@ -296,9 +303,11 @@ varpool_node::get_constructor (void)
>  
>    /* We may have renamed the declaration, e.g., a static function.  */
>    name = lto_get_decl_name_mapping (file_data, name);
> +  struct lto_in_decl_state *decl_state
> +	 = lto_get_function_in_decl_state (file_data, decl);
>  
>    data = lto_get_section_data (file_data, LTO_section_function_body,
> -			       name, &len);
> +			       name, &len, decl_state->compressed);
>    if (!data)
>      fatal_error (input_location, "%s: section %s is missing",
>  		 file_data->file_name,
> @@ -308,7 +317,7 @@ varpool_node::get_constructor (void)
>    gcc_assert (DECL_INITIAL (decl) != error_mark_node);
>    lto_stats.num_function_bodies++;
>    lto_free_section_data (file_data, LTO_section_function_body, name,
> -			 data, len);
> +			 data, len, decl_state->compressed);
>    lto_free_function_in_decl_state_for_node (this);
>    timevar_pop (TV_IPA_LTO_CTORS_IN);
>    return DECL_INITIAL (decl);
> Index: lto-streamer-out.c
> ===================================================================
> --- lto-streamer-out.c	(revision 231546)
> +++ lto-streamer-out.c	(working copy)
> @@ -2191,22 +2224,23 @@ copy_function_or_variable (struct symtab
>    struct lto_in_decl_state *in_state;
>    struct lto_out_decl_state *out_state = lto_get_out_decl_state ();
>  
> -  lto_begin_section (section_name, !flag_wpa);
> +  lto_begin_section (section_name, false);
>    free (section_name);
>  
>    /* We may have renamed the declaration, e.g., a static function.  */
>    name = lto_get_decl_name_mapping (file_data, name);
>  
> -  data = lto_get_section_data (file_data, LTO_section_function_body,
> -                               name, &len);
> +  data = lto_get_raw_section_data (file_data, LTO_section_function_body,
> +                                   name, &len);
>    gcc_assert (data);
>  
>    /* Do a bit copy of the function body.  */
> -  lto_write_data (data, len);
> +  lto_write_raw_data (data, len);
>  
>    /* Copy decls. */
>    in_state =
>      lto_get_function_in_decl_state (node->lto_file_data, function);
> +  out_state->compressed = in_state->compressed;
>    gcc_assert (in_state);
>  
>    for (i = 0; i < LTO_N_DECL_STREAMS; i++)
> @@ -2224,8 +2258,8 @@ copy_function_or_variable (struct symtab
>  	encoder->trees.safe_push ((*trees)[j]);
>      }
>  
> -  lto_free_section_data (file_data, LTO_section_function_body, name,
> -			 data, len);
> +  lto_free_raw_section_data (file_data, LTO_section_function_body, name,
> +			     data, len);
>    lto_end_section ();
>  }
>  
> @@ -2431,6 +2465,7 @@ lto_output_decl_state_refs (struct outpu
>    decl = (state->fn_decl) ? state->fn_decl : void_type_node;
>    streamer_tree_cache_lookup (ob->writer_cache, decl, &ref);
>    gcc_assert (ref != (unsigned)-1);
> +  ref = ref * 2 + (state->compressed ? 1 : 0);
>    lto_write_data (&ref, sizeof (uint32_t));
>  
>    for (i = 0;  i < LTO_N_DECL_STREAMS; i++)
> Index: lto/lto-symtab.c
> ===================================================================
> --- lto/lto-symtab.c	(revision 231548)
> +++ lto/lto-symtab.c	(working copy)
> @@ -883,6 +883,11 @@ lto_symtab_merge_symbols_1 (symtab_node
>  	  else
>  	    {
>  	      DECL_INITIAL (e->decl) = error_mark_node;
> +	      if (e->lto_file_data)
> +		{
> +		  lto_free_function_in_decl_state_for_node (e);
> +		  e->lto_file_data = NULL;
> +		}
>  	      symtab->call_varpool_removal_hooks (dyn_cast<varpool_node *> (e));
>  	    }
>  	  e->remove_all_references ();
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c	(revision 231546)
> +++ lto/lto.c	(working copy)
> @@ -234,6 +234,8 @@ lto_read_in_decl_state (struct data_in *
>    uint32_t i, j;
>  
>    ix = *data++;
> +  state->compressed = ix & 1;
> +  ix /= 2;
>    decl = streamer_tree_cache_get_tree (data_in->reader_cache, ix);
>    if (!VAR_OR_FUNCTION_DECL_P (decl))
>      {
> Index: lto-streamer.h
> ===================================================================
> --- lto-streamer.h	(revision 231546)
> +++ lto-streamer.h	(working copy)
> @@ -504,6 +505,9 @@ struct GTY((for_user)) lto_in_decl_state
>    /* If this in-decl state is associated with a function. FN_DECL
>       point to the FUNCTION_DECL. */
>    tree fn_decl;
> +
> +  /* True if decl state is compressed.  */
> +  bool compressed;
>  };
>  
>  typedef struct lto_in_decl_state *lto_in_decl_state_ptr;
> @@ -537,6 +541,9 @@ struct lto_out_decl_state
>    /* If this out-decl state belongs to a function, fn_decl points to that
>       function.  Otherwise, it is NULL. */
>    tree fn_decl;
> +
> +  /* True if decl state is compressed.  */
> +  bool compressed;
>  };
>  
>  typedef struct lto_out_decl_state *lto_out_decl_state_ptr;
> @@ -761,10 +768,18 @@ extern void lto_set_in_hooks (struct lto
>  extern struct lto_file_decl_data **lto_get_file_decl_data (void);
>  extern const char *lto_get_section_data (struct lto_file_decl_data *,
>  					 enum lto_section_type,
> -					 const char *, size_t *);
> +					 const char *, size_t *,
> +					 bool decompress = false);
> +extern const char *lto_get_raw_section_data (struct lto_file_decl_data *,
> +					     enum lto_section_type,
> +					     const char *, size_t *);
>  extern void lto_free_section_data (struct lto_file_decl_data *,
> -				   enum lto_section_type,
> -				   const char *, const char *, size_t);
> +			           enum lto_section_type,
> +				   const char *, const char *, size_t,
> +				   bool decompress = false);
> +extern void lto_free_raw_section_data (struct lto_file_decl_data *,
> +				       enum lto_section_type,
> +				       const char *, const char *, size_t);
>  extern htab_t lto_create_renaming_table (void);
>  extern void lto_record_renamed_decl (struct lto_file_decl_data *,
>  				     const char *, const char *);
> @@ -785,6 +800,7 @@ extern void lto_value_range_error (const
>  extern void lto_begin_section (const char *, bool);
>  extern void lto_end_section (void);
>  extern void lto_write_data (const void *, unsigned int);
> +extern void lto_write_raw_data (const void *, unsigned int);
>  extern void lto_write_stream (struct lto_output_stream *);
>  extern bool lto_output_decl_index (struct lto_output_stream *,
>  			    struct lto_tree_ref_encoder *,
> Index: lto-section-out.c
> ===================================================================
> --- lto-section-out.c	(revision 231546)
> +++ lto-section-out.c	(working copy)
> @@ -66,9 +66,6 @@ lto_begin_section (const char *name, boo
>  {
>    lang_hooks.lto.begin_section (name);
>  
> -  /* FIXME lto: for now, suppress compression if the lang_hook that appends
> -     data is anything other than assembler output.  The effect here is that
> -     we get compression of IL only in non-ltrans object files.  */
>    gcc_assert (compression_stream == NULL);
>    if (compress)
>      compression_stream = lto_start_compression (lto_append_data, NULL);
> @@ -99,6 +96,14 @@ lto_write_data (const void *data, unsign
>      lang_hooks.lto.append_data ((const char *)data, size, NULL);
>  }
>  
> +/* Write SIZE bytes starting at DATA to the assembler.  */
> +
> +void
> +lto_write_raw_data (const void *data, unsigned int size)
> +{
> +  lang_hooks.lto.append_data ((const char *)data, size, NULL);
> +}
> +
>  /* Write all of the chars in OBS to the assembler.  Recycle the blocks
>     in obs as this is being done.  */
>  
> @@ -123,10 +128,6 @@ lto_write_stream (struct lto_output_stre
>        if (!next_block)
>  	num_chars -= obs->left_in_block;
>  
> -      /* FIXME lto: WPA mode uses an ELF function as a lang_hook to append
> -         output data.  This hook is not happy with the way that compression
> -         blocks up output differently to the way it's blocked here.  So for
> -         now, we don't compress WPA output.  */
>        if (compression_stream)
>  	lto_compress_block (compression_stream, base, num_chars);
>        else
> @@ -295,6 +296,9 @@ lto_new_out_decl_state (void)
>    for (i = 0; i < LTO_N_DECL_STREAMS; i++)
>      lto_init_tree_ref_encoder (&state->streams[i]);
>  
> +  /* At WPA time we do not compress sections by default.  */
> +  state->compressed = !flag_wpa;
> +
>    return state;
>  }
>  
> 
>
Jan Hubicka Dec. 11, 2015, 9:38 a.m. UTC | #2
> > For now I added the information if section is compressed into 
> > decl_state.  I am not thrilled by this but it is only way I found w/o 
> > wasting 4 bytes per every lto section (because the lto header is not 
> > really extensible and the stream is assumed to be aligned).
> 
> So this trick now only applies to decl sections?  I think you

Only function/variable sections are copies verbatim by WPA, so yes.
Everything else is re-streamed from scratch (and I do not se what else
can be just copied through anyway)

> could have stolen a bit from lto_simple_header::main_size
> (oddly lto_simple_header_with_strings adds its own main_size,
> hiding the simple-hearder ones - huh).
> 
> Changing lto_header itself into
> 
>   int16_t major_version
>   int8_t minor_version
>   int8_t flags
> 
> would be another possibility (and bump the major version).  I think

This seems better for me - we can steal just little from main_size, but
I think we can be quite fine with only 256 minor versions. I will update the patch.
> we have no sections produced with just lto_header but always
> lto_simple_header (from grepping).  Some sections have no header
> (lto.opts).

lto.opts is never compressed. Also the symbol table used by lto-plugin
goes w/o headers.
> 
> So would the patch be a lot more difficult if you go down either of
> the routes above?  (I think I prefer changing lto_header rather
> than making main_size a bitfield)

Agreed ;)

Honza
> 
> Richard.
> 
> > The whole lowlevel lto streaming code is grand mess, I hope we will clean this
> > up and get more sane headers in foreseable future. Until that time this
> > solution does not waste extra space as it is easy to pickle the flag as part of
> > reference.
> > 
> > The patch saves about 7% of WPA time for firefox:
> > 
> >  phase opt and generate  :  75.66 (39%) usr   1.78 (14%) sys  77.44 (37%) wall  855644 kB (21%) ggc
> >  phase stream in         :  34.62 (18%) usr   1.95 (16%) sys  36.57 (18%) wall 3245604 kB (79%) ggc
> >  phase stream out        :  81.89 (42%) usr   8.49 (69%) sys  90.37 (44%) wall      50 kB ( 0%) ggc
> >  ipa dead code removal   :   4.33 ( 2%) usr   0.06 ( 0%) sys   4.24 ( 2%) wall       0 kB ( 0%) ggc
> >  ipa virtual call target :  25.15 (13%) usr   0.14 ( 1%) sys  25.42 (12%) wall       0 kB ( 0%) ggc
> >  ipa cp                  :   3.92 ( 2%) usr   0.21 ( 2%) sys   4.18 ( 2%) wall  340698 kB ( 8%) ggc
> >  ipa inlining heuristics :  24.12 (12%) usr   0.38 ( 3%) sys  24.37 (12%) wall  500427 kB (12%) ggc
> >  lto stream inflate      :   7.07 ( 4%) usr   0.38 ( 3%) sys   7.33 ( 4%) wall       0 kB ( 0%) ggc
> >  ipa lto gimple in       :   1.95 ( 1%) usr   0.61 ( 5%) sys   2.42 ( 1%) wall  324875 kB ( 8%) ggc
> >  ipa lto gimple out      :   9.16 ( 5%) usr   1.64 (13%) sys  10.49 ( 5%) wall      50 kB ( 0%) ggc
> >  ipa lto decl in         :  21.25 (11%) usr   1.01 ( 8%) sys  22.37 (11%) wall 2348869 kB (57%) ggc
> >  ipa lto decl out        :  67.33 (34%) usr   1.66 (13%) sys  68.96 (33%) wall       0 kB ( 0%) ggc
> >  ipa lto constructors out:   1.39 ( 1%) usr   0.38 ( 3%) sys   2.18 ( 1%) wall       0 kB ( 0%) ggc
> >  ipa lto decl merge      :   2.12 ( 2%) usr   0.00 ( 0%) sys   2.12 ( 2%) wall   13737 kB ( 0%) ggc
> >  ipa reference           :   2.14 ( 2%) usr   0.00 ( 0%) sys   2.13 ( 2%) wall       0 kB ( 0%) ggc
> >  ipa pure const          :   2.29 ( 2%) usr   0.01 ( 0%) sys   2.35 ( 2%) wall       0 kB ( 0%) ggc
> >  ipa icf                 :   9.02 ( 7%) usr   0.18 ( 2%) sys   9.72 ( 7%) wall   19203 kB ( 0%) ggc
> >  TOTAL                 : 195.27            12.37           207.64            4103297 kB
> >                                                                                 
> >                                                                                 
> >  phase opt and generate  :  79.00 (38%) usr   1.61 (13%) sys  80.61 (36%) wall 1000597 kB (24%) ggc
> >  phase stream in         :  33.93 (16%) usr   1.91 (15%) sys  35.83 (16%) wall 3242293 kB (76%) ggc
> >  phase stream out        :  96.90 (46%) usr   9.19 (72%) sys 106.09 (48%) wall      52 kB ( 0%) ggc
> >  garbage collection      :   2.94 ( 1%) usr   0.00 ( 0%) sys   2.93 ( 1%) wall       0 kB ( 0%) ggc
> >  ipa dead code removal   :   4.60 ( 2%) usr   0.04 ( 0%) sys   4.53 ( 2%) wall       0 kB ( 0%) ggc
> >  ipa virtual call target :  24.48 (12%) usr   0.14 ( 1%) sys  24.76 (11%) wall       0 kB ( 0%) ggc
> >  ipa cp                  :   4.92 ( 2%) usr   0.41 ( 3%) sys   5.31 ( 2%) wall  502843 kB (12%) ggc
> >  ipa inlining heuristics :  23.72 (11%) usr   0.23 ( 2%) sys  23.92 (11%) wall  490927 kB (12%) ggc
> >  lto stream inflate      :  14.35 ( 7%) usr   0.35 ( 3%) sys  15.22 ( 7%) wall       0 kB ( 0%) ggc
> >  ipa lto gimple in       :   1.79 ( 1%) usr   0.57 ( 4%) sys   2.46 ( 1%) wall  324857 kB ( 8%) ggc
> >  ipa lto gimple out      :   9.98 ( 5%) usr   1.45 (11%) sys  11.05 ( 5%) wall      52 kB ( 0%) ggc
> >  ipa lto decl in         :  21.01 (10%) usr   0.91 ( 7%) sys  21.90 (10%) wall 2345561 kB (55%) ggc
> >  ipa lto decl out        :  73.55 (35%) usr   2.09 (16%) sys  75.67 (34%) wall       0 kB ( 0%) ggc
> >  ipa lto constructors out:   1.87 ( 1%) usr   0.32 ( 3%) sys   2.18 ( 1%) wall       0 kB ( 0%) ggc
> >  ipa lto decl merge      :   2.06 ( 1%) usr   0.00 ( 0%) sys   2.05 ( 1%) wall   13737 kB ( 0%) ggc
> >  whopr wpa I/O           :   2.84 ( 1%) usr   5.14 (40%) sys   7.96 ( 4%) wall       0 kB ( 0%) ggc
> >  whopr partitioning      :   3.83 ( 2%) usr   0.01 ( 0%) sys   3.84 ( 2%) wall    5958 kB ( 0%) ggc
> >  ipa reference           :   2.63 ( 1%) usr   0.00 ( 0%) sys   2.64 ( 1%) wall       0 kB ( 0%) ggc
> >  ipa icf                 :   8.23 ( 4%) usr   0.12 ( 1%) sys   8.32 ( 4%) wall   19203 kB ( 0%) ggc
> >  TOTAL                 : 209.83            12.71           222.54            4244939 kB
> > 
> > This now compares well to 5.3:
> > 
> > Execution times (seconds)                                                       
> >  phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1989 kB ( 0%) ggc
> >  phase opt and generate  :  68.61 (31%) usr   2.41 (14%) sys  77.67 (29%) wall 1189579 kB (27%) ggc
> >  phase stream in         :  36.38 (16%) usr   2.32 (14%) sys  56.20 (21%) wall 3168787 kB (73%) ggc
> >  phase stream out        : 113.37 (51%) usr  11.90 (71%) sys 130.49 (49%) wall     112 kB ( 0%) ggc
> >  phase finalize          :   3.40 ( 2%) usr   0.13 ( 1%) sys   3.55 ( 1%) wall       0 kB ( 0%) ggc
> >  garbage collection      :   6.13 ( 3%) usr   0.01 ( 0%) sys   6.18 ( 2%) wall       0 kB ( 0%) ggc
> >  ipa dead code removal   :   4.74 ( 2%) usr   0.05 ( 0%) sys   5.09 ( 2%) wall       0 kB ( 0%) ggc
> >  ipa virtual call target :  11.29 ( 5%) usr   0.15 ( 1%) sys  11.20 ( 4%) wall       1 kB ( 0%) ggc
> >  ipa cp                  :   5.22 ( 2%) usr   0.21 ( 1%) sys   5.51 ( 2%) wall  507623 kB (12%) ggc
> >  ipa inlining heuristics :  24.11 (11%) usr   0.33 ( 2%) sys  24.67 ( 9%) wall  497487 kB (11%) ggc
> >  ipa lto gimple in       :   4.20 ( 2%) usr   1.08 ( 6%) sys  10.73 ( 4%) wall  467276 kB (11%) ggc
> >  ipa lto gimple out      :  17.57 ( 8%) usr   1.92 (11%) sys  23.61 ( 9%) wall     112 kB ( 0%) ggc
> >  ipa lto decl in         :  26.19 (12%) usr   1.20 ( 7%) sys  31.62 (12%) wall 2242394 kB (51%) ggc
> >  ipa lto decl out        :  89.09 (40%) usr   3.64 (22%) sys  92.79 (35%) wall       0 kB ( 0%) ggc
> >  ipa lto constructors in :   0.79 ( 0%) usr   0.28 ( 2%) sys  14.33 ( 5%) wall   17992 kB ( 0%) ggc
> >  ipa lto constructors out:   2.57 ( 1%) usr   0.41 ( 2%) sys   4.02 ( 2%) wall       0 kB ( 0%) ggc
> >  ipa lto cgraph I/O      :   1.11 ( 1%) usr   0.33 ( 2%) sys   1.81 ( 1%) wall  432544 kB (10%) ggc
> >  ipa lto decl merge      :   2.47 ( 1%) usr   0.00 ( 0%) sys   2.47 ( 1%) wall    8191 kB ( 0%) ggc
> >  ipa lto cgraph merge    :   1.91 ( 1%) usr   0.01 ( 0%) sys   1.97 ( 1%) wall   14717 kB ( 0%) ggc
> >  whopr wpa I/O           :   2.92 ( 1%) usr   5.93 (35%) sys   8.84 ( 3%) wall       0 kB ( 0%) ggc
> >  whopr partitioning      :   3.91 ( 2%) usr   0.02 ( 0%) sys   3.93 ( 1%) wall    6001 kB ( 0%) ggc
> >  ipa icf                 :   7.77 ( 4%) usr   0.19 ( 1%) sys   8.05 ( 3%) wall   22534 kB ( 1%) ggc
> >  TOTAL                 : 221.76            16.76           267.92            4360470 kB
> > 
> > Except that I really need to do something with virtual call targets. As the
> > quality of information improved by improved TBAA we now do more walks.
> > 
> > The savings for cc1 build are bigger and incremental linking improvements eveyr bigger
> > (about 50%), but I accidentaly removed the logs...
> > 
> > lto-bootstrapped/regtested x86_64-linux, OK?
> > 
> > 	* cgraph.c (cgraph_node::get_untransformed_body): Pass compressed
> > 	flag to lto_get_section_data.
> > 	* varpool.c (varpool_node::get_constructor): Likewise.
> > 	* lto-section-in.c (lto_get_section_data): Add new flag decompress.
> > 	(lto_free_section_data): Likewise.
> > 	(lto_get_raw_section_data): New function.
> > 	(lto_free_raw_section_data): New function.
> > 	(copy_function_or_variable): Copy sections w/o decompressing.
> > 	(lto_output_decl_state_refs): Picke compressed bit.
> > 	* lto-streamer.h (lto_in_decl_state): New flag compressed.
> > 	(lto_out_decl_state): Likewise.
> > 	(lto_get_section_data, lto_free_section_data): Update prototypes
> > 	(lto_get_raw_section_data, lto_free_raw_section_data): Declare.
> > 	(lto_write_raw_data): Declare.
> > 	(lto_begin_section): Remove FIXME.
> > 	(lto_write_raw_data): New function.
> > 	(lto_write_stream): Remove FIXME.
> > 	(lto_new_out_decl_state): Set compressed flag.
> > 
> > 	* lto.c (lto_read_in_decl_state): Unpickle compressed bit.
> > Index: cgraph.c
> > ===================================================================
> > --- cgraph.c	(revision 231546)
> > +++ cgraph.c	(working copy)
> > @@ -3251,9 +3251,11 @@ cgraph_node::get_untransformed_body (voi
> >  
> >    /* We may have renamed the declaration, e.g., a static function.  */
> >    name = lto_get_decl_name_mapping (file_data, name);
> > +  struct lto_in_decl_state *decl_state
> > +	 = lto_get_function_in_decl_state (file_data, decl);
> >  
> >    data = lto_get_section_data (file_data, LTO_section_function_body,
> > -			       name, &len);
> > +			       name, &len, decl_state->compressed);
> >    if (!data)
> >      fatal_error (input_location, "%s: section %s is missing",
> >  		 file_data->file_name,
> > @@ -3264,7 +3266,7 @@ cgraph_node::get_untransformed_body (voi
> >    lto_input_function_body (file_data, this, data);
> >    lto_stats.num_function_bodies++;
> >    lto_free_section_data (file_data, LTO_section_function_body, name,
> > -			 data, len);
> > +			 data, len, decl_state->compressed);
> >    lto_free_function_in_decl_state_for_node (this);
> >    /* Keep lto file data so ipa-inline-analysis knows about cross module
> >       inlining.  */
> > Index: lto-section-in.c
> > ===================================================================
> > --- lto-section-in.c	(revision 231546)
> > +++ lto-section-in.c	(working copy)
> > @@ -130,7 +130,7 @@ const char *
> >  lto_get_section_data (struct lto_file_decl_data *file_data,
> >  		      enum lto_section_type section_type,
> >  		      const char *name,
> > -		      size_t *len)
> > +		      size_t *len, bool decompress)
> >  {
> >    const char *data = (get_section_f) (file_data, section_type, name, len);
> >    const size_t header_length = sizeof (struct lto_data_header);
> > @@ -142,9 +142,10 @@ lto_get_section_data (struct lto_file_de
> >    if (data == NULL)
> >      return NULL;
> >  
> > -  /* FIXME lto: WPA mode does not write compressed sections, so for now
> > -     suppress uncompression if flag_ltrans.  */
> > -  if (!flag_ltrans)
> > +  /* WPA->ltrans streams are not compressed with exception of function bodies
> > +     and variable initializers that has been verbatim copied from earlier
> > +     compilations.  */
> > +  if (!flag_ltrans || decompress)
> >      {
> >        /* Create a mapping header containing the underlying data and length,
> >  	 and prepend this to the uncompression buffer.  The uncompressed data
> > @@ -170,6 +171,16 @@ lto_get_section_data (struct lto_file_de
> >    return data;
> >  }
> >  
> > +/* Get the section data without any header parsing or uncompression.  */
> > +
> > +const char *
> > +lto_get_raw_section_data (struct lto_file_decl_data *file_data,
> > +			  enum lto_section_type section_type,
> > +			  const char *name,
> > +			  size_t *len)
> > +{
> > +  return (get_section_f) (file_data, section_type, name, len);
> > +}
> >  
> >  /* Free the data found from the above call.  The first three
> >     parameters are the same as above.  DATA is the data to be freed and
> > @@ -180,7 +191,7 @@ lto_free_section_data (struct lto_file_d
> >  		       enum lto_section_type section_type,
> >  		       const char *name,
> >  		       const char *data,
> > -		       size_t len)
> > +		       size_t len, bool decompress)
> >  {
> >    const size_t header_length = sizeof (struct lto_data_header);
> >    const char *real_data = data - header_length;
> > @@ -189,9 +200,7 @@ lto_free_section_data (struct lto_file_d
> >  
> >    gcc_assert (free_section_f);
> >  
> > -  /* FIXME lto: WPA mode does not write compressed sections, so for now
> > -     suppress uncompression mapping if flag_ltrans.  */
> > -  if (flag_ltrans)
> > +  if (flag_ltrans && !decompress)
> >      {
> >        (free_section_f) (file_data, section_type, name, data, len);
> >        return;
> > @@ -203,6 +212,17 @@ lto_free_section_data (struct lto_file_d
> >    free (CONST_CAST (char *, real_data));
> >  }
> >  
> > +/* Free data allocated by lto_get_raw_section_data.  */
> > +
> > +void
> > +lto_free_raw_section_data (struct lto_file_decl_data *file_data,
> > +		           enum lto_section_type section_type,
> > +		           const char *name,
> > +		           const char *data,
> > +		           size_t len)
> > +{
> > +  (free_section_f) (file_data, section_type, name, data, len);
> > +}
> >  
> >  /* Load a section of type SECTION_TYPE from FILE_DATA, parse the
> >     header and then return an input block pointing to the section.  The
> > Index: varpool.c
> > ===================================================================
> > --- varpool.c	(revision 231546)
> > +++ varpool.c	(working copy)
> > @@ -296,9 +303,11 @@ varpool_node::get_constructor (void)
> >  
> >    /* We may have renamed the declaration, e.g., a static function.  */
> >    name = lto_get_decl_name_mapping (file_data, name);
> > +  struct lto_in_decl_state *decl_state
> > +	 = lto_get_function_in_decl_state (file_data, decl);
> >  
> >    data = lto_get_section_data (file_data, LTO_section_function_body,
> > -			       name, &len);
> > +			       name, &len, decl_state->compressed);
> >    if (!data)
> >      fatal_error (input_location, "%s: section %s is missing",
> >  		 file_data->file_name,
> > @@ -308,7 +317,7 @@ varpool_node::get_constructor (void)
> >    gcc_assert (DECL_INITIAL (decl) != error_mark_node);
> >    lto_stats.num_function_bodies++;
> >    lto_free_section_data (file_data, LTO_section_function_body, name,
> > -			 data, len);
> > +			 data, len, decl_state->compressed);
> >    lto_free_function_in_decl_state_for_node (this);
> >    timevar_pop (TV_IPA_LTO_CTORS_IN);
> >    return DECL_INITIAL (decl);
> > Index: lto-streamer-out.c
> > ===================================================================
> > --- lto-streamer-out.c	(revision 231546)
> > +++ lto-streamer-out.c	(working copy)
> > @@ -2191,22 +2224,23 @@ copy_function_or_variable (struct symtab
> >    struct lto_in_decl_state *in_state;
> >    struct lto_out_decl_state *out_state = lto_get_out_decl_state ();
> >  
> > -  lto_begin_section (section_name, !flag_wpa);
> > +  lto_begin_section (section_name, false);
> >    free (section_name);
> >  
> >    /* We may have renamed the declaration, e.g., a static function.  */
> >    name = lto_get_decl_name_mapping (file_data, name);
> >  
> > -  data = lto_get_section_data (file_data, LTO_section_function_body,
> > -                               name, &len);
> > +  data = lto_get_raw_section_data (file_data, LTO_section_function_body,
> > +                                   name, &len);
> >    gcc_assert (data);
> >  
> >    /* Do a bit copy of the function body.  */
> > -  lto_write_data (data, len);
> > +  lto_write_raw_data (data, len);
> >  
> >    /* Copy decls. */
> >    in_state =
> >      lto_get_function_in_decl_state (node->lto_file_data, function);
> > +  out_state->compressed = in_state->compressed;
> >    gcc_assert (in_state);
> >  
> >    for (i = 0; i < LTO_N_DECL_STREAMS; i++)
> > @@ -2224,8 +2258,8 @@ copy_function_or_variable (struct symtab
> >  	encoder->trees.safe_push ((*trees)[j]);
> >      }
> >  
> > -  lto_free_section_data (file_data, LTO_section_function_body, name,
> > -			 data, len);
> > +  lto_free_raw_section_data (file_data, LTO_section_function_body, name,
> > +			     data, len);
> >    lto_end_section ();
> >  }
> >  
> > @@ -2431,6 +2465,7 @@ lto_output_decl_state_refs (struct outpu
> >    decl = (state->fn_decl) ? state->fn_decl : void_type_node;
> >    streamer_tree_cache_lookup (ob->writer_cache, decl, &ref);
> >    gcc_assert (ref != (unsigned)-1);
> > +  ref = ref * 2 + (state->compressed ? 1 : 0);
> >    lto_write_data (&ref, sizeof (uint32_t));
> >  
> >    for (i = 0;  i < LTO_N_DECL_STREAMS; i++)
> > Index: lto/lto-symtab.c
> > ===================================================================
> > --- lto/lto-symtab.c	(revision 231548)
> > +++ lto/lto-symtab.c	(working copy)
> > @@ -883,6 +883,11 @@ lto_symtab_merge_symbols_1 (symtab_node
> >  	  else
> >  	    {
> >  	      DECL_INITIAL (e->decl) = error_mark_node;
> > +	      if (e->lto_file_data)
> > +		{
> > +		  lto_free_function_in_decl_state_for_node (e);
> > +		  e->lto_file_data = NULL;
> > +		}
> >  	      symtab->call_varpool_removal_hooks (dyn_cast<varpool_node *> (e));
> >  	    }
> >  	  e->remove_all_references ();
> > Index: lto/lto.c
> > ===================================================================
> > --- lto/lto.c	(revision 231546)
> > +++ lto/lto.c	(working copy)
> > @@ -234,6 +234,8 @@ lto_read_in_decl_state (struct data_in *
> >    uint32_t i, j;
> >  
> >    ix = *data++;
> > +  state->compressed = ix & 1;
> > +  ix /= 2;
> >    decl = streamer_tree_cache_get_tree (data_in->reader_cache, ix);
> >    if (!VAR_OR_FUNCTION_DECL_P (decl))
> >      {
> > Index: lto-streamer.h
> > ===================================================================
> > --- lto-streamer.h	(revision 231546)
> > +++ lto-streamer.h	(working copy)
> > @@ -504,6 +505,9 @@ struct GTY((for_user)) lto_in_decl_state
> >    /* If this in-decl state is associated with a function. FN_DECL
> >       point to the FUNCTION_DECL. */
> >    tree fn_decl;
> > +
> > +  /* True if decl state is compressed.  */
> > +  bool compressed;
> >  };
> >  
> >  typedef struct lto_in_decl_state *lto_in_decl_state_ptr;
> > @@ -537,6 +541,9 @@ struct lto_out_decl_state
> >    /* If this out-decl state belongs to a function, fn_decl points to that
> >       function.  Otherwise, it is NULL. */
> >    tree fn_decl;
> > +
> > +  /* True if decl state is compressed.  */
> > +  bool compressed;
> >  };
> >  
> >  typedef struct lto_out_decl_state *lto_out_decl_state_ptr;
> > @@ -761,10 +768,18 @@ extern void lto_set_in_hooks (struct lto
> >  extern struct lto_file_decl_data **lto_get_file_decl_data (void);
> >  extern const char *lto_get_section_data (struct lto_file_decl_data *,
> >  					 enum lto_section_type,
> > -					 const char *, size_t *);
> > +					 const char *, size_t *,
> > +					 bool decompress = false);
> > +extern const char *lto_get_raw_section_data (struct lto_file_decl_data *,
> > +					     enum lto_section_type,
> > +					     const char *, size_t *);
> >  extern void lto_free_section_data (struct lto_file_decl_data *,
> > -				   enum lto_section_type,
> > -				   const char *, const char *, size_t);
> > +			           enum lto_section_type,
> > +				   const char *, const char *, size_t,
> > +				   bool decompress = false);
> > +extern void lto_free_raw_section_data (struct lto_file_decl_data *,
> > +				       enum lto_section_type,
> > +				       const char *, const char *, size_t);
> >  extern htab_t lto_create_renaming_table (void);
> >  extern void lto_record_renamed_decl (struct lto_file_decl_data *,
> >  				     const char *, const char *);
> > @@ -785,6 +800,7 @@ extern void lto_value_range_error (const
> >  extern void lto_begin_section (const char *, bool);
> >  extern void lto_end_section (void);
> >  extern void lto_write_data (const void *, unsigned int);
> > +extern void lto_write_raw_data (const void *, unsigned int);
> >  extern void lto_write_stream (struct lto_output_stream *);
> >  extern bool lto_output_decl_index (struct lto_output_stream *,
> >  			    struct lto_tree_ref_encoder *,
> > Index: lto-section-out.c
> > ===================================================================
> > --- lto-section-out.c	(revision 231546)
> > +++ lto-section-out.c	(working copy)
> > @@ -66,9 +66,6 @@ lto_begin_section (const char *name, boo
> >  {
> >    lang_hooks.lto.begin_section (name);
> >  
> > -  /* FIXME lto: for now, suppress compression if the lang_hook that appends
> > -     data is anything other than assembler output.  The effect here is that
> > -     we get compression of IL only in non-ltrans object files.  */
> >    gcc_assert (compression_stream == NULL);
> >    if (compress)
> >      compression_stream = lto_start_compression (lto_append_data, NULL);
> > @@ -99,6 +96,14 @@ lto_write_data (const void *data, unsign
> >      lang_hooks.lto.append_data ((const char *)data, size, NULL);
> >  }
> >  
> > +/* Write SIZE bytes starting at DATA to the assembler.  */
> > +
> > +void
> > +lto_write_raw_data (const void *data, unsigned int size)
> > +{
> > +  lang_hooks.lto.append_data ((const char *)data, size, NULL);
> > +}
> > +
> >  /* Write all of the chars in OBS to the assembler.  Recycle the blocks
> >     in obs as this is being done.  */
> >  
> > @@ -123,10 +128,6 @@ lto_write_stream (struct lto_output_stre
> >        if (!next_block)
> >  	num_chars -= obs->left_in_block;
> >  
> > -      /* FIXME lto: WPA mode uses an ELF function as a lang_hook to append
> > -         output data.  This hook is not happy with the way that compression
> > -         blocks up output differently to the way it's blocked here.  So for
> > -         now, we don't compress WPA output.  */
> >        if (compression_stream)
> >  	lto_compress_block (compression_stream, base, num_chars);
> >        else
> > @@ -295,6 +296,9 @@ lto_new_out_decl_state (void)
> >    for (i = 0; i < LTO_N_DECL_STREAMS; i++)
> >      lto_init_tree_ref_encoder (&state->streams[i]);
> >  
> > +  /* At WPA time we do not compress sections by default.  */
> > +  state->compressed = !flag_wpa;
> > +
> >    return state;
> >  }
> >  
> > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Jan Hubicka Dec. 12, 2015, 6:47 p.m. UTC | #3
> > So would the patch be a lot more difficult if you go down either of
> > the routes above?  (I think I prefer changing lto_header rather
> > than making main_size a bitfield)
> 
> Agreed ;)
Hmm, actually it seems things are difficult. All the headers are already compressed
by zlib:
Contents of section .gnu.lto_.inline.faa6142d1fc0c505:
 0000 789c6362 6062c006 10a20000 ba0007    x.cb`b......... 
Contents of section .gnu.lto_.symbol_nodes.faa6142d1fc0c505:
 0000 789c6362 6062e060 80025606 06f67a06  x.cb`b.`..V...z.
 0010 0600029c 0098                        ......          
Contents of section .gnu.lto_.refs.faa6142d1fc0c505:
 0000 789c6362 60626064 80020000 460006    x.cb`b`d....F.. 
Contents of section .gnu.lto_.decls.faa6142d1fc0c505:
 0000 789c6362 6062d060 6060f80f 0461405a  x.cb`b.```...a@Z
 0010 10881981 d88e0115 80c496a2 8929312c  .............)1,
 0020 6c5870e0 c1d2291c 131a1b1b feffaf4f  lXp...)........O
 0030 6464e104 9ac7cec8 a0c5b0b2 a1a16142  dd............aB
 0040 43c32336 142d118c 898c7c9c 0c0a6006  C.#6.-....|...`.
 0050 0f27030b 58349171 1923135c 0d4b895e  .'..X4.q.#.\.K.^
 0060 32037359 620e9bbb 5fa88233 0300a167  2.sYb..._..3...g
 0070 1a7f                                 ..              
Contents of section .gnu.lto_.symtab.faa6142d1fc0c505:
 0000 76616c00 00040004 00000000 000000a5  val.............
 0010 000000                               ...                                                                                                                                                  
Contents of section .gnu.lto_.opts:                                                                                                                                                             
 0000 272d6d74 756e653d 67656e65 72696327  '-mtune=generic'                                                                                                                                     
 0010 20272d6d 61726368 3d783836 2d363427   '-march=x86-64'                                                                                                                                     
 0020 20272d66 6c746f27 00                  '-flto'.                                                                                                                                            
Contents of section .comment:                                                                                                                                                                   
 0000 00474343 3a202853 55534520 4c696e75  .GCC: (SUSE Linu                                                                                                                                     
 0010 78292034 2e382e33 20323031 34303632  x) 4.8.3 2014062                                                                                                                                     
 0020 37205b67 63632d34 5f382d62 72616e63  7 [gcc-4_8-branc
 0030 68207265 76697369 6f6e2032 31323036  h revision 21206
 0040 345d00                               4].             

as you can see only opts/comment and symtab sections come out decompressed.
The sequence x.cb`b is zlib's header.  The description is here:
https://tools.ietf.org/html/rfc1950#page-4

There don't seem to be unique identifier of the zlib header that would
allow us to tell lto_header apart from zlib, so I don't think I can play
a trick and auto-detect the compression.  As such, I do not think
we can get a header about compression into section w/o breaking
backward compatibility short of inventing our own mallformed zlib
header which we would be sure to be able to tell apart from zlib's.
That would defeat the plan to not increase the section sizes.

I would preffer to go with my current solution until we make a new
"major major" revision of the format where we will have a chance
to drop all this and cleanup other design mistakes of the original
LTO format.

Honza
Richard Biener Dec. 12, 2015, 7:47 p.m. UTC | #4
On December 12, 2015 7:47:39 PM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > So would the patch be a lot more difficult if you go down either of
>> > the routes above?  (I think I prefer changing lto_header rather
>> > than making main_size a bitfield)
>> 
>> Agreed ;)
>Hmm, actually it seems things are difficult. All the headers are
>already compressed
>by zlib:
>Contents of section .gnu.lto_.inline.faa6142d1fc0c505:
> 0000 789c6362 6062c006 10a20000 ba0007    x.cb`b......... 
>Contents of section .gnu.lto_.symbol_nodes.faa6142d1fc0c505:
> 0000 789c6362 6062e060 80025606 06f67a06  x.cb`b.`..V...z.
> 0010 0600029c 0098                        ......          
>Contents of section .gnu.lto_.refs.faa6142d1fc0c505:
> 0000 789c6362 60626064 80020000 460006    x.cb`b`d....F.. 
>Contents of section .gnu.lto_.decls.faa6142d1fc0c505:
> 0000 789c6362 6062d060 6060f80f 0461405a  x.cb`b.```...a@Z
> 0010 10881981 d88e0115 80c496a2 8929312c  .............)1,
> 0020 6c5870e0 c1d2291c 131a1b1b feffaf4f  lXp...)........O
> 0030 6464e104 9ac7cec8 a0c5b0b2 a1a16142  dd............aB
> 0040 43c32336 142d118c 898c7c9c 0c0a6006  C.#6.-....|...`.
> 0050 0f27030b 58349171 1923135c 0d4b895e  .'..X4.q.#.\.K.^
> 0060 32037359 620e9bbb 5fa88233 0300a167  2.sYb..._..3...g
> 0070 1a7f                                 ..              
>Contents of section .gnu.lto_.symtab.faa6142d1fc0c505:
> 0000 76616c00 00040004 00000000 000000a5  val.............
>0010 000000                               ...                          
>                                                                       
>Contents of section .gnu.lto_.opts:                                    
>                                                                       
>0000 272d6d74 756e653d 67656e65 72696327  '-mtune=generic'             
>                                                                       
>0010 20272d6d 61726368 3d783836 2d363427   '-march=x86-64'             
>                                                                       
>0020 20272d66 6c746f27 00                  '-flto'.                    
>                                                                       
>Contents of section .comment:                                          
>                                                                       
>0000 00474343 3a202853 55534520 4c696e75  .GCC: (SUSE Linu             
>                                                                       
>0010 78292034 2e382e33 20323031 34303632  x) 4.8.3 2014062             
>                                                                       
> 0020 37205b67 63632d34 5f382d62 72616e63  7 [gcc-4_8-branc
> 0030 68207265 76697369 6f6e2032 31323036  h revision 21206
> 0040 345d00                               4].             
>
>as you can see only opts/comment and symtab sections come out
>decompressed.
>The sequence x.cb`b is zlib's header.  The description is here:
>https://tools.ietf.org/html/rfc1950#page-4
>
>There don't seem to be unique identifier of the zlib header that would
>allow us to tell lto_header apart from zlib, so I don't think I can
>play
>a trick and auto-detect the compression.  As such, I do not think
>we can get a header about compression into section w/o breaking
>backward compatibility short of inventing our own mallformed zlib
>header which we would be sure to be able to tell apart from zlib's.
>That would defeat the plan to not increase the section sizes.

Aww, yeah.  Now I remember.

>I would preffer to go with my current solution until we make a new
>"major major" revision of the format where we will have a chance
>to drop all this and cleanup other design mistakes of the original
>LTO format.

OK...

Richard.

>
>Honza
diff mbox

Patch

Index: cgraph.c
===================================================================
--- cgraph.c	(revision 231546)
+++ cgraph.c	(working copy)
@@ -3251,9 +3251,11 @@  cgraph_node::get_untransformed_body (voi
 
   /* We may have renamed the declaration, e.g., a static function.  */
   name = lto_get_decl_name_mapping (file_data, name);
+  struct lto_in_decl_state *decl_state
+	 = lto_get_function_in_decl_state (file_data, decl);
 
   data = lto_get_section_data (file_data, LTO_section_function_body,
-			       name, &len);
+			       name, &len, decl_state->compressed);
   if (!data)
     fatal_error (input_location, "%s: section %s is missing",
 		 file_data->file_name,
@@ -3264,7 +3266,7 @@  cgraph_node::get_untransformed_body (voi
   lto_input_function_body (file_data, this, data);
   lto_stats.num_function_bodies++;
   lto_free_section_data (file_data, LTO_section_function_body, name,
-			 data, len);
+			 data, len, decl_state->compressed);
   lto_free_function_in_decl_state_for_node (this);
   /* Keep lto file data so ipa-inline-analysis knows about cross module
      inlining.  */
Index: lto-section-in.c
===================================================================
--- lto-section-in.c	(revision 231546)
+++ lto-section-in.c	(working copy)
@@ -130,7 +130,7 @@  const char *
 lto_get_section_data (struct lto_file_decl_data *file_data,
 		      enum lto_section_type section_type,
 		      const char *name,
-		      size_t *len)
+		      size_t *len, bool decompress)
 {
   const char *data = (get_section_f) (file_data, section_type, name, len);
   const size_t header_length = sizeof (struct lto_data_header);
@@ -142,9 +142,10 @@  lto_get_section_data (struct lto_file_de
   if (data == NULL)
     return NULL;
 
-  /* FIXME lto: WPA mode does not write compressed sections, so for now
-     suppress uncompression if flag_ltrans.  */
-  if (!flag_ltrans)
+  /* WPA->ltrans streams are not compressed with exception of function bodies
+     and variable initializers that has been verbatim copied from earlier
+     compilations.  */
+  if (!flag_ltrans || decompress)
     {
       /* Create a mapping header containing the underlying data and length,
 	 and prepend this to the uncompression buffer.  The uncompressed data
@@ -170,6 +171,16 @@  lto_get_section_data (struct lto_file_de
   return data;
 }
 
+/* Get the section data without any header parsing or uncompression.  */
+
+const char *
+lto_get_raw_section_data (struct lto_file_decl_data *file_data,
+			  enum lto_section_type section_type,
+			  const char *name,
+			  size_t *len)
+{
+  return (get_section_f) (file_data, section_type, name, len);
+}
 
 /* Free the data found from the above call.  The first three
    parameters are the same as above.  DATA is the data to be freed and
@@ -180,7 +191,7 @@  lto_free_section_data (struct lto_file_d
 		       enum lto_section_type section_type,
 		       const char *name,
 		       const char *data,
-		       size_t len)
+		       size_t len, bool decompress)
 {
   const size_t header_length = sizeof (struct lto_data_header);
   const char *real_data = data - header_length;
@@ -189,9 +200,7 @@  lto_free_section_data (struct lto_file_d
 
   gcc_assert (free_section_f);
 
-  /* FIXME lto: WPA mode does not write compressed sections, so for now
-     suppress uncompression mapping if flag_ltrans.  */
-  if (flag_ltrans)
+  if (flag_ltrans && !decompress)
     {
       (free_section_f) (file_data, section_type, name, data, len);
       return;
@@ -203,6 +212,17 @@  lto_free_section_data (struct lto_file_d
   free (CONST_CAST (char *, real_data));
 }
 
+/* Free data allocated by lto_get_raw_section_data.  */
+
+void
+lto_free_raw_section_data (struct lto_file_decl_data *file_data,
+		           enum lto_section_type section_type,
+		           const char *name,
+		           const char *data,
+		           size_t len)
+{
+  (free_section_f) (file_data, section_type, name, data, len);
+}
 
 /* Load a section of type SECTION_TYPE from FILE_DATA, parse the
    header and then return an input block pointing to the section.  The
Index: varpool.c
===================================================================
--- varpool.c	(revision 231546)
+++ varpool.c	(working copy)
@@ -296,9 +303,11 @@  varpool_node::get_constructor (void)
 
   /* We may have renamed the declaration, e.g., a static function.  */
   name = lto_get_decl_name_mapping (file_data, name);
+  struct lto_in_decl_state *decl_state
+	 = lto_get_function_in_decl_state (file_data, decl);
 
   data = lto_get_section_data (file_data, LTO_section_function_body,
-			       name, &len);
+			       name, &len, decl_state->compressed);
   if (!data)
     fatal_error (input_location, "%s: section %s is missing",
 		 file_data->file_name,
@@ -308,7 +317,7 @@  varpool_node::get_constructor (void)
   gcc_assert (DECL_INITIAL (decl) != error_mark_node);
   lto_stats.num_function_bodies++;
   lto_free_section_data (file_data, LTO_section_function_body, name,
-			 data, len);
+			 data, len, decl_state->compressed);
   lto_free_function_in_decl_state_for_node (this);
   timevar_pop (TV_IPA_LTO_CTORS_IN);
   return DECL_INITIAL (decl);
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(revision 231546)
+++ lto-streamer-out.c	(working copy)
@@ -2191,22 +2224,23 @@  copy_function_or_variable (struct symtab
   struct lto_in_decl_state *in_state;
   struct lto_out_decl_state *out_state = lto_get_out_decl_state ();
 
-  lto_begin_section (section_name, !flag_wpa);
+  lto_begin_section (section_name, false);
   free (section_name);
 
   /* We may have renamed the declaration, e.g., a static function.  */
   name = lto_get_decl_name_mapping (file_data, name);
 
-  data = lto_get_section_data (file_data, LTO_section_function_body,
-                               name, &len);
+  data = lto_get_raw_section_data (file_data, LTO_section_function_body,
+                                   name, &len);
   gcc_assert (data);
 
   /* Do a bit copy of the function body.  */
-  lto_write_data (data, len);
+  lto_write_raw_data (data, len);
 
   /* Copy decls. */
   in_state =
     lto_get_function_in_decl_state (node->lto_file_data, function);
+  out_state->compressed = in_state->compressed;
   gcc_assert (in_state);
 
   for (i = 0; i < LTO_N_DECL_STREAMS; i++)
@@ -2224,8 +2258,8 @@  copy_function_or_variable (struct symtab
 	encoder->trees.safe_push ((*trees)[j]);
     }
 
-  lto_free_section_data (file_data, LTO_section_function_body, name,
-			 data, len);
+  lto_free_raw_section_data (file_data, LTO_section_function_body, name,
+			     data, len);
   lto_end_section ();
 }
 
@@ -2431,6 +2465,7 @@  lto_output_decl_state_refs (struct outpu
   decl = (state->fn_decl) ? state->fn_decl : void_type_node;
   streamer_tree_cache_lookup (ob->writer_cache, decl, &ref);
   gcc_assert (ref != (unsigned)-1);
+  ref = ref * 2 + (state->compressed ? 1 : 0);
   lto_write_data (&ref, sizeof (uint32_t));
 
   for (i = 0;  i < LTO_N_DECL_STREAMS; i++)
Index: lto/lto-symtab.c
===================================================================
--- lto/lto-symtab.c	(revision 231548)
+++ lto/lto-symtab.c	(working copy)
@@ -883,6 +883,11 @@  lto_symtab_merge_symbols_1 (symtab_node
 	  else
 	    {
 	      DECL_INITIAL (e->decl) = error_mark_node;
+	      if (e->lto_file_data)
+		{
+		  lto_free_function_in_decl_state_for_node (e);
+		  e->lto_file_data = NULL;
+		}
 	      symtab->call_varpool_removal_hooks (dyn_cast<varpool_node *> (e));
 	    }
 	  e->remove_all_references ();
Index: lto/lto.c
===================================================================
--- lto/lto.c	(revision 231546)
+++ lto/lto.c	(working copy)
@@ -234,6 +234,8 @@  lto_read_in_decl_state (struct data_in *
   uint32_t i, j;
 
   ix = *data++;
+  state->compressed = ix & 1;
+  ix /= 2;
   decl = streamer_tree_cache_get_tree (data_in->reader_cache, ix);
   if (!VAR_OR_FUNCTION_DECL_P (decl))
     {
Index: lto-streamer.h
===================================================================
--- lto-streamer.h	(revision 231546)
+++ lto-streamer.h	(working copy)
@@ -504,6 +505,9 @@  struct GTY((for_user)) lto_in_decl_state
   /* If this in-decl state is associated with a function. FN_DECL
      point to the FUNCTION_DECL. */
   tree fn_decl;
+
+  /* True if decl state is compressed.  */
+  bool compressed;
 };
 
 typedef struct lto_in_decl_state *lto_in_decl_state_ptr;
@@ -537,6 +541,9 @@  struct lto_out_decl_state
   /* If this out-decl state belongs to a function, fn_decl points to that
      function.  Otherwise, it is NULL. */
   tree fn_decl;
+
+  /* True if decl state is compressed.  */
+  bool compressed;
 };
 
 typedef struct lto_out_decl_state *lto_out_decl_state_ptr;
@@ -761,10 +768,18 @@  extern void lto_set_in_hooks (struct lto
 extern struct lto_file_decl_data **lto_get_file_decl_data (void);
 extern const char *lto_get_section_data (struct lto_file_decl_data *,
 					 enum lto_section_type,
-					 const char *, size_t *);
+					 const char *, size_t *,
+					 bool decompress = false);
+extern const char *lto_get_raw_section_data (struct lto_file_decl_data *,
+					     enum lto_section_type,
+					     const char *, size_t *);
 extern void lto_free_section_data (struct lto_file_decl_data *,
-				   enum lto_section_type,
-				   const char *, const char *, size_t);
+			           enum lto_section_type,
+				   const char *, const char *, size_t,
+				   bool decompress = false);
+extern void lto_free_raw_section_data (struct lto_file_decl_data *,
+				       enum lto_section_type,
+				       const char *, const char *, size_t);
 extern htab_t lto_create_renaming_table (void);
 extern void lto_record_renamed_decl (struct lto_file_decl_data *,
 				     const char *, const char *);
@@ -785,6 +800,7 @@  extern void lto_value_range_error (const
 extern void lto_begin_section (const char *, bool);
 extern void lto_end_section (void);
 extern void lto_write_data (const void *, unsigned int);
+extern void lto_write_raw_data (const void *, unsigned int);
 extern void lto_write_stream (struct lto_output_stream *);
 extern bool lto_output_decl_index (struct lto_output_stream *,
 			    struct lto_tree_ref_encoder *,
Index: lto-section-out.c
===================================================================
--- lto-section-out.c	(revision 231546)
+++ lto-section-out.c	(working copy)
@@ -66,9 +66,6 @@  lto_begin_section (const char *name, boo
 {
   lang_hooks.lto.begin_section (name);
 
-  /* FIXME lto: for now, suppress compression if the lang_hook that appends
-     data is anything other than assembler output.  The effect here is that
-     we get compression of IL only in non-ltrans object files.  */
   gcc_assert (compression_stream == NULL);
   if (compress)
     compression_stream = lto_start_compression (lto_append_data, NULL);
@@ -99,6 +96,14 @@  lto_write_data (const void *data, unsign
     lang_hooks.lto.append_data ((const char *)data, size, NULL);
 }
 
+/* Write SIZE bytes starting at DATA to the assembler.  */
+
+void
+lto_write_raw_data (const void *data, unsigned int size)
+{
+  lang_hooks.lto.append_data ((const char *)data, size, NULL);
+}
+
 /* Write all of the chars in OBS to the assembler.  Recycle the blocks
    in obs as this is being done.  */
 
@@ -123,10 +128,6 @@  lto_write_stream (struct lto_output_stre
       if (!next_block)
 	num_chars -= obs->left_in_block;
 
-      /* FIXME lto: WPA mode uses an ELF function as a lang_hook to append
-         output data.  This hook is not happy with the way that compression
-         blocks up output differently to the way it's blocked here.  So for
-         now, we don't compress WPA output.  */
       if (compression_stream)
 	lto_compress_block (compression_stream, base, num_chars);
       else
@@ -295,6 +296,9 @@  lto_new_out_decl_state (void)
   for (i = 0; i < LTO_N_DECL_STREAMS; i++)
     lto_init_tree_ref_encoder (&state->streams[i]);
 
+  /* At WPA time we do not compress sections by default.  */
+  state->compressed = !flag_wpa;
+
   return state;
 }