Message ID | 20151211011916.GA5527@kam.mff.cuni.cz |
---|---|
State | New |
Headers | show |
On Fri, 11 Dec 2015, Jan Hubicka wrote: > Hi, > this patch makes WPA to copy sections w/o decompressing them. This leads > to a nice /tmp usage for GCC bootstrap (about 70%) and little for Firefox. > In GCC about 5% of the ltrans object file is the global decl section, while > for Firefox it is 85%. I will try to figure out if there is something > terribly stupid pickled there. > > The patch simply adds raw section i/o to lto-section-in.c and lto-section-out.c > which is used by copy_function_or_variable. The catch is that WPA->ltrans > stremaing is not compressed and this fact is not represented in the object file > at all. We simply test flag_wpa and flag_ltrans. Now function sections born > at WPA time are uncompressed, while function sections just copied are > compressed and we do not know how to read them. > > I tried to simply turn off the non-compressed path and set compression level > to minimal and then to none (which works despite the apparently outdated FIXME > comments I removed). Sadly zlib manages to burn about 16% of WPA time > at minimal level and about 7% at none because it computes the checksum. Clealry > next stage1 it is time to switch to better compression backend. > > For now I added the information if section is compressed into > decl_state. I am not thrilled by this but it is only way I found w/o > wasting 4 bytes per every lto section (because the lto header is not > really extensible and the stream is assumed to be aligned). So this trick now only applies to decl sections? I think you could have stolen a bit from lto_simple_header::main_size (oddly lto_simple_header_with_strings adds its own main_size, hiding the simple-hearder ones - huh). Changing lto_header itself into int16_t major_version int8_t minor_version int8_t flags would be another possibility (and bump the major version). I think we have no sections produced with just lto_header but always lto_simple_header (from grepping). Some sections have no header (lto.opts). So would the patch be a lot more difficult if you go down either of the routes above? (I think I prefer changing lto_header rather than making main_size a bitfield) Richard. > The whole lowlevel lto streaming code is grand mess, I hope we will clean this > up and get more sane headers in foreseable future. Until that time this > solution does not waste extra space as it is easy to pickle the flag as part of > reference. > > The patch saves about 7% of WPA time for firefox: > > phase opt and generate : 75.66 (39%) usr 1.78 (14%) sys 77.44 (37%) wall 855644 kB (21%) ggc > phase stream in : 34.62 (18%) usr 1.95 (16%) sys 36.57 (18%) wall 3245604 kB (79%) ggc > phase stream out : 81.89 (42%) usr 8.49 (69%) sys 90.37 (44%) wall 50 kB ( 0%) ggc > ipa dead code removal : 4.33 ( 2%) usr 0.06 ( 0%) sys 4.24 ( 2%) wall 0 kB ( 0%) ggc > ipa virtual call target : 25.15 (13%) usr 0.14 ( 1%) sys 25.42 (12%) wall 0 kB ( 0%) ggc > ipa cp : 3.92 ( 2%) usr 0.21 ( 2%) sys 4.18 ( 2%) wall 340698 kB ( 8%) ggc > ipa inlining heuristics : 24.12 (12%) usr 0.38 ( 3%) sys 24.37 (12%) wall 500427 kB (12%) ggc > lto stream inflate : 7.07 ( 4%) usr 0.38 ( 3%) sys 7.33 ( 4%) wall 0 kB ( 0%) ggc > ipa lto gimple in : 1.95 ( 1%) usr 0.61 ( 5%) sys 2.42 ( 1%) wall 324875 kB ( 8%) ggc > ipa lto gimple out : 9.16 ( 5%) usr 1.64 (13%) sys 10.49 ( 5%) wall 50 kB ( 0%) ggc > ipa lto decl in : 21.25 (11%) usr 1.01 ( 8%) sys 22.37 (11%) wall 2348869 kB (57%) ggc > ipa lto decl out : 67.33 (34%) usr 1.66 (13%) sys 68.96 (33%) wall 0 kB ( 0%) ggc > ipa lto constructors out: 1.39 ( 1%) usr 0.38 ( 3%) sys 2.18 ( 1%) wall 0 kB ( 0%) ggc > ipa lto decl merge : 2.12 ( 2%) usr 0.00 ( 0%) sys 2.12 ( 2%) wall 13737 kB ( 0%) ggc > ipa reference : 2.14 ( 2%) usr 0.00 ( 0%) sys 2.13 ( 2%) wall 0 kB ( 0%) ggc > ipa pure const : 2.29 ( 2%) usr 0.01 ( 0%) sys 2.35 ( 2%) wall 0 kB ( 0%) ggc > ipa icf : 9.02 ( 7%) usr 0.18 ( 2%) sys 9.72 ( 7%) wall 19203 kB ( 0%) ggc > TOTAL : 195.27 12.37 207.64 4103297 kB > > > phase opt and generate : 79.00 (38%) usr 1.61 (13%) sys 80.61 (36%) wall 1000597 kB (24%) ggc > phase stream in : 33.93 (16%) usr 1.91 (15%) sys 35.83 (16%) wall 3242293 kB (76%) ggc > phase stream out : 96.90 (46%) usr 9.19 (72%) sys 106.09 (48%) wall 52 kB ( 0%) ggc > garbage collection : 2.94 ( 1%) usr 0.00 ( 0%) sys 2.93 ( 1%) wall 0 kB ( 0%) ggc > ipa dead code removal : 4.60 ( 2%) usr 0.04 ( 0%) sys 4.53 ( 2%) wall 0 kB ( 0%) ggc > ipa virtual call target : 24.48 (12%) usr 0.14 ( 1%) sys 24.76 (11%) wall 0 kB ( 0%) ggc > ipa cp : 4.92 ( 2%) usr 0.41 ( 3%) sys 5.31 ( 2%) wall 502843 kB (12%) ggc > ipa inlining heuristics : 23.72 (11%) usr 0.23 ( 2%) sys 23.92 (11%) wall 490927 kB (12%) ggc > lto stream inflate : 14.35 ( 7%) usr 0.35 ( 3%) sys 15.22 ( 7%) wall 0 kB ( 0%) ggc > ipa lto gimple in : 1.79 ( 1%) usr 0.57 ( 4%) sys 2.46 ( 1%) wall 324857 kB ( 8%) ggc > ipa lto gimple out : 9.98 ( 5%) usr 1.45 (11%) sys 11.05 ( 5%) wall 52 kB ( 0%) ggc > ipa lto decl in : 21.01 (10%) usr 0.91 ( 7%) sys 21.90 (10%) wall 2345561 kB (55%) ggc > ipa lto decl out : 73.55 (35%) usr 2.09 (16%) sys 75.67 (34%) wall 0 kB ( 0%) ggc > ipa lto constructors out: 1.87 ( 1%) usr 0.32 ( 3%) sys 2.18 ( 1%) wall 0 kB ( 0%) ggc > ipa lto decl merge : 2.06 ( 1%) usr 0.00 ( 0%) sys 2.05 ( 1%) wall 13737 kB ( 0%) ggc > whopr wpa I/O : 2.84 ( 1%) usr 5.14 (40%) sys 7.96 ( 4%) wall 0 kB ( 0%) ggc > whopr partitioning : 3.83 ( 2%) usr 0.01 ( 0%) sys 3.84 ( 2%) wall 5958 kB ( 0%) ggc > ipa reference : 2.63 ( 1%) usr 0.00 ( 0%) sys 2.64 ( 1%) wall 0 kB ( 0%) ggc > ipa icf : 8.23 ( 4%) usr 0.12 ( 1%) sys 8.32 ( 4%) wall 19203 kB ( 0%) ggc > TOTAL : 209.83 12.71 222.54 4244939 kB > > This now compares well to 5.3: > > Execution times (seconds) > phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1989 kB ( 0%) ggc > phase opt and generate : 68.61 (31%) usr 2.41 (14%) sys 77.67 (29%) wall 1189579 kB (27%) ggc > phase stream in : 36.38 (16%) usr 2.32 (14%) sys 56.20 (21%) wall 3168787 kB (73%) ggc > phase stream out : 113.37 (51%) usr 11.90 (71%) sys 130.49 (49%) wall 112 kB ( 0%) ggc > phase finalize : 3.40 ( 2%) usr 0.13 ( 1%) sys 3.55 ( 1%) wall 0 kB ( 0%) ggc > garbage collection : 6.13 ( 3%) usr 0.01 ( 0%) sys 6.18 ( 2%) wall 0 kB ( 0%) ggc > ipa dead code removal : 4.74 ( 2%) usr 0.05 ( 0%) sys 5.09 ( 2%) wall 0 kB ( 0%) ggc > ipa virtual call target : 11.29 ( 5%) usr 0.15 ( 1%) sys 11.20 ( 4%) wall 1 kB ( 0%) ggc > ipa cp : 5.22 ( 2%) usr 0.21 ( 1%) sys 5.51 ( 2%) wall 507623 kB (12%) ggc > ipa inlining heuristics : 24.11 (11%) usr 0.33 ( 2%) sys 24.67 ( 9%) wall 497487 kB (11%) ggc > ipa lto gimple in : 4.20 ( 2%) usr 1.08 ( 6%) sys 10.73 ( 4%) wall 467276 kB (11%) ggc > ipa lto gimple out : 17.57 ( 8%) usr 1.92 (11%) sys 23.61 ( 9%) wall 112 kB ( 0%) ggc > ipa lto decl in : 26.19 (12%) usr 1.20 ( 7%) sys 31.62 (12%) wall 2242394 kB (51%) ggc > ipa lto decl out : 89.09 (40%) usr 3.64 (22%) sys 92.79 (35%) wall 0 kB ( 0%) ggc > ipa lto constructors in : 0.79 ( 0%) usr 0.28 ( 2%) sys 14.33 ( 5%) wall 17992 kB ( 0%) ggc > ipa lto constructors out: 2.57 ( 1%) usr 0.41 ( 2%) sys 4.02 ( 2%) wall 0 kB ( 0%) ggc > ipa lto cgraph I/O : 1.11 ( 1%) usr 0.33 ( 2%) sys 1.81 ( 1%) wall 432544 kB (10%) ggc > ipa lto decl merge : 2.47 ( 1%) usr 0.00 ( 0%) sys 2.47 ( 1%) wall 8191 kB ( 0%) ggc > ipa lto cgraph merge : 1.91 ( 1%) usr 0.01 ( 0%) sys 1.97 ( 1%) wall 14717 kB ( 0%) ggc > whopr wpa I/O : 2.92 ( 1%) usr 5.93 (35%) sys 8.84 ( 3%) wall 0 kB ( 0%) ggc > whopr partitioning : 3.91 ( 2%) usr 0.02 ( 0%) sys 3.93 ( 1%) wall 6001 kB ( 0%) ggc > ipa icf : 7.77 ( 4%) usr 0.19 ( 1%) sys 8.05 ( 3%) wall 22534 kB ( 1%) ggc > TOTAL : 221.76 16.76 267.92 4360470 kB > > Except that I really need to do something with virtual call targets. As the > quality of information improved by improved TBAA we now do more walks. > > The savings for cc1 build are bigger and incremental linking improvements eveyr bigger > (about 50%), but I accidentaly removed the logs... > > lto-bootstrapped/regtested x86_64-linux, OK? > > * cgraph.c (cgraph_node::get_untransformed_body): Pass compressed > flag to lto_get_section_data. > * varpool.c (varpool_node::get_constructor): Likewise. > * lto-section-in.c (lto_get_section_data): Add new flag decompress. > (lto_free_section_data): Likewise. > (lto_get_raw_section_data): New function. > (lto_free_raw_section_data): New function. > (copy_function_or_variable): Copy sections w/o decompressing. > (lto_output_decl_state_refs): Picke compressed bit. > * lto-streamer.h (lto_in_decl_state): New flag compressed. > (lto_out_decl_state): Likewise. > (lto_get_section_data, lto_free_section_data): Update prototypes > (lto_get_raw_section_data, lto_free_raw_section_data): Declare. > (lto_write_raw_data): Declare. > (lto_begin_section): Remove FIXME. > (lto_write_raw_data): New function. > (lto_write_stream): Remove FIXME. > (lto_new_out_decl_state): Set compressed flag. > > * lto.c (lto_read_in_decl_state): Unpickle compressed bit. > Index: cgraph.c > =================================================================== > --- cgraph.c (revision 231546) > +++ cgraph.c (working copy) > @@ -3251,9 +3251,11 @@ cgraph_node::get_untransformed_body (voi > > /* We may have renamed the declaration, e.g., a static function. */ > name = lto_get_decl_name_mapping (file_data, name); > + struct lto_in_decl_state *decl_state > + = lto_get_function_in_decl_state (file_data, decl); > > data = lto_get_section_data (file_data, LTO_section_function_body, > - name, &len); > + name, &len, decl_state->compressed); > if (!data) > fatal_error (input_location, "%s: section %s is missing", > file_data->file_name, > @@ -3264,7 +3266,7 @@ cgraph_node::get_untransformed_body (voi > lto_input_function_body (file_data, this, data); > lto_stats.num_function_bodies++; > lto_free_section_data (file_data, LTO_section_function_body, name, > - data, len); > + data, len, decl_state->compressed); > lto_free_function_in_decl_state_for_node (this); > /* Keep lto file data so ipa-inline-analysis knows about cross module > inlining. */ > Index: lto-section-in.c > =================================================================== > --- lto-section-in.c (revision 231546) > +++ lto-section-in.c (working copy) > @@ -130,7 +130,7 @@ const char * > lto_get_section_data (struct lto_file_decl_data *file_data, > enum lto_section_type section_type, > const char *name, > - size_t *len) > + size_t *len, bool decompress) > { > const char *data = (get_section_f) (file_data, section_type, name, len); > const size_t header_length = sizeof (struct lto_data_header); > @@ -142,9 +142,10 @@ lto_get_section_data (struct lto_file_de > if (data == NULL) > return NULL; > > - /* FIXME lto: WPA mode does not write compressed sections, so for now > - suppress uncompression if flag_ltrans. */ > - if (!flag_ltrans) > + /* WPA->ltrans streams are not compressed with exception of function bodies > + and variable initializers that has been verbatim copied from earlier > + compilations. */ > + if (!flag_ltrans || decompress) > { > /* Create a mapping header containing the underlying data and length, > and prepend this to the uncompression buffer. The uncompressed data > @@ -170,6 +171,16 @@ lto_get_section_data (struct lto_file_de > return data; > } > > +/* Get the section data without any header parsing or uncompression. */ > + > +const char * > +lto_get_raw_section_data (struct lto_file_decl_data *file_data, > + enum lto_section_type section_type, > + const char *name, > + size_t *len) > +{ > + return (get_section_f) (file_data, section_type, name, len); > +} > > /* Free the data found from the above call. The first three > parameters are the same as above. DATA is the data to be freed and > @@ -180,7 +191,7 @@ lto_free_section_data (struct lto_file_d > enum lto_section_type section_type, > const char *name, > const char *data, > - size_t len) > + size_t len, bool decompress) > { > const size_t header_length = sizeof (struct lto_data_header); > const char *real_data = data - header_length; > @@ -189,9 +200,7 @@ lto_free_section_data (struct lto_file_d > > gcc_assert (free_section_f); > > - /* FIXME lto: WPA mode does not write compressed sections, so for now > - suppress uncompression mapping if flag_ltrans. */ > - if (flag_ltrans) > + if (flag_ltrans && !decompress) > { > (free_section_f) (file_data, section_type, name, data, len); > return; > @@ -203,6 +212,17 @@ lto_free_section_data (struct lto_file_d > free (CONST_CAST (char *, real_data)); > } > > +/* Free data allocated by lto_get_raw_section_data. */ > + > +void > +lto_free_raw_section_data (struct lto_file_decl_data *file_data, > + enum lto_section_type section_type, > + const char *name, > + const char *data, > + size_t len) > +{ > + (free_section_f) (file_data, section_type, name, data, len); > +} > > /* Load a section of type SECTION_TYPE from FILE_DATA, parse the > header and then return an input block pointing to the section. The > Index: varpool.c > =================================================================== > --- varpool.c (revision 231546) > +++ varpool.c (working copy) > @@ -296,9 +303,11 @@ varpool_node::get_constructor (void) > > /* We may have renamed the declaration, e.g., a static function. */ > name = lto_get_decl_name_mapping (file_data, name); > + struct lto_in_decl_state *decl_state > + = lto_get_function_in_decl_state (file_data, decl); > > data = lto_get_section_data (file_data, LTO_section_function_body, > - name, &len); > + name, &len, decl_state->compressed); > if (!data) > fatal_error (input_location, "%s: section %s is missing", > file_data->file_name, > @@ -308,7 +317,7 @@ varpool_node::get_constructor (void) > gcc_assert (DECL_INITIAL (decl) != error_mark_node); > lto_stats.num_function_bodies++; > lto_free_section_data (file_data, LTO_section_function_body, name, > - data, len); > + data, len, decl_state->compressed); > lto_free_function_in_decl_state_for_node (this); > timevar_pop (TV_IPA_LTO_CTORS_IN); > return DECL_INITIAL (decl); > Index: lto-streamer-out.c > =================================================================== > --- lto-streamer-out.c (revision 231546) > +++ lto-streamer-out.c (working copy) > @@ -2191,22 +2224,23 @@ copy_function_or_variable (struct symtab > struct lto_in_decl_state *in_state; > struct lto_out_decl_state *out_state = lto_get_out_decl_state (); > > - lto_begin_section (section_name, !flag_wpa); > + lto_begin_section (section_name, false); > free (section_name); > > /* We may have renamed the declaration, e.g., a static function. */ > name = lto_get_decl_name_mapping (file_data, name); > > - data = lto_get_section_data (file_data, LTO_section_function_body, > - name, &len); > + data = lto_get_raw_section_data (file_data, LTO_section_function_body, > + name, &len); > gcc_assert (data); > > /* Do a bit copy of the function body. */ > - lto_write_data (data, len); > + lto_write_raw_data (data, len); > > /* Copy decls. */ > in_state = > lto_get_function_in_decl_state (node->lto_file_data, function); > + out_state->compressed = in_state->compressed; > gcc_assert (in_state); > > for (i = 0; i < LTO_N_DECL_STREAMS; i++) > @@ -2224,8 +2258,8 @@ copy_function_or_variable (struct symtab > encoder->trees.safe_push ((*trees)[j]); > } > > - lto_free_section_data (file_data, LTO_section_function_body, name, > - data, len); > + lto_free_raw_section_data (file_data, LTO_section_function_body, name, > + data, len); > lto_end_section (); > } > > @@ -2431,6 +2465,7 @@ lto_output_decl_state_refs (struct outpu > decl = (state->fn_decl) ? state->fn_decl : void_type_node; > streamer_tree_cache_lookup (ob->writer_cache, decl, &ref); > gcc_assert (ref != (unsigned)-1); > + ref = ref * 2 + (state->compressed ? 1 : 0); > lto_write_data (&ref, sizeof (uint32_t)); > > for (i = 0; i < LTO_N_DECL_STREAMS; i++) > Index: lto/lto-symtab.c > =================================================================== > --- lto/lto-symtab.c (revision 231548) > +++ lto/lto-symtab.c (working copy) > @@ -883,6 +883,11 @@ lto_symtab_merge_symbols_1 (symtab_node > else > { > DECL_INITIAL (e->decl) = error_mark_node; > + if (e->lto_file_data) > + { > + lto_free_function_in_decl_state_for_node (e); > + e->lto_file_data = NULL; > + } > symtab->call_varpool_removal_hooks (dyn_cast<varpool_node *> (e)); > } > e->remove_all_references (); > Index: lto/lto.c > =================================================================== > --- lto/lto.c (revision 231546) > +++ lto/lto.c (working copy) > @@ -234,6 +234,8 @@ lto_read_in_decl_state (struct data_in * > uint32_t i, j; > > ix = *data++; > + state->compressed = ix & 1; > + ix /= 2; > decl = streamer_tree_cache_get_tree (data_in->reader_cache, ix); > if (!VAR_OR_FUNCTION_DECL_P (decl)) > { > Index: lto-streamer.h > =================================================================== > --- lto-streamer.h (revision 231546) > +++ lto-streamer.h (working copy) > @@ -504,6 +505,9 @@ struct GTY((for_user)) lto_in_decl_state > /* If this in-decl state is associated with a function. FN_DECL > point to the FUNCTION_DECL. */ > tree fn_decl; > + > + /* True if decl state is compressed. */ > + bool compressed; > }; > > typedef struct lto_in_decl_state *lto_in_decl_state_ptr; > @@ -537,6 +541,9 @@ struct lto_out_decl_state > /* If this out-decl state belongs to a function, fn_decl points to that > function. Otherwise, it is NULL. */ > tree fn_decl; > + > + /* True if decl state is compressed. */ > + bool compressed; > }; > > typedef struct lto_out_decl_state *lto_out_decl_state_ptr; > @@ -761,10 +768,18 @@ extern void lto_set_in_hooks (struct lto > extern struct lto_file_decl_data **lto_get_file_decl_data (void); > extern const char *lto_get_section_data (struct lto_file_decl_data *, > enum lto_section_type, > - const char *, size_t *); > + const char *, size_t *, > + bool decompress = false); > +extern const char *lto_get_raw_section_data (struct lto_file_decl_data *, > + enum lto_section_type, > + const char *, size_t *); > extern void lto_free_section_data (struct lto_file_decl_data *, > - enum lto_section_type, > - const char *, const char *, size_t); > + enum lto_section_type, > + const char *, const char *, size_t, > + bool decompress = false); > +extern void lto_free_raw_section_data (struct lto_file_decl_data *, > + enum lto_section_type, > + const char *, const char *, size_t); > extern htab_t lto_create_renaming_table (void); > extern void lto_record_renamed_decl (struct lto_file_decl_data *, > const char *, const char *); > @@ -785,6 +800,7 @@ extern void lto_value_range_error (const > extern void lto_begin_section (const char *, bool); > extern void lto_end_section (void); > extern void lto_write_data (const void *, unsigned int); > +extern void lto_write_raw_data (const void *, unsigned int); > extern void lto_write_stream (struct lto_output_stream *); > extern bool lto_output_decl_index (struct lto_output_stream *, > struct lto_tree_ref_encoder *, > Index: lto-section-out.c > =================================================================== > --- lto-section-out.c (revision 231546) > +++ lto-section-out.c (working copy) > @@ -66,9 +66,6 @@ lto_begin_section (const char *name, boo > { > lang_hooks.lto.begin_section (name); > > - /* FIXME lto: for now, suppress compression if the lang_hook that appends > - data is anything other than assembler output. The effect here is that > - we get compression of IL only in non-ltrans object files. */ > gcc_assert (compression_stream == NULL); > if (compress) > compression_stream = lto_start_compression (lto_append_data, NULL); > @@ -99,6 +96,14 @@ lto_write_data (const void *data, unsign > lang_hooks.lto.append_data ((const char *)data, size, NULL); > } > > +/* Write SIZE bytes starting at DATA to the assembler. */ > + > +void > +lto_write_raw_data (const void *data, unsigned int size) > +{ > + lang_hooks.lto.append_data ((const char *)data, size, NULL); > +} > + > /* Write all of the chars in OBS to the assembler. Recycle the blocks > in obs as this is being done. */ > > @@ -123,10 +128,6 @@ lto_write_stream (struct lto_output_stre > if (!next_block) > num_chars -= obs->left_in_block; > > - /* FIXME lto: WPA mode uses an ELF function as a lang_hook to append > - output data. This hook is not happy with the way that compression > - blocks up output differently to the way it's blocked here. So for > - now, we don't compress WPA output. */ > if (compression_stream) > lto_compress_block (compression_stream, base, num_chars); > else > @@ -295,6 +296,9 @@ lto_new_out_decl_state (void) > for (i = 0; i < LTO_N_DECL_STREAMS; i++) > lto_init_tree_ref_encoder (&state->streams[i]); > > + /* At WPA time we do not compress sections by default. */ > + state->compressed = !flag_wpa; > + > return state; > } > > >
> > For now I added the information if section is compressed into > > decl_state. I am not thrilled by this but it is only way I found w/o > > wasting 4 bytes per every lto section (because the lto header is not > > really extensible and the stream is assumed to be aligned). > > So this trick now only applies to decl sections? I think you Only function/variable sections are copies verbatim by WPA, so yes. Everything else is re-streamed from scratch (and I do not se what else can be just copied through anyway) > could have stolen a bit from lto_simple_header::main_size > (oddly lto_simple_header_with_strings adds its own main_size, > hiding the simple-hearder ones - huh). > > Changing lto_header itself into > > int16_t major_version > int8_t minor_version > int8_t flags > > would be another possibility (and bump the major version). I think This seems better for me - we can steal just little from main_size, but I think we can be quite fine with only 256 minor versions. I will update the patch. > we have no sections produced with just lto_header but always > lto_simple_header (from grepping). Some sections have no header > (lto.opts). lto.opts is never compressed. Also the symbol table used by lto-plugin goes w/o headers. > > So would the patch be a lot more difficult if you go down either of > the routes above? (I think I prefer changing lto_header rather > than making main_size a bitfield) Agreed ;) Honza > > Richard. > > > The whole lowlevel lto streaming code is grand mess, I hope we will clean this > > up and get more sane headers in foreseable future. Until that time this > > solution does not waste extra space as it is easy to pickle the flag as part of > > reference. > > > > The patch saves about 7% of WPA time for firefox: > > > > phase opt and generate : 75.66 (39%) usr 1.78 (14%) sys 77.44 (37%) wall 855644 kB (21%) ggc > > phase stream in : 34.62 (18%) usr 1.95 (16%) sys 36.57 (18%) wall 3245604 kB (79%) ggc > > phase stream out : 81.89 (42%) usr 8.49 (69%) sys 90.37 (44%) wall 50 kB ( 0%) ggc > > ipa dead code removal : 4.33 ( 2%) usr 0.06 ( 0%) sys 4.24 ( 2%) wall 0 kB ( 0%) ggc > > ipa virtual call target : 25.15 (13%) usr 0.14 ( 1%) sys 25.42 (12%) wall 0 kB ( 0%) ggc > > ipa cp : 3.92 ( 2%) usr 0.21 ( 2%) sys 4.18 ( 2%) wall 340698 kB ( 8%) ggc > > ipa inlining heuristics : 24.12 (12%) usr 0.38 ( 3%) sys 24.37 (12%) wall 500427 kB (12%) ggc > > lto stream inflate : 7.07 ( 4%) usr 0.38 ( 3%) sys 7.33 ( 4%) wall 0 kB ( 0%) ggc > > ipa lto gimple in : 1.95 ( 1%) usr 0.61 ( 5%) sys 2.42 ( 1%) wall 324875 kB ( 8%) ggc > > ipa lto gimple out : 9.16 ( 5%) usr 1.64 (13%) sys 10.49 ( 5%) wall 50 kB ( 0%) ggc > > ipa lto decl in : 21.25 (11%) usr 1.01 ( 8%) sys 22.37 (11%) wall 2348869 kB (57%) ggc > > ipa lto decl out : 67.33 (34%) usr 1.66 (13%) sys 68.96 (33%) wall 0 kB ( 0%) ggc > > ipa lto constructors out: 1.39 ( 1%) usr 0.38 ( 3%) sys 2.18 ( 1%) wall 0 kB ( 0%) ggc > > ipa lto decl merge : 2.12 ( 2%) usr 0.00 ( 0%) sys 2.12 ( 2%) wall 13737 kB ( 0%) ggc > > ipa reference : 2.14 ( 2%) usr 0.00 ( 0%) sys 2.13 ( 2%) wall 0 kB ( 0%) ggc > > ipa pure const : 2.29 ( 2%) usr 0.01 ( 0%) sys 2.35 ( 2%) wall 0 kB ( 0%) ggc > > ipa icf : 9.02 ( 7%) usr 0.18 ( 2%) sys 9.72 ( 7%) wall 19203 kB ( 0%) ggc > > TOTAL : 195.27 12.37 207.64 4103297 kB > > > > > > phase opt and generate : 79.00 (38%) usr 1.61 (13%) sys 80.61 (36%) wall 1000597 kB (24%) ggc > > phase stream in : 33.93 (16%) usr 1.91 (15%) sys 35.83 (16%) wall 3242293 kB (76%) ggc > > phase stream out : 96.90 (46%) usr 9.19 (72%) sys 106.09 (48%) wall 52 kB ( 0%) ggc > > garbage collection : 2.94 ( 1%) usr 0.00 ( 0%) sys 2.93 ( 1%) wall 0 kB ( 0%) ggc > > ipa dead code removal : 4.60 ( 2%) usr 0.04 ( 0%) sys 4.53 ( 2%) wall 0 kB ( 0%) ggc > > ipa virtual call target : 24.48 (12%) usr 0.14 ( 1%) sys 24.76 (11%) wall 0 kB ( 0%) ggc > > ipa cp : 4.92 ( 2%) usr 0.41 ( 3%) sys 5.31 ( 2%) wall 502843 kB (12%) ggc > > ipa inlining heuristics : 23.72 (11%) usr 0.23 ( 2%) sys 23.92 (11%) wall 490927 kB (12%) ggc > > lto stream inflate : 14.35 ( 7%) usr 0.35 ( 3%) sys 15.22 ( 7%) wall 0 kB ( 0%) ggc > > ipa lto gimple in : 1.79 ( 1%) usr 0.57 ( 4%) sys 2.46 ( 1%) wall 324857 kB ( 8%) ggc > > ipa lto gimple out : 9.98 ( 5%) usr 1.45 (11%) sys 11.05 ( 5%) wall 52 kB ( 0%) ggc > > ipa lto decl in : 21.01 (10%) usr 0.91 ( 7%) sys 21.90 (10%) wall 2345561 kB (55%) ggc > > ipa lto decl out : 73.55 (35%) usr 2.09 (16%) sys 75.67 (34%) wall 0 kB ( 0%) ggc > > ipa lto constructors out: 1.87 ( 1%) usr 0.32 ( 3%) sys 2.18 ( 1%) wall 0 kB ( 0%) ggc > > ipa lto decl merge : 2.06 ( 1%) usr 0.00 ( 0%) sys 2.05 ( 1%) wall 13737 kB ( 0%) ggc > > whopr wpa I/O : 2.84 ( 1%) usr 5.14 (40%) sys 7.96 ( 4%) wall 0 kB ( 0%) ggc > > whopr partitioning : 3.83 ( 2%) usr 0.01 ( 0%) sys 3.84 ( 2%) wall 5958 kB ( 0%) ggc > > ipa reference : 2.63 ( 1%) usr 0.00 ( 0%) sys 2.64 ( 1%) wall 0 kB ( 0%) ggc > > ipa icf : 8.23 ( 4%) usr 0.12 ( 1%) sys 8.32 ( 4%) wall 19203 kB ( 0%) ggc > > TOTAL : 209.83 12.71 222.54 4244939 kB > > > > This now compares well to 5.3: > > > > Execution times (seconds) > > phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1989 kB ( 0%) ggc > > phase opt and generate : 68.61 (31%) usr 2.41 (14%) sys 77.67 (29%) wall 1189579 kB (27%) ggc > > phase stream in : 36.38 (16%) usr 2.32 (14%) sys 56.20 (21%) wall 3168787 kB (73%) ggc > > phase stream out : 113.37 (51%) usr 11.90 (71%) sys 130.49 (49%) wall 112 kB ( 0%) ggc > > phase finalize : 3.40 ( 2%) usr 0.13 ( 1%) sys 3.55 ( 1%) wall 0 kB ( 0%) ggc > > garbage collection : 6.13 ( 3%) usr 0.01 ( 0%) sys 6.18 ( 2%) wall 0 kB ( 0%) ggc > > ipa dead code removal : 4.74 ( 2%) usr 0.05 ( 0%) sys 5.09 ( 2%) wall 0 kB ( 0%) ggc > > ipa virtual call target : 11.29 ( 5%) usr 0.15 ( 1%) sys 11.20 ( 4%) wall 1 kB ( 0%) ggc > > ipa cp : 5.22 ( 2%) usr 0.21 ( 1%) sys 5.51 ( 2%) wall 507623 kB (12%) ggc > > ipa inlining heuristics : 24.11 (11%) usr 0.33 ( 2%) sys 24.67 ( 9%) wall 497487 kB (11%) ggc > > ipa lto gimple in : 4.20 ( 2%) usr 1.08 ( 6%) sys 10.73 ( 4%) wall 467276 kB (11%) ggc > > ipa lto gimple out : 17.57 ( 8%) usr 1.92 (11%) sys 23.61 ( 9%) wall 112 kB ( 0%) ggc > > ipa lto decl in : 26.19 (12%) usr 1.20 ( 7%) sys 31.62 (12%) wall 2242394 kB (51%) ggc > > ipa lto decl out : 89.09 (40%) usr 3.64 (22%) sys 92.79 (35%) wall 0 kB ( 0%) ggc > > ipa lto constructors in : 0.79 ( 0%) usr 0.28 ( 2%) sys 14.33 ( 5%) wall 17992 kB ( 0%) ggc > > ipa lto constructors out: 2.57 ( 1%) usr 0.41 ( 2%) sys 4.02 ( 2%) wall 0 kB ( 0%) ggc > > ipa lto cgraph I/O : 1.11 ( 1%) usr 0.33 ( 2%) sys 1.81 ( 1%) wall 432544 kB (10%) ggc > > ipa lto decl merge : 2.47 ( 1%) usr 0.00 ( 0%) sys 2.47 ( 1%) wall 8191 kB ( 0%) ggc > > ipa lto cgraph merge : 1.91 ( 1%) usr 0.01 ( 0%) sys 1.97 ( 1%) wall 14717 kB ( 0%) ggc > > whopr wpa I/O : 2.92 ( 1%) usr 5.93 (35%) sys 8.84 ( 3%) wall 0 kB ( 0%) ggc > > whopr partitioning : 3.91 ( 2%) usr 0.02 ( 0%) sys 3.93 ( 1%) wall 6001 kB ( 0%) ggc > > ipa icf : 7.77 ( 4%) usr 0.19 ( 1%) sys 8.05 ( 3%) wall 22534 kB ( 1%) ggc > > TOTAL : 221.76 16.76 267.92 4360470 kB > > > > Except that I really need to do something with virtual call targets. As the > > quality of information improved by improved TBAA we now do more walks. > > > > The savings for cc1 build are bigger and incremental linking improvements eveyr bigger > > (about 50%), but I accidentaly removed the logs... > > > > lto-bootstrapped/regtested x86_64-linux, OK? > > > > * cgraph.c (cgraph_node::get_untransformed_body): Pass compressed > > flag to lto_get_section_data. > > * varpool.c (varpool_node::get_constructor): Likewise. > > * lto-section-in.c (lto_get_section_data): Add new flag decompress. > > (lto_free_section_data): Likewise. > > (lto_get_raw_section_data): New function. > > (lto_free_raw_section_data): New function. > > (copy_function_or_variable): Copy sections w/o decompressing. > > (lto_output_decl_state_refs): Picke compressed bit. > > * lto-streamer.h (lto_in_decl_state): New flag compressed. > > (lto_out_decl_state): Likewise. > > (lto_get_section_data, lto_free_section_data): Update prototypes > > (lto_get_raw_section_data, lto_free_raw_section_data): Declare. > > (lto_write_raw_data): Declare. > > (lto_begin_section): Remove FIXME. > > (lto_write_raw_data): New function. > > (lto_write_stream): Remove FIXME. > > (lto_new_out_decl_state): Set compressed flag. > > > > * lto.c (lto_read_in_decl_state): Unpickle compressed bit. > > Index: cgraph.c > > =================================================================== > > --- cgraph.c (revision 231546) > > +++ cgraph.c (working copy) > > @@ -3251,9 +3251,11 @@ cgraph_node::get_untransformed_body (voi > > > > /* We may have renamed the declaration, e.g., a static function. */ > > name = lto_get_decl_name_mapping (file_data, name); > > + struct lto_in_decl_state *decl_state > > + = lto_get_function_in_decl_state (file_data, decl); > > > > data = lto_get_section_data (file_data, LTO_section_function_body, > > - name, &len); > > + name, &len, decl_state->compressed); > > if (!data) > > fatal_error (input_location, "%s: section %s is missing", > > file_data->file_name, > > @@ -3264,7 +3266,7 @@ cgraph_node::get_untransformed_body (voi > > lto_input_function_body (file_data, this, data); > > lto_stats.num_function_bodies++; > > lto_free_section_data (file_data, LTO_section_function_body, name, > > - data, len); > > + data, len, decl_state->compressed); > > lto_free_function_in_decl_state_for_node (this); > > /* Keep lto file data so ipa-inline-analysis knows about cross module > > inlining. */ > > Index: lto-section-in.c > > =================================================================== > > --- lto-section-in.c (revision 231546) > > +++ lto-section-in.c (working copy) > > @@ -130,7 +130,7 @@ const char * > > lto_get_section_data (struct lto_file_decl_data *file_data, > > enum lto_section_type section_type, > > const char *name, > > - size_t *len) > > + size_t *len, bool decompress) > > { > > const char *data = (get_section_f) (file_data, section_type, name, len); > > const size_t header_length = sizeof (struct lto_data_header); > > @@ -142,9 +142,10 @@ lto_get_section_data (struct lto_file_de > > if (data == NULL) > > return NULL; > > > > - /* FIXME lto: WPA mode does not write compressed sections, so for now > > - suppress uncompression if flag_ltrans. */ > > - if (!flag_ltrans) > > + /* WPA->ltrans streams are not compressed with exception of function bodies > > + and variable initializers that has been verbatim copied from earlier > > + compilations. */ > > + if (!flag_ltrans || decompress) > > { > > /* Create a mapping header containing the underlying data and length, > > and prepend this to the uncompression buffer. The uncompressed data > > @@ -170,6 +171,16 @@ lto_get_section_data (struct lto_file_de > > return data; > > } > > > > +/* Get the section data without any header parsing or uncompression. */ > > + > > +const char * > > +lto_get_raw_section_data (struct lto_file_decl_data *file_data, > > + enum lto_section_type section_type, > > + const char *name, > > + size_t *len) > > +{ > > + return (get_section_f) (file_data, section_type, name, len); > > +} > > > > /* Free the data found from the above call. The first three > > parameters are the same as above. DATA is the data to be freed and > > @@ -180,7 +191,7 @@ lto_free_section_data (struct lto_file_d > > enum lto_section_type section_type, > > const char *name, > > const char *data, > > - size_t len) > > + size_t len, bool decompress) > > { > > const size_t header_length = sizeof (struct lto_data_header); > > const char *real_data = data - header_length; > > @@ -189,9 +200,7 @@ lto_free_section_data (struct lto_file_d > > > > gcc_assert (free_section_f); > > > > - /* FIXME lto: WPA mode does not write compressed sections, so for now > > - suppress uncompression mapping if flag_ltrans. */ > > - if (flag_ltrans) > > + if (flag_ltrans && !decompress) > > { > > (free_section_f) (file_data, section_type, name, data, len); > > return; > > @@ -203,6 +212,17 @@ lto_free_section_data (struct lto_file_d > > free (CONST_CAST (char *, real_data)); > > } > > > > +/* Free data allocated by lto_get_raw_section_data. */ > > + > > +void > > +lto_free_raw_section_data (struct lto_file_decl_data *file_data, > > + enum lto_section_type section_type, > > + const char *name, > > + const char *data, > > + size_t len) > > +{ > > + (free_section_f) (file_data, section_type, name, data, len); > > +} > > > > /* Load a section of type SECTION_TYPE from FILE_DATA, parse the > > header and then return an input block pointing to the section. The > > Index: varpool.c > > =================================================================== > > --- varpool.c (revision 231546) > > +++ varpool.c (working copy) > > @@ -296,9 +303,11 @@ varpool_node::get_constructor (void) > > > > /* We may have renamed the declaration, e.g., a static function. */ > > name = lto_get_decl_name_mapping (file_data, name); > > + struct lto_in_decl_state *decl_state > > + = lto_get_function_in_decl_state (file_data, decl); > > > > data = lto_get_section_data (file_data, LTO_section_function_body, > > - name, &len); > > + name, &len, decl_state->compressed); > > if (!data) > > fatal_error (input_location, "%s: section %s is missing", > > file_data->file_name, > > @@ -308,7 +317,7 @@ varpool_node::get_constructor (void) > > gcc_assert (DECL_INITIAL (decl) != error_mark_node); > > lto_stats.num_function_bodies++; > > lto_free_section_data (file_data, LTO_section_function_body, name, > > - data, len); > > + data, len, decl_state->compressed); > > lto_free_function_in_decl_state_for_node (this); > > timevar_pop (TV_IPA_LTO_CTORS_IN); > > return DECL_INITIAL (decl); > > Index: lto-streamer-out.c > > =================================================================== > > --- lto-streamer-out.c (revision 231546) > > +++ lto-streamer-out.c (working copy) > > @@ -2191,22 +2224,23 @@ copy_function_or_variable (struct symtab > > struct lto_in_decl_state *in_state; > > struct lto_out_decl_state *out_state = lto_get_out_decl_state (); > > > > - lto_begin_section (section_name, !flag_wpa); > > + lto_begin_section (section_name, false); > > free (section_name); > > > > /* We may have renamed the declaration, e.g., a static function. */ > > name = lto_get_decl_name_mapping (file_data, name); > > > > - data = lto_get_section_data (file_data, LTO_section_function_body, > > - name, &len); > > + data = lto_get_raw_section_data (file_data, LTO_section_function_body, > > + name, &len); > > gcc_assert (data); > > > > /* Do a bit copy of the function body. */ > > - lto_write_data (data, len); > > + lto_write_raw_data (data, len); > > > > /* Copy decls. */ > > in_state = > > lto_get_function_in_decl_state (node->lto_file_data, function); > > + out_state->compressed = in_state->compressed; > > gcc_assert (in_state); > > > > for (i = 0; i < LTO_N_DECL_STREAMS; i++) > > @@ -2224,8 +2258,8 @@ copy_function_or_variable (struct symtab > > encoder->trees.safe_push ((*trees)[j]); > > } > > > > - lto_free_section_data (file_data, LTO_section_function_body, name, > > - data, len); > > + lto_free_raw_section_data (file_data, LTO_section_function_body, name, > > + data, len); > > lto_end_section (); > > } > > > > @@ -2431,6 +2465,7 @@ lto_output_decl_state_refs (struct outpu > > decl = (state->fn_decl) ? state->fn_decl : void_type_node; > > streamer_tree_cache_lookup (ob->writer_cache, decl, &ref); > > gcc_assert (ref != (unsigned)-1); > > + ref = ref * 2 + (state->compressed ? 1 : 0); > > lto_write_data (&ref, sizeof (uint32_t)); > > > > for (i = 0; i < LTO_N_DECL_STREAMS; i++) > > Index: lto/lto-symtab.c > > =================================================================== > > --- lto/lto-symtab.c (revision 231548) > > +++ lto/lto-symtab.c (working copy) > > @@ -883,6 +883,11 @@ lto_symtab_merge_symbols_1 (symtab_node > > else > > { > > DECL_INITIAL (e->decl) = error_mark_node; > > + if (e->lto_file_data) > > + { > > + lto_free_function_in_decl_state_for_node (e); > > + e->lto_file_data = NULL; > > + } > > symtab->call_varpool_removal_hooks (dyn_cast<varpool_node *> (e)); > > } > > e->remove_all_references (); > > Index: lto/lto.c > > =================================================================== > > --- lto/lto.c (revision 231546) > > +++ lto/lto.c (working copy) > > @@ -234,6 +234,8 @@ lto_read_in_decl_state (struct data_in * > > uint32_t i, j; > > > > ix = *data++; > > + state->compressed = ix & 1; > > + ix /= 2; > > decl = streamer_tree_cache_get_tree (data_in->reader_cache, ix); > > if (!VAR_OR_FUNCTION_DECL_P (decl)) > > { > > Index: lto-streamer.h > > =================================================================== > > --- lto-streamer.h (revision 231546) > > +++ lto-streamer.h (working copy) > > @@ -504,6 +505,9 @@ struct GTY((for_user)) lto_in_decl_state > > /* If this in-decl state is associated with a function. FN_DECL > > point to the FUNCTION_DECL. */ > > tree fn_decl; > > + > > + /* True if decl state is compressed. */ > > + bool compressed; > > }; > > > > typedef struct lto_in_decl_state *lto_in_decl_state_ptr; > > @@ -537,6 +541,9 @@ struct lto_out_decl_state > > /* If this out-decl state belongs to a function, fn_decl points to that > > function. Otherwise, it is NULL. */ > > tree fn_decl; > > + > > + /* True if decl state is compressed. */ > > + bool compressed; > > }; > > > > typedef struct lto_out_decl_state *lto_out_decl_state_ptr; > > @@ -761,10 +768,18 @@ extern void lto_set_in_hooks (struct lto > > extern struct lto_file_decl_data **lto_get_file_decl_data (void); > > extern const char *lto_get_section_data (struct lto_file_decl_data *, > > enum lto_section_type, > > - const char *, size_t *); > > + const char *, size_t *, > > + bool decompress = false); > > +extern const char *lto_get_raw_section_data (struct lto_file_decl_data *, > > + enum lto_section_type, > > + const char *, size_t *); > > extern void lto_free_section_data (struct lto_file_decl_data *, > > - enum lto_section_type, > > - const char *, const char *, size_t); > > + enum lto_section_type, > > + const char *, const char *, size_t, > > + bool decompress = false); > > +extern void lto_free_raw_section_data (struct lto_file_decl_data *, > > + enum lto_section_type, > > + const char *, const char *, size_t); > > extern htab_t lto_create_renaming_table (void); > > extern void lto_record_renamed_decl (struct lto_file_decl_data *, > > const char *, const char *); > > @@ -785,6 +800,7 @@ extern void lto_value_range_error (const > > extern void lto_begin_section (const char *, bool); > > extern void lto_end_section (void); > > extern void lto_write_data (const void *, unsigned int); > > +extern void lto_write_raw_data (const void *, unsigned int); > > extern void lto_write_stream (struct lto_output_stream *); > > extern bool lto_output_decl_index (struct lto_output_stream *, > > struct lto_tree_ref_encoder *, > > Index: lto-section-out.c > > =================================================================== > > --- lto-section-out.c (revision 231546) > > +++ lto-section-out.c (working copy) > > @@ -66,9 +66,6 @@ lto_begin_section (const char *name, boo > > { > > lang_hooks.lto.begin_section (name); > > > > - /* FIXME lto: for now, suppress compression if the lang_hook that appends > > - data is anything other than assembler output. The effect here is that > > - we get compression of IL only in non-ltrans object files. */ > > gcc_assert (compression_stream == NULL); > > if (compress) > > compression_stream = lto_start_compression (lto_append_data, NULL); > > @@ -99,6 +96,14 @@ lto_write_data (const void *data, unsign > > lang_hooks.lto.append_data ((const char *)data, size, NULL); > > } > > > > +/* Write SIZE bytes starting at DATA to the assembler. */ > > + > > +void > > +lto_write_raw_data (const void *data, unsigned int size) > > +{ > > + lang_hooks.lto.append_data ((const char *)data, size, NULL); > > +} > > + > > /* Write all of the chars in OBS to the assembler. Recycle the blocks > > in obs as this is being done. */ > > > > @@ -123,10 +128,6 @@ lto_write_stream (struct lto_output_stre > > if (!next_block) > > num_chars -= obs->left_in_block; > > > > - /* FIXME lto: WPA mode uses an ELF function as a lang_hook to append > > - output data. This hook is not happy with the way that compression > > - blocks up output differently to the way it's blocked here. So for > > - now, we don't compress WPA output. */ > > if (compression_stream) > > lto_compress_block (compression_stream, base, num_chars); > > else > > @@ -295,6 +296,9 @@ lto_new_out_decl_state (void) > > for (i = 0; i < LTO_N_DECL_STREAMS; i++) > > lto_init_tree_ref_encoder (&state->streams[i]); > > > > + /* At WPA time we do not compress sections by default. */ > > + state->compressed = !flag_wpa; > > + > > return state; > > } > > > > > > > > -- > Richard Biener <rguenther@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> > So would the patch be a lot more difficult if you go down either of > > the routes above? (I think I prefer changing lto_header rather > > than making main_size a bitfield) > > Agreed ;) Hmm, actually it seems things are difficult. All the headers are already compressed by zlib: Contents of section .gnu.lto_.inline.faa6142d1fc0c505: 0000 789c6362 6062c006 10a20000 ba0007 x.cb`b......... Contents of section .gnu.lto_.symbol_nodes.faa6142d1fc0c505: 0000 789c6362 6062e060 80025606 06f67a06 x.cb`b.`..V...z. 0010 0600029c 0098 ...... Contents of section .gnu.lto_.refs.faa6142d1fc0c505: 0000 789c6362 60626064 80020000 460006 x.cb`b`d....F.. Contents of section .gnu.lto_.decls.faa6142d1fc0c505: 0000 789c6362 6062d060 6060f80f 0461405a x.cb`b.```...a@Z 0010 10881981 d88e0115 80c496a2 8929312c .............)1, 0020 6c5870e0 c1d2291c 131a1b1b feffaf4f lXp...)........O 0030 6464e104 9ac7cec8 a0c5b0b2 a1a16142 dd............aB 0040 43c32336 142d118c 898c7c9c 0c0a6006 C.#6.-....|...`. 0050 0f27030b 58349171 1923135c 0d4b895e .'..X4.q.#.\.K.^ 0060 32037359 620e9bbb 5fa88233 0300a167 2.sYb..._..3...g 0070 1a7f .. Contents of section .gnu.lto_.symtab.faa6142d1fc0c505: 0000 76616c00 00040004 00000000 000000a5 val............. 0010 000000 ... Contents of section .gnu.lto_.opts: 0000 272d6d74 756e653d 67656e65 72696327 '-mtune=generic' 0010 20272d6d 61726368 3d783836 2d363427 '-march=x86-64' 0020 20272d66 6c746f27 00 '-flto'. Contents of section .comment: 0000 00474343 3a202853 55534520 4c696e75 .GCC: (SUSE Linu 0010 78292034 2e382e33 20323031 34303632 x) 4.8.3 2014062 0020 37205b67 63632d34 5f382d62 72616e63 7 [gcc-4_8-branc 0030 68207265 76697369 6f6e2032 31323036 h revision 21206 0040 345d00 4]. as you can see only opts/comment and symtab sections come out decompressed. The sequence x.cb`b is zlib's header. The description is here: https://tools.ietf.org/html/rfc1950#page-4 There don't seem to be unique identifier of the zlib header that would allow us to tell lto_header apart from zlib, so I don't think I can play a trick and auto-detect the compression. As such, I do not think we can get a header about compression into section w/o breaking backward compatibility short of inventing our own mallformed zlib header which we would be sure to be able to tell apart from zlib's. That would defeat the plan to not increase the section sizes. I would preffer to go with my current solution until we make a new "major major" revision of the format where we will have a chance to drop all this and cleanup other design mistakes of the original LTO format. Honza
On December 12, 2015 7:47:39 PM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote: >> > So would the patch be a lot more difficult if you go down either of >> > the routes above? (I think I prefer changing lto_header rather >> > than making main_size a bitfield) >> >> Agreed ;) >Hmm, actually it seems things are difficult. All the headers are >already compressed >by zlib: >Contents of section .gnu.lto_.inline.faa6142d1fc0c505: > 0000 789c6362 6062c006 10a20000 ba0007 x.cb`b......... >Contents of section .gnu.lto_.symbol_nodes.faa6142d1fc0c505: > 0000 789c6362 6062e060 80025606 06f67a06 x.cb`b.`..V...z. > 0010 0600029c 0098 ...... >Contents of section .gnu.lto_.refs.faa6142d1fc0c505: > 0000 789c6362 60626064 80020000 460006 x.cb`b`d....F.. >Contents of section .gnu.lto_.decls.faa6142d1fc0c505: > 0000 789c6362 6062d060 6060f80f 0461405a x.cb`b.```...a@Z > 0010 10881981 d88e0115 80c496a2 8929312c .............)1, > 0020 6c5870e0 c1d2291c 131a1b1b feffaf4f lXp...)........O > 0030 6464e104 9ac7cec8 a0c5b0b2 a1a16142 dd............aB > 0040 43c32336 142d118c 898c7c9c 0c0a6006 C.#6.-....|...`. > 0050 0f27030b 58349171 1923135c 0d4b895e .'..X4.q.#.\.K.^ > 0060 32037359 620e9bbb 5fa88233 0300a167 2.sYb..._..3...g > 0070 1a7f .. >Contents of section .gnu.lto_.symtab.faa6142d1fc0c505: > 0000 76616c00 00040004 00000000 000000a5 val............. >0010 000000 ... > >Contents of section .gnu.lto_.opts: > >0000 272d6d74 756e653d 67656e65 72696327 '-mtune=generic' > >0010 20272d6d 61726368 3d783836 2d363427 '-march=x86-64' > >0020 20272d66 6c746f27 00 '-flto'. > >Contents of section .comment: > >0000 00474343 3a202853 55534520 4c696e75 .GCC: (SUSE Linu > >0010 78292034 2e382e33 20323031 34303632 x) 4.8.3 2014062 > > 0020 37205b67 63632d34 5f382d62 72616e63 7 [gcc-4_8-branc > 0030 68207265 76697369 6f6e2032 31323036 h revision 21206 > 0040 345d00 4]. > >as you can see only opts/comment and symtab sections come out >decompressed. >The sequence x.cb`b is zlib's header. The description is here: >https://tools.ietf.org/html/rfc1950#page-4 > >There don't seem to be unique identifier of the zlib header that would >allow us to tell lto_header apart from zlib, so I don't think I can >play >a trick and auto-detect the compression. As such, I do not think >we can get a header about compression into section w/o breaking >backward compatibility short of inventing our own mallformed zlib >header which we would be sure to be able to tell apart from zlib's. >That would defeat the plan to not increase the section sizes. Aww, yeah. Now I remember. >I would preffer to go with my current solution until we make a new >"major major" revision of the format where we will have a chance >to drop all this and cleanup other design mistakes of the original >LTO format. OK... Richard. > >Honza
Index: cgraph.c =================================================================== --- cgraph.c (revision 231546) +++ cgraph.c (working copy) @@ -3251,9 +3251,11 @@ cgraph_node::get_untransformed_body (voi /* We may have renamed the declaration, e.g., a static function. */ name = lto_get_decl_name_mapping (file_data, name); + struct lto_in_decl_state *decl_state + = lto_get_function_in_decl_state (file_data, decl); data = lto_get_section_data (file_data, LTO_section_function_body, - name, &len); + name, &len, decl_state->compressed); if (!data) fatal_error (input_location, "%s: section %s is missing", file_data->file_name, @@ -3264,7 +3266,7 @@ cgraph_node::get_untransformed_body (voi lto_input_function_body (file_data, this, data); lto_stats.num_function_bodies++; lto_free_section_data (file_data, LTO_section_function_body, name, - data, len); + data, len, decl_state->compressed); lto_free_function_in_decl_state_for_node (this); /* Keep lto file data so ipa-inline-analysis knows about cross module inlining. */ Index: lto-section-in.c =================================================================== --- lto-section-in.c (revision 231546) +++ lto-section-in.c (working copy) @@ -130,7 +130,7 @@ const char * lto_get_section_data (struct lto_file_decl_data *file_data, enum lto_section_type section_type, const char *name, - size_t *len) + size_t *len, bool decompress) { const char *data = (get_section_f) (file_data, section_type, name, len); const size_t header_length = sizeof (struct lto_data_header); @@ -142,9 +142,10 @@ lto_get_section_data (struct lto_file_de if (data == NULL) return NULL; - /* FIXME lto: WPA mode does not write compressed sections, so for now - suppress uncompression if flag_ltrans. */ - if (!flag_ltrans) + /* WPA->ltrans streams are not compressed with exception of function bodies + and variable initializers that has been verbatim copied from earlier + compilations. */ + if (!flag_ltrans || decompress) { /* Create a mapping header containing the underlying data and length, and prepend this to the uncompression buffer. The uncompressed data @@ -170,6 +171,16 @@ lto_get_section_data (struct lto_file_de return data; } +/* Get the section data without any header parsing or uncompression. */ + +const char * +lto_get_raw_section_data (struct lto_file_decl_data *file_data, + enum lto_section_type section_type, + const char *name, + size_t *len) +{ + return (get_section_f) (file_data, section_type, name, len); +} /* Free the data found from the above call. The first three parameters are the same as above. DATA is the data to be freed and @@ -180,7 +191,7 @@ lto_free_section_data (struct lto_file_d enum lto_section_type section_type, const char *name, const char *data, - size_t len) + size_t len, bool decompress) { const size_t header_length = sizeof (struct lto_data_header); const char *real_data = data - header_length; @@ -189,9 +200,7 @@ lto_free_section_data (struct lto_file_d gcc_assert (free_section_f); - /* FIXME lto: WPA mode does not write compressed sections, so for now - suppress uncompression mapping if flag_ltrans. */ - if (flag_ltrans) + if (flag_ltrans && !decompress) { (free_section_f) (file_data, section_type, name, data, len); return; @@ -203,6 +212,17 @@ lto_free_section_data (struct lto_file_d free (CONST_CAST (char *, real_data)); } +/* Free data allocated by lto_get_raw_section_data. */ + +void +lto_free_raw_section_data (struct lto_file_decl_data *file_data, + enum lto_section_type section_type, + const char *name, + const char *data, + size_t len) +{ + (free_section_f) (file_data, section_type, name, data, len); +} /* Load a section of type SECTION_TYPE from FILE_DATA, parse the header and then return an input block pointing to the section. The Index: varpool.c =================================================================== --- varpool.c (revision 231546) +++ varpool.c (working copy) @@ -296,9 +303,11 @@ varpool_node::get_constructor (void) /* We may have renamed the declaration, e.g., a static function. */ name = lto_get_decl_name_mapping (file_data, name); + struct lto_in_decl_state *decl_state + = lto_get_function_in_decl_state (file_data, decl); data = lto_get_section_data (file_data, LTO_section_function_body, - name, &len); + name, &len, decl_state->compressed); if (!data) fatal_error (input_location, "%s: section %s is missing", file_data->file_name, @@ -308,7 +317,7 @@ varpool_node::get_constructor (void) gcc_assert (DECL_INITIAL (decl) != error_mark_node); lto_stats.num_function_bodies++; lto_free_section_data (file_data, LTO_section_function_body, name, - data, len); + data, len, decl_state->compressed); lto_free_function_in_decl_state_for_node (this); timevar_pop (TV_IPA_LTO_CTORS_IN); return DECL_INITIAL (decl); Index: lto-streamer-out.c =================================================================== --- lto-streamer-out.c (revision 231546) +++ lto-streamer-out.c (working copy) @@ -2191,22 +2224,23 @@ copy_function_or_variable (struct symtab struct lto_in_decl_state *in_state; struct lto_out_decl_state *out_state = lto_get_out_decl_state (); - lto_begin_section (section_name, !flag_wpa); + lto_begin_section (section_name, false); free (section_name); /* We may have renamed the declaration, e.g., a static function. */ name = lto_get_decl_name_mapping (file_data, name); - data = lto_get_section_data (file_data, LTO_section_function_body, - name, &len); + data = lto_get_raw_section_data (file_data, LTO_section_function_body, + name, &len); gcc_assert (data); /* Do a bit copy of the function body. */ - lto_write_data (data, len); + lto_write_raw_data (data, len); /* Copy decls. */ in_state = lto_get_function_in_decl_state (node->lto_file_data, function); + out_state->compressed = in_state->compressed; gcc_assert (in_state); for (i = 0; i < LTO_N_DECL_STREAMS; i++) @@ -2224,8 +2258,8 @@ copy_function_or_variable (struct symtab encoder->trees.safe_push ((*trees)[j]); } - lto_free_section_data (file_data, LTO_section_function_body, name, - data, len); + lto_free_raw_section_data (file_data, LTO_section_function_body, name, + data, len); lto_end_section (); } @@ -2431,6 +2465,7 @@ lto_output_decl_state_refs (struct outpu decl = (state->fn_decl) ? state->fn_decl : void_type_node; streamer_tree_cache_lookup (ob->writer_cache, decl, &ref); gcc_assert (ref != (unsigned)-1); + ref = ref * 2 + (state->compressed ? 1 : 0); lto_write_data (&ref, sizeof (uint32_t)); for (i = 0; i < LTO_N_DECL_STREAMS; i++) Index: lto/lto-symtab.c =================================================================== --- lto/lto-symtab.c (revision 231548) +++ lto/lto-symtab.c (working copy) @@ -883,6 +883,11 @@ lto_symtab_merge_symbols_1 (symtab_node else { DECL_INITIAL (e->decl) = error_mark_node; + if (e->lto_file_data) + { + lto_free_function_in_decl_state_for_node (e); + e->lto_file_data = NULL; + } symtab->call_varpool_removal_hooks (dyn_cast<varpool_node *> (e)); } e->remove_all_references (); Index: lto/lto.c =================================================================== --- lto/lto.c (revision 231546) +++ lto/lto.c (working copy) @@ -234,6 +234,8 @@ lto_read_in_decl_state (struct data_in * uint32_t i, j; ix = *data++; + state->compressed = ix & 1; + ix /= 2; decl = streamer_tree_cache_get_tree (data_in->reader_cache, ix); if (!VAR_OR_FUNCTION_DECL_P (decl)) { Index: lto-streamer.h =================================================================== --- lto-streamer.h (revision 231546) +++ lto-streamer.h (working copy) @@ -504,6 +505,9 @@ struct GTY((for_user)) lto_in_decl_state /* If this in-decl state is associated with a function. FN_DECL point to the FUNCTION_DECL. */ tree fn_decl; + + /* True if decl state is compressed. */ + bool compressed; }; typedef struct lto_in_decl_state *lto_in_decl_state_ptr; @@ -537,6 +541,9 @@ struct lto_out_decl_state /* If this out-decl state belongs to a function, fn_decl points to that function. Otherwise, it is NULL. */ tree fn_decl; + + /* True if decl state is compressed. */ + bool compressed; }; typedef struct lto_out_decl_state *lto_out_decl_state_ptr; @@ -761,10 +768,18 @@ extern void lto_set_in_hooks (struct lto extern struct lto_file_decl_data **lto_get_file_decl_data (void); extern const char *lto_get_section_data (struct lto_file_decl_data *, enum lto_section_type, - const char *, size_t *); + const char *, size_t *, + bool decompress = false); +extern const char *lto_get_raw_section_data (struct lto_file_decl_data *, + enum lto_section_type, + const char *, size_t *); extern void lto_free_section_data (struct lto_file_decl_data *, - enum lto_section_type, - const char *, const char *, size_t); + enum lto_section_type, + const char *, const char *, size_t, + bool decompress = false); +extern void lto_free_raw_section_data (struct lto_file_decl_data *, + enum lto_section_type, + const char *, const char *, size_t); extern htab_t lto_create_renaming_table (void); extern void lto_record_renamed_decl (struct lto_file_decl_data *, const char *, const char *); @@ -785,6 +800,7 @@ extern void lto_value_range_error (const extern void lto_begin_section (const char *, bool); extern void lto_end_section (void); extern void lto_write_data (const void *, unsigned int); +extern void lto_write_raw_data (const void *, unsigned int); extern void lto_write_stream (struct lto_output_stream *); extern bool lto_output_decl_index (struct lto_output_stream *, struct lto_tree_ref_encoder *, Index: lto-section-out.c =================================================================== --- lto-section-out.c (revision 231546) +++ lto-section-out.c (working copy) @@ -66,9 +66,6 @@ lto_begin_section (const char *name, boo { lang_hooks.lto.begin_section (name); - /* FIXME lto: for now, suppress compression if the lang_hook that appends - data is anything other than assembler output. The effect here is that - we get compression of IL only in non-ltrans object files. */ gcc_assert (compression_stream == NULL); if (compress) compression_stream = lto_start_compression (lto_append_data, NULL); @@ -99,6 +96,14 @@ lto_write_data (const void *data, unsign lang_hooks.lto.append_data ((const char *)data, size, NULL); } +/* Write SIZE bytes starting at DATA to the assembler. */ + +void +lto_write_raw_data (const void *data, unsigned int size) +{ + lang_hooks.lto.append_data ((const char *)data, size, NULL); +} + /* Write all of the chars in OBS to the assembler. Recycle the blocks in obs as this is being done. */ @@ -123,10 +128,6 @@ lto_write_stream (struct lto_output_stre if (!next_block) num_chars -= obs->left_in_block; - /* FIXME lto: WPA mode uses an ELF function as a lang_hook to append - output data. This hook is not happy with the way that compression - blocks up output differently to the way it's blocked here. So for - now, we don't compress WPA output. */ if (compression_stream) lto_compress_block (compression_stream, base, num_chars); else @@ -295,6 +296,9 @@ lto_new_out_decl_state (void) for (i = 0; i < LTO_N_DECL_STREAMS; i++) lto_init_tree_ref_encoder (&state->streams[i]); + /* At WPA time we do not compress sections by default. */ + state->compressed = !flag_wpa; + return state; }