diff mbox series

Avoid SCC hashing on unmergeable trees

Message ID 20200519204600.GA87583@kam.mff.cuni.cz
State New
Headers show
Series Avoid SCC hashing on unmergeable trees | expand

Commit Message

Jan Hubicka May 19, 2020, 8:46 p.m. UTC
Hi,
this is new incarantion of patch to identify unmergeable tree at streaming out
time rather than streaming in and to avoid pickling them to sccs with with hash
codes.

Building cc1 plus this patch reduces:

[WPA] read 4452927 SCCs of average size 1.986030
[WPA] 8843646 tree bodies read in total
[WPA] tree SCC table: size 524287, 205158 elements, collision ratio: 0.505204
[WPA] tree SCC max chain length 43 (size 1)
[WPA] Compared 947551 SCCs, 780270 collisions (0.823460)
[WPA] Merged 944038 SCCs
[WPA] Merged 5253521 tree bodies
[WPA] Merged 590027 types
...
[WPA] Size of mmap'd section decls: 99229066 bytes
[WPA] Size of mmap'd section function_body: 18398837 bytes
[WPA] Size of mmap'd section refs: 733678 bytes
[WPA] Size of mmap'd section jmpfuncs: 2965981 bytes
[WPA] Size of mmap'd section pureconst: 170248 bytes
[WPA] Size of mmap'd section profile: 17985 bytes
[WPA] Size of mmap'd section symbol_nodes: 3392736 bytes
[WPA] Size of mmap'd section inline: 2693920 bytes
[WPA] Size of mmap'd section icf: 435557 bytes
[WPA] Size of mmap'd section offload_table: 0 bytes
[WPA] Size of mmap'd section lto: 4320 bytes
[WPA] Size of mmap'd section ipa_sra: 651660 bytes

... to ...

[WPA] read 3312246 unshared trees
[WPA] read 1144381 mergeable SCCs of average size 4.833785
[WPA] 8843938 tree bodies read in total
[WPA] tree SCC table: size 524287, 197767 elements, collision ratio: 0.506446
[WPA] tree SCC max chain length 43 (size 1)
[WPA] Compared 946614 SCCs, 775077 collisions (0.818789)
[WPA] Merged 943798 SCCs
[WPA] Merged 5253336 tree bodies
[WPA] Merged 590105 types
....
[WPA] Size of mmap'd section decls: 81262144 bytes
[WPA] Size of mmap'd section function_body: 14702611 bytes
[WPA] Size of mmap'd section ext_symtab: 0 bytes
[WPA] Size of mmap'd section refs: 733695 bytes
[WPA] Size of mmap'd section jmpfuncs: 2332150 bytes
[WPA] Size of mmap'd section pureconst: 170292 bytes
[WPA] Size of mmap'd section profile: 17986 bytes
[WPA] Size of mmap'd section symbol_nodes: 3393358 bytes
[WPA] Size of mmap'd section inline: 2567939 bytes
[WPA] Size of mmap'd section icf: 435633 bytes
[WPA] Size of mmap'd section lto: 4320 bytes
[WPA] Size of mmap'd section ipa_sra: 651824 bytes

so results in about 22% reduction in global decl stream and 24% reduction on
function bodies stream (which is read mostly by ICF)

Martin, the zstd compression breaks the compression statistics (it works when
GCC is configured for zlib)

At first ltrans I get:

[LTRANS] Size of mmap'd section decls: 3734248 bytes
[LTRANS] Size of mmap'd section function_body: 4895962 bytes

... to ...

[LTRANS] Size of mmap'd section decls: 3479850 bytes
[LTRANS] Size of mmap'd section function_body: 3722935 bytes

So 7% reduction of global stream and 31% reduction of function bodies.

Stream in seems to get about 3% faster and stream out about 5% but it is
close to noise factor of my experiment.  I expect bigger speedups on
Firefox but I did not test it today since my Firefox setup broke again.
GCC is not very good example on the problem with anonymous namespace
types since we do not have so many of them.

Sice of object files in gcc directory is reduced by 11% (because hash
numbers do not compress well I guess).

The patch makes DFS walk to recognize trees that are not merged (anonymous
namespace, local function/variable decls, anonymous types etc).  As discussed
on IRC this is now done during the SCC walk rather than during the hash
computation.  When local tree is discovered we know that SCC components of everything that is on
the stack reffers to it and thus is also local. Moreover we mark trees into hash set in output block
so if we get a cross edge referring to local tree it gets marked too.

Patch also takes care of avoiding SCC wrappers around some trees. In particular
 1) singleton unmergeable SCCs are now streamed inline in global decl stream
    This includes INTEGER_CSTs and IDENTIFIER_NODEs that are shared by different
    code than rest of tree merging.
 2) We use LTO_trees instead of LTO_tree_scc to wrap unmergeable SCC components.
    It is still necessary to mark them because of forward references.  LTO_trees
    has simple header with number of trees and then things are streamed same way
    as for LTO_tree_scc. That is tree headers first followed by pickled references
    so things may point to future.

    Of course it is not necessary for LTO_tree_scc to be single component and
    streamer out may group more components together, but I decided to not snowball
    the patch even more
 3) In local streams when lto_output_tree is called and the topmost SCC components
    turns out to be singleton we stream the tree directly 
    instead of LTO_tree_scc, hash code, pickled tree, reference to just stremaed tree.

    LTO_trees is used to wrap all trees needed to represent tree being streamed.
    It would make sense again to use only one LTO_trees rather than one per SCC
    but I think this can be done incrementally.

In general local trees are now recognized by new predicate local_tree_p

Bit subtle is handing of TRANLSATION_UNIT_DECL, INTEGER_CST and
IDENTIFIER_NODE.  

TRANSLATION_UNIT_DECL a local tree but references to it does not make
other trees local (because we also understand local decls now).
So I check for it later after localness propagation is done.

INTEGER_CST and IDENTIFIER_NODEs are merged but not via the tree merging
machinery. So it makes sense to stream them as unmergeable trees but we
still need to compute their hashes so SCCs referring them do not get too
large collision chains.  For this reason they are checked just prior
stream out.

lto-bootstrapped/regteted x86_64-linux, OK?

gcc/ChangeLog:

2020-05-19  Jan Hubicka  <hubicka@ucw.cz>

	* lto-streamer-in.c (lto_input_scc): Add SHARED_SCC parameter.
	(lto_input_tree_1): Strenghten sanity check.
	(lto_input_tree): Update call of lto_input_scc.
	* lto-streamer-out.c: Include ipa-utils.h
	(create_output_block): Initialize local_trees if merigng is going
	to happen.
	(destroy_output_block): Destroy local_trees.
	(DFS): Add max_local_entry.
	(local_tree_p): New function.
	(DFS::DFS): Initialize and maintain it.
	(DFS::DFS_write_tree): Decide on streaming format.
	(lto_output_tree): Stream inline singleton SCCs
	* lto-streamer.h (enum LTO_tags): Add LTO_trees.
	(struct output_block): Add local_trees.
	(lto_input_scc): Update prototype.

gcc/lto/ChangeLog:

2020-05-19  Jan Hubicka  <hubicka@ucw.cz>

	* lto-common.c (compare_tree_sccs_1): Sanity check that we never
	read TRANSLATION_UNIT_DECL.
	(process_dref): Break out from ...
	(unify_scc): ... here.
	(process_new_tree): Break out from ...
	(lto_read_decls): ... here; handle streaming of singleton trees.
	(print_lto_report_1): Update statistics.

Comments

Martin Liška May 20, 2020, 9:27 a.m. UTC | #1
On 5/19/20 10:46 PM, Jan Hubicka wrote:
> Martin, the zstd compression breaks the compression statistics (it works when
> GCC is configured for zlib)

Hello.

Can you please be please more concrete? I can help with, but I don't know
what's broken.

Thanks,
Martin
Richard Biener May 20, 2020, 10:50 a.m. UTC | #2
On Tue, 19 May 2020, Jan Hubicka wrote:

> Hi,
> this is new incarantion of patch to identify unmergeable tree at streaming out
> time rather than streaming in and to avoid pickling them to sccs with with hash
> codes.
> 
> Building cc1 plus this patch reduces:
> 
> [WPA] read 4452927 SCCs of average size 1.986030
> [WPA] 8843646 tree bodies read in total
> [WPA] tree SCC table: size 524287, 205158 elements, collision ratio: 0.505204
> [WPA] tree SCC max chain length 43 (size 1)
> [WPA] Compared 947551 SCCs, 780270 collisions (0.823460)
> [WPA] Merged 944038 SCCs
> [WPA] Merged 5253521 tree bodies
> [WPA] Merged 590027 types
> ...
> [WPA] Size of mmap'd section decls: 99229066 bytes
> [WPA] Size of mmap'd section function_body: 18398837 bytes
> [WPA] Size of mmap'd section refs: 733678 bytes
> [WPA] Size of mmap'd section jmpfuncs: 2965981 bytes
> [WPA] Size of mmap'd section pureconst: 170248 bytes
> [WPA] Size of mmap'd section profile: 17985 bytes
> [WPA] Size of mmap'd section symbol_nodes: 3392736 bytes
> [WPA] Size of mmap'd section inline: 2693920 bytes
> [WPA] Size of mmap'd section icf: 435557 bytes
> [WPA] Size of mmap'd section offload_table: 0 bytes
> [WPA] Size of mmap'd section lto: 4320 bytes
> [WPA] Size of mmap'd section ipa_sra: 651660 bytes
> 
> ... to ...
> 
> [WPA] read 3312246 unshared trees
> [WPA] read 1144381 mergeable SCCs of average size 4.833785
> [WPA] 8843938 tree bodies read in total
> [WPA] tree SCC table: size 524287, 197767 elements, collision ratio: 0.506446
> [WPA] tree SCC max chain length 43 (size 1)
> [WPA] Compared 946614 SCCs, 775077 collisions (0.818789)
> [WPA] Merged 943798 SCCs
> [WPA] Merged 5253336 tree bodies
> [WPA] Merged 590105 types
> ....
> [WPA] Size of mmap'd section decls: 81262144 bytes
> [WPA] Size of mmap'd section function_body: 14702611 bytes
> [WPA] Size of mmap'd section ext_symtab: 0 bytes
> [WPA] Size of mmap'd section refs: 733695 bytes
> [WPA] Size of mmap'd section jmpfuncs: 2332150 bytes
> [WPA] Size of mmap'd section pureconst: 170292 bytes
> [WPA] Size of mmap'd section profile: 17986 bytes
> [WPA] Size of mmap'd section symbol_nodes: 3393358 bytes
> [WPA] Size of mmap'd section inline: 2567939 bytes
> [WPA] Size of mmap'd section icf: 435633 bytes
> [WPA] Size of mmap'd section lto: 4320 bytes
> [WPA] Size of mmap'd section ipa_sra: 651824 bytes
> 
> so results in about 22% reduction in global decl stream and 24% reduction on
> function bodies stream (which is read mostly by ICF)
> 
> Martin, the zstd compression breaks the compression statistics (it works when
> GCC is configured for zlib)
> 
> At first ltrans I get:
> 
> [LTRANS] Size of mmap'd section decls: 3734248 bytes
> [LTRANS] Size of mmap'd section function_body: 4895962 bytes
> 
> ... to ...
> 
> [LTRANS] Size of mmap'd section decls: 3479850 bytes
> [LTRANS] Size of mmap'd section function_body: 3722935 bytes
> 
> So 7% reduction of global stream and 31% reduction of function bodies.
> 
> Stream in seems to get about 3% faster and stream out about 5% but it is
> close to noise factor of my experiment.  I expect bigger speedups on
> Firefox but I did not test it today since my Firefox setup broke again.
> GCC is not very good example on the problem with anonymous namespace
> types since we do not have so many of them.
> 
> Sice of object files in gcc directory is reduced by 11% (because hash
> numbers do not compress well I guess).
> 
> The patch makes DFS walk to recognize trees that are not merged (anonymous
> namespace, local function/variable decls, anonymous types etc).  As discussed
> on IRC this is now done during the SCC walk rather than during the hash
> computation.  When local tree is discovered we know that SCC components of everything that is on
> the stack reffers to it and thus is also local. Moreover we mark trees into hash set in output block
> so if we get a cross edge referring to local tree it gets marked too.
> 
> Patch also takes care of avoiding SCC wrappers around some trees. In particular
>  1) singleton unmergeable SCCs are now streamed inline in global decl stream
>     This includes INTEGER_CSTs and IDENTIFIER_NODEs that are shared by different
>     code than rest of tree merging.
>  2) We use LTO_trees instead of LTO_tree_scc to wrap unmergeable SCC components.
>     It is still necessary to mark them because of forward references.  LTO_trees
>     has simple header with number of trees and then things are streamed same way
>     as for LTO_tree_scc. That is tree headers first followed by pickled references
>     so things may point to future.
> 
>     Of course it is not necessary for LTO_tree_scc to be single component and
>     streamer out may group more components together, but I decided to not snowball
>     the patch even more
>  3) In local streams when lto_output_tree is called and the topmost SCC components
>     turns out to be singleton we stream the tree directly 
>     instead of LTO_tree_scc, hash code, pickled tree, reference to just stremaed tree.
> 
>     LTO_trees is used to wrap all trees needed to represent tree being streamed.
>     It would make sense again to use only one LTO_trees rather than one per SCC
>     but I think this can be done incrementally.
> 
> In general local trees are now recognized by new predicate local_tree_p
> 
> Bit subtle is handing of TRANLSATION_UNIT_DECL, INTEGER_CST and
> IDENTIFIER_NODE.  
> 
> TRANSLATION_UNIT_DECL a local tree but references to it does not make
> other trees local (because we also understand local decls now).
> So I check for it later after localness propagation is done.
> 
> INTEGER_CST and IDENTIFIER_NODEs are merged but not via the tree merging
> machinery. So it makes sense to stream them as unmergeable trees but we
> still need to compute their hashes so SCCs referring them do not get too
> large collision chains.  For this reason they are checked just prior
> stream out.
> 
> lto-bootstrapped/regteted x86_64-linux, OK?

The patch looks reasonable - the wins are not entirely clear to me
since you mix in LTO streaming format optimizations.  You promised
to improve collisions on the SCC merging side but those are not
spectacular (if present at all) in the above data?

The patch itself looks really nice.  The only comment I have is
with respect to the new local_tree_p predicate and its relation
to tree_is_indexable.  We should be able to assert that
local trees are not indexable and indexable trees never are local.
Does this make the predicates equal? ;)  It would be nice if
they would cross-reference each other (maybe even assert at least
in one way), maybe put next to each other and share parts?
We can do this as followup.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 2020-05-19  Jan Hubicka  <hubicka@ucw.cz>
> 
> 	* lto-streamer-in.c (lto_input_scc): Add SHARED_SCC parameter.
> 	(lto_input_tree_1): Strenghten sanity check.
> 	(lto_input_tree): Update call of lto_input_scc.
> 	* lto-streamer-out.c: Include ipa-utils.h
> 	(create_output_block): Initialize local_trees if merigng is going
> 	to happen.
> 	(destroy_output_block): Destroy local_trees.
> 	(DFS): Add max_local_entry.
> 	(local_tree_p): New function.
> 	(DFS::DFS): Initialize and maintain it.
> 	(DFS::DFS_write_tree): Decide on streaming format.
> 	(lto_output_tree): Stream inline singleton SCCs
> 	* lto-streamer.h (enum LTO_tags): Add LTO_trees.
> 	(struct output_block): Add local_trees.
> 	(lto_input_scc): Update prototype.
> 
> gcc/lto/ChangeLog:
> 
> 2020-05-19  Jan Hubicka  <hubicka@ucw.cz>
> 
> 	* lto-common.c (compare_tree_sccs_1): Sanity check that we never
> 	read TRANSLATION_UNIT_DECL.
> 	(process_dref): Break out from ...
> 	(unify_scc): ... here.
> 	(process_new_tree): Break out from ...
> 	(lto_read_decls): ... here; handle streaming of singleton trees.
> 	(print_lto_report_1): Update statistics.
> 
> diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c
> index 244f5b8aa5c..6d571c4344a 100644
> --- a/gcc/lto-streamer-in.c
> +++ b/gcc/lto-streamer-in.c
> @@ -1424,16 +1424,17 @@ lto_read_tree (class lto_input_block *ib, class data_in *data_in,
>  
>  
>  /* Populate the reader cache with trees materialized from the SCC
> -   following in the IB, DATA_IN stream.  */
> +   following in the IB, DATA_IN stream.
> +   If SHARED_SCC is true we input LTO_tree_scc.  */
>  
>  hashval_t
>  lto_input_scc (class lto_input_block *ib, class data_in *data_in,
> -	       unsigned *len, unsigned *entry_len)
> +	       unsigned *len, unsigned *entry_len, bool shared_scc)
>  {
>    /* A blob of unnamed tree nodes, fill the cache from it and
>       recurse.  */
>    unsigned size = streamer_read_uhwi (ib);
> -  hashval_t scc_hash = streamer_read_uhwi (ib);
> +  hashval_t scc_hash = shared_scc ? streamer_read_uhwi (ib) : 0;
>    unsigned scc_entry_len = 1;
>  
>    if (size == 1)
> @@ -1456,7 +1457,8 @@ lto_input_scc (class lto_input_block *ib, class data_in *data_in,
>  	      || (tag >= LTO_field_decl_ref && tag <= LTO_global_decl_ref)
>  	      || tag == LTO_tree_pickle_reference
>  	      || tag == LTO_integer_cst
> -	      || tag == LTO_tree_scc)
> +	      || tag == LTO_tree_scc
> +	      || tag == LTO_trees)
>  	    gcc_unreachable ();
>  
>  	  result = streamer_alloc_tree (ib, data_in, tag);
> @@ -1522,7 +1524,7 @@ lto_input_tree_1 (class lto_input_block *ib, class data_in *data_in,
>  				 (a, len, TYPE_PRECISION (type)));
>        streamer_tree_cache_append (data_in->reader_cache, result, hash);
>      }
> -  else if (tag == LTO_tree_scc)
> +  else if (tag == LTO_tree_scc || tag == LTO_trees)
>      gcc_unreachable ();
>    else
>      {
> @@ -1538,11 +1540,11 @@ lto_input_tree (class lto_input_block *ib, class data_in *data_in)
>  {
>    enum LTO_tags tag;
>  
> -  /* Input and skip SCCs.  */
> -  while ((tag = streamer_read_record_start (ib)) == LTO_tree_scc)
> +  /* Input pickled trees needed to stream in the refrence.  */
> +  while ((tag = streamer_read_record_start (ib)) == LTO_trees)
>      {
>        unsigned len, entry_len;
> -      lto_input_scc (ib, data_in, &len, &entry_len);
> +      lto_input_scc (ib, data_in, &len, &entry_len, false);
>  
>        /* Register DECLs with the debuginfo machinery.  */
>        while (!dref_queue.is_empty ())
> @@ -1551,7 +1553,15 @@ lto_input_tree (class lto_input_block *ib, class data_in *data_in)
>  	  debug_hooks->register_external_die (e.decl, e.sym, e.off);
>  	}
>      }
> -  return lto_input_tree_1 (ib, data_in, tag, 0);
> +  tree t = lto_input_tree_1 (ib, data_in, tag, 0);
> +
> +  if (!dref_queue.is_empty ())
> +    {
> +      dref_entry e = dref_queue.pop ();
> +      debug_hooks->register_external_die (e.decl, e.sym, e.off);
> +      gcc_checking_assert (dref_queue.is_empty ());
> +    }
> +  return t;
>  }
>  
>  
> diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
> index 3d94324881f..1c5b8f42c9b 100644
> --- a/gcc/lto-streamer-out.c
> +++ b/gcc/lto-streamer-out.c
> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-dfa.h"
>  #include "file-prefix-map.h" /* remap_debug_filename()  */
>  #include "output.h"
> +#include "ipa-utils.h"
>  
>  
>  static void lto_write_tree (struct output_block*, tree, bool);
> @@ -75,6 +76,10 @@ create_output_block (enum lto_section_type section_type)
>  
>    ob->section_type = section_type;
>    ob->decl_state = lto_get_out_decl_state ();
> +  /* Only global decl stream in non-wpa will ever be considered by tree
> +     merging.  */
> +  if (!flag_wpa && section_type == LTO_section_decls)
> +    ob->local_trees = new (hash_set <tree>);
>    ob->main_stream = XCNEW (struct lto_output_stream);
>    ob->string_stream = XCNEW (struct lto_output_stream);
>    ob->writer_cache = streamer_tree_cache_create (!flag_wpa, true, false);
> @@ -100,6 +105,7 @@ destroy_output_block (struct output_block *ob)
>  
>    delete ob->string_hash_table;
>    ob->string_hash_table = NULL;
> +  delete ob->local_trees;
>  
>    free (ob->main_stream);
>    free (ob->string_stream);
> @@ -532,6 +538,8 @@ private:
>      bool ref_p;
>      bool this_ref_p;
>    };
> +  /* Maximum index of scc stack containing a local tree.  */
> +  int max_local_entry;
>  
>    static int scc_entry_compare (const void *, const void *);
>  
> @@ -550,6 +558,33 @@ private:
>    struct obstack sccstate_obstack;
>  };
>  
> +/* Return true if type can be merged.
> +   TRANSLATION_UNIT_DECL is handled specially since references to it does
> +   not make other trees local as well.  */
> +
> +static bool
> +local_tree_p (tree t)
> +{
> +  switch (TREE_CODE (t))
> +    {
> +    case LABEL_DECL:
> +      return true;
> +    case NAMESPACE_DECL:
> +      return !DECL_NAME (t);
> +    case VAR_DECL:
> +    case FUNCTION_DECL:
> +      return !TREE_PUBLIC (t) && !DECL_EXTERNAL (t);
> +    case RECORD_TYPE:
> +    case UNION_TYPE:
> +    case ENUMERAL_TYPE:
> +      return TYPE_MAIN_VARIANT (t) == t
> +	     && odr_type_p (t) && type_with_linkage_p (t)
> +	     && type_in_anonymous_namespace_p (t);
> +    default:
> +      return false;
> +    }
> +}
> +
>  /* Emit the physical representation of tree node EXPR to output block OB,
>     using depth-first search on the subgraph.  If THIS_REF_P is true, the
>     leaves of EXPR are emitted as references via lto_output_tree_ref.
> @@ -560,6 +595,8 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
>  	  bool single_p)
>  {
>    unsigned int next_dfs_num = 1;
> +
> +  max_local_entry = -1;
>    gcc_obstack_init (&sccstate_obstack);
>    DFS_write_tree (ob, NULL, expr, ref_p, this_ref_p);
>    while (!worklist_vec.is_empty ())
> @@ -586,6 +623,8 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
>  	  scc_entry e = { expr, 0 };
>  	  /* Not yet visited.  DFS recurse and push it onto the stack.  */
>  	  *slot = cstate = XOBNEW (&sccstate_obstack, struct sccs);
> +	  if (ob->local_trees && local_tree_p (expr))
> +	    max_local_entry = sccstack.length ();
>  	  sccstack.safe_push (e);
>  	  cstate->dfsnum = next_dfs_num++;
>  	  cstate->low = cstate->dfsnum;
> @@ -640,7 +679,26 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
>  	     any merging there.  */
>  	  hashval_t scc_hash = 0;
>  	  unsigned scc_entry_len = 0;
> -	  if (!flag_wpa)
> +	  bool local_to_unit = !ob->local_trees
> +			       || max_local_entry >= (int)first;
> +
> +	  /* Remember that trees are local so info gets propagated to other
> +	     SCCs.  */
> +	  if (local_to_unit && ob->local_trees)
> +	    {
> +	      for (unsigned i = 0; i < size; ++i)
> +		ob->local_trees->add (sccstack[first + i].t);
> +	    }
> +
> +	  /* As a special case do not stream TRANSLATION_UNIT_DECL as shared
> +	     tree.  We can not mark it local because references to it does not
> +	     make other trees local (all global decls reffer to it via
> +	     CONTEXT).  */
> +	  if (size == 1
> +	      && TREE_CODE (sccstack[first].t) == TRANSLATION_UNIT_DECL)
> +	    local_to_unit = true;
> +
> +	  if (!local_to_unit)
>  	    {
>  	      scc_hash = hash_scc (ob, first, size, ref_p, this_ref_p);
>  
> @@ -672,10 +730,44 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
>  	      gcc_checking_assert (scc_entry_len == 1);
>  	    }
>  
> -	  /* Write LTO_tree_scc.  */
> -	  streamer_write_record_start (ob, LTO_tree_scc);
> -	  streamer_write_uhwi (ob, size);
> -	  streamer_write_uhwi (ob, scc_hash);
> +	  worklist_vec.pop ();
> +
> +	  /* If we are not streaming global section tree sharing will not
> +	     be done, but we must mark block of trees so streamer can
> +	     distinguish it from reference being streamed.  */
> +	  if (ob->section_type != LTO_section_decls)
> +	    {
> +	      /* If this is the original tree we stream and it forms SCC
> +		 by itself then we do not need to stream SCC at all.  */
> +	      if (worklist_vec.is_empty () && first == 0 && size == 1)
> +		 return;
> +	      streamer_write_record_start (ob, LTO_trees);
> +	      streamer_write_uhwi (ob, size);
> +	    }
> +	  /* Write LTO_tree_scc if sharing is going to be performned.  */
> +	  else if (!local_to_unit
> +		   /* These are special since sharing is not done by tree
> +		      merging machinery.  We can not special case them earlier
> +		      because we still need to compute hash for further sharing
> +		      of trees referring to them.  */
> +		   && (size != 1
> +		       || (TREE_CODE (sccstack[first].t) != IDENTIFIER_NODE
> +			   && (TREE_CODE (sccstack[first].t) != INTEGER_CST
> +			       || TREE_OVERFLOW (sccstack[first].t)))))
> +
> +	    {
> +	      gcc_checking_assert (ob->section_type == LTO_section_decls);
> +	      streamer_write_record_start (ob, LTO_tree_scc);
> +	      streamer_write_uhwi (ob, size);
> +	      streamer_write_uhwi (ob, scc_hash);
> +	    }
> +	  /* Non-trivial SCCs must be packed to trees blocks so forward
> +	     references work correctly.  */
> +	  else if (size != 1)
> +	    {
> +	       streamer_write_record_start (ob, LTO_trees);
> +	       streamer_write_uhwi (ob, size);
> +	    }
>  
>  	  /* Write size-1 SCCs without wrapping them inside SCC bundles.
>  	     All INTEGER_CSTs need to be handled this way as we need
> @@ -722,10 +814,11 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
>  
>  	  /* Finally truncate the vector.  */
>  	  sccstack.truncate (first);
> +	  if ((int)first <= max_local_entry)
> +	    max_local_entry = first - 1;
>  
>  	  if (from_state)
>  	    from_state->low = MIN (from_state->low, cstate->low);
> -	  worklist_vec.pop ();
>  	  continue;
>  	}
>  
> @@ -1569,7 +1662,14 @@ DFS::DFS_write_tree (struct output_block *ob, sccs *from_state,
>  
>    /* Check if we already streamed EXPR.  */
>    if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
> -    return;
> +    {
> +      /* Refernece to a local tree makes entry also local.  We always process
> +	 top of stack entry, so set max to number of entries in stack - 1.  */
> +      if (ob->local_trees
> +	  && ob->local_trees->contains (expr))
> +	max_local_entry = sccstack.length () - 1;
> +      return;
> +    }
>  
>    worklist w;
>    w.expr = expr;
> @@ -1641,15 +1741,20 @@ lto_output_tree (struct output_block *ob, tree expr,
>        DFS (ob, expr, ref_p, this_ref_p, false);
>        in_dfs_walk = false;
>  
> -      /* Finally append a reference to the tree we were writing.
> -	 ???  If expr ended up as a singleton we could have
> -	 inlined it here and avoid outputting a reference.  */
> +      /* Finally append a reference to the tree we were writing.  */
>        existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
> -      gcc_assert (existed_p);
> -      streamer_write_record_start (ob, LTO_tree_pickle_reference);
> -      streamer_write_uhwi (ob, ix);
> -      streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
> -			   lto_tree_code_to_tag (TREE_CODE (expr)));
> +
> +      /* DFS walk above possibly skipped streaming EXPR itself to let us inline
> +	 it.  */
> +      if (!existed_p)
> +	lto_output_tree_1 (ob, expr, 0, ref_p, this_ref_p);
> +      else
> +	{
> +	  streamer_write_record_start (ob, LTO_tree_pickle_reference);
> +	  streamer_write_uhwi (ob, ix);
> +	  streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
> +			       lto_tree_code_to_tag (TREE_CODE (expr)));
> +	}
>        if (streamer_dump_file)
>  	{
>  	  print_node_brief (streamer_dump_file, "   Finished SCC of ",
> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> index 76aa6fe34b8..a466fb8b329 100644
> --- a/gcc/lto-streamer.h
> +++ b/gcc/lto-streamer.h
> @@ -178,6 +178,9 @@ enum LTO_tags
>    /* Special for global streamer.  A blob of unnamed tree nodes.  */
>    LTO_tree_scc,
>  
> +  /* Sequence of trees.  */
> +  LTO_trees,
> +
>    /* References to indexable tree nodes.  These objects are stored in
>       tables that are written separately from the function bodies that
>       reference them.  This way they can be instantiated even when the
> @@ -751,6 +754,9 @@ struct output_block
>    /* Cache of nodes written in this section.  */
>    struct streamer_tree_cache_d *writer_cache;
>  
> +  /* All trees identified as local to the unit streamed.  */
> +  hash_set<tree> *local_trees;
> +
>    /* All data persistent across whole duration of output block
>       can go here.  */
>    struct obstack obstack;
> @@ -901,7 +907,7 @@ tree lto_input_tree_ref (class lto_input_block *, class data_in *,
>  void lto_tag_check_set (enum LTO_tags, int, ...);
>  void lto_init_eh (void);
>  hashval_t lto_input_scc (class lto_input_block *, class data_in *,
> -			 unsigned *, unsigned *);
> +			 unsigned *, unsigned *, bool);
>  tree lto_input_tree_1 (class lto_input_block *, class data_in *,
>  		       enum LTO_tags, hashval_t hash);
>  tree lto_input_tree (class lto_input_block *, class data_in *);
> diff --git a/gcc/lto/lto-common.c b/gcc/lto/lto-common.c
> index 682a82d61ba..d04b1c9ca7b 100644
> --- a/gcc/lto/lto-common.c
> +++ b/gcc/lto/lto-common.c
> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "builtins.h"
>  #include "lto-common.h"
>  #include "tree-pretty-print.h"
> +#include "print-tree.h"
>  
>  /* True when no new types are going to be streamd from the global stream.  */
>  
> @@ -1054,6 +1055,7 @@ static unsigned long num_prevailing_types;
>  static unsigned long num_type_scc_trees;
>  static unsigned long total_scc_size;
>  static unsigned long num_sccs_read;
> +static unsigned long num_unshared_trees_read;
>  static unsigned long total_scc_size_merged;
>  static unsigned long num_sccs_merged;
>  static unsigned long num_scc_compares;
> @@ -1088,6 +1090,10 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
>    compare_values (TREE_CODE);
>    code = TREE_CODE (t1);
>  
> +  /* If we end up comparing translation unit decls we either forgot to mark
> +     some SCC as local or we compare too much.  */
> +  gcc_checking_assert (code != TRANSLATION_UNIT_DECL);
> +
>    if (!TYPE_P (t1))
>      {
>        compare_values (TREE_SIDE_EFFECTS);
> @@ -1626,6 +1632,28 @@ cmp_tree (const void *p1_, const void *p2_)
>    return ((uintptr_t)p1[1] < (uintptr_t)p2[1]) ? -1 : 1;
>  }
>  
> +/* New scc of size 1 containing T was streamed in from DATA_IN and not merged.
> +   Register it to reader cache at index FROM.  */
> +
> +static void
> +process_dref (class data_in *data_in, tree t, unsigned from)
> +{
> +  struct streamer_tree_cache_d *cache = data_in->reader_cache;
> +  /* If we got a debug reference queued, see if the prevailing
> +     tree has a debug reference and if not, register the one
> +     for the tree we are about to throw away.  */
> +  if (dref_queue.length () == 1)
> +    {
> +      dref_entry e = dref_queue.pop ();
> +      gcc_assert (e.decl
> +		  == streamer_tree_cache_get_tree (cache, from));
> +      const char *sym;
> +      unsigned HOST_WIDE_INT off;
> +      if (!debug_hooks->die_ref_for_decl (t, &sym, &off))
> +	debug_hooks->register_external_die (t, e.sym, e.off);
> +    }
> +}
> +
>  /* Try to unify the SCC with nodes FROM to FROM + LEN in CACHE and
>     hash value SCC_HASH with an already recorded SCC.  Return true if
>     that was successful, otherwise return false.  */
> @@ -1646,22 +1674,16 @@ unify_scc (class data_in *data_in, unsigned from,
>      {
>        tree t = streamer_tree_cache_get_tree (cache, from + i);
>        scc->entries[i] = t;
> -      /* Do not merge SCCs with local entities inside them.  Also do
> -	 not merge TRANSLATION_UNIT_DECLs and anonymous namespaces
> -	 and types therein types.  */
> -      if (TREE_CODE (t) == TRANSLATION_UNIT_DECL
> -	  || (VAR_OR_FUNCTION_DECL_P (t)
> -	      && !(TREE_PUBLIC (t) || DECL_EXTERNAL (t)))
> -	  || TREE_CODE (t) == LABEL_DECL
> -	  || (TREE_CODE (t) == NAMESPACE_DECL && !DECL_NAME (t))
> -	  || (TYPE_P (t)
> -	      && type_with_linkage_p (TYPE_MAIN_VARIANT (t))
> -	      && type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t))))
> -	{
> -	  /* Avoid doing any work for these cases and do not worry to
> -	     record the SCCs for further merging.  */
> -	  return false;
> -	}
> +      /* These types should be streamed as unshared.  */
> +      gcc_checking_assert
> +	 (!(TREE_CODE (t) == TRANSLATION_UNIT_DECL
> +	    || (VAR_OR_FUNCTION_DECL_P (t)
> +		&& !(TREE_PUBLIC (t) || DECL_EXTERNAL (t)))
> +	    || TREE_CODE (t) == LABEL_DECL
> +	    || (TREE_CODE (t) == NAMESPACE_DECL && !DECL_NAME (t))
> +	    || (TYPE_P (t)
> +		&& type_with_linkage_p (TYPE_MAIN_VARIANT (t))
> +		&& type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t)))));
>      }
>  
>    /* Look for the list of candidate SCCs to compare against.  */
> @@ -1712,21 +1734,7 @@ unify_scc (class data_in *data_in, unsigned from,
>  	     to the tree node mapping computed by compare_tree_sccs.  */
>  	  if (len == 1)
>  	    {
> -	      /* If we got a debug reference queued, see if the prevailing
> -		 tree has a debug reference and if not, register the one
> -		 for the tree we are about to throw away.  */
> -	      if (dref_queue.length () == 1)
> -		{
> -		  dref_entry e = dref_queue.pop ();
> -		  gcc_assert (e.decl
> -			      == streamer_tree_cache_get_tree (cache, from));
> -		  const char *sym;
> -		  unsigned HOST_WIDE_INT off;
> -		  if (!debug_hooks->die_ref_for_decl (pscc->entries[0], &sym,
> -						      &off))
> -		    debug_hooks->register_external_die (pscc->entries[0],
> -							e.sym, e.off);
> -		}
> +	      process_dref (data_in, pscc->entries[0], from);
>  	      lto_maybe_register_decl (data_in, pscc->entries[0], from);
>  	      streamer_tree_cache_replace_tree (cache, pscc->entries[0], from);
>  	    }
> @@ -1785,7 +1793,65 @@ unify_scc (class data_in *data_in, unsigned from,
>    return unified_p;
>  }
>  
> +typedef int_hash<unsigned, 0, UINT_MAX> code_id_hash;
> +
> +/* Do registering necessary once new tree fully streamed in (including all
> +   trees it reffers to).  */
> +
> +static void
> +process_new_tree (tree t, hash_map <code_id_hash, unsigned> *hm,
> +		  unsigned index, unsigned *total, class data_in *data_in)
> +{
> +  /* Reconstruct the type variant and pointer-to/reference-to
> +     chains.  */
> +  if (TYPE_P (t))
> +    {
> +      /* Map the tree types to their frequencies.  */
> +      if (flag_lto_dump_type_stats)
> +	{
> +	  unsigned key = (unsigned) TREE_CODE (t);
> +	  unsigned *countp = hm->get (key);
> +	  hm->put (key, countp ? (*countp) + 1 : 1);
> +	  (*total)++;
> +	}
> +
> +      num_prevailing_types++;
> +      lto_fixup_prevailing_type (t);
>  
> +      /* Compute the canonical type of all non-ODR types.
> +	 Delay ODR types for the end of merging process - the canonical
> +	 type for those can be computed using the (unique) name however
> +	 we want to do this only if units in other languages do not
> +	 contain structurally equivalent type.
> +
> +	 Because SCC components are streamed in random (hash) order
> +	 we may have encountered the type before while registering
> +	 type canonical of a derived type in the same SCC.  */
> +      if (!TYPE_CANONICAL (t))
> +	{
> +	  if (!RECORD_OR_UNION_TYPE_P (t)
> +	      || !TYPE_CXX_ODR_P (t))
> +	    gimple_register_canonical_type (t);
> +	  else if (COMPLETE_TYPE_P (t))
> +	    vec_safe_push (types_to_register, t);
> +	}
> +      if (TYPE_MAIN_VARIANT (t) == t && odr_type_p (t))
> +	register_odr_type (t);
> +    }
> +  /* Link shared INTEGER_CSTs into TYPE_CACHED_VALUEs of its
> +     type which is also member of this SCC.  */
> +  if (TREE_CODE (t) == INTEGER_CST
> +      && !TREE_OVERFLOW (t))
> +    cache_integer_cst (t);
> +  if (!flag_ltrans)
> +    {
> +      lto_maybe_register_decl (data_in, t, index);
> +      /* Scan the tree for references to global functions or
> +	 variables and record those for later fixup.  */
> +      if (mentions_vars_p (t))
> +	vec_safe_push (tree_with_vars, t);
> +    }
> +}
>  
>  /* Read all the symbols from buffer DATA, using descriptors in DECL_DATA.
>     RESOLUTIONS is the set of symbols picked by the linker (read from the
> @@ -1813,7 +1879,6 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
>    /* We do not uniquify the pre-loaded cache entries, those are middle-end
>       internal types that should not be merged.  */
>  
> -  typedef int_hash<unsigned, 0, UINT_MAX> code_id_hash;
>    hash_map <code_id_hash, unsigned> hm;
>    unsigned total = 0;
>  
> @@ -1824,31 +1889,41 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
>        unsigned from = data_in->reader_cache->nodes.length ();
>        /* Read and uniquify SCCs as in the input stream.  */
>        enum LTO_tags tag = streamer_read_record_start (&ib_main);
> -      if (tag == LTO_tree_scc)
> +      if (tag == LTO_tree_scc || tag == LTO_trees)
>  	{
>  	  unsigned len_;
>  	  unsigned scc_entry_len;
> +
> +	  /* Because we stream in SCC order we know that all unshared trees
> +	     are now fully streamed.  Process them.  */
>  	  hashval_t scc_hash = lto_input_scc (&ib_main, data_in, &len_,
> -					      &scc_entry_len);
> +					      &scc_entry_len,
> +					      tag == LTO_tree_scc);
>  	  unsigned len = data_in->reader_cache->nodes.length () - from;
>  	  gcc_assert (len == len_);
>  
> -	  total_scc_size += len;
> -	  num_sccs_read++;
> +	  if (tag == LTO_tree_scc)
> +	    {
> +	      total_scc_size += len;
> +	      num_sccs_read++;
> +	    }
> +	  else
> +	    num_unshared_trees_read += len;
>  
>  	  /* We have the special case of size-1 SCCs that are pre-merged
>  	     by means of identifier and string sharing for example.
>  	     ???  Maybe we should avoid streaming those as SCCs.  */
>  	  tree first = streamer_tree_cache_get_tree (data_in->reader_cache,
>  						     from);
> -	  if (len == 1
> -	      && (TREE_CODE (first) == IDENTIFIER_NODE
> -		  || (TREE_CODE (first) == INTEGER_CST
> -		      && !TREE_OVERFLOW (first))))
> -	    continue;
> +	  /* Identifier and integers are shared specially, they should never
> +	     go by the tree merging path.  */
> +	  gcc_checking_assert ((TREE_CODE (first) != IDENTIFIER_NODE
> +				&& (TREE_CODE (first) != INTEGER_CST
> +				    || TREE_OVERFLOW (first)))
> +			       || len != 1);
>  
>  	  /* Try to unify the SCC with already existing ones.  */
> -	  if (!flag_ltrans
> +	  if (!flag_ltrans && tag != LTO_trees
>  	      && unify_scc (data_in, from,
>  			    len, scc_entry_len, scc_hash))
>  	    continue;
> @@ -1862,56 +1937,9 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
>  	    {
>  	      tree t = streamer_tree_cache_get_tree (data_in->reader_cache,
>  						     from + i);
> -	      /* Reconstruct the type variant and pointer-to/reference-to
> -		 chains.  */
> +	      process_new_tree (t, &hm, from + i, &total, data_in);
>  	      if (TYPE_P (t))
> -		{
> -		  /* Map the tree types to their frequencies.  */
> -		  if (flag_lto_dump_type_stats)
> -		    {
> -		      unsigned key = (unsigned) TREE_CODE (t);
> -		      unsigned *countp = hm.get (key);
> -		      hm.put (key, countp ? (*countp) + 1 : 1);
> -		      total++;
> -		    }
> -
> -		  seen_type = true;
> -		  num_prevailing_types++;
> -		  lto_fixup_prevailing_type (t);
> -
> -		  /* Compute the canonical type of all non-ODR types.
> -		     Delay ODR types for the end of merging process - the canonical
> -		     type for those can be computed using the (unique) name however
> -		     we want to do this only if units in other languages do not
> -		     contain structurally equivalent type.
> -
> -		     Because SCC components are streamed in random (hash) order
> -		     we may have encountered the type before while registering
> -		     type canonical of a derived type in the same SCC.  */
> -		  if (!TYPE_CANONICAL (t))
> -		    {
> -		      if (!RECORD_OR_UNION_TYPE_P (t)
> -			  || !TYPE_CXX_ODR_P (t))
> -		        gimple_register_canonical_type (t);
> -		      else if (COMPLETE_TYPE_P (t))
> -			vec_safe_push (types_to_register, t);
> -		    }
> -		  if (TYPE_MAIN_VARIANT (t) == t && odr_type_p (t))
> -		    register_odr_type (t);
> -		}
> -	      /* Link shared INTEGER_CSTs into TYPE_CACHED_VALUEs of its
> -		 type which is also member of this SCC.  */
> -	      if (TREE_CODE (t) == INTEGER_CST
> -		  && !TREE_OVERFLOW (t))
> -		cache_integer_cst (t);
> -	      if (!flag_ltrans)
> -		{
> -		  lto_maybe_register_decl (data_in, t, from + i);
> -		  /* Scan the tree for references to global functions or
> -		     variables and record those for later fixup.  */
> -		  if (mentions_vars_p (t))
> -		    vec_safe_push (tree_with_vars, t);
> -		}
> +		seen_type = true;
>  	    }
>  
>  	  /* Register DECLs with the debuginfo machinery.  */
> @@ -1926,9 +1954,26 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
>  	}
>        else
>  	{
> -	  /* Pickle stray references.  */
>  	  t = lto_input_tree_1 (&ib_main, data_in, tag, 0);
> -	  gcc_assert (t && data_in->reader_cache->nodes.length () == from);
> +	  /* We streamed in new tree.  Add it to cache and process dref.  */
> +	  if (data_in->reader_cache->nodes.length () == from + 1)
> +	    {
> +	      num_unshared_trees_read++;
> +	      data_in->location_cache.accept_location_cache ();
> +	      process_dref (data_in, t, from);
> +	      if (TREE_CODE (t) == IDENTIFIER_NODE
> +		  || (TREE_CODE (t) == INTEGER_CST
> +		      && !TREE_OVERFLOW (t)))
> +		;
> +	      else
> +		{
> +		  lto_maybe_register_decl (data_in, t, from);
> +		  process_new_tree (t, &hm, from, &total, data_in);
> +		}
> +	    }
> +	  else
> +	    /* FIXME: It seems useless to pickle stray references.  */
> +	    gcc_assert (data_in->reader_cache->nodes.length () == from);
>  	}
>      }
>  
> @@ -2953,10 +2998,13 @@ print_lto_report_1 (void)
>    const char *pfx = (flag_lto) ? "LTO" : (flag_wpa) ? "WPA" : "LTRANS";
>    fprintf (stderr, "%s statistics\n", pfx);
>  
> -  fprintf (stderr, "[%s] read %lu SCCs of average size %f\n",
> +  fprintf (stderr, "[%s] read %lu unshared trees\n",
> +	   pfx, num_unshared_trees_read);
> +  fprintf (stderr, "[%s] read %lu mergeable SCCs of average size %f\n",
>  	   pfx, num_sccs_read, total_scc_size / (double)num_sccs_read);
> -  fprintf (stderr, "[%s] %lu tree bodies read in total\n", pfx, total_scc_size);
> -  if (flag_wpa && tree_scc_hash)
> +  fprintf (stderr, "[%s] %lu tree bodies read in total\n", pfx,
> +	   total_scc_size + num_unshared_trees_read);
> +  if (flag_wpa && tree_scc_hash && num_sccs_read)
>      {
>        fprintf (stderr, "[%s] tree SCC table: size %ld, %ld elements, "
>  	       "collision ratio: %f\n", pfx,
>
Jan Hubicka May 20, 2020, 11:49 a.m. UTC | #3
> On 5/19/20 10:46 PM, Jan Hubicka wrote:
> > Martin, the zstd compression breaks the compression statistics (it works when
> > GCC is configured for zlib)
> 
> Hello.
> 
> Can you please be please more concrete? I can help with, but I don't know
> what's broken.
Sure, the compression statistics are in -flto-report.
Currently I get:
[WPA] Compression: 99229066 input bytes, 0 uncompressed bytes (ratio: 0.000000)

It should print sane info about #of uncomporessed bytes and ration.
(it prints sane output with zlib compression on my other machine where i
did not install lzop).

On related note, I remember us discussing that std compression has
problem with bigger headers then zlib.  Since we stream our header that
says if section is compressed, I wonder if we could teach stream-out
phase to skip compression if it is not benefical, so the size info is
not duplicated?

Honza
> 
> Thanks,
> Martin
Jan Hubicka May 20, 2020, 11:57 a.m. UTC | #4
Hi,
> 
> The patch looks reasonable - the wins are not entirely clear to me
> since you mix in LTO streaming format optimizations.  You promised
> to improve collisions on the SCC merging side but those are not
> spectacular (if present at all) in the above data?

Thanks for looking into that. Collision ratios was imporving on Firefox
that has more anonymous types and their derivations, but I was currently
seeing an ICE building it.  I will debug that.

I think, quite surprisingly, most stream size benefits come from not
streaming hash codes for integer_csts and indentifier_nodes.
These are common and hash codes are large + hard to compress.

I was thinking about trying making them artifically smaller (32bit only)
once we nail down the unecessary scc chains.  I will look into that now
- last time i did stats it was mostly anonymous types and that is why I
started to worry about them first.
> 
> The patch itself looks really nice.  The only comment I have is
> with respect to the new local_tree_p predicate and its relation
> to tree_is_indexable.  We should be able to assert that
> local trees are not indexable and indexable trees never are local.

This is not true.  Anonymous namespace types are indexable (since they
are shared across function bodies) but they are local to translation
unit. Similarly non-public decls.

> Does this make the predicates equal? ;)  It would be nice if
> they would cross-reference each other (maybe even assert at least
> in one way), maybe put next to each other and share parts?
> We can do this as followup.

If you are OK doing more stuff as folowup, I would be happy.  I think it
is good base to fix other issues (track down longer chain in merging;
sanify scc streaming more etc).

I just tried to avoid snowballing, since the patch already mixes few
things together.

Thanks,
Honza
> 
> Thanks,
> Richard.
> 
> > gcc/ChangeLog:
> > 
> > 2020-05-19  Jan Hubicka  <hubicka@ucw.cz>
> > 
> > 	* lto-streamer-in.c (lto_input_scc): Add SHARED_SCC parameter.
> > 	(lto_input_tree_1): Strenghten sanity check.
> > 	(lto_input_tree): Update call of lto_input_scc.
> > 	* lto-streamer-out.c: Include ipa-utils.h
> > 	(create_output_block): Initialize local_trees if merigng is going
> > 	to happen.
> > 	(destroy_output_block): Destroy local_trees.
> > 	(DFS): Add max_local_entry.
> > 	(local_tree_p): New function.
> > 	(DFS::DFS): Initialize and maintain it.
> > 	(DFS::DFS_write_tree): Decide on streaming format.
> > 	(lto_output_tree): Stream inline singleton SCCs
> > 	* lto-streamer.h (enum LTO_tags): Add LTO_trees.
> > 	(struct output_block): Add local_trees.
> > 	(lto_input_scc): Update prototype.
> > 
> > gcc/lto/ChangeLog:
> > 
> > 2020-05-19  Jan Hubicka  <hubicka@ucw.cz>
> > 
> > 	* lto-common.c (compare_tree_sccs_1): Sanity check that we never
> > 	read TRANSLATION_UNIT_DECL.
> > 	(process_dref): Break out from ...
> > 	(unify_scc): ... here.
> > 	(process_new_tree): Break out from ...
> > 	(lto_read_decls): ... here; handle streaming of singleton trees.
> > 	(print_lto_report_1): Update statistics.
> > 
> > diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c
> > index 244f5b8aa5c..6d571c4344a 100644
> > --- a/gcc/lto-streamer-in.c
> > +++ b/gcc/lto-streamer-in.c
> > @@ -1424,16 +1424,17 @@ lto_read_tree (class lto_input_block *ib, class data_in *data_in,
> >  
> >  
> >  /* Populate the reader cache with trees materialized from the SCC
> > -   following in the IB, DATA_IN stream.  */
> > +   following in the IB, DATA_IN stream.
> > +   If SHARED_SCC is true we input LTO_tree_scc.  */
> >  
> >  hashval_t
> >  lto_input_scc (class lto_input_block *ib, class data_in *data_in,
> > -	       unsigned *len, unsigned *entry_len)
> > +	       unsigned *len, unsigned *entry_len, bool shared_scc)
> >  {
> >    /* A blob of unnamed tree nodes, fill the cache from it and
> >       recurse.  */
> >    unsigned size = streamer_read_uhwi (ib);
> > -  hashval_t scc_hash = streamer_read_uhwi (ib);
> > +  hashval_t scc_hash = shared_scc ? streamer_read_uhwi (ib) : 0;
> >    unsigned scc_entry_len = 1;
> >  
> >    if (size == 1)
> > @@ -1456,7 +1457,8 @@ lto_input_scc (class lto_input_block *ib, class data_in *data_in,
> >  	      || (tag >= LTO_field_decl_ref && tag <= LTO_global_decl_ref)
> >  	      || tag == LTO_tree_pickle_reference
> >  	      || tag == LTO_integer_cst
> > -	      || tag == LTO_tree_scc)
> > +	      || tag == LTO_tree_scc
> > +	      || tag == LTO_trees)
> >  	    gcc_unreachable ();
> >  
> >  	  result = streamer_alloc_tree (ib, data_in, tag);
> > @@ -1522,7 +1524,7 @@ lto_input_tree_1 (class lto_input_block *ib, class data_in *data_in,
> >  				 (a, len, TYPE_PRECISION (type)));
> >        streamer_tree_cache_append (data_in->reader_cache, result, hash);
> >      }
> > -  else if (tag == LTO_tree_scc)
> > +  else if (tag == LTO_tree_scc || tag == LTO_trees)
> >      gcc_unreachable ();
> >    else
> >      {
> > @@ -1538,11 +1540,11 @@ lto_input_tree (class lto_input_block *ib, class data_in *data_in)
> >  {
> >    enum LTO_tags tag;
> >  
> > -  /* Input and skip SCCs.  */
> > -  while ((tag = streamer_read_record_start (ib)) == LTO_tree_scc)
> > +  /* Input pickled trees needed to stream in the refrence.  */
> > +  while ((tag = streamer_read_record_start (ib)) == LTO_trees)
> >      {
> >        unsigned len, entry_len;
> > -      lto_input_scc (ib, data_in, &len, &entry_len);
> > +      lto_input_scc (ib, data_in, &len, &entry_len, false);
> >  
> >        /* Register DECLs with the debuginfo machinery.  */
> >        while (!dref_queue.is_empty ())
> > @@ -1551,7 +1553,15 @@ lto_input_tree (class lto_input_block *ib, class data_in *data_in)
> >  	  debug_hooks->register_external_die (e.decl, e.sym, e.off);
> >  	}
> >      }
> > -  return lto_input_tree_1 (ib, data_in, tag, 0);
> > +  tree t = lto_input_tree_1 (ib, data_in, tag, 0);
> > +
> > +  if (!dref_queue.is_empty ())
> > +    {
> > +      dref_entry e = dref_queue.pop ();
> > +      debug_hooks->register_external_die (e.decl, e.sym, e.off);
> > +      gcc_checking_assert (dref_queue.is_empty ());
> > +    }
> > +  return t;
> >  }
> >  
> >  
> > diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
> > index 3d94324881f..1c5b8f42c9b 100644
> > --- a/gcc/lto-streamer-out.c
> > +++ b/gcc/lto-streamer-out.c
> > @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-dfa.h"
> >  #include "file-prefix-map.h" /* remap_debug_filename()  */
> >  #include "output.h"
> > +#include "ipa-utils.h"
> >  
> >  
> >  static void lto_write_tree (struct output_block*, tree, bool);
> > @@ -75,6 +76,10 @@ create_output_block (enum lto_section_type section_type)
> >  
> >    ob->section_type = section_type;
> >    ob->decl_state = lto_get_out_decl_state ();
> > +  /* Only global decl stream in non-wpa will ever be considered by tree
> > +     merging.  */
> > +  if (!flag_wpa && section_type == LTO_section_decls)
> > +    ob->local_trees = new (hash_set <tree>);
> >    ob->main_stream = XCNEW (struct lto_output_stream);
> >    ob->string_stream = XCNEW (struct lto_output_stream);
> >    ob->writer_cache = streamer_tree_cache_create (!flag_wpa, true, false);
> > @@ -100,6 +105,7 @@ destroy_output_block (struct output_block *ob)
> >  
> >    delete ob->string_hash_table;
> >    ob->string_hash_table = NULL;
> > +  delete ob->local_trees;
> >  
> >    free (ob->main_stream);
> >    free (ob->string_stream);
> > @@ -532,6 +538,8 @@ private:
> >      bool ref_p;
> >      bool this_ref_p;
> >    };
> > +  /* Maximum index of scc stack containing a local tree.  */
> > +  int max_local_entry;
> >  
> >    static int scc_entry_compare (const void *, const void *);
> >  
> > @@ -550,6 +558,33 @@ private:
> >    struct obstack sccstate_obstack;
> >  };
> >  
> > +/* Return true if type can be merged.
> > +   TRANSLATION_UNIT_DECL is handled specially since references to it does
> > +   not make other trees local as well.  */
> > +
> > +static bool
> > +local_tree_p (tree t)
> > +{
> > +  switch (TREE_CODE (t))
> > +    {
> > +    case LABEL_DECL:
> > +      return true;
> > +    case NAMESPACE_DECL:
> > +      return !DECL_NAME (t);
> > +    case VAR_DECL:
> > +    case FUNCTION_DECL:
> > +      return !TREE_PUBLIC (t) && !DECL_EXTERNAL (t);
> > +    case RECORD_TYPE:
> > +    case UNION_TYPE:
> > +    case ENUMERAL_TYPE:
> > +      return TYPE_MAIN_VARIANT (t) == t
> > +	     && odr_type_p (t) && type_with_linkage_p (t)
> > +	     && type_in_anonymous_namespace_p (t);
> > +    default:
> > +      return false;
> > +    }
> > +}
> > +
> >  /* Emit the physical representation of tree node EXPR to output block OB,
> >     using depth-first search on the subgraph.  If THIS_REF_P is true, the
> >     leaves of EXPR are emitted as references via lto_output_tree_ref.
> > @@ -560,6 +595,8 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
> >  	  bool single_p)
> >  {
> >    unsigned int next_dfs_num = 1;
> > +
> > +  max_local_entry = -1;
> >    gcc_obstack_init (&sccstate_obstack);
> >    DFS_write_tree (ob, NULL, expr, ref_p, this_ref_p);
> >    while (!worklist_vec.is_empty ())
> > @@ -586,6 +623,8 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
> >  	  scc_entry e = { expr, 0 };
> >  	  /* Not yet visited.  DFS recurse and push it onto the stack.  */
> >  	  *slot = cstate = XOBNEW (&sccstate_obstack, struct sccs);
> > +	  if (ob->local_trees && local_tree_p (expr))
> > +	    max_local_entry = sccstack.length ();
> >  	  sccstack.safe_push (e);
> >  	  cstate->dfsnum = next_dfs_num++;
> >  	  cstate->low = cstate->dfsnum;
> > @@ -640,7 +679,26 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
> >  	     any merging there.  */
> >  	  hashval_t scc_hash = 0;
> >  	  unsigned scc_entry_len = 0;
> > -	  if (!flag_wpa)
> > +	  bool local_to_unit = !ob->local_trees
> > +			       || max_local_entry >= (int)first;
> > +
> > +	  /* Remember that trees are local so info gets propagated to other
> > +	     SCCs.  */
> > +	  if (local_to_unit && ob->local_trees)
> > +	    {
> > +	      for (unsigned i = 0; i < size; ++i)
> > +		ob->local_trees->add (sccstack[first + i].t);
> > +	    }
> > +
> > +	  /* As a special case do not stream TRANSLATION_UNIT_DECL as shared
> > +	     tree.  We can not mark it local because references to it does not
> > +	     make other trees local (all global decls reffer to it via
> > +	     CONTEXT).  */
> > +	  if (size == 1
> > +	      && TREE_CODE (sccstack[first].t) == TRANSLATION_UNIT_DECL)
> > +	    local_to_unit = true;
> > +
> > +	  if (!local_to_unit)
> >  	    {
> >  	      scc_hash = hash_scc (ob, first, size, ref_p, this_ref_p);
> >  
> > @@ -672,10 +730,44 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
> >  	      gcc_checking_assert (scc_entry_len == 1);
> >  	    }
> >  
> > -	  /* Write LTO_tree_scc.  */
> > -	  streamer_write_record_start (ob, LTO_tree_scc);
> > -	  streamer_write_uhwi (ob, size);
> > -	  streamer_write_uhwi (ob, scc_hash);
> > +	  worklist_vec.pop ();
> > +
> > +	  /* If we are not streaming global section tree sharing will not
> > +	     be done, but we must mark block of trees so streamer can
> > +	     distinguish it from reference being streamed.  */
> > +	  if (ob->section_type != LTO_section_decls)
> > +	    {
> > +	      /* If this is the original tree we stream and it forms SCC
> > +		 by itself then we do not need to stream SCC at all.  */
> > +	      if (worklist_vec.is_empty () && first == 0 && size == 1)
> > +		 return;
> > +	      streamer_write_record_start (ob, LTO_trees);
> > +	      streamer_write_uhwi (ob, size);
> > +	    }
> > +	  /* Write LTO_tree_scc if sharing is going to be performned.  */
> > +	  else if (!local_to_unit
> > +		   /* These are special since sharing is not done by tree
> > +		      merging machinery.  We can not special case them earlier
> > +		      because we still need to compute hash for further sharing
> > +		      of trees referring to them.  */
> > +		   && (size != 1
> > +		       || (TREE_CODE (sccstack[first].t) != IDENTIFIER_NODE
> > +			   && (TREE_CODE (sccstack[first].t) != INTEGER_CST
> > +			       || TREE_OVERFLOW (sccstack[first].t)))))
> > +
> > +	    {
> > +	      gcc_checking_assert (ob->section_type == LTO_section_decls);
> > +	      streamer_write_record_start (ob, LTO_tree_scc);
> > +	      streamer_write_uhwi (ob, size);
> > +	      streamer_write_uhwi (ob, scc_hash);
> > +	    }
> > +	  /* Non-trivial SCCs must be packed to trees blocks so forward
> > +	     references work correctly.  */
> > +	  else if (size != 1)
> > +	    {
> > +	       streamer_write_record_start (ob, LTO_trees);
> > +	       streamer_write_uhwi (ob, size);
> > +	    }
> >  
> >  	  /* Write size-1 SCCs without wrapping them inside SCC bundles.
> >  	     All INTEGER_CSTs need to be handled this way as we need
> > @@ -722,10 +814,11 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
> >  
> >  	  /* Finally truncate the vector.  */
> >  	  sccstack.truncate (first);
> > +	  if ((int)first <= max_local_entry)
> > +	    max_local_entry = first - 1;
> >  
> >  	  if (from_state)
> >  	    from_state->low = MIN (from_state->low, cstate->low);
> > -	  worklist_vec.pop ();
> >  	  continue;
> >  	}
> >  
> > @@ -1569,7 +1662,14 @@ DFS::DFS_write_tree (struct output_block *ob, sccs *from_state,
> >  
> >    /* Check if we already streamed EXPR.  */
> >    if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
> > -    return;
> > +    {
> > +      /* Refernece to a local tree makes entry also local.  We always process
> > +	 top of stack entry, so set max to number of entries in stack - 1.  */
> > +      if (ob->local_trees
> > +	  && ob->local_trees->contains (expr))
> > +	max_local_entry = sccstack.length () - 1;
> > +      return;
> > +    }
> >  
> >    worklist w;
> >    w.expr = expr;
> > @@ -1641,15 +1741,20 @@ lto_output_tree (struct output_block *ob, tree expr,
> >        DFS (ob, expr, ref_p, this_ref_p, false);
> >        in_dfs_walk = false;
> >  
> > -      /* Finally append a reference to the tree we were writing.
> > -	 ???  If expr ended up as a singleton we could have
> > -	 inlined it here and avoid outputting a reference.  */
> > +      /* Finally append a reference to the tree we were writing.  */
> >        existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
> > -      gcc_assert (existed_p);
> > -      streamer_write_record_start (ob, LTO_tree_pickle_reference);
> > -      streamer_write_uhwi (ob, ix);
> > -      streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
> > -			   lto_tree_code_to_tag (TREE_CODE (expr)));
> > +
> > +      /* DFS walk above possibly skipped streaming EXPR itself to let us inline
> > +	 it.  */
> > +      if (!existed_p)
> > +	lto_output_tree_1 (ob, expr, 0, ref_p, this_ref_p);
> > +      else
> > +	{
> > +	  streamer_write_record_start (ob, LTO_tree_pickle_reference);
> > +	  streamer_write_uhwi (ob, ix);
> > +	  streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
> > +			       lto_tree_code_to_tag (TREE_CODE (expr)));
> > +	}
> >        if (streamer_dump_file)
> >  	{
> >  	  print_node_brief (streamer_dump_file, "   Finished SCC of ",
> > diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> > index 76aa6fe34b8..a466fb8b329 100644
> > --- a/gcc/lto-streamer.h
> > +++ b/gcc/lto-streamer.h
> > @@ -178,6 +178,9 @@ enum LTO_tags
> >    /* Special for global streamer.  A blob of unnamed tree nodes.  */
> >    LTO_tree_scc,
> >  
> > +  /* Sequence of trees.  */
> > +  LTO_trees,
> > +
> >    /* References to indexable tree nodes.  These objects are stored in
> >       tables that are written separately from the function bodies that
> >       reference them.  This way they can be instantiated even when the
> > @@ -751,6 +754,9 @@ struct output_block
> >    /* Cache of nodes written in this section.  */
> >    struct streamer_tree_cache_d *writer_cache;
> >  
> > +  /* All trees identified as local to the unit streamed.  */
> > +  hash_set<tree> *local_trees;
> > +
> >    /* All data persistent across whole duration of output block
> >       can go here.  */
> >    struct obstack obstack;
> > @@ -901,7 +907,7 @@ tree lto_input_tree_ref (class lto_input_block *, class data_in *,
> >  void lto_tag_check_set (enum LTO_tags, int, ...);
> >  void lto_init_eh (void);
> >  hashval_t lto_input_scc (class lto_input_block *, class data_in *,
> > -			 unsigned *, unsigned *);
> > +			 unsigned *, unsigned *, bool);
> >  tree lto_input_tree_1 (class lto_input_block *, class data_in *,
> >  		       enum LTO_tags, hashval_t hash);
> >  tree lto_input_tree (class lto_input_block *, class data_in *);
> > diff --git a/gcc/lto/lto-common.c b/gcc/lto/lto-common.c
> > index 682a82d61ba..d04b1c9ca7b 100644
> > --- a/gcc/lto/lto-common.c
> > +++ b/gcc/lto/lto-common.c
> > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "builtins.h"
> >  #include "lto-common.h"
> >  #include "tree-pretty-print.h"
> > +#include "print-tree.h"
> >  
> >  /* True when no new types are going to be streamd from the global stream.  */
> >  
> > @@ -1054,6 +1055,7 @@ static unsigned long num_prevailing_types;
> >  static unsigned long num_type_scc_trees;
> >  static unsigned long total_scc_size;
> >  static unsigned long num_sccs_read;
> > +static unsigned long num_unshared_trees_read;
> >  static unsigned long total_scc_size_merged;
> >  static unsigned long num_sccs_merged;
> >  static unsigned long num_scc_compares;
> > @@ -1088,6 +1090,10 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
> >    compare_values (TREE_CODE);
> >    code = TREE_CODE (t1);
> >  
> > +  /* If we end up comparing translation unit decls we either forgot to mark
> > +     some SCC as local or we compare too much.  */
> > +  gcc_checking_assert (code != TRANSLATION_UNIT_DECL);
> > +
> >    if (!TYPE_P (t1))
> >      {
> >        compare_values (TREE_SIDE_EFFECTS);
> > @@ -1626,6 +1632,28 @@ cmp_tree (const void *p1_, const void *p2_)
> >    return ((uintptr_t)p1[1] < (uintptr_t)p2[1]) ? -1 : 1;
> >  }
> >  
> > +/* New scc of size 1 containing T was streamed in from DATA_IN and not merged.
> > +   Register it to reader cache at index FROM.  */
> > +
> > +static void
> > +process_dref (class data_in *data_in, tree t, unsigned from)
> > +{
> > +  struct streamer_tree_cache_d *cache = data_in->reader_cache;
> > +  /* If we got a debug reference queued, see if the prevailing
> > +     tree has a debug reference and if not, register the one
> > +     for the tree we are about to throw away.  */
> > +  if (dref_queue.length () == 1)
> > +    {
> > +      dref_entry e = dref_queue.pop ();
> > +      gcc_assert (e.decl
> > +		  == streamer_tree_cache_get_tree (cache, from));
> > +      const char *sym;
> > +      unsigned HOST_WIDE_INT off;
> > +      if (!debug_hooks->die_ref_for_decl (t, &sym, &off))
> > +	debug_hooks->register_external_die (t, e.sym, e.off);
> > +    }
> > +}
> > +
> >  /* Try to unify the SCC with nodes FROM to FROM + LEN in CACHE and
> >     hash value SCC_HASH with an already recorded SCC.  Return true if
> >     that was successful, otherwise return false.  */
> > @@ -1646,22 +1674,16 @@ unify_scc (class data_in *data_in, unsigned from,
> >      {
> >        tree t = streamer_tree_cache_get_tree (cache, from + i);
> >        scc->entries[i] = t;
> > -      /* Do not merge SCCs with local entities inside them.  Also do
> > -	 not merge TRANSLATION_UNIT_DECLs and anonymous namespaces
> > -	 and types therein types.  */
> > -      if (TREE_CODE (t) == TRANSLATION_UNIT_DECL
> > -	  || (VAR_OR_FUNCTION_DECL_P (t)
> > -	      && !(TREE_PUBLIC (t) || DECL_EXTERNAL (t)))
> > -	  || TREE_CODE (t) == LABEL_DECL
> > -	  || (TREE_CODE (t) == NAMESPACE_DECL && !DECL_NAME (t))
> > -	  || (TYPE_P (t)
> > -	      && type_with_linkage_p (TYPE_MAIN_VARIANT (t))
> > -	      && type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t))))
> > -	{
> > -	  /* Avoid doing any work for these cases and do not worry to
> > -	     record the SCCs for further merging.  */
> > -	  return false;
> > -	}
> > +      /* These types should be streamed as unshared.  */
> > +      gcc_checking_assert
> > +	 (!(TREE_CODE (t) == TRANSLATION_UNIT_DECL
> > +	    || (VAR_OR_FUNCTION_DECL_P (t)
> > +		&& !(TREE_PUBLIC (t) || DECL_EXTERNAL (t)))
> > +	    || TREE_CODE (t) == LABEL_DECL
> > +	    || (TREE_CODE (t) == NAMESPACE_DECL && !DECL_NAME (t))
> > +	    || (TYPE_P (t)
> > +		&& type_with_linkage_p (TYPE_MAIN_VARIANT (t))
> > +		&& type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t)))));
> >      }
> >  
> >    /* Look for the list of candidate SCCs to compare against.  */
> > @@ -1712,21 +1734,7 @@ unify_scc (class data_in *data_in, unsigned from,
> >  	     to the tree node mapping computed by compare_tree_sccs.  */
> >  	  if (len == 1)
> >  	    {
> > -	      /* If we got a debug reference queued, see if the prevailing
> > -		 tree has a debug reference and if not, register the one
> > -		 for the tree we are about to throw away.  */
> > -	      if (dref_queue.length () == 1)
> > -		{
> > -		  dref_entry e = dref_queue.pop ();
> > -		  gcc_assert (e.decl
> > -			      == streamer_tree_cache_get_tree (cache, from));
> > -		  const char *sym;
> > -		  unsigned HOST_WIDE_INT off;
> > -		  if (!debug_hooks->die_ref_for_decl (pscc->entries[0], &sym,
> > -						      &off))
> > -		    debug_hooks->register_external_die (pscc->entries[0],
> > -							e.sym, e.off);
> > -		}
> > +	      process_dref (data_in, pscc->entries[0], from);
> >  	      lto_maybe_register_decl (data_in, pscc->entries[0], from);
> >  	      streamer_tree_cache_replace_tree (cache, pscc->entries[0], from);
> >  	    }
> > @@ -1785,7 +1793,65 @@ unify_scc (class data_in *data_in, unsigned from,
> >    return unified_p;
> >  }
> >  
> > +typedef int_hash<unsigned, 0, UINT_MAX> code_id_hash;
> > +
> > +/* Do registering necessary once new tree fully streamed in (including all
> > +   trees it reffers to).  */
> > +
> > +static void
> > +process_new_tree (tree t, hash_map <code_id_hash, unsigned> *hm,
> > +		  unsigned index, unsigned *total, class data_in *data_in)
> > +{
> > +  /* Reconstruct the type variant and pointer-to/reference-to
> > +     chains.  */
> > +  if (TYPE_P (t))
> > +    {
> > +      /* Map the tree types to their frequencies.  */
> > +      if (flag_lto_dump_type_stats)
> > +	{
> > +	  unsigned key = (unsigned) TREE_CODE (t);
> > +	  unsigned *countp = hm->get (key);
> > +	  hm->put (key, countp ? (*countp) + 1 : 1);
> > +	  (*total)++;
> > +	}
> > +
> > +      num_prevailing_types++;
> > +      lto_fixup_prevailing_type (t);
> >  
> > +      /* Compute the canonical type of all non-ODR types.
> > +	 Delay ODR types for the end of merging process - the canonical
> > +	 type for those can be computed using the (unique) name however
> > +	 we want to do this only if units in other languages do not
> > +	 contain structurally equivalent type.
> > +
> > +	 Because SCC components are streamed in random (hash) order
> > +	 we may have encountered the type before while registering
> > +	 type canonical of a derived type in the same SCC.  */
> > +      if (!TYPE_CANONICAL (t))
> > +	{
> > +	  if (!RECORD_OR_UNION_TYPE_P (t)
> > +	      || !TYPE_CXX_ODR_P (t))
> > +	    gimple_register_canonical_type (t);
> > +	  else if (COMPLETE_TYPE_P (t))
> > +	    vec_safe_push (types_to_register, t);
> > +	}
> > +      if (TYPE_MAIN_VARIANT (t) == t && odr_type_p (t))
> > +	register_odr_type (t);
> > +    }
> > +  /* Link shared INTEGER_CSTs into TYPE_CACHED_VALUEs of its
> > +     type which is also member of this SCC.  */
> > +  if (TREE_CODE (t) == INTEGER_CST
> > +      && !TREE_OVERFLOW (t))
> > +    cache_integer_cst (t);
> > +  if (!flag_ltrans)
> > +    {
> > +      lto_maybe_register_decl (data_in, t, index);
> > +      /* Scan the tree for references to global functions or
> > +	 variables and record those for later fixup.  */
> > +      if (mentions_vars_p (t))
> > +	vec_safe_push (tree_with_vars, t);
> > +    }
> > +}
> >  
> >  /* Read all the symbols from buffer DATA, using descriptors in DECL_DATA.
> >     RESOLUTIONS is the set of symbols picked by the linker (read from the
> > @@ -1813,7 +1879,6 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
> >    /* We do not uniquify the pre-loaded cache entries, those are middle-end
> >       internal types that should not be merged.  */
> >  
> > -  typedef int_hash<unsigned, 0, UINT_MAX> code_id_hash;
> >    hash_map <code_id_hash, unsigned> hm;
> >    unsigned total = 0;
> >  
> > @@ -1824,31 +1889,41 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
> >        unsigned from = data_in->reader_cache->nodes.length ();
> >        /* Read and uniquify SCCs as in the input stream.  */
> >        enum LTO_tags tag = streamer_read_record_start (&ib_main);
> > -      if (tag == LTO_tree_scc)
> > +      if (tag == LTO_tree_scc || tag == LTO_trees)
> >  	{
> >  	  unsigned len_;
> >  	  unsigned scc_entry_len;
> > +
> > +	  /* Because we stream in SCC order we know that all unshared trees
> > +	     are now fully streamed.  Process them.  */
> >  	  hashval_t scc_hash = lto_input_scc (&ib_main, data_in, &len_,
> > -					      &scc_entry_len);
> > +					      &scc_entry_len,
> > +					      tag == LTO_tree_scc);
> >  	  unsigned len = data_in->reader_cache->nodes.length () - from;
> >  	  gcc_assert (len == len_);
> >  
> > -	  total_scc_size += len;
> > -	  num_sccs_read++;
> > +	  if (tag == LTO_tree_scc)
> > +	    {
> > +	      total_scc_size += len;
> > +	      num_sccs_read++;
> > +	    }
> > +	  else
> > +	    num_unshared_trees_read += len;
> >  
> >  	  /* We have the special case of size-1 SCCs that are pre-merged
> >  	     by means of identifier and string sharing for example.
> >  	     ???  Maybe we should avoid streaming those as SCCs.  */
> >  	  tree first = streamer_tree_cache_get_tree (data_in->reader_cache,
> >  						     from);
> > -	  if (len == 1
> > -	      && (TREE_CODE (first) == IDENTIFIER_NODE
> > -		  || (TREE_CODE (first) == INTEGER_CST
> > -		      && !TREE_OVERFLOW (first))))
> > -	    continue;
> > +	  /* Identifier and integers are shared specially, they should never
> > +	     go by the tree merging path.  */
> > +	  gcc_checking_assert ((TREE_CODE (first) != IDENTIFIER_NODE
> > +				&& (TREE_CODE (first) != INTEGER_CST
> > +				    || TREE_OVERFLOW (first)))
> > +			       || len != 1);
> >  
> >  	  /* Try to unify the SCC with already existing ones.  */
> > -	  if (!flag_ltrans
> > +	  if (!flag_ltrans && tag != LTO_trees
> >  	      && unify_scc (data_in, from,
> >  			    len, scc_entry_len, scc_hash))
> >  	    continue;
> > @@ -1862,56 +1937,9 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
> >  	    {
> >  	      tree t = streamer_tree_cache_get_tree (data_in->reader_cache,
> >  						     from + i);
> > -	      /* Reconstruct the type variant and pointer-to/reference-to
> > -		 chains.  */
> > +	      process_new_tree (t, &hm, from + i, &total, data_in);
> >  	      if (TYPE_P (t))
> > -		{
> > -		  /* Map the tree types to their frequencies.  */
> > -		  if (flag_lto_dump_type_stats)
> > -		    {
> > -		      unsigned key = (unsigned) TREE_CODE (t);
> > -		      unsigned *countp = hm.get (key);
> > -		      hm.put (key, countp ? (*countp) + 1 : 1);
> > -		      total++;
> > -		    }
> > -
> > -		  seen_type = true;
> > -		  num_prevailing_types++;
> > -		  lto_fixup_prevailing_type (t);
> > -
> > -		  /* Compute the canonical type of all non-ODR types.
> > -		     Delay ODR types for the end of merging process - the canonical
> > -		     type for those can be computed using the (unique) name however
> > -		     we want to do this only if units in other languages do not
> > -		     contain structurally equivalent type.
> > -
> > -		     Because SCC components are streamed in random (hash) order
> > -		     we may have encountered the type before while registering
> > -		     type canonical of a derived type in the same SCC.  */
> > -		  if (!TYPE_CANONICAL (t))
> > -		    {
> > -		      if (!RECORD_OR_UNION_TYPE_P (t)
> > -			  || !TYPE_CXX_ODR_P (t))
> > -		        gimple_register_canonical_type (t);
> > -		      else if (COMPLETE_TYPE_P (t))
> > -			vec_safe_push (types_to_register, t);
> > -		    }
> > -		  if (TYPE_MAIN_VARIANT (t) == t && odr_type_p (t))
> > -		    register_odr_type (t);
> > -		}
> > -	      /* Link shared INTEGER_CSTs into TYPE_CACHED_VALUEs of its
> > -		 type which is also member of this SCC.  */
> > -	      if (TREE_CODE (t) == INTEGER_CST
> > -		  && !TREE_OVERFLOW (t))
> > -		cache_integer_cst (t);
> > -	      if (!flag_ltrans)
> > -		{
> > -		  lto_maybe_register_decl (data_in, t, from + i);
> > -		  /* Scan the tree for references to global functions or
> > -		     variables and record those for later fixup.  */
> > -		  if (mentions_vars_p (t))
> > -		    vec_safe_push (tree_with_vars, t);
> > -		}
> > +		seen_type = true;
> >  	    }
> >  
> >  	  /* Register DECLs with the debuginfo machinery.  */
> > @@ -1926,9 +1954,26 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
> >  	}
> >        else
> >  	{
> > -	  /* Pickle stray references.  */
> >  	  t = lto_input_tree_1 (&ib_main, data_in, tag, 0);
> > -	  gcc_assert (t && data_in->reader_cache->nodes.length () == from);
> > +	  /* We streamed in new tree.  Add it to cache and process dref.  */
> > +	  if (data_in->reader_cache->nodes.length () == from + 1)
> > +	    {
> > +	      num_unshared_trees_read++;
> > +	      data_in->location_cache.accept_location_cache ();
> > +	      process_dref (data_in, t, from);
> > +	      if (TREE_CODE (t) == IDENTIFIER_NODE
> > +		  || (TREE_CODE (t) == INTEGER_CST
> > +		      && !TREE_OVERFLOW (t)))
> > +		;
> > +	      else
> > +		{
> > +		  lto_maybe_register_decl (data_in, t, from);
> > +		  process_new_tree (t, &hm, from, &total, data_in);
> > +		}
> > +	    }
> > +	  else
> > +	    /* FIXME: It seems useless to pickle stray references.  */
> > +	    gcc_assert (data_in->reader_cache->nodes.length () == from);
> >  	}
> >      }
> >  
> > @@ -2953,10 +2998,13 @@ print_lto_report_1 (void)
> >    const char *pfx = (flag_lto) ? "LTO" : (flag_wpa) ? "WPA" : "LTRANS";
> >    fprintf (stderr, "%s statistics\n", pfx);
> >  
> > -  fprintf (stderr, "[%s] read %lu SCCs of average size %f\n",
> > +  fprintf (stderr, "[%s] read %lu unshared trees\n",
> > +	   pfx, num_unshared_trees_read);
> > +  fprintf (stderr, "[%s] read %lu mergeable SCCs of average size %f\n",
> >  	   pfx, num_sccs_read, total_scc_size / (double)num_sccs_read);
> > -  fprintf (stderr, "[%s] %lu tree bodies read in total\n", pfx, total_scc_size);
> > -  if (flag_wpa && tree_scc_hash)
> > +  fprintf (stderr, "[%s] %lu tree bodies read in total\n", pfx,
> > +	   total_scc_size + num_unshared_trees_read);
> > +  if (flag_wpa && tree_scc_hash && num_sccs_read)
> >      {
> >        fprintf (stderr, "[%s] tree SCC table: size %ld, %ld elements, "
> >  	       "collision ratio: %f\n", pfx,
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Richard Biener May 20, 2020, 12:01 p.m. UTC | #5
On Wed, 20 May 2020, Jan Hubicka wrote:

> Hi,
> > 
> > The patch looks reasonable - the wins are not entirely clear to me
> > since you mix in LTO streaming format optimizations.  You promised
> > to improve collisions on the SCC merging side but those are not
> > spectacular (if present at all) in the above data?
> 
> Thanks for looking into that. Collision ratios was imporving on Firefox
> that has more anonymous types and their derivations, but I was currently
> seeing an ICE building it.  I will debug that.
> 
> I think, quite surprisingly, most stream size benefits come from not
> streaming hash codes for integer_csts and indentifier_nodes.
> These are common and hash codes are large + hard to compress.
> 
> I was thinking about trying making them artifically smaller (32bit only)
> once we nail down the unecessary scc chains.  I will look into that now
> - last time i did stats it was mostly anonymous types and that is why I
> started to worry about them first.
> > 
> > The patch itself looks really nice.  The only comment I have is
> > with respect to the new local_tree_p predicate and its relation
> > to tree_is_indexable.  We should be able to assert that
> > local trees are not indexable and indexable trees never are local.
> 
> This is not true.  Anonymous namespace types are indexable (since they
> are shared across function bodies) but they are local to translation
> unit. Similarly non-public decls.
> 
> > Does this make the predicates equal? ;)  It would be nice if
> > they would cross-reference each other (maybe even assert at least
> > in one way), maybe put next to each other and share parts?
> > We can do this as followup.
> 
> If you are OK doing more stuff as folowup, I would be happy.  I think it
> is good base to fix other issues (track down longer chain in merging;
> sanify scc streaming more etc).
> 
> I just tried to avoid snowballing, since the patch already mixes few
> things together.

Yeah, let's do everything else as followup - the patch as-is looks
good to me.

Thanks,
Richard.

> Thanks,
> Honza
> > 
> > Thanks,
> > Richard.
> > 
> > > gcc/ChangeLog:
> > > 
> > > 2020-05-19  Jan Hubicka  <hubicka@ucw.cz>
> > > 
> > > 	* lto-streamer-in.c (lto_input_scc): Add SHARED_SCC parameter.
> > > 	(lto_input_tree_1): Strenghten sanity check.
> > > 	(lto_input_tree): Update call of lto_input_scc.
> > > 	* lto-streamer-out.c: Include ipa-utils.h
> > > 	(create_output_block): Initialize local_trees if merigng is going
> > > 	to happen.
> > > 	(destroy_output_block): Destroy local_trees.
> > > 	(DFS): Add max_local_entry.
> > > 	(local_tree_p): New function.
> > > 	(DFS::DFS): Initialize and maintain it.
> > > 	(DFS::DFS_write_tree): Decide on streaming format.
> > > 	(lto_output_tree): Stream inline singleton SCCs
> > > 	* lto-streamer.h (enum LTO_tags): Add LTO_trees.
> > > 	(struct output_block): Add local_trees.
> > > 	(lto_input_scc): Update prototype.
> > > 
> > > gcc/lto/ChangeLog:
> > > 
> > > 2020-05-19  Jan Hubicka  <hubicka@ucw.cz>
> > > 
> > > 	* lto-common.c (compare_tree_sccs_1): Sanity check that we never
> > > 	read TRANSLATION_UNIT_DECL.
> > > 	(process_dref): Break out from ...
> > > 	(unify_scc): ... here.
> > > 	(process_new_tree): Break out from ...
> > > 	(lto_read_decls): ... here; handle streaming of singleton trees.
> > > 	(print_lto_report_1): Update statistics.
> > > 
> > > diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c
> > > index 244f5b8aa5c..6d571c4344a 100644
> > > --- a/gcc/lto-streamer-in.c
> > > +++ b/gcc/lto-streamer-in.c
> > > @@ -1424,16 +1424,17 @@ lto_read_tree (class lto_input_block *ib, class data_in *data_in,
> > >  
> > >  
> > >  /* Populate the reader cache with trees materialized from the SCC
> > > -   following in the IB, DATA_IN stream.  */
> > > +   following in the IB, DATA_IN stream.
> > > +   If SHARED_SCC is true we input LTO_tree_scc.  */
> > >  
> > >  hashval_t
> > >  lto_input_scc (class lto_input_block *ib, class data_in *data_in,
> > > -	       unsigned *len, unsigned *entry_len)
> > > +	       unsigned *len, unsigned *entry_len, bool shared_scc)
> > >  {
> > >    /* A blob of unnamed tree nodes, fill the cache from it and
> > >       recurse.  */
> > >    unsigned size = streamer_read_uhwi (ib);
> > > -  hashval_t scc_hash = streamer_read_uhwi (ib);
> > > +  hashval_t scc_hash = shared_scc ? streamer_read_uhwi (ib) : 0;
> > >    unsigned scc_entry_len = 1;
> > >  
> > >    if (size == 1)
> > > @@ -1456,7 +1457,8 @@ lto_input_scc (class lto_input_block *ib, class data_in *data_in,
> > >  	      || (tag >= LTO_field_decl_ref && tag <= LTO_global_decl_ref)
> > >  	      || tag == LTO_tree_pickle_reference
> > >  	      || tag == LTO_integer_cst
> > > -	      || tag == LTO_tree_scc)
> > > +	      || tag == LTO_tree_scc
> > > +	      || tag == LTO_trees)
> > >  	    gcc_unreachable ();
> > >  
> > >  	  result = streamer_alloc_tree (ib, data_in, tag);
> > > @@ -1522,7 +1524,7 @@ lto_input_tree_1 (class lto_input_block *ib, class data_in *data_in,
> > >  				 (a, len, TYPE_PRECISION (type)));
> > >        streamer_tree_cache_append (data_in->reader_cache, result, hash);
> > >      }
> > > -  else if (tag == LTO_tree_scc)
> > > +  else if (tag == LTO_tree_scc || tag == LTO_trees)
> > >      gcc_unreachable ();
> > >    else
> > >      {
> > > @@ -1538,11 +1540,11 @@ lto_input_tree (class lto_input_block *ib, class data_in *data_in)
> > >  {
> > >    enum LTO_tags tag;
> > >  
> > > -  /* Input and skip SCCs.  */
> > > -  while ((tag = streamer_read_record_start (ib)) == LTO_tree_scc)
> > > +  /* Input pickled trees needed to stream in the refrence.  */
> > > +  while ((tag = streamer_read_record_start (ib)) == LTO_trees)
> > >      {
> > >        unsigned len, entry_len;
> > > -      lto_input_scc (ib, data_in, &len, &entry_len);
> > > +      lto_input_scc (ib, data_in, &len, &entry_len, false);
> > >  
> > >        /* Register DECLs with the debuginfo machinery.  */
> > >        while (!dref_queue.is_empty ())
> > > @@ -1551,7 +1553,15 @@ lto_input_tree (class lto_input_block *ib, class data_in *data_in)
> > >  	  debug_hooks->register_external_die (e.decl, e.sym, e.off);
> > >  	}
> > >      }
> > > -  return lto_input_tree_1 (ib, data_in, tag, 0);
> > > +  tree t = lto_input_tree_1 (ib, data_in, tag, 0);
> > > +
> > > +  if (!dref_queue.is_empty ())
> > > +    {
> > > +      dref_entry e = dref_queue.pop ();
> > > +      debug_hooks->register_external_die (e.decl, e.sym, e.off);
> > > +      gcc_checking_assert (dref_queue.is_empty ());
> > > +    }
> > > +  return t;
> > >  }
> > >  
> > >  
> > > diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
> > > index 3d94324881f..1c5b8f42c9b 100644
> > > --- a/gcc/lto-streamer-out.c
> > > +++ b/gcc/lto-streamer-out.c
> > > @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "tree-dfa.h"
> > >  #include "file-prefix-map.h" /* remap_debug_filename()  */
> > >  #include "output.h"
> > > +#include "ipa-utils.h"
> > >  
> > >  
> > >  static void lto_write_tree (struct output_block*, tree, bool);
> > > @@ -75,6 +76,10 @@ create_output_block (enum lto_section_type section_type)
> > >  
> > >    ob->section_type = section_type;
> > >    ob->decl_state = lto_get_out_decl_state ();
> > > +  /* Only global decl stream in non-wpa will ever be considered by tree
> > > +     merging.  */
> > > +  if (!flag_wpa && section_type == LTO_section_decls)
> > > +    ob->local_trees = new (hash_set <tree>);
> > >    ob->main_stream = XCNEW (struct lto_output_stream);
> > >    ob->string_stream = XCNEW (struct lto_output_stream);
> > >    ob->writer_cache = streamer_tree_cache_create (!flag_wpa, true, false);
> > > @@ -100,6 +105,7 @@ destroy_output_block (struct output_block *ob)
> > >  
> > >    delete ob->string_hash_table;
> > >    ob->string_hash_table = NULL;
> > > +  delete ob->local_trees;
> > >  
> > >    free (ob->main_stream);
> > >    free (ob->string_stream);
> > > @@ -532,6 +538,8 @@ private:
> > >      bool ref_p;
> > >      bool this_ref_p;
> > >    };
> > > +  /* Maximum index of scc stack containing a local tree.  */
> > > +  int max_local_entry;
> > >  
> > >    static int scc_entry_compare (const void *, const void *);
> > >  
> > > @@ -550,6 +558,33 @@ private:
> > >    struct obstack sccstate_obstack;
> > >  };
> > >  
> > > +/* Return true if type can be merged.
> > > +   TRANSLATION_UNIT_DECL is handled specially since references to it does
> > > +   not make other trees local as well.  */
> > > +
> > > +static bool
> > > +local_tree_p (tree t)
> > > +{
> > > +  switch (TREE_CODE (t))
> > > +    {
> > > +    case LABEL_DECL:
> > > +      return true;
> > > +    case NAMESPACE_DECL:
> > > +      return !DECL_NAME (t);
> > > +    case VAR_DECL:
> > > +    case FUNCTION_DECL:
> > > +      return !TREE_PUBLIC (t) && !DECL_EXTERNAL (t);
> > > +    case RECORD_TYPE:
> > > +    case UNION_TYPE:
> > > +    case ENUMERAL_TYPE:
> > > +      return TYPE_MAIN_VARIANT (t) == t
> > > +	     && odr_type_p (t) && type_with_linkage_p (t)
> > > +	     && type_in_anonymous_namespace_p (t);
> > > +    default:
> > > +      return false;
> > > +    }
> > > +}
> > > +
> > >  /* Emit the physical representation of tree node EXPR to output block OB,
> > >     using depth-first search on the subgraph.  If THIS_REF_P is true, the
> > >     leaves of EXPR are emitted as references via lto_output_tree_ref.
> > > @@ -560,6 +595,8 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
> > >  	  bool single_p)
> > >  {
> > >    unsigned int next_dfs_num = 1;
> > > +
> > > +  max_local_entry = -1;
> > >    gcc_obstack_init (&sccstate_obstack);
> > >    DFS_write_tree (ob, NULL, expr, ref_p, this_ref_p);
> > >    while (!worklist_vec.is_empty ())
> > > @@ -586,6 +623,8 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
> > >  	  scc_entry e = { expr, 0 };
> > >  	  /* Not yet visited.  DFS recurse and push it onto the stack.  */
> > >  	  *slot = cstate = XOBNEW (&sccstate_obstack, struct sccs);
> > > +	  if (ob->local_trees && local_tree_p (expr))
> > > +	    max_local_entry = sccstack.length ();
> > >  	  sccstack.safe_push (e);
> > >  	  cstate->dfsnum = next_dfs_num++;
> > >  	  cstate->low = cstate->dfsnum;
> > > @@ -640,7 +679,26 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
> > >  	     any merging there.  */
> > >  	  hashval_t scc_hash = 0;
> > >  	  unsigned scc_entry_len = 0;
> > > -	  if (!flag_wpa)
> > > +	  bool local_to_unit = !ob->local_trees
> > > +			       || max_local_entry >= (int)first;
> > > +
> > > +	  /* Remember that trees are local so info gets propagated to other
> > > +	     SCCs.  */
> > > +	  if (local_to_unit && ob->local_trees)
> > > +	    {
> > > +	      for (unsigned i = 0; i < size; ++i)
> > > +		ob->local_trees->add (sccstack[first + i].t);
> > > +	    }
> > > +
> > > +	  /* As a special case do not stream TRANSLATION_UNIT_DECL as shared
> > > +	     tree.  We can not mark it local because references to it does not
> > > +	     make other trees local (all global decls reffer to it via
> > > +	     CONTEXT).  */
> > > +	  if (size == 1
> > > +	      && TREE_CODE (sccstack[first].t) == TRANSLATION_UNIT_DECL)
> > > +	    local_to_unit = true;
> > > +
> > > +	  if (!local_to_unit)
> > >  	    {
> > >  	      scc_hash = hash_scc (ob, first, size, ref_p, this_ref_p);
> > >  
> > > @@ -672,10 +730,44 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
> > >  	      gcc_checking_assert (scc_entry_len == 1);
> > >  	    }
> > >  
> > > -	  /* Write LTO_tree_scc.  */
> > > -	  streamer_write_record_start (ob, LTO_tree_scc);
> > > -	  streamer_write_uhwi (ob, size);
> > > -	  streamer_write_uhwi (ob, scc_hash);
> > > +	  worklist_vec.pop ();
> > > +
> > > +	  /* If we are not streaming global section tree sharing will not
> > > +	     be done, but we must mark block of trees so streamer can
> > > +	     distinguish it from reference being streamed.  */
> > > +	  if (ob->section_type != LTO_section_decls)
> > > +	    {
> > > +	      /* If this is the original tree we stream and it forms SCC
> > > +		 by itself then we do not need to stream SCC at all.  */
> > > +	      if (worklist_vec.is_empty () && first == 0 && size == 1)
> > > +		 return;
> > > +	      streamer_write_record_start (ob, LTO_trees);
> > > +	      streamer_write_uhwi (ob, size);
> > > +	    }
> > > +	  /* Write LTO_tree_scc if sharing is going to be performned.  */
> > > +	  else if (!local_to_unit
> > > +		   /* These are special since sharing is not done by tree
> > > +		      merging machinery.  We can not special case them earlier
> > > +		      because we still need to compute hash for further sharing
> > > +		      of trees referring to them.  */
> > > +		   && (size != 1
> > > +		       || (TREE_CODE (sccstack[first].t) != IDENTIFIER_NODE
> > > +			   && (TREE_CODE (sccstack[first].t) != INTEGER_CST
> > > +			       || TREE_OVERFLOW (sccstack[first].t)))))
> > > +
> > > +	    {
> > > +	      gcc_checking_assert (ob->section_type == LTO_section_decls);
> > > +	      streamer_write_record_start (ob, LTO_tree_scc);
> > > +	      streamer_write_uhwi (ob, size);
> > > +	      streamer_write_uhwi (ob, scc_hash);
> > > +	    }
> > > +	  /* Non-trivial SCCs must be packed to trees blocks so forward
> > > +	     references work correctly.  */
> > > +	  else if (size != 1)
> > > +	    {
> > > +	       streamer_write_record_start (ob, LTO_trees);
> > > +	       streamer_write_uhwi (ob, size);
> > > +	    }
> > >  
> > >  	  /* Write size-1 SCCs without wrapping them inside SCC bundles.
> > >  	     All INTEGER_CSTs need to be handled this way as we need
> > > @@ -722,10 +814,11 @@ DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
> > >  
> > >  	  /* Finally truncate the vector.  */
> > >  	  sccstack.truncate (first);
> > > +	  if ((int)first <= max_local_entry)
> > > +	    max_local_entry = first - 1;
> > >  
> > >  	  if (from_state)
> > >  	    from_state->low = MIN (from_state->low, cstate->low);
> > > -	  worklist_vec.pop ();
> > >  	  continue;
> > >  	}
> > >  
> > > @@ -1569,7 +1662,14 @@ DFS::DFS_write_tree (struct output_block *ob, sccs *from_state,
> > >  
> > >    /* Check if we already streamed EXPR.  */
> > >    if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
> > > -    return;
> > > +    {
> > > +      /* Refernece to a local tree makes entry also local.  We always process
> > > +	 top of stack entry, so set max to number of entries in stack - 1.  */
> > > +      if (ob->local_trees
> > > +	  && ob->local_trees->contains (expr))
> > > +	max_local_entry = sccstack.length () - 1;
> > > +      return;
> > > +    }
> > >  
> > >    worklist w;
> > >    w.expr = expr;
> > > @@ -1641,15 +1741,20 @@ lto_output_tree (struct output_block *ob, tree expr,
> > >        DFS (ob, expr, ref_p, this_ref_p, false);
> > >        in_dfs_walk = false;
> > >  
> > > -      /* Finally append a reference to the tree we were writing.
> > > -	 ???  If expr ended up as a singleton we could have
> > > -	 inlined it here and avoid outputting a reference.  */
> > > +      /* Finally append a reference to the tree we were writing.  */
> > >        existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
> > > -      gcc_assert (existed_p);
> > > -      streamer_write_record_start (ob, LTO_tree_pickle_reference);
> > > -      streamer_write_uhwi (ob, ix);
> > > -      streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
> > > -			   lto_tree_code_to_tag (TREE_CODE (expr)));
> > > +
> > > +      /* DFS walk above possibly skipped streaming EXPR itself to let us inline
> > > +	 it.  */
> > > +      if (!existed_p)
> > > +	lto_output_tree_1 (ob, expr, 0, ref_p, this_ref_p);
> > > +      else
> > > +	{
> > > +	  streamer_write_record_start (ob, LTO_tree_pickle_reference);
> > > +	  streamer_write_uhwi (ob, ix);
> > > +	  streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
> > > +			       lto_tree_code_to_tag (TREE_CODE (expr)));
> > > +	}
> > >        if (streamer_dump_file)
> > >  	{
> > >  	  print_node_brief (streamer_dump_file, "   Finished SCC of ",
> > > diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> > > index 76aa6fe34b8..a466fb8b329 100644
> > > --- a/gcc/lto-streamer.h
> > > +++ b/gcc/lto-streamer.h
> > > @@ -178,6 +178,9 @@ enum LTO_tags
> > >    /* Special for global streamer.  A blob of unnamed tree nodes.  */
> > >    LTO_tree_scc,
> > >  
> > > +  /* Sequence of trees.  */
> > > +  LTO_trees,
> > > +
> > >    /* References to indexable tree nodes.  These objects are stored in
> > >       tables that are written separately from the function bodies that
> > >       reference them.  This way they can be instantiated even when the
> > > @@ -751,6 +754,9 @@ struct output_block
> > >    /* Cache of nodes written in this section.  */
> > >    struct streamer_tree_cache_d *writer_cache;
> > >  
> > > +  /* All trees identified as local to the unit streamed.  */
> > > +  hash_set<tree> *local_trees;
> > > +
> > >    /* All data persistent across whole duration of output block
> > >       can go here.  */
> > >    struct obstack obstack;
> > > @@ -901,7 +907,7 @@ tree lto_input_tree_ref (class lto_input_block *, class data_in *,
> > >  void lto_tag_check_set (enum LTO_tags, int, ...);
> > >  void lto_init_eh (void);
> > >  hashval_t lto_input_scc (class lto_input_block *, class data_in *,
> > > -			 unsigned *, unsigned *);
> > > +			 unsigned *, unsigned *, bool);
> > >  tree lto_input_tree_1 (class lto_input_block *, class data_in *,
> > >  		       enum LTO_tags, hashval_t hash);
> > >  tree lto_input_tree (class lto_input_block *, class data_in *);
> > > diff --git a/gcc/lto/lto-common.c b/gcc/lto/lto-common.c
> > > index 682a82d61ba..d04b1c9ca7b 100644
> > > --- a/gcc/lto/lto-common.c
> > > +++ b/gcc/lto/lto-common.c
> > > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "builtins.h"
> > >  #include "lto-common.h"
> > >  #include "tree-pretty-print.h"
> > > +#include "print-tree.h"
> > >  
> > >  /* True when no new types are going to be streamd from the global stream.  */
> > >  
> > > @@ -1054,6 +1055,7 @@ static unsigned long num_prevailing_types;
> > >  static unsigned long num_type_scc_trees;
> > >  static unsigned long total_scc_size;
> > >  static unsigned long num_sccs_read;
> > > +static unsigned long num_unshared_trees_read;
> > >  static unsigned long total_scc_size_merged;
> > >  static unsigned long num_sccs_merged;
> > >  static unsigned long num_scc_compares;
> > > @@ -1088,6 +1090,10 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
> > >    compare_values (TREE_CODE);
> > >    code = TREE_CODE (t1);
> > >  
> > > +  /* If we end up comparing translation unit decls we either forgot to mark
> > > +     some SCC as local or we compare too much.  */
> > > +  gcc_checking_assert (code != TRANSLATION_UNIT_DECL);
> > > +
> > >    if (!TYPE_P (t1))
> > >      {
> > >        compare_values (TREE_SIDE_EFFECTS);
> > > @@ -1626,6 +1632,28 @@ cmp_tree (const void *p1_, const void *p2_)
> > >    return ((uintptr_t)p1[1] < (uintptr_t)p2[1]) ? -1 : 1;
> > >  }
> > >  
> > > +/* New scc of size 1 containing T was streamed in from DATA_IN and not merged.
> > > +   Register it to reader cache at index FROM.  */
> > > +
> > > +static void
> > > +process_dref (class data_in *data_in, tree t, unsigned from)
> > > +{
> > > +  struct streamer_tree_cache_d *cache = data_in->reader_cache;
> > > +  /* If we got a debug reference queued, see if the prevailing
> > > +     tree has a debug reference and if not, register the one
> > > +     for the tree we are about to throw away.  */
> > > +  if (dref_queue.length () == 1)
> > > +    {
> > > +      dref_entry e = dref_queue.pop ();
> > > +      gcc_assert (e.decl
> > > +		  == streamer_tree_cache_get_tree (cache, from));
> > > +      const char *sym;
> > > +      unsigned HOST_WIDE_INT off;
> > > +      if (!debug_hooks->die_ref_for_decl (t, &sym, &off))
> > > +	debug_hooks->register_external_die (t, e.sym, e.off);
> > > +    }
> > > +}
> > > +
> > >  /* Try to unify the SCC with nodes FROM to FROM + LEN in CACHE and
> > >     hash value SCC_HASH with an already recorded SCC.  Return true if
> > >     that was successful, otherwise return false.  */
> > > @@ -1646,22 +1674,16 @@ unify_scc (class data_in *data_in, unsigned from,
> > >      {
> > >        tree t = streamer_tree_cache_get_tree (cache, from + i);
> > >        scc->entries[i] = t;
> > > -      /* Do not merge SCCs with local entities inside them.  Also do
> > > -	 not merge TRANSLATION_UNIT_DECLs and anonymous namespaces
> > > -	 and types therein types.  */
> > > -      if (TREE_CODE (t) == TRANSLATION_UNIT_DECL
> > > -	  || (VAR_OR_FUNCTION_DECL_P (t)
> > > -	      && !(TREE_PUBLIC (t) || DECL_EXTERNAL (t)))
> > > -	  || TREE_CODE (t) == LABEL_DECL
> > > -	  || (TREE_CODE (t) == NAMESPACE_DECL && !DECL_NAME (t))
> > > -	  || (TYPE_P (t)
> > > -	      && type_with_linkage_p (TYPE_MAIN_VARIANT (t))
> > > -	      && type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t))))
> > > -	{
> > > -	  /* Avoid doing any work for these cases and do not worry to
> > > -	     record the SCCs for further merging.  */
> > > -	  return false;
> > > -	}
> > > +      /* These types should be streamed as unshared.  */
> > > +      gcc_checking_assert
> > > +	 (!(TREE_CODE (t) == TRANSLATION_UNIT_DECL
> > > +	    || (VAR_OR_FUNCTION_DECL_P (t)
> > > +		&& !(TREE_PUBLIC (t) || DECL_EXTERNAL (t)))
> > > +	    || TREE_CODE (t) == LABEL_DECL
> > > +	    || (TREE_CODE (t) == NAMESPACE_DECL && !DECL_NAME (t))
> > > +	    || (TYPE_P (t)
> > > +		&& type_with_linkage_p (TYPE_MAIN_VARIANT (t))
> > > +		&& type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t)))));
> > >      }
> > >  
> > >    /* Look for the list of candidate SCCs to compare against.  */
> > > @@ -1712,21 +1734,7 @@ unify_scc (class data_in *data_in, unsigned from,
> > >  	     to the tree node mapping computed by compare_tree_sccs.  */
> > >  	  if (len == 1)
> > >  	    {
> > > -	      /* If we got a debug reference queued, see if the prevailing
> > > -		 tree has a debug reference and if not, register the one
> > > -		 for the tree we are about to throw away.  */
> > > -	      if (dref_queue.length () == 1)
> > > -		{
> > > -		  dref_entry e = dref_queue.pop ();
> > > -		  gcc_assert (e.decl
> > > -			      == streamer_tree_cache_get_tree (cache, from));
> > > -		  const char *sym;
> > > -		  unsigned HOST_WIDE_INT off;
> > > -		  if (!debug_hooks->die_ref_for_decl (pscc->entries[0], &sym,
> > > -						      &off))
> > > -		    debug_hooks->register_external_die (pscc->entries[0],
> > > -							e.sym, e.off);
> > > -		}
> > > +	      process_dref (data_in, pscc->entries[0], from);
> > >  	      lto_maybe_register_decl (data_in, pscc->entries[0], from);
> > >  	      streamer_tree_cache_replace_tree (cache, pscc->entries[0], from);
> > >  	    }
> > > @@ -1785,7 +1793,65 @@ unify_scc (class data_in *data_in, unsigned from,
> > >    return unified_p;
> > >  }
> > >  
> > > +typedef int_hash<unsigned, 0, UINT_MAX> code_id_hash;
> > > +
> > > +/* Do registering necessary once new tree fully streamed in (including all
> > > +   trees it reffers to).  */
> > > +
> > > +static void
> > > +process_new_tree (tree t, hash_map <code_id_hash, unsigned> *hm,
> > > +		  unsigned index, unsigned *total, class data_in *data_in)
> > > +{
> > > +  /* Reconstruct the type variant and pointer-to/reference-to
> > > +     chains.  */
> > > +  if (TYPE_P (t))
> > > +    {
> > > +      /* Map the tree types to their frequencies.  */
> > > +      if (flag_lto_dump_type_stats)
> > > +	{
> > > +	  unsigned key = (unsigned) TREE_CODE (t);
> > > +	  unsigned *countp = hm->get (key);
> > > +	  hm->put (key, countp ? (*countp) + 1 : 1);
> > > +	  (*total)++;
> > > +	}
> > > +
> > > +      num_prevailing_types++;
> > > +      lto_fixup_prevailing_type (t);
> > >  
> > > +      /* Compute the canonical type of all non-ODR types.
> > > +	 Delay ODR types for the end of merging process - the canonical
> > > +	 type for those can be computed using the (unique) name however
> > > +	 we want to do this only if units in other languages do not
> > > +	 contain structurally equivalent type.
> > > +
> > > +	 Because SCC components are streamed in random (hash) order
> > > +	 we may have encountered the type before while registering
> > > +	 type canonical of a derived type in the same SCC.  */
> > > +      if (!TYPE_CANONICAL (t))
> > > +	{
> > > +	  if (!RECORD_OR_UNION_TYPE_P (t)
> > > +	      || !TYPE_CXX_ODR_P (t))
> > > +	    gimple_register_canonical_type (t);
> > > +	  else if (COMPLETE_TYPE_P (t))
> > > +	    vec_safe_push (types_to_register, t);
> > > +	}
> > > +      if (TYPE_MAIN_VARIANT (t) == t && odr_type_p (t))
> > > +	register_odr_type (t);
> > > +    }
> > > +  /* Link shared INTEGER_CSTs into TYPE_CACHED_VALUEs of its
> > > +     type which is also member of this SCC.  */
> > > +  if (TREE_CODE (t) == INTEGER_CST
> > > +      && !TREE_OVERFLOW (t))
> > > +    cache_integer_cst (t);
> > > +  if (!flag_ltrans)
> > > +    {
> > > +      lto_maybe_register_decl (data_in, t, index);
> > > +      /* Scan the tree for references to global functions or
> > > +	 variables and record those for later fixup.  */
> > > +      if (mentions_vars_p (t))
> > > +	vec_safe_push (tree_with_vars, t);
> > > +    }
> > > +}
> > >  
> > >  /* Read all the symbols from buffer DATA, using descriptors in DECL_DATA.
> > >     RESOLUTIONS is the set of symbols picked by the linker (read from the
> > > @@ -1813,7 +1879,6 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
> > >    /* We do not uniquify the pre-loaded cache entries, those are middle-end
> > >       internal types that should not be merged.  */
> > >  
> > > -  typedef int_hash<unsigned, 0, UINT_MAX> code_id_hash;
> > >    hash_map <code_id_hash, unsigned> hm;
> > >    unsigned total = 0;
> > >  
> > > @@ -1824,31 +1889,41 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
> > >        unsigned from = data_in->reader_cache->nodes.length ();
> > >        /* Read and uniquify SCCs as in the input stream.  */
> > >        enum LTO_tags tag = streamer_read_record_start (&ib_main);
> > > -      if (tag == LTO_tree_scc)
> > > +      if (tag == LTO_tree_scc || tag == LTO_trees)
> > >  	{
> > >  	  unsigned len_;
> > >  	  unsigned scc_entry_len;
> > > +
> > > +	  /* Because we stream in SCC order we know that all unshared trees
> > > +	     are now fully streamed.  Process them.  */
> > >  	  hashval_t scc_hash = lto_input_scc (&ib_main, data_in, &len_,
> > > -					      &scc_entry_len);
> > > +					      &scc_entry_len,
> > > +					      tag == LTO_tree_scc);
> > >  	  unsigned len = data_in->reader_cache->nodes.length () - from;
> > >  	  gcc_assert (len == len_);
> > >  
> > > -	  total_scc_size += len;
> > > -	  num_sccs_read++;
> > > +	  if (tag == LTO_tree_scc)
> > > +	    {
> > > +	      total_scc_size += len;
> > > +	      num_sccs_read++;
> > > +	    }
> > > +	  else
> > > +	    num_unshared_trees_read += len;
> > >  
> > >  	  /* We have the special case of size-1 SCCs that are pre-merged
> > >  	     by means of identifier and string sharing for example.
> > >  	     ???  Maybe we should avoid streaming those as SCCs.  */
> > >  	  tree first = streamer_tree_cache_get_tree (data_in->reader_cache,
> > >  						     from);
> > > -	  if (len == 1
> > > -	      && (TREE_CODE (first) == IDENTIFIER_NODE
> > > -		  || (TREE_CODE (first) == INTEGER_CST
> > > -		      && !TREE_OVERFLOW (first))))
> > > -	    continue;
> > > +	  /* Identifier and integers are shared specially, they should never
> > > +	     go by the tree merging path.  */
> > > +	  gcc_checking_assert ((TREE_CODE (first) != IDENTIFIER_NODE
> > > +				&& (TREE_CODE (first) != INTEGER_CST
> > > +				    || TREE_OVERFLOW (first)))
> > > +			       || len != 1);
> > >  
> > >  	  /* Try to unify the SCC with already existing ones.  */
> > > -	  if (!flag_ltrans
> > > +	  if (!flag_ltrans && tag != LTO_trees
> > >  	      && unify_scc (data_in, from,
> > >  			    len, scc_entry_len, scc_hash))
> > >  	    continue;
> > > @@ -1862,56 +1937,9 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
> > >  	    {
> > >  	      tree t = streamer_tree_cache_get_tree (data_in->reader_cache,
> > >  						     from + i);
> > > -	      /* Reconstruct the type variant and pointer-to/reference-to
> > > -		 chains.  */
> > > +	      process_new_tree (t, &hm, from + i, &total, data_in);
> > >  	      if (TYPE_P (t))
> > > -		{
> > > -		  /* Map the tree types to their frequencies.  */
> > > -		  if (flag_lto_dump_type_stats)
> > > -		    {
> > > -		      unsigned key = (unsigned) TREE_CODE (t);
> > > -		      unsigned *countp = hm.get (key);
> > > -		      hm.put (key, countp ? (*countp) + 1 : 1);
> > > -		      total++;
> > > -		    }
> > > -
> > > -		  seen_type = true;
> > > -		  num_prevailing_types++;
> > > -		  lto_fixup_prevailing_type (t);
> > > -
> > > -		  /* Compute the canonical type of all non-ODR types.
> > > -		     Delay ODR types for the end of merging process - the canonical
> > > -		     type for those can be computed using the (unique) name however
> > > -		     we want to do this only if units in other languages do not
> > > -		     contain structurally equivalent type.
> > > -
> > > -		     Because SCC components are streamed in random (hash) order
> > > -		     we may have encountered the type before while registering
> > > -		     type canonical of a derived type in the same SCC.  */
> > > -		  if (!TYPE_CANONICAL (t))
> > > -		    {
> > > -		      if (!RECORD_OR_UNION_TYPE_P (t)
> > > -			  || !TYPE_CXX_ODR_P (t))
> > > -		        gimple_register_canonical_type (t);
> > > -		      else if (COMPLETE_TYPE_P (t))
> > > -			vec_safe_push (types_to_register, t);
> > > -		    }
> > > -		  if (TYPE_MAIN_VARIANT (t) == t && odr_type_p (t))
> > > -		    register_odr_type (t);
> > > -		}
> > > -	      /* Link shared INTEGER_CSTs into TYPE_CACHED_VALUEs of its
> > > -		 type which is also member of this SCC.  */
> > > -	      if (TREE_CODE (t) == INTEGER_CST
> > > -		  && !TREE_OVERFLOW (t))
> > > -		cache_integer_cst (t);
> > > -	      if (!flag_ltrans)
> > > -		{
> > > -		  lto_maybe_register_decl (data_in, t, from + i);
> > > -		  /* Scan the tree for references to global functions or
> > > -		     variables and record those for later fixup.  */
> > > -		  if (mentions_vars_p (t))
> > > -		    vec_safe_push (tree_with_vars, t);
> > > -		}
> > > +		seen_type = true;
> > >  	    }
> > >  
> > >  	  /* Register DECLs with the debuginfo machinery.  */
> > > @@ -1926,9 +1954,26 @@ lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
> > >  	}
> > >        else
> > >  	{
> > > -	  /* Pickle stray references.  */
> > >  	  t = lto_input_tree_1 (&ib_main, data_in, tag, 0);
> > > -	  gcc_assert (t && data_in->reader_cache->nodes.length () == from);
> > > +	  /* We streamed in new tree.  Add it to cache and process dref.  */
> > > +	  if (data_in->reader_cache->nodes.length () == from + 1)
> > > +	    {
> > > +	      num_unshared_trees_read++;
> > > +	      data_in->location_cache.accept_location_cache ();
> > > +	      process_dref (data_in, t, from);
> > > +	      if (TREE_CODE (t) == IDENTIFIER_NODE
> > > +		  || (TREE_CODE (t) == INTEGER_CST
> > > +		      && !TREE_OVERFLOW (t)))
> > > +		;
> > > +	      else
> > > +		{
> > > +		  lto_maybe_register_decl (data_in, t, from);
> > > +		  process_new_tree (t, &hm, from, &total, data_in);
> > > +		}
> > > +	    }
> > > +	  else
> > > +	    /* FIXME: It seems useless to pickle stray references.  */
> > > +	    gcc_assert (data_in->reader_cache->nodes.length () == from);
> > >  	}
> > >      }
> > >  
> > > @@ -2953,10 +2998,13 @@ print_lto_report_1 (void)
> > >    const char *pfx = (flag_lto) ? "LTO" : (flag_wpa) ? "WPA" : "LTRANS";
> > >    fprintf (stderr, "%s statistics\n", pfx);
> > >  
> > > -  fprintf (stderr, "[%s] read %lu SCCs of average size %f\n",
> > > +  fprintf (stderr, "[%s] read %lu unshared trees\n",
> > > +	   pfx, num_unshared_trees_read);
> > > +  fprintf (stderr, "[%s] read %lu mergeable SCCs of average size %f\n",
> > >  	   pfx, num_sccs_read, total_scc_size / (double)num_sccs_read);
> > > -  fprintf (stderr, "[%s] %lu tree bodies read in total\n", pfx, total_scc_size);
> > > -  if (flag_wpa && tree_scc_hash)
> > > +  fprintf (stderr, "[%s] %lu tree bodies read in total\n", pfx,
> > > +	   total_scc_size + num_unshared_trees_read);
> > > +  if (flag_wpa && tree_scc_hash && num_sccs_read)
> > >      {
> > >        fprintf (stderr, "[%s] tree SCC table: size %ld, %ld elements, "
> > >  	       "collision ratio: %f\n", pfx,
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 
>
Martin Liška May 20, 2020, 12:43 p.m. UTC | #6
On 5/20/20 1:49 PM, Jan Hubicka wrote:
> On related note, I remember us discussing that std compression has
> problem with bigger headers then zlib.  Since we stream our header that
> says if section is compressed, I wonder if we could teach stream-out
> phase to skip compression if it is not benefical, so the size info is
> not duplicated?

I can take a look at collect some statistics.

Martin
Martin Liška May 21, 2020, 11:58 a.m. UTC | #7
On 5/20/20 1:49 PM, Jan Hubicka wrote:
> On related note, I remember us discussing that std compression has
> problem with bigger headers then zlib.  Since we stream our header that
> says if section is compressed, I wonder if we could teach stream-out
> phase to skip compression if it is not benefical, so the size info is
> not duplicated?

Hello.

I've collected statistics for that on tramp3d (with -O2 -flto) and I see:

sections: 3041, not beneficial: 16
before: 13804315, after: 4753009
can save: -128 (-0.00%)

So there are only 16 sections that are bigger when compressed with ZSTD
(including zstd header size). The benefit is negligible.

Martin
diff mbox series

Patch

diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c
index 244f5b8aa5c..6d571c4344a 100644
--- a/gcc/lto-streamer-in.c
+++ b/gcc/lto-streamer-in.c
@@ -1424,16 +1424,17 @@  lto_read_tree (class lto_input_block *ib, class data_in *data_in,
 
 
 /* Populate the reader cache with trees materialized from the SCC
-   following in the IB, DATA_IN stream.  */
+   following in the IB, DATA_IN stream.
+   If SHARED_SCC is true we input LTO_tree_scc.  */
 
 hashval_t
 lto_input_scc (class lto_input_block *ib, class data_in *data_in,
-	       unsigned *len, unsigned *entry_len)
+	       unsigned *len, unsigned *entry_len, bool shared_scc)
 {
   /* A blob of unnamed tree nodes, fill the cache from it and
      recurse.  */
   unsigned size = streamer_read_uhwi (ib);
-  hashval_t scc_hash = streamer_read_uhwi (ib);
+  hashval_t scc_hash = shared_scc ? streamer_read_uhwi (ib) : 0;
   unsigned scc_entry_len = 1;
 
   if (size == 1)
@@ -1456,7 +1457,8 @@  lto_input_scc (class lto_input_block *ib, class data_in *data_in,
 	      || (tag >= LTO_field_decl_ref && tag <= LTO_global_decl_ref)
 	      || tag == LTO_tree_pickle_reference
 	      || tag == LTO_integer_cst
-	      || tag == LTO_tree_scc)
+	      || tag == LTO_tree_scc
+	      || tag == LTO_trees)
 	    gcc_unreachable ();
 
 	  result = streamer_alloc_tree (ib, data_in, tag);
@@ -1522,7 +1524,7 @@  lto_input_tree_1 (class lto_input_block *ib, class data_in *data_in,
 				 (a, len, TYPE_PRECISION (type)));
       streamer_tree_cache_append (data_in->reader_cache, result, hash);
     }
-  else if (tag == LTO_tree_scc)
+  else if (tag == LTO_tree_scc || tag == LTO_trees)
     gcc_unreachable ();
   else
     {
@@ -1538,11 +1540,11 @@  lto_input_tree (class lto_input_block *ib, class data_in *data_in)
 {
   enum LTO_tags tag;
 
-  /* Input and skip SCCs.  */
-  while ((tag = streamer_read_record_start (ib)) == LTO_tree_scc)
+  /* Input pickled trees needed to stream in the refrence.  */
+  while ((tag = streamer_read_record_start (ib)) == LTO_trees)
     {
       unsigned len, entry_len;
-      lto_input_scc (ib, data_in, &len, &entry_len);
+      lto_input_scc (ib, data_in, &len, &entry_len, false);
 
       /* Register DECLs with the debuginfo machinery.  */
       while (!dref_queue.is_empty ())
@@ -1551,7 +1553,15 @@  lto_input_tree (class lto_input_block *ib, class data_in *data_in)
 	  debug_hooks->register_external_die (e.decl, e.sym, e.off);
 	}
     }
-  return lto_input_tree_1 (ib, data_in, tag, 0);
+  tree t = lto_input_tree_1 (ib, data_in, tag, 0);
+
+  if (!dref_queue.is_empty ())
+    {
+      dref_entry e = dref_queue.pop ();
+      debug_hooks->register_external_die (e.decl, e.sym, e.off);
+      gcc_checking_assert (dref_queue.is_empty ());
+    }
+  return t;
 }
 
 
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index 3d94324881f..1c5b8f42c9b 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -46,6 +46,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "file-prefix-map.h" /* remap_debug_filename()  */
 #include "output.h"
+#include "ipa-utils.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -75,6 +76,10 @@  create_output_block (enum lto_section_type section_type)
 
   ob->section_type = section_type;
   ob->decl_state = lto_get_out_decl_state ();
+  /* Only global decl stream in non-wpa will ever be considered by tree
+     merging.  */
+  if (!flag_wpa && section_type == LTO_section_decls)
+    ob->local_trees = new (hash_set <tree>);
   ob->main_stream = XCNEW (struct lto_output_stream);
   ob->string_stream = XCNEW (struct lto_output_stream);
   ob->writer_cache = streamer_tree_cache_create (!flag_wpa, true, false);
@@ -100,6 +105,7 @@  destroy_output_block (struct output_block *ob)
 
   delete ob->string_hash_table;
   ob->string_hash_table = NULL;
+  delete ob->local_trees;
 
   free (ob->main_stream);
   free (ob->string_stream);
@@ -532,6 +538,8 @@  private:
     bool ref_p;
     bool this_ref_p;
   };
+  /* Maximum index of scc stack containing a local tree.  */
+  int max_local_entry;
 
   static int scc_entry_compare (const void *, const void *);
 
@@ -550,6 +558,33 @@  private:
   struct obstack sccstate_obstack;
 };
 
+/* Return true if type can be merged.
+   TRANSLATION_UNIT_DECL is handled specially since references to it does
+   not make other trees local as well.  */
+
+static bool
+local_tree_p (tree t)
+{
+  switch (TREE_CODE (t))
+    {
+    case LABEL_DECL:
+      return true;
+    case NAMESPACE_DECL:
+      return !DECL_NAME (t);
+    case VAR_DECL:
+    case FUNCTION_DECL:
+      return !TREE_PUBLIC (t) && !DECL_EXTERNAL (t);
+    case RECORD_TYPE:
+    case UNION_TYPE:
+    case ENUMERAL_TYPE:
+      return TYPE_MAIN_VARIANT (t) == t
+	     && odr_type_p (t) && type_with_linkage_p (t)
+	     && type_in_anonymous_namespace_p (t);
+    default:
+      return false;
+    }
+}
+
 /* Emit the physical representation of tree node EXPR to output block OB,
    using depth-first search on the subgraph.  If THIS_REF_P is true, the
    leaves of EXPR are emitted as references via lto_output_tree_ref.
@@ -560,6 +595,8 @@  DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
 	  bool single_p)
 {
   unsigned int next_dfs_num = 1;
+
+  max_local_entry = -1;
   gcc_obstack_init (&sccstate_obstack);
   DFS_write_tree (ob, NULL, expr, ref_p, this_ref_p);
   while (!worklist_vec.is_empty ())
@@ -586,6 +623,8 @@  DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
 	  scc_entry e = { expr, 0 };
 	  /* Not yet visited.  DFS recurse and push it onto the stack.  */
 	  *slot = cstate = XOBNEW (&sccstate_obstack, struct sccs);
+	  if (ob->local_trees && local_tree_p (expr))
+	    max_local_entry = sccstack.length ();
 	  sccstack.safe_push (e);
 	  cstate->dfsnum = next_dfs_num++;
 	  cstate->low = cstate->dfsnum;
@@ -640,7 +679,26 @@  DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
 	     any merging there.  */
 	  hashval_t scc_hash = 0;
 	  unsigned scc_entry_len = 0;
-	  if (!flag_wpa)
+	  bool local_to_unit = !ob->local_trees
+			       || max_local_entry >= (int)first;
+
+	  /* Remember that trees are local so info gets propagated to other
+	     SCCs.  */
+	  if (local_to_unit && ob->local_trees)
+	    {
+	      for (unsigned i = 0; i < size; ++i)
+		ob->local_trees->add (sccstack[first + i].t);
+	    }
+
+	  /* As a special case do not stream TRANSLATION_UNIT_DECL as shared
+	     tree.  We can not mark it local because references to it does not
+	     make other trees local (all global decls reffer to it via
+	     CONTEXT).  */
+	  if (size == 1
+	      && TREE_CODE (sccstack[first].t) == TRANSLATION_UNIT_DECL)
+	    local_to_unit = true;
+
+	  if (!local_to_unit)
 	    {
 	      scc_hash = hash_scc (ob, first, size, ref_p, this_ref_p);
 
@@ -672,10 +730,44 @@  DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
 	      gcc_checking_assert (scc_entry_len == 1);
 	    }
 
-	  /* Write LTO_tree_scc.  */
-	  streamer_write_record_start (ob, LTO_tree_scc);
-	  streamer_write_uhwi (ob, size);
-	  streamer_write_uhwi (ob, scc_hash);
+	  worklist_vec.pop ();
+
+	  /* If we are not streaming global section tree sharing will not
+	     be done, but we must mark block of trees so streamer can
+	     distinguish it from reference being streamed.  */
+	  if (ob->section_type != LTO_section_decls)
+	    {
+	      /* If this is the original tree we stream and it forms SCC
+		 by itself then we do not need to stream SCC at all.  */
+	      if (worklist_vec.is_empty () && first == 0 && size == 1)
+		 return;
+	      streamer_write_record_start (ob, LTO_trees);
+	      streamer_write_uhwi (ob, size);
+	    }
+	  /* Write LTO_tree_scc if sharing is going to be performned.  */
+	  else if (!local_to_unit
+		   /* These are special since sharing is not done by tree
+		      merging machinery.  We can not special case them earlier
+		      because we still need to compute hash for further sharing
+		      of trees referring to them.  */
+		   && (size != 1
+		       || (TREE_CODE (sccstack[first].t) != IDENTIFIER_NODE
+			   && (TREE_CODE (sccstack[first].t) != INTEGER_CST
+			       || TREE_OVERFLOW (sccstack[first].t)))))
+
+	    {
+	      gcc_checking_assert (ob->section_type == LTO_section_decls);
+	      streamer_write_record_start (ob, LTO_tree_scc);
+	      streamer_write_uhwi (ob, size);
+	      streamer_write_uhwi (ob, scc_hash);
+	    }
+	  /* Non-trivial SCCs must be packed to trees blocks so forward
+	     references work correctly.  */
+	  else if (size != 1)
+	    {
+	       streamer_write_record_start (ob, LTO_trees);
+	       streamer_write_uhwi (ob, size);
+	    }
 
 	  /* Write size-1 SCCs without wrapping them inside SCC bundles.
 	     All INTEGER_CSTs need to be handled this way as we need
@@ -722,10 +814,11 @@  DFS::DFS (struct output_block *ob, tree expr, bool ref_p, bool this_ref_p,
 
 	  /* Finally truncate the vector.  */
 	  sccstack.truncate (first);
+	  if ((int)first <= max_local_entry)
+	    max_local_entry = first - 1;
 
 	  if (from_state)
 	    from_state->low = MIN (from_state->low, cstate->low);
-	  worklist_vec.pop ();
 	  continue;
 	}
 
@@ -1569,7 +1662,14 @@  DFS::DFS_write_tree (struct output_block *ob, sccs *from_state,
 
   /* Check if we already streamed EXPR.  */
   if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
-    return;
+    {
+      /* Refernece to a local tree makes entry also local.  We always process
+	 top of stack entry, so set max to number of entries in stack - 1.  */
+      if (ob->local_trees
+	  && ob->local_trees->contains (expr))
+	max_local_entry = sccstack.length () - 1;
+      return;
+    }
 
   worklist w;
   w.expr = expr;
@@ -1641,15 +1741,20 @@  lto_output_tree (struct output_block *ob, tree expr,
       DFS (ob, expr, ref_p, this_ref_p, false);
       in_dfs_walk = false;
 
-      /* Finally append a reference to the tree we were writing.
-	 ???  If expr ended up as a singleton we could have
-	 inlined it here and avoid outputting a reference.  */
+      /* Finally append a reference to the tree we were writing.  */
       existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
-      gcc_assert (existed_p);
-      streamer_write_record_start (ob, LTO_tree_pickle_reference);
-      streamer_write_uhwi (ob, ix);
-      streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
-			   lto_tree_code_to_tag (TREE_CODE (expr)));
+
+      /* DFS walk above possibly skipped streaming EXPR itself to let us inline
+	 it.  */
+      if (!existed_p)
+	lto_output_tree_1 (ob, expr, 0, ref_p, this_ref_p);
+      else
+	{
+	  streamer_write_record_start (ob, LTO_tree_pickle_reference);
+	  streamer_write_uhwi (ob, ix);
+	  streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
+			       lto_tree_code_to_tag (TREE_CODE (expr)));
+	}
       if (streamer_dump_file)
 	{
 	  print_node_brief (streamer_dump_file, "   Finished SCC of ",
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 76aa6fe34b8..a466fb8b329 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -178,6 +178,9 @@  enum LTO_tags
   /* Special for global streamer.  A blob of unnamed tree nodes.  */
   LTO_tree_scc,
 
+  /* Sequence of trees.  */
+  LTO_trees,
+
   /* References to indexable tree nodes.  These objects are stored in
      tables that are written separately from the function bodies that
      reference them.  This way they can be instantiated even when the
@@ -751,6 +754,9 @@  struct output_block
   /* Cache of nodes written in this section.  */
   struct streamer_tree_cache_d *writer_cache;
 
+  /* All trees identified as local to the unit streamed.  */
+  hash_set<tree> *local_trees;
+
   /* All data persistent across whole duration of output block
      can go here.  */
   struct obstack obstack;
@@ -901,7 +907,7 @@  tree lto_input_tree_ref (class lto_input_block *, class data_in *,
 void lto_tag_check_set (enum LTO_tags, int, ...);
 void lto_init_eh (void);
 hashval_t lto_input_scc (class lto_input_block *, class data_in *,
-			 unsigned *, unsigned *);
+			 unsigned *, unsigned *, bool);
 tree lto_input_tree_1 (class lto_input_block *, class data_in *,
 		       enum LTO_tags, hashval_t hash);
 tree lto_input_tree (class lto_input_block *, class data_in *);
diff --git a/gcc/lto/lto-common.c b/gcc/lto/lto-common.c
index 682a82d61ba..d04b1c9ca7b 100644
--- a/gcc/lto/lto-common.c
+++ b/gcc/lto/lto-common.c
@@ -56,6 +56,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "lto-common.h"
 #include "tree-pretty-print.h"
+#include "print-tree.h"
 
 /* True when no new types are going to be streamd from the global stream.  */
 
@@ -1054,6 +1055,7 @@  static unsigned long num_prevailing_types;
 static unsigned long num_type_scc_trees;
 static unsigned long total_scc_size;
 static unsigned long num_sccs_read;
+static unsigned long num_unshared_trees_read;
 static unsigned long total_scc_size_merged;
 static unsigned long num_sccs_merged;
 static unsigned long num_scc_compares;
@@ -1088,6 +1090,10 @@  compare_tree_sccs_1 (tree t1, tree t2, tree **map)
   compare_values (TREE_CODE);
   code = TREE_CODE (t1);
 
+  /* If we end up comparing translation unit decls we either forgot to mark
+     some SCC as local or we compare too much.  */
+  gcc_checking_assert (code != TRANSLATION_UNIT_DECL);
+
   if (!TYPE_P (t1))
     {
       compare_values (TREE_SIDE_EFFECTS);
@@ -1626,6 +1632,28 @@  cmp_tree (const void *p1_, const void *p2_)
   return ((uintptr_t)p1[1] < (uintptr_t)p2[1]) ? -1 : 1;
 }
 
+/* New scc of size 1 containing T was streamed in from DATA_IN and not merged.
+   Register it to reader cache at index FROM.  */
+
+static void
+process_dref (class data_in *data_in, tree t, unsigned from)
+{
+  struct streamer_tree_cache_d *cache = data_in->reader_cache;
+  /* If we got a debug reference queued, see if the prevailing
+     tree has a debug reference and if not, register the one
+     for the tree we are about to throw away.  */
+  if (dref_queue.length () == 1)
+    {
+      dref_entry e = dref_queue.pop ();
+      gcc_assert (e.decl
+		  == streamer_tree_cache_get_tree (cache, from));
+      const char *sym;
+      unsigned HOST_WIDE_INT off;
+      if (!debug_hooks->die_ref_for_decl (t, &sym, &off))
+	debug_hooks->register_external_die (t, e.sym, e.off);
+    }
+}
+
 /* Try to unify the SCC with nodes FROM to FROM + LEN in CACHE and
    hash value SCC_HASH with an already recorded SCC.  Return true if
    that was successful, otherwise return false.  */
@@ -1646,22 +1674,16 @@  unify_scc (class data_in *data_in, unsigned from,
     {
       tree t = streamer_tree_cache_get_tree (cache, from + i);
       scc->entries[i] = t;
-      /* Do not merge SCCs with local entities inside them.  Also do
-	 not merge TRANSLATION_UNIT_DECLs and anonymous namespaces
-	 and types therein types.  */
-      if (TREE_CODE (t) == TRANSLATION_UNIT_DECL
-	  || (VAR_OR_FUNCTION_DECL_P (t)
-	      && !(TREE_PUBLIC (t) || DECL_EXTERNAL (t)))
-	  || TREE_CODE (t) == LABEL_DECL
-	  || (TREE_CODE (t) == NAMESPACE_DECL && !DECL_NAME (t))
-	  || (TYPE_P (t)
-	      && type_with_linkage_p (TYPE_MAIN_VARIANT (t))
-	      && type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t))))
-	{
-	  /* Avoid doing any work for these cases and do not worry to
-	     record the SCCs for further merging.  */
-	  return false;
-	}
+      /* These types should be streamed as unshared.  */
+      gcc_checking_assert
+	 (!(TREE_CODE (t) == TRANSLATION_UNIT_DECL
+	    || (VAR_OR_FUNCTION_DECL_P (t)
+		&& !(TREE_PUBLIC (t) || DECL_EXTERNAL (t)))
+	    || TREE_CODE (t) == LABEL_DECL
+	    || (TREE_CODE (t) == NAMESPACE_DECL && !DECL_NAME (t))
+	    || (TYPE_P (t)
+		&& type_with_linkage_p (TYPE_MAIN_VARIANT (t))
+		&& type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t)))));
     }
 
   /* Look for the list of candidate SCCs to compare against.  */
@@ -1712,21 +1734,7 @@  unify_scc (class data_in *data_in, unsigned from,
 	     to the tree node mapping computed by compare_tree_sccs.  */
 	  if (len == 1)
 	    {
-	      /* If we got a debug reference queued, see if the prevailing
-		 tree has a debug reference and if not, register the one
-		 for the tree we are about to throw away.  */
-	      if (dref_queue.length () == 1)
-		{
-		  dref_entry e = dref_queue.pop ();
-		  gcc_assert (e.decl
-			      == streamer_tree_cache_get_tree (cache, from));
-		  const char *sym;
-		  unsigned HOST_WIDE_INT off;
-		  if (!debug_hooks->die_ref_for_decl (pscc->entries[0], &sym,
-						      &off))
-		    debug_hooks->register_external_die (pscc->entries[0],
-							e.sym, e.off);
-		}
+	      process_dref (data_in, pscc->entries[0], from);
 	      lto_maybe_register_decl (data_in, pscc->entries[0], from);
 	      streamer_tree_cache_replace_tree (cache, pscc->entries[0], from);
 	    }
@@ -1785,7 +1793,65 @@  unify_scc (class data_in *data_in, unsigned from,
   return unified_p;
 }
 
+typedef int_hash<unsigned, 0, UINT_MAX> code_id_hash;
+
+/* Do registering necessary once new tree fully streamed in (including all
+   trees it reffers to).  */
+
+static void
+process_new_tree (tree t, hash_map <code_id_hash, unsigned> *hm,
+		  unsigned index, unsigned *total, class data_in *data_in)
+{
+  /* Reconstruct the type variant and pointer-to/reference-to
+     chains.  */
+  if (TYPE_P (t))
+    {
+      /* Map the tree types to their frequencies.  */
+      if (flag_lto_dump_type_stats)
+	{
+	  unsigned key = (unsigned) TREE_CODE (t);
+	  unsigned *countp = hm->get (key);
+	  hm->put (key, countp ? (*countp) + 1 : 1);
+	  (*total)++;
+	}
+
+      num_prevailing_types++;
+      lto_fixup_prevailing_type (t);
 
+      /* Compute the canonical type of all non-ODR types.
+	 Delay ODR types for the end of merging process - the canonical
+	 type for those can be computed using the (unique) name however
+	 we want to do this only if units in other languages do not
+	 contain structurally equivalent type.
+
+	 Because SCC components are streamed in random (hash) order
+	 we may have encountered the type before while registering
+	 type canonical of a derived type in the same SCC.  */
+      if (!TYPE_CANONICAL (t))
+	{
+	  if (!RECORD_OR_UNION_TYPE_P (t)
+	      || !TYPE_CXX_ODR_P (t))
+	    gimple_register_canonical_type (t);
+	  else if (COMPLETE_TYPE_P (t))
+	    vec_safe_push (types_to_register, t);
+	}
+      if (TYPE_MAIN_VARIANT (t) == t && odr_type_p (t))
+	register_odr_type (t);
+    }
+  /* Link shared INTEGER_CSTs into TYPE_CACHED_VALUEs of its
+     type which is also member of this SCC.  */
+  if (TREE_CODE (t) == INTEGER_CST
+      && !TREE_OVERFLOW (t))
+    cache_integer_cst (t);
+  if (!flag_ltrans)
+    {
+      lto_maybe_register_decl (data_in, t, index);
+      /* Scan the tree for references to global functions or
+	 variables and record those for later fixup.  */
+      if (mentions_vars_p (t))
+	vec_safe_push (tree_with_vars, t);
+    }
+}
 
 /* Read all the symbols from buffer DATA, using descriptors in DECL_DATA.
    RESOLUTIONS is the set of symbols picked by the linker (read from the
@@ -1813,7 +1879,6 @@  lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
   /* We do not uniquify the pre-loaded cache entries, those are middle-end
      internal types that should not be merged.  */
 
-  typedef int_hash<unsigned, 0, UINT_MAX> code_id_hash;
   hash_map <code_id_hash, unsigned> hm;
   unsigned total = 0;
 
@@ -1824,31 +1889,41 @@  lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
       unsigned from = data_in->reader_cache->nodes.length ();
       /* Read and uniquify SCCs as in the input stream.  */
       enum LTO_tags tag = streamer_read_record_start (&ib_main);
-      if (tag == LTO_tree_scc)
+      if (tag == LTO_tree_scc || tag == LTO_trees)
 	{
 	  unsigned len_;
 	  unsigned scc_entry_len;
+
+	  /* Because we stream in SCC order we know that all unshared trees
+	     are now fully streamed.  Process them.  */
 	  hashval_t scc_hash = lto_input_scc (&ib_main, data_in, &len_,
-					      &scc_entry_len);
+					      &scc_entry_len,
+					      tag == LTO_tree_scc);
 	  unsigned len = data_in->reader_cache->nodes.length () - from;
 	  gcc_assert (len == len_);
 
-	  total_scc_size += len;
-	  num_sccs_read++;
+	  if (tag == LTO_tree_scc)
+	    {
+	      total_scc_size += len;
+	      num_sccs_read++;
+	    }
+	  else
+	    num_unshared_trees_read += len;
 
 	  /* We have the special case of size-1 SCCs that are pre-merged
 	     by means of identifier and string sharing for example.
 	     ???  Maybe we should avoid streaming those as SCCs.  */
 	  tree first = streamer_tree_cache_get_tree (data_in->reader_cache,
 						     from);
-	  if (len == 1
-	      && (TREE_CODE (first) == IDENTIFIER_NODE
-		  || (TREE_CODE (first) == INTEGER_CST
-		      && !TREE_OVERFLOW (first))))
-	    continue;
+	  /* Identifier and integers are shared specially, they should never
+	     go by the tree merging path.  */
+	  gcc_checking_assert ((TREE_CODE (first) != IDENTIFIER_NODE
+				&& (TREE_CODE (first) != INTEGER_CST
+				    || TREE_OVERFLOW (first)))
+			       || len != 1);
 
 	  /* Try to unify the SCC with already existing ones.  */
-	  if (!flag_ltrans
+	  if (!flag_ltrans && tag != LTO_trees
 	      && unify_scc (data_in, from,
 			    len, scc_entry_len, scc_hash))
 	    continue;
@@ -1862,56 +1937,9 @@  lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
 	    {
 	      tree t = streamer_tree_cache_get_tree (data_in->reader_cache,
 						     from + i);
-	      /* Reconstruct the type variant and pointer-to/reference-to
-		 chains.  */
+	      process_new_tree (t, &hm, from + i, &total, data_in);
 	      if (TYPE_P (t))
-		{
-		  /* Map the tree types to their frequencies.  */
-		  if (flag_lto_dump_type_stats)
-		    {
-		      unsigned key = (unsigned) TREE_CODE (t);
-		      unsigned *countp = hm.get (key);
-		      hm.put (key, countp ? (*countp) + 1 : 1);
-		      total++;
-		    }
-
-		  seen_type = true;
-		  num_prevailing_types++;
-		  lto_fixup_prevailing_type (t);
-
-		  /* Compute the canonical type of all non-ODR types.
-		     Delay ODR types for the end of merging process - the canonical
-		     type for those can be computed using the (unique) name however
-		     we want to do this only if units in other languages do not
-		     contain structurally equivalent type.
-
-		     Because SCC components are streamed in random (hash) order
-		     we may have encountered the type before while registering
-		     type canonical of a derived type in the same SCC.  */
-		  if (!TYPE_CANONICAL (t))
-		    {
-		      if (!RECORD_OR_UNION_TYPE_P (t)
-			  || !TYPE_CXX_ODR_P (t))
-		        gimple_register_canonical_type (t);
-		      else if (COMPLETE_TYPE_P (t))
-			vec_safe_push (types_to_register, t);
-		    }
-		  if (TYPE_MAIN_VARIANT (t) == t && odr_type_p (t))
-		    register_odr_type (t);
-		}
-	      /* Link shared INTEGER_CSTs into TYPE_CACHED_VALUEs of its
-		 type which is also member of this SCC.  */
-	      if (TREE_CODE (t) == INTEGER_CST
-		  && !TREE_OVERFLOW (t))
-		cache_integer_cst (t);
-	      if (!flag_ltrans)
-		{
-		  lto_maybe_register_decl (data_in, t, from + i);
-		  /* Scan the tree for references to global functions or
-		     variables and record those for later fixup.  */
-		  if (mentions_vars_p (t))
-		    vec_safe_push (tree_with_vars, t);
-		}
+		seen_type = true;
 	    }
 
 	  /* Register DECLs with the debuginfo machinery.  */
@@ -1926,9 +1954,26 @@  lto_read_decls (struct lto_file_decl_data *decl_data, const void *data,
 	}
       else
 	{
-	  /* Pickle stray references.  */
 	  t = lto_input_tree_1 (&ib_main, data_in, tag, 0);
-	  gcc_assert (t && data_in->reader_cache->nodes.length () == from);
+	  /* We streamed in new tree.  Add it to cache and process dref.  */
+	  if (data_in->reader_cache->nodes.length () == from + 1)
+	    {
+	      num_unshared_trees_read++;
+	      data_in->location_cache.accept_location_cache ();
+	      process_dref (data_in, t, from);
+	      if (TREE_CODE (t) == IDENTIFIER_NODE
+		  || (TREE_CODE (t) == INTEGER_CST
+		      && !TREE_OVERFLOW (t)))
+		;
+	      else
+		{
+		  lto_maybe_register_decl (data_in, t, from);
+		  process_new_tree (t, &hm, from, &total, data_in);
+		}
+	    }
+	  else
+	    /* FIXME: It seems useless to pickle stray references.  */
+	    gcc_assert (data_in->reader_cache->nodes.length () == from);
 	}
     }
 
@@ -2953,10 +2998,13 @@  print_lto_report_1 (void)
   const char *pfx = (flag_lto) ? "LTO" : (flag_wpa) ? "WPA" : "LTRANS";
   fprintf (stderr, "%s statistics\n", pfx);
 
-  fprintf (stderr, "[%s] read %lu SCCs of average size %f\n",
+  fprintf (stderr, "[%s] read %lu unshared trees\n",
+	   pfx, num_unshared_trees_read);
+  fprintf (stderr, "[%s] read %lu mergeable SCCs of average size %f\n",
 	   pfx, num_sccs_read, total_scc_size / (double)num_sccs_read);
-  fprintf (stderr, "[%s] %lu tree bodies read in total\n", pfx, total_scc_size);
-  if (flag_wpa && tree_scc_hash)
+  fprintf (stderr, "[%s] %lu tree bodies read in total\n", pfx,
+	   total_scc_size + num_unshared_trees_read);
+  if (flag_wpa && tree_scc_hash && num_sccs_read)
     {
       fprintf (stderr, "[%s] tree SCC table: size %ld, %ld elements, "
 	       "collision ratio: %f\n", pfx,