diff mbox series

RFC: variant and ODR based type merging during LTO streaming

Message ID 20180928144000.GB95723@kam.mff.cuni.cz
State New
Headers show
Series RFC: variant and ODR based type merging during LTO streaming | expand

Commit Message

Jan Hubicka Sept. 28, 2018, 2:40 p.m. UTC
Hi,
this is a proof-of-concept patch for type merging during LTO streaming. It
does two things
1) replace type variant by first compatible one in TYPE_NEXT_VARIANT list
   This is useful at compilation time because frontends produce more variants
   than needed. The effect of this is about 2% of decl stream
2) replace ODR types by their prevailing variant if ODR violation was not
   detected.  This is useful at WPA time and saves about 50% of global decl
   stream. For Firefox it reduces the ltrans.o files size from 2.6GB to 2.0GB.

Here are stats of number of nodes hitting global stream during WPA->ltrans
streaming of firefox:

Before  after
   7354    7280 namespace_decl
  46593   14084 union_type
  18496   15405 translation_unit_decl
  22816   21548 integer_type
 107047   31681 enumeral_type
  67072   46390 array_type
 541220   99572 pointer_plus_expr
 542360  100712 addr_expr
 167657  154019 var_decl
 864769  182691 tree_binfo
 240200  206783 reference_type
 410403  316877 function_type
1862985  522524 type_decl
1954864  652664 record_type
1070333  880209 method_type
1582055 1014270 pointer_type
1495367 1406670 tree_list
7926385 1483545 field_decl
1715133 1725612 function_decl
3384810 3191406 identifier_node

So largest savings are in field_decls (to 18% and they are not even merged
fully by the patch), pointer tpes (to 64%), record types (to 30%) and type
decls (to 28%), binfos addr_expr/pointer_plus_expr (the later are used to
reffer to vtables to 21%). Patch bootstraps, regtestes and seems to
work reliably.

The merging is currently done during streaming by simply looking up prevailing
type and populating the cache (so we do not get too many repeated lookups).
Similar tricks can be added to checksum calculation.

I am not sure this is best possible place, especially for 2) since it provides
really quite major reduction in the IL size.  I am quite sure we do not want
to do that at stream in time in parallel to SCC merging because it affects the
SCC components and also we do not want to merge this way types containing ODR
violations as QOI issue. (merging two completely different types would lead
to ICEs but merging two TBAA incompatible types is probably equally bad).

So we need to do following
 a) read in the the types & do tree mering
 b) populate odr type hash, do ODR violation checks and decide on what types
    are consistent
 c) do the ODR based merging sometime before the trees hit ltrans stream.

I was thinking that perhaps c) can also be added to mentions_vars_p machinery
which would make it somewhat more costy but we could free those duplicates and
save some more memory during WPA.  I am also not usre if longer term we
would not want to have mode that bypasses WPA->ltrans stream completely and
just compile out of WPA process in threads or forks.
Drawback is that building vector recording every single type pointer is going
to also consume quite some memory.

Patch is also not complete because it does not merge field decls. Analogous
tricks can be added to streamer, but I just tought I would like to discuss
alternative implementations first.

WPA compile time improves by about 11%, so merging benefits overcome the extra
complexity in sreamer.

I hope that with this patch I will be able to increase default number of partitions
since the global stream is getting less percentage of the ltrans files. 33%
is still quite a lot, but far better than what we had previously. Hopefully
we could still dramatically cut this down in followup.

Also obvious drawback is that this helps to C++ only. I looked into stats from
C programs and also Toon's stats on Fortran and I tend to believe that C++
is really much worse than those two.

Honza

Comments

Richard Biener Oct. 1, 2018, 10 a.m. UTC | #1
On Fri, 28 Sep 2018, Jan Hubicka wrote:

> Hi,
> this is a proof-of-concept patch for type merging during LTO streaming. It
> does two things
> 1) replace type variant by first compatible one in TYPE_NEXT_VARIANT list
>    This is useful at compilation time because frontends produce more variants
>    than needed. The effect of this is about 2% of decl stream
> 2) replace ODR types by their prevailing variant if ODR violation was not
>    detected.  This is useful at WPA time and saves about 50% of global decl
>    stream. For Firefox it reduces the ltrans.o files size from 2.6GB to 2.0GB.
> 
> Here are stats of number of nodes hitting global stream during WPA->ltrans
> streaming of firefox:
> 
> Before  after
>    7354    7280 namespace_decl
>   46593   14084 union_type
>   18496   15405 translation_unit_decl
>   22816   21548 integer_type
>  107047   31681 enumeral_type
>   67072   46390 array_type
>  541220   99572 pointer_plus_expr
>  542360  100712 addr_expr
>  167657  154019 var_decl
>  864769  182691 tree_binfo
>  240200  206783 reference_type
>  410403  316877 function_type
> 1862985  522524 type_decl
> 1954864  652664 record_type
> 1070333  880209 method_type
> 1582055 1014270 pointer_type
> 1495367 1406670 tree_list
> 7926385 1483545 field_decl
> 1715133 1725612 function_decl
> 3384810 3191406 identifier_node
> 
> So largest savings are in field_decls (to 18% and they are not even merged
> fully by the patch), pointer tpes (to 64%), record types (to 30%) and type
> decls (to 28%), binfos addr_expr/pointer_plus_expr (the later are used to
> reffer to vtables to 21%). Patch bootstraps, regtestes and seems to
> work reliably.
> 
> The merging is currently done during streaming by simply looking up prevailing
> type and populating the cache (so we do not get too many repeated lookups).
> Similar tricks can be added to checksum calculation.
> 
> I am not sure this is best possible place, especially for 2) since it provides
> really quite major reduction in the IL size.  I am quite sure we do not want
> to do that at stream in time in parallel to SCC merging because it affects the
> SCC components and also we do not want to merge this way types containing ODR
> violations as QOI issue. (merging two completely different types would lead
> to ICEs but merging two TBAA incompatible types is probably equally bad).
> 
> So we need to do following
>  a) read in the the types & do tree mering
>  b) populate odr type hash, do ODR violation checks and decide on what types
>     are consistent
>  c) do the ODR based merging sometime before the trees hit ltrans stream.
> 
> I was thinking that perhaps c) can also be added to mentions_vars_p machinery
> which would make it somewhat more costy but we could free those duplicates and
> save some more memory during WPA.  I am also not usre if longer term we
> would not want to have mode that bypasses WPA->ltrans stream completely and
> just compile out of WPA process in threads or forks.
> Drawback is that building vector recording every single type pointer is going
> to also consume quite some memory.
> 
> Patch is also not complete because it does not merge field decls. Analogous
> tricks can be added to streamer, but I just tought I would like to discuss
> alternative implementations first.
> 
> WPA compile time improves by about 11%, so merging benefits overcome the extra
> complexity in sreamer.
> 
> I hope that with this patch I will be able to increase default number of partitions
> since the global stream is getting less percentage of the ltrans files. 33%
> is still quite a lot, but far better than what we had previously. Hopefully
> we could still dramatically cut this down in followup.
> 
> Also obvious drawback is that this helps to C++ only. I looked into stats from
> C programs and also Toon's stats on Fortran and I tend to believe that C++
> is really much worse than those two.

The ODR savings really look good but as you say the implementation is
somewhat "tricky".

I'd like to see the type-variant done separately (of course) and
also differently.  This is because when enabling 
free-lang-data by default we should be able to get benefits for
non-LTO compilations as well.  Specifically I'd like us to
(in free-lang-data) "canonicalize" existing variants applying the
rules you derived for the lazier compare.  At the point you then
drop non-fld-walked variants we could weed out the resulting
duplicates, keeping only a prevailing one in the list and (ab-)using
GC to fixup references.  Like with ggc_register_fixup (void *from, void 
*to) which would when following edges at mark time, adjust them
according to this map (I'm not sure if we'd want to do it manually
and record a vector of possibly affected TREE_TYPE uses during the fld
walk).  We could also (easily?) optimize the streaming itself by
streaming only differences to the main variant of a type (but then
we could change our tree data structure to make that trivial).

I wonder how much "ODR" merging we'd get for free when we canonicalize
types some more - that is, how do ODR equal but tree unequal types
usually differ?  I guess it might be most of the time fields with
pointer to incomplete vs. complete type?

Thanks,
Richard.


> Honza
> 
> Index: ipa-devirt.c
> ===================================================================
> --- ipa-devirt.c	(revision 264689)
> +++ ipa-devirt.c	(working copy)
> @@ -2111,6 +2111,29 @@ get_odr_type (tree type, bool insert)
>    return val;
>  }
>  
> +/* Return the main variant of the odr type.  This is used for straming out
> +   to reduce number of type duplicates hitting the WPA->LTRANS streams.
> +   Do not do so when ODR violation was detected since the type may be
> +   structurally different then.  */
> +
> +tree
> +prevailing_odr_type (tree t)
> +{
> +  t = TYPE_MAIN_VARIANT (t);
> +  /* In need_assembler_name_p we also mangle assembler names of INTEGER_TYPE.
> +     We can not merge these because this does not honnor precision and
> +     signedness.  */
> +  if (!type_with_linkage_p (t)
> +      || type_in_anonymous_namespace_p (t)
> +      || TREE_CODE (t) == INTEGER_TYPE
> +      || !COMPLETE_TYPE_P (t))
> +    return t;
> +  odr_type ot = get_odr_type (t, true);
> +  if (!ot || !ot->odr_violated)
> +    return ot->type;
> +  return t;
> +}
> +
>  /* Add TYPE od ODR type hash.  */
>  
>  void
> Index: ipa-utils.h
> ===================================================================
> --- ipa-utils.h	(revision 264689)
> +++ ipa-utils.h	(working copy)
> @@ -90,6 +90,7 @@ void warn_types_mismatch (tree t1, tree
>  			  location_t loc2 = UNKNOWN_LOCATION);
>  bool odr_or_derived_type_p (const_tree t);
>  bool odr_types_equivalent_p (tree type1, tree type2);
> +tree prevailing_odr_type (tree t);
>  
>  /* Return vector containing possible targets of polymorphic call E.
>     If COMPLETEP is non-NULL, store true if the list is complete. 
> Index: lto/lto.c
> ===================================================================
> --- lto/lto.c	(revision 264689)
> +++ lto/lto.c	(working copy)
> @@ -485,6 +485,8 @@ gimple_register_canonical_type_1 (tree t
>  static void
>  gimple_register_canonical_type (tree t)
>  {
> +  if (flag_checking)
> +    verify_type (t);
>    if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t)
>        || !canonical_type_used_p (t))
>      return;
> Index: lto-streamer-out.c
> ===================================================================
> --- lto-streamer-out.c	(revision 264689)
> +++ lto-streamer-out.c	(working copy)
> @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.
>  #include "debug.h"
>  #include "omp-offload.h"
>  #include "print-tree.h"
> +#include "attribs.h"
> +#include "ipa-utils.h"
>  
>  
>  static void lto_write_tree (struct output_block*, tree, bool);
> @@ -109,14 +111,188 @@ destroy_output_block (struct output_bloc
>    free (ob);
>  }
>  
> +/* Do same comparsion as check_qualified_type skipping lang part of type
> +   and be more permissive about type names: we only care that names are
> +   same (for diagnostics) and that ODR names are the same.  */
>  
> -/* Look up NODE in the type table and write the index for it to OB.  */
> +static bool
> +types_equal_p (tree t, tree v)
> +{
> +  if (TYPE_QUALS (t) != TYPE_QUALS (v))
> +    return false;
> +
> +  if (TYPE_NAME (t) != TYPE_NAME (v)
> +      && (!TYPE_NAME (t) || !TYPE_NAME (v)
> +	  || TREE_CODE (TYPE_NAME (t)) != TYPE_DECL
> +	  || TREE_CODE (TYPE_NAME (v)) != TYPE_DECL
> +	  || DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (t))
> +	     != DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (v))
> +	  || DECL_NAME (TYPE_NAME (t)) != DECL_NAME (TYPE_NAME (v))))
> +     return false;
> +
> +  if (TYPE_ALIGN (t) != TYPE_ALIGN (v))
> +    return false;
> +
> +  if (!attribute_list_equal (TYPE_ATTRIBUTES (t),
> +			     TYPE_ATTRIBUTES (v)))
> +     return false;
> +
> +  /* Do not replace complete type by incomplete.  */
> +  if (COMPLETE_TYPE_P (t) != COMPLETE_TYPE_P (v))
> +    return false;
> +
> +  if (TYPE_ADDR_SPACE (t) != TYPE_ADDR_SPACE (v))
> +    return false;
> +
> +  gcc_assert (TREE_CODE (t) == TREE_CODE (v));
> +
> +  /* For pointer types and array types we also care about the type they
> +     reffer to.  */
> +  if (TREE_TYPE (t))
> +    return types_equal_p (TREE_TYPE (t), TREE_TYPE (v));
> +
> +  return true;
> +}
> +
> +/* Search list of type variants and look up first one that looks same.
> +   At compile time, this removes duplicates of types created by front-ends.
> +   At WPA time we also merge duplicated ODR types.  */
> +
> +static tree
> +first_compatible_type_variant (tree t)
> +{
> +  tree first = NULL;
> +  if (POINTER_TYPE_P (t))
> +    {
> +      tree t2 = first_compatible_type_variant (TREE_TYPE (t));
> +      if (t2 != TREE_TYPE (t))
> +	{
> +	  if (TREE_CODE (t) == POINTER_TYPE)
> +	    first = build_pointer_type_for_mode (t2, TYPE_MODE (t),
> +						TYPE_REF_CAN_ALIAS_ALL (t));
> +	  else
> +	    first = build_reference_type_for_mode (t2, TYPE_MODE (t),
> +						TYPE_REF_CAN_ALIAS_ALL (t));
> +	}
> +      else
> +	first = TYPE_MAIN_VARIANT (t);
> +      gcc_assert (TYPE_MAIN_VARIANT (first) == first);
> +    }
> +  else
> +    first = prevailing_odr_type (t);
> +
> +  gcc_checking_assert (gimple_canonical_types_compatible_p (t, first, false));
> +
> +  for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
> +    if (types_equal_p (t, v))
> +      {
> +	gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> +	gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> +	gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> +        return v;
> +      }
> +
> +  /* Most of the time we will find v itself in the list.  With ODR type merging
> +     we however merge across TYPE_MAIN_VARIANTs and in that case we may not
> +     have corresponding type in the list.  */
> +  tree v = build_variant_type_copy (first);
> +  TYPE_READONLY (v) = TYPE_READONLY (t);
> +  TYPE_VOLATILE (v) = TYPE_VOLATILE (t);
> +  TYPE_ATOMIC (v) = TYPE_ATOMIC (t);
> +  TYPE_RESTRICT (v) = TYPE_RESTRICT (t);
> +  TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
> +  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
> +  TYPE_NAME (v) = TYPE_NAME (t);
> +  TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
> +  TREE_TYPE (v) = TREE_TYPE (t);
> +  gcc_checking_assert (types_equal_p (t, v));
> +  gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> +  gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> +  gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> +  return v;
> +}
> +
> +
> +/* Look up NODE in the type table and write the index for it to OB.
> +   Eliminate unnecesary type variants.  */
>  
>  static void
>  output_type_ref (struct output_block *ob, tree node)
>  {
> +  bool existed_p;
> +  unsigned int &index
> +    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> +       get_or_insert (node, &existed_p);
>    streamer_write_record_start (ob, LTO_type_ref);
> -  lto_output_type_ref_index (ob->decl_state, ob->main_stream, node);
> +
> +  if (existed_p)
> +    {
> +      streamer_write_uhwi_stream (ob->main_stream, index);
> +      return;
> +    }
> +
> +  tree node2 = first_compatible_type_variant (node);
> +
> +  /* If NODE is first variant, just add it into encoder.  */
> +  if (node2 == node)
> +    {
> +      index = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> +      if (streamer_dump_file)
> +	{
> +	  print_node_brief (streamer_dump_file,
> +			    "    Encoding indexable ", node, 4);
> +	  fprintf (streamer_dump_file, "  as %i\n", index);
> +	}
> +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node);
> +      streamer_write_uhwi_stream (ob->main_stream, index);
> +      return;
> +    }
> +
> +  index = 0xdead;
> +
> +  if (streamer_dump_file)
> +    {
> +      print_node_brief (streamer_dump_file, "    Type ", node, 4);
> +      fprintf (streamer_dump_file, "  prevailed by ");
> +      print_node_brief (streamer_dump_file, "", node2, 4);
> +      fprintf (streamer_dump_file, "\n");
> +    }
> +
> +  /* See if we already assgined one to the first variant.  If that is the case
> +     only duplicate the entry in the hashtable so we do not need to call
> +     first_compatible_type_variant redundantly.  */
> +  unsigned int &index2
> +    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> +	get_or_insert (node2, &existed_p);
> +  /* The reference may be changed by hash table expansion.  */
> +  unsigned int i = index2;
> +
> +  if (existed_p)
> +    {
> +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> +	tree_hash_table->get_or_insert (node, &existed_p) = i;
> +      streamer_write_uhwi_stream (ob->main_stream, i);
> +      return;
> +    }
> +  else
> +    {
> +      index2 = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> +      i = index2;
> +      if (streamer_dump_file)
> +	{
> +	  print_node_brief (streamer_dump_file,
> +			    "    Encoding indexable ", node2, 4);
> +	  fprintf (streamer_dump_file, "  as %i \n", i);
> +	}
> +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node2);
> +      /* We need to search again to watch hash table resize.  */
> +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> +	tree_hash_table->get_or_insert (node, &existed_p) = i;
> +      gcc_assert (existed_p);
> +      streamer_write_uhwi_stream (ob->main_stream, i);
> +    }
> +  return;
>  }
>  
>  
> @@ -978,12 +1171,15 @@ hash_tree (struct streamer_tree_cache_d
>  #define visit(SIBLING) \
>    do { \
>      unsigned ix; \
> -    if (!SIBLING) \
> +    tree t2 = SIBLING; \
> +    if (t2 && TYPE_P (t2)) \
> +      t2 = first_compatible_type_variant (t2); \
> +    if (!t2) \
>        hstate.add_int (0); \
> -    else if (streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
> +    else if (streamer_tree_cache_lookup (cache, t2, &ix)) \
>        hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \
>      else if (map) \
> -      hstate.add_int (*map->get (SIBLING)); \
> +      hstate.add_int (*map->get (t2)); \
>      else \
>        hstate.add_int (1); \
>    } while (0)
> @@ -1554,6 +1750,10 @@ DFS::DFS_write_tree (struct output_block
>    if (this_ref_p && tree_is_indexable (expr))
>      return;
>  
> +  /* Replace type by its prevailing variant.  */
> +  if (TYPE_P (expr))
> +    expr = first_compatible_type_variant (expr);
> +
>    /* Check if we already streamed EXPR.  */
>    if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
>      return;
> @@ -1591,6 +1791,11 @@ lto_output_tree (struct output_block *ob
>        return;
>      }
>  
> +  if (TYPE_P (expr))
> +    expr = first_compatible_type_variant (expr);
> +
> +  gcc_checking_assert (!this_ref_p || !tree_is_indexable (expr));
> +
>    existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
>    if (existed_p)
>      {
> @@ -2334,7 +2539,18 @@ copy_function_or_variable (struct symtab
>        gcc_assert (lto_tree_ref_encoder_size (encoder) == 0);
>        encoder->trees.reserve_exact (n);
>        for (j = 0; j < n; j++)
> -	encoder->trees.safe_push ((*trees)[j]);
> +	{
> +	  tree t = (*trees)[j];
> +	  if (TYPE_P (t))
> +	    t = first_compatible_type_variant (t);
> +	  if (streamer_dump_file)
> +	    {
> +	      print_node_brief (streamer_dump_file,
> +				"  Adding reference to ", t, 4);
> +	      fprintf (streamer_dump_file, "  as %i \n", (int)j);
> +	    }
> +	  encoder->trees.safe_push (t);
> +	}
>      }
>  
>    lto_free_raw_section_data (file_data, LTO_section_function_body, name,
> @@ -2507,6 +2723,8 @@ write_global_stream (struct output_block
>  	  print_node_brief (streamer_dump_file, "", t, 4);
>            fprintf (streamer_dump_file, "\n");
>  	}
> +      gcc_checking_assert (!TYPE_P (t)
> +			   || t == first_compatible_type_variant (t));
>        if (!streamer_tree_cache_lookup (ob->writer_cache, t, NULL))
>  	stream_write_tree (ob, t, false);
>      }
> @@ -2537,6 +2755,8 @@ write_global_references (struct output_b
>        t = lto_tree_ref_encoder_get_tree (encoder, index);
>        streamer_tree_cache_lookup (ob->writer_cache, t, &slot_num);
>        gcc_assert (slot_num != (unsigned)-1);
> +      gcc_checking_assert (!TYPE_P (t)
> +			   || t == first_compatible_type_variant (t));
>        data[index + 1] = slot_num;
>      }
>  
> Index: tree-ssa-alias.c
> ===================================================================
> --- tree-ssa-alias.c	(revision 264689)
> +++ tree-ssa-alias.c	(working copy)
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-dfa.h"
>  #include "ipa-reference.h"
>  #include "varasm.h"
> +#include "ipa-utils.h"
>  
>  /* Broad overview of how alias analysis on gimple works:
>  
> @@ -955,6 +956,12 @@ nonoverlapping_component_refs_of_decl_p
>  	     ??? Bitfields can overlap at RTL level so punt on them.  */
>  	  if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
>  	    return false;
> +	  /* In LTO we may end up merging different variants of ODR types
> +	     but keep corresponding field decls unmerged.  */
> +	  if (in_lto_p
> +	      && type_with_linkage_p (type1)
> +	      && DECL_NAME (field1) == DECL_NAME (field2))
> +	    return false;
>  	  return true;
>  	}
>      }
> @@ -1049,7 +1056,9 @@ nonoverlapping_component_refs_p (const_t
>  	{
>  	  /* We're left with accessing different fields of a structure,
>  	     no possible overlap.  */
> -	  if (fieldx != fieldy)
> +	  if (fieldx != fieldy
> +	      && (in_lto_p || !type_with_linkage_p (typex)
> +		  || DECL_NAME (fieldx) == DECL_NAME (fieldy)))
>  	    {
>  	      /* A field and its representative need to be considered the
>  		 same.  */
> Index: tree.c
> ===================================================================
> --- tree.c	(revision 264689)
> +++ tree.c	(working copy)
> @@ -5845,7 +5861,12 @@ free_lang_data_in_cgraph (void)
>  
>    /* Traverse every type found freeing its language data.  */
>    FOR_EACH_VEC_ELT (fld.types, i, t)
> -    free_lang_data_in_type (t);
> +    {
> +      free_lang_data_in_type (t);
> +      while (TYPE_NEXT_VARIANT (t)
> +	     && !fld.pset.contains (TYPE_NEXT_VARIANT (t)))
> +	TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_NEXT_VARIANT (t));
> +    }
>    if (flag_checking)
>      {
>        FOR_EACH_VEC_ELT (fld.types, i, t)
> 
>
Jan Hubicka Oct. 1, 2018, 10:08 a.m. UTC | #2
> 
> The ODR savings really look good but as you say the implementation is
> somewhat "tricky".
> 
> I'd like to see the type-variant done separately (of course) and
> also differently.  This is because when enabling 
> free-lang-data by default we should be able to get benefits for
> non-LTO compilations as well.  Specifically I'd like us to
> (in free-lang-data) "canonicalize" existing variants applying the
> rules you derived for the lazier compare.  At the point you then
> drop non-fld-walked variants we could weed out the resulting
> duplicates, keeping only a prevailing one in the list and (ab-)using
> GC to fixup references.  Like with ggc_register_fixup (void *from, void 
> *to) which would when following edges at mark time, adjust them
> according to this map (I'm not sure if we'd want to do it manually
> and record a vector of possibly affected TREE_TYPE uses during the fld
> walk).  We could also (easily?) optimize the streaming itself by
> streaming only differences to the main variant of a type (but then
> we could change our tree data structure to make that trivial).

I had patch for streaming the differences only some time ago. My recollection
is that we got it into tree and then reverted as there was some issues with
cycles.  I can look into this again.

I would really like to avoid using ggc for IL rewriting. One reason is
that eventually we would like to see GGC to go and thus we should not
wire it more into the essential parts of the compiler and other is that
GGC is often not run. 

Saving type variants seems to have relatively minor effect on the IL size, so
we need to have solution that is not too expensive to be justified.  I suppose
free lang data is sort of visiting all the relevant datastructures for
middle-end so we could do rewriting there.  It would also make sense to
canonicalize types more based on knowledge whether they are used for memory
access (i.e. for non-accesses go to main variant and perhaps invent something
like main variant WRT useless conversions).

We could ggc_free the old variant and then ggc will ICE on dangling pointers.
I can give it a try with my patch to prune the variant tree.
> 
> I wonder how much "ODR" merging we'd get for free when we canonicalize
> types some more - that is, how do ODR equal but tree unequal types
> usually differ?  I guess it might be most of the time fields with
> pointer to incomplete vs. complete type?

Most of the time it is complete vs incomplete pointer type.
I had statistics of type duplicates before, but they did not look as bad
as reality. The reason is that smaller types tends to be merged while
bigger types tends to be duplicated many times.  In GCC this is typically
RTL, gimple and derived types which are really many.

Honza
> 
> Thanks,
> Richard.
> 
> 
> > Honza
> > 
> > Index: ipa-devirt.c
> > ===================================================================
> > --- ipa-devirt.c	(revision 264689)
> > +++ ipa-devirt.c	(working copy)
> > @@ -2111,6 +2111,29 @@ get_odr_type (tree type, bool insert)
> >    return val;
> >  }
> >  
> > +/* Return the main variant of the odr type.  This is used for straming out
> > +   to reduce number of type duplicates hitting the WPA->LTRANS streams.
> > +   Do not do so when ODR violation was detected since the type may be
> > +   structurally different then.  */
> > +
> > +tree
> > +prevailing_odr_type (tree t)
> > +{
> > +  t = TYPE_MAIN_VARIANT (t);
> > +  /* In need_assembler_name_p we also mangle assembler names of INTEGER_TYPE.
> > +     We can not merge these because this does not honnor precision and
> > +     signedness.  */
> > +  if (!type_with_linkage_p (t)
> > +      || type_in_anonymous_namespace_p (t)
> > +      || TREE_CODE (t) == INTEGER_TYPE
> > +      || !COMPLETE_TYPE_P (t))
> > +    return t;
> > +  odr_type ot = get_odr_type (t, true);
> > +  if (!ot || !ot->odr_violated)
> > +    return ot->type;
> > +  return t;
> > +}
> > +
> >  /* Add TYPE od ODR type hash.  */
> >  
> >  void
> > Index: ipa-utils.h
> > ===================================================================
> > --- ipa-utils.h	(revision 264689)
> > +++ ipa-utils.h	(working copy)
> > @@ -90,6 +90,7 @@ void warn_types_mismatch (tree t1, tree
> >  			  location_t loc2 = UNKNOWN_LOCATION);
> >  bool odr_or_derived_type_p (const_tree t);
> >  bool odr_types_equivalent_p (tree type1, tree type2);
> > +tree prevailing_odr_type (tree t);
> >  
> >  /* Return vector containing possible targets of polymorphic call E.
> >     If COMPLETEP is non-NULL, store true if the list is complete. 
> > Index: lto/lto.c
> > ===================================================================
> > --- lto/lto.c	(revision 264689)
> > +++ lto/lto.c	(working copy)
> > @@ -485,6 +485,8 @@ gimple_register_canonical_type_1 (tree t
> >  static void
> >  gimple_register_canonical_type (tree t)
> >  {
> > +  if (flag_checking)
> > +    verify_type (t);
> >    if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t)
> >        || !canonical_type_used_p (t))
> >      return;
> > Index: lto-streamer-out.c
> > ===================================================================
> > --- lto-streamer-out.c	(revision 264689)
> > +++ lto-streamer-out.c	(working copy)
> > @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.
> >  #include "debug.h"
> >  #include "omp-offload.h"
> >  #include "print-tree.h"
> > +#include "attribs.h"
> > +#include "ipa-utils.h"
> >  
> >  
> >  static void lto_write_tree (struct output_block*, tree, bool);
> > @@ -109,14 +111,188 @@ destroy_output_block (struct output_bloc
> >    free (ob);
> >  }
> >  
> > +/* Do same comparsion as check_qualified_type skipping lang part of type
> > +   and be more permissive about type names: we only care that names are
> > +   same (for diagnostics) and that ODR names are the same.  */
> >  
> > -/* Look up NODE in the type table and write the index for it to OB.  */
> > +static bool
> > +types_equal_p (tree t, tree v)
> > +{
> > +  if (TYPE_QUALS (t) != TYPE_QUALS (v))
> > +    return false;
> > +
> > +  if (TYPE_NAME (t) != TYPE_NAME (v)
> > +      && (!TYPE_NAME (t) || !TYPE_NAME (v)
> > +	  || TREE_CODE (TYPE_NAME (t)) != TYPE_DECL
> > +	  || TREE_CODE (TYPE_NAME (v)) != TYPE_DECL
> > +	  || DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (t))
> > +	     != DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (v))
> > +	  || DECL_NAME (TYPE_NAME (t)) != DECL_NAME (TYPE_NAME (v))))
> > +     return false;
> > +
> > +  if (TYPE_ALIGN (t) != TYPE_ALIGN (v))
> > +    return false;
> > +
> > +  if (!attribute_list_equal (TYPE_ATTRIBUTES (t),
> > +			     TYPE_ATTRIBUTES (v)))
> > +     return false;
> > +
> > +  /* Do not replace complete type by incomplete.  */
> > +  if (COMPLETE_TYPE_P (t) != COMPLETE_TYPE_P (v))
> > +    return false;
> > +
> > +  if (TYPE_ADDR_SPACE (t) != TYPE_ADDR_SPACE (v))
> > +    return false;
> > +
> > +  gcc_assert (TREE_CODE (t) == TREE_CODE (v));
> > +
> > +  /* For pointer types and array types we also care about the type they
> > +     reffer to.  */
> > +  if (TREE_TYPE (t))
> > +    return types_equal_p (TREE_TYPE (t), TREE_TYPE (v));
> > +
> > +  return true;
> > +}
> > +
> > +/* Search list of type variants and look up first one that looks same.
> > +   At compile time, this removes duplicates of types created by front-ends.
> > +   At WPA time we also merge duplicated ODR types.  */
> > +
> > +static tree
> > +first_compatible_type_variant (tree t)
> > +{
> > +  tree first = NULL;
> > +  if (POINTER_TYPE_P (t))
> > +    {
> > +      tree t2 = first_compatible_type_variant (TREE_TYPE (t));
> > +      if (t2 != TREE_TYPE (t))
> > +	{
> > +	  if (TREE_CODE (t) == POINTER_TYPE)
> > +	    first = build_pointer_type_for_mode (t2, TYPE_MODE (t),
> > +						TYPE_REF_CAN_ALIAS_ALL (t));
> > +	  else
> > +	    first = build_reference_type_for_mode (t2, TYPE_MODE (t),
> > +						TYPE_REF_CAN_ALIAS_ALL (t));
> > +	}
> > +      else
> > +	first = TYPE_MAIN_VARIANT (t);
> > +      gcc_assert (TYPE_MAIN_VARIANT (first) == first);
> > +    }
> > +  else
> > +    first = prevailing_odr_type (t);
> > +
> > +  gcc_checking_assert (gimple_canonical_types_compatible_p (t, first, false));
> > +
> > +  for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
> > +    if (types_equal_p (t, v))
> > +      {
> > +	gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> > +	gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> > +	gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> > +        return v;
> > +      }
> > +
> > +  /* Most of the time we will find v itself in the list.  With ODR type merging
> > +     we however merge across TYPE_MAIN_VARIANTs and in that case we may not
> > +     have corresponding type in the list.  */
> > +  tree v = build_variant_type_copy (first);
> > +  TYPE_READONLY (v) = TYPE_READONLY (t);
> > +  TYPE_VOLATILE (v) = TYPE_VOLATILE (t);
> > +  TYPE_ATOMIC (v) = TYPE_ATOMIC (t);
> > +  TYPE_RESTRICT (v) = TYPE_RESTRICT (t);
> > +  TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
> > +  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
> > +  TYPE_NAME (v) = TYPE_NAME (t);
> > +  TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
> > +  TREE_TYPE (v) = TREE_TYPE (t);
> > +  gcc_checking_assert (types_equal_p (t, v));
> > +  gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> > +  gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> > +  gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> > +  return v;
> > +}
> > +
> > +
> > +/* Look up NODE in the type table and write the index for it to OB.
> > +   Eliminate unnecesary type variants.  */
> >  
> >  static void
> >  output_type_ref (struct output_block *ob, tree node)
> >  {
> > +  bool existed_p;
> > +  unsigned int &index
> > +    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> > +       get_or_insert (node, &existed_p);
> >    streamer_write_record_start (ob, LTO_type_ref);
> > -  lto_output_type_ref_index (ob->decl_state, ob->main_stream, node);
> > +
> > +  if (existed_p)
> > +    {
> > +      streamer_write_uhwi_stream (ob->main_stream, index);
> > +      return;
> > +    }
> > +
> > +  tree node2 = first_compatible_type_variant (node);
> > +
> > +  /* If NODE is first variant, just add it into encoder.  */
> > +  if (node2 == node)
> > +    {
> > +      index = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> > +      if (streamer_dump_file)
> > +	{
> > +	  print_node_brief (streamer_dump_file,
> > +			    "    Encoding indexable ", node, 4);
> > +	  fprintf (streamer_dump_file, "  as %i\n", index);
> > +	}
> > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node);
> > +      streamer_write_uhwi_stream (ob->main_stream, index);
> > +      return;
> > +    }
> > +
> > +  index = 0xdead;
> > +
> > +  if (streamer_dump_file)
> > +    {
> > +      print_node_brief (streamer_dump_file, "    Type ", node, 4);
> > +      fprintf (streamer_dump_file, "  prevailed by ");
> > +      print_node_brief (streamer_dump_file, "", node2, 4);
> > +      fprintf (streamer_dump_file, "\n");
> > +    }
> > +
> > +  /* See if we already assgined one to the first variant.  If that is the case
> > +     only duplicate the entry in the hashtable so we do not need to call
> > +     first_compatible_type_variant redundantly.  */
> > +  unsigned int &index2
> > +    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> > +	get_or_insert (node2, &existed_p);
> > +  /* The reference may be changed by hash table expansion.  */
> > +  unsigned int i = index2;
> > +
> > +  if (existed_p)
> > +    {
> > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> > +	tree_hash_table->get_or_insert (node, &existed_p) = i;
> > +      streamer_write_uhwi_stream (ob->main_stream, i);
> > +      return;
> > +    }
> > +  else
> > +    {
> > +      index2 = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> > +      i = index2;
> > +      if (streamer_dump_file)
> > +	{
> > +	  print_node_brief (streamer_dump_file,
> > +			    "    Encoding indexable ", node2, 4);
> > +	  fprintf (streamer_dump_file, "  as %i \n", i);
> > +	}
> > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node2);
> > +      /* We need to search again to watch hash table resize.  */
> > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> > +	tree_hash_table->get_or_insert (node, &existed_p) = i;
> > +      gcc_assert (existed_p);
> > +      streamer_write_uhwi_stream (ob->main_stream, i);
> > +    }
> > +  return;
> >  }
> >  
> >  
> > @@ -978,12 +1171,15 @@ hash_tree (struct streamer_tree_cache_d
> >  #define visit(SIBLING) \
> >    do { \
> >      unsigned ix; \
> > -    if (!SIBLING) \
> > +    tree t2 = SIBLING; \
> > +    if (t2 && TYPE_P (t2)) \
> > +      t2 = first_compatible_type_variant (t2); \
> > +    if (!t2) \
> >        hstate.add_int (0); \
> > -    else if (streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
> > +    else if (streamer_tree_cache_lookup (cache, t2, &ix)) \
> >        hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \
> >      else if (map) \
> > -      hstate.add_int (*map->get (SIBLING)); \
> > +      hstate.add_int (*map->get (t2)); \
> >      else \
> >        hstate.add_int (1); \
> >    } while (0)
> > @@ -1554,6 +1750,10 @@ DFS::DFS_write_tree (struct output_block
> >    if (this_ref_p && tree_is_indexable (expr))
> >      return;
> >  
> > +  /* Replace type by its prevailing variant.  */
> > +  if (TYPE_P (expr))
> > +    expr = first_compatible_type_variant (expr);
> > +
> >    /* Check if we already streamed EXPR.  */
> >    if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
> >      return;
> > @@ -1591,6 +1791,11 @@ lto_output_tree (struct output_block *ob
> >        return;
> >      }
> >  
> > +  if (TYPE_P (expr))
> > +    expr = first_compatible_type_variant (expr);
> > +
> > +  gcc_checking_assert (!this_ref_p || !tree_is_indexable (expr));
> > +
> >    existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
> >    if (existed_p)
> >      {
> > @@ -2334,7 +2539,18 @@ copy_function_or_variable (struct symtab
> >        gcc_assert (lto_tree_ref_encoder_size (encoder) == 0);
> >        encoder->trees.reserve_exact (n);
> >        for (j = 0; j < n; j++)
> > -	encoder->trees.safe_push ((*trees)[j]);
> > +	{
> > +	  tree t = (*trees)[j];
> > +	  if (TYPE_P (t))
> > +	    t = first_compatible_type_variant (t);
> > +	  if (streamer_dump_file)
> > +	    {
> > +	      print_node_brief (streamer_dump_file,
> > +				"  Adding reference to ", t, 4);
> > +	      fprintf (streamer_dump_file, "  as %i \n", (int)j);
> > +	    }
> > +	  encoder->trees.safe_push (t);
> > +	}
> >      }
> >  
> >    lto_free_raw_section_data (file_data, LTO_section_function_body, name,
> > @@ -2507,6 +2723,8 @@ write_global_stream (struct output_block
> >  	  print_node_brief (streamer_dump_file, "", t, 4);
> >            fprintf (streamer_dump_file, "\n");
> >  	}
> > +      gcc_checking_assert (!TYPE_P (t)
> > +			   || t == first_compatible_type_variant (t));
> >        if (!streamer_tree_cache_lookup (ob->writer_cache, t, NULL))
> >  	stream_write_tree (ob, t, false);
> >      }
> > @@ -2537,6 +2755,8 @@ write_global_references (struct output_b
> >        t = lto_tree_ref_encoder_get_tree (encoder, index);
> >        streamer_tree_cache_lookup (ob->writer_cache, t, &slot_num);
> >        gcc_assert (slot_num != (unsigned)-1);
> > +      gcc_checking_assert (!TYPE_P (t)
> > +			   || t == first_compatible_type_variant (t));
> >        data[index + 1] = slot_num;
> >      }
> >  
> > Index: tree-ssa-alias.c
> > ===================================================================
> > --- tree-ssa-alias.c	(revision 264689)
> > +++ tree-ssa-alias.c	(working copy)
> > @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
> >  #include "tree-dfa.h"
> >  #include "ipa-reference.h"
> >  #include "varasm.h"
> > +#include "ipa-utils.h"
> >  
> >  /* Broad overview of how alias analysis on gimple works:
> >  
> > @@ -955,6 +956,12 @@ nonoverlapping_component_refs_of_decl_p
> >  	     ??? Bitfields can overlap at RTL level so punt on them.  */
> >  	  if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
> >  	    return false;
> > +	  /* In LTO we may end up merging different variants of ODR types
> > +	     but keep corresponding field decls unmerged.  */
> > +	  if (in_lto_p
> > +	      && type_with_linkage_p (type1)
> > +	      && DECL_NAME (field1) == DECL_NAME (field2))
> > +	    return false;
> >  	  return true;
> >  	}
> >      }
> > @@ -1049,7 +1056,9 @@ nonoverlapping_component_refs_p (const_t
> >  	{
> >  	  /* We're left with accessing different fields of a structure,
> >  	     no possible overlap.  */
> > -	  if (fieldx != fieldy)
> > +	  if (fieldx != fieldy
> > +	      && (in_lto_p || !type_with_linkage_p (typex)
> > +		  || DECL_NAME (fieldx) == DECL_NAME (fieldy)))
> >  	    {
> >  	      /* A field and its representative need to be considered the
> >  		 same.  */
> > Index: tree.c
> > ===================================================================
> > --- tree.c	(revision 264689)
> > +++ tree.c	(working copy)
> > @@ -5845,7 +5861,12 @@ free_lang_data_in_cgraph (void)
> >  
> >    /* Traverse every type found freeing its language data.  */
> >    FOR_EACH_VEC_ELT (fld.types, i, t)
> > -    free_lang_data_in_type (t);
> > +    {
> > +      free_lang_data_in_type (t);
> > +      while (TYPE_NEXT_VARIANT (t)
> > +	     && !fld.pset.contains (TYPE_NEXT_VARIANT (t)))
> > +	TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_NEXT_VARIANT (t));
> > +    }
> >    if (flag_checking)
> >      {
> >        FOR_EACH_VEC_ELT (fld.types, i, t)
> > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Oct. 1, 2018, 10:29 a.m. UTC | #3
On Mon, 1 Oct 2018, Jan Hubicka wrote:

> > 
> > The ODR savings really look good but as you say the implementation is
> > somewhat "tricky".
> > 
> > I'd like to see the type-variant done separately (of course) and
> > also differently.  This is because when enabling 
> > free-lang-data by default we should be able to get benefits for
> > non-LTO compilations as well.  Specifically I'd like us to
> > (in free-lang-data) "canonicalize" existing variants applying the
> > rules you derived for the lazier compare.  At the point you then
> > drop non-fld-walked variants we could weed out the resulting
> > duplicates, keeping only a prevailing one in the list and (ab-)using
> > GC to fixup references.  Like with ggc_register_fixup (void *from, void 
> > *to) which would when following edges at mark time, adjust them
> > according to this map (I'm not sure if we'd want to do it manually
> > and record a vector of possibly affected TREE_TYPE uses during the fld
> > walk).  We could also (easily?) optimize the streaming itself by
> > streaming only differences to the main variant of a type (but then
> > we could change our tree data structure to make that trivial).
> 
> I had patch for streaming the differences only some time ago. My recollection
> is that we got it into tree and then reverted as there was some issues with
> cycles.  I can look into this again.
> 
> I would really like to avoid using ggc for IL rewriting. One reason is
> that eventually we would like to see GGC to go and thus we should not
> wire it more into the essential parts of the compiler and other is that
> GGC is often not run. 

Heh.

> Saving type variants seems to have relatively minor effect on the IL size, so
> we need to have solution that is not too expensive to be justified.  I suppose
> free lang data is sort of visiting all the relevant datastructures for
> middle-end so we could do rewriting there.  It would also make sense to
> canonicalize types more based on knowledge whether they are used for memory
> access (i.e. for non-accesses go to main variant and perhaps invent something
> like main variant WRT useless conversions).

I guess for that it would be better to put things like memory access
alignment into the memory-references rather than using TYPE_ALIGN.
Likewise for other things affecting semantics (TYPE_REF_CAN_ALIAS_ALL).

Being able to simply drop to TYPE_MAIN_VARIANT would be very appealing...
(and simplify things).  The hard thing is to figure out where we look
into those variant differences during late compilation...

Type variants was always my first "easy" middle-end type related cleanup.
And for non-LTO ODR merging probably gets us nothing (the FE should do
the merging).

> We could ggc_free the old variant and then ggc will ICE on dangling pointers.
> I can give it a try with my patch to prune the variant tree.

Yes, definitely do that.  I also _really_ like us to do FLD 
unconditionally - maybe we can start by guarding individual pieces with
a for_lto flag we pass down.  But esp. disabling langhooks and stuff like
the variant type purging would be good to get enabled for non-LTO
to not have that modes diverge more and more...

> > 
> > I wonder how much "ODR" merging we'd get for free when we canonicalize
> > types some more - that is, how do ODR equal but tree unequal types
> > usually differ?  I guess it might be most of the time fields with
> > pointer to incomplete vs. complete type?
> 
> Most of the time it is complete vs incomplete pointer type.
> I had statistics of type duplicates before, but they did not look as bad
> as reality. The reason is that smaller types tends to be merged while
> bigger types tends to be duplicated many times.  In GCC this is typically
> RTL, gimple and derived types which are really many.

I see.  So one possible canonicalization is to make _all_
pointer-typed FIELD_DECLs point to incomplete variants since the memory
accesses should already have the "proper" access types.  Can you
get statistics on that?  Not sure how to get an "incomplete" type
though (iff we can simply copy the type and NULL TYPE_FIELDs and
TYPE_SIZE and friends) - again I'd do that at FLD time.

Richard.

> Honza
> > 
> > Thanks,
> > Richard.
> > 
> > 
> > > Honza
> > > 
> > > Index: ipa-devirt.c
> > > ===================================================================
> > > --- ipa-devirt.c	(revision 264689)
> > > +++ ipa-devirt.c	(working copy)
> > > @@ -2111,6 +2111,29 @@ get_odr_type (tree type, bool insert)
> > >    return val;
> > >  }
> > >  
> > > +/* Return the main variant of the odr type.  This is used for straming out
> > > +   to reduce number of type duplicates hitting the WPA->LTRANS streams.
> > > +   Do not do so when ODR violation was detected since the type may be
> > > +   structurally different then.  */
> > > +
> > > +tree
> > > +prevailing_odr_type (tree t)
> > > +{
> > > +  t = TYPE_MAIN_VARIANT (t);
> > > +  /* In need_assembler_name_p we also mangle assembler names of INTEGER_TYPE.
> > > +     We can not merge these because this does not honnor precision and
> > > +     signedness.  */
> > > +  if (!type_with_linkage_p (t)
> > > +      || type_in_anonymous_namespace_p (t)
> > > +      || TREE_CODE (t) == INTEGER_TYPE
> > > +      || !COMPLETE_TYPE_P (t))
> > > +    return t;
> > > +  odr_type ot = get_odr_type (t, true);
> > > +  if (!ot || !ot->odr_violated)
> > > +    return ot->type;
> > > +  return t;
> > > +}
> > > +
> > >  /* Add TYPE od ODR type hash.  */
> > >  
> > >  void
> > > Index: ipa-utils.h
> > > ===================================================================
> > > --- ipa-utils.h	(revision 264689)
> > > +++ ipa-utils.h	(working copy)
> > > @@ -90,6 +90,7 @@ void warn_types_mismatch (tree t1, tree
> > >  			  location_t loc2 = UNKNOWN_LOCATION);
> > >  bool odr_or_derived_type_p (const_tree t);
> > >  bool odr_types_equivalent_p (tree type1, tree type2);
> > > +tree prevailing_odr_type (tree t);
> > >  
> > >  /* Return vector containing possible targets of polymorphic call E.
> > >     If COMPLETEP is non-NULL, store true if the list is complete. 
> > > Index: lto/lto.c
> > > ===================================================================
> > > --- lto/lto.c	(revision 264689)
> > > +++ lto/lto.c	(working copy)
> > > @@ -485,6 +485,8 @@ gimple_register_canonical_type_1 (tree t
> > >  static void
> > >  gimple_register_canonical_type (tree t)
> > >  {
> > > +  if (flag_checking)
> > > +    verify_type (t);
> > >    if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t)
> > >        || !canonical_type_used_p (t))
> > >      return;
> > > Index: lto-streamer-out.c
> > > ===================================================================
> > > --- lto-streamer-out.c	(revision 264689)
> > > +++ lto-streamer-out.c	(working copy)
> > > @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.
> > >  #include "debug.h"
> > >  #include "omp-offload.h"
> > >  #include "print-tree.h"
> > > +#include "attribs.h"
> > > +#include "ipa-utils.h"
> > >  
> > >  
> > >  static void lto_write_tree (struct output_block*, tree, bool);
> > > @@ -109,14 +111,188 @@ destroy_output_block (struct output_bloc
> > >    free (ob);
> > >  }
> > >  
> > > +/* Do same comparsion as check_qualified_type skipping lang part of type
> > > +   and be more permissive about type names: we only care that names are
> > > +   same (for diagnostics) and that ODR names are the same.  */
> > >  
> > > -/* Look up NODE in the type table and write the index for it to OB.  */
> > > +static bool
> > > +types_equal_p (tree t, tree v)
> > > +{
> > > +  if (TYPE_QUALS (t) != TYPE_QUALS (v))
> > > +    return false;
> > > +
> > > +  if (TYPE_NAME (t) != TYPE_NAME (v)
> > > +      && (!TYPE_NAME (t) || !TYPE_NAME (v)
> > > +	  || TREE_CODE (TYPE_NAME (t)) != TYPE_DECL
> > > +	  || TREE_CODE (TYPE_NAME (v)) != TYPE_DECL
> > > +	  || DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (t))
> > > +	     != DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (v))
> > > +	  || DECL_NAME (TYPE_NAME (t)) != DECL_NAME (TYPE_NAME (v))))
> > > +     return false;
> > > +
> > > +  if (TYPE_ALIGN (t) != TYPE_ALIGN (v))
> > > +    return false;
> > > +
> > > +  if (!attribute_list_equal (TYPE_ATTRIBUTES (t),
> > > +			     TYPE_ATTRIBUTES (v)))
> > > +     return false;
> > > +
> > > +  /* Do not replace complete type by incomplete.  */
> > > +  if (COMPLETE_TYPE_P (t) != COMPLETE_TYPE_P (v))
> > > +    return false;
> > > +
> > > +  if (TYPE_ADDR_SPACE (t) != TYPE_ADDR_SPACE (v))
> > > +    return false;
> > > +
> > > +  gcc_assert (TREE_CODE (t) == TREE_CODE (v));
> > > +
> > > +  /* For pointer types and array types we also care about the type they
> > > +     reffer to.  */
> > > +  if (TREE_TYPE (t))
> > > +    return types_equal_p (TREE_TYPE (t), TREE_TYPE (v));
> > > +
> > > +  return true;
> > > +}
> > > +
> > > +/* Search list of type variants and look up first one that looks same.
> > > +   At compile time, this removes duplicates of types created by front-ends.
> > > +   At WPA time we also merge duplicated ODR types.  */
> > > +
> > > +static tree
> > > +first_compatible_type_variant (tree t)
> > > +{
> > > +  tree first = NULL;
> > > +  if (POINTER_TYPE_P (t))
> > > +    {
> > > +      tree t2 = first_compatible_type_variant (TREE_TYPE (t));
> > > +      if (t2 != TREE_TYPE (t))
> > > +	{
> > > +	  if (TREE_CODE (t) == POINTER_TYPE)
> > > +	    first = build_pointer_type_for_mode (t2, TYPE_MODE (t),
> > > +						TYPE_REF_CAN_ALIAS_ALL (t));
> > > +	  else
> > > +	    first = build_reference_type_for_mode (t2, TYPE_MODE (t),
> > > +						TYPE_REF_CAN_ALIAS_ALL (t));
> > > +	}
> > > +      else
> > > +	first = TYPE_MAIN_VARIANT (t);
> > > +      gcc_assert (TYPE_MAIN_VARIANT (first) == first);
> > > +    }
> > > +  else
> > > +    first = prevailing_odr_type (t);
> > > +
> > > +  gcc_checking_assert (gimple_canonical_types_compatible_p (t, first, false));
> > > +
> > > +  for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
> > > +    if (types_equal_p (t, v))
> > > +      {
> > > +	gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> > > +	gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> > > +	gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> > > +        return v;
> > > +      }
> > > +
> > > +  /* Most of the time we will find v itself in the list.  With ODR type merging
> > > +     we however merge across TYPE_MAIN_VARIANTs and in that case we may not
> > > +     have corresponding type in the list.  */
> > > +  tree v = build_variant_type_copy (first);
> > > +  TYPE_READONLY (v) = TYPE_READONLY (t);
> > > +  TYPE_VOLATILE (v) = TYPE_VOLATILE (t);
> > > +  TYPE_ATOMIC (v) = TYPE_ATOMIC (t);
> > > +  TYPE_RESTRICT (v) = TYPE_RESTRICT (t);
> > > +  TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
> > > +  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
> > > +  TYPE_NAME (v) = TYPE_NAME (t);
> > > +  TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
> > > +  TREE_TYPE (v) = TREE_TYPE (t);
> > > +  gcc_checking_assert (types_equal_p (t, v));
> > > +  gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> > > +  gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> > > +  gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> > > +  return v;
> > > +}
> > > +
> > > +
> > > +/* Look up NODE in the type table and write the index for it to OB.
> > > +   Eliminate unnecesary type variants.  */
> > >  
> > >  static void
> > >  output_type_ref (struct output_block *ob, tree node)
> > >  {
> > > +  bool existed_p;
> > > +  unsigned int &index
> > > +    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> > > +       get_or_insert (node, &existed_p);
> > >    streamer_write_record_start (ob, LTO_type_ref);
> > > -  lto_output_type_ref_index (ob->decl_state, ob->main_stream, node);
> > > +
> > > +  if (existed_p)
> > > +    {
> > > +      streamer_write_uhwi_stream (ob->main_stream, index);
> > > +      return;
> > > +    }
> > > +
> > > +  tree node2 = first_compatible_type_variant (node);
> > > +
> > > +  /* If NODE is first variant, just add it into encoder.  */
> > > +  if (node2 == node)
> > > +    {
> > > +      index = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> > > +      if (streamer_dump_file)
> > > +	{
> > > +	  print_node_brief (streamer_dump_file,
> > > +			    "    Encoding indexable ", node, 4);
> > > +	  fprintf (streamer_dump_file, "  as %i\n", index);
> > > +	}
> > > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node);
> > > +      streamer_write_uhwi_stream (ob->main_stream, index);
> > > +      return;
> > > +    }
> > > +
> > > +  index = 0xdead;
> > > +
> > > +  if (streamer_dump_file)
> > > +    {
> > > +      print_node_brief (streamer_dump_file, "    Type ", node, 4);
> > > +      fprintf (streamer_dump_file, "  prevailed by ");
> > > +      print_node_brief (streamer_dump_file, "", node2, 4);
> > > +      fprintf (streamer_dump_file, "\n");
> > > +    }
> > > +
> > > +  /* See if we already assgined one to the first variant.  If that is the case
> > > +     only duplicate the entry in the hashtable so we do not need to call
> > > +     first_compatible_type_variant redundantly.  */
> > > +  unsigned int &index2
> > > +    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> > > +	get_or_insert (node2, &existed_p);
> > > +  /* The reference may be changed by hash table expansion.  */
> > > +  unsigned int i = index2;
> > > +
> > > +  if (existed_p)
> > > +    {
> > > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> > > +	tree_hash_table->get_or_insert (node, &existed_p) = i;
> > > +      streamer_write_uhwi_stream (ob->main_stream, i);
> > > +      return;
> > > +    }
> > > +  else
> > > +    {
> > > +      index2 = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> > > +      i = index2;
> > > +      if (streamer_dump_file)
> > > +	{
> > > +	  print_node_brief (streamer_dump_file,
> > > +			    "    Encoding indexable ", node2, 4);
> > > +	  fprintf (streamer_dump_file, "  as %i \n", i);
> > > +	}
> > > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node2);
> > > +      /* We need to search again to watch hash table resize.  */
> > > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> > > +	tree_hash_table->get_or_insert (node, &existed_p) = i;
> > > +      gcc_assert (existed_p);
> > > +      streamer_write_uhwi_stream (ob->main_stream, i);
> > > +    }
> > > +  return;
> > >  }
> > >  
> > >  
> > > @@ -978,12 +1171,15 @@ hash_tree (struct streamer_tree_cache_d
> > >  #define visit(SIBLING) \
> > >    do { \
> > >      unsigned ix; \
> > > -    if (!SIBLING) \
> > > +    tree t2 = SIBLING; \
> > > +    if (t2 && TYPE_P (t2)) \
> > > +      t2 = first_compatible_type_variant (t2); \
> > > +    if (!t2) \
> > >        hstate.add_int (0); \
> > > -    else if (streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
> > > +    else if (streamer_tree_cache_lookup (cache, t2, &ix)) \
> > >        hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \
> > >      else if (map) \
> > > -      hstate.add_int (*map->get (SIBLING)); \
> > > +      hstate.add_int (*map->get (t2)); \
> > >      else \
> > >        hstate.add_int (1); \
> > >    } while (0)
> > > @@ -1554,6 +1750,10 @@ DFS::DFS_write_tree (struct output_block
> > >    if (this_ref_p && tree_is_indexable (expr))
> > >      return;
> > >  
> > > +  /* Replace type by its prevailing variant.  */
> > > +  if (TYPE_P (expr))
> > > +    expr = first_compatible_type_variant (expr);
> > > +
> > >    /* Check if we already streamed EXPR.  */
> > >    if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
> > >      return;
> > > @@ -1591,6 +1791,11 @@ lto_output_tree (struct output_block *ob
> > >        return;
> > >      }
> > >  
> > > +  if (TYPE_P (expr))
> > > +    expr = first_compatible_type_variant (expr);
> > > +
> > > +  gcc_checking_assert (!this_ref_p || !tree_is_indexable (expr));
> > > +
> > >    existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
> > >    if (existed_p)
> > >      {
> > > @@ -2334,7 +2539,18 @@ copy_function_or_variable (struct symtab
> > >        gcc_assert (lto_tree_ref_encoder_size (encoder) == 0);
> > >        encoder->trees.reserve_exact (n);
> > >        for (j = 0; j < n; j++)
> > > -	encoder->trees.safe_push ((*trees)[j]);
> > > +	{
> > > +	  tree t = (*trees)[j];
> > > +	  if (TYPE_P (t))
> > > +	    t = first_compatible_type_variant (t);
> > > +	  if (streamer_dump_file)
> > > +	    {
> > > +	      print_node_brief (streamer_dump_file,
> > > +				"  Adding reference to ", t, 4);
> > > +	      fprintf (streamer_dump_file, "  as %i \n", (int)j);
> > > +	    }
> > > +	  encoder->trees.safe_push (t);
> > > +	}
> > >      }
> > >  
> > >    lto_free_raw_section_data (file_data, LTO_section_function_body, name,
> > > @@ -2507,6 +2723,8 @@ write_global_stream (struct output_block
> > >  	  print_node_brief (streamer_dump_file, "", t, 4);
> > >            fprintf (streamer_dump_file, "\n");
> > >  	}
> > > +      gcc_checking_assert (!TYPE_P (t)
> > > +			   || t == first_compatible_type_variant (t));
> > >        if (!streamer_tree_cache_lookup (ob->writer_cache, t, NULL))
> > >  	stream_write_tree (ob, t, false);
> > >      }
> > > @@ -2537,6 +2755,8 @@ write_global_references (struct output_b
> > >        t = lto_tree_ref_encoder_get_tree (encoder, index);
> > >        streamer_tree_cache_lookup (ob->writer_cache, t, &slot_num);
> > >        gcc_assert (slot_num != (unsigned)-1);
> > > +      gcc_checking_assert (!TYPE_P (t)
> > > +			   || t == first_compatible_type_variant (t));
> > >        data[index + 1] = slot_num;
> > >      }
> > >  
> > > Index: tree-ssa-alias.c
> > > ===================================================================
> > > --- tree-ssa-alias.c	(revision 264689)
> > > +++ tree-ssa-alias.c	(working copy)
> > > @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
> > >  #include "tree-dfa.h"
> > >  #include "ipa-reference.h"
> > >  #include "varasm.h"
> > > +#include "ipa-utils.h"
> > >  
> > >  /* Broad overview of how alias analysis on gimple works:
> > >  
> > > @@ -955,6 +956,12 @@ nonoverlapping_component_refs_of_decl_p
> > >  	     ??? Bitfields can overlap at RTL level so punt on them.  */
> > >  	  if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
> > >  	    return false;
> > > +	  /* In LTO we may end up merging different variants of ODR types
> > > +	     but keep corresponding field decls unmerged.  */
> > > +	  if (in_lto_p
> > > +	      && type_with_linkage_p (type1)
> > > +	      && DECL_NAME (field1) == DECL_NAME (field2))
> > > +	    return false;
> > >  	  return true;
> > >  	}
> > >      }
> > > @@ -1049,7 +1056,9 @@ nonoverlapping_component_refs_p (const_t
> > >  	{
> > >  	  /* We're left with accessing different fields of a structure,
> > >  	     no possible overlap.  */
> > > -	  if (fieldx != fieldy)
> > > +	  if (fieldx != fieldy
> > > +	      && (in_lto_p || !type_with_linkage_p (typex)
> > > +		  || DECL_NAME (fieldx) == DECL_NAME (fieldy)))
> > >  	    {
> > >  	      /* A field and its representative need to be considered the
> > >  		 same.  */
> > > Index: tree.c
> > > ===================================================================
> > > --- tree.c	(revision 264689)
> > > +++ tree.c	(working copy)
> > > @@ -5845,7 +5861,12 @@ free_lang_data_in_cgraph (void)
> > >  
> > >    /* Traverse every type found freeing its language data.  */
> > >    FOR_EACH_VEC_ELT (fld.types, i, t)
> > > -    free_lang_data_in_type (t);
> > > +    {
> > > +      free_lang_data_in_type (t);
> > > +      while (TYPE_NEXT_VARIANT (t)
> > > +	     && !fld.pset.contains (TYPE_NEXT_VARIANT (t)))
> > > +	TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_NEXT_VARIANT (t));
> > > +    }
> > >    if (flag_checking)
> > >      {
> > >        FOR_EACH_VEC_ELT (fld.types, i, t)
> > > 
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 
>
Richard Biener Oct. 1, 2018, 11:05 a.m. UTC | #4
On Mon, 1 Oct 2018, Richard Biener wrote:

> On Mon, 1 Oct 2018, Jan Hubicka wrote:
> 
> > > 
> > > The ODR savings really look good but as you say the implementation is
> > > somewhat "tricky".
> > > 
> > > I'd like to see the type-variant done separately (of course) and
> > > also differently.  This is because when enabling 
> > > free-lang-data by default we should be able to get benefits for
> > > non-LTO compilations as well.  Specifically I'd like us to
> > > (in free-lang-data) "canonicalize" existing variants applying the
> > > rules you derived for the lazier compare.  At the point you then
> > > drop non-fld-walked variants we could weed out the resulting
> > > duplicates, keeping only a prevailing one in the list and (ab-)using
> > > GC to fixup references.  Like with ggc_register_fixup (void *from, void 
> > > *to) which would when following edges at mark time, adjust them
> > > according to this map (I'm not sure if we'd want to do it manually
> > > and record a vector of possibly affected TREE_TYPE uses during the fld
> > > walk).  We could also (easily?) optimize the streaming itself by
> > > streaming only differences to the main variant of a type (but then
> > > we could change our tree data structure to make that trivial).
> > 
> > I had patch for streaming the differences only some time ago. My recollection
> > is that we got it into tree and then reverted as there was some issues with
> > cycles.  I can look into this again.
> > 
> > I would really like to avoid using ggc for IL rewriting. One reason is
> > that eventually we would like to see GGC to go and thus we should not
> > wire it more into the essential parts of the compiler and other is that
> > GGC is often not run. 
> 
> Heh.
> 
> > Saving type variants seems to have relatively minor effect on the IL size, so
> > we need to have solution that is not too expensive to be justified.  I suppose
> > free lang data is sort of visiting all the relevant datastructures for
> > middle-end so we could do rewriting there.  It would also make sense to
> > canonicalize types more based on knowledge whether they are used for memory
> > access (i.e. for non-accesses go to main variant and perhaps invent something
> > like main variant WRT useless conversions).
> 
> I guess for that it would be better to put things like memory access
> alignment into the memory-references rather than using TYPE_ALIGN.
> Likewise for other things affecting semantics (TYPE_REF_CAN_ALIAS_ALL).
> 
> Being able to simply drop to TYPE_MAIN_VARIANT would be very appealing...
> (and simplify things).  The hard thing is to figure out where we look
> into those variant differences during late compilation...
> 
> Type variants was always my first "easy" middle-end type related cleanup.
> And for non-LTO ODR merging probably gets us nothing (the FE should do
> the merging).
> 
> > We could ggc_free the old variant and then ggc will ICE on dangling pointers.
> > I can give it a try with my patch to prune the variant tree.
> 
> Yes, definitely do that.  I also _really_ like us to do FLD 
> unconditionally - maybe we can start by guarding individual pieces with
> a for_lto flag we pass down.  But esp. disabling langhooks and stuff like
> the variant type purging would be good to get enabled for non-LTO
> to not have that modes diverge more and more...
> 
> > > 
> > > I wonder how much "ODR" merging we'd get for free when we canonicalize
> > > types some more - that is, how do ODR equal but tree unequal types
> > > usually differ?  I guess it might be most of the time fields with
> > > pointer to incomplete vs. complete type?
> > 
> > Most of the time it is complete vs incomplete pointer type.
> > I had statistics of type duplicates before, but they did not look as bad
> > as reality. The reason is that smaller types tends to be merged while
> > bigger types tends to be duplicated many times.  In GCC this is typically
> > RTL, gimple and derived types which are really many.
> 
> I see.  So one possible canonicalization is to make _all_
> pointer-typed FIELD_DECLs point to incomplete variants since the memory
> accesses should already have the "proper" access types.  Can you
> get statistics on that?  Not sure how to get an "incomplete" type
> though (iff we can simply copy the type and NULL TYPE_FIELDs and
> TYPE_SIZE and friends) - again I'd do that at FLD time.

So sth like

 tp = build_distinct_type_copy (t);
 TYPE_FIELDS (tp) = NULL_TREE;
 TYPE_SIZE (tp) = NULL_TREE;
 TYPE_SIZE_UNIT (tp) = NULL_TREE;
 tp = type_hash_canon (tp);

of course we "leak" the original type in used COMPONENT_REFs
(may also cause some verifier ICEs here if the types mismatch that
of the FIELD_DECLs) and in aggregate copies, etc.  But I wonder
how much "unused" unnecessary types we have.  That is, I'd paper
over the ICEs this causes and not fixup the IL stream at first for
example.

Richard.

> Richard.
> 
> > Honza
> > > 
> > > Thanks,
> > > Richard.
> > > 
> > > 
> > > > Honza
> > > > 
> > > > Index: ipa-devirt.c
> > > > ===================================================================
> > > > --- ipa-devirt.c	(revision 264689)
> > > > +++ ipa-devirt.c	(working copy)
> > > > @@ -2111,6 +2111,29 @@ get_odr_type (tree type, bool insert)
> > > >    return val;
> > > >  }
> > > >  
> > > > +/* Return the main variant of the odr type.  This is used for straming out
> > > > +   to reduce number of type duplicates hitting the WPA->LTRANS streams.
> > > > +   Do not do so when ODR violation was detected since the type may be
> > > > +   structurally different then.  */
> > > > +
> > > > +tree
> > > > +prevailing_odr_type (tree t)
> > > > +{
> > > > +  t = TYPE_MAIN_VARIANT (t);
> > > > +  /* In need_assembler_name_p we also mangle assembler names of INTEGER_TYPE.
> > > > +     We can not merge these because this does not honnor precision and
> > > > +     signedness.  */
> > > > +  if (!type_with_linkage_p (t)
> > > > +      || type_in_anonymous_namespace_p (t)
> > > > +      || TREE_CODE (t) == INTEGER_TYPE
> > > > +      || !COMPLETE_TYPE_P (t))
> > > > +    return t;
> > > > +  odr_type ot = get_odr_type (t, true);
> > > > +  if (!ot || !ot->odr_violated)
> > > > +    return ot->type;
> > > > +  return t;
> > > > +}
> > > > +
> > > >  /* Add TYPE od ODR type hash.  */
> > > >  
> > > >  void
> > > > Index: ipa-utils.h
> > > > ===================================================================
> > > > --- ipa-utils.h	(revision 264689)
> > > > +++ ipa-utils.h	(working copy)
> > > > @@ -90,6 +90,7 @@ void warn_types_mismatch (tree t1, tree
> > > >  			  location_t loc2 = UNKNOWN_LOCATION);
> > > >  bool odr_or_derived_type_p (const_tree t);
> > > >  bool odr_types_equivalent_p (tree type1, tree type2);
> > > > +tree prevailing_odr_type (tree t);
> > > >  
> > > >  /* Return vector containing possible targets of polymorphic call E.
> > > >     If COMPLETEP is non-NULL, store true if the list is complete. 
> > > > Index: lto/lto.c
> > > > ===================================================================
> > > > --- lto/lto.c	(revision 264689)
> > > > +++ lto/lto.c	(working copy)
> > > > @@ -485,6 +485,8 @@ gimple_register_canonical_type_1 (tree t
> > > >  static void
> > > >  gimple_register_canonical_type (tree t)
> > > >  {
> > > > +  if (flag_checking)
> > > > +    verify_type (t);
> > > >    if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t)
> > > >        || !canonical_type_used_p (t))
> > > >      return;
> > > > Index: lto-streamer-out.c
> > > > ===================================================================
> > > > --- lto-streamer-out.c	(revision 264689)
> > > > +++ lto-streamer-out.c	(working copy)
> > > > @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3.
> > > >  #include "debug.h"
> > > >  #include "omp-offload.h"
> > > >  #include "print-tree.h"
> > > > +#include "attribs.h"
> > > > +#include "ipa-utils.h"
> > > >  
> > > >  
> > > >  static void lto_write_tree (struct output_block*, tree, bool);
> > > > @@ -109,14 +111,188 @@ destroy_output_block (struct output_bloc
> > > >    free (ob);
> > > >  }
> > > >  
> > > > +/* Do same comparsion as check_qualified_type skipping lang part of type
> > > > +   and be more permissive about type names: we only care that names are
> > > > +   same (for diagnostics) and that ODR names are the same.  */
> > > >  
> > > > -/* Look up NODE in the type table and write the index for it to OB.  */
> > > > +static bool
> > > > +types_equal_p (tree t, tree v)
> > > > +{
> > > > +  if (TYPE_QUALS (t) != TYPE_QUALS (v))
> > > > +    return false;
> > > > +
> > > > +  if (TYPE_NAME (t) != TYPE_NAME (v)
> > > > +      && (!TYPE_NAME (t) || !TYPE_NAME (v)
> > > > +	  || TREE_CODE (TYPE_NAME (t)) != TYPE_DECL
> > > > +	  || TREE_CODE (TYPE_NAME (v)) != TYPE_DECL
> > > > +	  || DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (t))
> > > > +	     != DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (v))
> > > > +	  || DECL_NAME (TYPE_NAME (t)) != DECL_NAME (TYPE_NAME (v))))
> > > > +     return false;
> > > > +
> > > > +  if (TYPE_ALIGN (t) != TYPE_ALIGN (v))
> > > > +    return false;
> > > > +
> > > > +  if (!attribute_list_equal (TYPE_ATTRIBUTES (t),
> > > > +			     TYPE_ATTRIBUTES (v)))
> > > > +     return false;
> > > > +
> > > > +  /* Do not replace complete type by incomplete.  */
> > > > +  if (COMPLETE_TYPE_P (t) != COMPLETE_TYPE_P (v))
> > > > +    return false;
> > > > +
> > > > +  if (TYPE_ADDR_SPACE (t) != TYPE_ADDR_SPACE (v))
> > > > +    return false;
> > > > +
> > > > +  gcc_assert (TREE_CODE (t) == TREE_CODE (v));
> > > > +
> > > > +  /* For pointer types and array types we also care about the type they
> > > > +     reffer to.  */
> > > > +  if (TREE_TYPE (t))
> > > > +    return types_equal_p (TREE_TYPE (t), TREE_TYPE (v));
> > > > +
> > > > +  return true;
> > > > +}
> > > > +
> > > > +/* Search list of type variants and look up first one that looks same.
> > > > +   At compile time, this removes duplicates of types created by front-ends.
> > > > +   At WPA time we also merge duplicated ODR types.  */
> > > > +
> > > > +static tree
> > > > +first_compatible_type_variant (tree t)
> > > > +{
> > > > +  tree first = NULL;
> > > > +  if (POINTER_TYPE_P (t))
> > > > +    {
> > > > +      tree t2 = first_compatible_type_variant (TREE_TYPE (t));
> > > > +      if (t2 != TREE_TYPE (t))
> > > > +	{
> > > > +	  if (TREE_CODE (t) == POINTER_TYPE)
> > > > +	    first = build_pointer_type_for_mode (t2, TYPE_MODE (t),
> > > > +						TYPE_REF_CAN_ALIAS_ALL (t));
> > > > +	  else
> > > > +	    first = build_reference_type_for_mode (t2, TYPE_MODE (t),
> > > > +						TYPE_REF_CAN_ALIAS_ALL (t));
> > > > +	}
> > > > +      else
> > > > +	first = TYPE_MAIN_VARIANT (t);
> > > > +      gcc_assert (TYPE_MAIN_VARIANT (first) == first);
> > > > +    }
> > > > +  else
> > > > +    first = prevailing_odr_type (t);
> > > > +
> > > > +  gcc_checking_assert (gimple_canonical_types_compatible_p (t, first, false));
> > > > +
> > > > +  for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
> > > > +    if (types_equal_p (t, v))
> > > > +      {
> > > > +	gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> > > > +	gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> > > > +	gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> > > > +        return v;
> > > > +      }
> > > > +
> > > > +  /* Most of the time we will find v itself in the list.  With ODR type merging
> > > > +     we however merge across TYPE_MAIN_VARIANTs and in that case we may not
> > > > +     have corresponding type in the list.  */
> > > > +  tree v = build_variant_type_copy (first);
> > > > +  TYPE_READONLY (v) = TYPE_READONLY (t);
> > > > +  TYPE_VOLATILE (v) = TYPE_VOLATILE (t);
> > > > +  TYPE_ATOMIC (v) = TYPE_ATOMIC (t);
> > > > +  TYPE_RESTRICT (v) = TYPE_RESTRICT (t);
> > > > +  TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
> > > > +  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
> > > > +  TYPE_NAME (v) = TYPE_NAME (t);
> > > > +  TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
> > > > +  TREE_TYPE (v) = TREE_TYPE (t);
> > > > +  gcc_checking_assert (types_equal_p (t, v));
> > > > +  gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
> > > > +  gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
> > > > +  gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
> > > > +  return v;
> > > > +}
> > > > +
> > > > +
> > > > +/* Look up NODE in the type table and write the index for it to OB.
> > > > +   Eliminate unnecesary type variants.  */
> > > >  
> > > >  static void
> > > >  output_type_ref (struct output_block *ob, tree node)
> > > >  {
> > > > +  bool existed_p;
> > > > +  unsigned int &index
> > > > +    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> > > > +       get_or_insert (node, &existed_p);
> > > >    streamer_write_record_start (ob, LTO_type_ref);
> > > > -  lto_output_type_ref_index (ob->decl_state, ob->main_stream, node);
> > > > +
> > > > +  if (existed_p)
> > > > +    {
> > > > +      streamer_write_uhwi_stream (ob->main_stream, index);
> > > > +      return;
> > > > +    }
> > > > +
> > > > +  tree node2 = first_compatible_type_variant (node);
> > > > +
> > > > +  /* If NODE is first variant, just add it into encoder.  */
> > > > +  if (node2 == node)
> > > > +    {
> > > > +      index = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> > > > +      if (streamer_dump_file)
> > > > +	{
> > > > +	  print_node_brief (streamer_dump_file,
> > > > +			    "    Encoding indexable ", node, 4);
> > > > +	  fprintf (streamer_dump_file, "  as %i\n", index);
> > > > +	}
> > > > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node);
> > > > +      streamer_write_uhwi_stream (ob->main_stream, index);
> > > > +      return;
> > > > +    }
> > > > +
> > > > +  index = 0xdead;
> > > > +
> > > > +  if (streamer_dump_file)
> > > > +    {
> > > > +      print_node_brief (streamer_dump_file, "    Type ", node, 4);
> > > > +      fprintf (streamer_dump_file, "  prevailed by ");
> > > > +      print_node_brief (streamer_dump_file, "", node2, 4);
> > > > +      fprintf (streamer_dump_file, "\n");
> > > > +    }
> > > > +
> > > > +  /* See if we already assgined one to the first variant.  If that is the case
> > > > +     only duplicate the entry in the hashtable so we do not need to call
> > > > +     first_compatible_type_variant redundantly.  */
> > > > +  unsigned int &index2
> > > > +    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
> > > > +	get_or_insert (node2, &existed_p);
> > > > +  /* The reference may be changed by hash table expansion.  */
> > > > +  unsigned int i = index2;
> > > > +
> > > > +  if (existed_p)
> > > > +    {
> > > > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> > > > +	tree_hash_table->get_or_insert (node, &existed_p) = i;
> > > > +      streamer_write_uhwi_stream (ob->main_stream, i);
> > > > +      return;
> > > > +    }
> > > > +  else
> > > > +    {
> > > > +      index2 = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
> > > > +      i = index2;
> > > > +      if (streamer_dump_file)
> > > > +	{
> > > > +	  print_node_brief (streamer_dump_file,
> > > > +			    "    Encoding indexable ", node2, 4);
> > > > +	  fprintf (streamer_dump_file, "  as %i \n", i);
> > > > +	}
> > > > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node2);
> > > > +      /* We need to search again to watch hash table resize.  */
> > > > +      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
> > > > +	tree_hash_table->get_or_insert (node, &existed_p) = i;
> > > > +      gcc_assert (existed_p);
> > > > +      streamer_write_uhwi_stream (ob->main_stream, i);
> > > > +    }
> > > > +  return;
> > > >  }
> > > >  
> > > >  
> > > > @@ -978,12 +1171,15 @@ hash_tree (struct streamer_tree_cache_d
> > > >  #define visit(SIBLING) \
> > > >    do { \
> > > >      unsigned ix; \
> > > > -    if (!SIBLING) \
> > > > +    tree t2 = SIBLING; \
> > > > +    if (t2 && TYPE_P (t2)) \
> > > > +      t2 = first_compatible_type_variant (t2); \
> > > > +    if (!t2) \
> > > >        hstate.add_int (0); \
> > > > -    else if (streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
> > > > +    else if (streamer_tree_cache_lookup (cache, t2, &ix)) \
> > > >        hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \
> > > >      else if (map) \
> > > > -      hstate.add_int (*map->get (SIBLING)); \
> > > > +      hstate.add_int (*map->get (t2)); \
> > > >      else \
> > > >        hstate.add_int (1); \
> > > >    } while (0)
> > > > @@ -1554,6 +1750,10 @@ DFS::DFS_write_tree (struct output_block
> > > >    if (this_ref_p && tree_is_indexable (expr))
> > > >      return;
> > > >  
> > > > +  /* Replace type by its prevailing variant.  */
> > > > +  if (TYPE_P (expr))
> > > > +    expr = first_compatible_type_variant (expr);
> > > > +
> > > >    /* Check if we already streamed EXPR.  */
> > > >    if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
> > > >      return;
> > > > @@ -1591,6 +1791,11 @@ lto_output_tree (struct output_block *ob
> > > >        return;
> > > >      }
> > > >  
> > > > +  if (TYPE_P (expr))
> > > > +    expr = first_compatible_type_variant (expr);
> > > > +
> > > > +  gcc_checking_assert (!this_ref_p || !tree_is_indexable (expr));
> > > > +
> > > >    existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
> > > >    if (existed_p)
> > > >      {
> > > > @@ -2334,7 +2539,18 @@ copy_function_or_variable (struct symtab
> > > >        gcc_assert (lto_tree_ref_encoder_size (encoder) == 0);
> > > >        encoder->trees.reserve_exact (n);
> > > >        for (j = 0; j < n; j++)
> > > > -	encoder->trees.safe_push ((*trees)[j]);
> > > > +	{
> > > > +	  tree t = (*trees)[j];
> > > > +	  if (TYPE_P (t))
> > > > +	    t = first_compatible_type_variant (t);
> > > > +	  if (streamer_dump_file)
> > > > +	    {
> > > > +	      print_node_brief (streamer_dump_file,
> > > > +				"  Adding reference to ", t, 4);
> > > > +	      fprintf (streamer_dump_file, "  as %i \n", (int)j);
> > > > +	    }
> > > > +	  encoder->trees.safe_push (t);
> > > > +	}
> > > >      }
> > > >  
> > > >    lto_free_raw_section_data (file_data, LTO_section_function_body, name,
> > > > @@ -2507,6 +2723,8 @@ write_global_stream (struct output_block
> > > >  	  print_node_brief (streamer_dump_file, "", t, 4);
> > > >            fprintf (streamer_dump_file, "\n");
> > > >  	}
> > > > +      gcc_checking_assert (!TYPE_P (t)
> > > > +			   || t == first_compatible_type_variant (t));
> > > >        if (!streamer_tree_cache_lookup (ob->writer_cache, t, NULL))
> > > >  	stream_write_tree (ob, t, false);
> > > >      }
> > > > @@ -2537,6 +2755,8 @@ write_global_references (struct output_b
> > > >        t = lto_tree_ref_encoder_get_tree (encoder, index);
> > > >        streamer_tree_cache_lookup (ob->writer_cache, t, &slot_num);
> > > >        gcc_assert (slot_num != (unsigned)-1);
> > > > +      gcc_checking_assert (!TYPE_P (t)
> > > > +			   || t == first_compatible_type_variant (t));
> > > >        data[index + 1] = slot_num;
> > > >      }
> > > >  
> > > > Index: tree-ssa-alias.c
> > > > ===================================================================
> > > > --- tree-ssa-alias.c	(revision 264689)
> > > > +++ tree-ssa-alias.c	(working copy)
> > > > @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.
> > > >  #include "tree-dfa.h"
> > > >  #include "ipa-reference.h"
> > > >  #include "varasm.h"
> > > > +#include "ipa-utils.h"
> > > >  
> > > >  /* Broad overview of how alias analysis on gimple works:
> > > >  
> > > > @@ -955,6 +956,12 @@ nonoverlapping_component_refs_of_decl_p
> > > >  	     ??? Bitfields can overlap at RTL level so punt on them.  */
> > > >  	  if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
> > > >  	    return false;
> > > > +	  /* In LTO we may end up merging different variants of ODR types
> > > > +	     but keep corresponding field decls unmerged.  */
> > > > +	  if (in_lto_p
> > > > +	      && type_with_linkage_p (type1)
> > > > +	      && DECL_NAME (field1) == DECL_NAME (field2))
> > > > +	    return false;
> > > >  	  return true;
> > > >  	}
> > > >      }
> > > > @@ -1049,7 +1056,9 @@ nonoverlapping_component_refs_p (const_t
> > > >  	{
> > > >  	  /* We're left with accessing different fields of a structure,
> > > >  	     no possible overlap.  */
> > > > -	  if (fieldx != fieldy)
> > > > +	  if (fieldx != fieldy
> > > > +	      && (in_lto_p || !type_with_linkage_p (typex)
> > > > +		  || DECL_NAME (fieldx) == DECL_NAME (fieldy)))
> > > >  	    {
> > > >  	      /* A field and its representative need to be considered the
> > > >  		 same.  */
> > > > Index: tree.c
> > > > ===================================================================
> > > > --- tree.c	(revision 264689)
> > > > +++ tree.c	(working copy)
> > > > @@ -5845,7 +5861,12 @@ free_lang_data_in_cgraph (void)
> > > >  
> > > >    /* Traverse every type found freeing its language data.  */
> > > >    FOR_EACH_VEC_ELT (fld.types, i, t)
> > > > -    free_lang_data_in_type (t);
> > > > +    {
> > > > +      free_lang_data_in_type (t);
> > > > +      while (TYPE_NEXT_VARIANT (t)
> > > > +	     && !fld.pset.contains (TYPE_NEXT_VARIANT (t)))
> > > > +	TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_NEXT_VARIANT (t));
> > > > +    }
> > > >    if (flag_checking)
> > > >      {
> > > >        FOR_EACH_VEC_ELT (fld.types, i, t)
> > > > 
> > > > 
> > > 
> > > -- 
> > > Richard Biener <rguenther@suse.de>
> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> > 
> > 
> 
>
Jan Hubicka Oct. 1, 2018, 1:27 p.m. UTC | #5
> > I see.  So one possible canonicalization is to make _all_
> > pointer-typed FIELD_DECLs point to incomplete variants since the memory
> > accesses should already have the "proper" access types.  Can you
> > get statistics on that?  Not sure how to get an "incomplete" type
> > though (iff we can simply copy the type and NULL TYPE_FIELDs and
> > TYPE_SIZE and friends) - again I'd do that at FLD time.
> 
> So sth like
> 
>  tp = build_distinct_type_copy (t);
>  TYPE_FIELDS (tp) = NULL_TREE;
>  TYPE_SIZE (tp) = NULL_TREE;
>  TYPE_SIZE_UNIT (tp) = NULL_TREE;
>  tp = type_hash_canon (tp);
> 
> of course we "leak" the original type in used COMPONENT_REFs
> (may also cause some verifier ICEs here if the types mismatch that
> of the FIELD_DECLs) and in aggregate copies, etc.  But I wonder
> how much "unused" unnecessary types we have.  That is, I'd paper
> over the ICEs this causes and not fixup the IL stream at first for
> example.

I had patch to play with this as well, let me see if I can revive it.
One problem here is that we will lose info about ODR violations that happens
through pointers.

Honza
Richard Biener Oct. 1, 2018, 1:32 p.m. UTC | #6
On Mon, 1 Oct 2018, Jan Hubicka wrote:

> > > I see.  So one possible canonicalization is to make _all_
> > > pointer-typed FIELD_DECLs point to incomplete variants since the memory
> > > accesses should already have the "proper" access types.  Can you
> > > get statistics on that?  Not sure how to get an "incomplete" type
> > > though (iff we can simply copy the type and NULL TYPE_FIELDs and
> > > TYPE_SIZE and friends) - again I'd do that at FLD time.
> > 
> > So sth like
> > 
> >  tp = build_distinct_type_copy (t);
> >  TYPE_FIELDS (tp) = NULL_TREE;
> >  TYPE_SIZE (tp) = NULL_TREE;
> >  TYPE_SIZE_UNIT (tp) = NULL_TREE;
> >  tp = type_hash_canon (tp);
> > 
> > of course we "leak" the original type in used COMPONENT_REFs
> > (may also cause some verifier ICEs here if the types mismatch that
> > of the FIELD_DECLs) and in aggregate copies, etc.  But I wonder
> > how much "unused" unnecessary types we have.  That is, I'd paper
> > over the ICEs this causes and not fixup the IL stream at first for
> > example.
> 
> I had patch to play with this as well, let me see if I can revive it.
> One problem here is that we will lose info about ODR violations that happens
> through pointers.

How so, if we keep the mangled name of the pointed-to types?

Richard.

> Honza
> 
>
Jan Hubicka Oct. 1, 2018, 1:39 p.m. UTC | #7
> On Mon, 1 Oct 2018, Jan Hubicka wrote:
> 
> > > > I see.  So one possible canonicalization is to make _all_
> > > > pointer-typed FIELD_DECLs point to incomplete variants since the memory
> > > > accesses should already have the "proper" access types.  Can you
> > > > get statistics on that?  Not sure how to get an "incomplete" type
> > > > though (iff we can simply copy the type and NULL TYPE_FIELDs and
> > > > TYPE_SIZE and friends) - again I'd do that at FLD time.
> > > 
> > > So sth like
> > > 
> > >  tp = build_distinct_type_copy (t);
> > >  TYPE_FIELDS (tp) = NULL_TREE;
> > >  TYPE_SIZE (tp) = NULL_TREE;
> > >  TYPE_SIZE_UNIT (tp) = NULL_TREE;
> > >  tp = type_hash_canon (tp);
> > > 
> > > of course we "leak" the original type in used COMPONENT_REFs
> > > (may also cause some verifier ICEs here if the types mismatch that
> > > of the FIELD_DECLs) and in aggregate copies, etc.  But I wonder
> > > how much "unused" unnecessary types we have.  That is, I'd paper
> > > over the ICEs this causes and not fixup the IL stream at first for
> > > example.
> > 
> > I had patch to play with this as well, let me see if I can revive it.
> > One problem here is that we will lose info about ODR violations that happens
> > through pointers.
> 
> How so, if we keep the mangled name of the pointed-to types?

If you have "struct a" which violates ODR rule between units foo and
bar, then you need to make difference between struct a *ptr defined in
foo and one in bar. If tree merging merges them, we lose this info.
It is true that we can ten just declare any structure containing "struct a*"
as ODR violating that is probably good enough.

honza
> 
> Richard.
> 
> > Honza
> > 
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
diff mbox series

Patch

Index: ipa-devirt.c
===================================================================
--- ipa-devirt.c	(revision 264689)
+++ ipa-devirt.c	(working copy)
@@ -2111,6 +2111,29 @@  get_odr_type (tree type, bool insert)
   return val;
 }
 
+/* Return the main variant of the odr type.  This is used for straming out
+   to reduce number of type duplicates hitting the WPA->LTRANS streams.
+   Do not do so when ODR violation was detected since the type may be
+   structurally different then.  */
+
+tree
+prevailing_odr_type (tree t)
+{
+  t = TYPE_MAIN_VARIANT (t);
+  /* In need_assembler_name_p we also mangle assembler names of INTEGER_TYPE.
+     We can not merge these because this does not honnor precision and
+     signedness.  */
+  if (!type_with_linkage_p (t)
+      || type_in_anonymous_namespace_p (t)
+      || TREE_CODE (t) == INTEGER_TYPE
+      || !COMPLETE_TYPE_P (t))
+    return t;
+  odr_type ot = get_odr_type (t, true);
+  if (!ot || !ot->odr_violated)
+    return ot->type;
+  return t;
+}
+
 /* Add TYPE od ODR type hash.  */
 
 void
Index: ipa-utils.h
===================================================================
--- ipa-utils.h	(revision 264689)
+++ ipa-utils.h	(working copy)
@@ -90,6 +90,7 @@  void warn_types_mismatch (tree t1, tree
 			  location_t loc2 = UNKNOWN_LOCATION);
 bool odr_or_derived_type_p (const_tree t);
 bool odr_types_equivalent_p (tree type1, tree type2);
+tree prevailing_odr_type (tree t);
 
 /* Return vector containing possible targets of polymorphic call E.
    If COMPLETEP is non-NULL, store true if the list is complete. 
Index: lto/lto.c
===================================================================
--- lto/lto.c	(revision 264689)
+++ lto/lto.c	(working copy)
@@ -485,6 +485,8 @@  gimple_register_canonical_type_1 (tree t
 static void
 gimple_register_canonical_type (tree t)
 {
+  if (flag_checking)
+    verify_type (t);
   if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t)
       || !canonical_type_used_p (t))
     return;
Index: lto-streamer-out.c
===================================================================
--- lto-streamer-out.c	(revision 264689)
+++ lto-streamer-out.c	(working copy)
@@ -43,6 +43,8 @@  along with GCC; see the file COPYING3.
 #include "debug.h"
 #include "omp-offload.h"
 #include "print-tree.h"
+#include "attribs.h"
+#include "ipa-utils.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -109,14 +111,188 @@  destroy_output_block (struct output_bloc
   free (ob);
 }
 
+/* Do same comparsion as check_qualified_type skipping lang part of type
+   and be more permissive about type names: we only care that names are
+   same (for diagnostics) and that ODR names are the same.  */
 
-/* Look up NODE in the type table and write the index for it to OB.  */
+static bool
+types_equal_p (tree t, tree v)
+{
+  if (TYPE_QUALS (t) != TYPE_QUALS (v))
+    return false;
+
+  if (TYPE_NAME (t) != TYPE_NAME (v)
+      && (!TYPE_NAME (t) || !TYPE_NAME (v)
+	  || TREE_CODE (TYPE_NAME (t)) != TYPE_DECL
+	  || TREE_CODE (TYPE_NAME (v)) != TYPE_DECL
+	  || DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (t))
+	     != DECL_ASSEMBLER_NAME_RAW (TYPE_NAME (v))
+	  || DECL_NAME (TYPE_NAME (t)) != DECL_NAME (TYPE_NAME (v))))
+     return false;
+
+  if (TYPE_ALIGN (t) != TYPE_ALIGN (v))
+    return false;
+
+  if (!attribute_list_equal (TYPE_ATTRIBUTES (t),
+			     TYPE_ATTRIBUTES (v)))
+     return false;
+
+  /* Do not replace complete type by incomplete.  */
+  if (COMPLETE_TYPE_P (t) != COMPLETE_TYPE_P (v))
+    return false;
+
+  if (TYPE_ADDR_SPACE (t) != TYPE_ADDR_SPACE (v))
+    return false;
+
+  gcc_assert (TREE_CODE (t) == TREE_CODE (v));
+
+  /* For pointer types and array types we also care about the type they
+     reffer to.  */
+  if (TREE_TYPE (t))
+    return types_equal_p (TREE_TYPE (t), TREE_TYPE (v));
+
+  return true;
+}
+
+/* Search list of type variants and look up first one that looks same.
+   At compile time, this removes duplicates of types created by front-ends.
+   At WPA time we also merge duplicated ODR types.  */
+
+static tree
+first_compatible_type_variant (tree t)
+{
+  tree first = NULL;
+  if (POINTER_TYPE_P (t))
+    {
+      tree t2 = first_compatible_type_variant (TREE_TYPE (t));
+      if (t2 != TREE_TYPE (t))
+	{
+	  if (TREE_CODE (t) == POINTER_TYPE)
+	    first = build_pointer_type_for_mode (t2, TYPE_MODE (t),
+						TYPE_REF_CAN_ALIAS_ALL (t));
+	  else
+	    first = build_reference_type_for_mode (t2, TYPE_MODE (t),
+						TYPE_REF_CAN_ALIAS_ALL (t));
+	}
+      else
+	first = TYPE_MAIN_VARIANT (t);
+      gcc_assert (TYPE_MAIN_VARIANT (first) == first);
+    }
+  else
+    first = prevailing_odr_type (t);
+
+  gcc_checking_assert (gimple_canonical_types_compatible_p (t, first, false));
+
+  for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
+    if (types_equal_p (t, v))
+      {
+	gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
+	gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
+	gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
+        return v;
+      }
+
+  /* Most of the time we will find v itself in the list.  With ODR type merging
+     we however merge across TYPE_MAIN_VARIANTs and in that case we may not
+     have corresponding type in the list.  */
+  tree v = build_variant_type_copy (first);
+  TYPE_READONLY (v) = TYPE_READONLY (t);
+  TYPE_VOLATILE (v) = TYPE_VOLATILE (t);
+  TYPE_ATOMIC (v) = TYPE_ATOMIC (t);
+  TYPE_RESTRICT (v) = TYPE_RESTRICT (t);
+  TYPE_ADDR_SPACE (v) = TYPE_ADDR_SPACE (t);
+  SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
+  TYPE_NAME (v) = TYPE_NAME (t);
+  TYPE_ATTRIBUTES (v) = TYPE_ATTRIBUTES (t);
+  TREE_TYPE (v) = TREE_TYPE (t);
+  gcc_checking_assert (types_equal_p (t, v));
+  gcc_checking_assert (gimple_canonical_types_compatible_p (t, v, false));
+  gcc_checking_assert (!in_lto_p || TYPE_CANONICAL (t) == TYPE_CANONICAL (v));
+  gcc_checking_assert (get_alias_set (t) == get_alias_set (v));
+  return v;
+}
+
+
+/* Look up NODE in the type table and write the index for it to OB.
+   Eliminate unnecesary type variants.  */
 
 static void
 output_type_ref (struct output_block *ob, tree node)
 {
+  bool existed_p;
+  unsigned int &index
+    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
+       get_or_insert (node, &existed_p);
   streamer_write_record_start (ob, LTO_type_ref);
-  lto_output_type_ref_index (ob->decl_state, ob->main_stream, node);
+
+  if (existed_p)
+    {
+      streamer_write_uhwi_stream (ob->main_stream, index);
+      return;
+    }
+
+  tree node2 = first_compatible_type_variant (node);
+
+  /* If NODE is first variant, just add it into encoder.  */
+  if (node2 == node)
+    {
+      index = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
+      if (streamer_dump_file)
+	{
+	  print_node_brief (streamer_dump_file,
+			    "    Encoding indexable ", node, 4);
+	  fprintf (streamer_dump_file, "  as %i\n", index);
+	}
+      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node);
+      streamer_write_uhwi_stream (ob->main_stream, index);
+      return;
+    }
+
+  index = 0xdead;
+
+  if (streamer_dump_file)
+    {
+      print_node_brief (streamer_dump_file, "    Type ", node, 4);
+      fprintf (streamer_dump_file, "  prevailed by ");
+      print_node_brief (streamer_dump_file, "", node2, 4);
+      fprintf (streamer_dump_file, "\n");
+    }
+
+  /* See if we already assgined one to the first variant.  If that is the case
+     only duplicate the entry in the hashtable so we do not need to call
+     first_compatible_type_variant redundantly.  */
+  unsigned int &index2
+    = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].tree_hash_table->
+	get_or_insert (node2, &existed_p);
+  /* The reference may be changed by hash table expansion.  */
+  unsigned int i = index2;
+
+  if (existed_p)
+    {
+      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
+	tree_hash_table->get_or_insert (node, &existed_p) = i;
+      streamer_write_uhwi_stream (ob->main_stream, i);
+      return;
+    }
+  else
+    {
+      index2 = ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.length ();
+      i = index2;
+      if (streamer_dump_file)
+	{
+	  print_node_brief (streamer_dump_file,
+			    "    Encoding indexable ", node2, 4);
+	  fprintf (streamer_dump_file, "  as %i \n", i);
+	}
+      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].trees.safe_push (node2);
+      /* We need to search again to watch hash table resize.  */
+      ob->decl_state->streams[LTO_DECL_STREAM_TYPE].
+	tree_hash_table->get_or_insert (node, &existed_p) = i;
+      gcc_assert (existed_p);
+      streamer_write_uhwi_stream (ob->main_stream, i);
+    }
+  return;
 }
 
 
@@ -978,12 +1171,15 @@  hash_tree (struct streamer_tree_cache_d
 #define visit(SIBLING) \
   do { \
     unsigned ix; \
-    if (!SIBLING) \
+    tree t2 = SIBLING; \
+    if (t2 && TYPE_P (t2)) \
+      t2 = first_compatible_type_variant (t2); \
+    if (!t2) \
       hstate.add_int (0); \
-    else if (streamer_tree_cache_lookup (cache, SIBLING, &ix)) \
+    else if (streamer_tree_cache_lookup (cache, t2, &ix)) \
       hstate.add_int (streamer_tree_cache_get_hash (cache, ix)); \
     else if (map) \
-      hstate.add_int (*map->get (SIBLING)); \
+      hstate.add_int (*map->get (t2)); \
     else \
       hstate.add_int (1); \
   } while (0)
@@ -1554,6 +1750,10 @@  DFS::DFS_write_tree (struct output_block
   if (this_ref_p && tree_is_indexable (expr))
     return;
 
+  /* Replace type by its prevailing variant.  */
+  if (TYPE_P (expr))
+    expr = first_compatible_type_variant (expr);
+
   /* Check if we already streamed EXPR.  */
   if (streamer_tree_cache_lookup (ob->writer_cache, expr, NULL))
     return;
@@ -1591,6 +1791,11 @@  lto_output_tree (struct output_block *ob
       return;
     }
 
+  if (TYPE_P (expr))
+    expr = first_compatible_type_variant (expr);
+
+  gcc_checking_assert (!this_ref_p || !tree_is_indexable (expr));
+
   existed_p = streamer_tree_cache_lookup (ob->writer_cache, expr, &ix);
   if (existed_p)
     {
@@ -2334,7 +2539,18 @@  copy_function_or_variable (struct symtab
       gcc_assert (lto_tree_ref_encoder_size (encoder) == 0);
       encoder->trees.reserve_exact (n);
       for (j = 0; j < n; j++)
-	encoder->trees.safe_push ((*trees)[j]);
+	{
+	  tree t = (*trees)[j];
+	  if (TYPE_P (t))
+	    t = first_compatible_type_variant (t);
+	  if (streamer_dump_file)
+	    {
+	      print_node_brief (streamer_dump_file,
+				"  Adding reference to ", t, 4);
+	      fprintf (streamer_dump_file, "  as %i \n", (int)j);
+	    }
+	  encoder->trees.safe_push (t);
+	}
     }
 
   lto_free_raw_section_data (file_data, LTO_section_function_body, name,
@@ -2507,6 +2723,8 @@  write_global_stream (struct output_block
 	  print_node_brief (streamer_dump_file, "", t, 4);
           fprintf (streamer_dump_file, "\n");
 	}
+      gcc_checking_assert (!TYPE_P (t)
+			   || t == first_compatible_type_variant (t));
       if (!streamer_tree_cache_lookup (ob->writer_cache, t, NULL))
 	stream_write_tree (ob, t, false);
     }
@@ -2537,6 +2755,8 @@  write_global_references (struct output_b
       t = lto_tree_ref_encoder_get_tree (encoder, index);
       streamer_tree_cache_lookup (ob->writer_cache, t, &slot_num);
       gcc_assert (slot_num != (unsigned)-1);
+      gcc_checking_assert (!TYPE_P (t)
+			   || t == first_compatible_type_variant (t));
       data[index + 1] = slot_num;
     }
 
Index: tree-ssa-alias.c
===================================================================
--- tree-ssa-alias.c	(revision 264689)
+++ tree-ssa-alias.c	(working copy)
@@ -38,6 +38,7 @@  along with GCC; see the file COPYING3.
 #include "tree-dfa.h"
 #include "ipa-reference.h"
 #include "varasm.h"
+#include "ipa-utils.h"
 
 /* Broad overview of how alias analysis on gimple works:
 
@@ -955,6 +956,12 @@  nonoverlapping_component_refs_of_decl_p
 	     ??? Bitfields can overlap at RTL level so punt on them.  */
 	  if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2))
 	    return false;
+	  /* In LTO we may end up merging different variants of ODR types
+	     but keep corresponding field decls unmerged.  */
+	  if (in_lto_p
+	      && type_with_linkage_p (type1)
+	      && DECL_NAME (field1) == DECL_NAME (field2))
+	    return false;
 	  return true;
 	}
     }
@@ -1049,7 +1056,9 @@  nonoverlapping_component_refs_p (const_t
 	{
 	  /* We're left with accessing different fields of a structure,
 	     no possible overlap.  */
-	  if (fieldx != fieldy)
+	  if (fieldx != fieldy
+	      && (in_lto_p || !type_with_linkage_p (typex)
+		  || DECL_NAME (fieldx) == DECL_NAME (fieldy)))
 	    {
 	      /* A field and its representative need to be considered the
 		 same.  */
Index: tree.c
===================================================================
--- tree.c	(revision 264689)
+++ tree.c	(working copy)
@@ -5845,7 +5861,12 @@  free_lang_data_in_cgraph (void)
 
   /* Traverse every type found freeing its language data.  */
   FOR_EACH_VEC_ELT (fld.types, i, t)
-    free_lang_data_in_type (t);
+    {
+      free_lang_data_in_type (t);
+      while (TYPE_NEXT_VARIANT (t)
+	     && !fld.pset.contains (TYPE_NEXT_VARIANT (t)))
+	TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_NEXT_VARIANT (t));
+    }
   if (flag_checking)
     {
       FOR_EACH_VEC_ELT (fld.types, i, t)