diff mbox

Prune TYPE_FIELDS lists more in free_lang_data

Message ID 20151211085231.GC5527@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Dec. 11, 2015, 8:52 a.m. UTC
Hi,
this patch further reduce memory use and time of WPA stage, especially without -g
 phase opt and generate  :  75.66 (39%) usr   1.78 (14%) sys  77.44 (37%) wall  855644 kB (21%) ggc
 phase stream in         :  34.62 (18%) usr   1.95 (16%) sys  36.57 (18%) wall 3245604 kB (79%) ggc
 phase stream out        :  81.89 (42%) usr   8.49 (69%) sys  90.37 (44%) wall      50 kB ( 0%) ggc
 ipa dead code removal   :   4.33 ( 2%) usr   0.06 ( 0%) sys   4.24 ( 2%) wall       0 kB ( 0%) ggc
 ipa virtual call target :  25.15 (13%) usr   0.14 ( 1%) sys  25.42 (12%) wall       0 kB ( 0%) ggc
 ipa cp                  :   3.92 ( 2%) usr   0.21 ( 2%) sys   4.18 ( 2%) wall  340698 kB ( 8%) ggc
 ipa inlining heuristics :  24.12 (12%) usr   0.38 ( 3%) sys  24.37 (12%) wall  500427 kB (12%) ggc
 lto stream inflate      :   7.07 ( 4%) usr   0.38 ( 3%) sys   7.33 ( 4%) wall       0 kB ( 0%) ggc
 ipa lto gimple in       :   1.95 ( 1%) usr   0.61 ( 5%) sys   2.42 ( 1%) wall  324875 kB ( 8%) ggc
 ipa lto gimple out      :   9.16 ( 5%) usr   1.64 (13%) sys  10.49 ( 5%) wall      50 kB ( 0%) ggc
 ipa lto decl in         :  21.25 (11%) usr   1.01 ( 8%) sys  22.37 (11%) wall 2348869 kB (57%) ggc
 ipa lto decl out        :  67.33 (34%) usr   1.66 (13%) sys  68.96 (33%) wall       0 kB ( 0%) ggc
 ipa lto constructors out:   1.39 ( 1%) usr   0.38 ( 3%) sys   2.18 ( 1%) wall       0 kB ( 0%) ggc
 ipa lto decl merge      :   2.12 ( 2%) usr   0.00 ( 0%) sys   2.12 ( 2%) wall   13737 kB ( 0%) ggc
 ipa reference           :   2.14 ( 2%) usr   0.00 ( 0%) sys   2.13 ( 2%) wall       0 kB ( 0%) ggc
 ipa pure const          :   2.29 ( 2%) usr   0.01 ( 0%) sys   2.35 ( 2%) wall       0 kB ( 0%) ggc
 ipa icf                 :   9.02 ( 7%) usr   0.18 ( 2%) sys   9.72 ( 7%) wall   19203 kB ( 0%) ggc
 TOTAL                 : 195.27            12.37           207.64            4103297 kB

to:

 phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1996 kB ( 0%) ggc
 phase opt and generate  :  77.17 (53%) usr   1.69 ( 9%) sys  79.45 (48%) wall  856874 kB (26%) ggc
 phase stream in         :  25.92 (18%) usr   1.75 (10%) sys  27.66 (17%) wall 2418654 kB (74%) ggc
 phase stream out        :  39.90 (27%) usr  14.74 (81%) sys  54.82 (33%) wall      50 kB ( 0%) ggc
 phase finalize          :   2.52 ( 2%) usr   0.11 ( 1%) sys   2.63 ( 2%) wall       0 kB ( 0%) ggc
 garbage collection      :   4.56 ( 3%) usr   0.01 ( 0%) sys   4.56 ( 3%) wall       0 kB ( 0%) ggc
 ipa dead code removal   :   4.32 ( 3%) usr   0.03 ( 0%) sys   4.59 ( 3%) wall       2 kB ( 0%) ggc
 ipa virtual call target :  23.19 (16%) usr   0.18 ( 1%) sys  23.31 (14%) wall       0 kB ( 0%) ggc
 ipa cp                  :   4.06 ( 3%) usr   0.18 ( 1%) sys   4.10 ( 2%) wall  339974 kB (10%) ggc
 ipa inlining heuristics :  25.05 (17%) usr   0.32 ( 2%) sys  25.86 (16%) wall  500986 kB (15%) ggc
 lto stream inflate      :   5.50 ( 4%) usr   0.42 ( 2%) sys   5.73 ( 3%) wall       0 kB ( 0%) ggc
 ipa lto gimple in       :   1.97 ( 1%) usr   0.51 ( 3%) sys   2.70 ( 2%) wall  324937 kB (10%) ggc
 ipa lto gimple out      :   9.00 ( 6%) usr   1.59 ( 9%) sys  10.22 ( 6%) wall      50 kB ( 0%) ggc
 ipa lto decl in         :  14.29 (10%) usr   0.73 ( 4%) sys  15.18 ( 9%) wall 1522854 kB (46%) ggc
 ipa lto decl out        :  25.35 (17%) usr   0.59 ( 3%) sys  25.91 (16%) wall       0 kB ( 0%) ggc
 ipa lto constructors out:   1.48 ( 1%) usr   0.51 ( 3%) sys   2.38 ( 1%) wall       0 kB ( 0%) ggc
 ipa lto cgraph I/O      :   0.74 ( 1%) usr   0.22 ( 1%) sys   0.97 ( 1%) wall  408576 kB (12%) ggc
 ipa lto decl merge      :   1.94 ( 1%) usr   0.00 ( 0%) sys   1.95 ( 1%) wall   13556 kB ( 0%) ggc
 whopr wpa I/O           :   2.95 ( 2%) usr  12.03 (66%) sys  15.17 ( 9%) wall       0 kB ( 0%) ggc
 whopr partitioning      :   3.99 ( 3%) usr   0.03 ( 0%) sys   4.01 ( 2%) wall   13619 kB ( 0%) ggc
 ipa reference           :   2.45 ( 2%) usr   0.01 ( 0%) sys   2.46 ( 1%) wall       0 kB ( 0%) ggc
 ipa pure const          :   2.30 ( 2%) usr   0.03 ( 0%) sys   2.33 ( 1%) wall       0 kB ( 0%) ggc
 ipa icf                 :   8.30 ( 6%) usr   0.26 ( 1%) sys   8.37 ( 5%) wall   19276 kB ( 1%) ggc
 TOTAL                 : 145.51            18.29           164.57            3277576 kB

With debug output the numbers are not that impressive, but sitll about 17% down from decl in.
It also leads to about 63% code size reduction for global decl streams.

I built WPA with -flto-partition=max and looked into one of partitions that seemed most absurd.
We used about 180k type delcs to produce about 700 lines of assembler that mostly contained
a calls to various methods. THe thing is that each method borught in a lot of declarations
so I looked into why and noticed that TYPE_FIELDS contains TYPE_DECLS that are mostly ignored
by the back-end expect for dwaf2out and dwarf2out actually ignores good portion of them, too.

I thus made a predicate to tell waht decls are going to be useful for dwarf2out and removed
rest in free_lang_data.  Clearly with early debug, we will be able to remove them all.

Honza


	* tree.c (free_lang_data_in_type): Skip irrelevant typedecls.
	(find_decls_types_r): Likewise.
	* tree.h (type_decl_relevant_for_debug_p): Declare.
	* dwarf2out.c (type_decl_relevant_for_debug_p): New function.

Comments

Richard Biener Dec. 11, 2015, 9:14 a.m. UTC | #1
On Fri, 11 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch further reduce memory use and time of WPA stage, especially without -g
>  phase opt and generate  :  75.66 (39%) usr   1.78 (14%) sys  77.44 (37%) wall  855644 kB (21%) ggc
>  phase stream in         :  34.62 (18%) usr   1.95 (16%) sys  36.57 (18%) wall 3245604 kB (79%) ggc
>  phase stream out        :  81.89 (42%) usr   8.49 (69%) sys  90.37 (44%) wall      50 kB ( 0%) ggc
>  ipa dead code removal   :   4.33 ( 2%) usr   0.06 ( 0%) sys   4.24 ( 2%) wall       0 kB ( 0%) ggc
>  ipa virtual call target :  25.15 (13%) usr   0.14 ( 1%) sys  25.42 (12%) wall       0 kB ( 0%) ggc
>  ipa cp                  :   3.92 ( 2%) usr   0.21 ( 2%) sys   4.18 ( 2%) wall  340698 kB ( 8%) ggc
>  ipa inlining heuristics :  24.12 (12%) usr   0.38 ( 3%) sys  24.37 (12%) wall  500427 kB (12%) ggc
>  lto stream inflate      :   7.07 ( 4%) usr   0.38 ( 3%) sys   7.33 ( 4%) wall       0 kB ( 0%) ggc
>  ipa lto gimple in       :   1.95 ( 1%) usr   0.61 ( 5%) sys   2.42 ( 1%) wall  324875 kB ( 8%) ggc
>  ipa lto gimple out      :   9.16 ( 5%) usr   1.64 (13%) sys  10.49 ( 5%) wall      50 kB ( 0%) ggc
>  ipa lto decl in         :  21.25 (11%) usr   1.01 ( 8%) sys  22.37 (11%) wall 2348869 kB (57%) ggc
>  ipa lto decl out        :  67.33 (34%) usr   1.66 (13%) sys  68.96 (33%) wall       0 kB ( 0%) ggc
>  ipa lto constructors out:   1.39 ( 1%) usr   0.38 ( 3%) sys   2.18 ( 1%) wall       0 kB ( 0%) ggc
>  ipa lto decl merge      :   2.12 ( 2%) usr   0.00 ( 0%) sys   2.12 ( 2%) wall   13737 kB ( 0%) ggc
>  ipa reference           :   2.14 ( 2%) usr   0.00 ( 0%) sys   2.13 ( 2%) wall       0 kB ( 0%) ggc
>  ipa pure const          :   2.29 ( 2%) usr   0.01 ( 0%) sys   2.35 ( 2%) wall       0 kB ( 0%) ggc
>  ipa icf                 :   9.02 ( 7%) usr   0.18 ( 2%) sys   9.72 ( 7%) wall   19203 kB ( 0%) ggc
>  TOTAL                 : 195.27            12.37           207.64            4103297 kB
> 
> to:
> 
>  phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1996 kB ( 0%) ggc
>  phase opt and generate  :  77.17 (53%) usr   1.69 ( 9%) sys  79.45 (48%) wall  856874 kB (26%) ggc
>  phase stream in         :  25.92 (18%) usr   1.75 (10%) sys  27.66 (17%) wall 2418654 kB (74%) ggc
>  phase stream out        :  39.90 (27%) usr  14.74 (81%) sys  54.82 (33%) wall      50 kB ( 0%) ggc
>  phase finalize          :   2.52 ( 2%) usr   0.11 ( 1%) sys   2.63 ( 2%) wall       0 kB ( 0%) ggc
>  garbage collection      :   4.56 ( 3%) usr   0.01 ( 0%) sys   4.56 ( 3%) wall       0 kB ( 0%) ggc
>  ipa dead code removal   :   4.32 ( 3%) usr   0.03 ( 0%) sys   4.59 ( 3%) wall       2 kB ( 0%) ggc
>  ipa virtual call target :  23.19 (16%) usr   0.18 ( 1%) sys  23.31 (14%) wall       0 kB ( 0%) ggc
>  ipa cp                  :   4.06 ( 3%) usr   0.18 ( 1%) sys   4.10 ( 2%) wall  339974 kB (10%) ggc
>  ipa inlining heuristics :  25.05 (17%) usr   0.32 ( 2%) sys  25.86 (16%) wall  500986 kB (15%) ggc
>  lto stream inflate      :   5.50 ( 4%) usr   0.42 ( 2%) sys   5.73 ( 3%) wall       0 kB ( 0%) ggc
>  ipa lto gimple in       :   1.97 ( 1%) usr   0.51 ( 3%) sys   2.70 ( 2%) wall  324937 kB (10%) ggc
>  ipa lto gimple out      :   9.00 ( 6%) usr   1.59 ( 9%) sys  10.22 ( 6%) wall      50 kB ( 0%) ggc
>  ipa lto decl in         :  14.29 (10%) usr   0.73 ( 4%) sys  15.18 ( 9%) wall 1522854 kB (46%) ggc
>  ipa lto decl out        :  25.35 (17%) usr   0.59 ( 3%) sys  25.91 (16%) wall       0 kB ( 0%) ggc
>  ipa lto constructors out:   1.48 ( 1%) usr   0.51 ( 3%) sys   2.38 ( 1%) wall       0 kB ( 0%) ggc
>  ipa lto cgraph I/O      :   0.74 ( 1%) usr   0.22 ( 1%) sys   0.97 ( 1%) wall  408576 kB (12%) ggc
>  ipa lto decl merge      :   1.94 ( 1%) usr   0.00 ( 0%) sys   1.95 ( 1%) wall   13556 kB ( 0%) ggc
>  whopr wpa I/O           :   2.95 ( 2%) usr  12.03 (66%) sys  15.17 ( 9%) wall       0 kB ( 0%) ggc
>  whopr partitioning      :   3.99 ( 3%) usr   0.03 ( 0%) sys   4.01 ( 2%) wall   13619 kB ( 0%) ggc
>  ipa reference           :   2.45 ( 2%) usr   0.01 ( 0%) sys   2.46 ( 1%) wall       0 kB ( 0%) ggc
>  ipa pure const          :   2.30 ( 2%) usr   0.03 ( 0%) sys   2.33 ( 1%) wall       0 kB ( 0%) ggc
>  ipa icf                 :   8.30 ( 6%) usr   0.26 ( 1%) sys   8.37 ( 5%) wall   19276 kB ( 1%) ggc
>  TOTAL                 : 145.51            18.29           164.57            3277576 kB
> 
> With debug output the numbers are not that impressive, but sitll about 17% down from decl in.
> It also leads to about 63% code size reduction for global decl streams.
> 
> I built WPA with -flto-partition=max and looked into one of partitions that seemed most absurd.
> We used about 180k type delcs to produce about 700 lines of assembler that mostly contained
> a calls to various methods. THe thing is that each method borught in a lot of declarations
> so I looked into why and noticed that TYPE_FIELDS contains TYPE_DECLS that are mostly ignored
> by the back-end expect for dwaf2out and dwarf2out actually ignores good portion of them, too.
> 
> I thus made a predicate to tell waht decls are going to be useful for dwarf2out and removed
> rest in free_lang_data.  Clearly with early debug, we will be able to remove them all.
> 
> Honza
> 
> 
> 	* tree.c (free_lang_data_in_type): Skip irrelevant typedecls.
> 	(find_decls_types_r): Likewise.
> 	* tree.h (type_decl_relevant_for_debug_p): Declare.
> 	* dwarf2out.c (type_decl_relevant_for_debug_p): New function.
> Index: tree.c
> ===================================================================
> --- tree.c	(revision 231546)
> +++ tree.c	(working copy)
> @@ -5191,7 +5191,8 @@ free_lang_data_in_type (tree type)
>        while (member)
>  	{
>  	  if (TREE_CODE (member) == FIELD_DECL
> -	      || TREE_CODE (member) == TYPE_DECL)
> +	      || (TREE_CODE (member) == TYPE_DECL
> +		  && type_decl_relevant_for_debug_p (member)))
>  	    {
>  	      if (prev)
>  		TREE_CHAIN (prev) = member;
> @@ -5666,7 +5667,8 @@ find_decls_types_r (tree *tp, int *ws, v
>  	  while (tem)
>  	    {
>  	      if (TREE_CODE (tem) == FIELD_DECL
> -		  || TREE_CODE (tem) == TYPE_DECL)
> +		  || (TREE_CODE (tem) == TYPE_DECL
> +		      && type_decl_relevant_for_debug_p (tem)))
>  		fld_worklist_push (tem, fld);
>  	      tem = TREE_CHAIN (tem);
>  	    }
> Index: tree.h
> ===================================================================
> --- tree.h	(revision 231546)
> +++ tree.h	(working copy)
> @@ -5417,4 +5417,6 @@ desired_pro_or_demotion_p (const_tree to
>    return to_type_precision <= TYPE_PRECISION (from_type);
>  }
>  
> +extern bool type_decl_relevant_for_debug_p (const_tree);
> +
>  #endif  /* GCC_TREE_H  */
> Index: dwarf2out.c
> ===================================================================
> --- dwarf2out.c	(revision 231546)
> +++ dwarf2out.c	(working copy)
> @@ -21134,6 +21134,15 @@ is_redundant_typedef (const_tree decl)
>    return 0;
>  }
>  
> +/* Return true if DECL is going to be useful for debug output.  */
> +bool
> +type_decl_relevant_for_debug_p (const_tree decl)
> +{
> +  if (debug_info_level <= DINFO_LEVEL_TERSE)
> +    return false;

We explicitely do not use debug-info-level tests in free-lang-data
to allow mixing -g and -g0 objects.  Are you sure doing the above
doesn't mess up tree merging enough to effectively enlarge WPA
memory use and the merged decl sections?

[I'm quite sure firefox build system manages to mess up -g vs. -g0
in some places ;)]

> +  return (!DECL_IGNORED_P (decl) && !is_redundant_typedef (decl));
> +}
> +

The patch would be ok if you simply export is_redundant_typedef
and inline the DECL_IGNORED_P check into free-lang-data.

Thanks,
Richard.
Jan Hubicka Dec. 11, 2015, 9:25 a.m. UTC | #2
> 
> We explicitely do not use debug-info-level tests in free-lang-data
> to allow mixing -g and -g0 objects.  Are you sure doing the above
> doesn't mess up tree merging enough to effectively enlarge WPA
> memory use and the merged decl sections?
> 
> [I'm quite sure firefox build system manages to mess up -g vs. -g0
> in some places ;)]

Hmm, I will try the debug build with firefox on this.  -fdump-ipa-devirt
now dumps all main variants that are duplicates of one ODR type.
We definitely have types with hundreds of duplicates, so there are
quite common cases where tree merging does not fire.
> 
> > +  return (!DECL_IGNORED_P (decl) && !is_redundant_typedef (decl));
> > +}
> > +
> 
> The patch would be ok if you simply export is_redundant_typedef
> and inline the DECL_IGNORED_P check into free-lang-data.

OK, I had that originally, will return that back.
is_redundant_typedef is declared inline.  Putting it to tree.h drags
bit too many dwarf2out internals, but I suppose it is OK to just
turn it non-inline.  It is a type of function where inliner should be
able to decide.

Honza
> 
> Thanks,
> Richard.
Richard Biener Dec. 11, 2015, 9:28 a.m. UTC | #3
On Fri, 11 Dec 2015, Jan Hubicka wrote:

> > 
> > We explicitely do not use debug-info-level tests in free-lang-data
> > to allow mixing -g and -g0 objects.  Are you sure doing the above
> > doesn't mess up tree merging enough to effectively enlarge WPA
> > memory use and the merged decl sections?
> > 
> > [I'm quite sure firefox build system manages to mess up -g vs. -g0
> > in some places ;)]
> 
> Hmm, I will try the debug build with firefox on this.  -fdump-ipa-devirt
> now dumps all main variants that are duplicates of one ODR type.
> We definitely have types with hundreds of duplicates, so there are
> quite common cases where tree merging does not fire.
> > 
> > > +  return (!DECL_IGNORED_P (decl) && !is_redundant_typedef (decl));
> > > +}
> > > +
> > 
> > The patch would be ok if you simply export is_redundant_typedef
> > and inline the DECL_IGNORED_P check into free-lang-data.
> 
> OK, I had that originally, will return that back.
> is_redundant_typedef is declared inline.  Putting it to tree.h drags
> bit too many dwarf2out internals, but I suppose it is OK to just
> turn it non-inline.  It is a type of function where inliner should be
> able to decide.

Yeah.

Richard.

> Honza
> > 
> > Thanks,
> > Richard.
diff mbox

Patch

Index: tree.c
===================================================================
--- tree.c	(revision 231546)
+++ tree.c	(working copy)
@@ -5191,7 +5191,8 @@  free_lang_data_in_type (tree type)
       while (member)
 	{
 	  if (TREE_CODE (member) == FIELD_DECL
-	      || TREE_CODE (member) == TYPE_DECL)
+	      || (TREE_CODE (member) == TYPE_DECL
+		  && type_decl_relevant_for_debug_p (member)))
 	    {
 	      if (prev)
 		TREE_CHAIN (prev) = member;
@@ -5666,7 +5667,8 @@  find_decls_types_r (tree *tp, int *ws, v
 	  while (tem)
 	    {
 	      if (TREE_CODE (tem) == FIELD_DECL
-		  || TREE_CODE (tem) == TYPE_DECL)
+		  || (TREE_CODE (tem) == TYPE_DECL
+		      && type_decl_relevant_for_debug_p (tem)))
 		fld_worklist_push (tem, fld);
 	      tem = TREE_CHAIN (tem);
 	    }
Index: tree.h
===================================================================
--- tree.h	(revision 231546)
+++ tree.h	(working copy)
@@ -5417,4 +5417,6 @@  desired_pro_or_demotion_p (const_tree to
   return to_type_precision <= TYPE_PRECISION (from_type);
 }
 
+extern bool type_decl_relevant_for_debug_p (const_tree);
+
 #endif  /* GCC_TREE_H  */
Index: dwarf2out.c
===================================================================
--- dwarf2out.c	(revision 231546)
+++ dwarf2out.c	(working copy)
@@ -21134,6 +21134,15 @@  is_redundant_typedef (const_tree decl)
   return 0;
 }
 
+/* Return true if DECL is going to be useful for debug output.  */
+bool
+type_decl_relevant_for_debug_p (const_tree decl)
+{
+  if (debug_info_level <= DINFO_LEVEL_TERSE)
+    return false;
+  return (!DECL_IGNORED_P (decl) && !is_redundant_typedef (decl));
+}
+
 /* Return TRUE if TYPE is a typedef that names a type for linkage
    purposes. This kind of typedefs is produced by the C++ FE for
    constructs like: