diff mbox

Prune BLOCK_VARs lists in free_lang_data

Message ID 20160115110802.GD77658@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Jan. 15, 2016, 11:08 a.m. UTC
Hi,
this is last of my patches to improve partitionability of programs. 
This patch compacts BLOCK_VARs lists and trows away TYPE_DECLs for -g0
and redundant TYPE_DECLs on all levels. It does make noticeable difference
on firefox, but I managed to erase the numbers (can re-test them if requested)

Bootstrapped/regtested x86_64-linux, OK?

	* tree.c (needed_in_block_vars_p): New function.
	(free_lang_data_in_decl): Use it.

Comments

Richard Biener Jan. 15, 2016, 11:37 a.m. UTC | #1
On Fri, 15 Jan 2016, Jan Hubicka wrote:

> Hi,
> this is last of my patches to improve partitionability of programs. 
> This patch compacts BLOCK_VARs lists and trows away TYPE_DECLs for -g0
> and redundant TYPE_DECLs on all levels. It does make noticeable difference
> on firefox, but I managed to erase the numbers (can re-test them if requested)
> 
> Bootstrapped/regtested x86_64-linux, OK?

Hmm.  So I wonder why remove_unused_locals does not do this then
or if it does not because it would change code-gen based on -g/-g0
I wonder why your patch wouldn't introduce that issue with LTO.

That said - improve remove_unused_locals instead please?

Thanks,
Richard.

> 	* tree.c (needed_in_block_vars_p): New function.
> 	(free_lang_data_in_decl): Use it.
> Index: tree.c
> ===================================================================
> --- tree.c	(revision 232407)
> +++ tree.c	(working copy)
> @@ -5343,6 +5343,42 @@ need_assembler_name_p (tree decl)
>    return true;
>  }
>  
> +/* Return true if DECL should stay in BLOCK_VARs list.  */
> +
> +static inline bool
> +needed_in_block_vars_p (const_tree decl)
> +{
> +  if (TREE_CODE (decl) == VAR_DECL && TREE_USED (decl)
> +      && (!TREE_STATIC (decl) || DECL_EXTERNAL (decl)))
> +    return true;
> +  if (TREE_CODE (decl) == RESULT_DECL)
> +    return true;
> +  /* Dwarf2out output non-fortran and non-ada const decls when the
> +     enum is output and it will not consider language to be ada or fortran
> +     in LTO.  */
> +  if (TREE_CODE (decl) == CONST_DECL)
> +    return false;
> +  if (debug_info_level >= DINFO_LEVEL_TERSE)
> +    return false;
> +  /* Strip builtins from the translation-unit BLOCK.  We still have targets
> +     without builtin_decl_explicit support and also builtins are shared
> +     nodes and thus we can't use TREE_CHAIN in multiple lists.  */
> +  if (TREE_CODE (decl) == FUNCTION_DECL
> +      && DECL_BUILT_IN (decl))
> +    return false;
> +  if (DECL_IGNORED_P (decl))
> +    return false;
> +  if (TREE_CODE (decl) == TYPE_DECL
> +      && is_redundant_typedef (decl))
> +    return false;
> +  gcc_checking_assert (TREE_CODE (decl) == TYPE_DECL
> +		       || TREE_CODE (decl) == VAR_DECL
> +		       || TREE_CODE (decl) == FUNCTION_DECL
> +		       || TREE_CODE (decl) == LABEL_DECL
> +		       || TREE_CODE (decl) == RESULT_DECL);
> +  return true;
> +}
> +
>  
>  /* Reset all language specific information still present in symbol
>     DECL.  */
> @@ -5445,15 +5481,11 @@ free_lang_data_in_decl (tree decl)
>             && DECL_INITIAL (decl)
>             && TREE_CODE (DECL_INITIAL (decl)) == BLOCK)
>      {
> -      /* Strip builtins from the translation-unit BLOCK.  We still have targets
> -	 without builtin_decl_explicit support and also builtins are shared
> -	 nodes and thus we can't use TREE_CHAIN in multiple lists.  */
>        tree *nextp = &BLOCK_VARS (DECL_INITIAL (decl));
>        while (*nextp)
>          {
>            tree var = *nextp;
> -          if (TREE_CODE (var) == FUNCTION_DECL
> -              && DECL_BUILT_IN (var))
> +          if (needed_in_block_vars_p (var))
>  	    *nextp = TREE_CHAIN (var);
>  	  else
>  	    nextp = &TREE_CHAIN (var);
> 
>
Jan Hubicka Jan. 16, 2016, 10:27 a.m. UTC | #2
> On Fri, 15 Jan 2016, Jan Hubicka wrote:
> 
> > Hi,
> > this is last of my patches to improve partitionability of programs. 
> > This patch compacts BLOCK_VARs lists and trows away TYPE_DECLs for -g0
> > and redundant TYPE_DECLs on all levels. It does make noticeable difference
> > on firefox, but I managed to erase the numbers (can re-test them if requested)
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> Hmm.  So I wonder why remove_unused_locals does not do this then
> or if it does not because it would change code-gen based on -g/-g0
> I wonder why your patch wouldn't introduce that issue with LTO.

Usually the things stay to make DECL_UID stable across -g and -g0.  This is not
the case with LTO, so ATM we do not have same codegen with lto and -g/-g0.  I
guess it is something we can start shooting for once early debug is in.
> 
> That said - improve remove_unused_locals instead please?

Yep, that makes sense (though i am not sure every BLOCK_VARs seen by lto
streamer is also seen by this function).  Will update the patch and re-measure
the effect.

Honza
Richard Biener Jan. 16, 2016, 10:31 a.m. UTC | #3
On January 16, 2016 11:27:09 AM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Fri, 15 Jan 2016, Jan Hubicka wrote:
>> 
>> > Hi,
>> > this is last of my patches to improve partitionability of programs.
>
>> > This patch compacts BLOCK_VARs lists and trows away TYPE_DECLs for
>-g0
>> > and redundant TYPE_DECLs on all levels. It does make noticeable
>difference
>> > on firefox, but I managed to erase the numbers (can re-test them if
>requested)
>> > 
>> > Bootstrapped/regtested x86_64-linux, OK?
>> 
>> Hmm.  So I wonder why remove_unused_locals does not do this then
>> or if it does not because it would change code-gen based on -g/-g0
>> I wonder why your patch wouldn't introduce that issue with LTO.
>
>Usually the things stay to make DECL_UID stable across -g and -g0. 

Only DECL_UID order needs to be stable I think.

>This is not
>the case with LTO,

I don't see how LTO does behave differently here.

 so ATM we do not have same codegen with lto and
>-g/-g0.  I
>guess it is something we can start shooting for once early debug is in.
>> 
>> That said - improve remove_unused_locals instead please?
>
>Yep, that makes sense (though i am not sure every BLOCK_VARs seen by
>lto
>streamer is also seen by this function).  Will update the patch and
>re-measure
>the effect.
>
>Honza
Jakub Jelinek Jan. 16, 2016, 10:39 a.m. UTC | #4
On Sat, Jan 16, 2016 at 11:31:49AM +0100, Richard Biener wrote:
> On January 16, 2016 11:27:09 AM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> On Fri, 15 Jan 2016, Jan Hubicka wrote:
> >> 
> >> > Hi,
> >> > this is last of my patches to improve partitionability of programs.
> >
> >> > This patch compacts BLOCK_VARs lists and trows away TYPE_DECLs for
> >-g0
> >> > and redundant TYPE_DECLs on all levels. It does make noticeable
> >difference
> >> > on firefox, but I managed to erase the numbers (can re-test them if
> >requested)
> >> > 
> >> > Bootstrapped/regtested x86_64-linux, OK?
> >> 
> >> Hmm.  So I wonder why remove_unused_locals does not do this then
> >> or if it does not because it would change code-gen based on -g/-g0
> >> I wonder why your patch wouldn't introduce that issue with LTO.
> >
> >Usually the things stay to make DECL_UID stable across -g and -g0. 
> 
> Only DECL_UID order needs to be stable I think.

There is certainly something we only care about order and not exact numbers,
but I'm not sure if it is DECL_UIDs - otherwise we'd happily use allocate_decl_uid ()
even for DECL_DEBUG_EXPRs, but we instead use a different counter for those.
Alex?

	Jakub
Jan Hubicka Jan. 16, 2016, 10:41 a.m. UTC | #5
> >Usually the things stay to make DECL_UID stable across -g and -g0. 
> 
> Only DECL_UID order needs to be stable I think.

Well, when you remove a reference to a declaration, the DECL_UID order will
change.  We stream declarations in the order they are inserted to the encoder
and that depends on specific references from the IL. Creating the decl and 
not inserting it to il/optimizing it out early is safe.  We do so with debug
statements, block local vars and other stuff.

The difference to non-LTO is that in non-LTO you only need to be sure you
create same decls in the same order.  With LTO we need to be sure to keep all
references streamed in the same order.  I suppose we could avoid some of it by
streaming DECL_UIDs and merging them same way as we do symbol_table->order but
that will still have fun effects wrt merging.
> 
> >This is not
> >the case with LTO,
> 
> I don't see how LTO does behave differently here.
> 
>  so ATM we do not have same codegen with lto and
> >-g/-g0.  I
> >guess it is something we can start shooting for once early debug is in.
> >> 
> >> That said - improve remove_unused_locals instead please?

I looked into remove_unused_locals and in fact it is removing those decls
aready to some degree, it just keeps all TYPE_DECLs when debug info is on.  I
will extend it to skip redundant typedefs.

Honza
diff mbox

Patch

Index: tree.c
===================================================================
--- tree.c	(revision 232407)
+++ tree.c	(working copy)
@@ -5343,6 +5343,42 @@  need_assembler_name_p (tree decl)
   return true;
 }
 
+/* Return true if DECL should stay in BLOCK_VARs list.  */
+
+static inline bool
+needed_in_block_vars_p (const_tree decl)
+{
+  if (TREE_CODE (decl) == VAR_DECL && TREE_USED (decl)
+      && (!TREE_STATIC (decl) || DECL_EXTERNAL (decl)))
+    return true;
+  if (TREE_CODE (decl) == RESULT_DECL)
+    return true;
+  /* Dwarf2out output non-fortran and non-ada const decls when the
+     enum is output and it will not consider language to be ada or fortran
+     in LTO.  */
+  if (TREE_CODE (decl) == CONST_DECL)
+    return false;
+  if (debug_info_level >= DINFO_LEVEL_TERSE)
+    return false;
+  /* Strip builtins from the translation-unit BLOCK.  We still have targets
+     without builtin_decl_explicit support and also builtins are shared
+     nodes and thus we can't use TREE_CHAIN in multiple lists.  */
+  if (TREE_CODE (decl) == FUNCTION_DECL
+      && DECL_BUILT_IN (decl))
+    return false;
+  if (DECL_IGNORED_P (decl))
+    return false;
+  if (TREE_CODE (decl) == TYPE_DECL
+      && is_redundant_typedef (decl))
+    return false;
+  gcc_checking_assert (TREE_CODE (decl) == TYPE_DECL
+		       || TREE_CODE (decl) == VAR_DECL
+		       || TREE_CODE (decl) == FUNCTION_DECL
+		       || TREE_CODE (decl) == LABEL_DECL
+		       || TREE_CODE (decl) == RESULT_DECL);
+  return true;
+}
+
 
 /* Reset all language specific information still present in symbol
    DECL.  */
@@ -5445,15 +5481,11 @@  free_lang_data_in_decl (tree decl)
            && DECL_INITIAL (decl)
            && TREE_CODE (DECL_INITIAL (decl)) == BLOCK)
     {
-      /* Strip builtins from the translation-unit BLOCK.  We still have targets
-	 without builtin_decl_explicit support and also builtins are shared
-	 nodes and thus we can't use TREE_CHAIN in multiple lists.  */
       tree *nextp = &BLOCK_VARS (DECL_INITIAL (decl));
       while (*nextp)
         {
           tree var = *nextp;
-          if (TREE_CODE (var) == FUNCTION_DECL
-              && DECL_BUILT_IN (var))
+          if (needed_in_block_vars_p (var))
 	    *nextp = TREE_CHAIN (var);
 	  else
 	    nextp = &TREE_CHAIN (var);