diff mbox

Add used_by_single_function flag for static variables

Message ID 20140623042426.GB15978@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka June 23, 2014, 4:24 a.m. UTC
Hi,
this is patch to add the used_by_single_function flag to varpool.  In full generality
it is a simple dataflow problem, since the variable may be referred by other variables
as long as all of them are used by one function only.  I have bootstrapped/regtested it
on x86_64-linux and lto-bootstrapped with the following variant of earlier Richard's
patch to tree-ssa-dce:

Comments

Richard Biener June 25, 2014, 9:47 a.m. UTC | #1
On Mon, 23 Jun 2014, Jan Hubicka wrote:

> Hi,
> this is patch to add the used_by_single_function flag to varpool.  In full generality
> it is a simple dataflow problem, since the variable may be referred by other variables
> as long as all of them are used by one function only.  I have bootstrapped/regtested it
> on x86_64-linux and lto-bootstrapped with the following variant of earlier Richard's
> patch to tree-ssa-dce:
> Index: tree-ssa-dce.c
> ===================================================================
> --- tree-ssa-dce.c	(revision 211881)
> +++ tree-ssa-dce.c	(working copy)
> @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.
>  #include "flags.h"
>  #include "cfgloop.h"
>  #include "tree-scalar-evolution.h"
> +#include "cgraph.h"
>  
>  static struct stmt_stats
>  {
> @@ -278,10 +279,21 @@ mark_stmt_if_obviously_necessary (gimple
>        break;
>  
>      case GIMPLE_ASSIGN:
> -      if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
> -	  && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
> -	return;
> -      break;
> +      {
> +	tree lhs = gimple_assign_lhs (stmt);
> +	if (TREE_CODE (lhs) == SSA_NAME
> +	    && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
> +	  return;
> +	lhs = get_base_address (lhs);
> +	if (TREE_CODE (lhs) == VAR_DECL
> +	    && !TREE_ADDRESSABLE (lhs)
> +	    && !DECL_NONALIASED (lhs)
> +	    && !TREE_THIS_VOLATILE (lhs)
> +	    && (TREE_STATIC (lhs) || DECL_EXTERNAL (lhs))
> +	    && varpool_node_for_decl (lhs)->used_by_single_function)
> +	  return;
> +	break;
> +      }
>  
>      default:
>        break;
> 
> I will keep the dce change to Richard - in full generality we may want to
> handle asm statements (that seems to be all considered volatile by current
> implementation) and calls, too.

Damn, the DCE algorithm doesn't really handle the situation.

int foo ()
{
  static int i;
  if (i++)
    return i;
  return 0;
}

  <bb 2>:
  i.0_3 = i;
  i.2_4 = i.0_3 + 1;
  i = i.2_4;
  if (i.0_3 != 0)

and DCE will happily remove the store i = i.2_4 as there is no
user downstream (it of course doesn't need to consider upstream
uses).

So we can't really wire it into the propagator this way unless
we first compute if there is _any_ read of the decl in the
function (that is, if the decl is write-only).  Maybe we want
to special-case those globals in the propagator though (the
use may be dead after all).

So, more work needed, queued.

Richard.
Bernhard Reutner-Fischer Nov. 13, 2014, 8:56 p.m. UTC | #2
Honza,

On 23 June 2014 06:24, Jan Hubicka <hubicka@ucw.cz> wrote:

> --- lto-cgraph.c        (revision 211881)
> +++ lto-cgraph.c        (working copy)
> @@ -614,6 +614,7 @@ lto_output_varpool_node (struct lto_simp
>           /* in_other_partition.  */
>      }
>    bp_pack_value (&bp, node->tls_model, 3);
> +  bp_pack_value (&bp, node->used_by_single_function, 1);
>    streamer_write_bitpack (&bp);
>
>    group = node->get_comdat_group ();
> @@ -1275,6 +1276,7 @@ input_varpool_node (struct lto_file_decl
>    if (node->alias && !node->analyzed && node->weakref)
>      node->alias_target = get_alias_symbol (node->decl);
>    node->tls_model = (enum tls_model)bp_unpack_value (&bp, 3);
> +  node->used_by_single_function = (enum tls_model)bp_unpack_value (&bp, 1);
>    group = read_identifier (ib);

Let's please remove the (wrong) cast to tls_model for the
used_by_single_function bit.
PS: lto-cgraph should seemingly be switched to use bp_unpack_enum(), no?
PPS: input_ref() speculative setting should also remove the wrong enum
ipa_ref_use cast.
I better stop reading here ;)
cheers,
Jan Hubicka Nov. 13, 2014, 11:44 p.m. UTC | #3
> Honza,
> 
> On 23 June 2014 06:24, Jan Hubicka <hubicka@ucw.cz> wrote:
> 
> > --- lto-cgraph.c        (revision 211881)
> > +++ lto-cgraph.c        (working copy)
> > @@ -614,6 +614,7 @@ lto_output_varpool_node (struct lto_simp
> >           /* in_other_partition.  */
> >      }
> >    bp_pack_value (&bp, node->tls_model, 3);
> > +  bp_pack_value (&bp, node->used_by_single_function, 1);
> >    streamer_write_bitpack (&bp);
> >
> >    group = node->get_comdat_group ();
> > @@ -1275,6 +1276,7 @@ input_varpool_node (struct lto_file_decl
> >    if (node->alias && !node->analyzed && node->weakref)
> >      node->alias_target = get_alias_symbol (node->decl);
> >    node->tls_model = (enum tls_model)bp_unpack_value (&bp, 3);
> > +  node->used_by_single_function = (enum tls_model)bp_unpack_value (&bp, 1);
> >    group = read_identifier (ib);
> 
> Let's please remove the (wrong) cast to tls_model for the
> used_by_single_function bit.

Yep, it is obiovus pasto :)

> PS: lto-cgraph should seemingly be switched to use bp_unpack_enum(), no?

Yes, in genral lto-cgraph needs a lot of cleanups (most of that code was
written in early LTO days and needs a rewrite, it just never broke badly enough
to force it), I will try to schedule these early next stage 1.

> PPS: input_ref() speculative setting should also remove the wrong enum
> ipa_ref_use cast.
> I better stop reading here ;)
Hehe, just go ahead and keep me posted ;)

Honza
> cheers,
Bernhard Reutner-Fischer Nov. 14, 2014, 1:32 p.m. UTC | #4
On 14 November 2014 00:44, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Honza,
>>
>> On 23 June 2014 06:24, Jan Hubicka <hubicka@ucw.cz> wrote:
>>
>> > --- lto-cgraph.c        (revision 211881)
>> > +++ lto-cgraph.c        (working copy)
>> > @@ -614,6 +614,7 @@ lto_output_varpool_node (struct lto_simp
>> >           /* in_other_partition.  */
>> >      }
>> >    bp_pack_value (&bp, node->tls_model, 3);
>> > +  bp_pack_value (&bp, node->used_by_single_function, 1);
>> >    streamer_write_bitpack (&bp);
>> >
>> >    group = node->get_comdat_group ();
>> > @@ -1275,6 +1276,7 @@ input_varpool_node (struct lto_file_decl
>> >    if (node->alias && !node->analyzed && node->weakref)
>> >      node->alias_target = get_alias_symbol (node->decl);
>> >    node->tls_model = (enum tls_model)bp_unpack_value (&bp, 3);
>> > +  node->used_by_single_function = (enum tls_model)bp_unpack_value (&bp, 1);
>> >    group = read_identifier (ib);
>>
>> Let's please remove the (wrong) cast to tls_model for the
>> used_by_single_function bit.
>
> Yep, it is obiovus pasto :)
>
>> PS: lto-cgraph should seemingly be switched to use bp_unpack_enum(), no?
>
> Yes, in genral lto-cgraph needs a lot of cleanups (most of that code was
> written in early LTO days and needs a rewrite, it just never broke badly enough
> to force it), I will try to schedule these early next stage 1.
>
>> PPS: input_ref() speculative setting should also remove the wrong enum
>> ipa_ref_use cast.
>> I better stop reading here ;)
> Hehe, just go ahead and keep me posted ;)

lto_output_node does not really like
bp_pack_enum (&bp, node_frequency, 2+1, node->frequency);
though. I'll try to have a look in the evening.

btw, i've seen that struct symtab_node has two huge gaps, 30bit and 4bytes
[initially meant to play around with a simple pahole-like plugin to
point those out].
So i started to play around in layout_struct to automatically reorder
member elts to fill
eventual gaps, just to see how/if offsetof and addressof and ada break
if i put a
2D packing step there.
But that raises the question if we have hit-rate data, perhaps in
profile-mode?, for struct
member-access yet? Would be nice to be able to weight the hotter
members higher, towards
the start of the struct. Even neglecting ABI concerns, I fear this
obvious idea is a bit
more time-consuming than i'd like it to be..
diff mbox

Patch

Index: tree-ssa-dce.c
===================================================================
--- tree-ssa-dce.c	(revision 211881)
+++ tree-ssa-dce.c	(working copy)
@@ -73,6 +73,7 @@  along with GCC; see the file COPYING3.
 #include "flags.h"
 #include "cfgloop.h"
 #include "tree-scalar-evolution.h"
+#include "cgraph.h"
 
 static struct stmt_stats
 {
@@ -278,10 +279,21 @@  mark_stmt_if_obviously_necessary (gimple
       break;
 
     case GIMPLE_ASSIGN:
-      if (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME
-	  && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
-	return;
-      break;
+      {
+	tree lhs = gimple_assign_lhs (stmt);
+	if (TREE_CODE (lhs) == SSA_NAME
+	    && TREE_CLOBBER_P (gimple_assign_rhs1 (stmt)))
+	  return;
+	lhs = get_base_address (lhs);
+	if (TREE_CODE (lhs) == VAR_DECL
+	    && !TREE_ADDRESSABLE (lhs)
+	    && !DECL_NONALIASED (lhs)
+	    && !TREE_THIS_VOLATILE (lhs)
+	    && (TREE_STATIC (lhs) || DECL_EXTERNAL (lhs))
+	    && varpool_node_for_decl (lhs)->used_by_single_function)
+	  return;
+	break;
+      }
 
     default:
       break;

I will keep the dce change to Richard - in full generality we may want to
handle asm statements (that seems to be all considered volatile by current
implementation) and calls, too.

The propagation part is not realy consumed by dce but it is useful for other
analysis, like PTA.  Richard, I can also compute flag whether the function
may recurse meaning that it is in SCC region or may call external function or
indirect call (transitively). I guess this may be useful to determine whether calls
can use these variables if they are proven to be non-escaping.

Will commit it shortly,
Honza

Index: varpool.c
===================================================================
--- varpool.c	(revision 211881)
+++ varpool.c	(working copy)
@@ -211,6 +211,8 @@  dump_varpool_node (FILE *f, varpool_node
     fprintf (f, " initialized");
   if (node->output)
     fprintf (f, " output");
+  if (node->used_by_single_function)
+    fprintf (f, " used-by-single-function");
   if (TREE_READONLY (node->decl))
     fprintf (f, " read-only");
   if (ctor_for_folding (node->decl) != error_mark_node)
Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 211881)
+++ tree-pass.h	(working copy)
@@ -471,6 +471,7 @@  extern simple_ipa_opt_pass *make_pass_ip
 extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);
+extern ipa_opt_pass_d *make_pass_ipa_single_use (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_comdats (gcc::context *ctxt);
 
 extern gimple_opt_pass *make_pass_cleanup_cfg_post_optimizing (gcc::context
Index: cgraph.h
===================================================================
--- cgraph.h	(revision 211881)
+++ cgraph.h	(working copy)
@@ -719,6 +719,12 @@  public:
   unsigned dynamically_initialized : 1;
 
   ENUM_BITFIELD(tls_model) tls_model : 3;
+
+  /* Set if the variable is known to be used by single function only.
+     This is computed by ipa_signle_use pass and used by late optimizations
+     in places where optimization would be valid for local static variable
+     if we did not do any inter-procedural code movement.  */
+  unsigned used_by_single_function : 1;
 };
 
 /* Every top level asm statement is put into a asm_node.  */
Index: lto-cgraph.c
===================================================================
--- lto-cgraph.c	(revision 211881)
+++ lto-cgraph.c	(working copy)
@@ -614,6 +614,7 @@  lto_output_varpool_node (struct lto_simp
 	  /* in_other_partition.  */
     }
   bp_pack_value (&bp, node->tls_model, 3);
+  bp_pack_value (&bp, node->used_by_single_function, 1);
   streamer_write_bitpack (&bp);
 
   group = node->get_comdat_group ();
@@ -1275,6 +1276,7 @@  input_varpool_node (struct lto_file_decl
   if (node->alias && !node->analyzed && node->weakref)
     node->alias_target = get_alias_symbol (node->decl);
   node->tls_model = (enum tls_model)bp_unpack_value (&bp, 3);
+  node->used_by_single_function = (enum tls_model)bp_unpack_value (&bp, 1);
   group = read_identifier (ib);
   if (group)
     {
Index: passes.def
===================================================================
--- passes.def	(revision 211881)
+++ passes.def	(working copy)
@@ -109,6 +109,8 @@  along with GCC; see the file COPYING3.
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
   NEXT_PASS (pass_ipa_reference);
+  /* This pass needs to be scheduled after any IP code duplication.   */
+  NEXT_PASS (pass_ipa_single_use);
   /* Comdat privatization come last, as direct references to comdat local
      symbols are not allowed outside of the comdat group.  Privatizing early
      would result in missed optimizations due to this restriction.  */
Index: ipa.c
===================================================================
--- ipa.c	(revision 211881)
+++ ipa.c	(working copy)
@@ -1096,3 +1101,227 @@  make_pass_ipa_cdtor_merge (gcc::context
 {
   return new pass_ipa_cdtor_merge (ctxt);
 }
+
+/* Invalid pointer representing BODDOM for single user dataflow.  */
+#define BOTTOM ((cgraph_node *)(size_t) 2)
+
+/* Meet operation for single user dataflow.
+   Here we want to associate variables with sigle function that may access it.
+
+   FUNCTION is current single user of a variable, VAR is variable that uses it.
+   Latttice is stored in SINGLE_USER_MAP.
+
+   We represent: 
+    - TOP by no entry in SIGNLE_USER_MAP
+    - BOTTOM by BOTTOM in AUX pointer (to save lookups)
+    - known single user by cgraph pointer in SINGLE_USER_MAP.  */
+
+cgraph_node *
+meet (cgraph_node *function, varpool_node *var,
+       pointer_map<cgraph_node *> &single_user_map)
+{
+  struct cgraph_node *user, **f;
+
+  if (var->aux == BOTTOM)
+    return BOTTOM;
+
+  f = single_user_map.contains (var);
+  if (!f)
+    return function;
+  user = *f;
+  if (!function)
+    return user;
+  else if (function != user)
+    return BOTTOM;
+  else
+    return function;
+}
+
+/* Propagation step of single-use dataflow.
+
+   Check all uses of VNODE and see if they are used by single function FUNCTION.
+   SINGLE_USER_MAP represents the dataflow lattice.  */
+
+cgraph_node *
+propagate_single_user (varpool_node *vnode, cgraph_node *function,
+		       pointer_map<cgraph_node *> &single_user_map)
+{
+  int i;
+  struct ipa_ref *ref;
+
+  gcc_assert (!vnode->externally_visible);
+
+  /* If node is an alias, first meet with its target.  */
+  if (vnode->alias)
+    function = meet (function, varpool_alias_target (vnode), single_user_map);
+
+  /* Check all users and see if they correspond to a single function.  */
+  for (i = 0;
+       ipa_ref_list_referring_iterate (&vnode->ref_list, i, ref)
+       && function != BOTTOM; i++)
+    {
+      struct cgraph_node *cnode = dyn_cast <cgraph_node *> (ref->referring);
+      if (cnode)
+	{
+	  if (cnode->global.inlined_to)
+	    cnode = cnode->global.inlined_to;
+	  if (!function)
+	    function = cnode;
+	  else if (function != cnode)
+	    function = BOTTOM;
+	}
+      else
+        function = meet (function, dyn_cast <varpool_node *> (ref->referring), single_user_map);
+    }
+  return function;
+}
+
+/* Pass setting used_by_single_function flag.
+   This flag is set on variable when there is only one function that may possibly
+   referr to it.  */
+
+static unsigned int
+ipa_single_use (void)
+{
+  varpool_node *first = (varpool_node *) (void *) 1;
+  varpool_node *var;
+  pointer_map<cgraph_node *> single_user_map;
+
+  FOR_EACH_DEFINED_VARIABLE (var)
+    if (!varpool_all_refs_explicit_p (var))
+      var->aux = BOTTOM;
+    else
+      {
+	/* Enqueue symbol for dataflow.  */
+        var->aux = first;
+	first = var;
+      }
+
+  /* The actual dataflow.  */
+
+  while (first != (void *) 1)
+    {
+      cgraph_node *user, *orig_user, **f;
+
+      var = first;
+      first = (varpool_node *)first->aux;
+
+      f = single_user_map.contains (var);
+      if (f)
+	orig_user = *f;
+      else
+	orig_user = NULL;
+      user = propagate_single_user (var, orig_user, single_user_map);
+
+      gcc_checking_assert (var->aux != BOTTOM);
+
+      /* If user differs, enqueue all references.  */
+      if (user != orig_user)
+	{
+	  unsigned int i;
+	  ipa_ref *ref;
+
+	  *single_user_map.insert (var) = user;
+
+	  /* Enqueue all aliases for re-processing.  */
+	  for (i = 0;
+	       ipa_ref_list_referring_iterate (&var->ref_list, i, ref); i++)
+	    if (ref->use == IPA_REF_ALIAS
+		&& !ref->referring->aux)
+	      {
+		ref->referring->aux = first;
+		first = dyn_cast <varpool_node *> (ref->referring);
+	      }
+	  /* Enqueue all users for re-processing.  */
+	  for (i = 0;
+	       ipa_ref_list_reference_iterate (&var->ref_list, i, ref); i++)
+	    if (!ref->referred->aux
+	        && ref->referred->definition
+		&& is_a <varpool_node *> (ref->referred))
+	      {
+		ref->referred->aux = first;
+		first = dyn_cast <varpool_node *> (ref->referred);
+	      }
+
+	  /* If user is BOTTOM, just punt on this var.  */
+	  if (user == BOTTOM)
+	    var->aux = BOTTOM;
+	  else
+	    var->aux = NULL;
+	}
+      else
+	var->aux = NULL;
+    }
+
+  FOR_EACH_DEFINED_VARIABLE (var)
+    {
+      if (var->aux != BOTTOM)
+	{
+#ifdef ENABLE_CHECKING
+	  if (!single_user_map.contains (var))
+          gcc_assert (single_user_map.contains (var));
+#endif
+	  if (dump_file)
+	    {
+	      fprintf (dump_file, "Variable %s/%i is used by single function\n",
+		       var->name (), var->order);
+	    }
+	  var->used_by_single_function = true;
+	}
+      var->aux = NULL;
+    }
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_ipa_single_use =
+{
+  IPA_PASS, /* type */
+  "single-use", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  true, /* has_execute */
+  TV_CGRAPHOPT, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_ipa_single_use : public ipa_opt_pass_d
+{
+public:
+  pass_ipa_single_use (gcc::context *ctxt)
+    : ipa_opt_pass_d (pass_data_ipa_single_use, ctxt,
+		      NULL, /* generate_summary */
+		      NULL, /* write_summary */
+		      NULL, /* read_summary */
+		      NULL, /* write_optimization_summary */
+		      NULL, /* read_optimization_summary */
+		      NULL, /* stmt_fixup */
+		      0, /* function_transform_todo_flags_start */
+		      NULL, /* function_transform */
+		      NULL) /* variable_transform */
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *);
+  virtual unsigned int execute (function *) { return ipa_single_use (); }
+
+}; // class pass_ipa_single_use
+
+bool
+pass_ipa_single_use::gate (function *)
+{
+  return optimize;
+}
+
+} // anon namespace
+
+ipa_opt_pass_d *
+make_pass_ipa_single_use (gcc::context *ctxt)
+{
+  return new pass_ipa_single_use (ctxt);
+}