diff mbox

[2/2] Reducing the overhead of dwarf2 location tracking

Message ID g4aahbvviw.fsf@linaro.org
State New
Headers show

Commit Message

Richard Sandiford March 4, 2011, 1:56 p.m. UTC
Suppose we have a function F that is inlined several times.  Suppose too
that F has a local variable X, and that no "real" (as opposed to debug)
references to X remain after pre-inlining optimisations.  In this case,
we will add X to BLOCK_NONLOCALIZED_VARS rather than duplicate it each
time F is inlined.  Location notes for each inlining of F will then
refer to the same non-localised X decl.  This in turn means that each
inlining of F has the same location list for X, with the list specifying
the location of X for all inlinings of F.

Jakub confirms that this indeed the intended behaviour, and I haven't
seen any problem with the output.  However, the way we generate the
lists within dwarf2out.c is then quadratic in the number of times
F is inlined.  We generate the location list every time F is inlined,
then (at the end) run an optimisation pass to collapse these duplicate
lists back together.  Compile times are usually OK, but on this ARM
testcase:

    https://bugs.launchpad.net/gcc-linaro/+bug/714921/+attachment/1837418/+files/gcc-oom.i

the duplicate lists take up a huge amount of memory.  It isn't
actually possible build the thing with -O2 -g on most 32-bit hosts,
whereas with 4.4 it was.

The patch below caches the list for each BLOCK_NONLOCALIZED_VAR
so that the list can be shared immediately.  The testcase then goes
from needing over 2GB of virtual memory to needing less than 1GB.

I didn't want to grow var_loc_list_def by putting the cache there,
since most uses aren't like this.  I therefore went for a separate
hash table instead.

(I wondered at first whether we could store the cache in the existing
var_loc_list_def fields, e.g. by setting "first" to null and putting the
cache pointer in a union with "last".  This would be the only case in
which "first" was null and the second (union) field was nonnull.
However, it didn't look like this would be safe in cases where another
location list references the decl, perhaps with a different want_address.
In other words, I didn't think it would be safe in cases where dw_loc_list
is not called indirectly from add_location_or_const_value_attribute.
I don't think it's safe in general to cache the list in
loc_list_from_tree, since some callers modify it.)

The patch also makes sure that resolve_addr copes with cases where
location lists are already shared.

Bootstrapped & regression-tested on x86_64-linux-gnu.  Also checked
to the make sure that the "after" output for the testcase only differs
from the "before" output (which had 1/2 applied) in the numbering of
the location list labels.

OK for 4.6?  How about 4.5?

Richard


gcc/
	* dwarf2out.c (dw_loc_list_node): Add resolved_addr and replaced.
	(cached_dw_loc_list_def): New structure.
	(cached_dw_loc_list): New typedef.
	(cached_dw_loc_list_table): New variable.
	(cached_dw_loc_list_table_hash): New function.
	(cached_dw_loc_list_table_eq): Likewise.
	(add_location_or_const_value_attribute): Take a bool cache_p.
	Cache the list when the parameter is true.
	(gen_formal_parameter_die): Update caller.
	(gen_variable_die): Likewise.
	(dwarf2out_finish): Likewise.
	(dwarf2out_function_decl): Clear cached_dw_loc_list_table.
	(dwarf2out_init): Initialize cached_dw_loc_list_table.
	(resolve_addr): Cache the result of resolving a chain of
	location lists.

Comments

Richard Biener March 4, 2011, 2:30 p.m. UTC | #1
On Fri, Mar 4, 2011 at 2:56 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Suppose we have a function F that is inlined several times.  Suppose too
> that F has a local variable X, and that no "real" (as opposed to debug)
> references to X remain after pre-inlining optimisations.  In this case,
> we will add X to BLOCK_NONLOCALIZED_VARS rather than duplicate it each
> time F is inlined.  Location notes for each inlining of F will then
> refer to the same non-localised X decl.  This in turn means that each
> inlining of F has the same location list for X, with the list specifying
> the location of X for all inlinings of F.
>
> Jakub confirms that this indeed the intended behaviour, and I haven't
> seen any problem with the output.

Hm, but isn't it incorrect debug info?  I would have expected this
non-localized var to be the abstract origin of a copy for location
list purposes.

Well, for

static inline int foo (int p) { int q = p; return q; }
int bar1 (int i)
{
  return foo (i);
}
int bar2 (int j)
{
  return foo (j);
}

I don't even see location lists for q, but maybe I'm blind ;)

Richard.
Richard Sandiford March 4, 2011, 2:47 p.m. UTC | #2
Richard Guenther <richard.guenther@gmail.com> writes:
> On Fri, Mar 4, 2011 at 2:56 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Suppose we have a function F that is inlined several times.  Suppose too
>> that F has a local variable X, and that no "real" (as opposed to debug)
>> references to X remain after pre-inlining optimisations.  In this case,
>> we will add X to BLOCK_NONLOCALIZED_VARS rather than duplicate it each
>> time F is inlined.  Location notes for each inlining of F will then
>> refer to the same non-localised X decl.  This in turn means that each
>> inlining of F has the same location list for X, with the list specifying
>> the location of X for all inlinings of F.
>>
>> Jakub confirms that this indeed the intended behaviour, and I haven't
>> seen any problem with the output.
>
> Hm, but isn't it incorrect debug info?  I would have expected this
> non-localized var to be the abstract origin of a copy for location
> list purposes.
>
> Well, for
>
> static inline int foo (int p) { int q = p; return q; }
> int bar1 (int i)
> {
>   return foo (i);
> }
> int bar2 (int j)
> {
>   return foo (j);
> }
>
> I don't even see location lists for q, but maybe I'm blind ;)

Yeah, it does seem to be a bit selective.

FWIW, I used the attached while writing the patch.  The count* variables
are affected.  I was using a ulimit of 1GB (IIRC): without the patch,
the OOM_BEFORE definition of A2 caused an OOM, whereas after it,
the longer definition was OK.

Richard


volatile int g;

static inline __attribute__((always_inline)) int
f (int x)
{
#define C0(X, Y) X = Y
#define C1(X, Y) C0 (X##0, Y), C0 (X##1, Y | X##0), C0 (X##2, Y + X##0), C0 (X##3, Y & X##0)
#define C2(X, Y) C1 (X##0, Y), C1 (X##1, Y), C1 (X##2, Y), C1 (X##3, Y)
#define C3(X, Y) C2 (X##0, Y), C2 (X##1, Y), C2 (X##2, Y), C2 (X##3, Y)
#define A0 C3 (count, 1), x += g, C3 (count, x), x += g
#define A1 A0, A0, A0, A0, A0, A0, A0, A0
#ifdef OOM_BEFORE
#define A2 A1, A1, A1, A1
#else
#define A2 A1, A1, A1, A1, A1
#endif
  int C3 (count, 0);
  A2;
  return x;
}

int
b (int *ptr)
{
#define B0 x = f (x)
#define B1 B0, B0, B0, B0, B0, B0, B0, B0
#define B2 B1, B1, B1
  int x = 0;
  B2;
  return x;
}
Jakub Jelinek March 4, 2011, 2:49 p.m. UTC | #3
On Fri, Mar 04, 2011 at 03:30:34PM +0100, Richard Guenther wrote:
> On Fri, Mar 4, 2011 at 2:56 PM, Richard Sandiford
> > Jakub confirms that this indeed the intended behaviour, and I haven't
> > seen any problem with the output.
> 
> Hm, but isn't it incorrect debug info?  I would have expected this
> non-localized var to be the abstract origin of a copy for location
> list purposes.

I thought we've discussed that on IRC.  For -fcompare-debug reasons
the decision on what decl is localized and what goes into
BLOCK_NONLOCALIZED_VARS can't take into account debug stmts (otherwise
there would be -fcompare-debug failures).
Thus, we sometimes decide to have some BLOCK_NONLOCALIZED_VARS vars
referenced in debug stmts.
The way this works is we do normal var-tracking against those, and then
at dwarf2out.c time just pick their locations up.
See e.g.
http://gcc.gnu.org/ml/gcc-patches/2010-03/msg00445.html
for some of the details.

	Jakub
Richard Biener March 4, 2011, 3 p.m. UTC | #4
On Fri, Mar 4, 2011 at 3:49 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Mar 04, 2011 at 03:30:34PM +0100, Richard Guenther wrote:
>> On Fri, Mar 4, 2011 at 2:56 PM, Richard Sandiford
>> > Jakub confirms that this indeed the intended behaviour, and I haven't
>> > seen any problem with the output.
>>
>> Hm, but isn't it incorrect debug info?  I would have expected this
>> non-localized var to be the abstract origin of a copy for location
>> list purposes.
>
> I thought we've discussed that on IRC.  For -fcompare-debug reasons
> the decision on what decl is localized and what goes into
> BLOCK_NONLOCALIZED_VARS can't take into account debug stmts (otherwise
> there would be -fcompare-debug failures).
> Thus, we sometimes decide to have some BLOCK_NONLOCALIZED_VARS vars
> referenced in debug stmts.
> The way this works is we do normal var-tracking against those, and then
> at dwarf2out.c time just pick their locations up.
> See e.g.
> http://gcc.gnu.org/ml/gcc-patches/2010-03/msg00445.html
> for some of the details.

Sure.

But then location lists for those variables depend on the function context
and cannot be shared for different callers.  Am I missing something?
Likewise I don't see how they could be shared for multiple inline instances
in the same function.

Richard.

>        Jakub
>
Jakub Jelinek March 4, 2011, 3:10 p.m. UTC | #5
On Fri, Mar 04, 2011 at 04:00:06PM +0100, Richard Guenther wrote:
> But then location lists for those variables depend on the function context
> and cannot be shared for different callers.  Am I missing something?

Sure, they can't be shared between different functions.  If Richard's patch
does that, it is wrong.

> Likewise I don't see how they could be shared for multiple inline instances
> in the same function.

But there is no reason why it can't be shared within one function, across
many inline instances.  There is just a single nonlocalized decl in all
those cases you cache, and even if it appears in multiple BLOCKs
BLOCK_NONLOCALIZED_VARS, VAR_LOCATION notes referencing it are still
all queued for that DECL_UID from the whole function and any time you
want to create a location list from what has been queued in the VAR_LOCATION
notes, you will get the same list.

	Jakub
Richard Biener March 4, 2011, 3:14 p.m. UTC | #6
On Fri, Mar 4, 2011 at 4:10 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Mar 04, 2011 at 04:00:06PM +0100, Richard Guenther wrote:
>> But then location lists for those variables depend on the function context
>> and cannot be shared for different callers.  Am I missing something?
>
> Sure, they can't be shared between different functions.  If Richard's patch
> does that, it is wrong.
>
>> Likewise I don't see how they could be shared for multiple inline instances
>> in the same function.
>
> But there is no reason why it can't be shared within one function, across
> many inline instances.  There is just a single nonlocalized decl in all
> those cases you cache, and even if it appears in multiple BLOCKs
> BLOCK_NONLOCALIZED_VARS, VAR_LOCATION notes referencing it are still
> all queued for that DECL_UID from the whole function and any time you
> want to create a location list from what has been queued in the VAR_LOCATION
> notes, you will get the same list.

Ah, ok - that makes sense.  We have a single location list covering all
BLOCKs where the decl appears in BLOCK_NONLOCALIZED_VARS.
Somehow I thought this would already happen.

Richard.

>        Jakub
>
Jakub Jelinek March 4, 2011, 3:22 p.m. UTC | #7
On Fri, Mar 04, 2011 at 01:56:55PM +0000, Richard Sandiford wrote:
> 	* dwarf2out.c (dw_loc_list_node): Add resolved_addr and replaced.
> 	(cached_dw_loc_list_def): New structure.
> 	(cached_dw_loc_list): New typedef.
> 	(cached_dw_loc_list_table): New variable.
> 	(cached_dw_loc_list_table_hash): New function.
> 	(cached_dw_loc_list_table_eq): Likewise.
> 	(add_location_or_const_value_attribute): Take a bool cache_p.
> 	Cache the list when the parameter is true.
> 	(gen_formal_parameter_die): Update caller.
> 	(gen_variable_die): Likewise.
> 	(dwarf2out_finish): Likewise.
> 	(dwarf2out_function_decl): Clear cached_dw_loc_list_table.
> 	(dwarf2out_init): Initialize cached_dw_loc_list_table.
> 	(resolve_addr): Cache the result of resolving a chain of
> 	location lists.

I think you should handle the cached_dw_loc_list_table in
dwarf2out_abstract_function similarly to say decl_loc_table, i.e.
save/clear for the duration of the of recursive dwarf2out_decl
call, restore afterwards and in the places where you actually use
it guard it also with cached_dw_loc_list_table != NULL.

	Jakub
Richard Sandiford March 4, 2011, 3:48 p.m. UTC | #8
Jakub Jelinek <jakub@redhat.com> writes:
> On Fri, Mar 04, 2011 at 04:00:06PM +0100, Richard Guenther wrote:
>> But then location lists for those variables depend on the function context
>> and cannot be shared for different callers.  Am I missing something?
>
> Sure, they can't be shared between different functions.  If Richard's patch
> does that, it is wrong.

For the record, it isn't supposed to, although maybe it would without the
dwarf2out_abstract_function change you asked for.  I'll test that next
week and resubmit.

Richard
diff mbox

Patch

Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2011-03-04 13:15:28.000000000 +0000
+++ gcc/dwarf2out.c	2011-03-04 13:50:33.000000000 +0000
@@ -4464,6 +4464,11 @@  typedef struct GTY(()) dw_loc_list_struc
   const char *section; /* Section this loclist is relative to */
   dw_loc_descr_ref expr;
   hashval_t hash;
+  /* True if all addresses in this and subsequent lists are known to be
+     resolved.  */
+  bool resolved_addr;
+  /* True if this list has been replaced by dw_loc_next.  */
+  bool replaced;
   bool emitted;
 } dw_loc_list_node;
 
@@ -6119,6 +6124,19 @@  typedef struct var_loc_list_def var_loc_
 /* Table of decl location linked lists.  */
 static GTY ((param_is (var_loc_list))) htab_t decl_loc_table;
 
+/* A cached location list.  */
+struct GTY (()) cached_dw_loc_list_def {
+  /* The DECL_UID of the decl that this entry describes.  */
+  unsigned int decl_id;
+
+  /* The cached location list.  */
+  dw_loc_list_ref loc_list;
+};
+typedef struct cached_dw_loc_list_def cached_dw_loc_list;
+
+/* Table of cached location lists.  */
+static GTY ((param_is (cached_dw_loc_list))) htab_t cached_dw_loc_list_table;
+
 /* A pointer to the base of a list of references to DIE's that
    are uniquely identified by their tag, presence/absence of
    children DIE's, and list of attribute/value pairs.  */
@@ -6479,7 +6497,7 @@  static void insert_int (HOST_WIDE_INT, u
 static void insert_double (double_int, unsigned char *);
 static void insert_float (const_rtx, unsigned char *);
 static rtx rtl_for_decl_location (tree);
-static bool add_location_or_const_value_attribute (dw_die_ref, tree,
+static bool add_location_or_const_value_attribute (dw_die_ref, tree, bool,
 						   enum dwarf_attribute);
 static bool tree_add_const_value_attribute (dw_die_ref, tree);
 static bool tree_add_const_value_attribute_for_decl (dw_die_ref, tree);
@@ -8201,6 +8219,24 @@  lookup_decl_loc (const_tree decl)
     htab_find_with_hash (decl_loc_table, decl, DECL_UID (decl));
 }
 
+/* Returns a hash value for X (which really is a cached_dw_loc_list_list).  */
+
+static hashval_t
+cached_dw_loc_list_table_hash (const void *x)
+{
+  return (hashval_t) ((const cached_dw_loc_list *) x)->decl_id;
+}
+
+/* Return nonzero if decl_id of cached_dw_loc_list X is the same as
+   UID of decl *Y.  */
+
+static int
+cached_dw_loc_list_table_eq (const void *x, const void *y)
+{
+  return (((const cached_dw_loc_list *) x)->decl_id
+	  == DECL_UID ((const_tree) y));
+}
+
 /* Equate a DIE to a particular declaration.  */
 
 static void
@@ -16978,15 +17014,22 @@  fortran_common (tree decl, HOST_WIDE_INT
    these things can crop up in other ways also.)  Note that one type of
    constant value which can be passed into an inlined function is a constant
    pointer.  This can happen for example if an actual argument in an inlined
-   function call evaluates to a compile-time constant address.  */
+   function call evaluates to a compile-time constant address.
+
+   CACHE_P is true if it is worth caching the location list for DECL,
+   so that future calls can reuse it rather than regenerate it from scratch.
+   This is true for BLOCK_NONLOCALIZED_VARS in inlined subroutines,
+   since we will need to refer to them each time the function is inlined.  */
 
 static bool
-add_location_or_const_value_attribute (dw_die_ref die, tree decl,
+add_location_or_const_value_attribute (dw_die_ref die, tree decl, bool cache_p,
 				       enum dwarf_attribute attr)
 {
   rtx rtl;
   dw_loc_list_ref list;
   var_loc_list *loc_list;
+  cached_dw_loc_list *cache;
+  void **slot;
 
   if (TREE_CODE (decl) == ERROR_MARK)
     return false;
@@ -17023,7 +17066,31 @@  add_location_or_const_value_attribute (d
 	  && add_const_value_attribute (die, rtl))
 	 return true;
     }
-  list = loc_list_from_tree (decl, decl_by_reference_p (decl) ? 0 : 2);
+  list = NULL;
+  /* If this decl is from BLOCK_NONLOCALIZED_VARS, we might need its
+     list several times.  See if we've already cached the contents.  */
+  if (cache_p && loc_list)
+    {
+      cache = (cached_dw_loc_list *)
+	htab_find_with_hash (cached_dw_loc_list_table, decl, DECL_UID (decl));
+      if (cache)
+	list = cache->loc_list;
+    }
+  if (list == NULL)
+    {
+      list = loc_list_from_tree (decl, decl_by_reference_p (decl) ? 0 : 2);
+      /* It is usually worth caching this result if the decl is from
+	 BLOCK_NONLOCALIZED_VARS and if the list has at least two elements.  */
+      if (cache_p && loc_list && list && list->dw_loc_next)
+	{
+	  slot = htab_find_slot_with_hash (cached_dw_loc_list_table, decl,
+					   DECL_UID (decl), INSERT);
+	  cache = ggc_alloc_cleared_cached_dw_loc_list ();
+	  cache->decl_id = DECL_UID (decl);
+	  cache->loc_list = list;
+	  *slot = cache;
+	}
+    }
   if (list)
     {
       add_AT_location_description (die, attr, list);
@@ -18684,7 +18751,7 @@  gen_formal_parameter_die (tree node, tre
         equate_decl_number_to_die (node, parm_die);
       if (! DECL_ABSTRACT (node_or_origin))
 	add_location_or_const_value_attribute (parm_die, node_or_origin,
-					       DW_AT_location);
+					       node == NULL, DW_AT_location);
 
       break;
 
@@ -19723,9 +19790,8 @@  gen_variable_die (tree decl, tree origin
           && !TREE_SYMBOL_REFERENCED (DECL_ASSEMBLER_NAME (decl_or_origin)))
 	defer_location (decl_or_origin, var_die);
       else
-        add_location_or_const_value_attribute (var_die,
-					       decl_or_origin,
-					       DW_AT_location);
+        add_location_or_const_value_attribute (var_die, decl_or_origin,
+					       decl == NULL, DW_AT_location);
       add_pubname (decl_or_origin, var_die);
     }
   else
@@ -21501,6 +21567,7 @@  dwarf2out_function_decl (tree decl)
   dwarf2out_decl (decl);
 
   htab_empty (decl_loc_table);
+  htab_empty (cached_dw_loc_list_table);
 }
 
 /* Output a marker (i.e. a label) for the beginning of the generated code for
@@ -22210,6 +22277,11 @@  dwarf2out_init (const char *filename ATT
   decl_loc_table = htab_create_ggc (10, decl_loc_table_hash,
 				    decl_loc_table_eq, NULL);
 
+  /* Allocate the cached_dw_loc_list_table.  */
+  cached_dw_loc_list_table
+    = htab_create_ggc (10, cached_dw_loc_list_table_hash,
+		       cached_dw_loc_list_table_eq, NULL);
+
   /* Allocate the initial hunk of the decl_scope_table.  */
   decl_scope_table = VEC_alloc (tree, gc, 256);
 
@@ -22852,30 +22924,53 @@  resolve_addr (dw_die_ref die)
 {
   dw_die_ref c;
   dw_attr_ref a;
-  dw_loc_list_ref *curr;
+  dw_loc_list_ref *curr, *start, loc;
   unsigned ix;
 
   FOR_EACH_VEC_ELT (dw_attr_node, die->die_attr, ix, a)
     switch (AT_class (a))
       {
       case dw_val_class_loc_list:
-	curr = AT_loc_list_ptr (a);
-	while (*curr)
+	start = curr = AT_loc_list_ptr (a);
+	loc = *curr;
+	gcc_assert (loc);
+	/* The same list can be referenced more than once.  See if we have
+	   already recorded the result from a previous pass.  */
+	if (loc->replaced)
+	  *curr = loc->dw_loc_next;
+	else if (!loc->resolved_addr)
 	  {
-	    if (!resolve_addr_in_expr ((*curr)->expr))
+	    /* As things stand, we do not expect or allow one die to
+	       reference a suffix of another die's location list chain.
+	       References must be identical or completely separate.
+	       There is therefore no need to cache the result of this
+	       pass on any list other than the first; doing so
+	       would lead to unnecessary writes.  */
+	    while (*curr)
 	      {
-		dw_loc_list_ref next = (*curr)->dw_loc_next;
-		if (next && (*curr)->ll_symbol)
+		gcc_assert (!(*curr)->replaced && !(*curr)->resolved_addr);
+		if (!resolve_addr_in_expr ((*curr)->expr))
 		  {
-		    gcc_assert (!next->ll_symbol);
-		    next->ll_symbol = (*curr)->ll_symbol;
+		    dw_loc_list_ref next = (*curr)->dw_loc_next;
+		    if (next && (*curr)->ll_symbol)
+		      {
+			gcc_assert (!next->ll_symbol);
+			next->ll_symbol = (*curr)->ll_symbol;
+		      }
+		    *curr = next;
 		  }
-		*curr = next;
+		else
+		  curr = &(*curr)->dw_loc_next;
 	      }
+	    if (loc == *start)
+	      loc->resolved_addr = 1;
 	    else
-	      curr = &(*curr)->dw_loc_next;
+	      {
+		loc->replaced = 1;
+		loc->dw_loc_next = *start;
+	      }
 	  }
-	if (!AT_loc_list (a))
+	if (!*start)
 	  {
 	    remove_AT (die, a->dw_attr);
 	    ix--;
@@ -23304,6 +23399,7 @@  dwarf2out_finish (const char *filename)
       add_location_or_const_value_attribute (
         VEC_index (deferred_locations, deferred_locations_list, i)->die,
         VEC_index (deferred_locations, deferred_locations_list, i)->variable,
+	false,
 	DW_AT_location);
     }