Patchwork Implement -fcallgraph-info option

login
register
mail settings
Submitter Eric Botcazou
Date Oct. 29, 2010, 10:43 a.m.
Message ID <201010291243.53912.ebotcazou@adacore.com>
Download mbox | patch
Permalink /patch/69570/
State New
Headers show

Comments

Eric Botcazou - Oct. 29, 2010, 10:43 a.m.
Hi,

this is again something that has been for a while in our tree and appears to 
be of interest to some other people:
  http://gcc.gnu.org/ml/gcc/2010-10/msg00179.html

The command line option -fcallgraph-info is added and makes the compiler 
generate another output file (xxx.ci) for each compilation unit, which is a 
valid VCG file (you can launch your favorite VCG viewer on it unmodified) and 
contains the "final" callgraph of the unit.  "final" is a bit of a misnomer 
as this is actually the callgraph at RTL expansion time, but since most 
high-level optimizations are done at the Tree level and RTL doesn't usually 
fiddle with calls, it's final in almost all cases.  Moreover, the nodes can 
be decorated with additional info: -fcallgraph-info=su adds stack usage info 
and -fcallgraph-info=da dynamic allocation info.

This is again strictly orthogonal to code and debug info generation.  There 
are a few non-obvious changes to libfuncs.h, builtins.c, expr.c and optabs.c 
to deal with quirks of the RTL expander, but this mostly removes dead code.
I can submit them separately if this is deemed better.

I've also attached an example of .ci file and its conversion to PNG format, 
obtained by compiling unwind-dw2.c with -fcallgraph-info=su on x86-64/Linux, 
as well as a small Perl script to manipulate/analyze them.

Tested on x86_64-suse-linux, OK for mainline?


2010-10-29  Eric Botcazou  <ebotcazou@adacore.com>

	Callgraph info support
	* common.opt (-fcallgraph-info[=]): New option.
	* doc/invoke.texi (Debugging options): Document it.
	* flags.h (flag_stack_usage_info): New flag.
	(flag_callgraph_info): Likewise.
	* opts.c (common_handle_option): Handle -fcallgraph-info[=].
	Set flag_stack_usage_info to 1 if -fstack-usage.
	* builtins.c (set_builtin_user_assembler_name): Do not initialize
	memcpy_libfunc and memset_libfunc.
	* calls.c (expand_call): If -fcallgraph-info, record the call.  Turn
	flag_stack_usage into flag_stack_usage_info.
	(emit_library_call_value_1): Likewise.
	* cgraph.h (struct cgraph_final_info): New structure.
	(struct cgraph_dynamic_alloc): Likewise.
	(cgraph_final_edge): Likewise.
	(cgraph_node): Add 'final' field.
	(dump_cgraph_final_vcg): Declare.
	(cgraph_final_record_call): Likewise.
	(cgraph_final_record_dynamic_alloc): Likewise.
	(cgraph_final_info): Likewise.
	* cgraph.c (cgraph_create_node): Initialize 'final' field.
	(final_create_edge): New static function.
	(cgraph_final_record_call): New global function.
	(cgraph_final_record_dynamic_alloc): Likewise.
	(cgraph_final_info): Likewise.
	(dump_cgraph_final_indirect_call_node_vcg): New static function.
	(dump_cgraph_final_edge_vcg): Likewise.
	(dump_cgraph_final_node_vcg): Likewise.
	(external_node_needed_p): Likewise.
	(dump_cgraph_final_vcg): New global function.
	* explow.c (allocate_dynamic_stack_space): Turn flag_stack_usage into
	flag_stack_usage_info.
	* expr.c (emit_block_move_via_libcall): Set input_location on the call.
	(set_storage_via_libcall): Likewise.
	(block_move_fn): Make global.
	(block_clear_fn): Likewise.
	Do not include gt-expr.h.
	* function.c (instantiate_virtual_regs): Turn flag_stack_usage into
	flag_stack_usage_info.
	(prepare_function_start): Likewise.
	(rest_of_handle_thread_prologue_and_epilogue): Likewise.
	* gimplify.c (gimplify_decl_expr): Record dynamically-allocated object
	by calling cgraph_final_record_dynamic_alloc if -fcallgraph-info=da.
	* libfuncs.h (libfunc_index): Remove LTI_memcpy and LTI_memset.
	(memcpy_libfunc): Delete.
	(memset_libfunc): Likewise.
	* optabs.c (init_one_libfunc): Do not zap the SYMBOL_REF_DECL.
	(init_optabs): Do not initialize memcpy_libfunc and memset_libfunc.
	* print-tree.c (print_decl_identifier): New function.
	* toplev.h (stack_usage_qual): Declare.
	* toplev.c (flag_callgraph_info): New flag.
	(flag_stack_usage_info): Likewise.
	(callgraph_info_file): New file pointer.
	(stack_usage_qual): New global variable.
	(output_stack_usage): If -fcallgraph-info=su, set stack_usage_kind
	and stack_usage of associated callgraph node.  If -fstack-usage, use
	print_decl_identifier for pretty-printing.
	(lang_dependent_init): Open file if -fcallgraph-info.
	(finalize): If callgraph_info_file is not null, invoke dump_cgraph_vcg
	and close file.
	* tree.h (print_decl_identifier): Declare it.
	(PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New.
	(block_move_fn): Declare.
	(block_clear_fn): Likewise.
	* Makefile.in (expr.o): Remove gt-expr.h.
	* config/alpha/alpha.c (alpha_expand_prologue): Turn flag_stack_usage
	into flag_stack_usage_info.
	* config/avr/avr.c (expand_prologue): Likewise.
	* config/i386/i386.c (ix86_expand_prologue): Likewise.
	* config/ia64/ia64.c (ia64_expand_prologue): Likewise.
	* config/mips/mips.c (mips_expand_prologue): Likewise.
	* config/pa/pa.c (hppa_expand_prologue): Likewise.
	* config/rs6000/rs6000.c (rs6000_emit_prologue): Likewise.
	* config/sh/sh.c (sh_expand_prologue): Likewise.
	* config/sparc/sparc.c (sparc_expand_prologue): Likewise.
	* config/picochip/picochip.c: Do not include libfuncs.h.
	* config/m68hc11/m68hc11.c (m68hc11_init_libfuncs): Do not initialize
	memcpy_libfunc and memset_libfunc.
	* config/vms/vms-crtl.h (MEM_LIBFUNCS_INIT): Likewise.
	* config/vms/vms-crtl-64.h (MEM_LIBFUNCS_INIT): Likewise.
ada/
	* gcc-interface/misc.c (callgraph_info_file): Delete.
Joseph S. Myers - Oct. 29, 2010, 11:36 a.m.
On Fri, 29 Oct 2010, Eric Botcazou wrote:

> +/* Compute stack usage information on a per-function basis.  */
> +extern int flag_stack_usage_info;
> +
> +/* Output callgraph information on a per-file basis.  */
> +#define CALLGRAPH_INFO_NAKED         0x1
> +#define CALLGRAPH_INFO_STACK_USAGE   0x2
> +#define CALLGRAPH_INFO_DYNAMIC_ALLOC 0x4
> +extern int flag_callgraph_info;

Please use Variable declarations in common.opt for new option variables, 
so they go in the gcc_options structure, instead of manual declarations in 
flags.h and toplev.c.
Richard Guenther - Oct. 29, 2010, 1:35 p.m.
On Fri, Oct 29, 2010 at 12:43 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> Hi,
>
> this is again something that has been for a while in our tree and appears to
> be of interest to some other people:
>  http://gcc.gnu.org/ml/gcc/2010-10/msg00179.html
>
> The command line option -fcallgraph-info is added and makes the compiler
> generate another output file (xxx.ci) for each compilation unit, which is a
> valid VCG file (you can launch your favorite VCG viewer on it unmodified) and
> contains the "final" callgraph of the unit.  "final" is a bit of a misnomer
> as this is actually the callgraph at RTL expansion time, but since most
> high-level optimizations are done at the Tree level and RTL doesn't usually
> fiddle with calls, it's final in almost all cases.  Moreover, the nodes can
> be decorated with additional info: -fcallgraph-info=su adds stack usage info
> and -fcallgraph-info=da dynamic allocation info.
>
> This is again strictly orthogonal to code and debug info generation.  There
> are a few non-obvious changes to libfuncs.h, builtins.c, expr.c and optabs.c
> to deal with quirks of the RTL expander, but this mostly removes dead code.
> I can submit them separately if this is deemed better.
>
> I've also attached an example of .ci file and its conversion to PNG format,
> obtained by compiling unwind-dw2.c with -fcallgraph-info=su on x86-64/Linux,
> as well as a small Perl script to manipulate/analyze them.
>
> Tested on x86_64-suse-linux, OK for mainline?

Hm, the patch looks a bit excessive for just a dumping facility ... can't we
simply incrementally output the VCG (and I'd prefer DOT here)?  After
inline clone materialization the graph edges to the callees are final and
you can add properties in final.c for stack size.  So I dont' really see the
need for the additional final cgraph structures.

Not to say that either a plugin or a simple gdb python script would be
the prefered way to implement all this (or a postprocessing script for
the .000i cgraph dump we already have).

Richard.

>
> 2010-10-29  Eric Botcazou  <ebotcazou@adacore.com>
>
>        Callgraph info support
>        * common.opt (-fcallgraph-info[=]): New option.
>        * doc/invoke.texi (Debugging options): Document it.
>        * flags.h (flag_stack_usage_info): New flag.
>        (flag_callgraph_info): Likewise.
>        * opts.c (common_handle_option): Handle -fcallgraph-info[=].
>        Set flag_stack_usage_info to 1 if -fstack-usage.
>        * builtins.c (set_builtin_user_assembler_name): Do not initialize
>        memcpy_libfunc and memset_libfunc.
>        * calls.c (expand_call): If -fcallgraph-info, record the call.  Turn
>        flag_stack_usage into flag_stack_usage_info.
>        (emit_library_call_value_1): Likewise.
>        * cgraph.h (struct cgraph_final_info): New structure.
>        (struct cgraph_dynamic_alloc): Likewise.
>        (cgraph_final_edge): Likewise.
>        (cgraph_node): Add 'final' field.
>        (dump_cgraph_final_vcg): Declare.
>        (cgraph_final_record_call): Likewise.
>        (cgraph_final_record_dynamic_alloc): Likewise.
>        (cgraph_final_info): Likewise.
>        * cgraph.c (cgraph_create_node): Initialize 'final' field.
>        (final_create_edge): New static function.
>        (cgraph_final_record_call): New global function.
>        (cgraph_final_record_dynamic_alloc): Likewise.
>        (cgraph_final_info): Likewise.
>        (dump_cgraph_final_indirect_call_node_vcg): New static function.
>        (dump_cgraph_final_edge_vcg): Likewise.
>        (dump_cgraph_final_node_vcg): Likewise.
>        (external_node_needed_p): Likewise.
>        (dump_cgraph_final_vcg): New global function.
>        * explow.c (allocate_dynamic_stack_space): Turn flag_stack_usage into
>        flag_stack_usage_info.
>        * expr.c (emit_block_move_via_libcall): Set input_location on the call.
>        (set_storage_via_libcall): Likewise.
>        (block_move_fn): Make global.
>        (block_clear_fn): Likewise.
>        Do not include gt-expr.h.
>        * function.c (instantiate_virtual_regs): Turn flag_stack_usage into
>        flag_stack_usage_info.
>        (prepare_function_start): Likewise.
>        (rest_of_handle_thread_prologue_and_epilogue): Likewise.
>        * gimplify.c (gimplify_decl_expr): Record dynamically-allocated object
>        by calling cgraph_final_record_dynamic_alloc if -fcallgraph-info=da.
>        * libfuncs.h (libfunc_index): Remove LTI_memcpy and LTI_memset.
>        (memcpy_libfunc): Delete.
>        (memset_libfunc): Likewise.
>        * optabs.c (init_one_libfunc): Do not zap the SYMBOL_REF_DECL.
>        (init_optabs): Do not initialize memcpy_libfunc and memset_libfunc.
>        * print-tree.c (print_decl_identifier): New function.
>        * toplev.h (stack_usage_qual): Declare.
>        * toplev.c (flag_callgraph_info): New flag.
>        (flag_stack_usage_info): Likewise.
>        (callgraph_info_file): New file pointer.
>        (stack_usage_qual): New global variable.
>        (output_stack_usage): If -fcallgraph-info=su, set stack_usage_kind
>        and stack_usage of associated callgraph node.  If -fstack-usage, use
>        print_decl_identifier for pretty-printing.
>        (lang_dependent_init): Open file if -fcallgraph-info.
>        (finalize): If callgraph_info_file is not null, invoke dump_cgraph_vcg
>        and close file.
>        * tree.h (print_decl_identifier): Declare it.
>        (PRINT_DECL_ORIGIN, PRINT_DECL_NAME, PRINT_DECL_UNIQUE_NAME): New.
>        (block_move_fn): Declare.
>        (block_clear_fn): Likewise.
>        * Makefile.in (expr.o): Remove gt-expr.h.
>        * config/alpha/alpha.c (alpha_expand_prologue): Turn flag_stack_usage
>        into flag_stack_usage_info.
>        * config/avr/avr.c (expand_prologue): Likewise.
>        * config/i386/i386.c (ix86_expand_prologue): Likewise.
>        * config/ia64/ia64.c (ia64_expand_prologue): Likewise.
>        * config/mips/mips.c (mips_expand_prologue): Likewise.
>        * config/pa/pa.c (hppa_expand_prologue): Likewise.
>        * config/rs6000/rs6000.c (rs6000_emit_prologue): Likewise.
>        * config/sh/sh.c (sh_expand_prologue): Likewise.
>        * config/sparc/sparc.c (sparc_expand_prologue): Likewise.
>        * config/picochip/picochip.c: Do not include libfuncs.h.
>        * config/m68hc11/m68hc11.c (m68hc11_init_libfuncs): Do not initialize
>        memcpy_libfunc and memset_libfunc.
>        * config/vms/vms-crtl.h (MEM_LIBFUNCS_INIT): Likewise.
>        * config/vms/vms-crtl-64.h (MEM_LIBFUNCS_INIT): Likewise.
> ada/
>        * gcc-interface/misc.c (callgraph_info_file): Delete.
>
> --
> Eric Botcazou
>
Eric Botcazou - Oct. 29, 2010, 1:50 p.m.
> Please use Variable declarations in common.opt for new option variables,
> so they go in the gcc_options structure, instead of manual declarations in
> flags.h and toplev.c.

flag_stack_usage_info isn't associated with any option though.  As for the 
other one, where should I put the defines then?
Joseph S. Myers - Oct. 29, 2010, 3:10 p.m.
On Fri, 29 Oct 2010, Eric Botcazou wrote:

> > Please use Variable declarations in common.opt for new option variables,
> > so they go in the gcc_options structure, instead of manual declarations in
> > flags.h and toplev.c.
> 
> flag_stack_usage_info isn't associated with any option though.  As for the 
> other one, where should I put the defines then?

The defines can stay in flags.h.

The aim is for common_handle_option (and all target option handlers) to 
cease to set any global state.  Everything it sets should end up either 
being set through the opts pointer passed to the function, or being set in 
subsequent code after the main option handling.  If a variable is closely 
related to option state, and could reasonably exist in both the driver and 
the core compilers, then it's appropriate for it to go in the gcc_options 
structure using a Variable declaration, whether or not it directly 
describes the state of a single option.  I think that applies to 
flag_stack_usage_info.

Patch

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 165899)
+++ doc/invoke.texi	(working copy)
@@ -309,7 +309,7 @@  Objective-C and Objective-C++ Dialects}.
 -fcompare-debug@r{[}=@var{opts}@r{]}  -fcompare-debug-second @gol
 -feliminate-dwarf2-dups -feliminate-unused-debug-types @gol
 -feliminate-unused-debug-symbols -femit-class-debug-always @gol
--fenable-icf-debug @gol
+-fcallgraph-info@r{[}=su,da@r{]}  -fenable-icf-debug @gol
 -fmem-report -fpre-ipa-mem-report -fpost-ipa-mem-report -fprofile-arcs @gol
 -frandom-seed=@var{string} -fsched-verbose=@var{n} @gol
 -fsel-sched-verbose -fsel-sched-dump-cfg -fsel-sched-pipelining-verbose @gol
@@ -4840,6 +4840,18 @@  the function.  If it is not present, the
 not bounded at compile-time and the second field only represents the
 bounded part.
 
+@item -fcallgraph-info
+@itemx -fcallgraph-info=@var{MARKERS}
+@opindex fcallgraph-info
+Makes the compiler output callgraph information for the program, on a
+per-file basis.  The information is generated in the common VCG format.
+It can be decorated with additional, per-node and/or per-edge information,
+if a list of comma-separated markers is additionally specified.  When the
+@code{su} marker is specified, the callgraph is decorated with stack usage
+information; it is equivalent to @option{-fstack-usage}.  When the @code{da}
+marker is specified, the callgraph is decorated with information about
+dynamically allocated objects.
+
 @item -fprofile-arcs
 @opindex fprofile-arcs
 Add code so that program flow @dfn{arcs} are instrumented.  During
Index: flags.h
===================================================================
--- flags.h	(revision 165899)
+++ flags.h	(working copy)
@@ -191,6 +191,15 @@  extern enum graph_dump_types graph_dump_
 
 extern enum stack_check_type flag_stack_check;
 
+/* Compute stack usage information on a per-function basis.  */
+extern int flag_stack_usage_info;
+
+/* Output callgraph information on a per-file basis.  */
+#define CALLGRAPH_INFO_NAKED         0x1
+#define CALLGRAPH_INFO_STACK_USAGE   0x2
+#define CALLGRAPH_INFO_DYNAMIC_ALLOC 0x4
+extern int flag_callgraph_info;
+
 /* Returns TRUE if generated code should match ABI version N or
    greater is in use.  */
 
Index: cgraph.c
===================================================================
--- cgraph.c	(revision 165899)
+++ cgraph.c	(working copy)
@@ -465,6 +465,8 @@  cgraph_create_node (void)
   node->previous = NULL;
   node->global.estimated_growth = INT_MIN;
   node->frequency = NODE_FREQUENCY_NORMAL;
+  if (flag_callgraph_info)
+    node->final = ggc_alloc_cleared_cgraph_final_info ();
   ipa_empty_ref_list (&node->ref_list);
   cgraph_nodes = node;
   cgraph_n_nodes++;
@@ -1727,6 +1729,59 @@  cgraph_mark_address_taken_node (struct c
   node->address_taken = 1;
 }
 
+/* Create edge from CALLER to CALLEE in the final cgraph.  */
+
+static struct cgraph_final_edge *
+final_create_edge (struct cgraph_node *caller, struct cgraph_node *callee,
+		   location_t location)
+{
+  struct cgraph_final_edge *e
+    = ggc_alloc_cleared_cgraph_final_edge ();
+  e->location = location;
+  e->caller = caller;
+  e->callee = callee;
+  e->next = caller->final->calls;
+  caller->final->calls = e;
+  return e;
+}
+
+/* Record call from SOURCE to DEST in the final cgraph.  */
+
+void
+cgraph_final_record_call (tree source, tree dest, location_t location)
+{
+  struct cgraph_node *callee;
+
+  if (dest)
+    {
+      callee = cgraph_node (dest);
+      callee->final->called = true;
+    }
+  else
+    callee = NULL;
+
+  (void) final_create_edge (cgraph_node (source), callee, location);
+}
+
+/* Record a dynamically-allocated DECL in the final cgraph of FNDECL.  */
+
+void
+cgraph_final_record_dynamic_alloc (tree fndecl, tree decl)
+{
+  const char *dot;
+  struct cgraph_final_info *cfi = cgraph_final_info (fndecl);
+  struct cgraph_dynamic_alloc *cda
+    = ggc_alloc_cleared_cgraph_dynamic_alloc ();
+  cda->location = DECL_SOURCE_LOCATION (decl);
+  cda->name = lang_hooks.decl_printable_name (decl, 2);
+  dot = strrchr (cda->name, '.');
+  if (dot)
+    cda->name = dot + 1;
+  cda->name = ggc_strdup (cda->name);
+  cda->next = cfi->dynamic_allocs;
+  cfi->dynamic_allocs = cda;
+}
+
 /* Return local info for the compiled function.  */
 
 struct cgraph_local_info *
@@ -1784,6 +1839,20 @@  cgraph_inline_failed_string (cgraph_inli
   return cif_string_table[reason];
 }
 
+/* Return final info for the compiled function.  */
+
+struct cgraph_final_info *
+cgraph_final_info (tree decl)
+{
+  struct cgraph_node *node;
+  gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
+  node = cgraph_node (decl);
+  if (decl != current_function_decl
+      && !TREE_ASM_WRITTEN (node->decl))
+    return NULL;
+  return node->final;
+}
+
 /* Return name of the node used in debug output.  */
 const char *
 cgraph_node_name (struct cgraph_node *node)
@@ -1989,6 +2058,178 @@  debug_cgraph (void)
 }
 
 
+/* Dump placeholder node for indirect calls in VCG format.  */
+
+#define INDIRECT_CALL_NAME  "__indirect_call"
+
+static void
+dump_cgraph_final_indirect_call_node_vcg (FILE *f)
+{
+  static bool emitted = false;
+  if (emitted)
+    return;
+
+  fputs ("node: { title: \"", f);
+  fputs (INDIRECT_CALL_NAME, f);
+  fputs ("\" label: \"", f);
+  fputs ("Indirect Call Placeholder", f);
+  fputs ("\" shape : ellipse }\n", f);
+  emitted = true;
+}
+
+/* Dump final cgraph edge in VCG format.  */
+
+static void
+dump_cgraph_final_edge_vcg (FILE *f, struct cgraph_final_edge *edge)
+{
+  fputs ("edge: { sourcename: \"", f);
+  print_decl_identifier (f, edge->caller->decl, PRINT_DECL_UNIQUE_NAME);
+  fputs ("\" targetname: \"", f);
+  if (edge->callee)
+    print_decl_identifier (f, edge->callee->decl, PRINT_DECL_UNIQUE_NAME);
+  else
+    fputs (INDIRECT_CALL_NAME, f);
+  if (edge->location != UNKNOWN_LOCATION)
+    {
+      expanded_location loc;
+      fputs ("\" label: \"", f);
+      loc = expand_location (edge->location);
+      fprintf (f, "%s:%d:%d", loc.file, loc.line, loc.column);
+    }
+  fputs ("\" }\n", f);
+
+  if (!edge->callee)
+    dump_cgraph_final_indirect_call_node_vcg (f);
+}
+
+/* Dump final cgraph node in VCG format.  */
+
+static void
+dump_cgraph_final_node_vcg (FILE *f, struct cgraph_node *node)
+{
+  struct cgraph_final_edge *edge;
+
+  fputs ("node: { title: \"", f);
+  print_decl_identifier (f, node->decl, PRINT_DECL_UNIQUE_NAME);
+  fputs ("\" label: \"", f);
+  print_decl_identifier (f, node->decl, PRINT_DECL_NAME);
+  fputs ("\\n", f);
+  print_decl_identifier (f, node->decl, PRINT_DECL_ORIGIN);
+
+  if (DECL_EXTERNAL (node->decl))
+    {
+      fputs ("\" shape : ellipse }\n", f);
+      return;
+    }
+
+  if (flag_callgraph_info & CALLGRAPH_INFO_STACK_USAGE)
+    {
+      if (node->final->stack_usage)
+	fprintf (f, "\\n"HOST_WIDE_INT_PRINT_DEC" bytes (%s)",
+		 node->final->stack_usage,
+		 stack_usage_qual[node->final->stack_usage_kind]);
+      else
+	fputs ("\\n0 bytes", f);
+    }
+
+  if (flag_callgraph_info & CALLGRAPH_INFO_DYNAMIC_ALLOC)
+    {
+      if (node->final->dynamic_allocs)
+	{
+	  struct cgraph_dynamic_alloc *cda, *next;
+	  unsigned int count = 1;
+
+	  /* Reverse the linked list and count members.  */
+	  cda = node->final->dynamic_allocs;
+	  next = cda->next;
+	  cda->next = NULL;
+	  while (next)
+	    {
+	      struct cgraph_dynamic_alloc *tmp = next;
+	      next = next->next;
+	      tmp->next = cda;
+	      cda = tmp;
+	      count++;
+	    }
+	  node->final->dynamic_allocs = cda;
+
+	  fprintf (f, "\\n%d dynamic objects", count);
+
+	  for (cda = node->final->dynamic_allocs; cda; cda = cda->next)
+	    {
+	      expanded_location loc = expand_location (cda->location);
+	      fprintf (f, "\\n %s", cda->name);
+	      fprintf (f, " %s:%d:%d", loc.file, loc.line, loc.column);
+	    }
+	}
+      else
+	fputs ("\\n0 dynamic objects", f);
+    }
+
+  fputs ("\" }\n", f);
+
+  for (edge = node->final->calls; edge; edge = edge->next)
+    dump_cgraph_final_edge_vcg (f, edge);
+}
+
+/* Return true if NODE is needed in the final callgraph.  */
+
+static inline bool
+external_node_needed_p (struct cgraph_node *node)
+{
+  static bool memcpy_node_seen = false;
+  static bool memset_node_seen = false;
+
+  /* External node that are eventually not called are not needed.  */
+  if (!node->final->called)
+    return false;
+
+  /* Take care of not emitting the MEMCPY node twice because of the
+     late creation of a clone by the RTL expander.  */
+  if ((DECL_BUILT_IN_CLASS (node->decl) == BUILT_IN_NORMAL
+       && DECL_FUNCTION_CODE (node->decl) == BUILT_IN_MEMCPY)
+      || node->decl == block_move_fn)
+    {
+      if (memcpy_node_seen)
+	return false;
+      else
+	memcpy_node_seen = true;
+    }
+
+  /* Likewise for the MEMSET node.  */
+  if ((DECL_BUILT_IN_CLASS (node->decl) == BUILT_IN_NORMAL
+       && DECL_FUNCTION_CODE (node->decl) == BUILT_IN_MEMSET)
+      || node->decl == block_clear_fn)
+    {
+      if (memset_node_seen)
+	return false;
+      else
+	memset_node_seen = true;
+    }
+
+  return true;
+}
+
+/* Dump the final cgraph in VCG format.  */
+
+void
+dump_cgraph_final_vcg (FILE *f)
+{
+  struct cgraph_node *node;
+
+  /* Write the file header.  */
+  fprintf (f, "graph: { title: \"%s\"\n", main_input_filename);
+
+  /* Output only nodes that have been written in the final code.  */
+  for (node = cgraph_nodes; node; node = node->next)
+    if ((DECL_EXTERNAL (node->decl) && external_node_needed_p (node))
+	|| TREE_ASM_WRITTEN (node->decl))
+      dump_cgraph_final_node_vcg (f, node);
+
+  fputs ("}\n", f);
+}
+
+
 /* Set the DECL_ASSEMBLER_NAME and update cgraph hashtables.  */
 
 void
Index: cgraph.h
===================================================================
--- cgraph.h	(revision 165899)
+++ cgraph.h	(working copy)
@@ -194,6 +194,34 @@  enum node_frequency {
 };
 
 
+/* Information about the function that is computed by various parts of
+   the compiler.  Available only for functions that have been already
+   assembled and if -fcallgraph-info was specified.  */
+
+struct GTY((chain_next ("%h.next"))) cgraph_final_edge
+{
+  location_t location;
+  struct cgraph_node *caller;
+  struct cgraph_node *callee;
+  struct cgraph_final_edge *next;
+};
+
+struct GTY((chain_next ("%h.next"))) cgraph_dynamic_alloc
+{
+  location_t location;
+  const char *name;
+  struct cgraph_dynamic_alloc *next;
+};
+
+struct GTY(()) cgraph_final_info
+{
+  struct cgraph_final_edge *calls;
+  int stack_usage_kind;
+  HOST_WIDE_INT stack_usage;
+  struct cgraph_dynamic_alloc *dynamic_allocs;
+  bool called;
+};
+
 /* The cgraph data structure.
    Each function decl has assigned cgraph_node listing callees and callers.  */
 
@@ -244,6 +272,8 @@  struct GTY((chain_next ("%h.next"), chai
   struct cgraph_clone_info clone;
   struct cgraph_thunk_info thunk;
 
+  struct cgraph_final_info *final;
+
   /* Expected number of executions: calculated in profile.c.  */
   gcov_type count;
   /* Unique id of the node.  */
@@ -547,6 +577,7 @@  void dump_cgraph (FILE *);
 void debug_cgraph (void);
 void dump_cgraph_node (FILE *, struct cgraph_node *);
 void debug_cgraph_node (struct cgraph_node *);
+void dump_cgraph_final_vcg (FILE *);
 void cgraph_insert_node_to_hashtable (struct cgraph_node *node);
 void cgraph_remove_edge (struct cgraph_edge *);
 void cgraph_remove_node (struct cgraph_node *);
@@ -574,9 +605,12 @@  void cgraph_create_edge_including_clones
 					  gimple, gimple, gcov_type, int, int,
 					  cgraph_inline_failed_t);
 void cgraph_update_edges_for_call_stmt (gimple, tree, gimple);
+void cgraph_final_record_call (tree, tree, location_t);
+void cgraph_final_record_dynamic_alloc (tree, tree);
 struct cgraph_local_info *cgraph_local_info (tree);
 struct cgraph_global_info *cgraph_global_info (tree);
 struct cgraph_rtl_info *cgraph_rtl_info (tree);
+struct cgraph_final_info *cgraph_final_info (tree);
 const char * cgraph_node_name (struct cgraph_node *);
 struct cgraph_edge * cgraph_clone_edge (struct cgraph_edge *,
 					struct cgraph_node *, gimple,
Index: libfuncs.h
===================================================================
--- libfuncs.h	(revision 165899)
+++ libfuncs.h	(working copy)
@@ -26,10 +26,8 @@  along with GCC; see the file COPYING3.
 enum libfunc_index
 {
   LTI_abort,
-  LTI_memcpy,
   LTI_memmove,
   LTI_memcmp,
-  LTI_memset,
   LTI_setbits,
 
   LTI_setjmp,
@@ -79,10 +77,8 @@  extern struct target_libfuncs *this_targ
 /* Accessor macros for libfunc_table.  */
 
 #define abort_libfunc	(libfunc_table[LTI_abort])
-#define memcpy_libfunc	(libfunc_table[LTI_memcpy])
 #define memmove_libfunc	(libfunc_table[LTI_memmove])
 #define memcmp_libfunc	(libfunc_table[LTI_memcmp])
-#define memset_libfunc	(libfunc_table[LTI_memset])
 #define setbits_libfunc	(libfunc_table[LTI_setbits])
 
 #define setjmp_libfunc	(libfunc_table[LTI_setjmp])
Index: optabs.c
===================================================================
--- optabs.c	(revision 165899)
+++ optabs.c	(working copy)
@@ -6005,10 +6005,6 @@  build_libfunc_function (const char *name
   TREE_PUBLIC (decl) = 1;
   gcc_assert (DECL_ASSEMBLER_NAME (decl));
 
-  /* Zap the nonsensical SYMBOL_REF_DECL for this.  What we're left with
-     are the flags assigned by targetm.encode_section_info.  */
-  SET_SYMBOL_REF_DECL (XEXP (DECL_RTL (decl), 0), NULL);
-
   return decl;
 }
 
@@ -6536,10 +6532,8 @@  init_optabs (void)
     set_optab_libfunc (abs_optab, TYPE_MODE (complex_double_type_node), "cabs");
 
   abort_libfunc = init_one_libfunc ("abort");
-  memcpy_libfunc = init_one_libfunc ("memcpy");
   memmove_libfunc = init_one_libfunc ("memmove");
   memcmp_libfunc = init_one_libfunc ("memcmp");
-  memset_libfunc = init_one_libfunc ("memset");
   setbits_libfunc = init_one_libfunc ("__setbits");
 
 #ifndef DONT_USE_BUILTIN_SETJMP
Index: tree.h
===================================================================
--- tree.h	(revision 165899)
+++ tree.h	(working copy)
@@ -5169,6 +5169,10 @@  extern void print_vec_tree (FILE *, cons
 extern void print_node_brief (FILE *, const char *, const_tree, int);
 extern void indent_to (FILE *, int);
 #endif
+#define PRINT_DECL_ORIGIN       0x1
+#define PRINT_DECL_NAME         0x2
+#define PRINT_DECL_UNIQUE_NAME  0x4
+extern void print_decl_identifier (FILE *, tree, int flags);
 
 /* In tree-inline.c:  */
 extern bool debug_find_tree (tree, tree);
@@ -5474,6 +5478,8 @@  extern void fini_object_sizes (void);
 extern unsigned HOST_WIDE_INT compute_builtin_object_size (tree, int);
 
 /* In expr.c.  */
+extern GTY(()) tree block_move_fn;
+extern GTY(()) tree block_clear_fn;
 extern unsigned HOST_WIDE_INT highest_pow2_factor (const_tree);
 extern tree build_personality_function (const char *);
 
Index: builtins.c
===================================================================
--- builtins.c	(revision 165899)
+++ builtins.c	(working copy)
@@ -13763,11 +13763,9 @@  set_builtin_user_assembler_name (tree de
     {
     case BUILT_IN_MEMCPY:
       init_block_move_fn (asmspec);
-      memcpy_libfunc = set_user_assembler_libfunc ("memcpy", asmspec);
       break;
     case BUILT_IN_MEMSET:
       init_block_clear_fn (asmspec);
-      memset_libfunc = set_user_assembler_libfunc ("memset", asmspec);
       break;
     case BUILT_IN_MEMMOVE:
       memmove_libfunc = set_user_assembler_libfunc ("memmove", asmspec);
Index: toplev.c
===================================================================
--- toplev.c	(revision 165899)
+++ toplev.c	(working copy)
@@ -261,6 +261,12 @@  rtx stack_limit_rtx;
 /* Type of stack check.  */
 enum stack_check_type flag_stack_check = NO_STACK_CHECK;
 
+/* Output callgraph information on a per-file basis.  */
+int flag_callgraph_info;
+
+/* Compute stack usage information on a per-function basis.  */
+int flag_stack_usage_info;
+
 /* True if the user has tagged the function with the 'section'
    attribute.  */
 
@@ -297,6 +303,7 @@  static const param_info lang_independent
 
 FILE *asm_out_file;
 FILE *aux_info_file;
+FILE *callgraph_info_file = NULL;
 FILE *stack_usage_file = NULL;
 FILE *dump_file = NULL;
 const char *dump_file_name;
@@ -1532,21 +1539,15 @@  alloc_for_identifier_to_locale (size_t l
   return ggc_alloc_atomic (len);
 }
 
+const char *stack_usage_qual[] = { "static", "dynamic", "dynamic,bounded" };
+
 /* Output stack usage information.  */
 void
 output_stack_usage (void)
 {
   static bool warning_issued = false;
-  enum stack_usage_kind_type { STATIC = 0, DYNAMIC, DYNAMIC_BOUNDED };
-  const char *stack_usage_kind_str[] = {
-    "static",
-    "dynamic",
-    "dynamic,bounded"
-  };
   HOST_WIDE_INT stack_usage = current_function_static_stack_size;
-  enum stack_usage_kind_type stack_usage_kind;
-  expanded_location loc;
-  const char *raw_id, *id;
+  int stack_usage_kind;
 
   if (stack_usage < 0)
     {
@@ -1558,45 +1559,42 @@  output_stack_usage (void)
       return;
     }
 
-  stack_usage_kind = STATIC;
+  stack_usage_kind = SU_STATIC;
 
   /* Add the maximum amount of space pushed onto the stack.  */
   if (current_function_pushed_stack_size > 0)
     {
       stack_usage += current_function_pushed_stack_size;
-      stack_usage_kind = DYNAMIC_BOUNDED;
+      stack_usage_kind = SU_DYNAMIC_BOUNDED;
     }
 
   /* Now on to the tricky part: dynamic stack allocation.  */
   if (current_function_allocates_dynamic_stack_space)
     {
       if (current_function_has_unbounded_dynamic_stack_size)
-	stack_usage_kind = DYNAMIC;
+	stack_usage_kind = SU_DYNAMIC;
       else
-	stack_usage_kind = DYNAMIC_BOUNDED;
+	stack_usage_kind = SU_DYNAMIC_BOUNDED;
 
       /* Add the size even in the unbounded case, this can't hurt.  */
       stack_usage += current_function_dynamic_stack_size;
     }
 
-  loc = expand_location (DECL_SOURCE_LOCATION (current_function_decl));
-
-  /* Strip the scope prefix if any.  */
-  raw_id = lang_hooks.decl_printable_name (current_function_decl, 2);
-  id = strrchr (raw_id, '.');
-  if (id)
-    id++;
-  else
-    id = raw_id;
+  if (flag_callgraph_info & CALLGRAPH_INFO_STACK_USAGE)
+    {
+      struct cgraph_final_info *cfi
+	= cgraph_final_info (current_function_decl);
+      cfi->stack_usage = stack_usage;
+      cfi->stack_usage_kind = stack_usage_kind;
+    }
 
-  fprintf (stack_usage_file,
-	   "%s:%d:%d:%s\t"HOST_WIDE_INT_PRINT_DEC"\t%s\n",
-	   lbasename (loc.file),
-	   loc.line,
-	   loc.column,
-	   id,
-	   stack_usage,
-	   stack_usage_kind_str[stack_usage_kind]);
+  if (flag_stack_usage)
+    {
+      print_decl_identifier (stack_usage_file, current_function_decl,
+			     PRINT_DECL_ORIGIN | PRINT_DECL_NAME);
+      fprintf (stack_usage_file, "\t"HOST_WIDE_INT_PRINT_DEC"\t%s\n",
+	       stack_usage, stack_usage_qual[stack_usage_kind]);
+    }
 }
 
 /* Open an auxiliary output file.  */
@@ -2234,6 +2232,10 @@  lang_dependent_init (const char *name)
   if (flag_stack_usage)
     stack_usage_file = open_auxiliary_file ("su");
 
+  /* If call graph information is desired, open the output file.  */
+  if (flag_callgraph_info)
+    callgraph_info_file = open_auxiliary_file ("ci");
+
   /* This creates various _DECL nodes, so needs to be called after the
      front end is initialized.  */
   init_eh ();
@@ -2318,6 +2320,12 @@  finalize (void)
   if (stack_usage_file)
     fclose (stack_usage_file);
 
+  if (callgraph_info_file)
+    {
+      dump_cgraph_final_vcg (callgraph_info_file);
+      fclose (callgraph_info_file);
+    }
+
   statistics_fini ();
   finish_optimization_passes ();
 
Index: expr.c
===================================================================
--- expr.c	(revision 165899)
+++ expr.c	(working copy)
@@ -1364,6 +1364,7 @@  emit_block_move_via_libcall (rtx dst, rt
   fn = emit_block_move_libcall_fn (true);
   call_expr = build_call_expr (fn, 3, dst_tree, src_tree, size_tree);
   CALL_EXPR_TAILCALL (call_expr) = tailcall;
+  SET_EXPR_LOCATION (call_expr, input_location);
 
   retval = expand_normal (call_expr);
 
@@ -1374,7 +1375,7 @@  emit_block_move_via_libcall (rtx dst, rt
    for the function we use for block copies.  The first time FOR_CALL
    is true, we call assemble_external.  */
 
-static GTY(()) tree block_move_fn;
+tree block_move_fn;
 
 void
 init_block_move_fn (const char *asmspec)
@@ -2669,6 +2670,7 @@  set_storage_via_libcall (rtx object, rtx
   fn = clear_storage_libcall_fn (true);
   call_expr = build_call_expr (fn, 3, object_tree, val_tree, size_tree);
   CALL_EXPR_TAILCALL (call_expr) = tailcall;
+  SET_EXPR_LOCATION (call_expr, input_location);
 
   retval = expand_normal (call_expr);
 
@@ -10325,5 +10327,3 @@  get_personality_function (tree decl)
 
   return XEXP (DECL_RTL (personality), 0);
 }
-
-#include "gt-expr.h"
Index: opts.c
===================================================================
--- opts.c	(revision 165899)
+++ opts.c	(working copy)
@@ -1882,6 +1882,32 @@  common_handle_option (struct gcc_options
       add_debug_prefix_map (arg);
       break;
 
+    case OPT_fcallgraph_info:
+      flag_callgraph_info = CALLGRAPH_INFO_NAKED;
+      break;
+
+    case OPT_fcallgraph_info_:
+      {
+	char *my_arg, *p;
+	my_arg = xstrdup (arg);
+	p = strtok (my_arg, ",");
+	while (p)
+	  {
+	    if (strcmp (p, "su") == 0)
+	      {
+		flag_callgraph_info |= CALLGRAPH_INFO_STACK_USAGE;
+		flag_stack_usage_info = 1;
+	      }
+	    else if (strcmp (p, "da") == 0)
+	      flag_callgraph_info |= CALLGRAPH_INFO_DYNAMIC_ALLOC;
+	    else
+	      return 0;
+	    p = strtok (NULL, ",");
+	  }
+	free (my_arg);
+      }
+      break;
+
     case OPT_fdiagnostics_show_location_:
       if (!strcmp (arg, "once"))
 	diagnostic_prefixing_rule (global_dc) = DIAGNOSTICS_SHOW_PREFIX_ONCE;
@@ -2109,6 +2135,12 @@  common_handle_option (struct gcc_options
       stack_limit_rtx = gen_rtx_SYMBOL_REF (Pmode, ggc_strdup (arg));
       break;
 
+    case OPT_fstack_usage:
+      flag_stack_usage = value;
+      if (value)
+	flag_stack_usage_info = 1;
+      break;
+
     case OPT_ftree_vectorizer_verbose_:
       vect_set_verbosity_level (arg);
       break;
Index: ada/gcc-interface/misc.c
===================================================================
--- ada/gcc-interface/misc.c	(revision 165899)
+++ ada/gcc-interface/misc.c	(working copy)
@@ -56,9 +56,6 @@ 
 #include "ada-tree.h"
 #include "gigi.h"
 
-/* This symbol needs to be defined for the front-end.  */
-void *callgraph_info_file = NULL;
-
 /* Command-line argc and argv.  These variables are global since they are
    imported in back_end.adb.  */
 unsigned int save_argc;
Index: function.c
===================================================================
--- function.c	(revision 165899)
+++ function.c	(working copy)
@@ -1907,7 +1907,7 @@  instantiate_virtual_regs (void)
 
   /* See allocate_dynamic_stack_space for the rationale.  */
 #ifdef SETJMP_VIA_SAVE_AREA
-  if (flag_stack_usage && cfun->calls_setjmp)
+  if (flag_stack_usage_info && cfun->calls_setjmp)
     {
       int align = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
       dynamic_offset = (dynamic_offset + align - 1) / align * align;
@@ -4384,7 +4384,7 @@  prepare_function_start (void)
   init_expr ();
   default_rtl_profile ();
 
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     {
       cfun->su = ggc_alloc_cleared_stack_usage ();
       cfun->su->static_stack_size = -1;
@@ -5813,7 +5813,7 @@  rest_of_handle_thread_prologue_and_epilo
   thread_prologue_and_epilogue_insns ();
 
   /* The stack usage info is finalized during prologue expansion.  */
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     output_stack_usage ();
 
   return 0;
Index: gimplify.c
===================================================================
--- gimplify.c	(revision 165899)
+++ gimplify.c	(working copy)
@@ -1339,6 +1339,9 @@  gimplify_vla_decl (tree decl, gimple_seq
   /* Indicate that we need to restore the stack level when the
      enclosing BIND_EXPR is exited.  */
   gimplify_ctxp->save_stack = true;
+
+  if (flag_callgraph_info & CALLGRAPH_INFO_DYNAMIC_ALLOC)
+    cgraph_final_record_dynamic_alloc (current_function_decl, decl);
 }
 
 
Index: calls.c
===================================================================
--- calls.c	(revision 165899)
+++ calls.c	(working copy)
@@ -2388,6 +2388,10 @@  expand_call (tree exp, rtx target, int i
 
   preferred_unit_stack_boundary = preferred_stack_boundary / BITS_PER_UNIT;
 
+  if (flag_callgraph_info)
+    cgraph_final_record_call (current_function_decl, fndecl,
+			      EXPR_LOCATION (exp));
+
   /* We want to make two insn chains; one for a sibling call, the other
      for a normal call.  We will select one of the two chains after
      initial RTL generation is complete.  */
@@ -2499,7 +2503,7 @@  expand_call (tree exp, rtx target, int i
 	      stack_arg_under_construction = 0;
 	    }
 	  argblock = push_block (ARGS_SIZE_RTX (adjusted_args_size), 0, 0);
-	  if (flag_stack_usage)
+	  if (flag_stack_usage_info)
 	    current_function_has_unbounded_dynamic_stack_size = 1;
 	}
       else
@@ -2709,7 +2713,7 @@  expand_call (tree exp, rtx target, int i
       /* Record the maximum pushed stack space size.  We need to delay
 	 doing it this far to take into account the optimization done
 	 by combine_pending_stack_adjustment_and_call.  */
-      if (flag_stack_usage
+      if (flag_stack_usage_info
 	  && !ACCUMULATE_OUTGOING_ARGS
 	  && pass
 	  && adjusted_args_size.var == 0)
@@ -3576,7 +3580,7 @@  emit_library_call_value_1 (int retval, r
   if (args_size.constant > crtl->outgoing_args_size)
     crtl->outgoing_args_size = args_size.constant;
 
-  if (flag_stack_usage && !ACCUMULATE_OUTGOING_ARGS)
+  if (flag_stack_usage_info && !ACCUMULATE_OUTGOING_ARGS)
     {
       int pushed = args_size.constant + pending_stack_adjust;
       if (pushed > current_function_pushed_stack_size)
@@ -3856,6 +3860,10 @@  emit_library_call_value_1 (int retval, r
 
   before_call = get_last_insn ();
 
+  if (flag_callgraph_info)
+    cgraph_final_record_call (current_function_decl,
+			      SYMBOL_REF_DECL (orgfun), input_location);
+
   /* We pass the old value of inhibit_defer_pop + 1 to emit_call_1, which
      will set inhibit_defer_pop to that value.  */
   /* The return type is needed to decide how many bytes the function pops.
Index: explow.c
===================================================================
--- explow.c	(revision 165899)
+++ explow.c	(working copy)
@@ -1163,7 +1163,7 @@  allocate_dynamic_stack_space (rtx size,
   /* If stack usage info is requested, look into the size we are passed.
      We need to do so this early to avoid the obfuscation that may be
      introduced later by the various alignment operations.  */
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     {
       if (CONST_INT_P (size))
 	stack_usage_size = INTVAL (size);
@@ -1251,7 +1251,7 @@  allocate_dynamic_stack_space (rtx size,
       size = plus_constant (size, extra);
       size = force_operand (size, NULL_RTX);
 
-      if (flag_stack_usage)
+      if (flag_stack_usage_info)
 	stack_usage_size += extra;
 
       if (extra && size_align > extra_align)
@@ -1282,7 +1282,7 @@  allocate_dynamic_stack_space (rtx size,
       /* The above dynamic offset cannot be computed statically at this
 	 point, but it will be possible to do so after RTL expansion is
 	 done.  Record how many times we will need to add it.  */
-      if (flag_stack_usage)
+      if (flag_stack_usage_info)
 	current_function_dynamic_alloc_count++;
 
       /* ??? Can we infer a minimum of STACK_BOUNDARY here?  */
@@ -1307,7 +1307,7 @@  allocate_dynamic_stack_space (rtx size,
     {
       size = round_push (size);
 
-      if (flag_stack_usage)
+      if (flag_stack_usage_info)
 	{
 	  int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
 	  stack_usage_size = (stack_usage_size + align - 1) / align * align;
@@ -1318,7 +1318,7 @@  allocate_dynamic_stack_space (rtx size,
 
   /* The size is supposed to be fully adjusted at this point so record it
      if stack usage info is requested.  */
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     {
       current_function_dynamic_stack_size += stack_usage_size;
 
Index: print-tree.c
===================================================================
--- print-tree.c	(revision 165899)
+++ print-tree.c	(working copy)
@@ -26,6 +26,7 @@  along with GCC; see the file COPYING3.
 #include "tm.h"
 #include "tree.h"
 #include "ggc.h"
+#include "toplev.h"
 #include "langhooks.h"
 #include "tree-iterator.h"
 #include "diagnostic.h"
@@ -995,6 +996,72 @@  print_node (FILE *file, const char *pref
   fprintf (file, ">");
 }
 
+/* Print the identifier for DECL according to FLAGS.  */
+
+void
+print_decl_identifier (FILE *file, tree decl, int flags)
+{
+  bool needs_colon = false;
+  const char *name;
+  char *malloced_name = NULL;
+  char c;
+
+  if (flags & PRINT_DECL_ORIGIN)
+    {
+      if (DECL_IS_BUILTIN (decl))
+	fputs ("<built-in>", file);
+      else
+	{
+	  expanded_location loc
+	    = expand_location (DECL_SOURCE_LOCATION (decl));
+	  fprintf (file, "%s:%d:%d", loc.file, loc.line, loc.column);
+	}
+      needs_colon = true;
+    }
+
+  if (flags & PRINT_DECL_UNIQUE_NAME)
+    {
+      name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+      if (!TREE_PUBLIC (decl)
+	  || (DECL_WEAK (decl) && !DECL_EXTERNAL (decl)))
+        /* The symbol has internal or weak linkage so its assembler name
+	   is not necessarily unique among the compilation units of the
+	   program.  We therefore have to further mangle it.  But we can't
+	   simply use DECL_SOURCE_FILE because it contains the name of the
+	   file the symbol originates from so, e.g. for function templates
+	   in C++ where the templates are defined in a header file, we can
+	   have symbols with the same assembler name and DECL_SOURCE_FILE.
+	   That's why we use the name of the top-level source file of the
+	   compilation unit.  */
+	name = malloced_name = concat (main_input_filename, ":", name, NULL);
+    }
+  else if (flags & PRINT_DECL_NAME)
+    {
+      const char *dot;
+
+      name = lang_hooks.decl_printable_name (decl, 2);
+      dot = strrchr (name, '.');
+      if (dot)
+	name = dot + 1;
+    }
+  else
+    return;
+
+  if (needs_colon)
+    fputc (':', file);
+
+  while ((c = *name++) != '\0')
+    {
+      /* Strip double-quotes because of VCG.  */
+      if (c == '"')
+	continue;
+      fputc (c, file);
+    }
+
+  if (malloced_name)
+    free (malloced_name);
+}
+
 /* Print the tree vector VEC in full on file FILE, preceded by PREFIX,
    starting in column INDENT.  */
 
Index: common.opt
===================================================================
--- common.opt	(revision 165899)
+++ common.opt	(working copy)
@@ -670,6 +670,14 @@  fcombine-stack-adjustments
 Common Report Var(flag_combine_stack_adjustments) Optimization
 Looks for opportunities to reduce stack adjustments and stack references.
 
+fcallgraph-info
+Common RejectNegative
+Output callgraph information on a per-file basis
+
+fcallgraph-info=
+Common RejectNegative Joined
+Output callgraph information on a per-file basis with decorations
+
 fcommon
 Common Report Var(flag_no_common,0) Optimization
 Do not put uninitialized globals in the common section
Index: output.h
===================================================================
--- output.h	(revision 165899)
+++ output.h	(working copy)
@@ -640,7 +640,9 @@  extern int maybe_assemble_visibility (tr
 
 extern int default_address_cost (rtx, bool);
 
-/* Output stack usage information.  */
+/* Stack usage.  */
+enum { SU_STATIC = 0, SU_DYNAMIC, SU_DYNAMIC_BOUNDED };
+extern const char *stack_usage_qual[];
 extern void output_stack_usage (void);
 
 /* dbxout helper functions */
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 165899)
+++ Makefile.in	(working copy)
@@ -2347,8 +2347,8 @@  tree-inline.o : tree-inline.c $(CONFIG_H
    debug.h $(DIAGNOSTIC_H) $(EXCEPT_H) $(TREE_FLOW_H) tree-iterator.h tree-mudflap.h \
    $(IPA_PROP_H) value-prof.h $(TREE_PASS_H) $(TARGET_H) $(INTEGRATE_H) \
    tree-pretty-print.h
-print-tree.o : print-tree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
-   $(GGC_H) langhooks.h tree-iterator.h \
+print-tree.o : print-tree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
+   $(TREE_H) $(TOPLEV_H) $(GGC_H) langhooks.h tree-iterator.h \
    $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_PASS_H) gimple-pretty-print.h
 stor-layout.o : stor-layout.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(TREE_H) $(PARAMS_H) $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) output.h $(RTL_H) \
@@ -2917,7 +2917,7 @@  expr.o : expr.c $(CONFIG_H) $(SYSTEM_H)
    $(LIBFUNCS_H) $(INSN_ATTR_H) insn-config.h $(RECOG_H) output.h \
    typeclass.h hard-reg-set.h $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) hard-reg-set.h $(EXCEPT_H) \
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
-   tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
+   tree-iterator.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
    $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H)
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
Index: config/alpha/alpha.c
===================================================================
--- config/alpha/alpha.c	(revision 165899)
+++ config/alpha/alpha.c	(working copy)
@@ -7805,7 +7805,7 @@  alpha_expand_prologue (void)
   sa_size = alpha_sa_size ();
   frame_size = compute_frame_size (get_frame_size (), sa_size);
 
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     current_function_static_stack_size = frame_size;
 
   if (TARGET_ABI_OPEN_VMS)
Index: config/sparc/sparc.c
===================================================================
--- config/sparc/sparc.c	(revision 165899)
+++ config/sparc/sparc.c	(working copy)
@@ -4438,7 +4438,7 @@  sparc_expand_prologue (void)
   /* Advertise that the data calculated just above are now valid.  */
   sparc_prologue_data_valid_p = true;
 
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     current_function_static_stack_size = actual_fsize;
 
   if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK && actual_fsize)
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 165899)
+++ config/i386/i386.c	(working copy)
@@ -9696,7 +9696,7 @@  ix86_expand_prologue (void)
 
   allocate = frame.stack_pointer_offset - m->fs.sp_offset;
 
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     {
       /* We start to count from ARG_POINTER.  */
       HOST_WIDE_INT stack_size = frame.stack_pointer_offset;
Index: config/sh/sh.c
===================================================================
--- config/sh/sh.c	(revision 165899)
+++ config/sh/sh.c	(working copy)
@@ -7207,7 +7207,7 @@  sh_expand_prologue (void)
       emit_insn (gen_shcompact_incoming_args ());
     }
 
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     current_function_static_stack_size = stack_usage;
 }
 
Index: config/avr/avr.c
===================================================================
--- config/avr/avr.c	(revision 165899)
+++ config/avr/avr.c	(working copy)
@@ -766,7 +766,7 @@  expand_prologue (void)
         }
     }
 
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     current_function_static_stack_size = cfun->machine->stack_usage;
 }
 
Index: config/m68hc11/m68hc11.c
===================================================================
--- config/m68hc11/m68hc11.c	(revision 165899)
+++ config/m68hc11/m68hc11.c	(working copy)
@@ -5145,9 +5145,7 @@  m68hc11_reorg (void)
 static void
 m68hc11_init_libfuncs (void)
 {
-  memcpy_libfunc = init_one_libfunc ("__memcpy");
   memcmp_libfunc = init_one_libfunc ("__memcmp");
-  memset_libfunc = init_one_libfunc ("__memset");
 }
 
 
Index: config/vms/vms-crtl-64.h
===================================================================
--- config/vms/vms-crtl-64.h	(revision 165899)
+++ config/vms/vms-crtl-64.h	(working copy)
@@ -189,7 +189,5 @@  along with GCC; see the file COPYING3.
 #undef MEM_LIBFUNCS_INIT
 #define MEM_LIBFUNCS_INIT                                 \
 do {                                                      \
-  memcpy_libfunc = init_one_libfunc ("decc$_memcpy64");   \
   memmove_libfunc = init_one_libfunc ("decc$_memmove64"); \
-  memset_libfunc = init_one_libfunc ("decc$_memset64");   \
 } while (0)
Index: config/vms/vms-crtl.h
===================================================================
--- config/vms/vms-crtl.h	(revision 165899)
+++ config/vms/vms-crtl.h	(working copy)
@@ -185,7 +185,5 @@  along with GCC; see the file COPYING3.
 
 #define MEM_LIBFUNCS_INIT                              \
 do {                                                   \
-  memcpy_libfunc = init_one_libfunc ("decc$memcpy");   \
   memmove_libfunc = init_one_libfunc ("decc$memmove"); \
-  memset_libfunc = init_one_libfunc ("decc$memset");   \
 } while (0)
Index: config/ia64/ia64.c
===================================================================
--- config/ia64/ia64.c	(revision 165899)
+++ config/ia64/ia64.c	(working copy)
@@ -3140,7 +3140,7 @@  ia64_expand_prologue (void)
   ia64_compute_frame_size (get_frame_size ());
   last_scratch_gr_reg = 15;
 
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     current_function_static_stack_size = current_frame_info.total_size;
 
   if (dump_file) 
Index: config/rs6000/rs6000.c
===================================================================
--- config/rs6000/rs6000.c	(revision 165899)
+++ config/rs6000/rs6000.c	(working copy)
@@ -19836,7 +19836,7 @@  rs6000_emit_prologue (void)
 			      && call_used_regs[STATIC_CHAIN_REGNUM]);
   HOST_WIDE_INT sp_offset = 0;
 
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     current_function_static_stack_size = info->total_size;
 
   if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK && info->total_size)
Index: config/picochip/picochip.c
===================================================================
--- config/picochip/picochip.c	(revision 165899)
+++ config/picochip/picochip.c	(working copy)
@@ -60,7 +60,6 @@  along with GCC; see the file COPYING3.
 #include "optabs.h"		/* For GEN_FCN */
 #include "basic-block.h"	/* UPDATE_LIFE_GLOBAL* for picochip_reorg. */
 #include "timevar.h"		/* For TV_SCHED2, in picochip_reorg. */
-#include "libfuncs.h"		/* For memcpy_libfuncs, etc. */
 #include "df.h"			/* For df_regs_ever_live_df_regs_ever_live_pp, etc. */
 
 
Index: config/pa/pa.c
===================================================================
--- config/pa/pa.c	(revision 165899)
+++ config/pa/pa.c	(working copy)
@@ -3757,7 +3757,7 @@  hppa_expand_prologue (void)
     local_fsize += STARTING_FRAME_OFFSET;
 
   actual_fsize = compute_frame_size (size, &save_fregs);
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     current_function_static_stack_size = actual_fsize;
 
   /* Compute a few things we will use often.  */
Index: config/mips/mips.c
===================================================================
--- config/mips/mips.c	(revision 165899)
+++ config/mips/mips.c	(working copy)
@@ -10082,7 +10082,7 @@  mips_expand_prologue (void)
   frame = &cfun->machine->frame;
   size = frame->total_size;
 
-  if (flag_stack_usage)
+  if (flag_stack_usage_info)
     current_function_static_stack_size = size;
 
   /* Save the registers.  Allocate up to MIPS_MAX_FIRST_STACK_STEP