Patchwork add PMU profiling support (issue4638047)

login
register
mail settings
Submitter Sharad Singhai
Date June 19, 2011, 7:29 a.m.
Message ID <20110619072929.6DBDF15C19F@nabu.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/100944/
State New
Headers show

Comments

Sharad Singhai - June 19, 2011, 7:29 a.m.
This patch is a work in progress. It adds support for collecting PMU
profile via an external tool (currently via pfmon).  This patch is
platform specific to x86 since it collects performance counter data.  It
writes the samples collected by the PMU tool into a single gcda file.
Currently, PMU profile collection is supported only on x86 via
predefined events. The PMU profile information can be visualized via
gcov tool.

The optimization phases may later utilize the PMU profiling data for better
decisions during profile-use phase. That part is yet to be developed.

Okay for google/main?

Sharad


2011-06-18   Sharad Singhai  <singhai@google.com>

	* libgcc/Makefile.in: Add pmu-profile.c.
	* gcc/doc/invoke.texi: Document new pmu profile related options.
	* gcc/doc/gcov.texi: Document new options -m and -q.
	* gcc/gcc.c: Link libgcov for -fpmu-profile-generate option.
	* gcc/gcov.c (filter_pmu_data_lines): New function.
	(output_pmu_data_header): ditto.
	(output_pmu_data): ditto.
	(output_load_latency_line): ditto.
	(output_branch_mispredict_line): ditto.
	(static void process_pmu_profile): ditto.
	* gcc/gcov-io.c (gcov_canonical_filename): New function.
	(gcov_read_pmu_load_latency_info): ditto.
	(gcov_read_pmu_branch_mispredict_info): ditto.
	(gcov_read_pmu_tool_header): ditto.
	(gcov_string_length): ditto.
	(convert_unsigned_to_pct): ditto.
	(print_load_latency_line): ditto.
	(print_branch_mispredict_line): ditto.
	(print_pmu_tool_header): ditto.
	(destroy_pmu_tool_header): ditto.
	(gcov_read_string): Make it available unconditionally.
	* gcc/gcov-io.h (struct gcov_pmu_info): New structure.
	* gcc/opts.c: New option -fpmu-profile-generate.
	* gcc/pmu-profile.c (enum pmu_tool_type): New structure.
	(enum pmu_event_type): ditto.
	(enum pmu_state): ditto.
	(enum cpu_vendor_signature): ditto.
	(struct pmu_tool_info): ditto.
	(get_x86cpu_vendor): New function.
	(parse_pmu_profile_options): ditto.
	(start_addr2line_symbolizer): ditto.
	(reset_symbolizer_parent_pipes): ditto.
	(reset_symbolizer_child_pipes): ditto.
	(end_addr2line_symbolizer): ditto.
	(symbolize_addr2line): ditto.
	(start_pfmon_module): ditto.
	(convert_pct_to_unsigned): ditto.
	(parse_load_latency_line): ditto.
	(parse_branch_mispredict_line): ditto.
	(destroy_load_latency_infos): ditto.
	(destroy_branch_mispredict_infos): ditto.
	(parse_pfmon_load_latency): ditto.
	(parse_pfmon_tool_header): ditto.
	(parse_pfmon_branch_mispredicts): ditto.
	(pmu_start): ditto.
	(init_pmu_load_latency): ditto.
	(init_pmu_branch_mispredict): ditto.
	(init_pmu_tool): ditto.
	(__gcov_init_pmu_profiler): ditto.
	(__gcov_start_pmu_profiler): ditto.
	(__gcov_stop_pmu_profiler): ditto.
	(gcov_write_ll_line): ditto.
	(gcov_write_branch_mispredict_line): ditto.
	(gcov_write_load_latency_infos): ditto.
	(gcov_write_branch_mispredict_infos): ditto.
	(gcov_tag_pmu_tool_header_length): ditto.
	(gcov_write_tool_header): ditto.
	(__gcov_end_pmu_profiler): ditto.
	* gcc/coverage.c (get_const_string_type): New function.
	(create_coverage): Do the coverage processing even if only
	flag_pmu_profile_generate is specified.
	(coverage_init): Call gimple_init_instrumentation_sampling from here instead
	from tree-profile.c:gimple_init_edge_profiler.
	(get_da_file_name): Make extern.
	(profiling_enabled_p): New function.
	(init_pmu_profiling): ditto.
	(check_pmu_profile_options): ditto.
	* gcc/coverage.h (get_da_file_name): Make it extern.
	* gcc/common.opt: Add new options -fpmu-profile-generate and
	-fpmu-profile-use.
	* gcc/tree-profile.c (gimple_init_instrumentation_sampling): Make
	extern. Move the call from gimple_init_edge_profiler to
	coverage.c:coverage_init.
	* gcc/libgcov.c (gcov_alloc_filename): Moved earlier in file.
	(pmu_profile_stop): New function.
	(gcov_dump_module_info): Replace gcov_strip_leading_dirs by a macro.
	(__gcov_init): Add initialization of PMU profiler.
	(gcov_exit): Add finalization of PMU profiler.
	(gcov_get_filename): Cleanup whitespaces.
	* gcc/params.def: New parameter pmu_profile_n_addresses.
	* gcc/gcov-dump.c (tag_pmu_load_latency_info): New function.
	(tag_pmu_branch_mispredict_info): ditto.
	(tag_pmu_tool_header): ditto.


--
This patch is available for review at http://codereview.appspot.com/4638047

Patch

Index: libgcc/Makefile.in
===================================================================
--- libgcc/Makefile.in	(revision 175188)
+++ libgcc/Makefile.in	(working copy)
@@ -747,10 +747,13 @@ 
 dyn-ipa.o: %$(objext): $(gcc_srcdir)/libgcov.c
 	$(gcc_compile)  -c $(gcc_srcdir)/dyn-ipa.c
 
+pmu-profile.o: %$(objext): $(gcc_srcdir)/libgcov.c
+	$(gcc_compile)  -c $(gcc_srcdir)/pmu-profile.c
 
+
 # Static libraries.
 libgcc.a: $(libgcc-objects)
-libgcov.a: $(libgcov-objects) dyn-ipa$(objext)
+libgcov.a: $(libgcov-objects) dyn-ipa$(objext) pmu-profile$(objext)
 libunwind.a: $(libunwind-objects)
 libgcc_eh.a: $(libgcc-eh-objects)
 
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 175188)
+++ gcc/doc/invoke.texi	(working copy)
@@ -388,6 +388,8 @@ 
 -fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol
 -fprofile-generate=@var{path} -fprofile-generate-sampling @gol
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
+-fpmu-profile-generate=@var{pmuoption} @gol
+-fpmu-profile-use=@var{pmuoption} @gol
 -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol
 -freorder-blocks-and-partition -freorder-functions @gol
 -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
@@ -8088,6 +8090,26 @@ 
 If @var{path} is specified, GCC will look at the @var{path} to find
 the profile feedback data files. See @option{-fprofile-dir}.
 
+@item -fpmu-profile-generate=@var{pmuoption}
+@opindex fpmu-profile-generate
+
+Enable performance monitoring unit (PMU) profiling.  This collects
+hardware counter data corresponding to @var{pmuoption}.  Currently
+only @var{load-latency} and @var{branch-mispredict} are supported
+using pfmon tool.  You must use @option{-fpmu-profile-generate} both
+when compiling and when linking your program.  This PMU profile data
+may later be used by the compiler during optimizations as well can be
+displayed using coverage tool gcov. The params variable
+"pmu_profile_n_addresses" can be used to restrict PMU data collection
+to only this many addresses.
+
+@item -fpmu-profile-use=@var{pmuoption}
+@opindex fpmu-profile-use
+
+Enable performance monitoring unit (PMU) profiling based
+optimizations.  Currently only @var{load-latency} and
+@var{branch-mispredict} are supported.
+
 @item -fripa
 @opindex fripa
 Perform dynamic inter-procedural analysis. This is used in conjunction with
Index: gcc/doc/gcov.texi
===================================================================
--- gcc/doc/gcov.texi	(revision 175188)
+++ gcc/doc/gcov.texi	(working copy)
@@ -124,9 +124,11 @@ 
      [@option{-a}|@option{--all-blocks}]
      [@option{-b}|@option{--branch-probabilities}]
      [@option{-c}|@option{--branch-counts}]
+     [@option{-m}|@option{--pmu-profile}]
      [@option{-n}|@option{--no-output}]
      [@option{-l}|@option{--long-file-names}]
      [@option{-p}|@option{--preserve-paths}]
+     [@option{-q}|@option{--pmu_profile-path}]
      [@option{-f}|@option{--function-summaries}]
      [@option{-o}|@option{--object-directory} @var{directory|file}] @var{sourcefiles}
      [@option{-u}|@option{--unconditional-branches}]
@@ -169,6 +171,14 @@ 
 Write branch frequencies as the number of branches taken, rather than
 the percentage of branches taken.
 
+@item -m
+@itemx --pmu-profile
+Output the additional PMU profile information if available.
+
+@item -q
+@itemx --pmu_profile-path
+PMU profile path (default @file{pmuprofile.gcda}).
+
 @item -n
 @itemx --no-output
 Do not create the @command{gcov} output file.
Index: gcc/gcc.c
===================================================================
--- gcc/gcc.c	(revision 175188)
+++ gcc/gcc.c	(working copy)
@@ -662,7 +662,7 @@ 
     %{static:} %{L*} %(mfwrap) %(link_libgcc) %o\
     %{fopenmp|ftree-parallelize-loops=*:%:include(libgomp.spec)%(link_gomp)}\
     %(mflib) " STACK_SPLIT_SPEC "\
-    %{fprofile-arcs|fprofile-generate*|coverage:-lgcov}\
+    %{fprofile-arcs|fprofile-generate*|fpmu-profile-generate*|coverage:-lgcov}\
     %{!nostdlib:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}\
     %{!nostdlib:%{!nostartfiles:%E}} %{T*} }}}}}}"
 #endif
Index: gcc/gcov.c
===================================================================
--- gcc/gcov.c	(revision 175188)
+++ gcc/gcov.c	(working copy)
@@ -209,6 +209,15 @@ 
   char *name;
 } coverage_t;
 
+/* Describes PMU profile data for either one source file or for the
+   entire program.  */
+
+typedef struct pmu_data
+{
+  ll_infos_t ll_infos;
+  brm_infos_t brm_infos;
+} pmu_data_t;
+
 /* Describes a single line of source. Contains a chain of basic blocks
    with code on it.  */
 
@@ -242,6 +251,8 @@ 
 
   coverage_t coverage;
 
+  pmu_data_t *pmu_data;    /* PMU profile information for this file.  */
+
   /* Functions in this source file.  These are in ascending line
      number order.  */
   function_t *functions;
@@ -301,6 +312,10 @@ 
 /* Show unconditional branches too.  */
 static int flag_unconditional = 0;
 
+/* Output performance monitoring unit (PMU) data, if available.  */
+
+static int flag_pmu_profile = 0;
+
 /* Output a gcov file if this is true.  This is on by default, and can
    be turned off by the -n option.  */
 
@@ -345,6 +360,18 @@ 
 
 static int flag_counts = 0;
 
+/* PMU profile default filename.  */
+
+static char pmu_profile_default_filename[] = "pmuprofile.gcda";
+
+/* PMU profile filename where the PMU profile data is read from.  */
+
+static char *pmu_profile_filename = 0;
+
+/* PMU data for the entire program.  */
+
+static pmu_data_t pmu_global_info;
+
 /* Forward declarations.  */
 static void fnotice (FILE *, const char *, ...) ATTRIBUTE_PRINTF_2;
 static int process_args (int, char **);
@@ -366,6 +393,17 @@ 
 static void output_lines (FILE *, const source_t *);
 static char *make_gcov_file_name (const char *, const char *);
 static void release_structures (void);
+static void process_pmu_profile (void);
+static void filter_pmu_data_lines (source_t *src);
+static void output_pmu_data_header (FILE *gcov_file, pmu_data_t *pmu_data);
+static void output_pmu_data (FILE *gcov_file, const source_t *src,
+                             const unsigned line_num);
+static void output_load_latency_line (FILE *fp,
+                                      const gcov_pmu_ll_info_t *ll_info,
+                                      gcov_pmu_tool_header_t *tool_header);
+static void output_branch_mispredict_line (FILE *fp,
+                                           const gcov_pmu_brm_info_t *brm_info);
+
 extern int main (int, char **);
 
 int
@@ -389,6 +427,15 @@ 
   if (argc - argno > 1)
     multiple_files = 1;
 
+  /*  We read pmu profile first because we later filter
+      src:line_numbers for each source.  */
+  if (flag_pmu_profile)
+    {
+      if (!pmu_profile_filename)
+        pmu_profile_filename = pmu_profile_default_filename;
+      process_pmu_profile ();
+    }
+
   first_arg = argno;
   
   for (; argno != argc; argno++)
@@ -433,12 +480,14 @@ 
   fnotice (file, "  -b, --branch-probabilities      Include branch probabilities in output\n");
   fnotice (file, "  -c, --branch-counts             Given counts of branches taken\n\
                                     rather than percentages\n");
+  fnotice (file, "  -m, --pmu-profile               Output PMU profile data if available\n");
   fnotice (file, "  -n, --no-output                 Do not create an output file\n");
   fnotice (file, "  -l, --long-file-names           Use long output file names for included\n\
                                     source files\n");
   fnotice (file, "  -f, --function-summaries        Output summaries for each function\n");
   fnotice (file, "  -o, --object-directory DIR|FILE Search for object files in DIR or called FILE\n");
   fnotice (file, "  -p, --preserve-paths            Preserve all pathname components\n");
+  fnotice (file, "  -q, --pmu_profile-path          Path for PMU profile (default pmuprofile.gcda)\n");
   fnotice (file, "  -u, --unconditional-branches    Show unconditional branch counts too\n");
   fnotice (file, "  -i, --intermediate-format       Output .gcov file in an intermediate text\n\
                                     format that can be used by 'lcov' or other\n\
@@ -473,6 +522,7 @@ 
   { "all-blocks",           no_argument,       NULL, 'a' },
   { "branch-probabilities", no_argument,       NULL, 'b' },
   { "branch-counts",        no_argument,       NULL, 'c' },
+  { "pmu-profile",          no_argument,       NULL, 'm' },
   { "no-output",            no_argument,       NULL, 'n' },
   { "long-file-names",      no_argument,       NULL, 'l' },
   { "function-summaries",   no_argument,       NULL, 'f' },
@@ -480,6 +530,7 @@ 
   { "object-directory",     required_argument, NULL, 'o' },
   { "object-file",          required_argument, NULL, 'o' },
   { "unconditional-branches", no_argument,     NULL, 'u' },
+  { "pmu_profile-path",     required_argument, NULL, 'q' },
   { "display-progress",     no_argument,       NULL, 'd' },
   { "intermediate-format",  no_argument,       NULL, 'i' },
   { 0, 0, 0, 0 }
@@ -492,7 +543,7 @@ 
 {
   int opt;
 
-  while ((opt = getopt_long (argc, argv, "abcdfhilno:puv", options, NULL)) !=
+  while ((opt = getopt_long (argc, argv, "abcdfhilno:pq:uv", options, NULL)) !=
          -1)
     {
       switch (opt)
@@ -515,6 +566,9 @@ 
 	case 'l':
 	  flag_long_names = 1;
 	  break;
+	case 'm':
+	  flag_pmu_profile = 1;
+	  break;
 	case 'n':
 	  flag_gcov_file = 0;
 	  break;
@@ -524,6 +578,9 @@ 
 	case 'p':
 	  flag_preserve_paths = 1;
 	  break;
+	case 'q':
+	  pmu_profile_filename = optarg;
+	  break;
 	case 'u':
 	  flag_unconditional = 1;
 	  break;
@@ -766,6 +823,8 @@ 
 {
   function_t *fn;
   source_t *src;
+  ll_infos_t *ll_infos = &pmu_global_info.ll_infos;
+  brm_infos_t *brm_infos = &pmu_global_info.brm_infos;
 
   while ((src = sources))
     {
@@ -773,6 +832,14 @@ 
 
       free (src->name);
       free (src->lines);
+      if (src->pmu_data)
+        {
+          if (src->pmu_data->ll_infos.ll_array)
+            free (src->pmu_data->ll_infos.ll_array);
+          if (src->pmu_data->brm_infos.brm_array)
+            free (src->pmu_data->brm_infos.brm_array);
+          free (src->pmu_data);
+        }
     }
 
   while ((fn = functions))
@@ -794,6 +861,42 @@ 
       free (fn->blocks);
       free (fn->counts);
     }
+
+  /* Cleanup PMU load latency info.  */
+  if (ll_infos->ll_count)
+    {
+      unsigned i;
+
+      /* delete each element */
+      for (i = 0; i < ll_infos->ll_count; ++i)
+        {
+          if (ll_infos->ll_array[i]->filename)
+            XDELETE (ll_infos->ll_array[i]->filename);
+          XDELETE (ll_infos->ll_array[i]);
+        }
+      /* delete the array itself */
+      XDELETE (ll_infos->ll_array);
+      ll_infos->ll_array = NULL;
+      ll_infos->ll_count = 0;
+    }
+
+  /* Cleanup PMU branch mispredict info.  */
+  if (brm_infos->brm_count)
+    {
+      unsigned i;
+
+      /* delete each element */
+      for (i = 0; i < brm_infos->brm_count; ++i)
+        {
+          if (brm_infos->brm_array[i]->filename)
+            XDELETE (brm_infos->brm_array[i]->filename);
+          XDELETE (brm_infos->brm_array[i]);
+        }
+      /* delete the array itself */
+      XDELETE (brm_infos->brm_array);
+      brm_infos->brm_array = NULL;
+      brm_infos->brm_count = 0;
+    }
 }
 
 /* Generate the names of the graph and data files. If OBJECT_DIRECTORY
@@ -890,6 +993,7 @@ 
       src->coverage.name = src->name;
       src->index = source_index++;
       src->next = sources;
+      src->pmu_data = 0;
       sources = src;
 
       if (!stat (file_name, &status))
@@ -1806,6 +1910,148 @@ 
     fnotice (stderr, "%s:no lines for '%s'\n", bbg_file_name, fn->name);
 }
 
+/* Filter PMU profile global data for lines for SRC.  Save PMU info
+   matching the source file and sort them by line number for later
+   line by line processing.  */
+
+static void
+filter_pmu_data_lines (source_t *src)
+{
+  unsigned i;
+  int changed;
+  ll_infos_t *ll_infos;         /* load latency information for this source */
+  brm_infos_t *brm_infos;  /* branch mispredict information for this source */
+
+  if (pmu_global_info.ll_infos.ll_count == 0 &&
+      pmu_global_info.brm_infos.brm_count == 0)
+    /* If there are no global entries, there is nothing to filter.  */
+    return;
+
+  src->pmu_data = XCNEW (pmu_data_t);
+  ll_infos = &src->pmu_data->ll_infos;
+  brm_infos = &src->pmu_data->brm_infos;
+  ll_infos->pmu_tool_header = pmu_global_info.ll_infos.pmu_tool_header;
+  brm_infos->pmu_tool_header = pmu_global_info.brm_infos.pmu_tool_header;
+  ll_infos->ll_array = 0;
+  brm_infos->brm_array = 0;
+
+  /* Go over all the load latency entries and save the ones
+     corresponding to this source file.  */
+  for (i = 0; i < pmu_global_info.ll_infos.ll_count; ++i)
+    {
+      gcov_pmu_ll_info_t *ll_info = pmu_global_info.ll_infos.ll_array[i];
+      if (0 == strcmp (src->name, ll_info->filename))
+        {
+          if (!ll_infos->ll_array)
+            {
+              ll_infos->ll_count = 0;
+              ll_infos->alloc_ll_count = 64;
+              ll_infos->ll_array = XCNEWVEC (gcov_pmu_ll_info_t *,
+                                             ll_infos->alloc_ll_count);
+            }
+          /* Found a matching entry, save it.  */
+          ll_infos->ll_count++;
+          if (ll_infos->ll_count >= ll_infos->alloc_ll_count)
+            {
+              /* need to realloc */
+              ll_infos->ll_array = (gcov_pmu_ll_info_t **)
+                xrealloc (ll_infos->ll_array, 2 * ll_infos->alloc_ll_count);
+              if (ll_infos->ll_array == NULL) {
+                fprintf (stderr, "Cannot allocate memory for load latency.\n");
+                return;
+              }
+            }
+          ll_infos->ll_array[ll_infos->ll_count - 1] = ll_info;
+        }
+    }
+
+  /* Go over all the branch mispredict entries and save the ones
+     corresponding to this source file.  */
+  for (i = 0; i < pmu_global_info.brm_infos.brm_count; ++i)
+    {
+      gcov_pmu_brm_info_t *brm_info = pmu_global_info.brm_infos.brm_array[i];
+      if (0 == strcmp (src->name, brm_info->filename))
+        {
+          if (!brm_infos->brm_array)
+            {
+              brm_infos->brm_count = 0;
+              brm_infos->alloc_brm_count = 64;
+              brm_infos->brm_array = XCNEWVEC (gcov_pmu_brm_info_t *,
+                                               brm_infos->alloc_brm_count);
+            }
+          /* Found a matching entry, save it.  */
+          brm_infos->brm_count++;
+          if (brm_infos->brm_count >= brm_infos->alloc_brm_count)
+            {
+              /* need to realloc */
+              brm_infos->brm_array = (gcov_pmu_brm_info_t **)
+                xrealloc (brm_infos->brm_array, 2 * brm_infos->alloc_brm_count);
+              if (brm_infos->brm_array == NULL) {
+                fprintf (stderr, "Cannot allocate memory for load latency.\n");
+                return;
+              }
+            }
+          brm_infos->brm_array[brm_infos->brm_count - 1] = brm_info;
+        }
+    }
+
+  /* Sort the load latency data according to the line numbers because
+     we later iterate over sources in line number order. Normally we
+     expect the PMU tool to provide sorted data, but a few entries can
+     be out of order. Thus we use a very simple bubble sort here.  */
+  if (ll_infos->ll_count > 1)
+    {
+      changed = 1;
+      while (changed)
+        {
+          changed = 0;
+          for (i = 0; i < ll_infos->ll_count - 1; ++i)
+            {
+              gcov_pmu_ll_info_t *item1 = ll_infos->ll_array[i];
+              gcov_pmu_ll_info_t *item2 = ll_infos->ll_array[i+1];
+              if (item1->line > item2->line)
+                {
+                  /* swap */
+                  gcov_pmu_ll_info_t *tmp = ll_infos->ll_array[i];
+                  ll_infos->ll_array[i] = ll_infos->ll_array[i+1];
+                  ll_infos->ll_array[i+1] = tmp;
+                  changed = 1;
+                }
+            }
+        }
+    }
+
+  /* Similarly, sort branch mispredict info as well.  */
+  if (brm_infos->brm_count > 1)
+    {
+      changed = 1;
+      while (changed)
+        {
+          changed = 0;
+          for (i = 0; i < brm_infos->brm_count - 1; ++i)
+            {
+              gcov_pmu_brm_info_t *item1 = brm_infos->brm_array[i];
+              gcov_pmu_brm_info_t *item2 = brm_infos->brm_array[i+1];
+              if (item1->line > item2->line)
+                {
+                  /* swap */
+                  gcov_pmu_brm_info_t *tmp = brm_infos->brm_array[i];
+                  brm_infos->brm_array[i] = brm_infos->brm_array[i+1];
+                  brm_infos->brm_array[i+1] = tmp;
+                  changed = 1;
+                }
+            }
+        }
+    }
+
+  /* If no matching PMU info was found, relase the structures.  */
+  if (!brm_infos->brm_array && !ll_infos->ll_array)
+  {
+    free (src->pmu_data);
+    src->pmu_data = 0;
+  }
+}
+
 /* Accumulate the line counts of a file.  */
 
 static void
@@ -1815,6 +2061,10 @@ 
   function_t *fn, *fn_p, *fn_n;
   unsigned ix;
 
+  if (flag_pmu_profile)
+    /* Filter PMU profile by source files and save into matching line(s).  */
+    filter_pmu_data_lines (src);
+
   /* Reverse the function order.  */
   for (fn = src->functions, fn_p = NULL; fn;
        fn_p = fn, fn = fn_n)
@@ -2062,6 +2312,9 @@ 
   else if (src->file_time == 0)
     fprintf (gcov_file, "%9s:%5d:Source is newer than graph\n", "-", 0);
 
+  if (src->pmu_data)
+    output_pmu_data_header (gcov_file, src->pmu_data);
+
   if (flag_branches)
     fn = src->functions;
 
@@ -2139,6 +2392,10 @@ 
 	  for (ix = 0, arc = line->u.branches; arc; arc = arc->line_next)
 	    ix += output_branch_count (gcov_file, ix, arc);
 	}
+
+      /* Output PMU profile info if available.  */
+      if (flag_pmu_profile)
+        output_pmu_data (gcov_file, src, line_num);
     }
 
   /* Handle all remaining source lines.  There may be lines after the
@@ -2162,3 +2419,244 @@ 
   if (source_file)
     fclose (source_file);
 }
+
+/* Print an explanatory header for PMU_DATA into GCOV_FILE.  */
+
+static void
+output_pmu_data_header (FILE *gcov_file, pmu_data_t *pmu_data)
+{
+  /* Print header for the applicable PMU events.  */
+  fprintf (gcov_file, "%9s:%5d\n", "-", 0);
+  if (pmu_data->ll_infos.ll_count)
+    {
+      char *text = pmu_data->ll_infos.pmu_tool_header->column_description;
+      char c;
+      fprintf (gcov_file, "%9s:%5u: %s", "PMU_LL", 0,
+               pmu_data->ll_infos.pmu_tool_header->column_header);
+      /* The column description is multiline text and we want to print
+         each line separately after formatting it.  */
+      fprintf (gcov_file, "%9s:%5u: ", "PMU_LL", 0);
+      while ((c = *text++))
+        {
+          fprintf (gcov_file, "%c", c);
+          /* Do not print a new header on trailing newline.   */
+          if (c == '\n' && text[1])
+            fprintf (gcov_file, "%9s:%5u: ", "PMU_LL", 0);
+        }
+      fprintf (gcov_file, "%9s:%5d\n", "-", 0);
+    }
+
+  if (pmu_data->brm_infos.brm_count)
+    {
+
+      fprintf (gcov_file, "%9s:%5d:PMU BRM: line: %s %s %s\n",
+               "-", 0, "count", "self", "address");
+      fprintf (gcov_file, "%9s:%5d:         "
+               "count: number of branch mispredicts sampled at this address\n",
+               "-", 0);
+      fprintf (gcov_file, "%9s:%5d:         "
+               "self: branch mispredicts as percentage of the entire program\n",
+               "-", 0);
+      fprintf (gcov_file, "%9s:%5d\n", "-", 0);
+    }
+}
+
+/* Output pmu data corresponding to SRC and LINE_NUM into GCOV_FILE.  */
+
+static void
+output_pmu_data (FILE *gcov_file, const source_t *src, const unsigned line_num)
+{
+  unsigned i;
+  ll_infos_t *ll_infos;
+  brm_infos_t *brm_infos;
+  gcov_pmu_tool_header_t *tool_header;
+
+  if (!src->pmu_data)
+    return;
+
+  ll_infos = &src->pmu_data->ll_infos;
+  brm_infos = &src->pmu_data->brm_infos;
+
+  if (ll_infos->ll_array)
+    {
+      tool_header = src->pmu_data->ll_infos.pmu_tool_header;
+
+      /* Search PMU load latency data for the matching line
+         numbers. There could be multiple entries with the same line
+         number. We use the fact that line numbers are sorted in
+         ll_array.  */
+      for (i = 0; i < ll_infos->ll_count &&
+             ll_infos->ll_array[i]->line <= line_num; ++i)
+        {
+          gcov_pmu_ll_info_t *ll_info = ll_infos->ll_array[i];
+          if (ll_info->line == line_num)
+            output_load_latency_line (gcov_file, ll_info, tool_header);
+        }
+    }
+
+  if (brm_infos->brm_array)
+    {
+      tool_header = src->pmu_data->brm_infos.pmu_tool_header;
+
+      /* Search PMU branch mispredict data for the matching line
+         numbers. There could be multiple entries with the same line
+         number. We use the fact that line numbers are sorted in
+         brm_array.  */
+      for (i = 0; i < brm_infos->brm_count &&
+             brm_infos->brm_array[i]->line <= line_num; ++i)
+        {
+          gcov_pmu_brm_info_t *brm_info = brm_infos->brm_array[i];
+          if (brm_info->line == line_num)
+            output_branch_mispredict_line (gcov_file, brm_info);
+        }
+    }
+}
+
+
+/* Output formatted load latency info pointed to by LL_INFO into the
+   open file FP.  TOOL_HEADER contains additional explanation of
+   fields.  */
+
+static void
+output_load_latency_line (FILE *fp, const gcov_pmu_ll_info_t *ll_info,
+                          gcov_pmu_tool_header_t *tool_header ATTRIBUTE_UNUSED)
+{
+  fprintf (fp, "%9s:%5u:      ", "PMU_LL", ll_info->line);
+  fprintf (fp, " %u %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% "
+           "%.2f%% %.2f%% " HOST_WIDEST_INT_PRINT_HEX "\n",
+           ll_info->counts,
+           convert_unsigned_to_pct (ll_info->self),
+           convert_unsigned_to_pct (ll_info->cum),
+           convert_unsigned_to_pct (ll_info->lt_10),
+           convert_unsigned_to_pct (ll_info->lt_32),
+           convert_unsigned_to_pct (ll_info->lt_64),
+           convert_unsigned_to_pct (ll_info->lt_256),
+           convert_unsigned_to_pct (ll_info->lt_1024),
+           convert_unsigned_to_pct (ll_info->gt_1024),
+           convert_unsigned_to_pct (ll_info->wself),
+           ll_info->code_addr);
+}
+
+
+/* Output formatted branch mispredict info pointed to by BRM_INFO into
+   the open file FP.  */
+
+static void
+output_branch_mispredict_line (FILE *fp,
+                               const gcov_pmu_brm_info_t *ll_info)
+{
+  fprintf (fp, "%9s:%5u: count: %u self: %.2f%% addr: "
+           HOST_WIDEST_INT_PRINT_HEX "\n",
+           "PMU BRM",
+           ll_info->line,
+           ll_info->counts,
+           convert_unsigned_to_pct (ll_info->self),
+           ll_info->code_addr);
+}
+
+/* Read in the PMU profile information from the global PMU profile file.  */
+
+static void process_pmu_profile (void)
+{
+  unsigned tag;
+  unsigned version;
+  int error = 0;
+  ll_infos_t *ll_infos = &pmu_global_info.ll_infos;
+  brm_infos_t *brm_infos = &pmu_global_info.brm_infos;
+
+  /* Construct path for pmuprofile.gcda filename. */
+  create_file_names (pmu_profile_filename);
+  if (!gcov_open (da_file_name, 1))
+    {
+      fnotice (stderr, "%s:cannot open pmu profile file\n",
+               pmu_profile_filename);
+      return;
+    }
+  if (!gcov_magic (gcov_read_unsigned (), GCOV_DATA_MAGIC))
+    {
+      fnotice (stderr, "%s:not a gcov data file\n", da_file_name);
+    cleanup:;
+      gcov_close ();
+      return;
+    }
+  version = gcov_read_unsigned ();
+  if (version != GCOV_VERSION)
+    {
+      char v[4], e[4];
+
+      GCOV_UNSIGNED2STRING (v, version);
+      GCOV_UNSIGNED2STRING (e, GCOV_VERSION);
+      fnotice (stderr, "%s:version '%.4s', prefer version '%.4s'\n",
+	       da_file_name, v, e);
+    }
+  /* read stamp */
+  tag = gcov_read_unsigned ();
+
+  /* Initialize PMU data fields. */
+  ll_infos->ll_count = 0;
+  ll_infos->alloc_ll_count = 64;
+  ll_infos->ll_array = XCNEWVEC (gcov_pmu_ll_info_t *, ll_infos->alloc_ll_count);
+
+  brm_infos->brm_count = 0;
+  brm_infos->alloc_brm_count = 64;
+  brm_infos->brm_array = XCNEWVEC (gcov_pmu_brm_info_t *,
+                                   brm_infos->alloc_brm_count);
+
+  while ((tag = gcov_read_unsigned ()))
+    {
+      unsigned length = gcov_read_unsigned ();
+      unsigned long base = gcov_position ();
+
+      if (tag == GCOV_TAG_PMU_LOAD_LATENCY_INFO)
+        {
+          gcov_pmu_ll_info_t *ll_info = XCNEW (gcov_pmu_ll_info_t);
+          gcov_read_pmu_load_latency_info (ll_info, length);
+          ll_infos->ll_count++;
+          if (ll_infos->ll_count >= ll_infos->alloc_ll_count)
+            {
+              /* need to realloc */
+              ll_infos->ll_array = (gcov_pmu_ll_info_t **)
+                xrealloc (ll_infos->ll_array, 2 * ll_infos->alloc_ll_count);
+              if (ll_infos->ll_array == NULL) {
+                fprintf (stderr, "Cannot allocate memory for load latency.\n");
+                goto cleanup;
+              }
+            }
+          ll_infos->ll_array[ll_infos->ll_count - 1] = ll_info;
+        }
+      else if (tag == GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO)
+        {
+          gcov_pmu_brm_info_t *brm_info = XCNEW (gcov_pmu_brm_info_t);
+          gcov_read_pmu_branch_mispredict_info (brm_info, length);
+          brm_infos->brm_count++;
+          if (brm_infos->brm_count >= brm_infos->alloc_brm_count)
+            {
+              /* need to realloc */
+              brm_infos->brm_array = (gcov_pmu_brm_info_t **)
+                xrealloc (brm_infos->brm_array, 2 * brm_infos->alloc_brm_count);
+              if (brm_infos->brm_array == NULL) {
+                fprintf (stderr, "Cannot allocate memory for load latency.\n");
+                goto cleanup;
+              }
+            }
+          brm_infos->brm_array[brm_infos->brm_count - 1] = brm_info;
+        }
+      else if (tag == GCOV_TAG_PMU_TOOL_HEADER)
+        {
+          gcov_pmu_tool_header_t *tool_header = XCNEW (gcov_pmu_tool_header_t);
+          gcov_read_pmu_tool_header (tool_header, length);
+          ll_infos->pmu_tool_header = tool_header;
+          brm_infos->pmu_tool_header = tool_header;
+        }
+
+      gcov_sync (base, length);
+      if ((error = gcov_is_error ()))
+	{
+	  fnotice (stderr, error < 0 ? "%s:overflowed\n" : "%s:corrupted\n",
+		   da_file_name);
+	  goto cleanup;
+	}
+    }
+
+  gcov_close ();
+}
Index: gcc/gcov-io.c
===================================================================
--- gcc/gcov-io.c	(revision 175188)
+++ gcc/gcov-io.c	(working copy)
@@ -23,6 +23,12 @@ 
 /* Routines declared in gcov-io.h.  This file should be #included by
    another source file, after having #included gcov-io.h.  */
 
+/* Redefine these here, rather than using the ones in system.h since
+ * including system.h leads to conflicting definitions of other
+ * symbols and macros.  */
+#undef MIN
+#define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
+
 #if !IN_GCOV
 static void gcov_write_block (unsigned);
 static gcov_unsigned_t *gcov_write_words (unsigned);
@@ -197,6 +203,104 @@ 
 }
 
 #if !IN_LIBGCOV
+/* Modify FILENAME to a canonical form after stripping known prefixes
+   in place.  It removes '/proc/self/cwd' and '/proc/self/cwd/.'.
+   Returns the in-place modified filename.  */
+
+GCOV_LINKAGE char *
+gcov_canonical_filename (char *filename)
+{
+  static char cwd_dot_str[] = "/proc/self/cwd/./";
+  int cwd_dot_len = strlen (cwd_dot_str);
+  int cwd_len = cwd_dot_len - 2; /* without trailing './' */
+  int filename_len = strlen (filename);
+  /* delete the longer prefix first */
+  if (0 == strncmp (filename, cwd_dot_str, cwd_dot_len))
+    {
+      memmove (filename, filename + cwd_dot_len, filename_len - cwd_dot_len);
+      filename[filename_len - cwd_dot_len] = '\0';
+      return filename;
+    }
+
+  if (0 == strncmp (filename, cwd_dot_str, cwd_len))
+    {
+      memmove (filename, filename + cwd_len, filename_len - cwd_len);
+      filename[filename_len - cwd_len] = '\0';
+      return filename;
+    }
+  return filename;
+}
+
+/* Read LEN words and construct load latency info LL_INFO.  */
+
+GCOV_LINKAGE void
+gcov_read_pmu_load_latency_info (gcov_pmu_ll_info_t *ll_info,
+                                 gcov_unsigned_t len ATTRIBUTE_UNUSED)
+{
+  const char *filename;
+  ll_info->counts = gcov_read_unsigned ();
+  ll_info->self = gcov_read_unsigned ();
+  ll_info->cum = gcov_read_unsigned ();
+  ll_info->lt_10 = gcov_read_unsigned ();
+  ll_info->lt_32 = gcov_read_unsigned ();
+  ll_info->lt_64 = gcov_read_unsigned ();
+  ll_info->lt_256 = gcov_read_unsigned ();
+  ll_info->lt_1024 = gcov_read_unsigned ();
+  ll_info->gt_1024 = gcov_read_unsigned ();
+  ll_info->wself = gcov_read_unsigned ();
+  ll_info->code_addr = gcov_read_counter ();
+  ll_info->line = gcov_read_unsigned ();
+  ll_info->discriminator = gcov_read_unsigned ();
+  filename = gcov_read_string ();
+  if (filename)
+    ll_info->filename = gcov_canonical_filename (xstrdup (filename));
+  else
+    ll_info->filename = 0;
+}
+
+/* Read LEN words and construct branch mispredict info BRM_INFO.  */
+
+GCOV_LINKAGE void
+gcov_read_pmu_branch_mispredict_info (gcov_pmu_brm_info_t *brm_info,
+                                      gcov_unsigned_t len ATTRIBUTE_UNUSED)
+{
+  const char *filename;
+  brm_info->counts = gcov_read_unsigned ();
+  brm_info->self = gcov_read_unsigned ();
+  brm_info->cum = gcov_read_unsigned ();
+  brm_info->code_addr = gcov_read_counter ();
+  brm_info->line = gcov_read_unsigned ();
+  brm_info->discriminator = gcov_read_unsigned ();
+  filename = gcov_read_string ();
+  if (filename)
+    brm_info->filename = gcov_canonical_filename (xstrdup (filename));
+  else
+    brm_info->filename = 0;
+}
+
+/* Read LEN words from an open gcov file and construct data into pmu
+   tool header TOOL_HEADER.  */
+
+GCOV_LINKAGE void gcov_read_pmu_tool_header (gcov_pmu_tool_header_t *header,
+                                           gcov_unsigned_t len ATTRIBUTE_UNUSED)
+{
+  const char *str;
+  str = gcov_read_string ();
+  header->host_cpu = str ? xstrdup (str) : 0;
+  str = gcov_read_string ();
+  header->hostname = str ? xstrdup (str) : 0;
+  str = gcov_read_string ();
+  header->kernel_version = str ? xstrdup (str) : 0;
+  str = gcov_read_string ();
+  header->column_header = str ? xstrdup (str) : 0;
+  str = gcov_read_string ();
+  header->column_description = str ? xstrdup (str) : 0;
+  str = gcov_read_string ();
+  header->full_header = str ? xstrdup (str) : 0;
+}
+#endif
+
+#if !IN_LIBGCOV
 /* Check if MAGIC is EXPECTED. Use it to determine endianness of the
    file. Returns +1 for same endian, -1 for other endian and zero for
    not EXPECTED.  */
@@ -245,6 +349,24 @@ 
   gcov_var.offset -= size;
 }
 
+#if IN_LIBGCOV
+/* Return the number of words STRING would need including the length
+   field in the output stream itself.  This should be identical to
+   "alloc" calculation in gcov_write_string().  */
+
+GCOV_LINKAGE gcov_unsigned_t
+gcov_string_length (const char *string)
+{
+  gcov_unsigned_t len = (string) ? strlen (string) : 0;
+  /* + 1 because of the length field.  */
+  gcov_unsigned_t alloc = 1 + ((len + 4) >> 2);
+
+  /* Can not write a bigger than GCOV_BLOCK_SIZE string yet */
+  gcc_assert (alloc < GCOV_BLOCK_SIZE);
+  return alloc;
+}
+#endif
+
 /* Allocate space to write BYTES bytes to the gcov file. Return a
    pointer to those bytes, or NULL on failure.  */
 
@@ -255,13 +377,15 @@ 
 
   gcc_assert (gcov_var.mode < 0);
 #if IN_LIBGCOV
-  if (gcov_var.offset >= GCOV_BLOCK_SIZE)
+  if (gcov_var.offset + words >= GCOV_BLOCK_SIZE)
     {
-      gcov_write_block (GCOV_BLOCK_SIZE);
+      gcov_write_block (MIN (gcov_var.offset, GCOV_BLOCK_SIZE));
       if (gcov_var.offset)
 	{
-	  gcc_assert (gcov_var.offset == 1);
-	  memcpy (gcov_var.buffer, gcov_var.buffer + GCOV_BLOCK_SIZE, 4);
+	  gcc_assert (gcov_var.offset < GCOV_BLOCK_SIZE);
+	  memcpy (gcov_var.buffer,
+                  gcov_var.buffer + GCOV_BLOCK_SIZE,
+                  gcov_var.offset << 2);
 	}
     }
 #else
@@ -302,7 +426,6 @@ 
 }
 #endif /* IN_LIBGCOV */
 
-#if !IN_LIBGCOV
 /* Write STRING to coverage file.  Sets error flag on file
    error, overflow flag on overflow */
 
@@ -325,7 +448,6 @@ 
   buffer[alloc] = 0;
   memcpy (&buffer[1], string, length);
 }
-#endif
 
 #if !IN_LIBGCOV
 /* Write a tag TAG and reserve space for the record length. Return a
@@ -413,14 +535,15 @@ 
   unsigned excess = gcov_var.length - gcov_var.offset;
 
   gcc_assert (gcov_var.mode > 0);
+  gcc_assert (words < GCOV_BLOCK_SIZE);
   if (excess < words)
     {
       gcov_var.start += gcov_var.offset;
 #if IN_LIBGCOV
       if (excess)
 	{
-	  gcc_assert (excess == 1);
-	  memcpy (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, 4);
+	  gcc_assert (excess < GCOV_BLOCK_SIZE);
+	  memmove (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, excess * 4);
 	}
 #else
       memmove (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, excess * 4);
@@ -428,8 +551,7 @@ 
       gcov_var.offset = 0;
       gcov_var.length = excess;
 #if IN_LIBGCOV
-      gcc_assert (!gcov_var.length || gcov_var.length == 1);
-      excess = GCOV_BLOCK_SIZE;
+      excess = (sizeof (gcov_var.buffer) / sizeof (gcov_var.buffer[0])) - gcov_var.length;
 #else
       if (gcov_var.length + words > gcov_var.alloc)
 	gcov_allocate (gcov_var.length + words);
@@ -489,7 +611,6 @@ 
    buffer, or NULL on empty string. You must copy the string before
    calling another gcov function.  */
 
-#if !IN_LIBGCOV
 GCOV_LINKAGE const char *
 gcov_read_string (void)
 {
@@ -500,7 +621,6 @@ 
 
   return (const char *) gcov_read_words (length);
 }
-#endif
 
 GCOV_LINKAGE void
 gcov_read_summary (struct gcov_summary *summary)
@@ -629,6 +749,87 @@ 
 }
 #endif
 
+/* Convert an unsigned NUMBER to a percentage after dividing by
+   100.  */
+
+GCOV_LINKAGE float
+convert_unsigned_to_pct (const unsigned number)
+{
+  return (float)number / 100.0;
+}
+
+#if !IN_LIBGCOV && IN_GCOV != 1
+/* Print load latency information given by LL_INFO in a human readable
+   format into an open output file pointed by FP.  If NEWLINE is
+   nonzero, then a trailing newline is also printed.  */
+
+GCOV_LINKAGE void
+print_load_latency_line (FILE *fp, const gcov_pmu_ll_info_t *ll_info,
+                         const int print_newline)
+{
+  if (!ll_info)
+    return;
+  fprintf (fp, " %u %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% "
+           "%.2f%% %.2f%% " HOST_WIDEST_INT_PRINT_HEX " %s %d %d",
+           ll_info->counts,
+           convert_unsigned_to_pct (ll_info->self),
+           convert_unsigned_to_pct (ll_info->cum),
+           convert_unsigned_to_pct (ll_info->lt_10),
+           convert_unsigned_to_pct (ll_info->lt_32),
+           convert_unsigned_to_pct (ll_info->lt_64),
+           convert_unsigned_to_pct (ll_info->lt_256),
+           convert_unsigned_to_pct (ll_info->lt_1024),
+           convert_unsigned_to_pct (ll_info->gt_1024),
+           convert_unsigned_to_pct (ll_info->wself),
+           ll_info->code_addr,
+           ll_info->filename,
+           ll_info->line,
+           ll_info->discriminator);
+  if (print_newline)
+    fprintf (fp, "\n");
+}
+
+/* Print BRM_INFO into the file pointed by FP. If PRINT_NEWLINE is
+ * non-zero then output a trailing newline as well.  */
+
+GCOV_LINKAGE void
+print_branch_mispredict_line (FILE *fp, const gcov_pmu_brm_info_t *brm_info,
+                              const int print_newline)
+{
+  if (!brm_info)
+    return;
+  fprintf (fp, " %u %.2f%% %.2f%% " HOST_WIDEST_INT_PRINT_HEX " %s %d %d",
+           brm_info->counts,
+           convert_unsigned_to_pct (brm_info->self),
+           convert_unsigned_to_pct (brm_info->cum),
+           brm_info->code_addr,
+           brm_info->filename,
+           brm_info->line,
+           brm_info->discriminator);
+  if (print_newline)
+    fprintf (fp, "\n");
+}
+
+/* Print TOOL_HEADER into the file pointed by FP. If PRINT_NEWLINE is
+   non-zero then output a trailing newline as well.  */
+
+GCOV_LINKAGE void
+print_pmu_tool_header (FILE *fp, gcov_pmu_tool_header_t *tool_header,
+                              const int print_newline)
+{
+  if (!tool_header)
+    return;
+  fprintf (fp, "\nhost_cpu: %s\n", tool_header->host_cpu);
+  fprintf (fp, "hostname: %s\n", tool_header->hostname);
+  fprintf (fp, "kernel_version: %s\n", tool_header->kernel_version);
+  fprintf (fp, "column_header: %s\n", tool_header->column_header);
+  fprintf (fp, "column_description: %s\n", tool_header->column_description);
+  fprintf (fp, "full_header: %s\n", tool_header->full_header);
+  if (print_newline)
+    fprintf (fp, "\n");
+}
+#endif
+
 #if IN_GCOV > 0
 /* Return the modification time of the current gcov file.  */
 
@@ -715,7 +916,7 @@ 
   if (vsize <= vpos)
     {
       printk (KERN_ERR
-          "GCOV_KERNEL: something wrong: vbuf=%p vsize=%u vpos=%u\n",
+         "GCOV_KERNEL: something wrong: vbuf=%p vsize=%u vpos=%u\n",
           vbuf, vsize, vpos);
       return 0;
     }
@@ -744,4 +945,29 @@ 
   gcc_assert (0);  /* should not reach here */
   return 0;
 }
+#else /* __GCOV_KERNEL__ */
+
+#if IN_GCOV != 1
+/* Delete pmu tool header TOOL_HEADER.  */
+
+GCOV_LINKAGE void
+destroy_pmu_tool_header (gcov_pmu_tool_header_t *tool_header)
+{
+  if (!tool_header)
+    return;
+  if (tool_header->host_cpu)
+    free (tool_header->host_cpu);
+  if (tool_header->hostname)
+    free (tool_header->hostname);
+  if (tool_header->kernel_version)
+    free (tool_header->kernel_version);
+  if (tool_header->column_header)
+    free (tool_header->column_header);
+  if (tool_header->column_description)
+    free (tool_header->column_description);
+  if (tool_header->full_header)
+    free (tool_header->full_header);
+}
+#endif
+
 #endif /* GCOV_KERNEL */
Index: gcc/gcov-io.h
===================================================================
--- gcc/gcov-io.h	(revision 175188)
+++ gcc/gcov-io.h	(working copy)
@@ -313,6 +313,7 @@ 
 
 typedef unsigned gcov_unsigned_t;
 typedef unsigned gcov_position_t;
+
 /* gcov_type is typedef'd elsewhere for the compiler */
 #if IN_GCOV
 #define GCOV_LINKAGE static
@@ -363,15 +364,24 @@ 
 #define gcov_write_counter __gcov_write_counter
 #define gcov_write_summary __gcov_write_summary
 #define gcov_write_module_info __gcov_write_module_info
+#define gcov_write_string __gcov_write_string
+#define gcov_string_length __gcov_string_length
 #define gcov_read_unsigned __gcov_read_unsigned
 #define gcov_read_counter __gcov_read_counter
+#define gcov_read_string __gcov_read_string
 #define gcov_read_summary __gcov_read_summary
 #define gcov_read_module_info __gcov_read_module_info
 #define gcov_sort_n_vals __gcov_sort_n_vals
+#define gcov_canonical_filename _gcov_canonical_filename
+#define gcov_read_pmu_load_latency_info __gcov_read_pmu_load_latency_info
+#define gcov_read_pmu_branch_mispredict_info __gcov_read_pmu_branch_mispredict_info
+#define gcov_read_pmu_tool_header __gcov_read_pmu_tool_header
+#define destroy_pmu_tool_header __destroy_pmu_tool_header
 
+
 /* Poison these, so they don't accidentally slip in.  */
-#pragma GCC poison gcov_write_string gcov_write_tag gcov_write_length
-#pragma GCC poison gcov_read_string gcov_sync gcov_time gcov_magic
+#pragma GCC poison gcov_write_tag gcov_write_length
+#pragma GCC poison gcov_sync gcov_time gcov_magic
 
 #ifdef HAVE_GAS_HIDDEN
 #define ATTRIBUTE_HIDDEN  __attribute__ ((__visibility__ ("hidden")))
@@ -432,6 +442,13 @@ 
 #define GCOV_TAG_SUMMARY_LENGTH  \
 	(1 + GCOV_COUNTERS_SUMMABLE * (2 + 3 * 2))
 #define GCOV_TAG_MODULE_INFO ((gcov_unsigned_t)0xa4000000)
+#define GCOV_TAG_PMU_LOAD_LATENCY_INFO ((gcov_unsigned_t)0xa5000000)
+#define GCOV_TAG_PMU_LOAD_LATENCY_LENGTH(filename)  \
+  (gcov_string_length (filename) + 12 + 2)
+#define GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO ((gcov_unsigned_t)0xa7000000)
+#define GCOV_TAG_PMU_BRANCH_MISPREDICT_LENGTH(filename)  \
+  (gcov_string_length (filename) + 5 + 2)
+#define GCOV_TAG_PMU_TOOL_HEADER ((gcov_unsigned_t)0xa9000000)
 
 /* Counters that are collected.  */
 #define GCOV_COUNTER_ARCS 	0  /* Arc transitions.  */
@@ -576,6 +593,91 @@ 
    && !((module_infos[0]->lang & GCOV_MODULE_ASM_STMTS)			\
 	&& flag_ripa_disallow_asm_modules))
 
+/* Information about the hardware performance monitoring unit.  */
+struct gcov_pmu_info
+{
+  const char *pmu_profile_filename;	/* pmu profile filename  */
+  const char *pmu_tool;  	/* canonical pmu tool options  */
+  gcov_unsigned_t pmu_top_n_address;  /* how many top addresses to symbolize */
+};
+
+/* Information about the PMU tool header.  */
+typedef struct gcov_pmu_tool_header {
+  char *host_cpu;
+  char *hostname;
+  char *kernel_version;
+  char *column_header;
+  char *column_description;
+  char *full_header;
+} gcov_pmu_tool_header_t;
+
+/* Available only for PMUs which support PEBS or IBS using pfmon
+   tool. If any field here is changed, the length computation in
+   GCOV_TAG_PMU_LOAD_LATENCY_LENGTH must be updated as well. All
+   percentages are multiplied by 100 to make them out of 10000 and
+   only integer part is kept.  */
+typedef struct gcov_pmu_load_latency_info
+{
+  gcov_unsigned_t counts;     /* raw count of samples */
+  gcov_unsigned_t self;       /* per 10k of total samples */
+  gcov_unsigned_t cum;        /* per 10k cumulative weight */
+  gcov_unsigned_t lt_10;      /* per 10k with latency <= 10 cycles */
+  gcov_unsigned_t lt_32;      /* per 10k with latency <= 32 cycles */
+  gcov_unsigned_t lt_64;      /* per 10k with latency <= 64 cycles */
+  gcov_unsigned_t lt_256;     /* per 10k with latency <= 256 cycles */
+  gcov_unsigned_t lt_1024;    /* per 10k with latency <= 1024 cycles */
+  gcov_unsigned_t gt_1024;    /* per 10k with latency > 1024 cycles */
+  gcov_unsigned_t wself;      /* weighted average cost of this miss in cycles */
+  gcov_type code_addr;        /* the actual miss address (pc+1 for Intel) */
+  gcov_unsigned_t line;       /* line number corresponding to this miss */
+  gcov_unsigned_t discriminator;   /* discriminator information for this miss */
+  char *filename;       /* filename corresponding to this miss */
+} gcov_pmu_ll_info_t;
+
+/* This structure is used during runtime as well as in gcov.  */
+typedef struct load_latency_infos
+{
+  /* An array describing the total number of load latency fields.  */
+  gcov_pmu_ll_info_t **ll_array;
+  /* The total number of entries in the load latency array.  */
+  unsigned ll_count;
+  /* The total number of entries currently allocated in the array.
+     Used for bookkeeping.  */
+  unsigned alloc_ll_count;
+  /* PMU tool header */
+  gcov_pmu_tool_header_t *pmu_tool_header;
+} ll_infos_t;
+
+/* Available only for PMUs which support PEBS or IBS using pfmon
+   tool. If any field here is changed, the length computation in
+   GCOV_TAG_PMU_BR_MISPREDICT_LENGTH must be updated as well. All
+   percentages are multiplied by 100 to make them out of 10000 and
+   only integer part is kept.  */
+typedef struct gcov_pmu_branch_mispredict_info
+{
+  gcov_unsigned_t counts;     /* raw count of samples */
+  gcov_unsigned_t self;       /* per 10k of total samples */
+  gcov_unsigned_t cum;        /* per 10k cumulative weight */
+  gcov_type code_addr;        /* the actual mispredict address */
+  gcov_unsigned_t line;       /* line number corresponding to this event */
+  gcov_unsigned_t discriminator;   /* discriminator for this event */
+  char *filename;       /* filename corresponding to this event */
+} gcov_pmu_brm_info_t;
+
+/* This structure is used during runtime as well as in gcov.  */
+typedef struct branch_mispredict_infos
+{
+  /* An array describing the total number of mispredict entries.  */
+  gcov_pmu_brm_info_t **brm_array;
+  /* The total number of entries in the above array.  */
+  unsigned brm_count;
+  /* The total number of entries currently allocated in the array.
+     Used for bookkeeping.  */
+  unsigned alloc_brm_count;
+  /* PMU tool header */
+  gcov_pmu_tool_header_t *pmu_tool_header;
+} brm_infos_t;
+
 /* Structures embedded in coveraged program.  The structures generated
    by write_profile must match these.  */
 
@@ -635,9 +737,6 @@ 
 /* Register a new object file module.  */
 extern void __gcov_init (struct gcov_info *) ATTRIBUTE_HIDDEN;
 
-/* Set sampling rate to RATE.  */
-extern void __gcov_set_sampling_rate (unsigned int rate);
-
 /* Called before fork, to avoid double counting.  */
 extern void __gcov_flush (void) ATTRIBUTE_HIDDEN;
 
@@ -674,6 +773,12 @@ 
 extern void __gcov_ior_profiler (gcov_type *, gcov_type);
 extern void __gcov_sort_n_vals (gcov_type *value_array, int n);
 
+/* Initialize/start/stop/dump performance monitoring unit (PMU) profile */
+void __gcov_init_pmu_profiler (struct gcov_pmu_info *) ATTRIBUTE_HIDDEN;
+void __gcov_start_pmu_profiler (void) ATTRIBUTE_HIDDEN;
+void __gcov_stop_pmu_profiler (void) ATTRIBUTE_HIDDEN;
+void __gcov_end_pmu_profiler (void) ATTRIBUTE_HIDDEN;
+
 #ifndef inhibit_libc
 /* The wrappers around some library functions..  */
 extern pid_t __gcov_fork (void) ATTRIBUTE_HIDDEN;
@@ -746,14 +851,42 @@ 
 static gcov_position_t gcov_position (void);
 static int gcov_is_error (void);
 
+GCOV_LINKAGE const char *gcov_read_string (void) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE gcov_unsigned_t gcov_read_unsigned (void) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE gcov_type gcov_read_counter (void) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE void gcov_read_summary (struct gcov_summary *) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE char *gcov_canonical_filename (char *filename) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE void
+gcov_read_pmu_load_latency_info (gcov_pmu_ll_info_t *ll_info,
+                                 gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE void
+gcov_read_pmu_branch_mispredict_info (gcov_pmu_brm_info_t *brm_info,
+                                      gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE void
+gcov_read_pmu_tool_header (gcov_pmu_tool_header_t *tool_header,
+                           gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE float convert_unsigned_to_pct (
+    const unsigned number) ATTRIBUTE_HIDDEN;
+
 #if !IN_LIBGCOV && IN_GCOV != 1
 GCOV_LINKAGE void gcov_read_module_info (struct gcov_module_info *mod_info,
 					 gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE void print_load_latency_line (FILE *fp,
+                                           const gcov_pmu_ll_info_t *ll_info,
+                                           const int print_newline);
+GCOV_LINKAGE void
+print_branch_mispredict_line (FILE *fp, const gcov_pmu_brm_info_t *brm_info,
+                              const int print_newline);
+GCOV_LINKAGE void print_pmu_tool_header (FILE *fp,
+                                         gcov_pmu_tool_header_t *tool_header,
+                                         const int print_newline);
 #endif
 
+#if IN_GCOV != 1
+GCOV_LINKAGE void destroy_pmu_tool_header (gcov_pmu_tool_header_t *tool_header)
+  ATTRIBUTE_HIDDEN;
+#endif
+
 #if IN_LIBGCOV
 /* Available only in libgcov */
 GCOV_LINKAGE void gcov_write_counter (gcov_type) ATTRIBUTE_HIDDEN;
@@ -771,10 +904,10 @@ 
 static void gcov_rewrite (void);
 GCOV_LINKAGE void gcov_seek (gcov_position_t /*position*/) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE void gcov_truncate (void) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE gcov_unsigned_t gcov_string_length (const char *) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE unsigned gcov_gcda_file_size (struct gcov_info *);
 #else
 /* Available outside libgcov */
-GCOV_LINKAGE const char *gcov_read_string (void);
 GCOV_LINKAGE void gcov_sync (gcov_position_t /*base*/,
 			     gcov_unsigned_t /*length */);
 #endif
@@ -782,11 +915,11 @@ 
 #if !IN_GCOV
 /* Available outside gcov */
 GCOV_LINKAGE void gcov_write_unsigned (gcov_unsigned_t) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE void gcov_write_string (const char *) ATTRIBUTE_HIDDEN;
 #endif
 
 #if !IN_GCOV && !IN_LIBGCOV
 /* Available only in compiler */
-GCOV_LINKAGE void gcov_write_string (const char *);
 GCOV_LINKAGE gcov_position_t gcov_write_tag (gcov_unsigned_t);
 GCOV_LINKAGE void gcov_write_length (gcov_position_t /*position*/);
 #endif
Index: gcc/opts.c
===================================================================
--- gcc/opts.c	(revision 175188)
+++ gcc/opts.c	(working copy)
@@ -36,6 +36,9 @@ 
 #include "insn-attr.h"		/* For INSN_SCHEDULING and DELAY_SLOTS.  */
 #include "target.h"
 
+/* Defined in coverage.c.  */
+extern int check_pmu_profile_options (const char *options);
+
 /* Parse the -femit-struct-debug-detailed option value
    and set the flag variables. */
 
@@ -1597,6 +1600,15 @@ 
         opts->x_flag_ipa_reference = false;
       break;
 
+    case OPT_fpmu_profile_generate_:
+      /* This should be ideally turned on in conjunction with
+         -fprofile-dir or -fprofile-generate in order to specify a
+         profile directory.  */
+      if (check_pmu_profile_options (arg))
+        error ("Unrecognized pmu_profile_generate value \"%s\"", arg);
+      flag_pmu_profile_generate = xstrdup (arg);
+      break;
+
     case OPT_fshow_column:
       dc->show_column = value;
       break;
Index: gcc/pmu-profile.c
===================================================================
--- gcc/pmu-profile.c	(revision 0)
+++ gcc/pmu-profile.c	(revision 0)
@@ -0,0 +1,1552 @@ 
+/* Performance monitoring unit (PMU) profiler. If available, use an
+   external tool to collect hardware performance counter data and
+   write it in the .gcda files.
+
+   Copyright (C) 2011. Free Software Foundation, Inc.
+   Contributed by Sharad Singhai <singhai@google.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#include "tconfig.h"
+#include "tsystem.h"
+#include "coretypes.h"
+#include "tm.h"
+#if (defined (__x86_64__) || defined (__i386__))
+#include "cpuid.h"
+#endif
+
+#if defined(inhibit_libc)
+#define IN_LIBGCOV (-1)
+#else
+#include <stdio.h>
+#include <stdlib.h>
+#define IN_LIBGCOV 1
+  #if defined(L_gcov)
+  #define GCOV_LINKAGE /* nothing */
+  #endif
+#endif
+#include "gcov-io.h"
+#ifdef TARGET_POSIX_IO
+  #include <fcntl.h>
+  #include <signal.h>
+  #include <sys/stat.h>
+  #include <sys/types.h>
+#endif
+
+#if defined(inhibit_libc)
+#else
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#define XNEWVEC(type,ne) (type *)calloc((ne),sizeof(type))
+#define XNEW(type) (type *)malloc(sizeof(type))
+#define XDELETEVEC(p) free(p)
+#define XDELETE(p) free(p)
+
+#define PFMON_CMD "/usr/bin/pfmon"
+#define ADDR2LINE_CMD "/usr/bin/addr2line"
+#define PMU_TOOL_MAX_ARGS (20)
+static char default_addr2line[] = "??:0";
+static const char pfmon_ll_header[] = "#     counts   %self    %cum     "
+    "<10     <32     <64    <256   <1024  >=1024  %wself          "
+    "code addr symbol\n";
+static const char pfmon_bm_header[] =
+    "#     counts   %self    %cum          code addr symbol\n";
+
+const char *pfmon_intel_ll_args[PMU_TOOL_MAX_ARGS] = {
+  PFMON_CMD,
+  "--aggregate-results",
+  "--follow-all",
+  "--with-header",
+  "--smpl-module=pebs-ll",
+  "--ld-lat-threshold=4",
+  "--pebs-ll-dcmiss-code",
+  "--resolve-addresses",
+  "-emem_inst_retired:LATENCY_ABOVE_THRESHOLD",
+  "--long-smpl-periods=10000",
+  0  /* terminating NULL must be present */
+};
+
+const char *pfmon_amd_ll_args[PMU_TOOL_MAX_ARGS] = {
+  PFMON_CMD,
+  "--aggregate-results",
+  "--follow-all",
+  "-uk",
+  "--with-header",
+  "--smpl-module=ibs",
+  "--resolve-addresses",
+  "-eibsop_event:uops",
+  "--ibs-dcmiss-code",
+  "--long-smpl-periods=0xffff0",
+  0  /* terminating NULL must be present */
+};
+
+const char *pfmon_intel_brm_args[PMU_TOOL_MAX_ARGS] = {
+  PFMON_CMD,
+  "--aggregate-results",
+  "--follow-all",
+  "--with-header",
+  "--resolve-addresses",
+  "-eMISPREDICTED_BRANCH_RETIRED",
+  "--long-smpl-periods=10000",
+  0  /* terminating NULL must be present */
+};
+
+const char *pfmon_amd_brm_args[PMU_TOOL_MAX_ARGS] = {
+  PFMON_CMD,
+  "--aggregate-results",
+  "--follow-all",
+  "--with-header",
+  "--resolve-addresses",
+  "-eRETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS",
+  "--long-smpl-periods=10000",
+  0  /* terminating NULL must be present */
+};
+
+const char *addr2line_args[PMU_TOOL_MAX_ARGS] = {
+  ADDR2LINE_CMD,
+  "-e",
+  0  /* terminating NULL must be present */
+};
+
+
+enum pmu_tool_type
+{
+  PTT_PFMON,
+  PTT_LAST
+};
+
+enum pmu_event_type
+{
+  PET_INTEL_LOAD_LATENCY,
+  PET_AMD_LOAD_LATENCY,
+  PET_INTEL_BRANCH_MISPREDICT,
+  PET_AMD_BRANCH_MISPREDICT,
+  PET_LAST
+};
+
+typedef struct pmu_tool_fns {
+  const char *name;     /* name of the pmu tool */
+  /* pmu tool commandline argument.  */
+  const char **arg_array;
+  /* Initialize pmu module.  */
+  void *(*init_pmu_module) (void);
+  /* Start profililing.  */
+  void (*start_pmu_module) (pid_t ppid, char *tmpfile, const char **args);
+  /* Stop profililing.  */
+  void (*stop_pmu_module) (void);
+  /* How to parse the output generated by the PMU tool.  */
+  int (*parse_pmu_output) (char *filename, void *pmu_data);
+  /* How to write parsed pmu data into gcda file.  */
+  void (*gcov_write_pmu_data) (void *data);
+  /* How to cleanup any data structure created during parsing.  */
+  void (*cleanup_pmu_data) (void *data);
+  /* How to initialize symbolizer for the PPID.  */
+  int (*start_symbolizer) (pid_t ppid);
+  void (*end_symbolizer) (void);
+  char *(*symbolize) (void *addr);
+} pmu_tool_fns;
+
+enum pmu_state
+{
+  PMU_NONE,             /* Not configurated at all.  */
+  PMU_INITIALIZED,      /* Configured and initialized.  */
+  PMU_ERROR,            /* Configuration error. Cannot recover.  */
+  PMU_ON,               /* Currently profiling.  */
+  PMU_OFF               /* Currently stopped, but can be restarted.  */
+};
+
+enum cpu_vendor_signature
+{
+  CPU_VENDOR_UKNOWN = 0,
+  CPU_VENDOR_INTEL  = 0x756e6547, /* Genu */
+  CPU_VENDOR_AMD    = 0x68747541 /* Auth */
+};
+
+/* Info about pmu tool during the run time.  */
+struct pmu_tool_info
+{
+  /* Current pmu tool.  */
+  enum pmu_tool_type tool;
+  /* Current event.  */
+  enum pmu_event_type event;
+  /* filename for storing the pmu profile.  */
+  char *pmu_profile_filename;
+  /* Intermediate file where the tool stores the PMU data.  */
+  char *raw_pmu_profile_filename;
+  /* Where PMU tool's stderr should be stored.  */
+  char *tool_stderr_filename;
+  enum pmu_state pmu_profiling_state;
+  enum cpu_vendor_signature cpu_vendor; /* as discovered by cpuid */
+  pid_t pmu_tool_pid;   /* process id of the pmu tool */
+  pid_t symbolizer_pid; /* process id of the symbolizer */
+  int symbolizer_to_pipefd[2]; /* pipe for writing to the symbolizer */
+  int symbolizer_from_pipefd[2];  /* pipe for reading from the symbolizer */
+  void *pmu_data;       /* an opaque pointer for the tool to store pmu data */
+  int verbose;          /* turn on additional debugging */
+  unsigned top_n_address;  /* how many addresses to symbolize */
+  pmu_tool_fns *tool_details;  /* list of functions how to start/stop/parse */
+};
+
+/* Global struct for recordkeeping.  */
+static struct pmu_tool_info *the_pmu_tool_info;
+
+/* Additional info is printed if these are non-zero.  */
+static int tool_debug = 0;
+static int sym_debug = 0;
+
+static int parse_load_latency_line (char *line, gcov_pmu_ll_info_t *ll_info);
+static int parse_branch_mispredict_line (char *line,
+                                         gcov_pmu_brm_info_t *brm_info);
+static unsigned convert_pct_to_unsigned (float pct);
+static void start_pfmon_module (pid_t ppid, char *tmpfile, const char **pfmon_args);
+static void *init_pmu_load_latency (void);
+static void *init_pmu_branch_mispredict (void);
+static void destroy_load_latency_infos (void *info);
+static void destroy_branch_mispredict_infos (void *info);
+static int parse_pfmon_load_latency (char *filename, void *pmu_data);
+static int parse_pfmon_branch_mispredicts (char *filename, void *pmu_data);
+static gcov_unsigned_t gcov_tag_pmu_tool_header_length (gcov_pmu_tool_header_t
+                                                        *header);
+static void gcov_write_tool_header (gcov_pmu_tool_header_t *header);
+static void gcov_write_load_latency_infos (void *info);
+static void gcov_write_branch_mispredict_infos (void *info);
+static void gcov_write_ll_line (const gcov_pmu_ll_info_t *ll_info);
+static void gcov_write_branch_mispredict_line (const gcov_pmu_brm_info_t
+                                               *brm_info);
+static int start_addr2line_symbolizer (pid_t pid);
+static void end_addr2line_symbolizer (void);
+static char *symbolize_addr2line (void *p);
+static void reset_symbolizer_parent_pipes (void);
+static void reset_symbolizer_child_pipes (void);
+/* parse and cache relevant tool info.  */
+static int parse_pmu_profile_options (const char *options);
+static gcov_pmu_tool_header_t *parse_pfmon_tool_header (FILE *fp,
+                                                        const char *end_header);
+
+
+/* How to access the necessary functions for the PMU tools.  */
+pmu_tool_fns all_pmu_tool_fns[PTT_LAST][PET_LAST] = {
+  {
+    {
+      "intel-load-latency",             /* name */
+      pfmon_intel_ll_args,              /* tool args */
+      init_pmu_load_latency,            /* initialization */
+      start_pfmon_module,               /* start */
+      0,                                /* stop */
+      parse_pfmon_load_latency,         /* parse */
+      gcov_write_load_latency_infos,    /* write */
+      destroy_load_latency_infos,       /* cleanup */
+      start_addr2line_symbolizer,       /* start symbolizer */
+      end_addr2line_symbolizer,         /* end symbolizer */
+      symbolize_addr2line,              /* symbolize */
+    },
+    {
+      "amd-load-latency",               /* name */
+      pfmon_amd_ll_args,                /* tool args */
+      init_pmu_load_latency,            /* initialization */
+      start_pfmon_module,               /* start */
+      0,                                /* stop */
+      parse_pfmon_load_latency,         /* parse */
+      gcov_write_load_latency_infos,    /* write */
+      destroy_load_latency_infos,       /* cleanup */
+      start_addr2line_symbolizer,       /* start symbolizer */
+      end_addr2line_symbolizer,         /* end symbolizer */
+      symbolize_addr2line,              /* symbolize */
+    },
+    {
+      "intel-branch-mispredict",        /* name */
+      pfmon_intel_brm_args,             /* tool args */
+      init_pmu_branch_mispredict,       /* initialization */
+      start_pfmon_module,               /* start */
+      0,                                /* stop */
+      parse_pfmon_branch_mispredicts,   /* parse */
+      gcov_write_branch_mispredict_infos,/* write */
+      destroy_branch_mispredict_infos,  /* cleanup */
+      start_addr2line_symbolizer,       /* start symbolizer */
+      end_addr2line_symbolizer,         /* end symbolizer */
+      symbolize_addr2line,              /* symbolize */
+    },
+    {
+      "amd-branch-mispredict",          /* name */
+      pfmon_amd_brm_args,               /* tool args */
+      init_pmu_branch_mispredict,       /* initialization */
+      start_pfmon_module,               /* start */
+      0,                                /* stop */
+      parse_pfmon_branch_mispredicts,   /* parse */
+      gcov_write_branch_mispredict_infos,/* write */
+      destroy_branch_mispredict_infos,  /* cleanup */
+      start_addr2line_symbolizer,       /* start symbolizer */
+      end_addr2line_symbolizer,         /* end symbolizer */
+      symbolize_addr2line,              /* symbolize */
+    }
+  }
+};
+
+/* Determine the CPU vendor.  Currently only distinguishes x86 based
+   cpus where the vendor is either Intel or AMD.  Returns one of the
+   enum cpu_vendor_signatures.  */
+
+static unsigned int
+get_x86cpu_vendor (void)
+{
+  unsigned int vendor = CPU_VENDOR_UKNOWN;
+
+#if (defined (__x86_64__) || defined (__i386__))
+  if (__get_cpuid_max (0, &vendor) < 1)
+    return CPU_VENDOR_UKNOWN;      /* Cannot determine cpu type.  */
+#endif
+
+  if (vendor == CPU_VENDOR_INTEL || vendor == CPU_VENDOR_AMD)
+    return vendor;
+  else
+    return CPU_VENDOR_UKNOWN;
+}
+
+
+/* Parse PMU tool option string provided on the command line and store
+   information in global structure.  Return 0 on success, otherwise
+   return 1.  Any changes to this should be synced with
+   check_pmu_profile_options() which does compile time check.  */
+
+static int
+parse_pmu_profile_options (const char *options)
+{
+  enum pmu_tool_type ptt = the_pmu_tool_info->tool;
+  enum pmu_event_type pet = PET_LAST;
+  const char *pmutool_path;
+  the_pmu_tool_info->cpu_vendor =  get_x86cpu_vendor ();
+  /* Determine the platform we are running on.  */
+  if (the_pmu_tool_info->cpu_vendor == CPU_VENDOR_UKNOWN)
+    {
+      /* Cpuid failed or uknown vendor.  */
+      the_pmu_tool_info->pmu_profiling_state = PMU_ERROR;
+      return 1;
+    }
+
+  /* Validate the options.  */
+  if (strcmp(options, "load-latency") &&
+      strcmp(options, "load-latency-verbose") &&
+      strcmp(options, "branch-mispredict") &&
+      strcmp(options, "branch-mispredict-verbose"))
+    return 1;
+
+  /* Check if are aksed to collect load latency PMU data.  */
+  if (!strcmp(options, "load-latency") ||
+      !strcmp(options, "load-latency-verbose"))
+    {
+      if (the_pmu_tool_info->cpu_vendor == CPU_VENDOR_INTEL)
+        pet = PET_INTEL_LOAD_LATENCY;
+      else
+        pet = PET_AMD_LOAD_LATENCY;
+      if (!strcmp(options, "load-latency-verbose"))
+        the_pmu_tool_info->verbose = 1;
+    }
+
+  /* Check if are aksed to collect branch mispredict PMU data.  */
+  if (!strcmp(options, "branch-mispredict") ||
+      !strcmp(options, "branch-mispredict-verbose"))
+    {
+      if (the_pmu_tool_info->cpu_vendor == CPU_VENDOR_INTEL)
+        pet = PET_INTEL_BRANCH_MISPREDICT;
+      else
+        pet = PET_AMD_BRANCH_MISPREDICT;
+      if (!strcmp(options, "branch-mispredict-verbose"))
+        the_pmu_tool_info->verbose = 1;
+    }
+
+  the_pmu_tool_info->tool_details = &all_pmu_tool_fns[ptt][pet];
+  the_pmu_tool_info->event = pet;
+
+  /* Allow users to override the default tool path.  */
+  pmutool_path = getenv ("GCOV_PMUTOOL_PATH");
+  if (pmutool_path && strlen (pmutool_path))
+    the_pmu_tool_info->tool_details->arg_array[0] = pmutool_path;
+
+  return 0;
+}
+
+/* Do the initialization of addr2line symbolizer for the process id
+   given by TASK_PID.  It forks an addr2line process and creates two
+   pipes where addresses can be written and source_filename:line_num
+   entries can be read.  Returns 0 on success, non-zero otherwise.  */
+
+static int
+start_addr2line_symbolizer (pid_t task_pid)
+{
+  pid_t pid;
+  char *addr2line_path;
+
+  /* Allow users to override the default addr2line path.  */
+  addr2line_path = getenv ("GCOV_ADDR2LINE_PATH");
+  if (addr2line_path && strlen (addr2line_path))
+    addr2line_args[0] = addr2line_path;
+
+  if (pipe (the_pmu_tool_info->symbolizer_from_pipefd) == -1)
+    {
+      fprintf (stderr, "Cannot create symbolizer write pipe.\n");
+      return 1;
+    }
+  if (pipe (the_pmu_tool_info->symbolizer_to_pipefd) == -1)
+    {
+      fprintf (stderr, "Cannot create symbolizer read pipe.\n");
+      return 1;
+    }
+
+  pid = fork ();
+  if (pid == -1)
+    {
+      /* error condition */
+      fprintf (stderr, "Cannot create symbolizer process.\n");
+      reset_symbolizer_parent_pipes ();
+      reset_symbolizer_child_pipes ();
+      return 1;
+    }
+
+  if (pid == 0)
+    {
+      /* child does an exec and then connects to/from the pipe */
+      unsigned n_args = 0;
+      char proc_exe_buf[128];
+      int new_write_fd, new_read_fd;
+      int i;
+
+      /* Go over the current addr2line args.  */
+      for (i = 0; i < PMU_TOOL_MAX_ARGS && addr2line_args[i]; ++i)
+        n_args++;
+
+      /* We are going to add one more arg for the /proc/pid/exe */
+      if (n_args >= (PMU_TOOL_MAX_ARGS - 1))
+        {
+          fprintf (stderr, "too many addr2line args: %d\n", n_args);
+          _exit (0);
+        }
+      snprintf (proc_exe_buf, sizeof (proc_exe_buf) - 1, "/proc/%d/exe",
+                task_pid);
+
+      /* Add the extra arg for the process id.  */
+      addr2line_args[n_args] = proc_exe_buf;
+      n_args++;
+
+      addr2line_args[n_args] = (const char *)NULL;  /* terminating NULL */
+
+      if (sym_debug)
+        {
+          fprintf (stderr, "addr2line args:");
+          for (i = 0; i < PMU_TOOL_MAX_ARGS && addr2line_args[i]; ++i)
+            fprintf (stderr, " %s", addr2line_args[i]);
+          fprintf (stderr, "\n");
+        }
+
+      /* Close unused ends of the two pipes.  */
+      reset_symbolizer_child_pipes ();
+
+      /* Connect the pipes to stdin/stdout of the child process.  */
+      new_read_fd = dup2 (the_pmu_tool_info->symbolizer_to_pipefd[0], 0);
+      new_write_fd = dup2 (the_pmu_tool_info->symbolizer_from_pipefd[1], 1);
+      if (new_read_fd == -1 || new_write_fd == -1)
+        {
+          fprintf (stderr, "could not dup symbolizer fds\n");
+          reset_symbolizer_parent_pipes ();
+          reset_symbolizer_child_pipes ();
+          _exit (0);
+        }
+      the_pmu_tool_info->symbolizer_to_pipefd[0] = new_read_fd;
+      the_pmu_tool_info->symbolizer_from_pipefd[1] = new_write_fd;
+
+      /* Do execve with NULL env. */
+      execve (addr2line_args[0], (char * const*)addr2line_args,
+              (char * const*)NULL);
+      /* exec returned, an error condition.  */
+      fprintf (stderr, "could not create symbolizer process: %s\n",
+               addr2line_args[0]);
+      reset_symbolizer_parent_pipes ();
+      reset_symbolizer_child_pipes ();
+      _exit (0);
+    }
+  else
+    {
+      /* parent */
+      the_pmu_tool_info->symbolizer_pid = pid;
+      /* Close unused ends of the two pipes.  */
+      reset_symbolizer_parent_pipes ();
+      return 0;
+    }
+  return 0;
+}
+
+/* Close unused write end of the from-pipe and read end of the
+   to-pipe.  */
+
+static void
+reset_symbolizer_parent_pipes (void)
+{
+  if (the_pmu_tool_info->symbolizer_from_pipefd[1] != -1)
+    {
+      close (the_pmu_tool_info->symbolizer_from_pipefd[1]);
+      the_pmu_tool_info->symbolizer_from_pipefd[1] = -1;
+    }
+  if (the_pmu_tool_info->symbolizer_to_pipefd[0] != -1)
+    {
+      close (the_pmu_tool_info->symbolizer_to_pipefd[0]);
+      the_pmu_tool_info->symbolizer_to_pipefd[0] = -1;
+    }
+}
+
+/* Close unused write end of the to-pipe and read end of the
+   from-pipe.  */
+
+static void
+reset_symbolizer_child_pipes (void)
+{
+  if (the_pmu_tool_info->symbolizer_to_pipefd[1] != -1)
+    {
+      close (the_pmu_tool_info->symbolizer_to_pipefd[1]);
+      the_pmu_tool_info->symbolizer_to_pipefd[1] = -1;
+    }
+  if (the_pmu_tool_info->symbolizer_from_pipefd[0] != -1)
+    {
+      close (the_pmu_tool_info->symbolizer_from_pipefd[0]);
+      the_pmu_tool_info->symbolizer_from_pipefd[0] = -1;
+    }
+}
+
+
+/* Perform cleanup for the symbolizer process.  */
+
+static void
+end_addr2line_symbolizer (void)
+{
+  int pid_status;
+  int wait_status;
+  pid_t pid = the_pmu_tool_info->symbolizer_pid;
+
+  /* Symbolizer was not running.  */
+  if (!pid)
+    return;
+
+  reset_symbolizer_parent_pipes ();
+  reset_symbolizer_child_pipes ();
+  kill (pid, SIGTERM);
+  wait_status = waitpid (pid, &pid_status, 0);
+  if (sym_debug)
+  {
+    if (wait_status == pid)
+      fprintf (stderr, "Normal exit. symbolizer terminated.\n");
+    else
+      fprintf (stderr, "Abnormal exit. symbolizer status, %d.\n", pid_status);
+  }
+  the_pmu_tool_info->symbolizer_pid = 0;  /* Symoblizer no longer running.  */
+}
+
+
+/* Given an address ADDR, return a string containing
+   source_filename:line_num entries.  */
+
+static char *
+symbolize_addr2line (void *addr)
+{
+  char buf[32];  /* holds the ascii version of address */
+  int write_count;
+  int read_count;
+  char *srcfile_linenum;
+  size_t max_length = 1024;
+
+  if (!the_pmu_tool_info->symbolizer_pid)
+    return default_addr2line;    /* symbolizer is not running */
+
+  write_count = snprintf (buf, sizeof (buf) - 1, "%p\n", addr);
+
+  /* Write the address into the pipe.  */
+  if (write (the_pmu_tool_info->symbolizer_to_pipefd[1], buf, write_count)
+      < write_count)
+    {
+      if (sym_debug)
+        fprintf (stderr, "Cannot write symbolizer pipe.\n");
+      return default_addr2line;
+    }
+
+  srcfile_linenum = XNEWVEC (char, max_length);
+  read_count = read (the_pmu_tool_info->symbolizer_from_pipefd[0],
+                     srcfile_linenum, max_length);
+  if (read_count == -1)
+    {
+      if (sym_debug)
+        fprintf (stderr, "Cannot read symbolizer pipe.\n");
+      XDELETEVEC (srcfile_linenum);
+      return default_addr2line;
+    }
+
+  srcfile_linenum[read_count] = 0;
+  if (sym_debug)
+    fprintf (stderr, "symbolizer: for address %p, read_count %d, got %s\n",
+             addr, read_count, srcfile_linenum);
+  return srcfile_linenum;
+}
+
+/* Start monitoring PPID process via pfmon tool using TMPFILE as a
+   file to store the raw data and using PFMON_ARGS as the command line
+   arguments.  */
+
+static void
+start_pfmon_module (pid_t ppid, char *tmpfile, const char **pfmon_args)
+{
+  int i;
+  unsigned int n_args = 0;
+  unsigned n_chars;
+  char pid_buf[64];
+  char filename_buf[1024];
+  char top_n_buf[24];
+  unsigned extra_args;
+
+  /* Go over the current pfmon args */
+  for (i = 0; i < PMU_TOOL_MAX_ARGS && pfmon_args[i]; ++i)
+    n_args++;
+
+  if (the_pmu_tool_info->verbose)
+    extra_args = 4; /* account for additional --verbose */
+  else
+    extra_args = 3;
+
+  /* We are going to add args.  */
+  if (n_args >= (PMU_TOOL_MAX_ARGS - extra_args))
+    {
+      fprintf (stderr, "too many pfmon args: %d\n", n_args);
+      _exit (0);
+    }
+
+  n_chars = snprintf (pid_buf, sizeof (pid_buf), "--attach-task=%ld",
+                      (long)ppid);
+  if (n_chars >= sizeof (pid_buf))
+    {
+      fprintf (stderr, "pfmon task id too long: %s\n", pid_buf);
+      return;
+    }
+  pfmon_args[n_args] = pid_buf;
+  n_args++;
+
+  n_chars = snprintf (filename_buf, sizeof (filename_buf), "--smpl-outfile=%s",
+                      tmpfile);
+  if (n_chars >= sizeof (filename_buf))
+    {
+      fprintf (stderr, "pfmon filename too long: %s\n", filename_buf);
+      return;
+    }
+  pfmon_args[n_args] = filename_buf;
+  n_args++;
+
+  n_chars = snprintf (top_n_buf, sizeof (top_n_buf), "--smpl-show-top=%d",
+                      the_pmu_tool_info->top_n_address);
+  if (n_chars >= sizeof (top_n_buf))
+    {
+      fprintf (stderr, "pfmon option too long: %s\n", top_n_buf);
+      return;
+    }
+  pfmon_args[n_args] = top_n_buf;
+  n_args++;
+
+  if (the_pmu_tool_info->verbose) {
+    /* Add --verbose as well.  */
+    pfmon_args[n_args] = "--verbose";
+    n_args++;
+  }
+  pfmon_args[n_args] = (char *)NULL;
+
+  if (tool_debug)
+    {
+      fprintf (stderr, "pfmon args:");
+      for (i = 0; i < PMU_TOOL_MAX_ARGS && pfmon_args[i]; ++i)
+        fprintf (stderr, " %s", pfmon_args[i]);
+      fprintf (stderr, "\n");
+    }
+  /* Do execve with NULL env.  */
+  execve (pfmon_args[0], (char *const *)pfmon_args, (char * const*)NULL);
+  /* does not return */
+}
+
+/* Convert a fractional PCT to an unsigned integer after
+   muliplying by 100.  */
+
+static unsigned
+convert_pct_to_unsigned (float pct)
+{
+  return (unsigned)(pct * 100.0);
+}
+
+/* Parse the load latency info pointed by LINE and save it into
+   LL_INFO. Returns 0 if the line was parsed successfully, non-zero
+   otherwise.
+
+   An example header+line look like these:
+   "counts   %self    %cum     <10     <32     <64    <256   <1024  >=1024
+   %wself          code addr symbol"
+   "218  24.06%  24.06% 100.00%   0.00%   0.00%   0.00%   0.00%   0.00%  22.70%
+   0x0000000000413e75 CalcSSIM(...)+965</tmp/psnr>"
+*/
+
+static int
+parse_load_latency_line (char *line, gcov_pmu_ll_info_t *ll_info)
+{
+  unsigned counts;
+  /* These are percentages parsed as floats, but then converted to
+     integers after multiplying by 100.  */
+  float self, cum, lt_10, lt_32, lt_64, lt_256, lt_1024, gt_1024, wself;
+  long unsigned int p;
+  int n_values;
+  pmu_tool_fns *tool_details = the_pmu_tool_info->tool_details;
+
+  n_values = sscanf (line, "%u%f%%%f%%%f%%%f%%%f%%%f%%%f%%%f%%%f%%%lx",
+                     &counts, &self, &cum, &lt_10, &lt_32, &lt_64, &lt_256,
+                     &lt_1024, &gt_1024, &wself, &p);
+  if (n_values != 11)
+    return 1;
+
+  /* Values read successfully. Do the assignment after converting
+   * percentages into ints.  */
+  ll_info->counts = counts;
+  ll_info->self = convert_pct_to_unsigned (self);
+  ll_info->cum = convert_pct_to_unsigned (cum);
+  ll_info->lt_10 = convert_pct_to_unsigned (lt_10);
+  ll_info->lt_32 = convert_pct_to_unsigned (lt_32);
+  ll_info->lt_64 = convert_pct_to_unsigned (lt_64);
+  ll_info->lt_256 = convert_pct_to_unsigned (lt_256);
+  ll_info->lt_1024 = convert_pct_to_unsigned (lt_1024);
+  ll_info->gt_1024 = convert_pct_to_unsigned (gt_1024);
+  ll_info->wself = convert_pct_to_unsigned (wself);
+  ll_info->code_addr = p;
+
+  /* Run the raw address through the symbolizer.  */
+  if (tool_details->symbolize)
+    {
+      char *sym_info = tool_details->symbolize ((void *)p);
+      /* sym_info is of the form src_filename:linenum.  Descriminator is
+         currently not supported by addr2line.  */
+      char *sep = strchr (sym_info, ':');
+      if (!sep)
+        {
+          /* Assume entire string is srcfile.  */
+          ll_info->filename = (char *)sym_info;
+          ll_info->line = 0;
+        }
+      else
+        {
+          /* Terminate the filename string at the separator.  */
+          *sep = 0;
+          ll_info->filename = (char *)sym_info;
+          /* Convert rest of the sym info to a line number.  */
+          ll_info->line = atol (sep+1);
+        }
+      ll_info->discriminator = 0;
+    }
+  else
+    {
+      /* No symbolizer available.  */
+      ll_info->filename = NULL;
+      ll_info->line = 0;
+      ll_info->discriminator = 0;
+    }
+  return 0;
+}
+
+/* Parse the branch mispredict info pointed by LINE and save it into
+   BRM_INFO. Returns 0 if the line was parsed successfully, non-zero
+   otherwise.
+
+   An example header+line look like these:
+   "counts   %self    %cum          code addr symbol"
+   "6869  37.67%  37.67% 0x00000000004007e5 sum(std::vector<int*,
+    std::allocator<int*> > const&)+51</root/tmp/array>"
+*/
+
+static int
+parse_branch_mispredict_line (char *line, gcov_pmu_brm_info_t *brm_info)
+{
+  unsigned counts;
+  /* These are percentages parsed as floats, but then converted to
+     ints after multiplying by 100.  */
+  float self, cum;
+  long unsigned int p;
+  int n_values;
+  pmu_tool_fns *tool_details = the_pmu_tool_info->tool_details;
+
+  n_values = sscanf (line, "%u%f%%%f%%%lx",
+                     &counts, &self, &cum, &p);
+  if (n_values != 4)
+    return 1;
+
+  /* Values read successfully. Do the assignment after converting
+   * percentages into ints.  */
+  brm_info->counts = counts;
+  brm_info->self = convert_pct_to_unsigned (self);
+  brm_info->cum = convert_pct_to_unsigned (cum);
+  brm_info->code_addr = p;
+
+  /* Run the raw address through the symbolizer.  */
+  if (tool_details->symbolize)
+    {
+      char *sym_info = tool_details->symbolize ((void *)p);
+      /* sym_info is of the form src_filename:linenum.  Descriminator is
+         currently not supported by addr2line.  */
+      char *sep = strchr (sym_info, ':');
+      if (!sep)
+        {
+          /* Assume entire string is srcfile.  */
+          brm_info->filename = sym_info;
+          brm_info->line = 0;
+        }
+      else
+        {
+          /* Terminate the filename string at the separator.  */
+          *sep = 0;
+          brm_info->filename = sym_info;
+          /* Convert rest of the sym info to a line number.  */
+          brm_info->line = atol (sep+1);
+        }
+      brm_info->discriminator = 0;
+    }
+  else
+    {
+      /* No symbolizer available.  */
+      brm_info->filename = NULL;
+      brm_info->line = 0;
+      brm_info->discriminator = 0;
+    }
+  return 0;
+}
+
+/* Delete load latency info structures INFO.  */
+
+static void
+destroy_load_latency_infos (void *info)
+{
+  unsigned i;
+  ll_infos_t* ll_infos = (ll_infos_t *)info;
+
+  /* delete each element */
+  for (i = 0; i < ll_infos->ll_count; ++i)
+    XDELETE (ll_infos->ll_array[i]);
+  /* delete the array itself */
+  XDELETE (ll_infos->ll_array);
+  __destroy_pmu_tool_header (ll_infos->pmu_tool_header);
+  free (ll_infos->pmu_tool_header);
+  ll_infos->ll_array = 0;
+  ll_infos->ll_count = 0;
+}
+
+/* Delete branch mispredict structure INFO.  */
+
+static void
+destroy_branch_mispredict_infos (void *info)
+{
+  unsigned i;
+  brm_infos_t* brm_infos = (brm_infos_t *)info;
+
+  /* delete each element */
+  for (i = 0; i < brm_infos->brm_count; ++i)
+    XDELETE (brm_infos->brm_array[i]);
+  /* delete the array itself */
+  XDELETE (brm_infos->brm_array);
+  __destroy_pmu_tool_header (brm_infos->pmu_tool_header);
+  free (brm_infos->pmu_tool_header);
+  brm_infos->brm_array = 0;
+  brm_infos->brm_count = 0;
+}
+
+/* Parse FILENAME for load latency lines into a structure
+   PMU_DATA. Returns 0 on on success.  Returns non-zero on
+   failure.  */
+
+static int
+parse_pfmon_load_latency (char *filename, void *pmu_data)
+{
+  FILE *fp;
+  size_t buflen = 2*1024;
+  char *buf;
+  ll_infos_t *load_latency_infos = (ll_infos_t *)pmu_data;
+  gcov_pmu_tool_header_t *tool_header = 0;
+
+  if ((fp = fopen (filename, "r")) == NULL)
+    {
+      fprintf (stderr, "cannot open pmu data file: %s\n", filename);
+      return 1;
+    }
+
+  if (!(tool_header = parse_pfmon_tool_header (fp, pfmon_ll_header)))
+    {
+      fprintf (stderr, "cannot parse pmu data file header: %s\n", filename);
+      return 1;
+    }
+
+  buf = (char *) malloc (buflen);
+  while (fgets (buf, buflen, fp))
+    {
+      gcov_pmu_ll_info_t *ll_info = XNEW (gcov_pmu_ll_info_t);
+      if (!parse_load_latency_line (buf, ll_info))
+        {
+          /* valid line, add to the array */
+          load_latency_infos->ll_count++;
+          if (load_latency_infos->ll_count >=
+              load_latency_infos->alloc_ll_count)
+            {
+              /* need to realloc */
+              load_latency_infos->ll_array =
+                realloc (load_latency_infos->ll_array,
+                         2 * load_latency_infos->alloc_ll_count);
+              if (load_latency_infos->ll_array == NULL)
+                {
+                  fprintf (stderr, "Cannot allocate load latency memory.\n");
+                  __destroy_pmu_tool_header (tool_header);
+                  free (buf);
+                  fclose (fp);
+                  return 1;
+                }
+            }
+          load_latency_infos->ll_array[load_latency_infos->ll_count - 1] =
+            ll_info;
+        }
+      else
+        /* Delete invalid line.  */
+        XDELETE (ll_info);
+    }
+  free (buf);
+  fclose (fp);
+  load_latency_infos->pmu_tool_header = tool_header;
+  return 0;
+}
+
+/* Parse open file FP until END_HEADER is seen. The data matching
+   gcov_pmu_tool_header_t fields is saved and returned in a new
+   struct. In case of failure, it returns NULL.  */
+
+static gcov_pmu_tool_header_t *
+parse_pfmon_tool_header (FILE *fp, const char *end_header)
+{
+  static const char tag_hostname[] = "# hostname: ";
+  static const char tag_kversion[] = "# kernel version: ";
+  static const char tag_hostcpu[] = "# host CPUs:  ";
+  static const char tag_column_desc_start[] = "# description of columns:";
+  static const char tag_column_desc_end[] =
+      "#	other columns are self-explanatory";
+  size_t buflen = 4*1024;
+  char *buf, *buf_start, *buf_end;
+  gcov_pmu_tool_header_t *tool_header = XNEWVEC (gcov_pmu_tool_header_t, 1);
+  char *hostname = 0;
+  char *kversion = 0;
+  char *hostcpu = 0;
+  char *column_description = 0;
+  char *column_desc_start = 0;
+  char *column_desc_end = 0;
+  const char *column_header = 0;
+  int got_hostname = 0;
+  int got_kversion = 0 ;
+  int got_hostcpu = 0;
+  int got_end_header = 0;
+  int got_column_description = 0;
+
+  buf = (char *) malloc (buflen);
+  buf_start = buf;
+  buf_end = buf + buflen;
+  while (buf < (buf_end - 1) && fgets (buf, buf_end - buf, fp))
+    {
+      if (strncmp (end_header, buf, buf_end - buf) == 0)
+      {
+        got_end_header = 1;
+        break;
+      }
+      if (!got_hostname &&
+          strncmp (buf, tag_hostname, strlen (tag_hostname)) == 0)
+        {
+          size_t len = strlen (buf) - strlen (tag_hostname);
+          hostname = (char *)malloc (len);
+          memcpy (hostname, buf + strlen (tag_hostname), len);
+          hostname[len - 1] = 0;
+          tool_header->hostname = hostname;
+          got_hostname = 1;
+        }
+
+      if (!got_kversion &&
+          strncmp (buf, tag_kversion, strlen (tag_kversion)) == 0)
+        {
+          size_t len = strlen (buf) - strlen (tag_kversion);
+          kversion = (char *)malloc (len);
+          memcpy (kversion, buf + strlen (tag_kversion), len);
+          kversion[len - 1] = 0;
+          tool_header->kernel_version = kversion;
+          got_kversion = 1;
+        }
+
+      if (!got_hostcpu &&
+          strncmp (buf, tag_hostcpu, strlen (tag_hostcpu)) == 0)
+        {
+          size_t len = strlen (buf) - strlen (tag_hostcpu);
+          hostcpu = (char *)malloc (len);
+          memcpy (hostcpu, buf + strlen (tag_hostcpu), len);
+          hostcpu[len - 1] = 0;
+          tool_header->host_cpu = hostcpu;
+          got_hostcpu = 1;
+        }
+      if (!got_column_description &&
+          strncmp (buf, tag_column_desc_start, strlen (tag_column_desc_start))
+          == 0)
+        {
+          column_desc_start = buf;
+          column_desc_end = 0;
+          /* Continue reading until end of the column descriptor.  */
+          while (buf < (buf_end - 1) && fgets (buf, buf_end - buf, fp))
+            {
+              if (strncmp (buf, tag_column_desc_end,
+                           strlen (tag_column_desc_end)) == 0)
+                {
+                  column_desc_end = buf + strlen (tag_column_desc_end);
+                  break;
+                }
+              buf += strlen (buf);
+            }
+          if (column_desc_end)
+            {
+              /* Found the end, copy it into a new string.  */
+              column_description = (char *)malloc (column_desc_end -
+                                                   column_desc_start + 1);
+              got_column_description = 1;
+              strcpy (column_description, column_desc_start);
+              tool_header->column_description = column_description;
+            }
+        }
+      buf += strlen (buf);
+    }
+
+  /* If we are missing any of the fields, return NULL.  */
+  if (!got_end_header || !got_hostname || !got_kversion || !got_hostcpu
+      || !got_column_description)
+    {
+      if (hostname)
+        free (hostname);
+      if (kversion)
+        free (kversion);
+      if (hostcpu)
+        free (hostcpu);
+      if (column_description)
+        free (column_description);
+      free (buf_start);
+      free (tool_header);
+      return NULL;
+    }
+
+  switch (the_pmu_tool_info->event)
+    {
+    case PET_INTEL_LOAD_LATENCY:
+    case PET_AMD_LOAD_LATENCY:
+      column_header = pfmon_ll_header;
+      break;
+    case PET_INTEL_BRANCH_MISPREDICT:
+    case PET_AMD_BRANCH_MISPREDICT:
+      column_header = pfmon_bm_header;
+      break;
+    default:
+      break;
+    }
+  tool_header->column_header = strdup (column_header);
+  tool_header->full_header = buf_start;
+  return tool_header;
+}
+
+
+/* Parse FILENAME for branch mispredict lines into a structure
+   PMU_DATA. Returns 0 on on success.  Returns non-zero on
+   failure.  */
+
+static int
+parse_pfmon_branch_mispredicts (char *filename, void *pmu_data)
+{
+  FILE *fp;
+  size_t buflen = 2*1024;
+  char *buf;
+  brm_infos_t *brm_infos = (brm_infos_t *)pmu_data;
+  gcov_pmu_tool_header_t *tool_header = 0;
+
+  if ((fp = fopen (filename, "r")) == NULL)
+    {
+      fprintf (stderr, "cannot open pmu data file: %s\n", filename);
+      return 1;
+    }
+
+  if (!(tool_header = parse_pfmon_tool_header (fp, pfmon_bm_header)))
+    {
+      fprintf (stderr, "cannot parse pmu data file header: %s\n", filename);
+      return 1;
+    }
+
+  buf = (char *) malloc (buflen);
+  while (fgets (buf, buflen, fp))
+    {
+      gcov_pmu_brm_info_t *brm = XNEW (gcov_pmu_brm_info_t);
+      if (!parse_branch_mispredict_line (buf, brm))
+        {
+          /* Valid line, add to the array.  */
+          brm_infos->brm_count++;
+          if (brm_infos->brm_count >= brm_infos->alloc_brm_count)
+            {
+              /* Do we need to realloc? */
+              brm_infos->brm_array =
+                realloc (brm_infos->brm_array,
+                         2 * brm_infos->alloc_brm_count);
+              if (brm_infos->brm_array == NULL) {
+                fprintf (stderr,
+                         "Cannot allocate memory for br mispredicts.\n");
+                __destroy_pmu_tool_header (tool_header);
+                free (buf);
+                fclose (fp);
+                return 1;
+              }
+            }
+          brm_infos->brm_array[brm_infos->brm_count - 1] = brm;
+        }
+      else
+        /* Delete invalid line.  */
+        XDELETE (brm);
+    }
+  free (buf);
+  fclose (fp);
+  brm_infos->pmu_tool_header = tool_header;
+  return 0;
+}
+
+/* Start the monitoring process using pmu tool. Return 0 on success,
+   non-zero otherwise.  */
+
+static int
+pmu_start (void)
+{
+  pid_t pid;
+
+  /* no start function */
+  if (!the_pmu_tool_info->tool_details->start_pmu_module)
+    return 1;
+
+  pid = fork ();
+  if (pid == -1)
+    {
+      /* error condition */
+      fprintf (stderr, "Cannot create PMU profiling process, exiting.\n");
+      return 1;
+    }
+  else if (pid == 0)
+    {
+      /* child */
+      pid_t ppid = getppid();
+      char *tmpfile = the_pmu_tool_info->raw_pmu_profile_filename;
+      const char **pfmon_args = the_pmu_tool_info->tool_details->arg_array;
+      int new_stderr_fd;
+
+      /* Redirect stderr from the child process into a separate file.  */
+      new_stderr_fd = creat (the_pmu_tool_info->tool_stderr_filename,
+                             S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH);
+      if (new_stderr_fd != -1)
+          dup2 (new_stderr_fd, 2);
+      /* The following does an exec and thus is not expected to return.  */
+      the_pmu_tool_info->tool_details->start_pmu_module(ppid, tmpfile,
+                                                        pfmon_args);
+      /* exec returned, an error condition.  */
+      fprintf (stderr, "could not create profiling process: %s\n",
+               the_pmu_tool_info->tool_details->arg_array[0]);
+      _exit (0);
+    }
+  else
+    {
+      /* parent */
+      the_pmu_tool_info->pmu_tool_pid = pid;
+      return 0;
+    }
+}
+
+/* Allocate and initialize pmu load latency structure.  */
+
+static void *
+init_pmu_load_latency (void)
+{
+  ll_infos_t *load_latency = XNEWVEC (ll_infos_t, 1);
+  load_latency->ll_count = 0;
+  load_latency->alloc_ll_count = 64;
+  load_latency->ll_array = XNEWVEC (gcov_pmu_ll_info_t *,
+                                    load_latency->alloc_ll_count);
+  return (void *)load_latency;
+}
+
+/* Allocate and initialize pmu branch mispredict structure.  */
+
+static void *
+init_pmu_branch_mispredict (void)
+{
+  brm_infos_t *brm_info = XNEWVEC (brm_infos_t, 1);
+  brm_info->brm_count = 0;
+  brm_info->alloc_brm_count = 64;
+  brm_info->brm_array = XNEWVEC (gcov_pmu_brm_info_t *,
+                                 brm_info->alloc_brm_count);
+  return (void *)brm_info;
+}
+
+/* Initialize pmu tool based upon PMU_INFO. Sets the appropriate tool
+   type in the global the_pmu_tool_info.  */
+
+static int
+init_pmu_tool (struct gcov_pmu_info *pmu_info)
+{
+  the_pmu_tool_info->pmu_profiling_state = PMU_NONE;
+  the_pmu_tool_info->verbose = 0;
+  the_pmu_tool_info->tool = PTT_PFMON;  /* we support only pfmon */
+  the_pmu_tool_info->pmu_tool_pid = 0;
+  the_pmu_tool_info->top_n_address = pmu_info->pmu_top_n_address;
+  the_pmu_tool_info->symbolizer_pid = 0;
+  the_pmu_tool_info->symbolizer_to_pipefd[0] = -1;
+  the_pmu_tool_info->symbolizer_to_pipefd[1] = -1;
+  the_pmu_tool_info->symbolizer_from_pipefd[0] = -1;
+  the_pmu_tool_info->symbolizer_from_pipefd[1] = -1;
+
+  if (parse_pmu_profile_options (pmu_info->pmu_tool))
+    return 1;
+
+  if (the_pmu_tool_info->pmu_profiling_state == PMU_ERROR)
+    {
+      fprintf (stderr, "Unsupported PMU module: %s, disabling PMU profiling.\n",
+               pmu_info->pmu_tool);
+      return 1;
+    }
+
+  if (the_pmu_tool_info->tool_details->init_pmu_module)
+    /* initialize module */
+    the_pmu_tool_info->pmu_data =
+      the_pmu_tool_info->tool_details->init_pmu_module();
+  return 0;
+}
+
+/* Initialize PMU profiling based upon the information passed in
+   PMU_INFO and use pmu_profile_filename as the file to store the PMU
+   profile.  This is called multiple times from libgcov, once per
+   object file.  We need to make sure to do the necessary
+   initialization only the first time.  For subsequent invocations it
+   behaves as a NOOP.  */
+
+void
+__gcov_init_pmu_profiler (struct gcov_pmu_info *pmu_info)
+{
+  char *raw_pmu_profile_filename;
+  char *tool_stderr_filename;
+  if (!pmu_info || !pmu_info->pmu_profile_filename || !pmu_info->pmu_tool)
+    return;
+
+  /* Allocate the global structure on first invocation.  */
+  if (!the_pmu_tool_info)
+    {
+      the_pmu_tool_info = XNEWVEC (struct pmu_tool_info, 1);
+      if (!the_pmu_tool_info)
+        {
+          fprintf (stderr, "Error allocating memory for PMU tool\n");
+          return;
+        }
+      if (init_pmu_tool (pmu_info))
+        {
+          /* Initialization error.  */
+          XDELETE (the_pmu_tool_info);
+          the_pmu_tool_info = 0;
+          return;
+        }
+    }
+
+  switch (the_pmu_tool_info->pmu_profiling_state)
+    {
+    case PMU_NONE:
+      the_pmu_tool_info->pmu_profile_filename =
+        strdup (pmu_info->pmu_profile_filename);
+      /* Construct an intermediate filename by substituting trailing
+         '.gcda' with '.pmud'.  */
+      raw_pmu_profile_filename = strdup (pmu_info->pmu_profile_filename);
+      if (raw_pmu_profile_filename == NULL)
+        {
+          fprintf (stderr, "Cannot allocate memory\n");
+          exit (1);
+        }
+      strcpy (raw_pmu_profile_filename + strlen (raw_pmu_profile_filename) - 4,
+              "pmud");
+
+      /* Construct a filename for collecting PMU tool's stderr by
+         substituting trailing '.gcda' with '.stderr'.  */
+      tool_stderr_filename =
+        XNEWVEC (char, strlen (pmu_info->pmu_profile_filename) + 1 + 2);
+      strcpy (tool_stderr_filename, pmu_info->pmu_profile_filename);
+      strcpy (tool_stderr_filename + strlen (tool_stderr_filename) - 4,
+              "stderr");
+      the_pmu_tool_info->raw_pmu_profile_filename = raw_pmu_profile_filename;
+      the_pmu_tool_info->tool_stderr_filename = tool_stderr_filename;
+      the_pmu_tool_info->pmu_profiling_state = PMU_INITIALIZED;
+      break;
+
+    case PMU_INITIALIZED:
+    case PMU_OFF:
+    case PMU_ON:
+    case PMU_ERROR:
+      break;
+    default:
+      break;
+    }
+}
+
+/* Start PMU profiling.  It updates the current state.  */
+
+void
+__gcov_start_pmu_profiler (void)
+{
+  if (!the_pmu_tool_info)
+    return;
+
+  switch (the_pmu_tool_info->pmu_profiling_state)
+    {
+    case PMU_INITIALIZED:
+      if (!pmu_start ())
+        the_pmu_tool_info->pmu_profiling_state = PMU_ON;
+      else
+        the_pmu_tool_info->pmu_profiling_state = PMU_ERROR;
+      break;
+
+    case PMU_NONE:
+      /* PMU was not properly initialized, don't attempt start it.  */
+      the_pmu_tool_info->pmu_profiling_state = PMU_ERROR;
+      break;
+
+    case PMU_OFF:
+      /* Restarting PMU is not yet supported.  */
+    case PMU_ON:
+      /* Do nothing.  */
+    case PMU_ERROR:
+      break;
+
+    default:
+      break;
+    }
+}
+
+/* Stop PMU profiling.  Currently it doesn't do anything except
+   bookkeeping.  */
+
+void
+__gcov_stop_pmu_profiler (void)
+{
+  if (!the_pmu_tool_info)
+    return;
+
+  if (the_pmu_tool_info->tool_details->stop_pmu_module)
+    the_pmu_tool_info->tool_details->stop_pmu_module();
+  if (the_pmu_tool_info->pmu_profiling_state == PMU_ON)
+    the_pmu_tool_info->pmu_profiling_state = PMU_OFF;
+}
+
+/* Write the load latency information LL_INFO into the gcda file.  */
+
+static void
+gcov_write_ll_line (const gcov_pmu_ll_info_t *ll_info)
+{
+  gcov_unsigned_t len = GCOV_TAG_PMU_LOAD_LATENCY_LENGTH (ll_info->filename);
+  gcov_write_tag_length (GCOV_TAG_PMU_LOAD_LATENCY_INFO, len);
+  gcov_write_unsigned (ll_info->counts);
+  gcov_write_unsigned (ll_info->self);
+  gcov_write_unsigned (ll_info->cum);
+  gcov_write_unsigned (ll_info->lt_10);
+  gcov_write_unsigned (ll_info->lt_32);
+  gcov_write_unsigned (ll_info->lt_64);
+  gcov_write_unsigned (ll_info->lt_256);
+  gcov_write_unsigned (ll_info->lt_1024);
+  gcov_write_unsigned (ll_info->gt_1024);
+  gcov_write_unsigned (ll_info->wself);
+  gcov_write_counter (ll_info->code_addr);
+  gcov_write_unsigned (ll_info->line);
+  gcov_write_unsigned (ll_info->discriminator);
+  gcov_write_string (ll_info->filename);
+}
+
+
+/* Write the branch mispredict information BRM_INFO into the gcda file.  */
+
+static void
+gcov_write_branch_mispredict_line (const gcov_pmu_brm_info_t *brm_info)
+{
+  gcov_unsigned_t len = GCOV_TAG_PMU_BRANCH_MISPREDICT_LENGTH (
+      brm_info->filename);
+  gcov_write_tag_length (GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO, len);
+  gcov_write_unsigned (brm_info->counts);
+  gcov_write_unsigned (brm_info->self);
+  gcov_write_unsigned (brm_info->cum);
+  gcov_write_counter (brm_info->code_addr);
+  gcov_write_unsigned (brm_info->line);
+  gcov_write_unsigned (brm_info->discriminator);
+  gcov_write_string (brm_info->filename);
+}
+
+/* Write load latency information INFO into the gcda file.  The gcda
+   file has already been opened and is available for writing.  */
+
+static void
+gcov_write_load_latency_infos (void *info)
+{
+  unsigned i;
+  const ll_infos_t *ll_infos = (const ll_infos_t *)info;
+  gcov_unsigned_t stamp = 0;  /* Don't use stamp as we don't support merge.  */
+  /* We don't support merge, and instead always rewrite the file.  */
+  gcov_rewrite ();
+  gcov_write_tag_length (GCOV_DATA_MAGIC, GCOV_VERSION);
+  gcov_write_unsigned (stamp);
+  if (ll_infos->pmu_tool_header)
+    gcov_write_tool_header (ll_infos->pmu_tool_header);
+  for (i = 0; i < ll_infos->ll_count; ++i)
+    {
+      /* Write each line.  */
+      gcov_write_ll_line (ll_infos->ll_array[i]);
+    }
+  gcov_truncate ();
+}
+
+/* Write branch mispredict information INFO into the gcda file.  The
+   gcda file has already been opened and is available for writing.  */
+
+static void
+gcov_write_branch_mispredict_infos (void *info)
+{
+  unsigned i;
+  const brm_infos_t *brm_infos = (const brm_infos_t *)info;
+  gcov_unsigned_t stamp = 0;  /* Don't use stamp as we don't support merge. */
+  /* We don't support merge, and instead always rewrite the file.  */
+  gcov_rewrite ();
+  gcov_write_tag_length (GCOV_DATA_MAGIC, GCOV_VERSION);
+  gcov_write_unsigned (stamp);
+  if (brm_infos->pmu_tool_header)
+    gcov_write_tool_header (brm_infos->pmu_tool_header);
+  for (i = 0; i < brm_infos->brm_count; ++i)
+    {
+      /* Write each line.  */
+      gcov_write_branch_mispredict_line (brm_infos->brm_array[i]);
+    }
+  gcov_truncate ();
+}
+
+/* Compute TOOL_HEADER length for writing into the gcov file.  */
+
+static gcov_unsigned_t
+gcov_tag_pmu_tool_header_length (gcov_pmu_tool_header_t *header)
+{
+  gcov_unsigned_t len = 0;
+  if (header)
+    {
+      len += gcov_string_length (header->host_cpu);
+      len += gcov_string_length (header->hostname);
+      len += gcov_string_length (header->kernel_version);
+      len += gcov_string_length (header->column_header);
+      len += gcov_string_length (header->column_description);
+      len += gcov_string_length (header->full_header);
+    }
+  return len;
+}
+
+/* Write tool header into the gcda file. It assumes that the gcda file
+   has already been opened and is available for writing.  */
+
+static void
+gcov_write_tool_header (gcov_pmu_tool_header_t *header)
+{
+  gcov_unsigned_t len = gcov_tag_pmu_tool_header_length (header);
+  gcov_write_tag_length (GCOV_TAG_PMU_TOOL_HEADER, len);
+  gcov_write_string (header->host_cpu);
+  gcov_write_string (header->hostname);
+  gcov_write_string (header->kernel_version);
+  gcov_write_string (header->column_header);
+  gcov_write_string (header->column_description);
+  gcov_write_string (header->full_header);
+}
+
+
+/* End PMU profiling and write data into appropriate gcda file.  */
+
+void
+__gcov_end_pmu_profiler (void)
+{
+  int pid_status;
+  int wait_status;
+  pid_t pid;
+  pmu_tool_fns *tool_details;
+
+  if (!the_pmu_tool_info)
+    return;
+
+  tool_details = the_pmu_tool_info->tool_details;
+  pid = the_pmu_tool_info->pmu_tool_pid;
+  if (pid)
+    {
+      if (tool_debug)
+        fprintf (stderr, "terminating PMU profiling process %ld\n", (long)pid);
+      kill (pid, SIGTERM);
+      if (tool_debug)
+        fprintf (stderr, "parent: waiting for pmu process to end\n");
+      wait_status = waitpid (pid, &pid_status, 0);
+      if (tool_debug) {
+        if (wait_status == pid)
+          fprintf (stderr, "Normal exit. Child terminated.\n");
+        else
+          fprintf (stderr, "Abnormal exit. child status, %d.\n", pid_status);
+      }
+    }
+
+  if (the_pmu_tool_info->pmu_profiling_state != PMU_OFF)
+    {
+      /* nothing to do */
+      fprintf (stderr,
+               "__gcov_dump_pmu_profile: incorrect pmu state: %d, pid: %ld\n",
+               the_pmu_tool_info->pmu_profiling_state,
+               (unsigned long)pid);
+      return;
+    }
+
+  if (!tool_details->parse_pmu_output)
+    return;
+
+  /* Since we are going to parse the output, we also need symbolizer.  */
+  if (tool_details->start_symbolizer)
+    tool_details->start_symbolizer (getpid ());
+
+  if (!tool_details->parse_pmu_output
+      (the_pmu_tool_info->raw_pmu_profile_filename,
+       the_pmu_tool_info->pmu_data))
+    {
+      if (tool_details->gcov_write_pmu_data)
+        /* Write tool output into the gcda file.  */
+        tool_details->gcov_write_pmu_data (the_pmu_tool_info->pmu_data);
+    }
+
+  if (tool_details->end_symbolizer)
+    tool_details->end_symbolizer ();
+
+  if (tool_details->cleanup_pmu_data)
+    tool_details->cleanup_pmu_data (the_pmu_tool_info->pmu_data);
+}
+
+#endif
Index: gcc/coverage.c
===================================================================
--- gcc/coverage.c	(revision 175188)
+++ gcc/coverage.c	(working copy)
@@ -62,6 +62,9 @@ 
 #include "dbgcnt.h"
 #include "input.h"
 
+/* Defined in tree-profile.c.  */
+void gimple_init_instrumentation_sampling (void);
+
 struct function_list
 {
   struct function_list *next;	 /* next function */
@@ -120,6 +123,9 @@ 
 static char *da_base_file_name;
 static char *main_input_file_name;
 
+/* Filename for the global pmu profile */
+static char pmu_profile_filename[] = "pmuprofile";
+
 /* Hash table of count data.  */
 static htab_t counts_hash = NULL;
 
@@ -146,6 +152,16 @@ 
 /* True if the current module has any asm statements.  */
 static bool has_asm_statement;
 
+/* extern const char * __gcov_pmu_profile_filename */
+static tree gcov_pmu_filename_decl = NULL_TREE;
+/* extern const char * __gcov_pmu_profile_options */
+static tree gcov_pmu_options_decl = NULL_TREE;
+/* extern gcov_unsigned_t  __gcov_pmu_top_n_address */
+static tree gcov_pmu_top_n_address_decl = NULL_TREE;
+
+/* To ensure that the above variables are initialized only once.  */
+static int pmu_profiling_initialized = 0;
+
 /* Forward declarations.  */
 static hashval_t htab_counts_entry_hash (const void *);
 static int htab_counts_entry_eq (const void *, const void *);
@@ -157,7 +173,8 @@ 
 static tree build_ctr_info_value (unsigned, tree);
 static tree build_gcov_info (void);
 static void create_coverage (void);
-static char * get_da_file_name (const char *);
+static void init_pmu_profiling (void);
+static bool profiling_enabled_p (void);
 
 /* Return the type node for gcov_type.  */
 
@@ -175,6 +192,15 @@ 
   return lang_hooks.types.type_for_size (32, true);
 }
 
+/* Return the type node for const char *.  */
+
+static tree
+get_const_string_type (void)
+{
+  return build_pointer_type
+    (build_qualified_type (char_type_node, TYPE_QUAL_CONST));
+}
+
 static hashval_t
 htab_counts_entry_hash (const void *of)
 {
@@ -1688,7 +1714,7 @@ 
 
   no_coverage = 1; /* Disable any further coverage.  */
 
-  if (!prg_ctr_mask)
+  if (!prg_ctr_mask && !flag_pmu_profile_generate)
     return;
 
   t = build_gcov_info ();
@@ -1725,7 +1751,7 @@ 
 
 /* Get the da file name, given base file name.  */
 
-static char *
+char *
 get_da_file_name (const char *base_file_name)
 {
   char *da_file_name;
@@ -1910,8 +1936,122 @@ 
 	read_counts_file (get_da_file_name (module_infos[i]->da_filename),
 			  module_infos[i]->ident);
     }
+
+  /* Define variables which are referenced at runtime by libgcov.  */
+  if (profiling_enabled_p ())
+  {
+    init_pmu_profiling ();
+    gimple_init_instrumentation_sampling ();
+  }
 }
 
+/* Return True if any type of profiling is enabled which requires linking
+   in libgcov otherwise return False.  */
+
+static bool
+profiling_enabled_p (void)
+{
+  return flag_pmu_profile_generate || profile_arc_flag ||
+      flag_profile_generate_sampling || flag_test_coverage ||
+      flag_branch_probabilities || flag_profile_reusedist;
+}
+
+/* Construct variables for PMU profiling.
+   1) __gcov_pmu_profile_filename,
+   2) __gcov_pmu_profile_options,
+   3) __gcov_pmu_top_n_address.  */
+
+static void
+init_pmu_profiling (void)
+{
+  if (!pmu_profiling_initialized)
+    {
+      unsigned top_n_addr = PARAM_VALUE (PARAM_PMU_PROFILE_N_ADDRESS);
+      tree filename_ptr, options_ptr;
+
+      /* Construct an initializer for __gcov_pmu_profile_filename.  */
+      gcov_pmu_filename_decl =
+        build_decl (UNKNOWN_LOCATION, VAR_DECL,
+                    get_identifier ("__gcov_pmu_profile_filename"),
+                    get_const_string_type ());
+      TREE_PUBLIC (gcov_pmu_filename_decl) = 1;
+      DECL_ARTIFICIAL (gcov_pmu_filename_decl) = 1;
+      make_decl_one_only (gcov_pmu_filename_decl,
+                          DECL_ASSEMBLER_NAME (gcov_pmu_filename_decl));
+      TREE_STATIC (gcov_pmu_filename_decl) = 1;
+
+      if (flag_pmu_profile_generate)
+        {
+          const char *filename = get_da_file_name (pmu_profile_filename);
+          int file_name_len;
+          tree filename_string;
+          file_name_len = strlen (filename);
+          filename_string = build_string (file_name_len + 1, filename);
+          TREE_TYPE (filename_string) = build_array_type
+            (char_type_node, build_index_type
+             (build_int_cst (NULL_TREE, file_name_len)));
+          filename_ptr = build1 (ADDR_EXPR, get_const_string_type (),
+                                 filename_string);
+        }
+      else
+        filename_ptr = null_pointer_node;
+
+      DECL_INITIAL (gcov_pmu_filename_decl) = filename_ptr;
+      assemble_variable (gcov_pmu_filename_decl, 0, 0, 0);
+
+      /* Construct an initializer for __gcov_pmu_profile_options.  */
+      gcov_pmu_options_decl =
+        build_decl (UNKNOWN_LOCATION, VAR_DECL,
+                    get_identifier ("__gcov_pmu_profile_options"),
+                    get_const_string_type ());
+      TREE_PUBLIC (gcov_pmu_options_decl) = 1;
+      DECL_ARTIFICIAL (gcov_pmu_options_decl) = 1;
+      make_decl_one_only (gcov_pmu_options_decl,
+                          DECL_ASSEMBLER_NAME (gcov_pmu_options_decl));
+      TREE_STATIC (gcov_pmu_options_decl) = 1;
+
+      /* If the flag is false we generate a null pointer to indicate
+         that we are not doing the pmu profiling.  */
+      if (flag_pmu_profile_generate)
+        {
+          const char *pmu_options = flag_pmu_profile_generate;
+          int pmu_options_len;
+          tree pmu_options_string;
+
+          pmu_options_len = strlen (pmu_options);
+          pmu_options_string = build_string (pmu_options_len + 1, pmu_options);
+          TREE_TYPE (pmu_options_string) = build_array_type
+            (char_type_node, build_index_type (build_int_cst
+                                               (NULL_TREE, pmu_options_len)));
+          options_ptr = build1 (ADDR_EXPR, get_const_string_type (),
+                                pmu_options_string);
+        }
+      else
+        options_ptr = null_pointer_node;
+
+      DECL_INITIAL (gcov_pmu_options_decl) = options_ptr;
+      assemble_variable (gcov_pmu_options_decl, 0, 0, 0);
+
+      /* Construct an initializer for __gcov_pmu_top_n_address.  We
+         don't need to guard this with the flag_pmu_profile generate
+         because the value of __gcov_pmu_top_n_address is ignored when
+         not doing profiling.  */
+      gcov_pmu_top_n_address_decl =
+        build_decl (UNKNOWN_LOCATION, VAR_DECL,
+                    get_identifier ("__gcov_pmu_top_n_address"),
+                    get_gcov_unsigned_t ());
+      TREE_PUBLIC (gcov_pmu_top_n_address_decl) = 1;
+      DECL_ARTIFICIAL (gcov_pmu_top_n_address_decl) = 1;
+      make_decl_one_only (gcov_pmu_top_n_address_decl,
+                          DECL_ASSEMBLER_NAME (gcov_pmu_top_n_address_decl));
+      TREE_STATIC (gcov_pmu_top_n_address_decl) = 1;
+      DECL_INITIAL (gcov_pmu_top_n_address_decl) =
+        build_int_cstu (get_gcov_unsigned_t (), top_n_addr);
+      assemble_variable (gcov_pmu_top_n_address_decl, 0, 0, 0);
+    }
+  pmu_profiling_initialized = 1;
+}
+
 /* Performs file-level cleanup.  Close graph file, generate coverage
    variables and constructor.  */
 
@@ -1989,4 +2129,19 @@ 
   has_asm_statement = flag_ripa_disallow_asm_modules;
 }
 
+/* Check the command line OPTIONS passed to
+   -fpmu-profile-generate. Return 0 if the options are valid, non-zero
+   otherwise.  */
+
+int
+check_pmu_profile_options (const char *options)
+{
+  if (strcmp(options, "load-latency") &&
+      strcmp(options, "load-latency-verbose") &&
+      strcmp(options, "branch-mispredict") &&
+      strcmp(options, "branch-mispredict-verbose"))
+    return 1;
+  return 0;
+}
+
 #include "gt-coverage.h"
Index: gcc/coverage.h
===================================================================
--- gcc/coverage.h	(revision 175188)
+++ gcc/coverage.h	(working copy)
@@ -77,4 +77,13 @@ 
 /* Mark this module as containing asm statements.  */
 extern void coverage_has_asm_stmt (void);
 
+/* Get the da file name, given base file name.  */
+extern char * get_da_file_name (const char *base_file_name);
+
+/* Check if the specified options are valid for pmu profilig.  */
+extern int check_pmu_profile_options (const char *options);
+
+/* Defined in tree-profile.c.  */
+extern void tree_init_instrumentation_sampling (void);
+
 #endif
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 175188)
+++ gcc/common.opt	(working copy)
@@ -1606,6 +1606,14 @@ 
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fplugin-arg-<name>-<key>[=<value>]	Specify argument <key>=<value> for plugin <name>
 
+fpmu-profile-generate=
+Common Joined RejectNegative Var(flag_pmu_profile_generate)
+-fpmu-profile-generate=[load-latency]  Generate pmu profile for cache misses. Currently only pfmon based load latency profiling is supported on Intel/PEBS and AMD/IBS platforms.
+
+fpmu-profile-use=
+Common Joined RejectNegative Var(flag_pmu_profile_use)
+-fpmu-profile-use=[load-latency]  Use pmu profile data while optimizing.  Currently only perfmon based load latency profiling is supported on Intel/PEBS and AMD/IBS platforms.
+
 fpredictive-commoning
 Common Report Var(flag_predictive_commoning) Optimization
 Run predictive commoning optimization.
Index: gcc/tree-profile.c
===================================================================
--- gcc/tree-profile.c	(revision 175188)
+++ gcc/tree-profile.c	(working copy)
@@ -168,6 +168,9 @@ 
 /* extern gcov_unsigned_t __gcov_sampling_rate  */
 static tree gcov_sampling_rate_decl = NULL_TREE;
 
+/* forward declaration.  */
+void gimple_init_instrumentation_sampling (void);
+
 /* Insert STMT_IF around given sequence of consecutive statements in the
    same basic block starting with STMT_START, ending with STMT_END.  */
 
@@ -287,7 +290,7 @@ 
     }
 }
 
-static void
+void
 gimple_init_instrumentation_sampling (void)
 {
   if (!gcov_sampling_rate_decl)
@@ -341,8 +344,6 @@ 
   tree dc_profiler_fn_type;
   tree average_profiler_fn_type;
 
-  gimple_init_instrumentation_sampling ();
-
   if (!gcov_type_node)
     {
       char name_buf[32];
Index: gcc/libgcov.c
===================================================================
--- gcc/libgcov.c	(revision 175188)
+++ gcc/libgcov.c	(working copy)
@@ -124,9 +124,15 @@ 
 }
 
 #ifndef __GCOV_KERNEL__
+/* Emitted in coverage.c.  */
+extern char * __gcov_pmu_profile_filename;
+extern char * __gcov_pmu_profile_options;
+extern gcov_unsigned_t __gcov_pmu_top_n_address;
+
 /* Sampling rate.  */
 extern gcov_unsigned_t __gcov_sampling_rate;
 static int gcov_sampling_rate_initialized = 0;
+void __gcov_set_sampling_rate (unsigned int rate);
 
 /* Set sampling rate to RATE.  */
 
@@ -352,6 +358,99 @@ 
     strcpy (gi_filename_up, filename);
 }
 
+/* This function allocates the space to store current file name.  */
+
+static void
+gcov_alloc_filename (void)
+{
+  /* Get file name relocation prefix.  Non-absolute values are ignored.  */
+  char *gcov_prefix = 0;
+
+  prefix_length = 0;
+  gcov_prefix_strip = 0;
+
+  {
+    /* Check if the level of dirs to strip off specified. */
+    char *tmp = getenv ("GCOV_PREFIX_STRIP");
+    if (tmp)
+      {
+        gcov_prefix_strip = atoi (tmp);
+        /* Do not consider negative values. */
+        if (gcov_prefix_strip < 0)
+          gcov_prefix_strip = 0;
+      }
+  }
+  /* Get file name relocation prefix.  Non-absolute values are ignored. */
+  gcov_prefix = getenv ("GCOV_PREFIX");
+  if (gcov_prefix)
+    {
+      prefix_length = strlen(gcov_prefix);
+
+      /* Remove an unnecessary trailing '/' */
+      if (IS_DIR_SEPARATOR (gcov_prefix[prefix_length - 1]))
+        prefix_length--;
+    }
+  else
+    prefix_length = 0;
+
+  /* If no prefix was specified and a prefix stip, then we assume
+     relative.  */
+  if (gcov_prefix_strip != 0 && prefix_length == 0)
+    {
+      gcov_prefix = ".";
+      prefix_length = 1;
+    }
+
+  /* Allocate and initialize the filename scratch space.  */
+  gi_filename = (char *) malloc (prefix_length + gcov_max_filename + 2);
+  if (prefix_length)
+    memcpy (gi_filename, gcov_prefix, prefix_length);
+}
+
+/* Stop the pmu profiler and dump pmu profile info into the global file.  */
+
+static void
+pmu_profile_stop (void)
+{
+  const char *pmu_profile_filename =  __gcov_pmu_profile_filename;
+  const char *pmu_options = __gcov_pmu_profile_options;
+  size_t filename_length;
+
+  if (!pmu_profile_filename || !pmu_options)
+    return;
+
+  __gcov_stop_pmu_profiler ();
+
+  filename_length = strlen (pmu_profile_filename);
+  if (filename_length > gcov_max_filename)
+    gcov_max_filename = filename_length;
+  /* Allocate and initialize the filename scratch space.  */
+  gcov_alloc_filename ();
+  strcpy (gi_filename, pmu_profile_filename);
+  GCOV_GET_FILENAME (prefix_length, gcov_prefix_strip, gi_filename,
+                     gi_filename_up);
+  /* Open the gcda file for writing. We don't support merge yet. */
+  if (!gcov_open (gi_filename))
+    {
+#ifdef TARGET_POSIX_IO
+      /* Open failed likely due to missed directory.
+         Create directory and retry to open file. */
+      if (create_file_directory (gi_filename))
+        {
+          gcov_error ("pmu profiling:%s:Skip\n", gi_filename);
+          return;
+        }
+#endif
+      if (!gcov_open (gi_filename))
+        {
+          gcov_error ("pmu profiling:%s:Cannot open\n", gi_filename);
+          return;
+        }
+    }
+  __gcov_end_pmu_profiler ();
+  gcov_close ();
+}
+
 /* Sort N entries in VALUE_ARRAY in descending order.
    Each entry in VALUE_ARRAY has two values. The sorting
    is based on the second value.  */
@@ -438,56 +537,7 @@ 
     }
 }
 
-/* This function allocates the space to store current file name.  */
-
 static void
-gcov_alloc_filename (void)
-{
-  /* Get file name relocation prefix.  Non-absolute values are ignored.  */
-  char *gcov_prefix = 0;
-
-  prefix_length = 0;
-  gcov_prefix_strip = 0;
-
-  {
-    /* Check if the level of dirs to strip off specified. */
-    char *tmp = getenv ("GCOV_PREFIX_STRIP");
-    if (tmp)
-      {
-        gcov_prefix_strip = atoi (tmp);
-        /* Do not consider negative values. */
-        if (gcov_prefix_strip < 0)
-          gcov_prefix_strip = 0;
-      }
-  }
-  /* Get file name relocation prefix.  Non-absolute values are ignored. */
-  gcov_prefix = getenv ("GCOV_PREFIX");
-  if (gcov_prefix)
-    {
-      prefix_length = strlen(gcov_prefix);
-
-      /* Remove an unnecessary trailing '/' */
-      if (IS_DIR_SEPARATOR (gcov_prefix[prefix_length - 1]))
-        prefix_length--;
-    }
-  else
-    prefix_length = 0;
-
-  /* If no prefix was specified and a prefix stip, then we assume
-     relative.  */
-  if (gcov_prefix_strip != 0 && prefix_length == 0)
-    {
-      gcov_prefix = ".";
-      prefix_length = 1;
-    }
-
-  /* Aelocate and initialize the filename scratch space.  */
-  gi_filename = (char *) malloc (prefix_length + gcov_max_filename + 2);
-  if (prefix_length)
-    memcpy (gi_filename, gcov_prefix, prefix_length);
-}
-
-static void
 gcov_dump_module_info (void)
 {
   struct gcov_info *gi_ptr;
@@ -499,8 +549,8 @@ 
   {
     int error;
 
-    gcov_strip_leading_dirs (prefix_length, gcov_prefix_strip, 
-                             gi_ptr->filename, gi_filename_up);
+    GCOV_GET_FILENAME (prefix_length, gcov_prefix_strip, gi_ptr->filename,
+                       gi_filename_up);
     error = gcov_open_by_filename (gi_filename);
     if (error != 0)
       continue;
@@ -536,6 +586,8 @@ 
 
   dump_module_info = gcov_exit_init ();
 
+  /* Stop and write the PMU profile data into the global file.  */
+  pmu_profile_stop ();
 
   for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
     gcov_dump_one_gcov (gi_ptr);
@@ -572,11 +624,25 @@ 
       const char *ptr = info->filename;
       gcov_unsigned_t crc32 = gcov_crc32;
       size_t filename_length = strlen (info->filename);
+      struct gcov_pmu_info pmu_info;
 
       /* Refresh the longest file name information.  */
       if (filename_length > gcov_max_filename)
         gcov_max_filename = filename_length;
 
+      /* Initialize the pmu profiler.  */
+      pmu_info.pmu_profile_filename = __gcov_pmu_profile_filename;
+      pmu_info.pmu_tool = __gcov_pmu_profile_options;
+      pmu_info.pmu_top_n_address = __gcov_pmu_top_n_address;
+      __gcov_init_pmu_profiler (&pmu_info);
+      if (pmu_info.pmu_profile_filename)
+        {
+          /* Refresh the longest file name information.  */
+          filename_length = strlen (pmu_info.pmu_profile_filename);
+          if (filename_length > gcov_max_filename)
+            gcov_max_filename = filename_length;
+        }
+
       /* Assign the module ID (starting at 1).  */
       info->mod_info->ident = (++gcov_cur_module_id);
       gcc_assert (EXTRACT_MODULE_ID_FROM_GLOBAL_ID (GEN_FUNC_GLOBAL_ID (
@@ -601,7 +667,11 @@ 
       gcov_crc32 = crc32;
 
       if (!__gcov_list)
-        atexit (gcov_exit);
+        {
+          atexit (gcov_exit);
+          /* Start pmu profiler. */
+          __gcov_start_pmu_profiler ();
+        }
 
       info->next = __gcov_list;
       __gcov_list = info;
@@ -618,6 +688,7 @@ 
 {
   const struct gcov_info *gi_ptr;
 
+  __gcov_stop_pmu_profiler ();
   gcov_exit ();
   for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
     {
@@ -631,6 +702,7 @@ 
 	    ci_ptr++;
 	  }
     }
+  __gcov_start_pmu_profiler ();
 }
 
 #else /* __GCOV_KERNEL__ */
@@ -640,8 +712,8 @@ 
 /* Copy the filename to the buffer.  */
 
 static inline void
-gcov_get_filename (int prefix_length __attribute__ ((unused)), 
-                   int gcov_prefix_strip __attribute__ ((unused)), 
+gcov_get_filename (int prefix_length __attribute__ ((unused)),
+                   int gcov_prefix_strip __attribute__ ((unused)),
                    const char *filename, char *gi_filename_up)
 {
     strcpy (gi_filename_up, filename);
Index: gcc/params.def
===================================================================
--- gcc/params.def	(revision 175188)
+++ gcc/params.def	(working copy)
@@ -1011,6 +1011,11 @@ 
           ".note.callgraph.text section",
 	  0, 0, 0)
 
+DEFPARAM (PARAM_PMU_PROFILE_N_ADDRESS,
+	  "pmu_profile_n_addresses",
+	  "While doing PMU profiling symbolize this many top addresses.",
+	  50, 1, 10000)
+
 /*
 Local variables:
 mode:c
Index: gcc/gcov-dump.c
===================================================================
--- gcc/gcov-dump.c	(revision 175188)
+++ gcc/gcov-dump.c	(working copy)
@@ -39,6 +39,10 @@ 
 static void tag_counters (const char *, unsigned, unsigned);
 static void tag_summary (const char *, unsigned, unsigned);
 static void tag_module_info (const char *, unsigned, unsigned);
+static void tag_pmu_load_latency_info (const char *, unsigned, unsigned);
+static void tag_pmu_branch_mispredict_info (const char *, unsigned, unsigned);
+static void tag_pmu_tool_header (const char *, unsigned, unsigned);
+
 extern int main (int, char **);
 
 typedef struct tag_format
@@ -73,6 +77,11 @@ 
   {GCOV_TAG_OBJECT_SUMMARY, "OBJECT_SUMMARY", tag_summary},
   {GCOV_TAG_PROGRAM_SUMMARY, "PROGRAM_SUMMARY", tag_summary},
   {GCOV_TAG_MODULE_INFO, "MODULE INFO", tag_module_info},
+  {GCOV_TAG_PMU_LOAD_LATENCY_INFO, "PMU_LOAD_LATENCY_INFO",
+   tag_pmu_load_latency_info},
+  {GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO, "PMU_BRANCH_MISPREDICT_INFO",
+   tag_pmu_branch_mispredict_info},
+  {GCOV_TAG_PMU_TOOL_HEADER, "PMU_TOOL_HEADER", tag_pmu_tool_header},
   {0, NULL, NULL}
 };
 
@@ -519,3 +528,45 @@ 
       printf (": %s [%s]", mod_info->source_filename, suffix);
     }
 }
+
+/* Read gcov tag GCOV_TAG_PMU_LOAD_LATENCY_INFO from the gcda file and
+  print the contents in a human readable form.  */
+
+static void
+tag_pmu_load_latency_info (const char *filename ATTRIBUTE_UNUSED,
+                           unsigned tag ATTRIBUTE_UNUSED, unsigned length)
+{
+  gcov_pmu_ll_info_t ll_info;
+  gcov_read_pmu_load_latency_info (&ll_info, length);
+  print_load_latency_line (stdout, &ll_info, 0);
+  if (ll_info.filename)
+    free (ll_info.filename);
+}
+
+/* Read gcov tag GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO from the gcda
+  file and print the contents in a human readable form.  */
+
+static void
+tag_pmu_branch_mispredict_info (const char *filename ATTRIBUTE_UNUSED,
+                                unsigned tag ATTRIBUTE_UNUSED, unsigned length)
+{
+  gcov_pmu_brm_info_t brm_info;
+  gcov_read_pmu_branch_mispredict_info (&brm_info, length);
+  print_branch_mispredict_line (stdout, &brm_info, 0);
+  if (brm_info.filename)
+    free (brm_info.filename);
+}
+
+
+/* Read gcov tag GCOV_TAG_PMU_TOOL_HEADER from the gcda file and print
+   the contents in a human readable form.  */
+
+static void
+tag_pmu_tool_header (const char *filename ATTRIBUTE_UNUSED,
+                     unsigned tag ATTRIBUTE_UNUSED, unsigned length)
+{
+  gcov_pmu_tool_header_t tool_header;
+  gcov_read_pmu_tool_header (&tool_header, length);
+  print_pmu_tool_header (stdout, &tool_header, 0);
+  destroy_pmu_tool_header (&tool_header);
+}