Patchwork [google] Updated patch for PMU profiling (issue4638047)

login
register
mail settings
Submitter Sharad Singhai
Date June 20, 2011, 10:17 p.m.
Message ID <20110620221714.E35B115C1C1@nabu.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/101218/
State New
Headers show

Comments

Sharad Singhai - June 20, 2011, 10:17 p.m.
Hi David,

Thanks for your comments. I have addressed them in the following
updated patch. There were a few places in pmu-profile.c where I
couldn't use 'xrealloc' or 'xstrdup' as this file is part of libgcov
where these functions are not available. That is why XNEW etc. are
redefined in that file.

Regards,
Sharad


2011-06-20   Sharad Singhai  <singhai@google.com>

	* libgcc/Makefile.in: Add pmu-profile.o to libgcov.
	* gcc/doc/invoke.texi: Document new pmu profile related options.
	* gcc/doc/gcov.texi: Document new options -m and -q.
	* gcc/gcc.c: Link libgcov for -fpmu-profile-generate option.
	* gcc/gcov.c (filter_pmu_data_lines): New function.
	(output_pmu_data_header): ditto.
	(output_pmu_data): ditto.
	(output_load_latency_line): ditto.
	(output_branch_mispredict_line): ditto.
	(static void process_pmu_profile): ditto.
	* gcc/gcov-io.c (gcov_canonical_filename): New function.
	(gcov_read_pmu_load_latency_info): ditto.
	(gcov_read_pmu_branch_mispredict_info): ditto.
	(gcov_read_pmu_tool_header): ditto.
	(gcov_string_length): ditto.
	(convert_unsigned_to_pct): ditto.
	(print_load_latency_line): ditto.
	(print_branch_mispredict_line): ditto.
	(print_pmu_tool_header): ditto.
	(destroy_pmu_tool_header): ditto.
	(gcov_read_string): Make it available unconditionally.
	* gcc/gcov-io.h (struct gcov_pmu_info): New structure.
	* gcc/opts.c: New option -fpmu-profile-generate.
	* gcc/pmu-profile.c (enum pmu_tool_type): New structure.
	(enum pmu_event_type): ditto.
	(enum pmu_state): ditto.
	(enum cpu_vendor_signature): ditto.
	(struct pmu_tool_info): ditto.
	(get_x86cpu_vendor): New function.
	(parse_pmu_profile_options): ditto.
	(start_addr2line_symbolizer): ditto.
	(reset_symbolizer_parent_pipes): ditto.
	(reset_symbolizer_child_pipes): ditto.
	(end_addr2line_symbolizer): ditto.
	(symbolize_addr2line): ditto.
	(start_pfmon_module): ditto.
	(convert_pct_to_unsigned): ditto.
	(parse_load_latency_line): ditto.
	(parse_branch_mispredict_line): ditto.
	(destroy_load_latency_infos): ditto.
	(destroy_branch_mispredict_infos): ditto.
	(parse_pfmon_load_latency): ditto.
	(parse_pfmon_tool_header): ditto.
	(parse_pfmon_branch_mispredicts): ditto.
	(pmu_start): ditto.
	(init_pmu_load_latency): ditto.
	(init_pmu_branch_mispredict): ditto.
	(init_pmu_tool): ditto.
	(__gcov_init_pmu_profiler): ditto.
	(__gcov_start_pmu_profiler): ditto.
	(__gcov_stop_pmu_profiler): ditto.
	(gcov_write_ll_line): ditto.
	(gcov_write_branch_mispredict_line): ditto.
	(gcov_write_load_latency_infos): ditto.
	(gcov_write_branch_mispredict_infos): ditto.
	(gcov_tag_pmu_tool_header_length): ditto.
	(gcov_write_tool_header): ditto.
	(__gcov_end_pmu_profiler): ditto.
	* gcc/coverage.c (get_const_string_type): New function.
	(create_coverage): Do the coverage processing even if only
	flag_pmu_profile_generate is specified.
	(coverage_init): Call gimple_init_instrumentation_sampling from here instead
	from tree-profile.c:gimple_init_edge_profiler.
	(get_da_file_name): Make extern.
	(profiling_enabled_p): New function.
	(init_pmu_profiling): ditto.
	(check_pmu_profile_options): ditto.
	* gcc/coverage.h (get_da_file_name): Make it extern.
	* gcc/common.opt: Add new options -fpmu-profile-generate and
	-fpmu-profile-use.
	* gcc/tree-profile.c (gimple_init_instrumentation_sampling): Make
	extern. Move the call from gimple_init_edge_profiler to
	coverage.c:coverage_init.
	* gcc/libgcov.c (gcov_alloc_filename): Moved earlier in file.
	(pmu_profile_stop): New function.
	(gcov_dump_module_info): Replace gcov_strip_leading_dirs with a macro.
	(__gcov_init): Add initialization of PMU profiler.
	(gcov_exit): Add finalization of PMU profiler.
	(gcov_get_filename): Cleanup whitespaces.
	* gcc/params.def: New parameter pmu_profile_n_addresses.
	* gcc/gcov-dump.c (tag_pmu_load_latency_info): New function.
	(tag_pmu_branch_mispredict_info): ditto.
	(tag_pmu_tool_header): ditto.


--
This patch is available for review at http://codereview.appspot.com/4638047
Xinliang David Li - June 20, 2011, 11:37 p.m.
Ok for google/main when required FDO/LIPO testings are done.

David


On Mon, Jun 20, 2011 at 3:17 PM, Sharad Singhai <singhai@google.com> wrote:
> Hi David,
>
> Thanks for your comments. I have addressed them in the following
> updated patch. There were a few places in pmu-profile.c where I
> couldn't use 'xrealloc' or 'xstrdup' as this file is part of libgcov
> where these functions are not available. That is why XNEW etc. are
> redefined in that file.
>
> Regards,
> Sharad
>
>
> 2011-06-20   Sharad Singhai  <singhai@google.com>
>
>        * libgcc/Makefile.in: Add pmu-profile.o to libgcov.
>        * gcc/doc/invoke.texi: Document new pmu profile related options.
>        * gcc/doc/gcov.texi: Document new options -m and -q.
>        * gcc/gcc.c: Link libgcov for -fpmu-profile-generate option.
>        * gcc/gcov.c (filter_pmu_data_lines): New function.
>        (output_pmu_data_header): ditto.
>        (output_pmu_data): ditto.
>        (output_load_latency_line): ditto.
>        (output_branch_mispredict_line): ditto.
>        (static void process_pmu_profile): ditto.
>        * gcc/gcov-io.c (gcov_canonical_filename): New function.
>        (gcov_read_pmu_load_latency_info): ditto.
>        (gcov_read_pmu_branch_mispredict_info): ditto.
>        (gcov_read_pmu_tool_header): ditto.
>        (gcov_string_length): ditto.
>        (convert_unsigned_to_pct): ditto.
>        (print_load_latency_line): ditto.
>        (print_branch_mispredict_line): ditto.
>        (print_pmu_tool_header): ditto.
>        (destroy_pmu_tool_header): ditto.
>        (gcov_read_string): Make it available unconditionally.
>        * gcc/gcov-io.h (struct gcov_pmu_info): New structure.
>        * gcc/opts.c: New option -fpmu-profile-generate.
>        * gcc/pmu-profile.c (enum pmu_tool_type): New structure.
>        (enum pmu_event_type): ditto.
>        (enum pmu_state): ditto.
>        (enum cpu_vendor_signature): ditto.
>        (struct pmu_tool_info): ditto.
>        (get_x86cpu_vendor): New function.
>        (parse_pmu_profile_options): ditto.
>        (start_addr2line_symbolizer): ditto.
>        (reset_symbolizer_parent_pipes): ditto.
>        (reset_symbolizer_child_pipes): ditto.
>        (end_addr2line_symbolizer): ditto.
>        (symbolize_addr2line): ditto.
>        (start_pfmon_module): ditto.
>        (convert_pct_to_unsigned): ditto.
>        (parse_load_latency_line): ditto.
>        (parse_branch_mispredict_line): ditto.
>        (destroy_load_latency_infos): ditto.
>        (destroy_branch_mispredict_infos): ditto.
>        (parse_pfmon_load_latency): ditto.
>        (parse_pfmon_tool_header): ditto.
>        (parse_pfmon_branch_mispredicts): ditto.
>        (pmu_start): ditto.
>        (init_pmu_load_latency): ditto.
>        (init_pmu_branch_mispredict): ditto.
>        (init_pmu_tool): ditto.
>        (__gcov_init_pmu_profiler): ditto.
>        (__gcov_start_pmu_profiler): ditto.
>        (__gcov_stop_pmu_profiler): ditto.
>        (gcov_write_ll_line): ditto.
>        (gcov_write_branch_mispredict_line): ditto.
>        (gcov_write_load_latency_infos): ditto.
>        (gcov_write_branch_mispredict_infos): ditto.
>        (gcov_tag_pmu_tool_header_length): ditto.
>        (gcov_write_tool_header): ditto.
>        (__gcov_end_pmu_profiler): ditto.
>        * gcc/coverage.c (get_const_string_type): New function.
>        (create_coverage): Do the coverage processing even if only
>        flag_pmu_profile_generate is specified.
>        (coverage_init): Call gimple_init_instrumentation_sampling from here instead
>        from tree-profile.c:gimple_init_edge_profiler.
>        (get_da_file_name): Make extern.
>        (profiling_enabled_p): New function.
>        (init_pmu_profiling): ditto.
>        (check_pmu_profile_options): ditto.
>        * gcc/coverage.h (get_da_file_name): Make it extern.
>        * gcc/common.opt: Add new options -fpmu-profile-generate and
>        -fpmu-profile-use.
>        * gcc/tree-profile.c (gimple_init_instrumentation_sampling): Make
>        extern. Move the call from gimple_init_edge_profiler to
>        coverage.c:coverage_init.
>        * gcc/libgcov.c (gcov_alloc_filename): Moved earlier in file.
>        (pmu_profile_stop): New function.
>        (gcov_dump_module_info): Replace gcov_strip_leading_dirs with a macro.
>        (__gcov_init): Add initialization of PMU profiler.
>        (gcov_exit): Add finalization of PMU profiler.
>        (gcov_get_filename): Cleanup whitespaces.
>        * gcc/params.def: New parameter pmu_profile_n_addresses.
>        * gcc/gcov-dump.c (tag_pmu_load_latency_info): New function.
>        (tag_pmu_branch_mispredict_info): ditto.
>        (tag_pmu_tool_header): ditto.
>
> Index: libgcc/Makefile.in
> ===================================================================
> --- libgcc/Makefile.in  (revision 175226)
> +++ libgcc/Makefile.in  (working copy)
> @@ -747,10 +747,13 @@
>  dyn-ipa.o: %$(objext): $(gcc_srcdir)/libgcov.c
>        $(gcc_compile)  -c $(gcc_srcdir)/dyn-ipa.c
>
> +pmu-profile.o: %$(objext): $(gcc_srcdir)/libgcov.c
> +       $(gcc_compile)  -c $(gcc_srcdir)/pmu-profile.c
>
> +
>  # Static libraries.
>  libgcc.a: $(libgcc-objects)
> -libgcov.a: $(libgcov-objects) dyn-ipa$(objext)
> +libgcov.a: $(libgcov-objects) dyn-ipa$(objext) pmu-profile$(objext)
>  libunwind.a: $(libunwind-objects)
>  libgcc_eh.a: $(libgcc-eh-objects)
>
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi (revision 175226)
> +++ gcc/doc/invoke.texi (working copy)
> @@ -388,6 +388,8 @@
>  -fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol
>  -fprofile-generate=@var{path} -fprofile-generate-sampling @gol
>  -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
> +-fpmu-profile-generate=@var{pmuoption} @gol
> +-fpmu-profile-use=@var{pmuoption} @gol
>  -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol
>  -freorder-blocks-and-partition -freorder-functions @gol
>  -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
> @@ -8088,6 +8090,26 @@
>  If @var{path} is specified, GCC will look at the @var{path} to find
>  the profile feedback data files. See @option{-fprofile-dir}.
>
> +@item -fpmu-profile-generate=@var{pmuoption}
> +@opindex fpmu-profile-generate
> +
> +Enable performance monitoring unit (PMU) profiling.  This collects
> +hardware counter data corresponding to @var{pmuoption}.  Currently
> +only @var{load-latency} and @var{branch-mispredict} are supported
> +using pfmon tool.  You must use @option{-fpmu-profile-generate} both
> +when compiling and when linking your program.  This PMU profile data
> +may later be used by the compiler during optimizations as well can be
> +displayed using coverage tool gcov. The params variable
> +"pmu_profile_n_addresses" can be used to restrict PMU data collection
> +to only this many addresses.
> +
> +@item -fpmu-profile-use=@var{pmuoption}
> +@opindex fpmu-profile-use
> +
> +Enable performance monitoring unit (PMU) profiling based
> +optimizations.  Currently only @var{load-latency} and
> +@var{branch-mispredict} are supported.
> +
>  @item -fripa
>  @opindex fripa
>  Perform dynamic inter-procedural analysis. This is used in conjunction with
> Index: gcc/doc/gcov.texi
> ===================================================================
> --- gcc/doc/gcov.texi   (revision 175226)
> +++ gcc/doc/gcov.texi   (working copy)
> @@ -124,9 +124,11 @@
>      [@option{-a}|@option{--all-blocks}]
>      [@option{-b}|@option{--branch-probabilities}]
>      [@option{-c}|@option{--branch-counts}]
> +     [@option{-m}|@option{--pmu-profile}]
>      [@option{-n}|@option{--no-output}]
>      [@option{-l}|@option{--long-file-names}]
>      [@option{-p}|@option{--preserve-paths}]
> +     [@option{-q}|@option{--pmu_profile-path}]
>      [@option{-f}|@option{--function-summaries}]
>      [@option{-o}|@option{--object-directory} @var{directory|file}] @var{sourcefiles}
>      [@option{-u}|@option{--unconditional-branches}]
> @@ -169,6 +171,14 @@
>  Write branch frequencies as the number of branches taken, rather than
>  the percentage of branches taken.
>
> +@item -m
> +@itemx --pmu-profile
> +Output the additional PMU profile information if available.
> +
> +@item -q
> +@itemx --pmu_profile-path
> +PMU profile path (default @file{pmuprofile.gcda}).
> +
>  @item -n
>  @itemx --no-output
>  Do not create the @command{gcov} output file.
> Index: gcc/gcc.c
> ===================================================================
> --- gcc/gcc.c   (revision 175226)
> +++ gcc/gcc.c   (working copy)
> @@ -662,7 +662,7 @@
>     %{static:} %{L*} %(mfwrap) %(link_libgcc) %o\
>     %{fopenmp|ftree-parallelize-loops=*:%:include(libgomp.spec)%(link_gomp)}\
>     %(mflib) " STACK_SPLIT_SPEC "\
> -    %{fprofile-arcs|fprofile-generate*|coverage:-lgcov}\
> +    %{fprofile-arcs|fprofile-generate*|fpmu-profile-generate*|coverage:-lgcov}\
>     %{!nostdlib:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}\
>     %{!nostdlib:%{!nostartfiles:%E}} %{T*} }}}}}}"
>  #endif
> Index: gcc/gcov.c
> ===================================================================
> --- gcc/gcov.c  (revision 175226)
> +++ gcc/gcov.c  (working copy)
> @@ -209,6 +209,15 @@
>   char *name;
>  } coverage_t;
>
> +/* Describes PMU profile data for either one source file or for the
> +   entire program.  */
> +
> +typedef struct pmu_data
> +{
> +  ll_infos_t ll_infos;
> +  brm_infos_t brm_infos;
> +} pmu_data_t;
> +
>  /* Describes a single line of source. Contains a chain of basic blocks
>    with code on it.  */
>
> @@ -242,6 +251,8 @@
>
>   coverage_t coverage;
>
> +  pmu_data_t *pmu_data;    /* PMU profile information for this file.  */
> +
>   /* Functions in this source file.  These are in ascending line
>      number order.  */
>   function_t *functions;
> @@ -301,6 +312,10 @@
>  /* Show unconditional branches too.  */
>  static int flag_unconditional = 0;
>
> +/* Output performance monitoring unit (PMU) data, if available.  */
> +
> +static int flag_pmu_profile = 0;
> +
>  /* Output a gcov file if this is true.  This is on by default, and can
>    be turned off by the -n option.  */
>
> @@ -345,6 +360,18 @@
>
>  static int flag_counts = 0;
>
> +/* PMU profile default filename.  */
> +
> +static char pmu_profile_default_filename[] = "pmuprofile.gcda";
> +
> +/* PMU profile filename where the PMU profile data is read from.  */
> +
> +static char *pmu_profile_filename = 0;
> +
> +/* PMU data for the entire program.  */
> +
> +static pmu_data_t pmu_global_info;
> +
>  /* Forward declarations.  */
>  static void fnotice (FILE *, const char *, ...) ATTRIBUTE_PRINTF_2;
>  static int process_args (int, char **);
> @@ -366,6 +393,17 @@
>  static void output_lines (FILE *, const source_t *);
>  static char *make_gcov_file_name (const char *, const char *);
>  static void release_structures (void);
> +static void process_pmu_profile (void);
> +static void filter_pmu_data_lines (source_t *src);
> +static void output_pmu_data_header (FILE *gcov_file, pmu_data_t *pmu_data);
> +static void output_pmu_data (FILE *gcov_file, const source_t *src,
> +                             const unsigned line_num);
> +static void output_load_latency_line (FILE *fp,
> +                                      const gcov_pmu_ll_info_t *ll_info,
> +                                      gcov_pmu_tool_header_t *tool_header);
> +static void output_branch_mispredict_line (FILE *fp,
> +                                           const gcov_pmu_brm_info_t *brm_info);
> +
>  extern int main (int, char **);
>
>  int
> @@ -389,6 +427,15 @@
>   if (argc - argno > 1)
>     multiple_files = 1;
>
> +  /*  We read pmu profile first because we later filter
> +      src:line_numbers for each source.  */
> +  if (flag_pmu_profile)
> +    {
> +      if (!pmu_profile_filename)
> +        pmu_profile_filename = pmu_profile_default_filename;
> +      process_pmu_profile ();
> +    }
> +
>   first_arg = argno;
>
>   for (; argno != argc; argno++)
> @@ -433,12 +480,14 @@
>   fnotice (file, "  -b, --branch-probabilities      Include branch probabilities in output\n");
>   fnotice (file, "  -c, --branch-counts             Given counts of branches taken\n\
>                                     rather than percentages\n");
> +  fnotice (file, "  -m, --pmu-profile               Output PMU profile data if available\n");
>   fnotice (file, "  -n, --no-output                 Do not create an output file\n");
>   fnotice (file, "  -l, --long-file-names           Use long output file names for included\n\
>                                     source files\n");
>   fnotice (file, "  -f, --function-summaries        Output summaries for each function\n");
>   fnotice (file, "  -o, --object-directory DIR|FILE Search for object files in DIR or called FILE\n");
>   fnotice (file, "  -p, --preserve-paths            Preserve all pathname components\n");
> +  fnotice (file, "  -q, --pmu_profile-path          Path for PMU profile (default pmuprofile.gcda)\n");
>   fnotice (file, "  -u, --unconditional-branches    Show unconditional branch counts too\n");
>   fnotice (file, "  -i, --intermediate-format       Output .gcov file in an intermediate text\n\
>                                     format that can be used by 'lcov' or other\n\
> @@ -473,6 +522,7 @@
>   { "all-blocks",           no_argument,       NULL, 'a' },
>   { "branch-probabilities", no_argument,       NULL, 'b' },
>   { "branch-counts",        no_argument,       NULL, 'c' },
> +  { "pmu-profile",          no_argument,       NULL, 'm' },
>   { "no-output",            no_argument,       NULL, 'n' },
>   { "long-file-names",      no_argument,       NULL, 'l' },
>   { "function-summaries",   no_argument,       NULL, 'f' },
> @@ -480,6 +530,7 @@
>   { "object-directory",     required_argument, NULL, 'o' },
>   { "object-file",          required_argument, NULL, 'o' },
>   { "unconditional-branches", no_argument,     NULL, 'u' },
> +  { "pmu_profile-path",     required_argument, NULL, 'q' },
>   { "display-progress",     no_argument,       NULL, 'd' },
>   { "intermediate-format",  no_argument,       NULL, 'i' },
>   { 0, 0, 0, 0 }
> @@ -492,7 +543,7 @@
>  {
>   int opt;
>
> -  while ((opt = getopt_long (argc, argv, "abcdfhilno:puv", options, NULL)) !=
> +  while ((opt = getopt_long (argc, argv, "abcdfhilno:pq:uv", options, NULL)) !=
>          -1)
>     {
>       switch (opt)
> @@ -515,6 +566,9 @@
>        case 'l':
>          flag_long_names = 1;
>          break;
> +       case 'm':
> +         flag_pmu_profile = 1;
> +         break;
>        case 'n':
>          flag_gcov_file = 0;
>          break;
> @@ -524,6 +578,9 @@
>        case 'p':
>          flag_preserve_paths = 1;
>          break;
> +       case 'q':
> +         pmu_profile_filename = optarg;
> +         break;
>        case 'u':
>          flag_unconditional = 1;
>          break;
> @@ -766,6 +823,8 @@
>  {
>   function_t *fn;
>   source_t *src;
> +  ll_infos_t *ll_infos = &pmu_global_info.ll_infos;
> +  brm_infos_t *brm_infos = &pmu_global_info.brm_infos;
>
>   while ((src = sources))
>     {
> @@ -773,6 +832,14 @@
>
>       free (src->name);
>       free (src->lines);
> +      if (src->pmu_data)
> +        {
> +          if (src->pmu_data->ll_infos.ll_array)
> +            free (src->pmu_data->ll_infos.ll_array);
> +          if (src->pmu_data->brm_infos.brm_array)
> +            free (src->pmu_data->brm_infos.brm_array);
> +          free (src->pmu_data);
> +        }
>     }
>
>   while ((fn = functions))
> @@ -794,6 +861,42 @@
>       free (fn->blocks);
>       free (fn->counts);
>     }
> +
> +  /* Cleanup PMU load latency info.  */
> +  if (ll_infos->ll_count)
> +    {
> +      unsigned i;
> +
> +      /* delete each element */
> +      for (i = 0; i < ll_infos->ll_count; ++i)
> +        {
> +          if (ll_infos->ll_array[i]->filename)
> +            XDELETE (ll_infos->ll_array[i]->filename);
> +          XDELETE (ll_infos->ll_array[i]);
> +        }
> +      /* delete the array itself */
> +      XDELETE (ll_infos->ll_array);
> +      ll_infos->ll_array = NULL;
> +      ll_infos->ll_count = 0;
> +    }
> +
> +  /* Cleanup PMU branch mispredict info.  */
> +  if (brm_infos->brm_count)
> +    {
> +      unsigned i;
> +
> +      /* delete each element */
> +      for (i = 0; i < brm_infos->brm_count; ++i)
> +        {
> +          if (brm_infos->brm_array[i]->filename)
> +            XDELETE (brm_infos->brm_array[i]->filename);
> +          XDELETE (brm_infos->brm_array[i]);
> +        }
> +      /* delete the array itself */
> +      XDELETE (brm_infos->brm_array);
> +      brm_infos->brm_array = NULL;
> +      brm_infos->brm_count = 0;
> +    }
>  }
>
>  /* Generate the names of the graph and data files. If OBJECT_DIRECTORY
> @@ -890,6 +993,7 @@
>       src->coverage.name = src->name;
>       src->index = source_index++;
>       src->next = sources;
> +      src->pmu_data = 0;
>       sources = src;
>
>       if (!stat (file_name, &status))
> @@ -1806,6 +1910,140 @@
>     fnotice (stderr, "%s:no lines for '%s'\n", bbg_file_name, fn->name);
>  }
>
> +/* Filter PMU profile global data for lines for SRC.  Save PMU info
> +   matching the source file and sort them by line number for later
> +   line by line processing.  */
> +
> +static void
> +filter_pmu_data_lines (source_t *src)
> +{
> +  unsigned i;
> +  int changed;
> +  ll_infos_t *ll_infos;         /* load latency information for this source */
> +  brm_infos_t *brm_infos;  /* branch mispredict information for this source */
> +
> +  if (pmu_global_info.ll_infos.ll_count == 0 &&
> +      pmu_global_info.brm_infos.brm_count == 0)
> +    /* If there are no global entries, there is nothing to filter.  */
> +    return;
> +
> +  src->pmu_data = XCNEW (pmu_data_t);
> +  ll_infos = &src->pmu_data->ll_infos;
> +  brm_infos = &src->pmu_data->brm_infos;
> +  ll_infos->pmu_tool_header = pmu_global_info.ll_infos.pmu_tool_header;
> +  brm_infos->pmu_tool_header = pmu_global_info.brm_infos.pmu_tool_header;
> +  ll_infos->ll_array = 0;
> +  brm_infos->brm_array = 0;
> +
> +  /* Go over all the load latency entries and save the ones
> +     corresponding to this source file.  */
> +  for (i = 0; i < pmu_global_info.ll_infos.ll_count; ++i)
> +    {
> +      gcov_pmu_ll_info_t *ll_info = pmu_global_info.ll_infos.ll_array[i];
> +      if (0 == strcmp (src->name, ll_info->filename))
> +        {
> +          if (!ll_infos->ll_array)
> +            {
> +              ll_infos->ll_count = 0;
> +              ll_infos->alloc_ll_count = 64;
> +              ll_infos->ll_array = XCNEWVEC (gcov_pmu_ll_info_t *,
> +                                             ll_infos->alloc_ll_count);
> +            }
> +          /* Found a matching entry, save it.  */
> +          ll_infos->ll_count++;
> +          if (ll_infos->ll_count >= ll_infos->alloc_ll_count)
> +            {
> +              /* need to realloc */
> +              ll_infos->ll_array = (gcov_pmu_ll_info_t **)
> +                xrealloc (ll_infos->ll_array, 2 * ll_infos->alloc_ll_count);
> +            }
> +          ll_infos->ll_array[ll_infos->ll_count - 1] = ll_info;
> +        }
> +    }
> +
> +  /* Go over all the branch mispredict entries and save the ones
> +     corresponding to this source file.  */
> +  for (i = 0; i < pmu_global_info.brm_infos.brm_count; ++i)
> +    {
> +      gcov_pmu_brm_info_t *brm_info = pmu_global_info.brm_infos.brm_array[i];
> +      if (0 == strcmp (src->name, brm_info->filename))
> +        {
> +          if (!brm_infos->brm_array)
> +            {
> +              brm_infos->brm_count = 0;
> +              brm_infos->alloc_brm_count = 64;
> +              brm_infos->brm_array = XCNEWVEC (gcov_pmu_brm_info_t *,
> +                                               brm_infos->alloc_brm_count);
> +            }
> +          /* Found a matching entry, save it.  */
> +          brm_infos->brm_count++;
> +          if (brm_infos->brm_count >= brm_infos->alloc_brm_count)
> +            {
> +              /* need to realloc */
> +              brm_infos->brm_array = (gcov_pmu_brm_info_t **)
> +                xrealloc (brm_infos->brm_array, 2 * brm_infos->alloc_brm_count);
> +            }
> +          brm_infos->brm_array[brm_infos->brm_count - 1] = brm_info;
> +        }
> +    }
> +
> +  /* Sort the load latency data according to the line numbers because
> +     we later iterate over sources in line number order. Normally we
> +     expect the PMU tool to provide sorted data, but a few entries can
> +     be out of order. Thus we use a very simple bubble sort here.  */
> +  if (ll_infos->ll_count > 1)
> +    {
> +      changed = 1;
> +      while (changed)
> +        {
> +          changed = 0;
> +          for (i = 0; i < ll_infos->ll_count - 1; ++i)
> +            {
> +              gcov_pmu_ll_info_t *item1 = ll_infos->ll_array[i];
> +              gcov_pmu_ll_info_t *item2 = ll_infos->ll_array[i+1];
> +              if (item1->line > item2->line)
> +                {
> +                  /* swap */
> +                  gcov_pmu_ll_info_t *tmp = ll_infos->ll_array[i];
> +                  ll_infos->ll_array[i] = ll_infos->ll_array[i+1];
> +                  ll_infos->ll_array[i+1] = tmp;
> +                  changed = 1;
> +                }
> +            }
> +        }
> +    }
> +
> +  /* Similarly, sort branch mispredict info as well.  */
> +  if (brm_infos->brm_count > 1)
> +    {
> +      changed = 1;
> +      while (changed)
> +        {
> +          changed = 0;
> +          for (i = 0; i < brm_infos->brm_count - 1; ++i)
> +            {
> +              gcov_pmu_brm_info_t *item1 = brm_infos->brm_array[i];
> +              gcov_pmu_brm_info_t *item2 = brm_infos->brm_array[i+1];
> +              if (item1->line > item2->line)
> +                {
> +                  /* swap */
> +                  gcov_pmu_brm_info_t *tmp = brm_infos->brm_array[i];
> +                  brm_infos->brm_array[i] = brm_infos->brm_array[i+1];
> +                  brm_infos->brm_array[i+1] = tmp;
> +                  changed = 1;
> +                }
> +            }
> +        }
> +    }
> +
> +  /* If no matching PMU info was found, relase the structures.  */
> +  if (!brm_infos->brm_array && !ll_infos->ll_array)
> +  {
> +    free (src->pmu_data);
> +    src->pmu_data = 0;
> +  }
> +}
> +
>  /* Accumulate the line counts of a file.  */
>
>  static void
> @@ -1815,6 +2053,10 @@
>   function_t *fn, *fn_p, *fn_n;
>   unsigned ix;
>
> +  if (flag_pmu_profile)
> +    /* Filter PMU profile by source files and save into matching line(s).  */
> +    filter_pmu_data_lines (src);
> +
>   /* Reverse the function order.  */
>   for (fn = src->functions, fn_p = NULL; fn;
>        fn_p = fn, fn = fn_n)
> @@ -2062,6 +2304,9 @@
>   else if (src->file_time == 0)
>     fprintf (gcov_file, "%9s:%5d:Source is newer than graph\n", "-", 0);
>
> +  if (src->pmu_data)
> +    output_pmu_data_header (gcov_file, src->pmu_data);
> +
>   if (flag_branches)
>     fn = src->functions;
>
> @@ -2139,6 +2384,10 @@
>          for (ix = 0, arc = line->u.branches; arc; arc = arc->line_next)
>            ix += output_branch_count (gcov_file, ix, arc);
>        }
> +
> +      /* Output PMU profile info if available.  */
> +      if (flag_pmu_profile)
> +        output_pmu_data (gcov_file, src, line_num);
>     }
>
>   /* Handle all remaining source lines.  There may be lines after the
> @@ -2162,3 +2411,236 @@
>   if (source_file)
>     fclose (source_file);
>  }
> +
> +/* Print an explanatory header for PMU_DATA into GCOV_FILE.  */
> +
> +static void
> +output_pmu_data_header (FILE *gcov_file, pmu_data_t *pmu_data)
> +{
> +  /* Print header for the applicable PMU events.  */
> +  fprintf (gcov_file, "%9s:%5d\n", "-", 0);
> +  if (pmu_data->ll_infos.ll_count)
> +    {
> +      char *text = pmu_data->ll_infos.pmu_tool_header->column_description;
> +      char c;
> +      fprintf (gcov_file, "%9s:%5u: %s", "PMU_LL", 0,
> +               pmu_data->ll_infos.pmu_tool_header->column_header);
> +      /* The column description is multiline text and we want to print
> +         each line separately after formatting it.  */
> +      fprintf (gcov_file, "%9s:%5u: ", "PMU_LL", 0);
> +      while ((c = *text++))
> +        {
> +          fprintf (gcov_file, "%c", c);
> +          /* Do not print a new header on trailing newline.   */
> +          if (c == '\n' && text[1])
> +            fprintf (gcov_file, "%9s:%5u: ", "PMU_LL", 0);
> +        }
> +      fprintf (gcov_file, "%9s:%5d\n", "-", 0);
> +    }
> +
> +  if (pmu_data->brm_infos.brm_count)
> +    {
> +
> +      fprintf (gcov_file, "%9s:%5d:PMU BRM: line: %s %s %s\n",
> +               "-", 0, "count", "self", "address");
> +      fprintf (gcov_file, "%9s:%5d:         "
> +               "count: number of branch mispredicts sampled at this address\n",
> +               "-", 0);
> +      fprintf (gcov_file, "%9s:%5d:         "
> +               "self: branch mispredicts as percentage of the entire program\n",
> +               "-", 0);
> +      fprintf (gcov_file, "%9s:%5d\n", "-", 0);
> +    }
> +}
> +
> +/* Output pmu data corresponding to SRC and LINE_NUM into GCOV_FILE.  */
> +
> +static void
> +output_pmu_data (FILE *gcov_file, const source_t *src, const unsigned line_num)
> +{
> +  unsigned i;
> +  ll_infos_t *ll_infos;
> +  brm_infos_t *brm_infos;
> +  gcov_pmu_tool_header_t *tool_header;
> +
> +  if (!src->pmu_data)
> +    return;
> +
> +  ll_infos = &src->pmu_data->ll_infos;
> +  brm_infos = &src->pmu_data->brm_infos;
> +
> +  if (ll_infos->ll_array)
> +    {
> +      tool_header = src->pmu_data->ll_infos.pmu_tool_header;
> +
> +      /* Search PMU load latency data for the matching line
> +         numbers. There could be multiple entries with the same line
> +         number. We use the fact that line numbers are sorted in
> +         ll_array.  */
> +      for (i = 0; i < ll_infos->ll_count &&
> +             ll_infos->ll_array[i]->line <= line_num; ++i)
> +        {
> +          gcov_pmu_ll_info_t *ll_info = ll_infos->ll_array[i];
> +          if (ll_info->line == line_num)
> +            output_load_latency_line (gcov_file, ll_info, tool_header);
> +        }
> +    }
> +
> +  if (brm_infos->brm_array)
> +    {
> +      tool_header = src->pmu_data->brm_infos.pmu_tool_header;
> +
> +      /* Search PMU branch mispredict data for the matching line
> +         numbers. There could be multiple entries with the same line
> +         number. We use the fact that line numbers are sorted in
> +         brm_array.  */
> +      for (i = 0; i < brm_infos->brm_count &&
> +             brm_infos->brm_array[i]->line <= line_num; ++i)
> +        {
> +          gcov_pmu_brm_info_t *brm_info = brm_infos->brm_array[i];
> +          if (brm_info->line == line_num)
> +            output_branch_mispredict_line (gcov_file, brm_info);
> +        }
> +    }
> +}
> +
> +
> +/* Output formatted load latency info pointed to by LL_INFO into the
> +   open file FP.  TOOL_HEADER contains additional explanation of
> +   fields.  */
> +
> +static void
> +output_load_latency_line (FILE *fp, const gcov_pmu_ll_info_t *ll_info,
> +                          gcov_pmu_tool_header_t *tool_header ATTRIBUTE_UNUSED)
> +{
> +  fprintf (fp, "%9s:%5u:      ", "PMU_LL", ll_info->line);
> +  fprintf (fp, " %u %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% "
> +           "%.2f%% %.2f%% " HOST_WIDEST_INT_PRINT_HEX "\n",
> +           ll_info->counts,
> +           convert_unsigned_to_pct (ll_info->self),
> +           convert_unsigned_to_pct (ll_info->cum),
> +           convert_unsigned_to_pct (ll_info->lt_10),
> +           convert_unsigned_to_pct (ll_info->lt_32),
> +           convert_unsigned_to_pct (ll_info->lt_64),
> +           convert_unsigned_to_pct (ll_info->lt_256),
> +           convert_unsigned_to_pct (ll_info->lt_1024),
> +           convert_unsigned_to_pct (ll_info->gt_1024),
> +           convert_unsigned_to_pct (ll_info->wself),
> +           ll_info->code_addr);
> +}
> +
> +
> +/* Output formatted branch mispredict info pointed to by BRM_INFO into
> +   the open file FP.  */
> +
> +static void
> +output_branch_mispredict_line (FILE *fp,
> +                               const gcov_pmu_brm_info_t *ll_info)
> +{
> +  fprintf (fp, "%9s:%5u: count: %u self: %.2f%% addr: "
> +           HOST_WIDEST_INT_PRINT_HEX "\n",
> +           "PMU BRM",
> +           ll_info->line,
> +           ll_info->counts,
> +           convert_unsigned_to_pct (ll_info->self),
> +           ll_info->code_addr);
> +}
> +
> +/* Read in the PMU profile information from the global PMU profile file.  */
> +
> +static void process_pmu_profile (void)
> +{
> +  unsigned tag;
> +  unsigned version;
> +  int error = 0;
> +  ll_infos_t *ll_infos = &pmu_global_info.ll_infos;
> +  brm_infos_t *brm_infos = &pmu_global_info.brm_infos;
> +
> +  /* Construct path for pmuprofile.gcda filename. */
> +  create_file_names (pmu_profile_filename);
> +  if (!gcov_open (da_file_name, 1))
> +    {
> +      fnotice (stderr, "%s:cannot open pmu profile file\n",
> +               pmu_profile_filename);
> +      return;
> +    }
> +  if (!gcov_magic (gcov_read_unsigned (), GCOV_DATA_MAGIC))
> +    {
> +      fnotice (stderr, "%s:not a gcov data file\n", da_file_name);
> +    cleanup:;
> +      gcov_close ();
> +      return;
> +    }
> +  version = gcov_read_unsigned ();
> +  if (version != GCOV_VERSION)
> +    {
> +      char v[4], e[4];
> +
> +      GCOV_UNSIGNED2STRING (v, version);
> +      GCOV_UNSIGNED2STRING (e, GCOV_VERSION);
> +      fnotice (stderr, "%s:version '%.4s', prefer version '%.4s'\n",
> +              da_file_name, v, e);
> +    }
> +  /* read stamp */
> +  tag = gcov_read_unsigned ();
> +
> +  /* Initialize PMU data fields. */
> +  ll_infos->ll_count = 0;
> +  ll_infos->alloc_ll_count = 64;
> +  ll_infos->ll_array = XCNEWVEC (gcov_pmu_ll_info_t *, ll_infos->alloc_ll_count);
> +
> +  brm_infos->brm_count = 0;
> +  brm_infos->alloc_brm_count = 64;
> +  brm_infos->brm_array = XCNEWVEC (gcov_pmu_brm_info_t *,
> +                                   brm_infos->alloc_brm_count);
> +
> +  while ((tag = gcov_read_unsigned ()))
> +    {
> +      unsigned length = gcov_read_unsigned ();
> +      unsigned long base = gcov_position ();
> +
> +      if (tag == GCOV_TAG_PMU_LOAD_LATENCY_INFO)
> +        {
> +          gcov_pmu_ll_info_t *ll_info = XCNEW (gcov_pmu_ll_info_t);
> +          gcov_read_pmu_load_latency_info (ll_info, length);
> +          ll_infos->ll_count++;
> +          if (ll_infos->ll_count >= ll_infos->alloc_ll_count)
> +            {
> +              /* need to realloc */
> +              ll_infos->ll_array = (gcov_pmu_ll_info_t **)
> +                xrealloc (ll_infos->ll_array, 2 * ll_infos->alloc_ll_count);
> +            }
> +          ll_infos->ll_array[ll_infos->ll_count - 1] = ll_info;
> +        }
> +      else if (tag == GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO)
> +        {
> +          gcov_pmu_brm_info_t *brm_info = XCNEW (gcov_pmu_brm_info_t);
> +          gcov_read_pmu_branch_mispredict_info (brm_info, length);
> +          brm_infos->brm_count++;
> +          if (brm_infos->brm_count >= brm_infos->alloc_brm_count)
> +            {
> +              /* need to realloc */
> +              brm_infos->brm_array = (gcov_pmu_brm_info_t **)
> +                xrealloc (brm_infos->brm_array, 2 * brm_infos->alloc_brm_count);
> +            }
> +          brm_infos->brm_array[brm_infos->brm_count - 1] = brm_info;
> +        }
> +      else if (tag == GCOV_TAG_PMU_TOOL_HEADER)
> +        {
> +          gcov_pmu_tool_header_t *tool_header = XCNEW (gcov_pmu_tool_header_t);
> +          gcov_read_pmu_tool_header (tool_header, length);
> +          ll_infos->pmu_tool_header = tool_header;
> +          brm_infos->pmu_tool_header = tool_header;
> +        }
> +
> +      gcov_sync (base, length);
> +      if ((error = gcov_is_error ()))
> +       {
> +         fnotice (stderr, error < 0 ? "%s:overflowed\n" : "%s:corrupted\n",
> +                  da_file_name);
> +         goto cleanup;
> +       }
> +    }
> +
> +  gcov_close ();
> +}
> Index: gcc/gcov-io.c
> ===================================================================
> --- gcc/gcov-io.c       (revision 175226)
> +++ gcc/gcov-io.c       (working copy)
> @@ -23,6 +23,12 @@
>  /* Routines declared in gcov-io.h.  This file should be #included by
>    another source file, after having #included gcov-io.h.  */
>
> +/* Redefine these here, rather than using the ones in system.h since
> + * including system.h leads to conflicting definitions of other
> + * symbols and macros.  */
> +#undef MIN
> +#define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
> +
>  #if !IN_GCOV
>  static void gcov_write_block (unsigned);
>  static gcov_unsigned_t *gcov_write_words (unsigned);
> @@ -197,6 +203,104 @@
>  }
>
>  #if !IN_LIBGCOV
> +/* Modify FILENAME to a canonical form after stripping known prefixes
> +   in place.  It removes '/proc/self/cwd' and '/proc/self/cwd/.'.
> +   Returns the in-place modified filename.  */
> +
> +GCOV_LINKAGE char *
> +gcov_canonical_filename (char *filename)
> +{
> +  static char cwd_dot_str[] = "/proc/self/cwd/./";
> +  int cwd_dot_len = strlen (cwd_dot_str);
> +  int cwd_len = cwd_dot_len - 2; /* without trailing './' */
> +  int filename_len = strlen (filename);
> +  /* delete the longer prefix first */
> +  if (0 == strncmp (filename, cwd_dot_str, cwd_dot_len))
> +    {
> +      memmove (filename, filename + cwd_dot_len, filename_len - cwd_dot_len);
> +      filename[filename_len - cwd_dot_len] = '\0';
> +      return filename;
> +    }
> +
> +  if (0 == strncmp (filename, cwd_dot_str, cwd_len))
> +    {
> +      memmove (filename, filename + cwd_len, filename_len - cwd_len);
> +      filename[filename_len - cwd_len] = '\0';
> +      return filename;
> +    }
> +  return filename;
> +}
> +
> +/* Read LEN words and construct load latency info LL_INFO.  */
> +
> +GCOV_LINKAGE void
> +gcov_read_pmu_load_latency_info (gcov_pmu_ll_info_t *ll_info,
> +                                 gcov_unsigned_t len ATTRIBUTE_UNUSED)
> +{
> +  const char *filename;
> +  ll_info->counts = gcov_read_unsigned ();
> +  ll_info->self = gcov_read_unsigned ();
> +  ll_info->cum = gcov_read_unsigned ();
> +  ll_info->lt_10 = gcov_read_unsigned ();
> +  ll_info->lt_32 = gcov_read_unsigned ();
> +  ll_info->lt_64 = gcov_read_unsigned ();
> +  ll_info->lt_256 = gcov_read_unsigned ();
> +  ll_info->lt_1024 = gcov_read_unsigned ();
> +  ll_info->gt_1024 = gcov_read_unsigned ();
> +  ll_info->wself = gcov_read_unsigned ();
> +  ll_info->code_addr = gcov_read_counter ();
> +  ll_info->line = gcov_read_unsigned ();
> +  ll_info->discriminator = gcov_read_unsigned ();
> +  filename = gcov_read_string ();
> +  if (filename)
> +    ll_info->filename = gcov_canonical_filename (xstrdup (filename));
> +  else
> +    ll_info->filename = 0;
> +}
> +
> +/* Read LEN words and construct branch mispredict info BRM_INFO.  */
> +
> +GCOV_LINKAGE void
> +gcov_read_pmu_branch_mispredict_info (gcov_pmu_brm_info_t *brm_info,
> +                                      gcov_unsigned_t len ATTRIBUTE_UNUSED)
> +{
> +  const char *filename;
> +  brm_info->counts = gcov_read_unsigned ();
> +  brm_info->self = gcov_read_unsigned ();
> +  brm_info->cum = gcov_read_unsigned ();
> +  brm_info->code_addr = gcov_read_counter ();
> +  brm_info->line = gcov_read_unsigned ();
> +  brm_info->discriminator = gcov_read_unsigned ();
> +  filename = gcov_read_string ();
> +  if (filename)
> +    brm_info->filename = gcov_canonical_filename (xstrdup (filename));
> +  else
> +    brm_info->filename = 0;
> +}
> +
> +/* Read LEN words from an open gcov file and construct data into pmu
> +   tool header TOOL_HEADER.  */
> +
> +GCOV_LINKAGE void gcov_read_pmu_tool_header (gcov_pmu_tool_header_t *header,
> +                                           gcov_unsigned_t len ATTRIBUTE_UNUSED)
> +{
> +  const char *str;
> +  str = gcov_read_string ();
> +  header->host_cpu = str ? xstrdup (str) : 0;
> +  str = gcov_read_string ();
> +  header->hostname = str ? xstrdup (str) : 0;
> +  str = gcov_read_string ();
> +  header->kernel_version = str ? xstrdup (str) : 0;
> +  str = gcov_read_string ();
> +  header->column_header = str ? xstrdup (str) : 0;
> +  str = gcov_read_string ();
> +  header->column_description = str ? xstrdup (str) : 0;
> +  str = gcov_read_string ();
> +  header->full_header = str ? xstrdup (str) : 0;
> +}
> +#endif
> +
> +#if !IN_LIBGCOV
>  /* Check if MAGIC is EXPECTED. Use it to determine endianness of the
>    file. Returns +1 for same endian, -1 for other endian and zero for
>    not EXPECTED.  */
> @@ -245,6 +349,24 @@
>   gcov_var.offset -= size;
>  }
>
> +#if IN_LIBGCOV
> +/* Return the number of words STRING would need including the length
> +   field in the output stream itself.  This should be identical to
> +   "alloc" calculation in gcov_write_string().  */
> +
> +GCOV_LINKAGE gcov_unsigned_t
> +gcov_string_length (const char *string)
> +{
> +  gcov_unsigned_t len = (string) ? strlen (string) : 0;
> +  /* + 1 because of the length field.  */
> +  gcov_unsigned_t alloc = 1 + ((len + 4) >> 2);
> +
> +  /* Can not write a bigger than GCOV_BLOCK_SIZE string yet */
> +  gcc_assert (alloc < GCOV_BLOCK_SIZE);
> +  return alloc;
> +}
> +#endif
> +
>  /* Allocate space to write BYTES bytes to the gcov file. Return a
>    pointer to those bytes, or NULL on failure.  */
>
> @@ -255,13 +377,15 @@
>
>   gcc_assert (gcov_var.mode < 0);
>  #if IN_LIBGCOV
> -  if (gcov_var.offset >= GCOV_BLOCK_SIZE)
> +  if (gcov_var.offset + words >= GCOV_BLOCK_SIZE)
>     {
> -      gcov_write_block (GCOV_BLOCK_SIZE);
> +      gcov_write_block (MIN (gcov_var.offset, GCOV_BLOCK_SIZE));
>       if (gcov_var.offset)
>        {
> -         gcc_assert (gcov_var.offset == 1);
> -         memcpy (gcov_var.buffer, gcov_var.buffer + GCOV_BLOCK_SIZE, 4);
> +         gcc_assert (gcov_var.offset < GCOV_BLOCK_SIZE);
> +         memcpy (gcov_var.buffer,
> +                  gcov_var.buffer + GCOV_BLOCK_SIZE,
> +                  gcov_var.offset << 2);
>        }
>     }
>  #else
> @@ -302,7 +426,6 @@
>  }
>  #endif /* IN_LIBGCOV */
>
> -#if !IN_LIBGCOV
>  /* Write STRING to coverage file.  Sets error flag on file
>    error, overflow flag on overflow */
>
> @@ -325,7 +448,6 @@
>   buffer[alloc] = 0;
>   memcpy (&buffer[1], string, length);
>  }
> -#endif
>
>  #if !IN_LIBGCOV
>  /* Write a tag TAG and reserve space for the record length. Return a
> @@ -413,14 +535,15 @@
>   unsigned excess = gcov_var.length - gcov_var.offset;
>
>   gcc_assert (gcov_var.mode > 0);
> +  gcc_assert (words < GCOV_BLOCK_SIZE);
>   if (excess < words)
>     {
>       gcov_var.start += gcov_var.offset;
>  #if IN_LIBGCOV
>       if (excess)
>        {
> -         gcc_assert (excess == 1);
> -         memcpy (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, 4);
> +         gcc_assert (excess < GCOV_BLOCK_SIZE);
> +         memmove (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, excess * 4);
>        }
>  #else
>       memmove (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, excess * 4);
> @@ -428,8 +551,7 @@
>       gcov_var.offset = 0;
>       gcov_var.length = excess;
>  #if IN_LIBGCOV
> -      gcc_assert (!gcov_var.length || gcov_var.length == 1);
> -      excess = GCOV_BLOCK_SIZE;
> +      excess = (sizeof (gcov_var.buffer) / sizeof (gcov_var.buffer[0])) - gcov_var.length;
>  #else
>       if (gcov_var.length + words > gcov_var.alloc)
>        gcov_allocate (gcov_var.length + words);
> @@ -489,7 +611,6 @@
>    buffer, or NULL on empty string. You must copy the string before
>    calling another gcov function.  */
>
> -#if !IN_LIBGCOV
>  GCOV_LINKAGE const char *
>  gcov_read_string (void)
>  {
> @@ -500,7 +621,6 @@
>
>   return (const char *) gcov_read_words (length);
>  }
> -#endif
>
>  GCOV_LINKAGE void
>  gcov_read_summary (struct gcov_summary *summary)
> @@ -629,6 +749,87 @@
>  }
>  #endif
>
> +/* Convert an unsigned NUMBER to a percentage after dividing by
> +   100.  */
> +
> +GCOV_LINKAGE float
> +convert_unsigned_to_pct (const unsigned number)
> +{
> +  return (float)number / 100.0f;
> +}
> +
> +#if !IN_LIBGCOV && IN_GCOV != 1
> +/* Print load latency information given by LL_INFO in a human readable
> +   format into an open output file pointed by FP. NEWLINE specifies
> +   whether or not to print a trailing newline.  */
> +
> +GCOV_LINKAGE void
> +print_load_latency_line (FILE *fp, const gcov_pmu_ll_info_t *ll_info,
> +                         const enum print_newline newline)
> +{
> +  if (!ll_info)
> +    return;
> +  fprintf (fp, " %u %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% "
> +           "%.2f%% %.2f%% " HOST_WIDEST_INT_PRINT_HEX " %s %d %d",
> +           ll_info->counts,
> +           convert_unsigned_to_pct (ll_info->self),
> +           convert_unsigned_to_pct (ll_info->cum),
> +           convert_unsigned_to_pct (ll_info->lt_10),
> +           convert_unsigned_to_pct (ll_info->lt_32),
> +           convert_unsigned_to_pct (ll_info->lt_64),
> +           convert_unsigned_to_pct (ll_info->lt_256),
> +           convert_unsigned_to_pct (ll_info->lt_1024),
> +           convert_unsigned_to_pct (ll_info->gt_1024),
> +           convert_unsigned_to_pct (ll_info->wself),
> +           ll_info->code_addr,
> +           ll_info->filename,
> +           ll_info->line,
> +           ll_info->discriminator);
> +  if (newline == add_newline)
> +    fprintf (fp, "\n");
> +}
> +
> +/* Print BRM_INFO into the file pointed by FP.  NEWLINE specifies
> +   whether or not to print a trailing newline.  */
> +
> +GCOV_LINKAGE void
> +print_branch_mispredict_line (FILE *fp, const gcov_pmu_brm_info_t *brm_info,
> +                              const enum print_newline newline)
> +{
> +  if (!brm_info)
> +    return;
> +  fprintf (fp, " %u %.2f%% %.2f%% " HOST_WIDEST_INT_PRINT_HEX " %s %d %d",
> +           brm_info->counts,
> +           convert_unsigned_to_pct (brm_info->self),
> +           convert_unsigned_to_pct (brm_info->cum),
> +           brm_info->code_addr,
> +           brm_info->filename,
> +           brm_info->line,
> +           brm_info->discriminator);
> +  if (newline == add_newline)
> +    fprintf (fp, "\n");
> +}
> +
> +/* Print TOOL_HEADER into the file pointed by FP.  NEWLINE specifies
> +   whether or not to print a trailing newline.  */
> +
> +GCOV_LINKAGE void
> +print_pmu_tool_header (FILE *fp, gcov_pmu_tool_header_t *tool_header,
> +                       const enum print_newline newline)
> +{
> +  if (!tool_header)
> +    return;
> +  fprintf (fp, "\nhost_cpu: %s\n", tool_header->host_cpu);
> +  fprintf (fp, "hostname: %s\n", tool_header->hostname);
> +  fprintf (fp, "kernel_version: %s\n", tool_header->kernel_version);
> +  fprintf (fp, "column_header: %s\n", tool_header->column_header);
> +  fprintf (fp, "column_description: %s\n", tool_header->column_description);
> +  fprintf (fp, "full_header: %s\n", tool_header->full_header);
> +  if (newline == add_newline)
> +    fprintf (fp, "\n");
> +}
> +#endif
> +
>  #if IN_GCOV > 0
>  /* Return the modification time of the current gcov file.  */
>
> @@ -715,7 +916,7 @@
>   if (vsize <= vpos)
>     {
>       printk (KERN_ERR
> -          "GCOV_KERNEL: something wrong: vbuf=%p vsize=%u vpos=%u\n",
> +         "GCOV_KERNEL: something wrong: vbuf=%p vsize=%u vpos=%u\n",
>           vbuf, vsize, vpos);
>       return 0;
>     }
> @@ -744,4 +945,29 @@
>   gcc_assert (0);  /* should not reach here */
>   return 0;
>  }
> +#else /* __GCOV_KERNEL__ */
> +
> +#if IN_GCOV != 1
> +/* Delete pmu tool header TOOL_HEADER.  */
> +
> +GCOV_LINKAGE void
> +destroy_pmu_tool_header (gcov_pmu_tool_header_t *tool_header)
> +{
> +  if (!tool_header)
> +    return;
> +  if (tool_header->host_cpu)
> +    free (tool_header->host_cpu);
> +  if (tool_header->hostname)
> +    free (tool_header->hostname);
> +  if (tool_header->kernel_version)
> +    free (tool_header->kernel_version);
> +  if (tool_header->column_header)
> +    free (tool_header->column_header);
> +  if (tool_header->column_description)
> +    free (tool_header->column_description);
> +  if (tool_header->full_header)
> +    free (tool_header->full_header);
> +}
> +#endif
> +
>  #endif /* GCOV_KERNEL */
> Index: gcc/gcov-io.h
> ===================================================================
> --- gcc/gcov-io.h       (revision 175226)
> +++ gcc/gcov-io.h       (working copy)
> @@ -313,6 +313,7 @@
>
>  typedef unsigned gcov_unsigned_t;
>  typedef unsigned gcov_position_t;
> +
>  /* gcov_type is typedef'd elsewhere for the compiler */
>  #if IN_GCOV
>  #define GCOV_LINKAGE static
> @@ -363,15 +364,24 @@
>  #define gcov_write_counter __gcov_write_counter
>  #define gcov_write_summary __gcov_write_summary
>  #define gcov_write_module_info __gcov_write_module_info
> +#define gcov_write_string __gcov_write_string
> +#define gcov_string_length __gcov_string_length
>  #define gcov_read_unsigned __gcov_read_unsigned
>  #define gcov_read_counter __gcov_read_counter
> +#define gcov_read_string __gcov_read_string
>  #define gcov_read_summary __gcov_read_summary
>  #define gcov_read_module_info __gcov_read_module_info
>  #define gcov_sort_n_vals __gcov_sort_n_vals
> +#define gcov_canonical_filename _gcov_canonical_filename
> +#define gcov_read_pmu_load_latency_info __gcov_read_pmu_load_latency_info
> +#define gcov_read_pmu_branch_mispredict_info __gcov_read_pmu_branch_mispredict_info
> +#define gcov_read_pmu_tool_header __gcov_read_pmu_tool_header
> +#define destroy_pmu_tool_header __destroy_pmu_tool_header
>
> +
>  /* Poison these, so they don't accidentally slip in.  */
> -#pragma GCC poison gcov_write_string gcov_write_tag gcov_write_length
> -#pragma GCC poison gcov_read_string gcov_sync gcov_time gcov_magic
> +#pragma GCC poison gcov_write_tag gcov_write_length
> +#pragma GCC poison gcov_sync gcov_time gcov_magic
>
>  #ifdef HAVE_GAS_HIDDEN
>  #define ATTRIBUTE_HIDDEN  __attribute__ ((__visibility__ ("hidden")))
> @@ -432,6 +442,13 @@
>  #define GCOV_TAG_SUMMARY_LENGTH  \
>        (1 + GCOV_COUNTERS_SUMMABLE * (2 + 3 * 2))
>  #define GCOV_TAG_MODULE_INFO ((gcov_unsigned_t)0xa4000000)
> +#define GCOV_TAG_PMU_LOAD_LATENCY_INFO ((gcov_unsigned_t)0xa5000000)
> +#define GCOV_TAG_PMU_LOAD_LATENCY_LENGTH(filename)  \
> +  (gcov_string_length (filename) + 12 + 2)
> +#define GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO ((gcov_unsigned_t)0xa7000000)
> +#define GCOV_TAG_PMU_BRANCH_MISPREDICT_LENGTH(filename)  \
> +  (gcov_string_length (filename) + 5 + 2)
> +#define GCOV_TAG_PMU_TOOL_HEADER ((gcov_unsigned_t)0xa9000000)
>
>  /* Counters that are collected.  */
>  #define GCOV_COUNTER_ARCS      0  /* Arc transitions.  */
> @@ -545,6 +562,8 @@
>  #define GCOV_MODULE_ASM_STMTS (1 << 16)
>  #define GCOV_MODULE_LANG_MASK 0xffff
>
> +enum print_newline {no_newline, add_newline};
> +
>  /* Source module info. The data structure is used in
>    both runtime and profile-use phase. Make sure to allocate
>    enough space for the variable length member.  */
> @@ -576,6 +595,91 @@
>    && !((module_infos[0]->lang & GCOV_MODULE_ASM_STMTS)                        \
>        && flag_ripa_disallow_asm_modules))
>
> +/* Information about the hardware performance monitoring unit.  */
> +struct gcov_pmu_info
> +{
> +  const char *pmu_profile_filename;    /* pmu profile filename  */
> +  const char *pmu_tool;        /* canonical pmu tool options  */
> +  gcov_unsigned_t pmu_top_n_address;  /* how many top addresses to symbolize */
> +};
> +
> +/* Information about the PMU tool header.  */
> +typedef struct gcov_pmu_tool_header {
> +  char *host_cpu;
> +  char *hostname;
> +  char *kernel_version;
> +  char *column_header;
> +  char *column_description;
> +  char *full_header;
> +} gcov_pmu_tool_header_t;
> +
> +/* Available only for PMUs which support PEBS or IBS using pfmon
> +   tool. If any field here is changed, the length computation in
> +   GCOV_TAG_PMU_LOAD_LATENCY_LENGTH must be updated as well. All
> +   percentages are multiplied by 100 to make them out of 10000 and
> +   only integer part is kept.  */
> +typedef struct gcov_pmu_load_latency_info
> +{
> +  gcov_unsigned_t counts;     /* raw count of samples */
> +  gcov_unsigned_t self;       /* per 10k of total samples */
> +  gcov_unsigned_t cum;        /* per 10k cumulative weight */
> +  gcov_unsigned_t lt_10;      /* per 10k with latency <= 10 cycles */
> +  gcov_unsigned_t lt_32;      /* per 10k with latency <= 32 cycles */
> +  gcov_unsigned_t lt_64;      /* per 10k with latency <= 64 cycles */
> +  gcov_unsigned_t lt_256;     /* per 10k with latency <= 256 cycles */
> +  gcov_unsigned_t lt_1024;    /* per 10k with latency <= 1024 cycles */
> +  gcov_unsigned_t gt_1024;    /* per 10k with latency > 1024 cycles */
> +  gcov_unsigned_t wself;      /* weighted average cost of this miss in cycles */
> +  gcov_type code_addr;        /* the actual miss address (pc+1 for Intel) */
> +  gcov_unsigned_t line;       /* line number corresponding to this miss */
> +  gcov_unsigned_t discriminator;   /* discriminator information for this miss */
> +  char *filename;       /* filename corresponding to this miss */
> +} gcov_pmu_ll_info_t;
> +
> +/* This structure is used during runtime as well as in gcov.  */
> +typedef struct load_latency_infos
> +{
> +  /* An array describing the total number of load latency fields.  */
> +  gcov_pmu_ll_info_t **ll_array;
> +  /* The total number of entries in the load latency array.  */
> +  unsigned ll_count;
> +  /* The total number of entries currently allocated in the array.
> +     Used for bookkeeping.  */
> +  unsigned alloc_ll_count;
> +  /* PMU tool header */
> +  gcov_pmu_tool_header_t *pmu_tool_header;
> +} ll_infos_t;
> +
> +/* Available only for PMUs which support PEBS or IBS using pfmon
> +   tool. If any field here is changed, the length computation in
> +   GCOV_TAG_PMU_BR_MISPREDICT_LENGTH must be updated as well. All
> +   percentages are multiplied by 100 to make them out of 10000 and
> +   only integer part is kept.  */
> +typedef struct gcov_pmu_branch_mispredict_info
> +{
> +  gcov_unsigned_t counts;     /* raw count of samples */
> +  gcov_unsigned_t self;       /* per 10k of total samples */
> +  gcov_unsigned_t cum;        /* per 10k cumulative weight */
> +  gcov_type code_addr;        /* the actual mispredict address */
> +  gcov_unsigned_t line;       /* line number corresponding to this event */
> +  gcov_unsigned_t discriminator;   /* discriminator for this event */
> +  char *filename;       /* filename corresponding to this event */
> +} gcov_pmu_brm_info_t;
> +
> +/* This structure is used during runtime as well as in gcov.  */
> +typedef struct branch_mispredict_infos
> +{
> +  /* An array describing the total number of mispredict entries.  */
> +  gcov_pmu_brm_info_t **brm_array;
> +  /* The total number of entries in the above array.  */
> +  unsigned brm_count;
> +  /* The total number of entries currently allocated in the array.
> +     Used for bookkeeping.  */
> +  unsigned alloc_brm_count;
> +  /* PMU tool header */
> +  gcov_pmu_tool_header_t *pmu_tool_header;
> +} brm_infos_t;
> +
>  /* Structures embedded in coveraged program.  The structures generated
>    by write_profile must match these.  */
>
> @@ -635,9 +739,6 @@
>  /* Register a new object file module.  */
>  extern void __gcov_init (struct gcov_info *) ATTRIBUTE_HIDDEN;
>
> -/* Set sampling rate to RATE.  */
> -extern void __gcov_set_sampling_rate (unsigned int rate);
> -
>  /* Called before fork, to avoid double counting.  */
>  extern void __gcov_flush (void) ATTRIBUTE_HIDDEN;
>
> @@ -674,6 +775,12 @@
>  extern void __gcov_ior_profiler (gcov_type *, gcov_type);
>  extern void __gcov_sort_n_vals (gcov_type *value_array, int n);
>
> +/* Initialize/start/stop/dump performance monitoring unit (PMU) profile */
> +void __gcov_init_pmu_profiler (struct gcov_pmu_info *) ATTRIBUTE_HIDDEN;
> +void __gcov_start_pmu_profiler (void) ATTRIBUTE_HIDDEN;
> +void __gcov_stop_pmu_profiler (void) ATTRIBUTE_HIDDEN;
> +void __gcov_end_pmu_profiler (int gcda_error) ATTRIBUTE_HIDDEN;
> +
>  #ifndef inhibit_libc
>  /* The wrappers around some library functions..  */
>  extern pid_t __gcov_fork (void) ATTRIBUTE_HIDDEN;
> @@ -746,14 +853,42 @@
>  static gcov_position_t gcov_position (void);
>  static int gcov_is_error (void);
>
> +GCOV_LINKAGE const char *gcov_read_string (void) ATTRIBUTE_HIDDEN;
>  GCOV_LINKAGE gcov_unsigned_t gcov_read_unsigned (void) ATTRIBUTE_HIDDEN;
>  GCOV_LINKAGE gcov_type gcov_read_counter (void) ATTRIBUTE_HIDDEN;
>  GCOV_LINKAGE void gcov_read_summary (struct gcov_summary *) ATTRIBUTE_HIDDEN;
> +GCOV_LINKAGE char *gcov_canonical_filename (char *filename) ATTRIBUTE_HIDDEN;
> +GCOV_LINKAGE void
> +gcov_read_pmu_load_latency_info (gcov_pmu_ll_info_t *ll_info,
> +                                 gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
> +GCOV_LINKAGE void
> +gcov_read_pmu_branch_mispredict_info (gcov_pmu_brm_info_t *brm_info,
> +                                      gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
> +GCOV_LINKAGE void
> +gcov_read_pmu_tool_header (gcov_pmu_tool_header_t *tool_header,
> +                           gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
> +GCOV_LINKAGE float convert_unsigned_to_pct (
> +    const unsigned number) ATTRIBUTE_HIDDEN;
> +
>  #if !IN_LIBGCOV && IN_GCOV != 1
>  GCOV_LINKAGE void gcov_read_module_info (struct gcov_module_info *mod_info,
>                                         gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
> +GCOV_LINKAGE void print_load_latency_line (FILE *fp,
> +                                           const gcov_pmu_ll_info_t *ll_info,
> +                                           const enum print_newline);
> +GCOV_LINKAGE void
> +print_branch_mispredict_line (FILE *fp, const gcov_pmu_brm_info_t *brm_info,
> +                              const enum print_newline);
> +GCOV_LINKAGE void print_pmu_tool_header (FILE *fp,
> +                                         gcov_pmu_tool_header_t *tool_header,
> +                                         const enum print_newline);
>  #endif
>
> +#if IN_GCOV != 1
> +GCOV_LINKAGE void destroy_pmu_tool_header (gcov_pmu_tool_header_t *tool_header)
> +  ATTRIBUTE_HIDDEN;
> +#endif
> +
>  #if IN_LIBGCOV
>  /* Available only in libgcov */
>  GCOV_LINKAGE void gcov_write_counter (gcov_type) ATTRIBUTE_HIDDEN;
> @@ -771,10 +906,10 @@
>  static void gcov_rewrite (void);
>  GCOV_LINKAGE void gcov_seek (gcov_position_t /*position*/) ATTRIBUTE_HIDDEN;
>  GCOV_LINKAGE void gcov_truncate (void) ATTRIBUTE_HIDDEN;
> +GCOV_LINKAGE gcov_unsigned_t gcov_string_length (const char *) ATTRIBUTE_HIDDEN;
>  GCOV_LINKAGE unsigned gcov_gcda_file_size (struct gcov_info *);
>  #else
>  /* Available outside libgcov */
> -GCOV_LINKAGE const char *gcov_read_string (void);
>  GCOV_LINKAGE void gcov_sync (gcov_position_t /*base*/,
>                             gcov_unsigned_t /*length */);
>  #endif
> @@ -782,11 +917,11 @@
>  #if !IN_GCOV
>  /* Available outside gcov */
>  GCOV_LINKAGE void gcov_write_unsigned (gcov_unsigned_t) ATTRIBUTE_HIDDEN;
> +GCOV_LINKAGE void gcov_write_string (const char *) ATTRIBUTE_HIDDEN;
>  #endif
>
>  #if !IN_GCOV && !IN_LIBGCOV
>  /* Available only in compiler */
> -GCOV_LINKAGE void gcov_write_string (const char *);
>  GCOV_LINKAGE gcov_position_t gcov_write_tag (gcov_unsigned_t);
>  GCOV_LINKAGE void gcov_write_length (gcov_position_t /*position*/);
>  #endif
> Index: gcc/opts.c
> ===================================================================
> --- gcc/opts.c  (revision 175226)
> +++ gcc/opts.c  (working copy)
> @@ -36,6 +36,9 @@
>  #include "insn-attr.h"         /* For INSN_SCHEDULING and DELAY_SLOTS.  */
>  #include "target.h"
>
> +/* Defined in coverage.c.  */
> +extern int check_pmu_profile_options (const char *options);
> +
>  /* Parse the -femit-struct-debug-detailed option value
>    and set the flag variables. */
>
> @@ -1597,6 +1600,15 @@
>         opts->x_flag_ipa_reference = false;
>       break;
>
> +    case OPT_fpmu_profile_generate_:
> +      /* This should be ideally turned on in conjunction with
> +         -fprofile-dir or -fprofile-generate in order to specify a
> +         profile directory.  */
> +      if (check_pmu_profile_options (arg))
> +        error ("Unrecognized pmu_profile_generate value \"%s\"", arg);
> +      flag_pmu_profile_generate = xstrdup (arg);
> +      break;
> +
>     case OPT_fshow_column:
>       dc->show_column = value;
>       break;
> Index: gcc/pmu-profile.c
> ===================================================================
> --- gcc/pmu-profile.c   (revision 0)
> +++ gcc/pmu-profile.c   (revision 0)
> @@ -0,0 +1,1552 @@
> +/* Performance monitoring unit (PMU) profiler. If available, use an
> +   external tool to collect hardware performance counter data and
> +   write it in the .gcda files.
> +
> +   Copyright (C) 2011. Free Software Foundation, Inc.
> +   Contributed by Sharad Singhai <singhai@google.com>.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "tconfig.h"
> +#include "tsystem.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#if (defined (__x86_64__) || defined (__i386__))
> +#include "cpuid.h"
> +#endif
> +
> +#if defined(inhibit_libc)
> +#define IN_LIBGCOV (-1)
> +#else
> +#include <stdio.h>
> +#include <stdlib.h>
> +#define IN_LIBGCOV 1
> +  #if defined(L_gcov)
> +  #define GCOV_LINKAGE /* nothing */
> +  #endif
> +#endif
> +#include "gcov-io.h"
> +#ifdef TARGET_POSIX_IO
> +  #include <fcntl.h>
> +  #include <signal.h>
> +  #include <sys/stat.h>
> +  #include <sys/types.h>
> +#endif
> +
> +#if defined(inhibit_libc)
> +#else
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +
> +#define XNEWVEC(type,ne) (type *)calloc((ne),sizeof(type))
> +#define XNEW(type) (type *)malloc(sizeof(type))
> +#define XDELETEVEC(p) free(p)
> +#define XDELETE(p) free(p)
> +
> +#define PFMON_CMD "/usr/bin/pfmon"
> +#define ADDR2LINE_CMD "/usr/bin/addr2line"
> +#define PMU_TOOL_MAX_ARGS (20)
> +static char default_addr2line[] = "??:0";
> +static const char pfmon_ll_header[] = "#     counts   %self    %cum     "
> +    "<10     <32     <64    <256   <1024  >=1024  %wself          "
> +    "code addr symbol\n";
> +static const char pfmon_bm_header[] =
> +    "#     counts   %self    %cum          code addr symbol\n";
> +
> +const char *pfmon_intel_ll_args[PMU_TOOL_MAX_ARGS] = {
> +  PFMON_CMD,
> +  "--aggregate-results",
> +  "--follow-all",
> +  "--with-header",
> +  "--smpl-module=pebs-ll",
> +  "--ld-lat-threshold=4",
> +  "--pebs-ll-dcmiss-code",
> +  "--resolve-addresses",
> +  "-emem_inst_retired:LATENCY_ABOVE_THRESHOLD",
> +  "--long-smpl-periods=10000",
> +  0  /* terminating NULL must be present */
> +};
> +
> +const char *pfmon_amd_ll_args[PMU_TOOL_MAX_ARGS] = {
> +  PFMON_CMD,
> +  "--aggregate-results",
> +  "--follow-all",
> +  "-uk",
> +  "--with-header",
> +  "--smpl-module=ibs",
> +  "--resolve-addresses",
> +  "-eibsop_event:uops",
> +  "--ibs-dcmiss-code",
> +  "--long-smpl-periods=0xffff0",
> +  0  /* terminating NULL must be present */
> +};
> +
> +const char *pfmon_intel_brm_args[PMU_TOOL_MAX_ARGS] = {
> +  PFMON_CMD,
> +  "--aggregate-results",
> +  "--follow-all",
> +  "--with-header",
> +  "--resolve-addresses",
> +  "-eMISPREDICTED_BRANCH_RETIRED",
> +  "--long-smpl-periods=10000",
> +  0  /* terminating NULL must be present */
> +};
> +
> +const char *pfmon_amd_brm_args[PMU_TOOL_MAX_ARGS] = {
> +  PFMON_CMD,
> +  "--aggregate-results",
> +  "--follow-all",
> +  "--with-header",
> +  "--resolve-addresses",
> +  "-eRETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS",
> +  "--long-smpl-periods=10000",
> +  0  /* terminating NULL must be present */
> +};
> +
> +const char *addr2line_args[PMU_TOOL_MAX_ARGS] = {
> +  ADDR2LINE_CMD,
> +  "-e",
> +  0  /* terminating NULL must be present */
> +};
> +
> +
> +enum pmu_tool_type
> +{
> +  PTT_PFMON,
> +  PTT_LAST
> +};
> +
> +enum pmu_event_type
> +{
> +  PET_INTEL_LOAD_LATENCY,
> +  PET_AMD_LOAD_LATENCY,
> +  PET_INTEL_BRANCH_MISPREDICT,
> +  PET_AMD_BRANCH_MISPREDICT,
> +  PET_LAST
> +};
> +
> +typedef struct pmu_tool_fns {
> +  const char *name;     /* name of the pmu tool */
> +  /* pmu tool commandline argument.  */
> +  const char **arg_array;
> +  /* Initialize pmu module.  */
> +  void *(*init_pmu_module) (void);
> +  /* Start profililing.  */
> +  void (*start_pmu_module) (pid_t ppid, char *tmpfile, const char **args);
> +  /* Stop profililing.  */
> +  void (*stop_pmu_module) (void);
> +  /* How to parse the output generated by the PMU tool.  */
> +  int (*parse_pmu_output) (char *filename, void *pmu_data);
> +  /* How to write parsed pmu data into gcda file.  */
> +  void (*gcov_write_pmu_data) (void *data);
> +  /* How to cleanup any data structure created during parsing.  */
> +  void (*cleanup_pmu_data) (void *data);
> +  /* How to initialize symbolizer for the PPID.  */
> +  int (*start_symbolizer) (pid_t ppid);
> +  void (*end_symbolizer) (void);
> +  char *(*symbolize) (void *addr);
> +} pmu_tool_fns;
> +
> +enum pmu_state
> +{
> +  PMU_NONE,             /* Not configurated at all.  */
> +  PMU_INITIALIZED,      /* Configured and initialized.  */
> +  PMU_ERROR,            /* Configuration error. Cannot recover.  */
> +  PMU_ON,               /* Currently profiling.  */
> +  PMU_OFF               /* Currently stopped, but can be restarted.  */
> +};
> +
> +enum cpu_vendor_signature
> +{
> +  CPU_VENDOR_UKNOWN = 0,
> +  CPU_VENDOR_INTEL  = 0x756e6547, /* Genu */
> +  CPU_VENDOR_AMD    = 0x68747541 /* Auth */
> +};
> +
> +/* Info about pmu tool during the run time.  */
> +struct pmu_tool_info
> +{
> +  /* Current pmu tool.  */
> +  enum pmu_tool_type tool;
> +  /* Current event.  */
> +  enum pmu_event_type event;
> +  /* filename for storing the pmu profile.  */
> +  char *pmu_profile_filename;
> +  /* Intermediate file where the tool stores the PMU data.  */
> +  char *raw_pmu_profile_filename;
> +  /* Where PMU tool's stderr should be stored.  */
> +  char *tool_stderr_filename;
> +  enum pmu_state pmu_profiling_state;
> +  enum cpu_vendor_signature cpu_vendor; /* as discovered by cpuid */
> +  pid_t pmu_tool_pid;   /* process id of the pmu tool */
> +  pid_t symbolizer_pid; /* process id of the symbolizer */
> +  int symbolizer_to_pipefd[2]; /* pipe for writing to the symbolizer */
> +  int symbolizer_from_pipefd[2];  /* pipe for reading from the symbolizer */
> +  void *pmu_data;       /* an opaque pointer for the tool to store pmu data */
> +  int verbose;          /* turn on additional debugging */
> +  unsigned top_n_address;  /* how many addresses to symbolize */
> +  pmu_tool_fns *tool_details;  /* list of functions how to start/stop/parse */
> +};
> +
> +/* Global struct for recordkeeping.  */
> +static struct pmu_tool_info *the_pmu_tool_info;
> +
> +/* Additional info is printed if these are non-zero.  */
> +static int tool_debug = 0;
> +static int sym_debug = 0;
> +
> +static int parse_load_latency_line (char *line, gcov_pmu_ll_info_t *ll_info);
> +static int parse_branch_mispredict_line (char *line,
> +                                         gcov_pmu_brm_info_t *brm_info);
> +static unsigned convert_pct_to_unsigned (float pct);
> +static void start_pfmon_module (pid_t ppid, char *tmpfile, const char **pfmon_args);
> +static void *init_pmu_load_latency (void);
> +static void *init_pmu_branch_mispredict (void);
> +static void destroy_load_latency_infos (void *info);
> +static void destroy_branch_mispredict_infos (void *info);
> +static int parse_pfmon_load_latency (char *filename, void *pmu_data);
> +static int parse_pfmon_branch_mispredicts (char *filename, void *pmu_data);
> +static gcov_unsigned_t gcov_tag_pmu_tool_header_length (gcov_pmu_tool_header_t
> +                                                        *header);
> +static void gcov_write_tool_header (gcov_pmu_tool_header_t *header);
> +static void gcov_write_load_latency_infos (void *info);
> +static void gcov_write_branch_mispredict_infos (void *info);
> +static void gcov_write_ll_line (const gcov_pmu_ll_info_t *ll_info);
> +static void gcov_write_branch_mispredict_line (const gcov_pmu_brm_info_t
> +                                               *brm_info);
> +static int start_addr2line_symbolizer (pid_t pid);
> +static void end_addr2line_symbolizer (void);
> +static char *symbolize_addr2line (void *p);
> +static void reset_symbolizer_parent_pipes (void);
> +static void reset_symbolizer_child_pipes (void);
> +/* parse and cache relevant tool info.  */
> +static int parse_pmu_profile_options (const char *options);
> +static gcov_pmu_tool_header_t *parse_pfmon_tool_header (FILE *fp,
> +                                                        const char *end_header);
> +
> +
> +/* How to access the necessary functions for the PMU tools.  */
> +pmu_tool_fns all_pmu_tool_fns[PTT_LAST][PET_LAST] = {
> +  {
> +    {
> +      "intel-load-latency",             /* name */
> +      pfmon_intel_ll_args,              /* tool args */
> +      init_pmu_load_latency,            /* initialization */
> +      start_pfmon_module,               /* start */
> +      0,                                /* stop */
> +      parse_pfmon_load_latency,         /* parse */
> +      gcov_write_load_latency_infos,    /* write */
> +      destroy_load_latency_infos,       /* cleanup */
> +      start_addr2line_symbolizer,       /* start symbolizer */
> +      end_addr2line_symbolizer,         /* end symbolizer */
> +      symbolize_addr2line,              /* symbolize */
> +    },
> +    {
> +      "amd-load-latency",               /* name */
> +      pfmon_amd_ll_args,                /* tool args */
> +      init_pmu_load_latency,            /* initialization */
> +      start_pfmon_module,               /* start */
> +      0,                                /* stop */
> +      parse_pfmon_load_latency,         /* parse */
> +      gcov_write_load_latency_infos,    /* write */
> +      destroy_load_latency_infos,       /* cleanup */
> +      start_addr2line_symbolizer,       /* start symbolizer */
> +      end_addr2line_symbolizer,         /* end symbolizer */
> +      symbolize_addr2line,              /* symbolize */
> +    },
> +    {
> +      "intel-branch-mispredict",        /* name */
> +      pfmon_intel_brm_args,             /* tool args */
> +      init_pmu_branch_mispredict,       /* initialization */
> +      start_pfmon_module,               /* start */
> +      0,                                /* stop */
> +      parse_pfmon_branch_mispredicts,   /* parse */
> +      gcov_write_branch_mispredict_infos,/* write */
> +      destroy_branch_mispredict_infos,  /* cleanup */
> +      start_addr2line_symbolizer,       /* start symbolizer */
> +      end_addr2line_symbolizer,         /* end symbolizer */
> +      symbolize_addr2line,              /* symbolize */
> +    },
> +    {
> +      "amd-branch-mispredict",          /* name */
> +      pfmon_amd_brm_args,               /* tool args */
> +      init_pmu_branch_mispredict,       /* initialization */
> +      start_pfmon_module,               /* start */
> +      0,                                /* stop */
> +      parse_pfmon_branch_mispredicts,   /* parse */
> +      gcov_write_branch_mispredict_infos,/* write */
> +      destroy_branch_mispredict_infos,  /* cleanup */
> +      start_addr2line_symbolizer,       /* start symbolizer */
> +      end_addr2line_symbolizer,         /* end symbolizer */
> +      symbolize_addr2line,              /* symbolize */
> +    }
> +  }
> +};
> +
> +/* Determine the CPU vendor.  Currently only distinguishes x86 based
> +   cpus where the vendor is either Intel or AMD.  Returns one of the
> +   enum cpu_vendor_signatures.  */
> +
> +static unsigned int
> +get_x86cpu_vendor (void)
> +{
> +  unsigned int vendor = CPU_VENDOR_UKNOWN;
> +
> +#if (defined (__x86_64__) || defined (__i386__))
> +  if (__get_cpuid_max (0, &vendor) < 1)
> +    return CPU_VENDOR_UKNOWN;      /* Cannot determine cpu type.  */
> +#endif
> +
> +  if (vendor == CPU_VENDOR_INTEL || vendor == CPU_VENDOR_AMD)
> +    return vendor;
> +  else
> +    return CPU_VENDOR_UKNOWN;
> +}
> +
> +
> +/* Parse PMU tool option string provided on the command line and store
> +   information in global structure.  Return 0 on success, otherwise
> +   return 1.  Any changes to this should be synced with
> +   check_pmu_profile_options() which does compile time check.  */
> +
> +static int
> +parse_pmu_profile_options (const char *options)
> +{
> +  enum pmu_tool_type ptt = the_pmu_tool_info->tool;
> +  enum pmu_event_type pet = PET_LAST;
> +  const char *pmutool_path;
> +  the_pmu_tool_info->cpu_vendor =  get_x86cpu_vendor ();
> +  /* Determine the platform we are running on.  */
> +  if (the_pmu_tool_info->cpu_vendor == CPU_VENDOR_UKNOWN)
> +    {
> +      /* Cpuid failed or uknown vendor.  */
> +      the_pmu_tool_info->pmu_profiling_state = PMU_ERROR;
> +      return 1;
> +    }
> +
> +  /* Validate the options.  */
> +  if (strcmp(options, "load-latency") &&
> +      strcmp(options, "load-latency-verbose") &&
> +      strcmp(options, "branch-mispredict") &&
> +      strcmp(options, "branch-mispredict-verbose"))
> +    return 1;
> +
> +  /* Check if are aksed to collect load latency PMU data.  */
> +  if (!strcmp(options, "load-latency") ||
> +      !strcmp(options, "load-latency-verbose"))
> +    {
> +      if (the_pmu_tool_info->cpu_vendor == CPU_VENDOR_INTEL)
> +        pet = PET_INTEL_LOAD_LATENCY;
> +      else
> +        pet = PET_AMD_LOAD_LATENCY;
> +      if (!strcmp(options, "load-latency-verbose"))
> +        the_pmu_tool_info->verbose = 1;
> +    }
> +
> +  /* Check if are aksed to collect branch mispredict PMU data.  */
> +  if (!strcmp(options, "branch-mispredict") ||
> +      !strcmp(options, "branch-mispredict-verbose"))
> +    {
> +      if (the_pmu_tool_info->cpu_vendor == CPU_VENDOR_INTEL)
> +        pet = PET_INTEL_BRANCH_MISPREDICT;
> +      else
> +        pet = PET_AMD_BRANCH_MISPREDICT;
> +      if (!strcmp(options, "branch-mispredict-verbose"))
> +        the_pmu_tool_info->verbose = 1;
> +    }
> +
> +  the_pmu_tool_info->tool_details = &all_pmu_tool_fns[ptt][pet];
> +  the_pmu_tool_info->event = pet;
> +
> +  /* Allow users to override the default tool path.  */
> +  pmutool_path = getenv ("GCOV_PMUTOOL_PATH");
> +  if (pmutool_path && strlen (pmutool_path))
> +    the_pmu_tool_info->tool_details->arg_array[0] = pmutool_path;
> +
> +  return 0;
> +}
> +
> +/* Do the initialization of addr2line symbolizer for the process id
> +   given by TASK_PID.  It forks an addr2line process and creates two
> +   pipes where addresses can be written and source_filename:line_num
> +   entries can be read.  Returns 0 on success, non-zero otherwise.  */
> +
> +static int
> +start_addr2line_symbolizer (pid_t task_pid)
> +{
> +  pid_t pid;
> +  char *addr2line_path;
> +
> +  /* Allow users to override the default addr2line path.  */
> +  addr2line_path = getenv ("GCOV_ADDR2LINE_PATH");
> +  if (addr2line_path && strlen (addr2line_path))
> +    addr2line_args[0] = addr2line_path;
> +
> +  if (pipe (the_pmu_tool_info->symbolizer_from_pipefd) == -1)
> +    {
> +      fprintf (stderr, "Cannot create symbolizer write pipe.\n");
> +      return 1;
> +    }
> +  if (pipe (the_pmu_tool_info->symbolizer_to_pipefd) == -1)
> +    {
> +      fprintf (stderr, "Cannot create symbolizer read pipe.\n");
> +      return 1;
> +    }
> +
> +  pid = fork ();
> +  if (pid == -1)
> +    {
> +      /* error condition */
> +      fprintf (stderr, "Cannot create symbolizer process.\n");
> +      reset_symbolizer_parent_pipes ();
> +      reset_symbolizer_child_pipes ();
> +      return 1;
> +    }
> +
> +  if (pid == 0)
> +    {
> +      /* child does an exec and then connects to/from the pipe */
> +      unsigned n_args = 0;
> +      char proc_exe_buf[128];
> +      int new_write_fd, new_read_fd;
> +      int i;
> +
> +      /* Go over the current addr2line args.  */
> +      for (i = 0; i < PMU_TOOL_MAX_ARGS && addr2line_args[i]; ++i)
> +        n_args++;
> +
> +      /* We are going to add one more arg for the /proc/pid/exe */
> +      if (n_args >= (PMU_TOOL_MAX_ARGS - 1))
> +        {
> +          fprintf (stderr, "too many addr2line args: %d\n", n_args);
> +          _exit (0);
> +        }
> +      snprintf (proc_exe_buf, sizeof (proc_exe_buf), "/proc/%d/exe",
> +                task_pid);
> +
> +      /* Add the extra arg for the process id.  */
> +      addr2line_args[n_args] = proc_exe_buf;
> +      n_args++;
> +
> +      addr2line_args[n_args] = (const char *)NULL;  /* terminating NULL */
> +
> +      if (sym_debug)
> +        {
> +          fprintf (stderr, "addr2line args:");
> +          for (i = 0; i < PMU_TOOL_MAX_ARGS && addr2line_args[i]; ++i)
> +            fprintf (stderr, " %s", addr2line_args[i]);
> +          fprintf (stderr, "\n");
> +        }
> +
> +      /* Close unused ends of the two pipes.  */
> +      reset_symbolizer_child_pipes ();
> +
> +      /* Connect the pipes to stdin/stdout of the child process.  */
> +      new_read_fd = dup2 (the_pmu_tool_info->symbolizer_to_pipefd[0], 0);
> +      new_write_fd = dup2 (the_pmu_tool_info->symbolizer_from_pipefd[1], 1);
> +      if (new_read_fd == -1 || new_write_fd == -1)
> +        {
> +          fprintf (stderr, "could not dup symbolizer fds\n");
> +          reset_symbolizer_parent_pipes ();
> +          reset_symbolizer_child_pipes ();
> +          _exit (0);
> +        }
> +      the_pmu_tool_info->symbolizer_to_pipefd[0] = new_read_fd;
> +      the_pmu_tool_info->symbolizer_from_pipefd[1] = new_write_fd;
> +
> +      /* Do execve with NULL env. */
> +      execve (addr2line_args[0], (char * const*)addr2line_args,
> +              (char * const*)NULL);
> +      /* exec returned, an error condition.  */
> +      fprintf (stderr, "could not create symbolizer process: %s\n",
> +               addr2line_args[0]);
> +      reset_symbolizer_parent_pipes ();
> +      reset_symbolizer_child_pipes ();
> +      _exit (0);
> +    }
> +  else
> +    {
> +      /* parent */
> +      the_pmu_tool_info->symbolizer_pid = pid;
> +      /* Close unused ends of the two pipes.  */
> +      reset_symbolizer_parent_pipes ();
> +      return 0;
> +    }
> +  return 0;
> +}
> +
> +/* Close unused write end of the from-pipe and read end of the
> +   to-pipe.  */
> +
> +static void
> +reset_symbolizer_parent_pipes (void)
> +{
> +  if (the_pmu_tool_info->symbolizer_from_pipefd[1] != -1)
> +    {
> +      close (the_pmu_tool_info->symbolizer_from_pipefd[1]);
> +      the_pmu_tool_info->symbolizer_from_pipefd[1] = -1;
> +    }
> +  if (the_pmu_tool_info->symbolizer_to_pipefd[0] != -1)
> +    {
> +      close (the_pmu_tool_info->symbolizer_to_pipefd[0]);
> +      the_pmu_tool_info->symbolizer_to_pipefd[0] = -1;
> +    }
> +}
> +
> +/* Close unused write end of the to-pipe and read end of the
> +   from-pipe.  */
> +
> +static void
> +reset_symbolizer_child_pipes (void)
> +{
> +  if (the_pmu_tool_info->symbolizer_to_pipefd[1] != -1)
> +    {
> +      close (the_pmu_tool_info->symbolizer_to_pipefd[1]);
> +      the_pmu_tool_info->symbolizer_to_pipefd[1] = -1;
> +    }
> +  if (the_pmu_tool_info->symbolizer_from_pipefd[0] != -1)
> +    {
> +      close (the_pmu_tool_info->symbolizer_from_pipefd[0]);
> +      the_pmu_tool_info->symbolizer_from_pipefd[0] = -1;
> +    }
> +}
> +
> +
> +/* Perform cleanup for the symbolizer process.  */
> +
> +static void
> +end_addr2line_symbolizer (void)
> +{
> +  int pid_status;
> +  int wait_status;
> +  pid_t pid = the_pmu_tool_info->symbolizer_pid;
> +
> +  /* Symbolizer was not running.  */
> +  if (!pid)
> +    return;
> +
> +  reset_symbolizer_parent_pipes ();
> +  reset_symbolizer_child_pipes ();
> +  kill (pid, SIGTERM);
> +  wait_status = waitpid (pid, &pid_status, 0);
> +  if (sym_debug)
> +  {
> +    if (wait_status == pid)
> +      fprintf (stderr, "Normal exit. symbolizer terminated.\n");
> +    else
> +      fprintf (stderr, "Abnormal exit. symbolizer status, %d.\n", pid_status);
> +  }
> +  the_pmu_tool_info->symbolizer_pid = 0;  /* Symoblizer no longer running.  */
> +}
> +
> +
> +/* Given an address ADDR, return a string containing
> +   source_filename:line_num entries.  */
> +
> +static char *
> +symbolize_addr2line (void *addr)
> +{
> +  char buf[32];  /* holds the ascii version of address */
> +  int write_count;
> +  int read_count;
> +  char *srcfile_linenum;
> +  size_t max_length = 1024;
> +
> +  if (!the_pmu_tool_info->symbolizer_pid)
> +    return default_addr2line;    /* symbolizer is not running */
> +
> +  write_count = snprintf (buf, sizeof (buf), "%p\n", addr);
> +
> +  /* Write the address into the pipe.  */
> +  if (write (the_pmu_tool_info->symbolizer_to_pipefd[1], buf, write_count)
> +      < write_count)
> +    {
> +      if (sym_debug)
> +        fprintf (stderr, "Cannot write symbolizer pipe.\n");
> +      return default_addr2line;
> +    }
> +
> +  srcfile_linenum = XNEWVEC (char, max_length);
> +  read_count = read (the_pmu_tool_info->symbolizer_from_pipefd[0],
> +                     srcfile_linenum, max_length);
> +  if (read_count == -1)
> +    {
> +      if (sym_debug)
> +        fprintf (stderr, "Cannot read symbolizer pipe.\n");
> +      XDELETEVEC (srcfile_linenum);
> +      return default_addr2line;
> +    }
> +
> +  srcfile_linenum[read_count] = 0;
> +  if (sym_debug)
> +    fprintf (stderr, "symbolizer: for address %p, read_count %d, got %s\n",
> +             addr, read_count, srcfile_linenum);
> +  return srcfile_linenum;
> +}
> +
> +/* Start monitoring PPID process via pfmon tool using TMPFILE as a
> +   file to store the raw data and using PFMON_ARGS as the command line
> +   arguments.  */
> +
> +static void
> +start_pfmon_module (pid_t ppid, char *tmpfile, const char **pfmon_args)
> +{
> +  int i;
> +  unsigned int n_args = 0;
> +  unsigned n_chars;
> +  char pid_buf[64];
> +  char filename_buf[1024];
> +  char top_n_buf[24];
> +  unsigned extra_args;
> +
> +  /* Go over the current pfmon args */
> +  for (i = 0; i < PMU_TOOL_MAX_ARGS && pfmon_args[i]; ++i)
> +    n_args++;
> +
> +  if (the_pmu_tool_info->verbose)
> +    extra_args = 4; /* account for additional --verbose */
> +  else
> +    extra_args = 3;
> +
> +  /* We are going to add args.  */
> +  if (n_args >= (PMU_TOOL_MAX_ARGS - extra_args))
> +    {
> +      fprintf (stderr, "too many pfmon args: %d\n", n_args);
> +      _exit (0);
> +    }
> +
> +  n_chars = snprintf (pid_buf, sizeof (pid_buf), "--attach-task=%ld",
> +                      (long)ppid);
> +  if (n_chars >= sizeof (pid_buf))
> +    {
> +      fprintf (stderr, "pfmon task id too long: %s\n", pid_buf);
> +      return;
> +    }
> +  pfmon_args[n_args] = pid_buf;
> +  n_args++;
> +
> +  n_chars = snprintf (filename_buf, sizeof (filename_buf), "--smpl-outfile=%s",
> +                      tmpfile);
> +  if (n_chars >= sizeof (filename_buf))
> +    {
> +      fprintf (stderr, "pfmon filename too long: %s\n", filename_buf);
> +      return;
> +    }
> +  pfmon_args[n_args] = filename_buf;
> +  n_args++;
> +
> +  n_chars = snprintf (top_n_buf, sizeof (top_n_buf), "--smpl-show-top=%d",
> +                      the_pmu_tool_info->top_n_address);
> +  if (n_chars >= sizeof (top_n_buf))
> +    {
> +      fprintf (stderr, "pfmon option too long: %s\n", top_n_buf);
> +      return;
> +    }
> +  pfmon_args[n_args] = top_n_buf;
> +  n_args++;
> +
> +  if (the_pmu_tool_info->verbose) {
> +    /* Add --verbose as well.  */
> +    pfmon_args[n_args] = "--verbose";
> +    n_args++;
> +  }
> +  pfmon_args[n_args] = (char *)NULL;
> +
> +  if (tool_debug)
> +    {
> +      fprintf (stderr, "pfmon args:");
> +      for (i = 0; i < PMU_TOOL_MAX_ARGS && pfmon_args[i]; ++i)
> +        fprintf (stderr, " %s", pfmon_args[i]);
> +      fprintf (stderr, "\n");
> +    }
> +  /* Do execve with NULL env.  */
> +  execve (pfmon_args[0], (char *const *)pfmon_args, (char * const*)NULL);
> +  /* does not return */
> +}
> +
> +/* Convert a fractional PCT to an unsigned integer after
> +   muliplying by 100.  */
> +
> +static unsigned
> +convert_pct_to_unsigned (float pct)
> +{
> +  return (unsigned)(pct * 100.0f);
> +}
> +
> +/* Parse the load latency info pointed by LINE and save it into
> +   LL_INFO. Returns 0 if the line was parsed successfully, non-zero
> +   otherwise.
> +
> +   An example header+line look like these:
> +   "counts   %self    %cum     <10     <32     <64    <256   <1024  >=1024
> +   %wself          code addr symbol"
> +   "218  24.06%  24.06% 100.00%   0.00%   0.00%   0.00%   0.00%   0.00%  22.70%
> +   0x0000000000413e75 CalcSSIM(...)+965</tmp/psnr>"
> +*/
> +
> +static int
> +parse_load_latency_line (char *line, gcov_pmu_ll_info_t *ll_info)
> +{
> +  unsigned counts;
> +  /* These are percentages parsed as floats, but then converted to
> +     integers after multiplying by 100.  */
> +  float self, cum, lt_10, lt_32, lt_64, lt_256, lt_1024, gt_1024, wself;
> +  long unsigned int p;
> +  int n_values;
> +  pmu_tool_fns *tool_details = the_pmu_tool_info->tool_details;
> +
> +  n_values = sscanf (line, "%u%f%%%f%%%f%%%f%%%f%%%f%%%f%%%f%%%f%%%lx",
> +                     &counts, &self, &cum, &lt_10, &lt_32, &lt_64, &lt_256,
> +                     &lt_1024, &gt_1024, &wself, &p);
> +  if (n_values != 11)
> +    return 1;
> +
> +  /* Values read successfully. Do the assignment after converting
> +   * percentages into ints.  */
> +  ll_info->counts = counts;
> +  ll_info->self = convert_pct_to_unsigned (self);
> +  ll_info->cum = convert_pct_to_unsigned (cum);
> +  ll_info->lt_10 = convert_pct_to_unsigned (lt_10);
> +  ll_info->lt_32 = convert_pct_to_unsigned (lt_32);
> +  ll_info->lt_64 = convert_pct_to_unsigned (lt_64);
> +  ll_info->lt_256 = convert_pct_to_unsigned (lt_256);
> +  ll_info->lt_1024 = convert_pct_to_unsigned (lt_1024);
> +  ll_info->gt_1024 = convert_pct_to_unsigned (gt_1024);
> +  ll_info->wself = convert_pct_to_unsigned (wself);
> +  ll_info->code_addr = p;
> +
> +  /* Run the raw address through the symbolizer.  */
> +  if (tool_details->symbolize)
> +    {
> +      char *sym_info = tool_details->symbolize ((void *)p);
> +      /* sym_info is of the form src_filename:linenum.  Descriminator is
> +         currently not supported by addr2line.  */
> +      char *sep = strchr (sym_info, ':');
> +      if (!sep)
> +        {
> +          /* Assume entire string is srcfile.  */
> +          ll_info->filename = (char *)sym_info;
> +          ll_info->line = 0;
> +        }
> +      else
> +        {
> +          /* Terminate the filename string at the separator.  */
> +          *sep = 0;
> +          ll_info->filename = (char *)sym_info;
> +          /* Convert rest of the sym info to a line number.  */
> +          ll_info->line = atol (sep+1);
> +        }
> +      ll_info->discriminator = 0;
> +    }
> +  else
> +    {
> +      /* No symbolizer available.  */
> +      ll_info->filename = NULL;
> +      ll_info->line = 0;
> +      ll_info->discriminator = 0;
> +    }
> +  return 0;
> +}
> +
> +/* Parse the branch mispredict info pointed by LINE and save it into
> +   BRM_INFO. Returns 0 if the line was parsed successfully, non-zero
> +   otherwise.
> +
> +   An example header+line look like these:
> +   "counts   %self    %cum          code addr symbol"
> +   "6869  37.67%  37.67% 0x00000000004007e5 sum(std::vector<int*,
> +    std::allocator<int*> > const&)+51</root/tmp/array>"
> +*/
> +
> +static int
> +parse_branch_mispredict_line (char *line, gcov_pmu_brm_info_t *brm_info)
> +{
> +  unsigned counts;
> +  /* These are percentages parsed as floats, but then converted to
> +     ints after multiplying by 100.  */
> +  float self, cum;
> +  long unsigned int p;
> +  int n_values;
> +  pmu_tool_fns *tool_details = the_pmu_tool_info->tool_details;
> +
> +  n_values = sscanf (line, "%u%f%%%f%%%lx",
> +                     &counts, &self, &cum, &p);
> +  if (n_values != 4)
> +    return 1;
> +
> +  /* Values read successfully. Do the assignment after converting
> +   * percentages into ints.  */
> +  brm_info->counts = counts;
> +  brm_info->self = convert_pct_to_unsigned (self);
> +  brm_info->cum = convert_pct_to_unsigned (cum);
> +  brm_info->code_addr = p;
> +
> +  /* Run the raw address through the symbolizer.  */
> +  if (tool_details->symbolize)
> +    {
> +      char *sym_info = tool_details->symbolize ((void *)p);
> +      /* sym_info is of the form src_filename:linenum.  Descriminator is
> +         currently not supported by addr2line.  */
> +      char *sep = strchr (sym_info, ':');
> +      if (!sep)
> +        {
> +          /* Assume entire string is srcfile.  */
> +          brm_info->filename = sym_info;
> +          brm_info->line = 0;
> +        }
> +      else
> +        {
> +          /* Terminate the filename string at the separator.  */
> +          *sep = 0;
> +          brm_info->filename = sym_info;
> +          /* Convert rest of the sym info to a line number.  */
> +          brm_info->line = atol (sep+1);
> +        }
> +      brm_info->discriminator = 0;
> +    }
> +  else
> +    {
> +      /* No symbolizer available.  */
> +      brm_info->filename = NULL;
> +      brm_info->line = 0;
> +      brm_info->discriminator = 0;
> +    }
> +  return 0;
> +}
> +
> +/* Delete load latency info structures INFO.  */
> +
> +static void
> +destroy_load_latency_infos (void *info)
> +{
> +  unsigned i;
> +  ll_infos_t* ll_infos = (ll_infos_t *)info;
> +
> +  /* delete each element */
> +  for (i = 0; i < ll_infos->ll_count; ++i)
> +    XDELETE (ll_infos->ll_array[i]);
> +  /* delete the array itself */
> +  XDELETE (ll_infos->ll_array);
> +  __destroy_pmu_tool_header (ll_infos->pmu_tool_header);
> +  free (ll_infos->pmu_tool_header);
> +  ll_infos->ll_array = 0;
> +  ll_infos->ll_count = 0;
> +}
> +
> +/* Delete branch mispredict structure INFO.  */
> +
> +static void
> +destroy_branch_mispredict_infos (void *info)
> +{
> +  unsigned i;
> +  brm_infos_t* brm_infos = (brm_infos_t *)info;
> +
> +  /* delete each element */
> +  for (i = 0; i < brm_infos->brm_count; ++i)
> +    XDELETE (brm_infos->brm_array[i]);
> +  /* delete the array itself */
> +  XDELETE (brm_infos->brm_array);
> +  __destroy_pmu_tool_header (brm_infos->pmu_tool_header);
> +  free (brm_infos->pmu_tool_header);
> +  brm_infos->brm_array = 0;
> +  brm_infos->brm_count = 0;
> +}
> +
> +/* Parse FILENAME for load latency lines into a structure
> +   PMU_DATA. Returns 0 on on success.  Returns non-zero on
> +   failure.  */
> +
> +static int
> +parse_pfmon_load_latency (char *filename, void *pmu_data)
> +{
> +  FILE *fp;
> +  size_t buflen = 2*1024;
> +  char *buf;
> +  ll_infos_t *load_latency_infos = (ll_infos_t *)pmu_data;
> +  gcov_pmu_tool_header_t *tool_header = 0;
> +
> +  if ((fp = fopen (filename, "r")) == NULL)
> +    {
> +      fprintf (stderr, "cannot open pmu data file: %s\n", filename);
> +      return 1;
> +    }
> +
> +  if (!(tool_header = parse_pfmon_tool_header (fp, pfmon_ll_header)))
> +    {
> +      fprintf (stderr, "cannot parse pmu data file header: %s\n", filename);
> +      return 1;
> +    }
> +
> +  buf = XNEWVEC (char, buflen);
> +  while (fgets (buf, buflen, fp))
> +    {
> +      gcov_pmu_ll_info_t *ll_info = XNEW (gcov_pmu_ll_info_t);
> +      if (!parse_load_latency_line (buf, ll_info))
> +        {
> +          /* valid line, add to the array */
> +          load_latency_infos->ll_count++;
> +          if (load_latency_infos->ll_count >=
> +              load_latency_infos->alloc_ll_count)
> +            {
> +              /* need to realloc */
> +              load_latency_infos->ll_array =
> +                realloc (load_latency_infos->ll_array,
> +                         2 * load_latency_infos->alloc_ll_count);
> +              if (load_latency_infos->ll_array == NULL)
> +                {
> +                  fprintf (stderr, "Cannot allocate load latency memory.\n");
> +                  __destroy_pmu_tool_header (tool_header);
> +                  free (buf);
> +                  fclose (fp);
> +                  return 1;
> +                }
> +            }
> +          load_latency_infos->ll_array[load_latency_infos->ll_count - 1] =
> +            ll_info;
> +        }
> +      else
> +        /* Delete invalid line.  */
> +        XDELETE (ll_info);
> +    }
> +  free (buf);
> +  fclose (fp);
> +  load_latency_infos->pmu_tool_header = tool_header;
> +  return 0;
> +}
> +
> +/* Parse open file FP until END_HEADER is seen. The data matching
> +   gcov_pmu_tool_header_t fields is saved and returned in a new
> +   struct. In case of failure, it returns NULL.  */
> +
> +static gcov_pmu_tool_header_t *
> +parse_pfmon_tool_header (FILE *fp, const char *end_header)
> +{
> +  static const char tag_hostname[] = "# hostname: ";
> +  static const char tag_kversion[] = "# kernel version: ";
> +  static const char tag_hostcpu[] = "# host CPUs:  ";
> +  static const char tag_column_desc_start[] = "# description of columns:";
> +  static const char tag_column_desc_end[] =
> +      "#       other columns are self-explanatory";
> +  size_t buflen = 4*1024;
> +  char *buf, *buf_start, *buf_end;
> +  gcov_pmu_tool_header_t *tool_header = XNEWVEC (gcov_pmu_tool_header_t, 1);
> +  char *hostname = 0;
> +  char *kversion = 0;
> +  char *hostcpu = 0;
> +  char *column_description = 0;
> +  char *column_desc_start = 0;
> +  char *column_desc_end = 0;
> +  const char *column_header = 0;
> +  int got_hostname = 0;
> +  int got_kversion = 0 ;
> +  int got_hostcpu = 0;
> +  int got_end_header = 0;
> +  int got_column_description = 0;
> +
> +  buf = XNEWVEC (char, buflen);
> +  buf_start = buf;
> +  buf_end = buf + buflen;
> +  while (buf < (buf_end - 1) && fgets (buf, buf_end - buf, fp))
> +    {
> +      if (strncmp (end_header, buf, buf_end - buf) == 0)
> +      {
> +        got_end_header = 1;
> +        break;
> +      }
> +      if (!got_hostname &&
> +          strncmp (buf, tag_hostname, strlen (tag_hostname)) == 0)
> +        {
> +          size_t len = strlen (buf) - strlen (tag_hostname);
> +          hostname = XNEWVEC (char, len);
> +          memcpy (hostname, buf + strlen (tag_hostname), len);
> +          hostname[len - 1] = 0;
> +          tool_header->hostname = hostname;
> +          got_hostname = 1;
> +        }
> +
> +      if (!got_kversion &&
> +          strncmp (buf, tag_kversion, strlen (tag_kversion)) == 0)
> +        {
> +          size_t len = strlen (buf) - strlen (tag_kversion);
> +          kversion = XNEWVEC (char, len);
> +          memcpy (kversion, buf + strlen (tag_kversion), len);
> +          kversion[len - 1] = 0;
> +          tool_header->kernel_version = kversion;
> +          got_kversion = 1;
> +        }
> +
> +      if (!got_hostcpu &&
> +          strncmp (buf, tag_hostcpu, strlen (tag_hostcpu)) == 0)
> +        {
> +          size_t len = strlen (buf) - strlen (tag_hostcpu);
> +          hostcpu = XNEWVEC (char, len);
> +          memcpy (hostcpu, buf + strlen (tag_hostcpu), len);
> +          hostcpu[len - 1] = 0;
> +          tool_header->host_cpu = hostcpu;
> +          got_hostcpu = 1;
> +        }
> +      if (!got_column_description &&
> +          strncmp (buf, tag_column_desc_start, strlen (tag_column_desc_start))
> +          == 0)
> +        {
> +          column_desc_start = buf;
> +          column_desc_end = 0;
> +          /* Continue reading until end of the column descriptor.  */
> +          while (buf < (buf_end - 1) && fgets (buf, buf_end - buf, fp))
> +            {
> +              if (strncmp (buf, tag_column_desc_end,
> +                           strlen (tag_column_desc_end)) == 0)
> +                {
> +                  column_desc_end = buf + strlen (tag_column_desc_end);
> +                  break;
> +                }
> +              buf += strlen (buf);
> +            }
> +          if (column_desc_end)
> +            {
> +              /* Found the end, copy it into a new string.  */
> +              column_description = XNEWVEC (char, column_desc_end -
> +                                            column_desc_start + 1);
> +              got_column_description = 1;
> +              strcpy (column_description, column_desc_start);
> +              tool_header->column_description = column_description;
> +            }
> +        }
> +      buf += strlen (buf);
> +    }
> +
> +  /* If we are missing any of the fields, return NULL.  */
> +  if (!got_end_header || !got_hostname || !got_kversion || !got_hostcpu
> +      || !got_column_description)
> +    {
> +      free (hostname);
> +      free (kversion);
> +      free (hostcpu);
> +      free (column_description);
> +      free (buf_start);
> +      free (tool_header);
> +      return NULL;
> +    }
> +
> +  switch (the_pmu_tool_info->event)
> +    {
> +    case PET_INTEL_LOAD_LATENCY:
> +    case PET_AMD_LOAD_LATENCY:
> +      column_header = pfmon_ll_header;
> +      break;
> +    case PET_INTEL_BRANCH_MISPREDICT:
> +    case PET_AMD_BRANCH_MISPREDICT:
> +      column_header = pfmon_bm_header;
> +      break;
> +    default:
> +      break;
> +    }
> +  tool_header->column_header = strdup (column_header);
> +  tool_header->full_header = buf_start;
> +  return tool_header;
> +}
> +
> +
> +/* Parse FILENAME for branch mispredict lines into a structure
> +   PMU_DATA. Returns 0 on on success.  Returns non-zero on
> +   failure.  */
> +
> +static int
> +parse_pfmon_branch_mispredicts (char *filename, void *pmu_data)
> +{
> +  FILE *fp;
> +  size_t buflen = 2*1024;
> +  char *buf;
> +  brm_infos_t *brm_infos = (brm_infos_t *)pmu_data;
> +  gcov_pmu_tool_header_t *tool_header = 0;
> +
> +  if ((fp = fopen (filename, "r")) == NULL)
> +    {
> +      fprintf (stderr, "cannot open pmu data file: %s\n", filename);
> +      return 1;
> +    }
> +
> +  if (!(tool_header = parse_pfmon_tool_header (fp, pfmon_bm_header)))
> +    {
> +      fprintf (stderr, "cannot parse pmu data file header: %s\n", filename);
> +      return 1;
> +    }
> +
> +  buf = XNEWVEC (char, buflen);
> +  while (fgets (buf, buflen, fp))
> +    {
> +      gcov_pmu_brm_info_t *brm = XNEW (gcov_pmu_brm_info_t);
> +      if (!parse_branch_mispredict_line (buf, brm))
> +        {
> +          /* Valid line, add to the array.  */
> +          brm_infos->brm_count++;
> +          if (brm_infos->brm_count >= brm_infos->alloc_brm_count)
> +            {
> +              /* Do we need to realloc? */
> +              brm_infos->brm_array =
> +                realloc (brm_infos->brm_array,
> +                         2 * brm_infos->alloc_brm_count);
> +              if (brm_infos->brm_array == NULL) {
> +                fprintf (stderr,
> +                         "Cannot allocate memory for br mispredicts.\n");
> +                __destroy_pmu_tool_header (tool_header);
> +                free (buf);
> +                fclose (fp);
> +                return 1;
> +              }
> +            }
> +          brm_infos->brm_array[brm_infos->brm_count - 1] = brm;
> +        }
> +      else
> +        /* Delete invalid line.  */
> +        XDELETE (brm);
> +    }
> +  free (buf);
> +  fclose (fp);
> +  brm_infos->pmu_tool_header = tool_header;
> +  return 0;
> +}
> +
> +/* Start the monitoring process using pmu tool. Return 0 on success,
> +   non-zero otherwise.  */
> +
> +static int
> +pmu_start (void)
> +{
> +  pid_t pid;
> +
> +  /* no start function */
> +  if (!the_pmu_tool_info->tool_details->start_pmu_module)
> +    return 1;
> +
> +  pid = fork ();
> +  if (pid == -1)
> +    {
> +      /* error condition */
> +      fprintf (stderr, "Cannot create PMU profiling process, exiting.\n");
> +      return 1;
> +    }
> +  else if (pid == 0)
> +    {
> +      /* child */
> +      pid_t ppid = getppid();
> +      char *tmpfile = the_pmu_tool_info->raw_pmu_profile_filename;
> +      const char **pfmon_args = the_pmu_tool_info->tool_details->arg_array;
> +      int new_stderr_fd;
> +
> +      /* Redirect stderr from the child process into a separate file.  */
> +      new_stderr_fd = creat (the_pmu_tool_info->tool_stderr_filename,
> +                             S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH);
> +      if (new_stderr_fd != -1)
> +          dup2 (new_stderr_fd, 2);
> +      /* The following does an exec and thus is not expected to return.  */
> +      the_pmu_tool_info->tool_details->start_pmu_module(ppid, tmpfile,
> +                                                        pfmon_args);
> +      /* exec returned, an error condition.  */
> +      fprintf (stderr, "could not create profiling process: %s\n",
> +               the_pmu_tool_info->tool_details->arg_array[0]);
> +      _exit (0);
> +    }
> +  else
> +    {
> +      /* parent */
> +      the_pmu_tool_info->pmu_tool_pid = pid;
> +      return 0;
> +    }
> +}
> +
> +/* Allocate and initialize pmu load latency structure.  */
> +
> +static void *
> +init_pmu_load_latency (void)
> +{
> +  ll_infos_t *load_latency = XNEWVEC (ll_infos_t, 1);
> +  load_latency->ll_count = 0;
> +  load_latency->alloc_ll_count = 64;
> +  load_latency->ll_array = XNEWVEC (gcov_pmu_ll_info_t *,
> +                                    load_latency->alloc_ll_count);
> +  return (void *)load_latency;
> +}
> +
> +/* Allocate and initialize pmu branch mispredict structure.  */
> +
> +static void *
> +init_pmu_branch_mispredict (void)
> +{
> +  brm_infos_t *brm_info = XNEWVEC (brm_infos_t, 1);
> +  brm_info->brm_count = 0;
> +  brm_info->alloc_brm_count = 64;
> +  brm_info->brm_array = XNEWVEC (gcov_pmu_brm_info_t *,
> +                                 brm_info->alloc_brm_count);
> +  return (void *)brm_info;
> +}
> +
> +/* Initialize pmu tool based upon PMU_INFO. Sets the appropriate tool
> +   type in the global the_pmu_tool_info.  */
> +
> +static int
> +init_pmu_tool (struct gcov_pmu_info *pmu_info)
> +{
> +  the_pmu_tool_info->pmu_profiling_state = PMU_NONE;
> +  the_pmu_tool_info->verbose = 0;
> +  the_pmu_tool_info->tool = PTT_PFMON;  /* we support only pfmon */
> +  the_pmu_tool_info->pmu_tool_pid = 0;
> +  the_pmu_tool_info->top_n_address = pmu_info->pmu_top_n_address;
> +  the_pmu_tool_info->symbolizer_pid = 0;
> +  the_pmu_tool_info->symbolizer_to_pipefd[0] = -1;
> +  the_pmu_tool_info->symbolizer_to_pipefd[1] = -1;
> +  the_pmu_tool_info->symbolizer_from_pipefd[0] = -1;
> +  the_pmu_tool_info->symbolizer_from_pipefd[1] = -1;
> +
> +  if (parse_pmu_profile_options (pmu_info->pmu_tool))
> +    return 1;
> +
> +  if (the_pmu_tool_info->pmu_profiling_state == PMU_ERROR)
> +    {
> +      fprintf (stderr, "Unsupported PMU module: %s, disabling PMU profiling.\n",
> +               pmu_info->pmu_tool);
> +      return 1;
> +    }
> +
> +  if (the_pmu_tool_info->tool_details->init_pmu_module)
> +    /* initialize module */
> +    the_pmu_tool_info->pmu_data =
> +      the_pmu_tool_info->tool_details->init_pmu_module();
> +  return 0;
> +}
> +
> +/* Initialize PMU profiling based upon the information passed in
> +   PMU_INFO and use pmu_profile_filename as the file to store the PMU
> +   profile.  This is called multiple times from libgcov, once per
> +   object file.  We need to make sure to do the necessary
> +   initialization only the first time.  For subsequent invocations it
> +   behaves as a NOOP.  */
> +
> +void
> +__gcov_init_pmu_profiler (struct gcov_pmu_info *pmu_info)
> +{
> +  char *raw_pmu_profile_filename;
> +  char *tool_stderr_filename;
> +  if (!pmu_info || !pmu_info->pmu_profile_filename || !pmu_info->pmu_tool)
> +    return;
> +
> +  /* Allocate the global structure on first invocation.  */
> +  if (!the_pmu_tool_info)
> +    {
> +      the_pmu_tool_info = XNEWVEC (struct pmu_tool_info, 1);
> +      if (!the_pmu_tool_info)
> +        {
> +          fprintf (stderr, "Error allocating memory for PMU tool\n");
> +          return;
> +        }
> +      if (init_pmu_tool (pmu_info))
> +        {
> +          /* Initialization error.  */
> +          XDELETE (the_pmu_tool_info);
> +          the_pmu_tool_info = 0;
> +          return;
> +        }
> +    }
> +
> +  switch (the_pmu_tool_info->pmu_profiling_state)
> +    {
> +    case PMU_NONE:
> +      the_pmu_tool_info->pmu_profile_filename =
> +        strdup (pmu_info->pmu_profile_filename);
> +      /* Construct an intermediate filename by substituting trailing
> +         '.gcda' with '.pmud'.  */
> +      raw_pmu_profile_filename = strdup (pmu_info->pmu_profile_filename);
> +      if (raw_pmu_profile_filename == NULL)
> +        {
> +          fprintf (stderr, "Cannot allocate memory\n");
> +          exit (1);
> +        }
> +      strcpy (raw_pmu_profile_filename + strlen (raw_pmu_profile_filename) - 4,
> +              "pmud");
> +
> +      /* Construct a filename for collecting PMU tool's stderr by
> +         substituting trailing '.gcda' with '.stderr'.  */
> +      tool_stderr_filename =
> +        XNEWVEC (char, strlen (pmu_info->pmu_profile_filename) + 1 + 2);
> +      strcpy (tool_stderr_filename, pmu_info->pmu_profile_filename);
> +      strcpy (tool_stderr_filename + strlen (tool_stderr_filename) - 4,
> +              "stderr");
> +      the_pmu_tool_info->raw_pmu_profile_filename = raw_pmu_profile_filename;
> +      the_pmu_tool_info->tool_stderr_filename = tool_stderr_filename;
> +      the_pmu_tool_info->pmu_profiling_state = PMU_INITIALIZED;
> +      break;
> +
> +    case PMU_INITIALIZED:
> +    case PMU_OFF:
> +    case PMU_ON:
> +    case PMU_ERROR:
> +      break;
> +    default:
> +      break;
> +    }
> +}
> +
> +/* Start PMU profiling.  It updates the current state.  */
> +
> +void
> +__gcov_start_pmu_profiler (void)
> +{
> +  if (!the_pmu_tool_info)
> +    return;
> +
> +  switch (the_pmu_tool_info->pmu_profiling_state)
> +    {
> +    case PMU_INITIALIZED:
> +      if (!pmu_start ())
> +        the_pmu_tool_info->pmu_profiling_state = PMU_ON;
> +      else
> +        the_pmu_tool_info->pmu_profiling_state = PMU_ERROR;
> +      break;
> +
> +    case PMU_NONE:
> +      /* PMU was not properly initialized, don't attempt start it.  */
> +      the_pmu_tool_info->pmu_profiling_state = PMU_ERROR;
> +      break;
> +
> +    case PMU_OFF:
> +      /* Restarting PMU is not yet supported.  */
> +    case PMU_ON:
> +      /* Do nothing.  */
> +    case PMU_ERROR:
> +      break;
> +
> +    default:
> +      break;
> +    }
> +}
> +
> +/* Stop PMU profiling.  Currently it doesn't do anything except
> +   bookkeeping.  */
> +
> +void
> +__gcov_stop_pmu_profiler (void)
> +{
> +  if (!the_pmu_tool_info)
> +    return;
> +
> +  if (the_pmu_tool_info->tool_details->stop_pmu_module)
> +    the_pmu_tool_info->tool_details->stop_pmu_module();
> +  if (the_pmu_tool_info->pmu_profiling_state == PMU_ON)
> +    the_pmu_tool_info->pmu_profiling_state = PMU_OFF;
> +}
> +
> +/* Write the load latency information LL_INFO into the gcda file.  */
> +
> +static void
> +gcov_write_ll_line (const gcov_pmu_ll_info_t *ll_info)
> +{
> +  gcov_unsigned_t len = GCOV_TAG_PMU_LOAD_LATENCY_LENGTH (ll_info->filename);
> +  gcov_write_tag_length (GCOV_TAG_PMU_LOAD_LATENCY_INFO, len);
> +  gcov_write_unsigned (ll_info->counts);
> +  gcov_write_unsigned (ll_info->self);
> +  gcov_write_unsigned (ll_info->cum);
> +  gcov_write_unsigned (ll_info->lt_10);
> +  gcov_write_unsigned (ll_info->lt_32);
> +  gcov_write_unsigned (ll_info->lt_64);
> +  gcov_write_unsigned (ll_info->lt_256);
> +  gcov_write_unsigned (ll_info->lt_1024);
> +  gcov_write_unsigned (ll_info->gt_1024);
> +  gcov_write_unsigned (ll_info->wself);
> +  gcov_write_counter (ll_info->code_addr);
> +  gcov_write_unsigned (ll_info->line);
> +  gcov_write_unsigned (ll_info->discriminator);
> +  gcov_write_string (ll_info->filename);
> +}
> +
> +
> +/* Write the branch mispredict information BRM_INFO into the gcda file.  */
> +
> +static void
> +gcov_write_branch_mispredict_line (const gcov_pmu_brm_info_t *brm_info)
> +{
> +  gcov_unsigned_t len = GCOV_TAG_PMU_BRANCH_MISPREDICT_LENGTH (
> +      brm_info->filename);
> +  gcov_write_tag_length (GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO, len);
> +  gcov_write_unsigned (brm_info->counts);
> +  gcov_write_unsigned (brm_info->self);
> +  gcov_write_unsigned (brm_info->cum);
> +  gcov_write_counter (brm_info->code_addr);
> +  gcov_write_unsigned (brm_info->line);
> +  gcov_write_unsigned (brm_info->discriminator);
> +  gcov_write_string (brm_info->filename);
> +}
> +
> +/* Write load latency information INFO into the gcda file.  The gcda
> +   file has already been opened and is available for writing.  */
> +
> +static void
> +gcov_write_load_latency_infos (void *info)
> +{
> +  unsigned i;
> +  const ll_infos_t *ll_infos = (const ll_infos_t *)info;
> +  gcov_unsigned_t stamp = 0;  /* Don't use stamp as we don't support merge.  */
> +  /* We don't support merge, and instead always rewrite the file.  But
> +     to rewrite a gcov file we must first read it, however the read
> +     value is ignored.  */
> +  gcov_read_unsigned ();
> +  gcov_rewrite ();
> +  gcov_write_tag_length (GCOV_DATA_MAGIC, GCOV_VERSION);
> +  gcov_write_unsigned (stamp);
> +  if (ll_infos->pmu_tool_header)
> +    gcov_write_tool_header (ll_infos->pmu_tool_header);
> +  for (i = 0; i < ll_infos->ll_count; ++i)
> +    {
> +      /* Write each line.  */
> +      gcov_write_ll_line (ll_infos->ll_array[i]);
> +    }
> +  gcov_truncate ();
> +}
> +
> +/* Write branch mispredict information INFO into the gcda file.  The
> +   gcda file has already been opened and is available for writing.  */
> +
> +static void
> +gcov_write_branch_mispredict_infos (void *info)
> +{
> +  unsigned i;
> +  const brm_infos_t *brm_infos = (const brm_infos_t *)info;
> +  gcov_unsigned_t stamp = 0;  /* Don't use stamp as we don't support merge. */
> +  /* We don't support merge, and instead always rewrite the file.  */
> +  gcov_rewrite ();
> +  gcov_write_tag_length (GCOV_DATA_MAGIC, GCOV_VERSION);
> +  gcov_write_unsigned (stamp);
> +  if (brm_infos->pmu_tool_header)
> +    gcov_write_tool_header (brm_infos->pmu_tool_header);
> +  for (i = 0; i < brm_infos->brm_count; ++i)
> +    {
> +      /* Write each line.  */
> +      gcov_write_branch_mispredict_line (brm_infos->brm_array[i]);
> +    }
> +  gcov_truncate ();
> +}
> +
> +/* Compute TOOL_HEADER length for writing into the gcov file.  */
> +
> +static gcov_unsigned_t
> +gcov_tag_pmu_tool_header_length (gcov_pmu_tool_header_t *header)
> +{
> +  gcov_unsigned_t len = 0;
> +  if (header)
> +    {
> +      len += gcov_string_length (header->host_cpu);
> +      len += gcov_string_length (header->hostname);
> +      len += gcov_string_length (header->kernel_version);
> +      len += gcov_string_length (header->column_header);
> +      len += gcov_string_length (header->column_description);
> +      len += gcov_string_length (header->full_header);
> +    }
> +  return len;
> +}
> +
> +/* Write tool header into the gcda file. It assumes that the gcda file
> +   has already been opened and is available for writing.  */
> +
> +static void
> +gcov_write_tool_header (gcov_pmu_tool_header_t *header)
> +{
> +  gcov_unsigned_t len = gcov_tag_pmu_tool_header_length (header);
> +  gcov_write_tag_length (GCOV_TAG_PMU_TOOL_HEADER, len);
> +  gcov_write_string (header->host_cpu);
> +  gcov_write_string (header->hostname);
> +  gcov_write_string (header->kernel_version);
> +  gcov_write_string (header->column_header);
> +  gcov_write_string (header->column_description);
> +  gcov_write_string (header->full_header);
> +}
> +
> +
> +/* End PMU profiling. If GCDA_ERROR is non-zero then write profiling data into
> +   already open gcda file */
> +
> +void
> +__gcov_end_pmu_profiler (int gcda_error)
> +{
> +  int pid_status;
> +  int wait_status;
> +  pid_t pid;
> +  pmu_tool_fns *tool_details;
> +
> +  if (!the_pmu_tool_info)
> +    return;
> +
> +  tool_details = the_pmu_tool_info->tool_details;
> +  pid = the_pmu_tool_info->pmu_tool_pid;
> +  if (pid)
> +    {
> +      if (tool_debug)
> +        fprintf (stderr, "terminating PMU profiling process %ld\n", (long)pid);
> +      kill (pid, SIGTERM);
> +      if (tool_debug)
> +        fprintf (stderr, "parent: waiting for pmu process to end\n");
> +      wait_status = waitpid (pid, &pid_status, 0);
> +      if (tool_debug) {
> +        if (wait_status == pid)
> +          fprintf (stderr, "Normal exit. Child terminated.\n");
> +        else
> +          fprintf (stderr, "Abnormal exit. child status, %d.\n", pid_status);
> +      }
> +    }
> +
> +  if (the_pmu_tool_info->pmu_profiling_state != PMU_OFF)
> +    {
> +      /* nothing to do */
> +      fprintf (stderr,
> +               "__gcov_dump_pmu_profile: incorrect pmu state: %d, pid: %ld\n",
> +               the_pmu_tool_info->pmu_profiling_state,
> +               (unsigned long)pid);
> +      return;
> +    }
> +
> +  if (!tool_details->parse_pmu_output)
> +    return;
> +
> +  /* Since we are going to parse the output, we also need symbolizer.  */
> +  if (tool_details->start_symbolizer)
> +    tool_details->start_symbolizer (getpid ());
> +
> +  if (!tool_details->parse_pmu_output
> +      (the_pmu_tool_info->raw_pmu_profile_filename,
> +       the_pmu_tool_info->pmu_data))
> +    {
> +      if (!gcda_error && tool_details->gcov_write_pmu_data)
> +        /* Write tool output into the gcda file.  */
> +        tool_details->gcov_write_pmu_data (the_pmu_tool_info->pmu_data);
> +    }
> +
> +  if (tool_details->end_symbolizer)
> +    tool_details->end_symbolizer ();
> +
> +  if (tool_details->cleanup_pmu_data)
> +    tool_details->cleanup_pmu_data (the_pmu_tool_info->pmu_data);
> +}
> +
> +#endif
> Index: gcc/coverage.c
> ===================================================================
> --- gcc/coverage.c      (revision 175226)
> +++ gcc/coverage.c      (working copy)
> @@ -62,6 +62,9 @@
>  #include "dbgcnt.h"
>  #include "input.h"
>
> +/* Defined in tree-profile.c.  */
> +void gimple_init_instrumentation_sampling (void);
> +
>  struct function_list
>  {
>   struct function_list *next;   /* next function */
> @@ -120,6 +123,9 @@
>  static char *da_base_file_name;
>  static char *main_input_file_name;
>
> +/* Filename for the global pmu profile */
> +static char pmu_profile_filename[] = "pmuprofile";
> +
>  /* Hash table of count data.  */
>  static htab_t counts_hash = NULL;
>
> @@ -146,6 +152,16 @@
>  /* True if the current module has any asm statements.  */
>  static bool has_asm_statement;
>
> +/* extern const char * __gcov_pmu_profile_filename */
> +static tree gcov_pmu_filename_decl = NULL_TREE;
> +/* extern const char * __gcov_pmu_profile_options */
> +static tree gcov_pmu_options_decl = NULL_TREE;
> +/* extern gcov_unsigned_t  __gcov_pmu_top_n_address */
> +static tree gcov_pmu_top_n_address_decl = NULL_TREE;
> +
> +/* To ensure that the above variables are initialized only once.  */
> +static int pmu_profiling_initialized = 0;
> +
>  /* Forward declarations.  */
>  static hashval_t htab_counts_entry_hash (const void *);
>  static int htab_counts_entry_eq (const void *, const void *);
> @@ -157,7 +173,8 @@
>  static tree build_ctr_info_value (unsigned, tree);
>  static tree build_gcov_info (void);
>  static void create_coverage (void);
> -static char * get_da_file_name (const char *);
> +static void init_pmu_profiling (void);
> +static bool profiling_enabled_p (void);
>
>  /* Return the type node for gcov_type.  */
>
> @@ -175,6 +192,15 @@
>   return lang_hooks.types.type_for_size (32, true);
>  }
>
> +/* Return the type node for const char *.  */
> +
> +static tree
> +get_const_string_type (void)
> +{
> +  return build_pointer_type
> +    (build_qualified_type (char_type_node, TYPE_QUAL_CONST));
> +}
> +
>  static hashval_t
>  htab_counts_entry_hash (const void *of)
>  {
> @@ -1688,7 +1714,7 @@
>
>   no_coverage = 1; /* Disable any further coverage.  */
>
> -  if (!prg_ctr_mask)
> +  if (!prg_ctr_mask && !flag_pmu_profile_generate)
>     return;
>
>   t = build_gcov_info ();
> @@ -1725,7 +1751,7 @@
>
>  /* Get the da file name, given base file name.  */
>
> -static char *
> +char *
>  get_da_file_name (const char *base_file_name)
>  {
>   char *da_file_name;
> @@ -1910,8 +1936,122 @@
>        read_counts_file (get_da_file_name (module_infos[i]->da_filename),
>                          module_infos[i]->ident);
>     }
> +
> +  /* Define variables which are referenced at runtime by libgcov.  */
> +  if (profiling_enabled_p ())
> +  {
> +    init_pmu_profiling ();
> +    gimple_init_instrumentation_sampling ();
> +  }
>  }
>
> +/* Return True if any type of profiling is enabled which requires linking
> +   in libgcov otherwise return False.  */
> +
> +static bool
> +profiling_enabled_p (void)
> +{
> +  return flag_pmu_profile_generate || profile_arc_flag ||
> +      flag_profile_generate_sampling || flag_test_coverage ||
> +      flag_branch_probabilities || flag_profile_reusedist;
> +}
> +
> +/* Construct variables for PMU profiling.
> +   1) __gcov_pmu_profile_filename,
> +   2) __gcov_pmu_profile_options,
> +   3) __gcov_pmu_top_n_address.  */
> +
> +static void
> +init_pmu_profiling (void)
> +{
> +  if (!pmu_profiling_initialized)
> +    {
> +      unsigned top_n_addr = PARAM_VALUE (PARAM_PMU_PROFILE_N_ADDRESS);
> +      tree filename_ptr, options_ptr;
> +
> +      /* Construct an initializer for __gcov_pmu_profile_filename.  */
> +      gcov_pmu_filename_decl =
> +        build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +                    get_identifier ("__gcov_pmu_profile_filename"),
> +                    get_const_string_type ());
> +      TREE_PUBLIC (gcov_pmu_filename_decl) = 1;
> +      DECL_ARTIFICIAL (gcov_pmu_filename_decl) = 1;
> +      make_decl_one_only (gcov_pmu_filename_decl,
> +                          DECL_ASSEMBLER_NAME (gcov_pmu_filename_decl));
> +      TREE_STATIC (gcov_pmu_filename_decl) = 1;
> +
> +      if (flag_pmu_profile_generate)
> +        {
> +          const char *filename = get_da_file_name (pmu_profile_filename);
> +          int file_name_len;
> +          tree filename_string;
> +          file_name_len = strlen (filename);
> +          filename_string = build_string (file_name_len + 1, filename);
> +          TREE_TYPE (filename_string) = build_array_type
> +            (char_type_node, build_index_type
> +             (build_int_cst (NULL_TREE, file_name_len)));
> +          filename_ptr = build1 (ADDR_EXPR, get_const_string_type (),
> +                                 filename_string);
> +        }
> +      else
> +        filename_ptr = null_pointer_node;
> +
> +      DECL_INITIAL (gcov_pmu_filename_decl) = filename_ptr;
> +      assemble_variable (gcov_pmu_filename_decl, 0, 0, 0);
> +
> +      /* Construct an initializer for __gcov_pmu_profile_options.  */
> +      gcov_pmu_options_decl =
> +        build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +                    get_identifier ("__gcov_pmu_profile_options"),
> +                    get_const_string_type ());
> +      TREE_PUBLIC (gcov_pmu_options_decl) = 1;
> +      DECL_ARTIFICIAL (gcov_pmu_options_decl) = 1;
> +      make_decl_one_only (gcov_pmu_options_decl,
> +                          DECL_ASSEMBLER_NAME (gcov_pmu_options_decl));
> +      TREE_STATIC (gcov_pmu_options_decl) = 1;
> +
> +      /* If the flag is false we generate a null pointer to indicate
> +         that we are not doing the pmu profiling.  */
> +      if (flag_pmu_profile_generate)
> +        {
> +          const char *pmu_options = flag_pmu_profile_generate;
> +          int pmu_options_len;
> +          tree pmu_options_string;
> +
> +          pmu_options_len = strlen (pmu_options);
> +          pmu_options_string = build_string (pmu_options_len + 1, pmu_options);
> +          TREE_TYPE (pmu_options_string) = build_array_type
> +            (char_type_node, build_index_type (build_int_cst
> +                                               (NULL_TREE, pmu_options_len)));
> +          options_ptr = build1 (ADDR_EXPR, get_const_string_type (),
> +                                pmu_options_string);
> +        }
> +      else
> +        options_ptr = null_pointer_node;
> +
> +      DECL_INITIAL (gcov_pmu_options_decl) = options_ptr;
> +      assemble_variable (gcov_pmu_options_decl, 0, 0, 0);
> +
> +      /* Construct an initializer for __gcov_pmu_top_n_address.  We
> +         don't need to guard this with the flag_pmu_profile generate
> +         because the value of __gcov_pmu_top_n_address is ignored when
> +         not doing profiling.  */
> +      gcov_pmu_top_n_address_decl =
> +        build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +                    get_identifier ("__gcov_pmu_top_n_address"),
> +                    get_gcov_unsigned_t ());
> +      TREE_PUBLIC (gcov_pmu_top_n_address_decl) = 1;
> +      DECL_ARTIFICIAL (gcov_pmu_top_n_address_decl) = 1;
> +      make_decl_one_only (gcov_pmu_top_n_address_decl,
> +                          DECL_ASSEMBLER_NAME (gcov_pmu_top_n_address_decl));
> +      TREE_STATIC (gcov_pmu_top_n_address_decl) = 1;
> +      DECL_INITIAL (gcov_pmu_top_n_address_decl) =
> +        build_int_cstu (get_gcov_unsigned_t (), top_n_addr);
> +      assemble_variable (gcov_pmu_top_n_address_decl, 0, 0, 0);
> +    }
> +  pmu_profiling_initialized = 1;
> +}
> +
>  /* Performs file-level cleanup.  Close graph file, generate coverage
>    variables and constructor.  */
>
> @@ -1989,4 +2129,19 @@
>   has_asm_statement = flag_ripa_disallow_asm_modules;
>  }
>
> +/* Check the command line OPTIONS passed to
> +   -fpmu-profile-generate. Return 0 if the options are valid, non-zero
> +   otherwise.  */
> +
> +int
> +check_pmu_profile_options (const char *options)
> +{
> +  if (strcmp(options, "load-latency") &&
> +      strcmp(options, "load-latency-verbose") &&
> +      strcmp(options, "branch-mispredict") &&
> +      strcmp(options, "branch-mispredict-verbose"))
> +    return 1;
> +  return 0;
> +}
> +
>  #include "gt-coverage.h"
> Index: gcc/coverage.h
> ===================================================================
> --- gcc/coverage.h      (revision 175226)
> +++ gcc/coverage.h      (working copy)
> @@ -77,4 +77,13 @@
>  /* Mark this module as containing asm statements.  */
>  extern void coverage_has_asm_stmt (void);
>
> +/* Get the da file name, given base file name.  */
> +extern char * get_da_file_name (const char *base_file_name);
> +
> +/* Check if the specified options are valid for pmu profilig.  */
> +extern int check_pmu_profile_options (const char *options);
> +
> +/* Defined in tree-profile.c.  */
> +extern void tree_init_instrumentation_sampling (void);
> +
>  #endif
> Index: gcc/common.opt
> ===================================================================
> --- gcc/common.opt      (revision 175226)
> +++ gcc/common.opt      (working copy)
> @@ -1606,6 +1606,14 @@
>  Common Joined RejectNegative Var(common_deferred_options) Defer
>  -fplugin-arg-<name>-<key>[=<value>]    Specify argument <key>=<value> for plugin <name>
>
> +fpmu-profile-generate=
> +Common Joined RejectNegative Var(flag_pmu_profile_generate)
> +-fpmu-profile-generate=[load-latency]  Generate pmu profile for cache misses. Currently only pfmon based load latency profiling is supported on Intel/PEBS and AMD/IBS platforms.
> +
> +fpmu-profile-use=
> +Common Joined RejectNegative Var(flag_pmu_profile_use)
> +-fpmu-profile-use=[load-latency]  Use pmu profile data while optimizing.  Currently only perfmon based load latency profiling is supported on Intel/PEBS and AMD/IBS platforms.
> +
>  fpredictive-commoning
>  Common Report Var(flag_predictive_commoning) Optimization
>  Run predictive commoning optimization.
> Index: gcc/tree-profile.c
> ===================================================================
> --- gcc/tree-profile.c  (revision 175226)
> +++ gcc/tree-profile.c  (working copy)
> @@ -168,6 +168,9 @@
>  /* extern gcov_unsigned_t __gcov_sampling_rate  */
>  static tree gcov_sampling_rate_decl = NULL_TREE;
>
> +/* forward declaration.  */
> +void gimple_init_instrumentation_sampling (void);
> +
>  /* Insert STMT_IF around given sequence of consecutive statements in the
>    same basic block starting with STMT_START, ending with STMT_END.  */
>
> @@ -287,7 +290,7 @@
>     }
>  }
>
> -static void
> +void
>  gimple_init_instrumentation_sampling (void)
>  {
>   if (!gcov_sampling_rate_decl)
> @@ -341,8 +344,6 @@
>   tree dc_profiler_fn_type;
>   tree average_profiler_fn_type;
>
> -  gimple_init_instrumentation_sampling ();
> -
>   if (!gcov_type_node)
>     {
>       char name_buf[32];
> Index: gcc/libgcov.c
> ===================================================================
> --- gcc/libgcov.c       (revision 175226)
> +++ gcc/libgcov.c       (working copy)
> @@ -124,9 +124,15 @@
>  }
>
>  #ifndef __GCOV_KERNEL__
> +/* Emitted in coverage.c.  */
> +extern char * __gcov_pmu_profile_filename;
> +extern char * __gcov_pmu_profile_options;
> +extern gcov_unsigned_t __gcov_pmu_top_n_address;
> +
>  /* Sampling rate.  */
>  extern gcov_unsigned_t __gcov_sampling_rate;
>  static int gcov_sampling_rate_initialized = 0;
> +void __gcov_set_sampling_rate (unsigned int rate);
>
>  /* Set sampling rate to RATE.  */
>
> @@ -344,7 +350,7 @@
>   /* Update complete filename with stripped original. */
>   if (prefix_length != 0 && !IS_DIR_SEPARATOR (*filename))
>     {
> -      /* If prefix is given, add diretory separator.  */
> +      /* If prefix is given, add directory separator.  */
>       strcpy (gi_filename_up, "/");
>       strcpy (gi_filename_up + 1, filename);
>     }
> @@ -352,6 +358,88 @@
>     strcpy (gi_filename_up, filename);
>  }
>
> +/* This function allocates the space to store current file name.  */
> +
> +static void
> +gcov_alloc_filename (void)
> +{
> +  /* Get file name relocation prefix.  Non-absolute values are ignored.  */
> +  char *gcov_prefix = 0;
> +
> +  prefix_length = 0;
> +  gcov_prefix_strip = 0;
> +
> +  {
> +    /* Check if the level of dirs to strip off specified. */
> +    char *tmp = getenv ("GCOV_PREFIX_STRIP");
> +    if (tmp)
> +      {
> +        gcov_prefix_strip = atoi (tmp);
> +        /* Do not consider negative values. */
> +        if (gcov_prefix_strip < 0)
> +          gcov_prefix_strip = 0;
> +      }
> +  }
> +  /* Get file name relocation prefix.  Non-absolute values are ignored. */
> +  gcov_prefix = getenv ("GCOV_PREFIX");
> +  if (gcov_prefix)
> +    {
> +      prefix_length = strlen(gcov_prefix);
> +
> +      /* Remove an unnecessary trailing '/' */
> +      if (IS_DIR_SEPARATOR (gcov_prefix[prefix_length - 1]))
> +        prefix_length--;
> +    }
> +  else
> +    prefix_length = 0;
> +
> +  /* If no prefix was specified and a prefix stip, then we assume
> +     relative.  */
> +  if (gcov_prefix_strip != 0 && prefix_length == 0)
> +    {
> +      gcov_prefix = ".";
> +      prefix_length = 1;
> +    }
> +
> +  /* Allocate and initialize the filename scratch space.  */
> +  gi_filename = (char *) malloc (prefix_length + gcov_max_filename + 2);
> +  if (prefix_length)
> +    memcpy (gi_filename, gcov_prefix, prefix_length);
> +
> +  gi_filename_up = gi_filename + prefix_length;
> +}
> +
> +/* Stop the pmu profiler and dump pmu profile info into the global file.  */
> +
> +static void
> +pmu_profile_stop (void)
> +{
> +  const char *pmu_profile_filename =  __gcov_pmu_profile_filename;
> +  const char *pmu_options = __gcov_pmu_profile_options;
> +  size_t filename_length;
> +  int gcda_error;
> +
> +  if (!pmu_profile_filename || !pmu_options)
> +    return;
> +
> +  __gcov_stop_pmu_profiler ();
> +
> +  filename_length = strlen (pmu_profile_filename);
> +  if (filename_length > gcov_max_filename)
> +    gcov_max_filename = filename_length;
> +  /* Allocate and initialize the filename scratch space.  */
> +  gcov_alloc_filename ();
> +  GCOV_GET_FILENAME (prefix_length, gcov_prefix_strip, pmu_profile_filename,
> +                     gi_filename_up);
> +  /* Open the gcda file for writing. We don't support merge yet. */
> +  gcda_error = gcov_open_by_filename (gi_filename);
> +  __gcov_end_pmu_profiler (gcda_error);
> +  if ((gcda_error = gcov_close ()))
> +    gcov_error (gcda_error  < 0 ?  "pmu_profile_stop:%s:Overflow writing\n" :
> +                "pmu_profile_stop:%s:Error writing\n",
> +                gi_filename);
> +}
> +
>  /* Sort N entries in VALUE_ARRAY in descending order.
>    Each entry in VALUE_ARRAY has two values. The sorting
>    is based on the second value.  */
> @@ -438,56 +526,7 @@
>     }
>  }
>
> -/* This function allocates the space to store current file name.  */
> -
>  static void
> -gcov_alloc_filename (void)
> -{
> -  /* Get file name relocation prefix.  Non-absolute values are ignored.  */
> -  char *gcov_prefix = 0;
> -
> -  prefix_length = 0;
> -  gcov_prefix_strip = 0;
> -
> -  {
> -    /* Check if the level of dirs to strip off specified. */
> -    char *tmp = getenv ("GCOV_PREFIX_STRIP");
> -    if (tmp)
> -      {
> -        gcov_prefix_strip = atoi (tmp);
> -        /* Do not consider negative values. */
> -        if (gcov_prefix_strip < 0)
> -          gcov_prefix_strip = 0;
> -      }
> -  }
> -  /* Get file name relocation prefix.  Non-absolute values are ignored. */
> -  gcov_prefix = getenv ("GCOV_PREFIX");
> -  if (gcov_prefix)
> -    {
> -      prefix_length = strlen(gcov_prefix);
> -
> -      /* Remove an unnecessary trailing '/' */
> -      if (IS_DIR_SEPARATOR (gcov_prefix[prefix_length - 1]))
> -        prefix_length--;
> -    }
> -  else
> -    prefix_length = 0;
> -
> -  /* If no prefix was specified and a prefix stip, then we assume
> -     relative.  */
> -  if (gcov_prefix_strip != 0 && prefix_length == 0)
> -    {
> -      gcov_prefix = ".";
> -      prefix_length = 1;
> -    }
> -
> -  /* Aelocate and initialize the filename scratch space.  */
> -  gi_filename = (char *) malloc (prefix_length + gcov_max_filename + 2);
> -  if (prefix_length)
> -    memcpy (gi_filename, gcov_prefix, prefix_length);
> -}
> -
> -static void
>  gcov_dump_module_info (void)
>  {
>   struct gcov_info *gi_ptr;
> @@ -499,8 +538,8 @@
>   {
>     int error;
>
> -    gcov_strip_leading_dirs (prefix_length, gcov_prefix_strip,
> -                             gi_ptr->filename, gi_filename_up);
> +    GCOV_GET_FILENAME (prefix_length, gcov_prefix_strip, gi_ptr->filename,
> +                       gi_filename_up);
>     error = gcov_open_by_filename (gi_filename);
>     if (error != 0)
>       continue;
> @@ -534,9 +573,11 @@
>   struct gcov_info *gi_ptr;
>   int dump_module_info;
>
> +  /* Stop and write the PMU profile data into the global file.  */
> +  pmu_profile_stop ();
> +
>   dump_module_info = gcov_exit_init ();
>
> -
>   for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
>     gcov_dump_one_gcov (gi_ptr);
>
> @@ -572,11 +613,25 @@
>       const char *ptr = info->filename;
>       gcov_unsigned_t crc32 = gcov_crc32;
>       size_t filename_length = strlen (info->filename);
> +      struct gcov_pmu_info pmu_info;
>
>       /* Refresh the longest file name information.  */
>       if (filename_length > gcov_max_filename)
>         gcov_max_filename = filename_length;
>
> +      /* Initialize the pmu profiler.  */
> +      pmu_info.pmu_profile_filename = __gcov_pmu_profile_filename;
> +      pmu_info.pmu_tool = __gcov_pmu_profile_options;
> +      pmu_info.pmu_top_n_address = __gcov_pmu_top_n_address;
> +      __gcov_init_pmu_profiler (&pmu_info);
> +      if (pmu_info.pmu_profile_filename)
> +        {
> +          /* Refresh the longest file name information.  */
> +          filename_length = strlen (pmu_info.pmu_profile_filename);
> +          if (filename_length > gcov_max_filename)
> +            gcov_max_filename = filename_length;
> +        }
> +
>       /* Assign the module ID (starting at 1).  */
>       info->mod_info->ident = (++gcov_cur_module_id);
>       gcc_assert (EXTRACT_MODULE_ID_FROM_GLOBAL_ID (GEN_FUNC_GLOBAL_ID (
> @@ -601,7 +656,11 @@
>       gcov_crc32 = crc32;
>
>       if (!__gcov_list)
> -        atexit (gcov_exit);
> +        {
> +          atexit (gcov_exit);
> +          /* Start pmu profiler. */
> +          __gcov_start_pmu_profiler ();
> +        }
>
>       info->next = __gcov_list;
>       __gcov_list = info;
> @@ -618,6 +677,7 @@
>  {
>   const struct gcov_info *gi_ptr;
>
> +  __gcov_stop_pmu_profiler ();
>   gcov_exit ();
>   for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
>     {
> @@ -631,6 +691,7 @@
>            ci_ptr++;
>          }
>     }
> +  __gcov_start_pmu_profiler ();
>  }
>
>  #else /* __GCOV_KERNEL__ */
> @@ -640,8 +701,8 @@
>  /* Copy the filename to the buffer.  */
>
>  static inline void
> -gcov_get_filename (int prefix_length __attribute__ ((unused)),
> -                   int gcov_prefix_strip __attribute__ ((unused)),
> +gcov_get_filename (int prefix_length __attribute__ ((unused)),
> +                   int gcov_prefix_strip __attribute__ ((unused)),
>                    const char *filename, char *gi_filename_up)
>  {
>     strcpy (gi_filename_up, filename);
> @@ -1090,7 +1151,6 @@
>     }
>
>   gcov_alloc_filename ();
> -  gi_filename_up = gi_filename + prefix_length;
>
>   return dump_module_info;
>  }
> Index: gcc/params.def
> ===================================================================
> --- gcc/params.def      (revision 175226)
> +++ gcc/params.def      (working copy)
> @@ -1011,6 +1011,11 @@
>           ".note.callgraph.text section",
>          0, 0, 0)
>
> +DEFPARAM (PARAM_PMU_PROFILE_N_ADDRESS,
> +         "pmu_profile_n_addresses",
> +         "While doing PMU profiling symbolize this many top addresses.",
> +         50, 1, 10000)
> +
>  /*
>  Local variables:
>  mode:c
> Index: gcc/gcov-dump.c
> ===================================================================
> --- gcc/gcov-dump.c     (revision 175226)
> +++ gcc/gcov-dump.c     (working copy)
> @@ -39,6 +39,10 @@
>  static void tag_counters (const char *, unsigned, unsigned);
>  static void tag_summary (const char *, unsigned, unsigned);
>  static void tag_module_info (const char *, unsigned, unsigned);
> +static void tag_pmu_load_latency_info (const char *, unsigned, unsigned);
> +static void tag_pmu_branch_mispredict_info (const char *, unsigned, unsigned);
> +static void tag_pmu_tool_header (const char *, unsigned, unsigned);
> +
>  extern int main (int, char **);
>
>  typedef struct tag_format
> @@ -73,6 +77,11 @@
>   {GCOV_TAG_OBJECT_SUMMARY, "OBJECT_SUMMARY", tag_summary},
>   {GCOV_TAG_PROGRAM_SUMMARY, "PROGRAM_SUMMARY", tag_summary},
>   {GCOV_TAG_MODULE_INFO, "MODULE INFO", tag_module_info},
> +  {GCOV_TAG_PMU_LOAD_LATENCY_INFO, "PMU_LOAD_LATENCY_INFO",
> +   tag_pmu_load_latency_info},
> +  {GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO, "PMU_BRANCH_MISPREDICT_INFO",
> +   tag_pmu_branch_mispredict_info},
> +  {GCOV_TAG_PMU_TOOL_HEADER, "PMU_TOOL_HEADER", tag_pmu_tool_header},
>   {0, NULL, NULL}
>  };
>
> @@ -519,3 +528,43 @@
>       printf (": %s [%s]", mod_info->source_filename, suffix);
>     }
>  }
> +
> +/* Read gcov tag GCOV_TAG_PMU_LOAD_LATENCY_INFO from the gcda file and
> +  print the contents in a human readable form.  */
> +
> +static void
> +tag_pmu_load_latency_info (const char *filename ATTRIBUTE_UNUSED,
> +                           unsigned tag ATTRIBUTE_UNUSED, unsigned length)
> +{
> +  gcov_pmu_ll_info_t ll_info;
> +  gcov_read_pmu_load_latency_info (&ll_info, length);
> +  print_load_latency_line (stdout, &ll_info, no_newline);
> +  free (ll_info.filename);
> +}
> +
> +/* Read gcov tag GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO from the gcda
> +  file and print the contents in a human readable form.  */
> +
> +static void
> +tag_pmu_branch_mispredict_info (const char *filename ATTRIBUTE_UNUSED,
> +                                unsigned tag ATTRIBUTE_UNUSED, unsigned length)
> +{
> +  gcov_pmu_brm_info_t brm_info;
> +  gcov_read_pmu_branch_mispredict_info (&brm_info, length);
> +  print_branch_mispredict_line (stdout, &brm_info, no_newline);
> +  free (brm_info.filename);
> +}
> +
> +
> +/* Read gcov tag GCOV_TAG_PMU_TOOL_HEADER from the gcda file and print
> +   the contents in a human readable form.  */
> +
> +static void
> +tag_pmu_tool_header (const char *filename ATTRIBUTE_UNUSED,
> +                     unsigned tag ATTRIBUTE_UNUSED, unsigned length)
> +{
> +  gcov_pmu_tool_header_t tool_header;
> +  gcov_read_pmu_tool_header (&tool_header, length);
> +  print_pmu_tool_header (stdout, &tool_header, no_newline);
> +  destroy_pmu_tool_header (&tool_header);
> +}
>
> --
> This patch is available for review at http://codereview.appspot.com/4638047
>

Patch

Index: libgcc/Makefile.in
===================================================================
--- libgcc/Makefile.in	(revision 175226)
+++ libgcc/Makefile.in	(working copy)
@@ -747,10 +747,13 @@ 
 dyn-ipa.o: %$(objext): $(gcc_srcdir)/libgcov.c
 	$(gcc_compile)  -c $(gcc_srcdir)/dyn-ipa.c
 
+pmu-profile.o: %$(objext): $(gcc_srcdir)/libgcov.c
+	$(gcc_compile)  -c $(gcc_srcdir)/pmu-profile.c
 
+
 # Static libraries.
 libgcc.a: $(libgcc-objects)
-libgcov.a: $(libgcov-objects) dyn-ipa$(objext)
+libgcov.a: $(libgcov-objects) dyn-ipa$(objext) pmu-profile$(objext)
 libunwind.a: $(libunwind-objects)
 libgcc_eh.a: $(libgcc-eh-objects)
 
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 175226)
+++ gcc/doc/invoke.texi	(working copy)
@@ -388,6 +388,8 @@ 
 -fprofile-correction -fprofile-dir=@var{path} -fprofile-generate @gol
 -fprofile-generate=@var{path} -fprofile-generate-sampling @gol
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
+-fpmu-profile-generate=@var{pmuoption} @gol
+-fpmu-profile-use=@var{pmuoption} @gol
 -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol
 -freorder-blocks-and-partition -freorder-functions @gol
 -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
@@ -8088,6 +8090,26 @@ 
 If @var{path} is specified, GCC will look at the @var{path} to find
 the profile feedback data files. See @option{-fprofile-dir}.
 
+@item -fpmu-profile-generate=@var{pmuoption}
+@opindex fpmu-profile-generate
+
+Enable performance monitoring unit (PMU) profiling.  This collects
+hardware counter data corresponding to @var{pmuoption}.  Currently
+only @var{load-latency} and @var{branch-mispredict} are supported
+using pfmon tool.  You must use @option{-fpmu-profile-generate} both
+when compiling and when linking your program.  This PMU profile data
+may later be used by the compiler during optimizations as well can be
+displayed using coverage tool gcov. The params variable
+"pmu_profile_n_addresses" can be used to restrict PMU data collection
+to only this many addresses.
+
+@item -fpmu-profile-use=@var{pmuoption}
+@opindex fpmu-profile-use
+
+Enable performance monitoring unit (PMU) profiling based
+optimizations.  Currently only @var{load-latency} and
+@var{branch-mispredict} are supported.
+
 @item -fripa
 @opindex fripa
 Perform dynamic inter-procedural analysis. This is used in conjunction with
Index: gcc/doc/gcov.texi
===================================================================
--- gcc/doc/gcov.texi	(revision 175226)
+++ gcc/doc/gcov.texi	(working copy)
@@ -124,9 +124,11 @@ 
      [@option{-a}|@option{--all-blocks}]
      [@option{-b}|@option{--branch-probabilities}]
      [@option{-c}|@option{--branch-counts}]
+     [@option{-m}|@option{--pmu-profile}]
      [@option{-n}|@option{--no-output}]
      [@option{-l}|@option{--long-file-names}]
      [@option{-p}|@option{--preserve-paths}]
+     [@option{-q}|@option{--pmu_profile-path}]
      [@option{-f}|@option{--function-summaries}]
      [@option{-o}|@option{--object-directory} @var{directory|file}] @var{sourcefiles}
      [@option{-u}|@option{--unconditional-branches}]
@@ -169,6 +171,14 @@ 
 Write branch frequencies as the number of branches taken, rather than
 the percentage of branches taken.
 
+@item -m
+@itemx --pmu-profile
+Output the additional PMU profile information if available.
+
+@item -q
+@itemx --pmu_profile-path
+PMU profile path (default @file{pmuprofile.gcda}).
+
 @item -n
 @itemx --no-output
 Do not create the @command{gcov} output file.
Index: gcc/gcc.c
===================================================================
--- gcc/gcc.c	(revision 175226)
+++ gcc/gcc.c	(working copy)
@@ -662,7 +662,7 @@ 
     %{static:} %{L*} %(mfwrap) %(link_libgcc) %o\
     %{fopenmp|ftree-parallelize-loops=*:%:include(libgomp.spec)%(link_gomp)}\
     %(mflib) " STACK_SPLIT_SPEC "\
-    %{fprofile-arcs|fprofile-generate*|coverage:-lgcov}\
+    %{fprofile-arcs|fprofile-generate*|fpmu-profile-generate*|coverage:-lgcov}\
     %{!nostdlib:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}\
     %{!nostdlib:%{!nostartfiles:%E}} %{T*} }}}}}}"
 #endif
Index: gcc/gcov.c
===================================================================
--- gcc/gcov.c	(revision 175226)
+++ gcc/gcov.c	(working copy)
@@ -209,6 +209,15 @@ 
   char *name;
 } coverage_t;
 
+/* Describes PMU profile data for either one source file or for the
+   entire program.  */
+
+typedef struct pmu_data
+{
+  ll_infos_t ll_infos;
+  brm_infos_t brm_infos;
+} pmu_data_t;
+
 /* Describes a single line of source. Contains a chain of basic blocks
    with code on it.  */
 
@@ -242,6 +251,8 @@ 
 
   coverage_t coverage;
 
+  pmu_data_t *pmu_data;    /* PMU profile information for this file.  */
+
   /* Functions in this source file.  These are in ascending line
      number order.  */
   function_t *functions;
@@ -301,6 +312,10 @@ 
 /* Show unconditional branches too.  */
 static int flag_unconditional = 0;
 
+/* Output performance monitoring unit (PMU) data, if available.  */
+
+static int flag_pmu_profile = 0;
+
 /* Output a gcov file if this is true.  This is on by default, and can
    be turned off by the -n option.  */
 
@@ -345,6 +360,18 @@ 
 
 static int flag_counts = 0;
 
+/* PMU profile default filename.  */
+
+static char pmu_profile_default_filename[] = "pmuprofile.gcda";
+
+/* PMU profile filename where the PMU profile data is read from.  */
+
+static char *pmu_profile_filename = 0;
+
+/* PMU data for the entire program.  */
+
+static pmu_data_t pmu_global_info;
+
 /* Forward declarations.  */
 static void fnotice (FILE *, const char *, ...) ATTRIBUTE_PRINTF_2;
 static int process_args (int, char **);
@@ -366,6 +393,17 @@ 
 static void output_lines (FILE *, const source_t *);
 static char *make_gcov_file_name (const char *, const char *);
 static void release_structures (void);
+static void process_pmu_profile (void);
+static void filter_pmu_data_lines (source_t *src);
+static void output_pmu_data_header (FILE *gcov_file, pmu_data_t *pmu_data);
+static void output_pmu_data (FILE *gcov_file, const source_t *src,
+                             const unsigned line_num);
+static void output_load_latency_line (FILE *fp,
+                                      const gcov_pmu_ll_info_t *ll_info,
+                                      gcov_pmu_tool_header_t *tool_header);
+static void output_branch_mispredict_line (FILE *fp,
+                                           const gcov_pmu_brm_info_t *brm_info);
+
 extern int main (int, char **);
 
 int
@@ -389,6 +427,15 @@ 
   if (argc - argno > 1)
     multiple_files = 1;
 
+  /*  We read pmu profile first because we later filter
+      src:line_numbers for each source.  */
+  if (flag_pmu_profile)
+    {
+      if (!pmu_profile_filename)
+        pmu_profile_filename = pmu_profile_default_filename;
+      process_pmu_profile ();
+    }
+
   first_arg = argno;
   
   for (; argno != argc; argno++)
@@ -433,12 +480,14 @@ 
   fnotice (file, "  -b, --branch-probabilities      Include branch probabilities in output\n");
   fnotice (file, "  -c, --branch-counts             Given counts of branches taken\n\
                                     rather than percentages\n");
+  fnotice (file, "  -m, --pmu-profile               Output PMU profile data if available\n");
   fnotice (file, "  -n, --no-output                 Do not create an output file\n");
   fnotice (file, "  -l, --long-file-names           Use long output file names for included\n\
                                     source files\n");
   fnotice (file, "  -f, --function-summaries        Output summaries for each function\n");
   fnotice (file, "  -o, --object-directory DIR|FILE Search for object files in DIR or called FILE\n");
   fnotice (file, "  -p, --preserve-paths            Preserve all pathname components\n");
+  fnotice (file, "  -q, --pmu_profile-path          Path for PMU profile (default pmuprofile.gcda)\n");
   fnotice (file, "  -u, --unconditional-branches    Show unconditional branch counts too\n");
   fnotice (file, "  -i, --intermediate-format       Output .gcov file in an intermediate text\n\
                                     format that can be used by 'lcov' or other\n\
@@ -473,6 +522,7 @@ 
   { "all-blocks",           no_argument,       NULL, 'a' },
   { "branch-probabilities", no_argument,       NULL, 'b' },
   { "branch-counts",        no_argument,       NULL, 'c' },
+  { "pmu-profile",          no_argument,       NULL, 'm' },
   { "no-output",            no_argument,       NULL, 'n' },
   { "long-file-names",      no_argument,       NULL, 'l' },
   { "function-summaries",   no_argument,       NULL, 'f' },
@@ -480,6 +530,7 @@ 
   { "object-directory",     required_argument, NULL, 'o' },
   { "object-file",          required_argument, NULL, 'o' },
   { "unconditional-branches", no_argument,     NULL, 'u' },
+  { "pmu_profile-path",     required_argument, NULL, 'q' },
   { "display-progress",     no_argument,       NULL, 'd' },
   { "intermediate-format",  no_argument,       NULL, 'i' },
   { 0, 0, 0, 0 }
@@ -492,7 +543,7 @@ 
 {
   int opt;
 
-  while ((opt = getopt_long (argc, argv, "abcdfhilno:puv", options, NULL)) !=
+  while ((opt = getopt_long (argc, argv, "abcdfhilno:pq:uv", options, NULL)) !=
          -1)
     {
       switch (opt)
@@ -515,6 +566,9 @@ 
 	case 'l':
 	  flag_long_names = 1;
 	  break;
+	case 'm':
+	  flag_pmu_profile = 1;
+	  break;
 	case 'n':
 	  flag_gcov_file = 0;
 	  break;
@@ -524,6 +578,9 @@ 
 	case 'p':
 	  flag_preserve_paths = 1;
 	  break;
+	case 'q':
+	  pmu_profile_filename = optarg;
+	  break;
 	case 'u':
 	  flag_unconditional = 1;
 	  break;
@@ -766,6 +823,8 @@ 
 {
   function_t *fn;
   source_t *src;
+  ll_infos_t *ll_infos = &pmu_global_info.ll_infos;
+  brm_infos_t *brm_infos = &pmu_global_info.brm_infos;
 
   while ((src = sources))
     {
@@ -773,6 +832,14 @@ 
 
       free (src->name);
       free (src->lines);
+      if (src->pmu_data)
+        {
+          if (src->pmu_data->ll_infos.ll_array)
+            free (src->pmu_data->ll_infos.ll_array);
+          if (src->pmu_data->brm_infos.brm_array)
+            free (src->pmu_data->brm_infos.brm_array);
+          free (src->pmu_data);
+        }
     }
 
   while ((fn = functions))
@@ -794,6 +861,42 @@ 
       free (fn->blocks);
       free (fn->counts);
     }
+
+  /* Cleanup PMU load latency info.  */
+  if (ll_infos->ll_count)
+    {
+      unsigned i;
+
+      /* delete each element */
+      for (i = 0; i < ll_infos->ll_count; ++i)
+        {
+          if (ll_infos->ll_array[i]->filename)
+            XDELETE (ll_infos->ll_array[i]->filename);
+          XDELETE (ll_infos->ll_array[i]);
+        }
+      /* delete the array itself */
+      XDELETE (ll_infos->ll_array);
+      ll_infos->ll_array = NULL;
+      ll_infos->ll_count = 0;
+    }
+
+  /* Cleanup PMU branch mispredict info.  */
+  if (brm_infos->brm_count)
+    {
+      unsigned i;
+
+      /* delete each element */
+      for (i = 0; i < brm_infos->brm_count; ++i)
+        {
+          if (brm_infos->brm_array[i]->filename)
+            XDELETE (brm_infos->brm_array[i]->filename);
+          XDELETE (brm_infos->brm_array[i]);
+        }
+      /* delete the array itself */
+      XDELETE (brm_infos->brm_array);
+      brm_infos->brm_array = NULL;
+      brm_infos->brm_count = 0;
+    }
 }
 
 /* Generate the names of the graph and data files. If OBJECT_DIRECTORY
@@ -890,6 +993,7 @@ 
       src->coverage.name = src->name;
       src->index = source_index++;
       src->next = sources;
+      src->pmu_data = 0;
       sources = src;
 
       if (!stat (file_name, &status))
@@ -1806,6 +1910,140 @@ 
     fnotice (stderr, "%s:no lines for '%s'\n", bbg_file_name, fn->name);
 }
 
+/* Filter PMU profile global data for lines for SRC.  Save PMU info
+   matching the source file and sort them by line number for later
+   line by line processing.  */
+
+static void
+filter_pmu_data_lines (source_t *src)
+{
+  unsigned i;
+  int changed;
+  ll_infos_t *ll_infos;         /* load latency information for this source */
+  brm_infos_t *brm_infos;  /* branch mispredict information for this source */
+
+  if (pmu_global_info.ll_infos.ll_count == 0 &&
+      pmu_global_info.brm_infos.brm_count == 0)
+    /* If there are no global entries, there is nothing to filter.  */
+    return;
+
+  src->pmu_data = XCNEW (pmu_data_t);
+  ll_infos = &src->pmu_data->ll_infos;
+  brm_infos = &src->pmu_data->brm_infos;
+  ll_infos->pmu_tool_header = pmu_global_info.ll_infos.pmu_tool_header;
+  brm_infos->pmu_tool_header = pmu_global_info.brm_infos.pmu_tool_header;
+  ll_infos->ll_array = 0;
+  brm_infos->brm_array = 0;
+
+  /* Go over all the load latency entries and save the ones
+     corresponding to this source file.  */
+  for (i = 0; i < pmu_global_info.ll_infos.ll_count; ++i)
+    {
+      gcov_pmu_ll_info_t *ll_info = pmu_global_info.ll_infos.ll_array[i];
+      if (0 == strcmp (src->name, ll_info->filename))
+        {
+          if (!ll_infos->ll_array)
+            {
+              ll_infos->ll_count = 0;
+              ll_infos->alloc_ll_count = 64;
+              ll_infos->ll_array = XCNEWVEC (gcov_pmu_ll_info_t *,
+                                             ll_infos->alloc_ll_count);
+            }
+          /* Found a matching entry, save it.  */
+          ll_infos->ll_count++;
+          if (ll_infos->ll_count >= ll_infos->alloc_ll_count)
+            {
+              /* need to realloc */
+              ll_infos->ll_array = (gcov_pmu_ll_info_t **)
+                xrealloc (ll_infos->ll_array, 2 * ll_infos->alloc_ll_count);
+            }
+          ll_infos->ll_array[ll_infos->ll_count - 1] = ll_info;
+        }
+    }
+
+  /* Go over all the branch mispredict entries and save the ones
+     corresponding to this source file.  */
+  for (i = 0; i < pmu_global_info.brm_infos.brm_count; ++i)
+    {
+      gcov_pmu_brm_info_t *brm_info = pmu_global_info.brm_infos.brm_array[i];
+      if (0 == strcmp (src->name, brm_info->filename))
+        {
+          if (!brm_infos->brm_array)
+            {
+              brm_infos->brm_count = 0;
+              brm_infos->alloc_brm_count = 64;
+              brm_infos->brm_array = XCNEWVEC (gcov_pmu_brm_info_t *,
+                                               brm_infos->alloc_brm_count);
+            }
+          /* Found a matching entry, save it.  */
+          brm_infos->brm_count++;
+          if (brm_infos->brm_count >= brm_infos->alloc_brm_count)
+            {
+              /* need to realloc */
+              brm_infos->brm_array = (gcov_pmu_brm_info_t **)
+                xrealloc (brm_infos->brm_array, 2 * brm_infos->alloc_brm_count);
+            }
+          brm_infos->brm_array[brm_infos->brm_count - 1] = brm_info;
+        }
+    }
+
+  /* Sort the load latency data according to the line numbers because
+     we later iterate over sources in line number order. Normally we
+     expect the PMU tool to provide sorted data, but a few entries can
+     be out of order. Thus we use a very simple bubble sort here.  */
+  if (ll_infos->ll_count > 1)
+    {
+      changed = 1;
+      while (changed)
+        {
+          changed = 0;
+          for (i = 0; i < ll_infos->ll_count - 1; ++i)
+            {
+              gcov_pmu_ll_info_t *item1 = ll_infos->ll_array[i];
+              gcov_pmu_ll_info_t *item2 = ll_infos->ll_array[i+1];
+              if (item1->line > item2->line)
+                {
+                  /* swap */
+                  gcov_pmu_ll_info_t *tmp = ll_infos->ll_array[i];
+                  ll_infos->ll_array[i] = ll_infos->ll_array[i+1];
+                  ll_infos->ll_array[i+1] = tmp;
+                  changed = 1;
+                }
+            }
+        }
+    }
+
+  /* Similarly, sort branch mispredict info as well.  */
+  if (brm_infos->brm_count > 1)
+    {
+      changed = 1;
+      while (changed)
+        {
+          changed = 0;
+          for (i = 0; i < brm_infos->brm_count - 1; ++i)
+            {
+              gcov_pmu_brm_info_t *item1 = brm_infos->brm_array[i];
+              gcov_pmu_brm_info_t *item2 = brm_infos->brm_array[i+1];
+              if (item1->line > item2->line)
+                {
+                  /* swap */
+                  gcov_pmu_brm_info_t *tmp = brm_infos->brm_array[i];
+                  brm_infos->brm_array[i] = brm_infos->brm_array[i+1];
+                  brm_infos->brm_array[i+1] = tmp;
+                  changed = 1;
+                }
+            }
+        }
+    }
+
+  /* If no matching PMU info was found, relase the structures.  */
+  if (!brm_infos->brm_array && !ll_infos->ll_array)
+  {
+    free (src->pmu_data);
+    src->pmu_data = 0;
+  }
+}
+
 /* Accumulate the line counts of a file.  */
 
 static void
@@ -1815,6 +2053,10 @@ 
   function_t *fn, *fn_p, *fn_n;
   unsigned ix;
 
+  if (flag_pmu_profile)
+    /* Filter PMU profile by source files and save into matching line(s).  */
+    filter_pmu_data_lines (src);
+
   /* Reverse the function order.  */
   for (fn = src->functions, fn_p = NULL; fn;
        fn_p = fn, fn = fn_n)
@@ -2062,6 +2304,9 @@ 
   else if (src->file_time == 0)
     fprintf (gcov_file, "%9s:%5d:Source is newer than graph\n", "-", 0);
 
+  if (src->pmu_data)
+    output_pmu_data_header (gcov_file, src->pmu_data);
+
   if (flag_branches)
     fn = src->functions;
 
@@ -2139,6 +2384,10 @@ 
 	  for (ix = 0, arc = line->u.branches; arc; arc = arc->line_next)
 	    ix += output_branch_count (gcov_file, ix, arc);
 	}
+
+      /* Output PMU profile info if available.  */
+      if (flag_pmu_profile)
+        output_pmu_data (gcov_file, src, line_num);
     }
 
   /* Handle all remaining source lines.  There may be lines after the
@@ -2162,3 +2411,236 @@ 
   if (source_file)
     fclose (source_file);
 }
+
+/* Print an explanatory header for PMU_DATA into GCOV_FILE.  */
+
+static void
+output_pmu_data_header (FILE *gcov_file, pmu_data_t *pmu_data)
+{
+  /* Print header for the applicable PMU events.  */
+  fprintf (gcov_file, "%9s:%5d\n", "-", 0);
+  if (pmu_data->ll_infos.ll_count)
+    {
+      char *text = pmu_data->ll_infos.pmu_tool_header->column_description;
+      char c;
+      fprintf (gcov_file, "%9s:%5u: %s", "PMU_LL", 0,
+               pmu_data->ll_infos.pmu_tool_header->column_header);
+      /* The column description is multiline text and we want to print
+         each line separately after formatting it.  */
+      fprintf (gcov_file, "%9s:%5u: ", "PMU_LL", 0);
+      while ((c = *text++))
+        {
+          fprintf (gcov_file, "%c", c);
+          /* Do not print a new header on trailing newline.   */
+          if (c == '\n' && text[1])
+            fprintf (gcov_file, "%9s:%5u: ", "PMU_LL", 0);
+        }
+      fprintf (gcov_file, "%9s:%5d\n", "-", 0);
+    }
+
+  if (pmu_data->brm_infos.brm_count)
+    {
+
+      fprintf (gcov_file, "%9s:%5d:PMU BRM: line: %s %s %s\n",
+               "-", 0, "count", "self", "address");
+      fprintf (gcov_file, "%9s:%5d:         "
+               "count: number of branch mispredicts sampled at this address\n",
+               "-", 0);
+      fprintf (gcov_file, "%9s:%5d:         "
+               "self: branch mispredicts as percentage of the entire program\n",
+               "-", 0);
+      fprintf (gcov_file, "%9s:%5d\n", "-", 0);
+    }
+}
+
+/* Output pmu data corresponding to SRC and LINE_NUM into GCOV_FILE.  */
+
+static void
+output_pmu_data (FILE *gcov_file, const source_t *src, const unsigned line_num)
+{
+  unsigned i;
+  ll_infos_t *ll_infos;
+  brm_infos_t *brm_infos;
+  gcov_pmu_tool_header_t *tool_header;
+
+  if (!src->pmu_data)
+    return;
+
+  ll_infos = &src->pmu_data->ll_infos;
+  brm_infos = &src->pmu_data->brm_infos;
+
+  if (ll_infos->ll_array)
+    {
+      tool_header = src->pmu_data->ll_infos.pmu_tool_header;
+
+      /* Search PMU load latency data for the matching line
+         numbers. There could be multiple entries with the same line
+         number. We use the fact that line numbers are sorted in
+         ll_array.  */
+      for (i = 0; i < ll_infos->ll_count &&
+             ll_infos->ll_array[i]->line <= line_num; ++i)
+        {
+          gcov_pmu_ll_info_t *ll_info = ll_infos->ll_array[i];
+          if (ll_info->line == line_num)
+            output_load_latency_line (gcov_file, ll_info, tool_header);
+        }
+    }
+
+  if (brm_infos->brm_array)
+    {
+      tool_header = src->pmu_data->brm_infos.pmu_tool_header;
+
+      /* Search PMU branch mispredict data for the matching line
+         numbers. There could be multiple entries with the same line
+         number. We use the fact that line numbers are sorted in
+         brm_array.  */
+      for (i = 0; i < brm_infos->brm_count &&
+             brm_infos->brm_array[i]->line <= line_num; ++i)
+        {
+          gcov_pmu_brm_info_t *brm_info = brm_infos->brm_array[i];
+          if (brm_info->line == line_num)
+            output_branch_mispredict_line (gcov_file, brm_info);
+        }
+    }
+}
+
+
+/* Output formatted load latency info pointed to by LL_INFO into the
+   open file FP.  TOOL_HEADER contains additional explanation of
+   fields.  */
+
+static void
+output_load_latency_line (FILE *fp, const gcov_pmu_ll_info_t *ll_info,
+                          gcov_pmu_tool_header_t *tool_header ATTRIBUTE_UNUSED)
+{
+  fprintf (fp, "%9s:%5u:      ", "PMU_LL", ll_info->line);
+  fprintf (fp, " %u %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% "
+           "%.2f%% %.2f%% " HOST_WIDEST_INT_PRINT_HEX "\n",
+           ll_info->counts,
+           convert_unsigned_to_pct (ll_info->self),
+           convert_unsigned_to_pct (ll_info->cum),
+           convert_unsigned_to_pct (ll_info->lt_10),
+           convert_unsigned_to_pct (ll_info->lt_32),
+           convert_unsigned_to_pct (ll_info->lt_64),
+           convert_unsigned_to_pct (ll_info->lt_256),
+           convert_unsigned_to_pct (ll_info->lt_1024),
+           convert_unsigned_to_pct (ll_info->gt_1024),
+           convert_unsigned_to_pct (ll_info->wself),
+           ll_info->code_addr);
+}
+
+
+/* Output formatted branch mispredict info pointed to by BRM_INFO into
+   the open file FP.  */
+
+static void
+output_branch_mispredict_line (FILE *fp,
+                               const gcov_pmu_brm_info_t *ll_info)
+{
+  fprintf (fp, "%9s:%5u: count: %u self: %.2f%% addr: "
+           HOST_WIDEST_INT_PRINT_HEX "\n",
+           "PMU BRM",
+           ll_info->line,
+           ll_info->counts,
+           convert_unsigned_to_pct (ll_info->self),
+           ll_info->code_addr);
+}
+
+/* Read in the PMU profile information from the global PMU profile file.  */
+
+static void process_pmu_profile (void)
+{
+  unsigned tag;
+  unsigned version;
+  int error = 0;
+  ll_infos_t *ll_infos = &pmu_global_info.ll_infos;
+  brm_infos_t *brm_infos = &pmu_global_info.brm_infos;
+
+  /* Construct path for pmuprofile.gcda filename. */
+  create_file_names (pmu_profile_filename);
+  if (!gcov_open (da_file_name, 1))
+    {
+      fnotice (stderr, "%s:cannot open pmu profile file\n",
+               pmu_profile_filename);
+      return;
+    }
+  if (!gcov_magic (gcov_read_unsigned (), GCOV_DATA_MAGIC))
+    {
+      fnotice (stderr, "%s:not a gcov data file\n", da_file_name);
+    cleanup:;
+      gcov_close ();
+      return;
+    }
+  version = gcov_read_unsigned ();
+  if (version != GCOV_VERSION)
+    {
+      char v[4], e[4];
+
+      GCOV_UNSIGNED2STRING (v, version);
+      GCOV_UNSIGNED2STRING (e, GCOV_VERSION);
+      fnotice (stderr, "%s:version '%.4s', prefer version '%.4s'\n",
+	       da_file_name, v, e);
+    }
+  /* read stamp */
+  tag = gcov_read_unsigned ();
+
+  /* Initialize PMU data fields. */
+  ll_infos->ll_count = 0;
+  ll_infos->alloc_ll_count = 64;
+  ll_infos->ll_array = XCNEWVEC (gcov_pmu_ll_info_t *, ll_infos->alloc_ll_count);
+
+  brm_infos->brm_count = 0;
+  brm_infos->alloc_brm_count = 64;
+  brm_infos->brm_array = XCNEWVEC (gcov_pmu_brm_info_t *,
+                                   brm_infos->alloc_brm_count);
+
+  while ((tag = gcov_read_unsigned ()))
+    {
+      unsigned length = gcov_read_unsigned ();
+      unsigned long base = gcov_position ();
+
+      if (tag == GCOV_TAG_PMU_LOAD_LATENCY_INFO)
+        {
+          gcov_pmu_ll_info_t *ll_info = XCNEW (gcov_pmu_ll_info_t);
+          gcov_read_pmu_load_latency_info (ll_info, length);
+          ll_infos->ll_count++;
+          if (ll_infos->ll_count >= ll_infos->alloc_ll_count)
+            {
+              /* need to realloc */
+              ll_infos->ll_array = (gcov_pmu_ll_info_t **)
+                xrealloc (ll_infos->ll_array, 2 * ll_infos->alloc_ll_count);
+            }
+          ll_infos->ll_array[ll_infos->ll_count - 1] = ll_info;
+        }
+      else if (tag == GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO)
+        {
+          gcov_pmu_brm_info_t *brm_info = XCNEW (gcov_pmu_brm_info_t);
+          gcov_read_pmu_branch_mispredict_info (brm_info, length);
+          brm_infos->brm_count++;
+          if (brm_infos->brm_count >= brm_infos->alloc_brm_count)
+            {
+              /* need to realloc */
+              brm_infos->brm_array = (gcov_pmu_brm_info_t **)
+                xrealloc (brm_infos->brm_array, 2 * brm_infos->alloc_brm_count);
+            }
+          brm_infos->brm_array[brm_infos->brm_count - 1] = brm_info;
+        }
+      else if (tag == GCOV_TAG_PMU_TOOL_HEADER)
+        {
+          gcov_pmu_tool_header_t *tool_header = XCNEW (gcov_pmu_tool_header_t);
+          gcov_read_pmu_tool_header (tool_header, length);
+          ll_infos->pmu_tool_header = tool_header;
+          brm_infos->pmu_tool_header = tool_header;
+        }
+
+      gcov_sync (base, length);
+      if ((error = gcov_is_error ()))
+	{
+	  fnotice (stderr, error < 0 ? "%s:overflowed\n" : "%s:corrupted\n",
+		   da_file_name);
+	  goto cleanup;
+	}
+    }
+
+  gcov_close ();
+}
Index: gcc/gcov-io.c
===================================================================
--- gcc/gcov-io.c	(revision 175226)
+++ gcc/gcov-io.c	(working copy)
@@ -23,6 +23,12 @@ 
 /* Routines declared in gcov-io.h.  This file should be #included by
    another source file, after having #included gcov-io.h.  */
 
+/* Redefine these here, rather than using the ones in system.h since
+ * including system.h leads to conflicting definitions of other
+ * symbols and macros.  */
+#undef MIN
+#define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
+
 #if !IN_GCOV
 static void gcov_write_block (unsigned);
 static gcov_unsigned_t *gcov_write_words (unsigned);
@@ -197,6 +203,104 @@ 
 }
 
 #if !IN_LIBGCOV
+/* Modify FILENAME to a canonical form after stripping known prefixes
+   in place.  It removes '/proc/self/cwd' and '/proc/self/cwd/.'.
+   Returns the in-place modified filename.  */
+
+GCOV_LINKAGE char *
+gcov_canonical_filename (char *filename)
+{
+  static char cwd_dot_str[] = "/proc/self/cwd/./";
+  int cwd_dot_len = strlen (cwd_dot_str);
+  int cwd_len = cwd_dot_len - 2; /* without trailing './' */
+  int filename_len = strlen (filename);
+  /* delete the longer prefix first */
+  if (0 == strncmp (filename, cwd_dot_str, cwd_dot_len))
+    {
+      memmove (filename, filename + cwd_dot_len, filename_len - cwd_dot_len);
+      filename[filename_len - cwd_dot_len] = '\0';
+      return filename;
+    }
+
+  if (0 == strncmp (filename, cwd_dot_str, cwd_len))
+    {
+      memmove (filename, filename + cwd_len, filename_len - cwd_len);
+      filename[filename_len - cwd_len] = '\0';
+      return filename;
+    }
+  return filename;
+}
+
+/* Read LEN words and construct load latency info LL_INFO.  */
+
+GCOV_LINKAGE void
+gcov_read_pmu_load_latency_info (gcov_pmu_ll_info_t *ll_info,
+                                 gcov_unsigned_t len ATTRIBUTE_UNUSED)
+{
+  const char *filename;
+  ll_info->counts = gcov_read_unsigned ();
+  ll_info->self = gcov_read_unsigned ();
+  ll_info->cum = gcov_read_unsigned ();
+  ll_info->lt_10 = gcov_read_unsigned ();
+  ll_info->lt_32 = gcov_read_unsigned ();
+  ll_info->lt_64 = gcov_read_unsigned ();
+  ll_info->lt_256 = gcov_read_unsigned ();
+  ll_info->lt_1024 = gcov_read_unsigned ();
+  ll_info->gt_1024 = gcov_read_unsigned ();
+  ll_info->wself = gcov_read_unsigned ();
+  ll_info->code_addr = gcov_read_counter ();
+  ll_info->line = gcov_read_unsigned ();
+  ll_info->discriminator = gcov_read_unsigned ();
+  filename = gcov_read_string ();
+  if (filename)
+    ll_info->filename = gcov_canonical_filename (xstrdup (filename));
+  else
+    ll_info->filename = 0;
+}
+
+/* Read LEN words and construct branch mispredict info BRM_INFO.  */
+
+GCOV_LINKAGE void
+gcov_read_pmu_branch_mispredict_info (gcov_pmu_brm_info_t *brm_info,
+                                      gcov_unsigned_t len ATTRIBUTE_UNUSED)
+{
+  const char *filename;
+  brm_info->counts = gcov_read_unsigned ();
+  brm_info->self = gcov_read_unsigned ();
+  brm_info->cum = gcov_read_unsigned ();
+  brm_info->code_addr = gcov_read_counter ();
+  brm_info->line = gcov_read_unsigned ();
+  brm_info->discriminator = gcov_read_unsigned ();
+  filename = gcov_read_string ();
+  if (filename)
+    brm_info->filename = gcov_canonical_filename (xstrdup (filename));
+  else
+    brm_info->filename = 0;
+}
+
+/* Read LEN words from an open gcov file and construct data into pmu
+   tool header TOOL_HEADER.  */
+
+GCOV_LINKAGE void gcov_read_pmu_tool_header (gcov_pmu_tool_header_t *header,
+                                           gcov_unsigned_t len ATTRIBUTE_UNUSED)
+{
+  const char *str;
+  str = gcov_read_string ();
+  header->host_cpu = str ? xstrdup (str) : 0;
+  str = gcov_read_string ();
+  header->hostname = str ? xstrdup (str) : 0;
+  str = gcov_read_string ();
+  header->kernel_version = str ? xstrdup (str) : 0;
+  str = gcov_read_string ();
+  header->column_header = str ? xstrdup (str) : 0;
+  str = gcov_read_string ();
+  header->column_description = str ? xstrdup (str) : 0;
+  str = gcov_read_string ();
+  header->full_header = str ? xstrdup (str) : 0;
+}
+#endif
+
+#if !IN_LIBGCOV
 /* Check if MAGIC is EXPECTED. Use it to determine endianness of the
    file. Returns +1 for same endian, -1 for other endian and zero for
    not EXPECTED.  */
@@ -245,6 +349,24 @@ 
   gcov_var.offset -= size;
 }
 
+#if IN_LIBGCOV
+/* Return the number of words STRING would need including the length
+   field in the output stream itself.  This should be identical to
+   "alloc" calculation in gcov_write_string().  */
+
+GCOV_LINKAGE gcov_unsigned_t
+gcov_string_length (const char *string)
+{
+  gcov_unsigned_t len = (string) ? strlen (string) : 0;
+  /* + 1 because of the length field.  */
+  gcov_unsigned_t alloc = 1 + ((len + 4) >> 2);
+
+  /* Can not write a bigger than GCOV_BLOCK_SIZE string yet */
+  gcc_assert (alloc < GCOV_BLOCK_SIZE);
+  return alloc;
+}
+#endif
+
 /* Allocate space to write BYTES bytes to the gcov file. Return a
    pointer to those bytes, or NULL on failure.  */
 
@@ -255,13 +377,15 @@ 
 
   gcc_assert (gcov_var.mode < 0);
 #if IN_LIBGCOV
-  if (gcov_var.offset >= GCOV_BLOCK_SIZE)
+  if (gcov_var.offset + words >= GCOV_BLOCK_SIZE)
     {
-      gcov_write_block (GCOV_BLOCK_SIZE);
+      gcov_write_block (MIN (gcov_var.offset, GCOV_BLOCK_SIZE));
       if (gcov_var.offset)
 	{
-	  gcc_assert (gcov_var.offset == 1);
-	  memcpy (gcov_var.buffer, gcov_var.buffer + GCOV_BLOCK_SIZE, 4);
+	  gcc_assert (gcov_var.offset < GCOV_BLOCK_SIZE);
+	  memcpy (gcov_var.buffer,
+                  gcov_var.buffer + GCOV_BLOCK_SIZE,
+                  gcov_var.offset << 2);
 	}
     }
 #else
@@ -302,7 +426,6 @@ 
 }
 #endif /* IN_LIBGCOV */
 
-#if !IN_LIBGCOV
 /* Write STRING to coverage file.  Sets error flag on file
    error, overflow flag on overflow */
 
@@ -325,7 +448,6 @@ 
   buffer[alloc] = 0;
   memcpy (&buffer[1], string, length);
 }
-#endif
 
 #if !IN_LIBGCOV
 /* Write a tag TAG and reserve space for the record length. Return a
@@ -413,14 +535,15 @@ 
   unsigned excess = gcov_var.length - gcov_var.offset;
 
   gcc_assert (gcov_var.mode > 0);
+  gcc_assert (words < GCOV_BLOCK_SIZE);
   if (excess < words)
     {
       gcov_var.start += gcov_var.offset;
 #if IN_LIBGCOV
       if (excess)
 	{
-	  gcc_assert (excess == 1);
-	  memcpy (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, 4);
+	  gcc_assert (excess < GCOV_BLOCK_SIZE);
+	  memmove (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, excess * 4);
 	}
 #else
       memmove (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, excess * 4);
@@ -428,8 +551,7 @@ 
       gcov_var.offset = 0;
       gcov_var.length = excess;
 #if IN_LIBGCOV
-      gcc_assert (!gcov_var.length || gcov_var.length == 1);
-      excess = GCOV_BLOCK_SIZE;
+      excess = (sizeof (gcov_var.buffer) / sizeof (gcov_var.buffer[0])) - gcov_var.length;
 #else
       if (gcov_var.length + words > gcov_var.alloc)
 	gcov_allocate (gcov_var.length + words);
@@ -489,7 +611,6 @@ 
    buffer, or NULL on empty string. You must copy the string before
    calling another gcov function.  */
 
-#if !IN_LIBGCOV
 GCOV_LINKAGE const char *
 gcov_read_string (void)
 {
@@ -500,7 +621,6 @@ 
 
   return (const char *) gcov_read_words (length);
 }
-#endif
 
 GCOV_LINKAGE void
 gcov_read_summary (struct gcov_summary *summary)
@@ -629,6 +749,87 @@ 
 }
 #endif
 
+/* Convert an unsigned NUMBER to a percentage after dividing by
+   100.  */
+
+GCOV_LINKAGE float
+convert_unsigned_to_pct (const unsigned number)
+{
+  return (float)number / 100.0f;
+}
+
+#if !IN_LIBGCOV && IN_GCOV != 1
+/* Print load latency information given by LL_INFO in a human readable
+   format into an open output file pointed by FP. NEWLINE specifies
+   whether or not to print a trailing newline.  */
+
+GCOV_LINKAGE void
+print_load_latency_line (FILE *fp, const gcov_pmu_ll_info_t *ll_info,
+                         const enum print_newline newline)
+{
+  if (!ll_info)
+    return;
+  fprintf (fp, " %u %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% %.2f%% "
+           "%.2f%% %.2f%% " HOST_WIDEST_INT_PRINT_HEX " %s %d %d",
+           ll_info->counts,
+           convert_unsigned_to_pct (ll_info->self),
+           convert_unsigned_to_pct (ll_info->cum),
+           convert_unsigned_to_pct (ll_info->lt_10),
+           convert_unsigned_to_pct (ll_info->lt_32),
+           convert_unsigned_to_pct (ll_info->lt_64),
+           convert_unsigned_to_pct (ll_info->lt_256),
+           convert_unsigned_to_pct (ll_info->lt_1024),
+           convert_unsigned_to_pct (ll_info->gt_1024),
+           convert_unsigned_to_pct (ll_info->wself),
+           ll_info->code_addr,
+           ll_info->filename,
+           ll_info->line,
+           ll_info->discriminator);
+  if (newline == add_newline)
+    fprintf (fp, "\n");
+}
+
+/* Print BRM_INFO into the file pointed by FP.  NEWLINE specifies
+   whether or not to print a trailing newline.  */
+
+GCOV_LINKAGE void
+print_branch_mispredict_line (FILE *fp, const gcov_pmu_brm_info_t *brm_info,
+                              const enum print_newline newline)
+{
+  if (!brm_info)
+    return;
+  fprintf (fp, " %u %.2f%% %.2f%% " HOST_WIDEST_INT_PRINT_HEX " %s %d %d",
+           brm_info->counts,
+           convert_unsigned_to_pct (brm_info->self),
+           convert_unsigned_to_pct (brm_info->cum),
+           brm_info->code_addr,
+           brm_info->filename,
+           brm_info->line,
+           brm_info->discriminator);
+  if (newline == add_newline)
+    fprintf (fp, "\n");
+}
+
+/* Print TOOL_HEADER into the file pointed by FP.  NEWLINE specifies
+   whether or not to print a trailing newline.  */
+
+GCOV_LINKAGE void
+print_pmu_tool_header (FILE *fp, gcov_pmu_tool_header_t *tool_header,
+                       const enum print_newline newline)
+{
+  if (!tool_header)
+    return;
+  fprintf (fp, "\nhost_cpu: %s\n", tool_header->host_cpu);
+  fprintf (fp, "hostname: %s\n", tool_header->hostname);
+  fprintf (fp, "kernel_version: %s\n", tool_header->kernel_version);
+  fprintf (fp, "column_header: %s\n", tool_header->column_header);
+  fprintf (fp, "column_description: %s\n", tool_header->column_description);
+  fprintf (fp, "full_header: %s\n", tool_header->full_header);
+  if (newline == add_newline)
+    fprintf (fp, "\n");
+}
+#endif
+
 #if IN_GCOV > 0
 /* Return the modification time of the current gcov file.  */
 
@@ -715,7 +916,7 @@ 
   if (vsize <= vpos)
     {
       printk (KERN_ERR
-          "GCOV_KERNEL: something wrong: vbuf=%p vsize=%u vpos=%u\n",
+         "GCOV_KERNEL: something wrong: vbuf=%p vsize=%u vpos=%u\n",
           vbuf, vsize, vpos);
       return 0;
     }
@@ -744,4 +945,29 @@ 
   gcc_assert (0);  /* should not reach here */
   return 0;
 }
+#else /* __GCOV_KERNEL__ */
+
+#if IN_GCOV != 1
+/* Delete pmu tool header TOOL_HEADER.  */
+
+GCOV_LINKAGE void
+destroy_pmu_tool_header (gcov_pmu_tool_header_t *tool_header)
+{
+  if (!tool_header)
+    return;
+  if (tool_header->host_cpu)
+    free (tool_header->host_cpu);
+  if (tool_header->hostname)
+    free (tool_header->hostname);
+  if (tool_header->kernel_version)
+    free (tool_header->kernel_version);
+  if (tool_header->column_header)
+    free (tool_header->column_header);
+  if (tool_header->column_description)
+    free (tool_header->column_description);
+  if (tool_header->full_header)
+    free (tool_header->full_header);
+}
+#endif
+
 #endif /* GCOV_KERNEL */
Index: gcc/gcov-io.h
===================================================================
--- gcc/gcov-io.h	(revision 175226)
+++ gcc/gcov-io.h	(working copy)
@@ -313,6 +313,7 @@ 
 
 typedef unsigned gcov_unsigned_t;
 typedef unsigned gcov_position_t;
+
 /* gcov_type is typedef'd elsewhere for the compiler */
 #if IN_GCOV
 #define GCOV_LINKAGE static
@@ -363,15 +364,24 @@ 
 #define gcov_write_counter __gcov_write_counter
 #define gcov_write_summary __gcov_write_summary
 #define gcov_write_module_info __gcov_write_module_info
+#define gcov_write_string __gcov_write_string
+#define gcov_string_length __gcov_string_length
 #define gcov_read_unsigned __gcov_read_unsigned
 #define gcov_read_counter __gcov_read_counter
+#define gcov_read_string __gcov_read_string
 #define gcov_read_summary __gcov_read_summary
 #define gcov_read_module_info __gcov_read_module_info
 #define gcov_sort_n_vals __gcov_sort_n_vals
+#define gcov_canonical_filename _gcov_canonical_filename
+#define gcov_read_pmu_load_latency_info __gcov_read_pmu_load_latency_info
+#define gcov_read_pmu_branch_mispredict_info __gcov_read_pmu_branch_mispredict_info
+#define gcov_read_pmu_tool_header __gcov_read_pmu_tool_header
+#define destroy_pmu_tool_header __destroy_pmu_tool_header
 
+
 /* Poison these, so they don't accidentally slip in.  */
-#pragma GCC poison gcov_write_string gcov_write_tag gcov_write_length
-#pragma GCC poison gcov_read_string gcov_sync gcov_time gcov_magic
+#pragma GCC poison gcov_write_tag gcov_write_length
+#pragma GCC poison gcov_sync gcov_time gcov_magic
 
 #ifdef HAVE_GAS_HIDDEN
 #define ATTRIBUTE_HIDDEN  __attribute__ ((__visibility__ ("hidden")))
@@ -432,6 +442,13 @@ 
 #define GCOV_TAG_SUMMARY_LENGTH  \
 	(1 + GCOV_COUNTERS_SUMMABLE * (2 + 3 * 2))
 #define GCOV_TAG_MODULE_INFO ((gcov_unsigned_t)0xa4000000)
+#define GCOV_TAG_PMU_LOAD_LATENCY_INFO ((gcov_unsigned_t)0xa5000000)
+#define GCOV_TAG_PMU_LOAD_LATENCY_LENGTH(filename)  \
+  (gcov_string_length (filename) + 12 + 2)
+#define GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO ((gcov_unsigned_t)0xa7000000)
+#define GCOV_TAG_PMU_BRANCH_MISPREDICT_LENGTH(filename)  \
+  (gcov_string_length (filename) + 5 + 2)
+#define GCOV_TAG_PMU_TOOL_HEADER ((gcov_unsigned_t)0xa9000000)
 
 /* Counters that are collected.  */
 #define GCOV_COUNTER_ARCS 	0  /* Arc transitions.  */
@@ -545,6 +562,8 @@ 
 #define GCOV_MODULE_ASM_STMTS (1 << 16)
 #define GCOV_MODULE_LANG_MASK 0xffff
 
+enum print_newline {no_newline, add_newline};
+
 /* Source module info. The data structure is used in
    both runtime and profile-use phase. Make sure to allocate
    enough space for the variable length member.  */
@@ -576,6 +595,91 @@ 
    && !((module_infos[0]->lang & GCOV_MODULE_ASM_STMTS)			\
 	&& flag_ripa_disallow_asm_modules))
 
+/* Information about the hardware performance monitoring unit.  */
+struct gcov_pmu_info
+{
+  const char *pmu_profile_filename;	/* pmu profile filename  */
+  const char *pmu_tool;  	/* canonical pmu tool options  */
+  gcov_unsigned_t pmu_top_n_address;  /* how many top addresses to symbolize */
+};
+
+/* Information about the PMU tool header.  */
+typedef struct gcov_pmu_tool_header {
+  char *host_cpu;
+  char *hostname;
+  char *kernel_version;
+  char *column_header;
+  char *column_description;
+  char *full_header;
+} gcov_pmu_tool_header_t;
+
+/* Available only for PMUs which support PEBS or IBS using pfmon
+   tool. If any field here is changed, the length computation in
+   GCOV_TAG_PMU_LOAD_LATENCY_LENGTH must be updated as well. All
+   percentages are multiplied by 100 to make them out of 10000 and
+   only integer part is kept.  */
+typedef struct gcov_pmu_load_latency_info
+{
+  gcov_unsigned_t counts;     /* raw count of samples */
+  gcov_unsigned_t self;       /* per 10k of total samples */
+  gcov_unsigned_t cum;        /* per 10k cumulative weight */
+  gcov_unsigned_t lt_10;      /* per 10k with latency <= 10 cycles */
+  gcov_unsigned_t lt_32;      /* per 10k with latency <= 32 cycles */
+  gcov_unsigned_t lt_64;      /* per 10k with latency <= 64 cycles */
+  gcov_unsigned_t lt_256;     /* per 10k with latency <= 256 cycles */
+  gcov_unsigned_t lt_1024;    /* per 10k with latency <= 1024 cycles */
+  gcov_unsigned_t gt_1024;    /* per 10k with latency > 1024 cycles */
+  gcov_unsigned_t wself;      /* weighted average cost of this miss in cycles */
+  gcov_type code_addr;        /* the actual miss address (pc+1 for Intel) */
+  gcov_unsigned_t line;       /* line number corresponding to this miss */
+  gcov_unsigned_t discriminator;   /* discriminator information for this miss */
+  char *filename;       /* filename corresponding to this miss */
+} gcov_pmu_ll_info_t;
+
+/* This structure is used during runtime as well as in gcov.  */
+typedef struct load_latency_infos
+{
+  /* An array describing the total number of load latency fields.  */
+  gcov_pmu_ll_info_t **ll_array;
+  /* The total number of entries in the load latency array.  */
+  unsigned ll_count;
+  /* The total number of entries currently allocated in the array.
+     Used for bookkeeping.  */
+  unsigned alloc_ll_count;
+  /* PMU tool header */
+  gcov_pmu_tool_header_t *pmu_tool_header;
+} ll_infos_t;
+
+/* Available only for PMUs which support PEBS or IBS using pfmon
+   tool. If any field here is changed, the length computation in
+   GCOV_TAG_PMU_BR_MISPREDICT_LENGTH must be updated as well. All
+   percentages are multiplied by 100 to make them out of 10000 and
+   only integer part is kept.  */
+typedef struct gcov_pmu_branch_mispredict_info
+{
+  gcov_unsigned_t counts;     /* raw count of samples */
+  gcov_unsigned_t self;       /* per 10k of total samples */
+  gcov_unsigned_t cum;        /* per 10k cumulative weight */
+  gcov_type code_addr;        /* the actual mispredict address */
+  gcov_unsigned_t line;       /* line number corresponding to this event */
+  gcov_unsigned_t discriminator;   /* discriminator for this event */
+  char *filename;       /* filename corresponding to this event */
+} gcov_pmu_brm_info_t;
+
+/* This structure is used during runtime as well as in gcov.  */
+typedef struct branch_mispredict_infos
+{
+  /* An array describing the total number of mispredict entries.  */
+  gcov_pmu_brm_info_t **brm_array;
+  /* The total number of entries in the above array.  */
+  unsigned brm_count;
+  /* The total number of entries currently allocated in the array.
+     Used for bookkeeping.  */
+  unsigned alloc_brm_count;
+  /* PMU tool header */
+  gcov_pmu_tool_header_t *pmu_tool_header;
+} brm_infos_t;
+
 /* Structures embedded in coveraged program.  The structures generated
    by write_profile must match these.  */
 
@@ -635,9 +739,6 @@ 
 /* Register a new object file module.  */
 extern void __gcov_init (struct gcov_info *) ATTRIBUTE_HIDDEN;
 
-/* Set sampling rate to RATE.  */
-extern void __gcov_set_sampling_rate (unsigned int rate);
-
 /* Called before fork, to avoid double counting.  */
 extern void __gcov_flush (void) ATTRIBUTE_HIDDEN;
 
@@ -674,6 +775,12 @@ 
 extern void __gcov_ior_profiler (gcov_type *, gcov_type);
 extern void __gcov_sort_n_vals (gcov_type *value_array, int n);
 
+/* Initialize/start/stop/dump performance monitoring unit (PMU) profile */
+void __gcov_init_pmu_profiler (struct gcov_pmu_info *) ATTRIBUTE_HIDDEN;
+void __gcov_start_pmu_profiler (void) ATTRIBUTE_HIDDEN;
+void __gcov_stop_pmu_profiler (void) ATTRIBUTE_HIDDEN;
+void __gcov_end_pmu_profiler (int gcda_error) ATTRIBUTE_HIDDEN;
+
 #ifndef inhibit_libc
 /* The wrappers around some library functions..  */
 extern pid_t __gcov_fork (void) ATTRIBUTE_HIDDEN;
@@ -746,14 +853,42 @@ 
 static gcov_position_t gcov_position (void);
 static int gcov_is_error (void);
 
+GCOV_LINKAGE const char *gcov_read_string (void) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE gcov_unsigned_t gcov_read_unsigned (void) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE gcov_type gcov_read_counter (void) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE void gcov_read_summary (struct gcov_summary *) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE char *gcov_canonical_filename (char *filename) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE void
+gcov_read_pmu_load_latency_info (gcov_pmu_ll_info_t *ll_info,
+                                 gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE void
+gcov_read_pmu_branch_mispredict_info (gcov_pmu_brm_info_t *brm_info,
+                                      gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE void
+gcov_read_pmu_tool_header (gcov_pmu_tool_header_t *tool_header,
+                           gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE float convert_unsigned_to_pct (
+    const unsigned number) ATTRIBUTE_HIDDEN;
+
 #if !IN_LIBGCOV && IN_GCOV != 1
 GCOV_LINKAGE void gcov_read_module_info (struct gcov_module_info *mod_info,
 					 gcov_unsigned_t len) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE void print_load_latency_line (FILE *fp,
+                                           const gcov_pmu_ll_info_t *ll_info,
+                                           const enum print_newline);
+GCOV_LINKAGE void
+print_branch_mispredict_line (FILE *fp, const gcov_pmu_brm_info_t *brm_info,
+                              const enum print_newline);
+GCOV_LINKAGE void print_pmu_tool_header (FILE *fp,
+                                         gcov_pmu_tool_header_t *tool_header,
+                                         const enum print_newline);
 #endif
 
+#if IN_GCOV != 1
+GCOV_LINKAGE void destroy_pmu_tool_header (gcov_pmu_tool_header_t *tool_header)
+  ATTRIBUTE_HIDDEN;
+#endif
+
 #if IN_LIBGCOV
 /* Available only in libgcov */
 GCOV_LINKAGE void gcov_write_counter (gcov_type) ATTRIBUTE_HIDDEN;
@@ -771,10 +906,10 @@ 
 static void gcov_rewrite (void);
 GCOV_LINKAGE void gcov_seek (gcov_position_t /*position*/) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE void gcov_truncate (void) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE gcov_unsigned_t gcov_string_length (const char *) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE unsigned gcov_gcda_file_size (struct gcov_info *);
 #else
 /* Available outside libgcov */
-GCOV_LINKAGE const char *gcov_read_string (void);
 GCOV_LINKAGE void gcov_sync (gcov_position_t /*base*/,
 			     gcov_unsigned_t /*length */);
 #endif
@@ -782,11 +917,11 @@ 
 #if !IN_GCOV
 /* Available outside gcov */
 GCOV_LINKAGE void gcov_write_unsigned (gcov_unsigned_t) ATTRIBUTE_HIDDEN;
+GCOV_LINKAGE void gcov_write_string (const char *) ATTRIBUTE_HIDDEN;
 #endif
 
 #if !IN_GCOV && !IN_LIBGCOV
 /* Available only in compiler */
-GCOV_LINKAGE void gcov_write_string (const char *);
 GCOV_LINKAGE gcov_position_t gcov_write_tag (gcov_unsigned_t);
 GCOV_LINKAGE void gcov_write_length (gcov_position_t /*position*/);
 #endif
Index: gcc/opts.c
===================================================================
--- gcc/opts.c	(revision 175226)
+++ gcc/opts.c	(working copy)
@@ -36,6 +36,9 @@ 
 #include "insn-attr.h"		/* For INSN_SCHEDULING and DELAY_SLOTS.  */
 #include "target.h"
 
+/* Defined in coverage.c.  */
+extern int check_pmu_profile_options (const char *options);
+
 /* Parse the -femit-struct-debug-detailed option value
    and set the flag variables. */
 
@@ -1597,6 +1600,15 @@ 
         opts->x_flag_ipa_reference = false;
       break;
 
+    case OPT_fpmu_profile_generate_:
+      /* This should be ideally turned on in conjunction with
+         -fprofile-dir or -fprofile-generate in order to specify a
+         profile directory.  */
+      if (check_pmu_profile_options (arg))
+        error ("Unrecognized pmu_profile_generate value \"%s\"", arg);
+      flag_pmu_profile_generate = xstrdup (arg);
+      break;
+
     case OPT_fshow_column:
       dc->show_column = value;
       break;
Index: gcc/pmu-profile.c
===================================================================
--- gcc/pmu-profile.c	(revision 0)
+++ gcc/pmu-profile.c	(revision 0)
@@ -0,0 +1,1552 @@ 
+/* Performance monitoring unit (PMU) profiler. If available, use an
+   external tool to collect hardware performance counter data and
+   write it in the .gcda files.
+
+   Copyright (C) 2011. Free Software Foundation, Inc.
+   Contributed by Sharad Singhai <singhai@google.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#include "tconfig.h"
+#include "tsystem.h"
+#include "coretypes.h"
+#include "tm.h"
+#if (defined (__x86_64__) || defined (__i386__))
+#include "cpuid.h"
+#endif
+
+#if defined(inhibit_libc)
+#define IN_LIBGCOV (-1)
+#else
+#include <stdio.h>
+#include <stdlib.h>
+#define IN_LIBGCOV 1
+  #if defined(L_gcov)
+  #define GCOV_LINKAGE /* nothing */
+  #endif
+#endif
+#include "gcov-io.h"
+#ifdef TARGET_POSIX_IO
+  #include <fcntl.h>
+  #include <signal.h>
+  #include <sys/stat.h>
+  #include <sys/types.h>
+#endif
+
+#if defined(inhibit_libc)
+#else
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#define XNEWVEC(type,ne) (type *)calloc((ne),sizeof(type))
+#define XNEW(type) (type *)malloc(sizeof(type))
+#define XDELETEVEC(p) free(p)
+#define XDELETE(p) free(p)
+
+#define PFMON_CMD "/usr/bin/pfmon"
+#define ADDR2LINE_CMD "/usr/bin/addr2line"
+#define PMU_TOOL_MAX_ARGS (20)
+static char default_addr2line[] = "??:0";
+static const char pfmon_ll_header[] = "#     counts   %self    %cum     "
+    "<10     <32     <64    <256   <1024  >=1024  %wself          "
+    "code addr symbol\n";
+static const char pfmon_bm_header[] =
+    "#     counts   %self    %cum          code addr symbol\n";
+
+const char *pfmon_intel_ll_args[PMU_TOOL_MAX_ARGS] = {
+  PFMON_CMD,
+  "--aggregate-results",
+  "--follow-all",
+  "--with-header",
+  "--smpl-module=pebs-ll",
+  "--ld-lat-threshold=4",
+  "--pebs-ll-dcmiss-code",
+  "--resolve-addresses",
+  "-emem_inst_retired:LATENCY_ABOVE_THRESHOLD",
+  "--long-smpl-periods=10000",
+  0  /* terminating NULL must be present */
+};
+
+const char *pfmon_amd_ll_args[PMU_TOOL_MAX_ARGS] = {
+  PFMON_CMD,
+  "--aggregate-results",
+  "--follow-all",
+  "-uk",
+  "--with-header",
+  "--smpl-module=ibs",
+  "--resolve-addresses",
+  "-eibsop_event:uops",
+  "--ibs-dcmiss-code",
+  "--long-smpl-periods=0xffff0",
+  0  /* terminating NULL must be present */
+};
+
+const char *pfmon_intel_brm_args[PMU_TOOL_MAX_ARGS] = {
+  PFMON_CMD,
+  "--aggregate-results",
+  "--follow-all",
+  "--with-header",
+  "--resolve-addresses",
+  "-eMISPREDICTED_BRANCH_RETIRED",
+  "--long-smpl-periods=10000",
+  0  /* terminating NULL must be present */
+};
+
+const char *pfmon_amd_brm_args[PMU_TOOL_MAX_ARGS] = {
+  PFMON_CMD,
+  "--aggregate-results",
+  "--follow-all",
+  "--with-header",
+  "--resolve-addresses",
+  "-eRETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS",
+  "--long-smpl-periods=10000",
+  0  /* terminating NULL must be present */
+};
+
+const char *addr2line_args[PMU_TOOL_MAX_ARGS] = {
+  ADDR2LINE_CMD,
+  "-e",
+  0  /* terminating NULL must be present */
+};
+
+
+enum pmu_tool_type
+{
+  PTT_PFMON,
+  PTT_LAST
+};
+
+enum pmu_event_type
+{
+  PET_INTEL_LOAD_LATENCY,
+  PET_AMD_LOAD_LATENCY,
+  PET_INTEL_BRANCH_MISPREDICT,
+  PET_AMD_BRANCH_MISPREDICT,
+  PET_LAST
+};
+
+typedef struct pmu_tool_fns {
+  const char *name;     /* name of the pmu tool */
+  /* pmu tool commandline argument.  */
+  const char **arg_array;
+  /* Initialize pmu module.  */
+  void *(*init_pmu_module) (void);
+  /* Start profililing.  */
+  void (*start_pmu_module) (pid_t ppid, char *tmpfile, const char **args);
+  /* Stop profililing.  */
+  void (*stop_pmu_module) (void);
+  /* How to parse the output generated by the PMU tool.  */
+  int (*parse_pmu_output) (char *filename, void *pmu_data);
+  /* How to write parsed pmu data into gcda file.  */
+  void (*gcov_write_pmu_data) (void *data);
+  /* How to cleanup any data structure created during parsing.  */
+  void (*cleanup_pmu_data) (void *data);
+  /* How to initialize symbolizer for the PPID.  */
+  int (*start_symbolizer) (pid_t ppid);
+  void (*end_symbolizer) (void);
+  char *(*symbolize) (void *addr);
+} pmu_tool_fns;
+
+enum pmu_state
+{
+  PMU_NONE,             /* Not configurated at all.  */
+  PMU_INITIALIZED,      /* Configured and initialized.  */
+  PMU_ERROR,            /* Configuration error. Cannot recover.  */
+  PMU_ON,               /* Currently profiling.  */
+  PMU_OFF               /* Currently stopped, but can be restarted.  */
+};
+
+enum cpu_vendor_signature
+{
+  CPU_VENDOR_UKNOWN = 0,
+  CPU_VENDOR_INTEL  = 0x756e6547, /* Genu */
+  CPU_VENDOR_AMD    = 0x68747541 /* Auth */
+};
+
+/* Info about pmu tool during the run time.  */
+struct pmu_tool_info
+{
+  /* Current pmu tool.  */
+  enum pmu_tool_type tool;
+  /* Current event.  */
+  enum pmu_event_type event;
+  /* filename for storing the pmu profile.  */
+  char *pmu_profile_filename;
+  /* Intermediate file where the tool stores the PMU data.  */
+  char *raw_pmu_profile_filename;
+  /* Where PMU tool's stderr should be stored.  */
+  char *tool_stderr_filename;
+  enum pmu_state pmu_profiling_state;
+  enum cpu_vendor_signature cpu_vendor; /* as discovered by cpuid */
+  pid_t pmu_tool_pid;   /* process id of the pmu tool */
+  pid_t symbolizer_pid; /* process id of the symbolizer */
+  int symbolizer_to_pipefd[2]; /* pipe for writing to the symbolizer */
+  int symbolizer_from_pipefd[2];  /* pipe for reading from the symbolizer */
+  void *pmu_data;       /* an opaque pointer for the tool to store pmu data */
+  int verbose;          /* turn on additional debugging */
+  unsigned top_n_address;  /* how many addresses to symbolize */
+  pmu_tool_fns *tool_details;  /* list of functions how to start/stop/parse */
+};
+
+/* Global struct for recordkeeping.  */
+static struct pmu_tool_info *the_pmu_tool_info;
+
+/* Additional info is printed if these are non-zero.  */
+static int tool_debug = 0;
+static int sym_debug = 0;
+
+static int parse_load_latency_line (char *line, gcov_pmu_ll_info_t *ll_info);
+static int parse_branch_mispredict_line (char *line,
+                                         gcov_pmu_brm_info_t *brm_info);
+static unsigned convert_pct_to_unsigned (float pct);
+static void start_pfmon_module (pid_t ppid, char *tmpfile, const char **pfmon_args);
+static void *init_pmu_load_latency (void);
+static void *init_pmu_branch_mispredict (void);
+static void destroy_load_latency_infos (void *info);
+static void destroy_branch_mispredict_infos (void *info);
+static int parse_pfmon_load_latency (char *filename, void *pmu_data);
+static int parse_pfmon_branch_mispredicts (char *filename, void *pmu_data);
+static gcov_unsigned_t gcov_tag_pmu_tool_header_length (gcov_pmu_tool_header_t
+                                                        *header);
+static void gcov_write_tool_header (gcov_pmu_tool_header_t *header);
+static void gcov_write_load_latency_infos (void *info);
+static void gcov_write_branch_mispredict_infos (void *info);
+static void gcov_write_ll_line (const gcov_pmu_ll_info_t *ll_info);
+static void gcov_write_branch_mispredict_line (const gcov_pmu_brm_info_t
+                                               *brm_info);
+static int start_addr2line_symbolizer (pid_t pid);
+static void end_addr2line_symbolizer (void);
+static char *symbolize_addr2line (void *p);
+static void reset_symbolizer_parent_pipes (void);
+static void reset_symbolizer_child_pipes (void);
+/* parse and cache relevant tool info.  */
+static int parse_pmu_profile_options (const char *options);
+static gcov_pmu_tool_header_t *parse_pfmon_tool_header (FILE *fp,
+                                                        const char *end_header);
+
+
+/* How to access the necessary functions for the PMU tools.  */
+pmu_tool_fns all_pmu_tool_fns[PTT_LAST][PET_LAST] = {
+  {
+    {
+      "intel-load-latency",             /* name */
+      pfmon_intel_ll_args,              /* tool args */
+      init_pmu_load_latency,            /* initialization */
+      start_pfmon_module,               /* start */
+      0,                                /* stop */
+      parse_pfmon_load_latency,         /* parse */
+      gcov_write_load_latency_infos,    /* write */
+      destroy_load_latency_infos,       /* cleanup */
+      start_addr2line_symbolizer,       /* start symbolizer */
+      end_addr2line_symbolizer,         /* end symbolizer */
+      symbolize_addr2line,              /* symbolize */
+    },
+    {
+      "amd-load-latency",               /* name */
+      pfmon_amd_ll_args,                /* tool args */
+      init_pmu_load_latency,            /* initialization */
+      start_pfmon_module,               /* start */
+      0,                                /* stop */
+      parse_pfmon_load_latency,         /* parse */
+      gcov_write_load_latency_infos,    /* write */
+      destroy_load_latency_infos,       /* cleanup */
+      start_addr2line_symbolizer,       /* start symbolizer */
+      end_addr2line_symbolizer,         /* end symbolizer */
+      symbolize_addr2line,              /* symbolize */
+    },
+    {
+      "intel-branch-mispredict",        /* name */
+      pfmon_intel_brm_args,             /* tool args */
+      init_pmu_branch_mispredict,       /* initialization */
+      start_pfmon_module,               /* start */
+      0,                                /* stop */
+      parse_pfmon_branch_mispredicts,   /* parse */
+      gcov_write_branch_mispredict_infos,/* write */
+      destroy_branch_mispredict_infos,  /* cleanup */
+      start_addr2line_symbolizer,       /* start symbolizer */
+      end_addr2line_symbolizer,         /* end symbolizer */
+      symbolize_addr2line,              /* symbolize */
+    },
+    {
+      "amd-branch-mispredict",          /* name */
+      pfmon_amd_brm_args,               /* tool args */
+      init_pmu_branch_mispredict,       /* initialization */
+      start_pfmon_module,               /* start */
+      0,                                /* stop */
+      parse_pfmon_branch_mispredicts,   /* parse */
+      gcov_write_branch_mispredict_infos,/* write */
+      destroy_branch_mispredict_infos,  /* cleanup */
+      start_addr2line_symbolizer,       /* start symbolizer */
+      end_addr2line_symbolizer,         /* end symbolizer */
+      symbolize_addr2line,              /* symbolize */
+    }
+  }
+};
+
+/* Determine the CPU vendor.  Currently only distinguishes x86 based
+   cpus where the vendor is either Intel or AMD.  Returns one of the
+   enum cpu_vendor_signatures.  */
+
+static unsigned int
+get_x86cpu_vendor (void)
+{
+  unsigned int vendor = CPU_VENDOR_UKNOWN;
+
+#if (defined (__x86_64__) || defined (__i386__))
+  if (__get_cpuid_max (0, &vendor) < 1)
+    return CPU_VENDOR_UKNOWN;      /* Cannot determine cpu type.  */
+#endif
+
+  if (vendor == CPU_VENDOR_INTEL || vendor == CPU_VENDOR_AMD)
+    return vendor;
+  else
+    return CPU_VENDOR_UKNOWN;
+}
+
+
+/* Parse PMU tool option string provided on the command line and store
+   information in global structure.  Return 0 on success, otherwise
+   return 1.  Any changes to this should be synced with
+   check_pmu_profile_options() which does compile time check.  */
+
+static int
+parse_pmu_profile_options (const char *options)
+{
+  enum pmu_tool_type ptt = the_pmu_tool_info->tool;
+  enum pmu_event_type pet = PET_LAST;
+  const char *pmutool_path;
+  the_pmu_tool_info->cpu_vendor =  get_x86cpu_vendor ();
+  /* Determine the platform we are running on.  */
+  if (the_pmu_tool_info->cpu_vendor == CPU_VENDOR_UKNOWN)
+    {
+      /* Cpuid failed or uknown vendor.  */
+      the_pmu_tool_info->pmu_profiling_state = PMU_ERROR;
+      return 1;
+    }
+
+  /* Validate the options.  */
+  if (strcmp(options, "load-latency") &&
+      strcmp(options, "load-latency-verbose") &&
+      strcmp(options, "branch-mispredict") &&
+      strcmp(options, "branch-mispredict-verbose"))
+    return 1;
+
+  /* Check if are aksed to collect load latency PMU data.  */
+  if (!strcmp(options, "load-latency") ||
+      !strcmp(options, "load-latency-verbose"))
+    {
+      if (the_pmu_tool_info->cpu_vendor == CPU_VENDOR_INTEL)
+        pet = PET_INTEL_LOAD_LATENCY;
+      else
+        pet = PET_AMD_LOAD_LATENCY;
+      if (!strcmp(options, "load-latency-verbose"))
+        the_pmu_tool_info->verbose = 1;
+    }
+
+  /* Check if are aksed to collect branch mispredict PMU data.  */
+  if (!strcmp(options, "branch-mispredict") ||
+      !strcmp(options, "branch-mispredict-verbose"))
+    {
+      if (the_pmu_tool_info->cpu_vendor == CPU_VENDOR_INTEL)
+        pet = PET_INTEL_BRANCH_MISPREDICT;
+      else
+        pet = PET_AMD_BRANCH_MISPREDICT;
+      if (!strcmp(options, "branch-mispredict-verbose"))
+        the_pmu_tool_info->verbose = 1;
+    }
+
+  the_pmu_tool_info->tool_details = &all_pmu_tool_fns[ptt][pet];
+  the_pmu_tool_info->event = pet;
+
+  /* Allow users to override the default tool path.  */
+  pmutool_path = getenv ("GCOV_PMUTOOL_PATH");
+  if (pmutool_path && strlen (pmutool_path))
+    the_pmu_tool_info->tool_details->arg_array[0] = pmutool_path;
+
+  return 0;
+}
+
+/* Do the initialization of addr2line symbolizer for the process id
+   given by TASK_PID.  It forks an addr2line process and creates two
+   pipes where addresses can be written and source_filename:line_num
+   entries can be read.  Returns 0 on success, non-zero otherwise.  */
+
+static int
+start_addr2line_symbolizer (pid_t task_pid)
+{
+  pid_t pid;
+  char *addr2line_path;
+
+  /* Allow users to override the default addr2line path.  */
+  addr2line_path = getenv ("GCOV_ADDR2LINE_PATH");
+  if (addr2line_path && strlen (addr2line_path))
+    addr2line_args[0] = addr2line_path;
+
+  if (pipe (the_pmu_tool_info->symbolizer_from_pipefd) == -1)
+    {
+      fprintf (stderr, "Cannot create symbolizer write pipe.\n");
+      return 1;
+    }
+  if (pipe (the_pmu_tool_info->symbolizer_to_pipefd) == -1)
+    {
+      fprintf (stderr, "Cannot create symbolizer read pipe.\n");
+      return 1;
+    }
+
+  pid = fork ();
+  if (pid == -1)
+    {
+      /* error condition */
+      fprintf (stderr, "Cannot create symbolizer process.\n");
+      reset_symbolizer_parent_pipes ();
+      reset_symbolizer_child_pipes ();
+      return 1;
+    }
+
+  if (pid == 0)
+    {
+      /* child does an exec and then connects to/from the pipe */
+      unsigned n_args = 0;
+      char proc_exe_buf[128];
+      int new_write_fd, new_read_fd;
+      int i;
+
+      /* Go over the current addr2line args.  */
+      for (i = 0; i < PMU_TOOL_MAX_ARGS && addr2line_args[i]; ++i)
+        n_args++;
+
+      /* We are going to add one more arg for the /proc/pid/exe */
+      if (n_args >= (PMU_TOOL_MAX_ARGS - 1))
+        {
+          fprintf (stderr, "too many addr2line args: %d\n", n_args);
+          _exit (0);
+        }
+      snprintf (proc_exe_buf, sizeof (proc_exe_buf), "/proc/%d/exe",
+                task_pid);
+
+      /* Add the extra arg for the process id.  */
+      addr2line_args[n_args] = proc_exe_buf;
+      n_args++;
+
+      addr2line_args[n_args] = (const char *)NULL;  /* terminating NULL */
+
+      if (sym_debug)
+        {
+          fprintf (stderr, "addr2line args:");
+          for (i = 0; i < PMU_TOOL_MAX_ARGS && addr2line_args[i]; ++i)
+            fprintf (stderr, " %s", addr2line_args[i]);
+          fprintf (stderr, "\n");
+        }
+
+      /* Close unused ends of the two pipes.  */
+      reset_symbolizer_child_pipes ();
+
+      /* Connect the pipes to stdin/stdout of the child process.  */
+      new_read_fd = dup2 (the_pmu_tool_info->symbolizer_to_pipefd[0], 0);
+      new_write_fd = dup2 (the_pmu_tool_info->symbolizer_from_pipefd[1], 1);
+      if (new_read_fd == -1 || new_write_fd == -1)
+        {
+          fprintf (stderr, "could not dup symbolizer fds\n");
+          reset_symbolizer_parent_pipes ();
+          reset_symbolizer_child_pipes ();
+          _exit (0);
+        }
+      the_pmu_tool_info->symbolizer_to_pipefd[0] = new_read_fd;
+      the_pmu_tool_info->symbolizer_from_pipefd[1] = new_write_fd;
+
+      /* Do execve with NULL env. */
+      execve (addr2line_args[0], (char * const*)addr2line_args,
+              (char * const*)NULL);
+      /* exec returned, an error condition.  */
+      fprintf (stderr, "could not create symbolizer process: %s\n",
+               addr2line_args[0]);
+      reset_symbolizer_parent_pipes ();
+      reset_symbolizer_child_pipes ();
+      _exit (0);
+    }
+  else
+    {
+      /* parent */
+      the_pmu_tool_info->symbolizer_pid = pid;
+      /* Close unused ends of the two pipes.  */
+      reset_symbolizer_parent_pipes ();
+      return 0;
+    }
+  return 0;
+}
+
+/* Close unused write end of the from-pipe and read end of the
+   to-pipe.  */
+
+static void
+reset_symbolizer_parent_pipes (void)
+{
+  if (the_pmu_tool_info->symbolizer_from_pipefd[1] != -1)
+    {
+      close (the_pmu_tool_info->symbolizer_from_pipefd[1]);
+      the_pmu_tool_info->symbolizer_from_pipefd[1] = -1;
+    }
+  if (the_pmu_tool_info->symbolizer_to_pipefd[0] != -1)
+    {
+      close (the_pmu_tool_info->symbolizer_to_pipefd[0]);
+      the_pmu_tool_info->symbolizer_to_pipefd[0] = -1;
+    }
+}
+
+/* Close unused write end of the to-pipe and read end of the
+   from-pipe.  */
+
+static void
+reset_symbolizer_child_pipes (void)
+{
+  if (the_pmu_tool_info->symbolizer_to_pipefd[1] != -1)
+    {
+      close (the_pmu_tool_info->symbolizer_to_pipefd[1]);
+      the_pmu_tool_info->symbolizer_to_pipefd[1] = -1;
+    }
+  if (the_pmu_tool_info->symbolizer_from_pipefd[0] != -1)
+    {
+      close (the_pmu_tool_info->symbolizer_from_pipefd[0]);
+      the_pmu_tool_info->symbolizer_from_pipefd[0] = -1;
+    }
+}
+
+
+/* Perform cleanup for the symbolizer process.  */
+
+static void
+end_addr2line_symbolizer (void)
+{
+  int pid_status;
+  int wait_status;
+  pid_t pid = the_pmu_tool_info->symbolizer_pid;
+
+  /* Symbolizer was not running.  */
+  if (!pid)
+    return;
+
+  reset_symbolizer_parent_pipes ();
+  reset_symbolizer_child_pipes ();
+  kill (pid, SIGTERM);
+  wait_status = waitpid (pid, &pid_status, 0);
+  if (sym_debug)
+  {
+    if (wait_status == pid)
+      fprintf (stderr, "Normal exit. symbolizer terminated.\n");
+    else
+      fprintf (stderr, "Abnormal exit. symbolizer status, %d.\n", pid_status);
+  }
+  the_pmu_tool_info->symbolizer_pid = 0;  /* Symoblizer no longer running.  */
+}
+
+
+/* Given an address ADDR, return a string containing
+   source_filename:line_num entries.  */
+
+static char *
+symbolize_addr2line (void *addr)
+{
+  char buf[32];  /* holds the ascii version of address */
+  int write_count;
+  int read_count;
+  char *srcfile_linenum;
+  size_t max_length = 1024;
+
+  if (!the_pmu_tool_info->symbolizer_pid)
+    return default_addr2line;    /* symbolizer is not running */
+
+  write_count = snprintf (buf, sizeof (buf), "%p\n", addr);
+
+  /* Write the address into the pipe.  */
+  if (write (the_pmu_tool_info->symbolizer_to_pipefd[1], buf, write_count)
+      < write_count)
+    {
+      if (sym_debug)
+        fprintf (stderr, "Cannot write symbolizer pipe.\n");
+      return default_addr2line;
+    }
+
+  srcfile_linenum = XNEWVEC (char, max_length);
+  read_count = read (the_pmu_tool_info->symbolizer_from_pipefd[0],
+                     srcfile_linenum, max_length);
+  if (read_count == -1)
+    {
+      if (sym_debug)
+        fprintf (stderr, "Cannot read symbolizer pipe.\n");
+      XDELETEVEC (srcfile_linenum);
+      return default_addr2line;
+    }
+
+  srcfile_linenum[read_count] = 0;
+  if (sym_debug)
+    fprintf (stderr, "symbolizer: for address %p, read_count %d, got %s\n",
+             addr, read_count, srcfile_linenum);
+  return srcfile_linenum;
+}
+
+/* Start monitoring PPID process via pfmon tool using TMPFILE as a
+   file to store the raw data and using PFMON_ARGS as the command line
+   arguments.  */
+
+static void
+start_pfmon_module (pid_t ppid, char *tmpfile, const char **pfmon_args)
+{
+  int i;
+  unsigned int n_args = 0;
+  unsigned n_chars;
+  char pid_buf[64];
+  char filename_buf[1024];
+  char top_n_buf[24];
+  unsigned extra_args;
+
+  /* Go over the current pfmon args */
+  for (i = 0; i < PMU_TOOL_MAX_ARGS && pfmon_args[i]; ++i)
+    n_args++;
+
+  if (the_pmu_tool_info->verbose)
+    extra_args = 4; /* account for additional --verbose */
+  else
+    extra_args = 3;
+
+  /* We are going to add args.  */
+  if (n_args >= (PMU_TOOL_MAX_ARGS - extra_args))
+    {
+      fprintf (stderr, "too many pfmon args: %d\n", n_args);
+      _exit (0);
+    }
+
+  n_chars = snprintf (pid_buf, sizeof (pid_buf), "--attach-task=%ld",
+                      (long)ppid);
+  if (n_chars >= sizeof (pid_buf))
+    {
+      fprintf (stderr, "pfmon task id too long: %s\n", pid_buf);
+      return;
+    }
+  pfmon_args[n_args] = pid_buf;
+  n_args++;
+
+  n_chars = snprintf (filename_buf, sizeof (filename_buf), "--smpl-outfile=%s",
+                      tmpfile);
+  if (n_chars >= sizeof (filename_buf))
+    {
+      fprintf (stderr, "pfmon filename too long: %s\n", filename_buf);
+      return;
+    }
+  pfmon_args[n_args] = filename_buf;
+  n_args++;
+
+  n_chars = snprintf (top_n_buf, sizeof (top_n_buf), "--smpl-show-top=%d",
+                      the_pmu_tool_info->top_n_address);
+  if (n_chars >= sizeof (top_n_buf))
+    {
+      fprintf (stderr, "pfmon option too long: %s\n", top_n_buf);
+      return;
+    }
+  pfmon_args[n_args] = top_n_buf;
+  n_args++;
+
+  if (the_pmu_tool_info->verbose) {
+    /* Add --verbose as well.  */
+    pfmon_args[n_args] = "--verbose";
+    n_args++;
+  }
+  pfmon_args[n_args] = (char *)NULL;
+
+  if (tool_debug)
+    {
+      fprintf (stderr, "pfmon args:");
+      for (i = 0; i < PMU_TOOL_MAX_ARGS && pfmon_args[i]; ++i)
+        fprintf (stderr, " %s", pfmon_args[i]);
+      fprintf (stderr, "\n");
+    }
+  /* Do execve with NULL env.  */
+  execve (pfmon_args[0], (char *const *)pfmon_args, (char * const*)NULL);
+  /* does not return */
+}
+
+/* Convert a fractional PCT to an unsigned integer after
+   muliplying by 100.  */
+
+static unsigned
+convert_pct_to_unsigned (float pct)
+{
+  return (unsigned)(pct * 100.0f);
+}
+
+/* Parse the load latency info pointed by LINE and save it into
+   LL_INFO. Returns 0 if the line was parsed successfully, non-zero
+   otherwise.
+
+   An example header+line look like these:
+   "counts   %self    %cum     <10     <32     <64    <256   <1024  >=1024
+   %wself          code addr symbol"
+   "218  24.06%  24.06% 100.00%   0.00%   0.00%   0.00%   0.00%   0.00%  22.70%
+   0x0000000000413e75 CalcSSIM(...)+965</tmp/psnr>"
+*/
+
+static int
+parse_load_latency_line (char *line, gcov_pmu_ll_info_t *ll_info)
+{
+  unsigned counts;
+  /* These are percentages parsed as floats, but then converted to
+     integers after multiplying by 100.  */
+  float self, cum, lt_10, lt_32, lt_64, lt_256, lt_1024, gt_1024, wself;
+  long unsigned int p;
+  int n_values;
+  pmu_tool_fns *tool_details = the_pmu_tool_info->tool_details;
+
+  n_values = sscanf (line, "%u%f%%%f%%%f%%%f%%%f%%%f%%%f%%%f%%%f%%%lx",
+                     &counts, &self, &cum, &lt_10, &lt_32, &lt_64, &lt_256,
+                     &lt_1024, &gt_1024, &wself, &p);
+  if (n_values != 11)
+    return 1;
+
+  /* Values read successfully. Do the assignment after converting
+   * percentages into ints.  */
+  ll_info->counts = counts;
+  ll_info->self = convert_pct_to_unsigned (self);
+  ll_info->cum = convert_pct_to_unsigned (cum);
+  ll_info->lt_10 = convert_pct_to_unsigned (lt_10);
+  ll_info->lt_32 = convert_pct_to_unsigned (lt_32);
+  ll_info->lt_64 = convert_pct_to_unsigned (lt_64);
+  ll_info->lt_256 = convert_pct_to_unsigned (lt_256);
+  ll_info->lt_1024 = convert_pct_to_unsigned (lt_1024);
+  ll_info->gt_1024 = convert_pct_to_unsigned (gt_1024);
+  ll_info->wself = convert_pct_to_unsigned (wself);
+  ll_info->code_addr = p;
+
+  /* Run the raw address through the symbolizer.  */
+  if (tool_details->symbolize)
+    {
+      char *sym_info = tool_details->symbolize ((void *)p);
+      /* sym_info is of the form src_filename:linenum.  Descriminator is
+         currently not supported by addr2line.  */
+      char *sep = strchr (sym_info, ':');
+      if (!sep)
+        {
+          /* Assume entire string is srcfile.  */
+          ll_info->filename = (char *)sym_info;
+          ll_info->line = 0;
+        }
+      else
+        {
+          /* Terminate the filename string at the separator.  */
+          *sep = 0;
+          ll_info->filename = (char *)sym_info;
+          /* Convert rest of the sym info to a line number.  */
+          ll_info->line = atol (sep+1);
+        }
+      ll_info->discriminator = 0;
+    }
+  else
+    {
+      /* No symbolizer available.  */
+      ll_info->filename = NULL;
+      ll_info->line = 0;
+      ll_info->discriminator = 0;
+    }
+  return 0;
+}
+
+/* Parse the branch mispredict info pointed by LINE and save it into
+   BRM_INFO. Returns 0 if the line was parsed successfully, non-zero
+   otherwise.
+
+   An example header+line look like these:
+   "counts   %self    %cum          code addr symbol"
+   "6869  37.67%  37.67% 0x00000000004007e5 sum(std::vector<int*,
+    std::allocator<int*> > const&)+51</root/tmp/array>"
+*/
+
+static int
+parse_branch_mispredict_line (char *line, gcov_pmu_brm_info_t *brm_info)
+{
+  unsigned counts;
+  /* These are percentages parsed as floats, but then converted to
+     ints after multiplying by 100.  */
+  float self, cum;
+  long unsigned int p;
+  int n_values;
+  pmu_tool_fns *tool_details = the_pmu_tool_info->tool_details;
+
+  n_values = sscanf (line, "%u%f%%%f%%%lx",
+                     &counts, &self, &cum, &p);
+  if (n_values != 4)
+    return 1;
+
+  /* Values read successfully. Do the assignment after converting
+   * percentages into ints.  */
+  brm_info->counts = counts;
+  brm_info->self = convert_pct_to_unsigned (self);
+  brm_info->cum = convert_pct_to_unsigned (cum);
+  brm_info->code_addr = p;
+
+  /* Run the raw address through the symbolizer.  */
+  if (tool_details->symbolize)
+    {
+      char *sym_info = tool_details->symbolize ((void *)p);
+      /* sym_info is of the form src_filename:linenum.  Descriminator is
+         currently not supported by addr2line.  */
+      char *sep = strchr (sym_info, ':');
+      if (!sep)
+        {
+          /* Assume entire string is srcfile.  */
+          brm_info->filename = sym_info;
+          brm_info->line = 0;
+        }
+      else
+        {
+          /* Terminate the filename string at the separator.  */
+          *sep = 0;
+          brm_info->filename = sym_info;
+          /* Convert rest of the sym info to a line number.  */
+          brm_info->line = atol (sep+1);
+        }
+      brm_info->discriminator = 0;
+    }
+  else
+    {
+      /* No symbolizer available.  */
+      brm_info->filename = NULL;
+      brm_info->line = 0;
+      brm_info->discriminator = 0;
+    }
+  return 0;
+}
+
+/* Delete load latency info structures INFO.  */
+
+static void
+destroy_load_latency_infos (void *info)
+{
+  unsigned i;
+  ll_infos_t* ll_infos = (ll_infos_t *)info;
+
+  /* delete each element */
+  for (i = 0; i < ll_infos->ll_count; ++i)
+    XDELETE (ll_infos->ll_array[i]);
+  /* delete the array itself */
+  XDELETE (ll_infos->ll_array);
+  __destroy_pmu_tool_header (ll_infos->pmu_tool_header);
+  free (ll_infos->pmu_tool_header);
+  ll_infos->ll_array = 0;
+  ll_infos->ll_count = 0;
+}
+
+/* Delete branch mispredict structure INFO.  */
+
+static void
+destroy_branch_mispredict_infos (void *info)
+{
+  unsigned i;
+  brm_infos_t* brm_infos = (brm_infos_t *)info;
+
+  /* delete each element */
+  for (i = 0; i < brm_infos->brm_count; ++i)
+    XDELETE (brm_infos->brm_array[i]);
+  /* delete the array itself */
+  XDELETE (brm_infos->brm_array);
+  __destroy_pmu_tool_header (brm_infos->pmu_tool_header);
+  free (brm_infos->pmu_tool_header);
+  brm_infos->brm_array = 0;
+  brm_infos->brm_count = 0;
+}
+
+/* Parse FILENAME for load latency lines into a structure
+   PMU_DATA. Returns 0 on on success.  Returns non-zero on
+   failure.  */
+
+static int
+parse_pfmon_load_latency (char *filename, void *pmu_data)
+{
+  FILE *fp;
+  size_t buflen = 2*1024;
+  char *buf;
+  ll_infos_t *load_latency_infos = (ll_infos_t *)pmu_data;
+  gcov_pmu_tool_header_t *tool_header = 0;
+
+  if ((fp = fopen (filename, "r")) == NULL)
+    {
+      fprintf (stderr, "cannot open pmu data file: %s\n", filename);
+      return 1;
+    }
+
+  if (!(tool_header = parse_pfmon_tool_header (fp, pfmon_ll_header)))
+    {
+      fprintf (stderr, "cannot parse pmu data file header: %s\n", filename);
+      return 1;
+    }
+
+  buf = XNEWVEC (char, buflen);
+  while (fgets (buf, buflen, fp))
+    {
+      gcov_pmu_ll_info_t *ll_info = XNEW (gcov_pmu_ll_info_t);
+      if (!parse_load_latency_line (buf, ll_info))
+        {
+          /* valid line, add to the array */
+          load_latency_infos->ll_count++;
+          if (load_latency_infos->ll_count >=
+              load_latency_infos->alloc_ll_count)
+            {
+              /* need to realloc */
+              load_latency_infos->ll_array =
+                realloc (load_latency_infos->ll_array,
+                         2 * load_latency_infos->alloc_ll_count);
+              if (load_latency_infos->ll_array == NULL)
+                {
+                  fprintf (stderr, "Cannot allocate load latency memory.\n");
+                  __destroy_pmu_tool_header (tool_header);
+                  free (buf);
+                  fclose (fp);
+                  return 1;
+                }
+            }
+          load_latency_infos->ll_array[load_latency_infos->ll_count - 1] =
+            ll_info;
+        }
+      else
+        /* Delete invalid line.  */
+        XDELETE (ll_info);
+    }
+  free (buf);
+  fclose (fp);
+  load_latency_infos->pmu_tool_header = tool_header;
+  return 0;
+}
+
+/* Parse open file FP until END_HEADER is seen. The data matching
+   gcov_pmu_tool_header_t fields is saved and returned in a new
+   struct. In case of failure, it returns NULL.  */
+
+static gcov_pmu_tool_header_t *
+parse_pfmon_tool_header (FILE *fp, const char *end_header)
+{
+  static const char tag_hostname[] = "# hostname: ";
+  static const char tag_kversion[] = "# kernel version: ";
+  static const char tag_hostcpu[] = "# host CPUs:  ";
+  static const char tag_column_desc_start[] = "# description of columns:";
+  static const char tag_column_desc_end[] =
+      "#	other columns are self-explanatory";
+  size_t buflen = 4*1024;
+  char *buf, *buf_start, *buf_end;
+  gcov_pmu_tool_header_t *tool_header = XNEWVEC (gcov_pmu_tool_header_t, 1);
+  char *hostname = 0;
+  char *kversion = 0;
+  char *hostcpu = 0;
+  char *column_description = 0;
+  char *column_desc_start = 0;
+  char *column_desc_end = 0;
+  const char *column_header = 0;
+  int got_hostname = 0;
+  int got_kversion = 0 ;
+  int got_hostcpu = 0;
+  int got_end_header = 0;
+  int got_column_description = 0;
+
+  buf = XNEWVEC (char, buflen);
+  buf_start = buf;
+  buf_end = buf + buflen;
+  while (buf < (buf_end - 1) && fgets (buf, buf_end - buf, fp))
+    {
+      if (strncmp (end_header, buf, buf_end - buf) == 0)
+      {
+        got_end_header = 1;
+        break;
+      }
+      if (!got_hostname &&
+          strncmp (buf, tag_hostname, strlen (tag_hostname)) == 0)
+        {
+          size_t len = strlen (buf) - strlen (tag_hostname);
+          hostname = XNEWVEC (char, len);
+          memcpy (hostname, buf + strlen (tag_hostname), len);
+          hostname[len - 1] = 0;
+          tool_header->hostname = hostname;
+          got_hostname = 1;
+        }
+
+      if (!got_kversion &&
+          strncmp (buf, tag_kversion, strlen (tag_kversion)) == 0)
+        {
+          size_t len = strlen (buf) - strlen (tag_kversion);
+          kversion = XNEWVEC (char, len);
+          memcpy (kversion, buf + strlen (tag_kversion), len);
+          kversion[len - 1] = 0;
+          tool_header->kernel_version = kversion;
+          got_kversion = 1;
+        }
+
+      if (!got_hostcpu &&
+          strncmp (buf, tag_hostcpu, strlen (tag_hostcpu)) == 0)
+        {
+          size_t len = strlen (buf) - strlen (tag_hostcpu);
+          hostcpu = XNEWVEC (char, len);
+          memcpy (hostcpu, buf + strlen (tag_hostcpu), len);
+          hostcpu[len - 1] = 0;
+          tool_header->host_cpu = hostcpu;
+          got_hostcpu = 1;
+        }
+      if (!got_column_description &&
+          strncmp (buf, tag_column_desc_start, strlen (tag_column_desc_start))
+          == 0)
+        {
+          column_desc_start = buf;
+          column_desc_end = 0;
+          /* Continue reading until end of the column descriptor.  */
+          while (buf < (buf_end - 1) && fgets (buf, buf_end - buf, fp))
+            {
+              if (strncmp (buf, tag_column_desc_end,
+                           strlen (tag_column_desc_end)) == 0)
+                {
+                  column_desc_end = buf + strlen (tag_column_desc_end);
+                  break;
+                }
+              buf += strlen (buf);
+            }
+          if (column_desc_end)
+            {
+              /* Found the end, copy it into a new string.  */
+              column_description = XNEWVEC (char, column_desc_end -
+                                            column_desc_start + 1);
+              got_column_description = 1;
+              strcpy (column_description, column_desc_start);
+              tool_header->column_description = column_description;
+            }
+        }
+      buf += strlen (buf);
+    }
+
+  /* If we are missing any of the fields, return NULL.  */
+  if (!got_end_header || !got_hostname || !got_kversion || !got_hostcpu
+      || !got_column_description)
+    {
+      free (hostname);
+      free (kversion);
+      free (hostcpu);
+      free (column_description);
+      free (buf_start);
+      free (tool_header);
+      return NULL;
+    }
+
+  switch (the_pmu_tool_info->event)
+    {
+    case PET_INTEL_LOAD_LATENCY:
+    case PET_AMD_LOAD_LATENCY:
+      column_header = pfmon_ll_header;
+      break;
+    case PET_INTEL_BRANCH_MISPREDICT:
+    case PET_AMD_BRANCH_MISPREDICT:
+      column_header = pfmon_bm_header;
+      break;
+    default:
+      break;
+    }
+  tool_header->column_header = strdup (column_header);
+  tool_header->full_header = buf_start;
+  return tool_header;
+}
+
+
+/* Parse FILENAME for branch mispredict lines into a structure
+   PMU_DATA. Returns 0 on on success.  Returns non-zero on
+   failure.  */
+
+static int
+parse_pfmon_branch_mispredicts (char *filename, void *pmu_data)
+{
+  FILE *fp;
+  size_t buflen = 2*1024;
+  char *buf;
+  brm_infos_t *brm_infos = (brm_infos_t *)pmu_data;
+  gcov_pmu_tool_header_t *tool_header = 0;
+
+  if ((fp = fopen (filename, "r")) == NULL)
+    {
+      fprintf (stderr, "cannot open pmu data file: %s\n", filename);
+      return 1;
+    }
+
+  if (!(tool_header = parse_pfmon_tool_header (fp, pfmon_bm_header)))
+    {
+      fprintf (stderr, "cannot parse pmu data file header: %s\n", filename);
+      return 1;
+    }
+
+  buf = XNEWVEC (char, buflen);
+  while (fgets (buf, buflen, fp))
+    {
+      gcov_pmu_brm_info_t *brm = XNEW (gcov_pmu_brm_info_t);
+      if (!parse_branch_mispredict_line (buf, brm))
+        {
+          /* Valid line, add to the array.  */
+          brm_infos->brm_count++;
+          if (brm_infos->brm_count >= brm_infos->alloc_brm_count)
+            {
+              /* Do we need to realloc? */
+              brm_infos->brm_array =
+                realloc (brm_infos->brm_array,
+                         2 * brm_infos->alloc_brm_count);
+              if (brm_infos->brm_array == NULL) {
+                fprintf (stderr,
+                         "Cannot allocate memory for br mispredicts.\n");
+                __destroy_pmu_tool_header (tool_header);
+                free (buf);
+                fclose (fp);
+                return 1;
+              }
+            }
+          brm_infos->brm_array[brm_infos->brm_count - 1] = brm;
+        }
+      else
+        /* Delete invalid line.  */
+        XDELETE (brm);
+    }
+  free (buf);
+  fclose (fp);
+  brm_infos->pmu_tool_header = tool_header;
+  return 0;
+}
+
+/* Start the monitoring process using pmu tool. Return 0 on success,
+   non-zero otherwise.  */
+
+static int
+pmu_start (void)
+{
+  pid_t pid;
+
+  /* no start function */
+  if (!the_pmu_tool_info->tool_details->start_pmu_module)
+    return 1;
+
+  pid = fork ();
+  if (pid == -1)
+    {
+      /* error condition */
+      fprintf (stderr, "Cannot create PMU profiling process, exiting.\n");
+      return 1;
+    }
+  else if (pid == 0)
+    {
+      /* child */
+      pid_t ppid = getppid();
+      char *tmpfile = the_pmu_tool_info->raw_pmu_profile_filename;
+      const char **pfmon_args = the_pmu_tool_info->tool_details->arg_array;
+      int new_stderr_fd;
+
+      /* Redirect stderr from the child process into a separate file.  */
+      new_stderr_fd = creat (the_pmu_tool_info->tool_stderr_filename,
+                             S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH);
+      if (new_stderr_fd != -1)
+          dup2 (new_stderr_fd, 2);
+      /* The following does an exec and thus is not expected to return.  */
+      the_pmu_tool_info->tool_details->start_pmu_module(ppid, tmpfile,
+                                                        pfmon_args);
+      /* exec returned, an error condition.  */
+      fprintf (stderr, "could not create profiling process: %s\n",
+               the_pmu_tool_info->tool_details->arg_array[0]);
+      _exit (0);
+    }
+  else
+    {
+      /* parent */
+      the_pmu_tool_info->pmu_tool_pid = pid;
+      return 0;
+    }
+}
+
+/* Allocate and initialize pmu load latency structure.  */
+
+static void *
+init_pmu_load_latency (void)
+{
+  ll_infos_t *load_latency = XNEWVEC (ll_infos_t, 1);
+  load_latency->ll_count = 0;
+  load_latency->alloc_ll_count = 64;
+  load_latency->ll_array = XNEWVEC (gcov_pmu_ll_info_t *,
+                                    load_latency->alloc_ll_count);
+  return (void *)load_latency;
+}
+
+/* Allocate and initialize pmu branch mispredict structure.  */
+
+static void *
+init_pmu_branch_mispredict (void)
+{
+  brm_infos_t *brm_info = XNEWVEC (brm_infos_t, 1);
+  brm_info->brm_count = 0;
+  brm_info->alloc_brm_count = 64;
+  brm_info->brm_array = XNEWVEC (gcov_pmu_brm_info_t *,
+                                 brm_info->alloc_brm_count);
+  return (void *)brm_info;
+}
+
+/* Initialize pmu tool based upon PMU_INFO. Sets the appropriate tool
+   type in the global the_pmu_tool_info.  */
+
+static int
+init_pmu_tool (struct gcov_pmu_info *pmu_info)
+{
+  the_pmu_tool_info->pmu_profiling_state = PMU_NONE;
+  the_pmu_tool_info->verbose = 0;
+  the_pmu_tool_info->tool = PTT_PFMON;  /* we support only pfmon */
+  the_pmu_tool_info->pmu_tool_pid = 0;
+  the_pmu_tool_info->top_n_address = pmu_info->pmu_top_n_address;
+  the_pmu_tool_info->symbolizer_pid = 0;
+  the_pmu_tool_info->symbolizer_to_pipefd[0] = -1;
+  the_pmu_tool_info->symbolizer_to_pipefd[1] = -1;
+  the_pmu_tool_info->symbolizer_from_pipefd[0] = -1;
+  the_pmu_tool_info->symbolizer_from_pipefd[1] = -1;
+
+  if (parse_pmu_profile_options (pmu_info->pmu_tool))
+    return 1;
+
+  if (the_pmu_tool_info->pmu_profiling_state == PMU_ERROR)
+    {
+      fprintf (stderr, "Unsupported PMU module: %s, disabling PMU profiling.\n",
+               pmu_info->pmu_tool);
+      return 1;
+    }
+
+  if (the_pmu_tool_info->tool_details->init_pmu_module)
+    /* initialize module */
+    the_pmu_tool_info->pmu_data =
+      the_pmu_tool_info->tool_details->init_pmu_module();
+  return 0;
+}
+
+/* Initialize PMU profiling based upon the information passed in
+   PMU_INFO and use pmu_profile_filename as the file to store the PMU
+   profile.  This is called multiple times from libgcov, once per
+   object file.  We need to make sure to do the necessary
+   initialization only the first time.  For subsequent invocations it
+   behaves as a NOOP.  */
+
+void
+__gcov_init_pmu_profiler (struct gcov_pmu_info *pmu_info)
+{
+  char *raw_pmu_profile_filename;
+  char *tool_stderr_filename;
+  if (!pmu_info || !pmu_info->pmu_profile_filename || !pmu_info->pmu_tool)
+    return;
+
+  /* Allocate the global structure on first invocation.  */
+  if (!the_pmu_tool_info)
+    {
+      the_pmu_tool_info = XNEWVEC (struct pmu_tool_info, 1);
+      if (!the_pmu_tool_info)
+        {
+          fprintf (stderr, "Error allocating memory for PMU tool\n");
+          return;
+        }
+      if (init_pmu_tool (pmu_info))
+        {
+          /* Initialization error.  */
+          XDELETE (the_pmu_tool_info);
+          the_pmu_tool_info = 0;
+          return;
+        }
+    }
+
+  switch (the_pmu_tool_info->pmu_profiling_state)
+    {
+    case PMU_NONE:
+      the_pmu_tool_info->pmu_profile_filename =
+        strdup (pmu_info->pmu_profile_filename);
+      /* Construct an intermediate filename by substituting trailing
+         '.gcda' with '.pmud'.  */
+      raw_pmu_profile_filename = strdup (pmu_info->pmu_profile_filename);
+      if (raw_pmu_profile_filename == NULL)
+        {
+          fprintf (stderr, "Cannot allocate memory\n");
+          exit (1);
+        }
+      strcpy (raw_pmu_profile_filename + strlen (raw_pmu_profile_filename) - 4,
+              "pmud");
+
+      /* Construct a filename for collecting PMU tool's stderr by
+         substituting trailing '.gcda' with '.stderr'.  */
+      tool_stderr_filename =
+        XNEWVEC (char, strlen (pmu_info->pmu_profile_filename) + 1 + 2);
+      strcpy (tool_stderr_filename, pmu_info->pmu_profile_filename);
+      strcpy (tool_stderr_filename + strlen (tool_stderr_filename) - 4,
+              "stderr");
+      the_pmu_tool_info->raw_pmu_profile_filename = raw_pmu_profile_filename;
+      the_pmu_tool_info->tool_stderr_filename = tool_stderr_filename;
+      the_pmu_tool_info->pmu_profiling_state = PMU_INITIALIZED;
+      break;
+
+    case PMU_INITIALIZED:
+    case PMU_OFF:
+    case PMU_ON:
+    case PMU_ERROR:
+      break;
+    default:
+      break;
+    }
+}
+
+/* Start PMU profiling.  It updates the current state.  */
+
+void
+__gcov_start_pmu_profiler (void)
+{
+  if (!the_pmu_tool_info)
+    return;
+
+  switch (the_pmu_tool_info->pmu_profiling_state)
+    {
+    case PMU_INITIALIZED:
+      if (!pmu_start ())
+        the_pmu_tool_info->pmu_profiling_state = PMU_ON;
+      else
+        the_pmu_tool_info->pmu_profiling_state = PMU_ERROR;
+      break;
+
+    case PMU_NONE:
+      /* PMU was not properly initialized, don't attempt start it.  */
+      the_pmu_tool_info->pmu_profiling_state = PMU_ERROR;
+      break;
+
+    case PMU_OFF:
+      /* Restarting PMU is not yet supported.  */
+    case PMU_ON:
+      /* Do nothing.  */
+    case PMU_ERROR:
+      break;
+
+    default:
+      break;
+    }
+}
+
+/* Stop PMU profiling.  Currently it doesn't do anything except
+   bookkeeping.  */
+
+void
+__gcov_stop_pmu_profiler (void)
+{
+  if (!the_pmu_tool_info)
+    return;
+
+  if (the_pmu_tool_info->tool_details->stop_pmu_module)
+    the_pmu_tool_info->tool_details->stop_pmu_module();
+  if (the_pmu_tool_info->pmu_profiling_state == PMU_ON)
+    the_pmu_tool_info->pmu_profiling_state = PMU_OFF;
+}
+
+/* Write the load latency information LL_INFO into the gcda file.  */
+
+static void
+gcov_write_ll_line (const gcov_pmu_ll_info_t *ll_info)
+{
+  gcov_unsigned_t len = GCOV_TAG_PMU_LOAD_LATENCY_LENGTH (ll_info->filename);
+  gcov_write_tag_length (GCOV_TAG_PMU_LOAD_LATENCY_INFO, len);
+  gcov_write_unsigned (ll_info->counts);
+  gcov_write_unsigned (ll_info->self);
+  gcov_write_unsigned (ll_info->cum);
+  gcov_write_unsigned (ll_info->lt_10);
+  gcov_write_unsigned (ll_info->lt_32);
+  gcov_write_unsigned (ll_info->lt_64);
+  gcov_write_unsigned (ll_info->lt_256);
+  gcov_write_unsigned (ll_info->lt_1024);
+  gcov_write_unsigned (ll_info->gt_1024);
+  gcov_write_unsigned (ll_info->wself);
+  gcov_write_counter (ll_info->code_addr);
+  gcov_write_unsigned (ll_info->line);
+  gcov_write_unsigned (ll_info->discriminator);
+  gcov_write_string (ll_info->filename);
+}
+
+
+/* Write the branch mispredict information BRM_INFO into the gcda file.  */
+
+static void
+gcov_write_branch_mispredict_line (const gcov_pmu_brm_info_t *brm_info)
+{
+  gcov_unsigned_t len = GCOV_TAG_PMU_BRANCH_MISPREDICT_LENGTH (
+      brm_info->filename);
+  gcov_write_tag_length (GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO, len);
+  gcov_write_unsigned (brm_info->counts);
+  gcov_write_unsigned (brm_info->self);
+  gcov_write_unsigned (brm_info->cum);
+  gcov_write_counter (brm_info->code_addr);
+  gcov_write_unsigned (brm_info->line);
+  gcov_write_unsigned (brm_info->discriminator);
+  gcov_write_string (brm_info->filename);
+}
+
+/* Write load latency information INFO into the gcda file.  The gcda
+   file has already been opened and is available for writing.  */
+
+static void
+gcov_write_load_latency_infos (void *info)
+{
+  unsigned i;
+  const ll_infos_t *ll_infos = (const ll_infos_t *)info;
+  gcov_unsigned_t stamp = 0;  /* Don't use stamp as we don't support merge.  */
+  /* We don't support merge, and instead always rewrite the file.  But
+     to rewrite a gcov file we must first read it, however the read
+     value is ignored.  */
+  gcov_read_unsigned ();
+  gcov_rewrite ();
+  gcov_write_tag_length (GCOV_DATA_MAGIC, GCOV_VERSION);
+  gcov_write_unsigned (stamp);
+  if (ll_infos->pmu_tool_header)
+    gcov_write_tool_header (ll_infos->pmu_tool_header);
+  for (i = 0; i < ll_infos->ll_count; ++i)
+    {
+      /* Write each line.  */
+      gcov_write_ll_line (ll_infos->ll_array[i]);
+    }
+  gcov_truncate ();
+}
+
+/* Write branch mispredict information INFO into the gcda file.  The
+   gcda file has already been opened and is available for writing.  */
+
+static void
+gcov_write_branch_mispredict_infos (void *info)
+{
+  unsigned i;
+  const brm_infos_t *brm_infos = (const brm_infos_t *)info;
+  gcov_unsigned_t stamp = 0;  /* Don't use stamp as we don't support merge. */
+  /* We don't support merge, and instead always rewrite the file.  */
+  gcov_rewrite ();
+  gcov_write_tag_length (GCOV_DATA_MAGIC, GCOV_VERSION);
+  gcov_write_unsigned (stamp);
+  if (brm_infos->pmu_tool_header)
+    gcov_write_tool_header (brm_infos->pmu_tool_header);
+  for (i = 0; i < brm_infos->brm_count; ++i)
+    {
+      /* Write each line.  */
+      gcov_write_branch_mispredict_line (brm_infos->brm_array[i]);
+    }
+  gcov_truncate ();
+}
+
+/* Compute TOOL_HEADER length for writing into the gcov file.  */
+
+static gcov_unsigned_t
+gcov_tag_pmu_tool_header_length (gcov_pmu_tool_header_t *header)
+{
+  gcov_unsigned_t len = 0;
+  if (header)
+    {
+      len += gcov_string_length (header->host_cpu);
+      len += gcov_string_length (header->hostname);
+      len += gcov_string_length (header->kernel_version);
+      len += gcov_string_length (header->column_header);
+      len += gcov_string_length (header->column_description);
+      len += gcov_string_length (header->full_header);
+    }
+  return len;
+}
+
+/* Write tool header into the gcda file. It assumes that the gcda file
+   has already been opened and is available for writing.  */
+
+static void
+gcov_write_tool_header (gcov_pmu_tool_header_t *header)
+{
+  gcov_unsigned_t len = gcov_tag_pmu_tool_header_length (header);
+  gcov_write_tag_length (GCOV_TAG_PMU_TOOL_HEADER, len);
+  gcov_write_string (header->host_cpu);
+  gcov_write_string (header->hostname);
+  gcov_write_string (header->kernel_version);
+  gcov_write_string (header->column_header);
+  gcov_write_string (header->column_description);
+  gcov_write_string (header->full_header);
+}
+
+
+/* End PMU profiling. If GCDA_ERROR is non-zero then write profiling data into
+   already open gcda file */
+
+void
+__gcov_end_pmu_profiler (int gcda_error)
+{
+  int pid_status;
+  int wait_status;
+  pid_t pid;
+  pmu_tool_fns *tool_details;
+
+  if (!the_pmu_tool_info)
+    return;
+
+  tool_details = the_pmu_tool_info->tool_details;
+  pid = the_pmu_tool_info->pmu_tool_pid;
+  if (pid)
+    {
+      if (tool_debug)
+        fprintf (stderr, "terminating PMU profiling process %ld\n", (long)pid);
+      kill (pid, SIGTERM);
+      if (tool_debug)
+        fprintf (stderr, "parent: waiting for pmu process to end\n");
+      wait_status = waitpid (pid, &pid_status, 0);
+      if (tool_debug) {
+        if (wait_status == pid)
+          fprintf (stderr, "Normal exit. Child terminated.\n");
+        else
+          fprintf (stderr, "Abnormal exit. child status, %d.\n", pid_status);
+      }
+    }
+
+  if (the_pmu_tool_info->pmu_profiling_state != PMU_OFF)
+    {
+      /* nothing to do */
+      fprintf (stderr,
+               "__gcov_dump_pmu_profile: incorrect pmu state: %d, pid: %ld\n",
+               the_pmu_tool_info->pmu_profiling_state,
+               (unsigned long)pid);
+      return;
+    }
+
+  if (!tool_details->parse_pmu_output)
+    return;
+
+  /* Since we are going to parse the output, we also need symbolizer.  */
+  if (tool_details->start_symbolizer)
+    tool_details->start_symbolizer (getpid ());
+
+  if (!tool_details->parse_pmu_output
+      (the_pmu_tool_info->raw_pmu_profile_filename,
+       the_pmu_tool_info->pmu_data))
+    {
+      if (!gcda_error && tool_details->gcov_write_pmu_data)
+        /* Write tool output into the gcda file.  */
+        tool_details->gcov_write_pmu_data (the_pmu_tool_info->pmu_data);
+    }
+
+  if (tool_details->end_symbolizer)
+    tool_details->end_symbolizer ();
+
+  if (tool_details->cleanup_pmu_data)
+    tool_details->cleanup_pmu_data (the_pmu_tool_info->pmu_data);
+}
+
+#endif
Index: gcc/coverage.c
===================================================================
--- gcc/coverage.c	(revision 175226)
+++ gcc/coverage.c	(working copy)
@@ -62,6 +62,9 @@ 
 #include "dbgcnt.h"
 #include "input.h"
 
+/* Defined in tree-profile.c.  */
+void gimple_init_instrumentation_sampling (void);
+
 struct function_list
 {
   struct function_list *next;	 /* next function */
@@ -120,6 +123,9 @@ 
 static char *da_base_file_name;
 static char *main_input_file_name;
 
+/* Filename for the global pmu profile */
+static char pmu_profile_filename[] = "pmuprofile";
+
 /* Hash table of count data.  */
 static htab_t counts_hash = NULL;
 
@@ -146,6 +152,16 @@ 
 /* True if the current module has any asm statements.  */
 static bool has_asm_statement;
 
+/* extern const char * __gcov_pmu_profile_filename */
+static tree gcov_pmu_filename_decl = NULL_TREE;
+/* extern const char * __gcov_pmu_profile_options */
+static tree gcov_pmu_options_decl = NULL_TREE;
+/* extern gcov_unsigned_t  __gcov_pmu_top_n_address */
+static tree gcov_pmu_top_n_address_decl = NULL_TREE;
+
+/* To ensure that the above variables are initialized only once.  */
+static int pmu_profiling_initialized = 0;
+
 /* Forward declarations.  */
 static hashval_t htab_counts_entry_hash (const void *);
 static int htab_counts_entry_eq (const void *, const void *);
@@ -157,7 +173,8 @@ 
 static tree build_ctr_info_value (unsigned, tree);
 static tree build_gcov_info (void);
 static void create_coverage (void);
-static char * get_da_file_name (const char *);
+static void init_pmu_profiling (void);
+static bool profiling_enabled_p (void);
 
 /* Return the type node for gcov_type.  */
 
@@ -175,6 +192,15 @@ 
   return lang_hooks.types.type_for_size (32, true);
 }
 
+/* Return the type node for const char *.  */
+
+static tree
+get_const_string_type (void)
+{
+  return build_pointer_type
+    (build_qualified_type (char_type_node, TYPE_QUAL_CONST));
+}
+
 static hashval_t
 htab_counts_entry_hash (const void *of)
 {
@@ -1688,7 +1714,7 @@ 
 
   no_coverage = 1; /* Disable any further coverage.  */
 
-  if (!prg_ctr_mask)
+  if (!prg_ctr_mask && !flag_pmu_profile_generate)
     return;
 
   t = build_gcov_info ();
@@ -1725,7 +1751,7 @@ 
 
 /* Get the da file name, given base file name.  */
 
-static char *
+char *
 get_da_file_name (const char *base_file_name)
 {
   char *da_file_name;
@@ -1910,8 +1936,122 @@ 
 	read_counts_file (get_da_file_name (module_infos[i]->da_filename),
 			  module_infos[i]->ident);
     }
+
+  /* Define variables which are referenced at runtime by libgcov.  */
+  if (profiling_enabled_p ())
+  {
+    init_pmu_profiling ();
+    gimple_init_instrumentation_sampling ();
+  }
 }
 
+/* Return True if any type of profiling is enabled which requires linking
+   in libgcov otherwise return False.  */
+
+static bool
+profiling_enabled_p (void)
+{
+  return flag_pmu_profile_generate || profile_arc_flag ||
+      flag_profile_generate_sampling || flag_test_coverage ||
+      flag_branch_probabilities || flag_profile_reusedist;
+}
+
+/* Construct variables for PMU profiling.
+   1) __gcov_pmu_profile_filename,
+   2) __gcov_pmu_profile_options,
+   3) __gcov_pmu_top_n_address.  */
+
+static void
+init_pmu_profiling (void)
+{
+  if (!pmu_profiling_initialized)
+    {
+      unsigned top_n_addr = PARAM_VALUE (PARAM_PMU_PROFILE_N_ADDRESS);
+      tree filename_ptr, options_ptr;
+
+      /* Construct an initializer for __gcov_pmu_profile_filename.  */
+      gcov_pmu_filename_decl =
+        build_decl (UNKNOWN_LOCATION, VAR_DECL,
+                    get_identifier ("__gcov_pmu_profile_filename"),
+                    get_const_string_type ());
+      TREE_PUBLIC (gcov_pmu_filename_decl) = 1;
+      DECL_ARTIFICIAL (gcov_pmu_filename_decl) = 1;
+      make_decl_one_only (gcov_pmu_filename_decl,
+                          DECL_ASSEMBLER_NAME (gcov_pmu_filename_decl));
+      TREE_STATIC (gcov_pmu_filename_decl) = 1;
+
+      if (flag_pmu_profile_generate)
+        {
+          const char *filename = get_da_file_name (pmu_profile_filename);
+          int file_name_len;
+          tree filename_string;
+          file_name_len = strlen (filename);
+          filename_string = build_string (file_name_len + 1, filename);
+          TREE_TYPE (filename_string) = build_array_type
+            (char_type_node, build_index_type
+             (build_int_cst (NULL_TREE, file_name_len)));
+          filename_ptr = build1 (ADDR_EXPR, get_const_string_type (),
+                                 filename_string);
+        }
+      else
+        filename_ptr = null_pointer_node;
+
+      DECL_INITIAL (gcov_pmu_filename_decl) = filename_ptr;
+      assemble_variable (gcov_pmu_filename_decl, 0, 0, 0);
+
+      /* Construct an initializer for __gcov_pmu_profile_options.  */
+      gcov_pmu_options_decl =
+        build_decl (UNKNOWN_LOCATION, VAR_DECL,
+                    get_identifier ("__gcov_pmu_profile_options"),
+                    get_const_string_type ());
+      TREE_PUBLIC (gcov_pmu_options_decl) = 1;
+      DECL_ARTIFICIAL (gcov_pmu_options_decl) = 1;
+      make_decl_one_only (gcov_pmu_options_decl,
+                          DECL_ASSEMBLER_NAME (gcov_pmu_options_decl));
+      TREE_STATIC (gcov_pmu_options_decl) = 1;
+
+      /* If the flag is false we generate a null pointer to indicate
+         that we are not doing the pmu profiling.  */
+      if (flag_pmu_profile_generate)
+        {
+          const char *pmu_options = flag_pmu_profile_generate;
+          int pmu_options_len;
+          tree pmu_options_string;
+
+          pmu_options_len = strlen (pmu_options);
+          pmu_options_string = build_string (pmu_options_len + 1, pmu_options);
+          TREE_TYPE (pmu_options_string) = build_array_type
+            (char_type_node, build_index_type (build_int_cst
+                                               (NULL_TREE, pmu_options_len)));
+          options_ptr = build1 (ADDR_EXPR, get_const_string_type (),
+                                pmu_options_string);
+        }
+      else
+        options_ptr = null_pointer_node;
+
+      DECL_INITIAL (gcov_pmu_options_decl) = options_ptr;
+      assemble_variable (gcov_pmu_options_decl, 0, 0, 0);
+
+      /* Construct an initializer for __gcov_pmu_top_n_address.  We
+         don't need to guard this with the flag_pmu_profile generate
+         because the value of __gcov_pmu_top_n_address is ignored when
+         not doing profiling.  */
+      gcov_pmu_top_n_address_decl =
+        build_decl (UNKNOWN_LOCATION, VAR_DECL,
+                    get_identifier ("__gcov_pmu_top_n_address"),
+                    get_gcov_unsigned_t ());
+      TREE_PUBLIC (gcov_pmu_top_n_address_decl) = 1;
+      DECL_ARTIFICIAL (gcov_pmu_top_n_address_decl) = 1;
+      make_decl_one_only (gcov_pmu_top_n_address_decl,
+                          DECL_ASSEMBLER_NAME (gcov_pmu_top_n_address_decl));
+      TREE_STATIC (gcov_pmu_top_n_address_decl) = 1;
+      DECL_INITIAL (gcov_pmu_top_n_address_decl) =
+        build_int_cstu (get_gcov_unsigned_t (), top_n_addr);
+      assemble_variable (gcov_pmu_top_n_address_decl, 0, 0, 0);
+    }
+  pmu_profiling_initialized = 1;
+}
+
 /* Performs file-level cleanup.  Close graph file, generate coverage
    variables and constructor.  */
 
@@ -1989,4 +2129,19 @@ 
   has_asm_statement = flag_ripa_disallow_asm_modules;
 }
 
+/* Check the command line OPTIONS passed to
+   -fpmu-profile-generate. Return 0 if the options are valid, non-zero
+   otherwise.  */
+
+int
+check_pmu_profile_options (const char *options)
+{
+  if (strcmp(options, "load-latency") &&
+      strcmp(options, "load-latency-verbose") &&
+      strcmp(options, "branch-mispredict") &&
+      strcmp(options, "branch-mispredict-verbose"))
+    return 1;
+  return 0;
+}
+
 #include "gt-coverage.h"
Index: gcc/coverage.h
===================================================================
--- gcc/coverage.h	(revision 175226)
+++ gcc/coverage.h	(working copy)
@@ -77,4 +77,13 @@ 
 /* Mark this module as containing asm statements.  */
 extern void coverage_has_asm_stmt (void);
 
+/* Get the da file name, given base file name.  */
+extern char * get_da_file_name (const char *base_file_name);
+
+/* Check if the specified options are valid for pmu profilig.  */
+extern int check_pmu_profile_options (const char *options);
+
+/* Defined in tree-profile.c.  */
+extern void tree_init_instrumentation_sampling (void);
+
 #endif
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 175226)
+++ gcc/common.opt	(working copy)
@@ -1606,6 +1606,14 @@ 
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fplugin-arg-<name>-<key>[=<value>]	Specify argument <key>=<value> for plugin <name>
 
+fpmu-profile-generate=
+Common Joined RejectNegative Var(flag_pmu_profile_generate)
+-fpmu-profile-generate=[load-latency]  Generate pmu profile for cache misses. Currently only pfmon based load latency profiling is supported on Intel/PEBS and AMD/IBS platforms.
+
+fpmu-profile-use=
+Common Joined RejectNegative Var(flag_pmu_profile_use)
+-fpmu-profile-use=[load-latency]  Use pmu profile data while optimizing.  Currently only perfmon based load latency profiling is supported on Intel/PEBS and AMD/IBS platforms.
+
 fpredictive-commoning
 Common Report Var(flag_predictive_commoning) Optimization
 Run predictive commoning optimization.
Index: gcc/tree-profile.c
===================================================================
--- gcc/tree-profile.c	(revision 175226)
+++ gcc/tree-profile.c	(working copy)
@@ -168,6 +168,9 @@ 
 /* extern gcov_unsigned_t __gcov_sampling_rate  */
 static tree gcov_sampling_rate_decl = NULL_TREE;
 
+/* forward declaration.  */
+void gimple_init_instrumentation_sampling (void);
+
 /* Insert STMT_IF around given sequence of consecutive statements in the
    same basic block starting with STMT_START, ending with STMT_END.  */
 
@@ -287,7 +290,7 @@ 
     }
 }
 
-static void
+void
 gimple_init_instrumentation_sampling (void)
 {
   if (!gcov_sampling_rate_decl)
@@ -341,8 +344,6 @@ 
   tree dc_profiler_fn_type;
   tree average_profiler_fn_type;
 
-  gimple_init_instrumentation_sampling ();
-
   if (!gcov_type_node)
     {
       char name_buf[32];
Index: gcc/libgcov.c
===================================================================
--- gcc/libgcov.c	(revision 175226)
+++ gcc/libgcov.c	(working copy)
@@ -124,9 +124,15 @@ 
 }
 
 #ifndef __GCOV_KERNEL__
+/* Emitted in coverage.c.  */
+extern char * __gcov_pmu_profile_filename;
+extern char * __gcov_pmu_profile_options;
+extern gcov_unsigned_t __gcov_pmu_top_n_address;
+
 /* Sampling rate.  */
 extern gcov_unsigned_t __gcov_sampling_rate;
 static int gcov_sampling_rate_initialized = 0;
+void __gcov_set_sampling_rate (unsigned int rate);
 
 /* Set sampling rate to RATE.  */
 
@@ -344,7 +350,7 @@ 
   /* Update complete filename with stripped original. */
   if (prefix_length != 0 && !IS_DIR_SEPARATOR (*filename))
     {
-      /* If prefix is given, add diretory separator.  */
+      /* If prefix is given, add directory separator.  */
       strcpy (gi_filename_up, "/");
       strcpy (gi_filename_up + 1, filename);
     }
@@ -352,6 +358,88 @@ 
     strcpy (gi_filename_up, filename);
 }
 
+/* This function allocates the space to store current file name.  */
+
+static void
+gcov_alloc_filename (void)
+{
+  /* Get file name relocation prefix.  Non-absolute values are ignored.  */
+  char *gcov_prefix = 0;
+
+  prefix_length = 0;
+  gcov_prefix_strip = 0;
+
+  {
+    /* Check if the level of dirs to strip off specified. */
+    char *tmp = getenv ("GCOV_PREFIX_STRIP");
+    if (tmp)
+      {
+        gcov_prefix_strip = atoi (tmp);
+        /* Do not consider negative values. */
+        if (gcov_prefix_strip < 0)
+          gcov_prefix_strip = 0;
+      }
+  }
+  /* Get file name relocation prefix.  Non-absolute values are ignored. */
+  gcov_prefix = getenv ("GCOV_PREFIX");
+  if (gcov_prefix)
+    {
+      prefix_length = strlen(gcov_prefix);
+
+      /* Remove an unnecessary trailing '/' */
+      if (IS_DIR_SEPARATOR (gcov_prefix[prefix_length - 1]))
+        prefix_length--;
+    }
+  else
+    prefix_length = 0;
+
+  /* If no prefix was specified and a prefix stip, then we assume
+     relative.  */
+  if (gcov_prefix_strip != 0 && prefix_length == 0)
+    {
+      gcov_prefix = ".";
+      prefix_length = 1;
+    }
+
+  /* Allocate and initialize the filename scratch space.  */
+  gi_filename = (char *) malloc (prefix_length + gcov_max_filename + 2);
+  if (prefix_length)
+    memcpy (gi_filename, gcov_prefix, prefix_length);
+
+  gi_filename_up = gi_filename + prefix_length;
+}
+
+/* Stop the pmu profiler and dump pmu profile info into the global file.  */
+
+static void
+pmu_profile_stop (void)
+{
+  const char *pmu_profile_filename =  __gcov_pmu_profile_filename;
+  const char *pmu_options = __gcov_pmu_profile_options;
+  size_t filename_length;
+  int gcda_error;
+
+  if (!pmu_profile_filename || !pmu_options)
+    return;
+
+  __gcov_stop_pmu_profiler ();
+
+  filename_length = strlen (pmu_profile_filename);
+  if (filename_length > gcov_max_filename)
+    gcov_max_filename = filename_length;
+  /* Allocate and initialize the filename scratch space.  */
+  gcov_alloc_filename ();
+  GCOV_GET_FILENAME (prefix_length, gcov_prefix_strip, pmu_profile_filename,
+                     gi_filename_up);
+  /* Open the gcda file for writing. We don't support merge yet. */
+  gcda_error = gcov_open_by_filename (gi_filename);
+  __gcov_end_pmu_profiler (gcda_error);
+  if ((gcda_error = gcov_close ()))
+    gcov_error (gcda_error  < 0 ?  "pmu_profile_stop:%s:Overflow writing\n" :
+                "pmu_profile_stop:%s:Error writing\n",
+                gi_filename);
+}
+
 /* Sort N entries in VALUE_ARRAY in descending order.
    Each entry in VALUE_ARRAY has two values. The sorting
    is based on the second value.  */
@@ -438,56 +526,7 @@ 
     }
 }
 
-/* This function allocates the space to store current file name.  */
-
 static void
-gcov_alloc_filename (void)
-{
-  /* Get file name relocation prefix.  Non-absolute values are ignored.  */
-  char *gcov_prefix = 0;
-
-  prefix_length = 0;
-  gcov_prefix_strip = 0;
-
-  {
-    /* Check if the level of dirs to strip off specified. */
-    char *tmp = getenv ("GCOV_PREFIX_STRIP");
-    if (tmp)
-      {
-        gcov_prefix_strip = atoi (tmp);
-        /* Do not consider negative values. */
-        if (gcov_prefix_strip < 0)
-          gcov_prefix_strip = 0;
-      }
-  }
-  /* Get file name relocation prefix.  Non-absolute values are ignored. */
-  gcov_prefix = getenv ("GCOV_PREFIX");
-  if (gcov_prefix)
-    {
-      prefix_length = strlen(gcov_prefix);
-
-      /* Remove an unnecessary trailing '/' */
-      if (IS_DIR_SEPARATOR (gcov_prefix[prefix_length - 1]))
-        prefix_length--;
-    }
-  else
-    prefix_length = 0;
-
-  /* If no prefix was specified and a prefix stip, then we assume
-     relative.  */
-  if (gcov_prefix_strip != 0 && prefix_length == 0)
-    {
-      gcov_prefix = ".";
-      prefix_length = 1;
-    }
-
-  /* Aelocate and initialize the filename scratch space.  */
-  gi_filename = (char *) malloc (prefix_length + gcov_max_filename + 2);
-  if (prefix_length)
-    memcpy (gi_filename, gcov_prefix, prefix_length);
-}
-
-static void
 gcov_dump_module_info (void)
 {
   struct gcov_info *gi_ptr;
@@ -499,8 +538,8 @@ 
   {
     int error;
 
-    gcov_strip_leading_dirs (prefix_length, gcov_prefix_strip, 
-                             gi_ptr->filename, gi_filename_up);
+    GCOV_GET_FILENAME (prefix_length, gcov_prefix_strip, gi_ptr->filename,
+                       gi_filename_up);
     error = gcov_open_by_filename (gi_filename);
     if (error != 0)
       continue;
@@ -534,9 +573,11 @@ 
   struct gcov_info *gi_ptr;
   int dump_module_info;
 
+  /* Stop and write the PMU profile data into the global file.  */
+  pmu_profile_stop ();
+
   dump_module_info = gcov_exit_init ();
 
-
   for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
     gcov_dump_one_gcov (gi_ptr);
 
@@ -572,11 +613,25 @@ 
       const char *ptr = info->filename;
       gcov_unsigned_t crc32 = gcov_crc32;
       size_t filename_length = strlen (info->filename);
+      struct gcov_pmu_info pmu_info;
 
       /* Refresh the longest file name information.  */
       if (filename_length > gcov_max_filename)
         gcov_max_filename = filename_length;
 
+      /* Initialize the pmu profiler.  */
+      pmu_info.pmu_profile_filename = __gcov_pmu_profile_filename;
+      pmu_info.pmu_tool = __gcov_pmu_profile_options;
+      pmu_info.pmu_top_n_address = __gcov_pmu_top_n_address;
+      __gcov_init_pmu_profiler (&pmu_info);
+      if (pmu_info.pmu_profile_filename)
+        {
+          /* Refresh the longest file name information.  */
+          filename_length = strlen (pmu_info.pmu_profile_filename);
+          if (filename_length > gcov_max_filename)
+            gcov_max_filename = filename_length;
+        }
+
       /* Assign the module ID (starting at 1).  */
       info->mod_info->ident = (++gcov_cur_module_id);
       gcc_assert (EXTRACT_MODULE_ID_FROM_GLOBAL_ID (GEN_FUNC_GLOBAL_ID (
@@ -601,7 +656,11 @@ 
       gcov_crc32 = crc32;
 
       if (!__gcov_list)
-        atexit (gcov_exit);
+        {
+          atexit (gcov_exit);
+          /* Start pmu profiler. */
+          __gcov_start_pmu_profiler ();
+        }
 
       info->next = __gcov_list;
       __gcov_list = info;
@@ -618,6 +677,7 @@ 
 {
   const struct gcov_info *gi_ptr;
 
+  __gcov_stop_pmu_profiler ();
   gcov_exit ();
   for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
     {
@@ -631,6 +691,7 @@ 
 	    ci_ptr++;
 	  }
     }
+  __gcov_start_pmu_profiler ();
 }
 
 #else /* __GCOV_KERNEL__ */
@@ -640,8 +701,8 @@ 
 /* Copy the filename to the buffer.  */
 
 static inline void
-gcov_get_filename (int prefix_length __attribute__ ((unused)), 
-                   int gcov_prefix_strip __attribute__ ((unused)), 
+gcov_get_filename (int prefix_length __attribute__ ((unused)),
+                   int gcov_prefix_strip __attribute__ ((unused)),
                    const char *filename, char *gi_filename_up)
 {
     strcpy (gi_filename_up, filename);
@@ -1090,7 +1151,6 @@ 
     }
 
   gcov_alloc_filename ();
-  gi_filename_up = gi_filename + prefix_length;
 
   return dump_module_info;
 }
Index: gcc/params.def
===================================================================
--- gcc/params.def	(revision 175226)
+++ gcc/params.def	(working copy)
@@ -1011,6 +1011,11 @@ 
           ".note.callgraph.text section",
 	  0, 0, 0)
 
+DEFPARAM (PARAM_PMU_PROFILE_N_ADDRESS,
+	  "pmu_profile_n_addresses",
+	  "While doing PMU profiling symbolize this many top addresses.",
+	  50, 1, 10000)
+
 /*
 Local variables:
 mode:c
Index: gcc/gcov-dump.c
===================================================================
--- gcc/gcov-dump.c	(revision 175226)
+++ gcc/gcov-dump.c	(working copy)
@@ -39,6 +39,10 @@ 
 static void tag_counters (const char *, unsigned, unsigned);
 static void tag_summary (const char *, unsigned, unsigned);
 static void tag_module_info (const char *, unsigned, unsigned);
+static void tag_pmu_load_latency_info (const char *, unsigned, unsigned);
+static void tag_pmu_branch_mispredict_info (const char *, unsigned, unsigned);
+static void tag_pmu_tool_header (const char *, unsigned, unsigned);
+
 extern int main (int, char **);
 
 typedef struct tag_format
@@ -73,6 +77,11 @@ 
   {GCOV_TAG_OBJECT_SUMMARY, "OBJECT_SUMMARY", tag_summary},
   {GCOV_TAG_PROGRAM_SUMMARY, "PROGRAM_SUMMARY", tag_summary},
   {GCOV_TAG_MODULE_INFO, "MODULE INFO", tag_module_info},
+  {GCOV_TAG_PMU_LOAD_LATENCY_INFO, "PMU_LOAD_LATENCY_INFO",
+   tag_pmu_load_latency_info},
+  {GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO, "PMU_BRANCH_MISPREDICT_INFO",
+   tag_pmu_branch_mispredict_info},
+  {GCOV_TAG_PMU_TOOL_HEADER, "PMU_TOOL_HEADER", tag_pmu_tool_header},
   {0, NULL, NULL}
 };
 
@@ -519,3 +528,43 @@ 
       printf (": %s [%s]", mod_info->source_filename, suffix);
     }
 }
+
+/* Read gcov tag GCOV_TAG_PMU_LOAD_LATENCY_INFO from the gcda file and
+  print the contents in a human readable form.  */
+
+static void
+tag_pmu_load_latency_info (const char *filename ATTRIBUTE_UNUSED,
+                           unsigned tag ATTRIBUTE_UNUSED, unsigned length)
+{
+  gcov_pmu_ll_info_t ll_info;
+  gcov_read_pmu_load_latency_info (&ll_info, length);
+  print_load_latency_line (stdout, &ll_info, no_newline);
+  free (ll_info.filename);
+}
+
+/* Read gcov tag GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO from the gcda
+  file and print the contents in a human readable form.  */
+
+static void
+tag_pmu_branch_mispredict_info (const char *filename ATTRIBUTE_UNUSED,
+                                unsigned tag ATTRIBUTE_UNUSED, unsigned length)
+{
+  gcov_pmu_brm_info_t brm_info;
+  gcov_read_pmu_branch_mispredict_info (&brm_info, length);
+  print_branch_mispredict_line (stdout, &brm_info, no_newline);
+  free (brm_info.filename);
+}
+
+
+/* Read gcov tag GCOV_TAG_PMU_TOOL_HEADER from the gcda file and print
+   the contents in a human readable form.  */
+
+static void
+tag_pmu_tool_header (const char *filename ATTRIBUTE_UNUSED,
+                     unsigned tag ATTRIBUTE_UNUSED, unsigned length)
+{
+  gcov_pmu_tool_header_t tool_header;
+  gcov_read_pmu_tool_header (&tool_header, length);
+  print_pmu_tool_header (stdout, &tool_header, no_newline);
+  destroy_pmu_tool_header (&tool_header);
+}