diff mbox

[google/main] Backport counter histogram in fdo summary from trunk (issue6513045)

Message ID 20120914200919.9E5F960B47@tjsboxrox.mtv.corp.google.com
State New
Headers show

Commit Message

Teresa Johnson Sept. 14, 2012, 8:09 p.m. UTC
Backport from trunk r190952 to add counter histogram to gcov program summary,
and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238).

Tested on x86_64-unknown-linux-gnu. Ok for google branches?

2012-09-14  Teresa Johnson  <tejohnson@google.com>

	* libgcc/libgcov.c (gcov_histogram_insert): New function.
	(gcov_compute_histogram): Ditto.
	(sort_by_reverse_gcov_value): Remove function.
	(gcov_compute_cutoff_values): Ditto.
	(gcov_merge_gcda_file): Merge histogram while merging summary.
	(gcov_gcda_file_size): Include histogram in summary size computation.
	(gcov_write_gcda_file): Remove assert that is no longer valid.
	(gcov_exit_init): Invoke gcov_compute_histogram.
	* gcc/gcov-io.c (gcov_write_summary): Write out non-zero histogram
        entries to function summary along with an occupancy bit vector.
	(gcov_read_summary): Read in the histogram entries.
	(gcov_histo_index): New function.
	(gcov_histogram_merge): Ditto.
	* gcc/gcov-io.h (gcov_type_unsigned): New type.
        (struct gcov_bucket_type): Ditto.
        (struct gcov_ctr_summary): Include histogram.
        (GCOV_TAG_SUMMARY_LENGTH): Update to include histogram entries.
        (GCOV_HISTOGRAM_SIZE): New macro.
        (GCOV_HISTOGRAM_BITVECTOR_SIZE): Ditto.
        (gcov_gcda_file_size): New parameter.
	* gcc/profile.c (NUM_GCOV_WORKING_SETS): Ditto.
        (gcov_working_sets): New global variable.
	(compute_working_sets): New function.
	(find_working_set): Ditto.
	(get_exec_counts): Invoke compute_working_sets.
	* gcc/loop-unroll.c (code_size_limit_factor): Call new function
        find_working_set to obtain working set information.
	* gcc/coverage.c (read_counts_file): Merge histograms, and
        fix bug with accessing summary info for non-summable counters.
	* gcc/basic-block.h (gcov_type_unsigned): New type.
        (struct gcov_working_set_info): Ditto.
        (find_working_set): Declare.
	* gcc/gcov-dump.c (tag_summary): Dump out histogram.
	* gcc/configure.ac (HOST_HAS_F_SETLKW): Set based on compile
        test using F_SETLKW with fcntl.
	* gcc/configure, gcc/config.in: Regenerate.


--
This patch is available for review at http://codereview.appspot.com/6513045

Comments

Diego Novillo Sept. 14, 2012, 8:10 p.m. UTC | #1
On Fri, Sep 14, 2012 at 4:09 PM, Teresa Johnson <tejohnson@google.com> wrote:
> Backport from trunk r190952 to add counter histogram to gcov program summary,
> and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238).

Why don't we just get this via the trunk -> google/main merges?


Diego.
Teresa Johnson Sept. 14, 2012, 8:17 p.m. UTC | #2
On Fri, Sep 14, 2012 at 1:10 PM, Diego Novillo <dnovillo@google.com> wrote:
> On Fri, Sep 14, 2012 at 4:09 PM, Teresa Johnson <tejohnson@google.com> wrote:
>> Backport from trunk r190952 to add counter histogram to gcov program summary,
>> and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238).
>
> Why don't we just get this via the trunk -> google/main merges?
>
>
> Diego.

Should I just put it onto ggogle/4_7 and 4_6 directly then?

Teresa
Diego Novillo Sept. 14, 2012, 8:19 p.m. UTC | #3
On Fri Sep 14 16:17:25 2012, Teresa Johnson wrote:

> Should I just put it onto ggogle/4_7 and 4_6 directly then?

Yeah.  Not sure it's really needed in 4_6, though.


Diego.
Xinliang David Li Sept. 14, 2012, 8:19 p.m. UTC | #4
Yes. The google/main update will happen next quarter.

David

On Fri, Sep 14, 2012 at 1:17 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Fri, Sep 14, 2012 at 1:10 PM, Diego Novillo <dnovillo@google.com> wrote:
>> On Fri, Sep 14, 2012 at 4:09 PM, Teresa Johnson <tejohnson@google.com> wrote:
>>> Backport from trunk r190952 to add counter histogram to gcov program summary,
>>> and follow-on fixes for PR gcov-profile/54487 (r191074 and r191238).
>>
>> Why don't we just get this via the trunk -> google/main merges?
>>
>>
>> Diego.
>
> Should I just put it onto ggogle/4_7 and 4_6 directly then?
>
> Teresa
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
Teresa Johnson Sept. 14, 2012, 8:20 p.m. UTC | #5
On Fri, Sep 14, 2012 at 1:19 PM, Diego Novillo <dnovillo@google.com> wrote:
> On Fri Sep 14 16:17:25 2012, Teresa Johnson wrote:
>
>> Should I just put it onto ggogle/4_7 and 4_6 directly then?
>
>
> Yeah.  Not sure it's really needed in 4_6, though.

Ok. There are only trivial differences between the patch I uploaded
for google/main and the google/4_7 patch that I have also already
created and tested. Ok for google/4_7?

Thanks,
Teresa

>
>
> Diego.
Xinliang David Li Sept. 14, 2012, 8:21 p.m. UTC | #6
yes.

thanks,

David

On Fri, Sep 14, 2012 at 1:20 PM, Teresa Johnson <tejohnson@google.com> wrote:
> On Fri, Sep 14, 2012 at 1:19 PM, Diego Novillo <dnovillo@google.com> wrote:
>> On Fri Sep 14 16:17:25 2012, Teresa Johnson wrote:
>>
>>> Should I just put it onto ggogle/4_7 and 4_6 directly then?
>>
>>
>> Yeah.  Not sure it's really needed in 4_6, though.
>
> Ok. There are only trivial differences between the patch I uploaded
> for google/main and the google/4_7 patch that I have also already
> created and tested. Ok for google/4_7?
>
> Thanks,
> Teresa
>
>>
>>
>> Diego.
>
>
>
> --
> Teresa Johnson | Software Engineer | tejohnson@google.com | 408-460-2413
diff mbox

Patch

Index: libgcc/libgcov.c
===================================================================
--- libgcc/libgcov.c	(revision 191302)
+++ libgcc/libgcov.c	(working copy)
@@ -585,6 +585,76 @@  gcov_dump_module_info (void)
   __gcov_finalize_dyn_callgraph ();
 }
 
+/* Insert counter VALUE into HISTOGRAM.  */
+
+static void
+gcov_histogram_insert(gcov_bucket_type *histogram, gcov_type value)
+{
+  unsigned i;
+
+  i = gcov_histo_index(value);
+  histogram[i].num_counters++;
+  histogram[i].cum_value += value;
+  if (value < histogram[i].min_value)
+    histogram[i].min_value = value;
+}
+
+/* Computes a histogram of the arc counters to place in the summary SUM.  */
+
+static void
+gcov_compute_histogram (struct gcov_summary *sum)
+{
+  struct gcov_info *gi_ptr;
+  const struct gcov_fn_info *gfi_ptr;
+  const struct gcov_ctr_info *ci_ptr;
+  struct gcov_ctr_summary *cs_ptr;
+  unsigned t_ix, f_ix, ctr_info_ix, ix;
+  int h_ix;
+
+  /* This currently only applies to arc counters.  */
+  t_ix = GCOV_COUNTER_ARCS;
+
+  /* First check if there are any counts recorded for this counter.  */
+  cs_ptr = &(sum->ctrs[t_ix]);
+  if (!cs_ptr->num)
+    return;
+
+  for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
+    {
+      cs_ptr->histogram[h_ix].num_counters = 0;
+      cs_ptr->histogram[h_ix].min_value = cs_ptr->run_max;
+      cs_ptr->histogram[h_ix].cum_value = 0;
+    }
+
+  /* Walk through all the per-object structures and record each of
+     the count values in histogram.  */
+  for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
+    {
+      if (!gi_ptr->merge[t_ix])
+        continue;
+
+      /* Find the appropriate index into the gcov_ctr_info array
+         for the counter we are currently working on based on the
+         existence of the merge function pointer for this object.  */
+      for (ix = 0, ctr_info_ix = 0; ix < t_ix; ix++)
+        {
+          if (gi_ptr->merge[ix])
+            ctr_info_ix++;
+        }
+      for (f_ix = 0; f_ix != gi_ptr->n_functions; f_ix++)
+        {
+          gfi_ptr = gi_ptr->functions[f_ix];
+
+          if (!gfi_ptr || gfi_ptr->key != gi_ptr)
+            continue;
+
+          ci_ptr = &gfi_ptr->ctrs[ctr_info_ix];
+          for (ix = 0; ix < ci_ptr->num; ix++)
+            gcov_histogram_insert (cs_ptr->histogram, ci_ptr->values[ix]);
+        }
+    }
+}
+
 /* Dump the coverage counts. We merge with existing counts when
    possible, to avoid growing the .da files ad infinitum. We use this
    program's checksum to make sure we only accumulate whole program
@@ -758,118 +828,6 @@  gcov_sort_topn_counter_arrays (const struct gcov_i
      }
 }
 
-/* Used by qsort to sort gcov values in descending order.  */
-
-static int
-sort_by_reverse_gcov_value (const void *pa, const void *pb)
-{
-  const gcov_type a = *(gcov_type const *)pa;
-  const gcov_type b = *(gcov_type const *)pb;
-
-  if (b > a)
-    return 1;
-  else if (b == a)
-    return 0;
-  else
-    return -1;
-}
-
-/* Determines the number of counters required to cover a given percentage
-   of the total sum of execution counts in the summary, which is then also
-   recorded in SUM.  */
-
-static void
-gcov_compute_cutoff_values (struct gcov_summary *sum)
-{
-  struct gcov_info *gi_ptr;
-  const struct gcov_fn_info *gfi_ptr;
-  const struct gcov_ctr_info *ci_ptr;
-  struct gcov_ctr_summary *cs_ptr;
-  unsigned t_ix, f_ix, i, ctr_info_ix, index;
-  gcov_unsigned_t c_num;
-  gcov_type *value_array;
-  gcov_type cum, cum_cutoff;
-  char *cutoff_str;
-  unsigned cutoff_perc;
-
-#define CUM_CUTOFF_PERCENT_TIMES_10 999
-  cutoff_str = getenv ("GCOV_HOTCODE_CUTOFF_TIMES_10");
-  if (cutoff_str && strlen (cutoff_str))
-    cutoff_perc = atoi (cutoff_str);
-  else
-    cutoff_perc = CUM_CUTOFF_PERCENT_TIMES_10;
-
-  /* This currently only applies to arc counters.  */
-  t_ix = GCOV_COUNTER_ARCS;
-
-  /* First check if there are any counts recorded for this counter.  */
-  cs_ptr = &(sum->ctrs[t_ix]);
-  if (!cs_ptr->num)
-    return;
-
-  /* Determine the cumulative counter value at the specified cutoff
-     percentage and record the percentage for use by gcov consumers.
-     Check for overflow when sum_all is multiplied by the cutoff_perc,
-     and if so, do the divide first.  */
-  if (cs_ptr->sum_all*cutoff_perc < cs_ptr->sum_all)
-    /* Overflow, do the divide first.  */
-    cum_cutoff = cs_ptr->sum_all / 1000 * cutoff_perc;
-  else
-    /* Otherwise multiply first to get the correct value for small
-       values of sum_all.  */
-    cum_cutoff = (cs_ptr->sum_all * cutoff_perc) / 1000;
-
-  /* Next, walk through all the per-object structures and save each of
-     the count values in value_array.  */
-  index = 0;
-  value_array = (gcov_type *) malloc (sizeof (gcov_type) * cs_ptr->num);
-  for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next)
-    {
-      if (!gi_ptr->merge[t_ix])
-        continue;
-
-      /* Find the appropriate index into the gcov_ctr_info array
-         for the counter we are currently working on based on the
-         existence of the merge function pointer for this object.  */
-      for (i = 0, ctr_info_ix = 0; i < t_ix; i++)
-        {
-          if (gi_ptr->merge[i])
-            ctr_info_ix++;
-        }
-      for (f_ix = 0; f_ix != gi_ptr->n_functions; f_ix++)
-        {
-          gfi_ptr = gi_ptr->functions[f_ix];
-
-          if (!gfi_ptr || gfi_ptr->key != gi_ptr)
-            continue;
-
-          ci_ptr = &gfi_ptr->ctrs[ctr_info_ix];
-          /* Sanity check that there are enough entries in value_arry
-            for this function's counters. Gracefully handle the case when
-            there are not, in case something in the profile info is
-            corrupted.  */
-          c_num = ci_ptr->num;
-          if (index + c_num > cs_ptr->num)
-            c_num = cs_ptr->num - index;
-          /* Copy over this function's counter values.  */
-          memcpy (&value_array[index], ci_ptr->values,
-                  sizeof (gcov_type) * c_num);
-          index += c_num;
-        }
-    }
-
-  /* Sort all the counter values by descending value and finally
-     accumulate the values from hottest on down until reaching
-     the cutoff value computed earlier.  */
-  qsort (value_array, cs_ptr->num, sizeof (gcov_type),
-         sort_by_reverse_gcov_value);
-  for (cum = 0, c_num = 0; c_num < cs_ptr->num && cum < cum_cutoff; c_num++)
-    cum += value_array[c_num];
-  /* Record the number of counters required to reach the cutoff value.  */
-  cs_ptr->num_hot_counters = c_num;
-  free (value_array);
-}
-
 /* Compute object summary recored in gcov_info INFO. The result is
    stored in OBJ_SUM. Note that the caller is responsible for
    zeroing out OBJ_SUM, otherwise the summary is accumulated.  */
@@ -971,8 +929,6 @@  gcov_merge_gcda_file (struct gcov_info *gi_ptr)
            break;
 
          length = gcov_read_unsigned ();
-         if (length != GCOV_TAG_SUMMARY_LENGTH)
-           goto read_mismatch;
          gcov_read_summary (&tmp);
          if ((error = gcov_is_error ()))
            goto read_error;
@@ -1070,12 +1026,15 @@  rewrite:;
           {
             if (!cs_prg->runs++)
               cs_prg->num = cs_tprg->num;
-            if (cs_tprg->num_hot_counters > cs_prg->num_hot_counters)
-              cs_prg->num_hot_counters = cs_tprg->num_hot_counters;
             cs_prg->sum_all += cs_tprg->sum_all;
             if (cs_prg->run_max < cs_tprg->run_max)
               cs_prg->run_max = cs_tprg->run_max;
             cs_prg->sum_max += cs_tprg->run_max;
+            if (cs_prg->runs == 1)
+              memcpy (cs_prg->histogram, cs_tprg->histogram,
+                      sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE);
+            else
+              gcov_histogram_merge (cs_prg->histogram, cs_tprg->histogram);
           }
         else if (cs_prg->runs)
           goto read_mismatch;
@@ -1084,7 +1043,13 @@  rewrite:;
           memcpy (cs_all, cs_prg, sizeof (*cs_all));
         else if (!all.checksum
                  && (!GCOV_LOCKED || cs_all->runs == cs_prg->runs)
-                 && memcmp (cs_all, cs_prg, sizeof (*cs_all)))
+                   /* Don't compare the histograms, which may have slight
+                      variations depending on the order they were updated
+                      due to the truncating integer divides used in the
+                      merge.  */
+                   && memcmp (cs_all, cs_prg,
+                              sizeof (*cs_all) - (sizeof (gcov_bucket_type)
+                                                  * GCOV_HISTOGRAM_SIZE)))
           {
             fprintf (stderr, "profiling:%s:Invocation mismatch - "
                 "some data files may have been removed%s\n",
@@ -1103,19 +1068,28 @@  rewrite:;
    the size is in units of gcov_type.  */
 
 GCOV_LINKAGE unsigned
-gcov_gcda_file_size (struct gcov_info *gi_ptr)
+gcov_gcda_file_size (struct gcov_info *gi_ptr,
+                     struct gcov_summary *sum)
 {
   unsigned size;
   const struct gcov_fn_info *fi_ptr;
-  unsigned f_ix, t_ix;
+  unsigned f_ix, t_ix, h_ix, h_cnt = 0;
   unsigned n_counts;
   const struct gcov_ctr_info *ci_ptr;
+  const struct gcov_ctr_summary *csum;
 
   /* GCOV_DATA_MAGIC, GCOV_VERSION and time_stamp.  */
   size = 3;
 
-  /* Program summary.  */
-  size += 2 + GCOV_TAG_SUMMARY_LENGTH;
+  /* Program summary, which depends on the number of non-zero
+     histogram entries.  */
+  csum = &sum->ctrs[GCOV_COUNTER_ARCS];
+  for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
+    {
+      if (csum->histogram[h_ix].num_counters > 0)
+        h_cnt++;
+    }
+  size += 2 + GCOV_TAG_SUMMARY_LENGTH(h_cnt);
 
   /* size for each function.  */
   for (f_ix = 0; f_ix < gi_ptr->n_functions; f_ix++)
@@ -1195,9 +1169,6 @@  gcov_write_gcda_file (struct gcov_info *gi_ptr)
         }
       eof_pos1 = gcov_position ();
     }
-    gcc_assert (!eof_pos ||
-                (eof_pos == gcov_position () && eof_pos1 == eof_pos));
-
     eof_pos = eof_pos1;
     /* Write the end marker  */
     gcov_write_unsigned (0);
@@ -1237,7 +1208,7 @@  gcov_exit_init (void)
          is FDO/LIPO.  */
       dump_module_info |= gi_ptr->mod_info->is_primary;
     }
-  gcov_compute_cutoff_values (&this_program);
+  gcov_compute_histogram (&this_program);
 
   gcov_alloc_filename ();
 
Index: gcc/configure
===================================================================
--- gcc/configure	(revision 191302)
+++ gcc/configure	(working copy)
@@ -11026,6 +11026,46 @@  $as_echo "#define HAVE_CLOCK_T 1" >>confdefs.h
 
 fi
 
+# Check if F_SETLKW is supported by fcntl.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for F_SETLKW" >&5
+$as_echo_n "checking for F_SETLKW... " >&6; }
+if test "${ac_cv_f_setlkw+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+#include <fcntl.h>
+int
+main ()
+{
+
+struct flock fl;
+fl.l_whence = 0;
+fl.l_start = 0;
+fl.l_len = 0;
+fl.l_pid = 0;
+return fcntl (1, F_SETLKW, &fl);
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  ac_cv_f_setlkw=yes
+else
+  ac_cv_f_setlkw=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_f_setlkw" >&5
+$as_echo "$ac_cv_f_setlkw" >&6; }
+if test $ac_cv_f_setlkw = yes; then
+
+$as_echo "#define HOST_HAS_F_SETLKW 1" >>confdefs.h
+
+fi
+
 # Restore CFLAGS, CXXFLAGS from before the gcc_AC_NEED_DECLARATIONS tests.
 CFLAGS="$saved_CFLAGS"
 CXXFLAGS="$saved_CXXFLAGS"
@@ -18029,7 +18069,7 @@  else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18032 "configure"
+#line 18072 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -18135,7 +18175,7 @@  else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 18138 "configure"
+#line 18178 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
Index: gcc/gcov-io.c
===================================================================
--- gcc/gcov-io.c	(revision 191302)
+++ gcc/gcov-io.c	(working copy)
@@ -516,19 +516,49 @@  gcov_write_tag_length (gcov_unsigned_t tag, gcov_u
 GCOV_LINKAGE void
 gcov_write_summary (gcov_unsigned_t tag, const struct gcov_summary *summary)
 {
-  unsigned ix;
+  unsigned ix, h_ix, bv_ix, h_cnt = 0;
   const struct gcov_ctr_summary *csum;
+  unsigned histo_bitvector[GCOV_HISTOGRAM_BITVECTOR_SIZE];
 
-  gcov_write_tag_length (tag, GCOV_TAG_SUMMARY_LENGTH);
+  /* Count number of non-zero histogram entries, and fill in a bit vector
+     of non-zero indices. The histogram is only currently computed for arc
+     counters.  */
+  for (bv_ix = 0; bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE; bv_ix++)
+    histo_bitvector[bv_ix] = 0;
+  csum = &summary->ctrs[GCOV_COUNTER_ARCS];
+  for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
+    {
+      if (csum->histogram[h_ix].num_counters > 0)
+        {
+          histo_bitvector[h_ix / 32] |= 1 << (h_ix % 32);
+          h_cnt++;
+        }
+    }
+  gcov_write_tag_length (tag, GCOV_TAG_SUMMARY_LENGTH(h_cnt));
   gcov_write_unsigned (summary->checksum);
   for (csum = summary->ctrs, ix = GCOV_COUNTERS_SUMMABLE; ix--; csum++)
     {
       gcov_write_unsigned (csum->num);
-      gcov_write_unsigned (csum->num_hot_counters);
       gcov_write_unsigned (csum->runs);
       gcov_write_counter (csum->sum_all);
       gcov_write_counter (csum->run_max);
       gcov_write_counter (csum->sum_max);
+      if (ix != GCOV_COUNTER_ARCS)
+        {
+          for (bv_ix = 0; bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE; bv_ix++)
+            gcov_write_unsigned (0);
+          continue;
+        }
+      for (bv_ix = 0; bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE; bv_ix++)
+        gcov_write_unsigned (histo_bitvector[bv_ix]);
+      for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
+        {
+          if (!csum->histogram[h_ix].num_counters)
+            continue;
+          gcov_write_unsigned (csum->histogram[h_ix].num_counters);
+          gcov_write_counter (csum->histogram[h_ix].min_value);
+          gcov_write_counter (csum->histogram[h_ix].cum_value);
+        }
     }
 }
 #endif /* IN_LIBGCOV */
@@ -635,18 +665,56 @@  gcov_read_string (void)
 GCOV_LINKAGE void
 gcov_read_summary (struct gcov_summary *summary)
 {
-  unsigned ix;
+  unsigned ix, h_ix, bv_ix, h_cnt = 0;
   struct gcov_ctr_summary *csum;
+  unsigned histo_bitvector[GCOV_HISTOGRAM_BITVECTOR_SIZE];
+  unsigned cur_bitvector;
 
   summary->checksum = gcov_read_unsigned ();
   for (csum = summary->ctrs, ix = GCOV_COUNTERS_SUMMABLE; ix--; csum++)
     {
       csum->num = gcov_read_unsigned ();
-      csum->num_hot_counters = gcov_read_unsigned ();
       csum->runs = gcov_read_unsigned ();
       csum->sum_all = gcov_read_counter ();
       csum->run_max = gcov_read_counter ();
       csum->sum_max = gcov_read_counter ();
+      memset (csum->histogram, 0,
+              sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE);
+      for (bv_ix = 0; bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE; bv_ix++)
+        {
+          histo_bitvector[bv_ix] = gcov_read_unsigned ();
+          h_cnt += __builtin_popcountll (histo_bitvector[bv_ix]);
+        }
+      bv_ix = 0;
+      h_ix = 0;
+      cur_bitvector = 0;
+      while (h_cnt--)
+        {
+          /* Find the index corresponding to the next entry we will read in.
+             First find the next non-zero bitvector and re-initialize
+             the histogram index accordingly, then right shift and increment
+             the index until we find a set bit.  */
+          while (!cur_bitvector)
+            {
+              h_ix = bv_ix * 32;
+              cur_bitvector = histo_bitvector[bv_ix++];
+              gcc_assert(bv_ix <= GCOV_HISTOGRAM_BITVECTOR_SIZE);
+            }
+          while (!(cur_bitvector & 0x1))
+            {
+              h_ix++;
+              cur_bitvector >>= 1;
+            }
+          gcc_assert(h_ix < GCOV_HISTOGRAM_SIZE);
+
+          csum->histogram[h_ix].num_counters = gcov_read_unsigned ();
+          csum->histogram[h_ix].min_value = gcov_read_counter ();
+          csum->histogram[h_ix].cum_value = gcov_read_counter ();
+          /* Shift off the index we are done with and increment to the
+             corresponding next histogram entry.  */
+          cur_bitvector >>= 1;
+          h_ix++;
+        }
     }
 }
 
@@ -872,6 +940,187 @@  gcov_time (void)
 }
 #endif /* IN_GCOV */
 
+#if IN_LIBGCOV || !IN_GCOV
+/* Determine the index into histogram for VALUE. */
+
+static unsigned
+gcov_histo_index(gcov_type value)
+{
+  gcov_type_unsigned v = (gcov_type_unsigned)value;
+  unsigned r = 0;
+  unsigned prev2bits = 0;
+
+  /* Find index into log2 scale histogram, where each of the log2
+     sized buckets is divided into 4 linear sub-buckets for better
+     focus in the higher buckets.  */
+
+  /* Find the place of the most-significant bit set.  */
+  if (v > 0)
+    r = 63 - __builtin_clzll (v);
+
+  /* If at most the 2 least significant bits are set (value is
+     0 - 3) then that value is our index into the lowest set of
+     four buckets.  */
+  if (r < 2)
+    return (unsigned)value;
+
+  gcc_assert (r < 64);
+
+  /* Find the two next most significant bits to determine which
+     of the four linear sub-buckets to select.  */
+  prev2bits = (v >> (r - 2)) & 0x3;
+  /* Finally, compose the final bucket index from the log2 index and
+     the next 2 bits. The minimum r value at this point is 2 since we
+     returned above if r was 2 or more, so the minimum bucket at this
+     point is 4.  */
+  return (r - 1) * 4 + prev2bits;
+}
+
+/* Merge SRC_HISTO into TGT_HISTO. The counters are assumed to be in
+   the same relative order in both histograms, and are matched up
+   and merged in reverse order. Each counter is assigned an equal portion of
+   its entry's original cumulative counter value when computing the
+   new merged cum_value.  */
+
+static void gcov_histogram_merge(gcov_bucket_type *tgt_histo,
+                                 gcov_bucket_type *src_histo)
+{
+  int src_i, tgt_i, tmp_i = 0;
+  unsigned src_num, tgt_num, merge_num;
+  gcov_type src_cum, tgt_cum, merge_src_cum, merge_tgt_cum, merge_cum;
+  gcov_type merge_min;
+  gcov_bucket_type tmp_histo[GCOV_HISTOGRAM_SIZE];
+  int src_done = 0;
+
+  memset(tmp_histo, 0, sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE);
+
+  /* Assume that the counters are in the same relative order in both
+     histograms. Walk the histograms from largest to smallest entry,
+     matching up and combining counters in order.  */
+  src_num = 0;
+  src_cum = 0;
+  src_i = GCOV_HISTOGRAM_SIZE - 1;
+  for (tgt_i = GCOV_HISTOGRAM_SIZE - 1; tgt_i >= 0 && !src_done; tgt_i--)
+    {
+      tgt_num = tgt_histo[tgt_i].num_counters;
+      tgt_cum = tgt_histo[tgt_i].cum_value;
+      /* Keep going until all of the target histogram's counters at this
+         position have been matched and merged with counters from the
+         source histogram.  */
+      while (tgt_num > 0 && !src_done)
+        {
+          /* If this is either the first time through this loop or we just
+             exhausted the previous non-zero source histogram entry, look
+             for the next non-zero source histogram entry.  */
+          if (!src_num)
+            {
+              /* Locate the next non-zero entry.  */
+              while (src_i >= 0 && !src_histo[src_i].num_counters)
+                src_i--;
+              /* If source histogram has fewer counters, then just copy over the
+                 remaining target counters and quit.  */
+              if (src_i < 0)
+                {
+                  tmp_histo[tgt_i].num_counters += tgt_num;
+                  tmp_histo[tgt_i].cum_value += tgt_cum;
+                  if (!tmp_histo[tgt_i].min_value ||
+                      tgt_histo[tgt_i].min_value < tmp_histo[tgt_i].min_value)
+                    tmp_histo[tgt_i].min_value = tgt_histo[tgt_i].min_value;
+                  while (--tgt_i >= 0)
+                    {
+                      tmp_histo[tgt_i].num_counters
+                          += tgt_histo[tgt_i].num_counters;
+                      tmp_histo[tgt_i].cum_value += tgt_histo[tgt_i].cum_value;
+                      if (!tmp_histo[tgt_i].min_value ||
+                          tgt_histo[tgt_i].min_value
+                          < tmp_histo[tgt_i].min_value)
+                        tmp_histo[tgt_i].min_value = tgt_histo[tgt_i].min_value;
+                    }
+
+                  src_done = 1;
+                  break;
+                }
+
+              src_num = src_histo[src_i].num_counters;
+              src_cum = src_histo[src_i].cum_value;
+            }
+
+          /* The number of counters to merge on this pass is the minimum
+             of the remaining counters from the current target and source
+             histogram entries.  */
+          merge_num = tgt_num;
+          if (src_num < merge_num)
+            merge_num = src_num;
+
+          /* The merged min_value is the sum of the min_values from target
+             and source.  */
+          merge_min = tgt_histo[tgt_i].min_value + src_histo[src_i].min_value;
+
+          /* Compute the portion of source and target entries' cum_value
+             that will be apportioned to the counters being merged.
+             The total remaining cum_value from each entry is divided
+             equally among the counters from that histogram entry if we
+             are not merging all of them.  */
+          merge_src_cum = src_cum;
+          if (merge_num < src_num)
+            merge_src_cum = merge_num * src_cum / src_num;
+          merge_tgt_cum = tgt_cum;
+          if (merge_num < tgt_num)
+            merge_tgt_cum = merge_num * tgt_cum / tgt_num;
+          /* The merged cum_value is the sum of the source and target
+             components.  */
+          merge_cum = merge_src_cum + merge_tgt_cum;
+
+          /* Update the remaining number of counters and cum_value left
+             to be merged from this source and target entry.  */
+          src_cum -= merge_src_cum;
+          tgt_cum -= merge_tgt_cum;
+          src_num -= merge_num;
+          tgt_num -= merge_num;
+
+          /* The merged counters get placed in the new merged histogram
+             at the entry for the merged min_value.  */
+          tmp_i = gcov_histo_index(merge_min);
+          gcc_assert (tmp_i < GCOV_HISTOGRAM_SIZE);
+          tmp_histo[tmp_i].num_counters += merge_num;
+          tmp_histo[tmp_i].cum_value += merge_cum;
+          if (!tmp_histo[tmp_i].min_value ||
+              merge_min < tmp_histo[tmp_i].min_value)
+            tmp_histo[tmp_i].min_value = merge_min;
+
+          /* Ensure the search for the next non-zero src_histo entry starts
+             at the next smallest histogram bucket.  */
+          if (!src_num)
+            src_i--;
+        }
+    }
+
+  gcc_assert (tgt_i < 0);
+
+  /* In the case where there were more counters in the source histogram,
+     accumulate the remaining unmerged cumulative counter values. Add
+     those to the smallest non-zero target histogram entry. Otherwise,
+     the total cumulative counter values in the histogram will be smaller
+     than the sum_all stored in the summary, which will complicate
+     computing the working set information from the histogram later on.  */
+  if (src_num)
+    src_i--;
+  while (src_i >= 0)
+    {
+      src_cum += src_histo[src_i].cum_value;
+      src_i--;
+    }
+  /* At this point, tmp_i should be the smallest non-zero entry in the
+     tmp_histo.  */
+  gcc_assert(tmp_i >= 0 && tmp_i < GCOV_HISTOGRAM_SIZE
+             && tmp_histo[tmp_i].num_counters > 0);
+  tmp_histo[tmp_i].cum_value += src_cum;
+
+  /* Finally, copy the merged histogram into tgt_histo.  */
+  memcpy(tgt_histo, tmp_histo, sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE);
+}
+#endif /* IN_LIBGCOV || !IN_GCOV */
+
 #ifdef __GCOV_KERNEL__
 
 /* File fclose operation in kernel mode.  */
Index: gcc/gcov-io.h
===================================================================
--- gcc/gcov-io.h	(revision 191302)
+++ gcc/gcov-io.h	(working copy)
@@ -139,7 +139,9 @@  see the files COPYING3 and COPYING.RUNTIME respect
 	counts: header int64:count*
 	summary: int32:checksum {count-summary}GCOV_COUNTERS_SUMMABLE
 	count-summary:	int32:num int32:runs int64:sum
-			int64:max int64:sum_max
+			int64:max int64:sum_max histogram
+        histogram: {int32:bitvector}8 histogram-buckets*
+        histogram-buckets: int32:num int64:min int64:sum
 
    The ANNOUNCE_FUNCTION record is the same as that in the note file,
    but without the source location.  The COUNTS gives the
@@ -256,10 +258,12 @@  typedef unsigned gcov_unsigned_t __attribute__ ((m
 typedef unsigned gcov_position_t __attribute__ ((mode (SI)));
 #if LONG_LONG_TYPE_SIZE > 32
 typedef signed gcov_type __attribute__ ((mode (DI)));
+typedef unsigned gcov_type_unsigned __attribute__ ((mode (DI)));
 #define FUNC_ID_WIDTH 32
 #define FUNC_ID_MASK ((1ll << FUNC_ID_WIDTH) - 1)
 #else
 typedef signed gcov_type __attribute__ ((mode (SI)));
+typedef unsigned gcov_type_unsigned __attribute__ ((mode (SI)));
 #define FUNC_ID_WIDTH 16
 #define FUNC_ID_MASK ((1 << FUNC_ID_WIDTH) - 1)
 #endif
@@ -269,10 +273,12 @@  typedef unsigned gcov_unsigned_t __attribute__ ((m
 typedef unsigned gcov_position_t __attribute__ ((mode (HI)));
 #if LONG_LONG_TYPE_SIZE > 32
 typedef signed gcov_type __attribute__ ((mode (SI)));
+typedef unsigned gcov_type_unsigned __attribute__ ((mode (SI)));
 #define FUNC_ID_WIDTH 32
 #define FUNC_ID_MASK ((1ll << FUNC_ID_WIDTH) - 1)
 #else
 typedef signed gcov_type __attribute__ ((mode (HI)));
+typedef unsigned gcov_type_unsigned __attribute__ ((mode (HI)));
 #define FUNC_ID_WIDTH 16
 #define FUNC_ID_MASK ((1 << FUNC_ID_WIDTH) - 1)
 #endif
@@ -281,10 +287,12 @@  typedef unsigned gcov_unsigned_t __attribute__ ((m
 typedef unsigned gcov_position_t __attribute__ ((mode (QI)));
 #if LONG_LONG_TYPE_SIZE > 32
 typedef signed gcov_type __attribute__ ((mode (HI)));
+typedef unsigned gcov_type_unsigned __attribute__ ((mode (HI)));
 #define FUNC_ID_WIDTH 32
 #define FUNC_ID_MASK ((1ll << FUNC_ID_WIDTH) - 1)
 #else
 typedef signed gcov_type __attribute__ ((mode (QI)));
+typedef unsigned gcov_type_unsigned __attribute__ ((mode (QI)));
 #define FUNC_ID_WIDTH 16
 #define FUNC_ID_MASK ((1 << FUNC_ID_WIDTH) - 1)
 #endif
@@ -318,6 +326,7 @@  typedef unsigned gcov_position_t;
 #if IN_GCOV
 #define GCOV_LINKAGE static
 typedef HOST_WIDEST_INT gcov_type;
+typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;
 #if IN_GCOV > 0
 #include <sys/types.h>
 #endif
@@ -439,8 +448,8 @@  typedef HOST_WIDEST_INT gcov_type;
 #define GCOV_TAG_COUNTER_NUM(LENGTH) ((LENGTH) / 2)
 #define GCOV_TAG_OBJECT_SUMMARY  ((gcov_unsigned_t)0xa1000000) /* Obsolete */
 #define GCOV_TAG_PROGRAM_SUMMARY ((gcov_unsigned_t)0xa3000000)
-#define GCOV_TAG_SUMMARY_LENGTH  \
-	(1 + GCOV_COUNTERS_SUMMABLE * (3 + 3 * 2))
+#define GCOV_TAG_SUMMARY_LENGTH(NUM)  \
+	(1 + GCOV_COUNTERS_SUMMABLE * (10 + 3 * 2) + (NUM) * 5)
 #define GCOV_TAG_PMU_LOAD_LATENCY_INFO ((gcov_unsigned_t)0xa5000000)
 #define GCOV_TAG_PMU_LOAD_LATENCY_LENGTH (15)
 #define GCOV_TAG_PMU_BRANCH_MISPREDICT_INFO ((gcov_unsigned_t)0xa7000000)
@@ -538,16 +547,39 @@  typedef HOST_WIDEST_INT gcov_type;
 
 /* Structured records.  */
 
+/* Structure used for each bucket of the log2 histogram of counter values.  */
+typedef struct
+{
+  /* Number of counters whose profile count falls within the bucket.  */
+  gcov_unsigned_t num_counters;
+  /* Smallest profile count included in this bucket.  */
+  gcov_type min_value;
+  /* Cumulative value of the profile counts in this bucket.  */
+  gcov_type cum_value;
+} gcov_bucket_type;
+
+/* For a log2 scale histogram with each range split into 4
+   linear sub-ranges, there will be at most 64 (max gcov_type bit size) - 1 log2
+   ranges since the lowest 2 log2 values share the lowest 4 linear
+   sub-range (values 0 - 3).  This is 252 total entries (63*4).  */
+
+#define GCOV_HISTOGRAM_SIZE 252
+
+/* How many unsigned ints are required to hold a bit vector of non-zero
+   histogram entries when the histogram is written to the gcov file.
+   This is essentially a ceiling divide by 32 bits.  */
+#define GCOV_HISTOGRAM_BITVECTOR_SIZE (GCOV_HISTOGRAM_SIZE + 31) / 32
+
 /* Cumulative counter data.  */
 struct gcov_ctr_summary
 {
   gcov_unsigned_t num;		/* number of counters.  */
-  gcov_unsigned_t num_hot_counters;/* number of counters to reach a given
-                                      percent of sum_all.  */
   gcov_unsigned_t runs;		/* number of program runs */
   gcov_type sum_all;		/* sum of all counters accumulated.  */
   gcov_type run_max;		/* maximum value on a single run.  */
   gcov_type sum_max;    	/* sum of individual run max values.  */
+  gcov_bucket_type histogram[GCOV_HISTOGRAM_SIZE]; /* histogram of
+                                                      counter values.  */
 };
 
 /* Object & program summary record.  */
@@ -952,7 +984,8 @@  static void gcov_rewrite (void);
 GCOV_LINKAGE void gcov_seek (gcov_position_t /*position*/) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE void gcov_truncate (void) ATTRIBUTE_HIDDEN;
 GCOV_LINKAGE gcov_unsigned_t gcov_string_length (const char *) ATTRIBUTE_HIDDEN;
-GCOV_LINKAGE unsigned gcov_gcda_file_size (struct gcov_info *);
+GCOV_LINKAGE unsigned gcov_gcda_file_size (struct gcov_info *,
+                                           struct gcov_summary *);
 #else
 /* Available outside libgcov */
 GCOV_LINKAGE void gcov_sync (gcov_position_t /*base*/,
Index: gcc/config.in
===================================================================
--- gcc/config.in	(revision 191302)
+++ gcc/config.in	(working copy)
@@ -1594,6 +1594,12 @@ 
 #endif
 
 
+/* Define if F_SETLKW supported by fcntl. */
+#ifndef USED_FOR_TARGET
+#undef HOST_HAS_F_SETLKW
+#endif
+
+
 /* Define as const if the declaration of iconv() needs const. */
 #ifndef USED_FOR_TARGET
 #undef ICONV_CONST
Index: gcc/configure.ac
===================================================================
--- gcc/configure.ac	(revision 191302)
+++ gcc/configure.ac	(working copy)
@@ -1217,6 +1217,22 @@  if test $gcc_cv_type_clock_t = yes; then
   [Define if <time.h> defines clock_t.])
 fi
 
+# Check if F_SETLKW is supported by fcntl.
+AC_CACHE_CHECK(for F_SETLKW, ac_cv_f_setlkw, [
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include <fcntl.h>]], [[
+struct flock fl;
+fl.l_whence = 0;
+fl.l_start = 0;
+fl.l_len = 0;
+fl.l_pid = 0;
+return fcntl (1, F_SETLKW, &fl);]])],
+[ac_cv_f_setlkw=yes],[ac_cv_f_setlkw=no])])
+if test $ac_cv_f_setlkw = yes; then
+  AC_DEFINE(HOST_HAS_F_SETLKW, 1,
+  [Define if F_SETLKW supported by fcntl.])
+fi
+
 # Restore CFLAGS, CXXFLAGS from before the gcc_AC_NEED_DECLARATIONS tests.
 CFLAGS="$saved_CFLAGS"
 CXXFLAGS="$saved_CXXFLAGS"
Index: gcc/profile.c
===================================================================
--- gcc/profile.c	(revision 191302)
+++ gcc/profile.c	(working copy)
@@ -87,6 +87,15 @@  struct bb_info {
 
 const struct gcov_ctr_summary *profile_info;
 
+/* Number of data points in the working set summary array. Using 128
+   provides information for at least every 1% increment of the total
+   profile size. The last entry is hardwired to 99.9% of the total.  */
+#define NUM_GCOV_WORKING_SETS 128
+
+/* Counter working set information computed from the current counter
+   summary. Not initialized unless profile_info summary is non-NULL.  */
+static gcov_working_set_t gcov_working_sets[NUM_GCOV_WORKING_SETS];
+
 /* Collect statistics on the performance of this pass for the entire source
    file.  */
 
@@ -239,6 +248,152 @@  instrument_values (histogram_values values)
 }
 
 
+/* Compute the working set information from the counter histogram in
+   the profile summary. This is an array of information corresponding to a
+   range of percentages of the total execution count (sum_all), and includes
+   the number of counters required to cover that working set percentage and
+   the minimum counter value in that working set.  */
+
+static void
+compute_working_sets (void)
+{
+  gcov_type working_set_cum_values[NUM_GCOV_WORKING_SETS];
+  gcov_type ws_cum_hotness_incr;
+  gcov_type cum, tmp_cum;
+  const gcov_bucket_type *histo_bucket;
+  unsigned ws_ix, c_num, count, pctinc, pct;
+  int h_ix;
+  gcov_working_set_t *ws_info;
+
+  if (!profile_info)
+    return;
+
+  /* Compute the amount of sum_all that the cumulative hotness grows
+     by in each successive working set entry, which depends on the
+     number of working set entries.  */
+  ws_cum_hotness_incr = profile_info->sum_all / NUM_GCOV_WORKING_SETS;
+
+  /* Next fill in an array of the cumulative hotness values corresponding
+     to each working set summary entry we are going to compute below.
+     Skip 0% statistics, which can be extrapolated from the
+     rest of the summary data.  */
+  cum = ws_cum_hotness_incr;
+  for (ws_ix = 0; ws_ix < NUM_GCOV_WORKING_SETS;
+       ws_ix++, cum += ws_cum_hotness_incr)
+    working_set_cum_values[ws_ix] = cum;
+  /* The last summary entry is reserved for (roughly) 99.9% of the
+     working set. Divide by 1024 so it becomes a shift, which gives
+     almost exactly 99.9%.  */
+  working_set_cum_values[NUM_GCOV_WORKING_SETS-1]
+      = profile_info->sum_all - profile_info->sum_all/1024;
+
+  /* Next, walk through the histogram in decending order of hotness
+     and compute the statistics for the working set summary array.
+     As histogram entries are accumulated, we check to see which
+     working set entries have had their expected cum_value reached
+     and fill them in, walking the working set entries in increasing
+     size of cum_value.  */
+  ws_ix = 0; /* The current entry into the working set array.  */
+  cum = 0; /* The current accumulated counter sum.  */
+  count = 0; /* The current accumulated count of block counters.  */
+  for (h_ix = GCOV_HISTOGRAM_SIZE - 1;
+       h_ix >= 0 && ws_ix < NUM_GCOV_WORKING_SETS; h_ix--)
+    {
+      histo_bucket = &profile_info->histogram[h_ix];
+
+      /* If we haven't reached the required cumulative counter value for
+         the current working set percentage, simply accumulate this histogram
+         entry into the running sums and continue to the next histogram
+         entry.  */
+      if (cum + histo_bucket->cum_value < working_set_cum_values[ws_ix])
+        {
+          cum += histo_bucket->cum_value;
+          count += histo_bucket->num_counters;
+          continue;
+        }
+
+      /* If adding the current histogram entry's cumulative counter value
+         causes us to exceed the current working set size, then estimate
+         how many of this histogram entry's counter values are required to
+         reach the working set size, and fill in working set entries
+         as we reach their expected cumulative value.  */
+      for (c_num = 0, tmp_cum = cum;
+           c_num < histo_bucket->num_counters && ws_ix < NUM_GCOV_WORKING_SETS;
+           c_num++)
+        {
+          count++;
+          /* If we haven't reached the last histogram entry counter, add
+             in the minimum value again. This will underestimate the
+             cumulative sum so far, because many of the counter values in this
+             entry may have been larger than the minimum. We could add in the
+             average value every time, but that would require an expensive
+             divide operation.  */
+          if (c_num + 1 < histo_bucket->num_counters)
+            tmp_cum += histo_bucket->min_value;
+          /* If we have reached the last histogram entry counter, then add
+             in the entire cumulative value.  */
+          else
+            tmp_cum = cum + histo_bucket->cum_value;
+
+          /* Next walk through successive working set entries and fill in
+            the statistics for any whose size we have reached by accumulating
+            this histogram counter.  */
+          while (tmp_cum >= working_set_cum_values[ws_ix]
+                 && ws_ix < NUM_GCOV_WORKING_SETS)
+            {
+              gcov_working_sets[ws_ix].num_counters = count;
+              gcov_working_sets[ws_ix].min_counter
+                  = histo_bucket->min_value;
+              ws_ix++;
+            }
+        }
+      /* Finally, update the running cumulative value since we were
+         using a temporary above.  */
+      cum += histo_bucket->cum_value;
+    }
+  gcc_assert (ws_ix == NUM_GCOV_WORKING_SETS);
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "Counter working sets:\n");
+      /* Multiply the percentage by 100 to avoid float.  */
+      pctinc = 100 * 100 / NUM_GCOV_WORKING_SETS;
+      for (ws_ix = 0, pct = pctinc; ws_ix < NUM_GCOV_WORKING_SETS;
+           ws_ix++, pct += pctinc)
+        {
+          if (ws_ix == NUM_GCOV_WORKING_SETS - 1)
+            pct = 9990;
+          ws_info = &gcov_working_sets[ws_ix];
+          /* Print out the percentage using int arithmatic to avoid float.  */
+          fprintf (dump_file, "\t\t%u.%02u%%: num counts=%u, min counter="
+                   HOST_WIDEST_INT_PRINT_DEC "\n",
+                   pct / 100, pct - (pct / 100 * 100),
+                   ws_info->num_counters,
+                   (HOST_WIDEST_INT)ws_info->min_counter);
+        }
+    }
+}
+
+/* Given a the desired percentage of the full profile (sum_all from the
+   summary), multiplied by 10 to avoid float in PCT_TIMES_10, returns
+   the corresponding working set information. If an exact match for
+   the percentage isn't found, the closest value is used.  */
+
+gcov_working_set_t *
+find_working_set (unsigned pct_times_10)
+{
+  unsigned i;
+  if (!profile_info)
+    return NULL;
+  gcc_assert (pct_times_10 <= 1000);
+  if (pct_times_10 >= 999)
+    return &gcov_working_sets[NUM_GCOV_WORKING_SETS - 1];
+  i = pct_times_10 * NUM_GCOV_WORKING_SETS / 1000;
+  if (!i)
+    return &gcov_working_sets[0];
+  return &gcov_working_sets[i - 1];
+}
+
 /* Computes hybrid profile for all matching entries in da_file.  
    
    CFG_CHECKSUM is the precomputed checksum for the CFG.  */
@@ -266,6 +421,8 @@  get_exec_counts (unsigned cfg_checksum, unsigned l
   if (!counts)
     return NULL;
 
+  compute_working_sets();
+
   if (dump_file && profile_info)
     fprintf(dump_file, "Merged %u profiles with maximal count %u.\n",
 	    profile_info->runs, (unsigned) profile_info->sum_max);
Index: gcc/loop-unroll.c
===================================================================
--- gcc/loop-unroll.c	(revision 191302)
+++ gcc/loop-unroll.c	(working copy)
@@ -194,12 +194,18 @@  report_unroll_peel(struct loop *loop, location_t l
 static int
 code_size_limit_factor(struct loop *loop)
 {
-  unsigned size_threshold;
+  unsigned size_threshold, num_hot_counters;
   struct niter_desc *desc = get_simple_loop_desc (loop);
   gcov_type sum_to_header_ratio;
   int hotness_ratio_threshold;
   int limit_factor;
+  gcov_working_set_t *ws;
 
+  ws = find_working_set(999);
+  if (! ws)
+    return 1;
+  num_hot_counters = ws->num_counters;
+
   /* First check if the application has a large codesize footprint.
      This is estimated from FDO profile summary information for the
      program, where the num_hot_counters indicates the number of hottest
@@ -208,7 +214,7 @@  code_size_limit_factor(struct loop *loop)
      profile where icache misses may be a concern.  */
   size_threshold = PARAM_VALUE (PARAM_UNROLLPEEL_CODESIZE_THRESHOLD);
   if (!profile_info
-      || profile_info->num_hot_counters <= size_threshold
+      || num_hot_counters <= size_threshold
       || !profile_info->sum_all)
     return 1;
 
@@ -223,7 +229,7 @@  code_size_limit_factor(struct loop *loop)
   /* Next, set the value of the codesize-based unroll factor divisor which in
      most loops will need to be set to a value that will reduce or eliminate
      unrolling/peeling.  */
-  if (profile_info->num_hot_counters < size_threshold * 2
+  if (num_hot_counters < size_threshold * 2
       && loop->header->count > 0)
     {
       /* For applications that are less than twice the codesize limit, allow
@@ -231,7 +237,7 @@  code_size_limit_factor(struct loop *loop)
       sum_to_header_ratio = profile_info->sum_all / loop->header->count;
       hotness_ratio_threshold = PARAM_VALUE (PARAM_UNROLLPEEL_HOTNESS_THRESHOLD);
       /* When the profile count sum to loop entry header ratio is smaller than
-         the threshold (i.e. the loop entry is hot enough, the divisor is set
+         the threshold (i.e. the loop entry is hot enough), the divisor is set
          to 1 so the unroll/peel factor is not reduced. When it is bigger
          than the ratio, increase the divisor by the amount this ratio
          is over the threshold, which will quickly reduce the unroll/peel
Index: gcc/coverage.c
===================================================================
--- gcc/coverage.c	(revision 191302)
+++ gcc/coverage.c	(working copy)
@@ -529,14 +529,19 @@  read_counts_file (const char *da_file_name, unsign
 	  gcov_read_summary (&sum);
 	  for (ix = 0; ix != GCOV_COUNTERS_SUMMABLE; ix++)
 	    {
-	      summary.ctrs[ix].num_hot_counters
-                  += sum.ctrs[ix].num_hot_counters;
 	      summary.ctrs[ix].runs += sum.ctrs[ix].runs;
 	      summary.ctrs[ix].sum_all += sum.ctrs[ix].sum_all;
 	      if (summary.ctrs[ix].run_max < sum.ctrs[ix].run_max)
 		summary.ctrs[ix].run_max = sum.ctrs[ix].run_max;
 	      summary.ctrs[ix].sum_max += sum.ctrs[ix].sum_max;
 	    }
+          if (new_summary)
+            memcpy (summary.ctrs[GCOV_COUNTER_ARCS].histogram,
+                    sum.ctrs[GCOV_COUNTER_ARCS].histogram,
+                    sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE);
+          else
+            gcov_histogram_merge (summary.ctrs[GCOV_COUNTER_ARCS].histogram,
+                                  sum.ctrs[GCOV_COUNTER_ARCS].histogram);
 	  new_summary = 0;
 	}
       else if (GCOV_TAG_IS_COUNTER (tag) && fn_ident)
@@ -558,8 +563,9 @@  read_counts_file (const char *da_file_name, unsign
 	      entry->ctr = elt.ctr;
 	      entry->lineno_checksum = lineno_checksum;
 	      entry->cfg_checksum = cfg_checksum;
-	      entry->summary = summary.ctrs[elt.ctr];
-	      entry->summary.num = n_counts;
+              if (elt.ctr < GCOV_COUNTERS_SUMMABLE)
+                entry->summary = summary.ctrs[elt.ctr];
+              entry->summary.num = n_counts;
 	      entry->counts = XCNEWVEC (gcov_type, n_counts);
 	    }
 	  else if (entry->lineno_checksum != lineno_checksum
Index: gcc/basic-block.h
===================================================================
--- gcc/basic-block.h	(revision 191302)
+++ gcc/basic-block.h	(working copy)
@@ -31,6 +31,7 @@  along with GCC; see the file COPYING3.  If not see
    flow graph is manipulated by various optimizations.  A signed type
    makes those easy to detect.  */
 typedef HOST_WIDEST_INT gcov_type;
+typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;
 
 /* Control flow edge information.  */
 struct GTY(()) edge_def {
@@ -98,6 +99,16 @@  DEF_VEC_ALLOC_P(edge,heap);
    profile.c.  */
 extern const struct gcov_ctr_summary *profile_info;
 
+/* Working set size statistics for a given percentage of the entire
+   profile (sum_all from the counter summary).  */
+typedef struct gcov_working_set_info
+{
+  /* Number of hot counters included in this working set.  */
+  unsigned num_counters;
+  /* Smallest counter included in this working set.  */
+  gcov_type min_counter;
+} gcov_working_set_t;
+
 /* Declared in cfgloop.h.  */
 struct loop;
 
@@ -942,4 +953,7 @@  extern void rtl_profile_for_bb (basic_block);
 extern void rtl_profile_for_edge (edge);
 extern void default_rtl_profile (void);
 
+/* In profile.c.  */
+extern gcov_working_set_t *find_working_set(unsigned pct_times_10);
+
 #endif /* GCC_BASIC_BLOCK_H */
Index: gcc/gcov-dump.c
===================================================================
--- gcc/gcov-dump.c	(revision 191302)
+++ gcc/gcov-dump.c	(working copy)
@@ -520,7 +520,8 @@  tag_summary (const char *filename ATTRIBUTE_UNUSED
 	     unsigned tag ATTRIBUTE_UNUSED, unsigned length ATTRIBUTE_UNUSED)
 {
   struct gcov_summary summary;
-  unsigned ix;
+  unsigned ix, h_ix;
+  gcov_bucket_type *histo_bucket;
 
   gcov_read_summary (&summary);
   printf (" checksum=0x%08x", summary.checksum);
@@ -529,9 +530,8 @@  tag_summary (const char *filename ATTRIBUTE_UNUSED
     {
       printf ("\n");
       print_prefix (filename, 0, 0);
-      printf ("\t\tcounts=%u (num hot counts=%u), runs=%u",
+      printf ("\t\tcounts=%u, runs=%u",
 	      summary.ctrs[ix].num,
-	      summary.ctrs[ix].num_hot_counters,
 	      summary.ctrs[ix].runs);
 
       printf (", sum_all=" HOST_WIDEST_INT_PRINT_DEC,
@@ -540,6 +540,25 @@  tag_summary (const char *filename ATTRIBUTE_UNUSED
 	      (HOST_WIDEST_INT)summary.ctrs[ix].run_max);
       printf (", sum_max=" HOST_WIDEST_INT_PRINT_DEC,
 	      (HOST_WIDEST_INT)summary.ctrs[ix].sum_max);
+      if (ix != GCOV_COUNTER_ARCS)
+        continue;
+      printf ("\n");
+      print_prefix (filename, 0, 0);
+      printf ("\t\tcounter histogram:");
+      for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++)
+        {
+          histo_bucket = &summary.ctrs[ix].histogram[h_ix];
+          if (!histo_bucket->num_counters)
+            continue;
+          printf ("\n");
+          print_prefix (filename, 0, 0);
+          printf ("\t\t%d: num counts=%u, min counter="
+              HOST_WIDEST_INT_PRINT_DEC ", cum_counter="
+              HOST_WIDEST_INT_PRINT_DEC,
+	      h_ix, histo_bucket->num_counters,
+              (HOST_WIDEST_INT)histo_bucket->min_value,
+              (HOST_WIDEST_INT)histo_bucket->cum_value);
+        }
     }
 }