diff mbox series

[ovs-dev] dpif-netdev-perf: Clarify frequency number.

Message ID 20180925160040.7039-1-i.maximets@samsung.com
State Changes Requested
Delegated to: Ian Stokes
Headers show
Series [ovs-dev] dpif-netdev-perf: Clarify frequency number. | expand

Commit Message

Ilya Maximets Sept. 25, 2018, 4 p.m. UTC
'dpif-netdev/pmd-perf-show' command prints the frequency number
calculated from the total number of cycles spent for iterations
for the measured period. This number could be confusing, because
users may think that it should be equal to CPU frequency, especially
on non-x86 systems where TSC frequency likely does not match with
CPU one.

Moreover, counted TSC cycles could differ from the HW TSC cycles
in case of a large number of PMD reloads, because cycles spent
outside of the main polling loop are not taken into account anywhere.
In this case the frequency will not match even TSC frequency.

Let's clarify the meaning in order to avoid this misunderstanding.
'Cycles' replaced with 'Used TSC cycles', which describes how many TSC
cycles consumed by the main polling loop. % of the total TSC cycles
now printed instead of GHz frequency, because GHz is unclear for
understanding, especially without knowing the exact TSC frequency.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---
 lib/dpif-netdev-perf.c      | 36 ++++++++++++++++++------------------
 lib/dpif-netdev-unixctl.man | 28 +++++++++++++++-------------
 2 files changed, 33 insertions(+), 31 deletions(-)

Comments

Flavio Leitner Sept. 27, 2018, 5:40 p.m. UTC | #1
On Tue, Sep 25, 2018 at 07:00:40PM +0300, Ilya Maximets wrote:
> 'dpif-netdev/pmd-perf-show' command prints the frequency number
> calculated from the total number of cycles spent for iterations
> for the measured period. This number could be confusing, because
> users may think that it should be equal to CPU frequency, especially
> on non-x86 systems where TSC frequency likely does not match with
> CPU one.
> 
> Moreover, counted TSC cycles could differ from the HW TSC cycles
> in case of a large number of PMD reloads, because cycles spent
> outside of the main polling loop are not taken into account anywhere.
> In this case the frequency will not match even TSC frequency.
> 
> Let's clarify the meaning in order to avoid this misunderstanding.
> 'Cycles' replaced with 'Used TSC cycles', which describes how many TSC
> cycles consumed by the main polling loop. % of the total TSC cycles
> now printed instead of GHz frequency, because GHz is unclear for
> understanding, especially without knowing the exact TSC frequency.
> 
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---

LGTM
Acked-by: Flavio Leitner <fbl@sysclose.org>
Stokes, Ian Oct. 10, 2018, 3:43 p.m. UTC | #2
> On Tue, Sep 25, 2018 at 07:00:40PM +0300, Ilya Maximets wrote:
> > 'dpif-netdev/pmd-perf-show' command prints the frequency number
> > calculated from the total number of cycles spent for iterations for
> > the measured period. This number could be confusing, because users may
> > think that it should be equal to CPU frequency, especially on non-x86
> > systems where TSC frequency likely does not match with CPU one.
> >
> > Moreover, counted TSC cycles could differ from the HW TSC cycles in
> > case of a large number of PMD reloads, because cycles spent outside of
> > the main polling loop are not taken into account anywhere.
> > In this case the frequency will not match even TSC frequency.
> >
> > Let's clarify the meaning in order to avoid this misunderstanding.
> > 'Cycles' replaced with 'Used TSC cycles', which describes how many TSC
> > cycles consumed by the main polling loop. % of the total TSC cycles
> > now printed instead of GHz frequency, because GHz is unclear for
> > understanding, especially without knowing the exact TSC frequency.
> >
> > Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> > ---
> 
> LGTM
> Acked-by: Flavio Leitner <fbl@sysclose.org>
> 

Thanks Ilya, looks good from my side also. I'll apply this as part of this week's pull request.

Thanks
Ian
Stokes, Ian Oct. 10, 2018, 4:34 p.m. UTC | #3
> 'dpif-netdev/pmd-perf-show' command prints the frequency number calculated
> from the total number of cycles spent for iterations for the measured
> period. This number could be confusing, because users may think that it
> should be equal to CPU frequency, especially on non-x86 systems where TSC
> frequency likely does not match with CPU one.
> 
> Moreover, counted TSC cycles could differ from the HW TSC cycles in case
> of a large number of PMD reloads, because cycles spent outside of the main
> polling loop are not taken into account anywhere.
> In this case the frequency will not match even TSC frequency.
> 
> Let's clarify the meaning in order to avoid this misunderstanding.
> 'Cycles' replaced with 'Used TSC cycles', which describes how many TSC
> cycles consumed by the main polling loop. % of the total TSC cycles now
> printed instead of GHz frequency, because GHz is unclear for
> understanding, especially without knowing the exact TSC frequency.
> 

Hi Ilya,

I had signed off on this patch but I just spotted its causing the travis build for OS X to fail. It seems .EX and .EE macros are undefined when used in the manpages.

vswitchd/ovs-vswitchd.8:992: warning: macro `EX' not defined
vswitchd/ovs-vswitchd.8:1010: warning: macro `EE' not defined

Digging through some info on .EX and .EE its reported that they are supported on many but not all Unix based systems which may be the issue here. Thoughts?

https://travis-ci.org/istokes/ovs/jobs/439579443

Ian


> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
>  lib/dpif-netdev-perf.c      | 36 ++++++++++++++++++------------------
>  lib/dpif-netdev-unixctl.man | 28 +++++++++++++++-------------
>  2 files changed, 33 insertions(+), 31 deletions(-)
> 
> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index
> 13f1010c9..8ff4091e5 100644
> --- a/lib/dpif-netdev-perf.c
> +++ b/lib/dpif-netdev-perf.c
> @@ -170,7 +170,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct
> pmd_perf_stats *s,
>                                double duration)  {
>      uint64_t stats[PMD_N_STATS];
> -    double us_per_cycle = 1000000.0 / get_tsc_hz();
> +    uint64_t tsc_hz = get_tsc_hz();
> +    double us_per_cycle = 1000000.0 / tsc_hz;
> 
>      if (duration == 0) {
>          return;
> @@ -191,25 +192,25 @@ pmd_perf_format_overall_stats(struct ds *str, struct
> pmd_perf_stats *s,
>      uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter :
> 0;
> 
>      ds_put_format(str,
> -            "  Cycles:          %12"PRIu64"  (%.2f GHz)\n"
> -            "  Iterations:      %12"PRIu64"  (%.2f us/it)\n"
> -            "  - idle:          %12"PRIu64"  (%4.1f %% cycles)\n"
> -            "  - busy:          %12"PRIu64"  (%4.1f %% cycles)\n",
> -            tot_cycles, (tot_cycles / duration) / 1E9,
> +            "  Iterations:        %12"PRIu64"  (%.2f us/it)\n"
> +            "  - Used TSC cycles: %12"PRIu64"  (%5.1f %% of total
> cycles)\n"
> +            "  - idle iterations: %12"PRIu64"  (%5.1f %% of used
> cycles)\n"
> +            "  - busy iterations: %12"PRIu64"  (%5.1f %% of used
> + cycles)\n",
>              tot_iter, tot_cycles * us_per_cycle / tot_iter,
> +            tot_cycles, 100.0 * (tot_cycles / duration) / tsc_hz,
>              idle_iter,
>              100.0 * stats[PMD_CYCLES_ITER_IDLE] / tot_cycles,
>              busy_iter,
>              100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles);
>      if (rx_packets > 0) {
>          ds_put_format(str,
> -            "  Rx packets:      %12"PRIu64"  (%.0f Kpps, %.0f
> cycles/pkt)\n"
> -            "  Datapath passes: %12"PRIu64"  (%.2f passes/pkt)\n"
> -            "  - EMC hits:      %12"PRIu64"  (%4.1f %%)\n"
> -            "  - Megaflow hits: %12"PRIu64"  (%4.1f %%, %.2f subtbl
> lookups/"
> -
> "hit)\n"
> -            "  - Upcalls:       %12"PRIu64"  (%4.1f %%, %.1f
> us/upcall)\n"
> -            "  - Lost upcalls:  %12"PRIu64"  (%4.1f %%)\n",
> +            "  Rx packets:        %12"PRIu64"  (%.0f Kpps, %.0f
> cycles/pkt)\n"
> +            "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
> +            "  - EMC hits:        %12"PRIu64"  (%5.1f %%)\n"
> +            "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
> +                                                "subtbl lookups/hit)\n"
> +            "  - Upcalls:         %12"PRIu64"  (%5.1f %%, %.1f
> us/upcall)\n"
> +            "  - Lost upcalls:    %12"PRIu64"  (%5.1f %%)\n",
>              rx_packets, (rx_packets / duration) / 1000,
>              1.0 * stats[PMD_CYCLES_ITER_BUSY] / rx_packets,
>              passes, rx_packets ? 1.0 * passes / rx_packets : 0, @@ -
> 225,17 +226,16 @@ pmd_perf_format_overall_stats(struct ds *str, struct
> pmd_perf_stats *s,
>              stats[PMD_STAT_LOST],
>              100.0 * stats[PMD_STAT_LOST] / passes);
>      } else {
> -        ds_put_format(str, "  Rx packets:      %12d\n", 0);
> +        ds_put_format(str, "  Rx packets:        %12d\n", 0);
>      }
>      if (tx_packets > 0) {
>          ds_put_format(str,
> -            "  Tx packets:      %12"PRIu64"  (%.0f Kpps)\n"
> -            "  Tx batches:      %12"PRIu64"  (%.2f pkts/batch)"
> -            "\n",
> +            "  Tx packets:        %12"PRIu64"  (%.0f Kpps)\n"
> +            "  Tx batches:        %12"PRIu64"  (%.2f pkts/batch)\n",
>              tx_packets, (tx_packets / duration) / 1000,
>              tx_batches, 1.0 * tx_packets / tx_batches);
>      } else {
> -        ds_put_format(str, "  Tx packets:      %12d\n\n", 0);
> +        ds_put_format(str, "  Tx packets:        %12d\n\n", 0);
>      }
>  }
> 
> diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
> index ab8619e41..53f33d83f 100644
> --- a/lib/dpif-netdev-unixctl.man
> +++ b/lib/dpif-netdev-unixctl.man
> @@ -123,23 +123,25 @@ Display the last ms_len millisecond stats  The
> output always contains the following global PMD statistics:
>  .RS
>  .IP
> -Time: 15:24:55.270 .br
> +.EX
> +Time: 15:24:55.270
>  Measurement duration: 1.008 s
> 
>  pmd thread numa_id 0 core_id 1:
> 
> -  Cycles:            2419034712  (2.40 GHz)
> -  Iterations:            572817  (1.76 us/it)
> -  - idle:                486808  (15.9 % cycles)
> -  - busy:                 86009  (84.1 % cycles)
> -  Rx packets:           2399607  (2381 Kpps, 848 cycles/pkt)
> -  Datapath passes:      3599415  (1.50 passes/pkt)
> -  - EMC hits:            336472  ( 9.3 %)
> -  - Megaflow hits:      3262943  (90.7 %, 1.00 subtbl lookups/hit)
> -  - Upcalls:                  0  ( 0.0 %, 0.0 us/upcall)
> -  - Lost upcalls:             0  ( 0.0 %)
> -  Tx packets:           2399607  (2381 Kpps)
> -  Tx batches:            171400  (14.00 pkts/batch)
> +  Iterations:              572817  (1.76 us/it)
> +  - Used TSC cycles:   2419034712  ( 99.9 % of total cycles)
> +  - idle iterations:       486808  ( 15.9 % of used cycles)
> +  - busy iterations:        86009  ( 84.1 % of used cycles)
> +  Rx packets:             2399607  (2381 Kpps, 848 cycles/pkt)
> +  Datapath passes:        3599415  (1.50 passes/pkt)
> +  - EMC hits:              336472  (  9.3 %)
> +  - Megaflow hits:        3262943  ( 90.7 %, 1.00 subtbl lookups/hit)
> +  - Upcalls:                    0  (  0.0 %, 0.0 us/upcall)
> +  - Lost upcalls:               0  (  0.0 %)
> +  Tx packets:             2399607  (2381 Kpps)
> +  Tx batches:              171400  (14.00 pkts/batch)
> +.EE
>  .RE
>  .IP
>  Here "Rx packets" actually reflects the number of packets forwarded by
> the
> --
> 2.17.1
Ilya Maximets Oct. 11, 2018, 11:48 a.m. UTC | #4
On 10.10.2018 19:34, Stokes, Ian wrote:
>> 'dpif-netdev/pmd-perf-show' command prints the frequency number calculated
>> from the total number of cycles spent for iterations for the measured
>> period. This number could be confusing, because users may think that it
>> should be equal to CPU frequency, especially on non-x86 systems where TSC
>> frequency likely does not match with CPU one.
>>
>> Moreover, counted TSC cycles could differ from the HW TSC cycles in case
>> of a large number of PMD reloads, because cycles spent outside of the main
>> polling loop are not taken into account anywhere.
>> In this case the frequency will not match even TSC frequency.
>>
>> Let's clarify the meaning in order to avoid this misunderstanding.
>> 'Cycles' replaced with 'Used TSC cycles', which describes how many TSC
>> cycles consumed by the main polling loop. % of the total TSC cycles now
>> printed instead of GHz frequency, because GHz is unclear for
>> understanding, especially without knowing the exact TSC frequency.
>>
> 
> Hi Ilya,
> 
> I had signed off on this patch but I just spotted its causing the travis build for OS X to fail. It seems .EX and .EE macros are undefined when used in the manpages.
> 
> vswitchd/ovs-vswitchd.8:992: warning: macro `EX' not defined
> vswitchd/ovs-vswitchd.8:1010: warning: macro `EE' not defined
> 
> Digging through some info on .EX and .EE its reported that they are supported on many but not all Unix based systems which may be the issue here. Thoughts?
> 
> https://travis-ci.org/istokes/ovs/jobs/439579443
> 
> Ian

Hi Ian.
Thanks for spotting this. The issue could be solved by including
lib/ovs.tmac file which has required definitions.
I'll send v2 with this change.

Later I'll send the patch to include this file to all the remaining
man roots to avoid such problems in the future. This also will reduce
code duplication as many files has equal redefinitions of a common
macroses.

Best regards, Ilya Maximets.

> 
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>> ---
>>  lib/dpif-netdev-perf.c      | 36 ++++++++++++++++++------------------
>>  lib/dpif-netdev-unixctl.man | 28 +++++++++++++++-------------
>>  2 files changed, 33 insertions(+), 31 deletions(-)
>>
>> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index
>> 13f1010c9..8ff4091e5 100644
>> --- a/lib/dpif-netdev-perf.c
>> +++ b/lib/dpif-netdev-perf.c
>> @@ -170,7 +170,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct
>> pmd_perf_stats *s,
>>                                double duration)  {
>>      uint64_t stats[PMD_N_STATS];
>> -    double us_per_cycle = 1000000.0 / get_tsc_hz();
>> +    uint64_t tsc_hz = get_tsc_hz();
>> +    double us_per_cycle = 1000000.0 / tsc_hz;
>>
>>      if (duration == 0) {
>>          return;
>> @@ -191,25 +192,25 @@ pmd_perf_format_overall_stats(struct ds *str, struct
>> pmd_perf_stats *s,
>>      uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter :
>> 0;
>>
>>      ds_put_format(str,
>> -            "  Cycles:          %12"PRIu64"  (%.2f GHz)\n"
>> -            "  Iterations:      %12"PRIu64"  (%.2f us/it)\n"
>> -            "  - idle:          %12"PRIu64"  (%4.1f %% cycles)\n"
>> -            "  - busy:          %12"PRIu64"  (%4.1f %% cycles)\n",
>> -            tot_cycles, (tot_cycles / duration) / 1E9,
>> +            "  Iterations:        %12"PRIu64"  (%.2f us/it)\n"
>> +            "  - Used TSC cycles: %12"PRIu64"  (%5.1f %% of total
>> cycles)\n"
>> +            "  - idle iterations: %12"PRIu64"  (%5.1f %% of used
>> cycles)\n"
>> +            "  - busy iterations: %12"PRIu64"  (%5.1f %% of used
>> + cycles)\n",
>>              tot_iter, tot_cycles * us_per_cycle / tot_iter,
>> +            tot_cycles, 100.0 * (tot_cycles / duration) / tsc_hz,
>>              idle_iter,
>>              100.0 * stats[PMD_CYCLES_ITER_IDLE] / tot_cycles,
>>              busy_iter,
>>              100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles);
>>      if (rx_packets > 0) {
>>          ds_put_format(str,
>> -            "  Rx packets:      %12"PRIu64"  (%.0f Kpps, %.0f
>> cycles/pkt)\n"
>> -            "  Datapath passes: %12"PRIu64"  (%.2f passes/pkt)\n"
>> -            "  - EMC hits:      %12"PRIu64"  (%4.1f %%)\n"
>> -            "  - Megaflow hits: %12"PRIu64"  (%4.1f %%, %.2f subtbl
>> lookups/"
>> -
>> "hit)\n"
>> -            "  - Upcalls:       %12"PRIu64"  (%4.1f %%, %.1f
>> us/upcall)\n"
>> -            "  - Lost upcalls:  %12"PRIu64"  (%4.1f %%)\n",
>> +            "  Rx packets:        %12"PRIu64"  (%.0f Kpps, %.0f
>> cycles/pkt)\n"
>> +            "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
>> +            "  - EMC hits:        %12"PRIu64"  (%5.1f %%)\n"
>> +            "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
>> +                                                "subtbl lookups/hit)\n"
>> +            "  - Upcalls:         %12"PRIu64"  (%5.1f %%, %.1f
>> us/upcall)\n"
>> +            "  - Lost upcalls:    %12"PRIu64"  (%5.1f %%)\n",
>>              rx_packets, (rx_packets / duration) / 1000,
>>              1.0 * stats[PMD_CYCLES_ITER_BUSY] / rx_packets,
>>              passes, rx_packets ? 1.0 * passes / rx_packets : 0, @@ -
>> 225,17 +226,16 @@ pmd_perf_format_overall_stats(struct ds *str, struct
>> pmd_perf_stats *s,
>>              stats[PMD_STAT_LOST],
>>              100.0 * stats[PMD_STAT_LOST] / passes);
>>      } else {
>> -        ds_put_format(str, "  Rx packets:      %12d\n", 0);
>> +        ds_put_format(str, "  Rx packets:        %12d\n", 0);
>>      }
>>      if (tx_packets > 0) {
>>          ds_put_format(str,
>> -            "  Tx packets:      %12"PRIu64"  (%.0f Kpps)\n"
>> -            "  Tx batches:      %12"PRIu64"  (%.2f pkts/batch)"
>> -            "\n",
>> +            "  Tx packets:        %12"PRIu64"  (%.0f Kpps)\n"
>> +            "  Tx batches:        %12"PRIu64"  (%.2f pkts/batch)\n",
>>              tx_packets, (tx_packets / duration) / 1000,
>>              tx_batches, 1.0 * tx_packets / tx_batches);
>>      } else {
>> -        ds_put_format(str, "  Tx packets:      %12d\n\n", 0);
>> +        ds_put_format(str, "  Tx packets:        %12d\n\n", 0);
>>      }
>>  }
>>
>> diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
>> index ab8619e41..53f33d83f 100644
>> --- a/lib/dpif-netdev-unixctl.man
>> +++ b/lib/dpif-netdev-unixctl.man
>> @@ -123,23 +123,25 @@ Display the last ms_len millisecond stats  The
>> output always contains the following global PMD statistics:
>>  .RS
>>  .IP
>> -Time: 15:24:55.270 .br
>> +.EX
>> +Time: 15:24:55.270
>>  Measurement duration: 1.008 s
>>
>>  pmd thread numa_id 0 core_id 1:
>>
>> -  Cycles:            2419034712  (2.40 GHz)
>> -  Iterations:            572817  (1.76 us/it)
>> -  - idle:                486808  (15.9 % cycles)
>> -  - busy:                 86009  (84.1 % cycles)
>> -  Rx packets:           2399607  (2381 Kpps, 848 cycles/pkt)
>> -  Datapath passes:      3599415  (1.50 passes/pkt)
>> -  - EMC hits:            336472  ( 9.3 %)
>> -  - Megaflow hits:      3262943  (90.7 %, 1.00 subtbl lookups/hit)
>> -  - Upcalls:                  0  ( 0.0 %, 0.0 us/upcall)
>> -  - Lost upcalls:             0  ( 0.0 %)
>> -  Tx packets:           2399607  (2381 Kpps)
>> -  Tx batches:            171400  (14.00 pkts/batch)
>> +  Iterations:              572817  (1.76 us/it)
>> +  - Used TSC cycles:   2419034712  ( 99.9 % of total cycles)
>> +  - idle iterations:       486808  ( 15.9 % of used cycles)
>> +  - busy iterations:        86009  ( 84.1 % of used cycles)
>> +  Rx packets:             2399607  (2381 Kpps, 848 cycles/pkt)
>> +  Datapath passes:        3599415  (1.50 passes/pkt)
>> +  - EMC hits:              336472  (  9.3 %)
>> +  - Megaflow hits:        3262943  ( 90.7 %, 1.00 subtbl lookups/hit)
>> +  - Upcalls:                    0  (  0.0 %, 0.0 us/upcall)
>> +  - Lost upcalls:               0  (  0.0 %)
>> +  Tx packets:             2399607  (2381 Kpps)
>> +  Tx batches:              171400  (14.00 pkts/batch)
>> +.EE
>>  .RE
>>  .IP
>>  Here "Rx packets" actually reflects the number of packets forwarded by
>> the
>> --
>> 2.17.1
> 
> 
>
diff mbox series

Patch

diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 13f1010c9..8ff4091e5 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -170,7 +170,8 @@  pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
                               double duration)
 {
     uint64_t stats[PMD_N_STATS];
-    double us_per_cycle = 1000000.0 / get_tsc_hz();
+    uint64_t tsc_hz = get_tsc_hz();
+    double us_per_cycle = 1000000.0 / tsc_hz;
 
     if (duration == 0) {
         return;
@@ -191,25 +192,25 @@  pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
     uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0;
 
     ds_put_format(str,
-            "  Cycles:          %12"PRIu64"  (%.2f GHz)\n"
-            "  Iterations:      %12"PRIu64"  (%.2f us/it)\n"
-            "  - idle:          %12"PRIu64"  (%4.1f %% cycles)\n"
-            "  - busy:          %12"PRIu64"  (%4.1f %% cycles)\n",
-            tot_cycles, (tot_cycles / duration) / 1E9,
+            "  Iterations:        %12"PRIu64"  (%.2f us/it)\n"
+            "  - Used TSC cycles: %12"PRIu64"  (%5.1f %% of total cycles)\n"
+            "  - idle iterations: %12"PRIu64"  (%5.1f %% of used cycles)\n"
+            "  - busy iterations: %12"PRIu64"  (%5.1f %% of used cycles)\n",
             tot_iter, tot_cycles * us_per_cycle / tot_iter,
+            tot_cycles, 100.0 * (tot_cycles / duration) / tsc_hz,
             idle_iter,
             100.0 * stats[PMD_CYCLES_ITER_IDLE] / tot_cycles,
             busy_iter,
             100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles);
     if (rx_packets > 0) {
         ds_put_format(str,
-            "  Rx packets:      %12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
-            "  Datapath passes: %12"PRIu64"  (%.2f passes/pkt)\n"
-            "  - EMC hits:      %12"PRIu64"  (%4.1f %%)\n"
-            "  - Megaflow hits: %12"PRIu64"  (%4.1f %%, %.2f subtbl lookups/"
-                                                                     "hit)\n"
-            "  - Upcalls:       %12"PRIu64"  (%4.1f %%, %.1f us/upcall)\n"
-            "  - Lost upcalls:  %12"PRIu64"  (%4.1f %%)\n",
+            "  Rx packets:        %12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
+            "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
+            "  - EMC hits:        %12"PRIu64"  (%5.1f %%)\n"
+            "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
+                                                "subtbl lookups/hit)\n"
+            "  - Upcalls:         %12"PRIu64"  (%5.1f %%, %.1f us/upcall)\n"
+            "  - Lost upcalls:    %12"PRIu64"  (%5.1f %%)\n",
             rx_packets, (rx_packets / duration) / 1000,
             1.0 * stats[PMD_CYCLES_ITER_BUSY] / rx_packets,
             passes, rx_packets ? 1.0 * passes / rx_packets : 0,
@@ -225,17 +226,16 @@  pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
             stats[PMD_STAT_LOST],
             100.0 * stats[PMD_STAT_LOST] / passes);
     } else {
-        ds_put_format(str, "  Rx packets:      %12d\n", 0);
+        ds_put_format(str, "  Rx packets:        %12d\n", 0);
     }
     if (tx_packets > 0) {
         ds_put_format(str,
-            "  Tx packets:      %12"PRIu64"  (%.0f Kpps)\n"
-            "  Tx batches:      %12"PRIu64"  (%.2f pkts/batch)"
-            "\n",
+            "  Tx packets:        %12"PRIu64"  (%.0f Kpps)\n"
+            "  Tx batches:        %12"PRIu64"  (%.2f pkts/batch)\n",
             tx_packets, (tx_packets / duration) / 1000,
             tx_batches, 1.0 * tx_packets / tx_batches);
     } else {
-        ds_put_format(str, "  Tx packets:      %12d\n\n", 0);
+        ds_put_format(str, "  Tx packets:        %12d\n\n", 0);
     }
 }
 
diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index ab8619e41..53f33d83f 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -123,23 +123,25 @@  Display the last ms_len millisecond stats
 The output always contains the following global PMD statistics:
 .RS
 .IP
-Time: 15:24:55.270 .br
+.EX
+Time: 15:24:55.270
 Measurement duration: 1.008 s
 
 pmd thread numa_id 0 core_id 1:
 
-  Cycles:            2419034712  (2.40 GHz)
-  Iterations:            572817  (1.76 us/it)
-  - idle:                486808  (15.9 % cycles)
-  - busy:                 86009  (84.1 % cycles)
-  Rx packets:           2399607  (2381 Kpps, 848 cycles/pkt)
-  Datapath passes:      3599415  (1.50 passes/pkt)
-  - EMC hits:            336472  ( 9.3 %)
-  - Megaflow hits:      3262943  (90.7 %, 1.00 subtbl lookups/hit)
-  - Upcalls:                  0  ( 0.0 %, 0.0 us/upcall)
-  - Lost upcalls:             0  ( 0.0 %)
-  Tx packets:           2399607  (2381 Kpps)
-  Tx batches:            171400  (14.00 pkts/batch)
+  Iterations:              572817  (1.76 us/it)
+  - Used TSC cycles:   2419034712  ( 99.9 % of total cycles)
+  - idle iterations:       486808  ( 15.9 % of used cycles)
+  - busy iterations:        86009  ( 84.1 % of used cycles)
+  Rx packets:             2399607  (2381 Kpps, 848 cycles/pkt)
+  Datapath passes:        3599415  (1.50 passes/pkt)
+  - EMC hits:              336472  (  9.3 %)
+  - Megaflow hits:        3262943  ( 90.7 %, 1.00 subtbl lookups/hit)
+  - Upcalls:                    0  (  0.0 %, 0.0 us/upcall)
+  - Lost upcalls:               0  (  0.0 %)
+  Tx packets:             2399607  (2381 Kpps)
+  Tx batches:              171400  (14.00 pkts/batch)
+.EE
 .RE
 .IP
 Here "Rx packets" actually reflects the number of packets forwarded by the