diff mbox series

[ovs-dev,v3,1/3] dpif-netdev: Refactor PMD performance into dpif-netdev-perf

Message ID CFF8EF42F1132E4CBE2BF0AB6C21C58D8CECD1C6@ESESSMB107.ericsson.se
State Superseded
Delegated to: Ian Stokes
Headers show
Series [ovs-dev,v3,1/3] dpif-netdev: Refactor PMD performance into dpif-netdev-perf | expand

Commit Message

Jan Scheurich Nov. 21, 2017, 12:38 a.m. UTC
Add module dpif-netdev-perf to host all PMD performance-related
data structures and functions in dpif-netdev. Refactor the PMD
stats handling in dpif-netdev and delegate whatever possible into
the new module, using clean interfaces to shield dpif-netdev from
the implementation details. Accordingly, the all PMD statistics
members are moved from the main struct dp_netdev_pmd_thread into
a dedicated member of type struct pmd_perf_stats.

Include Darrel's prior refactoring of PMD stats contained in
[PATCH v5,2/3] dpif-netdev: Refactor some pmd stats:

1. The cycles per packet counts are now based on packets
received rather than packet passes through the datapath.

2. Packet counters are now kept for packets received and
packets recirculated. These are kept as separate counters for
maintainability reasons. The cost of incrementing these counters
is negligible.  These new counters are also displayed to the user.

3. A display statistic is added for the average number of
datapath passes per packet. This should be useful for user
debugging and understanding of packet processing.

4. The user visible 'miss' counter is used for successful upcalls,
rather than the sum of sucessful and unsuccessful upcalls. Hence,
this becomes what user historically understands by OVS 'miss upcall'.
The user display is annotated to make this clear as well.

5. The user visible 'lost' counter remains as failed upcalls, but
is annotated to make it clear what the meaning is.

6. The enum pmd_stat_type is annotated to make the usage of the
stats counters clear.

7. The subtable lookup stats is renamed to make it clear that it
relates to masked lookups.

8. The PMD stats test is updated to handle the new user stats of
packets received, packets recirculated and average number of datapath
passes per packet.

On top of that introduce a "-pmd <core>" option to the PMD info
commands to filter the output for a single PMD.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>

---
 lib/automake.mk        |   2 +
 lib/dpif-netdev-perf.c |  67 +++++++++
 lib/dpif-netdev-perf.h | 123 ++++++++++++++++
 lib/dpif-netdev.c      | 371 ++++++++++++++++++++-----------------------------
 tests/pmd.at           |  22 +--
 5 files changed, 358 insertions(+), 227 deletions(-)
 create mode 100644 lib/dpif-netdev-perf.c
 create mode 100644 lib/dpif-netdev-perf.h

Comments

Billy O'Mahony Dec. 8, 2017, 5:25 p.m. UTC | #1
Hi Jan,

I had problems applying later patches in this series so just reviewing this one for now. I tried several revisions to apply them.

The second patch ([ovs-dev,v3,2/3] dpif-netdev: Detailed performance stats for PMDs ) fails with 
fatal: patch fragment without header at line 708: @@ -1073,6 +1155,12 @@ dpif_netdev_init(void)

Overall not only is user-visible output is clearer but the code is also more consistent and easier to understand. 

I tested this patch by applying it to: 3728b3b Ben Pfaff 2017-11-20 Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into HEAD

These are the issues I did find:
1. make check #1159 "ofproto-dpif patch ports" consistently fails for me with this patch applied

2. ./utilities/checkpatch.py reports some line length issues:
== Checking "dpif-netdev-perf/ovs-dev-v3-1-3-dpif-netdev-Refactor-PMD-performance-into-dpif-netdev-perf.patch" ==
ERROR: Too many signoffs; are you missing Co-authored-by lines?
WARNING: Line length is >79-characters long
#346 FILE: lib/dpif-netdev.c:544:
        struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */

WARNING: Line has non-spaces leading whitespace
WARNING: Line has trailing whitespace
#347 FILE: lib/dpif-netdev.c:545:


WARNING: Line length is >79-characters long
#349 FILE: lib/dpif-netdev.c:547:
         * XPS disabled for this netdev. All static_tx_qid's are unique and less

WARNING: Line has non-spaces leading whitespace
WARNING: Line has trailing whitespace
#352 FILE: lib/dpif-netdev.c:550:


WARNING: Line length is >79-characters long
WARNING: Line has trailing whitespace
#413 FILE: lib/dpif-netdev.c:610:
    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct pmd_perf_stats perf_stats;

Lines checked: 976, Warnings: 8, Errors: 1

3. Does the new 'pmd' arg to pmd-stats-show interfere with the existing [dp] arg? 

    sudo ./utilities/ovs-appctl dpif-netdev/pmd-stats-show -pmd 1 netdev@dpif-netdev
    "dpif-netdev/pmd-stats-show" command takes at most 2 arguments
     ovs-appctl: ovs-vswitchd: server returned an error

Otherwise it looks like a really useful patch. And the remainder of the series more so.

Thanks,
Billy. 

> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> bounces@openvswitch.org] On Behalf Of Jan Scheurich
> Sent: Tuesday, November 21, 2017 12:38 AM
> To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
> Subject: [ovs-dev] [PATCH v3 1/3] dpif-netdev: Refactor PMD performance into
> dpif-netdev-perf
> 
> Add module dpif-netdev-perf to host all PMD performance-related data
> structures and functions in dpif-netdev. Refactor the PMD stats handling in dpif-
> netdev and delegate whatever possible into the new module, using clean
> interfaces to shield dpif-netdev from the implementation details. Accordingly,
> the all PMD statistics members are moved from the main struct
> dp_netdev_pmd_thread into a dedicated member of type struct pmd_perf_stats.
> 
> Include Darrel's prior refactoring of PMD stats contained in [PATCH v5,2/3] dpif-
> netdev: Refactor some pmd stats:
> 
> 1. The cycles per packet counts are now based on packets received rather than
> packet passes through the datapath.
> 
> 2. Packet counters are now kept for packets received and packets recirculated.
> These are kept as separate counters for maintainability reasons. The cost of
> incrementing these counters is negligible.  These new counters are also
> displayed to the user.
> 
> 3. A display statistic is added for the average number of datapath passes per
> packet. This should be useful for user debugging and understanding of packet
> processing.
> 
> 4. The user visible 'miss' counter is used for successful upcalls, rather than the
> sum of sucessful and unsuccessful upcalls. Hence, this becomes what user
> historically understands by OVS 'miss upcall'.
> The user display is annotated to make this clear as well.
> 
> 5. The user visible 'lost' counter remains as failed upcalls, but is annotated to
> make it clear what the meaning is.
> 
> 6. The enum pmd_stat_type is annotated to make the usage of the stats
> counters clear.
> 
> 7. The subtable lookup stats is renamed to make it clear that it relates to masked
> lookups.
> 
> 8. The PMD stats test is updated to handle the new user stats of packets
> received, packets recirculated and average number of datapath passes per
> packet.
> 
> On top of that introduce a "-pmd <core>" option to the PMD info commands to
> filter the output for a single PMD.
> 
> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
> Signed-off-by: Darrell Ball <dlu998@gmail.com>
> 
> ---
>  lib/automake.mk        |   2 +
>  lib/dpif-netdev-perf.c |  67 +++++++++
>  lib/dpif-netdev-perf.h | 123 ++++++++++++++++
>  lib/dpif-netdev.c      | 371 ++++++++++++++++++++-----------------------------
>  tests/pmd.at           |  22 +--
>  5 files changed, 358 insertions(+), 227 deletions(-)  create mode 100644 lib/dpif-
> netdev-perf.c  create mode 100644 lib/dpif-netdev-perf.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk index effe5b5..0b8df0e 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \
>  	lib/dpdk.h \
>  	lib/dpif-netdev.c \
>  	lib/dpif-netdev.h \
> +	lib/dpif-netdev-perf.c \
> +	lib/dpif-netdev-perf.h \
>  	lib/dpif-provider.h \
>  	lib/dpif.c \
>  	lib/dpif.h \
> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c new file mode 100644
> index 0000000..6a8f7c4
> --- /dev/null
> +++ b/lib/dpif-netdev-perf.c
> @@ -0,0 +1,67 @@
> +/*
> + * Copyright (c) 2017 Ericsson AB.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <config.h>
> +
> +#ifdef DPDK_NETDEV
> +#include <rte_cycles.h>
> +#endif
> +
> +#include "openvswitch/dynamic-string.h"
> +#include "openvswitch/vlog.h"
> +#include "dpif-netdev-perf.h"
> +#include "timeval.h"
> +
> +VLOG_DEFINE_THIS_MODULE(pmd_perf);
> +
> +void
> +pmd_perf_stats_init(struct pmd_perf_stats *s) {
> +    memset(s, 0 , sizeof(*s));
> +    s->start_ms = time_msec();
> +}
> +
> +void
> +pmd_perf_read_counters(struct pmd_perf_stats *s,
> +                       uint64_t stats[PMD_N_STATS]) {
> +    uint64_t val;
> +
> +    /* These loops subtracts reference values ('*_zero') from the counters.
> +     * Since loads and stores are relaxed, it might be possible for a '*_zero'
> +     * value to be more recent than the current value we're reading from the
> +     * counter.  This is not a big problem, since these numbers are not
> +     * supposed to be too accurate, but we should at least make sure that
> +     * the result is not negative. */
> +    for (int i = 0; i < PMD_N_STATS; i++) {
> +        atomic_read_relaxed(&s->counters.n[i], &val);
> +        if (val > s->counters.zero[i]) {
> +            stats[i] = val - s->counters.zero[i];
> +        } else {
> +            stats[i] = 0;
> +        }
> +    }
> +}
> +
> +void
> +pmd_perf_stats_clear(struct pmd_perf_stats *s) {
> +    s->start_ms = 0; /* Marks start of clearing. */
> +    for (int i = 0; i < PMD_N_STATS; i++) {
> +        atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]);
> +    }
> +    s->start_ms = time_msec(); /* Clearing finished. */ }
> +
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h new file mode 100644
> index 0000000..f55f7a2
> --- /dev/null
> +++ b/lib/dpif-netdev-perf.h
> @@ -0,0 +1,123 @@
> +/*
> + * Copyright (c) 2017 Ericsson AB.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef PMD_PERF_H
> +#define PMD_PERF_H 1
> +
> +#include <stdbool.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <string.h>
> +#include <math.h>
> +
> +#include "openvswitch/vlog.h"
> +#include "ovs-atomic.h"
> +#include "timeval.h"
> +#include "unixctl.h"
> +#include "util.h"
> +
> +#ifdef  __cplusplus
> +extern "C" {
> +#endif
> +
> +enum pmd_stat_type {
> +    PMD_STAT_EXACT_HIT,     /* Packets that had an exact match (emc). */
> +    PMD_STAT_MASKED_HIT,    /* Packets that matched in the flow table. */
> +    PMD_STAT_MISS,          /* Packets that did not match and upcall was ok. */
> +    PMD_STAT_LOST,          /* Packets that did not match and upcall failed. */
> +                           /* The above statistics account for the total
> +                            * number of packet passes through the datapath
> +                            * pipeline and should not be overlapping with each
> +                            * other. */
> +    PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table
> +                              hits. Each MASKED_HIT hit will have >= 1
> +                              MASKED_LOOKUP(s). */
> +    PMD_STAT_RECV,          /* Packets entering the datapath pipeline from an
> +                             * interface. */
> +    PMD_STAT_RECIRC,        /* Packets reentering the datapath pipeline due to
> +                             * recirculation. */
> +    PMD_CYCLES_POLL_IDLE,   /* Cycles spent unsuccessful polling. */
> +    PMD_CYCLES_POLL_BUSY,   /* Cycles spent successfully polling and
> +                             * processing polled packets. */
> +    PMD_CYCLES_OVERHEAD,    /* Cycles spent for other tasks. */
> +    PMD_CYCLES_UPCALL,      /* Cycles spent in upcalls. Included in
> +                               PMD_CYCLES_POLL_BUSY. */
> +    PMD_CYCLES_ITER_IDLE,   /* Cycles spent in idle iterations. */
> +    PMD_CYCLES_ITER_BUSY,   /* Cycles spent in busy iterations. */
> +    PMD_N_STATS
> +};
> +
> +struct pmd_counters {
> +    atomic_uint64_t n[PMD_N_STATS];     /* Indexed by PMD_STAT_*. */
> +    uint64_t zero[PMD_N_STATS];
> +};
> +
> +struct pmd_perf_stats {
> +    uint64_t start_ms;
> +    uint64_t last_tsc;
> +    struct pmd_counters counters;
> +};
> +
> +enum pmd_info_type;
> +struct dp_netdev_pmd_thread;
> +
> +struct pmd_perf_params {
> +    int command_type;
> +    bool histograms;
> +    size_t iter_hist_len;
> +    size_t ms_hist_len;
> +};
> +
> +void pmd_perf_stats_init(struct pmd_perf_stats *s); void
> +pmd_perf_stats_clear(struct pmd_perf_stats *s); void
> +pmd_perf_read_counters(struct pmd_perf_stats *s,
> +                            uint64_t stats[PMD_N_STATS]);
> +
> +static inline void
> +pmd_perf_update_counter(struct pmd_perf_stats *s,
> +                        enum pmd_stat_type counter, int delta) {
> +    uint64_t tmp;
> +    atomic_read_relaxed(&s->counters.n[counter], &tmp);
> +    tmp += delta;
> +    atomic_store_relaxed(&s->counters.n[counter], tmp); }
> +
> +static inline void
> +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) {
> +    if (OVS_UNLIKELY(s->start_ms == 0)) {
> +        /* Stats not yet fully initialised. */
> +        return;
> +    }
> +    s->last_tsc = now_tsc;
> +}
> +
> +static inline void
> +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
> +                       int packets)
> +{
> +    uint64_t cycles = now_tsc - s->last_tsc;
> +
> +    /* No need for atomic updates as only invoked within PMD thread. */
> +    if (packets > 0) {
> +        s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles;
> +    } else {
> +        s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles;
> +    }
> +}
> +
> +#endif /* pmd-perf.h */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0a62630..bf99f81 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -43,6 +43,7 @@
>  #include "csum.h"
>  #include "dp-packet.h"
>  #include "dpif.h"
> +#include "dpif-netdev-perf.h"
>  #include "dpif-provider.h"
>  #include "dummy.h"
>  #include "fat-rwlock.h"
> @@ -330,23 +331,6 @@ static struct dp_netdev_port
> *dp_netdev_lookup_port(const struct dp_netdev *dp,
>                                                      odp_port_t)
>      OVS_REQUIRES(dp->port_mutex);
> 
> -enum dp_stat_type {
> -    DP_STAT_EXACT_HIT,          /* Packets that had an exact match (emc). */
> -    DP_STAT_MASKED_HIT,         /* Packets that matched in the flow table. */
> -    DP_STAT_MISS,               /* Packets that did not match. */
> -    DP_STAT_LOST,               /* Packets not passed up to the client. */
> -    DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for flow table
> -                                   hits */
> -    DP_N_STATS
> -};
> -
> -enum pmd_cycles_counter_type {
> -    PMD_CYCLES_IDLE,            /* Cycles spent idle or unsuccessful polling */
> -    PMD_CYCLES_PROCESSING,      /* Cycles spent successfully polling and
> -                                 * processing polled packets */
> -    PMD_N_CYCLES
> -};
> -
>  enum rxq_cycles_counter_type {
>      RXQ_CYCLES_PROC_CURR,       /* Cycles spent successfully polling and
>                                     processing packets during the current @@ -496,18 +480,6
> @@ struct dp_netdev_actions *dp_netdev_flow_get_actions(
>      const struct dp_netdev_flow *);
>  static void dp_netdev_actions_free(struct dp_netdev_actions *);
> 
> -/* Contained by struct dp_netdev_pmd_thread's 'stats' member.  */ -struct
> dp_netdev_pmd_stats {
> -    /* Indexed by DP_STAT_*. */
> -    atomic_ullong n[DP_N_STATS];
> -};
> -
> -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member.  */ -struct
> dp_netdev_pmd_cycles {
> -    /* Indexed by PMD_CYCLES_*. */
> -    atomic_ullong n[PMD_N_CYCLES];
> -};
> -
>  struct polled_queue {
>      struct dp_netdev_rxq *rxq;
>      odp_port_t port_no;
> @@ -566,20 +538,22 @@ struct dp_netdev_pmd_thread {
>       * need to be protected by 'non_pmd_mutex'.  Every other instance
>       * will only be accessed by its own pmd thread. */
>      OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache;
> -    struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */
> -
> -    /* Queue id used by this pmd thread to send packets on all netdevs if
> -     * XPS disabled for this netdev. All static_tx_qid's are unique and less
> -     * than 'cmap_count(dp->poll_threads)'. */
> -    uint32_t static_tx_qid;
> -
> -    /* Flow-Table and classifiers
> -     *
> -     * Writers of 'flow_table' must take the 'flow_mutex'.  Corresponding
> -     * changes to 'classifiers' must be made while still holding the
> -     * 'flow_mutex'.
> -     */
> -    struct ovs_mutex flow_mutex;
> +    PADDED_MEMBERS(CACHE_LINE_SIZE,
> +        struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */
> +
> +        /* Queue id used by this pmd thread to send packets on all netdevs if
> +         * XPS disabled for this netdev. All static_tx_qid's are unique and less
> +         * than 'cmap_count(dp->poll_threads)'. */
> +        uint32_t static_tx_qid;
> +
> +        /* Flow-Table and classifiers
> +         *
> +         * Writers of 'flow_table' must take the 'flow_mutex'.  Corresponding
> +         * changes to 'classifiers' must be made while still holding the
> +         * 'flow_mutex'.
> +         */
> +        struct ovs_mutex flow_mutex;
> +    );
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>          struct cmap flow_table OVS_GUARDED; /* Flow table. */
> 
> @@ -591,20 +565,10 @@ struct dp_netdev_pmd_thread {
>             are stored for each polled rxq. */
>          long long int rxq_next_cycle_store;
> 
> -        /* Cycles counters */
> -        struct dp_netdev_pmd_cycles cycles;
> -
>          /* Used to count cycles. See 'cycles_counter_end()'. */
>          unsigned long long last_cycles;
>          struct latch exit_latch;        /* For terminating the pmd thread. */
> -    );
> -
> -    PADDED_MEMBERS(CACHE_LINE_SIZE,
> -        /* Statistics. */
> -        struct dp_netdev_pmd_stats stats;
> 
> -        struct seq *reload_seq;
> -        uint64_t last_reload_seq;
>          atomic_bool reload;             /* Do we need to reload ports? */
>          bool isolated;
> 
> @@ -612,11 +576,11 @@ struct dp_netdev_pmd_thread {
>          bool need_reload;
>          /* 5 pad bytes. */
>      );
> -
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> +        struct seq *reload_seq;
> +        uint64_t last_reload_seq;
>          struct ovs_mutex port_mutex;    /* Mutex for 'poll_list'
>                                             and 'tx_ports'. */
> -        /* 16 pad bytes. */
>      );
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>          /* List of rx queues to poll. */ @@ -642,15 +606,8 @@ struct
> dp_netdev_pmd_thread {
>          struct hmap send_port_cache;
>      );
> 
> -    PADDED_MEMBERS(CACHE_LINE_SIZE,
> -        /* Only a pmd thread can write on its own 'cycles' and 'stats'.
> -         * The main thread keeps 'stats_zero' and 'cycles_zero' as base
> -         * values and subtracts them from 'stats' and 'cycles' before
> -         * reporting to the user */
> -        unsigned long long stats_zero[DP_N_STATS];
> -        uint64_t cycles_zero[PMD_N_CYCLES];
> -        /* 8 pad bytes. */
> -    );
> +    /* Keep track of detailed PMD performance statistics. */
> +    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct pmd_perf_stats perf_stats;
>  };
> 
>  /* Interface to netdev-based datapath. */ @@ -813,46 +770,11 @@ enum
> pmd_info_type {  };
> 
>  static void
> -pmd_info_show_stats(struct ds *reply,
> -                    struct dp_netdev_pmd_thread *pmd,
> -                    unsigned long long stats[DP_N_STATS],
> -                    uint64_t cycles[PMD_N_CYCLES])
> +format_pmd_thread(struct ds *reply,
> +                  struct dp_netdev_pmd_thread *pmd)
>  {
> -    unsigned long long total_packets;
> -    uint64_t total_cycles = 0;
> -    int i;
> -
> -    /* These loops subtracts reference values ('*_zero') from the counters.
> -     * Since loads and stores are relaxed, it might be possible for a '*_zero'
> -     * value to be more recent than the current value we're reading from the
> -     * counter.  This is not a big problem, since these numbers are not
> -     * supposed to be too accurate, but we should at least make sure that
> -     * the result is not negative. */
> -    for (i = 0; i < DP_N_STATS; i++) {
> -        if (stats[i] > pmd->stats_zero[i]) {
> -            stats[i] -= pmd->stats_zero[i];
> -        } else {
> -            stats[i] = 0;
> -        }
> -    }
> -
> -    /* Sum of all the matched and not matched packets gives the total.  */
> -    total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT]
> -                    + stats[DP_STAT_MISS];
> -
> -    for (i = 0; i < PMD_N_CYCLES; i++) {
> -        if (cycles[i] > pmd->cycles_zero[i]) {
> -           cycles[i] -= pmd->cycles_zero[i];
> -        } else {
> -            cycles[i] = 0;
> -        }
> -
> -        total_cycles += cycles[i];
> -    }
> -
>      ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID)
>                          ? "main thread" : "pmd thread");
> -
>      if (pmd->numa_id != OVS_NUMA_UNSPEC) {
>          ds_put_format(reply, " numa_id %d", pmd->numa_id);
>      }
> @@ -860,16 +782,39 @@ pmd_info_show_stats(struct ds *reply,
>          ds_put_format(reply, " core_id %u", pmd->core_id);
>      }
>      ds_put_cstr(reply, ":\n");
> +}
> +
> +static void
> +pmd_info_show_stats(struct ds *reply,
> +                    struct dp_netdev_pmd_thread *pmd) {
> +    uint64_t stats[PMD_N_STATS];
> +    uint64_t total_cycles = 0;
> +
> +    pmd_perf_read_counters(&pmd->perf_stats, stats);
> +    total_cycles = stats[PMD_CYCLES_ITER_IDLE]
> +                         + stats[PMD_CYCLES_ITER_BUSY];
> +
> +    format_pmd_thread(reply, pmd);
> 
>      ds_put_format(reply,
> -                  "\temc hits:%llu\n\tmegaflow hits:%llu\n"
> -                  "\tavg. subtable lookups per hit:%.2f\n"
> -                  "\tmiss:%llu\n\tlost:%llu\n",
> -                  stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
> -                  stats[DP_STAT_MASKED_HIT] > 0
> -                  ? (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
> -                  : 0,
> -                  stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
> +                  "\tpackets received:%"PRIu64"\n"
> +                  "\tpacket recirculations:%"PRIu64"\n"
> +                  "\tavg. datapath passes per packet:%.2f\n"
> +                  "\temc hits:%"PRIu64"\n"
> +                  "\tmegaflow hits:%"PRIu64"\n"
> +                  "\tavg. subtable lookups per megaflow hit:%.2f\n"
> +                  "\tmiss(i.e. lookup miss with success upcall):%"PRIu64"\n"
> +                  "\tlost(i.e. lookup miss with failed upcall):%"PRIu64"\n",
> +                  stats[PMD_STAT_RECV], stats[PMD_STAT_RECIRC],
> +                  stats[PMD_STAT_RECV] > 0
> +                  ? (1.0 * (stats[PMD_STAT_RECV] + stats[PMD_STAT_RECIRC]))
> +                     / stats[PMD_STAT_RECV] : 0,
> +                  stats[PMD_STAT_EXACT_HIT], stats[PMD_STAT_MASKED_HIT],
> +                  stats[PMD_STAT_MASKED_HIT] > 0
> +                  ? (1.0 * stats[PMD_STAT_MASKED_LOOKUP])
> +                     / stats[PMD_STAT_MASKED_HIT]
> +                  : 0, stats[PMD_STAT_MISS], stats[PMD_STAT_LOST]);
> 
>      if (total_cycles == 0) {
>          return;
> @@ -878,46 +823,26 @@ pmd_info_show_stats(struct ds *reply,
>      ds_put_format(reply,
>                    "\tidle cycles:%"PRIu64" (%.02f%%)\n"
>                    "\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
> -                  cycles[PMD_CYCLES_IDLE],
> -                  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
> -                  cycles[PMD_CYCLES_PROCESSING],
> -                  cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100);
> +                  stats[PMD_CYCLES_ITER_IDLE],
> +                  stats[PMD_CYCLES_ITER_IDLE] / (double)total_cycles * 100,
> +                  stats[PMD_CYCLES_ITER_BUSY],
> +                  stats[PMD_CYCLES_ITER_BUSY] / (double)total_cycles *
> + 100);
> 
> +    uint64_t total_packets = stats[PMD_STAT_RECV];
>      if (total_packets == 0) {
>          return;
>      }
> 
>      ds_put_format(reply,
> -                  "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n",
> +                  "\tavg cycles per packet: %.02f
> + (%"PRIu64"/%"PRIu64")\n",
>                    total_cycles / (double)total_packets,
>                    total_cycles, total_packets);
> 
>      ds_put_format(reply,
>                    "\tavg processing cycles per packet: "
> -                  "%.02f (%"PRIu64"/%llu)\n",
> -                  cycles[PMD_CYCLES_PROCESSING] / (double)total_packets,
> -                  cycles[PMD_CYCLES_PROCESSING], total_packets);
> -}
> -
> -static void
> -pmd_info_clear_stats(struct ds *reply OVS_UNUSED,
> -                    struct dp_netdev_pmd_thread *pmd,
> -                    unsigned long long stats[DP_N_STATS],
> -                    uint64_t cycles[PMD_N_CYCLES])
> -{
> -    int i;
> -
> -    /* We cannot write 'stats' and 'cycles' (because they're written by other
> -     * threads) and we shouldn't change 'stats' (because they're used to count
> -     * datapath stats, which must not be cleared here).  Instead, we save the
> -     * current values and subtract them from the values to be displayed in the
> -     * future */
> -    for (i = 0; i < DP_N_STATS; i++) {
> -        pmd->stats_zero[i] = stats[i];
> -    }
> -    for (i = 0; i < PMD_N_CYCLES; i++) {
> -        pmd->cycles_zero[i] = cycles[i];
> -    }
> +                  "%.02f (%"PRIu64"/%"PRIu64")\n",
> +                  stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets,
> +                  stats[PMD_CYCLES_POLL_BUSY], total_packets);
>  }
> 
>  static int
> @@ -1077,23 +1002,34 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
> int argc, const char *argv[],
>      struct ds reply = DS_EMPTY_INITIALIZER;
>      struct dp_netdev_pmd_thread **pmd_list;
>      struct dp_netdev *dp = NULL;
> -    size_t n;
>      enum pmd_info_type type = *(enum pmd_info_type *) aux;
> +    int core_id = -1;
> +    size_t n;
> 
>      ovs_mutex_lock(&dp_netdev_mutex);
> 
> -    if (argc == 2) {
> -        dp = shash_find_data(&dp_netdevs, argv[1]);
> -    } else if (shash_count(&dp_netdevs) == 1) {
> -        /* There's only one datapath */
> -        dp = shash_first(&dp_netdevs)->data;
> +    while (argc > 1) {
> +        if (!strcmp(argv[1], "-pmd")) {
> +            core_id = strtol(argv[2], NULL, 10);
> +            argc -= 2;
> +            argv += 2;
> +        } else {
> +            dp = shash_find_data(&dp_netdevs, argv[1]);
> +            argc -= 1;
> +            argv += 1;
> +        }
>      }
> 
>      if (!dp) {
> -        ovs_mutex_unlock(&dp_netdev_mutex);
> -        unixctl_command_reply_error(conn,
> -                                    "please specify an existing datapath");
> -        return;
> +        if (shash_count(&dp_netdevs) == 1) {
> +            /* There's only one datapath */
> +            dp = shash_first(&dp_netdevs)->data;
> +        } else {
> +            ovs_mutex_unlock(&dp_netdev_mutex);
> +            unixctl_command_reply_error(conn,
> +                                        "please specify an existing datapath");
> +            return;
> +        }
>      }
> 
>      sorted_poll_thread_list(dp, &pmd_list, &n); @@ -1102,26 +1038,15 @@
> dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
>          if (!pmd) {
>              break;
>          }
> -
> +        if (core_id != -1 && pmd->core_id != core_id) {
> +            continue;
> +        }
>          if (type == PMD_INFO_SHOW_RXQ) {
>              pmd_info_show_rxq(&reply, pmd);
> -        } else {
> -            unsigned long long stats[DP_N_STATS];
> -            uint64_t cycles[PMD_N_CYCLES];
> -
> -            /* Read current stats and cycle counters */
> -            for (size_t j = 0; j < ARRAY_SIZE(stats); j++) {
> -                atomic_read_relaxed(&pmd->stats.n[j], &stats[j]);
> -            }
> -            for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) {
> -                atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]);
> -            }
> -
> -            if (type == PMD_INFO_CLEAR_STATS) {
> -                pmd_info_clear_stats(&reply, pmd, stats, cycles);
> -            } else if (type == PMD_INFO_SHOW_STATS) {
> -                pmd_info_show_stats(&reply, pmd, stats, cycles);
> -            }
> +        } else if (type == PMD_INFO_CLEAR_STATS) {
> +            pmd_perf_stats_clear(&pmd->perf_stats);
> +        } else if (type == PMD_INFO_SHOW_STATS) {
> +            pmd_info_show_stats(&reply, pmd);
>          }
>      }
>      free(pmd_list);
> @@ -1139,14 +1064,14 @@ dpif_netdev_init(void)
>                                clear_aux = PMD_INFO_CLEAR_STATS,
>                                poll_aux = PMD_INFO_SHOW_RXQ;
> 
> -    unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]",
> -                             0, 1, dpif_netdev_pmd_info,
> +    unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core]
> [dp]",
> +                             0, 2, dpif_netdev_pmd_info,
>                               (void *)&show_aux);
> -    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]",
> -                             0, 1, dpif_netdev_pmd_info,
> +    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core]
> [dp]",
> +                             0, 2, dpif_netdev_pmd_info,
>                               (void *)&clear_aux);
> -    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]",
> -                             0, 1, dpif_netdev_pmd_info,
> +    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]",
> +                             0, 2, dpif_netdev_pmd_info,
>                               (void *)&poll_aux);
>      unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
>                               0, 1, dpif_netdev_pmd_rebalance, @@ -1482,23 +1407,19 @@
> dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats)  {
>      struct dp_netdev *dp = get_dp_netdev(dpif);
>      struct dp_netdev_pmd_thread *pmd;
> +    uint64_t pmd_stats[PMD_N_STATS];
> 
> -    stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0;
> +    stats->n_flows = stats->n_hit =
> +            stats->n_mask_hit = stats->n_missed = stats->n_lost = 0;
>      CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
> -        unsigned long long n;
>          stats->n_flows += cmap_count(&pmd->flow_table);
> -
> -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n);
> -        stats->n_hit += n;
> -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n);
> -        stats->n_hit += n;
> -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n);
> -        stats->n_missed += n;
> -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n);
> -        stats->n_lost += n;
> +        pmd_perf_read_counters(&pmd->perf_stats, pmd_stats);
> +        stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT];
> +        stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT];
> +        stats->n_missed += pmd_stats[PMD_STAT_MISS];
> +        stats->n_lost += pmd_stats[PMD_STAT_LOST];
>      }
>      stats->n_masks = UINT32_MAX;
> -    stats->n_mask_hit = UINT64_MAX;
> 
>      return 0;
>  }
> @@ -3174,28 +3095,28 @@ cycles_count_start(struct dp_netdev_pmd_thread
> *pmd)
>  /* Stop counting cycles and add them to the counter 'type' */  static inline void
> cycles_count_end(struct dp_netdev_pmd_thread *pmd,
> -                 enum pmd_cycles_counter_type type)
> +                 enum pmd_stat_type type)
>      OVS_RELEASES(&cycles_counter_fake_mutex)
>      OVS_NO_THREAD_SAFETY_ANALYSIS
>  {
>      unsigned long long interval = cycles_counter() - pmd->last_cycles;
> 
> -    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
> +    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
>  }
> 
>  /* Calculate the intermediate cycle result and add to the counter 'type' */  static
> inline void  cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
>                            struct dp_netdev_rxq *rxq,
> -                          enum pmd_cycles_counter_type type)
> +                          enum pmd_stat_type type)
>      OVS_NO_THREAD_SAFETY_ANALYSIS
>  {
>      unsigned long long new_cycles = cycles_counter();
>      unsigned long long interval = new_cycles - pmd->last_cycles;
>      pmd->last_cycles = new_cycles;
> 
> -    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
> -    if (rxq && (type == PMD_CYCLES_PROCESSING)) {
> +    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
> +    if (rxq && (type == PMD_CYCLES_POLL_BUSY)) {
>          /* Add to the amount of current processing cycles. */
>          non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR],
> interval);
>      }
> @@ -3885,12 +3806,12 @@ dpif_netdev_run(struct dpif *dpif)
>                                                     port->port_no);
>                      cycles_count_intermediate(non_pmd, NULL,
>                                                process_packets
> -                                              ? PMD_CYCLES_PROCESSING
> -                                              : PMD_CYCLES_IDLE);
> +                                              ? PMD_CYCLES_POLL_BUSY
> +                                              : PMD_CYCLES_POLL_IDLE);
>                  }
>              }
>          }
> -        cycles_count_end(non_pmd, PMD_CYCLES_IDLE);
> +        cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE);
>          dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false);
>          ovs_mutex_unlock(&dp->non_pmd_mutex);
> 
> @@ -4034,6 +3955,7 @@ static void *
>  pmd_thread_main(void *f_)
>  {
>      struct dp_netdev_pmd_thread *pmd = f_;
> +    struct pmd_perf_stats *s = &pmd->perf_stats;
>      unsigned int lc = 0;
>      struct polled_queue *poll_list;
>      bool exiting;
> @@ -4069,13 +3991,17 @@ reload:
> 
>      cycles_count_start(pmd);
>      for (;;) {
> +        uint64_t iter_packets = 0;
> +        pmd_perf_start_iteration(s, pmd->last_cycles);
>          for (i = 0; i < poll_cnt; i++) {
>              process_packets =
>                  dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
>                                             poll_list[i].port_no);
>              cycles_count_intermediate(pmd, poll_list[i].rxq,
> -                                      process_packets ? PMD_CYCLES_PROCESSING
> -                                                      : PMD_CYCLES_IDLE);
> +                                      process_packets
> +                                      ? PMD_CYCLES_POLL_BUSY
> +                                      : PMD_CYCLES_POLL_IDLE);
> +            iter_packets += process_packets;
>          }
> 
>          if (lc++ > 1024) {
> @@ -4093,10 +4019,12 @@ reload:
>              if (reload) {
>                  break;
>              }
> +            cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD);
>          }
> +        pmd_perf_end_iteration(s, pmd->last_cycles, iter_packets);
>      }
> 
> -    cycles_count_end(pmd, PMD_CYCLES_IDLE);
> +    cycles_count_end(pmd, PMD_CYCLES_OVERHEAD);
> 
>      poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
>      exiting = latch_is_set(&pmd->exit_latch); @@ -4549,6 +4477,7 @@
> dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct
> dp_netdev *dp,
>      }
>      cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd-
> >node),
>                  hash_int(core_id, 0));
> +    pmd_perf_stats_init(&pmd->perf_stats);
>  }
> 
>  static void
> @@ -4746,13 +4675,6 @@ dp_netdev_flow_used(struct dp_netdev_flow
> *netdev_flow, int cnt, int size,
>      atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags);  }
> 
> -static void
> -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd,
> -                       enum dp_stat_type type, int cnt)
> -{
> -    non_atomic_ullong_add(&pmd->stats.n[type], cnt);
> -}
> -
>  static int
>  dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet
> *packet_,
>                   struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid, @@ -
> 4926,6 +4848,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>      int i;
> 
>      atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
> +    pmd_perf_update_counter(&pmd->perf_stats,
> +                            md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
> +                            cnt);
> 
>      DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
>          struct dp_netdev_flow *flow;
> @@ -4974,24 +4899,24 @@ emc_processing(struct dp_netdev_pmd_thread
> *pmd,
>          }
>      }
> 
> -    dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT,
> -                           cnt - n_dropped - n_missed);
> +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT,
> +                            cnt - n_dropped - n_missed);
> 
>      return dp_packet_batch_size(packets_);  }
> 
> -static inline void
> +static inline int
>  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
>                       struct dp_packet *packet,
>                       const struct netdev_flow_key *key,
>                       struct ofpbuf *actions, struct ofpbuf *put_actions,
> -                     int *lost_cnt, long long now)
> +                     long long now)
>  {
>      struct ofpbuf *add_actions;
>      struct dp_packet_batch b;
>      struct match match;
>      ovs_u128 ufid;
> -    int error;
> +    int error = 0;
> 
>      match.tun_md.valid = false;
>      miniflow_expand(&key->mf, &match.flow); @@ -5005,8 +4930,7 @@
> handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
>                               put_actions);
>      if (OVS_UNLIKELY(error && error != ENOSPC)) {
>          dp_packet_delete(packet);
> -        (*lost_cnt)++;
> -        return;
> +        return error;
>      }
> 
>      /* The Netlink encoding of datapath flow keys cannot express @@ -5046,6
> +4970,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
>          ovs_mutex_unlock(&pmd->flow_mutex);
>          emc_probabilistic_insert(pmd, key, netdev_flow);
>      }
> +    return error;
>  }
> 
>  static inline void
> @@ -5067,7 +4992,7 @@ fast_path_processing(struct dp_netdev_pmd_thread
> *pmd,
>      struct dpcls *cls;
>      struct dpcls_rule *rules[PKT_ARRAY_SIZE];
>      struct dp_netdev *dp = pmd->dp;
> -    int miss_cnt = 0, lost_cnt = 0;
> +    int upcall_ok_cnt = 0, upcall_fail_cnt = 0;
>      int lookup_cnt = 0, add_lookup_cnt;
>      bool any_miss;
>      size_t i;
> @@ -5109,9 +5034,14 @@ fast_path_processing(struct dp_netdev_pmd_thread
> *pmd,
>                  continue;
>              }
> 
> -            miss_cnt++;
> -            handle_packet_upcall(pmd, packet, &keys[i], &actions,
> -                                 &put_actions, &lost_cnt, now);
> +            int error = handle_packet_upcall(pmd, packet, &keys[i],
> +                           &actions, &put_actions, now);
> +
> +            if (OVS_UNLIKELY(error)) {
> +                upcall_fail_cnt++;
> +            } else {
> +                upcall_ok_cnt++;
> +            }
>          }
> 
>          ofpbuf_uninit(&actions);
> @@ -5121,8 +5051,7 @@ fast_path_processing(struct dp_netdev_pmd_thread
> *pmd,
>          DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
>              if (OVS_UNLIKELY(!rules[i])) {
>                  dp_packet_delete(packet);
> -                lost_cnt++;
> -                miss_cnt++;
> +                upcall_fail_cnt++;
>              }
>          }
>      }
> @@ -5140,10 +5069,14 @@ fast_path_processing(struct
> dp_netdev_pmd_thread *pmd,
>          dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
>      }
> 
> -    dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
> -    dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
> -    dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
> -    dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
> +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT,
> +                            cnt - upcall_ok_cnt - upcall_fail_cnt);
> +    pmd_perf_update_counter(&pmd->perf_stats,
> PMD_STAT_MASKED_LOOKUP,
> +                            lookup_cnt);
> +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS,
> +                            upcall_ok_cnt);
> +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST,
> +                            upcall_fail_cnt);
>  }
> 
>  /* Packets enter the datapath from a port (or from recirculation) here.
> diff --git a/tests/pmd.at b/tests/pmd.at index e39a23a..0356f87 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0
>  		p0 7/1: (dummy-pmd: configured_rx_queues=4,
> configured_tx_queues=<cleared>, requested_rx_queues=4,
> requested_tx_queues=<cleared>)
>  ])
> 
> -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> +SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
>  pmd thread numa_id <cleared> core_id <cleared>:
> +	packets received:0
> +	packet recirculations:0
> +	avg. datapath passes per packet:0.00
>  	emc hits:0
>  	megaflow hits:0
> -	avg. subtable lookups per hit:0.00
> -	miss:0
> -	lost:0
> +	avg. subtable lookups per megaflow hit:0.00
> +	miss(i.e. lookup miss with success upcall):0
> +	lost(i.e. lookup miss with failed upcall):0
>  ])
> 
>  ovs-appctl time/stop
> @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install
> | strip_xout], [0], [dnl
> recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50
> :54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del>
>  ])
> 
> -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
> +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> +SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
>  pmd thread numa_id <cleared> core_id <cleared>:
> +	packets received:20
> +	packet recirculations:0
> +	avg. datapath passes per packet:1.00
>  	emc hits:19
>  	megaflow hits:0
> -	avg. subtable lookups per hit:0.00
> -	miss:1
> -	lost:0
> +	avg. subtable lookups per megaflow hit:0.00
> +	miss(i.e. lookup miss with success upcall):1
> +	lost(i.e. lookup miss with failed upcall):0
>  ])
> 
>  OVS_VSWITCHD_STOP
> --
> 1.9.1
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Jan Scheurich Dec. 11, 2017, 9:05 a.m. UTC | #2
Hi Billy,

My patches frequently get corrupted by our email system. When I submit the v4 of the series adapted to the reverted dp_netdev_pmd_thread struct changes I will try out another method of sending them.

For now, could you please try the attached patch files? For me they apply cleanly on today's master. Haven't tested them yet.

I'll fix the white-space errors and line lengths issues in the next version. More answers in-line.

Thanks, Jan

> -----Original Message-----
> From: O Mahony, Billy [mailto:billy.o.mahony@intel.com]
> Sent: Friday, 08 December, 2017 18:25
> To: Jan Scheurich <jan.scheurich@ericsson.com>; 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
> Subject: RE: [ovs-dev] [PATCH v3 1/3] dpif-netdev: Refactor PMD performance into dpif-netdev-perf
> 
> Hi Jan,
> 
> I had problems applying later patches in this series so just reviewing this one for now. I tried several revisions to apply them.
> 
> The second patch ([ovs-dev,v3,2/3] dpif-netdev: Detailed performance stats for PMDs ) fails with
> fatal: patch fragment without header at line 708: @@ -1073,6 +1155,12 @@ dpif_netdev_init(void)
> 
> Overall not only is user-visible output is clearer but the code is also more consistent and easier to understand.
> 
> I tested this patch by applying it to: 3728b3b Ben Pfaff 2017-11-20 Merge branch 'dpdk_merge' of https://github.com/istokes/ovs into
> HEAD
> 
> These are the issues I did find:
> 1. make check #1159 "ofproto-dpif patch ports" consistently fails for me with this patch applied
> 
> 2. ./utilities/checkpatch.py reports some line length issues:
> == Checking "dpif-netdev-perf/ovs-dev-v3-1-3-dpif-netdev-Refactor-PMD-performance-into-dpif-netdev-perf.patch" ==
> ERROR: Too many signoffs; are you missing Co-authored-by lines?

I'm still confused by the tagging rule. I guess I need to add a
Co-authored-by: Darrell Ball <dlu998@gmail.com>
line to comply, right?

> WARNING: Line length is >79-characters long
> #346 FILE: lib/dpif-netdev.c:544:
>         struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */
> 
> WARNING: Line has non-spaces leading whitespace
> WARNING: Line has trailing whitespace
> #347 FILE: lib/dpif-netdev.c:545:
> 
> 
> WARNING: Line length is >79-characters long
> #349 FILE: lib/dpif-netdev.c:547:
>          * XPS disabled for this netdev. All static_tx_qid's are unique and less
> 
> WARNING: Line has non-spaces leading whitespace
> WARNING: Line has trailing whitespace
> #352 FILE: lib/dpif-netdev.c:550:
> 
> 
> WARNING: Line length is >79-characters long
> WARNING: Line has trailing whitespace
> #413 FILE: lib/dpif-netdev.c:610:
>     OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct pmd_perf_stats perf_stats;
> 
> Lines checked: 976, Warnings: 8, Errors: 1
> 
> 3. Does the new 'pmd' arg to pmd-stats-show interfere with the existing [dp] arg?
> 
>     sudo ./utilities/ovs-appctl dpif-netdev/pmd-stats-show -pmd 1 netdev@dpif-netdev
>     "dpif-netdev/pmd-stats-show" command takes at most 2 arguments
>      ovs-appctl: ovs-vswitchd: server returned an error

Good question. I'd say the two options should be mutually exclusive. Specifying a PMD should imply datapath netdev@dpif-netdev.
I can make this clear in the usage: dpif-netdev/pmd-stats-show", "[-pmd core |  dp].

Actually, I have never really understood the dp argument for these commands. Can there ever be more than one dpif-netdev datapath?
I know that one must specify netdev@dpif-netdev if there also is a kernel datapath in OVS, but I still don't fully get why.

> 
> Otherwise it looks like a really useful patch. And the remainder of the series more so.
> 
> Thanks,
> Billy.
> 
> > -----Original Message-----
> > From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> > bounces@openvswitch.org] On Behalf Of Jan Scheurich
> > Sent: Tuesday, November 21, 2017 12:38 AM
> > To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
> > Subject: [ovs-dev] [PATCH v3 1/3] dpif-netdev: Refactor PMD performance into
> > dpif-netdev-perf
> >
> > Add module dpif-netdev-perf to host all PMD performance-related data
> > structures and functions in dpif-netdev. Refactor the PMD stats handling in dpif-
> > netdev and delegate whatever possible into the new module, using clean
> > interfaces to shield dpif-netdev from the implementation details. Accordingly,
> > the all PMD statistics members are moved from the main struct
> > dp_netdev_pmd_thread into a dedicated member of type struct pmd_perf_stats.
> >
> > Include Darrel's prior refactoring of PMD stats contained in [PATCH v5,2/3] dpif-
> > netdev: Refactor some pmd stats:
> >
> > 1. The cycles per packet counts are now based on packets received rather than
> > packet passes through the datapath.
> >
> > 2. Packet counters are now kept for packets received and packets recirculated.
> > These are kept as separate counters for maintainability reasons. The cost of
> > incrementing these counters is negligible.  These new counters are also
> > displayed to the user.
> >
> > 3. A display statistic is added for the average number of datapath passes per
> > packet. This should be useful for user debugging and understanding of packet
> > processing.
> >
> > 4. The user visible 'miss' counter is used for successful upcalls, rather than the
> > sum of sucessful and unsuccessful upcalls. Hence, this becomes what user
> > historically understands by OVS 'miss upcall'.
> > The user display is annotated to make this clear as well.
> >
> > 5. The user visible 'lost' counter remains as failed upcalls, but is annotated to
> > make it clear what the meaning is.
> >
> > 6. The enum pmd_stat_type is annotated to make the usage of the stats
> > counters clear.
> >
> > 7. The subtable lookup stats is renamed to make it clear that it relates to masked
> > lookups.
> >
> > 8. The PMD stats test is updated to handle the new user stats of packets
> > received, packets recirculated and average number of datapath passes per
> > packet.
> >
> > On top of that introduce a "-pmd <core>" option to the PMD info commands to
> > filter the output for a single PMD.
> >
> > Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
> > Signed-off-by: Darrell Ball <dlu998@gmail.com>
> >
> > ---
> >  lib/automake.mk        |   2 +
> >  lib/dpif-netdev-perf.c |  67 +++++++++
> >  lib/dpif-netdev-perf.h | 123 ++++++++++++++++
> >  lib/dpif-netdev.c      | 371 ++++++++++++++++++++-----------------------------
> >  tests/pmd.at           |  22 +--
> >  5 files changed, 358 insertions(+), 227 deletions(-)  create mode 100644 lib/dpif-
> > netdev-perf.c  create mode 100644 lib/dpif-netdev-perf.h
> >
> > diff --git a/lib/automake.mk b/lib/automake.mk index effe5b5..0b8df0e 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \
> >  	lib/dpdk.h \
> >  	lib/dpif-netdev.c \
> >  	lib/dpif-netdev.h \
> > +	lib/dpif-netdev-perf.c \
> > +	lib/dpif-netdev-perf.h \
> >  	lib/dpif-provider.h \
> >  	lib/dpif.c \
> >  	lib/dpif.h \
> > diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c new file mode 100644
> > index 0000000..6a8f7c4
> > --- /dev/null
> > +++ b/lib/dpif-netdev-perf.c
> > @@ -0,0 +1,67 @@
> > +/*
> > + * Copyright (c) 2017 Ericsson AB.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include <config.h>
> > +
> > +#ifdef DPDK_NETDEV
> > +#include <rte_cycles.h>
> > +#endif
> > +
> > +#include "openvswitch/dynamic-string.h"
> > +#include "openvswitch/vlog.h"
> > +#include "dpif-netdev-perf.h"
> > +#include "timeval.h"
> > +
> > +VLOG_DEFINE_THIS_MODULE(pmd_perf);
> > +
> > +void
> > +pmd_perf_stats_init(struct pmd_perf_stats *s) {
> > +    memset(s, 0 , sizeof(*s));
> > +    s->start_ms = time_msec();
> > +}
> > +
> > +void
> > +pmd_perf_read_counters(struct pmd_perf_stats *s,
> > +                       uint64_t stats[PMD_N_STATS]) {
> > +    uint64_t val;
> > +
> > +    /* These loops subtracts reference values ('*_zero') from the counters.
> > +     * Since loads and stores are relaxed, it might be possible for a '*_zero'
> > +     * value to be more recent than the current value we're reading from the
> > +     * counter.  This is not a big problem, since these numbers are not
> > +     * supposed to be too accurate, but we should at least make sure that
> > +     * the result is not negative. */
> > +    for (int i = 0; i < PMD_N_STATS; i++) {
> > +        atomic_read_relaxed(&s->counters.n[i], &val);
> > +        if (val > s->counters.zero[i]) {
> > +            stats[i] = val - s->counters.zero[i];
> > +        } else {
> > +            stats[i] = 0;
> > +        }
> > +    }
> > +}
> > +
> > +void
> > +pmd_perf_stats_clear(struct pmd_perf_stats *s) {
> > +    s->start_ms = 0; /* Marks start of clearing. */
> > +    for (int i = 0; i < PMD_N_STATS; i++) {
> > +        atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]);
> > +    }
> > +    s->start_ms = time_msec(); /* Clearing finished. */ }
> > +
> > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h new file mode 100644
> > index 0000000..f55f7a2
> > --- /dev/null
> > +++ b/lib/dpif-netdev-perf.h
> > @@ -0,0 +1,123 @@
> > +/*
> > + * Copyright (c) 2017 Ericsson AB.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> > implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#ifndef PMD_PERF_H
> > +#define PMD_PERF_H 1
> > +
> > +#include <stdbool.h>
> > +#include <stddef.h>
> > +#include <stdint.h>
> > +#include <string.h>
> > +#include <math.h>
> > +
> > +#include "openvswitch/vlog.h"
> > +#include "ovs-atomic.h"
> > +#include "timeval.h"
> > +#include "unixctl.h"
> > +#include "util.h"
> > +
> > +#ifdef  __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +enum pmd_stat_type {
> > +    PMD_STAT_EXACT_HIT,     /* Packets that had an exact match (emc). */
> > +    PMD_STAT_MASKED_HIT,    /* Packets that matched in the flow table. */
> > +    PMD_STAT_MISS,          /* Packets that did not match and upcall was ok. */
> > +    PMD_STAT_LOST,          /* Packets that did not match and upcall failed. */
> > +                           /* The above statistics account for the total
> > +                            * number of packet passes through the datapath
> > +                            * pipeline and should not be overlapping with each
> > +                            * other. */
> > +    PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table
> > +                              hits. Each MASKED_HIT hit will have >= 1
> > +                              MASKED_LOOKUP(s). */
> > +    PMD_STAT_RECV,          /* Packets entering the datapath pipeline from an
> > +                             * interface. */
> > +    PMD_STAT_RECIRC,        /* Packets reentering the datapath pipeline due to
> > +                             * recirculation. */
> > +    PMD_CYCLES_POLL_IDLE,   /* Cycles spent unsuccessful polling. */
> > +    PMD_CYCLES_POLL_BUSY,   /* Cycles spent successfully polling and
> > +                             * processing polled packets. */
> > +    PMD_CYCLES_OVERHEAD,    /* Cycles spent for other tasks. */
> > +    PMD_CYCLES_UPCALL,      /* Cycles spent in upcalls. Included in
> > +                               PMD_CYCLES_POLL_BUSY. */
> > +    PMD_CYCLES_ITER_IDLE,   /* Cycles spent in idle iterations. */
> > +    PMD_CYCLES_ITER_BUSY,   /* Cycles spent in busy iterations. */
> > +    PMD_N_STATS
> > +};
> > +
> > +struct pmd_counters {
> > +    atomic_uint64_t n[PMD_N_STATS];     /* Indexed by PMD_STAT_*. */
> > +    uint64_t zero[PMD_N_STATS];
> > +};
> > +
> > +struct pmd_perf_stats {
> > +    uint64_t start_ms;
> > +    uint64_t last_tsc;
> > +    struct pmd_counters counters;
> > +};
> > +
> > +enum pmd_info_type;
> > +struct dp_netdev_pmd_thread;
> > +
> > +struct pmd_perf_params {
> > +    int command_type;
> > +    bool histograms;
> > +    size_t iter_hist_len;
> > +    size_t ms_hist_len;
> > +};
> > +
> > +void pmd_perf_stats_init(struct pmd_perf_stats *s); void
> > +pmd_perf_stats_clear(struct pmd_perf_stats *s); void
> > +pmd_perf_read_counters(struct pmd_perf_stats *s,
> > +                            uint64_t stats[PMD_N_STATS]);
> > +
> > +static inline void
> > +pmd_perf_update_counter(struct pmd_perf_stats *s,
> > +                        enum pmd_stat_type counter, int delta) {
> > +    uint64_t tmp;
> > +    atomic_read_relaxed(&s->counters.n[counter], &tmp);
> > +    tmp += delta;
> > +    atomic_store_relaxed(&s->counters.n[counter], tmp); }
> > +
> > +static inline void
> > +pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc) {
> > +    if (OVS_UNLIKELY(s->start_ms == 0)) {
> > +        /* Stats not yet fully initialised. */
> > +        return;
> > +    }
> > +    s->last_tsc = now_tsc;
> > +}
> > +
> > +static inline void
> > +pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
> > +                       int packets)
> > +{
> > +    uint64_t cycles = now_tsc - s->last_tsc;
> > +
> > +    /* No need for atomic updates as only invoked within PMD thread. */
> > +    if (packets > 0) {
> > +        s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles;
> > +    } else {
> > +        s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles;
> > +    }
> > +}
> > +
> > +#endif /* pmd-perf.h */
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0a62630..bf99f81 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -43,6 +43,7 @@
> >  #include "csum.h"
> >  #include "dp-packet.h"
> >  #include "dpif.h"
> > +#include "dpif-netdev-perf.h"
> >  #include "dpif-provider.h"
> >  #include "dummy.h"
> >  #include "fat-rwlock.h"
> > @@ -330,23 +331,6 @@ static struct dp_netdev_port
> > *dp_netdev_lookup_port(const struct dp_netdev *dp,
> >                                                      odp_port_t)
> >      OVS_REQUIRES(dp->port_mutex);
> >
> > -enum dp_stat_type {
> > -    DP_STAT_EXACT_HIT,          /* Packets that had an exact match (emc). */
> > -    DP_STAT_MASKED_HIT,         /* Packets that matched in the flow table. */
> > -    DP_STAT_MISS,               /* Packets that did not match. */
> > -    DP_STAT_LOST,               /* Packets not passed up to the client. */
> > -    DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for flow table
> > -                                   hits */
> > -    DP_N_STATS
> > -};
> > -
> > -enum pmd_cycles_counter_type {
> > -    PMD_CYCLES_IDLE,            /* Cycles spent idle or unsuccessful polling */
> > -    PMD_CYCLES_PROCESSING,      /* Cycles spent successfully polling and
> > -                                 * processing polled packets */
> > -    PMD_N_CYCLES
> > -};
> > -
> >  enum rxq_cycles_counter_type {
> >      RXQ_CYCLES_PROC_CURR,       /* Cycles spent successfully polling and
> >                                     processing packets during the current @@ -496,18 +480,6
> > @@ struct dp_netdev_actions *dp_netdev_flow_get_actions(
> >      const struct dp_netdev_flow *);
> >  static void dp_netdev_actions_free(struct dp_netdev_actions *);
> >
> > -/* Contained by struct dp_netdev_pmd_thread's 'stats' member.  */ -struct
> > dp_netdev_pmd_stats {
> > -    /* Indexed by DP_STAT_*. */
> > -    atomic_ullong n[DP_N_STATS];
> > -};
> > -
> > -/* Contained by struct dp_netdev_pmd_thread's 'cycle' member.  */ -struct
> > dp_netdev_pmd_cycles {
> > -    /* Indexed by PMD_CYCLES_*. */
> > -    atomic_ullong n[PMD_N_CYCLES];
> > -};
> > -
> >  struct polled_queue {
> >      struct dp_netdev_rxq *rxq;
> >      odp_port_t port_no;
> > @@ -566,20 +538,22 @@ struct dp_netdev_pmd_thread {
> >       * need to be protected by 'non_pmd_mutex'.  Every other instance
> >       * will only be accessed by its own pmd thread. */
> >      OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache;
> > -    struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */
> > -
> > -    /* Queue id used by this pmd thread to send packets on all netdevs if
> > -     * XPS disabled for this netdev. All static_tx_qid's are unique and less
> > -     * than 'cmap_count(dp->poll_threads)'. */
> > -    uint32_t static_tx_qid;
> > -
> > -    /* Flow-Table and classifiers
> > -     *
> > -     * Writers of 'flow_table' must take the 'flow_mutex'.  Corresponding
> > -     * changes to 'classifiers' must be made while still holding the
> > -     * 'flow_mutex'.
> > -     */
> > -    struct ovs_mutex flow_mutex;
> > +    PADDED_MEMBERS(CACHE_LINE_SIZE,
> > +        struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */
> > +
> > +        /* Queue id used by this pmd thread to send packets on all netdevs if
> > +         * XPS disabled for this netdev. All static_tx_qid's are unique and less
> > +         * than 'cmap_count(dp->poll_threads)'. */
> > +        uint32_t static_tx_qid;
> > +
> > +        /* Flow-Table and classifiers
> > +         *
> > +         * Writers of 'flow_table' must take the 'flow_mutex'.  Corresponding
> > +         * changes to 'classifiers' must be made while still holding the
> > +         * 'flow_mutex'.
> > +         */
> > +        struct ovs_mutex flow_mutex;
> > +    );
> >      PADDED_MEMBERS(CACHE_LINE_SIZE,
> >          struct cmap flow_table OVS_GUARDED; /* Flow table. */
> >
> > @@ -591,20 +565,10 @@ struct dp_netdev_pmd_thread {
> >             are stored for each polled rxq. */
> >          long long int rxq_next_cycle_store;
> >
> > -        /* Cycles counters */
> > -        struct dp_netdev_pmd_cycles cycles;
> > -
> >          /* Used to count cycles. See 'cycles_counter_end()'. */
> >          unsigned long long last_cycles;
> >          struct latch exit_latch;        /* For terminating the pmd thread. */
> > -    );
> > -
> > -    PADDED_MEMBERS(CACHE_LINE_SIZE,
> > -        /* Statistics. */
> > -        struct dp_netdev_pmd_stats stats;
> >
> > -        struct seq *reload_seq;
> > -        uint64_t last_reload_seq;
> >          atomic_bool reload;             /* Do we need to reload ports? */
> >          bool isolated;
> >
> > @@ -612,11 +576,11 @@ struct dp_netdev_pmd_thread {
> >          bool need_reload;
> >          /* 5 pad bytes. */
> >      );
> > -
> >      PADDED_MEMBERS(CACHE_LINE_SIZE,
> > +        struct seq *reload_seq;
> > +        uint64_t last_reload_seq;
> >          struct ovs_mutex port_mutex;    /* Mutex for 'poll_list'
> >                                             and 'tx_ports'. */
> > -        /* 16 pad bytes. */
> >      );
> >      PADDED_MEMBERS(CACHE_LINE_SIZE,
> >          /* List of rx queues to poll. */ @@ -642,15 +606,8 @@ struct
> > dp_netdev_pmd_thread {
> >          struct hmap send_port_cache;
> >      );
> >
> > -    PADDED_MEMBERS(CACHE_LINE_SIZE,
> > -        /* Only a pmd thread can write on its own 'cycles' and 'stats'.
> > -         * The main thread keeps 'stats_zero' and 'cycles_zero' as base
> > -         * values and subtracts them from 'stats' and 'cycles' before
> > -         * reporting to the user */
> > -        unsigned long long stats_zero[DP_N_STATS];
> > -        uint64_t cycles_zero[PMD_N_CYCLES];
> > -        /* 8 pad bytes. */
> > -    );
> > +    /* Keep track of detailed PMD performance statistics. */
> > +    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct pmd_perf_stats perf_stats;
> >  };
> >
> >  /* Interface to netdev-based datapath. */ @@ -813,46 +770,11 @@ enum
> > pmd_info_type {  };
> >
> >  static void
> > -pmd_info_show_stats(struct ds *reply,
> > -                    struct dp_netdev_pmd_thread *pmd,
> > -                    unsigned long long stats[DP_N_STATS],
> > -                    uint64_t cycles[PMD_N_CYCLES])
> > +format_pmd_thread(struct ds *reply,
> > +                  struct dp_netdev_pmd_thread *pmd)
> >  {
> > -    unsigned long long total_packets;
> > -    uint64_t total_cycles = 0;
> > -    int i;
> > -
> > -    /* These loops subtracts reference values ('*_zero') from the counters.
> > -     * Since loads and stores are relaxed, it might be possible for a '*_zero'
> > -     * value to be more recent than the current value we're reading from the
> > -     * counter.  This is not a big problem, since these numbers are not
> > -     * supposed to be too accurate, but we should at least make sure that
> > -     * the result is not negative. */
> > -    for (i = 0; i < DP_N_STATS; i++) {
> > -        if (stats[i] > pmd->stats_zero[i]) {
> > -            stats[i] -= pmd->stats_zero[i];
> > -        } else {
> > -            stats[i] = 0;
> > -        }
> > -    }
> > -
> > -    /* Sum of all the matched and not matched packets gives the total.  */
> > -    total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT]
> > -                    + stats[DP_STAT_MISS];
> > -
> > -    for (i = 0; i < PMD_N_CYCLES; i++) {
> > -        if (cycles[i] > pmd->cycles_zero[i]) {
> > -           cycles[i] -= pmd->cycles_zero[i];
> > -        } else {
> > -            cycles[i] = 0;
> > -        }
> > -
> > -        total_cycles += cycles[i];
> > -    }
> > -
> >      ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID)
> >                          ? "main thread" : "pmd thread");
> > -
> >      if (pmd->numa_id != OVS_NUMA_UNSPEC) {
> >          ds_put_format(reply, " numa_id %d", pmd->numa_id);
> >      }
> > @@ -860,16 +782,39 @@ pmd_info_show_stats(struct ds *reply,
> >          ds_put_format(reply, " core_id %u", pmd->core_id);
> >      }
> >      ds_put_cstr(reply, ":\n");
> > +}
> > +
> > +static void
> > +pmd_info_show_stats(struct ds *reply,
> > +                    struct dp_netdev_pmd_thread *pmd) {
> > +    uint64_t stats[PMD_N_STATS];
> > +    uint64_t total_cycles = 0;
> > +
> > +    pmd_perf_read_counters(&pmd->perf_stats, stats);
> > +    total_cycles = stats[PMD_CYCLES_ITER_IDLE]
> > +                         + stats[PMD_CYCLES_ITER_BUSY];
> > +
> > +    format_pmd_thread(reply, pmd);
> >
> >      ds_put_format(reply,
> > -                  "\temc hits:%llu\n\tmegaflow hits:%llu\n"
> > -                  "\tavg. subtable lookups per hit:%.2f\n"
> > -                  "\tmiss:%llu\n\tlost:%llu\n",
> > -                  stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
> > -                  stats[DP_STAT_MASKED_HIT] > 0
> > -                  ? (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
> > -                  : 0,
> > -                  stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
> > +                  "\tpackets received:%"PRIu64"\n"
> > +                  "\tpacket recirculations:%"PRIu64"\n"
> > +                  "\tavg. datapath passes per packet:%.2f\n"
> > +                  "\temc hits:%"PRIu64"\n"
> > +                  "\tmegaflow hits:%"PRIu64"\n"
> > +                  "\tavg. subtable lookups per megaflow hit:%.2f\n"
> > +                  "\tmiss(i.e. lookup miss with success upcall):%"PRIu64"\n"
> > +                  "\tlost(i.e. lookup miss with failed upcall):%"PRIu64"\n",
> > +                  stats[PMD_STAT_RECV], stats[PMD_STAT_RECIRC],
> > +                  stats[PMD_STAT_RECV] > 0
> > +                  ? (1.0 * (stats[PMD_STAT_RECV] + stats[PMD_STAT_RECIRC]))
> > +                     / stats[PMD_STAT_RECV] : 0,
> > +                  stats[PMD_STAT_EXACT_HIT], stats[PMD_STAT_MASKED_HIT],
> > +                  stats[PMD_STAT_MASKED_HIT] > 0
> > +                  ? (1.0 * stats[PMD_STAT_MASKED_LOOKUP])
> > +                     / stats[PMD_STAT_MASKED_HIT]
> > +                  : 0, stats[PMD_STAT_MISS], stats[PMD_STAT_LOST]);
> >
> >      if (total_cycles == 0) {
> >          return;
> > @@ -878,46 +823,26 @@ pmd_info_show_stats(struct ds *reply,
> >      ds_put_format(reply,
> >                    "\tidle cycles:%"PRIu64" (%.02f%%)\n"
> >                    "\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
> > -                  cycles[PMD_CYCLES_IDLE],
> > -                  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
> > -                  cycles[PMD_CYCLES_PROCESSING],
> > -                  cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100);
> > +                  stats[PMD_CYCLES_ITER_IDLE],
> > +                  stats[PMD_CYCLES_ITER_IDLE] / (double)total_cycles * 100,
> > +                  stats[PMD_CYCLES_ITER_BUSY],
> > +                  stats[PMD_CYCLES_ITER_BUSY] / (double)total_cycles *
> > + 100);
> >
> > +    uint64_t total_packets = stats[PMD_STAT_RECV];
> >      if (total_packets == 0) {
> >          return;
> >      }
> >
> >      ds_put_format(reply,
> > -                  "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n",
> > +                  "\tavg cycles per packet: %.02f
> > + (%"PRIu64"/%"PRIu64")\n",
> >                    total_cycles / (double)total_packets,
> >                    total_cycles, total_packets);
> >
> >      ds_put_format(reply,
> >                    "\tavg processing cycles per packet: "
> > -                  "%.02f (%"PRIu64"/%llu)\n",
> > -                  cycles[PMD_CYCLES_PROCESSING] / (double)total_packets,
> > -                  cycles[PMD_CYCLES_PROCESSING], total_packets);
> > -}
> > -
> > -static void
> > -pmd_info_clear_stats(struct ds *reply OVS_UNUSED,
> > -                    struct dp_netdev_pmd_thread *pmd,
> > -                    unsigned long long stats[DP_N_STATS],
> > -                    uint64_t cycles[PMD_N_CYCLES])
> > -{
> > -    int i;
> > -
> > -    /* We cannot write 'stats' and 'cycles' (because they're written by other
> > -     * threads) and we shouldn't change 'stats' (because they're used to count
> > -     * datapath stats, which must not be cleared here).  Instead, we save the
> > -     * current values and subtract them from the values to be displayed in the
> > -     * future */
> > -    for (i = 0; i < DP_N_STATS; i++) {
> > -        pmd->stats_zero[i] = stats[i];
> > -    }
> > -    for (i = 0; i < PMD_N_CYCLES; i++) {
> > -        pmd->cycles_zero[i] = cycles[i];
> > -    }
> > +                  "%.02f (%"PRIu64"/%"PRIu64")\n",
> > +                  stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets,
> > +                  stats[PMD_CYCLES_POLL_BUSY], total_packets);
> >  }
> >
> >  static int
> > @@ -1077,23 +1002,34 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn,
> > int argc, const char *argv[],
> >      struct ds reply = DS_EMPTY_INITIALIZER;
> >      struct dp_netdev_pmd_thread **pmd_list;
> >      struct dp_netdev *dp = NULL;
> > -    size_t n;
> >      enum pmd_info_type type = *(enum pmd_info_type *) aux;
> > +    int core_id = -1;
> > +    size_t n;
> >
> >      ovs_mutex_lock(&dp_netdev_mutex);
> >
> > -    if (argc == 2) {
> > -        dp = shash_find_data(&dp_netdevs, argv[1]);
> > -    } else if (shash_count(&dp_netdevs) == 1) {
> > -        /* There's only one datapath */
> > -        dp = shash_first(&dp_netdevs)->data;
> > +    while (argc > 1) {
> > +        if (!strcmp(argv[1], "-pmd")) {
> > +            core_id = strtol(argv[2], NULL, 10);
> > +            argc -= 2;
> > +            argv += 2;
> > +        } else {
> > +            dp = shash_find_data(&dp_netdevs, argv[1]);
> > +            argc -= 1;
> > +            argv += 1;
> > +        }
> >      }
> >
> >      if (!dp) {
> > -        ovs_mutex_unlock(&dp_netdev_mutex);
> > -        unixctl_command_reply_error(conn,
> > -                                    "please specify an existing datapath");
> > -        return;
> > +        if (shash_count(&dp_netdevs) == 1) {
> > +            /* There's only one datapath */
> > +            dp = shash_first(&dp_netdevs)->data;
> > +        } else {
> > +            ovs_mutex_unlock(&dp_netdev_mutex);
> > +            unixctl_command_reply_error(conn,
> > +                                        "please specify an existing datapath");
> > +            return;
> > +        }
> >      }
> >
> >      sorted_poll_thread_list(dp, &pmd_list, &n); @@ -1102,26 +1038,15 @@
> > dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
> >          if (!pmd) {
> >              break;
> >          }
> > -
> > +        if (core_id != -1 && pmd->core_id != core_id) {
> > +            continue;
> > +        }
> >          if (type == PMD_INFO_SHOW_RXQ) {
> >              pmd_info_show_rxq(&reply, pmd);
> > -        } else {
> > -            unsigned long long stats[DP_N_STATS];
> > -            uint64_t cycles[PMD_N_CYCLES];
> > -
> > -            /* Read current stats and cycle counters */
> > -            for (size_t j = 0; j < ARRAY_SIZE(stats); j++) {
> > -                atomic_read_relaxed(&pmd->stats.n[j], &stats[j]);
> > -            }
> > -            for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) {
> > -                atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]);
> > -            }
> > -
> > -            if (type == PMD_INFO_CLEAR_STATS) {
> > -                pmd_info_clear_stats(&reply, pmd, stats, cycles);
> > -            } else if (type == PMD_INFO_SHOW_STATS) {
> > -                pmd_info_show_stats(&reply, pmd, stats, cycles);
> > -            }
> > +        } else if (type == PMD_INFO_CLEAR_STATS) {
> > +            pmd_perf_stats_clear(&pmd->perf_stats);
> > +        } else if (type == PMD_INFO_SHOW_STATS) {
> > +            pmd_info_show_stats(&reply, pmd);
> >          }
> >      }
> >      free(pmd_list);
> > @@ -1139,14 +1064,14 @@ dpif_netdev_init(void)
> >                                clear_aux = PMD_INFO_CLEAR_STATS,
> >                                poll_aux = PMD_INFO_SHOW_RXQ;
> >
> > -    unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]",
> > -                             0, 1, dpif_netdev_pmd_info,
> > +    unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core]
> > [dp]",
> > +                             0, 2, dpif_netdev_pmd_info,
> >                               (void *)&show_aux);
> > -    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]",
> > -                             0, 1, dpif_netdev_pmd_info,
> > +    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core]
> > [dp]",
> > +                             0, 2, dpif_netdev_pmd_info,
> >                               (void *)&clear_aux);
> > -    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]",
> > -                             0, 1, dpif_netdev_pmd_info,
> > +    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]",
> > +                             0, 2, dpif_netdev_pmd_info,
> >                               (void *)&poll_aux);
> >      unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
> >                               0, 1, dpif_netdev_pmd_rebalance, @@ -1482,23 +1407,19 @@
> > dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats)  {
> >      struct dp_netdev *dp = get_dp_netdev(dpif);
> >      struct dp_netdev_pmd_thread *pmd;
> > +    uint64_t pmd_stats[PMD_N_STATS];
> >
> > -    stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0;
> > +    stats->n_flows = stats->n_hit =
> > +            stats->n_mask_hit = stats->n_missed = stats->n_lost = 0;
> >      CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
> > -        unsigned long long n;
> >          stats->n_flows += cmap_count(&pmd->flow_table);
> > -
> > -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n);
> > -        stats->n_hit += n;
> > -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n);
> > -        stats->n_hit += n;
> > -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n);
> > -        stats->n_missed += n;
> > -        atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n);
> > -        stats->n_lost += n;
> > +        pmd_perf_read_counters(&pmd->perf_stats, pmd_stats);
> > +        stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT];
> > +        stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT];
> > +        stats->n_missed += pmd_stats[PMD_STAT_MISS];
> > +        stats->n_lost += pmd_stats[PMD_STAT_LOST];
> >      }
> >      stats->n_masks = UINT32_MAX;
> > -    stats->n_mask_hit = UINT64_MAX;
> >
> >      return 0;
> >  }
> > @@ -3174,28 +3095,28 @@ cycles_count_start(struct dp_netdev_pmd_thread
> > *pmd)
> >  /* Stop counting cycles and add them to the counter 'type' */  static inline void
> > cycles_count_end(struct dp_netdev_pmd_thread *pmd,
> > -                 enum pmd_cycles_counter_type type)
> > +                 enum pmd_stat_type type)
> >      OVS_RELEASES(&cycles_counter_fake_mutex)
> >      OVS_NO_THREAD_SAFETY_ANALYSIS
> >  {
> >      unsigned long long interval = cycles_counter() - pmd->last_cycles;
> >
> > -    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
> > +    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
> >  }
> >
> >  /* Calculate the intermediate cycle result and add to the counter 'type' */  static
> > inline void  cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
> >                            struct dp_netdev_rxq *rxq,
> > -                          enum pmd_cycles_counter_type type)
> > +                          enum pmd_stat_type type)
> >      OVS_NO_THREAD_SAFETY_ANALYSIS
> >  {
> >      unsigned long long new_cycles = cycles_counter();
> >      unsigned long long interval = new_cycles - pmd->last_cycles;
> >      pmd->last_cycles = new_cycles;
> >
> > -    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
> > -    if (rxq && (type == PMD_CYCLES_PROCESSING)) {
> > +    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
> > +    if (rxq && (type == PMD_CYCLES_POLL_BUSY)) {
> >          /* Add to the amount of current processing cycles. */
> >          non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR],
> > interval);
> >      }
> > @@ -3885,12 +3806,12 @@ dpif_netdev_run(struct dpif *dpif)
> >                                                     port->port_no);
> >                      cycles_count_intermediate(non_pmd, NULL,
> >                                                process_packets
> > -                                              ? PMD_CYCLES_PROCESSING
> > -                                              : PMD_CYCLES_IDLE);
> > +                                              ? PMD_CYCLES_POLL_BUSY
> > +                                              : PMD_CYCLES_POLL_IDLE);
> >                  }
> >              }
> >          }
> > -        cycles_count_end(non_pmd, PMD_CYCLES_IDLE);
> > +        cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE);
> >          dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false);
> >          ovs_mutex_unlock(&dp->non_pmd_mutex);
> >
> > @@ -4034,6 +3955,7 @@ static void *
> >  pmd_thread_main(void *f_)
> >  {
> >      struct dp_netdev_pmd_thread *pmd = f_;
> > +    struct pmd_perf_stats *s = &pmd->perf_stats;
> >      unsigned int lc = 0;
> >      struct polled_queue *poll_list;
> >      bool exiting;
> > @@ -4069,13 +3991,17 @@ reload:
> >
> >      cycles_count_start(pmd);
> >      for (;;) {
> > +        uint64_t iter_packets = 0;
> > +        pmd_perf_start_iteration(s, pmd->last_cycles);
> >          for (i = 0; i < poll_cnt; i++) {
> >              process_packets =
> >                  dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
> >                                             poll_list[i].port_no);
> >              cycles_count_intermediate(pmd, poll_list[i].rxq,
> > -                                      process_packets ? PMD_CYCLES_PROCESSING
> > -                                                      : PMD_CYCLES_IDLE);
> > +                                      process_packets
> > +                                      ? PMD_CYCLES_POLL_BUSY
> > +                                      : PMD_CYCLES_POLL_IDLE);
> > +            iter_packets += process_packets;
> >          }
> >
> >          if (lc++ > 1024) {
> > @@ -4093,10 +4019,12 @@ reload:
> >              if (reload) {
> >                  break;
> >              }
> > +            cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD);
> >          }
> > +        pmd_perf_end_iteration(s, pmd->last_cycles, iter_packets);
> >      }
> >
> > -    cycles_count_end(pmd, PMD_CYCLES_IDLE);
> > +    cycles_count_end(pmd, PMD_CYCLES_OVERHEAD);
> >
> >      poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
> >      exiting = latch_is_set(&pmd->exit_latch); @@ -4549,6 +4477,7 @@
> > dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct
> > dp_netdev *dp,
> >      }
> >      cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd-
> > >node),
> >                  hash_int(core_id, 0));
> > +    pmd_perf_stats_init(&pmd->perf_stats);
> >  }
> >
> >  static void
> > @@ -4746,13 +4675,6 @@ dp_netdev_flow_used(struct dp_netdev_flow
> > *netdev_flow, int cnt, int size,
> >      atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags);  }
> >
> > -static void
> > -dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd,
> > -                       enum dp_stat_type type, int cnt)
> > -{
> > -    non_atomic_ullong_add(&pmd->stats.n[type], cnt);
> > -}
> > -
> >  static int
> >  dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet
> > *packet_,
> >                   struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid, @@ -
> > 4926,6 +4848,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
> >      int i;
> >
> >      atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
> > +    pmd_perf_update_counter(&pmd->perf_stats,
> > +                            md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
> > +                            cnt);
> >
> >      DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
> >          struct dp_netdev_flow *flow;
> > @@ -4974,24 +4899,24 @@ emc_processing(struct dp_netdev_pmd_thread
> > *pmd,
> >          }
> >      }
> >
> > -    dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT,
> > -                           cnt - n_dropped - n_missed);
> > +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT,
> > +                            cnt - n_dropped - n_missed);
> >
> >      return dp_packet_batch_size(packets_);  }
> >
> > -static inline void
> > +static inline int
> >  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
> >                       struct dp_packet *packet,
> >                       const struct netdev_flow_key *key,
> >                       struct ofpbuf *actions, struct ofpbuf *put_actions,
> > -                     int *lost_cnt, long long now)
> > +                     long long now)
> >  {
> >      struct ofpbuf *add_actions;
> >      struct dp_packet_batch b;
> >      struct match match;
> >      ovs_u128 ufid;
> > -    int error;
> > +    int error = 0;
> >
> >      match.tun_md.valid = false;
> >      miniflow_expand(&key->mf, &match.flow); @@ -5005,8 +4930,7 @@
> > handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
> >                               put_actions);
> >      if (OVS_UNLIKELY(error && error != ENOSPC)) {
> >          dp_packet_delete(packet);
> > -        (*lost_cnt)++;
> > -        return;
> > +        return error;
> >      }
> >
> >      /* The Netlink encoding of datapath flow keys cannot express @@ -5046,6
> > +4970,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
> >          ovs_mutex_unlock(&pmd->flow_mutex);
> >          emc_probabilistic_insert(pmd, key, netdev_flow);
> >      }
> > +    return error;
> >  }
> >
> >  static inline void
> > @@ -5067,7 +4992,7 @@ fast_path_processing(struct dp_netdev_pmd_thread
> > *pmd,
> >      struct dpcls *cls;
> >      struct dpcls_rule *rules[PKT_ARRAY_SIZE];
> >      struct dp_netdev *dp = pmd->dp;
> > -    int miss_cnt = 0, lost_cnt = 0;
> > +    int upcall_ok_cnt = 0, upcall_fail_cnt = 0;
> >      int lookup_cnt = 0, add_lookup_cnt;
> >      bool any_miss;
> >      size_t i;
> > @@ -5109,9 +5034,14 @@ fast_path_processing(struct dp_netdev_pmd_thread
> > *pmd,
> >                  continue;
> >              }
> >
> > -            miss_cnt++;
> > -            handle_packet_upcall(pmd, packet, &keys[i], &actions,
> > -                                 &put_actions, &lost_cnt, now);
> > +            int error = handle_packet_upcall(pmd, packet, &keys[i],
> > +                           &actions, &put_actions, now);
> > +
> > +            if (OVS_UNLIKELY(error)) {
> > +                upcall_fail_cnt++;
> > +            } else {
> > +                upcall_ok_cnt++;
> > +            }
> >          }
> >
> >          ofpbuf_uninit(&actions);
> > @@ -5121,8 +5051,7 @@ fast_path_processing(struct dp_netdev_pmd_thread
> > *pmd,
> >          DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
> >              if (OVS_UNLIKELY(!rules[i])) {
> >                  dp_packet_delete(packet);
> > -                lost_cnt++;
> > -                miss_cnt++;
> > +                upcall_fail_cnt++;
> >              }
> >          }
> >      }
> > @@ -5140,10 +5069,14 @@ fast_path_processing(struct
> > dp_netdev_pmd_thread *pmd,
> >          dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
> >      }
> >
> > -    dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
> > -    dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
> > -    dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
> > -    dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
> > +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT,
> > +                            cnt - upcall_ok_cnt - upcall_fail_cnt);
> > +    pmd_perf_update_counter(&pmd->perf_stats,
> > PMD_STAT_MASKED_LOOKUP,
> > +                            lookup_cnt);
> > +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS,
> > +                            upcall_ok_cnt);
> > +    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST,
> > +                            upcall_fail_cnt);
> >  }
> >
> >  /* Packets enter the datapath from a port (or from recirculation) here.
> > diff --git a/tests/pmd.at b/tests/pmd.at index e39a23a..0356f87 100644
> > --- a/tests/pmd.at
> > +++ b/tests/pmd.at
> > @@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0
> >  		p0 7/1: (dummy-pmd: configured_rx_queues=4,
> > configured_tx_queues=<cleared>, requested_rx_queues=4,
> > requested_tx_queues=<cleared>)
> >  ])
> >
> > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> > SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
> > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> > +SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
> >  pmd thread numa_id <cleared> core_id <cleared>:
> > +	packets received:0
> > +	packet recirculations:0
> > +	avg. datapath passes per packet:0.00
> >  	emc hits:0
> >  	megaflow hits:0
> > -	avg. subtable lookups per hit:0.00
> > -	miss:0
> > -	lost:0
> > +	avg. subtable lookups per megaflow hit:0.00
> > +	miss(i.e. lookup miss with success upcall):0
> > +	lost(i.e. lookup miss with failed upcall):0
> >  ])
> >
> >  ovs-appctl time/stop
> > @@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install
> > | strip_xout], [0], [dnl
> > recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50
> > :54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del>
> >  ])
> >
> > -AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> > SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
> > +AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed
> > +SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
> >  pmd thread numa_id <cleared> core_id <cleared>:
> > +	packets received:20
> > +	packet recirculations:0
> > +	avg. datapath passes per packet:1.00
> >  	emc hits:19
> >  	megaflow hits:0
> > -	avg. subtable lookups per hit:0.00
> > -	miss:1
> > -	lost:0
> > +	avg. subtable lookups per megaflow hit:0.00
> > +	miss(i.e. lookup miss with success upcall):1
> > +	lost(i.e. lookup miss with failed upcall):0
> >  ])
> >
> >  OVS_VSWITCHD_STOP
> > --
> > 1.9.1
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Aaron Conole Dec. 13, 2017, 4:07 p.m. UTC | #3
Jan Scheurich <jan.scheurich@ericsson.com> writes:

> Hi Billy,
>
> My patches frequently get corrupted by our email system. When I submit
> the v4 of the series adapted to the reverted dp_netdev_pmd_thread
> struct changes I will try out another method of sending them.
>
> For now, could you please try the attached patch files? For me they
> apply cleanly on today's master. Haven't tested them yet.

I wanted to follow up on this, but it seems I don't have any attachments
for this email.  Did my mail client break something?

> I'll fix the white-space errors and line lengths issues in the next
> version. More answers in-line.
>
> Thanks, Jan
diff mbox series

Patch

diff --git a/lib/automake.mk b/lib/automake.mk
index effe5b5..0b8df0e 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -80,6 +80,8 @@  lib_libopenvswitch_la_SOURCES = \
 	lib/dpdk.h \
 	lib/dpif-netdev.c \
 	lib/dpif-netdev.h \
+	lib/dpif-netdev-perf.c \
+	lib/dpif-netdev-perf.h \
 	lib/dpif-provider.h \
 	lib/dpif.c \
 	lib/dpif.h \
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
new file mode 100644
index 0000000..6a8f7c4
--- /dev/null
+++ b/lib/dpif-netdev-perf.c
@@ -0,0 +1,67 @@ 
+/*
+ * Copyright (c) 2017 Ericsson AB.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+
+#ifdef DPDK_NETDEV
+#include <rte_cycles.h>
+#endif
+
+#include "openvswitch/dynamic-string.h"
+#include "openvswitch/vlog.h"
+#include "dpif-netdev-perf.h"
+#include "timeval.h"
+
+VLOG_DEFINE_THIS_MODULE(pmd_perf);
+
+void
+pmd_perf_stats_init(struct pmd_perf_stats *s) {
+    memset(s, 0 , sizeof(*s));
+    s->start_ms = time_msec();
+}
+
+void
+pmd_perf_read_counters(struct pmd_perf_stats *s,
+                       uint64_t stats[PMD_N_STATS])
+{
+    uint64_t val;
+
+    /* These loops subtracts reference values ('*_zero') from the counters.
+     * Since loads and stores are relaxed, it might be possible for a '*_zero'
+     * value to be more recent than the current value we're reading from the
+     * counter.  This is not a big problem, since these numbers are not
+     * supposed to be too accurate, but we should at least make sure that
+     * the result is not negative. */
+    for (int i = 0; i < PMD_N_STATS; i++) {
+        atomic_read_relaxed(&s->counters.n[i], &val);
+        if (val > s->counters.zero[i]) {
+            stats[i] = val - s->counters.zero[i];
+        } else {
+            stats[i] = 0;
+        }
+    }
+}
+
+void
+pmd_perf_stats_clear(struct pmd_perf_stats *s)
+{
+    s->start_ms = 0; /* Marks start of clearing. */
+    for (int i = 0; i < PMD_N_STATS; i++) {
+        atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]);
+    }
+    s->start_ms = time_msec(); /* Clearing finished. */
+}
+
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
new file mode 100644
index 0000000..f55f7a2
--- /dev/null
+++ b/lib/dpif-netdev-perf.h
@@ -0,0 +1,123 @@ 
+/*
+ * Copyright (c) 2017 Ericsson AB.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef PMD_PERF_H
+#define PMD_PERF_H 1
+
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+#include <math.h>
+
+#include "openvswitch/vlog.h"
+#include "ovs-atomic.h"
+#include "timeval.h"
+#include "unixctl.h"
+#include "util.h"
+
+#ifdef  __cplusplus
+extern "C" {
+#endif
+
+enum pmd_stat_type {
+    PMD_STAT_EXACT_HIT,     /* Packets that had an exact match (emc). */
+    PMD_STAT_MASKED_HIT,    /* Packets that matched in the flow table. */
+    PMD_STAT_MISS,          /* Packets that did not match and upcall was ok. */
+    PMD_STAT_LOST,          /* Packets that did not match and upcall failed. */
+                           /* The above statistics account for the total
+                            * number of packet passes through the datapath
+                            * pipeline and should not be overlapping with each
+                            * other. */
+    PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table
+                              hits. Each MASKED_HIT hit will have >= 1
+                              MASKED_LOOKUP(s). */
+    PMD_STAT_RECV,          /* Packets entering the datapath pipeline from an
+                             * interface. */
+    PMD_STAT_RECIRC,        /* Packets reentering the datapath pipeline due to
+                             * recirculation. */
+    PMD_CYCLES_POLL_IDLE,   /* Cycles spent unsuccessful polling. */
+    PMD_CYCLES_POLL_BUSY,   /* Cycles spent successfully polling and
+                             * processing polled packets. */
+    PMD_CYCLES_OVERHEAD,    /* Cycles spent for other tasks. */
+    PMD_CYCLES_UPCALL,      /* Cycles spent in upcalls. Included in
+                               PMD_CYCLES_POLL_BUSY. */
+    PMD_CYCLES_ITER_IDLE,   /* Cycles spent in idle iterations. */
+    PMD_CYCLES_ITER_BUSY,   /* Cycles spent in busy iterations. */
+    PMD_N_STATS
+};
+
+struct pmd_counters {
+    atomic_uint64_t n[PMD_N_STATS];     /* Indexed by PMD_STAT_*. */
+    uint64_t zero[PMD_N_STATS];
+};
+
+struct pmd_perf_stats {
+    uint64_t start_ms;
+    uint64_t last_tsc;
+    struct pmd_counters counters;
+};
+
+enum pmd_info_type;
+struct dp_netdev_pmd_thread;
+
+struct pmd_perf_params {
+    int command_type;
+    bool histograms;
+    size_t iter_hist_len;
+    size_t ms_hist_len;
+};
+
+void pmd_perf_stats_init(struct pmd_perf_stats *s);
+void pmd_perf_stats_clear(struct pmd_perf_stats *s);
+void pmd_perf_read_counters(struct pmd_perf_stats *s,
+                            uint64_t stats[PMD_N_STATS]);
+
+static inline void
+pmd_perf_update_counter(struct pmd_perf_stats *s,
+                        enum pmd_stat_type counter, int delta)
+{
+    uint64_t tmp;
+    atomic_read_relaxed(&s->counters.n[counter], &tmp);
+    tmp += delta;
+    atomic_store_relaxed(&s->counters.n[counter], tmp);
+}
+
+static inline void
+pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc)
+{
+    if (OVS_UNLIKELY(s->start_ms == 0)) {
+        /* Stats not yet fully initialised. */
+        return;
+    }
+    s->last_tsc = now_tsc;
+}
+
+static inline void
+pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
+                       int packets)
+{
+    uint64_t cycles = now_tsc - s->last_tsc;
+
+    /* No need for atomic updates as only invoked within PMD thread. */
+    if (packets > 0) {
+        s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles;
+    } else {
+        s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles;
+    }
+}
+
+#endif /* pmd-perf.h */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0a62630..bf99f81 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -43,6 +43,7 @@ 
 #include "csum.h"
 #include "dp-packet.h"
 #include "dpif.h"
+#include "dpif-netdev-perf.h"
 #include "dpif-provider.h"
 #include "dummy.h"
 #include "fat-rwlock.h"
@@ -330,23 +331,6 @@  static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp,
                                                     odp_port_t)
     OVS_REQUIRES(dp->port_mutex);
 
-enum dp_stat_type {
-    DP_STAT_EXACT_HIT,          /* Packets that had an exact match (emc). */
-    DP_STAT_MASKED_HIT,         /* Packets that matched in the flow table. */
-    DP_STAT_MISS,               /* Packets that did not match. */
-    DP_STAT_LOST,               /* Packets not passed up to the client. */
-    DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for flow table
-                                   hits */
-    DP_N_STATS
-};
-
-enum pmd_cycles_counter_type {
-    PMD_CYCLES_IDLE,            /* Cycles spent idle or unsuccessful polling */
-    PMD_CYCLES_PROCESSING,      /* Cycles spent successfully polling and
-                                 * processing polled packets */
-    PMD_N_CYCLES
-};
-
 enum rxq_cycles_counter_type {
     RXQ_CYCLES_PROC_CURR,       /* Cycles spent successfully polling and
                                    processing packets during the current
@@ -496,18 +480,6 @@  struct dp_netdev_actions *dp_netdev_flow_get_actions(
     const struct dp_netdev_flow *);
 static void dp_netdev_actions_free(struct dp_netdev_actions *);
 
-/* Contained by struct dp_netdev_pmd_thread's 'stats' member.  */
-struct dp_netdev_pmd_stats {
-    /* Indexed by DP_STAT_*. */
-    atomic_ullong n[DP_N_STATS];
-};
-
-/* Contained by struct dp_netdev_pmd_thread's 'cycle' member.  */
-struct dp_netdev_pmd_cycles {
-    /* Indexed by PMD_CYCLES_*. */
-    atomic_ullong n[PMD_N_CYCLES];
-};
-
 struct polled_queue {
     struct dp_netdev_rxq *rxq;
     odp_port_t port_no;
@@ -566,20 +538,22 @@  struct dp_netdev_pmd_thread {
      * need to be protected by 'non_pmd_mutex'.  Every other instance
      * will only be accessed by its own pmd thread. */
     OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache flow_cache;
-    struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */
-
-    /* Queue id used by this pmd thread to send packets on all netdevs if
-     * XPS disabled for this netdev. All static_tx_qid's are unique and less
-     * than 'cmap_count(dp->poll_threads)'. */
-    uint32_t static_tx_qid;
-
-    /* Flow-Table and classifiers
-     *
-     * Writers of 'flow_table' must take the 'flow_mutex'.  Corresponding
-     * changes to 'classifiers' must be made while still holding the
-     * 'flow_mutex'.
-     */
-    struct ovs_mutex flow_mutex;
+    PADDED_MEMBERS(CACHE_LINE_SIZE,
+        struct ovs_refcount ref_cnt;    /* Every reference must be refcount'ed. */
+    
+        /* Queue id used by this pmd thread to send packets on all netdevs if
+         * XPS disabled for this netdev. All static_tx_qid's are unique and less
+         * than 'cmap_count(dp->poll_threads)'. */
+        uint32_t static_tx_qid;
+    
+        /* Flow-Table and classifiers
+         *
+         * Writers of 'flow_table' must take the 'flow_mutex'.  Corresponding
+         * changes to 'classifiers' must be made while still holding the
+         * 'flow_mutex'.
+         */
+        struct ovs_mutex flow_mutex;
+    );
     PADDED_MEMBERS(CACHE_LINE_SIZE,
         struct cmap flow_table OVS_GUARDED; /* Flow table. */
 
@@ -591,20 +565,10 @@  struct dp_netdev_pmd_thread {
            are stored for each polled rxq. */
         long long int rxq_next_cycle_store;
 
-        /* Cycles counters */
-        struct dp_netdev_pmd_cycles cycles;
-
         /* Used to count cycles. See 'cycles_counter_end()'. */
         unsigned long long last_cycles;
         struct latch exit_latch;        /* For terminating the pmd thread. */
-    );
-
-    PADDED_MEMBERS(CACHE_LINE_SIZE,
-        /* Statistics. */
-        struct dp_netdev_pmd_stats stats;
 
-        struct seq *reload_seq;
-        uint64_t last_reload_seq;
         atomic_bool reload;             /* Do we need to reload ports? */
         bool isolated;
 
@@ -612,11 +576,11 @@  struct dp_netdev_pmd_thread {
         bool need_reload;
         /* 5 pad bytes. */
     );
-
     PADDED_MEMBERS(CACHE_LINE_SIZE,
+        struct seq *reload_seq;
+        uint64_t last_reload_seq;
         struct ovs_mutex port_mutex;    /* Mutex for 'poll_list'
                                            and 'tx_ports'. */
-        /* 16 pad bytes. */
     );
     PADDED_MEMBERS(CACHE_LINE_SIZE,
         /* List of rx queues to poll. */
@@ -642,15 +606,8 @@  struct dp_netdev_pmd_thread {
         struct hmap send_port_cache;
     );
 
-    PADDED_MEMBERS(CACHE_LINE_SIZE,
-        /* Only a pmd thread can write on its own 'cycles' and 'stats'.
-         * The main thread keeps 'stats_zero' and 'cycles_zero' as base
-         * values and subtracts them from 'stats' and 'cycles' before
-         * reporting to the user */
-        unsigned long long stats_zero[DP_N_STATS];
-        uint64_t cycles_zero[PMD_N_CYCLES];
-        /* 8 pad bytes. */
-    );
+    /* Keep track of detailed PMD performance statistics. */
+    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct pmd_perf_stats perf_stats;            
 };
 
 /* Interface to netdev-based datapath. */
@@ -813,46 +770,11 @@  enum pmd_info_type {
 };
 
 static void
-pmd_info_show_stats(struct ds *reply,
-                    struct dp_netdev_pmd_thread *pmd,
-                    unsigned long long stats[DP_N_STATS],
-                    uint64_t cycles[PMD_N_CYCLES])
+format_pmd_thread(struct ds *reply,
+                  struct dp_netdev_pmd_thread *pmd)
 {
-    unsigned long long total_packets;
-    uint64_t total_cycles = 0;
-    int i;
-
-    /* These loops subtracts reference values ('*_zero') from the counters.
-     * Since loads and stores are relaxed, it might be possible for a '*_zero'
-     * value to be more recent than the current value we're reading from the
-     * counter.  This is not a big problem, since these numbers are not
-     * supposed to be too accurate, but we should at least make sure that
-     * the result is not negative. */
-    for (i = 0; i < DP_N_STATS; i++) {
-        if (stats[i] > pmd->stats_zero[i]) {
-            stats[i] -= pmd->stats_zero[i];
-        } else {
-            stats[i] = 0;
-        }
-    }
-
-    /* Sum of all the matched and not matched packets gives the total.  */
-    total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT]
-                    + stats[DP_STAT_MISS];
-
-    for (i = 0; i < PMD_N_CYCLES; i++) {
-        if (cycles[i] > pmd->cycles_zero[i]) {
-           cycles[i] -= pmd->cycles_zero[i];
-        } else {
-            cycles[i] = 0;
-        }
-
-        total_cycles += cycles[i];
-    }
-
     ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID)
                         ? "main thread" : "pmd thread");
-
     if (pmd->numa_id != OVS_NUMA_UNSPEC) {
         ds_put_format(reply, " numa_id %d", pmd->numa_id);
     }
@@ -860,16 +782,39 @@  pmd_info_show_stats(struct ds *reply,
         ds_put_format(reply, " core_id %u", pmd->core_id);
     }
     ds_put_cstr(reply, ":\n");
+}
+
+static void
+pmd_info_show_stats(struct ds *reply,
+                    struct dp_netdev_pmd_thread *pmd)
+{
+    uint64_t stats[PMD_N_STATS];
+    uint64_t total_cycles = 0;
+
+    pmd_perf_read_counters(&pmd->perf_stats, stats);
+    total_cycles = stats[PMD_CYCLES_ITER_IDLE]
+                         + stats[PMD_CYCLES_ITER_BUSY];
+
+    format_pmd_thread(reply, pmd);
 
     ds_put_format(reply,
-                  "\temc hits:%llu\n\tmegaflow hits:%llu\n"
-                  "\tavg. subtable lookups per hit:%.2f\n"
-                  "\tmiss:%llu\n\tlost:%llu\n",
-                  stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
-                  stats[DP_STAT_MASKED_HIT] > 0
-                  ? (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
-                  : 0,
-                  stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
+                  "\tpackets received:%"PRIu64"\n"
+                  "\tpacket recirculations:%"PRIu64"\n"
+                  "\tavg. datapath passes per packet:%.2f\n"
+                  "\temc hits:%"PRIu64"\n"
+                  "\tmegaflow hits:%"PRIu64"\n"
+                  "\tavg. subtable lookups per megaflow hit:%.2f\n"
+                  "\tmiss(i.e. lookup miss with success upcall):%"PRIu64"\n"
+                  "\tlost(i.e. lookup miss with failed upcall):%"PRIu64"\n",
+                  stats[PMD_STAT_RECV], stats[PMD_STAT_RECIRC],
+                  stats[PMD_STAT_RECV] > 0
+                  ? (1.0 * (stats[PMD_STAT_RECV] + stats[PMD_STAT_RECIRC]))
+                     / stats[PMD_STAT_RECV] : 0,
+                  stats[PMD_STAT_EXACT_HIT], stats[PMD_STAT_MASKED_HIT],
+                  stats[PMD_STAT_MASKED_HIT] > 0
+                  ? (1.0 * stats[PMD_STAT_MASKED_LOOKUP])
+                     / stats[PMD_STAT_MASKED_HIT]
+                  : 0, stats[PMD_STAT_MISS], stats[PMD_STAT_LOST]);
 
     if (total_cycles == 0) {
         return;
@@ -878,46 +823,26 @@  pmd_info_show_stats(struct ds *reply,
     ds_put_format(reply,
                   "\tidle cycles:%"PRIu64" (%.02f%%)\n"
                   "\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
-                  cycles[PMD_CYCLES_IDLE],
-                  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
-                  cycles[PMD_CYCLES_PROCESSING],
-                  cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100);
+                  stats[PMD_CYCLES_ITER_IDLE],
+                  stats[PMD_CYCLES_ITER_IDLE] / (double)total_cycles * 100,
+                  stats[PMD_CYCLES_ITER_BUSY],
+                  stats[PMD_CYCLES_ITER_BUSY] / (double)total_cycles * 100);
 
+    uint64_t total_packets = stats[PMD_STAT_RECV];
     if (total_packets == 0) {
         return;
     }
 
     ds_put_format(reply,
-                  "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n",
+                  "\tavg cycles per packet: %.02f (%"PRIu64"/%"PRIu64")\n",
                   total_cycles / (double)total_packets,
                   total_cycles, total_packets);
 
     ds_put_format(reply,
                   "\tavg processing cycles per packet: "
-                  "%.02f (%"PRIu64"/%llu)\n",
-                  cycles[PMD_CYCLES_PROCESSING] / (double)total_packets,
-                  cycles[PMD_CYCLES_PROCESSING], total_packets);
-}
-
-static void
-pmd_info_clear_stats(struct ds *reply OVS_UNUSED,
-                    struct dp_netdev_pmd_thread *pmd,
-                    unsigned long long stats[DP_N_STATS],
-                    uint64_t cycles[PMD_N_CYCLES])
-{
-    int i;
-
-    /* We cannot write 'stats' and 'cycles' (because they're written by other
-     * threads) and we shouldn't change 'stats' (because they're used to count
-     * datapath stats, which must not be cleared here).  Instead, we save the
-     * current values and subtract them from the values to be displayed in the
-     * future */
-    for (i = 0; i < DP_N_STATS; i++) {
-        pmd->stats_zero[i] = stats[i];
-    }
-    for (i = 0; i < PMD_N_CYCLES; i++) {
-        pmd->cycles_zero[i] = cycles[i];
-    }
+                  "%.02f (%"PRIu64"/%"PRIu64")\n",
+                  stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets,
+                  stats[PMD_CYCLES_POLL_BUSY], total_packets);
 }
 
 static int
@@ -1077,23 +1002,34 @@  dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
     struct ds reply = DS_EMPTY_INITIALIZER;
     struct dp_netdev_pmd_thread **pmd_list;
     struct dp_netdev *dp = NULL;
-    size_t n;
     enum pmd_info_type type = *(enum pmd_info_type *) aux;
+    int core_id = -1;
+    size_t n;
 
     ovs_mutex_lock(&dp_netdev_mutex);
 
-    if (argc == 2) {
-        dp = shash_find_data(&dp_netdevs, argv[1]);
-    } else if (shash_count(&dp_netdevs) == 1) {
-        /* There's only one datapath */
-        dp = shash_first(&dp_netdevs)->data;
+    while (argc > 1) {
+        if (!strcmp(argv[1], "-pmd")) {
+            core_id = strtol(argv[2], NULL, 10);
+            argc -= 2;
+            argv += 2;
+        } else {
+            dp = shash_find_data(&dp_netdevs, argv[1]);
+            argc -= 1;
+            argv += 1;
+        }
     }
 
     if (!dp) {
-        ovs_mutex_unlock(&dp_netdev_mutex);
-        unixctl_command_reply_error(conn,
-                                    "please specify an existing datapath");
-        return;
+        if (shash_count(&dp_netdevs) == 1) {
+            /* There's only one datapath */
+            dp = shash_first(&dp_netdevs)->data;
+        } else {
+            ovs_mutex_unlock(&dp_netdev_mutex);
+            unixctl_command_reply_error(conn,
+                                        "please specify an existing datapath");
+            return;
+        }
     }
 
     sorted_poll_thread_list(dp, &pmd_list, &n);
@@ -1102,26 +1038,15 @@  dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
         if (!pmd) {
             break;
         }
-
+        if (core_id != -1 && pmd->core_id != core_id) {
+            continue;
+        }
         if (type == PMD_INFO_SHOW_RXQ) {
             pmd_info_show_rxq(&reply, pmd);
-        } else {
-            unsigned long long stats[DP_N_STATS];
-            uint64_t cycles[PMD_N_CYCLES];
-
-            /* Read current stats and cycle counters */
-            for (size_t j = 0; j < ARRAY_SIZE(stats); j++) {
-                atomic_read_relaxed(&pmd->stats.n[j], &stats[j]);
-            }
-            for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) {
-                atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]);
-            }
-
-            if (type == PMD_INFO_CLEAR_STATS) {
-                pmd_info_clear_stats(&reply, pmd, stats, cycles);
-            } else if (type == PMD_INFO_SHOW_STATS) {
-                pmd_info_show_stats(&reply, pmd, stats, cycles);
-            }
+        } else if (type == PMD_INFO_CLEAR_STATS) {
+            pmd_perf_stats_clear(&pmd->perf_stats);
+        } else if (type == PMD_INFO_SHOW_STATS) {
+            pmd_info_show_stats(&reply, pmd);
         }
     }
     free(pmd_list);
@@ -1139,14 +1064,14 @@  dpif_netdev_init(void)
                               clear_aux = PMD_INFO_CLEAR_STATS,
                               poll_aux = PMD_INFO_SHOW_RXQ;
 
-    unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]",
-                             0, 1, dpif_netdev_pmd_info,
+    unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core] [dp]",
+                             0, 2, dpif_netdev_pmd_info,
                              (void *)&show_aux);
-    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]",
-                             0, 1, dpif_netdev_pmd_info,
+    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core] [dp]",
+                             0, 2, dpif_netdev_pmd_info,
                              (void *)&clear_aux);
-    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]",
-                             0, 1, dpif_netdev_pmd_info,
+    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]",
+                             0, 2, dpif_netdev_pmd_info,
                              (void *)&poll_aux);
     unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
                              0, 1, dpif_netdev_pmd_rebalance,
@@ -1482,23 +1407,19 @@  dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats)
 {
     struct dp_netdev *dp = get_dp_netdev(dpif);
     struct dp_netdev_pmd_thread *pmd;
+    uint64_t pmd_stats[PMD_N_STATS];
 
-    stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0;
+    stats->n_flows = stats->n_hit =
+            stats->n_mask_hit = stats->n_missed = stats->n_lost = 0;
     CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
-        unsigned long long n;
         stats->n_flows += cmap_count(&pmd->flow_table);
-
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n);
-        stats->n_hit += n;
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n);
-        stats->n_hit += n;
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n);
-        stats->n_missed += n;
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n);
-        stats->n_lost += n;
+        pmd_perf_read_counters(&pmd->perf_stats, pmd_stats);
+        stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT];
+        stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT];
+        stats->n_missed += pmd_stats[PMD_STAT_MISS];
+        stats->n_lost += pmd_stats[PMD_STAT_LOST];
     }
     stats->n_masks = UINT32_MAX;
-    stats->n_mask_hit = UINT64_MAX;
 
     return 0;
 }
@@ -3174,28 +3095,28 @@  cycles_count_start(struct dp_netdev_pmd_thread *pmd)
 /* Stop counting cycles and add them to the counter 'type' */
 static inline void
 cycles_count_end(struct dp_netdev_pmd_thread *pmd,
-                 enum pmd_cycles_counter_type type)
+                 enum pmd_stat_type type)
     OVS_RELEASES(&cycles_counter_fake_mutex)
     OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     unsigned long long interval = cycles_counter() - pmd->last_cycles;
 
-    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
+    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
 }
 
 /* Calculate the intermediate cycle result and add to the counter 'type' */
 static inline void
 cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
                           struct dp_netdev_rxq *rxq,
-                          enum pmd_cycles_counter_type type)
+                          enum pmd_stat_type type)
     OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     unsigned long long new_cycles = cycles_counter();
     unsigned long long interval = new_cycles - pmd->last_cycles;
     pmd->last_cycles = new_cycles;
 
-    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
-    if (rxq && (type == PMD_CYCLES_PROCESSING)) {
+    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
+    if (rxq && (type == PMD_CYCLES_POLL_BUSY)) {
         /* Add to the amount of current processing cycles. */
         non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval);
     }
@@ -3885,12 +3806,12 @@  dpif_netdev_run(struct dpif *dpif)
                                                    port->port_no);
                     cycles_count_intermediate(non_pmd, NULL,
                                               process_packets
-                                              ? PMD_CYCLES_PROCESSING
-                                              : PMD_CYCLES_IDLE);
+                                              ? PMD_CYCLES_POLL_BUSY
+                                              : PMD_CYCLES_POLL_IDLE);
                 }
             }
         }
-        cycles_count_end(non_pmd, PMD_CYCLES_IDLE);
+        cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE);
         dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false);
         ovs_mutex_unlock(&dp->non_pmd_mutex);
 
@@ -4034,6 +3955,7 @@  static void *
 pmd_thread_main(void *f_)
 {
     struct dp_netdev_pmd_thread *pmd = f_;
+    struct pmd_perf_stats *s = &pmd->perf_stats;
     unsigned int lc = 0;
     struct polled_queue *poll_list;
     bool exiting;
@@ -4069,13 +3991,17 @@  reload:
 
     cycles_count_start(pmd);
     for (;;) {
+        uint64_t iter_packets = 0;
+        pmd_perf_start_iteration(s, pmd->last_cycles);
         for (i = 0; i < poll_cnt; i++) {
             process_packets =
                 dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
                                            poll_list[i].port_no);
             cycles_count_intermediate(pmd, poll_list[i].rxq,
-                                      process_packets ? PMD_CYCLES_PROCESSING
-                                                      : PMD_CYCLES_IDLE);
+                                      process_packets
+                                      ? PMD_CYCLES_POLL_BUSY
+                                      : PMD_CYCLES_POLL_IDLE);
+            iter_packets += process_packets;
         }
 
         if (lc++ > 1024) {
@@ -4093,10 +4019,12 @@  reload:
             if (reload) {
                 break;
             }
+            cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD);
         }
+        pmd_perf_end_iteration(s, pmd->last_cycles, iter_packets);
     }
 
-    cycles_count_end(pmd, PMD_CYCLES_IDLE);
+    cycles_count_end(pmd, PMD_CYCLES_OVERHEAD);
 
     poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
     exiting = latch_is_set(&pmd->exit_latch);
@@ -4549,6 +4477,7 @@  dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
     }
     cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node),
                 hash_int(core_id, 0));
+    pmd_perf_stats_init(&pmd->perf_stats);
 }
 
 static void
@@ -4746,13 +4675,6 @@  dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, int cnt, int size,
     atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags);
 }
 
-static void
-dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd,
-                       enum dp_stat_type type, int cnt)
-{
-    non_atomic_ullong_add(&pmd->stats.n[type], cnt);
-}
-
 static int
 dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_,
                  struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid,
@@ -4926,6 +4848,9 @@  emc_processing(struct dp_netdev_pmd_thread *pmd,
     int i;
 
     atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
+    pmd_perf_update_counter(&pmd->perf_stats,
+                            md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
+                            cnt);
 
     DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
         struct dp_netdev_flow *flow;
@@ -4974,24 +4899,24 @@  emc_processing(struct dp_netdev_pmd_thread *pmd,
         }
     }
 
-    dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT,
-                           cnt - n_dropped - n_missed);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT,
+                            cnt - n_dropped - n_missed);
 
     return dp_packet_batch_size(packets_);
 }
 
-static inline void
+static inline int
 handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
                      struct dp_packet *packet,
                      const struct netdev_flow_key *key,
                      struct ofpbuf *actions, struct ofpbuf *put_actions,
-                     int *lost_cnt, long long now)
+                     long long now)
 {
     struct ofpbuf *add_actions;
     struct dp_packet_batch b;
     struct match match;
     ovs_u128 ufid;
-    int error;
+    int error = 0;
 
     match.tun_md.valid = false;
     miniflow_expand(&key->mf, &match.flow);
@@ -5005,8 +4930,7 @@  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
                              put_actions);
     if (OVS_UNLIKELY(error && error != ENOSPC)) {
         dp_packet_delete(packet);
-        (*lost_cnt)++;
-        return;
+        return error;
     }
 
     /* The Netlink encoding of datapath flow keys cannot express
@@ -5046,6 +4970,7 @@  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
         ovs_mutex_unlock(&pmd->flow_mutex);
         emc_probabilistic_insert(pmd, key, netdev_flow);
     }
+    return error;
 }
 
 static inline void
@@ -5067,7 +4992,7 @@  fast_path_processing(struct dp_netdev_pmd_thread *pmd,
     struct dpcls *cls;
     struct dpcls_rule *rules[PKT_ARRAY_SIZE];
     struct dp_netdev *dp = pmd->dp;
-    int miss_cnt = 0, lost_cnt = 0;
+    int upcall_ok_cnt = 0, upcall_fail_cnt = 0;
     int lookup_cnt = 0, add_lookup_cnt;
     bool any_miss;
     size_t i;
@@ -5109,9 +5034,14 @@  fast_path_processing(struct dp_netdev_pmd_thread *pmd,
                 continue;
             }
 
-            miss_cnt++;
-            handle_packet_upcall(pmd, packet, &keys[i], &actions,
-                                 &put_actions, &lost_cnt, now);
+            int error = handle_packet_upcall(pmd, packet, &keys[i],
+                           &actions, &put_actions, now);
+
+            if (OVS_UNLIKELY(error)) {
+                upcall_fail_cnt++;
+            } else {
+                upcall_ok_cnt++;
+            }
         }
 
         ofpbuf_uninit(&actions);
@@ -5121,8 +5051,7 @@  fast_path_processing(struct dp_netdev_pmd_thread *pmd,
         DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
             if (OVS_UNLIKELY(!rules[i])) {
                 dp_packet_delete(packet);
-                lost_cnt++;
-                miss_cnt++;
+                upcall_fail_cnt++;
             }
         }
     }
@@ -5140,10 +5069,14 @@  fast_path_processing(struct dp_netdev_pmd_thread *pmd,
         dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
     }
 
-    dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
-    dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
-    dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
-    dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT,
+                            cnt - upcall_ok_cnt - upcall_fail_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP,
+                            lookup_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS,
+                            upcall_ok_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST,
+                            upcall_fail_cnt);
 }
 
 /* Packets enter the datapath from a port (or from recirculation) here.
diff --git a/tests/pmd.at b/tests/pmd.at
index e39a23a..0356f87 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -170,13 +170,16 @@  dummy@ovs-dummy: hit:0 missed:0
 		p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=<cleared>, requested_rx_queues=4, requested_tx_queues=<cleared>)
 ])
 
-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
 pmd thread numa_id <cleared> core_id <cleared>:
+	packets received:0
+	packet recirculations:0
+	avg. datapath passes per packet:0.00
 	emc hits:0
 	megaflow hits:0
-	avg. subtable lookups per hit:0.00
-	miss:0
-	lost:0
+	avg. subtable lookups per megaflow hit:0.00
+	miss(i.e. lookup miss with success upcall):0
+	lost(i.e. lookup miss with failed upcall):0
 ])
 
 ovs-appctl time/stop
@@ -197,13 +200,16 @@  AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl
 recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del>
 ])
 
-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
 pmd thread numa_id <cleared> core_id <cleared>:
+	packets received:20
+	packet recirculations:0
+	avg. datapath passes per packet:1.00
 	emc hits:19
 	megaflow hits:0
-	avg. subtable lookups per hit:0.00
-	miss:1
-	lost:0
+	avg. subtable lookups per megaflow hit:0.00
+	miss(i.e. lookup miss with success upcall):1
+	lost(i.e. lookup miss with failed upcall):0
 ])
 
 OVS_VSWITCHD_STOP