From patchwork Thu Jan  4 12:07:44 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Scheurich <jan.scheurich@ericsson.com>
X-Patchwork-Id: 855576
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path: <ovs-dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
	(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
	envelope-from=ovs-dev-bounces@openvswitch.org;
	receiver=<UNKNOWN>)
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
	[140.211.169.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3zC6Bs3Pssz9sRm
	for <incoming@patchwork.ozlabs.org>;
	Thu,  4 Jan 2018 23:13:09 +1100 (AEDT)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
	by mail.linuxfoundation.org (Postfix) with ESMTP id 1780CBA3;
	Thu,  4 Jan 2018 12:13:07 +0000 (UTC)
X-Original-To: dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 17922ACC
	for <dev@openvswitch.org>; Thu,  4 Jan 2018 12:13:05 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mout.web.de (mout.web.de [212.227.17.12])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7354E18A
	for <dev@openvswitch.org>; Thu,  4 Jan 2018 12:13:02 +0000 (UTC)
Received: from ubuntu.local ([129.192.10.2]) by smtp.web.de (mrweb101
	[213.165.67.124]) with ESMTPSA (Nemesis) id 0MT8o6-1eMjGx3niI-00S3P9;
	Thu, 04 Jan 2018 13:07:53 +0100
From: Jan Scheurich <jan.scheurich@ericsson.com>
To: dev@openvswitch.org
Date: Thu,  4 Jan 2018 13:07:44 +0100
Message-Id: <1515067666-12398-2-git-send-email-jan.scheurich@ericsson.com>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com>
References: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com>
X-Provags-ID: V03:K0:PFTiTZHZ15NIcYqbyhWYhC62tYZG469bYZxQa8GPUZkwa3NuC56
	8aW39iKMaacl4GML+HvEoKLSB02bsDsM27WNyRZWaUVS/iddi172KVwIKSLYzXtD0rnVsZ6
	h7m+v2vJsugowtW0W2qWMKpqrFSxWRtfxU7AkOYbIyD+Ke8iloL/70hjoVuBEli0wXOIDRr
	fsE5DxNztRNLf22t3S11A==
X-UI-Out-Filterresults: notjunk:1; V01:K0:pZJlYHgW8QQ=:oyoASrn0XuXr97zpBWUPEp
	NyTbe/bTMEgnYs115TV+7PK+CeCFN8Jgy3G3I6LRIF+RWQR64lUvQfm0QQnGQrhEQBnhqwIMR
	Vp3ABkewpE1IA98pKgBUlSLUqXcsOe/3UgrgG9qHLi7JwAE35HLMTE+x7giYZXXhxrobp1Ui/
	9Olw62z/I7D026iipuUI2Ghy1etBCEoyEwIsTSCQx5GgwBjxhtNZU8b298xxb67EMdU0eAX9N
	+nkUWpB4GzWO6z7gEPdn6RL+EO8Kmed+UpEPsSJUHi7631Pz6YqmOBeUxlYRwjAry5xoOT2s6
	OkaqcHsNy8aufptuXGEWjOW7N4RmJmB6oX1YHifr+tLC+qHr87XjMabeDCiCpoopngOV4yqY0
	jVzs91/d/02yHrXXSw026UNa2TZHi/CNrW4/cynwIt81QPO3987JKNEjdGlQpW2ddZeqV0TOn
	fUoOJjK7vUT0qD2zU63xM1jtlMeo/uBnWx61G9pPvlQWqtpgOOw++qraSWBHxB5kPvaH5OM8R
	CX/y08Vf5jLBKB0iYuZPy8LSNpJ3iiL0BITMTU+zrZ9Mrct9pIBbkrrtiHEf4ps9hOHrThbSN
	U5L3f3vSOti9R72Tr3P19RQOeCQZOKovZuSAVKQx9pZRdHv1CwzGBhMJUOHhrZcyN6X8Poz+e
	+j3RnE1KDgcGmN4x2QcIoFdj90sANyCNcquq07+2BcCZjMifZ7mmseLeitRajDvHgw5zoK1C0
	lmaiw4PBG4H4Ap/X6gSNUGjlTw3IjwpLcFHX9Ljtd0xjYjA/qxE5uFwPimHOM97bAc6LbtOgO
	iwg/dMwStE+agLG9kq2euxDUeEdSQ==
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	RCVD_IN_DNSWL_LOW,T_RP_MATCHES_RCVD autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: [ovs-dev] [PATCH v5 1/3] dpif-netdev: Refactor PMD performance into
	dpif-netdev-perf
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <ovs-dev.openvswitch.org>
List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/>
List-Post: <mailto:ovs-dev@openvswitch.org>
List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help>
List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

Add module dpif-netdev-perf to host all PMD performance-related
data structures and functions in dpif-netdev. Refactor the PMD
stats handling in dpif-netdev and delegate whatever possible into
the new module, using clean interfaces to shield dpif-netdev from
the implementation details. Accordingly, the all PMD statistics
members are moved from the main struct dp_netdev_pmd_thread into
a dedicated member of type struct pmd_perf_stats.

Include Darrel's prior refactoring of PMD stats contained in
[PATCH v5,2/3] dpif-netdev: Refactor some pmd stats:

1. The cycles per packet counts are now based on packets
received rather than packet passes through the datapath.

2. Packet counters are now kept for packets received and
packets recirculated. These are kept as separate counters for
maintainability reasons. The cost of incrementing these counters
is negligible.  These new counters are also displayed to the user.

3. A display statistic is added for the average number of
datapath passes per packet. This should be useful for user
debugging and understanding of packet processing.

4. The user visible 'miss' counter is used for successful upcalls,
rather than the sum of sucessful and unsuccessful upcalls. Hence,
this becomes what user historically understands by OVS 'miss upcall'.
The user display is annotated to make this clear as well.

5. The user visible 'lost' counter remains as failed upcalls, but
is annotated to make it clear what the meaning is.

6. The enum pmd_stat_type is annotated to make the usage of the
stats counters clear.

7. The subtable lookup stats is renamed to make it clear that it
relates to masked lookups.

8. The PMD stats test is updated to handle the new user stats of
packets received, packets recirculated and average number of datapath
passes per packet.

On top of that introduce a "-pmd <core>" option to the PMD info
commands to filter the output for a single PMD.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Co-authored-by: Darrell Ball <dlu998@gmail.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
---
 lib/automake.mk        |   2 +
 lib/dpif-netdev-perf.c |  66 +++++++++
 lib/dpif-netdev-perf.h | 129 ++++++++++++++++++
 lib/dpif-netdev.c      | 353 ++++++++++++++++++++-----------------------------
 tests/pmd.at           |  22 +--
 5 files changed, 357 insertions(+), 215 deletions(-)
 create mode 100644 lib/dpif-netdev-perf.c
 create mode 100644 lib/dpif-netdev-perf.h
diff --git a/lib/automake.mk b/lib/automake.mk
index 4b38a11..159319f 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \
 	lib/dpdk.h \
 	lib/dpif-netdev.c \
 	lib/dpif-netdev.h \
+	lib/dpif-netdev-perf.c \
+	lib/dpif-netdev-perf.h \
 	lib/dpif-provider.h \
 	lib/dpif.c \
 	lib/dpif.h \
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
new file mode 100644
index 0000000..7d8b7b2
--- /dev/null
+++ b/lib/dpif-netdev-perf.c
@@ -0,0 +1,66 @@
+/*
+ * Copyright (c) 2017 Ericsson AB.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+
+#ifdef DPDK_NETDEV
+#include <rte_cycles.h>
+#endif
+
+#include "openvswitch/dynamic-string.h"
+#include "openvswitch/vlog.h"
+#include "dpif-netdev-perf.h"
+#include "timeval.h"
+
+VLOG_DEFINE_THIS_MODULE(pmd_perf);
+
+void
+pmd_perf_stats_init(struct pmd_perf_stats *s) {
+    memset(s, 0 , sizeof(*s));
+    s->start_ms = time_msec();
+}
+
+void
+pmd_perf_read_counters(struct pmd_perf_stats *s,
+                       uint64_t stats[PMD_N_STATS])
+{
+    uint64_t val;
+
+    /* These loops subtracts reference values ('.zero[*]') from the counters.
+     * Since loads and stores are relaxed, it might be possible for a '.zero[*]'
+     * value to be more recent than the current value we're reading from the
+     * counter.  This is not a big problem, since these numbers are not
+     * supposed to be too accurate, but we should at least make sure that
+     * the result is not negative. */
+    for (int i = 0; i < PMD_N_STATS; i++) {
+        atomic_read_relaxed(&s->counters.n[i], &val);
+        if (val > s->counters.zero[i]) {
+            stats[i] = val - s->counters.zero[i];
+        } else {
+            stats[i] = 0;
+        }
+    }
+}
+
+void
+pmd_perf_stats_clear(struct pmd_perf_stats *s)
+{
+    s->start_ms = 0; /* Marks start of clearing. */
+    for (int i = 0; i < PMD_N_STATS; i++) {
+        atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]);
+    }
+    s->start_ms = time_msec(); /* Clearing finished. */
+}
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
new file mode 100644
index 0000000..fa5cf37
--- /dev/null
+++ b/lib/dpif-netdev-perf.h
@@ -0,0 +1,129 @@
+/*
+ * Copyright (c) 2017 Ericsson AB.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef DPIF_NETDEV_PERF_H
+#define DPIF_NETDEV_PERF_H 1
+
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+#include <math.h>
+
+#include "openvswitch/vlog.h"
+#include "ovs-atomic.h"
+#include "timeval.h"
+#include "unixctl.h"
+#include "util.h"
+
+#ifdef  __cplusplus
+extern "C" {
+#endif
+
+enum pmd_stat_type {
+    PMD_STAT_EXACT_HIT,     /* Packets that had an exact match (emc). */
+    PMD_STAT_MASKED_HIT,    /* Packets that matched in the flow table. */
+    PMD_STAT_MISS,          /* Packets that did not match and upcall was ok. */
+    PMD_STAT_LOST,          /* Packets that did not match and upcall failed. */
+                            /* The above statistics account for the total
+                             * number of packet passes through the datapath
+                             * pipeline and should not be overlapping with each
+                             * other. */
+    PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table
+                               hits. Each MASKED_HIT hit will have >= 1
+                               MASKED_LOOKUP(s). */
+    PMD_STAT_RECV,          /* Packets entering the datapath pipeline from an
+                             * interface. */
+    PMD_STAT_RECIRC,        /* Packets reentering the datapath pipeline due to
+                             * recirculation. */
+    PMD_STAT_SENT_PKTS,     /* Packets that has been sent. */
+    PMD_STAT_SENT_BATCHES,  /* Number of batches sent. */
+    PMD_CYCLES_POLL_IDLE,   /* Cycles spent unsuccessful polling. */
+    PMD_CYCLES_POLL_BUSY,   /* Cycles spent successfully polling and
+                             * processing polled packets. */
+    PMD_CYCLES_OVERHEAD,    /* Cycles spent for other tasks. */
+    PMD_CYCLES_UPCALL,      /* Cycles spent in upcalls. Included in
+                               PMD_CYCLES_POLL_BUSY. */
+    PMD_CYCLES_ITER_IDLE,   /* Cycles spent in idle iterations. */
+    PMD_CYCLES_ITER_BUSY,   /* Cycles spent in busy iterations. */
+    PMD_N_STATS
+};
+
+struct pmd_counters {
+    atomic_uint64_t n[PMD_N_STATS];     /* Indexed by PMD_STAT_*. */
+    uint64_t zero[PMD_N_STATS];
+};
+
+struct pmd_perf_stats {
+    uint64_t start_ms;
+    uint64_t last_tsc;
+    struct pmd_counters counters;
+};
+
+enum pmd_info_type;
+struct dp_netdev_pmd_thread;
+
+struct pmd_perf_params {
+    int command_type;
+    bool histograms;
+    size_t iter_hist_len;
+    size_t ms_hist_len;
+};
+
+void pmd_perf_stats_init(struct pmd_perf_stats *s);
+void pmd_perf_stats_clear(struct pmd_perf_stats *s);
+void pmd_perf_read_counters(struct pmd_perf_stats *s,
+                            uint64_t stats[PMD_N_STATS]);
+
+static inline void
+pmd_perf_update_counter(struct pmd_perf_stats *s,
+                        enum pmd_stat_type counter, int delta)
+{
+    uint64_t tmp;
+    atomic_read_relaxed(&s->counters.n[counter], &tmp);
+    tmp += delta;
+    atomic_store_relaxed(&s->counters.n[counter], tmp);
+}
+
+static inline void
+pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc)
+{
+    if (OVS_UNLIKELY(s->start_ms == 0)) {
+        /* Stats not yet fully initialised. */
+        return;
+    }
+    s->last_tsc = now_tsc;
+}
+
+static inline void
+pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
+                       int packets)
+{
+    uint64_t cycles = now_tsc - s->last_tsc;
+
+    /* No need for atomic updates as only invoked within PMD thread. */
+    if (packets > 0) {
+        s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles;
+    } else {
+        s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles;
+    }
+}
+
+#ifdef  __cplusplus
+}
+#endif
+
+#endif /* DPIF_NETDEV_PERF_H */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 349e98a..8b9698a 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -44,6 +44,7 @@
 #include "csum.h"
 #include "dp-packet.h"
 #include "dpif.h"
+#include "dpif-netdev-perf.h"
 #include "dpif-provider.h"
 #include "dummy.h"
 #include "fat-rwlock.h"
@@ -331,25 +332,6 @@ static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp,
                                                     odp_port_t)
     OVS_REQUIRES(dp->port_mutex);
 
-enum dp_stat_type {
-    DP_STAT_EXACT_HIT,          /* Packets that had an exact match (emc). */
-    DP_STAT_MASKED_HIT,         /* Packets that matched in the flow table. */
-    DP_STAT_MISS,               /* Packets that did not match. */
-    DP_STAT_LOST,               /* Packets not passed up to the client. */
-    DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for flow table
-                                   hits */
-    DP_STAT_SENT_PKTS,          /* Packets that has been sent. */
-    DP_STAT_SENT_BATCHES,       /* Number of batches sent. */
-    DP_N_STATS
-};
-
-enum pmd_cycles_counter_type {
-    PMD_CYCLES_IDLE,            /* Cycles spent idle or unsuccessful polling */
-    PMD_CYCLES_PROCESSING,      /* Cycles spent successfully polling and
-                                 * processing polled packets */
-    PMD_N_CYCLES
-};
-
 enum rxq_cycles_counter_type {
     RXQ_CYCLES_PROC_CURR,       /* Cycles spent successfully polling and
                                    processing packets during the current
@@ -499,21 +481,6 @@ struct dp_netdev_actions *dp_netdev_flow_get_actions(
     const struct dp_netdev_flow *);
 static void dp_netdev_actions_free(struct dp_netdev_actions *);
 
-/* Contained by struct dp_netdev_pmd_thread's 'stats' member.  */
-struct dp_netdev_pmd_stats {
-    /* Indexed by DP_STAT_*. */
-    atomic_ullong n[DP_N_STATS];
-};
-
-/* Contained by struct dp_netdev_pmd_thread's 'cycle' member.  */
-struct dp_netdev_pmd_cycles {
-    /* Indexed by PMD_CYCLES_*. */
-    atomic_ullong n[PMD_N_CYCLES];
-};
-
-static void dp_netdev_count_packet(struct dp_netdev_pmd_thread *,
-                                   enum dp_stat_type type, int cnt);
-
 struct polled_queue {
     struct dp_netdev_rxq *rxq;
     odp_port_t port_no;
@@ -595,12 +562,6 @@ struct dp_netdev_pmd_thread {
        are stored for each polled rxq. */
     long long int rxq_next_cycle_store;
 
-    /* Statistics. */
-    struct dp_netdev_pmd_stats stats;
-
-    /* Cycles counters */
-    struct dp_netdev_pmd_cycles cycles;
-
     /* Current context of the PMD thread. */
     struct dp_netdev_pmd_thread_ctx ctx;
 
@@ -638,12 +599,8 @@ struct dp_netdev_pmd_thread {
     struct hmap tnl_port_cache;
     struct hmap send_port_cache;
 
-    /* Only a pmd thread can write on its own 'cycles' and 'stats'.
-     * The main thread keeps 'stats_zero' and 'cycles_zero' as base
-     * values and subtracts them from 'stats' and 'cycles' before
-     * reporting to the user */
-    unsigned long long stats_zero[DP_N_STATS];
-    uint64_t cycles_zero[PMD_N_CYCLES];
+    /* Keep track of detailed PMD performance statistics. */
+    struct pmd_perf_stats perf_stats;
 
     /* Set to true if the pmd thread needs to be reloaded. */
     bool need_reload;
@@ -833,47 +790,10 @@ enum pmd_info_type {
 };
 
 static void
-pmd_info_show_stats(struct ds *reply,
-                    struct dp_netdev_pmd_thread *pmd,
-                    unsigned long long stats[DP_N_STATS],
-                    uint64_t cycles[PMD_N_CYCLES])
+format_pmd_thread(struct ds *reply, struct dp_netdev_pmd_thread *pmd)
 {
-    unsigned long long total_packets;
-    uint64_t total_cycles = 0;
-    double lookups_per_hit = 0, packets_per_batch = 0;
-    int i;
-
-    /* These loops subtracts reference values ('*_zero') from the counters.
-     * Since loads and stores are relaxed, it might be possible for a '*_zero'
-     * value to be more recent than the current value we're reading from the
-     * counter.  This is not a big problem, since these numbers are not
-     * supposed to be too accurate, but we should at least make sure that
-     * the result is not negative. */
-    for (i = 0; i < DP_N_STATS; i++) {
-        if (stats[i] > pmd->stats_zero[i]) {
-            stats[i] -= pmd->stats_zero[i];
-        } else {
-            stats[i] = 0;
-        }
-    }
-
-    /* Sum of all the matched and not matched packets gives the total.  */
-    total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT]
-                    + stats[DP_STAT_MISS];
-
-    for (i = 0; i < PMD_N_CYCLES; i++) {
-        if (cycles[i] > pmd->cycles_zero[i]) {
-           cycles[i] -= pmd->cycles_zero[i];
-        } else {
-            cycles[i] = 0;
-        }
-
-        total_cycles += cycles[i];
-    }
-
     ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID)
                         ? "main thread" : "pmd thread");
-
     if (pmd->numa_id != OVS_NUMA_UNSPEC) {
         ds_put_format(reply, " numa_id %d", pmd->numa_id);
     }
@@ -881,23 +801,51 @@ pmd_info_show_stats(struct ds *reply,
         ds_put_format(reply, " core_id %u", pmd->core_id);
     }
     ds_put_cstr(reply, ":\n");
+}
+
+static void
+pmd_info_show_stats(struct ds *reply,
+                    struct dp_netdev_pmd_thread *pmd)
+{
+    uint64_t stats[PMD_N_STATS];
+    uint64_t total_cycles = 0;
+    double passes_per_pkt = 0;
+    double lookups_per_hit = 0;
+    double packets_per_batch = 0;
+
+    pmd_perf_read_counters(&pmd->perf_stats, stats);
+    total_cycles = stats[PMD_CYCLES_ITER_IDLE]
+                         + stats[PMD_CYCLES_ITER_BUSY];
 
-    if (stats[DP_STAT_MASKED_HIT] > 0) {
-        lookups_per_hit = stats[DP_STAT_LOOKUP_HIT]
-                          / (double) stats[DP_STAT_MASKED_HIT];
+    format_pmd_thread(reply, pmd);
+
+    if (stats[PMD_STAT_RECV] > 0) {
+        passes_per_pkt = (stats[PMD_STAT_RECV] + stats[PMD_STAT_RECIRC])
+                            / (double) stats[PMD_STAT_RECV];
+    }
+    if (stats[PMD_STAT_MASKED_HIT] > 0) {
+        lookups_per_hit = stats[PMD_STAT_MASKED_LOOKUP]
+                            / (double) stats[PMD_STAT_MASKED_HIT];
     }
-    if (stats[DP_STAT_SENT_BATCHES] > 0) {
-        packets_per_batch = stats[DP_STAT_SENT_PKTS]
-                            / (double) stats[DP_STAT_SENT_BATCHES];
+    if (stats[PMD_STAT_SENT_BATCHES] > 0) {
+        packets_per_batch = stats[PMD_STAT_SENT_PKTS]
+                            / (double) stats[PMD_STAT_SENT_BATCHES];
     }
 
     ds_put_format(reply,
-                  "\temc hits:%llu\n\tmegaflow hits:%llu\n"
-                  "\tavg. subtable lookups per hit:%.2f\n"
-                  "\tmiss:%llu\n\tlost:%llu\n"
+                  "\tpackets received:%"PRIu64"\n"
+                  "\tpacket recirculations:%"PRIu64"\n"
+                  "\tavg. datapath passes per packet:%.2f\n"
+                  "\temc hits:%"PRIu64"\n"
+                  "\tmegaflow hits:%"PRIu64"\n"
+                  "\tavg. subtable lookups per megaflow hit:%.2f\n"
+                  "\tmiss(i.e. lookup miss with success upcall):%"PRIu64"\n"
+                  "\tlost(i.e. lookup miss with failed upcall):%"PRIu64"\n"
                   "\tavg. packets per output batch: %.2f\n",
-                  stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
-                  lookups_per_hit, stats[DP_STAT_MISS], stats[DP_STAT_LOST],
+                  stats[PMD_STAT_RECV], stats[PMD_STAT_RECIRC],
+                  passes_per_pkt, stats[PMD_STAT_EXACT_HIT],
+                  stats[PMD_STAT_MASKED_HIT], lookups_per_hit,
+                  stats[PMD_STAT_MISS], stats[PMD_STAT_LOST],
                   packets_per_batch);
 
     if (total_cycles == 0) {
@@ -907,46 +855,26 @@ pmd_info_show_stats(struct ds *reply,
     ds_put_format(reply,
                   "\tidle cycles:%"PRIu64" (%.02f%%)\n"
                   "\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
-                  cycles[PMD_CYCLES_IDLE],
-                  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
-                  cycles[PMD_CYCLES_PROCESSING],
-                  cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100);
+                  stats[PMD_CYCLES_ITER_IDLE],
+                  stats[PMD_CYCLES_ITER_IDLE] / (double)total_cycles * 100,
+                  stats[PMD_CYCLES_ITER_BUSY],
+                  stats[PMD_CYCLES_ITER_BUSY] / (double)total_cycles * 100);
 
+    uint64_t total_packets = stats[PMD_STAT_RECV];
     if (total_packets == 0) {
         return;
     }
 
     ds_put_format(reply,
-                  "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n",
+                  "\tavg cycles per packet: %.02f (%"PRIu64"/%"PRIu64")\n",
                   total_cycles / (double)total_packets,
                   total_cycles, total_packets);
 
     ds_put_format(reply,
                   "\tavg processing cycles per packet: "
-                  "%.02f (%"PRIu64"/%llu)\n",
-                  cycles[PMD_CYCLES_PROCESSING] / (double)total_packets,
-                  cycles[PMD_CYCLES_PROCESSING], total_packets);
-}
-
-static void
-pmd_info_clear_stats(struct ds *reply OVS_UNUSED,
-                    struct dp_netdev_pmd_thread *pmd,
-                    unsigned long long stats[DP_N_STATS],
-                    uint64_t cycles[PMD_N_CYCLES])
-{
-    int i;
-
-    /* We cannot write 'stats' and 'cycles' (because they're written by other
-     * threads) and we shouldn't change 'stats' (because they're used to count
-     * datapath stats, which must not be cleared here).  Instead, we save the
-     * current values and subtract them from the values to be displayed in the
-     * future */
-    for (i = 0; i < DP_N_STATS; i++) {
-        pmd->stats_zero[i] = stats[i];
-    }
-    for (i = 0; i < PMD_N_CYCLES; i++) {
-        pmd->cycles_zero[i] = cycles[i];
-    }
+                  "%.02f (%"PRIu64"/%"PRIu64")\n",
+                  stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets,
+                  stats[PMD_CYCLES_POLL_BUSY], total_packets);
 }
 
 static int
@@ -1106,23 +1034,34 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
     struct ds reply = DS_EMPTY_INITIALIZER;
     struct dp_netdev_pmd_thread **pmd_list;
     struct dp_netdev *dp = NULL;
-    size_t n;
     enum pmd_info_type type = *(enum pmd_info_type *) aux;
+    int core_id = -1;
+    size_t n;
 
     ovs_mutex_lock(&dp_netdev_mutex);
 
-    if (argc == 2) {
-        dp = shash_find_data(&dp_netdevs, argv[1]);
-    } else if (shash_count(&dp_netdevs) == 1) {
-        /* There's only one datapath */
-        dp = shash_first(&dp_netdevs)->data;
+    while (argc > 1) {
+        if (!strcmp(argv[1], "-pmd")) {
+            core_id = strtol(argv[2], NULL, 10);
+            argc -= 2;
+            argv += 2;
+        } else {
+            dp = shash_find_data(&dp_netdevs, argv[1]);
+            argc -= 1;
+            argv += 1;
+        }
     }
 
     if (!dp) {
-        ovs_mutex_unlock(&dp_netdev_mutex);
-        unixctl_command_reply_error(conn,
-                                    "please specify an existing datapath");
-        return;
+        if (shash_count(&dp_netdevs) == 1) {
+            /* There's only one datapath */
+            dp = shash_first(&dp_netdevs)->data;
+        } else {
+            ovs_mutex_unlock(&dp_netdev_mutex);
+            unixctl_command_reply_error(conn,
+                                        "please specify an existing datapath");
+            return;
+        }
     }
 
     sorted_poll_thread_list(dp, &pmd_list, &n);
@@ -1131,26 +1070,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
         if (!pmd) {
             break;
         }
-
+        if (core_id != -1 && pmd->core_id != core_id) {
+            continue;
+        }
         if (type == PMD_INFO_SHOW_RXQ) {
             pmd_info_show_rxq(&reply, pmd);
-        } else {
-            unsigned long long stats[DP_N_STATS];
-            uint64_t cycles[PMD_N_CYCLES];
-
-            /* Read current stats and cycle counters */
-            for (size_t j = 0; j < ARRAY_SIZE(stats); j++) {
-                atomic_read_relaxed(&pmd->stats.n[j], &stats[j]);
-            }
-            for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) {
-                atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]);
-            }
-
-            if (type == PMD_INFO_CLEAR_STATS) {
-                pmd_info_clear_stats(&reply, pmd, stats, cycles);
-            } else if (type == PMD_INFO_SHOW_STATS) {
-                pmd_info_show_stats(&reply, pmd, stats, cycles);
-            }
+        } else if (type == PMD_INFO_CLEAR_STATS) {
+            pmd_perf_stats_clear(&pmd->perf_stats);
+        } else if (type == PMD_INFO_SHOW_STATS) {
+            pmd_info_show_stats(&reply, pmd);
         }
     }
     free(pmd_list);
@@ -1168,14 +1096,14 @@ dpif_netdev_init(void)
                               clear_aux = PMD_INFO_CLEAR_STATS,
                               poll_aux = PMD_INFO_SHOW_RXQ;
 
-    unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]",
-                             0, 1, dpif_netdev_pmd_info,
+    unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core | dp]",
+                             0, 2, dpif_netdev_pmd_info,
                              (void *)&show_aux);
-    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]",
-                             0, 1, dpif_netdev_pmd_info,
+    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core | dp]",
+                             0, 2, dpif_netdev_pmd_info,
                              (void *)&clear_aux);
-    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]",
-                             0, 1, dpif_netdev_pmd_info,
+    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core | dp]",
+                             0, 2, dpif_netdev_pmd_info,
                              (void *)&poll_aux);
     unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
                              0, 1, dpif_netdev_pmd_rebalance,
@@ -1511,23 +1439,19 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats)
 {
     struct dp_netdev *dp = get_dp_netdev(dpif);
     struct dp_netdev_pmd_thread *pmd;
+    uint64_t pmd_stats[PMD_N_STATS];
 
-    stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0;
+    stats->n_flows = stats->n_hit =
+            stats->n_mask_hit = stats->n_missed = stats->n_lost = 0;
     CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
-        unsigned long long n;
         stats->n_flows += cmap_count(&pmd->flow_table);
-
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n);
-        stats->n_hit += n;
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n);
-        stats->n_hit += n;
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n);
-        stats->n_missed += n;
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n);
-        stats->n_lost += n;
+        pmd_perf_read_counters(&pmd->perf_stats, pmd_stats);
+        stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT];
+        stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT];
+        stats->n_missed += pmd_stats[PMD_STAT_MISS];
+        stats->n_lost += pmd_stats[PMD_STAT_LOST];
     }
     stats->n_masks = UINT32_MAX;
-    stats->n_mask_hit = UINT64_MAX;
 
     return 0;
 }
@@ -3209,28 +3133,28 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd)
 /* Stop counting cycles and add them to the counter 'type' */
 static inline void
 cycles_count_end(struct dp_netdev_pmd_thread *pmd,
-                 enum pmd_cycles_counter_type type)
+                 enum pmd_stat_type type)
     OVS_RELEASES(&cycles_counter_fake_mutex)
     OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     unsigned long long interval = cycles_counter() - pmd->ctx.last_cycles;
 
-    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
+    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
 }
 
 /* Calculate the intermediate cycle result and add to the counter 'type' */
 static inline void
 cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
                           struct dp_netdev_rxq *rxq,
-                          enum pmd_cycles_counter_type type)
+                          enum pmd_stat_type type)
     OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     unsigned long long new_cycles = cycles_counter();
     unsigned long long interval = new_cycles - pmd->ctx.last_cycles;
     pmd->ctx.last_cycles = new_cycles;
 
-    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
-    if (rxq && (type == PMD_CYCLES_PROCESSING)) {
+    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
+    if (rxq && (type == PMD_CYCLES_POLL_BUSY)) {
         /* Add to the amount of current processing cycles. */
         non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval);
     }
@@ -3289,8 +3213,10 @@ dp_netdev_pmd_flush_output_on_port(struct dp_netdev_pmd_thread *pmd,
     netdev_send(p->port->netdev, tx_qid, &p->output_pkts, dynamic_txqs);
     dp_packet_batch_init(&p->output_pkts);
 
-    dp_netdev_count_packet(pmd, DP_STAT_SENT_PKTS, output_cnt);
-    dp_netdev_count_packet(pmd, DP_STAT_SENT_BATCHES, 1);
+    pmd_perf_update_counter(&pmd->perf_stats,
+                            PMD_STAT_SENT_PKTS, output_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats,
+                            PMD_STAT_SENT_BATCHES, 1);
 }
 
 static void
@@ -3971,12 +3897,12 @@ dpif_netdev_run(struct dpif *dpif)
                                                    port->port_no);
                     cycles_count_intermediate(non_pmd, NULL,
                                               process_packets
-                                              ? PMD_CYCLES_PROCESSING
-                                              : PMD_CYCLES_IDLE);
+                                              ? PMD_CYCLES_POLL_BUSY
+                                              : PMD_CYCLES_POLL_IDLE);
                 }
             }
         }
-        cycles_count_end(non_pmd, PMD_CYCLES_IDLE);
+        cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE);
         pmd_thread_ctx_time_update(non_pmd);
         dpif_netdev_xps_revalidate_pmd(non_pmd, false);
         ovs_mutex_unlock(&dp->non_pmd_mutex);
@@ -4121,6 +4047,7 @@ static void *
 pmd_thread_main(void *f_)
 {
     struct dp_netdev_pmd_thread *pmd = f_;
+    struct pmd_perf_stats *s = &pmd->perf_stats;
     unsigned int lc = 0;
     struct polled_queue *poll_list;
     bool exiting;
@@ -4156,13 +4083,17 @@ reload:
 
     cycles_count_start(pmd);
     for (;;) {
+        uint64_t iter_packets = 0;
+        pmd_perf_start_iteration(s, pmd->ctx.last_cycles);
         for (i = 0; i < poll_cnt; i++) {
             process_packets =
                 dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
                                            poll_list[i].port_no);
             cycles_count_intermediate(pmd, poll_list[i].rxq,
-                                      process_packets ? PMD_CYCLES_PROCESSING
-                                                      : PMD_CYCLES_IDLE);
+                                      process_packets
+                                      ? PMD_CYCLES_POLL_BUSY
+                                      : PMD_CYCLES_POLL_IDLE);
+            iter_packets += process_packets;
         }
 
         if (lc++ > 1024) {
@@ -4183,10 +4114,12 @@ reload:
             if (reload) {
                 break;
             }
+            cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD);
         }
+        pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets);
     }
 
-    cycles_count_end(pmd, PMD_CYCLES_IDLE);
+    cycles_count_end(pmd, PMD_CYCLES_OVERHEAD);
 
     poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
     exiting = latch_is_set(&pmd->exit_latch);
@@ -4640,6 +4573,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
     }
     cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node),
                 hash_int(core_id, 0));
+    pmd_perf_stats_init(&pmd->perf_stats);
 }
 
 static void
@@ -4838,13 +4772,6 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, int cnt, int size,
     atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags);
 }
 
-static void
-dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd,
-                       enum dp_stat_type type, int cnt)
-{
-    non_atomic_ullong_add(&pmd->stats.n[type], cnt);
-}
-
 static int
 dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_,
                  struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid,
@@ -5017,6 +4944,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
     int i;
 
     atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
+    pmd_perf_update_counter(&pmd->perf_stats,
+                            md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
+                            cnt);
 
     DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
         struct dp_netdev_flow *flow;
@@ -5065,18 +4995,17 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
         }
     }
 
-    dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT,
-                           cnt - n_dropped - n_missed);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT,
+                            cnt - n_dropped - n_missed);
 
     return dp_packet_batch_size(packets_);
 }
 
-static inline void
+static inline int
 handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
                      struct dp_packet *packet,
                      const struct netdev_flow_key *key,
-                     struct ofpbuf *actions, struct ofpbuf *put_actions,
-                     int *lost_cnt)
+                     struct ofpbuf *actions, struct ofpbuf *put_actions)
 {
     struct ofpbuf *add_actions;
     struct dp_packet_batch b;
@@ -5096,8 +5025,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
                              put_actions);
     if (OVS_UNLIKELY(error && error != ENOSPC)) {
         dp_packet_delete(packet);
-        (*lost_cnt)++;
-        return;
+        return error;
     }
 
     /* The Netlink encoding of datapath flow keys cannot express
@@ -5137,6 +5065,9 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
         ovs_mutex_unlock(&pmd->flow_mutex);
         emc_probabilistic_insert(pmd, key, netdev_flow);
     }
+    /* Only error ENOSPC can reach here. We process the packet but do not
+     * install a datapath flow. Treat as successful. */
+    return 0;
 }
 
 static inline void
@@ -5158,7 +5089,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
     struct dpcls *cls;
     struct dpcls_rule *rules[PKT_ARRAY_SIZE];
     struct dp_netdev *dp = pmd->dp;
-    int miss_cnt = 0, lost_cnt = 0;
+    int upcall_ok_cnt = 0, upcall_fail_cnt = 0;
     int lookup_cnt = 0, add_lookup_cnt;
     bool any_miss;
     size_t i;
@@ -5200,9 +5131,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
                 continue;
             }
 
-            miss_cnt++;
-            handle_packet_upcall(pmd, packet, &keys[i], &actions,
-                                 &put_actions, &lost_cnt);
+            int error = handle_packet_upcall(pmd, packet, &keys[i],
+                           &actions, &put_actions);
+
+            if (OVS_UNLIKELY(error)) {
+                upcall_fail_cnt++;
+            } else {
+                upcall_ok_cnt++;
+            }
         }
 
         ofpbuf_uninit(&actions);
@@ -5212,8 +5148,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
         DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
             if (OVS_UNLIKELY(!rules[i])) {
                 dp_packet_delete(packet);
-                lost_cnt++;
-                miss_cnt++;
+                upcall_fail_cnt++;
             }
         }
     }
@@ -5231,10 +5166,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
         dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
     }
 
-    dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
-    dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
-    dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
-    dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT,
+                            cnt - upcall_ok_cnt - upcall_fail_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP,
+                            lookup_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS,
+                            upcall_ok_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST,
+                            upcall_fail_cnt);
 }
 
 /* Packets enter the datapath from a port (or from recirculation) here.
diff --git a/tests/pmd.at b/tests/pmd.at
index e39a23a..0356f87 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0
 		p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=<cleared>, requested_rx_queues=4, requested_tx_queues=<cleared>)
 ])
 
-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
 pmd thread numa_id <cleared> core_id <cleared>:
+	packets received:0
+	packet recirculations:0
+	avg. datapath passes per packet:0.00
 	emc hits:0
 	megaflow hits:0
-	avg. subtable lookups per hit:0.00
-	miss:0
-	lost:0
+	avg. subtable lookups per megaflow hit:0.00
+	miss(i.e. lookup miss with success upcall):0
+	lost(i.e. lookup miss with failed upcall):0
 ])
 
 ovs-appctl time/stop
@@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl
 recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del>
 ])
 
-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
 pmd thread numa_id <cleared> core_id <cleared>:
+	packets received:20
+	packet recirculations:0
+	avg. datapath passes per packet:1.00
 	emc hits:19
 	megaflow hits:0
-	avg. subtable lookups per hit:0.00
-	miss:1
-	lost:0
+	avg. subtable lookups per megaflow hit:0.00
+	miss(i.e. lookup miss with success upcall):1
+	lost(i.e. lookup miss with failed upcall):0
 ])
 
 OVS_VSWITCHD_STOP

From patchwork Thu Jan  4 12:07:45 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Scheurich <jan.scheurich@ericsson.com>
X-Patchwork-Id: 855574
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path: <ovs-dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
	(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
	envelope-from=ovs-dev-bounces@openvswitch.org;
	receiver=<UNKNOWN>)
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
	[140.211.169.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3zC6690kWJz9s7s
	for <incoming@patchwork.ozlabs.org>;
	Thu,  4 Jan 2018 23:09:05 +1100 (AEDT)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
	by mail.linuxfoundation.org (Postfix) with ESMTP id 36E02BDB;
	Thu,  4 Jan 2018 12:08:03 +0000 (UTC)
X-Original-To: dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id AD823BAF
	for <dev@openvswitch.org>; Thu,  4 Jan 2018 12:08:00 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mout.web.de (mout.web.de [212.227.17.11])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0A59014D
	for <dev@openvswitch.org>; Thu,  4 Jan 2018 12:07:57 +0000 (UTC)
Received: from ubuntu.local ([129.192.10.2]) by smtp.web.de (mrweb101
	[213.165.67.124]) with ESMTPSA (Nemesis) id 0LaCji-1fBO0z06hi-00m4GR;
	Thu, 04 Jan 2018 13:07:55 +0100
From: Jan Scheurich <jan.scheurich@ericsson.com>
To: dev@openvswitch.org
Date: Thu,  4 Jan 2018 13:07:45 +0100
Message-Id: <1515067666-12398-3-git-send-email-jan.scheurich@ericsson.com>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com>
References: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com>
X-Provags-ID: V03:K0:uPDcDMb/kt/U0muhQ/97zSChYlDjGxTGAkMddRL5nmSgBNkajlD
	gfTi677qvQ/ftW5nQ4QHtS59PRwxzRE65IK2bN1mGA56FII1Mo1pKmglf330OtYJogQa3Mh
	3FEKb9FBZ4mQB/KwJ5+msNXPbgb6ZsOXMw2XFSCIJLXNuXI2rWilYBE+yqI28cnvrv6nJHZ
	mSSDtBtjVakuWeAjG4vlQ==
X-UI-Out-Filterresults: notjunk:1; V01:K0:YVBuUBHocfo=:Md/m/6SlvPufIaZsJN5r1R
	TMZV6dcnnuCbL+xih8mEFAJoMdQZLJxss8NT1J/OkHAY7l2zRqaWvvKjdCokNFu9jUKkj00Qt
	anH3lirvWbJez0lvPdXGCTz4PPKVfYc6znDe/z0wobdnPOnfbDBuyZHm/qBqpiHL2eD+2u1iC
	edpCC15QOGWHMqD8A/liuRZv9D3+lxV0QBY2if9naaSk1WpoIZFyEKZmiWWhR54hu9604N8+N
	//gM7tbeJ2GvgnQaYGMX6mveWdpmkTbkR/GaCZ1Se6k/gsgGyaIvTDVQvtqGoB7JjuRbKyN4X
	W/mAJIqHcm73yhH7lljTb/oLrWmHcaAc5v7+kG4jmpxTB7Rg1w5Nkad6+7G3oi8TlJJQGmU33
	5VpEQX7WuxZuI1FlyXDNvt4LdRBu0P4A5mlTceoR1Hng1BUlUXft92oKxvKfN5kjx/df6KQ7x
	ct6oU+Vh2QEvVlqDTuz+m6CwmxGNgHKfIKO2JXHOIKU169MJjKBvUrh38FEnK32ZZqRUQggUM
	QdOpFstiufNmYpbLraCzVT595DcLgzenbQnyEzLTFFv/Fg4wW5tyvO3EwJtT4e4IKhBYWLujT
	JYkmVbYbDJKMgZKJSnqJB0i/vc0QVu8gfRTtBAdqIP6w1dDJEppd2bi9u5CWsGNJ7WGQ7rHw8
	lf+6ehCzbrO6Z6IFE92CnrrHuzkCFNtUEw5SesDGA/+sOLYwo7Y6Vhju3mr4JgeelcfaM83DU
	M7/ts1Lv9mrTABQcQ5LwRqFlAZnX8FFurKwgYdVQESrB6sqnaEAGSm+oq/L9LCU1hKFugp0Bs
	fG3ISG01NBcGcozzs8myrNdZfUPKA==
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	RCVD_IN_DNSWL_LOW,T_RP_MATCHES_RCVD autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: [ovs-dev] [PATCH v5 2/3] dpif-netdev: Detailed performance stats
	for PMDs
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <ovs-dev.openvswitch.org>
List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/>
List-Post: <mailto:ovs-dev@openvswitch.org>
List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help>
List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

This patch instruments the dpif-netdev datapath to record detailed
statistics of what is happening in every iteration of a PMD thread.

The collection of detailed statistics can be controlled by a new
configuration parameter "other_config:pmd-perf-metrics". By default it
is disabled. The run-time overhead, when enabled, is in the order of 1%.

The covered metrics per iteration are:
  - cycles
  - packets
  - (rx) batches
  - packets/batch
  - max. vhostuser qlen
  - upcalls
  - cycles spent in upcalls

This raw recorded data is used threefold:

1. In histograms for each of the following metrics:
   - cycles/iteration (log.)
   - packets/iteration (log.)
   - cycles/packet
   - packets/batch
   - max. vhostuser qlen (log.)
   - upcalls
   - cycles/upcall (log)
   The histograms bins are divided linear or logarithmic.

2. A cyclic history of the above statistics for 1024 iterations

3. A cyclic history of the cummulative/average values per millisecond
   wall clock for the last 1024 milliseconds:
   - number of iterations
   - avg. cycles/iteration
   - packets (Kpps)
   - avg. packets/batch
   - avg. max vhost qlen
   - upcalls
   - avg. cycles/upcall

The gathered performance statists can be printed at any time with the
new CLI command

ovs-appctl dpif-netdev/pmd-perf-show [-nh] [-it iter_len] [-ms ms_len]
	[-pmd core] [dp]

The options are

-nh:            Suppress the histograms
-it iter_len:   Display the last iter_len iteration stats
-ms ms_len:     Display the last ms_len millisecond stats
-pmd core:      Display only

The performance statistics are reset with the existing
dpif-netdev/pmd-stats-clear command.

The output always contains the following global PMD statistics,
similar to the pmd-stats-show command:

Time: 15:24:55.270
Measurement duration: 1.008 s

pmd thread numa_id 0 core_id 1:

  Cycles:            2419034712  (2.40 GHz)
  Iterations:            572817  (1.76 us/it)
  - idle:                486808  (15.9 % cycles)
  - busy:                 86009  (84.1 % cycles)
  Packets:              2399607  (2381 Kpps, 848 cycles/pkt)
  Datapath passes:      3599415  (1.50 passes/pkt)
  - EMC hits:            336472  ( 9.3 %)
  - Megaflow hits:      3262943  (90.7 %, 1.00 subtbl lookups/hit)
  - Upcalls:                  0  ( 0.0 %, 0.0 us/upcall)
  - Lost upcalls:             0  ( 0.0 %)

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
---
 lib/dp-packet.h        |   2 +
 lib/dpif-netdev-perf.c | 309 ++++++++++++++++++++++++++++++++++++++++++++++++-
 lib/dpif-netdev-perf.h | 173 ++++++++++++++++++++++++++-
 lib/dpif-netdev.c      | 158 +++++++++++++++++++++++--
 lib/netdev-dpdk.c      |  23 +++-
 lib/netdev-dpdk.h      |  14 +++
 ofproto/ofproto-dpif.c |   3 +-
 7 files changed, 664 insertions(+), 18 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index b4b721c..7950247 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -695,8 +695,10 @@ enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a batch. */
 
 struct dp_packet_batch {
     size_t count;
+    size_t qfill; /* Number of packets in queue when reading rx burst. */
     bool trunc; /* true if the batch needs truncate. */
     struct dp_packet *packets[NETDEV_MAX_BURST];
+
 };
 
 static inline void
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 7d8b7b2..a66a48c 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -15,6 +15,7 @@
  */
 
 #include <config.h>
+#include <stdint.h>
 
 #ifdef DPDK_NETDEV
 #include <rte_cycles.h>
@@ -27,13 +28,307 @@
 
 VLOG_DEFINE_THIS_MODULE(pmd_perf);
 
+#ifdef DPDK_NETDEV
+static uint64_t
+get_tsc_hz(void)
+{
+    return rte_get_tsc_hz();
+}
+#else
+static uint64_t
+read_tsc(void)
+{
+    register uint64_t tsc asm("eax");
+    asm volatile (".byte 15, 49" : : : "eax", "edx");
+    return tsc;
+}
+
+static uint64_t
+get_tsc_hz(void)
+{
+    uint64_t tsc_start = read_tsc();
+    xsleep(1);
+    return read_tsc() - tsc_start;
+}
+#endif
+
+static void
+histogram_walls_set_lin(struct histogram *hist, uint32_t min, uint32_t max)
+{
+    int i;
+
+    ovs_assert(min < max);
+    for (i = 0; i < NUM_BINS-1; i++) {
+        hist->wall[i] = min + (i * (max - min)) / (NUM_BINS - 2);
+    }
+    hist->wall[NUM_BINS-1] = UINT32_MAX;
+}
+
+static void
+histogram_walls_set_log(struct histogram *hist, uint32_t min, uint32_t max)
+{
+    int i, start, bins, wall;
+    double log_min, log_max;
+
+    ovs_assert(min < max);
+    if (min > 0) {
+        log_min = log(min);
+        log_max = log(max);
+        start = 0;
+        bins = NUM_BINS - 1;
+    } else {
+        hist->wall[0] = 0;
+        log_min = log(1);
+        log_max = log(max);
+        start = 1;
+        bins = NUM_BINS - 2;
+    }
+    wall = start;
+    for (i = 0; i < bins; i++) {
+        /* Make sure each wall is monotonically increasing. */
+        wall = MAX(wall, exp(log_min + (i * (log_max - log_min)) / (bins-1)));
+        hist->wall[start + i] = wall++;
+    }
+    if (hist->wall[NUM_BINS-2] < max) {
+        hist->wall[NUM_BINS-2] = max;
+    }
+    hist->wall[NUM_BINS-1] = UINT32_MAX;
+}
+
+uint64_t
+histogram_samples(const struct histogram *hist)
+{
+    uint64_t samples = 0;
+
+    for (int i = 0; i < NUM_BINS; i++) {
+        samples += hist->bin[i];
+    }
+    return samples;
+}
+
+static void
+histogram_clear(struct histogram *hist)
+{
+    int i;
+
+    for (i = 0; i < NUM_BINS; i++) {
+        hist->bin[i] = 0;
+    }
+}
+
+static void
+history_init(struct history *h)
+{
+    memset(h, 0, sizeof(*h));
+}
+
 void
 pmd_perf_stats_init(struct pmd_perf_stats *s) {
-    memset(s, 0 , sizeof(*s));
+    /* The struct has been zeroed at allocation. */
+    histogram_walls_set_log(&s->cycles, 500, 24000000);
+    histogram_walls_set_log(&s->pkts, 0, 1000);
+    histogram_walls_set_lin(&s->cycles_per_pkt, 100, 30000);
+    histogram_walls_set_lin(&s->pkts_per_batch, 0, 32);
+    histogram_walls_set_lin(&s->upcalls, 0, 30);
+    histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
+    histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
     s->start_ms = time_msec();
 }
 
 void
+pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
+                              double duration)
+{
+    uint64_t stats[PMD_N_STATS];
+    double us_per_cycle = 1000000.0 / get_tsc_hz();
+
+    if (duration == 0) {
+        return;
+    }
+
+    pmd_perf_read_counters(s, stats);
+    uint64_t tot_cycles = stats[PMD_CYCLES_ITER_IDLE] +
+                          stats[PMD_CYCLES_ITER_BUSY];
+    uint64_t packets = stats[PMD_STAT_RECV];
+    uint64_t passes = stats[PMD_STAT_RECV] +
+                      stats[PMD_STAT_RECIRC];
+    uint64_t upcalls = stats[PMD_STAT_MISS];
+    uint64_t upcall_cycles = stats[PMD_CYCLES_UPCALL];
+    uint64_t tot_iter = histogram_samples(&s->pkts);
+    uint64_t idle_iter = s->pkts.bin[0];
+    uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0;
+
+    ds_put_format(str,
+            "  Cycles:          %12"PRIu64"  (%.2f GHz)\n"
+            "  Iterations:      %12"PRIu64"  (%.2f us/it)\n"
+            "  - idle:          %12"PRIu64"  (%4.1f %% cycles)\n"
+            "  - busy:          %12"PRIu64"  (%4.1f %% cycles)\n",
+            tot_cycles, (tot_cycles / duration) / 1E9,
+            tot_iter, tot_cycles * us_per_cycle / tot_iter,
+            idle_iter,
+            100.0 * stats[PMD_CYCLES_ITER_IDLE] / tot_cycles,
+            busy_iter,
+            100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles);
+    if (packets > 0) {
+        ds_put_format(str,
+            "  Packets:         %12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
+            "  Datapath passes: %12"PRIu64"  (%.2f passes/pkt)\n"
+            "  - EMC hits:      %12"PRIu64"  (%4.1f %%)\n"
+            "  - Megaflow hits: %12"PRIu64"  (%4.1f %%, %.2f subtbl lookups/"
+                                                                     "hit)\n"
+            "  - Upcalls:       %12"PRIu64"  (%4.1f %%, %.1f us/upcall)\n"
+            "  - Lost upcalls:  %12"PRIu64"  (%4.1f %%)\n"
+            "\n",
+            packets, (packets / duration) / 1000,
+            1.0 * stats[PMD_CYCLES_ITER_BUSY] / packets,
+            passes, packets ? 1.0 * passes / packets : 0,
+            stats[PMD_STAT_EXACT_HIT],
+            100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
+            stats[PMD_STAT_MASKED_HIT],
+            100.0 * stats[PMD_STAT_MASKED_HIT] / passes,
+            stats[PMD_STAT_MASKED_HIT]
+            ? 1.0 * stats[PMD_STAT_MASKED_LOOKUP] / stats[PMD_STAT_MASKED_HIT]
+            : 0,
+            upcalls, 100.0 * upcalls / passes,
+            upcalls ? (upcall_cycles * us_per_cycle) / upcalls : 0,
+            stats[PMD_STAT_LOST],
+            100.0 * stats[PMD_STAT_LOST] / passes);
+    } else {
+        ds_put_format(str,
+                "  Packets:         %12"PRIu64"\n"
+                "\n",
+                0UL);
+    }
+}
+
+void
+pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats *s)
+{
+    int i;
+
+    ds_put_cstr(str, "Histograms\n");
+    ds_put_format(str,
+                  "   %-21s  %-21s  %-21s  %-21s  %-21s  %-21s  %-21s\n",
+                  "cycles/it", "packets/it", "cycles/pkt", "pkts/batch",
+                  "max vhost qlen", "upcalls/it", "cycles/upcall");
+    for (i = 0; i < NUM_BINS-1; i++) {
+        ds_put_format(str,
+            "   %-9d %-11"PRIu64"  %-9d %-11"PRIu64"  %-9d %-11"PRIu64
+            "  %-9d %-11"PRIu64"  %-9d %-11"PRIu64"  %-9d %-11"PRIu64
+            "  %-9d %-11"PRIu64"\n",
+            s->cycles.wall[i], s->cycles.bin[i],
+            s->pkts.wall[i],s->pkts.bin[i],
+            s->cycles_per_pkt.wall[i], s->cycles_per_pkt.bin[i],
+            s->pkts_per_batch.wall[i], s->pkts_per_batch.bin[i],
+            s->max_vhost_qfill.wall[i], s->max_vhost_qfill.bin[i],
+            s->upcalls.wall[i], s->upcalls.bin[i],
+            s->cycles_per_upcall.wall[i], s->cycles_per_upcall.bin[i]);
+    }
+    ds_put_format(str,
+                  "   %-9s %-11"PRIu64"  %-9s %-11"PRIu64"  %-9s %-11"PRIu64
+                  "  %-9s %-11"PRIu64"  %-9s %-11"PRIu64"  %-9s %-11"PRIu64
+                  "  %-9s %-11"PRIu64"\n",
+                  ">", s->cycles.bin[i],
+                  ">", s->pkts.bin[i],
+                  ">", s->cycles_per_pkt.bin[i],
+                  ">", s->pkts_per_batch.bin[i],
+                  ">", s->max_vhost_qfill.bin[i],
+                  ">", s->upcalls.bin[i],
+                  ">", s->cycles_per_upcall.bin[i]);
+    if (s->totals.iterations > 0) {
+        ds_put_cstr(str,
+                    "-----------------------------------------------------"
+                    "-----------------------------------------------------"
+                    "------------------------------------------------\n");
+        ds_put_format(str,
+                      "   %-21s  %-21s  %-21s  %-21s  %-21s  %-21s  %-21s\n",
+                      "cycles/it", "packets/it", "cycles/pkt", "pkts/batch",
+                      "vhost qlen", "upcalls/it", "cycles/upcall");
+        ds_put_format(str,
+                      "   %-21"PRIu64"  %-21.5f  %-21"PRIu64
+                      "  %-21.5f  %-21.5f  %-21.5f  %-21"PRIu32"\n",
+                      s->totals.cycles / s->totals.iterations,
+                      1.0 * s->totals.pkts / s->totals.iterations,
+                      s->totals.pkts
+                          ? s->totals.busy_cycles / s->totals.pkts : 0,
+                      s->totals.batches
+                          ? 1.0 * s->totals.pkts / s->totals.batches : 0,
+                      1.0 * s->totals.max_vhost_qfill / s->totals.iterations,
+                      1.0 * s->totals.upcalls / s->totals.iterations,
+                      s->totals.upcalls
+                          ? s->totals.upcall_cycles / s->totals.upcalls : 0);
+    }
+}
+
+void
+pmd_perf_format_iteration_history(struct ds *str, struct pmd_perf_stats *s,
+                                  int n_iter)
+{
+    struct iter_stats *is;
+    size_t index;
+    int i;
+
+    if (n_iter == 0) {
+        return;
+    }
+    ds_put_format(str, "   %-17s   %-10s   %-10s   %-10s   %-10s   "
+                  "%-10s   %-10s   %-10s\n",
+                  "tsc", "cycles", "packets", "cycles/pkt", "pkts/batch",
+                  "vhost qlen", "upcalls", "cycles/upcall");
+    for (i = 1; i <= n_iter; i++) {
+        index = (s->iterations.idx + HISTORY_LEN - i) % HISTORY_LEN;
+        is = &s->iterations.sample[index];
+        ds_put_format(str,
+                      "   %-17"PRIu64"   %-11"PRIu64"  %-11"PRIu32
+                      "  %-11"PRIu64"  %-11"PRIu32"  %-11"PRIu32
+                      "  %-11"PRIu32"  %-11"PRIu32"\n",
+                      is->timestamp,
+                      is->cycles,
+                      is->pkts,
+                      is->pkts ? is->cycles / is->pkts : 0,
+                      is->batches ? is->pkts / is->batches : 0,
+                      is->max_vhost_qfill,
+                      is->upcalls,
+                      is->upcalls ? is->upcall_cycles / is->upcalls : 0);
+    }
+}
+
+void
+pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s, int n_ms)
+{
+    struct iter_stats *is;
+    size_t index;
+    int i;
+
+    if (n_ms == 0) {
+        return;
+    }
+    ds_put_format(str,
+                  "   %-12s   %-10s   %-10s   %-10s   %-10s"
+                  "   %-10s   %-10s   %-10s   %-10s\n",
+                  "ms", "iterations", "cycles/it", "Kpps", "cycles/pkt",
+                  "pkts/batch", "vhost qlen", "upcalls", "cycles/upcall");
+    for (i = 1; i <= n_ms; i++) {
+        index = (s->milliseconds.idx + HISTORY_LEN - i) % HISTORY_LEN;
+        is = &s->milliseconds.sample[index];
+        ds_put_format(str,
+                      "   %-12"PRIu64"   %-11"PRIu32"  %-11"PRIu64
+                      "  %-11"PRIu32"  %-11"PRIu64"  %-11"PRIu32
+                      "  %-11"PRIu32"  %-11"PRIu32"  %-11"PRIu32"\n",
+                      is->timestamp,
+                      is->iterations,
+                      is->iterations ? is->cycles / is->iterations : 0,
+                      is->pkts,
+                      is->pkts ? is->busy_cycles / is->pkts : 0,
+                      is->batches ? is->pkts / is->batches : 0,
+                      is->iterations
+                          ? is->max_vhost_qfill / is->iterations : 0,
+                      is->upcalls,
+                      is->upcalls ? is->upcall_cycles / is->upcalls : 0);
+    }
+}
+
+void
 pmd_perf_read_counters(struct pmd_perf_stats *s,
                        uint64_t stats[PMD_N_STATS])
 {
@@ -59,8 +354,20 @@ void
 pmd_perf_stats_clear(struct pmd_perf_stats *s)
 {
     s->start_ms = 0; /* Marks start of clearing. */
+    memset(&s->current, 0 , sizeof(struct iter_stats));
+    memset(&s->totals, 0 , sizeof(struct iter_stats));
     for (int i = 0; i < PMD_N_STATS; i++) {
         atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]);
     }
+    histogram_clear(&s->cycles);
+    histogram_clear(&s->pkts);
+    histogram_clear(&s->cycles_per_pkt);
+    histogram_clear(&s->upcalls);
+    histogram_clear(&s->cycles_per_upcall);
+    histogram_clear(&s->pkts_per_batch);
+    histogram_clear(&s->max_vhost_qfill);
+    history_init(&s->iterations);
+    history_init(&s->milliseconds);
     s->start_ms = time_msec(); /* Clearing finished. */
+    s->milliseconds.sample[0].timestamp = s->start_ms;
 }
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index fa5cf37..3611723 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -33,6 +33,11 @@
 extern "C" {
 #endif
 
+#define NUM_BINS 32             /* Number of histogram bins. */
+#define HISTORY_LEN 1000        /* Length of recorded history
+                                   (iterations and ms). */
+#define DEF_HIST_SHOW 20        /* Default number of history samples. */
+
 enum pmd_stat_type {
     PMD_STAT_EXACT_HIT,     /* Packets that had an exact match (emc). */
     PMD_STAT_MASKED_HIT,    /* Packets that matched in the flow table. */
@@ -62,6 +67,28 @@ enum pmd_stat_type {
     PMD_N_STATS
 };
 
+struct histogram {
+    uint32_t wall[NUM_BINS];
+    uint64_t bin[NUM_BINS];
+};
+
+struct iter_stats {
+    uint64_t timestamp;
+    uint64_t cycles;
+    uint64_t busy_cycles;
+    uint32_t iterations;
+    uint32_t pkts;
+    uint32_t upcalls;
+    uint32_t upcall_cycles;
+    uint32_t batches;
+    uint32_t max_vhost_qfill;
+};
+
+struct history {
+    uint64_t idx;
+    struct iter_stats sample[HISTORY_LEN];
+};
+
 struct pmd_counters {
     atomic_uint64_t n[PMD_N_STATS];     /* Indexed by PMD_STAT_*. */
     uint64_t zero[PMD_N_STATS];
@@ -71,6 +98,17 @@ struct pmd_perf_stats {
     uint64_t start_ms;
     uint64_t last_tsc;
     struct pmd_counters counters;
+    struct iter_stats current;
+    struct iter_stats totals;
+    struct histogram cycles;
+    struct histogram pkts;
+    struct histogram cycles_per_pkt;
+    struct histogram upcalls;
+    struct histogram cycles_per_upcall;
+    struct histogram pkts_per_batch;
+    struct histogram max_vhost_qfill;
+    struct history iterations;
+    struct history milliseconds;
 };
 
 enum pmd_info_type;
@@ -98,6 +136,63 @@ pmd_perf_update_counter(struct pmd_perf_stats *s,
     atomic_store_relaxed(&s->counters.n[counter], tmp);
 }
 
+void pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
+                                   double duration);
+void pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats *s);
+void pmd_perf_format_iteration_history(struct ds *str,
+                                       struct pmd_perf_stats *s,
+                                       int n_iter);
+void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s,
+                                int n_ms);
+
+void pmd_perf_show(struct unixctl_conn *conn, int argc,
+                   const char *argv[],
+                   void *aux OVS_UNUSED);
+static inline void
+histogram_add_sample(struct histogram *hist, uint32_t val)
+{
+    /* TODO: Can do better with binary search? */
+    for (int i = 0; i < NUM_BINS-1; i++) {
+        if (val <= hist->wall[i]) {
+            hist->bin[i]++;
+            return;
+        }
+    }
+    hist->bin[NUM_BINS-1]++;
+}
+
+uint64_t histogram_samples(const struct histogram *hist);
+
+static inline struct iter_stats *
+history_current(struct history *h)
+{
+    return &h->sample[h->idx];
+}
+
+static inline struct iter_stats *
+history_next(struct history *h)
+{
+    struct iter_stats *next;
+
+    h->idx++;
+    if (h->idx == HISTORY_LEN) {
+        h->idx = 0;
+    }
+    next = &h->sample[h->idx];
+    memset(next, 0, sizeof(*next));
+    return next;
+}
+
+static inline struct iter_stats *
+history_store(struct history *h, struct iter_stats *is)
+{
+    if (is) {
+        h->sample[h->idx] = *is;
+    }
+    /* Advance the history pointer */
+    return history_next(h);
+}
+
 static inline void
 pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc)
 {
@@ -105,14 +200,25 @@ pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc)
         /* Stats not yet fully initialised. */
         return;
     }
-    s->last_tsc = now_tsc;
+    memset(&s->current, 0, sizeof(struct iter_stats));
+    s->current.timestamp = now_tsc;
 }
 
 static inline void
 pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
-                       int packets)
+                       int packets, bool full_metrics)
 {
-    uint64_t cycles = now_tsc - s->last_tsc;
+    struct iter_stats *cum_ms;
+    uint64_t cycles, cycles_per_pkt = 0;
+
+    if (OVS_UNLIKELY(s->current.timestamp == 0)) {
+        /* Stats were cleared during the ongoing iteration. */
+        return;
+    }
+
+    cycles = now_tsc - s->current.timestamp;
+    s->current.cycles = cycles;
+    s->current.pkts = packets;
 
     /* No need for atomic updates as only invoked within PMD thread. */
     if (packets > 0) {
@@ -120,7 +226,68 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
     } else {
         s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles;
     }
+    /* Add iteration sample to histograms. */
+    histogram_add_sample(&s->cycles, cycles);
+    histogram_add_sample(&s->pkts, packets);
+
+    if (!full_metrics) {
+        return;
+    }
+
+    s->counters.n[PMD_CYCLES_UPCALL] += s->current.upcall_cycles;
+
+    if (packets > 0) {
+        cycles_per_pkt = cycles / packets;
+        histogram_add_sample(&s->cycles_per_pkt, cycles_per_pkt);
+    }
+    if (s->current.batches > 0) {
+        histogram_add_sample(&s->pkts_per_batch, packets / s->current.batches);
+    }
+    histogram_add_sample(&s->upcalls, s->current.upcalls);
+    if (s->current.upcalls > 0) {
+        histogram_add_sample(&s->cycles_per_upcall,
+                             s->current.upcall_cycles / s->current.upcalls);
+    }
+    histogram_add_sample(&s->max_vhost_qfill, s->current.max_vhost_qfill);
+
+    /* Add iteration samples to millisecond stats. */
+    cum_ms = history_current(&s->milliseconds);
+    cum_ms->iterations++;
+    cum_ms->cycles += cycles;
+    if (packets > 0) {
+        cum_ms->busy_cycles += cycles;
+    }
+    cum_ms->pkts += s->current.pkts;
+    cum_ms->upcalls += s->current.upcalls;
+    cum_ms->upcall_cycles += s->current.upcall_cycles;
+    cum_ms->batches += s->current.batches;
+    cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
+
+    /* Store in iteration history. */
+    history_store(&s->iterations, &s->current);
+    if (now_tsc - s->last_tsc > 10000) {
+        /* Check if ms is completed and store in milliseconds history. */
+        uint64_t now = time_msec();
+        if (now != cum_ms->timestamp) {
+            /* Add ms stats to totals. */
+            s->totals.iterations += cum_ms->iterations;
+            s->totals.cycles += cum_ms->cycles;
+            s->totals.busy_cycles += cum_ms->busy_cycles;
+            s->totals.pkts += cum_ms->pkts;
+            s->totals.upcalls += cum_ms->upcalls;
+            s->totals.upcall_cycles += cum_ms->upcall_cycles;
+            s->totals.batches += cum_ms->batches;
+            s->totals.max_vhost_qfill += cum_ms->max_vhost_qfill;
+            cum_ms = history_next(&s->milliseconds);
+            cum_ms->timestamp = now;
+        }
+        s->last_tsc = now_tsc;
+    }
+}
+
+#ifdef  __cplusplus
 }
+#endif
 
 #ifdef  __cplusplus
 }
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8b9698a..8f64df3 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -53,6 +53,7 @@
 #include "id-pool.h"
 #include "latch.h"
 #include "netdev.h"
+#include "netdev-provider.h"
 #include "netdev-vport.h"
 #include "netlink.h"
 #include "odp-execute.h"
@@ -278,6 +279,8 @@ struct dp_netdev {
 
     /* Probability of EMC insertions is a factor of 'emc_insert_min'.*/
     OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min;
+    /* Enable collection of PMD performance metrics. */
+    ATOMIC(bool) pmd_perf_metrics;
 
     /* Protects access to ofproto-dpif-upcall interface during revalidator
      * thread synchronization. */
@@ -703,6 +706,7 @@ static inline bool emc_entry_alive(struct emc_entry *ce);
 static void emc_clear_entry(struct emc_entry *ce);
 
 static void dp_netdev_request_reconfigure(struct dp_netdev *dp);
+static inline bool pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd);
 
 static void
 emc_cache_init(struct emc_cache *flow_cache)
@@ -786,7 +790,8 @@ get_dp_netdev(const struct dpif *dpif)
 enum pmd_info_type {
     PMD_INFO_SHOW_STATS,  /* Show how cpu cycles are spent. */
     PMD_INFO_CLEAR_STATS, /* Set the cycles count to 0. */
-    PMD_INFO_SHOW_RXQ     /* Show poll-lists of pmd threads. */
+    PMD_INFO_SHOW_RXQ,    /* Show poll-lists of pmd threads. */
+    PMD_INFO_PERF_SHOW,   /* Show pmd performance details. */
 };
 
 static void
@@ -873,8 +878,46 @@ pmd_info_show_stats(struct ds *reply,
     ds_put_format(reply,
                   "\tavg processing cycles per packet: "
                   "%.02f (%"PRIu64"/%"PRIu64")\n",
-                  stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets,
-                  stats[PMD_CYCLES_POLL_BUSY], total_packets);
+                  stats[PMD_CYCLES_ITER_BUSY] / (double)total_packets,
+                  stats[PMD_CYCLES_ITER_BUSY], total_packets);
+}
+
+static void
+pmd_info_show_perf(struct ds *reply,
+                   struct dp_netdev_pmd_thread *pmd,
+                   struct pmd_perf_params *par)
+{
+    if (pmd->core_id != NON_PMD_CORE_ID) {
+        char *time_str =
+                xastrftime_msec("%H:%M:%S.###", time_wall_msec(), true);
+        long long now = time_msec();
+        double duration = (now - pmd->perf_stats.start_ms) / 1000.0;
+
+        ds_put_cstr(reply, "\n");
+        ds_put_format(reply, "Time: %s\n", time_str);
+        ds_put_format(reply, "Measurement duration: %.3f s\n", duration);
+        ds_put_cstr(reply, "\n");
+        format_pmd_thread(reply, pmd);
+        ds_put_cstr(reply, "\n");
+        pmd_perf_format_overall_stats(reply, &pmd->perf_stats, duration);
+        if (pmd_perf_metrics_enabled(pmd)) {
+            if (par->histograms) {
+                ds_put_cstr(reply, "\n");
+                pmd_perf_format_histograms(reply, &pmd->perf_stats);
+            }
+            if (par->iter_hist_len > 0) {
+                ds_put_cstr(reply, "\n");
+                pmd_perf_format_iteration_history(reply, &pmd->perf_stats,
+                        par->iter_hist_len);
+            }
+            if (par->ms_hist_len > 0) {
+                ds_put_cstr(reply, "\n");
+                pmd_perf_format_ms_history(reply, &pmd->perf_stats,
+                        par->ms_hist_len);
+            }
+        }
+        free(time_str);
+    }
 }
 
 static int
@@ -1079,6 +1122,8 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
             pmd_perf_stats_clear(&pmd->perf_stats);
         } else if (type == PMD_INFO_SHOW_STATS) {
             pmd_info_show_stats(&reply, pmd);
+        } else if (type == PMD_INFO_PERF_SHOW) {
+            pmd_info_show_perf(&reply, pmd, (struct pmd_perf_params *)aux);
         }
     }
     free(pmd_list);
@@ -1088,6 +1133,48 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
     unixctl_command_reply(conn, ds_cstr(&reply));
     ds_destroy(&reply);
 }
+
+static void
+pmd_perf_show_cmd(struct unixctl_conn *conn, int argc,
+                          const char *argv[],
+                          void *aux OVS_UNUSED)
+{
+    struct pmd_perf_params par;
+    long int it_hist = 0, ms_hist = 0;
+    par.histograms = true;
+
+    while (argc > 1) {
+        if (!strcmp(argv[1], "-nh")) {
+            par.histograms = false;
+            argc -= 1;
+            argv += 1;
+        } else if (!strcmp(argv[1], "-it") && argc > 2) {
+            it_hist = strtol(argv[2], NULL, 10);
+            if (it_hist < 0) {
+                it_hist = 0;
+            } else if (it_hist > HISTORY_LEN) {
+                it_hist = HISTORY_LEN;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-ms") && argc > 2) {
+            ms_hist = strtol(argv[2], NULL, 10);
+            if (ms_hist < 0) {
+                ms_hist = 0;
+            } else if (ms_hist > HISTORY_LEN) {
+                ms_hist = HISTORY_LEN;
+            }
+            argc -= 2;
+            argv += 2;
+        } else {
+            break;
+        }
+    }
+    par.iter_hist_len = it_hist;
+    par.ms_hist_len = ms_hist;
+    par.command_type = PMD_INFO_PERF_SHOW;
+    dpif_netdev_pmd_info(conn, argc, argv, &par);
+}
 
 static int
 dpif_netdev_init(void)
@@ -1105,6 +1192,12 @@ dpif_netdev_init(void)
     unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core | dp]",
                              0, 2, dpif_netdev_pmd_info,
                              (void *)&poll_aux);
+    unixctl_command_register("dpif-netdev/pmd-perf-show",
+                             "[-nh] [-it iter-history-len]"
+                             " [-ms ms-history-len]"
+                             " [-pmd core | dp]",
+                             0, 7, pmd_perf_show_cmd,
+                             NULL);
     unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
                              0, 1, dpif_netdev_pmd_rebalance,
                              NULL);
@@ -2980,6 +3073,18 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
         }
     }
 
+    bool perf_enabled = smap_get_bool(other_config, "pmd-perf-metrics", false);
+    bool cur_perf_enabled;
+    atomic_read_relaxed(&dp->pmd_perf_metrics, &cur_perf_enabled);
+    if (perf_enabled != cur_perf_enabled) {
+        atomic_store_relaxed(&dp->pmd_perf_metrics, perf_enabled);
+        if (perf_enabled) {
+            VLOG_INFO("PMD performance metrics collection enabled");
+        } else {
+            VLOG_INFO("PMD performance metrics collection disabled");
+        }
+    }
+
     return 0;
 }
 
@@ -3160,6 +3265,14 @@ cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
     }
 }
 
+static inline bool
+pmd_perf_metrics_enabled(const struct dp_netdev_pmd_thread *pmd)
+{
+    bool pmd_perf_enabled;
+    atomic_read_relaxed(&pmd->dp->pmd_perf_metrics, &pmd_perf_enabled);
+    return pmd_perf_enabled;
+}
+
 static void
 dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx,
                          enum rxq_cycles_counter_type type,
@@ -3236,6 +3349,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
                            struct netdev_rxq *rx,
                            odp_port_t port_no)
 {
+    struct pmd_perf_stats *s = &pmd->perf_stats;
     struct dp_packet_batch batch;
     int error;
     int batch_cnt = 0;
@@ -3245,8 +3359,25 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
     if (!error) {
         *recirc_depth_get() = 0;
         pmd_thread_ctx_time_update(pmd);
-
         batch_cnt = batch.count;
+        if (pmd_perf_metrics_enabled(pmd)) {
+            /* Update batch histogram. */
+            s->current.batches++;
+            histogram_add_sample(&s->pkts_per_batch, batch_cnt);
+            /* Update the maximum Rx queue fill level. */
+            uint32_t qfill = batch.qfill;
+            switch (netdev_dpdk_get_type(netdev_rxq_get_netdev(rx))) {
+            case DPDK_DEV_VHOST:
+                if (qfill > s->current.max_vhost_qfill) {
+                    s->current.max_vhost_qfill = qfill;
+                }
+                break;
+            case DPDK_DEV_ETH:
+            default:
+                break;
+            }
+        }
+        /* Process packet batch. */
         dp_netdev_input(pmd, &batch, port_no);
         dp_netdev_pmd_flush_output_packets(pmd);
     } else if (error != EAGAIN && error != EOPNOTSUPP) {
@@ -4084,7 +4215,9 @@ reload:
     cycles_count_start(pmd);
     for (;;) {
         uint64_t iter_packets = 0;
+
         pmd_perf_start_iteration(s, pmd->ctx.last_cycles);
+
         for (i = 0; i < poll_cnt; i++) {
             process_packets =
                 dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
@@ -4116,7 +4249,9 @@ reload:
             }
             cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD);
         }
-        pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets);
+
+        pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets,
+                               pmd_perf_metrics_enabled(pmd));
     }
 
     cycles_count_end(pmd, PMD_CYCLES_OVERHEAD);
@@ -5012,6 +5147,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
     struct match match;
     ovs_u128 ufid;
     int error;
+    uint64_t cycles = cycles_counter();
 
     match.tun_md.valid = false;
     miniflow_expand(&key->mf, &match.flow);
@@ -5065,9 +5201,15 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
         ovs_mutex_unlock(&pmd->flow_mutex);
         emc_probabilistic_insert(pmd, key, netdev_flow);
     }
-    /* Only error ENOSPC can reach here. We process the packet but do not
-     * install a datapath flow. Treat as successful. */
-    return 0;
+    if (pmd_perf_metrics_enabled(pmd)) {
+        /* Update upcall stats. */
+        cycles = cycles_counter() - cycles;
+        struct pmd_perf_stats *s = &pmd->perf_stats;
+        s->current.upcalls++;
+        s->current.upcall_cycles += cycles;
+        histogram_add_sample(&s->cycles_per_upcall, cycles);
+    }
+    return error;
 }
 
 static inline void
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 364f545..d701621 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_meter.h>
 #include <rte_pci.h>
+#include <rte_version.h>
 #include <rte_vhost.h>
 #include <rte_version.h>
 
@@ -188,11 +189,6 @@ enum { DPDK_RING_SIZE = 256 };
 BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
 enum { DRAIN_TSC = 200000ULL };
 
-enum dpdk_dev_type {
-    DPDK_DEV_ETH = 0,
-    DPDK_DEV_VHOST = 1,
-};
-
 /* Quality of Service */
 
 /* An instance of a QoS configuration.  Always associated with a particular
@@ -835,6 +831,13 @@ netdev_dpdk_cast(const struct netdev *netdev)
     return CONTAINER_OF(netdev, struct netdev_dpdk, up);
 }
 
+enum dpdk_dev_type
+netdev_dpdk_get_type(const struct netdev *netdev)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    return dev->type;
+}
+
 static struct netdev *
 netdev_dpdk_alloc(void)
 {
@@ -1644,6 +1647,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
         return EAGAIN;
     }
 
+    batch->qfill = nb_rx;
+
+    if (OVS_UNLIKELY(nb_rx == NETDEV_MAX_BURST)) {
+        batch->qfill += rte_vhost_rx_queue_count(netdev_dpdk_get_vid(dev),
+                                                 qid * VIRTIO_QNUM
+                                                     + VIRTIO_TXQ);
+    }
+
     if (policer) {
         dropped = nb_rx;
         nb_rx = ingress_policer_run(policer,
@@ -1683,6 +1694,8 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch)
         return EAGAIN;
     }
 
+    batch->qfill = nb_rx;
+
     if (policer) {
         dropped = nb_rx;
         nb_rx = ingress_policer_run(policer,
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index b7d02a7..2b357db 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -22,11 +22,18 @@
 #include "openvswitch/compiler.h"
 
 struct dp_packet;
+struct netdev;
+
+enum dpdk_dev_type {
+    DPDK_DEV_ETH = 0,
+    DPDK_DEV_VHOST = 1,
+};
 
 #ifdef DPDK_NETDEV
 
 void netdev_dpdk_register(void);
 void free_dpdk_buf(struct dp_packet *);
+enum dpdk_dev_type netdev_dpdk_get_type(const struct netdev *netdev);
 
 #else
 
@@ -41,6 +48,13 @@ free_dpdk_buf(struct dp_packet *buf OVS_UNUSED)
     /* Nothing */
 }
 
+static inline enum dpdk_dev_type
+netdev_dpdk_get_type(const struct netdev *netdev OVS_UNUSED)
+{
+    /* Nothing to do. Return value zero to make compiler happy. */
+    return DPDK_DEV_ETH;
+}
+
 #endif
 
 #endif /* netdev-dpdk.h */
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 43b7b89..3845b12 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -5317,7 +5317,8 @@ dpif_show_backer(const struct dpif_backer *backer, struct ds *ds)
 
     dpif_get_dp_stats(backer->dpif, &dp_stats);
     ds_put_format(ds, "%s: hit:%"PRIu64" missed:%"PRIu64"\n",
-                  dpif_name(backer->dpif), dp_stats.n_hit, dp_stats.n_missed);
+                  dpif_name(backer->dpif),
+                  dp_stats.n_hit + dp_stats.n_mask_hit, dp_stats.n_missed);
 
     shash_init(&ofproto_shash);
     ofprotos = get_ofprotos(&ofproto_shash);

From patchwork Thu Jan  4 12:07:46 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Scheurich <jan.scheurich@ericsson.com>
X-Patchwork-Id: 855573
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path: <ovs-dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
	(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
	envelope-from=ovs-dev-bounces@openvswitch.org;
	receiver=<UNKNOWN>)
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
	[140.211.169.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3zC65X2L4jz9s7s
	for <incoming@patchwork.ozlabs.org>;
	Thu,  4 Jan 2018 23:08:32 +1100 (AEDT)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
	by mail.linuxfoundation.org (Postfix) with ESMTP id 50141BD4;
	Thu,  4 Jan 2018 12:08:01 +0000 (UTC)
X-Original-To: dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 2F342BAF
	for <dev@openvswitch.org>; Thu,  4 Jan 2018 12:08:00 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mout.web.de (mout.web.de [212.227.17.12])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id C554C18A
	for <dev@openvswitch.org>; Thu,  4 Jan 2018 12:07:58 +0000 (UTC)
Received: from ubuntu.local ([129.192.10.2]) by smtp.web.de (mrweb101
	[213.165.67.124]) with ESMTPSA (Nemesis) id 0M7snk-1etFzY0DJ2-00vNxN;
	Thu, 04 Jan 2018 13:07:56 +0100
From: Jan Scheurich <jan.scheurich@ericsson.com>
To: dev@openvswitch.org
Date: Thu,  4 Jan 2018 13:07:46 +0100
Message-Id: <1515067666-12398-4-git-send-email-jan.scheurich@ericsson.com>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com>
References: <1515067666-12398-1-git-send-email-jan.scheurich@ericsson.com>
X-Provags-ID: V03:K0:YWM8sKdUBeBqeHqTm1colNYOQK69fDbwD5Uw/OZ5Z9oEDXnqX09
	cG7YJUW9IP0XlYPnTN07kmV5eiPfeLCzydYdgJaFhMdTw5BLjQXXTFdt2HEJx2TtQuR7D1B
	HqWhPbqMvRbVtkO2otVe9L4v7KKy7vWNYnFUsoKRtPranG3FWTMlcj5YCaCN7Duqlhk1hvL
	NwNT6jSiRr8YY9GjBl7XQ==
X-UI-Out-Filterresults: notjunk:1; V01:K0:bM79Is+UnLI=:mG6Dus26T/9j8wW2T4km5f
	qGDo+s9tzlyML36Szih3HOJwvCqo3MIKl4OcfW6Ag9QMj/nFbWGnNa2eAjkQ8WAI9zTLzbXK9
	DL+KGBOBHeXbopW4vFhcE07QOsa83qHzqZFeIypn8e1RiNxSabflegY+xjuN6H97XKNh3MeyW
	vx0qkrlGqEtQc9QZniN9YCQoepz9Z2xoyGjSizAixKnosxEFo/L7vtz1hWtiyCGupmpC2yqLj
	0dT3l8FdMehfpV3ln5Cwr9xMHl6O8U27t/bN8XWgzqVGzZZO5BXF3tp8wqI2DgpdUS8Ka4ojl
	2cOgnXp9H9bi8+hcYZgF7bV3DxrwStr20dkL+tERtsKYFI5J3vkpx+03A6RMHxij9TypnDtmb
	qsE/A0zhtij7hiYFYVbbqGVwYNHJ3WXGPkPZpgd29GsGDbsJajjnAsrIVFVGLAkV3jvlRDpqI
	inR/VHHLiKKjzu/q+iPkPRzgm2jpHmB5k9c4MwHmd/TtVFGDn3p3XpCBbwwXjOkqkmxvQg7hb
	BsJeZXUVcCBTjBYsVU9PYRDeCbjbwDB+O3SptDJZGlU3mngD1wSMbtyhOhS/5GSrRSxJFInP7
	UHv/0JztMBno+pFdPYkmbiWvpNCD21lfZGuQBjL72GFWFhSzENn9aUk/nTH5h6mHIEWwEDLbc
	pQ8Le27+dQlGkMdKke/bNtNL2yX8VTD6jwezQIFKKF7lGsot7havf2vh/dAFj2pBVzmohrVx6
	9boiJmbLbRAvA8uGimkrRF3G0xBG9HuwRX+J53DaafMc2GYgfkFOesOhwgl8P9vEz0HMmcnWu
	MeRK0tKdCvdp++EIY6yhpY8coK6ZA==
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	RCVD_IN_DNSWL_LOW,T_RP_MATCHES_RCVD autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: [ovs-dev] [PATCH v5 3/3] dpif-netdev: Detection and logging of
	suspicious PMD iterations
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <ovs-dev.openvswitch.org>
List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/>
List-Post: <mailto:ovs-dev@openvswitch.org>
List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help>
List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

This patch enhances dpif-netdev-perf to detect iterations with
suspicious statistics according to the following criteria:

- iteration lasts longer than US_THR microseconds (default 250).
  This can be used to capture events where a PMD is blocked or
  interrupted for such a period of time that there is a risk for
  dropped packets on any of its Rx queues.

- max vhost qlen exceeds a threshold Q_THR (default 128). This can
  be used to infer virtio queue overruns and dropped packets inside
  a VM, which are not visible in OVS otherwise.

Such suspicious iterations can be logged together with their iteration
statistics to be able to correlate them to packet drop or other events
outside OVS.

A new command is introduced to enable/disable logging at run-time and
to adjust the above thresholds for suspicious iterations:

ovs-appctl dpif-netdev/pmd-perf-log-set [on|off]
    [-b before] [-a after] [-us usec] [-q qlen]

Turn logging on or off at run-time (on|off).

-b before:  The number of iterations before the suspicious iteration to
            be logged (default 5).
-a after:   The number of iterations after the suspicious iteration to
            be logged (default 5).
-q qlen:    Suspicious vhost queue fill level threshold. Increase this
            to 512 if the Qemu supports 1024 virtio queue length.
            (default 128).
-us usec:   change the duration threshold for a suspicious iteration
            (default 250 us).

If more than 100 iterations before or after a suspicious iteration have
been looged once, OVS falls back to the safe default values (5/5) to
avoid that logging itself causes continuos further logging.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
---
 lib/dpif-netdev-perf.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/dpif-netdev-perf.h |  32 +++++++++++
 lib/dpif-netdev.c      |   5 ++
 3 files changed, 179 insertions(+)

diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index a66a48c..3fd19b0 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -28,6 +28,23 @@
 
 VLOG_DEFINE_THIS_MODULE(pmd_perf);
 
+#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration duration
+                                   in microseconds. */
+#define VHOST_QUEUE_FULL 128    /* Size of the virtio TX queue. */
+#define LOG_IT_BEFORE 5         /* Number of iteration to log before
+                                   suspicious iteration. */
+#define LOG_IT_AFTER 5          /* Number of iteration to log after
+                                   suspicious iteration. */
+
+bool log_on = false;
+static uint32_t log_it_before = LOG_IT_BEFORE;
+static uint32_t log_it_after = LOG_IT_AFTER;
+static uint32_t log_us_thr = ITER_US_THRESHOLD;
+uint32_t log_q_thr = VHOST_QUEUE_FULL;
+uint64_t iter_cycle_threshold;
+
+static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600, 600);
+
 #ifdef DPDK_NETDEV
 static uint64_t
 get_tsc_hz(void)
@@ -133,6 +150,8 @@ pmd_perf_stats_init(struct pmd_perf_stats *s) {
     histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
     histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
     s->start_ms = time_msec();
+    s->log_begin_it = UINT64_MAX;
+    s->log_end_it = UINT64_MAX;
 }
 
 void
@@ -368,6 +387,129 @@ pmd_perf_stats_clear(struct pmd_perf_stats *s)
     histogram_clear(&s->max_vhost_qfill);
     history_init(&s->iterations);
     history_init(&s->milliseconds);
+    s->log_begin_it = UINT64_MAX;
+    s->log_end_it = UINT64_MAX;
     s->start_ms = time_msec(); /* Clearing finished. */
     s->milliseconds.sample[0].timestamp = s->start_ms;
 }
+
+void
+pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
+                                     uint64_t cycles,
+                                     char *reason)
+{
+    VLOG_WARN_RL(&latency_rl,
+                 "Suspicious iteration (%s): tsc=%"PRIu64
+                 " duration=%"PRIu64" us\n",
+                 reason, s->current.timestamp,
+                 (1000000L * cycles) / get_tsc_hz());
+    if (log_it_before + log_it_after > 0) {
+        if (s->log_begin_it == UINT64_MAX) {
+            s->log_begin_it =
+                    (s->iterations.idx + HISTORY_LEN - log_it_before)
+                        % HISTORY_LEN;
+        }
+        s->log_end_it =
+                (s->iterations.idx + log_it_after) % HISTORY_LEN;
+    }
+}
+
+void
+pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s)
+{
+    struct ds log = DS_EMPTY_INITIALIZER;
+    pmd_perf_format_iteration_history(&log, s,
+            (s->log_end_it + HISTORY_LEN - s->log_begin_it) % HISTORY_LEN);
+    VLOG_WARN_RL(&latency_rl,
+                 "Neighborhood of suspicious iteration:\n"
+                 "%s", ds_cstr(&log));
+    ds_destroy(&log);
+    s->log_end_it = s->log_begin_it = UINT64_MAX;
+    if (log_it_before > 100 || log_it_after > 100) {
+        /* Reset to safe default values to avoid disturbance. */
+        log_it_before = LOG_IT_BEFORE;
+        log_it_after = LOG_IT_AFTER;
+    }
+}
+
+void
+pmd_perf_log_set_cmd(struct unixctl_conn *conn,
+                 int argc, const char *argv[],
+                 void *aux OVS_UNUSED)
+{
+    int it_before, it_after, us_thr, q_thr;
+    bool on;
+    bool usage = false;
+
+    on = log_on;
+    it_before = log_it_before;
+    it_after = log_it_after;
+    q_thr = log_q_thr;
+    us_thr = log_us_thr;
+
+    while (argc > 1) {
+        if (!strcmp(argv[1], "on")) {
+            on = true;
+            argc--;
+            argv++;
+        } else if (!strcmp(argv[1], "off")) {
+            on = false;
+            argc--;
+            argv++;
+        } else if (!strcmp(argv[1], "-a")) {
+            it_after = strtol(argv[2], NULL, 10);
+            if (it_after < 0) {
+                it_after = 0;
+            } else if (it_before + it_after > HISTORY_LEN-1) {
+                it_after = HISTORY_LEN-1 - it_before;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-b")) {
+            it_before = strtol(argv[2], NULL, 10);
+            if (it_before < 0) {
+                it_before = 0;
+            } else if (it_before > HISTORY_LEN-10) {
+                it_before = HISTORY_LEN-10;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-q")) {
+            q_thr = strtol(argv[2], NULL, 10);
+            if (q_thr < 0) {
+                q_thr = 0;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-us")) {
+            us_thr = strtol(argv[2], NULL, 10);
+            if (us_thr < 0) {
+                us_thr = 0;
+            }
+            argc -= 2;
+            argv += 2;
+        } else {
+            usage = true;
+            break;
+        }
+    }
+
+    if (usage) {
+        unixctl_command_reply_error(conn,
+                "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set "
+                "[on|off] [-b before] [-a after] [-us usec] [-q qlen]");
+        return;
+    }
+
+    VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, us_thr=%d,"
+              " q_thr=%d\n",
+              on ? "on" : "off", it_before, it_after, us_thr, q_thr);
+    log_on = on;
+    log_it_before = it_before;
+    log_it_after = it_after;
+    log_q_thr = q_thr;
+    log_us_thr = us_thr;
+    iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L;
+
+    unixctl_command_reply(conn, "");
+}
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 3611723..392ca80 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -109,6 +109,8 @@ struct pmd_perf_stats {
     struct histogram max_vhost_qfill;
     struct history iterations;
     struct history milliseconds;
+    uint64_t log_begin_it;
+    uint64_t log_end_it;
 };
 
 enum pmd_info_type;
@@ -121,6 +123,10 @@ struct pmd_perf_params {
     size_t ms_hist_len;
 };
 
+extern bool log_on;
+extern uint32_t log_q_thr;
+extern uint64_t iter_cycle_threshold;
+
 void pmd_perf_stats_init(struct pmd_perf_stats *s);
 void pmd_perf_stats_clear(struct pmd_perf_stats *s);
 void pmd_perf_read_counters(struct pmd_perf_stats *s,
@@ -145,9 +151,17 @@ void pmd_perf_format_iteration_history(struct ds *str,
 void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s,
                                 int n_ms);
 
+void pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
+                                       uint64_t cycles,
+                                       char *reason);
+void pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s);
 void pmd_perf_show(struct unixctl_conn *conn, int argc,
                    const char *argv[],
                    void *aux OVS_UNUSED);
+void pmd_perf_log_set_cmd(struct unixctl_conn *conn,
+                      int argc, const char *argv[],
+                      void *aux OVS_UNUSED);
+
 static inline void
 histogram_add_sample(struct histogram *hist, uint32_t val)
 {
@@ -210,6 +224,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
 {
     struct iter_stats *cum_ms;
     uint64_t cycles, cycles_per_pkt = 0;
+    char *reason = NULL;
 
     if (OVS_UNLIKELY(s->current.timestamp == 0)) {
         /* Stats were cleared during the ongoing iteration. */
@@ -263,6 +278,23 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
     cum_ms->batches += s->current.batches;
     cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
 
+    if (log_on) {
+        /* Log suspicious iterations. */
+        if (cycles > iter_cycle_threshold) {
+            reason = "Excessive total cycles";
+        } else if (s->current.max_vhost_qfill >= log_q_thr) {
+            reason = "Vhost RX queue full";
+        }
+        if (OVS_UNLIKELY(reason)) {
+            pmd_perf_log_suspicious_iteration(s, cycles, reason);
+        }
+
+        /* Log iteration stats interval when required. */
+        if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) {
+            pmd_perf_log_iteration_neighborhood(s);
+        }
+    }
+
     /* Store in iteration history. */
     history_store(&s->iterations, &s->current);
     if (now_tsc - s->last_tsc > 10000) {
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8f64df3..96cc4d5 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1201,6 +1201,11 @@ dpif_netdev_init(void)
     unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
                              0, 1, dpif_netdev_pmd_rebalance,
                              NULL);
+    unixctl_command_register("dpif-netdev/pmd-perf-log-set",
+                             "[on|off] [-b before] [-a after] "
+                             "[-us usec] [-q qlen]",
+                             0, 10, pmd_perf_log_set_cmd,
+                             NULL);
     return 0;
 }