From patchwork Wed Oct 18 10:46:16 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Scheurich <jan.scheurich@ericsson.com>
X-Patchwork-Id: 827523
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path: <ovs-dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
	(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
	envelope-from=ovs-dev-bounces@openvswitch.org;
	receiver=<UNKNOWN>)
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
	[140.211.169.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3yH7zJ4kThz9t3k
	for <incoming@patchwork.ozlabs.org>;
	Wed, 18 Oct 2017 21:46:52 +1100 (AEDT)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
	by mail.linuxfoundation.org (Postfix) with ESMTP id 5B10DAA6;
	Wed, 18 Oct 2017 10:46:25 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 8E75B9F8
	for <ovs-dev@openvswitch.org>; Wed, 18 Oct 2017 10:46:22 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from sesbmg23.ericsson.net (sesbmg23.ericsson.net [193.180.251.37])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 462894DA
	for <ovs-dev@openvswitch.org>; Wed, 18 Oct 2017 10:46:18 +0000 (UTC)
X-AuditID: c1b4fb25-dd3ff70000000c94-7a-59e730f9a112
Received: from ESESSHC024.ericsson.se (Unknown_Domain [153.88.183.90])
	by sesbmg23.ericsson.net (Symantec Mail Security) with SMTP id
	81.8A.03220.9F037E95; Wed, 18 Oct 2017 12:46:17 +0200 (CEST)
Received: from ESESSMB107.ericsson.se ([169.254.7.134]) by
	ESESSHC024.ericsson.se ([153.88.183.90]) with mapi id 14.03.0352.000;
	Wed, 18 Oct 2017 12:46:16 +0200
From: Jan Scheurich <jan.scheurich@ericsson.com>
To: "'ovs-dev@openvswitch.org'" <ovs-dev@openvswitch.org>
Thread-Topic: [PATCH v2 1/3] dpif-netdev: Refactor PMD performance into
Thread-Index: AdNH/RpfeRBD5GZkQgy7/S3Wh+Ry+w==
Date: Wed, 18 Oct 2017 10:46:16 +0000
Message-ID: <CFF8EF42F1132E4CBE2BF0AB6C21C58D7883B172@ESESSMB107.ericsson.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [153.88.183.16]
MIME-Version: 1.0
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFrrDLMWRmVeSWpSXmKPExsUyM2J7lO5Pg+eRBv+v8lnM/fSc0YHR49nN
	/4wBjFFcNimpOZllqUX6dglcGZu3zWMvaJvJVLGi+Q5jA+OG44xdjJwcEgImEvterwOyuTiE
	BI4wSqxdu4IJwlnCKNFzaypLFyMHB5uAgcTs3Q4gDSIClhJnm9azgtjMAnoSh9Y3sIDYwgJO
	Ep/WdjNC1LhLfOg4ywZh60nsfPECLM4ioCoxfdV5JhCbV8BX4si6P2A1jAJiEt9PrWGCmCku
	cevJfCaI4wQkluw5zwxhi0q8fPyPFcJWlNh5tp0Zol5HYsHuT2wQtrbEsoWvmSHmC0qcnPmE
	ZQKj8CwkY2chaZmFpGUWkpYFjCyrGEWLU4uTctONjPVSizKTi4vz8/TyUks2MQLD/OCW36o7
	GC+/cTzEKMDBqMTDu0X5eaQQa2JZcWXuIUYJDmYlEV7mj88ihXhTEiurUovy44tKc1KLDzFK
	c7AoifM67rsQISSQnliSmp2aWpBaBJNl4uCUamBknbrBQzVxvdKxmGPSqt6s724UVz1WPO3M
	+Hj+h/XrD51s+lC9OOrbmrqf/1Y935yVtSmEx43bstA6pVcz89eu3/P4b3NP/nHzi4la2Ktj
	hz7fy+Bb/zzV7PBLmYOzmpXe1JwLadz3lu1EUs/ZTYZT91zYduDM0TNxreEddaYGtixp/FX9
	qT5MSizFGYmGWsxFxYkAs/lXem8CAAA=
X-Spam-Status: No, score=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED
	autolearn=disabled version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: [ovs-dev] [PATCH v2 1/3] dpif-netdev: Refactor PMD performance into
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <ovs-dev.openvswitch.org>
List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/>
List-Post: <mailto:ovs-dev@openvswitch.org>
List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help>
List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

Add module dpif-netdev-perf to host all PMD performance-related
data structures and functions in dpif-netdev. Refactor the PMD
stats handling in dpif-netdev and delegate whatever possible into
the new module, using clean interfaces to shield dpif-netdev from
the implementation details.

Include Darrel's prior refactoring of PMD stats contained in
[PATCH v5,2/3] dpif-netdev: Refactor some pmd stats:

1. The cycles per packet counts are now based on packets
received rather than packet passes through the datapath.

2. Packet counters are now kept for packets received and
packets recirculated. These are kept as separate counters for
maintainability reasons. The cost of incrementing these counters
is negligible.  These new counters are also displayed to the user.

3. A display statistic is added for the average number of
datapath passes per packet. This should be useful for user
debugging and understanding of packet processing.

4. The user visible 'miss' counter is used for successful upcalls,
rather than the sum of sucessful and unsuccessful upcalls. Hence,
this becomes what user historically understands by OVS 'miss upcall'.
The user display is annotated to make this clear as well.

5. The user visible 'lost' counter remains as failed upcalls, but
is annotated to make it clear what the meaning is.

6. The enum pmd_stat_type is annotated to make the usage of the
stats counters clear.

7. The subtable lookup stats is renamed to make it clear that it
relates to masked lookups.

8. The PMD stats test is updated to handle the new user stats of
packets received, packets recirculated and average number of datapath
passes per packet.

On top of that introduce a "-pmd <core>" option to the PMD info
commands to filter the output for a single PMD.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
---
 lib/automake.mk        |   2 +
 lib/dpif-netdev-perf.c |  67 ++++++++++
 lib/dpif-netdev-perf.h | 123 ++++++++++++++++++
 lib/dpif-netdev.c      | 330 ++++++++++++++++++++-----------------------------
 tests/pmd.at           |  22 ++--
 5 files changed, 340 insertions(+), 204 deletions(-)
 create mode 100644 lib/dpif-netdev-perf.c
 create mode 100644 lib/dpif-netdev-perf.h
diff --git a/lib/automake.mk b/lib/automake.mk
index 2415f4c..79199a0 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -80,6 +80,8 @@ lib_libopenvswitch_la_SOURCES = \
 	lib/dpdk.h \
 	lib/dpif-netdev.c \
 	lib/dpif-netdev.h \
+	lib/dpif-netdev-perf.c \
+	lib/dpif-netdev-perf.h \
 	lib/dpif-provider.h \
 	lib/dpif.c \
 	lib/dpif.h \
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
new file mode 100644
index 0000000..6a8f7c4
--- /dev/null
+++ b/lib/dpif-netdev-perf.c
@@ -0,0 +1,67 @@
+/*
+ * Copyright (c) 2017 Ericsson AB.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+
+#ifdef DPDK_NETDEV
+#include <rte_cycles.h>
+#endif
+
+#include "openvswitch/dynamic-string.h"
+#include "openvswitch/vlog.h"
+#include "dpif-netdev-perf.h"
+#include "timeval.h"
+
+VLOG_DEFINE_THIS_MODULE(pmd_perf);
+
+void
+pmd_perf_stats_init(struct pmd_perf_stats *s) {
+    memset(s, 0 , sizeof(*s));
+    s->start_ms = time_msec();
+}
+
+void
+pmd_perf_read_counters(struct pmd_perf_stats *s,
+                       uint64_t stats[PMD_N_STATS])
+{
+    uint64_t val;
+
+    /* These loops subtracts reference values ('*_zero') from the counters.
+     * Since loads and stores are relaxed, it might be possible for a '*_zero'
+     * value to be more recent than the current value we're reading from the
+     * counter.  This is not a big problem, since these numbers are not
+     * supposed to be too accurate, but we should at least make sure that
+     * the result is not negative. */
+    for (int i = 0; i < PMD_N_STATS; i++) {
+        atomic_read_relaxed(&s->counters.n[i], &val);
+        if (val > s->counters.zero[i]) {
+            stats[i] = val - s->counters.zero[i];
+        } else {
+            stats[i] = 0;
+        }
+    }
+}
+
+void
+pmd_perf_stats_clear(struct pmd_perf_stats *s)
+{
+    s->start_ms = 0; /* Marks start of clearing. */
+    for (int i = 0; i < PMD_N_STATS; i++) {
+        atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]);
+    }
+    s->start_ms = time_msec(); /* Clearing finished. */
+}
+
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
new file mode 100644
index 0000000..f55f7a2
--- /dev/null
+++ b/lib/dpif-netdev-perf.h
@@ -0,0 +1,123 @@
+/*
+ * Copyright (c) 2017 Ericsson AB.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef PMD_PERF_H
+#define PMD_PERF_H 1
+
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <string.h>
+#include <math.h>
+
+#include "openvswitch/vlog.h"
+#include "ovs-atomic.h"
+#include "timeval.h"
+#include "unixctl.h"
+#include "util.h"
+
+#ifdef  __cplusplus
+extern "C" {
+#endif
+
+enum pmd_stat_type {
+    PMD_STAT_EXACT_HIT,     /* Packets that had an exact match (emc). */
+    PMD_STAT_MASKED_HIT,    /* Packets that matched in the flow table. */
+    PMD_STAT_MISS,          /* Packets that did not match and upcall was ok. */
+    PMD_STAT_LOST,          /* Packets that did not match and upcall failed. */
+                           /* The above statistics account for the total
+                            * number of packet passes through the datapath
+                            * pipeline and should not be overlapping with each
+                            * other. */
+    PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table
+                              hits. Each MASKED_HIT hit will have >= 1
+                              MASKED_LOOKUP(s). */
+    PMD_STAT_RECV,          /* Packets entering the datapath pipeline from an
+                             * interface. */
+    PMD_STAT_RECIRC,        /* Packets reentering the datapath pipeline due to
+                             * recirculation. */
+    PMD_CYCLES_POLL_IDLE,   /* Cycles spent unsuccessful polling. */
+    PMD_CYCLES_POLL_BUSY,   /* Cycles spent successfully polling and
+                             * processing polled packets. */
+    PMD_CYCLES_OVERHEAD,    /* Cycles spent for other tasks. */
+    PMD_CYCLES_UPCALL,      /* Cycles spent in upcalls. Included in
+                               PMD_CYCLES_POLL_BUSY. */
+    PMD_CYCLES_ITER_IDLE,   /* Cycles spent in idle iterations. */
+    PMD_CYCLES_ITER_BUSY,   /* Cycles spent in busy iterations. */
+    PMD_N_STATS
+};
+
+struct pmd_counters {
+    atomic_uint64_t n[PMD_N_STATS];     /* Indexed by PMD_STAT_*. */
+    uint64_t zero[PMD_N_STATS];
+};
+
+struct pmd_perf_stats {
+    uint64_t start_ms;
+    uint64_t last_tsc;
+    struct pmd_counters counters;
+};
+
+enum pmd_info_type;
+struct dp_netdev_pmd_thread;
+
+struct pmd_perf_params {
+    int command_type;
+    bool histograms;
+    size_t iter_hist_len;
+    size_t ms_hist_len;
+};
+
+void pmd_perf_stats_init(struct pmd_perf_stats *s);
+void pmd_perf_stats_clear(struct pmd_perf_stats *s);
+void pmd_perf_read_counters(struct pmd_perf_stats *s,
+                            uint64_t stats[PMD_N_STATS]);
+
+static inline void
+pmd_perf_update_counter(struct pmd_perf_stats *s,
+                        enum pmd_stat_type counter, int delta)
+{
+    uint64_t tmp;
+    atomic_read_relaxed(&s->counters.n[counter], &tmp);
+    tmp += delta;
+    atomic_store_relaxed(&s->counters.n[counter], tmp);
+}
+
+static inline void
+pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc)
+{
+    if (OVS_UNLIKELY(s->start_ms == 0)) {
+        /* Stats not yet fully initialised. */
+        return;
+    }
+    s->last_tsc = now_tsc;
+}
+
+static inline void
+pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
+                       int packets)
+{
+    uint64_t cycles = now_tsc - s->last_tsc;
+
+    /* No need for atomic updates as only invoked within PMD thread. */
+    if (packets > 0) {
+        s->counters.n[PMD_CYCLES_ITER_BUSY] += cycles;
+    } else {
+        s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles;
+    }
+}
+
+#endif /* pmd-perf.h */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d5eb830..2347f55 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -43,6 +43,7 @@
 #include "csum.h"
 #include "dp-packet.h"
 #include "dpif.h"
+#include "dpif-netdev-perf.h"
 #include "dpif-provider.h"
 #include "dummy.h"
 #include "fat-rwlock.h"
@@ -330,23 +331,6 @@ static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev *dp,
                                                     odp_port_t)
     OVS_REQUIRES(dp->port_mutex);
 
-enum dp_stat_type {
-    DP_STAT_EXACT_HIT,          /* Packets that had an exact match (emc). */
-    DP_STAT_MASKED_HIT,         /* Packets that matched in the flow table. */
-    DP_STAT_MISS,               /* Packets that did not match. */
-    DP_STAT_LOST,               /* Packets not passed up to the client. */
-    DP_STAT_LOOKUP_HIT,         /* Number of subtable lookups for flow table
-                                   hits */
-    DP_N_STATS
-};
-
-enum pmd_cycles_counter_type {
-    PMD_CYCLES_IDLE,            /* Cycles spent idle or unsuccessful polling */
-    PMD_CYCLES_PROCESSING,      /* Cycles spent successfully polling and
-                                 * processing polled packets */
-    PMD_N_CYCLES
-};
-
 enum rxq_cycles_counter_type {
     RXQ_CYCLES_PROC_CURR,       /* Cycles spent successfully polling and
                                    processing packets during the current
@@ -496,18 +480,6 @@ struct dp_netdev_actions *dp_netdev_flow_get_actions(
     const struct dp_netdev_flow *);
 static void dp_netdev_actions_free(struct dp_netdev_actions *);
 
-/* Contained by struct dp_netdev_pmd_thread's 'stats' member.  */
-struct dp_netdev_pmd_stats {
-    /* Indexed by DP_STAT_*. */
-    atomic_ullong n[DP_N_STATS];
-};
-
-/* Contained by struct dp_netdev_pmd_thread's 'cycle' member.  */
-struct dp_netdev_pmd_cycles {
-    /* Indexed by PMD_CYCLES_*. */
-    atomic_ullong n[PMD_N_CYCLES];
-};
-
 struct polled_queue {
     struct dp_netdev_rxq *rxq;
     odp_port_t port_no;
@@ -577,12 +549,6 @@ struct dp_netdev_pmd_thread {
        are stored for each polled rxq. */
     long long int rxq_interval;
 
-    /* Statistics. */
-    struct dp_netdev_pmd_stats stats;
-
-    /* Cycles counters */
-    struct dp_netdev_pmd_cycles cycles;
-
     /* Used to count cicles. See 'cycles_counter_end()' */
     unsigned long long last_cycles;
 
@@ -620,12 +586,8 @@ struct dp_netdev_pmd_thread {
     struct hmap tnl_port_cache;
     struct hmap send_port_cache;
 
-    /* Only a pmd thread can write on its own 'cycles' and 'stats'.
-     * The main thread keeps 'stats_zero' and 'cycles_zero' as base
-     * values and subtracts them from 'stats' and 'cycles' before
-     * reporting to the user */
-    unsigned long long stats_zero[DP_N_STATS];
-    uint64_t cycles_zero[PMD_N_CYCLES];
+    /* Keep track of detailed PMD performance statistics. */
+    struct pmd_perf_stats perf_stats;
 
     /* Set to true if the pmd thread needs to be reloaded. */
     bool need_reload;
@@ -791,46 +753,11 @@ enum pmd_info_type {
 };
 
 static void
-pmd_info_show_stats(struct ds *reply,
-                    struct dp_netdev_pmd_thread *pmd,
-                    unsigned long long stats[DP_N_STATS],
-                    uint64_t cycles[PMD_N_CYCLES])
+format_pmd_thread(struct ds *reply,
+                  struct dp_netdev_pmd_thread *pmd)
 {
-    unsigned long long total_packets;
-    uint64_t total_cycles = 0;
-    int i;
-
-    /* These loops subtracts reference values ('*_zero') from the counters.
-     * Since loads and stores are relaxed, it might be possible for a '*_zero'
-     * value to be more recent than the current value we're reading from the
-     * counter.  This is not a big problem, since these numbers are not
-     * supposed to be too accurate, but we should at least make sure that
-     * the result is not negative. */
-    for (i = 0; i < DP_N_STATS; i++) {
-        if (stats[i] > pmd->stats_zero[i]) {
-            stats[i] -= pmd->stats_zero[i];
-        } else {
-            stats[i] = 0;
-        }
-    }
-
-    /* Sum of all the matched and not matched packets gives the total.  */
-    total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT]
-                    + stats[DP_STAT_MISS];
-
-    for (i = 0; i < PMD_N_CYCLES; i++) {
-        if (cycles[i] > pmd->cycles_zero[i]) {
-           cycles[i] -= pmd->cycles_zero[i];
-        } else {
-            cycles[i] = 0;
-        }
-
-        total_cycles += cycles[i];
-    }
-
     ds_put_cstr(reply, (pmd->core_id == NON_PMD_CORE_ID)
                         ? "main thread" : "pmd thread");
-
     if (pmd->numa_id != OVS_NUMA_UNSPEC) {
         ds_put_format(reply, " numa_id %d", pmd->numa_id);
     }
@@ -838,16 +765,39 @@ pmd_info_show_stats(struct ds *reply,
         ds_put_format(reply, " core_id %u", pmd->core_id);
     }
     ds_put_cstr(reply, ":\n");
+}
+
+static void
+pmd_info_show_stats(struct ds *reply,
+                    struct dp_netdev_pmd_thread *pmd)
+{
+    uint64_t stats[PMD_N_STATS];
+    uint64_t total_cycles = 0;
+
+    pmd_perf_read_counters(&pmd->perf_stats, stats);
+    total_cycles = stats[PMD_CYCLES_ITER_IDLE]
+                         + stats[PMD_CYCLES_ITER_BUSY];
+
+    format_pmd_thread(reply, pmd);
 
     ds_put_format(reply,
-                  "\temc hits:%llu\n\tmegaflow hits:%llu\n"
-                  "\tavg. subtable lookups per hit:%.2f\n"
-                  "\tmiss:%llu\n\tlost:%llu\n",
-                  stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
-                  stats[DP_STAT_MASKED_HIT] > 0
-                  ? (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
-                  : 0,
-                  stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
+                  "\tpackets received:%"PRIu64"\n"
+                  "\tpacket recirculations:%"PRIu64"\n"
+                  "\tavg. datapath passes per packet:%.2f\n"
+                  "\temc hits:%"PRIu64"\n"
+                  "\tmegaflow hits:%"PRIu64"\n"
+                  "\tavg. subtable lookups per megaflow hit:%.2f\n"
+                  "\tmiss(i.e. lookup miss with success upcall):%"PRIu64"\n"
+                  "\tlost(i.e. lookup miss with failed upcall):%"PRIu64"\n",
+                  stats[PMD_STAT_RECV], stats[PMD_STAT_RECIRC],
+                  stats[PMD_STAT_RECV] > 0
+                  ? (1.0 * (stats[PMD_STAT_RECV] + stats[PMD_STAT_RECIRC]))
+                     / stats[PMD_STAT_RECV] : 0,
+                  stats[PMD_STAT_EXACT_HIT], stats[PMD_STAT_MASKED_HIT],
+                  stats[PMD_STAT_MASKED_HIT] > 0
+                  ? (1.0 * stats[PMD_STAT_MASKED_LOOKUP])
+                     / stats[PMD_STAT_MASKED_HIT]
+                  : 0, stats[PMD_STAT_MISS], stats[PMD_STAT_LOST]);
 
     if (total_cycles == 0) {
         return;
@@ -856,46 +806,26 @@ pmd_info_show_stats(struct ds *reply,
     ds_put_format(reply,
                   "\tidle cycles:%"PRIu64" (%.02f%%)\n"
                   "\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
-                  cycles[PMD_CYCLES_IDLE],
-                  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
-                  cycles[PMD_CYCLES_PROCESSING],
-                  cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 100);
+                  stats[PMD_CYCLES_ITER_IDLE],
+                  stats[PMD_CYCLES_ITER_IDLE] / (double)total_cycles * 100,
+                  stats[PMD_CYCLES_ITER_BUSY],
+                  stats[PMD_CYCLES_ITER_BUSY] / (double)total_cycles * 100);
 
+    uint64_t total_packets = stats[PMD_STAT_RECV];
     if (total_packets == 0) {
         return;
     }
 
     ds_put_format(reply,
-                  "\tavg cycles per packet: %.02f (%"PRIu64"/%llu)\n",
+                  "\tavg cycles per packet: %.02f (%"PRIu64"/%"PRIu64")\n",
                   total_cycles / (double)total_packets,
                   total_cycles, total_packets);
 
     ds_put_format(reply,
                   "\tavg processing cycles per packet: "
-                  "%.02f (%"PRIu64"/%llu)\n",
-                  cycles[PMD_CYCLES_PROCESSING] / (double)total_packets,
-                  cycles[PMD_CYCLES_PROCESSING], total_packets);
-}
-
-static void
-pmd_info_clear_stats(struct ds *reply OVS_UNUSED,
-                    struct dp_netdev_pmd_thread *pmd,
-                    unsigned long long stats[DP_N_STATS],
-                    uint64_t cycles[PMD_N_CYCLES])
-{
-    int i;
-
-    /* We cannot write 'stats' and 'cycles' (because they're written by other
-     * threads) and we shouldn't change 'stats' (because they're used to count
-     * datapath stats, which must not be cleared here).  Instead, we save the
-     * current values and subtract them from the values to be displayed in the
-     * future */
-    for (i = 0; i < DP_N_STATS; i++) {
-        pmd->stats_zero[i] = stats[i];
-    }
-    for (i = 0; i < PMD_N_CYCLES; i++) {
-        pmd->cycles_zero[i] = cycles[i];
-    }
+                  "%.02f (%"PRIu64"/%"PRIu64")\n",
+                  stats[PMD_CYCLES_POLL_BUSY] / (double)total_packets,
+                  stats[PMD_CYCLES_POLL_BUSY], total_packets);
 }
 
 static int
@@ -1055,23 +985,34 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
     struct ds reply = DS_EMPTY_INITIALIZER;
     struct dp_netdev_pmd_thread **pmd_list;
     struct dp_netdev *dp = NULL;
-    size_t n;
     enum pmd_info_type type = *(enum pmd_info_type *) aux;
+    int core_id = -1;
+    size_t n;
 
     ovs_mutex_lock(&dp_netdev_mutex);
 
-    if (argc == 2) {
-        dp = shash_find_data(&dp_netdevs, argv[1]);
-    } else if (shash_count(&dp_netdevs) == 1) {
-        /* There's only one datapath */
-        dp = shash_first(&dp_netdevs)->data;
+    while (argc > 1) {
+        if (!strcmp(argv[1], "-pmd")) {
+            core_id = strtol(argv[2], NULL, 10);
+            argc -= 2;
+            argv += 2;
+        } else {
+            dp = shash_find_data(&dp_netdevs, argv[1]);
+            argc -= 1;
+            argv += 1;
+        }
     }
 
     if (!dp) {
-        ovs_mutex_unlock(&dp_netdev_mutex);
-        unixctl_command_reply_error(conn,
-                                    "please specify an existing datapath");
-        return;
+        if (shash_count(&dp_netdevs) == 1) {
+            /* There's only one datapath */
+            dp = shash_first(&dp_netdevs)->data;
+        } else {
+            ovs_mutex_unlock(&dp_netdev_mutex);
+            unixctl_command_reply_error(conn,
+                                        "please specify an existing datapath");
+            return;
+        }
     }
 
     sorted_poll_thread_list(dp, &pmd_list, &n);
@@ -1080,26 +1021,15 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
         if (!pmd) {
             break;
         }
-
+        if (core_id != -1 && pmd->core_id != core_id) {
+            continue;
+        }
         if (type == PMD_INFO_SHOW_RXQ) {
             pmd_info_show_rxq(&reply, pmd);
-        } else {
-            unsigned long long stats[DP_N_STATS];
-            uint64_t cycles[PMD_N_CYCLES];
-
-            /* Read current stats and cycle counters */
-            for (size_t j = 0; j < ARRAY_SIZE(stats); j++) {
-                atomic_read_relaxed(&pmd->stats.n[j], &stats[j]);
-            }
-            for (size_t j = 0; j < ARRAY_SIZE(cycles); j++) {
-                atomic_read_relaxed(&pmd->cycles.n[j], &cycles[j]);
-            }
-
-            if (type == PMD_INFO_CLEAR_STATS) {
-                pmd_info_clear_stats(&reply, pmd, stats, cycles);
-            } else if (type == PMD_INFO_SHOW_STATS) {
-                pmd_info_show_stats(&reply, pmd, stats, cycles);
-            }
+        } else if (type == PMD_INFO_CLEAR_STATS) {
+            pmd_perf_stats_clear(&pmd->perf_stats);
+        } else if (type == PMD_INFO_SHOW_STATS) {
+            pmd_info_show_stats(&reply, pmd);
         }
     }
     free(pmd_list);
@@ -1117,14 +1047,14 @@ dpif_netdev_init(void)
                               clear_aux = PMD_INFO_CLEAR_STATS,
                               poll_aux = PMD_INFO_SHOW_RXQ;
 
-    unixctl_command_register("dpif-netdev/pmd-stats-show", "[dp]",
-                             0, 1, dpif_netdev_pmd_info,
+    unixctl_command_register("dpif-netdev/pmd-stats-show", "[-pmd core] [dp]",
+                             0, 2, dpif_netdev_pmd_info,
                              (void *)&show_aux);
-    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]",
-                             0, 1, dpif_netdev_pmd_info,
+    unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core] [dp]",
+                             0, 2, dpif_netdev_pmd_info,
                              (void *)&clear_aux);
-    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]",
-                             0, 1, dpif_netdev_pmd_info,
+    unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]",
+                             0, 2, dpif_netdev_pmd_info,
                              (void *)&poll_aux);
     unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
                              0, 1, dpif_netdev_pmd_rebalance,
@@ -1460,23 +1390,19 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats)
 {
     struct dp_netdev *dp = get_dp_netdev(dpif);
     struct dp_netdev_pmd_thread *pmd;
+    uint64_t pmd_stats[PMD_N_STATS];
 
-    stats->n_flows = stats->n_hit = stats->n_missed = stats->n_lost = 0;
+    stats->n_flows = stats->n_hit =
+            stats->n_mask_hit = stats->n_missed = stats->n_lost = 0;
     CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
-        unsigned long long n;
         stats->n_flows += cmap_count(&pmd->flow_table);
-
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MASKED_HIT], &n);
-        stats->n_hit += n;
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_EXACT_HIT], &n);
-        stats->n_hit += n;
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_MISS], &n);
-        stats->n_missed += n;
-        atomic_read_relaxed(&pmd->stats.n[DP_STAT_LOST], &n);
-        stats->n_lost += n;
+        pmd_perf_read_counters(&pmd->perf_stats, pmd_stats);
+        stats->n_hit += pmd_stats[PMD_STAT_EXACT_HIT];
+        stats->n_mask_hit += pmd_stats[PMD_STAT_MASKED_HIT];
+        stats->n_missed += pmd_stats[PMD_STAT_MISS];
+        stats->n_lost += pmd_stats[PMD_STAT_LOST];
     }
     stats->n_masks = UINT32_MAX;
-    stats->n_mask_hit = UINT64_MAX;
 
     return 0;
 }
@@ -3152,28 +3078,28 @@ cycles_count_start(struct dp_netdev_pmd_thread *pmd)
 /* Stop counting cycles and add them to the counter 'type' */
 static inline void
 cycles_count_end(struct dp_netdev_pmd_thread *pmd,
-                 enum pmd_cycles_counter_type type)
+                 enum pmd_stat_type type)
     OVS_RELEASES(&cycles_counter_fake_mutex)
     OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     unsigned long long interval = cycles_counter() - pmd->last_cycles;
 
-    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
+    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
 }
 
 /* Calculate the intermediate cycle result and add to the counter 'type' */
 static inline void
 cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
                           struct dp_netdev_rxq *rxq,
-                          enum pmd_cycles_counter_type type)
+                          enum pmd_stat_type type)
     OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     unsigned long long new_cycles = cycles_counter();
     unsigned long long interval = new_cycles - pmd->last_cycles;
     pmd->last_cycles = new_cycles;
 
-    non_atomic_ullong_add(&pmd->cycles.n[type], interval);
-    if (rxq && (type == PMD_CYCLES_PROCESSING)) {
+    pmd_perf_update_counter(&pmd->perf_stats, type, interval);
+    if (rxq && (type == PMD_CYCLES_POLL_BUSY)) {
         /* Add to the amount of current processing cycles. */
         non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval);
     }
@@ -3865,12 +3791,12 @@ dpif_netdev_run(struct dpif *dpif)
                                                    port->port_no);
                     cycles_count_intermediate(non_pmd, NULL,
                                               process_packets
-                                              ? PMD_CYCLES_PROCESSING
-                                              : PMD_CYCLES_IDLE);
+                                              ? PMD_CYCLES_POLL_BUSY
+                                              : PMD_CYCLES_POLL_IDLE);
                 }
             }
         }
-        cycles_count_end(non_pmd, PMD_CYCLES_IDLE);
+        cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE);
         dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false);
         ovs_mutex_unlock(&dp->non_pmd_mutex);
 
@@ -4014,6 +3940,7 @@ static void *
 pmd_thread_main(void *f_)
 {
     struct dp_netdev_pmd_thread *pmd = f_;
+    struct pmd_perf_stats *s = &pmd->perf_stats;
     unsigned int lc = 0;
     struct polled_queue *poll_list;
     bool exiting;
@@ -4049,13 +3976,17 @@ reload:
 
     cycles_count_start(pmd);
     for (;;) {
+        uint64_t iter_packets = 0;
+        pmd_perf_start_iteration(s, pmd->last_cycles);
         for (i = 0; i < poll_cnt; i++) {
             process_packets =
                 dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
                                            poll_list[i].port_no);
             cycles_count_intermediate(pmd, poll_list[i].rxq,
-                                      process_packets ? PMD_CYCLES_PROCESSING
-                                                      : PMD_CYCLES_IDLE);
+                                      process_packets
+                                      ? PMD_CYCLES_POLL_BUSY
+                                      : PMD_CYCLES_POLL_IDLE);
+            iter_packets += process_packets;
         }
 
         if (lc++ > 1024) {
@@ -4073,10 +4004,12 @@ reload:
             if (reload) {
                 break;
             }
+            cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD);
         }
+        pmd_perf_end_iteration(s, pmd->last_cycles, iter_packets);
     }
 
-    cycles_count_end(pmd, PMD_CYCLES_IDLE);
+    cycles_count_end(pmd, PMD_CYCLES_OVERHEAD);
 
     poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
     exiting = latch_is_set(&pmd->exit_latch);
@@ -4529,6 +4462,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
     }
     cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node),
                 hash_int(core_id, 0));
+    pmd_perf_stats_init(&pmd->perf_stats);
 }
 
 static void
@@ -4726,13 +4660,6 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, int cnt, int size,
     atomic_store_relaxed(&netdev_flow->stats.tcp_flags, flags);
 }
 
-static void
-dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd,
-                       enum dp_stat_type type, int cnt)
-{
-    non_atomic_ullong_add(&pmd->stats.n[type], cnt);
-}
-
 static int
 dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_,
                  struct flow *flow, struct flow_wildcards *wc, ovs_u128 *ufid,
@@ -4906,6 +4833,9 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
     int i;
 
     atomic_read_relaxed(&pmd->dp->emc_insert_min, &cur_min);
+    pmd_perf_update_counter(&pmd->perf_stats,
+                            md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
+                            cnt);
 
     DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
         struct dp_netdev_flow *flow;
@@ -4954,24 +4884,24 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
         }
     }
 
-    dp_netdev_count_packet(pmd, DP_STAT_EXACT_HIT,
-                           cnt - n_dropped - n_missed);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_EXACT_HIT,
+                            cnt - n_dropped - n_missed);
 
     return dp_packet_batch_size(packets_);
 }
 
-static inline void
+static inline int
 handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
                      struct dp_packet *packet,
                      const struct netdev_flow_key *key,
                      struct ofpbuf *actions, struct ofpbuf *put_actions,
-                     int *lost_cnt, long long now)
+                     long long now)
 {
     struct ofpbuf *add_actions;
     struct dp_packet_batch b;
     struct match match;
     ovs_u128 ufid;
-    int error;
+    int error = 0;
 
     match.tun_md.valid = false;
     miniflow_expand(&key->mf, &match.flow);
@@ -4985,8 +4915,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
                              put_actions);
     if (OVS_UNLIKELY(error && error != ENOSPC)) {
         dp_packet_delete(packet);
-        (*lost_cnt)++;
-        return;
+        return error;
     }
 
     /* The Netlink encoding of datapath flow keys cannot express
@@ -5026,6 +4955,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
         ovs_mutex_unlock(&pmd->flow_mutex);
         emc_probabilistic_insert(pmd, key, netdev_flow);
     }
+    return error;
 }
 
 static inline void
@@ -5047,7 +4977,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
     struct dpcls *cls;
     struct dpcls_rule *rules[PKT_ARRAY_SIZE];
     struct dp_netdev *dp = pmd->dp;
-    int miss_cnt = 0, lost_cnt = 0;
+    int upcall_ok_cnt = 0, upcall_fail_cnt = 0;
     int lookup_cnt = 0, add_lookup_cnt;
     bool any_miss;
     size_t i;
@@ -5089,9 +5019,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
                 continue;
             }
 
-            miss_cnt++;
-            handle_packet_upcall(pmd, packet, &keys[i], &actions,
-                                 &put_actions, &lost_cnt, now);
+            int error = handle_packet_upcall(pmd, packet, &keys[i],
+                           &actions, &put_actions, now);
+
+            if (OVS_UNLIKELY(error)) {
+                upcall_fail_cnt++;
+            } else {
+                upcall_ok_cnt++;
+            }
         }
 
         ofpbuf_uninit(&actions);
@@ -5101,8 +5036,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
         DP_PACKET_BATCH_FOR_EACH (packet, packets_) {
             if (OVS_UNLIKELY(!rules[i])) {
                 dp_packet_delete(packet);
-                lost_cnt++;
-                miss_cnt++;
+                upcall_fail_cnt++;
             }
         }
     }
@@ -5120,10 +5054,14 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
         dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
     }
 
-    dp_netdev_count_packet(pmd, DP_STAT_MASKED_HIT, cnt - miss_cnt);
-    dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
-    dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
-    dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_HIT,
+                            cnt - upcall_ok_cnt - upcall_fail_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MASKED_LOOKUP,
+                            lookup_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_MISS,
+                            upcall_ok_cnt);
+    pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST,
+                            upcall_fail_cnt);
 }
 
 /* Packets enter the datapath from a port (or from recirculation) here.
diff --git a/tests/pmd.at b/tests/pmd.at
index e39a23a..0356f87 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -170,13 +170,16 @@ dummy@ovs-dummy: hit:0 missed:0
 		p0 7/1: (dummy-pmd: configured_rx_queues=4, configured_tx_queues=<cleared>, requested_rx_queues=4, requested_tx_queues=<cleared>)
 ])
 
-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
 pmd thread numa_id <cleared> core_id <cleared>:
+	packets received:0
+	packet recirculations:0
+	avg. datapath passes per packet:0.00
 	emc hits:0
 	megaflow hits:0
-	avg. subtable lookups per hit:0.00
-	miss:0
-	lost:0
+	avg. subtable lookups per megaflow hit:0.00
+	miss(i.e. lookup miss with success upcall):0
+	lost(i.e. lookup miss with failed upcall):0
 ])
 
 ovs-appctl time/stop
@@ -197,13 +200,16 @@ AT_CHECK([cat ovs-vswitchd.log | filter_flow_install | strip_xout], [0], [dnl
 recirc_id(0),in_port(1),packet_type(ns=0,id=0),eth(src=50:54:00:00:00:77,dst=50:54:00:00:01:78),eth_type(0x0800),ipv4(frag=no), actions: <del>
 ])
 
-AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 5], [0], [dnl
+AT_CHECK([ovs-appctl dpif-netdev/pmd-stats-show | sed SED_NUMA_CORE_PATTERN | sed '/cycles/d' | grep pmd -A 8], [0], [dnl
 pmd thread numa_id <cleared> core_id <cleared>:
+	packets received:20
+	packet recirculations:0
+	avg. datapath passes per packet:1.00
 	emc hits:19
 	megaflow hits:0
-	avg. subtable lookups per hit:0.00
-	miss:1
-	lost:0
+	avg. subtable lookups per megaflow hit:0.00
+	miss(i.e. lookup miss with success upcall):1
+	lost(i.e. lookup miss with failed upcall):0
 ])
 
 OVS_VSWITCHD_STOP

From patchwork Wed Oct 18 10:46:20 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Scheurich <jan.scheurich@ericsson.com>
X-Patchwork-Id: 827524
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path: <ovs-dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
	(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
	envelope-from=ovs-dev-bounces@openvswitch.org;
	receiver=<UNKNOWN>)
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
	[140.211.169.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3yH80H1frJz9t39
	for <incoming@patchwork.ozlabs.org>;
	Wed, 18 Oct 2017 21:47:43 +1100 (AEDT)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
	by mail.linuxfoundation.org (Postfix) with ESMTP id B675BABA;
	Wed, 18 Oct 2017 10:46:27 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 0480BAAE
	for <ovs-dev@openvswitch.org>; Wed, 18 Oct 2017 10:46:26 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from sesbmg23.ericsson.net (sesbmg23.ericsson.net [193.180.251.37])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 952AA4DA
	for <ovs-dev@openvswitch.org>; Wed, 18 Oct 2017 10:46:22 +0000 (UTC)
X-AuditID: c1b4fb25-dd3ff70000000c94-84-59e730fc164f
Received: from ESESSHC001.ericsson.se (Unknown_Domain [153.88.183.21])
	by sesbmg23.ericsson.net (Symantec Mail Security) with SMTP id
	F2.8A.03220.CF037E95; Wed, 18 Oct 2017 12:46:20 +0200 (CEST)
Received: from ESESSMB107.ericsson.se ([169.254.7.134]) by
	ESESSHC001.ericsson.se ([153.88.183.21]) with mapi id 14.03.0352.000;
	Wed, 18 Oct 2017 12:46:20 +0200
From: Jan Scheurich <jan.scheurich@ericsson.com>
To: "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
Thread-Topic: [PATCH v2 2/3] dpif-netdev: Detailed performance stats for PMDs
Thread-Index: AdNH/WISwqv6YIfWT76878TATSPZ/A==
Date: Wed, 18 Oct 2017 10:46:20 +0000
Message-ID: <CFF8EF42F1132E4CBE2BF0AB6C21C58D7883B17C@ESESSMB107.ericsson.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [153.88.183.16]
MIME-Version: 1.0
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFrrNLMWRmVeSWpSXmKPExsUyM2K7qO4fg+eRBufuCljM/fSc0YHR49nN
	/4wBjFFcNimpOZllqUX6dglcGYd2LGAq2NPJVLH8pEID46adjF2MnBwSAiYSF59NZupi5OIQ
	EjjCKDH12WYoZwmjxJZbp4EcDg42AQOJ2bsdQBpEBMwlTnw4xw5iMwvoSRxa38ACYgsLeEhM
	v7uJEaLGV+Li2Z1MELaexM5fE5hBbBYBVYl3E4+A1fMC1Vz6+BnMZhQQk/h+ag0TxExxiVtP
	5jNBHCcgsWTPeWYIW1Ti5eN/rBC2osTOs+3MEPU6Egt2f2KDsLUlli18zQwxX1Di5MwnLBMY
	hWchGTsLScssJC2zkLQsYGRZxShanFqclJtuZKyXWpSZXFycn6eXl1qyiREY5Ae3/FbdwXj5
	jeMhRgEORiUe3i3KzyOFWBPLiitzDzFKcDArifAyf3wWKcSbklhZlVqUH19UmpNafIhRmoNF
	SZzXcd+FCCGB9MSS1OzU1ILUIpgsEwenVAOj31uZzGepi/53KB7/+t9S/PGd8DXRTaceznmU
	4/7Aeendgsuxn9xmHfvWXLH4974I70vR2zNbuo82rF1ecyyAvff0U+sbR38J5Sk7MxXoFu72
	jjryR6Ai7vmd1nsq7AbNrG+d211WHSts3ss8kaP1xUzPB4cc3NeY8nwJfKEYumZPYObF1An6
	SizFGYmGWsxFxYkABnVQpm4CAAA=
X-Spam-Status: No, score=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED
	autolearn=disabled version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: [ovs-dev] [PATCH v2 2/3] dpif-netdev: Detailed performance stats
	for PMDs
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <ovs-dev.openvswitch.org>
List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/>
List-Post: <mailto:ovs-dev@openvswitch.org>
List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help>
List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

This patch instruments the dpif-netdev datapath to record detailed
statistics of what is happening in every iteration of a PMD thread.

The covered metrics per iteration are:
  - cycles
  - packets
  - (rx) batches
  - packets/batch
  - max. vhostuser qlen
  - upcalls
  - cycles spent in upcalls

This raw recorded data is used threefold:

1. In histograms for each of the following metrics:
   - cycles/iteration (log.)
   - packets/iteration (log.)
   - cycles/packet
   - packets/batch
   - max. vhostuser qlen (log.)
   - upcalls
   - cycles/upcall (log)
   The histograms bins are divided linear or logarithmic.

2. A cyclic history of the above statistics for 1024 iterations

3. A cyclic history of the cumulative/average values per millisecond
   wall clock for the last 1024 milliseconds:
   - number of iterations
   - avg. cycles/iteration
   - packets (Kpps)
   - avg. packets/batch
   - avg. max vhost qlen
   - upcalls
   - avg. cycles/upcall

The gathered performance statists can be printed at any time with the
new CLI command

ovs-appctl dpif-netdev/pmd-perf-show [-nh] [-it iter_len] [-ms ms_len]
	[-pmd core] [dp]

The options are

-nh:            Suppress the histograms
-it iter_len:   Display the last iter_len iteration stats
-ms ms_len:     Display the last ms_len millisecond stats
-pmd core:      Display only

The performance statistics are reset with the existing
dpif-netdev/pmd-stats-clear command.

The output always contains the following global PMD statistics,
similar to the pmd-stats-show command:

Time: 15:24:55.270
Measurement duration: 1.008 s

pmd thread numa_id 0 core_id 1:

  Cycles:            2419034712  (2.40 GHz)
  Iterations:            572817  (1.76 us/it)
  - idle:                486808  (15.9 % cycles)
  - busy:                 86009  (84.1 % cycles)
  Packets:              2399607  (2381 Kpps, 848 cycles/pkt)
  Datapath passes:      3599415  (1.50 passes/pkt)
  - EMC hits:            336472  ( 9.3 %)
  - Megaflow hits:      3262943  (90.7 %, 1.00 subtbl lookups/hit)
  - Upcalls:                  0  ( 0.0 %, 0.0 us/upcall)
  - Lost upcalls:             0  ( 0.0 %)

The other output will be documented separately.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
---
 lib/dp-packet.h        |   2 +
 lib/dpif-netdev-perf.c | 306 ++++++++++++++++++++++++++++++++++++++++++++++++-
 lib/dpif-netdev-perf.h | 166 ++++++++++++++++++++++++++-
 lib/dpif-netdev.c      | 115 ++++++++++++++++++-
 lib/netdev-dpdk.c      |  28 ++++-
 lib/netdev-dpdk.h      |  14 +++
 ofproto/ofproto-dpif.c |   3 +-
 7 files changed, 623 insertions(+), 11 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index b4b721c..7950247 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -695,8 +695,10 @@ enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a batch. */
 
 struct dp_packet_batch {
     size_t count;
+    size_t qfill; /* Number of packets in queue when reading rx burst. */
     bool trunc; /* true if the batch needs truncate. */
     struct dp_packet *packets[NETDEV_MAX_BURST];
+
 };
 
 static inline void
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 6a8f7c4..42d93a2 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -15,6 +15,7 @@
  */
 
 #include <config.h>
+#include <stdint.h>
 
 #ifdef DPDK_NETDEV
 #include <rte_cycles.h>
@@ -27,13 +28,304 @@
 
 VLOG_DEFINE_THIS_MODULE(pmd_perf);
 
+#ifdef DPDK_NETDEV
+static uint64_t
+get_tsc_hz(void)
+{
+    return rte_get_tsc_hz();
+}
+#else
+static uint64_t
+read_tsc(void)
+{
+    register uint64_t tsc asm("eax");
+    asm volatile (".byte 15, 49" : : : "eax", "edx");
+    return tsc;
+}
+
+static uint64_t
+get_tsc_hz(void)
+{
+    uint64_t tsc_start = read_tsc();
+    xsleep(1);
+    return read_tsc() - tsc_start;
+}
+#endif
+
+static void
+histogram_walls_set_lin(struct histogram *hist, uint32_t min, uint32_t max)
+{
+    int i;
+
+    ovs_assert(min < max);
+    for (i = 0; i < NUM_BINS-1; i++) {
+        hist->wall[i] = min + (i * (max - min)) / (NUM_BINS - 2);
+    }
+    hist->wall[NUM_BINS-1] = UINT32_MAX;
+}
+
+static void
+histogram_walls_set_log(struct histogram *hist, uint32_t min, uint32_t max)
+{
+    int i, start, bins, wall;
+    double log_min, log_max;
+
+    ovs_assert(min < max);
+    if (min > 0) {
+        log_min = log(min);
+        log_max = log(max);
+        start = 0;
+        bins = NUM_BINS - 1;
+    } else {
+        hist->wall[0] = 0;
+        log_min = log(1);
+        log_max = log(max);
+        start = 1;
+        bins = NUM_BINS - 2;
+    }
+    wall = start;
+    for (i = 0; i < bins; i++) {
+        /* Make sure each wall is monotonically increasing. */
+        wall = MAX(wall, exp(log_min + (i * (log_max - log_min)) / (bins-1)));
+        hist->wall[start + i] = wall++;
+    }
+    if (hist->wall[NUM_BINS-2] < max) {
+        hist->wall[NUM_BINS-2] = max;
+    }
+    hist->wall[NUM_BINS-1] = UINT32_MAX;
+}
+
+uint64_t
+histogram_samples(const struct histogram *hist)
+{
+    uint64_t samples = 0;
+
+    for (int i = 0; i < NUM_BINS; i++) {
+        samples += hist->bin[i];
+    }
+    return samples;
+}
+
+static void
+histogram_clear(struct histogram *hist)
+{
+    int i;
+
+    for (i = 0; i < NUM_BINS; i++) {
+        hist->bin[i] = 0;
+    }
+}
+
+static void
+history_init(struct history *h)
+{
+    memset(h, 0, sizeof(*h));
+}
+
 void
 pmd_perf_stats_init(struct pmd_perf_stats *s) {
-    memset(s, 0 , sizeof(*s));
+    /* The struct has been zeroed at allocation. */
+    histogram_walls_set_log(&s->cycles, 500, 24000000);
+    histogram_walls_set_log(&s->pkts, 0, 1000);
+    histogram_walls_set_lin(&s->cycles_per_pkt, 100, 30000);
+    histogram_walls_set_lin(&s->pkts_per_batch, 0, 32);
+    histogram_walls_set_lin(&s->upcalls, 0, 30);
+    histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
+    histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
     s->start_ms = time_msec();
 }
 
 void
+pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
+                              double duration)
+{
+    uint64_t stats[PMD_N_STATS];
+    double us_per_cycle = 1000000.0 / get_tsc_hz();
+
+    if (duration == 0) {
+        return;
+    }
+
+    pmd_perf_read_counters(s, stats);
+    uint64_t tot_cycles = stats[PMD_CYCLES_ITER_IDLE] +
+                          stats[PMD_CYCLES_ITER_BUSY];
+    uint64_t packets = stats[PMD_STAT_RECV];
+    uint64_t passes = stats[PMD_STAT_RECV] +
+                      stats[PMD_STAT_RECIRC];
+    uint64_t upcalls = stats[PMD_STAT_MISS];
+    uint64_t upcall_cycles = stats[PMD_CYCLES_UPCALL];
+    uint64_t tot_iter = histogram_samples(&s->pkts);
+    uint64_t idle_iter = s->pkts.bin[0];
+
+    ds_put_format(str,
+            "  Cycles:          %12"PRIu64"  (%.2f GHz)\n"
+            "  Iterations:      %12"PRIu64"  (%.2f us/it)\n"
+            "  - idle:          %12"PRIu64"  (%4.1f %% cycles)\n"
+            "  - busy:          %12"PRIu64"  (%4.1f %% cycles)\n",
+            tot_cycles, (tot_cycles / duration) / 1E9,
+            tot_iter, tot_cycles * us_per_cycle / tot_iter,
+            idle_iter,
+            100.0 * stats[PMD_CYCLES_ITER_IDLE] / tot_cycles,
+            tot_iter - idle_iter,
+            100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles);
+    if (packets > 0) {
+        ds_put_format(str,
+            "  Packets:         %12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
+            "  Datapath passes: %12"PRIu64"  (%.2f passes/pkt)\n"
+            "  - EMC hits:      %12"PRIu64"  (%4.1f %%)\n"
+            "  - Megaflow hits: %12"PRIu64"  (%4.1f %%, %.2f subtbl lookups/"
+                                                                     "hit)\n"
+            "  - Upcalls:       %12"PRIu64"  (%4.1f %%, %.1f us/upcall)\n"
+            "  - Lost upcalls:  %12"PRIu64"  (%4.1f %%)\n"
+            "\n",
+            packets, (packets / duration) / 1000,
+            1.0 * stats[PMD_CYCLES_ITER_BUSY] / packets,
+            passes, packets ? 1.0 * passes / packets : 0,
+            stats[PMD_STAT_EXACT_HIT],
+            100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
+            stats[PMD_STAT_MASKED_HIT],
+            100.0 * stats[PMD_STAT_MASKED_HIT] / passes,
+            1.0 * stats[PMD_STAT_MASKED_LOOKUP] / stats[PMD_STAT_MASKED_HIT],
+            upcalls, 100.0 * upcalls / passes,
+            upcalls ? (upcall_cycles * us_per_cycle) / upcalls : 0,
+            stats[PMD_STAT_LOST],
+            100.0 * stats[PMD_STAT_LOST] / passes);
+    } else {
+        ds_put_format(str,
+                "  Packets:         %12"PRIu64"\n"
+                "\n",
+                0UL);
+    }
+}
+
+void
+pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats *s)
+{
+    int i;
+
+    ds_put_cstr(str, "Histograms\n");
+    ds_put_format(str,
+                  "   %-21s  %-21s  %-21s  %-21s  %-21s  %-21s  %-21s\n",
+                  "cycles/it", "packets/it", "cycles/pkt", "pkts/batch",
+                  "max vhost qlen", "upcalls/it", "cycles/upcall");
+    for (i = 0; i < NUM_BINS-1; i++) {
+        ds_put_format(str,
+            "   %-9d %-11"PRIu64"  %-9d %-11"PRIu64"  %-9d %-11"PRIu64
+            "  %-9d %-11"PRIu64"  %-9d %-11"PRIu64"  %-9d %-11"PRIu64
+            "  %-9d %-11"PRIu64"\n",
+            s->cycles.wall[i], s->cycles.bin[i],
+            s->pkts.wall[i],s->pkts.bin[i],
+            s->cycles_per_pkt.wall[i], s->cycles_per_pkt.bin[i],
+            s->pkts_per_batch.wall[i], s->pkts_per_batch.bin[i],
+            s->max_vhost_qfill.wall[i], s->max_vhost_qfill.bin[i],
+            s->upcalls.wall[i], s->upcalls.bin[i],
+            s->cycles_per_upcall.wall[i], s->cycles_per_upcall.bin[i]);
+    }
+    ds_put_format(str,
+                  "   %-9s %-11"PRIu64"  %-9s %-11"PRIu64"  %-9s %-11"PRIu64
+                  "  %-9s %-11"PRIu64"  %-9s %-11"PRIu64"  %-9s %-11"PRIu64
+                  "  %-9s %-11"PRIu64"\n",
+                  ">", s->cycles.bin[i],
+                  ">", s->pkts.bin[i],
+                  ">", s->cycles_per_pkt.bin[i],
+                  ">", s->pkts_per_batch.bin[i],
+                  ">", s->max_vhost_qfill.bin[i],
+                  ">", s->upcalls.bin[i],
+                  ">", s->cycles_per_upcall.bin[i]);
+    if (s->totals.iterations > 0) {
+        ds_put_cstr(str,
+                    "-----------------------------------------------------"
+                    "-----------------------------------------------------"
+                    "------------------------------------------------\n");
+        ds_put_format(str,
+                      "   %-21s  %-21s  %-21s  %-21s  %-21s  %-21s  %-21s\n",
+                      "cycles/it", "packets/it", "cycles/pkt", "pkts/batch",
+                      "vhost qlen", "upcalls/it", "cycles/upcall");
+        ds_put_format(str,
+                      "   %-21"PRIu64"  %-21.5f  %-21"PRIu64
+                      "  %-21.5f  %-21.5f  %-21.5f  %-21"PRIu32"\n",
+                      s->totals.cycles / s->totals.iterations,
+                      1.0 * s->totals.pkts / s->totals.iterations,
+                      s->totals.pkts
+                          ? s->totals.busy_cycles / s->totals.pkts : 0,
+                      s->totals.batches
+                          ? 1.0 * s->totals.pkts / s->totals.batches : 0,
+                      1.0 * s->totals.max_vhost_qfill / s->totals.iterations,
+                      1.0 * s->totals.upcalls / s->totals.iterations,
+                      s->totals.upcalls
+                          ? s->totals.upcall_cycles / s->totals.upcalls : 0);
+    }
+}
+
+void
+pmd_perf_format_iteration_history(struct ds *str, struct pmd_perf_stats *s,
+                                  int n_iter)
+{
+    struct iter_stats *is;
+    size_t index;
+    int i;
+
+    if (n_iter == 0) {
+        return;
+    }
+    ds_put_format(str, "   %-17s   %-10s   %-10s   %-10s   %-10s   "
+                  "%-10s   %-10s   %-10s\n",
+                  "tsc", "cycles", "packets", "cycles/pkt", "pkts/batch",
+                  "vhost qlen", "upcalls", "cycles/upcall");
+    for (i = 1; i <= n_iter; i++) {
+        index = (s->iterations.idx + HISTORY_LEN - i) % HISTORY_LEN;
+        is = &s->iterations.sample[index];
+        ds_put_format(str,
+                      "   %-17"PRIu64"   %-11"PRIu64"  %-11"PRIu32
+                      "  %-11"PRIu64"  %-11"PRIu32"  %-11"PRIu32
+                      "  %-11"PRIu32"  %-11"PRIu32"\n",
+                      is->timestamp,
+                      is->cycles,
+                      is->pkts,
+                      is->pkts ? is->cycles / is->pkts : 0,
+                      is->batches ? is->pkts / is->batches : 0,
+                      is->max_vhost_qfill,
+                      is->upcalls,
+                      is->upcalls ? is->upcall_cycles / is->upcalls : 0);
+    }
+}
+
+void
+pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s, int n_ms)
+{
+    struct iter_stats *is;
+    size_t index;
+    int i;
+
+    if (n_ms == 0) {
+        return;
+    }
+    ds_put_format(str,
+                  "   %-12s   %-10s   %-10s   %-10s   %-10s"
+                  "   %-10s   %-10s   %-10s   %-10s\n",
+                  "ms", "iterations", "cycles/it", "Kpps", "cycles/pkt",
+                  "pkts/batch", "vhost qlen", "upcalls", "cycles/upcall");
+    for (i = 1; i <= n_ms; i++) {
+        index = (s->milliseconds.idx + HISTORY_LEN - i) % HISTORY_LEN;
+        is = &s->milliseconds.sample[index];
+        ds_put_format(str,
+                      "   %-12"PRIu64"   %-11"PRIu32"  %-11"PRIu64
+                      "  %-11"PRIu32"  %-11"PRIu64"  %-11"PRIu32
+                      "  %-11"PRIu32"  %-11"PRIu32"  %-11"PRIu32"\n",
+                      is->timestamp,
+                      is->iterations,
+                      is->iterations ? is->cycles / is->iterations : 0,
+                      is->pkts,
+                      is->pkts ? is->busy_cycles / is->pkts : 0,
+                      is->batches ? is->pkts / is->batches : 0,
+                      is->iterations
+                          ? is->max_vhost_qfill / is->iterations : 0,
+                      is->upcalls,
+                      is->upcalls ? is->upcall_cycles / is->upcalls : 0);
+    }
+}
+
+void
 pmd_perf_read_counters(struct pmd_perf_stats *s,
                        uint64_t stats[PMD_N_STATS])
 {
@@ -59,9 +351,21 @@ void
 pmd_perf_stats_clear(struct pmd_perf_stats *s)
 {
     s->start_ms = 0; /* Marks start of clearing. */
+    memset(&s->current, 0 , sizeof(struct iter_stats));
+    memset(&s->totals, 0 , sizeof(struct iter_stats));
     for (int i = 0; i < PMD_N_STATS; i++) {
         atomic_read_relaxed(&s->counters.n[i], &s->counters.zero[i]);
     }
+    histogram_clear(&s->cycles);
+    histogram_clear(&s->pkts);
+    histogram_clear(&s->cycles_per_pkt);
+    histogram_clear(&s->upcalls);
+    histogram_clear(&s->cycles_per_upcall);
+    histogram_clear(&s->pkts_per_batch);
+    histogram_clear(&s->max_vhost_qfill);
+    history_init(&s->iterations);
+    history_init(&s->milliseconds);
     s->start_ms = time_msec(); /* Clearing finished. */
+    s->milliseconds.sample[0].timestamp = s->start_ms;
 }
 
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index f55f7a2..01fde8d 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -33,6 +33,11 @@
 extern "C" {
 #endif
 
+#define NUM_BINS 32             /* Number of histogram bins. */
+#define HISTORY_LEN 1000        /* Length of recorded history
+                                   (iterations and ms). */
+#define DEF_HIST_SHOW 20        /* Default number of history samples. */
+
 enum pmd_stat_type {
     PMD_STAT_EXACT_HIT,     /* Packets that had an exact match (emc). */
     PMD_STAT_MASKED_HIT,    /* Packets that matched in the flow table. */
@@ -60,6 +65,28 @@ enum pmd_stat_type {
     PMD_N_STATS
 };
 
+struct histogram {
+    uint32_t wall[NUM_BINS];
+    uint64_t bin[NUM_BINS];
+};
+
+struct iter_stats {
+    uint64_t timestamp;
+    uint64_t cycles;
+    uint64_t busy_cycles;
+    uint32_t iterations;
+    uint32_t pkts;
+    uint32_t upcalls;
+    uint32_t upcall_cycles;
+    uint32_t batches;
+    uint32_t max_vhost_qfill;
+};
+
+struct history {
+    uint64_t idx;
+    struct iter_stats sample[HISTORY_LEN];
+};
+
 struct pmd_counters {
     atomic_uint64_t n[PMD_N_STATS];     /* Indexed by PMD_STAT_*. */
     uint64_t zero[PMD_N_STATS];
@@ -69,6 +96,17 @@ struct pmd_perf_stats {
     uint64_t start_ms;
     uint64_t last_tsc;
     struct pmd_counters counters;
+    struct iter_stats current;
+    struct iter_stats totals;
+    struct histogram cycles;
+    struct histogram pkts;
+    struct histogram cycles_per_pkt;
+    struct histogram upcalls;
+    struct histogram cycles_per_upcall;
+    struct histogram pkts_per_batch;
+    struct histogram max_vhost_qfill;
+    struct history iterations;
+    struct history milliseconds;
 };
 
 enum pmd_info_type;
@@ -96,6 +134,63 @@ pmd_perf_update_counter(struct pmd_perf_stats *s,
     atomic_store_relaxed(&s->counters.n[counter], tmp);
 }
 
+void pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
+                                   double duration);
+void pmd_perf_format_histograms(struct ds *str, struct pmd_perf_stats *s);
+void pmd_perf_format_iteration_history(struct ds *str,
+                                       struct pmd_perf_stats *s,
+                                       int n_iter);
+void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s,
+                                int n_ms);
+
+void pmd_perf_show(struct unixctl_conn *conn, int argc,
+                   const char *argv[],
+                   void *aux OVS_UNUSED);
+static inline void
+histogram_add_sample(struct histogram *hist, uint32_t val)
+{
+    /* TODO: Can do better with binary search? */
+    for (int i = 0; i < NUM_BINS-1; i++) {
+        if (val <= hist->wall[i]) {
+            hist->bin[i]++;
+            return;
+        }
+    }
+    hist->bin[NUM_BINS-1]++;
+}
+
+uint64_t histogram_samples(const struct histogram *hist);
+
+static inline struct iter_stats *
+history_current(struct history *h)
+{
+    return &h->sample[h->idx];
+}
+
+static inline struct iter_stats *
+history_next(struct history *h)
+{
+    struct iter_stats *next;
+
+    h->idx++;
+    if (h->idx == HISTORY_LEN) {
+        h->idx = 0;
+    }
+    next = &h->sample[h->idx];
+    memset(next, 0, sizeof(*next));
+    return next;
+}
+
+static inline struct iter_stats *
+history_store(struct history *h, struct iter_stats *is)
+{
+    if (is) {
+        h->sample[h->idx] = *is;
+    }
+    /* Advance the history pointer */
+    return history_next(h);
+}
+
 static inline void
 pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc)
 {
@@ -103,14 +198,25 @@ pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc)
         /* Stats not yet fully initialised. */
         return;
     }
-    s->last_tsc = now_tsc;
+    memset(&s->current, 0, sizeof(struct iter_stats));
+    s->current.timestamp = now_tsc;
 }
 
 static inline void
 pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
                        int packets)
 {
-    uint64_t cycles = now_tsc - s->last_tsc;
+    struct iter_stats *cum_ms;
+    uint64_t cycles, cycles_per_pkt = 0;
+
+    if (OVS_UNLIKELY(s->current.timestamp == 0)) {
+        /* Stats were cleared during the ongoing iteration. */
+        return;
+    }
+
+    cycles = now_tsc - s->current.timestamp;
+    s->current.cycles = cycles;
+    s->current.pkts = packets;
 
     /* No need for atomic updates as only invoked within PMD thread. */
     if (packets > 0) {
@@ -118,6 +224,62 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
     } else {
         s->counters.n[PMD_CYCLES_ITER_IDLE] += cycles;
     }
+    s->counters.n[PMD_CYCLES_UPCALL] += s->current.upcall_cycles;
+
+    /* Add iteration sample to histograms. */
+    histogram_add_sample(&s->cycles, cycles);
+    histogram_add_sample(&s->pkts, packets);
+    if (packets > 0) {
+        cycles_per_pkt = cycles / packets;
+        histogram_add_sample(&s->cycles_per_pkt, cycles_per_pkt);
+    }
+    if (s->current.batches > 0) {
+        histogram_add_sample(&s->pkts_per_batch, packets / s->current.batches);
+    }
+    histogram_add_sample(&s->upcalls, s->current.upcalls);
+    if (s->current.upcalls > 0) {
+        histogram_add_sample(&s->cycles_per_upcall,
+                             s->current.upcall_cycles / s->current.upcalls);
+    }
+    histogram_add_sample(&s->max_vhost_qfill, s->current.max_vhost_qfill);
+
+    /* Add iteration samples to millisecond stats. */
+    cum_ms = history_current(&s->milliseconds);
+    cum_ms->iterations++;
+    cum_ms->cycles += cycles;
+    if (packets > 0) {
+        cum_ms->busy_cycles += cycles;
+    }
+    cum_ms->pkts += s->current.pkts;
+    cum_ms->upcalls += s->current.upcalls;
+    cum_ms->upcall_cycles += s->current.upcall_cycles;
+    cum_ms->batches += s->current.batches;
+    cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
+
+    /* Store in iteration history. */
+    history_store(&s->iterations, &s->current);
+    if (now_tsc - s->last_tsc > 10000) {
+        /* Check if ms is completed and store in milliseconds history. */
+        uint64_t now = time_msec();
+        if (now != cum_ms->timestamp) {
+            /* Add ms stats to totals. */
+            s->totals.iterations += cum_ms->iterations;
+            s->totals.cycles += cum_ms->cycles;
+            s->totals.busy_cycles += cum_ms->busy_cycles;
+            s->totals.pkts += cum_ms->pkts;
+            s->totals.upcalls += cum_ms->upcalls;
+            s->totals.upcall_cycles += cum_ms->upcall_cycles;
+            s->totals.batches += cum_ms->batches;
+            s->totals.max_vhost_qfill += cum_ms->max_vhost_qfill;
+            cum_ms = history_next(&s->milliseconds);
+            cum_ms->timestamp = now;
+        }
+        s->last_tsc = now_tsc;
+    }
 }
 
+#ifdef  __cplusplus
+}
+#endif
+
 #endif /* pmd-perf.h */
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 2347f55..81f090b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -52,6 +52,7 @@
 #include "id-pool.h"
 #include "latch.h"
 #include "netdev.h"
+#include "netdev-provider.h"
 #include "netdev-vport.h"
 #include "netlink.h"
 #include "odp-execute.h"
@@ -749,7 +750,8 @@ get_dp_netdev(const struct dpif *dpif)
 enum pmd_info_type {
     PMD_INFO_SHOW_STATS,  /* Show how cpu cycles are spent. */
     PMD_INFO_CLEAR_STATS, /* Set the cycles count to 0. */
-    PMD_INFO_SHOW_RXQ     /* Show poll-lists of pmd threads. */
+    PMD_INFO_SHOW_RXQ,    /* Show poll-lists of pmd threads. */
+    PMD_INFO_PERF_SHOW,   /* Show pmd performance details. */
 };
 
 static void
@@ -828,6 +830,42 @@ pmd_info_show_stats(struct ds *reply,
                   stats[PMD_CYCLES_POLL_BUSY], total_packets);
 }
 
+static void
+pmd_info_show_perf(struct ds *reply,
+                   struct dp_netdev_pmd_thread *pmd,
+                   struct pmd_perf_params *par)
+{
+    if (pmd->core_id != NON_PMD_CORE_ID) {
+        char *time_str =
+                xastrftime_msec("%H:%M:%S.###", time_wall_msec(), true);
+        long long now = time_msec();
+        double duration = (now - pmd->perf_stats.start_ms) / 1000.0;
+
+        ds_put_cstr(reply, "\n");
+        ds_put_format(reply, "Time: %s\n", time_str);
+        ds_put_format(reply, "Measurement duration: %.3f s\n", duration);
+        ds_put_cstr(reply, "\n");
+        format_pmd_thread(reply, pmd);
+        ds_put_cstr(reply, "\n");
+        pmd_perf_format_overall_stats(reply, &pmd->perf_stats, duration);
+        if (par->histograms) {
+            ds_put_cstr(reply, "\n");
+            pmd_perf_format_histograms(reply, &pmd->perf_stats);
+        }
+        if (par->iter_hist_len > 0) {
+            ds_put_cstr(reply, "\n");
+            pmd_perf_format_iteration_history(reply, &pmd->perf_stats,
+                                              par->iter_hist_len);
+        }
+        if (par->ms_hist_len > 0) {
+            ds_put_cstr(reply, "\n");
+            pmd_perf_format_ms_history(reply, &pmd->perf_stats,
+                                       par->ms_hist_len);
+        }
+        free(time_str);
+    }
+}
+
 static int
 compare_poll_list(const void *a_, const void *b_)
 {
@@ -1030,6 +1068,8 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
             pmd_perf_stats_clear(&pmd->perf_stats);
         } else if (type == PMD_INFO_SHOW_STATS) {
             pmd_info_show_stats(&reply, pmd);
+        } else if (type == PMD_INFO_PERF_SHOW) {
+            pmd_info_show_perf(&reply, pmd, (struct pmd_perf_params *)aux);
         }
     }
     free(pmd_list);
@@ -1039,6 +1079,48 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
     unixctl_command_reply(conn, ds_cstr(&reply));
     ds_destroy(&reply);
 }
+
+static void
+pmd_perf_show_cmd(struct unixctl_conn *conn, int argc,
+                          const char *argv[],
+                          void *aux OVS_UNUSED)
+{
+    struct pmd_perf_params par;
+    long int it_hist = 0, ms_hist = 0;
+    par.histograms = true;
+
+    while (argc > 1) {
+        if (!strcmp(argv[1], "-nh")) {
+            par.histograms = false;
+            argc -= 1;
+            argv += 1;
+        } else if (!strcmp(argv[1], "-it") && argc > 2) {
+            it_hist = strtol(argv[2], NULL, 10);
+            if (it_hist < 0) {
+                it_hist = 0;
+            } else if (it_hist > HISTORY_LEN) {
+                it_hist = HISTORY_LEN;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-ms") && argc > 2) {
+            ms_hist = strtol(argv[2], NULL, 10);
+            if (ms_hist < 0) {
+                ms_hist = 0;
+            } else if (ms_hist > HISTORY_LEN) {
+                ms_hist = HISTORY_LEN;
+            }
+            argc -= 2;
+            argv += 2;
+        } else {
+            break;
+        }
+    }
+    par.iter_hist_len = it_hist;
+    par.ms_hist_len = ms_hist;
+    par.command_type = PMD_INFO_PERF_SHOW;
+    dpif_netdev_pmd_info(conn, argc, argv, &par);
+}
 

 static int
 dpif_netdev_init(void)
@@ -1056,6 +1138,12 @@ dpif_netdev_init(void)
     unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]",
                              0, 2, dpif_netdev_pmd_info,
                              (void *)&poll_aux);
+    unixctl_command_register("dpif-netdev/pmd-perf-show",
+                             "[-nh] [-it iter-history-len]"
+                             " [-ms ms-history-len]"
+                             " [-pmd core] [dp]",
+                             0, 7, pmd_perf_show_cmd,
+                             NULL);
     unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
                              0, 1, dpif_netdev_pmd_rebalance,
                              NULL);
@@ -3143,6 +3231,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
                            struct netdev_rxq *rx,
                            odp_port_t port_no)
 {
+    struct pmd_perf_stats *s = &pmd->perf_stats;
     struct dp_packet_batch batch;
     int error;
     int batch_cnt = 0;
@@ -3151,8 +3240,23 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
     error = netdev_rxq_recv(rx, &batch);
     if (!error) {
         *recirc_depth_get() = 0;
-
+        /* Update batch histogram. */
         batch_cnt = batch.count;
+        s->current.batches++;
+        histogram_add_sample(&s->pkts_per_batch, batch_cnt);
+        /* Update the maximum Rx queue fill level. */
+        uint32_t qfill = batch.qfill;
+        switch (netdev_dpdk_get_type(netdev_rxq_get_netdev(rx))) {
+        case DPDK_DEV_VHOST:
+            if (qfill > s->current.max_vhost_qfill) {
+                s->current.max_vhost_qfill = qfill;
+            }
+            break;
+        case DPDK_DEV_ETH:
+        default:
+            break;
+        }
+        /* Process packet batch. */
         dp_netdev_input(pmd, &batch, port_no);
     } else if (error != EAGAIN && error != EOPNOTSUPP) {
         static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
@@ -4902,6 +5006,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
     struct match match;
     ovs_u128 ufid;
     int error = 0;
+    uint64_t cycles = cycles_counter();
 
     match.tun_md.valid = false;
     miniflow_expand(&key->mf, &match.flow);
@@ -4955,6 +5060,12 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
         ovs_mutex_unlock(&pmd->flow_mutex);
         emc_probabilistic_insert(pmd, key, netdev_flow);
     }
+    /* Update upcall stats. */
+    cycles = cycles_counter() - cycles;
+    struct pmd_perf_stats *s = &pmd->perf_stats;
+    s->current.upcalls++;
+    s->current.upcall_cycles += cycles;
+    histogram_add_sample(&s->cycles_per_upcall, cycles);
     return error;
 }
 
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c60f46f..f7be5dc 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -35,6 +35,7 @@
 #include <rte_mbuf.h>
 #include <rte_meter.h>
 #include <rte_pci.h>
+#include <rte_version.h>
 #include <rte_vhost.h>
 
 #include "dirs.h"
@@ -196,11 +197,6 @@ enum { DPDK_RING_SIZE = 256 };
 BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
 enum { DRAIN_TSC = 200000ULL };
 
-enum dpdk_dev_type {
-    DPDK_DEV_ETH = 0,
-    DPDK_DEV_VHOST = 1,
-};
-
 /* Quality of Service */
 
 /* An instance of a QoS configuration.  Always associated with a particular
@@ -846,6 +842,13 @@ netdev_dpdk_cast(const struct netdev *netdev)
     return CONTAINER_OF(netdev, struct netdev_dpdk, up);
 }
 
+enum dpdk_dev_type
+netdev_dpdk_get_type(const struct netdev *netdev)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    return dev->type;
+}
+
 static struct netdev *
 netdev_dpdk_alloc(void)
 {
@@ -1656,6 +1659,19 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
         return EAGAIN;
     }
 
+    batch->qfill = nb_rx;
+
+    if (OVS_UNLIKELY(nb_rx == NETDEV_MAX_BURST)) {
+        /* This vhostuser API call is only implemented in DPDK 17.08.
+         * The need for this API call disappears when switching to the
+         * vhostuser PMD. */
+#if RTE_VERSION >= RTE_VERSION_NUM (17, 8, 0, RTE_VER_RELEASE)
+        batch->qfill += rte_vhost_rx_queue_count(netdev_dpdk_get_vid(dev),
+                                                 qid * VIRTIO_QNUM
+                                                     + VIRTIO_TXQ);
+#endif
+    }
+
     if (policer) {
         dropped = nb_rx;
         nb_rx = ingress_policer_run(policer,
@@ -1695,6 +1711,8 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch)
         return EAGAIN;
     }
 
+    batch->qfill = nb_rx;
+
     if (policer) {
         dropped = nb_rx;
         nb_rx = ingress_policer_run(policer,
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index b7d02a7..2b357db 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -22,11 +22,18 @@
 #include "openvswitch/compiler.h"
 
 struct dp_packet;
+struct netdev;
+
+enum dpdk_dev_type {
+    DPDK_DEV_ETH = 0,
+    DPDK_DEV_VHOST = 1,
+};
 
 #ifdef DPDK_NETDEV
 
 void netdev_dpdk_register(void);
 void free_dpdk_buf(struct dp_packet *);
+enum dpdk_dev_type netdev_dpdk_get_type(const struct netdev *netdev);
 
 #else
 
@@ -41,6 +48,13 @@ free_dpdk_buf(struct dp_packet *buf OVS_UNUSED)
     /* Nothing */
 }
 
+static inline enum dpdk_dev_type
+netdev_dpdk_get_type(const struct netdev *netdev OVS_UNUSED)
+{
+    /* Nothing to do. Return value zero to make compiler happy. */
+    return DPDK_DEV_ETH;
+}
+
 #endif
 
 #endif /* netdev-dpdk.h */
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 43d670a..5f67999 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -5308,7 +5308,8 @@ dpif_show_backer(const struct dpif_backer *backer, struct ds *ds)
 
     dpif_get_dp_stats(backer->dpif, &dp_stats);
     ds_put_format(ds, "%s: hit:%"PRIu64" missed:%"PRIu64"\n",
-                  dpif_name(backer->dpif), dp_stats.n_hit, dp_stats.n_missed);
+                  dpif_name(backer->dpif),
+                  dp_stats.n_hit + dp_stats.n_mask_hit, dp_stats.n_missed);
 
     shash_init(&ofproto_shash);
     ofprotos = get_ofprotos(&ofproto_shash);

From patchwork Wed Oct 18 10:46:22 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Scheurich <jan.scheurich@ericsson.com>
X-Patchwork-Id: 827525
X-Patchwork-Delegate: ian.stokes@intel.com
Return-Path: <ovs-dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
	(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
	envelope-from=ovs-dev-bounces@openvswitch.org;
	receiver=<UNKNOWN>)
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
	[140.211.169.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3yH8185H9kz9t39
	for <incoming@patchwork.ozlabs.org>;
	Wed, 18 Oct 2017 21:48:28 +1100 (AEDT)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
	by mail.linuxfoundation.org (Postfix) with ESMTP id 4350EB00;
	Wed, 18 Oct 2017 10:46:30 +0000 (UTC)
X-Original-To: ovs-dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 53742AD5
	for <ovs-dev@openvswitch.org>; Wed, 18 Oct 2017 10:46:28 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from sessmg22.ericsson.net (sessmg22.ericsson.net [193.180.251.58])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 64F194CA
	for <ovs-dev@openvswitch.org>; Wed, 18 Oct 2017 10:46:27 +0000 (UTC)
X-AuditID: c1b4fb3a-dffff70000006897-fd-59e73100a45b
Received: from ESESSHC003.ericsson.se (Unknown_Domain [153.88.183.27])
	by sessmg22.ericsson.net (Symantec Mail Security) with SMTP id
	55.B2.26775.00137E95; Wed, 18 Oct 2017 12:46:24 +0200 (CEST)
Received: from ESESSMB107.ericsson.se ([169.254.7.134]) by
	ESESSHC003.ericsson.se ([153.88.183.27]) with mapi id 14.03.0352.000;
	Wed, 18 Oct 2017 12:46:23 +0200
From: Jan Scheurich <jan.scheurich@ericsson.com>
To: "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>
Thread-Topic: [PATCH v2 3/3] dpif-netdev: Detection and logging of
	suspicious PMD
Thread-Index: AdNH/Y7y+a72WQe4RI2TKsIyaPqxPA==
Date: Wed, 18 Oct 2017 10:46:22 +0000
Message-ID: <CFF8EF42F1132E4CBE2BF0AB6C21C58D7883B186@ESESSMB107.ericsson.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [153.88.183.16]
MIME-Version: 1.0
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFrrGLMWRmVeSWpSXmKPExsUyM2K7tC6D4fNIg/9PhSzmfnrO6MDo8ezm
	f8YAxigum5TUnMyy1CJ9uwSujOcz1jEVfIyoeLnnMXMD4zq7LkYODgkBE4m/e2W6GDk5hASO
	MEps268MYS9hlPh33AukhE3AQGL2bgeQsIiAucSJD+fYQWxmAT2JQ+sbWEBKhAV8JHbfq4Mo
	CZY4u+oEK4StJ9Fy4hETiM0ioCqxa8ZvsFZeAV+JCW+b2UBsRgExie+n1jBBjBSXuPVkPpgt
	ISAgsWTPeWYIW1Ti5eN/rBC2osTOs+3MEPU6Egt2f2KDsLUlli18zQwxX1Di5MwnLBMYhWch
	GTsLScssJC2zkLQsYGRZxShanFpcnJtuZKSXWpSZXFycn6eXl1qyiREY2ge3/LbawXjwueMh
	RgEORiUeXka155FCrIllxZW5hxglOJiVRHiZPz6LFOJNSaysSi3Kjy8qzUktPsQozcGiJM7r
	sO9ChJBAemJJanZqakFqEUyWiYNTqoGxP3H6g2WfYrSUv4p27LmX/rzQ5p9qRMu9UP3ok39f
	ZZ1b61kyw9IglrHlz857E+/Wpuz/l+yrudehTLfPivP7mmKZqm1pkRXRToKFBrF2xT/zg5IS
	9yZdjYq5ey2ZU+S1XtO5xysVd3y73T7D8csn3QhJv712U86d/Fa01EMiwVJY5HV8x3clluKM
	REMt5qLiRABIHSQjaQIAAA==
X-Spam-Status: No, score=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED
	autolearn=disabled version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Subject: [ovs-dev] [PATCH v2 3/3] dpif-netdev: Detection and logging of
	suspicious PMD
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <ovs-dev.openvswitch.org>
List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/>
List-Post: <mailto:ovs-dev@openvswitch.org>
List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help>
List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

This patch enhances dpif-netdev-perf to detect iterations with
suspicious statistics according to the following criteria:

- iteration lasts longer than US_THR microseconds (default 250).
  This can be used to capture events where a PMD is blocked or
  interrupted for such a period of time that there is a risk for
  dropped packets on any of its Rx queues.

- max vhost qlen exceeds a threshold Q_THR (default 128). This can
  be used to infer virtio queue overruns and dropped packets inside
  a VM, which are not visible in OVS otherwise.

Such suspicious iterations can be logged together with their iteration
statistics to be able to correlate them to packet drop or other events
outside OVS.

A new command is introduced to enable/disable logging at run-time and
to adjust the above thresholds for suspicious iterations:

ovs-appctl dpif-netdev/pmd-perf-log-set [on|off]
    [-b before] [-a after] [-us usec] [-q qlen]

Turn logging on or off at run-time (on|off).

-b before:  The number of iterations before the suspicious iteration to
            be logged (default 5).
-a after:   The number of iterations after the suspicious iteration to
            be logged (default 5).
-q qlen:    Suspicious vhost queue fill level threshold. Increase this
            to 512 if the Qemu supports 1024 virtio queue length.
            (default 128).
-us usec:   change the duration threshold for a suspicious iteration
            (default 250 us).

If more than 100 iterations before or after a suspicious iteration have
been looged once, OVS falls back to the safe default values (5/5) to
avoid that logging itself causes continuos further logging.

Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com>
---
 lib/dpif-netdev-perf.c | 141 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/dpif-netdev-perf.h |  32 +++++++++++
 lib/dpif-netdev.c      |   5 ++
 3 files changed, 178 insertions(+)

diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 42d93a2..455c4b5 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -28,6 +28,23 @@
 
 VLOG_DEFINE_THIS_MODULE(pmd_perf);
 
+#define ITER_US_THRESHOLD 250   /* Warning threshold for iteration duration
+                                   in microseconds. */
+#define VHOST_QUEUE_FULL 128    /* Size of the virtio TX queue. */
+#define LOG_IT_BEFORE 5         /* Number of iteration to log before
+                                   suspicious iteration. */
+#define LOG_IT_AFTER 5          /* Number of iteration to log after
+                                   suspicious iteration. */
+
+bool log_on = false;
+static uint32_t log_it_before = LOG_IT_BEFORE;
+static uint32_t log_it_after = LOG_IT_AFTER;
+static uint32_t log_us_thr = ITER_US_THRESHOLD;
+uint32_t log_q_thr = VHOST_QUEUE_FULL;
+uint64_t iter_cycle_threshold;
+
+static struct vlog_rate_limit latency_rl = VLOG_RATE_LIMIT_INIT(600, 600);
+
 #ifdef DPDK_NETDEV
 static uint64_t
 get_tsc_hz(void)
@@ -133,6 +150,8 @@ pmd_perf_stats_init(struct pmd_perf_stats *s) {
     histogram_walls_set_log(&s->cycles_per_upcall, 1000, 1000000);
     histogram_walls_set_log(&s->max_vhost_qfill, 0, 512);
     s->start_ms = time_msec();
+    s->log_begin_it = UINT64_MAX;
+    s->log_end_it = UINT64_MAX;
 }
 
 void
@@ -365,7 +384,129 @@ pmd_perf_stats_clear(struct pmd_perf_stats *s)
     histogram_clear(&s->max_vhost_qfill);
     history_init(&s->iterations);
     history_init(&s->milliseconds);
+    s->log_begin_it = UINT64_MAX;
+    s->log_end_it = UINT64_MAX;
     s->start_ms = time_msec(); /* Clearing finished. */
     s->milliseconds.sample[0].timestamp = s->start_ms;
 }
 
+void
+pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
+                                     uint64_t cycles,
+                                     char *reason)
+{
+    VLOG_WARN_RL(&latency_rl,
+                 "Suspicious iteration (%s): tsc=%"PRIu64
+                 " duration=%"PRIu64" us\n",
+                 reason, s->current.timestamp,
+                 (1000000L * cycles) / get_tsc_hz());
+    if (log_it_before + log_it_after > 0) {
+        if (s->log_begin_it == UINT64_MAX) {
+            s->log_begin_it =
+                    (s->iterations.idx + HISTORY_LEN - log_it_before)
+                        % HISTORY_LEN;
+        }
+        s->log_end_it =
+                (s->iterations.idx + log_it_after) % HISTORY_LEN;
+    }
+}
+
+void
+pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s)
+{
+    struct ds log = DS_EMPTY_INITIALIZER;
+    pmd_perf_format_iteration_history(&log, s,
+            (s->log_end_it + HISTORY_LEN - s->log_begin_it) % HISTORY_LEN);
+    VLOG_WARN_RL(&latency_rl,
+                 "Neighborhood of suspicious iteration:\n"
+                 "%s", ds_cstr(&log));
+    ds_destroy(&log);
+    s->log_end_it = s->log_begin_it = UINT64_MAX;
+    if (log_it_before > 100 || log_it_after > 100) {
+        /* Reset to safe default values to avoid disturbance. */
+        log_it_before = LOG_IT_BEFORE;
+        log_it_after = LOG_IT_AFTER;
+    }
+}
+
+void
+pmd_perf_log_set_cmd(struct unixctl_conn *conn,
+                 int argc, const char *argv[],
+                 void *aux OVS_UNUSED)
+{
+    int it_before, it_after, us_thr, q_thr;
+    bool on;
+    bool usage = false;
+
+    on = log_on;
+    it_before = log_it_before;
+    it_after = log_it_after;
+    q_thr = log_q_thr;
+    us_thr = log_us_thr;
+
+    while (argc > 1) {
+        if (!strcmp(argv[1], "on")) {
+            on = true;
+            argc--;
+            argv++;
+        } else if (!strcmp(argv[1], "off")) {
+            on = false;
+            argc--;
+            argv++;
+        } else if (!strcmp(argv[1], "-a")) {
+            it_after = strtol(argv[2], NULL, 10);
+            if (it_after < 0) {
+                it_after = 0;
+            } else if (it_before + it_after > HISTORY_LEN-1) {
+                it_after = HISTORY_LEN-1 - it_before;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-b")) {
+            it_before = strtol(argv[2], NULL, 10);
+            if (it_before < 0) {
+                it_before = 0;
+            } else if (it_before > HISTORY_LEN-10) {
+                it_before = HISTORY_LEN-10;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-q")) {
+            q_thr = strtol(argv[2], NULL, 10);
+            if (q_thr < 0) {
+                q_thr = 0;
+            }
+            argc -= 2;
+            argv += 2;
+        } else if (!strcmp(argv[1], "-us")) {
+            us_thr = strtol(argv[2], NULL, 10);
+            if (us_thr < 0) {
+                us_thr = 0;
+            }
+            argc -= 2;
+            argv += 2;
+        } else {
+            usage = true;
+            break;
+        }
+    }
+
+    if (usage) {
+        unixctl_command_reply_error(conn,
+                "Usage: ovs-appctl dpif-netdev/pmd-perf-log-set "
+                "[on|off] [-b before] [-a after] [-us usec] [-q qlen]");
+        return;
+    }
+
+    VLOG_INFO("pmd-perf-log-set: %s, before=%d, after=%d, us_thr=%d,"
+              " q_thr=%d\n",
+              on ? "on" : "off", it_before, it_after, us_thr, q_thr);
+    log_on = on;
+    log_it_before = it_before;
+    log_it_after = it_after;
+    log_q_thr = q_thr;
+    log_us_thr = us_thr;
+    iter_cycle_threshold = (log_us_thr * get_tsc_hz()) / 1000000L;
+
+    unixctl_command_reply(conn, "");
+}
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 01fde8d..442c94b 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -107,6 +107,8 @@ struct pmd_perf_stats {
     struct histogram max_vhost_qfill;
     struct history iterations;
     struct history milliseconds;
+    uint64_t log_begin_it;
+    uint64_t log_end_it;
 };
 
 enum pmd_info_type;
@@ -119,6 +121,10 @@ struct pmd_perf_params {
     size_t ms_hist_len;
 };
 
+extern bool log_on;
+extern uint32_t log_q_thr;
+extern uint64_t iter_cycle_threshold;
+
 void pmd_perf_stats_init(struct pmd_perf_stats *s);
 void pmd_perf_stats_clear(struct pmd_perf_stats *s);
 void pmd_perf_read_counters(struct pmd_perf_stats *s,
@@ -143,9 +149,17 @@ void pmd_perf_format_iteration_history(struct ds *str,
 void pmd_perf_format_ms_history(struct ds *str, struct pmd_perf_stats *s,
                                 int n_ms);
 
+void pmd_perf_log_suspicious_iteration(struct pmd_perf_stats *s,
+                                       uint64_t cycles,
+                                       char *reason);
+void pmd_perf_log_iteration_neighborhood(struct pmd_perf_stats *s);
 void pmd_perf_show(struct unixctl_conn *conn, int argc,
                    const char *argv[],
                    void *aux OVS_UNUSED);
+void pmd_perf_log_set_cmd(struct unixctl_conn *conn,
+                      int argc, const char *argv[],
+                      void *aux OVS_UNUSED);
+
 static inline void
 histogram_add_sample(struct histogram *hist, uint32_t val)
 {
@@ -208,6 +222,7 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
 {
     struct iter_stats *cum_ms;
     uint64_t cycles, cycles_per_pkt = 0;
+    char *reason = NULL;
 
     if (OVS_UNLIKELY(s->current.timestamp == 0)) {
         /* Stats were cleared during the ongoing iteration. */
@@ -256,6 +271,23 @@ pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
     cum_ms->batches += s->current.batches;
     cum_ms->max_vhost_qfill += s->current.max_vhost_qfill;
 
+    if (log_on) {
+        /* Log suspicious iterations. */
+        if (cycles > iter_cycle_threshold) {
+            reason = "Excessive total cycles";
+        } else if (s->current.max_vhost_qfill >= log_q_thr) {
+            reason = "Vhost RX queue full";
+        }
+        if (OVS_UNLIKELY(reason)) {
+            pmd_perf_log_suspicious_iteration(s, cycles, reason);
+        }
+
+        /* Log iteration stats interval when required. */
+        if (OVS_UNLIKELY(s->log_end_it == s->iterations.idx)) {
+            pmd_perf_log_iteration_neighborhood(s);
+        }
+    }
+
     /* Store in iteration history. */
     history_store(&s->iterations, &s->current);
     if (now_tsc - s->last_tsc > 10000) {
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 81f090b..85fc65f 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1147,6 +1147,11 @@ dpif_netdev_init(void)
     unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]",
                              0, 1, dpif_netdev_pmd_rebalance,
                              NULL);
+    unixctl_command_register("dpif-netdev/pmd-perf-log-set",
+                             "[on|off] [-b before] [-a after] "
+                             "[-us usec] [-q qlen]",
+                             0, 10, pmd_perf_log_set_cmd,
+                             NULL);
     return 0;
 }