diff mbox series

[ovs-dev,v9,2/2] netdev-dpdk:Detailed packet drop statistics

Message ID 20190921024008.11355-2-sriram.v@altencalsoftlabs.com
State Changes Requested
Delegated to: Ilya Maximets
Headers show
Series [ovs-dev,v9,1/2] netdev-dpdk: Reuse vhost function for dpdk ETH custom stats. | expand

Commit Message

Li,Rongqing via dev Sept. 21, 2019, 2:40 a.m. UTC
OVS may be unable to transmit packets for multiple reasons and
today there is a single counter to track packets dropped due to
any of those reasons. The most common reason is that a VM is
unable to read packets fast enough causing the vhostuser port
transmit queue on the OVS side to become full. This manifests
as a problem with VNFs not receiving all packets. Having a
separate drop counter to track packets dropped because the
transmit queue is full will clearly indicate that the problem
is on the VM side and not in OVS. Similarly maintaining separate
counters for all possible drops helps in indicating sensible
cause for packet drops.

This patch adds custom software stats counters to track packets
dropped at port level and these counters are displayed along with
other stats in "ovs-vsctl get interface <iface> statistics"
command. The detailed stats will be available for both dpdk and
vhostuser ports.

--
Changes since v8:
Addressed comments given by Ilya.

Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
---
 Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
 lib/netdev-dpdk.c                             | 81 +++++++++++++++----
 utilities/bugtool/automake.mk                 |  3 +-
 utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
 .../plugins/network-status/openvswitch.xml    |  1 +
 5 files changed, 105 insertions(+), 18 deletions(-)
 create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats

Comments

Li,Rongqing via dev Sept. 25, 2019, 4:28 a.m. UTC | #1
Hi,
Addressed comments given in patch v8. Kindly review patch v9.
 
Changes since patch v8 :
Rebased my changes on top of Ilya's patch " Reuse vhost function for dpdk
ETH custom stats". Patch set 1/2.
Addressed comments given in patch v8.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 21 September 2019 08:10
To: ovs-dev@openvswitch.org; i.maximets@samsung.com
Cc: ktraynor@redhat.com; Sriram Vatala <sriram.v@altencalsoftlabs.com>
Subject: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

OVS may be unable to transmit packets for multiple reasons and today there
is a single counter to track packets dropped due to any of those reasons.
The most common reason is that a VM is unable to read packets fast enough
causing the vhostuser port transmit queue on the OVS side to become full.
This manifests as a problem with VNFs not receiving all packets. Having a
separate drop counter to track packets dropped because the transmit queue is
full will clearly indicate that the problem is on the VM side and not in
OVS. Similarly maintaining separate counters for all possible drops helps in
indicating sensible cause for packet drops.

This patch adds custom software stats counters to track packets dropped at
port level and these counters are displayed along with other stats in
"ovs-vsctl get interface <iface> statistics"
command. The detailed stats will be available for both dpdk and vhostuser
ports.

--
Changes since v8:
Addressed comments given by Ilya.

Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
---
 Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
 lib/netdev-dpdk.c                             | 81 +++++++++++++++----
 utilities/bugtool/automake.mk                 |  3 +-
 utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
 .../plugins/network-status/openvswitch.xml    |  1 +
 5 files changed, 105 insertions(+), 18 deletions(-)  create mode 100755
utilities/bugtool/ovs-bugtool-get-port-stats

diff --git a/Documentation/topics/dpdk/vhost-user.rst
b/Documentation/topics/dpdk/vhost-user.rst
index 724aa62f6..89388a2bf 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -551,7 +551,18 @@ processing packets at the required rate.
 The amount of Tx retries on a vhost-user or vhost-user-client interface can
be  shown with::
 
-  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
+  $ ovs-vsctl get Interface dpdkvhostclient0 
+ statistics:netdev_dpdk_tx_retries
+
+When the guest is not able to consume the packets fast enough, the 
+transmit queue of the port gets filled up i.e queue runs out of free
descriptors.
+This is the most likely reason why dpdk transmit API will fail to send 
+packets besides other reasons.
+
+The amount of tx failure drops on a dpdk vhost/physical interface can 
+be shown with::
+
+  $ ovs-vsctl get Interface dpdkvhostclient0 \
+                            statistics:netdev_dpdk_tx_failure_drops
 
 vhost-user Dequeue Zero Copy (experimental)
 -------------------------------------------
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
652b57e3b..fd8f9102e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -174,6 +174,20 @@ static const struct vhost_device_ops
virtio_net_device_ops =
     .destroy_connection = destroy_connection,  };
 
+/* Custom software stats for dpdk ports */ struct netdev_dpdk_sw_stats 
+{
+    /* No. of retries when unable to transmit. */
+    uint64_t tx_retries;
+    /* Packet drops when unable to transmit; Probably Tx queue is full. */
+    uint64_t tx_failure_drops;
+    /* Packet length greater than device MTU. */
+    uint64_t tx_mtu_exceeded_drops;
+    /* Packet drops in egress policer processing. */
+    uint64_t tx_qos_drops;
+    /* Packet drops in ingress policer processing. */
+    uint64_t rx_qos_drops;
+};
+
 enum { DPDK_RING_SIZE = 256 };
 BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
 enum { DRAIN_TSC = 200000ULL };
@@ -413,11 +427,10 @@ struct netdev_dpdk {
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
         struct netdev_stats stats;
-        /* Custom stat for retries when unable to transmit. */
-        uint64_t tx_retries;
+        struct netdev_dpdk_sw_stats *sw_stats;
         /* Protects stats */
         rte_spinlock_t stats_lock;
-        /* 4 pad bytes here. */
+        /* 36 pad bytes here. */
     );
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1173,7 +1186,9 @@ common_construct(struct netdev *netdev, dpdk_port_t
port_no,
     dev->rte_xstats_ids = NULL;
     dev->rte_xstats_ids_size = 0;
 
-    dev->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX;
+    dev->sw_stats = (struct netdev_dpdk_sw_stats *)
+                        xzalloc(sizeof *dev->sw_stats);
+    dev->sw_stats->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : 
+ UINT64_MAX;
 
     return 0;
 }
@@ -1359,6 +1374,7 @@ common_destruct(struct netdev_dpdk *dev)
     ovs_list_remove(&dev->list_node);
     free(ovsrcu_get_protected(struct ingress_policer *,
                               &dev->ingress_policer));
+    free(dev->sw_stats);
     ovs_mutex_destroy(&dev->mutex);
 }
 
@@ -2209,6 +2225,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_rx_counters(&dev->stats, batch->packets,
                                          nb_rx, dropped);
+    dev->sw_stats->rx_qos_drops += dropped;
     rte_spinlock_unlock(&dev->stats_lock);
 
     batch->count = nb_rx;
@@ -2258,6 +2275,7 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
dp_packet_batch *batch,
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.rx_dropped += dropped;
+        dev->sw_stats->rx_qos_drops += dropped;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 
@@ -2341,6 +2359,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int
qid,
     struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
     unsigned int total_pkts = cnt;
     unsigned int dropped = 0;
+    unsigned int tx_failure;
+    unsigned int mtu_drops;
+    unsigned int qos_drops;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
     int i, retries = 0;
     int max_retries = VHOST_ENQ_RETRY_MIN;
     int vid = netdev_dpdk_get_vid(dev); @@ -2358,9 +2380,12 @@
__netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
     rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
 
     cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
+    mtu_drops = total_pkts - cnt;
+    qos_drops = cnt;
     /* Check has QoS has been configured for the netdev */
     cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true);
-    dropped = total_pkts - cnt;
+    qos_drops -= cnt;
+    dropped = qos_drops + mtu_drops;
 
     do {
         int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; @@ -2385,12
+2410,16 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
         }
     } while (cnt && (retries++ < max_retries));
 
+    tx_failure = cnt;
     rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
 
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
                                          cnt + dropped);
-    dev->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_failure_drops += tx_failure;
+    sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+    sw_stats->tx_qos_drops += qos_drops;
     rte_spinlock_unlock(&dev->stats_lock);
 
 out:
@@ -2415,12 +2444,16 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
struct dp_packet_batch *batch)
     struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
     uint32_t cnt = batch_cnt;
     uint32_t dropped = 0;
+    uint32_t tx_failure = 0;
+    uint32_t mtu_drops = 0;
+    uint32_t qos_drops = 0;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
 
     if (dev->type != DPDK_DEV_VHOST) {
         /* Check if QoS has been configured for this netdev. */
         cnt = netdev_dpdk_qos_run(dev, (struct rte_mbuf **) batch->packets,
                                   batch_cnt, false);
-        dropped += batch_cnt - cnt;
+        qos_drops = batch_cnt - cnt;
     }
 
     uint32_t txcnt = 0;
@@ -2433,7 +2466,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct
dp_packet_batch *batch)
             VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
                          size, dev->max_packet_len);
 
-            dropped++;
+            mtu_drops++;
             continue;
         }
 
@@ -2456,13 +2489,17 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
struct dp_packet_batch *batch)
             __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **)
pkts,
                                      txcnt);
         } else {
-            dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt);
+            tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, 
+ txcnt);
         }
     }
 
+    dropped += qos_drops + mtu_drops + tx_failure;
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.tx_dropped += dropped;
+        sw_stats->tx_failure_drops += tx_failure;
+        sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+        sw_stats->tx_qos_drops += qos_drops;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 }
@@ -2503,19 +2540,27 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
         dpdk_do_tx_copy(netdev, qid, batch);
         dp_packet_delete_batch(batch, true);
     } else {
+        struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
         int tx_cnt, dropped;
+        int tx_failure, mtu_drops, qos_drops;
         int batch_cnt = dp_packet_batch_size(batch);
         struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets;
 
         tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt);
+        mtu_drops = batch_cnt - tx_cnt;
+        qos_drops = tx_cnt;
         tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true);
-        dropped = batch_cnt - tx_cnt;
+        qos_drops -= tx_cnt;
 
-        dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
+        tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
 
+        dropped = tx_failure + mtu_drops + qos_drops;
         if (OVS_UNLIKELY(dropped)) {
             rte_spinlock_lock(&dev->stats_lock);
             dev->stats.tx_dropped += dropped;
+            sw_stats->tx_failure_drops += tx_failure;
+            sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+            sw_stats->tx_qos_drops += qos_drops;
             rte_spinlock_unlock(&dev->stats_lock);
         }
     }
@@ -2826,8 +2871,12 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
     int i, n;
 
-#define SW_CSTATS \
-    SW_CSTAT(tx_retries)
+#define SW_CSTATS                    \
+    SW_CSTAT(tx_retries)             \
+    SW_CSTAT(tx_failure_drops)       \
+    SW_CSTAT(tx_mtu_exceeded_drops)  \
+    SW_CSTAT(tx_qos_drops)           \
+    SW_CSTAT(rx_qos_drops)
 
 #define SW_CSTAT(NAME) + 1
     custom_stats->size = SW_CSTATS;
@@ -2840,7 +2889,7 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     rte_spinlock_lock(&dev->stats_lock);
     i = 0;
 #define SW_CSTAT(NAME) \
-    custom_stats->counters[i++].value = dev->NAME;
+    custom_stats->counters[i++].value = dev->sw_stats->NAME;
     SW_CSTATS;
 #undef SW_CSTAT
     rte_spinlock_unlock(&dev->stats_lock);
@@ -2851,8 +2900,8 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     n = 0;
 #define SW_CSTAT(NAME) \
     if (custom_stats->counters[i].value != UINT64_MAX) {
\
-        ovs_strlcpy(custom_stats->counters[n].name, #NAME,
\
-                    NETDEV_CUSTOM_STATS_NAME_SIZE);
\
+        ovs_strlcpy(custom_stats->counters[n].name,
\
+                   "netdev_dpdk_"#NAME, NETDEV_CUSTOM_STATS_NAME_SIZE); 
+ \
         custom_stats->counters[n].value = custom_stats->counters[i].value;
\
         n++;
\
     }
\
diff --git a/utilities/bugtool/automake.mk b/utilities/bugtool/automake.mk
index 40980b367..dda58e0a1 100644
--- a/utilities/bugtool/automake.mk
+++ b/utilities/bugtool/automake.mk
@@ -22,7 +22,8 @@ bugtool_scripts = \
 	utilities/bugtool/ovs-bugtool-ovs-bridge-datapath-type \
 	utilities/bugtool/ovs-bugtool-ovs-vswitchd-threads-affinity \
 	utilities/bugtool/ovs-bugtool-qos-configs \
-	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa
+	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa \
+	utilities/bugtool/ovs-bugtool-get-port-stats
 
 scripts_SCRIPTS += $(bugtool_scripts)
 
diff --git a/utilities/bugtool/ovs-bugtool-get-port-stats
b/utilities/bugtool/ovs-bugtool-get-port-stats
new file mode 100755
index 000000000..0fe175e6b
--- /dev/null
+++ b/utilities/bugtool/ovs-bugtool-get-port-stats
@@ -0,0 +1,25 @@
+#! /bin/bash
+
+# This library is free software; you can redistribute it and/or # 
+modify it under the terms of version 2.1 of the GNU Lesser General # 
+Public License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful, # but 
+WITHOUT ANY WARRANTY; without even the implied warranty of # 
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU # 
+Lesser General Public License for more details.
+#
+# Copyright (C) 2019 Ericsson AB
+
+for bridge in `ovs-vsctl -- --real list-br` do
+    echo -e "\nBridge : ${bridge}\n"
+    for iface in `ovs-vsctl list-ifaces ${bridge}`
+    do
+        echo -e "iface : ${iface}"
+        ovs-vsctl get interface ${iface} statistics
+        echo -e "\n"
+    done
+    echo -e "iface : ${bridge}"
+    ovs-vsctl get interface ${bridge} statistics done
diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml
b/utilities/bugtool/plugins/network-status/openvswitch.xml
index d39867c6e..8c1ec643e 100644
--- a/utilities/bugtool/plugins/network-status/openvswitch.xml
+++ b/utilities/bugtool/plugins/network-status/openvswitch.xml
@@ -40,4 +40,5 @@
     <command label="ovs-ofctl-dump-groups"
filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over
-bridges "dump-groups"</command>
     <command label="ovs-ofctl-dump-group-stats" filters="ovs"
repeat="2">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-br
idges "dump-group-stats"</command>
     <command label="get_dpdk_nic_numa"
filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-dpdk-nic-numa</
command>
+    <command label="get_port_statistics" 
+ filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-port-stat
+ s</command>
 </collect>
--
2.20.1
Li,Rongqing via dev Oct. 4, 2019, 8:06 a.m. UTC | #2
Hi,
Please consider this as a gentle remainder.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 25 September 2019 09:58
To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
Cc: 'ktraynor@redhat.com' <ktraynor@redhat.com>; 'Ilya Maximets'
<i.maximets@samsung.com>
Subject: RE: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi,
Addressed comments given in patch v8. Kindly review patch v9.
 
Changes since patch v8 :
Rebased my changes on top of Ilya's patch " Reuse vhost function for dpdk
ETH custom stats". Patch set 1/2.
Addressed comments given in patch v8.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 21 September 2019 08:10
To: ovs-dev@openvswitch.org; i.maximets@samsung.com
Cc: ktraynor@redhat.com; Sriram Vatala <sriram.v@altencalsoftlabs.com>
Subject: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

OVS may be unable to transmit packets for multiple reasons and today there
is a single counter to track packets dropped due to any of those reasons.
The most common reason is that a VM is unable to read packets fast enough
causing the vhostuser port transmit queue on the OVS side to become full.
This manifests as a problem with VNFs not receiving all packets. Having a
separate drop counter to track packets dropped because the transmit queue is
full will clearly indicate that the problem is on the VM side and not in
OVS. Similarly maintaining separate counters for all possible drops helps in
indicating sensible cause for packet drops.

This patch adds custom software stats counters to track packets dropped at
port level and these counters are displayed along with other stats in
"ovs-vsctl get interface <iface> statistics"
command. The detailed stats will be available for both dpdk and vhostuser
ports.

--
Changes since v8:
Addressed comments given by Ilya.

Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
---
 Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
 lib/netdev-dpdk.c                             | 81 +++++++++++++++----
 utilities/bugtool/automake.mk                 |  3 +-
 utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
 .../plugins/network-status/openvswitch.xml    |  1 +
 5 files changed, 105 insertions(+), 18 deletions(-)  create mode 100755
utilities/bugtool/ovs-bugtool-get-port-stats

diff --git a/Documentation/topics/dpdk/vhost-user.rst
b/Documentation/topics/dpdk/vhost-user.rst
index 724aa62f6..89388a2bf 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -551,7 +551,18 @@ processing packets at the required rate.
 The amount of Tx retries on a vhost-user or vhost-user-client interface can
be  shown with::
 
-  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
+  $ ovs-vsctl get Interface dpdkvhostclient0 
+ statistics:netdev_dpdk_tx_retries
+
+When the guest is not able to consume the packets fast enough, the 
+transmit queue of the port gets filled up i.e queue runs out of free
descriptors.
+This is the most likely reason why dpdk transmit API will fail to send 
+packets besides other reasons.
+
+The amount of tx failure drops on a dpdk vhost/physical interface can 
+be shown with::
+
+  $ ovs-vsctl get Interface dpdkvhostclient0 \
+                            statistics:netdev_dpdk_tx_failure_drops
 
 vhost-user Dequeue Zero Copy (experimental)
 -------------------------------------------
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
652b57e3b..fd8f9102e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -174,6 +174,20 @@ static const struct vhost_device_ops
virtio_net_device_ops =
     .destroy_connection = destroy_connection,  };
 
+/* Custom software stats for dpdk ports */ struct netdev_dpdk_sw_stats 
+{
+    /* No. of retries when unable to transmit. */
+    uint64_t tx_retries;
+    /* Packet drops when unable to transmit; Probably Tx queue is full. */
+    uint64_t tx_failure_drops;
+    /* Packet length greater than device MTU. */
+    uint64_t tx_mtu_exceeded_drops;
+    /* Packet drops in egress policer processing. */
+    uint64_t tx_qos_drops;
+    /* Packet drops in ingress policer processing. */
+    uint64_t rx_qos_drops;
+};
+
 enum { DPDK_RING_SIZE = 256 };
 BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
 enum { DRAIN_TSC = 200000ULL };
@@ -413,11 +427,10 @@ struct netdev_dpdk {
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
         struct netdev_stats stats;
-        /* Custom stat for retries when unable to transmit. */
-        uint64_t tx_retries;
+        struct netdev_dpdk_sw_stats *sw_stats;
         /* Protects stats */
         rte_spinlock_t stats_lock;
-        /* 4 pad bytes here. */
+        /* 36 pad bytes here. */
     );
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1173,7 +1186,9 @@ common_construct(struct netdev *netdev, dpdk_port_t
port_no,
     dev->rte_xstats_ids = NULL;
     dev->rte_xstats_ids_size = 0;
 
-    dev->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX;
+    dev->sw_stats = (struct netdev_dpdk_sw_stats *)
+                        xzalloc(sizeof *dev->sw_stats);
+    dev->sw_stats->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : 
+ UINT64_MAX;
 
     return 0;
 }
@@ -1359,6 +1374,7 @@ common_destruct(struct netdev_dpdk *dev)
     ovs_list_remove(&dev->list_node);
     free(ovsrcu_get_protected(struct ingress_policer *,
                               &dev->ingress_policer));
+    free(dev->sw_stats);
     ovs_mutex_destroy(&dev->mutex);
 }
 
@@ -2209,6 +2225,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_rx_counters(&dev->stats, batch->packets,
                                          nb_rx, dropped);
+    dev->sw_stats->rx_qos_drops += dropped;
     rte_spinlock_unlock(&dev->stats_lock);
 
     batch->count = nb_rx;
@@ -2258,6 +2275,7 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
dp_packet_batch *batch,
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.rx_dropped += dropped;
+        dev->sw_stats->rx_qos_drops += dropped;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 
@@ -2341,6 +2359,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int
qid,
     struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
     unsigned int total_pkts = cnt;
     unsigned int dropped = 0;
+    unsigned int tx_failure;
+    unsigned int mtu_drops;
+    unsigned int qos_drops;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
     int i, retries = 0;
     int max_retries = VHOST_ENQ_RETRY_MIN;
     int vid = netdev_dpdk_get_vid(dev); @@ -2358,9 +2380,12 @@
__netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
     rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
 
     cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
+    mtu_drops = total_pkts - cnt;
+    qos_drops = cnt;
     /* Check has QoS has been configured for the netdev */
     cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true);
-    dropped = total_pkts - cnt;
+    qos_drops -= cnt;
+    dropped = qos_drops + mtu_drops;
 
     do {
         int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; @@ -2385,12
+2410,16 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
         }
     } while (cnt && (retries++ < max_retries));
 
+    tx_failure = cnt;
     rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
 
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
                                          cnt + dropped);
-    dev->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_failure_drops += tx_failure;
+    sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+    sw_stats->tx_qos_drops += qos_drops;
     rte_spinlock_unlock(&dev->stats_lock);
 
 out:
@@ -2415,12 +2444,16 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
struct dp_packet_batch *batch)
     struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
     uint32_t cnt = batch_cnt;
     uint32_t dropped = 0;
+    uint32_t tx_failure = 0;
+    uint32_t mtu_drops = 0;
+    uint32_t qos_drops = 0;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
 
     if (dev->type != DPDK_DEV_VHOST) {
         /* Check if QoS has been configured for this netdev. */
         cnt = netdev_dpdk_qos_run(dev, (struct rte_mbuf **) batch->packets,
                                   batch_cnt, false);
-        dropped += batch_cnt - cnt;
+        qos_drops = batch_cnt - cnt;
     }
 
     uint32_t txcnt = 0;
@@ -2433,7 +2466,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct
dp_packet_batch *batch)
             VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
                          size, dev->max_packet_len);
 
-            dropped++;
+            mtu_drops++;
             continue;
         }
 
@@ -2456,13 +2489,17 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
struct dp_packet_batch *batch)
             __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **)
pkts,
                                      txcnt);
         } else {
-            dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt);
+            tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, 
+ txcnt);
         }
     }
 
+    dropped += qos_drops + mtu_drops + tx_failure;
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.tx_dropped += dropped;
+        sw_stats->tx_failure_drops += tx_failure;
+        sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+        sw_stats->tx_qos_drops += qos_drops;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 }
@@ -2503,19 +2540,27 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
         dpdk_do_tx_copy(netdev, qid, batch);
         dp_packet_delete_batch(batch, true);
     } else {
+        struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
         int tx_cnt, dropped;
+        int tx_failure, mtu_drops, qos_drops;
         int batch_cnt = dp_packet_batch_size(batch);
         struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets;
 
         tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt);
+        mtu_drops = batch_cnt - tx_cnt;
+        qos_drops = tx_cnt;
         tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true);
-        dropped = batch_cnt - tx_cnt;
+        qos_drops -= tx_cnt;
 
-        dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
+        tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
 
+        dropped = tx_failure + mtu_drops + qos_drops;
         if (OVS_UNLIKELY(dropped)) {
             rte_spinlock_lock(&dev->stats_lock);
             dev->stats.tx_dropped += dropped;
+            sw_stats->tx_failure_drops += tx_failure;
+            sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+            sw_stats->tx_qos_drops += qos_drops;
             rte_spinlock_unlock(&dev->stats_lock);
         }
     }
@@ -2826,8 +2871,12 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
     int i, n;
 
-#define SW_CSTATS \
-    SW_CSTAT(tx_retries)
+#define SW_CSTATS                    \
+    SW_CSTAT(tx_retries)             \
+    SW_CSTAT(tx_failure_drops)       \
+    SW_CSTAT(tx_mtu_exceeded_drops)  \
+    SW_CSTAT(tx_qos_drops)           \
+    SW_CSTAT(rx_qos_drops)
 
 #define SW_CSTAT(NAME) + 1
     custom_stats->size = SW_CSTATS;
@@ -2840,7 +2889,7 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     rte_spinlock_lock(&dev->stats_lock);
     i = 0;
 #define SW_CSTAT(NAME) \
-    custom_stats->counters[i++].value = dev->NAME;
+    custom_stats->counters[i++].value = dev->sw_stats->NAME;
     SW_CSTATS;
 #undef SW_CSTAT
     rte_spinlock_unlock(&dev->stats_lock);
@@ -2851,8 +2900,8 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     n = 0;
 #define SW_CSTAT(NAME) \
     if (custom_stats->counters[i].value != UINT64_MAX) {
\
-        ovs_strlcpy(custom_stats->counters[n].name, #NAME,
\
-                    NETDEV_CUSTOM_STATS_NAME_SIZE);
\
+        ovs_strlcpy(custom_stats->counters[n].name,
\
+                   "netdev_dpdk_"#NAME, NETDEV_CUSTOM_STATS_NAME_SIZE); 
+ \
         custom_stats->counters[n].value = custom_stats->counters[i].value;
\
         n++;
\
     }
\
diff --git a/utilities/bugtool/automake.mk b/utilities/bugtool/automake.mk
index 40980b367..dda58e0a1 100644
--- a/utilities/bugtool/automake.mk
+++ b/utilities/bugtool/automake.mk
@@ -22,7 +22,8 @@ bugtool_scripts = \
 	utilities/bugtool/ovs-bugtool-ovs-bridge-datapath-type \
 	utilities/bugtool/ovs-bugtool-ovs-vswitchd-threads-affinity \
 	utilities/bugtool/ovs-bugtool-qos-configs \
-	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa
+	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa \
+	utilities/bugtool/ovs-bugtool-get-port-stats
 
 scripts_SCRIPTS += $(bugtool_scripts)
 
diff --git a/utilities/bugtool/ovs-bugtool-get-port-stats
b/utilities/bugtool/ovs-bugtool-get-port-stats
new file mode 100755
index 000000000..0fe175e6b
--- /dev/null
+++ b/utilities/bugtool/ovs-bugtool-get-port-stats
@@ -0,0 +1,25 @@
+#! /bin/bash
+
+# This library is free software; you can redistribute it and/or # 
+modify it under the terms of version 2.1 of the GNU Lesser General # 
+Public License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful, # but 
+WITHOUT ANY WARRANTY; without even the implied warranty of # 
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU # 
+Lesser General Public License for more details.
+#
+# Copyright (C) 2019 Ericsson AB
+
+for bridge in `ovs-vsctl -- --real list-br` do
+    echo -e "\nBridge : ${bridge}\n"
+    for iface in `ovs-vsctl list-ifaces ${bridge}`
+    do
+        echo -e "iface : ${iface}"
+        ovs-vsctl get interface ${iface} statistics
+        echo -e "\n"
+    done
+    echo -e "iface : ${bridge}"
+    ovs-vsctl get interface ${bridge} statistics done
diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml
b/utilities/bugtool/plugins/network-status/openvswitch.xml
index d39867c6e..8c1ec643e 100644
--- a/utilities/bugtool/plugins/network-status/openvswitch.xml
+++ b/utilities/bugtool/plugins/network-status/openvswitch.xml
@@ -40,4 +40,5 @@
     <command label="ovs-ofctl-dump-groups"
filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over
-bridges "dump-groups"</command>
     <command label="ovs-ofctl-dump-group-stats" filters="ovs"
repeat="2">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-br
idges "dump-group-stats"</command>
     <command label="get_dpdk_nic_numa"
filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-dpdk-nic-numa</
command>
+    <command label="get_port_statistics" 
+ filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-port-stat
+ s</command>
 </collect>
--
2.20.1
Li,Rongqing via dev Oct. 9, 2019, 6:23 a.m. UTC | #3
Hi All,
Please consider this as a gentle remainder.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 04 October 2019 13:37
To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
Cc: 'ktraynor@redhat.com' <ktraynor@redhat.com>; 'Ilya Maximets'
<i.maximets@samsung.com>
Subject: RE: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi,
Please consider this as a gentle remainder.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 25 September 2019 09:58
To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
Cc: 'ktraynor@redhat.com' <ktraynor@redhat.com>; 'Ilya Maximets'
<i.maximets@samsung.com>
Subject: RE: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi,
Addressed comments given in patch v8. Kindly review patch v9.
 
Changes since patch v8 :
Rebased my changes on top of Ilya's patch " Reuse vhost function for dpdk
ETH custom stats". Patch set 1/2.
Addressed comments given in patch v8.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 21 September 2019 08:10
To: ovs-dev@openvswitch.org; i.maximets@samsung.com
Cc: ktraynor@redhat.com; Sriram Vatala <sriram.v@altencalsoftlabs.com>
Subject: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

OVS may be unable to transmit packets for multiple reasons and today there
is a single counter to track packets dropped due to any of those reasons.
The most common reason is that a VM is unable to read packets fast enough
causing the vhostuser port transmit queue on the OVS side to become full.
This manifests as a problem with VNFs not receiving all packets. Having a
separate drop counter to track packets dropped because the transmit queue is
full will clearly indicate that the problem is on the VM side and not in
OVS. Similarly maintaining separate counters for all possible drops helps in
indicating sensible cause for packet drops.

This patch adds custom software stats counters to track packets dropped at
port level and these counters are displayed along with other stats in
"ovs-vsctl get interface <iface> statistics"
command. The detailed stats will be available for both dpdk and vhostuser
ports.

--
Changes since v8:
Addressed comments given by Ilya.

Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
---
 Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
 lib/netdev-dpdk.c                             | 81 +++++++++++++++----
 utilities/bugtool/automake.mk                 |  3 +-
 utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
 .../plugins/network-status/openvswitch.xml    |  1 +
 5 files changed, 105 insertions(+), 18 deletions(-)  create mode 100755
utilities/bugtool/ovs-bugtool-get-port-stats

diff --git a/Documentation/topics/dpdk/vhost-user.rst
b/Documentation/topics/dpdk/vhost-user.rst
index 724aa62f6..89388a2bf 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -551,7 +551,18 @@ processing packets at the required rate.
 The amount of Tx retries on a vhost-user or vhost-user-client interface can
be  shown with::
 
-  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
+  $ ovs-vsctl get Interface dpdkvhostclient0 
+ statistics:netdev_dpdk_tx_retries
+
+When the guest is not able to consume the packets fast enough, the 
+transmit queue of the port gets filled up i.e queue runs out of free
descriptors.
+This is the most likely reason why dpdk transmit API will fail to send 
+packets besides other reasons.
+
+The amount of tx failure drops on a dpdk vhost/physical interface can 
+be shown with::
+
+  $ ovs-vsctl get Interface dpdkvhostclient0 \
+                            statistics:netdev_dpdk_tx_failure_drops
 
 vhost-user Dequeue Zero Copy (experimental)
 -------------------------------------------
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
652b57e3b..fd8f9102e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -174,6 +174,20 @@ static const struct vhost_device_ops
virtio_net_device_ops =
     .destroy_connection = destroy_connection,  };
 
+/* Custom software stats for dpdk ports */ struct netdev_dpdk_sw_stats 
+{
+    /* No. of retries when unable to transmit. */
+    uint64_t tx_retries;
+    /* Packet drops when unable to transmit; Probably Tx queue is full. */
+    uint64_t tx_failure_drops;
+    /* Packet length greater than device MTU. */
+    uint64_t tx_mtu_exceeded_drops;
+    /* Packet drops in egress policer processing. */
+    uint64_t tx_qos_drops;
+    /* Packet drops in ingress policer processing. */
+    uint64_t rx_qos_drops;
+};
+
 enum { DPDK_RING_SIZE = 256 };
 BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
 enum { DRAIN_TSC = 200000ULL };
@@ -413,11 +427,10 @@ struct netdev_dpdk {
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
         struct netdev_stats stats;
-        /* Custom stat for retries when unable to transmit. */
-        uint64_t tx_retries;
+        struct netdev_dpdk_sw_stats *sw_stats;
         /* Protects stats */
         rte_spinlock_t stats_lock;
-        /* 4 pad bytes here. */
+        /* 36 pad bytes here. */
     );
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1173,7 +1186,9 @@ common_construct(struct netdev *netdev, dpdk_port_t
port_no,
     dev->rte_xstats_ids = NULL;
     dev->rte_xstats_ids_size = 0;
 
-    dev->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX;
+    dev->sw_stats = (struct netdev_dpdk_sw_stats *)
+                        xzalloc(sizeof *dev->sw_stats);
+    dev->sw_stats->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : 
+ UINT64_MAX;
 
     return 0;
 }
@@ -1359,6 +1374,7 @@ common_destruct(struct netdev_dpdk *dev)
     ovs_list_remove(&dev->list_node);
     free(ovsrcu_get_protected(struct ingress_policer *,
                               &dev->ingress_policer));
+    free(dev->sw_stats);
     ovs_mutex_destroy(&dev->mutex);
 }
 
@@ -2209,6 +2225,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_rx_counters(&dev->stats, batch->packets,
                                          nb_rx, dropped);
+    dev->sw_stats->rx_qos_drops += dropped;
     rte_spinlock_unlock(&dev->stats_lock);
 
     batch->count = nb_rx;
@@ -2258,6 +2275,7 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
dp_packet_batch *batch,
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.rx_dropped += dropped;
+        dev->sw_stats->rx_qos_drops += dropped;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 
@@ -2341,6 +2359,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int
qid,
     struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
     unsigned int total_pkts = cnt;
     unsigned int dropped = 0;
+    unsigned int tx_failure;
+    unsigned int mtu_drops;
+    unsigned int qos_drops;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
     int i, retries = 0;
     int max_retries = VHOST_ENQ_RETRY_MIN;
     int vid = netdev_dpdk_get_vid(dev); @@ -2358,9 +2380,12 @@
__netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
     rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
 
     cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
+    mtu_drops = total_pkts - cnt;
+    qos_drops = cnt;
     /* Check has QoS has been configured for the netdev */
     cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true);
-    dropped = total_pkts - cnt;
+    qos_drops -= cnt;
+    dropped = qos_drops + mtu_drops;
 
     do {
         int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; @@ -2385,12
+2410,16 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
         }
     } while (cnt && (retries++ < max_retries));
 
+    tx_failure = cnt;
     rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
 
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
                                          cnt + dropped);
-    dev->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_failure_drops += tx_failure;
+    sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+    sw_stats->tx_qos_drops += qos_drops;
     rte_spinlock_unlock(&dev->stats_lock);
 
 out:
@@ -2415,12 +2444,16 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
struct dp_packet_batch *batch)
     struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
     uint32_t cnt = batch_cnt;
     uint32_t dropped = 0;
+    uint32_t tx_failure = 0;
+    uint32_t mtu_drops = 0;
+    uint32_t qos_drops = 0;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
 
     if (dev->type != DPDK_DEV_VHOST) {
         /* Check if QoS has been configured for this netdev. */
         cnt = netdev_dpdk_qos_run(dev, (struct rte_mbuf **) batch->packets,
                                   batch_cnt, false);
-        dropped += batch_cnt - cnt;
+        qos_drops = batch_cnt - cnt;
     }
 
     uint32_t txcnt = 0;
@@ -2433,7 +2466,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct
dp_packet_batch *batch)
             VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
                          size, dev->max_packet_len);
 
-            dropped++;
+            mtu_drops++;
             continue;
         }
 
@@ -2456,13 +2489,17 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
struct dp_packet_batch *batch)
             __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **)
pkts,
                                      txcnt);
         } else {
-            dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt);
+            tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, 
+ txcnt);
         }
     }
 
+    dropped += qos_drops + mtu_drops + tx_failure;
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.tx_dropped += dropped;
+        sw_stats->tx_failure_drops += tx_failure;
+        sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+        sw_stats->tx_qos_drops += qos_drops;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 }
@@ -2503,19 +2540,27 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
         dpdk_do_tx_copy(netdev, qid, batch);
         dp_packet_delete_batch(batch, true);
     } else {
+        struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
         int tx_cnt, dropped;
+        int tx_failure, mtu_drops, qos_drops;
         int batch_cnt = dp_packet_batch_size(batch);
         struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets;
 
         tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt);
+        mtu_drops = batch_cnt - tx_cnt;
+        qos_drops = tx_cnt;
         tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true);
-        dropped = batch_cnt - tx_cnt;
+        qos_drops -= tx_cnt;
 
-        dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
+        tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
 
+        dropped = tx_failure + mtu_drops + qos_drops;
         if (OVS_UNLIKELY(dropped)) {
             rte_spinlock_lock(&dev->stats_lock);
             dev->stats.tx_dropped += dropped;
+            sw_stats->tx_failure_drops += tx_failure;
+            sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+            sw_stats->tx_qos_drops += qos_drops;
             rte_spinlock_unlock(&dev->stats_lock);
         }
     }
@@ -2826,8 +2871,12 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
     int i, n;
 
-#define SW_CSTATS \
-    SW_CSTAT(tx_retries)
+#define SW_CSTATS                    \
+    SW_CSTAT(tx_retries)             \
+    SW_CSTAT(tx_failure_drops)       \
+    SW_CSTAT(tx_mtu_exceeded_drops)  \
+    SW_CSTAT(tx_qos_drops)           \
+    SW_CSTAT(rx_qos_drops)
 
 #define SW_CSTAT(NAME) + 1
     custom_stats->size = SW_CSTATS;
@@ -2840,7 +2889,7 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     rte_spinlock_lock(&dev->stats_lock);
     i = 0;
 #define SW_CSTAT(NAME) \
-    custom_stats->counters[i++].value = dev->NAME;
+    custom_stats->counters[i++].value = dev->sw_stats->NAME;
     SW_CSTATS;
 #undef SW_CSTAT
     rte_spinlock_unlock(&dev->stats_lock);
@@ -2851,8 +2900,8 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     n = 0;
 #define SW_CSTAT(NAME) \
     if (custom_stats->counters[i].value != UINT64_MAX) {
\
-        ovs_strlcpy(custom_stats->counters[n].name, #NAME,
\
-                    NETDEV_CUSTOM_STATS_NAME_SIZE);
\
+        ovs_strlcpy(custom_stats->counters[n].name,
\
+                   "netdev_dpdk_"#NAME, NETDEV_CUSTOM_STATS_NAME_SIZE); 
+ \
         custom_stats->counters[n].value = custom_stats->counters[i].value;
\
         n++;
\
     }
\
diff --git a/utilities/bugtool/automake.mk b/utilities/bugtool/automake.mk
index 40980b367..dda58e0a1 100644
--- a/utilities/bugtool/automake.mk
+++ b/utilities/bugtool/automake.mk
@@ -22,7 +22,8 @@ bugtool_scripts = \
 	utilities/bugtool/ovs-bugtool-ovs-bridge-datapath-type \
 	utilities/bugtool/ovs-bugtool-ovs-vswitchd-threads-affinity \
 	utilities/bugtool/ovs-bugtool-qos-configs \
-	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa
+	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa \
+	utilities/bugtool/ovs-bugtool-get-port-stats
 
 scripts_SCRIPTS += $(bugtool_scripts)
 
diff --git a/utilities/bugtool/ovs-bugtool-get-port-stats
b/utilities/bugtool/ovs-bugtool-get-port-stats
new file mode 100755
index 000000000..0fe175e6b
--- /dev/null
+++ b/utilities/bugtool/ovs-bugtool-get-port-stats
@@ -0,0 +1,25 @@
+#! /bin/bash
+
+# This library is free software; you can redistribute it and/or # 
+modify it under the terms of version 2.1 of the GNU Lesser General # 
+Public License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful, # but 
+WITHOUT ANY WARRANTY; without even the implied warranty of # 
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU # 
+Lesser General Public License for more details.
+#
+# Copyright (C) 2019 Ericsson AB
+
+for bridge in `ovs-vsctl -- --real list-br` do
+    echo -e "\nBridge : ${bridge}\n"
+    for iface in `ovs-vsctl list-ifaces ${bridge}`
+    do
+        echo -e "iface : ${iface}"
+        ovs-vsctl get interface ${iface} statistics
+        echo -e "\n"
+    done
+    echo -e "iface : ${bridge}"
+    ovs-vsctl get interface ${bridge} statistics done
diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml
b/utilities/bugtool/plugins/network-status/openvswitch.xml
index d39867c6e..8c1ec643e 100644
--- a/utilities/bugtool/plugins/network-status/openvswitch.xml
+++ b/utilities/bugtool/plugins/network-status/openvswitch.xml
@@ -40,4 +40,5 @@
     <command label="ovs-ofctl-dump-groups"
filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over
-bridges "dump-groups"</command>
     <command label="ovs-ofctl-dump-group-stats" filters="ovs"
repeat="2">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-br
idges "dump-group-stats"</command>
     <command label="get_dpdk_nic_numa"
filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-dpdk-nic-numa</
command>
+    <command label="get_port_statistics" 
+ filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-port-stat
+ s</command>
 </collect>
--
2.20.1
Li,Rongqing via dev Oct. 11, 2019, 11:28 a.m. UTC | #4
Hi,
Please consider this as a gentle remainder.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 09 October 2019 11:53
To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
Cc: 'ktraynor@redhat.com' <ktraynor@redhat.com>; 'Ilya Maximets'
<i.maximets@samsung.com>; 'i.maximets@ovn.org' <i.maximets@ovn.org>
Subject: RE: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi All,
Please consider this as a gentle remainder.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 04 October 2019 13:37
To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
Cc: 'ktraynor@redhat.com' <ktraynor@redhat.com>; 'Ilya Maximets'
<i.maximets@samsung.com>
Subject: RE: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi,
Please consider this as a gentle remainder.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 25 September 2019 09:58
To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
Cc: 'ktraynor@redhat.com' <ktraynor@redhat.com>; 'Ilya Maximets'
<i.maximets@samsung.com>
Subject: RE: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi,
Addressed comments given in patch v8. Kindly review patch v9.
 
Changes since patch v8 :
Rebased my changes on top of Ilya's patch " Reuse vhost function for dpdk
ETH custom stats". Patch set 1/2.
Addressed comments given in patch v8.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 21 September 2019 08:10
To: ovs-dev@openvswitch.org; i.maximets@samsung.com
Cc: ktraynor@redhat.com; Sriram Vatala <sriram.v@altencalsoftlabs.com>
Subject: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

OVS may be unable to transmit packets for multiple reasons and today there
is a single counter to track packets dropped due to any of those reasons.
The most common reason is that a VM is unable to read packets fast enough
causing the vhostuser port transmit queue on the OVS side to become full.
This manifests as a problem with VNFs not receiving all packets. Having a
separate drop counter to track packets dropped because the transmit queue is
full will clearly indicate that the problem is on the VM side and not in
OVS. Similarly maintaining separate counters for all possible drops helps in
indicating sensible cause for packet drops.

This patch adds custom software stats counters to track packets dropped at
port level and these counters are displayed along with other stats in
"ovs-vsctl get interface <iface> statistics"
command. The detailed stats will be available for both dpdk and vhostuser
ports.

--
Changes since v8:
Addressed comments given by Ilya.

Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
---
 Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
 lib/netdev-dpdk.c                             | 81 +++++++++++++++----
 utilities/bugtool/automake.mk                 |  3 +-
 utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
 .../plugins/network-status/openvswitch.xml    |  1 +
 5 files changed, 105 insertions(+), 18 deletions(-)  create mode 100755
utilities/bugtool/ovs-bugtool-get-port-stats

diff --git a/Documentation/topics/dpdk/vhost-user.rst
b/Documentation/topics/dpdk/vhost-user.rst
index 724aa62f6..89388a2bf 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -551,7 +551,18 @@ processing packets at the required rate.
 The amount of Tx retries on a vhost-user or vhost-user-client interface can
be  shown with::
 
-  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
+  $ ovs-vsctl get Interface dpdkvhostclient0 
+ statistics:netdev_dpdk_tx_retries
+
+When the guest is not able to consume the packets fast enough, the 
+transmit queue of the port gets filled up i.e queue runs out of free
descriptors.
+This is the most likely reason why dpdk transmit API will fail to send 
+packets besides other reasons.
+
+The amount of tx failure drops on a dpdk vhost/physical interface can 
+be shown with::
+
+  $ ovs-vsctl get Interface dpdkvhostclient0 \
+                            statistics:netdev_dpdk_tx_failure_drops
 
 vhost-user Dequeue Zero Copy (experimental)
 -------------------------------------------
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
652b57e3b..fd8f9102e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -174,6 +174,20 @@ static const struct vhost_device_ops
virtio_net_device_ops =
     .destroy_connection = destroy_connection,  };
 
+/* Custom software stats for dpdk ports */ struct netdev_dpdk_sw_stats 
+{
+    /* No. of retries when unable to transmit. */
+    uint64_t tx_retries;
+    /* Packet drops when unable to transmit; Probably Tx queue is full. */
+    uint64_t tx_failure_drops;
+    /* Packet length greater than device MTU. */
+    uint64_t tx_mtu_exceeded_drops;
+    /* Packet drops in egress policer processing. */
+    uint64_t tx_qos_drops;
+    /* Packet drops in ingress policer processing. */
+    uint64_t rx_qos_drops;
+};
+
 enum { DPDK_RING_SIZE = 256 };
 BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
 enum { DRAIN_TSC = 200000ULL };
@@ -413,11 +427,10 @@ struct netdev_dpdk {
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
         struct netdev_stats stats;
-        /* Custom stat for retries when unable to transmit. */
-        uint64_t tx_retries;
+        struct netdev_dpdk_sw_stats *sw_stats;
         /* Protects stats */
         rte_spinlock_t stats_lock;
-        /* 4 pad bytes here. */
+        /* 36 pad bytes here. */
     );
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1173,7 +1186,9 @@ common_construct(struct netdev *netdev, dpdk_port_t
port_no,
     dev->rte_xstats_ids = NULL;
     dev->rte_xstats_ids_size = 0;
 
-    dev->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX;
+    dev->sw_stats = (struct netdev_dpdk_sw_stats *)
+                        xzalloc(sizeof *dev->sw_stats);
+    dev->sw_stats->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : 
+ UINT64_MAX;
 
     return 0;
 }
@@ -1359,6 +1374,7 @@ common_destruct(struct netdev_dpdk *dev)
     ovs_list_remove(&dev->list_node);
     free(ovsrcu_get_protected(struct ingress_policer *,
                               &dev->ingress_policer));
+    free(dev->sw_stats);
     ovs_mutex_destroy(&dev->mutex);
 }
 
@@ -2209,6 +2225,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_rx_counters(&dev->stats, batch->packets,
                                          nb_rx, dropped);
+    dev->sw_stats->rx_qos_drops += dropped;
     rte_spinlock_unlock(&dev->stats_lock);
 
     batch->count = nb_rx;
@@ -2258,6 +2275,7 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
dp_packet_batch *batch,
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.rx_dropped += dropped;
+        dev->sw_stats->rx_qos_drops += dropped;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 
@@ -2341,6 +2359,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int
qid,
     struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
     unsigned int total_pkts = cnt;
     unsigned int dropped = 0;
+    unsigned int tx_failure;
+    unsigned int mtu_drops;
+    unsigned int qos_drops;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
     int i, retries = 0;
     int max_retries = VHOST_ENQ_RETRY_MIN;
     int vid = netdev_dpdk_get_vid(dev); @@ -2358,9 +2380,12 @@
__netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
     rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
 
     cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
+    mtu_drops = total_pkts - cnt;
+    qos_drops = cnt;
     /* Check has QoS has been configured for the netdev */
     cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true);
-    dropped = total_pkts - cnt;
+    qos_drops -= cnt;
+    dropped = qos_drops + mtu_drops;
 
     do {
         int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; @@ -2385,12
+2410,16 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
         }
     } while (cnt && (retries++ < max_retries));
 
+    tx_failure = cnt;
     rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
 
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
                                          cnt + dropped);
-    dev->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_failure_drops += tx_failure;
+    sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+    sw_stats->tx_qos_drops += qos_drops;
     rte_spinlock_unlock(&dev->stats_lock);
 
 out:
@@ -2415,12 +2444,16 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
struct dp_packet_batch *batch)
     struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
     uint32_t cnt = batch_cnt;
     uint32_t dropped = 0;
+    uint32_t tx_failure = 0;
+    uint32_t mtu_drops = 0;
+    uint32_t qos_drops = 0;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
 
     if (dev->type != DPDK_DEV_VHOST) {
         /* Check if QoS has been configured for this netdev. */
         cnt = netdev_dpdk_qos_run(dev, (struct rte_mbuf **) batch->packets,
                                   batch_cnt, false);
-        dropped += batch_cnt - cnt;
+        qos_drops = batch_cnt - cnt;
     }
 
     uint32_t txcnt = 0;
@@ -2433,7 +2466,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct
dp_packet_batch *batch)
             VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
                          size, dev->max_packet_len);
 
-            dropped++;
+            mtu_drops++;
             continue;
         }
 
@@ -2456,13 +2489,17 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
struct dp_packet_batch *batch)
             __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **)
pkts,
                                      txcnt);
         } else {
-            dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt);
+            tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, 
+ txcnt);
         }
     }
 
+    dropped += qos_drops + mtu_drops + tx_failure;
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.tx_dropped += dropped;
+        sw_stats->tx_failure_drops += tx_failure;
+        sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+        sw_stats->tx_qos_drops += qos_drops;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 }
@@ -2503,19 +2540,27 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
         dpdk_do_tx_copy(netdev, qid, batch);
         dp_packet_delete_batch(batch, true);
     } else {
+        struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
         int tx_cnt, dropped;
+        int tx_failure, mtu_drops, qos_drops;
         int batch_cnt = dp_packet_batch_size(batch);
         struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets;
 
         tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt);
+        mtu_drops = batch_cnt - tx_cnt;
+        qos_drops = tx_cnt;
         tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true);
-        dropped = batch_cnt - tx_cnt;
+        qos_drops -= tx_cnt;
 
-        dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
+        tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
 
+        dropped = tx_failure + mtu_drops + qos_drops;
         if (OVS_UNLIKELY(dropped)) {
             rte_spinlock_lock(&dev->stats_lock);
             dev->stats.tx_dropped += dropped;
+            sw_stats->tx_failure_drops += tx_failure;
+            sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+            sw_stats->tx_qos_drops += qos_drops;
             rte_spinlock_unlock(&dev->stats_lock);
         }
     }
@@ -2826,8 +2871,12 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
     int i, n;
 
-#define SW_CSTATS \
-    SW_CSTAT(tx_retries)
+#define SW_CSTATS                    \
+    SW_CSTAT(tx_retries)             \
+    SW_CSTAT(tx_failure_drops)       \
+    SW_CSTAT(tx_mtu_exceeded_drops)  \
+    SW_CSTAT(tx_qos_drops)           \
+    SW_CSTAT(rx_qos_drops)
 
 #define SW_CSTAT(NAME) + 1
     custom_stats->size = SW_CSTATS;
@@ -2840,7 +2889,7 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     rte_spinlock_lock(&dev->stats_lock);
     i = 0;
 #define SW_CSTAT(NAME) \
-    custom_stats->counters[i++].value = dev->NAME;
+    custom_stats->counters[i++].value = dev->sw_stats->NAME;
     SW_CSTATS;
 #undef SW_CSTAT
     rte_spinlock_unlock(&dev->stats_lock);
@@ -2851,8 +2900,8 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     n = 0;
 #define SW_CSTAT(NAME) \
     if (custom_stats->counters[i].value != UINT64_MAX) {
\
-        ovs_strlcpy(custom_stats->counters[n].name, #NAME,
\
-                    NETDEV_CUSTOM_STATS_NAME_SIZE);
\
+        ovs_strlcpy(custom_stats->counters[n].name,
\
+                   "netdev_dpdk_"#NAME, NETDEV_CUSTOM_STATS_NAME_SIZE); 
+ \
         custom_stats->counters[n].value = custom_stats->counters[i].value;
\
         n++;
\
     }
\
diff --git a/utilities/bugtool/automake.mk b/utilities/bugtool/automake.mk
index 40980b367..dda58e0a1 100644
--- a/utilities/bugtool/automake.mk
+++ b/utilities/bugtool/automake.mk
@@ -22,7 +22,8 @@ bugtool_scripts = \
 	utilities/bugtool/ovs-bugtool-ovs-bridge-datapath-type \
 	utilities/bugtool/ovs-bugtool-ovs-vswitchd-threads-affinity \
 	utilities/bugtool/ovs-bugtool-qos-configs \
-	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa
+	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa \
+	utilities/bugtool/ovs-bugtool-get-port-stats
 
 scripts_SCRIPTS += $(bugtool_scripts)
 
diff --git a/utilities/bugtool/ovs-bugtool-get-port-stats
b/utilities/bugtool/ovs-bugtool-get-port-stats
new file mode 100755
index 000000000..0fe175e6b
--- /dev/null
+++ b/utilities/bugtool/ovs-bugtool-get-port-stats
@@ -0,0 +1,25 @@
+#! /bin/bash
+
+# This library is free software; you can redistribute it and/or # 
+modify it under the terms of version 2.1 of the GNU Lesser General # 
+Public License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful, # but 
+WITHOUT ANY WARRANTY; without even the implied warranty of # 
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU # 
+Lesser General Public License for more details.
+#
+# Copyright (C) 2019 Ericsson AB
+
+for bridge in `ovs-vsctl -- --real list-br` do
+    echo -e "\nBridge : ${bridge}\n"
+    for iface in `ovs-vsctl list-ifaces ${bridge}`
+    do
+        echo -e "iface : ${iface}"
+        ovs-vsctl get interface ${iface} statistics
+        echo -e "\n"
+    done
+    echo -e "iface : ${bridge}"
+    ovs-vsctl get interface ${bridge} statistics done
diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml
b/utilities/bugtool/plugins/network-status/openvswitch.xml
index d39867c6e..8c1ec643e 100644
--- a/utilities/bugtool/plugins/network-status/openvswitch.xml
+++ b/utilities/bugtool/plugins/network-status/openvswitch.xml
@@ -40,4 +40,5 @@
     <command label="ovs-ofctl-dump-groups"
filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over
-bridges "dump-groups"</command>
     <command label="ovs-ofctl-dump-group-stats" filters="ovs"
repeat="2">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-br
idges "dump-group-stats"</command>
     <command label="get_dpdk_nic_numa"
filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-dpdk-nic-numa</
command>
+    <command label="get_port_statistics" 
+ filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-port-stat
+ s</command>
 </collect>
--
2.20.1
Li,Rongqing via dev Oct. 15, 2019, 5:53 a.m. UTC | #5
Hi,
Please consider this as gentle remainder and kindly review the patch set.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 11 October 2019 16:58
To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>;
'i.maximets@ovn.org' <i.maximets@ovn.org>
Subject: RE: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi,
Please consider this as a gentle remainder.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 09 October 2019 11:53
To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
Cc: 'ktraynor@redhat.com' <ktraynor@redhat.com>; 'Ilya Maximets'
<i.maximets@samsung.com>; 'i.maximets@ovn.org' <i.maximets@ovn.org>
Subject: RE: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi All,
Please consider this as a gentle remainder.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 04 October 2019 13:37
To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
Cc: 'ktraynor@redhat.com' <ktraynor@redhat.com>; 'Ilya Maximets'
<i.maximets@samsung.com>
Subject: RE: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi,
Please consider this as a gentle remainder.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 25 September 2019 09:58
To: 'ovs-dev@openvswitch.org' <ovs-dev@openvswitch.org>
Cc: 'ktraynor@redhat.com' <ktraynor@redhat.com>; 'Ilya Maximets'
<i.maximets@samsung.com>
Subject: RE: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi,
Addressed comments given in patch v8. Kindly review patch v9.
 
Changes since patch v8 :
Rebased my changes on top of Ilya's patch " Reuse vhost function for dpdk
ETH custom stats". Patch set 1/2.
Addressed comments given in patch v8.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v@altencalsoftlabs.com> 
Sent: 21 September 2019 08:10
To: ovs-dev@openvswitch.org; i.maximets@samsung.com
Cc: ktraynor@redhat.com; Sriram Vatala <sriram.v@altencalsoftlabs.com>
Subject: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

OVS may be unable to transmit packets for multiple reasons and today there
is a single counter to track packets dropped due to any of those reasons.
The most common reason is that a VM is unable to read packets fast enough
causing the vhostuser port transmit queue on the OVS side to become full.
This manifests as a problem with VNFs not receiving all packets. Having a
separate drop counter to track packets dropped because the transmit queue is
full will clearly indicate that the problem is on the VM side and not in
OVS. Similarly maintaining separate counters for all possible drops helps in
indicating sensible cause for packet drops.

This patch adds custom software stats counters to track packets dropped at
port level and these counters are displayed along with other stats in
"ovs-vsctl get interface <iface> statistics"
command. The detailed stats will be available for both dpdk and vhostuser
ports.

--
Changes since v8:
Addressed comments given by Ilya.

Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
---
 Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
 lib/netdev-dpdk.c                             | 81 +++++++++++++++----
 utilities/bugtool/automake.mk                 |  3 +-
 utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
 .../plugins/network-status/openvswitch.xml    |  1 +
 5 files changed, 105 insertions(+), 18 deletions(-)  create mode 100755
utilities/bugtool/ovs-bugtool-get-port-stats

diff --git a/Documentation/topics/dpdk/vhost-user.rst
b/Documentation/topics/dpdk/vhost-user.rst
index 724aa62f6..89388a2bf 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -551,7 +551,18 @@ processing packets at the required rate.
 The amount of Tx retries on a vhost-user or vhost-user-client interface can
be  shown with::
 
-  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
+  $ ovs-vsctl get Interface dpdkvhostclient0 
+ statistics:netdev_dpdk_tx_retries
+
+When the guest is not able to consume the packets fast enough, the 
+transmit queue of the port gets filled up i.e queue runs out of free
descriptors.
+This is the most likely reason why dpdk transmit API will fail to send 
+packets besides other reasons.
+
+The amount of tx failure drops on a dpdk vhost/physical interface can 
+be shown with::
+
+  $ ovs-vsctl get Interface dpdkvhostclient0 \
+                            statistics:netdev_dpdk_tx_failure_drops
 
 vhost-user Dequeue Zero Copy (experimental)
 -------------------------------------------
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
652b57e3b..fd8f9102e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -174,6 +174,20 @@ static const struct vhost_device_ops
virtio_net_device_ops =
     .destroy_connection = destroy_connection,  };
 
+/* Custom software stats for dpdk ports */ struct netdev_dpdk_sw_stats 
+{
+    /* No. of retries when unable to transmit. */
+    uint64_t tx_retries;
+    /* Packet drops when unable to transmit; Probably Tx queue is full. */
+    uint64_t tx_failure_drops;
+    /* Packet length greater than device MTU. */
+    uint64_t tx_mtu_exceeded_drops;
+    /* Packet drops in egress policer processing. */
+    uint64_t tx_qos_drops;
+    /* Packet drops in ingress policer processing. */
+    uint64_t rx_qos_drops;
+};
+
 enum { DPDK_RING_SIZE = 256 };
 BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
 enum { DRAIN_TSC = 200000ULL };
@@ -413,11 +427,10 @@ struct netdev_dpdk {
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
         struct netdev_stats stats;
-        /* Custom stat for retries when unable to transmit. */
-        uint64_t tx_retries;
+        struct netdev_dpdk_sw_stats *sw_stats;
         /* Protects stats */
         rte_spinlock_t stats_lock;
-        /* 4 pad bytes here. */
+        /* 36 pad bytes here. */
     );
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1173,7 +1186,9 @@ common_construct(struct netdev *netdev, dpdk_port_t
port_no,
     dev->rte_xstats_ids = NULL;
     dev->rte_xstats_ids_size = 0;
 
-    dev->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX;
+    dev->sw_stats = (struct netdev_dpdk_sw_stats *)
+                        xzalloc(sizeof *dev->sw_stats);
+    dev->sw_stats->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : 
+ UINT64_MAX;
 
     return 0;
 }
@@ -1359,6 +1374,7 @@ common_destruct(struct netdev_dpdk *dev)
     ovs_list_remove(&dev->list_node);
     free(ovsrcu_get_protected(struct ingress_policer *,
                               &dev->ingress_policer));
+    free(dev->sw_stats);
     ovs_mutex_destroy(&dev->mutex);
 }
 
@@ -2209,6 +2225,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_rx_counters(&dev->stats, batch->packets,
                                          nb_rx, dropped);
+    dev->sw_stats->rx_qos_drops += dropped;
     rte_spinlock_unlock(&dev->stats_lock);
 
     batch->count = nb_rx;
@@ -2258,6 +2275,7 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
dp_packet_batch *batch,
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.rx_dropped += dropped;
+        dev->sw_stats->rx_qos_drops += dropped;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 
@@ -2341,6 +2359,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int
qid,
     struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
     unsigned int total_pkts = cnt;
     unsigned int dropped = 0;
+    unsigned int tx_failure;
+    unsigned int mtu_drops;
+    unsigned int qos_drops;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
     int i, retries = 0;
     int max_retries = VHOST_ENQ_RETRY_MIN;
     int vid = netdev_dpdk_get_vid(dev); @@ -2358,9 +2380,12 @@
__netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
     rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
 
     cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
+    mtu_drops = total_pkts - cnt;
+    qos_drops = cnt;
     /* Check has QoS has been configured for the netdev */
     cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true);
-    dropped = total_pkts - cnt;
+    qos_drops -= cnt;
+    dropped = qos_drops + mtu_drops;
 
     do {
         int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ; @@ -2385,12
+2410,16 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
         }
     } while (cnt && (retries++ < max_retries));
 
+    tx_failure = cnt;
     rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
 
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
                                          cnt + dropped);
-    dev->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_failure_drops += tx_failure;
+    sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+    sw_stats->tx_qos_drops += qos_drops;
     rte_spinlock_unlock(&dev->stats_lock);
 
 out:
@@ -2415,12 +2444,16 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
struct dp_packet_batch *batch)
     struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
     uint32_t cnt = batch_cnt;
     uint32_t dropped = 0;
+    uint32_t tx_failure = 0;
+    uint32_t mtu_drops = 0;
+    uint32_t qos_drops = 0;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
 
     if (dev->type != DPDK_DEV_VHOST) {
         /* Check if QoS has been configured for this netdev. */
         cnt = netdev_dpdk_qos_run(dev, (struct rte_mbuf **) batch->packets,
                                   batch_cnt, false);
-        dropped += batch_cnt - cnt;
+        qos_drops = batch_cnt - cnt;
     }
 
     uint32_t txcnt = 0;
@@ -2433,7 +2466,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct
dp_packet_batch *batch)
             VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
                          size, dev->max_packet_len);
 
-            dropped++;
+            mtu_drops++;
             continue;
         }
 
@@ -2456,13 +2489,17 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
struct dp_packet_batch *batch)
             __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **)
pkts,
                                      txcnt);
         } else {
-            dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt);
+            tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, 
+ txcnt);
         }
     }
 
+    dropped += qos_drops + mtu_drops + tx_failure;
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.tx_dropped += dropped;
+        sw_stats->tx_failure_drops += tx_failure;
+        sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+        sw_stats->tx_qos_drops += qos_drops;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 }
@@ -2503,19 +2540,27 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
         dpdk_do_tx_copy(netdev, qid, batch);
         dp_packet_delete_batch(batch, true);
     } else {
+        struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
         int tx_cnt, dropped;
+        int tx_failure, mtu_drops, qos_drops;
         int batch_cnt = dp_packet_batch_size(batch);
         struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets;
 
         tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt);
+        mtu_drops = batch_cnt - tx_cnt;
+        qos_drops = tx_cnt;
         tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true);
-        dropped = batch_cnt - tx_cnt;
+        qos_drops -= tx_cnt;
 
-        dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
+        tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
 
+        dropped = tx_failure + mtu_drops + qos_drops;
         if (OVS_UNLIKELY(dropped)) {
             rte_spinlock_lock(&dev->stats_lock);
             dev->stats.tx_dropped += dropped;
+            sw_stats->tx_failure_drops += tx_failure;
+            sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+            sw_stats->tx_qos_drops += qos_drops;
             rte_spinlock_unlock(&dev->stats_lock);
         }
     }
@@ -2826,8 +2871,12 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
     int i, n;
 
-#define SW_CSTATS \
-    SW_CSTAT(tx_retries)
+#define SW_CSTATS                    \
+    SW_CSTAT(tx_retries)             \
+    SW_CSTAT(tx_failure_drops)       \
+    SW_CSTAT(tx_mtu_exceeded_drops)  \
+    SW_CSTAT(tx_qos_drops)           \
+    SW_CSTAT(rx_qos_drops)
 
 #define SW_CSTAT(NAME) + 1
     custom_stats->size = SW_CSTATS;
@@ -2840,7 +2889,7 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     rte_spinlock_lock(&dev->stats_lock);
     i = 0;
 #define SW_CSTAT(NAME) \
-    custom_stats->counters[i++].value = dev->NAME;
+    custom_stats->counters[i++].value = dev->sw_stats->NAME;
     SW_CSTATS;
 #undef SW_CSTAT
     rte_spinlock_unlock(&dev->stats_lock);
@@ -2851,8 +2900,8 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev
*netdev,
     n = 0;
 #define SW_CSTAT(NAME) \
     if (custom_stats->counters[i].value != UINT64_MAX) {
\
-        ovs_strlcpy(custom_stats->counters[n].name, #NAME,
\
-                    NETDEV_CUSTOM_STATS_NAME_SIZE);
\
+        ovs_strlcpy(custom_stats->counters[n].name,
\
+                   "netdev_dpdk_"#NAME, NETDEV_CUSTOM_STATS_NAME_SIZE); 
+ \
         custom_stats->counters[n].value = custom_stats->counters[i].value;
\
         n++;
\
     }
\
diff --git a/utilities/bugtool/automake.mk b/utilities/bugtool/automake.mk
index 40980b367..dda58e0a1 100644
--- a/utilities/bugtool/automake.mk
+++ b/utilities/bugtool/automake.mk
@@ -22,7 +22,8 @@ bugtool_scripts = \
 	utilities/bugtool/ovs-bugtool-ovs-bridge-datapath-type \
 	utilities/bugtool/ovs-bugtool-ovs-vswitchd-threads-affinity \
 	utilities/bugtool/ovs-bugtool-qos-configs \
-	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa
+	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa \
+	utilities/bugtool/ovs-bugtool-get-port-stats
 
 scripts_SCRIPTS += $(bugtool_scripts)
 
diff --git a/utilities/bugtool/ovs-bugtool-get-port-stats
b/utilities/bugtool/ovs-bugtool-get-port-stats
new file mode 100755
index 000000000..0fe175e6b
--- /dev/null
+++ b/utilities/bugtool/ovs-bugtool-get-port-stats
@@ -0,0 +1,25 @@
+#! /bin/bash
+
+# This library is free software; you can redistribute it and/or # 
+modify it under the terms of version 2.1 of the GNU Lesser General # 
+Public License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful, # but 
+WITHOUT ANY WARRANTY; without even the implied warranty of # 
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU # 
+Lesser General Public License for more details.
+#
+# Copyright (C) 2019 Ericsson AB
+
+for bridge in `ovs-vsctl -- --real list-br` do
+    echo -e "\nBridge : ${bridge}\n"
+    for iface in `ovs-vsctl list-ifaces ${bridge}`
+    do
+        echo -e "iface : ${iface}"
+        ovs-vsctl get interface ${iface} statistics
+        echo -e "\n"
+    done
+    echo -e "iface : ${bridge}"
+    ovs-vsctl get interface ${bridge} statistics done
diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml
b/utilities/bugtool/plugins/network-status/openvswitch.xml
index d39867c6e..8c1ec643e 100644
--- a/utilities/bugtool/plugins/network-status/openvswitch.xml
+++ b/utilities/bugtool/plugins/network-status/openvswitch.xml
@@ -40,4 +40,5 @@
     <command label="ovs-ofctl-dump-groups"
filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over
-bridges "dump-groups"</command>
     <command label="ovs-ofctl-dump-group-stats" filters="ovs"
repeat="2">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-br
idges "dump-group-stats"</command>
     <command label="get_dpdk_nic_numa"
filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-dpdk-nic-numa</
command>
+    <command label="get_port_statistics" 
+ filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-port-stat
+ s</command>
 </collect>
--
2.20.1
Kevin Traynor Oct. 16, 2019, 10:15 a.m. UTC | #6
Hi Sriram,

Thanks for working on making more fine grained stats for debugging. As
mentioned yesterday, this patch needs rebase so I just reviewed docs and
netdev-dpdk.c which applied. Comments below.

On 21/09/2019 03:40, Sriram Vatala wrote:
> OVS may be unable to transmit packets for multiple reasons and
> today there is a single counter to track packets dropped due to
> any of those reasons. The most common reason is that a VM is
> unable to read packets fast enough causing the vhostuser port
> transmit queue on the OVS side to become full. This manifests
> as a problem with VNFs not receiving all packets. Having a
> separate drop counter to track packets dropped because the
> transmit queue is full will clearly indicate that the problem
> is on the VM side and not in OVS. Similarly maintaining separate

It reads like a bit of a contradiction. On one hand the "The *most
common* reason is that a VM is unable to read packets fast enough".
While having a stat "*will clearly indicate* that the problem is on the
VM side".

> counters for all possible drops helps in indicating sensible
> cause for packet drops.
> 
> This patch adds custom software stats counters to track packets
> dropped at port level and these counters are displayed along with
> other stats in "ovs-vsctl get interface <iface> statistics"
> command. The detailed stats will be available for both dpdk and
> vhostuser ports.
> 

I think the commit msg could be a bit clearer, suggest something like
below - feel free to modify (or reject):

OVS may be unable to transmit packets for multiple reasons on the DPDK
datapath and today there is a single counter to track packets dropped
due to any of those reasons.

This patch adds custom software stats for the different reasons packets
may be dropped during tx on the DPDK datapath in OVS.

- MTU drops : drops that occur due to a too large packet size
- Qos drops: drops that occur due to egress QOS
- Tx failures: drops as returned by the DPDK PMD send function

Note that the reason for tx failures is not specificied in OVS. In
practice for vhost ports it is most common that tx failures are because
there are not enough available descriptors, which is usually caused by
misconfiguration of the guest queues and/or because the guest is not
consuming packets fast enough from the queues.

These counters are displayed along with other stats in "ovs-vsctl get
interface <iface> statistics" command and are available for dpdk and
vhostuser/vhostuserclient ports.

Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>

---

v9:...
v8:...


Note that signed-off, --- and version info should be like this ^^^.
otherwise the review version comments will be in the git commit log when
it is applied.

> --
> Changes since v8:
> Addressed comments given by Ilya.
> 
> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
> ---
>  Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>  lib/netdev-dpdk.c                             | 81 +++++++++++++++----
>  utilities/bugtool/automake.mk                 |  3 +-
>  utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>  .../plugins/network-status/openvswitch.xml    |  1 +
>  5 files changed, 105 insertions(+), 18 deletions(-)
>  create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
> 
> diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
> index 724aa62f6..89388a2bf 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -551,7 +551,18 @@ processing packets at the required rate.
>  The amount of Tx retries on a vhost-user or vhost-user-client interface can be
>  shown with::
>  
> -  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
> +  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:netdev_dpdk_tx_retries

I think the "netdev_dpdk_" prefixes should be removed for a few reasons,

- These are custom stats that will only be displayed for the dpdk ports
so they don't need any extra prefix to say they are for dpdk ports

- The 'tx_retries' stat is part of OVS 2.12, I don't like to change its
name to a different one in OVS 2.13 unless there is a very good reason

- The existing stats don't have this type of prefixes (granted most of
them are general stats):
# ovs-vsctl get Interface dpdkvhost0 statistics
{"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
"rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
"rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
"rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
rx_errors=0, rx_packets=25622176, tx_bytes=3829825920, tx_dropped=0,
tx_packets=63830432, tx_retries=0}

Also, just to note that if there are changes to existing
interfaces/behaviour it should really mention that in the commit message
so it is highlighted.

> +
> +When the guest is not able to consume the packets fast enough, the transmit
> +queue of the port gets filled up i.e queue runs out of free descriptors.
> +This is the most likely reason why dpdk transmit API will fail to send packets
> +besides other reasons.
> +
> +The amount of tx failure drops on a dpdk vhost/physical interface can be
> +shown with::
> +
> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
> +                            statistics:netdev_dpdk_tx_failure_drops
>  

The commit msg/docs are only focusing on one stat for vhost ports, but
there are other stats and dpdk ports, so they should all get some mention.

>  vhost-user Dequeue Zero Copy (experimental)
>  -------------------------------------------
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 652b57e3b..fd8f9102e 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -174,6 +174,20 @@ static const struct vhost_device_ops virtio_net_device_ops =
>      .destroy_connection = destroy_connection,
>  };
>  
> +/* Custom software stats for dpdk ports */
> +struct netdev_dpdk_sw_stats {
> +    /* No. of retries when unable to transmit. */
> +    uint64_t tx_retries;
> +    /* Packet drops when unable to transmit; Probably Tx queue is full. */
> +    uint64_t tx_failure_drops;
> +    /* Packet length greater than device MTU. */
> +    uint64_t tx_mtu_exceeded_drops;
> +    /* Packet drops in egress policer processing. */
> +    uint64_t tx_qos_drops;
> +    /* Packet drops in ingress policer processing. */
> +    uint64_t rx_qos_drops;
> +};
> +
>  enum { DPDK_RING_SIZE = 256 };
>  BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
>  enum { DRAIN_TSC = 200000ULL };
> @@ -413,11 +427,10 @@ struct netdev_dpdk {
>  
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
>          struct netdev_stats stats;
> -        /* Custom stat for retries when unable to transmit. */
> -        uint64_t tx_retries;
> +        struct netdev_dpdk_sw_stats *sw_stats;
>          /* Protects stats */
>          rte_spinlock_t stats_lock;
> -        /* 4 pad bytes here. */
> +        /* 36 pad bytes here. */

Thanks, I'm glad one of us can count :-)

>      );
>  
>      PADDED_MEMBERS(CACHE_LINE_SIZE,
> @@ -1173,7 +1186,9 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no,
>      dev->rte_xstats_ids = NULL;
>      dev->rte_xstats_ids_size = 0;
>  
> -    dev->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX;
> +    dev->sw_stats = (struct netdev_dpdk_sw_stats *)

You don't need the cast here.

Ilya mentioned on v8:

> This should be:
>        dev->sw_stats = xzalloc(sizeof *dev->sw_stats);
> Or
>        dev->sw_stats = dpdk_rte_mzalloc(sizeof *dev->sw_stats);
>
> The later variant will require proper error handling as it could
return NULL.
>
> See the Documentation/internals/contributing/coding-style.rst for details.


> +                        xzalloc(sizeof *dev->sw_stats);
> +    dev->sw_stats->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX;
>  
>      return 0;
>  }
> @@ -1359,6 +1374,7 @@ common_destruct(struct netdev_dpdk *dev)
>      ovs_list_remove(&dev->list_node);
>      free(ovsrcu_get_protected(struct ingress_policer *,
>                                &dev->ingress_policer));
> +    free(dev->sw_stats);
>      ovs_mutex_destroy(&dev->mutex);
>  }
>  
> @@ -2209,6 +2225,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
>      rte_spinlock_lock(&dev->stats_lock);
>      netdev_dpdk_vhost_update_rx_counters(&dev->stats, batch->packets,
>                                           nb_rx, dropped);
> +    dev->sw_stats->rx_qos_drops += dropped;
>      rte_spinlock_unlock(&dev->stats_lock);
>  
>      batch->count = nb_rx;
> @@ -2258,6 +2275,7 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch,
>      if (OVS_UNLIKELY(dropped)) {
>          rte_spinlock_lock(&dev->stats_lock);
>          dev->stats.rx_dropped += dropped;
> +        dev->sw_stats->rx_qos_drops += dropped;
>          rte_spinlock_unlock(&dev->stats_lock);
>      }
>  
> @@ -2341,6 +2359,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>      struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
>      unsigned int total_pkts = cnt;
>      unsigned int dropped = 0;
> +    unsigned int tx_failure;
> +    unsigned int mtu_drops;
> +    unsigned int qos_drops;

> +    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;

I think generally the structs are grouped, so you can move this up to be
directly underneath the other structs above.

>      int i, retries = 0;
>      int max_retries = VHOST_ENQ_RETRY_MIN;
>      int vid = netdev_dpdk_get_vid(dev);
> @@ -2358,9 +2380,12 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>      rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
>  
>      cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
> +    mtu_drops = total_pkts - cnt;
> +    qos_drops = cnt;
>      /* Check has QoS has been configured for the netdev */
>      cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true);
> -    dropped = total_pkts - cnt;
> +    qos_drops -= cnt;
> +    dropped = qos_drops + mtu_drops;
>  
>      do {
>          int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
> @@ -2385,12 +2410,16 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>          }
>      } while (cnt && (retries++ < max_retries));
>  
> +    tx_failure = cnt;
>      rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
>  
>      rte_spinlock_lock(&dev->stats_lock);
>      netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
>                                           cnt + dropped);
> -    dev->tx_retries += MIN(retries, max_retries);
> +    sw_stats->tx_retries += MIN(retries, max_retries);
> +    sw_stats->tx_failure_drops += tx_failure;
> +    sw_stats->tx_mtu_exceeded_drops += mtu_drops;
> +    sw_stats->tx_qos_drops += qos_drops;

Previously commented that I thought you could create a function for
storing some of these stats. If you can do something for the common ones
that can be reused elsewhere it might be worthwhile but I won't insist.

>      rte_spinlock_unlock(&dev->stats_lock);
>  
>  out:
> @@ -2415,12 +2444,16 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
>      struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
>      uint32_t cnt = batch_cnt;
>      uint32_t dropped = 0;
> +    uint32_t tx_failure = 0;
> +    uint32_t mtu_drops = 0;
> +    uint32_t qos_drops = 0;
> +    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
>  

Can group the structs together

>      if (dev->type != DPDK_DEV_VHOST) {
>          /* Check if QoS has been configured for this netdev. */
>          cnt = netdev_dpdk_qos_run(dev, (struct rte_mbuf **) batch->packets,
>                                    batch_cnt, false);
> -        dropped += batch_cnt - cnt;
> +        qos_drops = batch_cnt - cnt;
>      }
>  
>      uint32_t txcnt = 0;
> @@ -2433,7 +2466,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
>              VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
>                           size, dev->max_packet_len);
>  
> -            dropped++;
> +            mtu_drops++;
>              continue;
>          }
>  

Just below this line is

        pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
        if (OVS_UNLIKELY(!pkts[txcnt])) {
            dropped += cnt - i;
                    ^^ you can change to '=' as it is not used before
and there is a break from the loop if you get here


> @@ -2456,13 +2489,17 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
>              __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **) pkts,
>                                       txcnt);
>          } else {
> -            dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt);
> +            tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt);
>          }
>      }
>  
> +    dropped += qos_drops + mtu_drops + tx_failure;
>      if (OVS_UNLIKELY(dropped)) {
>          rte_spinlock_lock(&dev->stats_lock);
>          dev->stats.tx_dropped += dropped;
> +        sw_stats->tx_failure_drops += tx_failure;
> +        sw_stats->tx_mtu_exceeded_drops += mtu_drops;
> +        sw_stats->tx_qos_drops += qos_drops;
>          rte_spinlock_unlock(&dev->stats_lock);
>      }
>  }
> @@ -2503,19 +2540,27 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
>          dpdk_do_tx_copy(netdev, qid, batch);
>          dp_packet_delete_batch(batch, true);
>      } else {
> +        struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
>          int tx_cnt, dropped;
> +        int tx_failure, mtu_drops, qos_drops;
>          int batch_cnt = dp_packet_batch_size(batch);
>          struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets;
>  
>          tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt);
> +        mtu_drops = batch_cnt - tx_cnt;
> +        qos_drops = tx_cnt;
>          tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true);
> -        dropped = batch_cnt - tx_cnt;
> +        qos_drops -= tx_cnt;
>  
> -        dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
> +        tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
>  
> +        dropped = tx_failure + mtu_drops + qos_drops;
>          if (OVS_UNLIKELY(dropped)) {
>              rte_spinlock_lock(&dev->stats_lock);
>              dev->stats.tx_dropped += dropped;
> +            sw_stats->tx_failure_drops += tx_failure;
> +            sw_stats->tx_mtu_exceeded_drops += mtu_drops;
> +            sw_stats->tx_qos_drops += qos_drops;
>              rte_spinlock_unlock(&dev->stats_lock);
>          }
>      }
> @@ -2826,8 +2871,12 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev,
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>      int i, n;
>  
> -#define SW_CSTATS \
> -    SW_CSTAT(tx_retries)
> +#define SW_CSTATS                    \
> +    SW_CSTAT(tx_retries)             \
> +    SW_CSTAT(tx_failure_drops)       \
> +    SW_CSTAT(tx_mtu_exceeded_drops)  \
> +    SW_CSTAT(tx_qos_drops)           \
> +    SW_CSTAT(rx_qos_drops)
>  
>  #define SW_CSTAT(NAME) + 1
>      custom_stats->size = SW_CSTATS;
> @@ -2840,7 +2889,7 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev,
>      rte_spinlock_lock(&dev->stats_lock);
>      i = 0;
>  #define SW_CSTAT(NAME) \
> -    custom_stats->counters[i++].value = dev->NAME;
> +    custom_stats->counters[i++].value = dev->sw_stats->NAME;
>      SW_CSTATS;
>  #undef SW_CSTAT
>      rte_spinlock_unlock(&dev->stats_lock);
> @@ -2851,8 +2900,8 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev,
>      n = 0;
>  #define SW_CSTAT(NAME) \
>      if (custom_stats->counters[i].value != UINT64_MAX) {                   \
> -        ovs_strlcpy(custom_stats->counters[n].name, #NAME,                 \
> -                    NETDEV_CUSTOM_STATS_NAME_SIZE);                        \
> +        ovs_strlcpy(custom_stats->counters[n].name,                        \
> +                   "netdev_dpdk_"#NAME, NETDEV_CUSTOM_STATS_NAME_SIZE); \

can remove this "netdev_dpdk_" prefix as per comment above

>          custom_stats->counters[n].value = custom_stats->counters[i].value; \
>          n++;                                                               \
>      }                                                                      \
> diff --git a/utilities/bugtool/automake.mk b/utilities/bugtool/automake.mk
> index 40980b367..dda58e0a1 100644
> --- a/utilities/bugtool/automake.mk
> +++ b/utilities/bugtool/automake.mk
> @@ -22,7 +22,8 @@ bugtool_scripts = \
>  	utilities/bugtool/ovs-bugtool-ovs-bridge-datapath-type \
>  	utilities/bugtool/ovs-bugtool-ovs-vswitchd-threads-affinity \
>  	utilities/bugtool/ovs-bugtool-qos-configs \
> -	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa
> +	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa \
> +	utilities/bugtool/ovs-bugtool-get-port-stats
>  
>  scripts_SCRIPTS += $(bugtool_scripts)
>  
> diff --git a/utilities/bugtool/ovs-bugtool-get-port-stats b/utilities/bugtool/ovs-bugtool-get-port-stats
> new file mode 100755
> index 000000000..0fe175e6b
> --- /dev/null
> +++ b/utilities/bugtool/ovs-bugtool-get-port-stats
> @@ -0,0 +1,25 @@
> +#! /bin/bash
> +
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of version 2.1 of the GNU Lesser General
> +# Public License as published by the Free Software Foundation.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# Copyright (C) 2019 Ericsson AB
> +
> +for bridge in `ovs-vsctl -- --real list-br`
> +do
> +    echo -e "\nBridge : ${bridge}\n"
> +    for iface in `ovs-vsctl list-ifaces ${bridge}`
> +    do
> +        echo -e "iface : ${iface}"
> +        ovs-vsctl get interface ${iface} statistics
> +        echo -e "\n"
> +    done
> +    echo -e "iface : ${bridge}"
> +    ovs-vsctl get interface ${bridge} statistics
> +done
> diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml b/utilities/bugtool/plugins/network-status/openvswitch.xml
> index d39867c6e..8c1ec643e 100644
> --- a/utilities/bugtool/plugins/network-status/openvswitch.xml
> +++ b/utilities/bugtool/plugins/network-status/openvswitch.xml
> @@ -40,4 +40,5 @@
>      <command label="ovs-ofctl-dump-groups" filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges "dump-groups"</command>
>      <command label="ovs-ofctl-dump-group-stats" filters="ovs" repeat="2">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges "dump-group-stats"</command>
>      <command label="get_dpdk_nic_numa" filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-dpdk-nic-numa</command>
> +    <command label="get_port_statistics" filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-port-stats</command>
>  </collect>
> 

Didn't review these last files due to issues with applying the patch

Kevin.
Ilya Maximets Oct. 16, 2019, 12:16 p.m. UTC | #7
Hi Kevin,

Thanks for review. Some comments inline.

On 16.10.2019 12:15, Kevin Traynor wrote:
> Hi Sriram,
> 
> Thanks for working on making more fine grained stats for debugging. As
> mentioned yesterday, this patch needs rebase so I just reviewed docs and
> netdev-dpdk.c which applied. Comments below.
> 
> On 21/09/2019 03:40, Sriram Vatala wrote:
>> OVS may be unable to transmit packets for multiple reasons and
>> today there is a single counter to track packets dropped due to
>> any of those reasons. The most common reason is that a VM is
>> unable to read packets fast enough causing the vhostuser port
>> transmit queue on the OVS side to become full. This manifests
>> as a problem with VNFs not receiving all packets. Having a
>> separate drop counter to track packets dropped because the
>> transmit queue is full will clearly indicate that the problem
>> is on the VM side and not in OVS. Similarly maintaining separate
> 
> It reads like a bit of a contradiction. On one hand the "The *most
> common* reason is that a VM is unable to read packets fast enough".
> While having a stat "*will clearly indicate* that the problem is on the
> VM side".
> 
>> counters for all possible drops helps in indicating sensible
>> cause for packet drops.
>>
>> This patch adds custom software stats counters to track packets
>> dropped at port level and these counters are displayed along with
>> other stats in "ovs-vsctl get interface <iface> statistics"
>> command. The detailed stats will be available for both dpdk and
>> vhostuser ports.
>>
> 
> I think the commit msg could be a bit clearer, suggest something like
> below - feel free to modify (or reject):
> 
> OVS may be unable to transmit packets for multiple reasons on the DPDK
> datapath and today there is a single counter to track packets dropped
> due to any of those reasons.
> 
> This patch adds custom software stats for the different reasons packets
> may be dropped during tx on the DPDK datapath in OVS.
> 
> - MTU drops : drops that occur due to a too large packet size
> - Qos drops: drops that occur due to egress QOS
> - Tx failures: drops as returned by the DPDK PMD send function
> 
> Note that the reason for tx failures is not specificied in OVS. In
> practice for vhost ports it is most common that tx failures are because
> there are not enough available descriptors, which is usually caused by
> misconfiguration of the guest queues and/or because the guest is not
> consuming packets fast enough from the queues.
> 
> These counters are displayed along with other stats in "ovs-vsctl get
> interface <iface> statistics" command and are available for dpdk and
> vhostuser/vhostuserclient ports.
> 
> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
> 
> ---
> 
> v9:...
> v8:...
> 
> 
> Note that signed-off, --- and version info should be like this ^^^.
> otherwise the review version comments will be in the git commit log when
> it is applied.
> 
>> --
>> Changes since v8:
>> Addressed comments given by Ilya.
>>
>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>> ---
>>   Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>>   lib/netdev-dpdk.c                             | 81 +++++++++++++++----
>>   utilities/bugtool/automake.mk                 |  3 +-
>>   utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>>   .../plugins/network-status/openvswitch.xml    |  1 +
>>   5 files changed, 105 insertions(+), 18 deletions(-)
>>   create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
>>
>> diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
>> index 724aa62f6..89388a2bf 100644
>> --- a/Documentation/topics/dpdk/vhost-user.rst
>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>> @@ -551,7 +551,18 @@ processing packets at the required rate.
>>   The amount of Tx retries on a vhost-user or vhost-user-client interface can be
>>   shown with::
>>   
>> -  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
>> +  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:netdev_dpdk_tx_retries
> 
> I think the "netdev_dpdk_" prefixes should be removed for a few reasons,
> 
> - These are custom stats that will only be displayed for the dpdk ports
> so they don't need any extra prefix to say they are for dpdk ports
> 
> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change its
> name to a different one in OVS 2.13 unless there is a very good reason
> 
> - The existing stats don't have this type of prefixes (granted most of
> them are general stats):

The main reason for the prefix for me is to distinguish those stats
from the stats that comest from the driver/HW.  Our new stats has
very generic names that could be named a lot like or even equal to
HW stats.  This might be not an issue for vhostuser, but for HW
NICs this may be really confusing for users.

I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a
way to clearly distinguish those stats.  We may use different prefixes
like 'sw_' or just 'ovs_'.

> # ovs-vsctl get Interface dpdkvhost0 statistics
> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920, tx_dropped=0,
> tx_packets=63830432, tx_retries=0}
> 
> Also, just to note that if there are changes to existing
> interfaces/behaviour it should really mention that in the commit message
> so it is highlighted.
> 
>> +
>> +When the guest is not able to consume the packets fast enough, the transmit
>> +queue of the port gets filled up i.e queue runs out of free descriptors.
>> +This is the most likely reason why dpdk transmit API will fail to send packets
>> +besides other reasons.
>> +
>> +The amount of tx failure drops on a dpdk vhost/physical interface can be
>> +shown with::
>> +
>> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
>> +                            statistics:netdev_dpdk_tx_failure_drops
>>   
> 
> The commit msg/docs are only focusing on one stat for vhost ports, but
> there are other stats and dpdk ports, so they should all get some mention.

I don't feel comfortable for documenting custom stats.  This is just because
we can't describe them all.  These are things to be changed over time.
And we don't really know what some types of stats means and if they means the
same for different types of HW NICs (they are definitely not) or OVS netdevs
(even within netdev-dpdk).
And that is one more reason to have a prefix for OVS internal statistics on
which we have at least partial control.

I think, that user documentation could mention some special statistics while
describing the troubleshooting for some hard special cases, but this should
not be a glossary of all the possible custom stats.  From my point of view,
names of the stats should be as possible self-descriptive and should not
require additional documentation.

BTW, you will not find any description for statistics provided by the linux
or DPDK drivers.  You could only look at the driver code or device spec.

Best regards, Ilya Maximets.
Li,Rongqing via dev Oct. 16, 2019, 2:20 p.m. UTC | #8
Hi Kevin & Ilya,
Thanks Kevin for reviewing the patch. Response inline.
Thanks Ilya for your response.

-----Original Message-----
From: Ilya Maximets <i.maximets@ovn.org>
Sent: 16 October 2019 17:46
To: Kevin Traynor <ktraynor@redhat.com>; Sriram Vatala 
<sriram.v@altencalsoftlabs.com>; ovs-dev@openvswitch.org; Ilya Maximets 
<i.maximets@ovn.org>
Cc: Stokes, Ian <ian.stokes@intel.com>
Subject: Re: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

Hi Kevin,

Thanks for review. Some comments inline.

On 16.10.2019 12:15, Kevin Traynor wrote:
> Hi Sriram,
>
> Thanks for working on making more fine grained stats for debugging. As
> mentioned yesterday, this patch needs rebase so I just reviewed docs
> and netdev-dpdk.c which applied. Comments below.
>
> On 21/09/2019 03:40, Sriram Vatala wrote:
>> OVS may be unable to transmit packets for multiple reasons and today
>> there is a single counter to track packets dropped due to any of
>> those reasons. The most common reason is that a VM is unable to read
>> packets fast enough causing the vhostuser port transmit queue on the
>> OVS side to become full. This manifests as a problem with VNFs not
>> receiving all packets. Having a separate drop counter to track
>> packets dropped because the transmit queue is full will clearly
>> indicate that the problem is on the VM side and not in OVS. Similarly
>> maintaining separate
>

> It reads like a bit of a contradiction. On one hand the "The *most
> common* reason is that a VM is unable to read packets fast enough".
> While having a stat "*will clearly indicate* that the problem is on
> the VM side".
>

    I mentioned one use-case to emphasis the need for separate drop stats 
besides combined drop counter. I will rephrase the commit message.

>> counters for all possible drops helps in indicating sensible cause
>> for packet drops.
>>
>> This patch adds custom software stats counters to track packets
>> dropped at port level and these counters are displayed along with
>> other stats in "ovs-vsctl get interface <iface> statistics"
>> command. The detailed stats will be available for both dpdk and
>> vhostuser ports.
>>
>
> I think the commit msg could be a bit clearer, suggest something like
> below - feel free to modify (or reject):
>
> OVS may be unable to transmit packets for multiple reasons on the DPDK
> datapath and today there is a single counter to track packets dropped
> due to any of those reasons.
>
> This patch adds custom software stats for the different reasons
> packets may be dropped during tx on the DPDK datapath in OVS.
>
> - MTU drops : drops that occur due to a too large packet size
> - Qos drops: drops that occur due to egress QOS
> - Tx failures: drops as returned by the DPDK PMD send function
>
> Note that the reason for tx failures is not specificied in OVS. In
> practice for vhost ports it is most common that tx failures are
> because there are not enough available descriptors, which is usually
> caused by misconfiguration of the guest queues and/or because the
> guest is not consuming packets fast enough from the queues.
>
> These counters are displayed along with other stats in "ovs-vsctl get
> interface <iface> statistics" command and are available for dpdk and
> vhostuser/vhostuserclient ports.
>
    Looks good to me. Thanks for the suggestion.
> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>
> ---
>
> v9:...
> v8:...
>
>
> Note that signed-off, --- and version info should be like this ^^^.
> otherwise the review version comments will be in the git commit log
> when it is applied.
>
   I will take care of this in my next patch.
>> --
>> Changes since v8:
>> Addressed comments given by Ilya.
>>
>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>> ---
>>   Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>>   lib/netdev-dpdk.c                             | 81 +++++++++++++++----
>>   utilities/bugtool/automake.mk                 |  3 +-
>>   utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>>   .../plugins/network-status/openvswitch.xml    |  1 +
>>   5 files changed, 105 insertions(+), 18 deletions(-)
>>   create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
>>
>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>> b/Documentation/topics/dpdk/vhost-user.rst
>> index 724aa62f6..89388a2bf 100644
>> --- a/Documentation/topics/dpdk/vhost-user.rst
>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>> @@ -551,7 +551,18 @@ processing packets at the required rate.
>>   The amount of Tx retries on a vhost-user or vhost-user-client interface 
>> can be
>>   shown with::
>>
>> -  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
>> +  $ ovs-vsctl get Interface dpdkvhostclient0
>> + statistics:netdev_dpdk_tx_retries
>
> I think the "netdev_dpdk_" prefixes should be removed for a few
> reasons,
>
> - These are custom stats that will only be displayed for the dpdk
> ports so they don't need any extra prefix to say they are for dpdk
> ports
>
> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change
> its name to a different one in OVS 2.13 unless there is a very good
> reason
>
> - The existing stats don't have this type of prefixes (granted most of
> them are general stats):

The main reason for the prefix for me is to distinguish those stats from the 
stats that comest from the driver/HW.  Our new stats has very generic names 
that could be named a lot like or even equal to HW stats.  This might be not 
an issue for vhostuser, but for HW NICs this may be really confusing for 
users.

I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a way to 
clearly distinguish those stats.  We may use different prefixes like 'sw_' or 
just 'ovs_'.

As Ilya has already mentioned, the reason for using the prefix is to 
distinguish the custom stats with driver stats. I am not insisting on this. 
Please share your thoughts on this. I can use the approach that is good to go 
with, for all.

> # ovs-vsctl get Interface dpdkvhost0 statistics
> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920, tx_dropped=0,
> tx_packets=63830432, tx_retries=0}
>
> Also, just to note that if there are changes to existing
> interfaces/behaviour it should really mention that in the commit
> message so it is highlighted.
>
OK. There are no changes to existing interfaces stats or behaviour except that 
name for "tx_retires" is changed. This might change based on the final 
approach of stats name(prefix/non-prefix) that is good to go with, for all. I 
will definitely mention in the commit message if there are any changes.
>> +
>> +When the guest is not able to consume the packets fast enough, the
>> +transmit queue of the port gets filled up i.e queue runs out of free 
>> descriptors.
>> +This is the most likely reason why dpdk transmit API will fail to
>> +send packets besides other reasons.
>> +
>> +The amount of tx failure drops on a dpdk vhost/physical interface
>> +can be shown with::
>> +
>> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
>> +                            statistics:netdev_dpdk_tx_failure_drops
>>
>
> The commit msg/docs are only focusing on one stat for vhost ports, but
> there are other stats and dpdk ports, so they should all get some mention.

I don't feel comfortable for documenting custom stats.  This is just because 
we can't describe them all.  These are things to be changed over time.
And we don't really know what some types of stats means and if they means the 
same for different types of HW NICs (they are definitely not) or OVS netdevs 
(even within netdev-dpdk).
And that is one more reason to have a prefix for OVS internal statistics on 
which we have at least partial control.

I think, that user documentation could mention some special statistics while 
describing the troubleshooting for some hard special cases, but this should 
not be a glossary of all the possible custom stats.  From my point of view, 
names of the stats should be as possible self-descriptive and should not 
require additional documentation.

BTW, you will not find any description for statistics provided by the linux or 
DPDK drivers.  You could only look at the driver code or device spec.

I felt that "tx failure drops" need some explanation apart from other stats 
because names for other stats are sensible to user, besides I have added brief 
description for every counter in comment lines above the stats structure. 
Hence documented only one stat. I am not insisting on this. Please share your 
thoughts.

Thanks & Regards,
Sriram.
Kevin Traynor Oct. 16, 2019, 5:48 p.m. UTC | #9
On 16/10/2019 13:16, Ilya Maximets wrote:
> Hi Kevin,
> 
> Thanks for review. Some comments inline.
> 
> On 16.10.2019 12:15, Kevin Traynor wrote:
>> Hi Sriram,
>>
>> Thanks for working on making more fine grained stats for debugging. As
>> mentioned yesterday, this patch needs rebase so I just reviewed docs and
>> netdev-dpdk.c which applied. Comments below.
>>
>> On 21/09/2019 03:40, Sriram Vatala wrote:
>>> OVS may be unable to transmit packets for multiple reasons and
>>> today there is a single counter to track packets dropped due to
>>> any of those reasons. The most common reason is that a VM is
>>> unable to read packets fast enough causing the vhostuser port
>>> transmit queue on the OVS side to become full. This manifests
>>> as a problem with VNFs not receiving all packets. Having a
>>> separate drop counter to track packets dropped because the
>>> transmit queue is full will clearly indicate that the problem
>>> is on the VM side and not in OVS. Similarly maintaining separate
>>
>> It reads like a bit of a contradiction. On one hand the "The *most
>> common* reason is that a VM is unable to read packets fast enough".
>> While having a stat "*will clearly indicate* that the problem is on the
>> VM side".
>>
>>> counters for all possible drops helps in indicating sensible
>>> cause for packet drops.
>>>
>>> This patch adds custom software stats counters to track packets
>>> dropped at port level and these counters are displayed along with
>>> other stats in "ovs-vsctl get interface <iface> statistics"
>>> command. The detailed stats will be available for both dpdk and
>>> vhostuser ports.
>>>
>>
>> I think the commit msg could be a bit clearer, suggest something like
>> below - feel free to modify (or reject):
>>
>> OVS may be unable to transmit packets for multiple reasons on the DPDK
>> datapath and today there is a single counter to track packets dropped
>> due to any of those reasons.
>>
>> This patch adds custom software stats for the different reasons packets
>> may be dropped during tx on the DPDK datapath in OVS.
>>
>> - MTU drops : drops that occur due to a too large packet size
>> - Qos drops: drops that occur due to egress QOS
>> - Tx failures: drops as returned by the DPDK PMD send function
>>
>> Note that the reason for tx failures is not specificied in OVS. In
>> practice for vhost ports it is most common that tx failures are because
>> there are not enough available descriptors, which is usually caused by
>> misconfiguration of the guest queues and/or because the guest is not
>> consuming packets fast enough from the queues.
>>
>> These counters are displayed along with other stats in "ovs-vsctl get
>> interface <iface> statistics" command and are available for dpdk and
>> vhostuser/vhostuserclient ports.
>>
>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>
>> ---
>>
>> v9:...
>> v8:...
>>
>>
>> Note that signed-off, --- and version info should be like this ^^^.
>> otherwise the review version comments will be in the git commit log when
>> it is applied.
>>
>>> --
>>> Changes since v8:
>>> Addressed comments given by Ilya.
>>>
>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>> ---
>>>   Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>>>   lib/netdev-dpdk.c                             | 81 +++++++++++++++----
>>>   utilities/bugtool/automake.mk                 |  3 +-
>>>   utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>>>   .../plugins/network-status/openvswitch.xml    |  1 +
>>>   5 files changed, 105 insertions(+), 18 deletions(-)
>>>   create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
>>>
>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
>>> index 724aa62f6..89388a2bf 100644
>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>> @@ -551,7 +551,18 @@ processing packets at the required rate.
>>>   The amount of Tx retries on a vhost-user or vhost-user-client interface can be
>>>   shown with::
>>>   
>>> -  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:netdev_dpdk_tx_retries
>>
>> I think the "netdev_dpdk_" prefixes should be removed for a few reasons,
>>
>> - These are custom stats that will only be displayed for the dpdk ports
>> so they don't need any extra prefix to say they are for dpdk ports
>>
>> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change its
>> name to a different one in OVS 2.13 unless there is a very good reason
>>
>> - The existing stats don't have this type of prefixes (granted most of
>> them are general stats):
> 
> The main reason for the prefix for me is to distinguish those stats
> from the stats that comest from the driver/HW.  Our new stats has
> very generic names that could be named a lot like or even equal to
> HW stats.  This might be not an issue for vhostuser, but for HW
> NICs this may be really confusing for users.
> 
> I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a
> way to clearly distinguish those stats.  We may use different prefixes
> like 'sw_' or just 'ovs_'.
> 

ok, I can see that we might want to distinguish custom stats to indicate
that they are specific for that device but that also has the side effect
of making them less self-descriptive and the user will have to be told
somewhere what the prefix means.

'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be
confusing too, as for example in DPDK case, Tx drops are calculated in
sw/ovs the same as qos/mtu/txfails but won't have that prefix. If we
really need a prefix 'cstat_' is the best I can think of.

>> # ovs-vsctl get Interface dpdkvhost0 statistics
>> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
>> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
>> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
>> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
>> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920, tx_dropped=0,
>> tx_packets=63830432, tx_retries=0}
>>
>> Also, just to note that if there are changes to existing
>> interfaces/behaviour it should really mention that in the commit message
>> so it is highlighted.
>>
>>> +
>>> +When the guest is not able to consume the packets fast enough, the transmit
>>> +queue of the port gets filled up i.e queue runs out of free descriptors.
>>> +This is the most likely reason why dpdk transmit API will fail to send packets
>>> +besides other reasons.
>>> +
>>> +The amount of tx failure drops on a dpdk vhost/physical interface can be
>>> +shown with::
>>> +
>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
>>> +                            statistics:netdev_dpdk_tx_failure_drops
>>>   
>>
>> The commit msg/docs are only focusing on one stat for vhost ports, but
>> there are other stats and dpdk ports, so they should all get some mention.
> 
> I don't feel comfortable for documenting custom stats.  This is just because
> we can't describe them all.  These are things to be changed over time.
> And we don't really know what some types of stats means and if they means the
> same for different types of HW NICs (they are definitely not) or OVS netdevs
> (even within netdev-dpdk).
> And that is one more reason to have a prefix for OVS internal statistics on
> which we have at least partial control.
> 
> I think, that user documentation could mention some special statistics while
> describing the troubleshooting for some hard special cases, but this should
> not be a glossary of all the possible custom stats.  From my point of view,
> names of the stats should be as possible self-descriptive and should not
> require additional documentation.
> 

For sure any prefix would need to be explained because that part would
not be intuitive.

thanks,
Kevin.

> BTW, you will not find any description for statistics provided by the linux
> or DPDK drivers.  You could only look at the driver code or device spec.
> 
> Best regards, Ilya Maximets.
>
Ben Pfaff Oct. 16, 2019, 10:20 p.m. UTC | #10
On Wed, Oct 16, 2019 at 02:16:14PM +0200, Ilya Maximets wrote:
> BTW, you will not find any description for statistics provided by the linux
> or DPDK drivers.  You could only look at the driver code or device spec.

I'm not sure the meanings are even consistent.  I've heard that
different Linux drivers count the number of bytes in a packet
differently, for example.
Li,Rongqing via dev Oct. 18, 2019, 11:59 a.m. UTC | #11
Hi,
Thanks Kevin for your suggestion.
Ilya, Can you please share your opinion also so that we can arrive at some 
conclusion on the approach.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Kevin Traynor <ktraynor@redhat.com>
Sent: 16 October 2019 23:18
To: Ilya Maximets <i.maximets@ovn.org>; Sriram Vatala 
<sriram.v@altencalsoftlabs.com>; ovs-dev@openvswitch.org
Cc: Stokes, Ian <ian.stokes@intel.com>
Subject: Re: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

On 16/10/2019 13:16, Ilya Maximets wrote:
> Hi Kevin,
>
> Thanks for review. Some comments inline.
>
> On 16.10.2019 12:15, Kevin Traynor wrote:
>> Hi Sriram,
>>
>> Thanks for working on making more fine grained stats for debugging.
>> As mentioned yesterday, this patch needs rebase so I just reviewed
>> docs and netdev-dpdk.c which applied. Comments below.
>>
>> On 21/09/2019 03:40, Sriram Vatala wrote:
>>> OVS may be unable to transmit packets for multiple reasons and today
>>> there is a single counter to track packets dropped due to any of
>>> those reasons. The most common reason is that a VM is unable to read
>>> packets fast enough causing the vhostuser port transmit queue on the
>>> OVS side to become full. This manifests as a problem with VNFs not
>>> receiving all packets. Having a separate drop counter to track
>>> packets dropped because the transmit queue is full will clearly
>>> indicate that the problem is on the VM side and not in OVS.
>>> Similarly maintaining separate
>>
>> It reads like a bit of a contradiction. On one hand the "The *most
>> common* reason is that a VM is unable to read packets fast enough".
>> While having a stat "*will clearly indicate* that the problem is on
>> the VM side".
>>
>>> counters for all possible drops helps in indicating sensible cause
>>> for packet drops.
>>>
>>> This patch adds custom software stats counters to track packets
>>> dropped at port level and these counters are displayed along with
>>> other stats in "ovs-vsctl get interface <iface> statistics"
>>> command. The detailed stats will be available for both dpdk and
>>> vhostuser ports.
>>>
>>
>> I think the commit msg could be a bit clearer, suggest something like
>> below - feel free to modify (or reject):
>>
>> OVS may be unable to transmit packets for multiple reasons on the
>> DPDK datapath and today there is a single counter to track packets
>> dropped due to any of those reasons.
>>
>> This patch adds custom software stats for the different reasons
>> packets may be dropped during tx on the DPDK datapath in OVS.
>>
>> - MTU drops : drops that occur due to a too large packet size
>> - Qos drops: drops that occur due to egress QOS
>> - Tx failures: drops as returned by the DPDK PMD send function
>>
>> Note that the reason for tx failures is not specificied in OVS. In
>> practice for vhost ports it is most common that tx failures are
>> because there are not enough available descriptors, which is usually
>> caused by misconfiguration of the guest queues and/or because the
>> guest is not consuming packets fast enough from the queues.
>>
>> These counters are displayed along with other stats in "ovs-vsctl get
>> interface <iface> statistics" command and are available for dpdk and
>> vhostuser/vhostuserclient ports.
>>
>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>
>> ---
>>
>> v9:...
>> v8:...
>>
>>
>> Note that signed-off, --- and version info should be like this ^^^.
>> otherwise the review version comments will be in the git commit log
>> when it is applied.
>>
>>> --
>>> Changes since v8:
>>> Addressed comments given by Ilya.
>>>
>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>> ---
>>>   Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>>>   lib/netdev-dpdk.c                             | 81 +++++++++++++++----
>>>   utilities/bugtool/automake.mk                 |  3 +-
>>>   utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>>>   .../plugins/network-status/openvswitch.xml    |  1 +
>>>   5 files changed, 105 insertions(+), 18 deletions(-)
>>>   create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
>>>
>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>>> b/Documentation/topics/dpdk/vhost-user.rst
>>> index 724aa62f6..89388a2bf 100644
>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>> @@ -551,7 +551,18 @@ processing packets at the required rate.
>>>   The amount of Tx retries on a vhost-user or vhost-user-client interface 
>>> can be
>>>   shown with::
>>>
>>> -  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
>>> +  $ ovs-vsctl get Interface dpdkvhostclient0
>>> + statistics:netdev_dpdk_tx_retries
>>
>> I think the "netdev_dpdk_" prefixes should be removed for a few
>> reasons,
>>
>> - These are custom stats that will only be displayed for the dpdk
>> ports so they don't need any extra prefix to say they are for dpdk
>> ports
>>
>> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change
>> its name to a different one in OVS 2.13 unless there is a very good
>> reason
>>
>> - The existing stats don't have this type of prefixes (granted most
>> of them are general stats):
>
> The main reason for the prefix for me is to distinguish those stats
> from the stats that comest from the driver/HW.  Our new stats has very
> generic names that could be named a lot like or even equal to HW
> stats.  This might be not an issue for vhostuser, but for HW NICs this
> may be really confusing for users.
>
> I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a way
> to clearly distinguish those stats.  We may use different prefixes
> like 'sw_' or just 'ovs_'.
>

ok, I can see that we might want to distinguish custom stats to indicate that 
they are specific for that device but that also has the side effect of making 
them less self-descriptive and the user will have to be told somewhere what 
the prefix means.

'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be confusing 
too, as for example in DPDK case, Tx drops are calculated in sw/ovs the same 
as qos/mtu/txfails but won't have that prefix. If we really need a prefix 
'cstat_' is the best I can think of.

>> # ovs-vsctl get Interface dpdkvhost0 statistics
>> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
>> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
>> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
>> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
>> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920, tx_dropped=0,
>> tx_packets=63830432, tx_retries=0}
>>
>> Also, just to note that if there are changes to existing
>> interfaces/behaviour it should really mention that in the commit
>> message so it is highlighted.
>>
>>> +
>>> +When the guest is not able to consume the packets fast enough, the
>>> +transmit queue of the port gets filled up i.e queue runs out of free 
>>> descriptors.
>>> +This is the most likely reason why dpdk transmit API will fail to
>>> +send packets besides other reasons.
>>> +
>>> +The amount of tx failure drops on a dpdk vhost/physical interface
>>> +can be shown with::
>>> +
>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
>>> +                            statistics:netdev_dpdk_tx_failure_drops
>>>
>>
>> The commit msg/docs are only focusing on one stat for vhost ports,
>> but there are other stats and dpdk ports, so they should all get some 
>> mention.
>
> I don't feel comfortable for documenting custom stats.  This is just
> because we can't describe them all.  These are things to be changed over 
> time.
> And we don't really know what some types of stats means and if they
> means the same for different types of HW NICs (they are definitely
> not) or OVS netdevs (even within netdev-dpdk).
> And that is one more reason to have a prefix for OVS internal
> statistics on which we have at least partial control.
>
> I think, that user documentation could mention some special statistics
> while describing the troubleshooting for some hard special cases, but
> this should not be a glossary of all the possible custom stats.  From
> my point of view, names of the stats should be as possible
> self-descriptive and should not require additional documentation.
>

For sure any prefix would need to be explained because that part would not be 
intuitive.

thanks,
Kevin.

> BTW, you will not find any description for statistics provided by the
> linux or DPDK drivers.  You could only look at the driver code or device 
> spec.
>
> Best regards, Ilya Maximets.
>
Ilya Maximets Oct. 18, 2019, 12:59 p.m. UTC | #12
On 16.10.2019 19:48, Kevin Traynor wrote:
> On 16/10/2019 13:16, Ilya Maximets wrote:
>> Hi Kevin,
>>
>> Thanks for review. Some comments inline.
>>
>> On 16.10.2019 12:15, Kevin Traynor wrote:
>>> Hi Sriram,
>>>
>>> Thanks for working on making more fine grained stats for debugging. As
>>> mentioned yesterday, this patch needs rebase so I just reviewed docs and
>>> netdev-dpdk.c which applied. Comments below.
>>>
>>> On 21/09/2019 03:40, Sriram Vatala wrote:
>>>> OVS may be unable to transmit packets for multiple reasons and
>>>> today there is a single counter to track packets dropped due to
>>>> any of those reasons. The most common reason is that a VM is
>>>> unable to read packets fast enough causing the vhostuser port
>>>> transmit queue on the OVS side to become full. This manifests
>>>> as a problem with VNFs not receiving all packets. Having a
>>>> separate drop counter to track packets dropped because the
>>>> transmit queue is full will clearly indicate that the problem
>>>> is on the VM side and not in OVS. Similarly maintaining separate
>>>
>>> It reads like a bit of a contradiction. On one hand the "The *most
>>> common* reason is that a VM is unable to read packets fast enough".
>>> While having a stat "*will clearly indicate* that the problem is on the
>>> VM side".
>>>
>>>> counters for all possible drops helps in indicating sensible
>>>> cause for packet drops.
>>>>
>>>> This patch adds custom software stats counters to track packets
>>>> dropped at port level and these counters are displayed along with
>>>> other stats in "ovs-vsctl get interface <iface> statistics"
>>>> command. The detailed stats will be available for both dpdk and
>>>> vhostuser ports.
>>>>
>>>
>>> I think the commit msg could be a bit clearer, suggest something like
>>> below - feel free to modify (or reject):
>>>
>>> OVS may be unable to transmit packets for multiple reasons on the DPDK
>>> datapath and today there is a single counter to track packets dropped
>>> due to any of those reasons.
>>>
>>> This patch adds custom software stats for the different reasons packets
>>> may be dropped during tx on the DPDK datapath in OVS.
>>>
>>> - MTU drops : drops that occur due to a too large packet size
>>> - Qos drops: drops that occur due to egress QOS
>>> - Tx failures: drops as returned by the DPDK PMD send function
>>>
>>> Note that the reason for tx failures is not specificied in OVS. In
>>> practice for vhost ports it is most common that tx failures are because
>>> there are not enough available descriptors, which is usually caused by
>>> misconfiguration of the guest queues and/or because the guest is not
>>> consuming packets fast enough from the queues.
>>>
>>> These counters are displayed along with other stats in "ovs-vsctl get
>>> interface <iface> statistics" command and are available for dpdk and
>>> vhostuser/vhostuserclient ports.
>>>
>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>
>>> ---
>>>
>>> v9:...
>>> v8:...
>>>
>>>
>>> Note that signed-off, --- and version info should be like this ^^^.
>>> otherwise the review version comments will be in the git commit log when
>>> it is applied.
>>>
>>>> --
>>>> Changes since v8:
>>>> Addressed comments given by Ilya.
>>>>
>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>> ---
>>>>    Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>>>>    lib/netdev-dpdk.c                             | 81 +++++++++++++++----
>>>>    utilities/bugtool/automake.mk                 |  3 +-
>>>>    utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>>>>    .../plugins/network-status/openvswitch.xml    |  1 +
>>>>    5 files changed, 105 insertions(+), 18 deletions(-)
>>>>    create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
>>>>
>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
>>>> index 724aa62f6..89388a2bf 100644
>>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>>> @@ -551,7 +551,18 @@ processing packets at the required rate.
>>>>    The amount of Tx retries on a vhost-user or vhost-user-client interface can be
>>>>    shown with::
>>>>    
>>>> -  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:netdev_dpdk_tx_retries
>>>
>>> I think the "netdev_dpdk_" prefixes should be removed for a few reasons,
>>>
>>> - These are custom stats that will only be displayed for the dpdk ports
>>> so they don't need any extra prefix to say they are for dpdk ports
>>>
>>> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change its
>>> name to a different one in OVS 2.13 unless there is a very good reason
>>>
>>> - The existing stats don't have this type of prefixes (granted most of
>>> them are general stats):
>>
>> The main reason for the prefix for me is to distinguish those stats
>> from the stats that comest from the driver/HW.  Our new stats has
>> very generic names that could be named a lot like or even equal to
>> HW stats.  This might be not an issue for vhostuser, but for HW
>> NICs this may be really confusing for users.
>>
>> I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a
>> way to clearly distinguish those stats.  We may use different prefixes
>> like 'sw_' or just 'ovs_'.
>>
> 
> ok, I can see that we might want to distinguish custom stats to indicate
> that they are specific for that device but that also has the side effect
> of making them less self-descriptive and the user will have to be told
> somewhere what the prefix means.
> 
> 'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be
> confusing too, as for example in DPDK case, Tx drops are calculated in
> sw/ovs the same as qos/mtu/txfails but won't have that prefix. If we
> really need a prefix 'cstat_' is the best I can think of.

What we're trying to solve here is to distinguish stats that counted
internally in OVS from the stats provided by the third parties.
Just to not have same/similar names

Usual stats from 'netdev_stats' are combined from various sources.
For example, 'rx_dropped' in netdev-dpdk is a combination of our
QoS dorps from OVS and HW counters like rte_stats.rx_nombuf.
For me it looks like for these stats we have kind of best-effort
policy to provide as full info as possible.  Trying to describe
how each of these counters calculated doesn't make sense because
we do not know the origin of many of the components that comes
from DPDK or HW.

> 
>>> # ovs-vsctl get Interface dpdkvhost0 statistics
>>> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
>>> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
>>> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
>>> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
>>> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920, tx_dropped=0,
>>> tx_packets=63830432, tx_retries=0}
>>>
>>> Also, just to note that if there are changes to existing
>>> interfaces/behaviour it should really mention that in the commit message
>>> so it is highlighted.
>>>
>>>> +
>>>> +When the guest is not able to consume the packets fast enough, the transmit
>>>> +queue of the port gets filled up i.e queue runs out of free descriptors.
>>>> +This is the most likely reason why dpdk transmit API will fail to send packets
>>>> +besides other reasons.
>>>> +
>>>> +The amount of tx failure drops on a dpdk vhost/physical interface can be
>>>> +shown with::
>>>> +
>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
>>>> +                            statistics:netdev_dpdk_tx_failure_drops
>>>>    
>>>
>>> The commit msg/docs are only focusing on one stat for vhost ports, but
>>> there are other stats and dpdk ports, so they should all get some mention.
>>
>> I don't feel comfortable for documenting custom stats.  This is just because
>> we can't describe them all.  These are things to be changed over time.
>> And we don't really know what some types of stats means and if they means the
>> same for different types of HW NICs (they are definitely not) or OVS netdevs
>> (even within netdev-dpdk).
>> And that is one more reason to have a prefix for OVS internal statistics on
>> which we have at least partial control.
>>
>> I think, that user documentation could mention some special statistics while
>> describing the troubleshooting for some hard special cases, but this should
>> not be a glossary of all the possible custom stats.  From my point of view,
>> names of the stats should be as possible self-descriptive and should not
>> require additional documentation.
>>
> 
> For sure any prefix would need to be explained because that part would
> not be intuitive.

We could add a note to the common DPDK guide about that in a single sentence:
"There are custom statistics that OVS accumulates itself and these stats
has 'your_prefix_here_' as a prefix."

'cstat_' or 'sw_' might need this, but 'netdev_dpdk_' or 'ovs_' looks
self-descriptive enough to not do that.

> 
> thanks,
> Kevin.
> 
>> BTW, you will not find any description for statistics provided by the linux
>> or DPDK drivers.  You could only look at the driver code or device spec.
>>
>> Best regards, Ilya Maximets.
>>
>
Kevin Traynor Oct. 21, 2019, 2:52 p.m. UTC | #13
On 18/10/2019 13:59, Ilya Maximets wrote:
> On 16.10.2019 19:48, Kevin Traynor wrote:
>> On 16/10/2019 13:16, Ilya Maximets wrote:
>>> Hi Kevin,
>>>
>>> Thanks for review. Some comments inline.
>>>
>>> On 16.10.2019 12:15, Kevin Traynor wrote:
>>>> Hi Sriram,
>>>>
>>>> Thanks for working on making more fine grained stats for debugging. As
>>>> mentioned yesterday, this patch needs rebase so I just reviewed docs and
>>>> netdev-dpdk.c which applied. Comments below.
>>>>
>>>> On 21/09/2019 03:40, Sriram Vatala wrote:
>>>>> OVS may be unable to transmit packets for multiple reasons and
>>>>> today there is a single counter to track packets dropped due to
>>>>> any of those reasons. The most common reason is that a VM is
>>>>> unable to read packets fast enough causing the vhostuser port
>>>>> transmit queue on the OVS side to become full. This manifests
>>>>> as a problem with VNFs not receiving all packets. Having a
>>>>> separate drop counter to track packets dropped because the
>>>>> transmit queue is full will clearly indicate that the problem
>>>>> is on the VM side and not in OVS. Similarly maintaining separate
>>>>
>>>> It reads like a bit of a contradiction. On one hand the "The *most
>>>> common* reason is that a VM is unable to read packets fast enough".
>>>> While having a stat "*will clearly indicate* that the problem is on the
>>>> VM side".
>>>>
>>>>> counters for all possible drops helps in indicating sensible
>>>>> cause for packet drops.
>>>>>
>>>>> This patch adds custom software stats counters to track packets
>>>>> dropped at port level and these counters are displayed along with
>>>>> other stats in "ovs-vsctl get interface <iface> statistics"
>>>>> command. The detailed stats will be available for both dpdk and
>>>>> vhostuser ports.
>>>>>
>>>>
>>>> I think the commit msg could be a bit clearer, suggest something like
>>>> below - feel free to modify (or reject):
>>>>
>>>> OVS may be unable to transmit packets for multiple reasons on the DPDK
>>>> datapath and today there is a single counter to track packets dropped
>>>> due to any of those reasons.
>>>>
>>>> This patch adds custom software stats for the different reasons packets
>>>> may be dropped during tx on the DPDK datapath in OVS.
>>>>
>>>> - MTU drops : drops that occur due to a too large packet size
>>>> - Qos drops: drops that occur due to egress QOS
>>>> - Tx failures: drops as returned by the DPDK PMD send function
>>>>
>>>> Note that the reason for tx failures is not specificied in OVS. In
>>>> practice for vhost ports it is most common that tx failures are because
>>>> there are not enough available descriptors, which is usually caused by
>>>> misconfiguration of the guest queues and/or because the guest is not
>>>> consuming packets fast enough from the queues.
>>>>
>>>> These counters are displayed along with other stats in "ovs-vsctl get
>>>> interface <iface> statistics" command and are available for dpdk and
>>>> vhostuser/vhostuserclient ports.
>>>>
>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>>
>>>> ---
>>>>
>>>> v9:...
>>>> v8:...
>>>>
>>>>
>>>> Note that signed-off, --- and version info should be like this ^^^.
>>>> otherwise the review version comments will be in the git commit log when
>>>> it is applied.
>>>>
>>>>> --
>>>>> Changes since v8:
>>>>> Addressed comments given by Ilya.
>>>>>
>>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>>> ---
>>>>>    Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>>>>>    lib/netdev-dpdk.c                             | 81 +++++++++++++++----
>>>>>    utilities/bugtool/automake.mk                 |  3 +-
>>>>>    utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>>>>>    .../plugins/network-status/openvswitch.xml    |  1 +
>>>>>    5 files changed, 105 insertions(+), 18 deletions(-)
>>>>>    create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
>>>>>
>>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
>>>>> index 724aa62f6..89388a2bf 100644
>>>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>>>> @@ -551,7 +551,18 @@ processing packets at the required rate.
>>>>>    The amount of Tx retries on a vhost-user or vhost-user-client interface can be
>>>>>    shown with::
>>>>>    
>>>>> -  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
>>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:netdev_dpdk_tx_retries
>>>>
>>>> I think the "netdev_dpdk_" prefixes should be removed for a few reasons,
>>>>
>>>> - These are custom stats that will only be displayed for the dpdk ports
>>>> so they don't need any extra prefix to say they are for dpdk ports
>>>>
>>>> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change its
>>>> name to a different one in OVS 2.13 unless there is a very good reason
>>>>
>>>> - The existing stats don't have this type of prefixes (granted most of
>>>> them are general stats):
>>>
>>> The main reason for the prefix for me is to distinguish those stats
>>> from the stats that comest from the driver/HW.  Our new stats has
>>> very generic names that could be named a lot like or even equal to
>>> HW stats.  This might be not an issue for vhostuser, but for HW
>>> NICs this may be really confusing for users.
>>>
>>> I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a
>>> way to clearly distinguish those stats.  We may use different prefixes
>>> like 'sw_' or just 'ovs_'.
>>>
>>
>> ok, I can see that we might want to distinguish custom stats to indicate
>> that they are specific for that device but that also has the side effect
>> of making them less self-descriptive and the user will have to be told
>> somewhere what the prefix means.
>>
>> 'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be
>> confusing too, as for example in DPDK case, Tx drops are calculated in
>> sw/ovs the same as qos/mtu/txfails but won't have that prefix. If we
>> really need a prefix 'cstat_' is the best I can think of.
> 
> What we're trying to solve here is to distinguish stats that counted
> internally in OVS from the stats provided by the third parties.
> Just to not have same/similar names
> 
> Usual stats from 'netdev_stats' are combined from various sources.
> For example, 'rx_dropped' in netdev-dpdk is a combination of our
> QoS dorps from OVS and HW counters like rte_stats.rx_nombuf.
> For me it looks like for these stats we have kind of best-effort
> policy to provide as full info as possible.  Trying to describe
> how each of these counters calculated doesn't make sense because
> we do not know the origin of many of the components that comes
> from DPDK or HW.
> 
>>
>>>> # ovs-vsctl get Interface dpdkvhost0 statistics
>>>> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
>>>> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
>>>> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
>>>> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
>>>> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920, tx_dropped=0,
>>>> tx_packets=63830432, tx_retries=0}
>>>>
>>>> Also, just to note that if there are changes to existing
>>>> interfaces/behaviour it should really mention that in the commit message
>>>> so it is highlighted.
>>>>
>>>>> +
>>>>> +When the guest is not able to consume the packets fast enough, the transmit
>>>>> +queue of the port gets filled up i.e queue runs out of free descriptors.
>>>>> +This is the most likely reason why dpdk transmit API will fail to send packets
>>>>> +besides other reasons.
>>>>> +
>>>>> +The amount of tx failure drops on a dpdk vhost/physical interface can be
>>>>> +shown with::
>>>>> +
>>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
>>>>> +                            statistics:netdev_dpdk_tx_failure_drops
>>>>>    
>>>>
>>>> The commit msg/docs are only focusing on one stat for vhost ports, but
>>>> there are other stats and dpdk ports, so they should all get some mention.
>>>
>>> I don't feel comfortable for documenting custom stats.  This is just because
>>> we can't describe them all.  These are things to be changed over time.
>>> And we don't really know what some types of stats means and if they means the
>>> same for different types of HW NICs (they are definitely not) or OVS netdevs
>>> (even within netdev-dpdk).
>>> And that is one more reason to have a prefix for OVS internal statistics on
>>> which we have at least partial control.
>>>
>>> I think, that user documentation could mention some special statistics while
>>> describing the troubleshooting for some hard special cases, but this should
>>> not be a glossary of all the possible custom stats.  From my point of view,
>>> names of the stats should be as possible self-descriptive and should not
>>> require additional documentation.
>>>
>>
>> For sure any prefix would need to be explained because that part would
>> not be intuitive.
> 
> We could add a note to the common DPDK guide about that in a single sentence:
> "There are custom statistics that OVS accumulates itself and these stats
> has 'your_prefix_here_' as a prefix."
> 

yes, that is what I had in mind. It clarifies that it is a custom stat
which explains why something like vhost tx_dropped does not have this
prefix even though it is also counted in ovs/sw/netdev_dpdk.

Anyway, I don't want Sriram to be held up over one line of
documentation, so i'm ok with whatever is decided.

thanks,
Kevin.

> 'cstat_' or 'sw_' might need this, but 'netdev_dpdk_' or 'ovs_' looks
> self-descriptive enough to not do that.
> 
>>
>> thanks,
>> Kevin.
>>
>>> BTW, you will not find any description for statistics provided by the linux
>>> or DPDK drivers.  You could only look at the driver code or device spec.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>
Li,Rongqing via dev Oct. 22, 2019, 8:25 a.m. UTC | #14
Hi Ilya & Kevin,
Thanks for your suggestions. I am summarizing our discussion, please feel free 
to correct me, if I am wrong.

1) All custom stats calculated in OVS will have the prefix , so that these 
stats will not intersect with the names of other stats (driver/HW etc).
2) The prefix used for these custom stats is "cstat_".
3) Instead of documenting all the stats, I will add a generic note about the 
prefix used for custom stats in "Documentation/topics/dpdk/vhost-user.rst 
where "tx_retries" is documented.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Kevin Traynor <ktraynor@redhat.com>
Sent: 21 October 2019 20:22
To: Ilya Maximets <i.maximets@ovn.org>; Sriram Vatala 
<sriram.v@altencalsoftlabs.com>; ovs-dev@openvswitch.org
Cc: Stokes, Ian <ian.stokes@intel.com>
Subject: Re: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

On 18/10/2019 13:59, Ilya Maximets wrote:
> On 16.10.2019 19:48, Kevin Traynor wrote:
>> On 16/10/2019 13:16, Ilya Maximets wrote:
>>> Hi Kevin,
>>>
>>> Thanks for review. Some comments inline.
>>>
>>> On 16.10.2019 12:15, Kevin Traynor wrote:
>>>> Hi Sriram,
>>>>
>>>> Thanks for working on making more fine grained stats for debugging.
>>>> As mentioned yesterday, this patch needs rebase so I just reviewed
>>>> docs and netdev-dpdk.c which applied. Comments below.
>>>>
>>>> On 21/09/2019 03:40, Sriram Vatala wrote:
>>>>> OVS may be unable to transmit packets for multiple reasons and
>>>>> today there is a single counter to track packets dropped due to
>>>>> any of those reasons. The most common reason is that a VM is
>>>>> unable to read packets fast enough causing the vhostuser port
>>>>> transmit queue on the OVS side to become full. This manifests as a
>>>>> problem with VNFs not receiving all packets. Having a separate
>>>>> drop counter to track packets dropped because the transmit queue
>>>>> is full will clearly indicate that the problem is on the VM side
>>>>> and not in OVS. Similarly maintaining separate
>>>>
>>>> It reads like a bit of a contradiction. On one hand the "The *most
>>>> common* reason is that a VM is unable to read packets fast enough".
>>>> While having a stat "*will clearly indicate* that the problem is on
>>>> the VM side".
>>>>
>>>>> counters for all possible drops helps in indicating sensible cause
>>>>> for packet drops.
>>>>>
>>>>> This patch adds custom software stats counters to track packets
>>>>> dropped at port level and these counters are displayed along with
>>>>> other stats in "ovs-vsctl get interface <iface> statistics"
>>>>> command. The detailed stats will be available for both dpdk and
>>>>> vhostuser ports.
>>>>>
>>>>
>>>> I think the commit msg could be a bit clearer, suggest something
>>>> like below - feel free to modify (or reject):
>>>>
>>>> OVS may be unable to transmit packets for multiple reasons on the
>>>> DPDK datapath and today there is a single counter to track packets
>>>> dropped due to any of those reasons.
>>>>
>>>> This patch adds custom software stats for the different reasons
>>>> packets may be dropped during tx on the DPDK datapath in OVS.
>>>>
>>>> - MTU drops : drops that occur due to a too large packet size
>>>> - Qos drops: drops that occur due to egress QOS
>>>> - Tx failures: drops as returned by the DPDK PMD send function
>>>>
>>>> Note that the reason for tx failures is not specificied in OVS. In
>>>> practice for vhost ports it is most common that tx failures are
>>>> because there are not enough available descriptors, which is
>>>> usually caused by misconfiguration of the guest queues and/or
>>>> because the guest is not consuming packets fast enough from the queues.
>>>>
>>>> These counters are displayed along with other stats in "ovs-vsctl
>>>> get interface <iface> statistics" command and are available for
>>>> dpdk and vhostuser/vhostuserclient ports.
>>>>
>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>>
>>>> ---
>>>>
>>>> v9:...
>>>> v8:...
>>>>
>>>>
>>>> Note that signed-off, --- and version info should be like this ^^^.
>>>> otherwise the review version comments will be in the git commit log
>>>> when it is applied.
>>>>
>>>>> --
>>>>> Changes since v8:
>>>>> Addressed comments given by Ilya.
>>>>>
>>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>>> ---
>>>>>    Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>>>>>    lib/netdev-dpdk.c                             | 81 
>>>>> +++++++++++++++----
>>>>>    utilities/bugtool/automake.mk                 |  3 +-
>>>>>    utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>>>>>    .../plugins/network-status/openvswitch.xml    |  1 +
>>>>>    5 files changed, 105 insertions(+), 18 deletions(-)
>>>>>    create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
>>>>>
>>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>>>>> b/Documentation/topics/dpdk/vhost-user.rst
>>>>> index 724aa62f6..89388a2bf 100644
>>>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>>>> @@ -551,7 +551,18 @@ processing packets at the required rate.
>>>>>    The amount of Tx retries on a vhost-user or vhost-user-client 
>>>>> interface can be
>>>>>    shown with::
>>>>>
>>>>> -  $ ovs-vsctl get Interface dpdkvhostclient0
>>>>> statistics:tx_retries
>>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0
>>>>> + statistics:netdev_dpdk_tx_retries
>>>>
>>>> I think the "netdev_dpdk_" prefixes should be removed for a few
>>>> reasons,
>>>>
>>>> - These are custom stats that will only be displayed for the dpdk
>>>> ports so they don't need any extra prefix to say they are for dpdk
>>>> ports
>>>>
>>>> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change
>>>> its name to a different one in OVS 2.13 unless there is a very good
>>>> reason
>>>>
>>>> - The existing stats don't have this type of prefixes (granted most
>>>> of them are general stats):
>>>
>>> The main reason for the prefix for me is to distinguish those stats
>>> from the stats that comest from the driver/HW.  Our new stats has
>>> very generic names that could be named a lot like or even equal to
>>> HW stats.  This might be not an issue for vhostuser, but for HW NICs
>>> this may be really confusing for users.
>>>
>>> I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a way
>>> to clearly distinguish those stats.  We may use different prefixes
>>> like 'sw_' or just 'ovs_'.
>>>
>>
>> ok, I can see that we might want to distinguish custom stats to
>> indicate that they are specific for that device but that also has the
>> side effect of making them less self-descriptive and the user will
>> have to be told somewhere what the prefix means.
>>
>> 'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be
>> confusing too, as for example in DPDK case, Tx drops are calculated
>> in sw/ovs the same as qos/mtu/txfails but won't have that prefix. If
>> we really need a prefix 'cstat_' is the best I can think of.
>
> What we're trying to solve here is to distinguish stats that counted
> internally in OVS from the stats provided by the third parties.
> Just to not have same/similar names
>
> Usual stats from 'netdev_stats' are combined from various sources.
> For example, 'rx_dropped' in netdev-dpdk is a combination of our QoS
> dorps from OVS and HW counters like rte_stats.rx_nombuf.
> For me it looks like for these stats we have kind of best-effort
> policy to provide as full info as possible.  Trying to describe how
> each of these counters calculated doesn't make sense because we do not
> know the origin of many of the components that comes from DPDK or HW.
>
>>
>>>> # ovs-vsctl get Interface dpdkvhost0 statistics
>>>> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
>>>> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
>>>> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
>>>> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
>>>> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920,
>>>> tx_dropped=0, tx_packets=63830432, tx_retries=0}
>>>>
>>>> Also, just to note that if there are changes to existing
>>>> interfaces/behaviour it should really mention that in the commit
>>>> message so it is highlighted.
>>>>
>>>>> +
>>>>> +When the guest is not able to consume the packets fast enough,
>>>>> +the transmit queue of the port gets filled up i.e queue runs out of 
>>>>> free descriptors.
>>>>> +This is the most likely reason why dpdk transmit API will fail to
>>>>> +send packets besides other reasons.
>>>>> +
>>>>> +The amount of tx failure drops on a dpdk vhost/physical interface
>>>>> +can be shown with::
>>>>> +
>>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
>>>>> +
>>>>> + statistics:netdev_dpdk_tx_failure_drops
>>>>>
>>>>
>>>> The commit msg/docs are only focusing on one stat for vhost ports,
>>>> but there are other stats and dpdk ports, so they should all get some 
>>>> mention.
>>>
>>> I don't feel comfortable for documenting custom stats.  This is just
>>> because we can't describe them all.  These are things to be changed over 
>>> time.
>>> And we don't really know what some types of stats means and if they
>>> means the same for different types of HW NICs (they are definitely
>>> not) or OVS netdevs (even within netdev-dpdk).
>>> And that is one more reason to have a prefix for OVS internal
>>> statistics on which we have at least partial control.
>>>
>>> I think, that user documentation could mention some special
>>> statistics while describing the troubleshooting for some hard
>>> special cases, but this should not be a glossary of all the possible
>>> custom stats.  From my point of view, names of the stats should be
>>> as possible self-descriptive and should not require additional 
>>> documentation.
>>>
>>
>> For sure any prefix would need to be explained because that part
>> would not be intuitive.
>
> We could add a note to the common DPDK guide about that in a single 
> sentence:
> "There are custom statistics that OVS accumulates itself and these
> stats has 'your_prefix_here_' as a prefix."
>

yes, that is what I had in mind. It clarifies that it is a custom stat which 
explains why something like vhost tx_dropped does not have this prefix even 
though it is also counted in ovs/sw/netdev_dpdk.

Anyway, I don't want Sriram to be held up over one line of documentation, so 
i'm ok with whatever is decided.

thanks,
Kevin.

> 'cstat_' or 'sw_' might need this, but 'netdev_dpdk_' or 'ovs_' looks
> self-descriptive enough to not do that.
>
>>
>> thanks,
>> Kevin.
>>
>>> BTW, you will not find any description for statistics provided by
>>> the linux or DPDK drivers.  You could only look at the driver code or 
>>> device spec.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>
Ilya Maximets Oct. 22, 2019, 10:30 a.m. UTC | #15
On 22.10.2019 10:25, Sriram Vatala wrote:
> Hi Ilya & Kevin,
> Thanks for your suggestions. I am summarizing our discussion, please feel free
> to correct me, if I am wrong.
> 
> 1) All custom stats calculated in OVS will have the prefix , so that these
> stats will not intersect with the names of other stats (driver/HW etc).
> 2) The prefix used for these custom stats is "cstat_".
> 3) Instead of documenting all the stats, I will add a generic note about the
> prefix used for custom stats in "Documentation/topics/dpdk/vhost-user.rst
> where "tx_retries" is documented.

Looks OK.
One thing is that prefix documentation should, probably, go to the
"Extended & Custom Statistics" section of Documentation/topics/dpdk/bridge.rst .
This looks more appropriate, because we'll have those stats not only for
vhostuser ports.

Best regards, Ilya Maximets.

> 
> Thanks & Regards,
> Sriram.
> 
> -----Original Message-----
> From: Kevin Traynor <ktraynor@redhat.com>
> Sent: 21 October 2019 20:22
> To: Ilya Maximets <i.maximets@ovn.org>; Sriram Vatala
> <sriram.v@altencalsoftlabs.com>; ovs-dev@openvswitch.org
> Cc: Stokes, Ian <ian.stokes@intel.com>
> Subject: Re: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics
> 
> On 18/10/2019 13:59, Ilya Maximets wrote:
>> On 16.10.2019 19:48, Kevin Traynor wrote:
>>> On 16/10/2019 13:16, Ilya Maximets wrote:
>>>> Hi Kevin,
>>>>
>>>> Thanks for review. Some comments inline.
>>>>
>>>> On 16.10.2019 12:15, Kevin Traynor wrote:
>>>>> Hi Sriram,
>>>>>
>>>>> Thanks for working on making more fine grained stats for debugging.
>>>>> As mentioned yesterday, this patch needs rebase so I just reviewed
>>>>> docs and netdev-dpdk.c which applied. Comments below.
>>>>>
>>>>> On 21/09/2019 03:40, Sriram Vatala wrote:
>>>>>> OVS may be unable to transmit packets for multiple reasons and
>>>>>> today there is a single counter to track packets dropped due to
>>>>>> any of those reasons. The most common reason is that a VM is
>>>>>> unable to read packets fast enough causing the vhostuser port
>>>>>> transmit queue on the OVS side to become full. This manifests as a
>>>>>> problem with VNFs not receiving all packets. Having a separate
>>>>>> drop counter to track packets dropped because the transmit queue
>>>>>> is full will clearly indicate that the problem is on the VM side
>>>>>> and not in OVS. Similarly maintaining separate
>>>>>
>>>>> It reads like a bit of a contradiction. On one hand the "The *most
>>>>> common* reason is that a VM is unable to read packets fast enough".
>>>>> While having a stat "*will clearly indicate* that the problem is on
>>>>> the VM side".
>>>>>
>>>>>> counters for all possible drops helps in indicating sensible cause
>>>>>> for packet drops.
>>>>>>
>>>>>> This patch adds custom software stats counters to track packets
>>>>>> dropped at port level and these counters are displayed along with
>>>>>> other stats in "ovs-vsctl get interface <iface> statistics"
>>>>>> command. The detailed stats will be available for both dpdk and
>>>>>> vhostuser ports.
>>>>>>
>>>>>
>>>>> I think the commit msg could be a bit clearer, suggest something
>>>>> like below - feel free to modify (or reject):
>>>>>
>>>>> OVS may be unable to transmit packets for multiple reasons on the
>>>>> DPDK datapath and today there is a single counter to track packets
>>>>> dropped due to any of those reasons.
>>>>>
>>>>> This patch adds custom software stats for the different reasons
>>>>> packets may be dropped during tx on the DPDK datapath in OVS.
>>>>>
>>>>> - MTU drops : drops that occur due to a too large packet size
>>>>> - Qos drops: drops that occur due to egress QOS
>>>>> - Tx failures: drops as returned by the DPDK PMD send function
>>>>>
>>>>> Note that the reason for tx failures is not specificied in OVS. In
>>>>> practice for vhost ports it is most common that tx failures are
>>>>> because there are not enough available descriptors, which is
>>>>> usually caused by misconfiguration of the guest queues and/or
>>>>> because the guest is not consuming packets fast enough from the queues.
>>>>>
>>>>> These counters are displayed along with other stats in "ovs-vsctl
>>>>> get interface <iface> statistics" command and are available for
>>>>> dpdk and vhostuser/vhostuserclient ports.
>>>>>
>>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>>>
>>>>> ---
>>>>>
>>>>> v9:...
>>>>> v8:...
>>>>>
>>>>>
>>>>> Note that signed-off, --- and version info should be like this ^^^.
>>>>> otherwise the review version comments will be in the git commit log
>>>>> when it is applied.
>>>>>
>>>>>> --
>>>>>> Changes since v8:
>>>>>> Addressed comments given by Ilya.
>>>>>>
>>>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>>>> ---
>>>>>>     Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>>>>>>     lib/netdev-dpdk.c                             | 81
>>>>>> +++++++++++++++----
>>>>>>     utilities/bugtool/automake.mk                 |  3 +-
>>>>>>     utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>>>>>>     .../plugins/network-status/openvswitch.xml    |  1 +
>>>>>>     5 files changed, 105 insertions(+), 18 deletions(-)
>>>>>>     create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
>>>>>>
>>>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>>>>>> b/Documentation/topics/dpdk/vhost-user.rst
>>>>>> index 724aa62f6..89388a2bf 100644
>>>>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>>>>> @@ -551,7 +551,18 @@ processing packets at the required rate.
>>>>>>     The amount of Tx retries on a vhost-user or vhost-user-client
>>>>>> interface can be
>>>>>>     shown with::
>>>>>>
>>>>>> -  $ ovs-vsctl get Interface dpdkvhostclient0
>>>>>> statistics:tx_retries
>>>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0
>>>>>> + statistics:netdev_dpdk_tx_retries
>>>>>
>>>>> I think the "netdev_dpdk_" prefixes should be removed for a few
>>>>> reasons,
>>>>>
>>>>> - These are custom stats that will only be displayed for the dpdk
>>>>> ports so they don't need any extra prefix to say they are for dpdk
>>>>> ports
>>>>>
>>>>> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change
>>>>> its name to a different one in OVS 2.13 unless there is a very good
>>>>> reason
>>>>>
>>>>> - The existing stats don't have this type of prefixes (granted most
>>>>> of them are general stats):
>>>>
>>>> The main reason for the prefix for me is to distinguish those stats
>>>> from the stats that comest from the driver/HW.  Our new stats has
>>>> very generic names that could be named a lot like or even equal to
>>>> HW stats.  This might be not an issue for vhostuser, but for HW NICs
>>>> this may be really confusing for users.
>>>>
>>>> I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a way
>>>> to clearly distinguish those stats.  We may use different prefixes
>>>> like 'sw_' or just 'ovs_'.
>>>>
>>>
>>> ok, I can see that we might want to distinguish custom stats to
>>> indicate that they are specific for that device but that also has the
>>> side effect of making them less self-descriptive and the user will
>>> have to be told somewhere what the prefix means.
>>>
>>> 'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be
>>> confusing too, as for example in DPDK case, Tx drops are calculated
>>> in sw/ovs the same as qos/mtu/txfails but won't have that prefix. If
>>> we really need a prefix 'cstat_' is the best I can think of.
>>
>> What we're trying to solve here is to distinguish stats that counted
>> internally in OVS from the stats provided by the third parties.
>> Just to not have same/similar names
>>
>> Usual stats from 'netdev_stats' are combined from various sources.
>> For example, 'rx_dropped' in netdev-dpdk is a combination of our QoS
>> dorps from OVS and HW counters like rte_stats.rx_nombuf.
>> For me it looks like for these stats we have kind of best-effort
>> policy to provide as full info as possible.  Trying to describe how
>> each of these counters calculated doesn't make sense because we do not
>> know the origin of many of the components that comes from DPDK or HW.
>>
>>>
>>>>> # ovs-vsctl get Interface dpdkvhost0 statistics
>>>>> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
>>>>> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
>>>>> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
>>>>> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
>>>>> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920,
>>>>> tx_dropped=0, tx_packets=63830432, tx_retries=0}
>>>>>
>>>>> Also, just to note that if there are changes to existing
>>>>> interfaces/behaviour it should really mention that in the commit
>>>>> message so it is highlighted.
>>>>>
>>>>>> +
>>>>>> +When the guest is not able to consume the packets fast enough,
>>>>>> +the transmit queue of the port gets filled up i.e queue runs out of
>>>>>> free descriptors.
>>>>>> +This is the most likely reason why dpdk transmit API will fail to
>>>>>> +send packets besides other reasons.
>>>>>> +
>>>>>> +The amount of tx failure drops on a dpdk vhost/physical interface
>>>>>> +can be shown with::
>>>>>> +
>>>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
>>>>>> +
>>>>>> + statistics:netdev_dpdk_tx_failure_drops
>>>>>>
>>>>>
>>>>> The commit msg/docs are only focusing on one stat for vhost ports,
>>>>> but there are other stats and dpdk ports, so they should all get some
>>>>> mention.
>>>>
>>>> I don't feel comfortable for documenting custom stats.  This is just
>>>> because we can't describe them all.  These are things to be changed over
>>>> time.
>>>> And we don't really know what some types of stats means and if they
>>>> means the same for different types of HW NICs (they are definitely
>>>> not) or OVS netdevs (even within netdev-dpdk).
>>>> And that is one more reason to have a prefix for OVS internal
>>>> statistics on which we have at least partial control.
>>>>
>>>> I think, that user documentation could mention some special
>>>> statistics while describing the troubleshooting for some hard
>>>> special cases, but this should not be a glossary of all the possible
>>>> custom stats.  From my point of view, names of the stats should be
>>>> as possible self-descriptive and should not require additional
>>>> documentation.
>>>>
>>>
>>> For sure any prefix would need to be explained because that part
>>> would not be intuitive.
>>
>> We could add a note to the common DPDK guide about that in a single
>> sentence:
>> "There are custom statistics that OVS accumulates itself and these
>> stats has 'your_prefix_here_' as a prefix."
>>
> 
> yes, that is what I had in mind. It clarifies that it is a custom stat which
> explains why something like vhost tx_dropped does not have this prefix even
> though it is also counted in ovs/sw/netdev_dpdk.
> 
> Anyway, I don't want Sriram to be held up over one line of documentation, so
> i'm ok with whatever is decided.
> 
> thanks,
> Kevin.
> 
>> 'cstat_' or 'sw_' might need this, but 'netdev_dpdk_' or 'ovs_' looks
>> self-descriptive enough to not do that.
>>
>>>
>>> thanks,
>>> Kevin.
>>>
>>>> BTW, you will not find any description for statistics provided by
>>>> the linux or DPDK drivers.  You could only look at the driver code or
>>>> device spec.
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>
Kevin Traynor Oct. 22, 2019, 10:31 a.m. UTC | #16
On 22/10/2019 09:25, Sriram Vatala wrote:
> Hi Ilya & Kevin,
> Thanks for your suggestions. I am summarizing our discussion, please feel free 
> to correct me, if I am wrong.
> 

Hi Sriram,

Will share my thought, Ilya may have different view.

> 1) All custom stats calculated in OVS will have the prefix , so that these 
> stats will not intersect with the names of other stats (driver/HW etc).

yes

> 2) The prefix used for these custom stats is "cstat_".

I think with 3) (note to explain the prefix) then 'sw_' or 'ovs_' would
look neater in the logs but i'm ok with any of them.

> 3) Instead of documenting all the stats, I will add a generic note about the 
> prefix used for custom stats in "Documentation/topics/dpdk/vhost-user.rst 
> where "tx_retries" is documented.
> 

yes, but these stats are not vhost specific so I think they should go
into topics/dpdk/bridge.rst in the 'Extended & Custom Statistics' section.

Kevin.

> Thanks & Regards,
> Sriram.
> 
> -----Original Message-----
> From: Kevin Traynor <ktraynor@redhat.com>
> Sent: 21 October 2019 20:22
> To: Ilya Maximets <i.maximets@ovn.org>; Sriram Vatala 
> <sriram.v@altencalsoftlabs.com>; ovs-dev@openvswitch.org
> Cc: Stokes, Ian <ian.stokes@intel.com>
> Subject: Re: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics
> 
> On 18/10/2019 13:59, Ilya Maximets wrote:
>> On 16.10.2019 19:48, Kevin Traynor wrote:
>>> On 16/10/2019 13:16, Ilya Maximets wrote:
>>>> Hi Kevin,
>>>>
>>>> Thanks for review. Some comments inline.
>>>>
>>>> On 16.10.2019 12:15, Kevin Traynor wrote:
>>>>> Hi Sriram,
>>>>>
>>>>> Thanks for working on making more fine grained stats for debugging.
>>>>> As mentioned yesterday, this patch needs rebase so I just reviewed
>>>>> docs and netdev-dpdk.c which applied. Comments below.
>>>>>
>>>>> On 21/09/2019 03:40, Sriram Vatala wrote:
>>>>>> OVS may be unable to transmit packets for multiple reasons and
>>>>>> today there is a single counter to track packets dropped due to
>>>>>> any of those reasons. The most common reason is that a VM is
>>>>>> unable to read packets fast enough causing the vhostuser port
>>>>>> transmit queue on the OVS side to become full. This manifests as a
>>>>>> problem with VNFs not receiving all packets. Having a separate
>>>>>> drop counter to track packets dropped because the transmit queue
>>>>>> is full will clearly indicate that the problem is on the VM side
>>>>>> and not in OVS. Similarly maintaining separate
>>>>>
>>>>> It reads like a bit of a contradiction. On one hand the "The *most
>>>>> common* reason is that a VM is unable to read packets fast enough".
>>>>> While having a stat "*will clearly indicate* that the problem is on
>>>>> the VM side".
>>>>>
>>>>>> counters for all possible drops helps in indicating sensible cause
>>>>>> for packet drops.
>>>>>>
>>>>>> This patch adds custom software stats counters to track packets
>>>>>> dropped at port level and these counters are displayed along with
>>>>>> other stats in "ovs-vsctl get interface <iface> statistics"
>>>>>> command. The detailed stats will be available for both dpdk and
>>>>>> vhostuser ports.
>>>>>>
>>>>>
>>>>> I think the commit msg could be a bit clearer, suggest something
>>>>> like below - feel free to modify (or reject):
>>>>>
>>>>> OVS may be unable to transmit packets for multiple reasons on the
>>>>> DPDK datapath and today there is a single counter to track packets
>>>>> dropped due to any of those reasons.
>>>>>
>>>>> This patch adds custom software stats for the different reasons
>>>>> packets may be dropped during tx on the DPDK datapath in OVS.
>>>>>
>>>>> - MTU drops : drops that occur due to a too large packet size
>>>>> - Qos drops: drops that occur due to egress QOS
>>>>> - Tx failures: drops as returned by the DPDK PMD send function
>>>>>
>>>>> Note that the reason for tx failures is not specificied in OVS. In
>>>>> practice for vhost ports it is most common that tx failures are
>>>>> because there are not enough available descriptors, which is
>>>>> usually caused by misconfiguration of the guest queues and/or
>>>>> because the guest is not consuming packets fast enough from the queues.
>>>>>
>>>>> These counters are displayed along with other stats in "ovs-vsctl
>>>>> get interface <iface> statistics" command and are available for
>>>>> dpdk and vhostuser/vhostuserclient ports.
>>>>>
>>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>>>
>>>>> ---
>>>>>
>>>>> v9:...
>>>>> v8:...
>>>>>
>>>>>
>>>>> Note that signed-off, --- and version info should be like this ^^^.
>>>>> otherwise the review version comments will be in the git commit log
>>>>> when it is applied.
>>>>>
>>>>>> --
>>>>>> Changes since v8:
>>>>>> Addressed comments given by Ilya.
>>>>>>
>>>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>>>> ---
>>>>>>    Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>>>>>>    lib/netdev-dpdk.c                             | 81 
>>>>>> +++++++++++++++----
>>>>>>    utilities/bugtool/automake.mk                 |  3 +-
>>>>>>    utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>>>>>>    .../plugins/network-status/openvswitch.xml    |  1 +
>>>>>>    5 files changed, 105 insertions(+), 18 deletions(-)
>>>>>>    create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
>>>>>>
>>>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>>>>>> b/Documentation/topics/dpdk/vhost-user.rst
>>>>>> index 724aa62f6..89388a2bf 100644
>>>>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>>>>> @@ -551,7 +551,18 @@ processing packets at the required rate.
>>>>>>    The amount of Tx retries on a vhost-user or vhost-user-client 
>>>>>> interface can be
>>>>>>    shown with::
>>>>>>
>>>>>> -  $ ovs-vsctl get Interface dpdkvhostclient0
>>>>>> statistics:tx_retries
>>>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0
>>>>>> + statistics:netdev_dpdk_tx_retries
>>>>>
>>>>> I think the "netdev_dpdk_" prefixes should be removed for a few
>>>>> reasons,
>>>>>
>>>>> - These are custom stats that will only be displayed for the dpdk
>>>>> ports so they don't need any extra prefix to say they are for dpdk
>>>>> ports
>>>>>
>>>>> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change
>>>>> its name to a different one in OVS 2.13 unless there is a very good
>>>>> reason
>>>>>
>>>>> - The existing stats don't have this type of prefixes (granted most
>>>>> of them are general stats):
>>>>
>>>> The main reason for the prefix for me is to distinguish those stats
>>>> from the stats that comest from the driver/HW.  Our new stats has
>>>> very generic names that could be named a lot like or even equal to
>>>> HW stats.  This might be not an issue for vhostuser, but for HW NICs
>>>> this may be really confusing for users.
>>>>
>>>> I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a way
>>>> to clearly distinguish those stats.  We may use different prefixes
>>>> like 'sw_' or just 'ovs_'.
>>>>
>>>
>>> ok, I can see that we might want to distinguish custom stats to
>>> indicate that they are specific for that device but that also has the
>>> side effect of making them less self-descriptive and the user will
>>> have to be told somewhere what the prefix means.
>>>
>>> 'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be
>>> confusing too, as for example in DPDK case, Tx drops are calculated
>>> in sw/ovs the same as qos/mtu/txfails but won't have that prefix. If
>>> we really need a prefix 'cstat_' is the best I can think of.
>>
>> What we're trying to solve here is to distinguish stats that counted
>> internally in OVS from the stats provided by the third parties.
>> Just to not have same/similar names
>>
>> Usual stats from 'netdev_stats' are combined from various sources.
>> For example, 'rx_dropped' in netdev-dpdk is a combination of our QoS
>> dorps from OVS and HW counters like rte_stats.rx_nombuf.
>> For me it looks like for these stats we have kind of best-effort
>> policy to provide as full info as possible.  Trying to describe how
>> each of these counters calculated doesn't make sense because we do not
>> know the origin of many of the components that comes from DPDK or HW.
>>
>>>
>>>>> # ovs-vsctl get Interface dpdkvhost0 statistics
>>>>> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
>>>>> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
>>>>> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
>>>>> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
>>>>> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920,
>>>>> tx_dropped=0, tx_packets=63830432, tx_retries=0}
>>>>>
>>>>> Also, just to note that if there are changes to existing
>>>>> interfaces/behaviour it should really mention that in the commit
>>>>> message so it is highlighted.
>>>>>
>>>>>> +
>>>>>> +When the guest is not able to consume the packets fast enough,
>>>>>> +the transmit queue of the port gets filled up i.e queue runs out of 
>>>>>> free descriptors.
>>>>>> +This is the most likely reason why dpdk transmit API will fail to
>>>>>> +send packets besides other reasons.
>>>>>> +
>>>>>> +The amount of tx failure drops on a dpdk vhost/physical interface
>>>>>> +can be shown with::
>>>>>> +
>>>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
>>>>>> +
>>>>>> + statistics:netdev_dpdk_tx_failure_drops
>>>>>>
>>>>>
>>>>> The commit msg/docs are only focusing on one stat for vhost ports,
>>>>> but there are other stats and dpdk ports, so they should all get some 
>>>>> mention.
>>>>
>>>> I don't feel comfortable for documenting custom stats.  This is just
>>>> because we can't describe them all.  These are things to be changed over 
>>>> time.
>>>> And we don't really know what some types of stats means and if they
>>>> means the same for different types of HW NICs (they are definitely
>>>> not) or OVS netdevs (even within netdev-dpdk).
>>>> And that is one more reason to have a prefix for OVS internal
>>>> statistics on which we have at least partial control.
>>>>
>>>> I think, that user documentation could mention some special
>>>> statistics while describing the troubleshooting for some hard
>>>> special cases, but this should not be a glossary of all the possible
>>>> custom stats.  From my point of view, names of the stats should be
>>>> as possible self-descriptive and should not require additional 
>>>> documentation.
>>>>
>>>
>>> For sure any prefix would need to be explained because that part
>>> would not be intuitive.
>>
>> We could add a note to the common DPDK guide about that in a single 
>> sentence:
>> "There are custom statistics that OVS accumulates itself and these
>> stats has 'your_prefix_here_' as a prefix."
>>
> 
> yes, that is what I had in mind. It clarifies that it is a custom stat which 
> explains why something like vhost tx_dropped does not have this prefix even 
> though it is also counted in ovs/sw/netdev_dpdk.
> 
> Anyway, I don't want Sriram to be held up over one line of documentation, so 
> i'm ok with whatever is decided.
> 
> thanks,
> Kevin.
> 
>> 'cstat_' or 'sw_' might need this, but 'netdev_dpdk_' or 'ovs_' looks
>> self-descriptive enough to not do that.
>>
>>>
>>> thanks,
>>> Kevin.
>>>
>>>> BTW, you will not find any description for statistics provided by
>>>> the linux or DPDK drivers.  You could only look at the driver code or 
>>>> device spec.
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>
Kevin Traynor Oct. 22, 2019, 10:33 a.m. UTC | #17
On 22/10/2019 11:31, Kevin Traynor wrote:
> On 22/10/2019 09:25, Sriram Vatala wrote:
>> Hi Ilya & Kevin,
>> Thanks for your suggestions. I am summarizing our discussion, please feel free 
>> to correct me, if I am wrong.
>>
> 
> Hi Sriram,
> 
> Will share my thought, Ilya may have different view.
> 
>> 1) All custom stats calculated in OVS will have the prefix , so that these 
>> stats will not intersect with the names of other stats (driver/HW etc).
> 
> yes
> 
>> 2) The prefix used for these custom stats is "cstat_".
> 
> I think with 3) (note to explain the prefix) then 'sw_' or 'ovs_' would
> look neater in the logs but i'm ok with any of them.
> 
>> 3) Instead of documenting all the stats, I will add a generic note about the 
>> prefix used for custom stats in "Documentation/topics/dpdk/vhost-user.rst 
>> where "tx_retries" is documented.
>>
> 
> yes, but these stats are not vhost specific so I think they should go
> into topics/dpdk/bridge.rst in the 'Extended & Custom Statistics' section.
> 

crossed mails, but the same comment :-)

> Kevin.
> 
>> Thanks & Regards,
>> Sriram.
>>
>> -----Original Message-----
>> From: Kevin Traynor <ktraynor@redhat.com>
>> Sent: 21 October 2019 20:22
>> To: Ilya Maximets <i.maximets@ovn.org>; Sriram Vatala 
>> <sriram.v@altencalsoftlabs.com>; ovs-dev@openvswitch.org
>> Cc: Stokes, Ian <ian.stokes@intel.com>
>> Subject: Re: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics
>>
>> On 18/10/2019 13:59, Ilya Maximets wrote:
>>> On 16.10.2019 19:48, Kevin Traynor wrote:
>>>> On 16/10/2019 13:16, Ilya Maximets wrote:
>>>>> Hi Kevin,
>>>>>
>>>>> Thanks for review. Some comments inline.
>>>>>
>>>>> On 16.10.2019 12:15, Kevin Traynor wrote:
>>>>>> Hi Sriram,
>>>>>>
>>>>>> Thanks for working on making more fine grained stats for debugging.
>>>>>> As mentioned yesterday, this patch needs rebase so I just reviewed
>>>>>> docs and netdev-dpdk.c which applied. Comments below.
>>>>>>
>>>>>> On 21/09/2019 03:40, Sriram Vatala wrote:
>>>>>>> OVS may be unable to transmit packets for multiple reasons and
>>>>>>> today there is a single counter to track packets dropped due to
>>>>>>> any of those reasons. The most common reason is that a VM is
>>>>>>> unable to read packets fast enough causing the vhostuser port
>>>>>>> transmit queue on the OVS side to become full. This manifests as a
>>>>>>> problem with VNFs not receiving all packets. Having a separate
>>>>>>> drop counter to track packets dropped because the transmit queue
>>>>>>> is full will clearly indicate that the problem is on the VM side
>>>>>>> and not in OVS. Similarly maintaining separate
>>>>>>
>>>>>> It reads like a bit of a contradiction. On one hand the "The *most
>>>>>> common* reason is that a VM is unable to read packets fast enough".
>>>>>> While having a stat "*will clearly indicate* that the problem is on
>>>>>> the VM side".
>>>>>>
>>>>>>> counters for all possible drops helps in indicating sensible cause
>>>>>>> for packet drops.
>>>>>>>
>>>>>>> This patch adds custom software stats counters to track packets
>>>>>>> dropped at port level and these counters are displayed along with
>>>>>>> other stats in "ovs-vsctl get interface <iface> statistics"
>>>>>>> command. The detailed stats will be available for both dpdk and
>>>>>>> vhostuser ports.
>>>>>>>
>>>>>>
>>>>>> I think the commit msg could be a bit clearer, suggest something
>>>>>> like below - feel free to modify (or reject):
>>>>>>
>>>>>> OVS may be unable to transmit packets for multiple reasons on the
>>>>>> DPDK datapath and today there is a single counter to track packets
>>>>>> dropped due to any of those reasons.
>>>>>>
>>>>>> This patch adds custom software stats for the different reasons
>>>>>> packets may be dropped during tx on the DPDK datapath in OVS.
>>>>>>
>>>>>> - MTU drops : drops that occur due to a too large packet size
>>>>>> - Qos drops: drops that occur due to egress QOS
>>>>>> - Tx failures: drops as returned by the DPDK PMD send function
>>>>>>
>>>>>> Note that the reason for tx failures is not specificied in OVS. In
>>>>>> practice for vhost ports it is most common that tx failures are
>>>>>> because there are not enough available descriptors, which is
>>>>>> usually caused by misconfiguration of the guest queues and/or
>>>>>> because the guest is not consuming packets fast enough from the queues.
>>>>>>
>>>>>> These counters are displayed along with other stats in "ovs-vsctl
>>>>>> get interface <iface> statistics" command and are available for
>>>>>> dpdk and vhostuser/vhostuserclient ports.
>>>>>>
>>>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> v9:...
>>>>>> v8:...
>>>>>>
>>>>>>
>>>>>> Note that signed-off, --- and version info should be like this ^^^.
>>>>>> otherwise the review version comments will be in the git commit log
>>>>>> when it is applied.
>>>>>>
>>>>>>> --
>>>>>>> Changes since v8:
>>>>>>> Addressed comments given by Ilya.
>>>>>>>
>>>>>>> Signed-off-by: Sriram Vatala <sriram.v@altencalsoftlabs.com>
>>>>>>> ---
>>>>>>>    Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
>>>>>>>    lib/netdev-dpdk.c                             | 81 
>>>>>>> +++++++++++++++----
>>>>>>>    utilities/bugtool/automake.mk                 |  3 +-
>>>>>>>    utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
>>>>>>>    .../plugins/network-status/openvswitch.xml    |  1 +
>>>>>>>    5 files changed, 105 insertions(+), 18 deletions(-)
>>>>>>>    create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats
>>>>>>>
>>>>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst
>>>>>>> b/Documentation/topics/dpdk/vhost-user.rst
>>>>>>> index 724aa62f6..89388a2bf 100644
>>>>>>> --- a/Documentation/topics/dpdk/vhost-user.rst
>>>>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst
>>>>>>> @@ -551,7 +551,18 @@ processing packets at the required rate.
>>>>>>>    The amount of Tx retries on a vhost-user or vhost-user-client 
>>>>>>> interface can be
>>>>>>>    shown with::
>>>>>>>
>>>>>>> -  $ ovs-vsctl get Interface dpdkvhostclient0
>>>>>>> statistics:tx_retries
>>>>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0
>>>>>>> + statistics:netdev_dpdk_tx_retries
>>>>>>
>>>>>> I think the "netdev_dpdk_" prefixes should be removed for a few
>>>>>> reasons,
>>>>>>
>>>>>> - These are custom stats that will only be displayed for the dpdk
>>>>>> ports so they don't need any extra prefix to say they are for dpdk
>>>>>> ports
>>>>>>
>>>>>> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change
>>>>>> its name to a different one in OVS 2.13 unless there is a very good
>>>>>> reason
>>>>>>
>>>>>> - The existing stats don't have this type of prefixes (granted most
>>>>>> of them are general stats):
>>>>>
>>>>> The main reason for the prefix for me is to distinguish those stats
>>>>> from the stats that comest from the driver/HW.  Our new stats has
>>>>> very generic names that could be named a lot like or even equal to
>>>>> HW stats.  This might be not an issue for vhostuser, but for HW NICs
>>>>> this may be really confusing for users.
>>>>>
>>>>> I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a way
>>>>> to clearly distinguish those stats.  We may use different prefixes
>>>>> like 'sw_' or just 'ovs_'.
>>>>>
>>>>
>>>> ok, I can see that we might want to distinguish custom stats to
>>>> indicate that they are specific for that device but that also has the
>>>> side effect of making them less self-descriptive and the user will
>>>> have to be told somewhere what the prefix means.
>>>>
>>>> 'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be
>>>> confusing too, as for example in DPDK case, Tx drops are calculated
>>>> in sw/ovs the same as qos/mtu/txfails but won't have that prefix. If
>>>> we really need a prefix 'cstat_' is the best I can think of.
>>>
>>> What we're trying to solve here is to distinguish stats that counted
>>> internally in OVS from the stats provided by the third parties.
>>> Just to not have same/similar names
>>>
>>> Usual stats from 'netdev_stats' are combined from various sources.
>>> For example, 'rx_dropped' in netdev-dpdk is a combination of our QoS
>>> dorps from OVS and HW counters like rte_stats.rx_nombuf.
>>> For me it looks like for these stats we have kind of best-effort
>>> policy to provide as full info as possible.  Trying to describe how
>>> each of these counters calculated doesn't make sense because we do not
>>> know the origin of many of the components that comes from DPDK or HW.
>>>
>>>>
>>>>>> # ovs-vsctl get Interface dpdkvhost0 statistics
>>>>>> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
>>>>>> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
>>>>>> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
>>>>>> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
>>>>>> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920,
>>>>>> tx_dropped=0, tx_packets=63830432, tx_retries=0}
>>>>>>
>>>>>> Also, just to note that if there are changes to existing
>>>>>> interfaces/behaviour it should really mention that in the commit
>>>>>> message so it is highlighted.
>>>>>>
>>>>>>> +
>>>>>>> +When the guest is not able to consume the packets fast enough,
>>>>>>> +the transmit queue of the port gets filled up i.e queue runs out of 
>>>>>>> free descriptors.
>>>>>>> +This is the most likely reason why dpdk transmit API will fail to
>>>>>>> +send packets besides other reasons.
>>>>>>> +
>>>>>>> +The amount of tx failure drops on a dpdk vhost/physical interface
>>>>>>> +can be shown with::
>>>>>>> +
>>>>>>> +  $ ovs-vsctl get Interface dpdkvhostclient0 \
>>>>>>> +
>>>>>>> + statistics:netdev_dpdk_tx_failure_drops
>>>>>>>
>>>>>>
>>>>>> The commit msg/docs are only focusing on one stat for vhost ports,
>>>>>> but there are other stats and dpdk ports, so they should all get some 
>>>>>> mention.
>>>>>
>>>>> I don't feel comfortable for documenting custom stats.  This is just
>>>>> because we can't describe them all.  These are things to be changed over 
>>>>> time.
>>>>> And we don't really know what some types of stats means and if they
>>>>> means the same for different types of HW NICs (they are definitely
>>>>> not) or OVS netdevs (even within netdev-dpdk).
>>>>> And that is one more reason to have a prefix for OVS internal
>>>>> statistics on which we have at least partial control.
>>>>>
>>>>> I think, that user documentation could mention some special
>>>>> statistics while describing the troubleshooting for some hard
>>>>> special cases, but this should not be a glossary of all the possible
>>>>> custom stats.  From my point of view, names of the stats should be
>>>>> as possible self-descriptive and should not require additional 
>>>>> documentation.
>>>>>
>>>>
>>>> For sure any prefix would need to be explained because that part
>>>> would not be intuitive.
>>>
>>> We could add a note to the common DPDK guide about that in a single 
>>> sentence:
>>> "There are custom statistics that OVS accumulates itself and these
>>> stats has 'your_prefix_here_' as a prefix."
>>>
>>
>> yes, that is what I had in mind. It clarifies that it is a custom stat which 
>> explains why something like vhost tx_dropped does not have this prefix even 
>> though it is also counted in ovs/sw/netdev_dpdk.
>>
>> Anyway, I don't want Sriram to be held up over one line of documentation, so 
>> i'm ok with whatever is decided.
>>
>> thanks,
>> Kevin.
>>
>>> 'cstat_' or 'sw_' might need this, but 'netdev_dpdk_' or 'ovs_' looks
>>> self-descriptive enough to not do that.
>>>
>>>>
>>>> thanks,
>>>> Kevin.
>>>>
>>>>> BTW, you will not find any description for statistics provided by
>>>>> the linux or DPDK drivers.  You could only look at the driver code or 
>>>>> device spec.
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>>
>>>>
>
Ilya Maximets Oct. 22, 2019, 10:37 a.m. UTC | #18
On 22.10.2019 12:31, Kevin Traynor wrote:
> On 22/10/2019 09:25, Sriram Vatala wrote:
>> Hi Ilya & Kevin,
>> Thanks for your suggestions. I am summarizing our discussion, please feel free
>> to correct me, if I am wrong.
>>
> 
> Hi Sriram,
> 
> Will share my thought, Ilya may have different view.
> 
>> 1) All custom stats calculated in OVS will have the prefix , so that these
>> stats will not intersect with the names of other stats (driver/HW etc).
> 
> yes
> 
>> 2) The prefix used for these custom stats is "cstat_".
> 
> I think with 3) (note to explain the prefix) then 'sw_' or 'ovs_' would
> look neater in the logs but i'm ok with any of them.

OK. Lets go with 'ovs_' prefix + prefix explanation note in this case.
It looks like it will have lowest chances to overlap with existing stats.
And it also self-descriptive enough for those who doesn't read docs.

Best regards, Ilya Maximets.
Li,Rongqing via dev Oct. 22, 2019, 12:18 p.m. UTC | #19
Thanks Ilya and Kevin.
I will send the updated patch set.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Ilya Maximets <i.maximets@ovn.org>
Sent: 22 October 2019 16:07
To: Kevin Traynor <ktraynor@redhat.com>; Sriram Vatala 
<sriram.v@altencalsoftlabs.com>; 'Ilya Maximets' <i.maximets@ovn.org>; 
ovs-dev@openvswitch.org
Cc: 'Stokes, Ian' <ian.stokes@intel.com>
Subject: Re: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

On 22.10.2019 12:31, Kevin Traynor wrote:
> On 22/10/2019 09:25, Sriram Vatala wrote:
>> Hi Ilya & Kevin,
>> Thanks for your suggestions. I am summarizing our discussion, please
>> feel free to correct me, if I am wrong.
>>
>
> Hi Sriram,
>
> Will share my thought, Ilya may have different view.
>
>> 1) All custom stats calculated in OVS will have the prefix , so that
>> these stats will not intersect with the names of other stats (driver/HW 
>> etc).
>
> yes
>
>> 2) The prefix used for these custom stats is "cstat_".
>
> I think with 3) (note to explain the prefix) then 'sw_' or 'ovs_'
> would look neater in the logs but i'm ok with any of them.

OK. Lets go with 'ovs_' prefix + prefix explanation note in this case.
It looks like it will have lowest chances to overlap with existing stats.
And it also self-descriptive enough for those who doesn't read docs.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
index 724aa62f6..89388a2bf 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -551,7 +551,18 @@  processing packets at the required rate.
 The amount of Tx retries on a vhost-user or vhost-user-client interface can be
 shown with::
 
-  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
+  $ ovs-vsctl get Interface dpdkvhostclient0 statistics:netdev_dpdk_tx_retries
+
+When the guest is not able to consume the packets fast enough, the transmit
+queue of the port gets filled up i.e queue runs out of free descriptors.
+This is the most likely reason why dpdk transmit API will fail to send packets
+besides other reasons.
+
+The amount of tx failure drops on a dpdk vhost/physical interface can be
+shown with::
+
+  $ ovs-vsctl get Interface dpdkvhostclient0 \
+                            statistics:netdev_dpdk_tx_failure_drops
 
 vhost-user Dequeue Zero Copy (experimental)
 -------------------------------------------
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 652b57e3b..fd8f9102e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -174,6 +174,20 @@  static const struct vhost_device_ops virtio_net_device_ops =
     .destroy_connection = destroy_connection,
 };
 
+/* Custom software stats for dpdk ports */
+struct netdev_dpdk_sw_stats {
+    /* No. of retries when unable to transmit. */
+    uint64_t tx_retries;
+    /* Packet drops when unable to transmit; Probably Tx queue is full. */
+    uint64_t tx_failure_drops;
+    /* Packet length greater than device MTU. */
+    uint64_t tx_mtu_exceeded_drops;
+    /* Packet drops in egress policer processing. */
+    uint64_t tx_qos_drops;
+    /* Packet drops in ingress policer processing. */
+    uint64_t rx_qos_drops;
+};
+
 enum { DPDK_RING_SIZE = 256 };
 BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
 enum { DRAIN_TSC = 200000ULL };
@@ -413,11 +427,10 @@  struct netdev_dpdk {
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
         struct netdev_stats stats;
-        /* Custom stat for retries when unable to transmit. */
-        uint64_t tx_retries;
+        struct netdev_dpdk_sw_stats *sw_stats;
         /* Protects stats */
         rte_spinlock_t stats_lock;
-        /* 4 pad bytes here. */
+        /* 36 pad bytes here. */
     );
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1173,7 +1186,9 @@  common_construct(struct netdev *netdev, dpdk_port_t port_no,
     dev->rte_xstats_ids = NULL;
     dev->rte_xstats_ids_size = 0;
 
-    dev->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX;
+    dev->sw_stats = (struct netdev_dpdk_sw_stats *)
+                        xzalloc(sizeof *dev->sw_stats);
+    dev->sw_stats->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX;
 
     return 0;
 }
@@ -1359,6 +1374,7 @@  common_destruct(struct netdev_dpdk *dev)
     ovs_list_remove(&dev->list_node);
     free(ovsrcu_get_protected(struct ingress_policer *,
                               &dev->ingress_policer));
+    free(dev->sw_stats);
     ovs_mutex_destroy(&dev->mutex);
 }
 
@@ -2209,6 +2225,7 @@  netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_rx_counters(&dev->stats, batch->packets,
                                          nb_rx, dropped);
+    dev->sw_stats->rx_qos_drops += dropped;
     rte_spinlock_unlock(&dev->stats_lock);
 
     batch->count = nb_rx;
@@ -2258,6 +2275,7 @@  netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch,
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.rx_dropped += dropped;
+        dev->sw_stats->rx_qos_drops += dropped;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 
@@ -2341,6 +2359,10 @@  __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
     struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
     unsigned int total_pkts = cnt;
     unsigned int dropped = 0;
+    unsigned int tx_failure;
+    unsigned int mtu_drops;
+    unsigned int qos_drops;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
     int i, retries = 0;
     int max_retries = VHOST_ENQ_RETRY_MIN;
     int vid = netdev_dpdk_get_vid(dev);
@@ -2358,9 +2380,12 @@  __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
     rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
 
     cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
+    mtu_drops = total_pkts - cnt;
+    qos_drops = cnt;
     /* Check has QoS has been configured for the netdev */
     cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true);
-    dropped = total_pkts - cnt;
+    qos_drops -= cnt;
+    dropped = qos_drops + mtu_drops;
 
     do {
         int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
@@ -2385,12 +2410,16 @@  __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
         }
     } while (cnt && (retries++ < max_retries));
 
+    tx_failure = cnt;
     rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
 
     rte_spinlock_lock(&dev->stats_lock);
     netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts, total_pkts,
                                          cnt + dropped);
-    dev->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_retries += MIN(retries, max_retries);
+    sw_stats->tx_failure_drops += tx_failure;
+    sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+    sw_stats->tx_qos_drops += qos_drops;
     rte_spinlock_unlock(&dev->stats_lock);
 
 out:
@@ -2415,12 +2444,16 @@  dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
     struct rte_mbuf *pkts[PKT_ARRAY_SIZE];
     uint32_t cnt = batch_cnt;
     uint32_t dropped = 0;
+    uint32_t tx_failure = 0;
+    uint32_t mtu_drops = 0;
+    uint32_t qos_drops = 0;
+    struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
 
     if (dev->type != DPDK_DEV_VHOST) {
         /* Check if QoS has been configured for this netdev. */
         cnt = netdev_dpdk_qos_run(dev, (struct rte_mbuf **) batch->packets,
                                   batch_cnt, false);
-        dropped += batch_cnt - cnt;
+        qos_drops = batch_cnt - cnt;
     }
 
     uint32_t txcnt = 0;
@@ -2433,7 +2466,7 @@  dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
             VLOG_WARN_RL(&rl, "Too big size %u max_packet_len %d",
                          size, dev->max_packet_len);
 
-            dropped++;
+            mtu_drops++;
             continue;
         }
 
@@ -2456,13 +2489,17 @@  dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
             __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **) pkts,
                                      txcnt);
         } else {
-            dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt);
+            tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, txcnt);
         }
     }
 
+    dropped += qos_drops + mtu_drops + tx_failure;
     if (OVS_UNLIKELY(dropped)) {
         rte_spinlock_lock(&dev->stats_lock);
         dev->stats.tx_dropped += dropped;
+        sw_stats->tx_failure_drops += tx_failure;
+        sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+        sw_stats->tx_qos_drops += qos_drops;
         rte_spinlock_unlock(&dev->stats_lock);
     }
 }
@@ -2503,19 +2540,27 @@  netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
         dpdk_do_tx_copy(netdev, qid, batch);
         dp_packet_delete_batch(batch, true);
     } else {
+        struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
         int tx_cnt, dropped;
+        int tx_failure, mtu_drops, qos_drops;
         int batch_cnt = dp_packet_batch_size(batch);
         struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets;
 
         tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt);
+        mtu_drops = batch_cnt - tx_cnt;
+        qos_drops = tx_cnt;
         tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true);
-        dropped = batch_cnt - tx_cnt;
+        qos_drops -= tx_cnt;
 
-        dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
+        tx_failure = netdev_dpdk_eth_tx_burst(dev, qid, pkts, tx_cnt);
 
+        dropped = tx_failure + mtu_drops + qos_drops;
         if (OVS_UNLIKELY(dropped)) {
             rte_spinlock_lock(&dev->stats_lock);
             dev->stats.tx_dropped += dropped;
+            sw_stats->tx_failure_drops += tx_failure;
+            sw_stats->tx_mtu_exceeded_drops += mtu_drops;
+            sw_stats->tx_qos_drops += qos_drops;
             rte_spinlock_unlock(&dev->stats_lock);
         }
     }
@@ -2826,8 +2871,12 @@  netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev,
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
     int i, n;
 
-#define SW_CSTATS \
-    SW_CSTAT(tx_retries)
+#define SW_CSTATS                    \
+    SW_CSTAT(tx_retries)             \
+    SW_CSTAT(tx_failure_drops)       \
+    SW_CSTAT(tx_mtu_exceeded_drops)  \
+    SW_CSTAT(tx_qos_drops)           \
+    SW_CSTAT(rx_qos_drops)
 
 #define SW_CSTAT(NAME) + 1
     custom_stats->size = SW_CSTATS;
@@ -2840,7 +2889,7 @@  netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev,
     rte_spinlock_lock(&dev->stats_lock);
     i = 0;
 #define SW_CSTAT(NAME) \
-    custom_stats->counters[i++].value = dev->NAME;
+    custom_stats->counters[i++].value = dev->sw_stats->NAME;
     SW_CSTATS;
 #undef SW_CSTAT
     rte_spinlock_unlock(&dev->stats_lock);
@@ -2851,8 +2900,8 @@  netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev,
     n = 0;
 #define SW_CSTAT(NAME) \
     if (custom_stats->counters[i].value != UINT64_MAX) {                   \
-        ovs_strlcpy(custom_stats->counters[n].name, #NAME,                 \
-                    NETDEV_CUSTOM_STATS_NAME_SIZE);                        \
+        ovs_strlcpy(custom_stats->counters[n].name,                        \
+                   "netdev_dpdk_"#NAME, NETDEV_CUSTOM_STATS_NAME_SIZE); \
         custom_stats->counters[n].value = custom_stats->counters[i].value; \
         n++;                                                               \
     }                                                                      \
diff --git a/utilities/bugtool/automake.mk b/utilities/bugtool/automake.mk
index 40980b367..dda58e0a1 100644
--- a/utilities/bugtool/automake.mk
+++ b/utilities/bugtool/automake.mk
@@ -22,7 +22,8 @@  bugtool_scripts = \
 	utilities/bugtool/ovs-bugtool-ovs-bridge-datapath-type \
 	utilities/bugtool/ovs-bugtool-ovs-vswitchd-threads-affinity \
 	utilities/bugtool/ovs-bugtool-qos-configs \
-	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa
+	utilities/bugtool/ovs-bugtool-get-dpdk-nic-numa \
+	utilities/bugtool/ovs-bugtool-get-port-stats
 
 scripts_SCRIPTS += $(bugtool_scripts)
 
diff --git a/utilities/bugtool/ovs-bugtool-get-port-stats b/utilities/bugtool/ovs-bugtool-get-port-stats
new file mode 100755
index 000000000..0fe175e6b
--- /dev/null
+++ b/utilities/bugtool/ovs-bugtool-get-port-stats
@@ -0,0 +1,25 @@ 
+#! /bin/bash
+
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of version 2.1 of the GNU Lesser General
+# Public License as published by the Free Software Foundation.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# Copyright (C) 2019 Ericsson AB
+
+for bridge in `ovs-vsctl -- --real list-br`
+do
+    echo -e "\nBridge : ${bridge}\n"
+    for iface in `ovs-vsctl list-ifaces ${bridge}`
+    do
+        echo -e "iface : ${iface}"
+        ovs-vsctl get interface ${iface} statistics
+        echo -e "\n"
+    done
+    echo -e "iface : ${bridge}"
+    ovs-vsctl get interface ${bridge} statistics
+done
diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml b/utilities/bugtool/plugins/network-status/openvswitch.xml
index d39867c6e..8c1ec643e 100644
--- a/utilities/bugtool/plugins/network-status/openvswitch.xml
+++ b/utilities/bugtool/plugins/network-status/openvswitch.xml
@@ -40,4 +40,5 @@ 
     <command label="ovs-ofctl-dump-groups" filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges "dump-groups"</command>
     <command label="ovs-ofctl-dump-group-stats" filters="ovs" repeat="2">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges "dump-group-stats"</command>
     <command label="get_dpdk_nic_numa" filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-dpdk-nic-numa</command>
+    <command label="get_port_statistics" filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-port-stats</command>
 </collect>