diff mbox series

[ovs-dev,v5,1/5] netdev-dpdk: Introduce per rxq/txq Vhost-user statistics.

Message ID 20220105081926.613684-2-maxime.coquelin@redhat.com
State Changes Requested
Headers show
Series dpif-netdev: Hash-based Tx packet steering | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

Maxime Coquelin Jan. 5, 2022, 8:19 a.m. UTC
Hash-based Tx steering feature will enable steering Tx
packets on transmit queues based on their hashes. In order
to test the feature, it is needed to be able to get the
per-queue statistics for Vhost-user ports.

This patch introduces "bytes", "packets" and "error"
per-queue custom statistics for Vhost-user ports.

Suggested-by David Marchand <david.marchand@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
---
 lib/netdev-dpdk.c | 147 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 138 insertions(+), 9 deletions(-)

Comments

Ilya Maximets Jan. 17, 2022, 6:01 p.m. UTC | #1
On 1/5/22 09:19, Maxime Coquelin wrote:
> Hash-based Tx steering feature will enable steering Tx
> packets on transmit queues based on their hashes. In order
> to test the feature, it is needed to be able to get the
> per-queue statistics for Vhost-user ports.
> 
> This patch introduces "bytes", "packets" and "error"
> per-queue custom statistics for Vhost-user ports.
> 
> Suggested-by David Marchand <david.marchand@redhat.com>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Reviewed-by: David Marchand <david.marchand@redhat.com>
> ---
>  lib/netdev-dpdk.c | 147 +++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 138 insertions(+), 9 deletions(-)

Hi, Maxime.

Thanks for the patch; and I really think that it's an important
feature for debugging performance issues in a real-world setups.

However, it causes a performance drop by about 2-2.5% for me
with the VM-VM bidirectional traffic with 2 PMD threads.

The reason is the existing stats_lock.  Unfortunately, in current
code, we're taking the same stats_lock on both rx and tx paths,
and since rx and tx are likely performed by different threads at
the same time, they are frequently locking each other.

Under this circumstances even a slight increase of a critical
section causes a visible performance drop.

One of the possible solutions might be to split the stats_lock
in two (one for rx stats and one for tx stats).  We also should
split or re-align to different cache lines rx and tx fields
of the generic struct netdev_stats, or count all the stats on the
per-queue basis.
Quick prototype of such a solution gives an extra 2-3% performance
boost over the current master and reduces the impact of extra
stats in this patch to a minimum.

I'll polish and submit my prototype code sometime later.
For now, I think, we won't be able to accept this change for 2.17,
since some more development is needed to avoid regression.

There is also a memory leak in this code, but that can be easily
fixed:

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0e1efefe3..f8680a058 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1549,6 +1549,9 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
     dev->vhost_id = NULL;
     rte_free(dev->vhost_rxq_enabled);
 
+    free(dev->vhost_rxq_stats);
+    free(dev->vhost_txq_stats);
+
     common_destruct(dev);
 
     ovs_mutex_unlock(&dpdk_mutex);
---

Best regards, Ilya Maximets.
Maxime Coquelin Sept. 22, 2022, 12:56 p.m. UTC | #2
Hi Ilya,

On 1/17/22 19:01, Ilya Maximets wrote:
> On 1/5/22 09:19, Maxime Coquelin wrote:
>> Hash-based Tx steering feature will enable steering Tx
>> packets on transmit queues based on their hashes. In order
>> to test the feature, it is needed to be able to get the
>> per-queue statistics for Vhost-user ports.
>>
>> This patch introduces "bytes", "packets" and "error"
>> per-queue custom statistics for Vhost-user ports.
>>
>> Suggested-by David Marchand <david.marchand@redhat.com>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Reviewed-by: David Marchand <david.marchand@redhat.com>
>> ---
>>   lib/netdev-dpdk.c | 147 +++++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 138 insertions(+), 9 deletions(-)
> 
> Hi, Maxime.
> 
> Thanks for the patch; and I really think that it's an important
> feature for debugging performance issues in a real-world setups.
> 
> However, it causes a performance drop by about 2-2.5% for me
> with the VM-VM bidirectional traffic with 2 PMD threads.
> 
> The reason is the existing stats_lock.  Unfortunately, in current
> code, we're taking the same stats_lock on both rx and tx paths,
> and since rx and tx are likely performed by different threads at
> the same time, they are frequently locking each other.
> 
> Under this circumstances even a slight increase of a critical
> section causes a visible performance drop.
> 
> One of the possible solutions might be to split the stats_lock
> in two (one for rx stats and one for tx stats).  We also should
> split or re-align to different cache lines rx and tx fields
> of the generic struct netdev_stats, or count all the stats on the
> per-queue basis.
> Quick prototype of such a solution gives an extra 2-3% performance
> boost over the current master and reduces the impact of extra
> stats in this patch to a minimum.
> 
> I'll polish and submit my prototype code sometime later.
> For now, I think, we won't be able to accept this change for 2.17,
> since some more development is needed to avoid regression.

I'm currently working on supporting the new Vhost per queue stats API in
OVS. Have you posted the prototype you did? I cannot find it, and think
it would be better to be applied before my series.

Thanks,
Maxime

> There is also a memory leak in this code, but that can be easily
> fixed:
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 0e1efefe3..f8680a058 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1549,6 +1549,9 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>       dev->vhost_id = NULL;
>       rte_free(dev->vhost_rxq_enabled);
>   
> +    free(dev->vhost_rxq_stats);
> +    free(dev->vhost_txq_stats);
> +
>       common_destruct(dev);
>   
>       ovs_mutex_unlock(&dpdk_mutex);
> ---
> 
> Best regards, Ilya Maximets.
>
Ilya Maximets Sept. 26, 2022, 8:42 p.m. UTC | #3
On 9/22/22 14:56, Maxime Coquelin wrote:
> Hi Ilya,
> 
> On 1/17/22 19:01, Ilya Maximets wrote:
>> On 1/5/22 09:19, Maxime Coquelin wrote:
>>> Hash-based Tx steering feature will enable steering Tx
>>> packets on transmit queues based on their hashes. In order
>>> to test the feature, it is needed to be able to get the
>>> per-queue statistics for Vhost-user ports.
>>>
>>> This patch introduces "bytes", "packets" and "error"
>>> per-queue custom statistics for Vhost-user ports.
>>>
>>> Suggested-by David Marchand <david.marchand@redhat.com>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> Reviewed-by: David Marchand <david.marchand@redhat.com>
>>> ---
>>>   lib/netdev-dpdk.c | 147 +++++++++++++++++++++++++++++++++++++++++++---
>>>   1 file changed, 138 insertions(+), 9 deletions(-)
>>
>> Hi, Maxime.
>>
>> Thanks for the patch; and I really think that it's an important
>> feature for debugging performance issues in a real-world setups.
>>
>> However, it causes a performance drop by about 2-2.5% for me
>> with the VM-VM bidirectional traffic with 2 PMD threads.
>>
>> The reason is the existing stats_lock.  Unfortunately, in current
>> code, we're taking the same stats_lock on both rx and tx paths,
>> and since rx and tx are likely performed by different threads at
>> the same time, they are frequently locking each other.
>>
>> Under this circumstances even a slight increase of a critical
>> section causes a visible performance drop.
>>
>> One of the possible solutions might be to split the stats_lock
>> in two (one for rx stats and one for tx stats).  We also should
>> split or re-align to different cache lines rx and tx fields
>> of the generic struct netdev_stats, or count all the stats on the
>> per-queue basis.
>> Quick prototype of such a solution gives an extra 2-3% performance
>> boost over the current master and reduces the impact of extra
>> stats in this patch to a minimum.
>>
>> I'll polish and submit my prototype code sometime later.
>> For now, I think, we won't be able to accept this change for 2.17,
>> since some more development is needed to avoid regression.
> 
> I'm currently working on supporting the new Vhost per queue stats API in
> OVS. Have you posted the prototype you did? I cannot find it, and think
> it would be better to be applied before my series.

Hi.  I never actually posted it, but here is the commit:
  https://github.com/igsilya/ovs/commit/cc3b03a8d1eb613bc42c9dc7c491efc42206f824

It's fairly simple.  I'm not sure about modifying the
public 'netdev_stats' structure though.  It might be
better to keep 2 instances of that structure.  One for
rx and one for tx and keep them on separate cache lines
along with their locks.

Best regards, Ilya Maximets.

> 
> Thanks,
> Maxime
> 
>> There is also a memory leak in this code, but that can be easily
>> fixed:
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 0e1efefe3..f8680a058 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -1549,6 +1549,9 @@ netdev_dpdk_vhost_destruct(struct netdev *netdev)
>>       dev->vhost_id = NULL;
>>       rte_free(dev->vhost_rxq_enabled);
>>   +    free(dev->vhost_rxq_stats);
>> +    free(dev->vhost_txq_stats);
>> +
>>       common_destruct(dev);
>>         ovs_mutex_unlock(&dpdk_mutex);
>> ---
>>
>> Best regards, Ilya Maximets.
>>
>
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 6782d3e8f..6d301cd2e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -192,6 +192,13 @@  static const struct rte_vhost_device_ops virtio_net_device_ops =
     .guest_notified = vhost_guest_notified,
 };
 
+/* Custom software per-queue stats for vhost ports */
+struct netdev_dpdk_vhost_q_stats {
+    uint64_t bytes;
+    uint64_t packets;
+    uint64_t errors;
+};
+
 /* Custom software stats for dpdk ports */
 struct netdev_dpdk_sw_stats {
     /* No. of retries when unable to transmit. */
@@ -479,9 +486,10 @@  struct netdev_dpdk {
     PADDED_MEMBERS(CACHE_LINE_SIZE,
         struct netdev_stats stats;
         struct netdev_dpdk_sw_stats *sw_stats;
+        struct netdev_dpdk_vhost_q_stats *vhost_txq_stats;
+        struct netdev_dpdk_vhost_q_stats *vhost_rxq_stats;
         /* Protects stats */
         rte_spinlock_t stats_lock;
-        /* 36 pad bytes here. */
     );
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1276,6 +1284,13 @@  common_construct(struct netdev *netdev, dpdk_port_t port_no,
     dev->sw_stats = xzalloc(sizeof *dev->sw_stats);
     dev->sw_stats->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX;
 
+    if (dev->type == DPDK_DEV_VHOST) {
+        dev->vhost_txq_stats = xcalloc(netdev->n_txq,
+                                       sizeof *dev->vhost_txq_stats);
+        dev->vhost_rxq_stats = xcalloc(netdev->n_rxq,
+                                       sizeof *dev->vhost_rxq_stats);
+    }
+
     return 0;
 }
 
@@ -2354,17 +2369,21 @@  netdev_dpdk_vhost_update_rx_size_counters(struct netdev_stats *stats,
 }
 
 static inline void
-netdev_dpdk_vhost_update_rx_counters(struct netdev_dpdk *dev,
+netdev_dpdk_vhost_update_rx_counters(struct netdev_dpdk *dev, int qid,
                                      struct dp_packet **packets, int count,
                                      int qos_drops)
 {
+    struct netdev_dpdk_vhost_q_stats *q_stats = &dev->vhost_rxq_stats[qid];
     struct netdev_stats *stats = &dev->stats;
     struct dp_packet *packet;
     unsigned int packet_size;
     int i;
 
     stats->rx_packets += count;
+    q_stats->packets += count;
     stats->rx_dropped += qos_drops;
+    q_stats->errors += qos_drops;
+
     for (i = 0; i < count; i++) {
         packet = packets[i];
         packet_size = dp_packet_size(packet);
@@ -2375,6 +2394,7 @@  netdev_dpdk_vhost_update_rx_counters(struct netdev_dpdk *dev,
              * further processing. */
             stats->rx_errors++;
             stats->rx_length_errors++;
+            q_stats->errors++;
             continue;
         }
 
@@ -2386,6 +2406,7 @@  netdev_dpdk_vhost_update_rx_counters(struct netdev_dpdk *dev,
         }
 
         stats->rx_bytes += packet_size;
+        q_stats->bytes += packet_size;
     }
 
     if (OVS_UNLIKELY(qos_drops)) {
@@ -2438,7 +2459,7 @@  netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
     }
 
     rte_spinlock_lock(&dev->stats_lock);
-    netdev_dpdk_vhost_update_rx_counters(dev, batch->packets,
+    netdev_dpdk_vhost_update_rx_counters(dev, rxq->queue_id, batch->packets,
                                          nb_rx, qos_drops);
     rte_spinlock_unlock(&dev->stats_lock);
 
@@ -2552,11 +2573,12 @@  netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
 }
 
 static inline void
-netdev_dpdk_vhost_update_tx_counters(struct netdev_dpdk *dev,
+netdev_dpdk_vhost_update_tx_counters(struct netdev_dpdk *dev, int qid,
                                      struct dp_packet **packets,
                                      int attempted,
                                      struct netdev_dpdk_sw_stats *sw_stats_add)
 {
+    struct netdev_dpdk_vhost_q_stats *q_stats = &dev->vhost_txq_stats[qid];
     int dropped = sw_stats_add->tx_mtu_exceeded_drops +
                   sw_stats_add->tx_qos_drops +
                   sw_stats_add->tx_failure_drops +
@@ -2566,10 +2588,15 @@  netdev_dpdk_vhost_update_tx_counters(struct netdev_dpdk *dev,
     int i;
 
     stats->tx_packets += sent;
+    q_stats->packets += sent;
     stats->tx_dropped += dropped;
+    q_stats->errors += dropped;
 
     for (i = 0; i < sent; i++) {
-        stats->tx_bytes += dp_packet_size(packets[i]);
+        uint64_t bytes = dp_packet_size(packets[i]);
+
+        stats->tx_bytes += bytes;
+        q_stats->bytes += bytes;
     }
 
     if (OVS_UNLIKELY(dropped || sw_stats_add->tx_retries)) {
@@ -2657,7 +2684,7 @@  __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
     sw_stats_add.tx_retries = MIN(retries, max_retries);
 
     rte_spinlock_lock(&dev->stats_lock);
-    netdev_dpdk_vhost_update_tx_counters(dev, pkts, total_packets,
+    netdev_dpdk_vhost_update_tx_counters(dev, qid, pkts, total_packets,
                                          &sw_stats_add);
     rte_spinlock_unlock(&dev->stats_lock);
 
@@ -3287,6 +3314,76 @@  netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev,
     return 0;
 }
 
+static int
+netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev,
+                                struct netdev_custom_stats *custom_stats)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int sw_stats_size, i, j;
+
+    netdev_dpdk_get_sw_custom_stats(netdev, custom_stats);
+
+    ovs_mutex_lock(&dev->mutex);
+
+#define VHOST_Q_STATS     \
+    VHOST_Q_STAT(bytes)   \
+    VHOST_Q_STAT(packets) \
+    VHOST_Q_STAT(errors)
+
+    sw_stats_size = custom_stats->size;
+#define VHOST_Q_STAT(NAME) + netdev->n_rxq
+    custom_stats->size += VHOST_Q_STATS;
+#undef VHOST_Q_STAT
+#define VHOST_Q_STAT(NAME) + netdev->n_txq
+    custom_stats->size += VHOST_Q_STATS;
+#undef VHOST_Q_STAT
+    custom_stats->counters = xrealloc(custom_stats->counters,
+                                      custom_stats->size *
+                                      sizeof *custom_stats->counters);
+
+    j = 0;
+    for (i = 0; i < netdev->n_rxq; i++) {
+#define VHOST_Q_STAT(NAME) \
+        snprintf(custom_stats->counters[sw_stats_size + j++].name, \
+                 NETDEV_CUSTOM_STATS_NAME_SIZE, "rx_q%d_"#NAME, i);
+        VHOST_Q_STATS
+#undef VHOST_Q_STAT
+    }
+
+    for (i = 0; i < netdev->n_txq; i++) {
+#define VHOST_Q_STAT(NAME) \
+        snprintf(custom_stats->counters[sw_stats_size + j++].name, \
+                 NETDEV_CUSTOM_STATS_NAME_SIZE, "tx_q%d_"#NAME, i);
+        VHOST_Q_STATS
+#undef VHOST_Q_STAT
+    }
+
+    rte_spinlock_lock(&dev->stats_lock);
+
+    j = 0;
+    for (i = 0; i < netdev->n_rxq; i++) {
+#define VHOST_Q_STAT(NAME) \
+        custom_stats->counters[sw_stats_size + j++].value = \
+            dev->vhost_rxq_stats[i].NAME;
+        VHOST_Q_STATS
+#undef VHOST_Q_STAT
+    }
+
+    for (i = 0; i < netdev->n_txq; i++) {
+#define VHOST_Q_STAT(NAME) \
+        custom_stats->counters[sw_stats_size + j++].value = \
+            dev->vhost_txq_stats[i].NAME;
+        VHOST_Q_STATS
+#undef VHOST_Q_STAT
+    }
+
+    rte_spinlock_unlock(&dev->stats_lock);
+
+    ovs_mutex_unlock(&dev->mutex);
+
+    return 0;
+}
+
 static int
 netdev_dpdk_get_features(const struct netdev *netdev,
                          enum netdev_features *current,
@@ -3556,6 +3653,11 @@  netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
             if (NETDEV_UP & on) {
                 rte_spinlock_lock(&dev->stats_lock);
                 memset(&dev->stats, 0, sizeof dev->stats);
+                memset(dev->sw_stats, 0, sizeof *dev->sw_stats);
+                memset(dev->vhost_rxq_stats, 0,
+                       dev->up.n_rxq * sizeof *dev->vhost_rxq_stats);
+                memset(dev->vhost_txq_stats, 0,
+                       dev->up.n_txq * sizeof *dev->vhost_txq_stats);
                 rte_spinlock_unlock(&dev->stats_lock);
             }
         }
@@ -5048,9 +5150,12 @@  static int
 dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
     OVS_REQUIRES(dev->mutex)
 {
+    int old_n_txq = dev->up.n_txq;
+    int old_n_rxq = dev->up.n_rxq;
+    int err;
+
     dev->up.n_txq = dev->requested_n_txq;
     dev->up.n_rxq = dev->requested_n_rxq;
-    int err;
 
     /* Always keep RX queue 0 enabled for implementations that won't
      * report vring states. */
@@ -5068,6 +5173,30 @@  dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
 
     netdev_dpdk_remap_txqs(dev);
 
+    /* Reset all stats if number of queues changed. */
+    if (dev->up.n_txq != old_n_txq || dev->up.n_rxq != old_n_rxq) {
+        struct netdev_dpdk_vhost_q_stats *old_txq_stats, *new_txq_stats;
+        struct netdev_dpdk_vhost_q_stats *old_rxq_stats, *new_rxq_stats;
+
+        new_txq_stats = xcalloc(dev->up.n_txq, sizeof *dev->vhost_txq_stats);
+        new_rxq_stats = xcalloc(dev->up.n_rxq, sizeof *dev->vhost_rxq_stats);
+
+        rte_spinlock_lock(&dev->stats_lock);
+
+        memset(&dev->stats, 0, sizeof dev->stats);
+        memset(dev->sw_stats, 0, sizeof *dev->sw_stats);
+
+        old_txq_stats = dev->vhost_txq_stats;
+        dev->vhost_txq_stats = new_txq_stats;
+        old_rxq_stats = dev->vhost_rxq_stats;
+        dev->vhost_rxq_stats = new_rxq_stats;
+
+        rte_spinlock_unlock(&dev->stats_lock);
+
+        free(old_txq_stats);
+        free(old_rxq_stats);
+    }
+
     err = netdev_dpdk_mempool_configure(dev);
     if (!err) {
         /* A new mempool was created or re-used. */
@@ -5473,7 +5602,7 @@  static const struct netdev_class dpdk_vhost_class = {
     .send = netdev_dpdk_vhost_send,
     .get_carrier = netdev_dpdk_vhost_get_carrier,
     .get_stats = netdev_dpdk_vhost_get_stats,
-    .get_custom_stats = netdev_dpdk_get_sw_custom_stats,
+    .get_custom_stats = netdev_dpdk_vhost_get_custom_stats,
     .get_status = netdev_dpdk_vhost_user_get_status,
     .reconfigure = netdev_dpdk_vhost_reconfigure,
     .rxq_recv = netdev_dpdk_vhost_rxq_recv,
@@ -5489,7 +5618,7 @@  static const struct netdev_class dpdk_vhost_client_class = {
     .send = netdev_dpdk_vhost_send,
     .get_carrier = netdev_dpdk_vhost_get_carrier,
     .get_stats = netdev_dpdk_vhost_get_stats,
-    .get_custom_stats = netdev_dpdk_get_sw_custom_stats,
+    .get_custom_stats = netdev_dpdk_vhost_get_custom_stats,
     .get_status = netdev_dpdk_vhost_user_get_status,
     .reconfigure = netdev_dpdk_vhost_client_reconfigure,
     .rxq_recv = netdev_dpdk_vhost_rxq_recv,