From patchwork Thu Apr  4 19:49:53 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: David Marchand <david.marchand@redhat.com>
X-Patchwork-Id: 1077643
Return-Path: <ovs-dev-bounces@openvswitch.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=openvswitch.org
	(client-ip=140.211.169.12; helo=mail.linuxfoundation.org;
	envelope-from=ovs-dev-bounces@openvswitch.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=fail (p=none dis=none) header.from=redhat.com
Received: from mail.linuxfoundation.org (mail.linuxfoundation.org
	[140.211.169.12])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 44Ztp94MQkz9sPW
	for <incoming@patchwork.ozlabs.org>;
	Fri,  5 Apr 2019 06:50:08 +1100 (AEDT)
Received: from mail.linux-foundation.org (localhost [127.0.0.1])
	by mail.linuxfoundation.org (Postfix) with ESMTP id 772221CE5;
	Thu,  4 Apr 2019 19:50:06 +0000 (UTC)
X-Original-To: dev@openvswitch.org
Delivered-To: ovs-dev@mail.linuxfoundation.org
Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org
	[172.17.192.35])
	by mail.linuxfoundation.org (Postfix) with ESMTPS id 4FD241CE0
	for <dev@openvswitch.org>; Thu,  4 Apr 2019 19:50:05 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
	by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 1F7E17C3
	for <dev@openvswitch.org>; Thu,  4 Apr 2019 19:50:04 +0000 (UTC)
Received: from smtp.corp.redhat.com
	(int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mx1.redhat.com (Postfix) with ESMTPS id 39EE73082B02;
	Thu,  4 Apr 2019 19:50:03 +0000 (UTC)
Received: from dmarchan.remote.csb (ovpn-204-129.brq.redhat.com
	[10.40.204.129])
	by smtp.corp.redhat.com (Postfix) with ESMTP id A5CE21001DE0;
	Thu,  4 Apr 2019 19:49:59 +0000 (UTC)
From: David Marchand <david.marchand@redhat.com>
To: dev@openvswitch.org
Date: Thu,  4 Apr 2019 21:49:53 +0200
Message-Id: <1554407393-24298-1-git-send-email-david.marchand@redhat.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
	(mx1.redhat.com [10.5.110.45]);
	Thu, 04 Apr 2019 19:50:03 +0000 (UTC)
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	smtp1.linux-foundation.org
Cc: i.maximets@samsung.com, maxime.coquelin@redhat.com
Subject: [ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <ovs-dev.openvswitch.org>
List-Unsubscribe: <https://mail.openvswitch.org/mailman/options/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=unsubscribe>
List-Archive: <http://mail.openvswitch.org/pipermail/ovs-dev/>
List-Post: <mailto:ovs-dev@openvswitch.org>
List-Help: <mailto:ovs-dev-request@openvswitch.org?subject=help>
List-Subscribe: <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>,
	<mailto:ovs-dev-request@openvswitch.org?subject=subscribe>
MIME-Version: 1.0
Sender: ovs-dev-bounces@openvswitch.org
Errors-To: ovs-dev-bounces@openvswitch.org

We currently poll all available queues based on the max queue count
exchanged with the vhost peer and rely on the vhost library in DPDK to
check the vring status beneath.
This can lead to some overhead when we have a lot of unused queues.
This situation happens easily when you provision more queues than you
need for your virtual machines.

To enhance the situation, we can inform the rxq scheduling algorithm to
skip unused queues. All we need is to catch the per rxq notifications
and trigger a rebalance by asking for a port reconfigure (without
changing the port configuration itself).

A consequence of this is that before a device is connected to the vhost
port, no rxq is polled at all.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---

We tried to lower the number of rebalances but we don't have a
satisfying solution at the moment, so this patch rebalances on each
update.

---
 lib/dpif-netdev.c     |  4 +++
 lib/netdev-dpdk.c     | 71 +++++++++++++++++++++++++++++++++++++++++----------
 lib/netdev-provider.h |  5 ++++
 lib/netdev.c          | 10 ++++++++
 lib/netdev.h          |  1 +
 5 files changed, 78 insertions(+), 13 deletions(-)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 481ef50..7b34ccc 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4546,6 +4546,10 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
         for (int qid = 0; qid < port->n_rxq; qid++) {
             struct dp_netdev_rxq *q = &port->rxqs[qid];
 
+            /* skip disabled rxq */
+            if (!netdev_rxq_enabled(q->rx)) {
+                continue;
+            }
             if (pinned && q->core_id != OVS_CORE_UNSPEC) {
                 struct dp_netdev_pmd_thread *pmd;
 
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 4bf0ca9..8f8fd1a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -326,6 +326,10 @@ struct dpdk_mp {
      struct ovs_list list_node OVS_GUARDED_BY(dpdk_mp_mutex);
  };
 
+struct dpdk_rx_queue {
+    bool enabled;
+};
+
 /* There should be one 'struct dpdk_tx_queue' created for
  * each cpu core. */
 struct dpdk_tx_queue {
@@ -433,6 +437,8 @@ struct netdev_dpdk {
         OVSRCU_TYPE(struct ingress_policer *) ingress_policer;
         uint32_t policer_rate;
         uint32_t policer_burst;
+
+        struct dpdk_rx_queue *rx_q;
     );
 
     PADDED_MEMBERS(CACHE_LINE_SIZE,
@@ -1119,6 +1125,12 @@ netdev_dpdk_alloc(void)
     return NULL;
 }
 
+static struct dpdk_rx_queue *
+netdev_dpdk_alloc_rxq(unsigned int n_rxqs)
+{
+    return dpdk_rte_mzalloc(n_rxqs * sizeof(struct dpdk_rx_queue));
+}
+
 static struct dpdk_tx_queue *
 netdev_dpdk_alloc_txq(unsigned int n_txqs)
 {
@@ -1245,6 +1257,10 @@ vhost_common_construct(struct netdev *netdev)
     int socket_id = rte_lcore_to_socket_id(rte_get_master_lcore());
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 
+    dev->rx_q = netdev_dpdk_alloc_rxq(OVS_VHOST_MAX_QUEUE_NUM);
+    if (!dev->rx_q) {
+        return ENOMEM;
+    }
     dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
     if (!dev->tx_q) {
         return ENOMEM;
@@ -1360,6 +1376,7 @@ common_destruct(struct netdev_dpdk *dev)
     OVS_REQUIRES(dpdk_mutex)
     OVS_EXCLUDED(dev->mutex)
 {
+    rte_free(dev->rx_q);
     rte_free(dev->tx_q);
     dpdk_mp_put(dev->dpdk_mp);
 
@@ -2205,6 +2222,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
 }
 
 static int
+netdev_dpdk_vhost_rxq_enabled(struct netdev_rxq *rxq)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
+
+    return dev->rx_q[rxq->queue_id].enabled;
+}
+
+static int
 netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch,
                      int *qfill)
 {
@@ -3527,6 +3552,17 @@ new_device(int vid)
 
 /* Clears mapping for all available queues of vhost interface. */
 static void
+netdev_dpdk_rxq_map_clear(struct netdev_dpdk *dev)
+    OVS_REQUIRES(dev->mutex)
+{
+    int i;
+
+    for (i = 0; i < dev->up.n_rxq; i++) {
+        dev->rx_q[i].enabled = false;
+    }
+}
+
+static void
 netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)
     OVS_REQUIRES(dev->mutex)
 {
@@ -3559,6 +3595,7 @@ destroy_device(int vid)
             ovs_mutex_lock(&dev->mutex);
             dev->vhost_reconfigured = false;
             ovsrcu_index_set(&dev->vid, -1);
+            netdev_dpdk_rxq_map_clear(dev);
             netdev_dpdk_txq_map_clear(dev);
 
             netdev_change_seq_changed(&dev->up);
@@ -3593,24 +3630,30 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
     struct netdev_dpdk *dev;
     bool exists = false;
     int qid = queue_id / VIRTIO_QNUM;
+    bool is_rx = (queue_id % VIRTIO_QNUM) == VIRTIO_TXQ;
     char ifname[IF_NAME_SZ];
 
     rte_vhost_get_ifname(vid, ifname, sizeof ifname);
 
-    if (queue_id % VIRTIO_QNUM == VIRTIO_TXQ) {
-        return 0;
-    }
-
     ovs_mutex_lock(&dpdk_mutex);
     LIST_FOR_EACH (dev, list_node, &dpdk_list) {
         ovs_mutex_lock(&dev->mutex);
         if (strncmp(ifname, dev->vhost_id, IF_NAME_SZ) == 0) {
-            if (enable) {
-                dev->tx_q[qid].map = qid;
+            if (is_rx) {
+                bool enabled = dev->rx_q[qid].enabled;
+
+                dev->rx_q[qid].enabled = enable != 0;
+                if (enabled ^ dev->rx_q[qid].enabled) {
+                    netdev_request_reconfigure(&dev->up);
+                }
             } else {
-                dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED;
+                if (enable) {
+                    dev->tx_q[qid].map = qid;
+                } else {
+                    dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED;
+                }
+                netdev_dpdk_remap_txqs(dev);
             }
-            netdev_dpdk_remap_txqs(dev);
             exists = true;
             ovs_mutex_unlock(&dev->mutex);
             break;
@@ -3620,9 +3663,9 @@ vring_state_changed(int vid, uint16_t queue_id, int enable)
     ovs_mutex_unlock(&dpdk_mutex);
 
     if (exists) {
-        VLOG_INFO("State of queue %d ( tx_qid %d ) of vhost device '%s'"
-                  "changed to \'%s\'", queue_id, qid, ifname,
-                  (enable == 1) ? "enabled" : "disabled");
+        VLOG_INFO("State of queue %d ( %s_qid %d ) of vhost device '%s' "
+                  "changed to \'%s\'", queue_id, is_rx == true ? "rx" : "tx",
+                  qid, ifname, (enable == 1) ? "enabled" : "disabled");
     } else {
         VLOG_INFO("vHost Device '%s' not found", ifname);
         return -1;
@@ -5014,7 +5057,8 @@ static const struct netdev_class dpdk_vhost_class = {
     .get_stats = netdev_dpdk_vhost_get_stats,
     .get_status = netdev_dpdk_vhost_user_get_status,
     .reconfigure = netdev_dpdk_vhost_reconfigure,
-    .rxq_recv = netdev_dpdk_vhost_rxq_recv
+    .rxq_recv = netdev_dpdk_vhost_rxq_recv,
+    .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
 };
 
 static const struct netdev_class dpdk_vhost_client_class = {
@@ -5028,7 +5072,8 @@ static const struct netdev_class dpdk_vhost_client_class = {
     .get_stats = netdev_dpdk_vhost_get_stats,
     .get_status = netdev_dpdk_vhost_user_get_status,
     .reconfigure = netdev_dpdk_vhost_client_reconfigure,
-    .rxq_recv = netdev_dpdk_vhost_rxq_recv
+    .rxq_recv = netdev_dpdk_vhost_rxq_recv,
+    .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
 };
 
 void
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index fb0c27e..39354db 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -789,6 +789,11 @@ struct netdev_class {
     void (*rxq_destruct)(struct netdev_rxq *);
     void (*rxq_dealloc)(struct netdev_rxq *);
 
+    /* A netdev can report if a queue won't get traffic and should be excluded
+     * from polling (no callback implicitely means that the queue is enabled).
+     */
+    int (*rxq_enabled)(struct netdev_rxq *);
+
     /* Attempts to receive a batch of packets from 'rx'.  In 'batch', the
      * caller supplies 'packets' as the pointer to the beginning of an array
      * of NETDEV_MAX_BURST pointers to dp_packet.  If successful, the
diff --git a/lib/netdev.c b/lib/netdev.c
index 45b50f2..e0ef21f 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -683,6 +683,16 @@ netdev_rxq_close(struct netdev_rxq *rx)
     }
 }
 
+int netdev_rxq_enabled(struct netdev_rxq *rx)
+{
+    bool enabled = true;
+
+    if (rx->netdev->netdev_class->rxq_enabled) {
+        enabled = rx->netdev->netdev_class->rxq_enabled(rx);
+    }
+    return enabled;
+}
+
 /* Attempts to receive a batch of packets from 'rx'.  'batch' should point to
  * the beginning of an array of NETDEV_MAX_BURST pointers to dp_packet.  If
  * successful, this function stores pointers to up to NETDEV_MAX_BURST
diff --git a/lib/netdev.h b/lib/netdev.h
index d94817f..859f5ef 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -183,6 +183,7 @@ enum netdev_pt_mode netdev_get_pt_mode(const struct netdev *);
 /* Packet reception. */
 int netdev_rxq_open(struct netdev *, struct netdev_rxq **, int id);
 void netdev_rxq_close(struct netdev_rxq *);
+int netdev_rxq_enabled(struct netdev_rxq *);
 
 const char *netdev_rxq_get_name(const struct netdev_rxq *);
 int netdev_rxq_get_queue_id(const struct netdev_rxq *);