From patchwork Thu Apr 11 14:00:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Marchand X-Patchwork-Id: 1084057 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44g2kn5vP1z9s47 for ; Fri, 12 Apr 2019 00:01:37 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id DE5372815; Thu, 11 Apr 2019 14:01:35 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id C4EDE2806 for ; Thu, 11 Apr 2019 14:00:59 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id DB36AF4 for ; Thu, 11 Apr 2019 14:00:58 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 704A319D229; Thu, 11 Apr 2019 14:00:58 +0000 (UTC) Received: from dmarchan.remote.csb (ovpn-204-129.brq.redhat.com [10.40.204.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id 290336013A; Thu, 11 Apr 2019 14:00:52 +0000 (UTC) From: David Marchand To: dev@openvswitch.org Date: Thu, 11 Apr 2019 16:00:42 +0200 Message-Id: <1554991244-26307-1-git-send-email-david.marchand@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 11 Apr 2019 14:00:58 +0000 (UTC) X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Cc: i.maximets@samsung.com, maxime.coquelin@redhat.com Subject: [ovs-dev] [PATCH 1/3] dpif-netdev: only poll enabled vhost queues X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org We currently poll all available queues based on the max queue count exchanged with the vhost peer and rely on the vhost library in DPDK to check the vring status beneath. This can lead to some overhead when we have a lot of unused queues. To enhance the situation, we can skip the disabled queues. On rxq notifications, we make use of the netdev's change_seq number so that the pmd thread main loop can cache the queue state periodically. $ ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 1: isolated : true port: dpdk0 queue-id: 0 pmd usage: 0 % polling: enabled pmd thread numa_id 0 core_id 2: isolated : true port: vhost1 queue-id: 0 pmd usage: 0 % polling: disabled port: vhost3 queue-id: 0 pmd usage: 0 % polling: disabled pmd thread numa_id 0 core_id 15: isolated : true port: dpdk1 queue-id: 0 pmd usage: 0 % polling: enabled pmd thread numa_id 0 core_id 16: isolated : true port: vhost0 queue-id: 0 pmd usage: 0 % polling: disabled port: vhost2 queue-id: 0 pmd usage: 0 % polling: disabled $ while true; do ovs-appctl dpif-netdev/pmd-rxq-show |awk ' /polling: / { tot++; if ($NF == "enabled") { en++; } } END { print "total: " tot ", enabled: " en }' sleep 1 done total: 6, enabled: 2 total: 6, enabled: 2 ... # Started vm, virtio devices are bound to kernel driver which enables # F_MQ + all queue pairs total: 6, enabled: 2 total: 66, enabled: 66 ... # Unbound vhost0 and vhost1 from the kernel driver total: 66, enabled: 66 total: 66, enabled: 34 ... # Configured kernel bound devices to use only 1 queue pair total: 66, enabled: 34 total: 66, enabled: 19 total: 66, enabled: 4 ... # While rebooting the vm total: 66, enabled: 4 total: 66, enabled: 2 ... total: 66, enabled: 66 ... # After shutting down the vm total: 66, enabled: 66 total: 66, enabled: 2 Signed-off-by: David Marchand --- lib/dpif-netdev.c | 27 ++++++++++++++++++++ lib/netdev-dpdk.c | 71 +++++++++++++++++++++++++++++++++++++++++---------- lib/netdev-provider.h | 5 ++++ lib/netdev.c | 10 ++++++++ lib/netdev.h | 1 + 5 files changed, 101 insertions(+), 13 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 4d6d0c3..c95612f 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -591,6 +591,8 @@ struct polled_queue { struct dp_netdev_rxq *rxq; odp_port_t port_no; bool emc_enabled; + bool enabled; + uint64_t change_seq; }; /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */ @@ -1171,6 +1173,9 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd) } else { ds_put_format(reply, "%s", "NOT AVAIL"); } + ds_put_format(reply, " polling: %s", + (netdev_rxq_enabled(list[i].rxq->rx)) + ? "enabled" : "disabled"); ds_put_cstr(reply, "\n"); } ovs_mutex_unlock(&pmd->port_mutex); @@ -5198,6 +5203,11 @@ dpif_netdev_run(struct dpif *dpif) } for (i = 0; i < port->n_rxq; i++) { + + if (!netdev_rxq_enabled(port->rxqs[i].rx)) { + continue; + } + if (dp_netdev_process_rxq_port(non_pmd, &port->rxqs[i], port->port_no)) { @@ -5371,6 +5381,9 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread *pmd, poll_list[i].rxq = poll->rxq; poll_list[i].port_no = poll->rxq->port->port_no; poll_list[i].emc_enabled = poll->rxq->port->emc_enabled; + poll_list[i].enabled = netdev_rxq_enabled(poll->rxq->rx); + poll_list[i].change_seq = + netdev_get_change_seq(poll->rxq->port->netdev); i++; } @@ -5436,6 +5449,10 @@ reload: for (i = 0; i < poll_cnt; i++) { + if (!poll_list[i].enabled) { + continue; + } + if (poll_list[i].emc_enabled) { atomic_read_relaxed(&pmd->dp->emc_insert_min, &pmd->ctx.emc_insert_min); @@ -5472,6 +5489,16 @@ reload: if (reload) { break; } + + for (i = 0; i < poll_cnt; i++) { + uint64_t current_seq = + netdev_get_change_seq(poll_list[i].rxq->port->netdev); + if (poll_list[i].change_seq != current_seq) { + poll_list[i].change_seq = current_seq; + poll_list[i].enabled = + netdev_rxq_enabled(poll_list[i].rxq->rx); + } + } } pmd_perf_end_iteration(s, rx_packets, tx_packets, pmd_perf_metrics_enabled(pmd)); diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 47153dc..9088ff4 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -317,6 +317,10 @@ struct dpdk_mp { struct ovs_list list_node OVS_GUARDED_BY(dpdk_mp_mutex); }; +struct dpdk_rx_queue { + bool enabled; +}; + /* There should be one 'struct dpdk_tx_queue' created for * each cpu core. */ struct dpdk_tx_queue { @@ -424,6 +428,8 @@ struct netdev_dpdk { OVSRCU_TYPE(struct ingress_policer *) ingress_policer; uint32_t policer_rate; uint32_t policer_burst; + + struct dpdk_rx_queue *rx_q; ); PADDED_MEMBERS(CACHE_LINE_SIZE, @@ -1109,6 +1115,12 @@ netdev_dpdk_alloc(void) return NULL; } +static struct dpdk_rx_queue * +netdev_dpdk_alloc_rxq(unsigned int n_rxqs) +{ + return dpdk_rte_mzalloc(n_rxqs * sizeof(struct dpdk_rx_queue)); +} + static struct dpdk_tx_queue * netdev_dpdk_alloc_txq(unsigned int n_txqs) { @@ -1235,6 +1247,10 @@ vhost_common_construct(struct netdev *netdev) int socket_id = rte_lcore_to_socket_id(rte_get_master_lcore()); struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + dev->rx_q = netdev_dpdk_alloc_rxq(OVS_VHOST_MAX_QUEUE_NUM); + if (!dev->rx_q) { + return ENOMEM; + } dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM); if (!dev->tx_q) { return ENOMEM; @@ -1354,6 +1370,7 @@ common_destruct(struct netdev_dpdk *dev) OVS_REQUIRES(dpdk_mutex) OVS_EXCLUDED(dev->mutex) { + rte_free(dev->rx_q); rte_free(dev->tx_q); dpdk_mp_put(dev->dpdk_mp); @@ -2201,6 +2218,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, } static int +netdev_dpdk_vhost_rxq_enabled(struct netdev_rxq *rxq) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); + + return dev->rx_q[rxq->queue_id].enabled; +} + +static int netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, int *qfill) { @@ -3531,6 +3556,17 @@ new_device(int vid) /* Clears mapping for all available queues of vhost interface. */ static void +netdev_dpdk_rxq_map_clear(struct netdev_dpdk *dev) + OVS_REQUIRES(dev->mutex) +{ + int i; + + for (i = 0; i < dev->up.n_rxq; i++) { + dev->rx_q[i].enabled = false; + } +} + +static void netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) { @@ -3563,6 +3599,7 @@ destroy_device(int vid) ovs_mutex_lock(&dev->mutex); dev->vhost_reconfigured = false; ovsrcu_index_set(&dev->vid, -1); + netdev_dpdk_rxq_map_clear(dev); netdev_dpdk_txq_map_clear(dev); netdev_change_seq_changed(&dev->up); @@ -3597,24 +3634,30 @@ vring_state_changed(int vid, uint16_t queue_id, int enable) struct netdev_dpdk *dev; bool exists = false; int qid = queue_id / VIRTIO_QNUM; + bool is_rx = (queue_id % VIRTIO_QNUM) == VIRTIO_TXQ; char ifname[IF_NAME_SZ]; rte_vhost_get_ifname(vid, ifname, sizeof ifname); - if (queue_id % VIRTIO_QNUM == VIRTIO_TXQ) { - return 0; - } - ovs_mutex_lock(&dpdk_mutex); LIST_FOR_EACH (dev, list_node, &dpdk_list) { ovs_mutex_lock(&dev->mutex); if (nullable_string_is_equal(ifname, dev->vhost_id)) { - if (enable) { - dev->tx_q[qid].map = qid; + if (is_rx) { + bool enabled = dev->rx_q[qid].enabled; + + dev->rx_q[qid].enabled = enable != 0; + if (enabled ^ dev->rx_q[qid].enabled) { + netdev_change_seq_changed(&dev->up); + } } else { - dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED; + if (enable) { + dev->tx_q[qid].map = qid; + } else { + dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED; + } + netdev_dpdk_remap_txqs(dev); } - netdev_dpdk_remap_txqs(dev); exists = true; ovs_mutex_unlock(&dev->mutex); break; @@ -3624,9 +3667,9 @@ vring_state_changed(int vid, uint16_t queue_id, int enable) ovs_mutex_unlock(&dpdk_mutex); if (exists) { - VLOG_INFO("State of queue %d ( tx_qid %d ) of vhost device '%s' " - "changed to \'%s\'", queue_id, qid, ifname, - (enable == 1) ? "enabled" : "disabled"); + VLOG_INFO("State of queue %d ( %s_qid %d ) of vhost device '%s' " + "changed to \'%s\'", queue_id, is_rx == true ? "rx" : "tx", + qid, ifname, (enable == 1) ? "enabled" : "disabled"); } else { VLOG_INFO("vHost Device '%s' not found", ifname); return -1; @@ -4297,7 +4340,8 @@ static const struct netdev_class dpdk_vhost_class = { .get_stats = netdev_dpdk_vhost_get_stats, .get_status = netdev_dpdk_vhost_user_get_status, .reconfigure = netdev_dpdk_vhost_reconfigure, - .rxq_recv = netdev_dpdk_vhost_rxq_recv + .rxq_recv = netdev_dpdk_vhost_rxq_recv, + .rxq_enabled = netdev_dpdk_vhost_rxq_enabled, }; static const struct netdev_class dpdk_vhost_client_class = { @@ -4311,7 +4355,8 @@ static const struct netdev_class dpdk_vhost_client_class = { .get_stats = netdev_dpdk_vhost_get_stats, .get_status = netdev_dpdk_vhost_user_get_status, .reconfigure = netdev_dpdk_vhost_client_reconfigure, - .rxq_recv = netdev_dpdk_vhost_rxq_recv + .rxq_recv = netdev_dpdk_vhost_rxq_recv, + .rxq_enabled = netdev_dpdk_vhost_rxq_enabled, }; void diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index fb0c27e..39354db 100644 --- a/lib/netdev-provider.h +++ b/lib/netdev-provider.h @@ -789,6 +789,11 @@ struct netdev_class { void (*rxq_destruct)(struct netdev_rxq *); void (*rxq_dealloc)(struct netdev_rxq *); + /* A netdev can report if a queue won't get traffic and should be excluded + * from polling (no callback implicitely means that the queue is enabled). + */ + int (*rxq_enabled)(struct netdev_rxq *); + /* Attempts to receive a batch of packets from 'rx'. In 'batch', the * caller supplies 'packets' as the pointer to the beginning of an array * of NETDEV_MAX_BURST pointers to dp_packet. If successful, the diff --git a/lib/netdev.c b/lib/netdev.c index 7d7ecf6..88db6c7 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -682,6 +682,16 @@ netdev_rxq_close(struct netdev_rxq *rx) } } +int netdev_rxq_enabled(struct netdev_rxq *rx) +{ + bool enabled = true; + + if (rx->netdev->netdev_class->rxq_enabled) { + enabled = rx->netdev->netdev_class->rxq_enabled(rx); + } + return enabled; +} + /* Attempts to receive a batch of packets from 'rx'. 'batch' should point to * the beginning of an array of NETDEV_MAX_BURST pointers to dp_packet. If * successful, this function stores pointers to up to NETDEV_MAX_BURST diff --git a/lib/netdev.h b/lib/netdev.h index d94817f..859f5ef 100644 --- a/lib/netdev.h +++ b/lib/netdev.h @@ -183,6 +183,7 @@ enum netdev_pt_mode netdev_get_pt_mode(const struct netdev *); /* Packet reception. */ int netdev_rxq_open(struct netdev *, struct netdev_rxq **, int id); void netdev_rxq_close(struct netdev_rxq *); +int netdev_rxq_enabled(struct netdev_rxq *); const char *netdev_rxq_get_name(const struct netdev_rxq *); int netdev_rxq_get_queue_id(const struct netdev_rxq *);