From patchwork Wed Jun 9 13:09:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gaetan Rivet X-Patchwork-Id: 1489883 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=u256.net header.i=@u256.net header.a=rsa-sha256 header.s=fm2 header.b=I4tPaUkt; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.a=rsa-sha256 header.s=fm3 header.b=qC8lrFlI; dkim-atps=neutral Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G0SFJ0h5pz9sRK for ; Wed, 9 Jun 2021 23:12:19 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 4934B6F81F; Wed, 9 Jun 2021 13:12:13 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0rx9V_ObsERo; Wed, 9 Jun 2021 13:12:11 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id 4733B6F8FF; Wed, 9 Jun 2021 13:12:06 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 32994C002B; Wed, 9 Jun 2021 13:12:05 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1026BC000D for ; Wed, 9 Jun 2021 13:12:03 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id E2254419B6 for ; Wed, 9 Jun 2021 13:10:37 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp2.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=u256.net header.b="I4tPaUkt"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="qC8lrFlI" Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id d7EB7IqiDml6 for ; Wed, 9 Jun 2021 13:10:34 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by smtp2.osuosl.org (Postfix) with ESMTPS id 5D886419A6 for ; Wed, 9 Jun 2021 13:10:22 +0000 (UTC) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id BB60C261A; Wed, 9 Jun 2021 09:10:21 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 09 Jun 2021 09:10:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=u256.net; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm2; bh=ZumV+8X9Mn6u+ /qmPgn+yEp9EzKIxnUdKsSMAdWZHrM=; b=I4tPaUktRYg5Jh/O1SgiNdYKmOtK5 DMc8cepczFC0cQWHaJuTbp5PBo60e3AVq5rGhkFQUhNeuQszse3GGjdC7GJ8UDqK wiiQDy0ryi4WCYMerOVc/LOSTp3/KZzAVWX6Ck92j26LS5eVWTOrYJe9Xi/wc0Qp /56DGsEuKnowBO5lIDklJlcRteS/cogAMKd8M9qjTcZ5xlOd+aHEwcQXLNFWBYPO tVqI9VCGSU7FL1yyNu9kdqV3/JiYJQwgZlQmRI8HoW5goV1VR4adXvXeEmA1EqFW d8qOOOppFXH1va068rn6M6BS8vXLK1HNC/JFzBS1UMF+uX7bQDGcp+s3w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=ZumV+8X9Mn6u+/qmPgn+yEp9EzKIxnUdKsSMAdWZHrM=; b=qC8lrFlI eYZHF0LpAUH1i8LmaNBDBfkrJ02+51OVNSVyT3BE/WGy9WfLLDXpm6b7wDMACN+d wl7+ho5xA+fdnx504mlb++zusuEOUZb4MwecHZxagyQ+sIzGQzHEsCRruZdgmiRo eWcvsYm6pvogpKLtJuiL2bkTxAm9cQZp023tZSeSPvcoxbWtFzn1II0etS0IkAvL ialY962CqS132ZrC8M0MSdBvJaVF+S7jLvttjKPXhMJC+eeEYPrizFoqASp5s9L7 +/vgyDsWZhJdwML/O4DrdNaJx4wa1+qZ5juscsIkj/huwsuSWOm0qNiuRJKymP67 /V/vGIo1HH7BQw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfeduuddgieefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpefirggvthgr nhcutfhivhgvthcuoehgrhhivhgvsehuvdehiedrnhgvtheqnecuggftrfgrthhtvghrnh ephefgveffkeetheetfeeifedvheelfeejfeehveduteejhfekuedtkeeiuedvteehnecu vehluhhsthgvrhfuihiivgepieenucfrrghrrghmpehmrghilhhfrhhomhepghhrihhvvg esuhdvheeirdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 9 Jun 2021 09:10:20 -0400 (EDT) From: Gaetan Rivet To: ovs-dev@openvswitch.org Date: Wed, 9 Jun 2021 15:09:31 +0200 Message-Id: <533f7f8b9c2ddd343b884b2d6921b9285598ed37.1623234822.git.grive@u256.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Cc: Eli Britstein , Maxime Coquelin Subject: [ovs-dev] [PATCH v4 23/27] dpif-netdev: Use lockless queue to manage offloads X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The dataplane threads (PMDs) send offloading commands to a dedicated offload management thread. The current implementation uses a lock and benchmarks show a high contention on the queue in some cases. With high-contention, the mutex will more often lead to the locking thread yielding in wait, using a syscall. This should be avoided in a userland dataplane. The mpsc-queue can be used instead. It uses less cycles and has lower latency. Benchmarks show better behavior as multiple revalidators and one or multiple PMDs writes to a single queue while another thread polls it. One trade-off with the new scheme however is to be forced to poll the queue from the offload thread. Without mutex, a cond_wait cannot be used for signaling. The offload thread is implementing an exponential backoff and will sleep in short increments when no data is available. This makes the thread yield, at the price of some latency to manage offloads after an inactivity period. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein Reviewed-by: Maxime Coquelin --- lib/dpif-netdev.c | 109 ++++++++++++++++++++++++---------------------- 1 file changed, 57 insertions(+), 52 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 1daaecb1c..68dcdf39a 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -53,6 +53,7 @@ #include "id-pool.h" #include "ipf.h" #include "mov-avg.h" +#include "mpsc-queue.h" #include "netdev.h" #include "netdev-offload.h" #include "netdev-provider.h" @@ -452,25 +453,22 @@ union dp_offload_thread_data { }; struct dp_offload_thread_item { - struct ovs_list node; + struct mpsc_queue_node node; enum dp_offload_type type; long long int timestamp; union dp_offload_thread_data data[0]; }; struct dp_offload_thread { - struct ovs_mutex mutex; - struct ovs_list list; - uint64_t enqueued_item; + struct mpsc_queue queue; + atomic_uint64_t enqueued_item; struct mov_avg_cma cma; struct mov_avg_ema ema; - pthread_cond_t cond; }; static struct dp_offload_thread dp_offload_thread = { - .mutex = OVS_MUTEX_INITIALIZER, - .list = OVS_LIST_INITIALIZER(&dp_offload_thread.list), - .enqueued_item = 0, + .queue = MPSC_QUEUE_INITIALIZER(&dp_offload_thread.queue), + .enqueued_item = ATOMIC_VAR_INIT(0), .cma = MOV_AVG_CMA_INITIALIZER, .ema = MOV_AVG_EMA_INITIALIZER(100), }; @@ -2697,11 +2695,8 @@ dp_netdev_free_offload(struct dp_offload_thread_item *offload) static void dp_netdev_append_offload(struct dp_offload_thread_item *offload) { - ovs_mutex_lock(&dp_offload_thread.mutex); - ovs_list_push_back(&dp_offload_thread.list, &offload->node); - dp_offload_thread.enqueued_item++; - xpthread_cond_signal(&dp_offload_thread.cond); - ovs_mutex_unlock(&dp_offload_thread.mutex); + mpsc_queue_insert(&dp_offload_thread.queue, &offload->node); + atomic_count_inc64(&dp_offload_thread.enqueued_item); } static int @@ -2845,58 +2840,68 @@ dp_offload_flush(struct dp_offload_thread_item *item) ovs_barrier_block(flush->barrier); } +#define DP_NETDEV_OFFLOAD_BACKOFF_MIN 1 +#define DP_NETDEV_OFFLOAD_BACKOFF_MAX 64 #define DP_NETDEV_OFFLOAD_QUIESCE_INTERVAL_US (10 * 1000) /* 10 ms */ static void * dp_netdev_flow_offload_main(void *data OVS_UNUSED) { struct dp_offload_thread_item *offload; - struct ovs_list *list; + struct mpsc_queue_node *node; + struct mpsc_queue *queue; long long int latency_us; long long int next_rcu; long long int now; + uint64_t backoff; - next_rcu = time_usec() + DP_NETDEV_OFFLOAD_QUIESCE_INTERVAL_US; - for (;;) { - ovs_mutex_lock(&dp_offload_thread.mutex); - if (ovs_list_is_empty(&dp_offload_thread.list)) { - ovsrcu_quiesce_start(); - ovs_mutex_cond_wait(&dp_offload_thread.cond, - &dp_offload_thread.mutex); - ovsrcu_quiesce_end(); - next_rcu = time_usec() + DP_NETDEV_OFFLOAD_QUIESCE_INTERVAL_US; - } - list = ovs_list_pop_front(&dp_offload_thread.list); - dp_offload_thread.enqueued_item--; - offload = CONTAINER_OF(list, struct dp_offload_thread_item, node); - ovs_mutex_unlock(&dp_offload_thread.mutex); - - switch (offload->type) { - case DP_OFFLOAD_FLOW: - dp_offload_flow(offload); - break; - case DP_OFFLOAD_FLUSH: - dp_offload_flush(offload); - break; - default: - OVS_NOT_REACHED(); + queue = &dp_offload_thread.queue; + mpsc_queue_acquire(queue); + + while (true) { + backoff = DP_NETDEV_OFFLOAD_BACKOFF_MIN; + while (mpsc_queue_tail(queue) == NULL) { + xnanosleep(backoff * 1E6); + if (backoff < DP_NETDEV_OFFLOAD_BACKOFF_MAX) { + backoff <<= 1; + } } - now = time_usec(); + next_rcu = time_usec() + DP_NETDEV_OFFLOAD_QUIESCE_INTERVAL_US; + MPSC_QUEUE_FOR_EACH_POP (node, queue) { + offload = CONTAINER_OF(node, struct dp_offload_thread_item, node); + atomic_count_dec64(&dp_offload_thread.enqueued_item); - latency_us = now - offload->timestamp; - mov_avg_cma_update(&dp_offload_thread.cma, latency_us); - mov_avg_ema_update(&dp_offload_thread.ema, latency_us); + switch (offload->type) { + case DP_OFFLOAD_FLOW: + dp_offload_flow(offload); + break; + case DP_OFFLOAD_FLUSH: + dp_offload_flush(offload); + break; + default: + OVS_NOT_REACHED(); + } - dp_netdev_free_offload(offload); + now = time_usec(); - /* Do RCU synchronization at fixed interval. */ - if (now > next_rcu) { - ovsrcu_quiesce(); - next_rcu = time_usec() + DP_NETDEV_OFFLOAD_QUIESCE_INTERVAL_US; + latency_us = now - offload->timestamp; + mov_avg_cma_update(&dp_offload_thread.cma, latency_us); + mov_avg_ema_update(&dp_offload_thread.ema, latency_us); + + dp_netdev_free_offload(offload); + + /* Do RCU synchronization at fixed interval. */ + if (now > next_rcu) { + ovsrcu_quiesce(); + next_rcu = time_usec() + DP_NETDEV_OFFLOAD_QUIESCE_INTERVAL_US; + } } } + OVS_NOT_REACHED(); + mpsc_queue_release(queue); + return NULL; } @@ -2907,7 +2912,7 @@ queue_netdev_flow_del(struct dp_netdev_pmd_thread *pmd, struct dp_offload_thread_item *offload; if (ovsthread_once_start(&offload_thread_once)) { - xpthread_cond_init(&dp_offload_thread.cond, NULL); + mpsc_queue_init(&dp_offload_thread.queue); ovs_thread_create("hw_offload", dp_netdev_flow_offload_main, NULL); ovsthread_once_done(&offload_thread_once); } @@ -2932,7 +2937,7 @@ queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, } if (ovsthread_once_start(&offload_thread_once)) { - xpthread_cond_init(&dp_offload_thread.cond, NULL); + mpsc_queue_init(&dp_offload_thread.queue); ovs_thread_create("hw_offload", dp_netdev_flow_offload_main, NULL); ovsthread_once_done(&offload_thread_once); } @@ -2983,7 +2988,7 @@ dp_netdev_offload_flush_enqueue(struct dp_netdev *dp, struct dp_offload_flush_item *flush; if (ovsthread_once_start(&offload_thread_once)) { - xpthread_cond_init(&dp_offload_thread.cond, NULL); + mpsc_queue_init(&dp_offload_thread.queue); ovs_thread_create("hw_offload", dp_netdev_flow_offload_main, NULL); ovsthread_once_done(&offload_thread_once); } @@ -4470,8 +4475,8 @@ dpif_netdev_offload_stats_get(struct dpif *dpif, } ovs_mutex_unlock(&dp->port_mutex); - stats->counters[DP_NETDEV_HW_OFFLOADS_STATS_ENQUEUED].value = - dp_offload_thread.enqueued_item; + atomic_read_relaxed(&dp_offload_thread.enqueued_item, + &stats->counters[DP_NETDEV_HW_OFFLOADS_STATS_ENQUEUED].value); stats->counters[DP_NETDEV_HW_OFFLOADS_STATS_INSERTED].value = nb_offloads; stats->counters[DP_NETDEV_HW_OFFLOADS_STATS_LAT_CMA_MEAN].value = mov_avg_cma(&dp_offload_thread.cma);