From patchwork Tue Jun 15 23:22:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gaetan Rivet X-Patchwork-Id: 1492517 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=u256.net header.i=@u256.net header.a=rsa-sha256 header.s=fm2 header.b=c10WnyqH; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.a=rsa-sha256 header.s=fm3 header.b=WuVWQksS; dkim-atps=neutral Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G4PWN5KcLz9sWl for ; Wed, 16 Jun 2021 09:23:12 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id A01F8600CA; Tue, 15 Jun 2021 23:23:09 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WEWKu8yXeEGM; Tue, 15 Jun 2021 23:23:08 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id CD83960662; Tue, 15 Jun 2021 23:23:07 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id A72E8C000E; Tue, 15 Jun 2021 23:23:07 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id EDE7FC000B for ; Tue, 15 Jun 2021 23:23:06 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id D038560662 for ; Tue, 15 Jun 2021 23:23:06 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dO8saFd-u-Z9 for ; Tue, 15 Jun 2021 23:23:06 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp3.osuosl.org (Postfix) with ESMTPS id 3EBC460664 for ; Tue, 15 Jun 2021 23:23:06 +0000 (UTC) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 092AB5C012F; Tue, 15 Jun 2021 19:23:03 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Tue, 15 Jun 2021 19:23:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=u256.net; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm2; bh=Hqzz8DYXyh22v PXuOypiRnykVe2r+BP/dApR3zSAw3M=; b=c10WnyqH52BQmb/84rkOTXDvFI5/w 8hT3dp5dUPgzLmonDGgY3mjqUIpMTuW3X5RdUAIeq1o93MUpVt/9PI3XeyvnkF17 kDQClMO4MSAzliS1MQgZe+R0q/ekNe7F7sS9BzYBl/FvlGHP9j3TsD/NzfLTMBMx WPDJiodENYH3oMdLcuWeI9tEZamEVjXJfefXr2ZmAvB8vgx03s7qh6RNIeThA+NT n4ggRtoeuXFoHRngOYJTsX0EJ7OGhTvzojCwpvTYe//VgBs3iLIrrN7tDYGUuW3T zv08GTYQcFZthu5VMzIprq07xJm7FEEOfZieWms1zMZrCHG8JPQbe3rUQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=Hqzz8DYXyh22vPXuOypiRnykVe2r+BP/dApR3zSAw3M=; b=WuVWQksS yuqfaQO8cfv/fAloeFfoTbGgl/uq95QQkT5qKjtQxgxDBycToTJtTmqjuBSgAraj sb+jqXATs0THbUvsAbe6/BrKT9ct2PpFUJtGebximNCfB7aSFy2sPWqdYFy8biNu wkLHy6v0Ma0o9KBeTTTGWhXYefszY7H0QbuxxRjo5HMZE4l2vtehCKpD6q4UMrhA Y3ez+SDWBy1r/EFIGt/ZV1KxcrrdB0uFX5sWR4eeNCdLsMF3CPgfPltRh1DDEmIe rfpJxii8uIJWm7/Imv0FoZVrEVkJiwJjMkddtYe0C+7umjkUAfCN4PrXrJiD0unT g01ogcF+4ZNtWg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfedvkedgvddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpefirggvthgr nhcutfhivhgvthcuoehgrhhivhgvsehuvdehiedrnhgvtheqnecuggftrfgrthhtvghrnh ephefgveffkeetheetfeeifedvheelfeejfeehveduteejhfekuedtkeeiuedvteehnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhrihhvvg esuhdvheeirdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Jun 2021 19:23:02 -0400 (EDT) From: Gaetan Rivet To: ovs-dev@openvswitch.org Date: Wed, 16 Jun 2021 01:22:46 +0200 Message-Id: <308d3ef9f3fbb8c41f52dd1150cf549c5890287f.1623786081.git.grive@u256.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Cc: Eli Britstein Subject: [ovs-dev] [PATCH v3 1/8] conntrack: Init hash basis first at creation X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The 'hash_basis' field is used sometimes during sub-systems init routine. It will be 0 by default before randomization. Sub-systems would then init some nodes with incorrect hash values. The timeout policies module is affected, making the default policy being referenced using an incorrect hash value. Fixes: 2078901a4c14 ("userspace: Add conntrack timeout policy support.") Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein Acked-by: William Tu --- lib/conntrack.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 99198a601..a5efb37aa 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -291,6 +291,11 @@ conntrack_init(void) static struct ovsthread_once setup_l4_once = OVSTHREAD_ONCE_INITIALIZER; struct conntrack *ct = xzalloc(sizeof *ct); + /* This value can be used during init (e.g. timeout_policy_init()), + * set it first to ensure it is available. + */ + ct->hash_basis = random_uint32(); + ovs_rwlock_init(&ct->resources_lock); ovs_rwlock_wrlock(&ct->resources_lock); hmap_init(&ct->alg_expectations); @@ -308,7 +313,6 @@ conntrack_init(void) timeout_policy_init(ct); ovs_mutex_unlock(&ct->ct_lock); - ct->hash_basis = random_uint32(); atomic_count_init(&ct->n_conn, 0); atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT); atomic_init(&ct->tcp_seq_chk, true); From patchwork Tue Jun 15 23:22:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gaetan Rivet X-Patchwork-Id: 1492519 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=u256.net header.i=@u256.net header.a=rsa-sha256 header.s=fm2 header.b=eSDI/CIw; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.a=rsa-sha256 header.s=fm3 header.b=QPSSfC5H; dkim-atps=neutral Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G4PWY2Fw6z9sWl for ; Wed, 16 Jun 2021 09:23:21 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 277B883C5B; Tue, 15 Jun 2021 23:23:18 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 27l3ijbC53wa; Tue, 15 Jun 2021 23:23:15 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTPS id 991D283C49; Tue, 15 Jun 2021 23:23:14 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 5E7C4C0027; Tue, 15 Jun 2021 23:23:13 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7ADF4C000B for ; Tue, 15 Jun 2021 23:23:08 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 6A8878380B for ; Tue, 15 Jun 2021 23:23:08 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ElzuHQazUByv for ; Tue, 15 Jun 2021 23:23:07 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp1.osuosl.org (Postfix) with ESMTPS id 617C483C14 for ; Tue, 15 Jun 2021 23:23:06 +0000 (UTC) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id AFC065C00E1; Tue, 15 Jun 2021 19:23:03 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Tue, 15 Jun 2021 19:23:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=u256.net; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm2; bh=Yvl4r9kQh64HU ASVvgT1bAiJYkfZycRowrVksd8jWC8=; b=eSDI/CIwYcJQrQGS2TgRQ+vfcG5ES /D+0SsjiED6JgtOhs/aHfGomnqdkppiZtX6fV+qyJYeIOidYHsieiz9x1VO4a60O T+NVxmLlbSPp+hNgu9O0z5B2Z7WZvRbl+kaTPfMki2pJ41cMl5SBC79F6uPQE9Gj IkcBfEfqCEWOTBCzJX6Ffa6KInTZr0Bd85ZppjVgGWt8aX0tCrP/SU3vW78ttXT3 ncNg0cACOR4ShaEAY9kUow4ov2yeRkgMULp28WNoauOsTQ8ulfaBZULWoAsr2y4e 4uP6dz7Jq1Hfx55ZiyULKox34mL7CMw/YNNgr2keAOQKf3tTz8vNpSzcw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=Yvl4r9kQh64HUASVvgT1bAiJYkfZycRowrVksd8jWC8=; b=QPSSfC5H hfHmil6wC3yfo/7HB48qKamgf33CSbccMSKQPrlrYqf5Rp0Qt968ElmbVq94l3wM +pw1mQllkg94g1KKk1exHAIKjCL3rYsos6GigaxEI7dJ4774lBqHQDTG1fuP4Zw/ 98RbzE2FXDR8HnJyXUC0MqzPT8gtRv2adLNzyDNKWCSkKeVA6mtVgmyuT9+KRw+t L1mM22deNBx51sNvg1ySwgO/KV+bvTvuGn8tewfxtCJT/JAduldtYhvzWmKBgomy /m3NOwMM+tuNfZ+Kuyt7J+7taEFwomI6AqQEOjiwBts/Q2638p+jizj6OD3xarwU TNommRm+Xh0BBw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfedvkedgvddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpefirggvthgr nhcutfhivhgvthcuoehgrhhivhgvsehuvdehiedrnhgvtheqnecuggftrfgrthhtvghrnh ephefgveffkeetheetfeeifedvheelfeejfeehveduteejhfekuedtkeeiuedvteehnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhrihhvvg esuhdvheeirdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Jun 2021 19:23:03 -0400 (EDT) From: Gaetan Rivet To: ovs-dev@openvswitch.org Date: Wed, 16 Jun 2021 01:22:47 +0200 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Cc: Eli Britstein Subject: [ovs-dev] [PATCH v3 2/8] ovs-atomic: Expose atomic exchange operation X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The atomic exchange operation is a useful primitive that should be available as well. Most compiler already expose or offer a way to use it, but a single symbol needs to be defined. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein --- lib/ovs-atomic-c++.h | 3 +++ lib/ovs-atomic-clang.h | 5 +++++ lib/ovs-atomic-gcc4+.h | 5 +++++ lib/ovs-atomic-gcc4.7+.h | 5 +++++ lib/ovs-atomic-i586.h | 5 +++++ lib/ovs-atomic-locked.h | 9 +++++++++ lib/ovs-atomic-msvc.h | 22 ++++++++++++++++++++++ lib/ovs-atomic-pthreads.h | 5 +++++ lib/ovs-atomic-x86_64.h | 5 +++++ lib/ovs-atomic.h | 8 +++++++- 10 files changed, 71 insertions(+), 1 deletion(-) diff --git a/lib/ovs-atomic-c++.h b/lib/ovs-atomic-c++.h index d47b8dd39..8605fa9d3 100644 --- a/lib/ovs-atomic-c++.h +++ b/lib/ovs-atomic-c++.h @@ -29,6 +29,9 @@ using std::atomic_compare_exchange_strong_explicit; using std::atomic_compare_exchange_weak; using std::atomic_compare_exchange_weak_explicit; +using std::atomic_exchange; +using std::atomic_exchange_explicit; + #define atomic_read(SRC, DST) \ atomic_read_explicit(SRC, DST, memory_order_seq_cst) #define atomic_read_explicit(SRC, DST, ORDER) \ diff --git a/lib/ovs-atomic-clang.h b/lib/ovs-atomic-clang.h index 34cc2faa7..cdf02a512 100644 --- a/lib/ovs-atomic-clang.h +++ b/lib/ovs-atomic-clang.h @@ -67,6 +67,11 @@ typedef enum { #define atomic_compare_exchange_weak_explicit(DST, EXP, SRC, ORD1, ORD2) \ __c11_atomic_compare_exchange_weak(DST, EXP, SRC, ORD1, ORD2) +#define atomic_exchange(RMW, ARG) \ + atomic_exchange_explicit(RMW, ARG, memory_order_seq_cst) +#define atomic_exchange_explicit(RMW, ARG, ORDER) \ + __c11_atomic_exchange(RMW, ARG, ORDER) + #define atomic_add(RMW, ARG, ORIG) \ atomic_add_explicit(RMW, ARG, ORIG, memory_order_seq_cst) #define atomic_sub(RMW, ARG, ORIG) \ diff --git a/lib/ovs-atomic-gcc4+.h b/lib/ovs-atomic-gcc4+.h index 25bcf20a0..f9accde1a 100644 --- a/lib/ovs-atomic-gcc4+.h +++ b/lib/ovs-atomic-gcc4+.h @@ -128,6 +128,11 @@ atomic_signal_fence(memory_order order) #define atomic_compare_exchange_weak_explicit \ atomic_compare_exchange_strong_explicit +#define atomic_exchange_explicit(DST, SRC, ORDER) \ + __sync_lock_test_and_set(DST, SRC) +#define atomic_exchange(DST, SRC) \ + atomic_exchange_explicit(DST, SRC, memory_order_seq_cst) + #define atomic_op__(RMW, OP, ARG, ORIG) \ ({ \ typeof(RMW) rmw__ = (RMW); \ diff --git a/lib/ovs-atomic-gcc4.7+.h b/lib/ovs-atomic-gcc4.7+.h index 4c197ebe0..846e05775 100644 --- a/lib/ovs-atomic-gcc4.7+.h +++ b/lib/ovs-atomic-gcc4.7+.h @@ -61,6 +61,11 @@ typedef enum { #define atomic_compare_exchange_weak_explicit(DST, EXP, SRC, ORD1, ORD2) \ __atomic_compare_exchange_n(DST, EXP, SRC, true, ORD1, ORD2) +#define atomic_exchange_explicit(DST, SRC, ORDER) \ + __atomic_exchange_n(DST, SRC, ORDER) +#define atomic_exchange(DST, SRC) \ + atomic_exchange_explicit(DST, SRC, memory_order_seq_cst) + #define atomic_add(RMW, OPERAND, ORIG) \ atomic_add_explicit(RMW, OPERAND, ORIG, memory_order_seq_cst) #define atomic_sub(RMW, OPERAND, ORIG) \ diff --git a/lib/ovs-atomic-i586.h b/lib/ovs-atomic-i586.h index 9a385ce84..35a0959ff 100644 --- a/lib/ovs-atomic-i586.h +++ b/lib/ovs-atomic-i586.h @@ -400,6 +400,11 @@ atomic_signal_fence(memory_order order) #define atomic_compare_exchange_weak_explicit \ atomic_compare_exchange_strong_explicit +#define atomic_exchange_explicit(RMW, ARG, ORDER) \ + atomic_exchange__(RMW, ARG, ORDER) +#define atomic_exchange(RMW, ARG) \ + atomic_exchange_explicit(RMW, ARG, memory_order_seq_cst) + #define atomic_add__(RMW, ARG, CLOB) \ asm volatile("lock; xadd %0,%1 ; " \ "# atomic_add__ " \ diff --git a/lib/ovs-atomic-locked.h b/lib/ovs-atomic-locked.h index f8f0ba2a5..bf38c4a43 100644 --- a/lib/ovs-atomic-locked.h +++ b/lib/ovs-atomic-locked.h @@ -31,6 +31,15 @@ void atomic_unlock__(void *); atomic_unlock__(DST), \ false))) +#define atomic_exchange_locked(DST, SRC) \ + ({ \ + atomic_lock__(DST); \ + typeof(*(DST)) __tmp = *(DST); \ + *(DST) = SRC; \ + atomic_unlock__(DST); \ + __tmp; \ + }) + #define atomic_op_locked_add += #define atomic_op_locked_sub -= #define atomic_op_locked_or |= diff --git a/lib/ovs-atomic-msvc.h b/lib/ovs-atomic-msvc.h index 9def887d3..ef8310269 100644 --- a/lib/ovs-atomic-msvc.h +++ b/lib/ovs-atomic-msvc.h @@ -345,6 +345,28 @@ atomic_signal_fence(memory_order order) #define atomic_compare_exchange_weak_explicit \ atomic_compare_exchange_strong_explicit +/* While intrinsics offering different memory ordering + * are available in MSVC C compiler, they are not defined + * in the C++ compiler. Ignore for compatibility. + * + * Use nested ternary operators as the GNU extension ({}) + * is not available. + */ + +#define atomic_exchange_explicit(DST, SRC, ORDER) \ + ((sizeof *(DST) == 1) ? \ + _InterlockedExchange8((char volatile *) DST, SRC) \ + : (sizeof *(DST) == 2) ? \ + _InterlockedExchange16((short volatile *) DST, SRC) \ + : (sizeof *(DST) == 4) ? \ + _InterlockedExchange((long int volatile *) DST, SRC) \ + : (sizeof *(DST) == 8) ? \ + _InterlockedExchange64((__int64 volatile *) DST, SRC) \ + : (ovs_abort(), 0)) + +#define atomic_exchange(DST, SRC) \ + atomic_exchange_explicit(DST, SRC, memory_order_seq_cst) + /* MSVCs c++ compiler implements c11 atomics and looking through its * implementation (in xatomic.h), orders are ignored for x86 platform. * Do the same here. */ diff --git a/lib/ovs-atomic-pthreads.h b/lib/ovs-atomic-pthreads.h index 12234e79e..570a67fe4 100644 --- a/lib/ovs-atomic-pthreads.h +++ b/lib/ovs-atomic-pthreads.h @@ -77,6 +77,11 @@ atomic_signal_fence(memory_order order OVS_UNUSED) #define atomic_compare_exchange_weak_explicit \ atomic_compare_exchange_strong_explicit +#define atomic_exchange(DST, SRC) \ + atomic_exchange_locked(DST, SRC) +#define atomic_exchange_explicit(DST, SRC, ORDER) \ + ((void) (ORDER), atomic_exchange(DST, SRC)) + #define atomic_add(RMW, ARG, ORIG) atomic_op_locked(RMW, add, ARG, ORIG) #define atomic_sub(RMW, ARG, ORIG) atomic_op_locked(RMW, sub, ARG, ORIG) #define atomic_or( RMW, ARG, ORIG) atomic_op_locked(RMW, or, ARG, ORIG) diff --git a/lib/ovs-atomic-x86_64.h b/lib/ovs-atomic-x86_64.h index 1e7d42707..3bdaf2f08 100644 --- a/lib/ovs-atomic-x86_64.h +++ b/lib/ovs-atomic-x86_64.h @@ -274,6 +274,11 @@ atomic_signal_fence(memory_order order) #define atomic_compare_exchange_weak_explicit \ atomic_compare_exchange_strong_explicit +#define atomic_exchange_explicit(RMW, ARG, ORDER) \ + atomic_exchange__(RMW, ARG, ORDER) +#define atomic_exchange(RMW, ARG) \ + atomic_exchange_explicit(RMW, ARG, memory_order_seq_cst) + #define atomic_add__(RMW, ARG, CLOB) \ asm volatile("lock; xadd %0,%1 ; " \ "# atomic_add__ " \ diff --git a/lib/ovs-atomic.h b/lib/ovs-atomic.h index 11fa19268..8fdce0cf8 100644 --- a/lib/ovs-atomic.h +++ b/lib/ovs-atomic.h @@ -210,7 +210,7 @@ * In this section, A is an atomic type and C is the corresponding non-atomic * type. * - * The "store" and "compare_exchange" primitives match C11: + * The "store", "exchange", and "compare_exchange" primitives match C11: * * void atomic_store(A *object, C value); * void atomic_store_explicit(A *object, C value, memory_order); @@ -244,6 +244,12 @@ * efficiently, so it should be used if the application will need to * loop anyway. * + * C atomic_exchange(A *object, C desired); + * C atomic_exchange_explicit(A *object, C desired, memory_order); + * + * Atomically stores 'desired' into '*object', returning the value + * previously held. + * * The following primitives differ from the C11 ones (and have different names) * because there does not appear to be a way to implement the standard * primitives in standard C: From patchwork Tue Jun 15 23:22:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gaetan Rivet X-Patchwork-Id: 1492521 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=u256.net header.i=@u256.net header.a=rsa-sha256 header.s=fm2 header.b=GLjzELte; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.a=rsa-sha256 header.s=fm3 header.b=aCf00PZl; dkim-atps=neutral Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G4PWc1MxQz9sWl for ; Wed, 16 Jun 2021 09:23:24 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 0F4B983C8F; Tue, 15 Jun 2021 23:23:19 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4wddf4DO0MVf; Tue, 15 Jun 2021 23:23:14 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTPS id A541283C3D; Tue, 15 Jun 2021 23:23:12 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6CA0DC001C; Tue, 15 Jun 2021 23:23:12 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 24CCCC000B for ; Tue, 15 Jun 2021 23:23:08 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id F0AC740473 for ; Tue, 15 Jun 2021 23:23:07 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp2.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=u256.net header.b="GLjzELte"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="aCf00PZl" Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id v-WiYL_PDPg1 for ; Tue, 15 Jun 2021 23:23:05 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp2.osuosl.org (Postfix) with ESMTPS id AF62F403A5 for ; Tue, 15 Jun 2021 23:23:05 +0000 (UTC) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id E23805C0049; Tue, 15 Jun 2021 19:23:04 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 15 Jun 2021 19:23:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=u256.net; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm2; bh=wUMwJW8+yntXF +wW+cOpB5Zo5HrIuNuR21WjHV4F0EM=; b=GLjzELtev7OhBvM9ITkv37/XrfYiQ 10PNwGk2EprHlsecLRZzc4lw0a0vpZkQ5SV9XH2GABoBOL4VsfMKs+HvlFgFbNL/ V9TcJTT1vszsc4mX73HSF+udDXT1w7a+CP5kFxtO41H5pivpgb03rJlWM2YiKRDR gsAlcOj/8sIbuqVtiP2RbobYCMtEDwFfWKwyd6vacA2u0JEiuXvQq3fi+G1Pftvc tRopSHGW0rCO03kxahDO0rfX8Yvwwg+AGdFDNZmp8Pino63lknLU3O66jTSZhrLu RjbwysErzFon733EX1QyXV3CIgx0wlhetn9/U/4pI1hHQri8YV71GxVgA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=wUMwJW8+yntXF+wW+cOpB5Zo5HrIuNuR21WjHV4F0EM=; b=aCf00PZl qK3CznTGVbdg2a/d9uhi636kdX0XwEASJoFF1yPcBhLZHY42lcVgnqT/IPDVFOXh X1NT6TJd8ZwnAss2BD/rWIJJTXlrr45ymplSBkzok7wDByhBpo72M24vv4+dRNNN VmXxhnnlD6PCel/r6W7+DMDLEIpDMt7+dROH+wdP1hlVcuVyGFsIQ2+/BmaHg3fR ly4CJZx3j8NENBLXSzOToG5oSk+6NBG4OAlm1D3bwLxbOz3u/AFifLrc69BHpmMF +/ReHtGek77S+uAIffBsZk3LN63CYYLzt+8L3apQljb+sTVCIOvzTxF4gsys5rc8 d8b9HBZH0mRvvQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfedvkedgvddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpefirggvthgr nhcutfhivhgvthcuoehgrhhivhgvsehuvdehiedrnhgvtheqnecuggftrfgrthhtvghrnh eptedthefhudeugffgffelhffffedtkeefudefjedvgffhgfdttdfgueeuhfffgfejnecu ffhomhgrihhnpegrphgrtghhvgdrohhrghdpuddtvdegtghorhgvshdrnhgvthdprhhotg hhvghsthgvrhdrvgguuhenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgr ihhlfhhrohhmpehgrhhivhgvsehuvdehiedrnhgvth X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Jun 2021 19:23:03 -0400 (EDT) From: Gaetan Rivet To: ovs-dev@openvswitch.org Date: Wed, 16 Jun 2021 01:22:48 +0200 Message-Id: <8edfa9ace5bd779eab7f8dc956873e5a701d4047.1623786081.git.grive@u256.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Cc: Eli Britstein Subject: [ovs-dev] [PATCH v3 3/8] mpsc-queue: Module for lock-free message passing X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Add a lockless multi-producer/single-consumer (MPSC), linked-list based, intrusive, unbounded queue that does not require deferred memory management. The queue is designed to improve the specific MPSC setup. A benchmark accompanies the unit tests to measure the difference in this configuration. A single reader thread polls the queue while N writers enqueue elements as fast as possible. The mpsc-queue is compared against the regular ovs-list as well as the guarded list. The latter usually offers a slight improvement by batching the element removal, however the mpsc-queue is faster. The average is of each producer threads time: $ ./tests/ovstest test-mpsc-queue benchmark 3000000 1 Benchmarking n=3000000 on 1 + 1 threads. type\thread: Reader 1 Avg mpsc-queue: 167 167 167 ms list(spin): 89 80 80 ms list(mutex): 745 745 745 ms guarded list: 788 788 788 ms $ ./tests/ovstest test-mpsc-queue benchmark 3000000 2 Benchmarking n=3000000 on 1 + 2 threads. type\thread: Reader 1 2 Avg mpsc-queue: 98 97 94 95 ms list(spin): 185 171 173 172 ms list(mutex): 203 199 203 201 ms guarded list: 269 269 188 228 ms $ ./tests/ovstest test-mpsc-queue benchmark 3000000 3 Benchmarking n=3000000 on 1 + 3 threads. type\thread: Reader 1 2 3 Avg mpsc-queue: 76 76 65 76 72 ms list(spin): 246 110 240 238 196 ms list(mutex): 542 541 541 539 540 ms guarded list: 535 535 507 511 517 ms $ ./tests/ovstest test-mpsc-queue benchmark 3000000 4 Benchmarking n=3000000 on 1 + 4 threads. type\thread: Reader 1 2 3 4 Avg mpsc-queue: 73 68 68 68 68 68 ms list(spin): 294 275 279 277 282 278 ms list(mutex): 346 309 287 345 302 310 ms guarded list: 378 319 334 378 351 345 ms Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein --- lib/automake.mk | 2 + lib/mpsc-queue.c | 251 +++++++++++++ lib/mpsc-queue.h | 190 ++++++++++ tests/automake.mk | 1 + tests/library.at | 5 + tests/test-mpsc-queue.c | 772 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 1221 insertions(+) create mode 100644 lib/mpsc-queue.c create mode 100644 lib/mpsc-queue.h create mode 100644 tests/test-mpsc-queue.c diff --git a/lib/automake.mk b/lib/automake.mk index db9017591..4b68c7227 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -166,6 +166,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/memory.c \ lib/memory.h \ lib/meta-flow.c \ + lib/mpsc-queue.c \ + lib/mpsc-queue.h \ lib/multipath.c \ lib/multipath.h \ lib/namemap.c \ diff --git a/lib/mpsc-queue.c b/lib/mpsc-queue.c new file mode 100644 index 000000000..ee762e1dc --- /dev/null +++ b/lib/mpsc-queue.c @@ -0,0 +1,251 @@ +/* + * Copyright (c) 2020 NVIDIA Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include "ovs-atomic.h" + +#include "mpsc-queue.h" + +/* Multi-producer, single-consumer queue + * ===================================== + * + * This an implementation of the MPSC queue described by Dmitri Vyukov [1]. + * + * One atomic exchange operation is done per insertion. Removal in most cases + * will not require atomic operation and will use one atomic exchange to close + * the queue chain. + * + * Insertion + * ========= + * + * The queue is implemented using a linked-list. Insertion is done at the + * back of the queue, by swapping the current end with the new node atomically, + * then pointing the previous end toward the new node. To follow Vyukov + * nomenclature, the end-node of the chain is called head. A producer will + * only manipulate the head. + * + * The head swap is atomic, however the link from the previous head to the new + * one is done in a separate operation. This means that the chain is + * momentarily broken, when the previous head still points to NULL and the + * current head has been inserted. + * + * Considering a series of insertions, the queue state will remain consistent + * and the insertions order is compatible with their precedence, thus the + * queue is serializable. However, because an insertion consists in two + * separate memory transactions, it is not linearizable. + * + * Removal + * ======= + * + * The consumer must deal with the queue inconsistency. It will manipulate + * the tail of the queue and move it along the latest consumed elements. + * When an end of the chain of elements is found (the next pointer is NULL), + * the tail is compared with the head. + * + * If both points to different addresses, then the queue is in an inconsistent + * state: the tail cannot move forward as the next is NULL, but the head is not + * the last element in the chain: this can only happen if the chain is broken. + * + * In this case, the consumer must wait for the producer to finish writing the + * next pointer of its current tail: 'MPSC_QUEUE_RETRY' is returned. + * + * Removal is thus in most cases (when there are elements in the queue) + * accomplished without using atomics, until the last element of the queue. + * There, the head is atomically loaded. If the queue is in a consistent state, + * the head is moved back to the queue stub by inserting the stub in the queue: + * ending the queue is the same as an insertion, which is one atomic XCHG. + * + * Forward guarantees + * ================== + * + * Insertion and peeking are wait-free: they will execute in a known bounded + * number of instructions, regardless of the state of the queue. + * + * However, while removal consists in peeking and a constant write to + * update the tail, it can repeatedly fail until the queue become consistent. + * It is thus dependent on other threads progressing. This means that the + * queue forward progress is obstruction-free only. It has a potential for + * livelocking. + * + * The chain will remain broken as long as a producer is not finished writing + * its next pointer. If a producer is cancelled for example, the queue could + * remain broken for any future readings. This queue should either be used + * with cooperative threads or insertion must only be done outside cancellable + * sections. + * + * Performances + * ============ + * + * In benchmarks this structure was better than alternatives such as: + * + * * A reversed Treiber stack [2], using 1 CAS per operations + * and requiring reversal of the node list on removal. + * + * * Michael-Scott lock-free queue [3], using 2 CAS per operations. + * + * While it is not linearizable, this queue is well-suited for message passing. + * If a proper hardware XCHG operation is used, it scales better than + * CAS-based implementations. + * + * References + * ========== + * + * [1]: http://www.1024cores.net/home/lock-free-algorithms/queues/intrusive-mpsc-node-based-queue + * + * [2]: R. K. Treiber. Systems programming: Coping with parallelism. + * Technical Report RJ 5118, IBM Almaden Research Center, April 1986. + * + * [3]: M. M. Michael, Simple, Fast, and Practical Non-Blocking and + * Blocking Concurrent Queue Algorithms + * [3]: https://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html + * + */ + +void +mpsc_queue_init(struct mpsc_queue *queue) +{ + atomic_store_relaxed(&queue->head, &queue->stub); + atomic_store_relaxed(&queue->tail, &queue->stub); + atomic_store_relaxed(&queue->stub.next, NULL); + + ovs_mutex_init(&queue->read_lock); +} + +void +mpsc_queue_destroy(struct mpsc_queue *queue) + OVS_EXCLUDED(queue->read_lock) +{ + ovs_mutex_destroy(&queue->read_lock); +} + +enum mpsc_queue_poll_result +mpsc_queue_poll(struct mpsc_queue *queue, struct mpsc_queue_node **node) + OVS_REQUIRES(queue->read_lock) +{ + struct mpsc_queue_node *tail; + struct mpsc_queue_node *next; + struct mpsc_queue_node *head; + + atomic_read_relaxed(&queue->tail, &tail); + atomic_read_explicit(&tail->next, &next, memory_order_acquire); + + if (tail == &queue->stub) { + if (next == NULL) { + return MPSC_QUEUE_EMPTY; + } + + atomic_store_relaxed(&queue->tail, next); + tail = next; + atomic_read_explicit(&tail->next, &next, memory_order_acquire); + } + + if (next != NULL) { + atomic_store_relaxed(&queue->tail, next); + *node = tail; + return MPSC_QUEUE_ITEM; + } + + atomic_read_explicit(&queue->head, &head, memory_order_acquire); + if (tail != head) { + return MPSC_QUEUE_RETRY; + } + + mpsc_queue_insert(queue, &queue->stub); + + atomic_read_explicit(&tail->next, &next, memory_order_acquire); + if (next != NULL) { + atomic_store_relaxed(&queue->tail, next); + *node = tail; + return MPSC_QUEUE_ITEM; + } + + return MPSC_QUEUE_EMPTY; +} + +struct mpsc_queue_node * +mpsc_queue_pop(struct mpsc_queue *queue) + OVS_REQUIRES(queue->read_lock) +{ + enum mpsc_queue_poll_result result; + struct mpsc_queue_node *node; + + do { + result = mpsc_queue_poll(queue, &node); + if (result == MPSC_QUEUE_EMPTY) { + return NULL; + } + } while (result == MPSC_QUEUE_RETRY); + + return node; +} + +void +mpsc_queue_push_front(struct mpsc_queue *queue, struct mpsc_queue_node *node) + OVS_REQUIRES(queue->read_lock) +{ + struct mpsc_queue_node *tail; + + atomic_read_relaxed(&queue->tail, &tail); + atomic_store_relaxed(&node->next, tail); + atomic_store_relaxed(&queue->tail, node); +} + +struct mpsc_queue_node * +mpsc_queue_tail(struct mpsc_queue *queue) + OVS_REQUIRES(queue->read_lock) +{ + struct mpsc_queue_node *tail; + struct mpsc_queue_node *next; + + atomic_read_relaxed(&queue->tail, &tail); + atomic_read_explicit(&tail->next, &next, memory_order_acquire); + + if (tail == &queue->stub) { + if (next == NULL) { + return NULL; + } + + atomic_store_relaxed(&queue->tail, next); + tail = next; + } + + return tail; +} + +/* Get the next element of a node. */ +struct mpsc_queue_node *mpsc_queue_next(struct mpsc_queue *queue, + struct mpsc_queue_node *prev) + OVS_REQUIRES(queue->read_lock) +{ + struct mpsc_queue_node *next; + + atomic_read_explicit(&prev->next, &next, memory_order_acquire); + if (next == &queue->stub) { + atomic_read_explicit(&next->next, &next, memory_order_acquire); + } + return next; +} + +void +mpsc_queue_insert(struct mpsc_queue *queue, struct mpsc_queue_node *node) +{ + struct mpsc_queue_node *prev; + + atomic_store_relaxed(&node->next, NULL); + prev = atomic_exchange_explicit(&queue->head, node, memory_order_acq_rel); + atomic_store_explicit(&prev->next, node, memory_order_release); +} diff --git a/lib/mpsc-queue.h b/lib/mpsc-queue.h new file mode 100644 index 000000000..3bb9e3bee --- /dev/null +++ b/lib/mpsc-queue.h @@ -0,0 +1,190 @@ +/* + * Copyright (c) 2020 NVIDIA Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef MPSC_QUEUE_H +#define MPSC_QUEUE_H 1 + +#include +#include +#include + +#include +#include + +#include "ovs-atomic.h" + +/* Multi-producer, single-consumer queue + * ===================================== + * + * This data structure is a lockless queue implementation with + * the following properties: + * + * * Multi-producer: multiple threads can write concurrently. + * Insertion in the queue is thread-safe, no inter-thread + * synchronization is necessary. + * + * * Single-consumer: only a single thread can safely remove + * nodes from the queue. The queue must be 'acquired' using + * 'mpsc_queue_acquire()' before removing nodes. + * + * * Unbounded: the queue is backed by a linked-list and is not + * limited in number of elements. + * + * * Intrusive: queue elements are allocated as part of larger + * objects. Objects are retrieved by offset manipulation. + * + * * per-producer FIFO: Elements in the queue are kept in the + * order their producer inserted them. The consumer retrieves + * them in in the same insertion order. When multiple + * producers insert at the same time, either will proceed. + * + * This queue is well-suited for message passing between threads, + * where any number of thread can insert a message and a single + * thread is meant to receive and process it. + * + * Thread-safety + * ============= + * + * The consumer thread must acquire the queue using 'mpsc_queue_acquire()'. + * Once the queue is protected against concurrent reads, the thread can call + * the consumer API: + * + * * mpsc_queue_poll() to peek and return the tail of the queue + * * mpsc_queue_pop() to remove the tail of the queue + * * mpsc_queue_tail() to read the current tail + * * mpsc_queue_push_front() to enqueue an element safely at the tail + * * MPSC_QUEUE_FOR_EACH() to iterate over the current elements, + * without removing them. + * * MPSC_QUEUE_FOR_EACH_POP() to iterate over the elements while + * removing them. + * + * When a thread is finished with reading the queue, it can release the + * reader lock using 'mpsc_queue_release()'. + * + * Producers can always insert elements in the queue, even if no consumer + * acquired the reader lock. No inter-producer synchronization is needed. + * + * The consumer thread is also allowed to insert elements while it holds the + * reader lock. + * + * Producer threads must never be cancelled while writing to the queue. + * This will block the consumer, that will then lose any subsequent elements + * in the queue. Producers should ideally be cooperatively managed or + * the queue insertion should be within non-cancellable sections. + * + * Queue state + * =========== + * + * When polling the queue, three states can be observed: 'empty', 'non-empty', + * and 'inconsistent'. Three polling results are defined, respectively: + * + * * MPSC_QUEUE_EMPTY: the queue is empty. + * * MPSC_QUEUE_ITEM: an item was available and has been removed. + * * MPSC_QUEUE_RETRY: the queue is inconsistent. + * + * If 'MPSC_QUEUE_RETRY' is returned, then a producer has not yet finished + * writing to the queue and the list of nodes is not coherent. The consumer + * can retry shortly to check if the producer has finished. + * + * This behavior is the reason the removal function is called + * 'mpsc_queue_poll()'. + * + */ + +struct mpsc_queue_node { + ATOMIC(struct mpsc_queue_node *) next; +}; + +struct mpsc_queue { + ATOMIC(struct mpsc_queue_node *) head; + ATOMIC(struct mpsc_queue_node *) tail; + struct mpsc_queue_node stub; + struct ovs_mutex read_lock; +}; + +#define MPSC_QUEUE_INITIALIZER(Q) { \ + .head = ATOMIC_VAR_INIT(&(Q)->stub), \ + .tail = ATOMIC_VAR_INIT(&(Q)->stub), \ + .stub = { .next = ATOMIC_VAR_INIT(NULL) }, \ + .read_lock = OVS_MUTEX_INITIALIZER, \ +} + +/* Consumer API. */ + +/* Initialize the queue. Not necessary is 'MPSC_QUEUE_INITIALIZER' was used. */ +void mpsc_queue_init(struct mpsc_queue *queue); +/* The reader lock must be released prior to destroying the queue. */ +void mpsc_queue_destroy(struct mpsc_queue *queue); + +/* Acquire and release the consumer lock. */ +#define mpsc_queue_acquire(q) do { \ + ovs_mutex_lock(&(q)->read_lock); \ + } while (0) +#define mpsc_queue_release(q) do { \ + ovs_mutex_unlock(&(q)->read_lock); \ + } while (0) + +enum mpsc_queue_poll_result { + /* Queue is empty. */ + MPSC_QUEUE_EMPTY, + /* Polling the queue returned an item. */ + MPSC_QUEUE_ITEM, + /* Data has been enqueued but one or more producer thread have not + * finished writing it. The queue is in an inconsistent state. + * Retrying shortly, if the producer threads are still active, will + * return the data. + */ + MPSC_QUEUE_RETRY, +}; + +/* Set 'node' to a removed item from the queue if 'MPSC_QUEUE_ITEM' is + * returned, otherwise 'node' is not set. + */ +enum mpsc_queue_poll_result mpsc_queue_poll(struct mpsc_queue *queue, + struct mpsc_queue_node **node) + OVS_REQUIRES(queue->read_lock); + +/* Pop an element if there is any in the queue. */ +struct mpsc_queue_node *mpsc_queue_pop(struct mpsc_queue *queue) + OVS_REQUIRES(queue->read_lock); + +/* Insert at the front of the queue. Only the consumer can do it. */ +void mpsc_queue_push_front(struct mpsc_queue *queue, + struct mpsc_queue_node *node) + OVS_REQUIRES(queue->read_lock); + +/* Get the current queue tail. */ +struct mpsc_queue_node *mpsc_queue_tail(struct mpsc_queue *queue) + OVS_REQUIRES(queue->read_lock); + +/* Get the next element of a node. */ +struct mpsc_queue_node *mpsc_queue_next(struct mpsc_queue *queue, + struct mpsc_queue_node *prev) + OVS_REQUIRES(queue->read_lock); + +#define MPSC_QUEUE_FOR_EACH(node, queue) \ + for (node = mpsc_queue_tail(queue); node != NULL; \ + node = mpsc_queue_next((queue), node)) + +#define MPSC_QUEUE_FOR_EACH_POP(node, queue) \ + for (node = mpsc_queue_pop(queue); node != NULL; \ + node = mpsc_queue_pop(queue)) + +/* Producer API. */ + +void mpsc_queue_insert(struct mpsc_queue *queue, struct mpsc_queue_node *node); + +#endif /* MPSC_QUEUE_H */ diff --git a/tests/automake.mk b/tests/automake.mk index 1a528aa39..e95eb0180 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -465,6 +465,7 @@ tests_ovstest_SOURCES = \ tests/test-list.c \ tests/test-lockfile.c \ tests/test-multipath.c \ + tests/test-mpsc-queue.c \ tests/test-netflow.c \ tests/test-odp.c \ tests/test-ofpbuf.c \ diff --git a/tests/library.at b/tests/library.at index 1702b7556..ace6234b5 100644 --- a/tests/library.at +++ b/tests/library.at @@ -254,3 +254,8 @@ AT_SETUP([stopwatch module]) AT_CHECK([ovstest test-stopwatch], [0], [...... ], [ignore]) AT_CLEANUP + +AT_SETUP([mpsc-queue module]) +AT_CHECK([ovstest test-mpsc-queue check], [0], [.... +]) +AT_CLEANUP diff --git a/tests/test-mpsc-queue.c b/tests/test-mpsc-queue.c new file mode 100644 index 000000000..7bcecb8ff --- /dev/null +++ b/tests/test-mpsc-queue.c @@ -0,0 +1,772 @@ +/* + * Copyright (c) 2020 NVIDIA Corporation. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#undef NDEBUG +#include +#include +#include + +#include + +#include "command-line.h" +#include "guarded-list.h" +#include "mpsc-queue.h" +#include "openvswitch/list.h" +#include "openvswitch/util.h" +#include "openvswitch/vlog.h" +#include "ovs-rcu.h" +#include "ovs-thread.h" +#include "ovstest.h" +#include "timeval.h" +#include "util.h" + +struct element { + union { + struct mpsc_queue_node mpscq; + struct ovs_list list; + } node; + uint64_t mark; +}; + +static void +test_mpsc_queue_mark_element(struct mpsc_queue_node *node, + uint64_t mark, + unsigned int *counter) +{ + struct element *elem; + + elem = CONTAINER_OF(node, struct element, node.mpscq); + elem->mark = mark; + *counter += 1; +} + +static void +test_mpsc_queue_insert(void) +{ + struct element elements[100]; + struct mpsc_queue_node *node; + struct mpsc_queue queue; + unsigned int counter; + size_t i; + + memset(elements, 0, sizeof(elements)); + mpsc_queue_init(&queue); + mpsc_queue_acquire(&queue); + + for (i = 0; i < ARRAY_SIZE(elements); i++) { + mpsc_queue_insert(&queue, &elements[i].node.mpscq); + } + + counter = 0; + while (mpsc_queue_poll(&queue, &node) == MPSC_QUEUE_ITEM) { + test_mpsc_queue_mark_element(node, 1, &counter); + } + + mpsc_queue_release(&queue); + mpsc_queue_destroy(&queue); + + ovs_assert(counter == ARRAY_SIZE(elements)); + for (i = 0; i < ARRAY_SIZE(elements); i++) { + ovs_assert(elements[i].mark == 1); + } + + printf("."); +} + +static void +test_mpsc_queue_removal_fifo(void) +{ + struct element elements[100]; + struct mpsc_queue_node *node; + struct mpsc_queue queue; + unsigned int counter; + size_t i; + + memset(elements, 0, sizeof(elements)); + + mpsc_queue_init(&queue); + mpsc_queue_acquire(&queue); + + for (i = 0; i < ARRAY_SIZE(elements); i++) { + mpsc_queue_insert(&queue, &elements[i].node.mpscq); + } + + /* Elements are in the same order in the list as they + * were declared / initialized. + */ + counter = 0; + while (mpsc_queue_poll(&queue, &node) == MPSC_QUEUE_ITEM) { + test_mpsc_queue_mark_element(node, counter, &counter); + } + + /* The list is valid once extracted from the queue, + * the queue can be destroyed here. + */ + mpsc_queue_release(&queue); + mpsc_queue_destroy(&queue); + + for (i = 0; i < ARRAY_SIZE(elements) - 1; i++) { + struct element *e1, *e2; + + e1 = &elements[i]; + e2 = &elements[i + 1]; + + ovs_assert(e1->mark < e2->mark); + } + + printf("."); +} + +/* Partial insert: + * + * Those functions are 'mpsc_queue_insert()' divided in two parts. + * They serve to test the behavior of the queue when forcing the potential + * condition of a thread starting an insertion then yielding. + */ +static struct mpsc_queue_node * +mpsc_queue_insert_begin(struct mpsc_queue *queue, struct mpsc_queue_node *node) +{ + struct mpsc_queue_node *prev; + + atomic_store_explicit(&node->next, NULL, memory_order_relaxed); + prev = atomic_exchange_explicit(&queue->head, node, memory_order_acq_rel); + return prev; +} + +static void +mpsc_queue_insert_end(struct mpsc_queue_node *prev, + struct mpsc_queue_node *node) +{ + atomic_store_explicit(&prev->next, node, memory_order_release); +} + +static void +test_mpsc_queue_insert_partial(void) +{ + struct element elements[10]; + struct mpsc_queue_node *prevs[ARRAY_SIZE(elements)]; + struct mpsc_queue_node *node; + struct mpsc_queue queue, *q = &queue; + size_t i; + + mpsc_queue_init(q); + + /* Insert the first half of elements entirely, + * insert the second hald of elements partially. + */ + for (i = 0; i < ARRAY_SIZE(elements); i++) { + elements[i].mark = i; + if (i > ARRAY_SIZE(elements) / 2) { + prevs[i] = mpsc_queue_insert_begin(q, &elements[i].node.mpscq); + } else { + prevs[i] = NULL; + mpsc_queue_insert(q, &elements[i].node.mpscq); + } + } + + mpsc_queue_acquire(q); + + /* Verify that when the chain is broken, iterators will stop. */ + i = 0; + MPSC_QUEUE_FOR_EACH (node, q) { + struct element *e = CONTAINER_OF(node, struct element, node.mpscq); + ovs_assert(e == &elements[i]); + i++; + } + ovs_assert(i < ARRAY_SIZE(elements)); + + for (i = 0; i < ARRAY_SIZE(elements); i++) { + if (prevs[i] != NULL) { + mpsc_queue_insert_end(prevs[i], &elements[i].node.mpscq); + } + } + + i = 0; + MPSC_QUEUE_FOR_EACH (node, q) { + struct element *e = CONTAINER_OF(node, struct element, node.mpscq); + ovs_assert(e == &elements[i]); + i++; + } + ovs_assert(i == ARRAY_SIZE(elements)); + + MPSC_QUEUE_FOR_EACH_POP (node, q) { + struct element *e = CONTAINER_OF(node, struct element, node.mpscq); + ovs_assert(e->mark == (unsigned int)(e - elements)); + } + + mpsc_queue_release(q); + mpsc_queue_destroy(q); + + printf("."); +} + +static void +test_mpsc_queue_push_front(void) +{ + struct mpsc_queue queue, *q = &queue; + struct mpsc_queue_node *node; + struct element elements[10]; + size_t i; + + mpsc_queue_init(q); + mpsc_queue_acquire(q); + + ovs_assert(mpsc_queue_pop(q) == NULL); + mpsc_queue_push_front(q, &elements[0].node.mpscq); + node = mpsc_queue_pop(q); + ovs_assert(node == &elements[0].node.mpscq); + ovs_assert(mpsc_queue_pop(q) == NULL); + + mpsc_queue_push_front(q, &elements[0].node.mpscq); + mpsc_queue_push_front(q, &elements[1].node.mpscq); + ovs_assert(mpsc_queue_pop(q) == &elements[1].node.mpscq); + ovs_assert(mpsc_queue_pop(q) == &elements[0].node.mpscq); + ovs_assert(mpsc_queue_pop(q) == NULL); + + mpsc_queue_push_front(q, &elements[1].node.mpscq); + mpsc_queue_push_front(q, &elements[0].node.mpscq); + mpsc_queue_insert(q, &elements[2].node.mpscq); + ovs_assert(mpsc_queue_pop(q) == &elements[0].node.mpscq); + ovs_assert(mpsc_queue_pop(q) == &elements[1].node.mpscq); + ovs_assert(mpsc_queue_pop(q) == &elements[2].node.mpscq); + ovs_assert(mpsc_queue_pop(q) == NULL); + + for (i = 0; i < ARRAY_SIZE(elements); i++) { + elements[i].mark = i; + mpsc_queue_insert(q, &elements[i].node.mpscq); + } + + node = mpsc_queue_pop(q); + mpsc_queue_push_front(q, node); + ovs_assert(mpsc_queue_pop(q) == node); + mpsc_queue_push_front(q, node); + + i = 0; + MPSC_QUEUE_FOR_EACH (node, q) { + struct element *e = CONTAINER_OF(node, struct element, node.mpscq); + ovs_assert(e == &elements[i]); + i++; + } + ovs_assert(i == ARRAY_SIZE(elements)); + + MPSC_QUEUE_FOR_EACH_POP (node, q) { + struct element *e = CONTAINER_OF(node, struct element, node.mpscq); + ovs_assert(e->mark == (unsigned int)(e - elements)); + } + + mpsc_queue_release(q); + mpsc_queue_destroy(q); + + printf("."); +} + +static void +run_tests(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + /* Verify basic insertion. */ + test_mpsc_queue_insert(); + /* Test partial insertion. */ + test_mpsc_queue_insert_partial(); + /* Verify removal order is respected. */ + test_mpsc_queue_removal_fifo(); + /* Verify tail-end insertion works. */ + test_mpsc_queue_push_front(); + printf("\n"); +} + +static struct element *elements; +static uint64_t *thread_working_ms; /* Measured work time. */ + +static unsigned int n_threads; +static unsigned int n_elems; + +static struct ovs_barrier barrier; +static volatile bool working; + +static int +elapsed(const struct timeval *start) +{ + struct timeval end; + + xgettimeofday(&end); + return timeval_to_msec(&end) - timeval_to_msec(start); +} + +static void +print_result(const char *prefix, int reader_elapsed) +{ + uint64_t avg; + size_t i; + + avg = 0; + for (i = 0; i < n_threads; i++) { + avg += thread_working_ms[i]; + } + avg /= n_threads; + printf("%s: %6d", prefix, reader_elapsed); + for (i = 0; i < n_threads; i++) { + printf(" %6" PRIu64, thread_working_ms[i]); + } + printf(" %6" PRIu64 " ms\n", avg); +} + +struct mpscq_aux { + struct mpsc_queue *queue; + atomic_uint thread_id; +}; + +static void * +mpsc_queue_insert_thread(void *aux_) +{ + unsigned int n_elems_per_thread; + struct element *th_elements; + struct mpscq_aux *aux = aux_; + struct timeval start; + unsigned int id; + size_t i; + + atomic_add(&aux->thread_id, 1u, &id); + n_elems_per_thread = n_elems / n_threads; + th_elements = &elements[id * n_elems_per_thread]; + + ovs_barrier_block(&barrier); + xgettimeofday(&start); + + for (i = 0; i < n_elems_per_thread; i++) { + mpsc_queue_insert(aux->queue, &th_elements[i].node.mpscq); + } + + thread_working_ms[id] = elapsed(&start); + ovs_barrier_block(&barrier); + + working = false; + + return NULL; +} + +static void +benchmark_mpsc_queue(void) +{ + struct mpsc_queue_node *node; + struct mpsc_queue queue; + struct timeval start; + unsigned int counter; + bool work_complete; + pthread_t *threads; + struct mpscq_aux aux; + uint64_t epoch; + size_t i; + + memset(elements, 0, n_elems & sizeof *elements); + memset(thread_working_ms, 0, n_threads & sizeof *thread_working_ms); + + mpsc_queue_init(&queue); + + aux.queue = &queue; + atomic_store(&aux.thread_id, 0); + + for (i = n_elems - (n_elems % n_threads); i < n_elems; i++) { + mpsc_queue_insert(&queue, &elements[i].node.mpscq); + } + + working = true; + + threads = xmalloc(n_threads * sizeof *threads); + ovs_barrier_init(&barrier, n_threads); + + for (i = 0; i < n_threads; i++) { + threads[i] = ovs_thread_create("sc_queue_insert", + mpsc_queue_insert_thread, &aux); + } + + mpsc_queue_acquire(&queue); + xgettimeofday(&start); + + counter = 0; + epoch = 1; + do { + while (mpsc_queue_poll(&queue, &node) == MPSC_QUEUE_ITEM) { + test_mpsc_queue_mark_element(node, epoch, &counter); + } + if (epoch == UINT64_MAX) { + epoch = 0; + } + epoch++; + } while (working); + + for (i = 0; i < n_threads; i++) { + xpthread_join(threads[i], NULL); + } + + /* Elements might have been inserted before threads were joined. */ + while (mpsc_queue_poll(&queue, &node) == MPSC_QUEUE_ITEM) { + test_mpsc_queue_mark_element(node, epoch, &counter); + } + + print_result(" mpsc-queue", elapsed(&start)); + + mpsc_queue_release(&queue); + mpsc_queue_destroy(&queue); + ovs_barrier_destroy(&barrier); + free(threads); + + work_complete = true; + for (i = 0; i < n_elems; i++) { + if (elements[i].mark == 0) { + printf("Element %" PRIuSIZE " was never consumed.\n", i); + work_complete = false; + } + } + ovs_assert(work_complete); + ovs_assert(counter == n_elems); +} + +#ifdef HAVE_PTHREAD_SPIN_LOCK +#define spin_lock_type ovs_spin +#define spin_lock_init(l) ovs_spin_init(l) +#define spin_lock_destroy(l) ovs_spin_destroy(l) +#define spin_lock(l) ovs_spin_lock(l) +#define spin_unlock(l) ovs_spin_unlock(l) +#else +#define spin_lock_type ovs_mutex +#define spin_lock_init(l) ovs_mutex_init(l) +#define spin_lock_destroy(l) ovs_mutex_destroy(l) +#define spin_lock(l) ovs_mutex_lock(l) +#define spin_unlock(l) ovs_mutex_unlock(l) +#endif + +struct list_aux { + struct ovs_list *list; + struct ovs_mutex *mutex; + struct spin_lock_type *spin; + atomic_uint thread_id; +}; + +static void * +locked_list_insert_main(void *aux_) + OVS_NO_THREAD_SAFETY_ANALYSIS +{ + unsigned int n_elems_per_thread; + struct element *th_elements; + struct list_aux *aux = aux_; + struct timeval start; + unsigned int id; + size_t i; + + atomic_add(&aux->thread_id, 1u, &id); + n_elems_per_thread = n_elems / n_threads; + th_elements = &elements[id * n_elems_per_thread]; + + ovs_barrier_block(&barrier); + xgettimeofday(&start); + + for (i = 0; i < n_elems_per_thread; i++) { + aux->mutex ? ovs_mutex_lock(aux->mutex) + : spin_lock(aux->spin); + ovs_list_push_front(aux->list, &th_elements[i].node.list); + aux->mutex ? ovs_mutex_unlock(aux->mutex) + : spin_unlock(aux->spin); + } + + thread_working_ms[id] = elapsed(&start); + ovs_barrier_block(&barrier); + + working = false; + + return NULL; +} + +static void +benchmark_list(bool use_mutex) +{ + struct ovs_mutex mutex; + struct spin_lock_type spin; + struct ovs_list list; + struct element *elem; + struct timeval start; + unsigned int counter; + bool work_complete; + pthread_t *threads; + struct list_aux aux; + uint64_t epoch; + size_t i; + + memset(elements, 0, n_elems * sizeof *elements); + memset(thread_working_ms, 0, n_threads * sizeof *thread_working_ms); + + use_mutex ? ovs_mutex_init(&mutex) : spin_lock_init(&spin); + + ovs_list_init(&list); + + aux.list = &list; + aux.mutex = use_mutex ? &mutex : NULL; + aux.spin = use_mutex ? NULL : &spin; + atomic_store(&aux.thread_id, 0); + + for (i = n_elems - (n_elems % n_threads); i < n_elems; i++) { + ovs_list_push_front(&list, &elements[i].node.list); + } + + working = true; + + threads = xmalloc(n_threads * sizeof *threads); + ovs_barrier_init(&barrier, n_threads); + + for (i = 0; i < n_threads; i++) { + threads[i] = ovs_thread_create("locked_list_insert", + locked_list_insert_main, &aux); + } + + xgettimeofday(&start); + + counter = 0; + epoch = 1; + do { + if (use_mutex) { + ovs_mutex_lock(&mutex); + LIST_FOR_EACH_POP (elem, node.list, &list) { + elem->mark = epoch; + counter++; + } + ovs_mutex_unlock(&mutex); + } else { + struct ovs_list *node = NULL; + + spin_lock(&spin); + if (!ovs_list_is_empty(&list)) { + node = ovs_list_pop_front(&list); + } + spin_unlock(&spin); + + if (!node) { + continue; + } + + elem = CONTAINER_OF(node, struct element, node.list); + elem->mark = epoch; + counter++; + } + if (epoch == UINT64_MAX) { + epoch = 0; + } + epoch++; + } while (working); + + for (i = 0; i < n_threads; i++) { + xpthread_join(threads[i], NULL); + } + + /* Elements might have been inserted before threads were joined. */ + LIST_FOR_EACH_POP (elem, node.list, &list) { + elem->mark = epoch; + counter++; + } + + if (use_mutex) { + print_result(" list(mutex)", elapsed(&start)); + } else { + print_result(" list(spin)", elapsed(&start)); + } + + use_mutex ? ovs_mutex_destroy(&mutex) : spin_lock_destroy(&spin); + ovs_barrier_destroy(&barrier); + free(threads); + + work_complete = true; + for (i = 0; i < n_elems; i++) { + if (elements[i].mark == 0) { + printf("Element %" PRIuSIZE " was never consumed.\n", i); + work_complete = false; + } + } + ovs_assert(work_complete); + ovs_assert(counter == n_elems); +} + +struct guarded_list_aux { + struct guarded_list *glist; + atomic_uint thread_id; +}; + +static void * +guarded_list_insert_thread(void *aux_) +{ + unsigned int n_elems_per_thread; + struct element *th_elements; + struct guarded_list_aux *aux = aux_; + struct timeval start; + unsigned int id; + size_t i; + + atomic_add(&aux->thread_id, 1u, &id); + n_elems_per_thread = n_elems / n_threads; + th_elements = &elements[id * n_elems_per_thread]; + + ovs_barrier_block(&barrier); + xgettimeofday(&start); + + for (i = 0; i < n_elems_per_thread; i++) { + guarded_list_push_back(aux->glist, &th_elements[i].node.list, n_elems); + } + + thread_working_ms[id] = elapsed(&start); + ovs_barrier_block(&barrier); + + working = false; + + return NULL; +} + +static void +benchmark_guarded_list(void) +{ + struct guarded_list_aux aux; + struct ovs_list extracted; + struct guarded_list glist; + struct element *elem; + struct timeval start; + unsigned int counter; + bool work_complete; + pthread_t *threads; + uint64_t epoch; + size_t i; + + memset(elements, 0, n_elems * sizeof *elements); + memset(thread_working_ms, 0, n_threads * sizeof *thread_working_ms); + + guarded_list_init(&glist); + ovs_list_init(&extracted); + + aux.glist = &glist; + atomic_store(&aux.thread_id, 0); + + for (i = n_elems - (n_elems % n_threads); i < n_elems; i++) { + guarded_list_push_back(&glist, &elements[i].node.list, n_elems); + } + + working = true; + + threads = xmalloc(n_threads * sizeof *threads); + ovs_barrier_init(&barrier, n_threads); + + for (i = 0; i < n_threads; i++) { + threads[i] = ovs_thread_create("guarded_list_insert", + guarded_list_insert_thread, &aux); + } + + xgettimeofday(&start); + + counter = 0; + epoch = 1; + do { + guarded_list_pop_all(&glist, &extracted); + LIST_FOR_EACH_POP (elem, node.list, &extracted) { + elem->mark = epoch; + counter++; + } + if (epoch == UINT64_MAX) { + epoch = 0; + } + epoch++; + } while (working); + + for (i = 0; i < n_threads; i++) { + xpthread_join(threads[i], NULL); + } + + /* Elements might have been inserted before threads were joined. */ + guarded_list_pop_all(&glist, &extracted); + LIST_FOR_EACH_POP (elem, node.list, &extracted) { + elem->mark = epoch; + counter++; + } + + print_result("guarded list", elapsed(&start)); + + ovs_barrier_destroy(&barrier); + free(threads); + guarded_list_destroy(&glist); + + work_complete = true; + for (i = 0; i < n_elems; i++) { + if (elements[i].mark == 0) { + printf("Element %" PRIuSIZE " was never consumed.\n", i); + work_complete = false; + } + } + ovs_assert(work_complete); + ovs_assert(counter == n_elems); +} + +static void +run_benchmarks(struct ovs_cmdl_context *ctx) +{ + long int l_threads; + long int l_elems; + size_t i; + + ovsrcu_quiesce_start(); + + l_elems = strtol(ctx->argv[1], NULL, 10); + l_threads = strtol(ctx->argv[2], NULL, 10); + ovs_assert(l_elems > 0 && l_threads > 0); + + n_elems = l_elems; + n_threads = l_threads; + + elements = xcalloc(n_elems, sizeof *elements); + thread_working_ms = xcalloc(n_threads, sizeof *thread_working_ms); + + printf("Benchmarking n=%u on 1 + %u threads.\n", n_elems, n_threads); + + printf(" type\\thread: Reader "); + for (i = 0; i < n_threads; i++) { + printf(" %3" PRIuSIZE " ", i + 1); + } + printf(" Avg\n"); + + benchmark_mpsc_queue(); +#ifdef HAVE_PTHREAD_SPIN_LOCK + benchmark_list(false); +#endif + benchmark_list(true); + benchmark_guarded_list(); + + free(thread_working_ms); + free(elements); +} + +static const struct ovs_cmdl_command commands[] = { + {"check", NULL, 0, 0, run_tests, OVS_RO}, + {"benchmark", " ", 2, 2, run_benchmarks, OVS_RO}, + {NULL, NULL, 0, 0, NULL, OVS_RO}, +}; + +static void +test_mpsc_queue_main(int argc, char *argv[]) +{ + struct ovs_cmdl_context ctx = { + .argc = argc - optind, + .argv = argv + optind, + }; + + vlog_set_levels(NULL, VLF_ANY_DESTINATION, VLL_OFF); + + set_program_name(argv[0]); + ovs_cmdl_run_command(&ctx, commands); +} + +OVSTEST_REGISTER("test-mpsc-queue", test_mpsc_queue_main); From patchwork Tue Jun 15 23:22:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gaetan Rivet X-Patchwork-Id: 1492520 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=u256.net header.i=@u256.net header.a=rsa-sha256 header.s=fm2 header.b=S44yuI/w; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.a=rsa-sha256 header.s=fm3 header.b=mYzKgv/w; dkim-atps=neutral Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G4PWZ6h1vz9sWl for ; Wed, 16 Jun 2021 09:23:22 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id D837C608F0; Tue, 15 Jun 2021 23:23:20 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KWBBno4FTtvq; Tue, 15 Jun 2021 23:23:17 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id 06A7960902; Tue, 15 Jun 2021 23:23:16 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7A91AC002A; Tue, 15 Jun 2021 23:23:14 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 3B0ADC000B for ; Tue, 15 Jun 2021 23:23:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 9DEF28380B for ; Tue, 15 Jun 2021 23:23:08 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=u256.net header.b="S44yuI/w"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="mYzKgv/w" Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vYPGeQhQUxEZ for ; Tue, 15 Jun 2021 23:23:07 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp1.osuosl.org (Postfix) with ESMTPS id 9694283C2B for ; Tue, 15 Jun 2021 23:23:06 +0000 (UTC) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id C38195C019A; Tue, 15 Jun 2021 19:23:05 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 15 Jun 2021 19:23:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=u256.net; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm2; bh=sEIMfxtLl7+Ft tgjt8734w502YfIZJWNfMCkDYDARB8=; b=S44yuI/wjk3M/NsVR2x7s4PWJ4I9q 3CtjbpJklajastNMprIgjbfLUvwp1gyuORr1M69kdwVGv1OU/6QPh7u1dzCJ0Z7z /WhA3h8SgEstM0Wldg6367GOwafmTAfs4YrTTGTmq94OJ/YCjnLMaPQbWgweOvyW dBmm2lvwnw+03o2+zT8crVBSkKfsr7GsDNfepUDlYO8TQqoGyV7OlaJl3LcMuSKh f2ZcIHIJs5b5sbJFu3SHF6yhBXTFcp7YE7UYlpxlP4NMKOAuCarkB13TtJ31R9R9 SG4Jlkau38EekhcfITP/xgXditca4EkGPr88/9ltO/km0thnXXscTG7/g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=sEIMfxtLl7+Fttgjt8734w502YfIZJWNfMCkDYDARB8=; b=mYzKgv/w QgAZuCG0cZgGQ7B3ODlccv9/US88Mh6h2apvX4wNlvsraUeC6Cdmw4/nWSv/+F/9 J9mbpoJFyAuMwOQVBNhm4cANOlT1SzHL9MbfZjIbcQVPOWCnvED/uXXed1D9rEoz T9Imn5sRMXjGxO02ExJ+N1z182gYYS4dFSX2dgpsua4q0aAZENxygyN0zVrIuhea lcodOBxid75AbuwV7QBzs8pldQIuLsybBC6EycTEt1kpaE+X01T7b+hAadzlQ8ib A0l2ynPg3hHwk8231+nBGqVfTW891VbLLvLcHbztWSWdlLTMji72LUMr+NeaHfgj slPzb1jgNtKFGA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfedvkedgvddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpefirggvthgr nhcutfhivhgvthcuoehgrhhivhgvsehuvdehiedrnhgvtheqnecuggftrfgrthhtvghrnh ephefgveffkeetheetfeeifedvheelfeejfeehveduteejhfekuedtkeeiuedvteehnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhrihhvvg esuhdvheeirdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Jun 2021 19:23:05 -0400 (EDT) From: Gaetan Rivet To: ovs-dev@openvswitch.org Date: Wed, 16 Jun 2021 01:22:49 +0200 Message-Id: <027307a5bc5594ee34bce3bbec5b063b874820e7.1623786081.git.grive@u256.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Cc: Eli Britstein Subject: [ovs-dev] [PATCH v3 4/8] conntrack: Use mpsc-queue to store conn expirations X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Change the connection expiration lists from ovs_list to mpsc-queue. This is a pre-step towards reducing the granularity of 'ct_lock'. It simplifies the responsibilities toward updating the expiration queue. The dataplane now appends the new conn for expiration once during creation. Any further update will only consist in writing the conn expiration limit and marking the conn for expiration rescheduling. The ageing thread 'ct_clean' is the only one consuming the expiration lists. If a conn was marked for rescheduling by a dataplane, it will move the conn to the end of the queue. Once the locks have been reworked, it means neither the dataplane threads nor 'ct_clean' have to take a lock to update the expiration lists (assuming the consumer lock is perpetually held by 'ct_clean'); Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein --- lib/conntrack-private.h | 84 +++++++++++++++++++--------- lib/conntrack-tp.c | 28 +++++----- lib/conntrack.c | 118 ++++++++++++++++++++++++++++++++++------ 3 files changed, 173 insertions(+), 57 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index e8332bdba..537e56534 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -29,6 +29,7 @@ #include "openvswitch/list.h" #include "openvswitch/types.h" #include "packets.h" +#include "mpsc-queue.h" #include "unaligned.h" #include "dp-packet.h" @@ -86,22 +87,57 @@ struct alg_exp_node { bool nat_rpl_dst; }; +/* Timeouts: all the possible timeout states passed to update_expiration() + * are listed here. The name will be prefix by CT_TM_ and the value is in + * milliseconds */ +#define CT_TIMEOUTS \ + CT_TIMEOUT(TCP_FIRST_PACKET) \ + CT_TIMEOUT(TCP_OPENING) \ + CT_TIMEOUT(TCP_ESTABLISHED) \ + CT_TIMEOUT(TCP_CLOSING) \ + CT_TIMEOUT(TCP_FIN_WAIT) \ + CT_TIMEOUT(TCP_CLOSED) \ + CT_TIMEOUT(OTHER_FIRST) \ + CT_TIMEOUT(OTHER_MULTIPLE) \ + CT_TIMEOUT(OTHER_BIDIR) \ + CT_TIMEOUT(ICMP_FIRST) \ + CT_TIMEOUT(ICMP_REPLY) + +enum ct_timeout { +#define CT_TIMEOUT(NAME) CT_TM_##NAME, + CT_TIMEOUTS +#undef CT_TIMEOUT + N_CT_TM +}; + enum OVS_PACKED_ENUM ct_conn_type { CT_CONN_TYPE_DEFAULT, CT_CONN_TYPE_UN_NAT, }; +struct conn_expire { + struct mpsc_queue_node node; + /* Timeout state of the connection. + * It follows the connection state updates. + */ + enum ct_timeout tm; + atomic_flag reschedule; + struct ovs_refcount refcount; +}; + struct conn { /* Immutable data. */ struct conn_key key; struct conn_key rev_key; struct conn_key parent_key; /* Only used for orig_tuple support. */ - struct ovs_list exp_node; struct cmap_node cm_node; struct nat_action_info_t *nat_info; char *alg; struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */ + /* Inserted once by a PMD, then managed by the 'ct_clean' thread. */ + struct conn_expire exp; + /* Mutable data. */ struct ovs_mutex lock; /* Guards all mutable fields. */ ovs_u128 label; @@ -132,33 +168,10 @@ enum ct_update_res { CT_UPDATE_VALID_NEW, }; -/* Timeouts: all the possible timeout states passed to update_expiration() - * are listed here. The name will be prefix by CT_TM_ and the value is in - * milliseconds */ -#define CT_TIMEOUTS \ - CT_TIMEOUT(TCP_FIRST_PACKET) \ - CT_TIMEOUT(TCP_OPENING) \ - CT_TIMEOUT(TCP_ESTABLISHED) \ - CT_TIMEOUT(TCP_CLOSING) \ - CT_TIMEOUT(TCP_FIN_WAIT) \ - CT_TIMEOUT(TCP_CLOSED) \ - CT_TIMEOUT(OTHER_FIRST) \ - CT_TIMEOUT(OTHER_MULTIPLE) \ - CT_TIMEOUT(OTHER_BIDIR) \ - CT_TIMEOUT(ICMP_FIRST) \ - CT_TIMEOUT(ICMP_REPLY) - -enum ct_timeout { -#define CT_TIMEOUT(NAME) CT_TM_##NAME, - CT_TIMEOUTS -#undef CT_TIMEOUT - N_CT_TM -}; - struct conntrack { struct ovs_mutex ct_lock; /* Protects 2 following fields. */ struct cmap conns OVS_GUARDED; - struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED; + struct mpsc_queue exp_lists[N_CT_TM]; struct hmap zone_limits OVS_GUARDED; struct hmap timeout_policies OVS_GUARDED; uint32_t hash_basis; /* Salt for hashing a connection key. */ @@ -204,4 +217,25 @@ struct ct_l4_proto { struct ct_dpif_protoinfo *); }; +static inline void +conn_expire_push_back(struct conntrack *ct, struct conn *conn) +{ + if (ovs_refcount_try_ref_rcu(&conn->exp.refcount)) { + atomic_flag_clear(&conn->exp.reschedule); + mpsc_queue_insert(&ct->exp_lists[conn->exp.tm], &conn->exp.node); + } +} + +static inline void +conn_expire_push_front(struct conntrack *ct, struct conn *conn) + OVS_REQUIRES(ct->exp_lists[conn->exp.tm].read_lock) +{ + if (ovs_refcount_try_ref_rcu(&conn->exp.refcount)) { + /* Do not change 'reschedule' state, if this expire node is put + * at the tail of the list, it will be next to be read from the queue. + */ + mpsc_queue_push_front(&ct->exp_lists[conn->exp.tm], &conn->exp.node); + } +} + #endif /* conntrack-private.h */ diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index a586d3a8d..6de2354c0 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -230,6 +230,15 @@ tm_to_ct_dpif_tp(enum ct_timeout tm) return CT_DPIF_TP_ATTR_MAX; } +static void +conn_schedule_expiration(struct conn *conn, enum ct_timeout tm, long long now, + uint32_t tp_value) +{ + conn->expiration = now + tp_value * 1000; + conn->exp.tm = tm; + ignore(atomic_flag_test_and_set(&conn->exp.reschedule)); +} + static void conn_update_expiration__(struct conntrack *ct, struct conn *conn, enum ct_timeout tm, long long now, @@ -240,11 +249,7 @@ conn_update_expiration__(struct conntrack *ct, struct conn *conn, ovs_mutex_lock(&ct->ct_lock); ovs_mutex_lock(&conn->lock); - if (!conn->cleaned) { - conn->expiration = now + tp_value * 1000; - ovs_list_remove(&conn->exp_node); - ovs_list_push_back(&ct->exp_lists[tm], &conn->exp_node); - } + conn_schedule_expiration(conn, tm, now, tp_value); ovs_mutex_unlock(&conn->lock); ovs_mutex_unlock(&ct->ct_lock); @@ -281,15 +286,6 @@ conn_update_expiration(struct conntrack *ct, struct conn *conn, conn_update_expiration__(ct, conn, tm, now, val); } -static void -conn_init_expiration__(struct conntrack *ct, struct conn *conn, - enum ct_timeout tm, long long now, - uint32_t tp_value) -{ - conn->expiration = now + tp_value * 1000; - ovs_list_push_back(&ct->exp_lists[tm], &conn->exp_node); -} - /* ct_lock must be held. */ void conn_init_expiration(struct conntrack *ct, struct conn *conn, @@ -309,5 +305,7 @@ conn_init_expiration(struct conntrack *ct, struct conn *conn, VLOG_DBG_RL(&rl, "Init timeout %s zone=%u with policy id=%d val=%u sec.", ct_timeout_str[tm], conn->key.zone, conn->tp_id, val); - conn_init_expiration__(ct, conn, tm, now, val); + atomic_flag_clear(&conn->exp.reschedule); + ovs_refcount_init(&conn->exp.refcount); + conn_schedule_expiration(conn, tm, now, val); } diff --git a/lib/conntrack.c b/lib/conntrack.c index a5efb37aa..45de13ebf 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -306,7 +306,7 @@ conntrack_init(void) ovs_mutex_lock(&ct->ct_lock); cmap_init(&ct->conns); for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) { - ovs_list_init(&ct->exp_lists[i]); + mpsc_queue_init(&ct->exp_lists[i]); } hmap_init(&ct->zone_limits); ct->zone_limit_seq = 0; @@ -457,6 +457,17 @@ conn_clean_cmn(struct conntrack *ct, struct conn *conn) } } +static inline bool +conn_unref(struct conn *conn) +{ + ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); + if (ovs_refcount_unref(&conn->exp.refcount) == 1) { + ovsrcu_postpone(delete_conn, conn); + return true; + } + return false; +} + /* Must be called with 'conn' of 'conn_type' CT_CONN_TYPE_DEFAULT. Also * removes the associated nat 'conn' from the lookup datastructures. */ static void @@ -470,23 +481,31 @@ conn_clean(struct conntrack *ct, struct conn *conn) uint32_t hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis); cmap_remove(&ct->conns, &conn->nat_conn->cm_node, hash); } - ovs_list_remove(&conn->exp_node); conn->cleaned = true; - ovsrcu_postpone(delete_conn, conn); + conn_unref(conn); atomic_count_dec(&ct->n_conn); } +static inline bool +conn_unref_one(struct conn *conn) +{ + if (ovs_refcount_unref(&conn->exp.refcount) == 1) { + ovsrcu_postpone(delete_conn_one, conn); + return true; + } + return false; +} + static void conn_clean_one(struct conntrack *ct, struct conn *conn) OVS_REQUIRES(ct->ct_lock) { conn_clean_cmn(ct, conn); if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_list_remove(&conn->exp_node); conn->cleaned = true; atomic_count_dec(&ct->n_conn); } - ovsrcu_postpone(delete_conn_one, conn); + conn_unref_one(conn); } /* Destroys the connection tracker 'ct' and frees all the allocated memory. @@ -501,6 +520,19 @@ conntrack_destroy(struct conntrack *ct) latch_destroy(&ct->clean_thread_exit); ovs_mutex_lock(&ct->ct_lock); + + for (unsigned i = 0; i < N_CT_TM; i++) { + struct mpsc_queue_node *node; + + mpsc_queue_acquire(&ct->exp_lists[i]); + MPSC_QUEUE_FOR_EACH_POP (node, &ct->exp_lists[i]) { + conn = CONTAINER_OF(node, struct conn, exp.node); + conn_unref(conn); + } + mpsc_queue_release(&ct->exp_lists[i]); + mpsc_queue_destroy(&ct->exp_lists[i]); + } + CMAP_FOR_EACH (conn, cm_node, &ct->conns) { conn_clean_one(ct, conn); } @@ -1059,6 +1091,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, ovs_mutex_init_adaptive(&nc->lock); nc->conn_type = CT_CONN_TYPE_DEFAULT; cmap_insert(&ct->conns, &nc->cm_node, ctx->hash); + conn_expire_push_back(ct, nc); atomic_count_inc(&ct->n_conn); ctx->conn = nc; /* For completeness. */ if (zl) { @@ -1079,7 +1112,6 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, * can limit DoS impact. */ nat_res_exhaustion: free(nat_conn); - ovs_list_remove(&nc->exp_node); delete_conn_cmn(nc); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); VLOG_WARN_RL(&rl, "Unable to NAT due to tuple space exhaustion - " @@ -1496,29 +1528,72 @@ set_label(struct dp_packet *pkt, struct conn *conn, * if 'limit' is reached */ static long long ct_sweep(struct conntrack *ct, long long now, size_t limit) + OVS_NO_THREAD_SAFETY_ANALYSIS { - struct conn *conn, *next; + struct conn *conn; long long min_expiration = LLONG_MAX; + struct mpsc_queue_node *node; size_t count = 0; ovs_mutex_lock(&ct->ct_lock); for (unsigned i = 0; i < N_CT_TM; i++) { - LIST_FOR_EACH_SAFE (conn, next, exp_node, &ct->exp_lists[i]) { + struct conn *end_of_queue = NULL; + + MPSC_QUEUE_FOR_EACH_POP (node, &ct->exp_lists[i]) { + long long int expiration; + + conn = CONTAINER_OF(node, struct conn, exp.node); + if (conn_unref(conn)) { + continue; + } + ovs_mutex_lock(&conn->lock); - if (now < conn->expiration || count >= limit) { - min_expiration = MIN(min_expiration, conn->expiration); - ovs_mutex_unlock(&conn->lock); - if (count >= limit) { - /* Do not check other lists. */ - COVERAGE_INC(conntrack_long_cleanup); - goto out; - } + expiration = conn->expiration; + ovs_mutex_unlock(&conn->lock); + + if (conn == end_of_queue) { + /* If we already re-enqueued this conn during this sweep, + * stop iterating this list and skip to the next. + */ + min_expiration = MIN(min_expiration, expiration); + conn_expire_push_front(ct, conn); break; + } + + if (count >= limit) { + min_expiration = MIN(min_expiration, expiration); + conn_expire_push_front(ct, conn); + COVERAGE_INC(conntrack_long_cleanup); + /* Do not check other lists. */ + goto out; + } + + if (now < expiration) { + if (atomic_flag_test_and_set(&conn->exp.reschedule)) { + /* Reschedule was true, another thread marked + * this conn to be enqueued back. + * The conn is not yet expired, still valid, and + * this list should still be iterated. + */ + conn_expire_push_back(ct, conn); + if (end_of_queue == NULL) { + end_of_queue = conn; + } + } else { + /* This connection is still valid, while no other thread + * modified it: it means this list iteration is finished + * for now. Put back the connection within the list. + */ + atomic_flag_clear(&conn->exp.reschedule); + min_expiration = MIN(min_expiration, expiration); + conn_expire_push_front(ct, conn); + break; + } } else { - ovs_mutex_unlock(&conn->lock); conn_clean(ct, conn); } + count++; } } @@ -1570,9 +1645,14 @@ conntrack_clean(struct conntrack *ct, long long now) static void * clean_thread_main(void *f_) + OVS_NO_THREAD_SAFETY_ANALYSIS { struct conntrack *ct = f_; + for (unsigned i = 0; i < N_CT_TM; i++) { + mpsc_queue_acquire(&ct->exp_lists[i]); + } + while (!latch_is_set(&ct->clean_thread_exit)) { long long next_wake; long long now = time_msec(); @@ -1587,6 +1667,10 @@ clean_thread_main(void *f_) poll_block(); } + for (unsigned i = 0; i < N_CT_TM; i++) { + mpsc_queue_release(&ct->exp_lists[i]); + } + return NULL; } From patchwork Tue Jun 15 23:22:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gaetan Rivet X-Patchwork-Id: 1492523 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=u256.net header.i=@u256.net header.a=rsa-sha256 header.s=fm2 header.b=lFK1lrFQ; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.a=rsa-sha256 header.s=fm3 header.b=dNS4Qr+k; dkim-atps=neutral Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G4PWh1gwfz9sWl for ; Wed, 16 Jun 2021 09:23:28 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 6E4A983C89; Tue, 15 Jun 2021 23:23:24 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Eq40Y-5zCdlR; Tue, 15 Jun 2021 23:23:22 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTPS id BC82783C8A; Tue, 15 Jun 2021 23:23:17 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 023A2C0022; Tue, 15 Jun 2021 23:23:16 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 54920C000E for ; Tue, 15 Jun 2021 23:23:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 45FB940547 for ; Tue, 15 Jun 2021 23:23:09 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp2.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=u256.net header.b="lFK1lrFQ"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="dNS4Qr+k" Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NvmPQwXCpQn5 for ; Tue, 15 Jun 2021 23:23:07 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp2.osuosl.org (Postfix) with ESMTPS id 39311403A7 for ; Tue, 15 Jun 2021 23:23:07 +0000 (UTC) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 7D49C5C00C8; Tue, 15 Jun 2021 19:23:06 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Tue, 15 Jun 2021 19:23:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=u256.net; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm2; bh=X2mb3N0N5u5up 4wPCz30+OLEq8hAzcEZ22NlDgXEjDc=; b=lFK1lrFQyDDwE+0PKbyTMiVIlPYp6 eD3S92h+jXi/JL/IYIckOjR6CeVd7LqEn6i/h/QDTzFQQD8InIRe2rCsRkLbF0Cy 7c4ppOt1Ryesg+OyrKOX/BZ7ty8OaDGvrQ5KYAaD1SDi3TFIeMwSHv+UvSABc7s+ jy04dBbs4wgzWHEyEzbaFO8zxIBrNfEjLzWwIXp7P0FOSHlVzdMgwxjupHJ6tLwY 3IarEHVXqSvfQnWiS3BMj93/0EI5feahO81Tdk7os2nLhbAseenWoxyKNs95Q1VH qq+A5bH4kztpvsutNMDV01bWOqpY3LpNo/NK/mJocuIj5+BNroE9ZfTsg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=X2mb3N0N5u5up4wPCz30+OLEq8hAzcEZ22NlDgXEjDc=; b=dNS4Qr+k 9CICsQQFOTjt96x07ihhkiUnyBaZKnntU2HaZJS7d0LKJ/eVqo+Z75d1z2yi/5Pz AEtSSnHbHW52wX59eIQSkLo6JwR42QexCmd7JrRSQNekw9B2CKAT8YX0zVy1j3fF ofXhGRcnkxZYdMflLKGdwUrsWdgBh1jhLygDMeOnjCnI7JTJvqT4wkouCdBMRO9r HN7jFTdIQ3CdLMasN6bsRcXhXWnQ8mz/uE7tjLIAvsJ1/4Sm1E20JjvXQchcGVga heLMR4NYhGFUCUI323W5YXc1eDU7Za6RFaQZjqrmZMoFiMllZpjhK3o4Ky+nb0cq 3P6XCvxTQl2ldw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfedvkedgvddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpefirggvthgr nhcutfhivhgvthcuoehgrhhivhgvsehuvdehiedrnhgvtheqnecuggftrfgrthhtvghrnh ephefgveffkeetheetfeeifedvheelfeejfeehveduteejhfekuedtkeeiuedvteehnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhrihhvvg esuhdvheeirdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Jun 2021 19:23:05 -0400 (EDT) From: Gaetan Rivet To: ovs-dev@openvswitch.org Date: Wed, 16 Jun 2021 01:22:50 +0200 Message-Id: <6e228bd0b76d84716edf1d658bc9d3c3e95b6151.1623786081.git.grive@u256.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Cc: Eli Britstein Subject: [ovs-dev] [PATCH v3 5/8] conntrack: Use a cmap to store zone limits X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Change the data structure from hmap to cmap for zone limits. As they are shared amongst multiple conntrack users, multiple readers want to check the current zone limit state before progressing in their processing. Using a CMAP allows doing lookups without taking the global 'ct_lock', thus reducing contention. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein --- lib/conntrack-private.h | 2 +- lib/conntrack.c | 70 ++++++++++++++++++++++++++++------------- lib/conntrack.h | 2 +- lib/dpif-netdev.c | 5 +-- 4 files changed, 53 insertions(+), 26 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index 537e56534..7eb3ca297 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -172,7 +172,7 @@ struct conntrack { struct ovs_mutex ct_lock; /* Protects 2 following fields. */ struct cmap conns OVS_GUARDED; struct mpsc_queue exp_lists[N_CT_TM]; - struct hmap zone_limits OVS_GUARDED; + struct cmap zone_limits OVS_GUARDED; struct hmap timeout_policies OVS_GUARDED; uint32_t hash_basis; /* Salt for hashing a connection key. */ pthread_t clean_thread; /* Periodically cleans up connection tracker. */ diff --git a/lib/conntrack.c b/lib/conntrack.c index 45de13ebf..094367733 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -79,7 +79,7 @@ enum ct_alg_ctl_type { }; struct zone_limit { - struct hmap_node node; + struct cmap_node node; struct conntrack_zone_limit czl; }; @@ -308,7 +308,7 @@ conntrack_init(void) for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) { mpsc_queue_init(&ct->exp_lists[i]); } - hmap_init(&ct->zone_limits); + cmap_init(&ct->zone_limits); ct->zone_limit_seq = 0; timeout_policy_init(ct); ovs_mutex_unlock(&ct->ct_lock); @@ -343,12 +343,25 @@ zone_key_hash(int32_t zone, uint32_t basis) } static struct zone_limit * -zone_limit_lookup(struct conntrack *ct, int32_t zone) +zone_limit_lookup_protected(struct conntrack *ct, int32_t zone) OVS_REQUIRES(ct->ct_lock) { uint32_t hash = zone_key_hash(zone, ct->hash_basis); struct zone_limit *zl; - HMAP_FOR_EACH_IN_BUCKET (zl, node, hash, &ct->zone_limits) { + CMAP_FOR_EACH_WITH_HASH_PROTECTED (zl, node, hash, &ct->zone_limits) { + if (zl->czl.zone == zone) { + return zl; + } + } + return NULL; +} + +static struct zone_limit * +zone_limit_lookup(struct conntrack *ct, int32_t zone) +{ + uint32_t hash = zone_key_hash(zone, ct->hash_basis); + struct zone_limit *zl; + CMAP_FOR_EACH_WITH_HASH (zl, node, hash, &ct->zone_limits) { if (zl->czl.zone == zone) { return zl; } @@ -358,7 +371,6 @@ zone_limit_lookup(struct conntrack *ct, int32_t zone) static struct zone_limit * zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone) - OVS_REQUIRES(ct->ct_lock) { struct zone_limit *zl = zone_limit_lookup(ct, zone); return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE); @@ -367,13 +379,16 @@ zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone) struct conntrack_zone_limit zone_limit_get(struct conntrack *ct, int32_t zone) { - ovs_mutex_lock(&ct->ct_lock); - struct conntrack_zone_limit czl = {DEFAULT_ZONE, 0, 0, 0}; + struct conntrack_zone_limit czl = { + .zone = DEFAULT_ZONE, + .limit = 0, + .count = ATOMIC_COUNT_INIT(0), + .zone_limit_seq = 0, + }; struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone); if (zl) { czl = zl->czl; } - ovs_mutex_unlock(&ct->ct_lock); return czl; } @@ -381,13 +396,19 @@ static int zone_limit_create(struct conntrack *ct, int32_t zone, uint32_t limit) OVS_REQUIRES(ct->ct_lock) { + struct zone_limit *zl = zone_limit_lookup_protected(ct, zone); + + if (zl) { + return 0; + } + if (zone >= DEFAULT_ZONE && zone <= MAX_ZONE) { - struct zone_limit *zl = xzalloc(sizeof *zl); + zl = xzalloc(sizeof *zl); zl->czl.limit = limit; zl->czl.zone = zone; zl->czl.zone_limit_seq = ct->zone_limit_seq++; uint32_t hash = zone_key_hash(zone, ct->hash_basis); - hmap_insert(&ct->zone_limits, &zl->node, hash); + cmap_insert(&ct->zone_limits, &zl->node, hash); return 0; } else { return EINVAL; @@ -398,13 +419,14 @@ int zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit) { int err = 0; - ovs_mutex_lock(&ct->ct_lock); struct zone_limit *zl = zone_limit_lookup(ct, zone); if (zl) { zl->czl.limit = limit; VLOG_INFO("Changed zone limit of %u for zone %d", limit, zone); } else { + ovs_mutex_lock(&ct->ct_lock); err = zone_limit_create(ct, zone, limit); + ovs_mutex_unlock(&ct->ct_lock); if (!err) { VLOG_INFO("Created zone limit of %u for zone %d", limit, zone); } else { @@ -412,7 +434,6 @@ zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit) zone); } } - ovs_mutex_unlock(&ct->ct_lock); return err; } @@ -420,23 +441,25 @@ static void zone_limit_clean(struct conntrack *ct, struct zone_limit *zl) OVS_REQUIRES(ct->ct_lock) { - hmap_remove(&ct->zone_limits, &zl->node); - free(zl); + uint32_t hash = zone_key_hash(zl->czl.zone, ct->hash_basis); + cmap_remove(&ct->zone_limits, &zl->node, hash); + ovsrcu_postpone(free, zl); } int zone_limit_delete(struct conntrack *ct, uint16_t zone) { ovs_mutex_lock(&ct->ct_lock); - struct zone_limit *zl = zone_limit_lookup(ct, zone); + struct zone_limit *zl = zone_limit_lookup_protected(ct, zone); if (zl) { zone_limit_clean(ct, zl); + ovs_mutex_unlock(&ct->ct_lock); VLOG_INFO("Deleted zone limit for zone %d", zone); } else { + ovs_mutex_unlock(&ct->ct_lock); VLOG_INFO("Attempted delete of non-existent zone limit: zone %d", zone); } - ovs_mutex_unlock(&ct->ct_lock); return 0; } @@ -453,7 +476,7 @@ conn_clean_cmn(struct conntrack *ct, struct conn *conn) struct zone_limit *zl = zone_limit_lookup(ct, conn->admit_zone); if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) { - zl->czl.count--; + atomic_count_dec(&zl->czl.count); } } @@ -539,10 +562,13 @@ conntrack_destroy(struct conntrack *ct) cmap_destroy(&ct->conns); struct zone_limit *zl; - HMAP_FOR_EACH_POP (zl, node, &ct->zone_limits) { - free(zl); + CMAP_FOR_EACH (zl, node, &ct->zone_limits) { + uint32_t hash = zone_key_hash(zl->czl.zone, ct->hash_basis); + + cmap_remove(&ct->zone_limits, &zl->node, hash); + ovsrcu_postpone(free, zl); } - hmap_destroy(&ct->zone_limits); + cmap_destroy(&ct->zone_limits); struct timeout_policy *tp; HMAP_FOR_EACH_POP (tp, node, &ct->timeout_policies) { @@ -1024,7 +1050,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, if (commit) { struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); - if (zl && zl->czl.count >= zl->czl.limit) { + if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) { return nc; } @@ -1097,7 +1123,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, if (zl) { nc->admit_zone = zl->czl.zone; nc->zone_limit_seq = zl->czl.zone_limit_seq; - zl->czl.count++; + atomic_count_inc(&zl->czl.count); } else { nc->admit_zone = INVALID_ZONE; } diff --git a/lib/conntrack.h b/lib/conntrack.h index 9553b188a..58b181834 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -108,7 +108,7 @@ struct conntrack_dump { struct conntrack_zone_limit { int32_t zone; uint32_t limit; - uint32_t count; + atomic_count count; uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */ }; diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 8fa7eb6d4..e93720e20 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -8215,7 +8215,8 @@ dpif_netdev_ct_get_limits(struct dpif *dpif OVS_UNUSED, czl = zone_limit_get(dp->conntrack, zone_limit->zone); if (czl.zone == zone_limit->zone || czl.zone == DEFAULT_ZONE) { ct_dpif_push_zone_limit(zone_limits_reply, zone_limit->zone, - czl.limit, czl.count); + czl.limit, + atomic_count_get(&czl.count)); } else { return EINVAL; } @@ -8225,7 +8226,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif OVS_UNUSED, czl = zone_limit_get(dp->conntrack, z); if (czl.zone == z) { ct_dpif_push_zone_limit(zone_limits_reply, z, czl.limit, - czl.count); + atomic_count_get(&czl.count)); } } } From patchwork Tue Jun 15 23:22:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gaetan Rivet X-Patchwork-Id: 1492522 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=u256.net header.i=@u256.net header.a=rsa-sha256 header.s=fm2 header.b=mWDOdpYY; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.a=rsa-sha256 header.s=fm3 header.b=A7VR7rDV; dkim-atps=neutral Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G4PWf3N4Hz9sWl for ; Wed, 16 Jun 2021 09:23:26 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 7F63860907; Tue, 15 Jun 2021 23:23:24 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RHUNNTaXooQl; Tue, 15 Jun 2021 23:23:22 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id 4EB3C60A52; Tue, 15 Jun 2021 23:23:19 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 58641C002A; Tue, 15 Jun 2021 23:23:17 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 80260C000B for ; Tue, 15 Jun 2021 23:23:09 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 20C9F4066F for ; Tue, 15 Jun 2021 23:23:09 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp4.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=u256.net header.b="mWDOdpYY"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="A7VR7rDV" Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dwNp_qchOXzu for ; Tue, 15 Jun 2021 23:23:08 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp4.osuosl.org (Postfix) with ESMTPS id 17F9B40678 for ; Tue, 15 Jun 2021 23:23:07 +0000 (UTC) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 512085C00E1; Tue, 15 Jun 2021 19:23:07 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Tue, 15 Jun 2021 19:23:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=u256.net; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm2; bh=tsbSqlMBINfws VL1x/AFmbQBBNJN/r+QTBMXUZcPlas=; b=mWDOdpYYcLSaXemXQPnHv/aKeYt/K QeI0I/rG48uN0T765Td+pwvzW5phJi5YW6UJMLaIwQW0mmdkOzAnVSqOjtf9JlCd iS90+N74lQgWtXdslV0i2KxnA5utM9WAT3/JESpfKTbzZigi553cYSDT5CXZj3zP O2ZepVAC5syq/vq8STtQKNWor5T7RESQU54G9U0HOCY47qf7tjXu8NAfCwo0AFJd MC6me71tj7sbFB6/7r93BuysVrXeqzmWzeCmFjoRsv1dv5xbMhL4wSzfEQrPsXzY O93uYezCJtUk562ukz7ToFUC4q6MllwkZ6RT68W6nhTMIzKrfijl1pGhg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=tsbSqlMBINfwsVL1x/AFmbQBBNJN/r+QTBMXUZcPlas=; b=A7VR7rDV TjXr8QcLJQnPvdsyzQXCWdLE2AwLL2HvkqSaEPiodI1j3QSUlwJDOWqZfGmQvZYR jMUywsyUQAHF9grdo28lI/Fkmx3519xTlKfw9cngrk/T0enSKPAFMa21FlB9e7d1 uPba1onEXlYBAkKbBWA071xdrv3ZGPGd+81XDWDm80aahQmxSzjJdUQ8Vi6jKFr/ PhHEIydpq2utc5ZHPpHvsVItYLXpHSVKZ0RlkssLKUpY3/Oi3rjZBdyCMdLM1qWB KI77jEG4Q1g5B/BjO1NEARP67jByiMdJwbPcNrdkbSuGxuICBrhBp8Lu65cWwIcg vSej6li+uqK7cQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfedvkedgvddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpefirggvthgr nhcutfhivhgvthcuoehgrhhivhgvsehuvdehiedrnhgvtheqnecuggftrfgrthhtvghrnh ephefgveffkeetheetfeeifedvheelfeejfeehveduteejhfekuedtkeeiuedvteehnecu vehluhhsthgvrhfuihiivgepvdenucfrrghrrghmpehmrghilhhfrhhomhepghhrihhvvg esuhdvheeirdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Jun 2021 19:23:06 -0400 (EDT) From: Gaetan Rivet To: ovs-dev@openvswitch.org Date: Wed, 16 Jun 2021 01:22:51 +0200 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Cc: Eli Britstein Subject: [ovs-dev] [PATCH v3 6/8] conntrack-tp: Use a cmap to store timeout policies X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Multiple lookups are done to stored timeout policies, each time blocking the global 'ct_lock'. This is usually not necessary and it should be acceptable to get policy updates slightly delayed (by one RCU sync at most). Using a CMAP reduces multiple lock taking and releasing in the connection insertion path. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein Acked-by: William Tu --- lib/conntrack-private.h | 2 +- lib/conntrack-tp.c | 54 +++++++++++++++++++++++------------------ lib/conntrack.c | 9 ++++--- lib/conntrack.h | 2 +- 4 files changed, 38 insertions(+), 29 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index 7eb3ca297..ea2e7ed4d 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -173,7 +173,7 @@ struct conntrack { struct cmap conns OVS_GUARDED; struct mpsc_queue exp_lists[N_CT_TM]; struct cmap zone_limits OVS_GUARDED; - struct hmap timeout_policies OVS_GUARDED; + struct cmap timeout_policies OVS_GUARDED; uint32_t hash_basis; /* Salt for hashing a connection key. */ pthread_t clean_thread; /* Periodically cleans up connection tracker. */ struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */ diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index 6de2354c0..592e10c6f 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -47,14 +47,15 @@ static unsigned int ct_dpif_netdev_tp_def[] = { }; static struct timeout_policy * -timeout_policy_lookup(struct conntrack *ct, int32_t tp_id) +timeout_policy_lookup_protected(struct conntrack *ct, int32_t tp_id) OVS_REQUIRES(ct->ct_lock) { struct timeout_policy *tp; uint32_t hash; hash = hash_int(tp_id, ct->hash_basis); - HMAP_FOR_EACH_IN_BUCKET (tp, node, hash, &ct->timeout_policies) { + CMAP_FOR_EACH_WITH_HASH_PROTECTED (tp, node, hash, + &ct->timeout_policies) { if (tp->policy.id == tp_id) { return tp; } @@ -62,20 +63,25 @@ timeout_policy_lookup(struct conntrack *ct, int32_t tp_id) return NULL; } -struct timeout_policy * -timeout_policy_get(struct conntrack *ct, int32_t tp_id) +static struct timeout_policy * +timeout_policy_lookup(struct conntrack *ct, int32_t tp_id) { struct timeout_policy *tp; + uint32_t hash; - ovs_mutex_lock(&ct->ct_lock); - tp = timeout_policy_lookup(ct, tp_id); - if (!tp) { - ovs_mutex_unlock(&ct->ct_lock); - return NULL; + hash = hash_int(tp_id, ct->hash_basis); + CMAP_FOR_EACH_WITH_HASH (tp, node, hash, &ct->timeout_policies) { + if (tp->policy.id == tp_id) { + return tp; + } } + return NULL; +} - ovs_mutex_unlock(&ct->ct_lock); - return tp; +struct timeout_policy * +timeout_policy_get(struct conntrack *ct, int32_t tp_id) +{ + return timeout_policy_lookup(ct, tp_id); } static void @@ -125,27 +131,30 @@ timeout_policy_create(struct conntrack *ct, init_default_tp(tp, tp_id); update_existing_tp(tp, new_tp); hash = hash_int(tp_id, ct->hash_basis); - hmap_insert(&ct->timeout_policies, &tp->node, hash); + cmap_insert(&ct->timeout_policies, &tp->node, hash); } static void timeout_policy_clean(struct conntrack *ct, struct timeout_policy *tp) OVS_REQUIRES(ct->ct_lock) { - hmap_remove(&ct->timeout_policies, &tp->node); - free(tp); + uint32_t hash = hash_int(tp->policy.id, ct->hash_basis); + cmap_remove(&ct->timeout_policies, &tp->node, hash); + ovsrcu_postpone(free, tp); } static int -timeout_policy_delete__(struct conntrack *ct, uint32_t tp_id) +timeout_policy_delete__(struct conntrack *ct, uint32_t tp_id, + bool warn_on_error) OVS_REQUIRES(ct->ct_lock) { + struct timeout_policy *tp; int err = 0; - struct timeout_policy *tp = timeout_policy_lookup(ct, tp_id); + tp = timeout_policy_lookup_protected(ct, tp_id); if (tp) { timeout_policy_clean(ct, tp); - } else { + } else if (warn_on_error) { VLOG_WARN_RL(&rl, "Failed to delete a non-existent timeout " "policy: id=%d", tp_id); err = ENOENT; @@ -159,7 +168,7 @@ timeout_policy_delete(struct conntrack *ct, uint32_t tp_id) int err; ovs_mutex_lock(&ct->ct_lock); - err = timeout_policy_delete__(ct, tp_id); + err = timeout_policy_delete__(ct, tp_id, true); ovs_mutex_unlock(&ct->ct_lock); return err; } @@ -170,7 +179,7 @@ timeout_policy_init(struct conntrack *ct) { struct timeout_policy tp; - hmap_init(&ct->timeout_policies); + cmap_init(&ct->timeout_policies); /* Create default timeout policy. */ memset(&tp, 0, sizeof tp); @@ -182,14 +191,11 @@ int timeout_policy_update(struct conntrack *ct, struct timeout_policy *new_tp) { - int err = 0; uint32_t tp_id = new_tp->policy.id; + int err = 0; ovs_mutex_lock(&ct->ct_lock); - struct timeout_policy *tp = timeout_policy_lookup(ct, tp_id); - if (tp) { - err = timeout_policy_delete__(ct, tp_id); - } + timeout_policy_delete__(ct, tp_id, false); timeout_policy_create(ct, new_tp); ovs_mutex_unlock(&ct->ct_lock); return err; diff --git a/lib/conntrack.c b/lib/conntrack.c index 094367733..71f51f3d9 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -571,10 +571,13 @@ conntrack_destroy(struct conntrack *ct) cmap_destroy(&ct->zone_limits); struct timeout_policy *tp; - HMAP_FOR_EACH_POP (tp, node, &ct->timeout_policies) { - free(tp); + CMAP_FOR_EACH (tp, node, &ct->timeout_policies) { + uint32_t hash = hash_int(tp->policy.id, ct->hash_basis); + + cmap_remove(&ct->timeout_policies, &tp->node, hash); + ovsrcu_postpone(free, tp); } - hmap_destroy(&ct->timeout_policies); + cmap_destroy(&ct->timeout_policies); ovs_mutex_unlock(&ct->ct_lock); ovs_mutex_destroy(&ct->ct_lock); diff --git a/lib/conntrack.h b/lib/conntrack.h index 58b181834..b064abc9f 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -113,7 +113,7 @@ struct conntrack_zone_limit { }; struct timeout_policy { - struct hmap_node node; + struct cmap_node node; struct ct_dpif_timeout_policy policy; }; From patchwork Tue Jun 15 23:22:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gaetan Rivet X-Patchwork-Id: 1492524 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=u256.net header.i=@u256.net header.a=rsa-sha256 header.s=fm2 header.b=G4YDUrYr; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.a=rsa-sha256 header.s=fm3 header.b=ItUDLh+/; dkim-atps=neutral Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G4PWj6JrHz9sXG for ; Wed, 16 Jun 2021 09:23:29 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 3745C41627; Tue, 15 Jun 2021 23:23:27 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yM1HIGOBZAgt; Tue, 15 Jun 2021 23:23:25 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTPS id 27B2541592; Tue, 15 Jun 2021 23:23:21 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 42ECEC0032; Tue, 15 Jun 2021 23:23:18 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 275C5C000B for ; Tue, 15 Jun 2021 23:23:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id D80E3402F4 for ; Tue, 15 Jun 2021 23:23:09 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp2.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=u256.net header.b="G4YDUrYr"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="ItUDLh+/" Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id A9jyZm9qkVzc for ; Tue, 15 Jun 2021 23:23:09 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp2.osuosl.org (Postfix) with ESMTPS id DA631401F4 for ; Tue, 15 Jun 2021 23:23:08 +0000 (UTC) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 251FC5C00C8; Tue, 15 Jun 2021 19:23:08 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Tue, 15 Jun 2021 19:23:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=u256.net; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm2; bh=0xEZdoh2DDWOs LRDL5YVFQJL9xMFbOJ3B5+xvNJrd1g=; b=G4YDUrYrJ41oWK6Wb5xOLZZ5dZLoZ JWS25ZeMOpT2sGvsXgvcO+ocyFJHjSkxb1X5HdSsQXDPG6GFyCViMopsVRctMvhh CmrZHcsOhqefadVUC4+5ZisbFfr2wyqP2OCgKJAYxH50Wwa+dOocl9IASE0m9a2Q qP+P/mqjdysYP+aZawdAHtJrLeJxK3WxI1LGwGV0xhIzCMoMJo2lCx+UsgJP6MnO XJlzZSI2BhWPFhzyuPDLdNsWmkpznAGc2mSKw+4HGX+O2wS6hMYzK5582yar9MLT tHeWlWwJqqGeqaHk9h2gedrlfZJJ1LQc8oQFrcbkQZo0o1SMO3Ro2WdDQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=0xEZdoh2DDWOsLRDL5YVFQJL9xMFbOJ3B5+xvNJrd1g=; b=ItUDLh+/ 1qUTndOcKQhgfhxqeR1D7tENKSR3GaVQQ67E/UF1dISFSmpVjS3RyQ+RMPEfZebk YGdx5iJJD1BHg0pCBIKdmOmiyfBhaRwfrPWjTL8eScmJXJxDBdZPTLBtWRfgrqpD lZIbAyJXJZXrofR0j3A55y8MNC/8ljtuQf7EbxPVjuH9ZdcImg4dlvSPT4UYI+yP tcBM389k9Hyw0tbIGhJXxyhBLCHkf1d32oM90nCciF6UHrPN2sAyZpWZkuBYPuyb /8RjGBc/HCC5FgNII/Pft9YWVSBhD3rVbkasqmc/WCK6w4o+/b77OFjR2S/fbTaL UvUhZcHWnjBa+w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfedvkedgvddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpefirggvthgr nhcutfhivhgvthcuoehgrhhivhgvsehuvdehiedrnhgvtheqnecuggftrfgrthhtvghrnh ephefgveffkeetheetfeeifedvheelfeejfeehveduteejhfekuedtkeeiuedvteehnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhrihhvvg esuhdvheeirdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Jun 2021 19:23:07 -0400 (EDT) From: Gaetan Rivet To: ovs-dev@openvswitch.org Date: Wed, 16 Jun 2021 01:22:52 +0200 Message-Id: <9ae8ad243da85be4853b90eccc958600dace7726.1623786081.git.grive@u256.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Cc: Eli Britstein Subject: [ovs-dev] [PATCH v3 7/8] conntrack: Inverse conn and ct lock precedence X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The lock priority order is for the global 'ct_lock' to be taken first and then 'conn->lock'. This is an issue, as multiple operations on connections are thus blocked between threads contending on the global 'ct_lock'. This was previously necessary due to how the expiration lists, timeout policies and zone limits were managed. They are now using RCU-friendly structures that allow concurrent readers. The mutual exclusion now only needs to happen during writes. This allows reducing the 'ct_lock' precedence, and to only take it when writing the relevant structures. This will reduce contention on 'ct_lock', which impairs scalability when the connection tracker is used by many threads. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein --- lib/conntrack-private.h | 7 ++++-- lib/conntrack-tp.c | 30 +--------------------- lib/conntrack.c | 56 +++++++++++++++++++++++++---------------- 3 files changed, 41 insertions(+), 52 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index ea2e7ed4d..bb82252e8 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -134,6 +134,9 @@ struct conn { struct nat_action_info_t *nat_info; char *alg; struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */ + atomic_flag reclaimed; /* False during the lifetime of the connection, + * True as soon as a thread has started freeing + * its memory. */ /* Inserted once by a PMD, then managed by the 'ct_clean' thread. */ struct conn_expire exp; @@ -196,8 +199,8 @@ struct conntrack { }; /* Lock acquisition order: - * 1. 'ct_lock' - * 2. 'conn->lock' + * 1. 'conn->lock' + * 2. 'ct_lock' * 3. 'resources_lock' */ diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index 592e10c6f..22363e7fe 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -245,58 +245,30 @@ conn_schedule_expiration(struct conn *conn, enum ct_timeout tm, long long now, ignore(atomic_flag_test_and_set(&conn->exp.reschedule)); } -static void -conn_update_expiration__(struct conntrack *ct, struct conn *conn, - enum ct_timeout tm, long long now, - uint32_t tp_value) - OVS_REQUIRES(conn->lock) -{ - ovs_mutex_unlock(&conn->lock); - - ovs_mutex_lock(&ct->ct_lock); - ovs_mutex_lock(&conn->lock); - conn_schedule_expiration(conn, tm, now, tp_value); - ovs_mutex_unlock(&conn->lock); - ovs_mutex_unlock(&ct->ct_lock); - - ovs_mutex_lock(&conn->lock); -} - /* The conn entry lock must be held on entry and exit. */ void conn_update_expiration(struct conntrack *ct, struct conn *conn, enum ct_timeout tm, long long now) - OVS_REQUIRES(conn->lock) { struct timeout_policy *tp; uint32_t val; - ovs_mutex_unlock(&conn->lock); - - ovs_mutex_lock(&ct->ct_lock); - ovs_mutex_lock(&conn->lock); tp = timeout_policy_lookup(ct, conn->tp_id); if (tp) { val = tp->policy.attrs[tm_to_ct_dpif_tp(tm)]; } else { val = ct_dpif_netdev_tp_def[tm_to_ct_dpif_tp(tm)]; } - ovs_mutex_unlock(&conn->lock); - ovs_mutex_unlock(&ct->ct_lock); - - ovs_mutex_lock(&conn->lock); VLOG_DBG_RL(&rl, "Update timeout %s zone=%u with policy id=%d " "val=%u sec.", ct_timeout_str[tm], conn->key.zone, conn->tp_id, val); - conn_update_expiration__(ct, conn, tm, now, val); + conn_schedule_expiration(conn, tm, now, val); } -/* ct_lock must be held. */ void conn_init_expiration(struct conntrack *ct, struct conn *conn, enum ct_timeout tm, long long now) - OVS_REQUIRES(ct->ct_lock) { struct timeout_policy *tp; uint32_t val; diff --git a/lib/conntrack.c b/lib/conntrack.c index 71f51f3d9..045710e8d 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -465,7 +465,7 @@ zone_limit_delete(struct conntrack *ct, uint16_t zone) static void conn_clean_cmn(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) + OVS_REQUIRES(conn->lock, ct->ct_lock) { if (conn->alg) { expectation_clean(ct, &conn->key); @@ -495,18 +495,29 @@ conn_unref(struct conn *conn) * removes the associated nat 'conn' from the lookup datastructures. */ static void conn_clean(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) + OVS_EXCLUDED(conn->lock, ct->ct_lock) { ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); + if (atomic_flag_test_and_set(&conn->reclaimed)) { + return; + } + + ovs_mutex_lock(&conn->lock); + + ovs_mutex_lock(&ct->ct_lock); conn_clean_cmn(ct, conn); if (conn->nat_conn) { uint32_t hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis); cmap_remove(&ct->conns, &conn->nat_conn->cm_node, hash); } + ovs_mutex_unlock(&ct->ct_lock); + conn->cleaned = true; conn_unref(conn); atomic_count_dec(&ct->n_conn); + + ovs_mutex_unlock(&conn->lock); } static inline bool @@ -521,14 +532,25 @@ conn_unref_one(struct conn *conn) static void conn_clean_one(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) + OVS_EXCLUDED(conn->lock, ct->ct_lock) { + if (atomic_flag_test_and_set(&conn->reclaimed)) { + return; + } + + ovs_mutex_lock(&conn->lock); + + ovs_mutex_lock(&ct->ct_lock); conn_clean_cmn(ct, conn); + ovs_mutex_unlock(&ct->ct_lock); + if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { conn->cleaned = true; atomic_count_dec(&ct->n_conn); } conn_unref_one(conn); + + ovs_mutex_unlock(&conn->lock); } /* Destroys the connection tracker 'ct' and frees all the allocated memory. @@ -542,8 +564,6 @@ conntrack_destroy(struct conntrack *ct) pthread_join(ct->clean_thread, NULL); latch_destroy(&ct->clean_thread_exit); - ovs_mutex_lock(&ct->ct_lock); - for (unsigned i = 0; i < N_CT_TM; i++) { struct mpsc_queue_node *node; @@ -559,7 +579,6 @@ conntrack_destroy(struct conntrack *ct) CMAP_FOR_EACH (conn, cm_node, &ct->conns) { conn_clean_one(ct, conn); } - cmap_destroy(&ct->conns); struct zone_limit *zl; CMAP_FOR_EACH (zl, node, &ct->zone_limits) { @@ -568,7 +587,6 @@ conntrack_destroy(struct conntrack *ct) cmap_remove(&ct->zone_limits, &zl->node, hash); ovsrcu_postpone(free, zl); } - cmap_destroy(&ct->zone_limits); struct timeout_policy *tp; CMAP_FOR_EACH (tp, node, &ct->timeout_policies) { @@ -577,6 +595,11 @@ conntrack_destroy(struct conntrack *ct) cmap_remove(&ct->timeout_policies, &tp->node, hash); ovsrcu_postpone(free, tp); } + + ovs_mutex_lock(&ct->ct_lock); + + cmap_destroy(&ct->conns); + cmap_destroy(&ct->zone_limits); cmap_destroy(&ct->timeout_policies); ovs_mutex_unlock(&ct->ct_lock); @@ -1034,7 +1057,6 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, const struct nat_action_info_t *nat_action_info, const char *helper, const struct alg_exp_node *alg_exp, enum ct_alg_ctl_type ct_alg_ctl, uint32_t tp_id) - OVS_REQUIRES(ct->ct_lock) { struct conn *nc = NULL; struct conn *nat_conn = NULL; @@ -1113,13 +1135,18 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, nat_conn->alg = NULL; nat_conn->nat_conn = NULL; uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); + ovs_mutex_lock(&ct->ct_lock); cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); + ovs_mutex_unlock(&ct->ct_lock); } nc->nat_conn = nat_conn; ovs_mutex_init_adaptive(&nc->lock); nc->conn_type = CT_CONN_TYPE_DEFAULT; + atomic_flag_clear(&nc->reclaimed); + ovs_mutex_lock(&ct->ct_lock); cmap_insert(&ct->conns, &nc->cm_node, ctx->hash); + ovs_mutex_unlock(&ct->ct_lock); conn_expire_push_back(ct, nc); atomic_count_inc(&ct->n_conn); ctx->conn = nc; /* For completeness. */ @@ -1180,11 +1207,9 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, pkt->md.ct_state = CS_INVALID; break; case CT_UPDATE_NEW: - ovs_mutex_lock(&ct->ct_lock); if (conn_lookup(ct, &conn->key, now, NULL, NULL)) { conn_clean(ct, conn); } - ovs_mutex_unlock(&ct->ct_lock); create_new_conn = true; break; case CT_UPDATE_VALID_NEW: @@ -1366,11 +1391,9 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, /* Delete found entry if in wrong direction. 'force' implies commit. */ if (OVS_UNLIKELY(force && ctx->reply && conn)) { - ovs_mutex_lock(&ct->ct_lock); if (conn_lookup(ct, &conn->key, now, NULL, NULL)) { conn_clean(ct, conn); } - ovs_mutex_unlock(&ct->ct_lock); conn = NULL; } @@ -1434,12 +1457,10 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, } ovs_rwlock_unlock(&ct->resources_lock); - ovs_mutex_lock(&ct->ct_lock); if (!conn_lookup(ct, &ctx->key, now, NULL, NULL)) { conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, helper, alg_exp, ct_alg_ctl, tp_id); } - ovs_mutex_unlock(&ct->ct_lock); } write_ct_md(pkt, zone, conn, &ctx->key, alg_exp); @@ -1564,8 +1585,6 @@ ct_sweep(struct conntrack *ct, long long now, size_t limit) struct mpsc_queue_node *node; size_t count = 0; - ovs_mutex_lock(&ct->ct_lock); - for (unsigned i = 0; i < N_CT_TM; i++) { struct conn *end_of_queue = NULL; @@ -1630,7 +1649,6 @@ ct_sweep(struct conntrack *ct, long long now, size_t limit) out: VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec", count, time_msec() - now); - ovs_mutex_unlock(&ct->ct_lock); return min_expiration; } @@ -2688,13 +2706,11 @@ conntrack_flush(struct conntrack *ct, const uint16_t *zone) { struct conn *conn; - ovs_mutex_lock(&ct->ct_lock); CMAP_FOR_EACH (conn, cm_node, &ct->conns) { if (!zone || *zone == conn->key.zone) { conn_clean_one(ct, conn); } } - ovs_mutex_unlock(&ct->ct_lock); return 0; } @@ -2709,7 +2725,6 @@ conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple, memset(&key, 0, sizeof(key)); tuple_to_conn_key(tuple, zone, &key); - ovs_mutex_lock(&ct->ct_lock); conn_lookup(ct, &key, time_msec(), &conn, NULL); if (conn && conn->conn_type == CT_CONN_TYPE_DEFAULT) { @@ -2719,7 +2734,6 @@ conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple, error = ENOENT; } - ovs_mutex_unlock(&ct->ct_lock); return error; } From patchwork Tue Jun 15 23:22:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Gaetan Rivet X-Patchwork-Id: 1492525 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=u256.net header.i=@u256.net header.a=rsa-sha256 header.s=fm2 header.b=H6bW2AUI; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.a=rsa-sha256 header.s=fm3 header.b=UYexDBiF; dkim-atps=neutral Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4G4PWn1nRfz9sWl for ; Wed, 16 Jun 2021 09:23:33 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id A9D7F41641; Tue, 15 Jun 2021 23:23:28 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lQK7_jTQ_Lhy; Tue, 15 Jun 2021 23:23:26 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id D26A041622; Tue, 15 Jun 2021 23:23:22 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 578F6C0033; Tue, 15 Jun 2021 23:23:19 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 496EFC000B for ; Tue, 15 Jun 2021 23:23:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 41BB58299E for ; Tue, 15 Jun 2021 23:23:10 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=u256.net header.b="H6bW2AUI"; dkim=pass (2048-bit key) header.d=messagingengine.com header.b="UYexDBiF" Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id o8pG1C4WpqoC for ; Tue, 15 Jun 2021 23:23:09 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.8.0 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by smtp1.osuosl.org (Postfix) with ESMTPS id 8932C8380B for ; Tue, 15 Jun 2021 23:23:09 +0000 (UTC) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id D909A5C00FB; Tue, 15 Jun 2021 19:23:08 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Tue, 15 Jun 2021 19:23:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=u256.net; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=fm2; bh=KG4f5j/F3FCim NB5WBJIHU1M0shsVsMAd1VtcHs+0aA=; b=H6bW2AUIHvPMM10ctf6K82nc4Fciq L5MPrQnTImJWjHBvQiFKr5fJODXrrnW86kJEm4zIKA7LLdaB6PXv3w9SH7fguXNT kn39KdCTrqAFg8yAxAf+7LGoAudTPokGKV+5Tw1tB+DrSeszPdNyP4Eo3hYKvwcG uogYDbOkm9fPRWuyhzgxsdCoJPO0Io/pzoZ0ywQqmORQAFflYi8Q6e5xAxRBdeyA oUAOUBR1fS0unfn5qn/2MhyBfGxO/iyFePGUHSJ7+VuT5it/gf8UbaTkb6k8F5/N MhTAKw7qpMHdfvSb6DHxQWPMrYTFHYZ8lY7S36G6zrscstpDNZX3RqSFw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=KG4f5j/F3FCimNB5WBJIHU1M0shsVsMAd1VtcHs+0aA=; b=UYexDBiF qcedX64E8CotHT7h5nZScoP7GAnbjSDw1lJOwN/KjuUj+rVvIEC2PpKanzpbmgwa 8zgSMVueSqb1DB5j3/Bb8jLagtrcqigf6OFtoEXci6w5tCMS/tyrfWxWCf3T2a2z ECF2rz1El2ZJivAWqRqAqCH+jHO92yug6fTX5aymCFeNfj1JB7mQViF09iB4qDT0 H1NsgJt7Steh7zK8SrcM5isgtohJC3Tlv1xkbWc+gyMGO/j2BTAj4a1Ofn9StvsU 6HTgnOKgjQYlwL2KjjQFnSY2tbsX5Y/dQ2VHcCAbh6qsv8VOzF7IAmhACV/yfaUN GmLM7o+H97v4kA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfedvkedgvddtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpefirggvthgr nhcutfhivhgvthcuoehgrhhivhgvsehuvdehiedrnhgvtheqnecuggftrfgrthhtvghrnh ephefgveffkeetheetfeeifedvheelfeejfeehveduteejhfekuedtkeeiuedvteehnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepghhrihhvvg esuhdvheeirdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Jun 2021 19:23:08 -0400 (EDT) From: Gaetan Rivet To: ovs-dev@openvswitch.org Date: Wed, 16 Jun 2021 01:22:53 +0200 Message-Id: <7ba072a13032e3f6e843881205583a8e9a57f5d3.1623786081.git.grive@u256.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Cc: Eli Britstein Subject: [ovs-dev] [PATCH v3 8/8] conntrack: Use an atomic conn expiration value X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" A lock is taken during conn_lookup() to check whether a connection is expired before returning it. This lock can have some contention. Even though this lock ensures a consistent sequence of writes, it does not imply a specific order. A ct_clean thread taking the lock first could read a value that would be updated immediately after by a PMD waiting on the same lock, just as well as the inverse order. As such, the expiration time can be stale anytime it is read. In this context, using an atomic will ensure the same guarantees for either writes or reads, i.e. writes are consistent and reads are not undefined behaviour. Reading an atomic is however less costly than taking and releasing a lock. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein Acked-by: William Tu --- lib/conntrack-private.h | 2 +- lib/conntrack-tp.c | 2 +- lib/conntrack.c | 27 +++++++++++++++------------ 3 files changed, 17 insertions(+), 14 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index bb82252e8..d61ab4f36 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -144,7 +144,7 @@ struct conn { /* Mutable data. */ struct ovs_mutex lock; /* Guards all mutable fields. */ ovs_u128 label; - long long expiration; + atomic_llong expiration; uint32_t mark; int seq_skew; diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index 22363e7fe..5bf2816ca 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -240,7 +240,7 @@ static void conn_schedule_expiration(struct conn *conn, enum ct_timeout tm, long long now, uint32_t tp_value) { - conn->expiration = now + tp_value * 1000; + atomic_store_relaxed(&conn->expiration, now + tp_value * 1000); conn->exp.tm = tm; ignore(atomic_flag_test_and_set(&conn->exp.reschedule)); } diff --git a/lib/conntrack.c b/lib/conntrack.c index 045710e8d..03aa21e78 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -99,6 +99,7 @@ static enum ct_update_res conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, long long now); +static long long int conn_expiration(const struct conn *); static bool conn_expired(struct conn *, long long now); static void set_mark(struct dp_packet *, struct conn *, uint32_t val, uint32_t mask); @@ -1018,13 +1019,10 @@ un_nat_packet(struct dp_packet *pkt, const struct conn *conn, static void conn_seq_skew_set(struct conntrack *ct, const struct conn *conn_in, long long now, int seq_skew, bool seq_skew_dir) - OVS_NO_THREAD_SAFETY_ANALYSIS { struct conn *conn; - ovs_mutex_unlock(&conn_in->lock); - conn_lookup(ct, &conn_in->key, now, &conn, NULL); - ovs_mutex_lock(&conn_in->lock); + conn_lookup(ct, &conn_in->key, now, &conn, NULL); if (conn && seq_skew) { conn->seq_skew = seq_skew; conn->seq_skew_dir = seq_skew_dir; @@ -1596,9 +1594,7 @@ ct_sweep(struct conntrack *ct, long long now, size_t limit) continue; } - ovs_mutex_lock(&conn->lock); - expiration = conn->expiration; - ovs_mutex_unlock(&conn->lock); + expiration = conn_expiration(conn); if (conn == end_of_queue) { /* If we already re-enqueued this conn during this sweep, @@ -2483,14 +2479,21 @@ conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, return update_res; } +static long long int +conn_expiration(const struct conn *conn) +{ + long long int expiration; + + atomic_read_relaxed(&CONST_CAST(struct conn *, conn)->expiration, + &expiration); + return expiration; +} + static bool conn_expired(struct conn *conn, long long now) { if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_mutex_lock(&conn->lock); - bool expired = now >= conn->expiration ? true : false; - ovs_mutex_unlock(&conn->lock); - return expired; + return now >= conn_expiration(conn); } return false; } @@ -2633,7 +2636,7 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, entry->mark = conn->mark; memcpy(&entry->labels, &conn->label, sizeof entry->labels); - long long expiration = conn->expiration - now; + long long expiration = conn_expiration(conn) - now; struct ct_l4_proto *class = l4_protos[conn->key.nw_proto]; if (class->conn_get_protoinfo) {