From patchwork Mon Nov 29 18:05:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1561312 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=JWRsPIDa; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4J2tZG42jMz9sVc for ; Tue, 30 Nov 2021 05:05:58 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 833B8404B7; Mon, 29 Nov 2021 18:05:55 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HR4lrVleDsuw; Mon, 29 Nov 2021 18:05:54 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id A8B37404A7; Mon, 29 Nov 2021 18:05:53 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 64A7DC0039; Mon, 29 Nov 2021 18:05:53 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id BCB4DC0039 for ; Mon, 29 Nov 2021 18:05:51 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 066CA81B53 for ; Mon, 29 Nov 2021 18:05:47 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9TLwj2LC7zSp for ; Mon, 29 Nov 2021 18:05:46 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 148FC81B60 for ; Mon, 29 Nov 2021 18:05:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638209145; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1qqyf5PoNOyEFahiVhtpUXktwt8OAHv8VOUNyCA1s1M=; b=JWRsPIDa0NagygM4ebU04040fuVAVt9v0PEs8e8v/u3MxBcQK6eBzX1L22s5h9K/C3yNaK 2CHs1LMuowZ4Z98qkhrpGuH+/L9g4Z3G8wXlA/f2vHDUvxBW+Ug52Xa6BZyCNZksRzHPok 0FF6cOKbGItZqVroTNnA/Z0KbaxRcm0= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-545-muCkC3zoMSmyS0xyRp47bA-1; Mon, 29 Nov 2021 13:05:40 -0500 X-MC-Unique: muCkC3zoMSmyS0xyRp47bA-1 Received: by mail-wm1-f69.google.com with SMTP id j25-20020a05600c1c1900b00332372c252dso11275783wms.1 for ; Mon, 29 Nov 2021 10:05:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=1qqyf5PoNOyEFahiVhtpUXktwt8OAHv8VOUNyCA1s1M=; b=P4WbKJg/SLjInCeiewCObX0Hf7MHIWahAd8yTwSXnFPA4UfbZk4ILakl3rtNKdS526 BKIxQ5z0XMk4OsDwXDn+SaegkGW8Cp0RAy1swA2of206rq2+D7soOVjIvQCA9xalJTwT LcJPK1DhlpToxfGZS0VUq/aO4EXJpGyEbekBAsORXynVrn4FAypkxDz3gyZJVVbcgwBR 0QpoisX6r7Ncs4Au1zCzTYhwDd3BAHpLQPQsxF9+yZR1AgVfnTjueauUkv4shthbTiS+ IlZ7turpZBywdTZVYlRq1rJOODqIXiilFZTa5WVsMSGJ8df0KBMUbnHPv7RwaS17++GI rqQg== X-Gm-Message-State: AOAM530wT2YqcZniZutA/Z3D0gdvUm3Z5zMozDwQFrO4g8zvYguvrz6H ZjvUeOdEqcw00Yv5f9DUUUZ/jVu+68rD6h+mrxGbmcP0LuWe1CiwY28wrObM1HCSO0xbM+P9KsG 8CA1q9TEH8IhKlCE0ciE42iMVG8+rbwQlCNN2ew8pfJw3OP+EG9jn5R9gHyCn/U4T X-Received: by 2002:a5d:4d07:: with SMTP id z7mr8353663wrt.487.1638209138611; Mon, 29 Nov 2021 10:05:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJxwT+oAQaHk6bW7rieChqG56heIpE4E9ebO/cgyUDRcyfTT6NtiorlE7NBvb+4w6tuwmZ3SyA== X-Received: by 2002:a5d:4d07:: with SMTP id z7mr8353597wrt.487.1638209138122; Mon, 29 Nov 2021 10:05:38 -0800 (PST) Received: from localhost (net-5-88-23-84.cust.vodafonedsl.it. [5.88.23.84]) by smtp.gmail.com with ESMTPSA id k37sm15620wms.21.2021.11.29.10.05.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Nov 2021 10:05:37 -0800 (PST) From: Paolo Valerio To: dev@openvswitch.org Date: Mon, 29 Nov 2021 19:05:37 +0100 Message-ID: <163820913056.3001886.12795917577713677397.stgit@fed.void> In-Reply-To: <163820911055.3001886.13118680364054625917.stgit@fed.void> References: <163820911055.3001886.13118680364054625917.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: fbl@redhat.com, i.maximets@ovn.org Subject: [ovs-dev] [PATCH 1/5] conntrack: Use a cmap to store zone limits X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Gaetan Rivet Change the data structure from hmap to cmap for zone limits. As they are shared amongst multiple conntrack users, multiple readers want to check the current zone limit state before progressing in their processing. Using a CMAP allows doing lookups without taking the global 'ct_lock', thus reducing contention. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein Signed-off-by: Paolo Valerio --- lib/conntrack-private.h | 2 + lib/conntrack.c | 70 ++++++++++++++++++++++++++++++++--------------- lib/conntrack.h | 2 + lib/dpif-netdev.c | 5 ++- 4 files changed, 53 insertions(+), 26 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index dfdf4e676..d9461b811 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -192,7 +192,7 @@ struct conntrack { struct ovs_mutex ct_lock; /* Protects 2 following fields. */ struct cmap conns OVS_GUARDED; struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED; - struct hmap zone_limits OVS_GUARDED; + struct cmap zone_limits OVS_GUARDED; struct hmap timeout_policies OVS_GUARDED; uint32_t hash_basis; /* Salt for hashing a connection key. */ pthread_t clean_thread; /* Periodically cleans up connection tracker. */ diff --git a/lib/conntrack.c b/lib/conntrack.c index 33a1a9295..cbd3b40ff 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -81,7 +81,7 @@ enum ct_alg_ctl_type { }; struct zone_limit { - struct hmap_node node; + struct cmap_node node; struct conntrack_zone_limit czl; }; @@ -311,7 +311,7 @@ conntrack_init(void) for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) { ovs_list_init(&ct->exp_lists[i]); } - hmap_init(&ct->zone_limits); + cmap_init(&ct->zone_limits); ct->zone_limit_seq = 0; timeout_policy_init(ct); ovs_mutex_unlock(&ct->ct_lock); @@ -346,12 +346,25 @@ zone_key_hash(int32_t zone, uint32_t basis) } static struct zone_limit * -zone_limit_lookup(struct conntrack *ct, int32_t zone) +zone_limit_lookup_protected(struct conntrack *ct, int32_t zone) OVS_REQUIRES(ct->ct_lock) { uint32_t hash = zone_key_hash(zone, ct->hash_basis); struct zone_limit *zl; - HMAP_FOR_EACH_IN_BUCKET (zl, node, hash, &ct->zone_limits) { + CMAP_FOR_EACH_WITH_HASH_PROTECTED (zl, node, hash, &ct->zone_limits) { + if (zl->czl.zone == zone) { + return zl; + } + } + return NULL; +} + +static struct zone_limit * +zone_limit_lookup(struct conntrack *ct, int32_t zone) +{ + uint32_t hash = zone_key_hash(zone, ct->hash_basis); + struct zone_limit *zl; + CMAP_FOR_EACH_WITH_HASH (zl, node, hash, &ct->zone_limits) { if (zl->czl.zone == zone) { return zl; } @@ -361,7 +374,6 @@ zone_limit_lookup(struct conntrack *ct, int32_t zone) static struct zone_limit * zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone) - OVS_REQUIRES(ct->ct_lock) { struct zone_limit *zl = zone_limit_lookup(ct, zone); return zl ? zl : zone_limit_lookup(ct, DEFAULT_ZONE); @@ -370,13 +382,16 @@ zone_limit_lookup_or_default(struct conntrack *ct, int32_t zone) struct conntrack_zone_limit zone_limit_get(struct conntrack *ct, int32_t zone) { - ovs_mutex_lock(&ct->ct_lock); - struct conntrack_zone_limit czl = {DEFAULT_ZONE, 0, 0, 0}; + struct conntrack_zone_limit czl = { + .zone = DEFAULT_ZONE, + .limit = 0, + .count = ATOMIC_COUNT_INIT(0), + .zone_limit_seq = 0, + }; struct zone_limit *zl = zone_limit_lookup_or_default(ct, zone); if (zl) { czl = zl->czl; } - ovs_mutex_unlock(&ct->ct_lock); return czl; } @@ -384,13 +399,19 @@ static int zone_limit_create(struct conntrack *ct, int32_t zone, uint32_t limit) OVS_REQUIRES(ct->ct_lock) { + struct zone_limit *zl = zone_limit_lookup_protected(ct, zone); + + if (zl) { + return 0; + } + if (zone >= DEFAULT_ZONE && zone <= MAX_ZONE) { - struct zone_limit *zl = xzalloc(sizeof *zl); + zl = xzalloc(sizeof *zl); zl->czl.limit = limit; zl->czl.zone = zone; zl->czl.zone_limit_seq = ct->zone_limit_seq++; uint32_t hash = zone_key_hash(zone, ct->hash_basis); - hmap_insert(&ct->zone_limits, &zl->node, hash); + cmap_insert(&ct->zone_limits, &zl->node, hash); return 0; } else { return EINVAL; @@ -401,13 +422,14 @@ int zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit) { int err = 0; - ovs_mutex_lock(&ct->ct_lock); struct zone_limit *zl = zone_limit_lookup(ct, zone); if (zl) { zl->czl.limit = limit; VLOG_INFO("Changed zone limit of %u for zone %d", limit, zone); } else { + ovs_mutex_lock(&ct->ct_lock); err = zone_limit_create(ct, zone, limit); + ovs_mutex_unlock(&ct->ct_lock); if (!err) { VLOG_INFO("Created zone limit of %u for zone %d", limit, zone); } else { @@ -415,7 +437,6 @@ zone_limit_update(struct conntrack *ct, int32_t zone, uint32_t limit) zone); } } - ovs_mutex_unlock(&ct->ct_lock); return err; } @@ -423,23 +444,25 @@ static void zone_limit_clean(struct conntrack *ct, struct zone_limit *zl) OVS_REQUIRES(ct->ct_lock) { - hmap_remove(&ct->zone_limits, &zl->node); - free(zl); + uint32_t hash = zone_key_hash(zl->czl.zone, ct->hash_basis); + cmap_remove(&ct->zone_limits, &zl->node, hash); + ovsrcu_postpone(free, zl); } int zone_limit_delete(struct conntrack *ct, uint16_t zone) { ovs_mutex_lock(&ct->ct_lock); - struct zone_limit *zl = zone_limit_lookup(ct, zone); + struct zone_limit *zl = zone_limit_lookup_protected(ct, zone); if (zl) { zone_limit_clean(ct, zl); + ovs_mutex_unlock(&ct->ct_lock); VLOG_INFO("Deleted zone limit for zone %d", zone); } else { + ovs_mutex_unlock(&ct->ct_lock); VLOG_INFO("Attempted delete of non-existent zone limit: zone %d", zone); } - ovs_mutex_unlock(&ct->ct_lock); return 0; } @@ -456,7 +479,7 @@ conn_clean_cmn(struct conntrack *ct, struct conn *conn) struct zone_limit *zl = zone_limit_lookup(ct, conn->admit_zone); if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) { - zl->czl.count--; + atomic_count_dec(&zl->czl.count); } } @@ -510,10 +533,13 @@ conntrack_destroy(struct conntrack *ct) cmap_destroy(&ct->conns); struct zone_limit *zl; - HMAP_FOR_EACH_POP (zl, node, &ct->zone_limits) { - free(zl); + CMAP_FOR_EACH (zl, node, &ct->zone_limits) { + uint32_t hash = zone_key_hash(zl->czl.zone, ct->hash_basis); + + cmap_remove(&ct->zone_limits, &zl->node, hash); + ovsrcu_postpone(free, zl); } - hmap_destroy(&ct->zone_limits); + cmap_destroy(&ct->zone_limits); struct timeout_policy *tp; HMAP_FOR_EACH_POP (tp, node, &ct->timeout_policies) { @@ -991,7 +1017,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, if (commit) { struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); - if (zl && zl->czl.count >= zl->czl.limit) { + if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) { return nc; } @@ -1064,7 +1090,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, if (zl) { nc->admit_zone = zl->czl.zone; nc->zone_limit_seq = zl->czl.zone_limit_seq; - zl->czl.count++; + atomic_count_inc(&zl->czl.count); } else { nc->admit_zone = INVALID_ZONE; } diff --git a/lib/conntrack.h b/lib/conntrack.h index 9553b188a..58b181834 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -108,7 +108,7 @@ struct conntrack_dump { struct conntrack_zone_limit { int32_t zone; uint32_t limit; - uint32_t count; + atomic_count count; uint32_t zone_limit_seq; /* Used to disambiguate zone limit counts. */ }; diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 69d7ec26e..b35e3fdcd 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -8519,7 +8519,8 @@ dpif_netdev_ct_get_limits(struct dpif *dpif, czl = zone_limit_get(dp->conntrack, zone_limit->zone); if (czl.zone == zone_limit->zone || czl.zone == DEFAULT_ZONE) { ct_dpif_push_zone_limit(zone_limits_reply, zone_limit->zone, - czl.limit, czl.count); + czl.limit, + atomic_count_get(&czl.count)); } else { return EINVAL; } @@ -8529,7 +8530,7 @@ dpif_netdev_ct_get_limits(struct dpif *dpif, czl = zone_limit_get(dp->conntrack, z); if (czl.zone == z) { ct_dpif_push_zone_limit(zone_limits_reply, z, czl.limit, - czl.count); + atomic_count_get(&czl.count)); } } } From patchwork Mon Nov 29 18:05:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1561313 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=A/LiC1Ln; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4J2tZN1s9wz9sVc for ; Tue, 30 Nov 2021 05:06:04 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id C3AB881B48; Mon, 29 Nov 2021 18:06:01 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jNniKYTpCrQM; Mon, 29 Nov 2021 18:06:00 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTPS id CD5BC81B99; Mon, 29 Nov 2021 18:05:59 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id A3393C0012; Mon, 29 Nov 2021 18:05:59 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 06A92C0012 for ; Mon, 29 Nov 2021 18:05:58 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 90CE36071A for ; Mon, 29 Nov 2021 18:05:54 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp3.osuosl.org (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BwqWsEV030MW for ; Mon, 29 Nov 2021 18:05:53 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id A3E2660AA6 for ; Mon, 29 Nov 2021 18:05:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638209152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mTdsKWVtlFry0Bs3IqGkvUh1I9RAmL32giCbCZfrq5o=; b=A/LiC1LnVnedWqNcCMF2JD22zOmiTMbkQN41Rx3DYsooHj1oKI+tfyLRtH3wz/OKMN8f9Z pcrloS7ZcoQzuO2975r1whO6F6GvxRdUIdeXSu+dQIVAQ5uA9BQd4a9SPTPMUfZF0qAwGL pFfwdvCGcEo2tIHWvXF02vgXt6EqN7s= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-313-veXcEfiqOaW-_uFNgxM64g-1; Mon, 29 Nov 2021 13:05:50 -0500 X-MC-Unique: veXcEfiqOaW-_uFNgxM64g-1 Received: by mail-wm1-f71.google.com with SMTP id 205-20020a1c00d6000000b003335d1384f1so12276021wma.3 for ; Mon, 29 Nov 2021 10:05:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=mTdsKWVtlFry0Bs3IqGkvUh1I9RAmL32giCbCZfrq5o=; b=xIKxhBPTkbiVMq+6typ5ZdIcCoCBXYs8Ex673G3d7norEowEQvSS+mxWkR1t0f+p8O 2kMP8DG+FxYXOzQJMXqq3S+tKsTlM0LhN2hRH8fo9ZxB1XuZUuFV8LTp/1rtQjgfZ5FM 9DuUyzviPmApXBgwh7HeYO0wsoxDIOdFYTdDFWo7xR6qgs7oo8PTmkgYqFx0ZOFOp5t/ U7YmKJ7nvQB4OyU+dQnAkSDTH3SObhgq7cJiGcjFbMfKhZ4JnbxZ/hyIucTPY3DwkNTK GqTJcemibE87ZQS4HUXvZd6zm5/Q5DFjPxmFtg9h6PhqMacU6KCoVWOXZgOBRwubkxKV pWOA== X-Gm-Message-State: AOAM532gCwJk5L8rqs4tYNS3uxeSPHLG2o25yqhug4KsJoC3WMkaK5R1 O4iJHiPzSmlvqG2woHJKqNyB/CVxLxJWRtqohlBI4Zwk3Lk6brk8BTiZyrAal0qNlNgxZ9FqNvD aIWrho4n9oKmoydNHPHHVJBwJyXpWJ7xiV9DIvrFn614g2BRX6NsBdT/8dgyJ8KLP X-Received: by 2002:a5d:5144:: with SMTP id u4mr36267232wrt.91.1638209148476; Mon, 29 Nov 2021 10:05:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJyzU3MWFeLbO3y3zOfLo5TkN5d7FvlgWXMPJNYPxMJIN7KXW7cJ9tnebBTD5YqIxsKQ3wIx1A== X-Received: by 2002:a5d:5144:: with SMTP id u4mr36267190wrt.91.1638209148103; Mon, 29 Nov 2021 10:05:48 -0800 (PST) Received: from localhost (net-5-88-23-84.cust.vodafonedsl.it. [5.88.23.84]) by smtp.gmail.com with ESMTPSA id p14sm8031wms.29.2021.11.29.10.05.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Nov 2021 10:05:47 -0800 (PST) From: Paolo Valerio To: dev@openvswitch.org Date: Mon, 29 Nov 2021 19:05:46 +0100 Message-ID: <163820914325.3001886.1316072665396898657.stgit@fed.void> In-Reply-To: <163820911055.3001886.13118680364054625917.stgit@fed.void> References: <163820911055.3001886.13118680364054625917.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: fbl@redhat.com, i.maximets@ovn.org Subject: [ovs-dev] [PATCH 2/5] conntrack-tp: Use a cmap to store timeout policies X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Gaetan Rivet Multiple lookups are done to stored timeout policies, each time blocking the global 'ct_lock'. This is usually not necessary and it should be acceptable to get policy updates slightly delayed (by one RCU sync at most). Using a CMAP reduces multiple lock taking and releasing in the connection insertion path. Signed-off-by: Gaetan Rivet Reviewed-by: Eli Britstein Acked-by: William Tu Signed-off-by: Paolo Valerio --- lib/conntrack-private.h | 2 +- lib/conntrack-tp.c | 54 ++++++++++++++++++++++++++--------------------- lib/conntrack.c | 9 +++++--- lib/conntrack.h | 2 +- 4 files changed, 38 insertions(+), 29 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index d9461b811..34c688821 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -193,7 +193,7 @@ struct conntrack { struct cmap conns OVS_GUARDED; struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED; struct cmap zone_limits OVS_GUARDED; - struct hmap timeout_policies OVS_GUARDED; + struct cmap timeout_policies OVS_GUARDED; uint32_t hash_basis; /* Salt for hashing a connection key. */ pthread_t clean_thread; /* Periodically cleans up connection tracker. */ struct latch clean_thread_exit; /* To destroy the 'clean_thread'. */ diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index a586d3a8d..c2245038b 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -47,14 +47,15 @@ static unsigned int ct_dpif_netdev_tp_def[] = { }; static struct timeout_policy * -timeout_policy_lookup(struct conntrack *ct, int32_t tp_id) +timeout_policy_lookup_protected(struct conntrack *ct, int32_t tp_id) OVS_REQUIRES(ct->ct_lock) { struct timeout_policy *tp; uint32_t hash; hash = hash_int(tp_id, ct->hash_basis); - HMAP_FOR_EACH_IN_BUCKET (tp, node, hash, &ct->timeout_policies) { + CMAP_FOR_EACH_WITH_HASH_PROTECTED (tp, node, hash, + &ct->timeout_policies) { if (tp->policy.id == tp_id) { return tp; } @@ -62,20 +63,25 @@ timeout_policy_lookup(struct conntrack *ct, int32_t tp_id) return NULL; } -struct timeout_policy * -timeout_policy_get(struct conntrack *ct, int32_t tp_id) +static struct timeout_policy * +timeout_policy_lookup(struct conntrack *ct, int32_t tp_id) { struct timeout_policy *tp; + uint32_t hash; - ovs_mutex_lock(&ct->ct_lock); - tp = timeout_policy_lookup(ct, tp_id); - if (!tp) { - ovs_mutex_unlock(&ct->ct_lock); - return NULL; + hash = hash_int(tp_id, ct->hash_basis); + CMAP_FOR_EACH_WITH_HASH (tp, node, hash, &ct->timeout_policies) { + if (tp->policy.id == tp_id) { + return tp; + } } + return NULL; +} - ovs_mutex_unlock(&ct->ct_lock); - return tp; +struct timeout_policy * +timeout_policy_get(struct conntrack *ct, int32_t tp_id) +{ + return timeout_policy_lookup(ct, tp_id); } static void @@ -125,27 +131,30 @@ timeout_policy_create(struct conntrack *ct, init_default_tp(tp, tp_id); update_existing_tp(tp, new_tp); hash = hash_int(tp_id, ct->hash_basis); - hmap_insert(&ct->timeout_policies, &tp->node, hash); + cmap_insert(&ct->timeout_policies, &tp->node, hash); } static void timeout_policy_clean(struct conntrack *ct, struct timeout_policy *tp) OVS_REQUIRES(ct->ct_lock) { - hmap_remove(&ct->timeout_policies, &tp->node); - free(tp); + uint32_t hash = hash_int(tp->policy.id, ct->hash_basis); + cmap_remove(&ct->timeout_policies, &tp->node, hash); + ovsrcu_postpone(free, tp); } static int -timeout_policy_delete__(struct conntrack *ct, uint32_t tp_id) +timeout_policy_delete__(struct conntrack *ct, uint32_t tp_id, + bool warn_on_error) OVS_REQUIRES(ct->ct_lock) { + struct timeout_policy *tp; int err = 0; - struct timeout_policy *tp = timeout_policy_lookup(ct, tp_id); + tp = timeout_policy_lookup_protected(ct, tp_id); if (tp) { timeout_policy_clean(ct, tp); - } else { + } else if (warn_on_error) { VLOG_WARN_RL(&rl, "Failed to delete a non-existent timeout " "policy: id=%d", tp_id); err = ENOENT; @@ -159,7 +168,7 @@ timeout_policy_delete(struct conntrack *ct, uint32_t tp_id) int err; ovs_mutex_lock(&ct->ct_lock); - err = timeout_policy_delete__(ct, tp_id); + err = timeout_policy_delete__(ct, tp_id, true); ovs_mutex_unlock(&ct->ct_lock); return err; } @@ -170,7 +179,7 @@ timeout_policy_init(struct conntrack *ct) { struct timeout_policy tp; - hmap_init(&ct->timeout_policies); + cmap_init(&ct->timeout_policies); /* Create default timeout policy. */ memset(&tp, 0, sizeof tp); @@ -182,14 +191,11 @@ int timeout_policy_update(struct conntrack *ct, struct timeout_policy *new_tp) { - int err = 0; uint32_t tp_id = new_tp->policy.id; + int err = 0; ovs_mutex_lock(&ct->ct_lock); - struct timeout_policy *tp = timeout_policy_lookup(ct, tp_id); - if (tp) { - err = timeout_policy_delete__(ct, tp_id); - } + timeout_policy_delete__(ct, tp_id, false); timeout_policy_create(ct, new_tp); ovs_mutex_unlock(&ct->ct_lock); return err; diff --git a/lib/conntrack.c b/lib/conntrack.c index cbd3b40ff..983f824d2 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -542,10 +542,13 @@ conntrack_destroy(struct conntrack *ct) cmap_destroy(&ct->zone_limits); struct timeout_policy *tp; - HMAP_FOR_EACH_POP (tp, node, &ct->timeout_policies) { - free(tp); + CMAP_FOR_EACH (tp, node, &ct->timeout_policies) { + uint32_t hash = hash_int(tp->policy.id, ct->hash_basis); + + cmap_remove(&ct->timeout_policies, &tp->node, hash); + ovsrcu_postpone(free, tp); } - hmap_destroy(&ct->timeout_policies); + cmap_destroy(&ct->timeout_policies); ovs_mutex_unlock(&ct->ct_lock); ovs_mutex_destroy(&ct->ct_lock); diff --git a/lib/conntrack.h b/lib/conntrack.h index 58b181834..b064abc9f 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -113,7 +113,7 @@ struct conntrack_zone_limit { }; struct timeout_policy { - struct hmap_node node; + struct cmap_node node; struct ct_dpif_timeout_policy policy; }; From patchwork Mon Nov 29 18:05:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1561314 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=XyVKLHYU; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4J2tZk5Yk2z9sVc for ; Tue, 30 Nov 2021 05:06:22 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 3B14A81B52; Mon, 29 Nov 2021 18:06:20 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ihbFtlT0NRGS; Mon, 29 Nov 2021 18:06:17 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTPS id B104E81BF4; Mon, 29 Nov 2021 18:06:16 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 77A80C0012; Mon, 29 Nov 2021 18:06:16 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id A0CA7C0012 for ; Mon, 29 Nov 2021 18:06:14 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 08C3A405B8 for ; Mon, 29 Nov 2021 18:06:07 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp4.osuosl.org (amavisd-new); dkim=fail (1024-bit key) reason="fail (body has been altered)" header.d=redhat.com Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xJWOUkEJvk8Y for ; Mon, 29 Nov 2021 18:06:04 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id 6ECEA405AA for ; Mon, 29 Nov 2021 18:06:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638209163; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+eD4ibEStbfqPZd/yLge/H55IToXiV722UhcqeE0NrQ=; b=XyVKLHYU8FVyLB2kQytTX8PM5A99bZn5RTL40sjjNBNsxLu1EYl/eXW4AtuPtyE2WBfPCW ZPlvO47mRvwH+0MoKX4anuGytouGAPAaTBkAlpaAjOmMDltqWLfkfhpe0wnxwMHqICamLK k3h/e4FcMuLhnkKTC4Gkxs4wYmGBCCw= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-587-IKCKM6DoMHeOLTlTrjJ6ww-1; Mon, 29 Nov 2021 13:06:00 -0500 X-MC-Unique: IKCKM6DoMHeOLTlTrjJ6ww-1 Received: by mail-wm1-f71.google.com with SMTP id 205-20020a1c00d6000000b003335d1384f1so12276245wma.3 for ; Mon, 29 Nov 2021 10:06:00 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=ICztUx8UIN75WO6aAuczzeZOnsTegcdYrPjBVHY9Zn0=; b=4b/4awTqhf3nY+jdMoxGn9RzLNTZkJiClGUGqofbodEtgDl8Mx1AZw/T0/k4xUFcFP WGMDYcUuW85ZBtilaluIZjyFq6FSXS/o8cGAxbsVKQTgIdjXaultTHK3YTvdVkehYzTB JGLRGf4whlCwuiw3QDu5u6q42+WRXBB35qYcyipiEOyJhGKH4zcXd456TtgKTN2fK8OW /2GkJJTYJMKdEPeo/amSXL2ZY7E839ZuSOwlSkaTenypIQYIOMA+TqxNJmrMOpsKRNQW YL/PskuDU4bVJbksKhYHqr7UMmvCyeyk08sYJzlvITHqUBD66oCROf16C78mzCZNbEVP k8jQ== X-Gm-Message-State: AOAM531AK44BAgR1GjfvxUo+zEROkAyEAHQ1eCj0j5CoyWtfN1VK2Adi 18RipG6rAycHv2PR7MRoa3I04G8RZgwrYY/AAhtUUUGqw4cUwYP1Wdo98Cu3NeZOr/FBwixGTTT XM2pFrJl/TufmiuNWA7fP7bhPJZBSA8SuYJmPAAZxLfCA456wl1Gg8Si2EQE9kJ+W X-Received: by 2002:adf:d1e2:: with SMTP id g2mr37175758wrd.105.1638209158726; Mon, 29 Nov 2021 10:05:58 -0800 (PST) X-Google-Smtp-Source: ABdhPJxtjYG/V9ZUoJa5BS6TGVDc9OQudpmIkT1MPMosSHn0vVjmrl6vMyc+7Jp0JjMHxSAlDzF66w== X-Received: by 2002:adf:d1e2:: with SMTP id g2mr37175659wrd.105.1638209157852; Mon, 29 Nov 2021 10:05:57 -0800 (PST) Received: from localhost (net-5-88-23-84.cust.vodafonedsl.it. [5.88.23.84]) by smtp.gmail.com with ESMTPSA id r8sm18189882wrz.43.2021.11.29.10.05.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Nov 2021 10:05:57 -0800 (PST) From: Paolo Valerio To: dev@openvswitch.org Date: Mon, 29 Nov 2021 19:05:56 +0100 Message-ID: <163820915323.3001886.1292868316263973958.stgit@fed.void> In-Reply-To: <163820911055.3001886.13118680364054625917.stgit@fed.void> References: <163820911055.3001886.13118680364054625917.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: fbl@redhat.com, i.maximets@ovn.org Subject: [ovs-dev] [PATCH RFC 3/5] conntrack: Replaces nat_conn introducing key directionality. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Peng He Currently, when doing NAT, the userspace conntrack will use an extra conn for the two directions in a flow. However, each conn has actually the two keys for both orig and rev directions. This patch introduces a key_node[CT_DIR_MAX] member in the conn which consists of key and a cmap_node for hash lookup. Both keys can now be accessed in the following way: conn->key_node[CT_DIR_{FWD,REV}].key similarly to what Aaron Conole suggested. This patch avoids the extra allocation for nat_conn, and makes userspace code cleaner. Signed-off-by: Peng He Co-authored-by: Paolo Valerio Signed-off-by: Paolo Valerio --- Initial patch: https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0320@bytedance.com/ --- lib/conntrack-private.h | 20 +- lib/conntrack-tp.c | 6 - lib/conntrack.c | 499 ++++++++++++++++++++--------------------------- 3 files changed, 229 insertions(+), 296 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index 34c688821..ea5ba3d9e 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -48,9 +48,16 @@ struct ct_endpoint { * hashing in ct_endpoint_hash_add(). */ BUILD_ASSERT_DECL(sizeof(struct ct_endpoint) == sizeof(union ct_addr) + 4); +enum key_dir { + CT_DIR_FWD = 0, + CT_DIR_REV, + CT_DIR_MAX, +}; + /* Changes to this structure need to be reflected in conn_key_hash() * and conn_key_cmp(). */ struct conn_key { + enum key_dir dir; struct ct_endpoint src; struct ct_endpoint dst; @@ -86,21 +93,19 @@ struct alg_exp_node { bool nat_rpl_dst; }; -enum OVS_PACKED_ENUM ct_conn_type { - CT_CONN_TYPE_DEFAULT, - CT_CONN_TYPE_UN_NAT, +struct conn_key_node { + struct conn_key key; + struct cmap_node cm_node; }; struct conn { /* Immutable data. */ - struct conn_key key; - struct conn_key rev_key; + struct conn_key_node key_node[CT_DIR_MAX]; struct conn_key parent_key; /* Only used for orig_tuple support. */ struct ovs_list exp_node; - struct cmap_node cm_node; + uint16_t nat_action; char *alg; - struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */ /* Mutable data. */ struct ovs_mutex lock; /* Guards all mutable fields. */ @@ -120,7 +125,6 @@ struct conn { /* Immutable data. */ bool alg_related; /* True if alg data connection. */ - enum ct_conn_type conn_type; uint32_t tp_id; /* Timeout policy ID. */ }; diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index c2245038b..9ecb06978 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -282,7 +282,8 @@ conn_update_expiration(struct conntrack *ct, struct conn *conn, ovs_mutex_lock(&conn->lock); VLOG_DBG_RL(&rl, "Update timeout %s zone=%u with policy id=%d " "val=%u sec.", - ct_timeout_str[tm], conn->key.zone, conn->tp_id, val); + ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, + conn->tp_id, val); conn_update_expiration__(ct, conn, tm, now, val); } @@ -313,7 +314,8 @@ conn_init_expiration(struct conntrack *ct, struct conn *conn, } VLOG_DBG_RL(&rl, "Init timeout %s zone=%u with policy id=%d val=%u sec.", - ct_timeout_str[tm], conn->key.zone, conn->tp_id, val); + ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, + conn->tp_id, val); conn_init_expiration__(ct, conn, tm, now, val); } diff --git a/lib/conntrack.c b/lib/conntrack.c index 983f824d2..a284c57c0 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -96,7 +96,6 @@ static struct conn *new_conn(struct conntrack *ct, struct dp_packet *pkt, uint32_t tp_id); static void delete_conn_cmn(struct conn *); static void delete_conn(struct conn *); -static void delete_conn_one(struct conn *conn); static enum ct_update_res conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, @@ -110,8 +109,7 @@ static void set_label(struct dp_packet *, struct conn *, static void *clean_thread_main(void *f_); static bool -nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, - struct conn *nat_conn, +nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, const struct nat_action_info_t *nat_info); static uint8_t @@ -231,61 +229,6 @@ conn_key_cmp(const struct conn_key *key1, const struct conn_key *key2) return 1; } -static void -ct_print_conn_info(const struct conn *c, const char *log_msg, - enum vlog_level vll, bool force, bool rl_on) -{ -#define CT_VLOG(RL_ON, LEVEL, ...) \ - do { \ - if (RL_ON) { \ - static struct vlog_rate_limit rl_ = VLOG_RATE_LIMIT_INIT(5, 5); \ - vlog_rate_limit(&this_module, LEVEL, &rl_, __VA_ARGS__); \ - } else { \ - vlog(&this_module, LEVEL, __VA_ARGS__); \ - } \ - } while (0) - - if (OVS_UNLIKELY(force || vlog_is_enabled(&this_module, vll))) { - if (c->key.dl_type == htons(ETH_TYPE_IP)) { - CT_VLOG(rl_on, vll, "%s: src ip "IP_FMT" dst ip "IP_FMT" rev src " - "ip "IP_FMT" rev dst ip "IP_FMT" src/dst ports " - "%"PRIu16"/%"PRIu16" rev src/dst ports " - "%"PRIu16"/%"PRIu16" zone/rev zone " - "%"PRIu16"/%"PRIu16" nw_proto/rev nw_proto " - "%"PRIu8"/%"PRIu8, log_msg, - IP_ARGS(c->key.src.addr.ipv4), - IP_ARGS(c->key.dst.addr.ipv4), - IP_ARGS(c->rev_key.src.addr.ipv4), - IP_ARGS(c->rev_key.dst.addr.ipv4), - ntohs(c->key.src.port), ntohs(c->key.dst.port), - ntohs(c->rev_key.src.port), ntohs(c->rev_key.dst.port), - c->key.zone, c->rev_key.zone, c->key.nw_proto, - c->rev_key.nw_proto); - } else { - char ip6_s[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->key.src.addr.ipv6, ip6_s, sizeof ip6_s); - char ip6_d[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->key.dst.addr.ipv6, ip6_d, sizeof ip6_d); - char ip6_rs[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->rev_key.src.addr.ipv6, ip6_rs, - sizeof ip6_rs); - char ip6_rd[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->rev_key.dst.addr.ipv6, ip6_rd, - sizeof ip6_rd); - - CT_VLOG(rl_on, vll, "%s: src ip %s dst ip %s rev src ip %s" - " rev dst ip %s src/dst ports %"PRIu16"/%"PRIu16 - " rev src/dst ports %"PRIu16"/%"PRIu16" zone/rev zone " - "%"PRIu16"/%"PRIu16" nw_proto/rev nw_proto " - "%"PRIu8"/%"PRIu8, log_msg, ip6_s, ip6_d, ip6_rs, - ip6_rd, ntohs(c->key.src.port), ntohs(c->key.dst.port), - ntohs(c->rev_key.src.port), ntohs(c->rev_key.dst.port), - c->key.zone, c->rev_key.zone, c->key.nw_proto, - c->rev_key.nw_proto); - } - } -} - /* Initializes the connection tracker 'ct'. The caller is responsible for * calling 'conntrack_destroy()', when the instance is not needed anymore */ struct conntrack * @@ -467,34 +410,28 @@ zone_limit_delete(struct conntrack *ct, uint16_t zone) } static void -conn_clean_cmn(struct conntrack *ct, struct conn *conn) +conn_clean(struct conntrack *ct, struct conn *conn) OVS_REQUIRES(ct->ct_lock) { + struct zone_limit *zl; + uint32_t hash; + if (conn->alg) { - expectation_clean(ct, &conn->key); + expectation_clean(ct, &conn->key_node[CT_DIR_FWD].key); } - uint32_t hash = conn_key_hash(&conn->key, ct->hash_basis); - cmap_remove(&ct->conns, &conn->cm_node, hash); + hash = conn_key_hash(&conn->key_node[CT_DIR_FWD].key, ct->hash_basis); + cmap_remove(&ct->conns, &conn->key_node[CT_DIR_FWD].cm_node, hash); - struct zone_limit *zl = zone_limit_lookup(ct, conn->admit_zone); + zl = zone_limit_lookup(ct, conn->admit_zone); if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) { atomic_count_dec(&zl->czl.count); } -} -/* Must be called with 'conn' of 'conn_type' CT_CONN_TYPE_DEFAULT. Also - * removes the associated nat 'conn' from the lookup datastructures. */ -static void -conn_clean(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) -{ - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); - - conn_clean_cmn(ct, conn); - if (conn->nat_conn) { - uint32_t hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis); - cmap_remove(&ct->conns, &conn->nat_conn->cm_node, hash); + if (conn->nat_action) { + hash = conn_key_hash(&conn->key_node[CT_DIR_REV].key, + ct->hash_basis); + cmap_remove(&ct->conns, &conn->key_node[CT_DIR_REV].cm_node, hash); } ovs_list_remove(&conn->exp_node); conn->cleaned = true; @@ -502,33 +439,26 @@ conn_clean(struct conntrack *ct, struct conn *conn) atomic_count_dec(&ct->n_conn); } -static void -conn_clean_one(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) -{ - conn_clean_cmn(ct, conn); - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_list_remove(&conn->exp_node); - conn->cleaned = true; - atomic_count_dec(&ct->n_conn); - } - ovsrcu_postpone(delete_conn_one, conn); -} - /* Destroys the connection tracker 'ct' and frees all the allocated memory. * The caller of this function must already have shut down packet input * and PMD threads (which would have been quiesced). */ void conntrack_destroy(struct conntrack *ct) { + struct conn_key_node *keyn; struct conn *conn; + latch_set(&ct->clean_thread_exit); pthread_join(ct->clean_thread, NULL); latch_destroy(&ct->clean_thread_exit); ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (conn, cm_node, &ct->conns) { - conn_clean_one(ct, conn); + CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { + if (keyn->key.dir != CT_DIR_FWD) { + continue; + } + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + conn_clean(ct, conn); } cmap_destroy(&ct->conns); @@ -573,31 +503,31 @@ conn_key_lookup(struct conntrack *ct, const struct conn_key *key, uint32_t hash, long long now, struct conn **conn_out, bool *reply) { - struct conn *conn; + struct conn_key_node *keyn; + struct conn *conn = NULL; bool found = false; - CMAP_FOR_EACH_WITH_HASH (conn, cm_node, hash, &ct->conns) { - if (!conn_key_cmp(&conn->key, key) && !conn_expired(conn, now)) { - found = true; - if (reply) { - *reply = false; - } - break; - } - if (!conn_key_cmp(&conn->rev_key, key) && !conn_expired(conn, now)) { - found = true; - if (reply) { - *reply = true; + CMAP_FOR_EACH_WITH_HASH (keyn, cm_node, hash, &ct->conns) { + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + for (int i = CT_DIR_FWD; i < CT_DIR_MAX; i++) { + if (!conn_key_cmp(&conn->key_node[i].key, key) && + !conn_expired(conn, now)) { + found = true; + if (reply) { + *reply = i; + } + goto out_found; } - break; } } +out_found: if (found && conn_out) { *conn_out = conn; } else if (conn_out) { *conn_out = NULL; } + return found; } @@ -631,7 +561,7 @@ write_ct_md(struct dp_packet *pkt, uint16_t zone, const struct conn *conn, if (conn->alg_related) { key = &conn->parent_key; } else { - key = &conn->key; + key = &conn->key_node[CT_DIR_FWD].key; } } else if (alg_exp) { pkt->md.ct_mark = alg_exp->parent_mark; @@ -754,21 +684,24 @@ handle_alg_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, static void pat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *rev_key = &conn->key_node[CT_DIR_REV].key, + *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { - if (conn->key.nw_proto == IPPROTO_TCP) { + if (key->nw_proto == IPPROTO_TCP) { struct tcp_header *th = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, conn->rev_key.dst.port, th->tcp_dst); - } else if (conn->key.nw_proto == IPPROTO_UDP) { + packet_set_tcp_port(pkt, rev_key->dst.port, th->tcp_dst); + } else if (key->nw_proto == IPPROTO_UDP) { struct udp_header *uh = dp_packet_l4(pkt); - packet_set_udp_port(pkt, conn->rev_key.dst.port, uh->udp_dst); + packet_set_udp_port(pkt, rev_key->dst.port, uh->udp_dst); } } else if (conn->nat_action & NAT_ACTION_DST) { - if (conn->key.nw_proto == IPPROTO_TCP) { - packet_set_tcp_port(pkt, conn->rev_key.dst.port, - conn->rev_key.src.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { - packet_set_udp_port(pkt, conn->rev_key.dst.port, - conn->rev_key.src.port); + if (key->nw_proto == IPPROTO_TCP) { + packet_set_tcp_port(pkt, rev_key->dst.port, + rev_key->src.port); + } else if (key->nw_proto == IPPROTO_UDP) { + packet_set_udp_port(pkt, rev_key->dst.port, + rev_key->src.port); } } } @@ -776,32 +709,35 @@ pat_packet(struct dp_packet *pkt, const struct conn *conn) static void nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related) { + const struct conn_key *rev_key = &conn->key_node[CT_DIR_REV].key, + *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { pkt->md.ct_state |= CS_SRC_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_src, - conn->rev_key.dst.addr.ipv4); + rev_key->dst.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_src.be32, - &conn->rev_key.dst.addr.ipv6, true); + &rev_key->dst.addr.ipv6, true); } if (!related) { pat_packet(pkt, conn); } } else if (conn->nat_action & NAT_ACTION_DST) { pkt->md.ct_state |= CS_DST_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_dst, - conn->rev_key.src.addr.ipv4); + rev_key->src.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_dst.be32, - &conn->rev_key.src.addr.ipv6, true); + &rev_key->src.addr.ipv6, true); } if (!related) { pat_packet(pkt, conn); @@ -812,19 +748,21 @@ nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related) static void un_pat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { - if (conn->key.nw_proto == IPPROTO_TCP) { + if (key->nw_proto == IPPROTO_TCP) { struct tcp_header *th = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, th->tcp_src, conn->key.src.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { + packet_set_tcp_port(pkt, th->tcp_src, key->src.port); + } else if (key->nw_proto == IPPROTO_UDP) { struct udp_header *uh = dp_packet_l4(pkt); - packet_set_udp_port(pkt, uh->udp_src, conn->key.src.port); + packet_set_udp_port(pkt, uh->udp_src, key->src.port); } } else if (conn->nat_action & NAT_ACTION_DST) { - if (conn->key.nw_proto == IPPROTO_TCP) { - packet_set_tcp_port(pkt, conn->key.dst.port, conn->key.src.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { - packet_set_udp_port(pkt, conn->key.dst.port, conn->key.src.port); + if (key->nw_proto == IPPROTO_TCP) { + packet_set_tcp_port(pkt, key->dst.port, key->src.port); + } else if (key->nw_proto == IPPROTO_UDP) { + packet_set_udp_port(pkt, key->dst.port, key->src.port); } } } @@ -832,23 +770,25 @@ un_pat_packet(struct dp_packet *pkt, const struct conn *conn) static void reverse_pat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { - if (conn->key.nw_proto == IPPROTO_TCP) { + if (key->nw_proto == IPPROTO_TCP) { struct tcp_header *th_in = dp_packet_l4(pkt); - packet_set_tcp_port(pkt, conn->key.src.port, + packet_set_tcp_port(pkt, key->src.port, th_in->tcp_dst); - } else if (conn->key.nw_proto == IPPROTO_UDP) { + } else if (key->nw_proto == IPPROTO_UDP) { struct udp_header *uh_in = dp_packet_l4(pkt); - packet_set_udp_port(pkt, conn->key.src.port, + packet_set_udp_port(pkt, key->src.port, uh_in->udp_dst); } } else if (conn->nat_action & NAT_ACTION_DST) { - if (conn->key.nw_proto == IPPROTO_TCP) { - packet_set_tcp_port(pkt, conn->key.src.port, - conn->key.dst.port); - } else if (conn->key.nw_proto == IPPROTO_UDP) { - packet_set_udp_port(pkt, conn->key.src.port, - conn->key.dst.port); + if (key->nw_proto == IPPROTO_TCP) { + packet_set_tcp_port(pkt, key->src.port, + key->dst.port); + } else if (key->nw_proto == IPPROTO_UDP) { + packet_set_udp_port(pkt, key->src.port, + key->dst.port); } } } @@ -856,6 +796,7 @@ reverse_pat_packet(struct dp_packet *pkt, const struct conn *conn) static void reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; char *tail = dp_packet_tail(pkt); uint16_t pad = dp_packet_l2_pad_size(pkt); struct conn_key inner_key; @@ -863,7 +804,7 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) uint16_t orig_l3_ofs = pkt->l3_ofs; uint16_t orig_l4_ofs = pkt->l4_ofs; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); struct icmp_header *icmp = dp_packet_l4(pkt); struct ip_header *inner_l3 = (struct ip_header *) (icmp + 1); @@ -876,10 +817,10 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) if (conn->nat_action & NAT_ACTION_SRC) { packet_set_ipv4_addr(pkt, &inner_l3->ip_src, - conn->key.src.addr.ipv4); + key->src.addr.ipv4); } else if (conn->nat_action & NAT_ACTION_DST) { packet_set_ipv4_addr(pkt, &inner_l3->ip_dst, - conn->key.dst.addr.ipv4); + key->dst.addr.ipv4); } reverse_pat_packet(pkt, conn); @@ -899,13 +840,13 @@ reverse_nat_packet(struct dp_packet *pkt, const struct conn *conn) pkt->l4_ofs += inner_l4 - (char *) icmp6; if (conn->nat_action & NAT_ACTION_SRC) { - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, inner_l3_6->ip6_src.be32, - &conn->key.src.addr.ipv6, true); + &key->src.addr.ipv6, true); } else if (conn->nat_action & NAT_ACTION_DST) { - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, inner_l3_6->ip6_dst.be32, - &conn->key.dst.addr.ipv6, true); + &key->dst.addr.ipv6, true); } reverse_pat_packet(pkt, conn); icmp6->icmp6_base.icmp6_cksum = 0; @@ -920,17 +861,19 @@ static void un_nat_packet(struct dp_packet *pkt, const struct conn *conn, bool related) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; + if (conn->nat_action & NAT_ACTION_SRC) { pkt->md.ct_state |= CS_DST_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_dst, - conn->key.src.addr.ipv4); + key->src.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_dst.be32, - &conn->key.src.addr.ipv6, true); + &key->src.addr.ipv6, true); } if (OVS_UNLIKELY(related)) { @@ -940,15 +883,15 @@ un_nat_packet(struct dp_packet *pkt, const struct conn *conn, } } else if (conn->nat_action & NAT_ACTION_DST) { pkt->md.ct_state |= CS_SRC_NAT; - if (conn->key.dl_type == htons(ETH_TYPE_IP)) { + if (key->dl_type == htons(ETH_TYPE_IP)) { struct ip_header *nh = dp_packet_l3(pkt); packet_set_ipv4_addr(pkt, &nh->ip_src, - conn->key.dst.addr.ipv4); + key->dst.addr.ipv4); } else { struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - packet_set_ipv6_addr(pkt, conn->key.nw_proto, + packet_set_ipv6_addr(pkt, key->nw_proto, nh6->ip6_src.be32, - &conn->key.dst.addr.ipv6, true); + &key->dst.addr.ipv6, true); } if (OVS_UNLIKELY(related)) { @@ -966,7 +909,7 @@ conn_seq_skew_set(struct conntrack *ct, const struct conn *conn_in, { struct conn *conn; ovs_mutex_unlock(&conn_in->lock); - conn_lookup(ct, &conn_in->key, now, &conn, NULL); + conn_lookup(ct, &conn_in->key_node[CT_DIR_FWD].key, now, &conn, NULL); ovs_mutex_lock(&conn_in->lock); if (conn && seq_skew) { @@ -1004,7 +947,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, OVS_REQUIRES(ct->ct_lock) { struct conn *nc = NULL; - struct conn *nat_conn = NULL; + uint32_t rev_hash = ctx->hash; if (!valid_new(pkt, &ctx->key)) { pkt->md.ct_state = CS_INVALID; @@ -1018,6 +961,8 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } if (commit) { + struct conn_key_node *fwd_key_node, *rev_key_node; + struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) { @@ -1032,9 +977,12 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } nc = new_conn(ct, pkt, &ctx->key, now, tp_id); - memcpy(&nc->key, &ctx->key, sizeof nc->key); - memcpy(&nc->rev_key, &nc->key, sizeof nc->rev_key); - conn_key_reverse(&nc->rev_key); + fwd_key_node = &nc->key_node[CT_DIR_FWD]; + rev_key_node = &nc->key_node[CT_DIR_REV]; + memcpy(&fwd_key_node->key, &ctx->key, sizeof fwd_key_node->key); + memcpy(&rev_key_node->key, &fwd_key_node->key, + sizeof rev_key_node->key); + conn_key_reverse(&rev_key_node->key); if (ct_verify_helper(helper, ct_alg_ctl)) { nc->alg = nullable_xstrdup(helper); @@ -1049,45 +997,33 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, if (nat_action_info) { nc->nat_action = nat_action_info->nat_action; - nat_conn = xzalloc(sizeof *nat_conn); if (alg_exp) { if (alg_exp->nat_rpl_dst) { - nc->rev_key.dst.addr = alg_exp->alg_nat_repl_addr; + rev_key_node->key.dst.addr = alg_exp->alg_nat_repl_addr; nc->nat_action = NAT_ACTION_SRC; } else { - nc->rev_key.src.addr = alg_exp->alg_nat_repl_addr; + rev_key_node->key.src.addr = alg_exp->alg_nat_repl_addr; nc->nat_action = NAT_ACTION_DST; } } else { - memcpy(nat_conn, nc, sizeof *nat_conn); - bool nat_res = nat_get_unique_tuple(ct, nc, nat_conn, - nat_action_info); + bool nat_res = nat_get_unique_tuple(ct, nc, nat_action_info); if (!nat_res) { goto nat_res_exhaustion; } - - /* Update nc with nat adjustments made to nat_conn by - * nat_get_unique_tuple(). */ - memcpy(nc, nat_conn, sizeof *nc); } nat_packet(pkt, nc, ctx->icmp_related); - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; - nat_conn->nat_action = 0; - nat_conn->alg = NULL; - nat_conn->nat_conn = NULL; - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); + rev_hash = conn_key_hash(&rev_key_node->key, ct->hash_basis); + rev_key_node->key.dir = CT_DIR_REV; + cmap_insert(&ct->conns, &rev_key_node->cm_node, rev_hash); } - nc->nat_conn = nat_conn; ovs_mutex_init_adaptive(&nc->lock); - nc->conn_type = CT_CONN_TYPE_DEFAULT; - cmap_insert(&ct->conns, &nc->cm_node, ctx->hash); + fwd_key_node->key.dir = CT_DIR_FWD; + cmap_insert(&ct->conns, &fwd_key_node->cm_node, ctx->hash); + atomic_count_inc(&ct->n_conn); ctx->conn = nc; /* For completeness. */ if (zl) { @@ -1107,7 +1043,6 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, * firewall rules or a separate firewall. Also using zone partitioning * can limit DoS impact. */ nat_res_exhaustion: - free(nat_conn); ovs_list_remove(&nc->exp_node); delete_conn_cmn(nc); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); @@ -1121,7 +1056,6 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, struct conn *conn, long long now) { - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); bool create_new_conn = false; if (ctx->icmp_related) { @@ -1149,7 +1083,8 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, break; case CT_UPDATE_NEW: ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key, now, NULL, NULL)) { + if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, + now, NULL, NULL)) { conn_clean(ct, conn); } ovs_mutex_unlock(&ct->ct_lock); @@ -1330,11 +1265,12 @@ initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx, if (natted) { if (OVS_LIKELY(ctx->conn)) { ctx->reply = !ctx->reply; - ctx->key = ctx->reply ? ctx->conn->rev_key : ctx->conn->key; + ctx->key = ctx->conn->key_node[ctx->reply ? + CT_DIR_REV : CT_DIR_FWD].key; ctx->hash = conn_key_hash(&ctx->key, ct->hash_basis); } else { /* A lookup failure does not necessarily imply that an - * error occurred, it may simply indicate that a conn got + * error occurred, it may simply indicate that a ctx->conn got * removed during the recirculation. */ COVERAGE_INC(conntrack_lookup_natted_miss); conn_key_reverse(&ctx->key); @@ -1364,32 +1300,14 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, /* Delete found entry if in wrong direction. 'force' implies commit. */ if (OVS_UNLIKELY(force && ctx->reply && conn)) { ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key, now, NULL, NULL)) { + if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, + now, NULL, NULL)) { conn_clean(ct, conn); } ovs_mutex_unlock(&ct->ct_lock); conn = NULL; } - if (OVS_LIKELY(conn)) { - if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { - - ctx->reply = true; - struct conn *rev_conn = conn; /* Save for debugging. */ - uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); - conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); - - if (!conn) { - pkt->md.ct_state |= CS_INVALID; - write_ct_md(pkt, zone, NULL, NULL, NULL); - char *log_msg = xasprintf("Missing parent conn %p", rev_conn); - ct_print_conn_info(rev_conn, log_msg, VLL_INFO, true, true); - free(log_msg); - return; - } - } - } - enum ct_alg_ctl_type ct_alg_ctl = get_alg_ctl_type(pkt, tp_src, tp_dst, helper); @@ -1482,7 +1400,8 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, struct conn *conn = packet->md.conn; if (OVS_UNLIKELY(packet->md.ct_state == CS_INVALID)) { write_ct_md(packet, zone, NULL, NULL, NULL); - } else if (conn && conn->key.zone == zone && !force + } else if (conn && + conn->key_node[CT_DIR_FWD].key.zone == zone && !force && !get_alg_ctl_type(packet, tp_src, tp_dst, helper)) { process_one_fast(zone, setmark, setlabel, nat_action_info, conn, packet); @@ -2263,7 +2182,7 @@ nat_ipv6_addr_increment(struct in6_addr *ipv6, uint32_t increment) } static uint32_t -nat_range_hash(const struct conn *conn, uint32_t basis, +nat_range_hash(struct conn_key *key, uint32_t basis, const struct nat_action_info_t *nat_info) { uint32_t hash = basis; @@ -2273,11 +2192,11 @@ nat_range_hash(const struct conn *conn, uint32_t basis, hash = hash_add(hash, (nat_info->max_port << 16) | nat_info->min_port); - hash = ct_endpoint_hash_add(hash, &conn->key.src); - hash = ct_endpoint_hash_add(hash, &conn->key.dst); - hash = hash_add(hash, (OVS_FORCE uint32_t) conn->key.dl_type); - hash = hash_add(hash, conn->key.nw_proto); - hash = hash_add(hash, conn->key.zone); + hash = ct_endpoint_hash_add(hash, &key->src); + hash = ct_endpoint_hash_add(hash, &key->dst); + hash = hash_add(hash, (OVS_FORCE uint32_t) key->dl_type); + hash = hash_add(hash, key->nw_proto); + hash = hash_add(hash, key->zone); /* The purpose of the second parameter is to distinguish hashes of data of * different length; our data always has the same length so there is no @@ -2343,7 +2262,7 @@ get_addr_in_range(union ct_addr *min, union ct_addr *max, } static void -get_initial_addr(const struct conn *conn, union ct_addr *min, +get_initial_addr(struct conn_key *key, union ct_addr *min, union ct_addr *max, union ct_addr *curr, uint32_t hash, bool ipv4, const struct nat_action_info_t *nat_info) @@ -2353,9 +2272,9 @@ get_initial_addr(const struct conn *conn, union ct_addr *min, /* All-zero case. */ if (!memcmp(min, &zero_ip, sizeof *min)) { if (nat_info->nat_action & NAT_ACTION_SRC) { - *curr = conn->key.src.addr; + *curr = key->src.addr; } else if (nat_info->nat_action & NAT_ACTION_DST) { - *curr = conn->key.dst.addr; + *curr = key->dst.addr; } } else { get_addr_in_range(min, max, curr, hash, ipv4); @@ -2438,39 +2357,42 @@ next_addr_in_range_guarded(union ct_addr *curr, union ct_addr *min, * * If none can be found, return exhaustion to the caller. */ static bool -nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, - struct conn *nat_conn, +nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, const struct nat_action_info_t *nat_info) { union ct_addr min_addr = {0}, max_addr = {0}, curr_addr = {0}, guard_addr = {0}; - uint32_t hash = nat_range_hash(conn, ct->hash_basis, nat_info); - bool pat_proto = conn->key.nw_proto == IPPROTO_TCP || - conn->key.nw_proto == IPPROTO_UDP; + struct conn_key *fwd_key = &conn->key_node[CT_DIR_FWD].key, + *rev_key = &conn->key_node[CT_DIR_REV].key; + bool pat_proto = fwd_key->nw_proto == IPPROTO_TCP || + fwd_key->nw_proto == IPPROTO_UDP; uint16_t min_dport, max_dport, curr_dport; uint16_t min_sport, max_sport, curr_sport; + uint32_t hash; + + hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info); min_addr = nat_info->min_addr; max_addr = nat_info->max_addr; - get_initial_addr(conn, &min_addr, &max_addr, &curr_addr, hash, - (conn->key.dl_type == htons(ETH_TYPE_IP)), nat_info); + get_initial_addr(fwd_key, &min_addr, &max_addr, &curr_addr, hash, + (fwd_key->dl_type == htons(ETH_TYPE_IP)), nat_info); /* Save the address we started from so that * we can stop once we reach it. */ guard_addr = curr_addr; - set_sport_range(nat_info, &conn->key, hash, &curr_sport, + set_sport_range(nat_info, fwd_key, hash, &curr_sport, &min_sport, &max_sport); - set_dport_range(nat_info, &conn->key, hash, &curr_dport, + set_dport_range(nat_info, fwd_key, hash, &curr_dport, &min_dport, &max_dport); another_round: - store_addr_to_key(&curr_addr, &nat_conn->rev_key, + store_addr_to_key(&curr_addr, rev_key, nat_info->nat_action); if (!pat_proto) { - if (!conn_lookup(ct, &nat_conn->rev_key, + if (!conn_lookup(ct, rev_key, time_msec(), NULL, NULL)) { return true; } @@ -2479,10 +2401,10 @@ another_round: } FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, max_dport) { - nat_conn->rev_key.src.port = htons(curr_dport); + rev_key->src.port = htons(curr_dport); FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) { - nat_conn->rev_key.dst.port = htons(curr_sport); - if (!conn_lookup(ct, &nat_conn->rev_key, + rev_key->dst.port = htons(curr_sport); + if (!conn_lookup(ct, rev_key, time_msec(), NULL, NULL)) { return true; } @@ -2494,7 +2416,7 @@ another_round: next_addr: if (next_addr_in_range_guarded(&curr_addr, &min_addr, &max_addr, &guard_addr, - conn->key.dl_type == htons(ETH_TYPE_IP))) { + fwd_key->dl_type == htons(ETH_TYPE_IP))) { return false; } @@ -2506,9 +2428,9 @@ conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, long long now) { ovs_mutex_lock(&conn->lock); + uint8_t nw_proto = conn->key_node[CT_DIR_FWD].key.nw_proto; enum ct_update_res update_res = - l4_protos[conn->key.nw_proto]->conn_update(ct, conn, pkt, ctx->reply, - now); + l4_protos[nw_proto]->conn_update(ct, conn, pkt, ctx->reply, now); ovs_mutex_unlock(&conn->lock); return update_res; } @@ -2516,13 +2438,10 @@ conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, static bool conn_expired(struct conn *conn, long long now) { - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_mutex_lock(&conn->lock); - bool expired = now >= conn->expiration ? true : false; - ovs_mutex_unlock(&conn->lock); - return expired; - } - return false; + ovs_mutex_lock(&conn->lock); + bool expired = now >= conn->expiration ? true : false; + ovs_mutex_unlock(&conn->lock); + return expired; } static bool @@ -2548,19 +2467,7 @@ delete_conn_cmn(struct conn *conn) static void delete_conn(struct conn *conn) { - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); ovs_mutex_destroy(&conn->lock); - free(conn->nat_conn); - delete_conn_cmn(conn); -} - -/* Only used by conn_clean_one(). */ -static void -delete_conn_one(struct conn *conn) -{ - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_mutex_destroy(&conn->lock); - } delete_conn_cmn(conn); } @@ -2652,11 +2559,14 @@ static void conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, long long now) { + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key, + *rev_key = &conn->key_node[CT_DIR_REV].key; + memset(entry, 0, sizeof *entry); - conn_key_to_tuple(&conn->key, &entry->tuple_orig); - conn_key_to_tuple(&conn->rev_key, &entry->tuple_reply); + conn_key_to_tuple(key, &entry->tuple_orig); + conn_key_to_tuple(rev_key, &entry->tuple_reply); - entry->zone = conn->key.zone; + entry->zone = key->zone; ovs_mutex_lock(&conn->lock); entry->mark = conn->mark; @@ -2664,7 +2574,7 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, long long expiration = conn->expiration - now; - struct ct_l4_proto *class = l4_protos[conn->key.nw_proto]; + struct ct_l4_proto *class = l4_protos[key->nw_proto]; if (class->conn_get_protoinfo) { class->conn_get_protoinfo(conn, &entry->protoinfo); } @@ -2712,10 +2622,12 @@ conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry) if (!cm_node) { break; } + struct conn_key_node *keyn; struct conn *conn; - INIT_CONTAINER(conn, cm_node, cm_node); - if ((!dump->filter_zone || conn->key.zone == dump->zone) && - (conn->conn_type != CT_CONN_TYPE_UN_NAT)) { + INIT_CONTAINER(keyn, cm_node, cm_node); + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + if ((!dump->filter_zone || keyn->key.zone == dump->zone) && + (keyn->key.dir == CT_DIR_FWD)) { conn_to_ct_dpif_entry(conn, entry, now); return 0; } @@ -2733,12 +2645,17 @@ conntrack_dump_done(struct conntrack_dump *dump OVS_UNUSED) int conntrack_flush(struct conntrack *ct, const uint16_t *zone) { + struct conn_key_node *keyn; struct conn *conn; ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (conn, cm_node, &ct->conns) { - if (!zone || *zone == conn->key.zone) { - conn_clean_one(ct, conn); + CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { + if (keyn->key.dir != CT_DIR_FWD) { + continue; + } + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + if (!zone || *zone == keyn->key.zone) { + conn_clean(ct, conn); } } ovs_mutex_unlock(&ct->ct_lock); @@ -2759,10 +2676,10 @@ conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple, ovs_mutex_lock(&ct->ct_lock); conn_lookup(ct, &key, time_msec(), &conn, NULL); - if (conn && conn->conn_type == CT_CONN_TYPE_DEFAULT) { + if (conn) { conn_clean(ct, conn); } else { - VLOG_WARN("Must flush tuple using the original pre-NATed tuple"); + VLOG_WARN("Tuple not found"); error = ENOENT; } @@ -2906,50 +2823,52 @@ expectation_create(struct conntrack *ct, ovs_be16 dst_port, const struct conn *parent_conn, bool reply, bool src_ip_wc, bool skip_nat) { + const struct conn_key *pconn_key = &parent_conn->key_node[CT_DIR_FWD].key, + *pconn_rev_key = &parent_conn->key_node[CT_DIR_REV].key; union ct_addr src_addr; union ct_addr dst_addr; union ct_addr alg_nat_repl_addr; struct alg_exp_node *alg_exp_node = xzalloc(sizeof *alg_exp_node); if (reply) { - src_addr = parent_conn->key.src.addr; - dst_addr = parent_conn->key.dst.addr; + src_addr = pconn_key->src.addr; + dst_addr = pconn_key->dst.addr; alg_exp_node->nat_rpl_dst = true; if (skip_nat) { alg_nat_repl_addr = dst_addr; } else if (parent_conn->nat_action & NAT_ACTION_DST) { - alg_nat_repl_addr = parent_conn->rev_key.src.addr; + alg_nat_repl_addr = pconn_rev_key->src.addr; alg_exp_node->nat_rpl_dst = false; } else { - alg_nat_repl_addr = parent_conn->rev_key.dst.addr; + alg_nat_repl_addr = pconn_rev_key->dst.addr; } } else { - src_addr = parent_conn->rev_key.src.addr; - dst_addr = parent_conn->rev_key.dst.addr; + src_addr = pconn_rev_key->src.addr; + dst_addr = pconn_rev_key->dst.addr; alg_exp_node->nat_rpl_dst = false; if (skip_nat) { alg_nat_repl_addr = src_addr; } else if (parent_conn->nat_action & NAT_ACTION_DST) { - alg_nat_repl_addr = parent_conn->key.dst.addr; + alg_nat_repl_addr = pconn_key->dst.addr; alg_exp_node->nat_rpl_dst = true; } else { - alg_nat_repl_addr = parent_conn->key.src.addr; + alg_nat_repl_addr = pconn_key->src.addr; } } if (src_ip_wc) { memset(&src_addr, 0, sizeof src_addr); } - alg_exp_node->key.dl_type = parent_conn->key.dl_type; - alg_exp_node->key.nw_proto = parent_conn->key.nw_proto; - alg_exp_node->key.zone = parent_conn->key.zone; + alg_exp_node->key.dl_type = pconn_key->dl_type; + alg_exp_node->key.nw_proto = pconn_key->nw_proto; + alg_exp_node->key.zone = pconn_key->zone; alg_exp_node->key.src.addr = src_addr; alg_exp_node->key.dst.addr = dst_addr; alg_exp_node->key.src.port = ALG_WC_SRC_PORT; alg_exp_node->key.dst.port = dst_port; alg_exp_node->parent_mark = parent_conn->mark; alg_exp_node->parent_label = parent_conn->label; - memcpy(&alg_exp_node->parent_key, &parent_conn->key, + memcpy(&alg_exp_node->parent_key, pconn_key, sizeof alg_exp_node->parent_key); /* Take the write lock here because it is almost 100% * likely that the lookup will fail and @@ -3201,12 +3120,16 @@ process_ftp_ctl_v4(struct conntrack *ct, switch (mode) { case CT_FTP_MODE_ACTIVE: - *v4_addr_rep = conn_for_expectation->rev_key.dst.addr.ipv4; - conn_ipv4_addr = conn_for_expectation->key.src.addr.ipv4; + *v4_addr_rep = + conn_for_expectation->key_node[CT_DIR_REV].key.dst.addr.ipv4; + conn_ipv4_addr = + conn_for_expectation->key_node[CT_DIR_FWD].key.src.addr.ipv4; break; case CT_FTP_MODE_PASSIVE: - *v4_addr_rep = conn_for_expectation->key.dst.addr.ipv4; - conn_ipv4_addr = conn_for_expectation->rev_key.src.addr.ipv4; + *v4_addr_rep = + conn_for_expectation->key_node[CT_DIR_FWD].key.dst.addr.ipv4; + conn_ipv4_addr = + conn_for_expectation->key_node[CT_DIR_REV].key.src.addr.ipv4; break; case CT_TFTP_MODE: default: @@ -3306,17 +3229,20 @@ process_ftp_ctl_v6(struct conntrack *ct, switch (*mode) { case CT_FTP_MODE_ACTIVE: - *v6_addr_rep = conn_for_expectation->rev_key.dst.addr; + *v6_addr_rep = + conn_for_expectation->key_node[CT_DIR_REV].key.dst.addr; /* Although most servers will block this exploit, there may be some * less well managed. */ if (memcmp(&ip6_addr, &v6_addr_rep->ipv6, sizeof ip6_addr) && - memcmp(&ip6_addr, &conn_for_expectation->key.src.addr.ipv6, + memcmp(&ip6_addr, + &conn_for_expectation->key_node[CT_DIR_FWD].key.src.addr.ipv6, sizeof ip6_addr)) { return CT_FTP_CTL_INVALID; } break; case CT_FTP_MODE_PASSIVE: - *v6_addr_rep = conn_for_expectation->key.dst.addr; + *v6_addr_rep = + conn_for_expectation->key_node[CT_DIR_FWD].key.dst.addr; break; case CT_TFTP_MODE: default: @@ -3477,7 +3403,8 @@ handle_tftp_ctl(struct conntrack *ct, long long now OVS_UNUSED, enum ftp_ctl_pkt ftp_ctl OVS_UNUSED, bool nat OVS_UNUSED) { - expectation_create(ct, conn_for_expectation->key.src.port, + expectation_create(ct, + conn_for_expectation->key_node[CT_DIR_FWD].key.src.port, conn_for_expectation, !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); } From patchwork Mon Nov 29 18:06:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1561315 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=iOFxRhEf; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4J2tb23gy7z9sVc for ; Tue, 30 Nov 2021 05:06:38 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id DBCEE4093E; Mon, 29 Nov 2021 18:06:35 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BK9KY5WjHXuV; Mon, 29 Nov 2021 18:06:32 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id E576840605; Mon, 29 Nov 2021 18:06:31 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id B2632C0039; Mon, 29 Nov 2021 18:06:31 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 257BEC0012 for ; Mon, 29 Nov 2021 18:06:30 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 92003405AE for ; Mon, 29 Nov 2021 18:06:17 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eyzqs45QN7yJ for ; Mon, 29 Nov 2021 18:06:15 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id AFFF4404AB for ; Mon, 29 Nov 2021 18:06:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638209174; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JF+9Qd1WMVlRifTakZWMRU43DvGkFnimCO3C12gQVXk=; b=iOFxRhEf9MYpJf41IBTcFJLcV6VVESKlYlT1+M2oEDyYmydqtu2WeL2YG2ohfADc4T61ev cnIifLoXcFucdGz9RKEKAi8Zq924IOupDncttEDrLCPgx7lmywAdXIhI/2Sm6uhfDt3xZN rYVOph3mVix0uRQVqtDUWfyHfDukzdY= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-360-xMjR-7RaPPmkxQuBy96lOg-1; Mon, 29 Nov 2021 13:06:13 -0500 X-MC-Unique: xMjR-7RaPPmkxQuBy96lOg-1 Received: by mail-wr1-f69.google.com with SMTP id d18-20020adfe852000000b001985d36817cso3086986wrn.13 for ; Mon, 29 Nov 2021 10:06:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=loiYSW/ndbhuFazRtqUH+GGUZJtSIx7k9NrTyEMv5bE=; b=rsGiRx0Mk5hInPYsaT6rk3bdXLQztaBua8S37sJxOdtACR+Gd149GaIPf/0w2z4Z6w SLyYk9rBBHfPIMH3ssGWr286QWsdxmKsBisUULo6VW1JtuCnQ1h3Dllsytyh0bLi8JRX 7RMyzF6EbzwgBaZxbvYnCxeVXgBr/2Qeo0tM9OG0++tYBzbWkqG/dfT0V0u0fFKPW4jt bojcZLZfFnWFP5uIdtZ9emFkHvrfizRkf1W26N4fd5t+5nJ46OQyZta2uR6ii1zo5IPS +9CG1g5HwIAF5qkg0Svmku8uXdeVOygfc9PGnq9DLI8YUYqAITQW3AdAPdspJy13vcCz Syig== X-Gm-Message-State: AOAM532xtSxVQhhY/z6VdIc7BC5FDXCWWmQvb/uORPn1oe5zXMC570Cb 6iG/zW8Swj0bwZk5ItrW3LL5bWboCCxsB9NfxmE1IKCXppmOo0UqW46prvUJEyje9ALZXmGW3Ys ebMP21qFQg1mDH/NYrM9L6n8FSRxOepuDwnCA1znh0GkaZTIKYpJpOUfGab4U3ba8 X-Received: by 2002:adf:eb52:: with SMTP id u18mr35955749wrn.90.1638209169941; Mon, 29 Nov 2021 10:06:09 -0800 (PST) X-Google-Smtp-Source: ABdhPJw65EoNa08Dc0FLOAninpGY//IZjIxcjzzLyDOT69PKqk4bMbzM4Xw57QIhCynrHt2LVl64jQ== X-Received: by 2002:adf:eb52:: with SMTP id u18mr35955393wrn.90.1638209166745; Mon, 29 Nov 2021 10:06:06 -0800 (PST) Received: from localhost (net-5-88-23-84.cust.vodafonedsl.it. [5.88.23.84]) by smtp.gmail.com with ESMTPSA id 4sm18695640wrz.90.2021.11.29.10.06.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Nov 2021 10:06:06 -0800 (PST) From: Paolo Valerio To: dev@openvswitch.org Date: Mon, 29 Nov 2021 19:06:05 +0100 Message-ID: <163820916292.3001886.15857595669410962866.stgit@fed.void> In-Reply-To: <163820911055.3001886.13118680364054625917.stgit@fed.void> References: <163820911055.3001886.13118680364054625917.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: fbl@redhat.com, i.maximets@ovn.org Subject: [ovs-dev] [PATCH RFC 4/5] conntrack: Split single cmap to multiple buckets. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The purpose of this commit is to split the current way of storing the conn nodes. Before this patch the nodes were stored into a single cmap using ct->lock to avoid concurrent write access. With this commit a single connection can be stored into one or two (at most) CONNTRACK_BUCKETS available based on the outcome of the function hash_scale() on the key. Every bucket has its local lock that needs to be acquired every time a node has to be removed/inserted from/to the cmap. This means that, in case the hash of the CT_DIR_FWD key differs from the one of the CT_DIR_REV, we can end up having the reference of the two key nodes in different buckets, and consequently acquiring two locks (one per bucket). This approach may be handy in different ways, depending on the way the stale connection removal gets designed. The attempt of this patch is to remove the expiration lists, removing the stale entries mostly in two ways: - during the key lookup - when the sweeper task wakes up the first case is not very strict, as we remove only expired entries with the same hash. To increase its effectiveness, we should probably increase the number of buckets and replace the cmaps with other data structures like rcu lists. The sweeper task instead takes charge of the remaining stale entries removal. The heuristics used in the sweeper task are mostly an example, but could be modified to match any possible uncovered use case. Signed-off-by: Paolo Valerio --- The cover letter includes further details. --- lib/conntrack-private.h | 34 +++ lib/conntrack-tp.c | 42 ---- lib/conntrack.c | 461 +++++++++++++++++++++++++++++++---------------- tests/system-traffic.at | 5 - 4 files changed, 331 insertions(+), 211 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index ea5ba3d9e..a89ff96fa 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -95,6 +95,7 @@ struct alg_exp_node { struct conn_key_node { struct conn_key key; + uint32_t key_hash; struct cmap_node cm_node; }; @@ -102,7 +103,6 @@ struct conn { /* Immutable data. */ struct conn_key_node key_node[CT_DIR_MAX]; struct conn_key parent_key; /* Only used for orig_tuple support. */ - struct ovs_list exp_node; uint16_t nat_action; char *alg; @@ -121,7 +121,9 @@ struct conn { /* Mutable data. */ bool seq_skew_dir; /* TCP sequence skew direction due to NATTing of FTP * control messages; true if reply direction. */ - bool cleaned; /* True if cleaned from expiry lists. */ + atomic_flag cleaned; /* True if the entry was stale and one of the + * cleaner (i.e. packet path or sweeper) took + * charge of it. */ /* Immutable data. */ bool alg_related; /* True if alg data connection. */ @@ -192,10 +194,25 @@ enum ct_timeout { N_CT_TM }; -struct conntrack { - struct ovs_mutex ct_lock; /* Protects 2 following fields. */ +#define CONNTRACK_BUCKETS_SHIFT 10 +#define CONNTRACK_BUCKETS (1 << CONNTRACK_BUCKETS_SHIFT) + +struct ct_bucket { + /* Protects 'conns'. In case of natted conns, there's a high + * chance that the forward and the reverse key stand in different + * buckets. buckets_lock() should be the preferred way to acquire + * these locks (unless otherwise needed), as it deals with the + * acquisition order. */ + struct ovs_mutex lock; + /* Contains the connections in the bucket, indexed by + * 'struct conn_key'. */ struct cmap conns OVS_GUARDED; - struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED; +}; + +struct conntrack { + struct ct_bucket buckets[CONNTRACK_BUCKETS]; + unsigned int next_bucket; + struct ovs_mutex ct_lock; struct cmap zone_limits OVS_GUARDED; struct cmap timeout_policies OVS_GUARDED; uint32_t hash_basis; /* Salt for hashing a connection key. */ @@ -220,9 +237,10 @@ struct conntrack { }; /* Lock acquisition order: - * 1. 'ct_lock' - * 2. 'conn->lock' - * 3. 'resources_lock' + * 1. 'buckets[p1]->lock' + * 2 'buckets[p2]->lock' (with p1 < p2) + * 3. 'conn->lock' + * 4. 'resources_lock' */ extern struct ct_l4_proto ct_proto_tcp; diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index 9ecb06978..117810528 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -236,27 +236,6 @@ tm_to_ct_dpif_tp(enum ct_timeout tm) return CT_DPIF_TP_ATTR_MAX; } -static void -conn_update_expiration__(struct conntrack *ct, struct conn *conn, - enum ct_timeout tm, long long now, - uint32_t tp_value) - OVS_REQUIRES(conn->lock) -{ - ovs_mutex_unlock(&conn->lock); - - ovs_mutex_lock(&ct->ct_lock); - ovs_mutex_lock(&conn->lock); - if (!conn->cleaned) { - conn->expiration = now + tp_value * 1000; - ovs_list_remove(&conn->exp_node); - ovs_list_push_back(&ct->exp_lists[tm], &conn->exp_node); - } - ovs_mutex_unlock(&conn->lock); - ovs_mutex_unlock(&ct->ct_lock); - - ovs_mutex_lock(&conn->lock); -} - /* The conn entry lock must be held on entry and exit. */ void conn_update_expiration(struct conntrack *ct, struct conn *conn, @@ -266,42 +245,25 @@ conn_update_expiration(struct conntrack *ct, struct conn *conn, struct timeout_policy *tp; uint32_t val; - ovs_mutex_unlock(&conn->lock); - - ovs_mutex_lock(&ct->ct_lock); - ovs_mutex_lock(&conn->lock); tp = timeout_policy_lookup(ct, conn->tp_id); if (tp) { val = tp->policy.attrs[tm_to_ct_dpif_tp(tm)]; } else { val = ct_dpif_netdev_tp_def[tm_to_ct_dpif_tp(tm)]; } - ovs_mutex_unlock(&conn->lock); - ovs_mutex_unlock(&ct->ct_lock); - ovs_mutex_lock(&conn->lock); VLOG_DBG_RL(&rl, "Update timeout %s zone=%u with policy id=%d " "val=%u sec.", ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, conn->tp_id, val); - conn_update_expiration__(ct, conn, tm, now, val); -} - -static void -conn_init_expiration__(struct conntrack *ct, struct conn *conn, - enum ct_timeout tm, long long now, - uint32_t tp_value) -{ - conn->expiration = now + tp_value * 1000; - ovs_list_push_back(&ct->exp_lists[tm], &conn->exp_node); + conn->expiration = now + val * 1000; } /* ct_lock must be held. */ void conn_init_expiration(struct conntrack *ct, struct conn *conn, enum ct_timeout tm, long long now) - OVS_REQUIRES(ct->ct_lock) { struct timeout_policy *tp; uint32_t val; @@ -317,5 +279,5 @@ conn_init_expiration(struct conntrack *ct, struct conn *conn, ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, conn->tp_id, val); - conn_init_expiration__(ct, conn, tm, now, val); + conn->expiration = now + val * 1000; } diff --git a/lib/conntrack.c b/lib/conntrack.c index a284c57c0..1c019af29 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -85,9 +85,12 @@ struct zone_limit { struct conntrack_zone_limit czl; }; +static unsigned hash_scale(uint32_t hash); +static void conn_clean(struct conntrack *ct, struct conn *conn); static bool conn_key_extract(struct conntrack *, struct dp_packet *, ovs_be16 dl_type, struct conn_lookup_ctx *, uint16_t zone); +static uint32_t cached_key_hash(struct conn_key_node *n); static uint32_t conn_key_hash(const struct conn_key *, uint32_t basis); static void conn_key_reverse(struct conn_key *); static bool valid_new(struct dp_packet *pkt, struct conn_key *); @@ -109,8 +112,9 @@ static void set_label(struct dp_packet *, struct conn *, static void *clean_thread_main(void *f_); static bool -nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, - const struct nat_action_info_t *nat_info); +nat_get_unique_tuple_lock(struct conntrack *ct, struct conn *conn, + const struct nat_action_info_t *nat_info, + uint32_t *rev_hash); static uint8_t reverse_icmp_type(uint8_t type); @@ -249,16 +253,17 @@ conntrack_init(void) ovs_rwlock_unlock(&ct->resources_lock); ovs_mutex_init_adaptive(&ct->ct_lock); - ovs_mutex_lock(&ct->ct_lock); - cmap_init(&ct->conns); - for (unsigned i = 0; i < ARRAY_SIZE(ct->exp_lists); i++) { - ovs_list_init(&ct->exp_lists[i]); + + ct->next_bucket = 0; + for (unsigned i = 0; i < CONNTRACK_BUCKETS; i++) { + struct ct_bucket *bucket = &ct->buckets[i]; + cmap_init(&bucket->conns); + ovs_mutex_init_recursive(&bucket->lock); } + cmap_init(&ct->zone_limits); ct->zone_limit_seq = 0; timeout_policy_init(ct); - ovs_mutex_unlock(&ct->ct_lock); - atomic_count_init(&ct->n_conn, 0); atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT); atomic_init(&ct->tcp_seq_chk, true); @@ -410,9 +415,9 @@ zone_limit_delete(struct conntrack *ct, uint16_t zone) } static void -conn_clean(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) +conn_clean__(struct conntrack *ct, struct conn *conn) { + struct ct_bucket *bucket; struct zone_limit *zl; uint32_t hash; @@ -420,8 +425,9 @@ conn_clean(struct conntrack *ct, struct conn *conn) expectation_clean(ct, &conn->key_node[CT_DIR_FWD].key); } - hash = conn_key_hash(&conn->key_node[CT_DIR_FWD].key, ct->hash_basis); - cmap_remove(&ct->conns, &conn->key_node[CT_DIR_FWD].cm_node, hash); + hash = cached_key_hash(&conn->key_node[CT_DIR_FWD]); + bucket = &ct->buckets[hash_scale(hash)]; + cmap_remove(&bucket->conns, &conn->key_node[CT_DIR_FWD].cm_node, hash); zl = zone_limit_lookup(ct, conn->admit_zone); if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) { @@ -429,12 +435,10 @@ conn_clean(struct conntrack *ct, struct conn *conn) } if (conn->nat_action) { - hash = conn_key_hash(&conn->key_node[CT_DIR_REV].key, - ct->hash_basis); - cmap_remove(&ct->conns, &conn->key_node[CT_DIR_REV].cm_node, hash); + hash = cached_key_hash(&conn->key_node[CT_DIR_REV]); + bucket = &ct->buckets[hash_scale(hash)]; + cmap_remove(&bucket->conns, &conn->key_node[CT_DIR_REV].cm_node, hash); } - ovs_list_remove(&conn->exp_node); - conn->cleaned = true; ovsrcu_postpone(delete_conn, conn); atomic_count_dec(&ct->n_conn); } @@ -446,22 +450,35 @@ void conntrack_destroy(struct conntrack *ct) { struct conn_key_node *keyn; + struct ct_bucket *bucket; struct conn *conn; + int i; latch_set(&ct->clean_thread_exit); pthread_join(ct->clean_thread, NULL); latch_destroy(&ct->clean_thread_exit); - ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { - if (keyn->key.dir != CT_DIR_FWD) { - continue; + for (i = 0; i < CONNTRACK_BUCKETS; i++) { + bucket = &ct->buckets[i]; + CMAP_FOR_EACH (keyn, cm_node, &bucket->conns) { + if (keyn->key.dir != CT_DIR_FWD) { + continue; + } + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + conn_clean(ct, conn); } - conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); - conn_clean(ct, conn); } - cmap_destroy(&ct->conns); + /* XXX: we need this loop because connections may be in multiple + * buckets. The former loop should probably use conn_clean__() + * or an unlocked version of conn_clean(). */ + for (i = 0; i < CONNTRACK_BUCKETS; i++) { + bucket = &ct->buckets[i]; + ovs_mutex_destroy(&bucket->lock); + cmap_destroy(&ct->buckets[i].conns); + } + + ovs_mutex_lock(&ct->ct_lock); struct zone_limit *zl; CMAP_FOR_EACH (zl, node, &ct->zone_limits) { uint32_t hash = zone_key_hash(zl->czl.zone, ct->hash_basis); @@ -498,45 +515,108 @@ conntrack_destroy(struct conntrack *ct) } +static unsigned hash_scale(uint32_t hash) +{ + return (hash >> (32 - CONNTRACK_BUCKETS_SHIFT)) % CONNTRACK_BUCKETS; +} + static bool -conn_key_lookup(struct conntrack *ct, const struct conn_key *key, - uint32_t hash, long long now, struct conn **conn_out, +conn_key_lookup(struct conntrack *ct, unsigned bucket, + const struct conn_key *key, uint32_t hash, + long long now, struct conn **conn_out, bool *reply) { + struct ct_bucket *ctb = &ct->buckets[bucket]; struct conn_key_node *keyn; - struct conn *conn = NULL; bool found = false; + struct conn *conn; - CMAP_FOR_EACH_WITH_HASH (keyn, cm_node, hash, &ct->conns) { + CMAP_FOR_EACH_WITH_HASH (keyn, cm_node, hash, &ctb->conns) { conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + if (conn_expired(conn, now)) { + conn_clean(ct, conn); + continue; + } + for (int i = CT_DIR_FWD; i < CT_DIR_MAX; i++) { - if (!conn_key_cmp(&conn->key_node[i].key, key) && - !conn_expired(conn, now)) { + if (!conn_key_cmp(&conn->key_node[i].key, key)) { found = true; if (reply) { *reply = i; } - goto out_found; + + goto conn_out_found; } } } -out_found: - if (found && conn_out) { - *conn_out = conn; - } else if (conn_out) { - *conn_out = NULL; +conn_out_found: + if (conn_out) { + *conn_out = found ? conn : NULL; } return found; } +static void +buckets_unlock(struct conntrack *ct, uint32_t h1, uint32_t h2) +{ + unsigned p1 = hash_scale(h1), + p2 = hash_scale(h2); + + if (p1 > p2) { + ovs_mutex_unlock(&ct->buckets[p1].lock); + ovs_mutex_unlock(&ct->buckets[p2].lock); + } else if (p1 < p2) { + ovs_mutex_unlock(&ct->buckets[p2].lock); + ovs_mutex_unlock(&ct->buckets[p1].lock); + } else { + ovs_mutex_unlock(&ct->buckets[p1].lock); + } +} + +/* Acquires both locks in an ordered way. */ +static void +buckets_lock(struct conntrack *ct, uint32_t h1, uint32_t h2) +{ + unsigned p1 = hash_scale(h1), + p2 = hash_scale(h2); + + if (p1 < p2) { + ovs_mutex_lock(&ct->buckets[p1].lock); + ovs_mutex_lock(&ct->buckets[p2].lock); + } else if (p1 > p2) { + ovs_mutex_lock(&ct->buckets[p2].lock); + ovs_mutex_lock(&ct->buckets[p1].lock); + } else { + ovs_mutex_lock(&ct->buckets[p1].lock); + } +} + +static void +conn_clean(struct conntrack *ct, struct conn *conn) +{ + uint32_t h1, h2; + + if (atomic_flag_test_and_set(&conn->cleaned)) { + return; + } + + h1 = cached_key_hash(&conn->key_node[CT_DIR_FWD]); + h2 = cached_key_hash(&conn->key_node[CT_DIR_REV]); + buckets_lock(ct, h1, h2); + conn_clean__(ct, conn); + buckets_unlock(ct, h1, h2); +} + static bool conn_lookup(struct conntrack *ct, const struct conn_key *key, long long now, struct conn **conn_out, bool *reply) { uint32_t hash = conn_key_hash(key, ct->hash_basis); - return conn_key_lookup(ct, key, hash, now, conn_out, reply); + unsigned bucket = hash_scale(hash); + + return conn_key_lookup(ct, bucket, key, hash, now, conn_out, reply); } static void @@ -944,7 +1024,6 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, const struct nat_action_info_t *nat_action_info, const char *helper, const struct alg_exp_node *alg_exp, enum ct_alg_ctl_type ct_alg_ctl, uint32_t tp_id) - OVS_REQUIRES(ct->ct_lock) { struct conn *nc = NULL; uint32_t rev_hash = ctx->hash; @@ -954,6 +1033,8 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, return nc; } + /* XXX: We are unlocked here, so we don't know + * if the tuple already exists in the table. */ pkt->md.ct_state = CS_NEW; if (alg_exp) { @@ -961,10 +1042,11 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } if (commit) { - struct conn_key_node *fwd_key_node, *rev_key_node; - struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); + struct conn_key_node *fwd_key_node, *rev_key_node; + bool handle_tuple = false; + if (zl && atomic_count_get(&zl->czl.count) >= zl->czl.limit) { return nc; } @@ -1007,22 +1089,40 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, nc->nat_action = NAT_ACTION_DST; } } else { - bool nat_res = nat_get_unique_tuple(ct, nc, nat_action_info); - - if (!nat_res) { - goto nat_res_exhaustion; - } + handle_tuple = true; } - - nat_packet(pkt, nc, ctx->icmp_related); - rev_hash = conn_key_hash(&rev_key_node->key, ct->hash_basis); - rev_key_node->key.dir = CT_DIR_REV; - cmap_insert(&ct->conns, &rev_key_node->cm_node, rev_hash); } ovs_mutex_init_adaptive(&nc->lock); + atomic_flag_clear(&nc->cleaned); fwd_key_node->key.dir = CT_DIR_FWD; - cmap_insert(&ct->conns, &fwd_key_node->cm_node, ctx->hash); + rev_key_node->key.dir = CT_DIR_REV; + + if (handle_tuple) { + bool nat_res = nat_get_unique_tuple_lock(ct, nc, nat_action_info, + &rev_hash); + + if (!nat_res) { + goto out_error; + } + } else { + rev_hash = conn_key_hash(&rev_key_node->key, ct->hash_basis); + buckets_lock(ct, ctx->hash, rev_hash); + } + + if (conn_lookup(ct, &ctx->key, now, NULL, NULL)) { + goto out_error_unlock; + } + + fwd_key_node->key_hash = ctx->hash; + cmap_insert(&ct->buckets[hash_scale(ctx->hash)].conns, + &fwd_key_node->cm_node, ctx->hash); + if (nat_action_info) { + rev_key_node->key_hash = rev_hash; + cmap_insert(&ct->buckets[hash_scale(rev_hash)].conns, + &rev_key_node->cm_node, rev_hash); + nat_packet(pkt, nc, ctx->icmp_related); + } atomic_count_inc(&ct->n_conn); ctx->conn = nc; /* For completeness. */ @@ -1033,21 +1133,23 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } else { nc->admit_zone = INVALID_ZONE; } + buckets_unlock(ct, ctx->hash, rev_hash); } return nc; +out_error_unlock: + buckets_unlock(ct, ctx->hash, rev_hash); /* This would be a user error or a DOS attack. A user error is prevented * by allocating enough combinations of NAT addresses when combined with * ephemeral ports. A DOS attack should be protected against with * firewall rules or a separate firewall. Also using zone partitioning * can limit DoS impact. */ -nat_res_exhaustion: - ovs_list_remove(&nc->exp_node); +out_error: + ovs_mutex_destroy(&nc->lock); delete_conn_cmn(nc); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); - VLOG_WARN_RL(&rl, "Unable to NAT due to tuple space exhaustion - " - "if DoS attack, use firewalling and/or zone partitioning."); + VLOG_WARN_RL(&rl, "Unable to insert a new connection."); return NULL; } @@ -1082,12 +1184,7 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, pkt->md.ct_state = CS_INVALID; break; case CT_UPDATE_NEW: - ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, - now, NULL, NULL)) { - conn_clean(ct, conn); - } - ovs_mutex_unlock(&ct->ct_lock); + conn_clean(ct, conn); create_new_conn = true; break; case CT_UPDATE_VALID_NEW: @@ -1253,6 +1350,8 @@ static void initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx, long long now, bool natted) { + unsigned bucket = hash_scale(ctx->hash); + if (natted) { /* If the packet has been already natted (e.g. a previous * action took place), retrieve it performing a lookup of its @@ -1260,7 +1359,8 @@ initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx, conn_key_reverse(&ctx->key); } - conn_key_lookup(ct, &ctx->key, ctx->hash, now, &ctx->conn, &ctx->reply); + conn_key_lookup(ct, bucket, &ctx->key, ctx->hash, + now, &ctx->conn, &ctx->reply); if (natted) { if (OVS_LIKELY(ctx->conn)) { @@ -1287,24 +1387,20 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper, uint32_t tp_id) { + bool create_new_conn = false; + /* Reset ct_state whenever entering a new zone. */ if (pkt->md.ct_state && pkt->md.ct_zone != zone) { pkt->md.ct_state = 0; } - bool create_new_conn = false; initial_conn_lookup(ct, ctx, now, !!(pkt->md.ct_state & (CS_SRC_NAT | CS_DST_NAT))); struct conn *conn = ctx->conn; /* Delete found entry if in wrong direction. 'force' implies commit. */ if (OVS_UNLIKELY(force && ctx->reply && conn)) { - ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, - now, NULL, NULL)) { - conn_clean(ct, conn); - } - ovs_mutex_unlock(&ct->ct_lock); + conn_clean(ct, conn); conn = NULL; } @@ -1338,7 +1434,6 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, struct alg_exp_node alg_exp_entry; if (OVS_UNLIKELY(create_new_conn)) { - ovs_rwlock_rdlock(&ct->resources_lock); alg_exp = expectation_lookup(&ct->alg_expectations, &ctx->key, ct->hash_basis, @@ -1349,12 +1444,9 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, } ovs_rwlock_unlock(&ct->resources_lock); - ovs_mutex_lock(&ct->ct_lock); - if (!conn_lookup(ct, &ctx->key, now, NULL, NULL)) { - conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info, - helper, alg_exp, ct_alg_ctl, tp_id); - } - ovs_mutex_unlock(&ct->ct_lock); + conn = conn_not_found(ct, pkt, ctx, commit, now, + nat_action_info, helper, alg_exp, + ct_alg_ctl, tp_id); } write_ct_md(pkt, zone, conn, &ctx->key, alg_exp); @@ -1467,83 +1559,92 @@ set_label(struct dp_packet *pkt, struct conn *conn, } -/* Delete the expired connections from 'ctb', up to 'limit'. Returns the - * earliest expiration time among the remaining connections in 'ctb'. Returns - * LLONG_MAX if 'ctb' is empty. The return value might be smaller than 'now', - * if 'limit' is reached */ +/* Delete the expired connections from 'bucket', up to 'limit'. + * Returns the earliest expiration time among the remaining + * connections in 'bucket'. Returns LLONG_MAX if 'bucket' is empty. + * The return value might be smaller than 'now', if 'limit' is + * reached. */ static long long -ct_sweep(struct conntrack *ct, long long now, size_t limit) +sweep_bucket(struct conntrack *ct, struct ct_bucket *bucket, + long long now) { - struct conn *conn, *next; - long long min_expiration = LLONG_MAX; - size_t count = 0; + struct conn_key_node *keyn; + unsigned int conn_count = 0; + struct conn *conn; + long long expiration; - ovs_mutex_lock(&ct->ct_lock); + CMAP_FOR_EACH (keyn, cm_node, &bucket->conns) { + if (keyn->key.dir != CT_DIR_FWD) { + continue; + } - for (unsigned i = 0; i < N_CT_TM; i++) { - LIST_FOR_EACH_SAFE (conn, next, exp_node, &ct->exp_lists[i]) { - ovs_mutex_lock(&conn->lock); - if (now < conn->expiration || count >= limit) { - min_expiration = MIN(min_expiration, conn->expiration); - ovs_mutex_unlock(&conn->lock); - if (count >= limit) { - /* Do not check other lists. */ - COVERAGE_INC(conntrack_long_cleanup); - goto out; - } - break; - } else { - ovs_mutex_unlock(&conn->lock); - conn_clean(ct, conn); - } - count++; + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + ovs_mutex_lock(&conn->lock); + expiration = conn->expiration; + ovs_mutex_unlock(&conn->lock); + + if (now >= expiration) { + conn_clean(ct, conn); } + + conn_count++; } -out: - VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec", count, - time_msec() - now); - ovs_mutex_unlock(&ct->ct_lock); - return min_expiration; + return conn_count; } -/* Cleans up old connection entries from 'ct'. Returns the time when the - * next expiration might happen. The return value might be smaller than - * 'now', meaning that an internal limit has been reached, and some expired - * connections have not been deleted. */ +/* Cleans up old connection entries from 'ct'. Returns the the next + * wake up time. The return value might be smaller than 'now', meaning + * that an internal limit has been reached, that is, the table + * hasn't been entirely scanned. */ static long long conntrack_clean(struct conntrack *ct, long long now) { - unsigned int n_conn_limit; + long long next_wakeup = now + 90 * 1000; + unsigned int n_conn_limit, i, count = 0; + size_t clean_end; + atomic_read_relaxed(&ct->n_conn_limit, &n_conn_limit); - size_t clean_max = n_conn_limit > 10 ? n_conn_limit / 10 : 1; - long long min_exp = ct_sweep(ct, now, clean_max); - long long next_wakeup = MIN(min_exp, now + CT_DPIF_NETDEV_TP_MIN); + clean_end = n_conn_limit / 64; + + for (i = ct->next_bucket; i < CONNTRACK_BUCKETS; i++) { + struct ct_bucket *bucket = &ct->buckets[i]; + + count += sweep_bucket(ct, bucket, now); + + if (count > clean_end) { + next_wakeup = 0; + break; + } + } + + ct->next_bucket = (i < CONNTRACK_BUCKETS) ? i : 0; return next_wakeup; } /* Cleanup: * - * We must call conntrack_clean() periodically. conntrack_clean() return - * value gives an hint on when the next cleanup must be done (either because - * there is an actual connection that expires, or because a new connection - * might be created with the minimum timeout). + * We must call conntrack_clean() periodically. conntrack_clean() + * return value gives an hint on when the next cleanup must be done + * (either because there is still work to do, or because a new + * connection might be created). * * The logic below has two goals: * - * - We want to reduce the number of wakeups and batch connection cleanup - * when the load is not very high. CT_CLEAN_INTERVAL ensures that if we - * are coping with the current cleanup tasks, then we wait at least - * 5 seconds to do further cleanup. + * - When the load is high, we want to avoid to hog the CPU scanning + * all the buckets and their respective CMAPs "at once". For this + * reason, every batch cleanup aims to scan at most n_conn_limit / + * 64 entries (more if the buckets contains many entrie) before + * yielding the CPU. In this case, the next wake up will happen in + * CT_CLEAN_MIN_INTERVAL_MS and the scan will resume starting from + * the first bucket not scanned. * - * - We don't want to keep the map locked too long, as we might prevent - * traffic from flowing. CT_CLEAN_MIN_INTERVAL ensures that if cleanup is - * behind, there is at least some 200ms blocks of time when the map will be - * left alone, so the datapath can operate unhindered. - */ -#define CT_CLEAN_INTERVAL 5000 /* 5 seconds */ -#define CT_CLEAN_MIN_INTERVAL 200 /* 0.2 seconds */ + * - We also don't want to scan the buckets so frequently, as going + * through all the connections, during high loads, may be costly in + * terms of CPU time. In this case the next wake up is set to 90 + * seconds. */ +#define CT_CLEAN_MIN_INTERVAL_MS 100 /* 0.1 seconds */ static void * clean_thread_main(void *f_) @@ -1556,9 +1657,9 @@ clean_thread_main(void *f_) next_wake = conntrack_clean(ct, now); if (next_wake < now) { - poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL); + poll_timer_wait_until(now + CT_CLEAN_MIN_INTERVAL_MS); } else { - poll_timer_wait_until(MAX(next_wake, now + CT_CLEAN_INTERVAL)); + poll_timer_wait_until(next_wake); } latch_wait(&ct->clean_thread_exit); poll_block(); @@ -2088,6 +2189,12 @@ ct_endpoint_hash_add(uint32_t hash, const struct ct_endpoint *ep) return hash_add_bytes32(hash, (const uint32_t *) ep, sizeof *ep); } +static uint32_t +cached_key_hash(struct conn_key_node *n) +{ + return n->key_hash; +} + /* Symmetric */ static uint32_t conn_key_hash(const struct conn_key *key, uint32_t basis) @@ -2357,8 +2464,9 @@ next_addr_in_range_guarded(union ct_addr *curr, union ct_addr *min, * * If none can be found, return exhaustion to the caller. */ static bool -nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, - const struct nat_action_info_t *nat_info) +nat_get_unique_tuple_lock(struct conntrack *ct, struct conn *conn, + const struct nat_action_info_t *nat_info, + uint32_t *rev_hash) { union ct_addr min_addr = {0}, max_addr = {0}, curr_addr = {0}, guard_addr = {0}; @@ -2392,10 +2500,15 @@ another_round: nat_info->nat_action); if (!pat_proto) { + uint32_t key_hash = conn_key_hash(fwd_key, ct->hash_basis); + *rev_hash = conn_key_hash(rev_key, ct->hash_basis); + + buckets_lock(ct, key_hash, *rev_hash); if (!conn_lookup(ct, rev_key, time_msec(), NULL, NULL)) { return true; } + buckets_unlock(ct, key_hash, *rev_hash); goto next_addr; } @@ -2404,10 +2517,15 @@ another_round: rev_key->src.port = htons(curr_dport); FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) { rev_key->dst.port = htons(curr_sport); + uint32_t key_hash = conn_key_hash(fwd_key, ct->hash_basis); + *rev_hash = conn_key_hash(rev_key, ct->hash_basis); + + buckets_lock(ct, key_hash, *rev_hash); if (!conn_lookup(ct, rev_key, time_msec(), NULL, NULL)) { return true; } + buckets_unlock(ct, key_hash, *rev_hash); } } @@ -2615,20 +2733,39 @@ conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry) { struct conntrack *ct = dump->ct; long long now = time_msec(); + struct ct_bucket *bucket; - for (;;) { - struct cmap_node *cm_node = cmap_next_position(&ct->conns, - &dump->cm_pos); - if (!cm_node) { - break; + while (dump->bucket < CONNTRACK_BUCKETS) { + struct cmap_node *cm_node; + bucket = &ct->buckets[dump->bucket]; + + for (;;) { + cm_node = cmap_next_position(&bucket->conns, + &dump->cm_pos); + if (!cm_node) { + break; + } + struct conn_key_node *keyn; + struct conn *conn; + INIT_CONTAINER(keyn, cm_node, cm_node); + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + + if (conn_expired(conn, now)) { + /* XXX: ideally this should call conn_clean(). */ + continue; + } + + if ((!dump->filter_zone || keyn->key.zone == dump->zone) && + (keyn->key.dir == CT_DIR_FWD)) { + conn_to_ct_dpif_entry(conn, entry, now); + break; + } } - struct conn_key_node *keyn; - struct conn *conn; - INIT_CONTAINER(keyn, cm_node, cm_node); - conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); - if ((!dump->filter_zone || keyn->key.zone == dump->zone) && - (keyn->key.dir == CT_DIR_FWD)) { - conn_to_ct_dpif_entry(conn, entry, now); + + if (!cm_node) { + memset(&dump->cm_pos, 0, sizeof dump->cm_pos); + dump->bucket++; + } else { return 0; } } @@ -2648,17 +2785,18 @@ conntrack_flush(struct conntrack *ct, const uint16_t *zone) struct conn_key_node *keyn; struct conn *conn; - ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { - if (keyn->key.dir != CT_DIR_FWD) { - continue; - } - conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); - if (!zone || *zone == keyn->key.zone) { - conn_clean(ct, conn); + for (unsigned i = 0; i < CONNTRACK_BUCKETS; i++) { + CMAP_FOR_EACH (keyn, cm_node, &ct->buckets[i].conns) { + if (keyn->key.dir != CT_DIR_FWD) { + continue; + } + + conn = CONTAINER_OF(keyn, struct conn, key_node[keyn->key.dir]); + if (!zone || *zone == keyn->key.zone) { + conn_clean(ct, conn); + } } } - ovs_mutex_unlock(&ct->ct_lock); return 0; } @@ -2667,15 +2805,19 @@ int conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple, uint16_t zone) { - int error = 0; struct conn_key key; - struct conn *conn; + struct conn *conn = NULL; + unsigned bucket; + uint32_t hash; + int error = 0; memset(&key, 0, sizeof(key)); tuple_to_conn_key(tuple, zone, &key); - ovs_mutex_lock(&ct->ct_lock); - conn_lookup(ct, &key, time_msec(), &conn, NULL); + hash = conn_key_hash(&key, ct->hash_basis); + bucket = hash_scale(hash); + + conn_key_lookup(ct, bucket, &key, hash, time_msec(), &conn, NULL); if (conn) { conn_clean(ct, conn); } else { @@ -2683,7 +2825,6 @@ conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple, error = ENOENT; } - ovs_mutex_unlock(&ct->ct_lock); return error; } diff --git a/tests/system-traffic.at b/tests/system-traffic.at index d79753a99..9a48a9a7f 100644 --- a/tests/system-traffic.at +++ b/tests/system-traffic.at @@ -4611,9 +4611,8 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2) | sed -e 's/dst= tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.2XX,sport=,dport=),zone=1,protoinfo=(state=) ]) -OVS_TRAFFIC_VSWITCHD_STOP(["dnl -/Unable to NAT due to tuple space exhaustion - if DoS attack, use firewalling and\/or zone partitioning./d -/Dropped .* log messages in last .* seconds \(most recently, .* seconds ago\) due to excessive rate/d"]) +OVS_TRAFFIC_VSWITCHD_STOP(["/dnl +Unable to insert a new connection./d"]) AT_CLEANUP AT_SETUP([conntrack - more complex SNAT]) From patchwork Mon Nov 29 18:06:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Valerio X-Patchwork-Id: 1561316 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=AG4LeDKU; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4J2tbH050qz9sVc for ; Tue, 30 Nov 2021 05:06:50 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id B749841BF2; Mon, 29 Nov 2021 18:06:48 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w8eKY6tSUFjs; Mon, 29 Nov 2021 18:06:46 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTPS id 53CEC40585; Mon, 29 Nov 2021 18:06:45 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 188BDC0012; Mon, 29 Nov 2021 18:06:45 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 44B53C000A for ; Mon, 29 Nov 2021 18:06:43 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 9905140449 for ; Mon, 29 Nov 2021 18:06:23 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp2.osuosl.org (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kM3Dh6F5jO6Q for ; Mon, 29 Nov 2021 18:06:22 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id 9E99C40434 for ; Mon, 29 Nov 2021 18:06:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638209181; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BGhRYrbeRXHJw7U2B55OtjxbT3rGMPWH7aQ16Tkbsfs=; b=AG4LeDKUqdWDrs97n6kFTAUNQdxUU2MhE4JvG3HAcY6Pgj5e8L66CSs9yQvKRcPN4DvFJ2 jHxOj/iuVRaLapGzi0OUA+BsMjllVzfZ3K8NW5cBDEN0Gk/TIlZjtWhPS7PJ9aSc+Bxu8k GtXhkdwICo5BGeMVDFXRJ/Bn/iJNTw4= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-355-GRuXz5WVMs6uvGGDRW5L1A-1; Mon, 29 Nov 2021 13:06:20 -0500 X-MC-Unique: GRuXz5WVMs6uvGGDRW5L1A-1 Received: by mail-wm1-f72.google.com with SMTP id 145-20020a1c0197000000b0032efc3eb9bcso12278323wmb.0 for ; Mon, 29 Nov 2021 10:06:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=BGhRYrbeRXHJw7U2B55OtjxbT3rGMPWH7aQ16Tkbsfs=; b=LsOjIOM1+815wYRrr+DLrcM2m8jRMcRTEQlmGywJmBYgjvF+2H5U8sbpWriYC9VMD0 e006OiHC3YNkmpSH4xI8ym+92/W/ClRt7CeCGrtO2v8LpUjKzCNvo2OENcQtl2LGj+QE Vf20vCNzxY/fK7KWX1ELUTOwgcteK1gSMmExYqVNFFVuSSMWp2RcR0ZCCzbTEMnvvqTp 1kuJHt35NJuRDFxJdJCxvEVgbT9a1fteILLd/6iCLqMG/w6ul8O4F0gr5rhw4SiQ5svK tfRp31gG8BOWDyN77o9DBEj7e6qFLvXcTHPGbxXW46+hZa41wfBUDnELih5I5mhXlWkz h8YQ== X-Gm-Message-State: AOAM531nVhTSJ3/Z6+xa4Lf+Elo8s7iMfJcDQQWxcS/wj4VRZ7+6w+Zf tXciISvFqS4A4YmW6fIj0S+smhEoG+VgcHgep9h7jQPfZnzHFt29b70v37R1HqtXIcJCAGfuApB YjZCXBfO4FrBPadqPsAAVMBhYUk5/YwB0dZ+bonU8WFnj6azc8Cjk/UpZnTvAZePF X-Received: by 2002:adf:ea0a:: with SMTP id q10mr36418486wrm.1.1638209178397; Mon, 29 Nov 2021 10:06:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJwcTwmw0Id6YizYgv+Zz5BMTlGsEEodbsYSGsa7HthzLbapMYAg//kgBxzTh+aqIyXbRTghhw== X-Received: by 2002:adf:ea0a:: with SMTP id q10mr36418454wrm.1.1638209178205; Mon, 29 Nov 2021 10:06:18 -0800 (PST) Received: from localhost (net-5-88-23-84.cust.vodafonedsl.it. [5.88.23.84]) by smtp.gmail.com with ESMTPSA id b13sm11064572wrh.32.2021.11.29.10.06.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Nov 2021 10:06:17 -0800 (PST) From: Paolo Valerio To: dev@openvswitch.org Date: Mon, 29 Nov 2021 19:06:17 +0100 Message-ID: <163820917300.3001886.2414895920910370135.stgit@fed.void> In-Reply-To: <163820911055.3001886.13118680364054625917.stgit@fed.void> References: <163820911055.3001886.13118680364054625917.stgit@fed.void> User-Agent: StGit/1.1 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: fbl@redhat.com, i.maximets@ovn.org Subject: [ovs-dev] [PATCH RFC 5/5] conntrack: Make ovs-appctl dpctl/ct-bkts work with multiple buckets X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Without this patch "ovs-appctl dpctl/ct-bkts" produces the following output: Total Buckets: 1 Current Connections: 10246 +-----------+-----------------------------------------+ | Buckets | Connections per Buckets | +-----------+-----------------------------------------+ 0.. 7 | 10246 with this patch applied, the output becomes: Total Buckets: 1024 Current Connections: 95956 +-----------+-----------------------------------------+ | Buckets | Connections per Buckets | +-----------+-----------------------------------------+ 0.. 7 | 87 100 90 91 92 92 101 83 8.. 15 | 98 86 80 91 92 93 92 84 16.. 23 | 82 114 103 90 94 80 95 96 ... 1000..1007 | 98 88 106 100 91 99 89 81 1008..1015 | 97 99 93 67 102 97 89 86 1016..1023 | 113 79 97 86 106 93 80 90 Signed-off-by: Paolo Valerio --- lib/conntrack.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 1c019af29..cbeafb22b 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -2675,7 +2675,7 @@ tuple_to_conn_key(const struct ct_dpif_tuple *tuple, uint16_t zone, static void conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, - long long now) + long long now, unsigned int bkt) { const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key, *rev_key = &conn->key_node[CT_DIR_REV].key; @@ -2699,6 +2699,7 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, ovs_mutex_unlock(&conn->lock); entry->timeout = (expiration > 0) ? expiration / 1000 : 0; + entry->bkt = bkt; if (conn->alg) { /* Caller is responsible for freeing. */ @@ -2724,7 +2725,7 @@ conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump, } dump->ct = ct; - *ptot_bkts = 1; /* Need to clean up the callers. */ + *ptot_bkts = CONNTRACK_BUCKETS; return 0; } @@ -2757,7 +2758,7 @@ conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry) if ((!dump->filter_zone || keyn->key.zone == dump->zone) && (keyn->key.dir == CT_DIR_FWD)) { - conn_to_ct_dpif_entry(conn, entry, now); + conn_to_ct_dpif_entry(conn, entry, now, dump->bucket); break; } }