From patchwork Tue Oct 3 19:05:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 1842873 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=fUGSpv1X; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4S0S393q4Tz1yqM for ; Wed, 4 Oct 2023 06:06:13 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 496F9418E2; Tue, 3 Oct 2023 19:06:11 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 496F9418E2 Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=fUGSpv1X X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 64RqJv_p0O-l; Tue, 3 Oct 2023 19:06:10 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id E5F52417C4; Tue, 3 Oct 2023 19:06:08 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org E5F52417C4 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id C4E02C0071; Tue, 3 Oct 2023 19:06:08 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id DE55BC0032 for ; Tue, 3 Oct 2023 19:06:07 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id B2172417D1 for ; Tue, 3 Oct 2023 19:06:07 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org B2172417D1 X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7T4wg3ICoZqx for ; Tue, 3 Oct 2023 19:06:06 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id 53E0B417C4 for ; Tue, 3 Oct 2023 19:06:06 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 53E0B417C4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696359964; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=d0JRY9L1GwW3ju2KbxBEgFVUPbsMspyCPpH/Itv57GE=; b=fUGSpv1X15PxY+ZFNJWKTIxjU9pgLI/J/ABPKvKOHIyBrwMojv7fvbIbZkyI136Evp7euv /yTyzaahWH5eUWz/Jcht+fHS/aDWP1IRTQxiVR2vHc1nsqJEC7Zjg11RD/f5ZX8F5c5Icz 2Q0v/vFfRE9T9Vs2DT5Ycz+ehIWO5Hc= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-695-F9O-3gPqPw22-A7JpA4Jpg-1; Tue, 03 Oct 2023 15:05:59 -0400 X-MC-Unique: F9O-3gPqPw22-A7JpA4Jpg-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A91A73806707; Tue, 3 Oct 2023 19:05:58 +0000 (UTC) Received: from RHTPC1VM0NT.lan (unknown [10.22.10.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id AB3CF493113; Tue, 3 Oct 2023 19:05:57 +0000 (UTC) From: Aaron Conole To: dev@openvswitch.org Date: Tue, 3 Oct 2023 15:05:56 -0400 Message-Id: <20231003190557.423232-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: Peng He , Ilya Maximets , Michael Plato Subject: [ovs-dev] [PATCH v3 branch-2.17 1/2] conntrack: simplify cleanup path X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The conntrack cleanup and allocation code is spread across multiple list invocations. This was changed in mainline code when the timeout expiration lists were refactored, but backporting those fixes would be a rather large effort. Instead, take only the changes we need to backport "contrack: Remove nat_conn introducing key directionality" into branch-2.17. Signed-off-by: Aaron Conole Co-authored-by: Paolo Valerio Signed-off-by: Paolo Valerio Tested-by: Frode Nordahl Acked-by: Simon Horman --- lib/conntrack.c | 60 +++++++++++++++---------------------------------- 1 file changed, 18 insertions(+), 42 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index fff8e77db1..83a73995d6 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -94,9 +94,8 @@ static bool valid_new(struct dp_packet *pkt, struct conn_key *); static struct conn *new_conn(struct conntrack *ct, struct dp_packet *pkt, struct conn_key *, long long now, uint32_t tp_id); -static void delete_conn_cmn(struct conn *); +static void delete_conn__(struct conn *); static void delete_conn(struct conn *); -static void delete_conn_one(struct conn *conn); static enum ct_update_res conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, @@ -444,9 +443,11 @@ zone_limit_delete(struct conntrack *ct, uint16_t zone) } static void -conn_clean_cmn(struct conntrack *ct, struct conn *conn) +conn_clean(struct conntrack *ct, struct conn *conn) OVS_REQUIRES(ct->ct_lock) { + ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); + if (conn->alg) { expectation_clean(ct, &conn->key); } @@ -458,19 +459,9 @@ conn_clean_cmn(struct conntrack *ct, struct conn *conn) if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) { zl->czl.count--; } -} -/* Must be called with 'conn' of 'conn_type' CT_CONN_TYPE_DEFAULT. Also - * removes the associated nat 'conn' from the lookup datastructures. */ -static void -conn_clean(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) -{ - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); - - conn_clean_cmn(ct, conn); if (conn->nat_conn) { - uint32_t hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis); + hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis); cmap_remove(&ct->conns, &conn->nat_conn->cm_node, hash); } ovs_list_remove(&conn->exp_node); @@ -479,19 +470,6 @@ conn_clean(struct conntrack *ct, struct conn *conn) atomic_count_dec(&ct->n_conn); } -static void -conn_clean_one(struct conntrack *ct, struct conn *conn) - OVS_REQUIRES(ct->ct_lock) -{ - conn_clean_cmn(ct, conn); - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_list_remove(&conn->exp_node); - conn->cleaned = true; - atomic_count_dec(&ct->n_conn); - } - ovsrcu_postpone(delete_conn_one, conn); -} - /* Destroys the connection tracker 'ct' and frees all the allocated memory. * The caller of this function must already have shut down packet input * and PMD threads (which would have been quiesced). */ @@ -505,7 +483,11 @@ conntrack_destroy(struct conntrack *ct) ovs_mutex_lock(&ct->ct_lock); CMAP_FOR_EACH (conn, cm_node, &ct->conns) { - conn_clean_one(ct, conn); + if (conn->conn_type != CT_CONN_TYPE_DEFAULT) { + continue; + } + + conn_clean(ct, conn); } cmap_destroy(&ct->conns); @@ -1009,7 +991,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, nat_res_exhaustion: free(nat_conn); ovs_list_remove(&nc->exp_node); - delete_conn_cmn(nc); + delete_conn__(nc); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); VLOG_WARN_RL(&rl, "Unable to NAT due to tuple space exhaustion - " "if DoS attack, use firewalling and/or zone partitioning."); @@ -2475,7 +2457,7 @@ new_conn(struct conntrack *ct, struct dp_packet *pkt, struct conn_key *key, } static void -delete_conn_cmn(struct conn *conn) +delete_conn__(struct conn *conn) { free(conn->alg); free(conn); @@ -2487,17 +2469,7 @@ delete_conn(struct conn *conn) ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); ovs_mutex_destroy(&conn->lock); free(conn->nat_conn); - delete_conn_cmn(conn); -} - -/* Only used by conn_clean_one(). */ -static void -delete_conn_one(struct conn *conn) -{ - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_mutex_destroy(&conn->lock); - } - delete_conn_cmn(conn); + delete_conn__(conn); } /* Convert a conntrack address 'a' into an IP address 'b' based on 'dl_type'. @@ -2673,8 +2645,12 @@ conntrack_flush(struct conntrack *ct, const uint16_t *zone) ovs_mutex_lock(&ct->ct_lock); CMAP_FOR_EACH (conn, cm_node, &ct->conns) { + if (conn->conn_type != CT_CONN_TYPE_DEFAULT) { + continue; + } + if (!zone || *zone == conn->key.zone) { - conn_clean_one(ct, conn); + conn_clean(ct, conn); } } ovs_mutex_unlock(&ct->ct_lock); From patchwork Tue Oct 3 19:05:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 1842874 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=LiXcbMuf; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4S0S3K47d1z1yqM for ; Wed, 4 Oct 2023 06:06:21 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id BA43861197; Tue, 3 Oct 2023 19:06:18 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org BA43861197 Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=LiXcbMuf X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OTEh6OBMYnDe; Tue, 3 Oct 2023 19:06:12 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id 9FD046107C; Tue, 3 Oct 2023 19:06:11 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 9FD046107C Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 62EC0C0071; Tue, 3 Oct 2023 19:06:10 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 85264C0032 for ; Tue, 3 Oct 2023 19:06:08 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 6CC3782012 for ; Tue, 3 Oct 2023 19:06:08 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 6CC3782012 Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=LiXcbMuf X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MTQYn9rDx12P for ; Tue, 3 Oct 2023 19:06:06 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 5364582004 for ; Tue, 3 Oct 2023 19:06:06 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 5364582004 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696359965; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ng3kOzFofbz3t990t3TTi6Qi62PNzb326QTJ+Ei9dj4=; b=LiXcbMufRHRQ2NhUiv1LGfCIWqwRrviSuEFxfFlBpYryhzHgdfB2YLWO1b6IsfOAAratB9 sN57jf/0tVfPIWASAtEllZjmnMP7NEnw9rJMMonaHIbJidJZgdbAdVEq6HZAwjqCLa9J7G B9M6zaX30SUfeOKQSV5gbDpltrOxLVs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-363-DuTQDt7JPb2ZP5un5Zewgg-1; Tue, 03 Oct 2023 15:06:00 -0400 X-MC-Unique: DuTQDt7JPb2ZP5un5Zewgg-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B8D4A85A5A8; Tue, 3 Oct 2023 19:05:59 +0000 (UTC) Received: from RHTPC1VM0NT.lan (unknown [10.22.10.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id C9FB2493113; Tue, 3 Oct 2023 19:05:58 +0000 (UTC) From: Aaron Conole To: dev@openvswitch.org Date: Tue, 3 Oct 2023 15:05:57 -0400 Message-Id: <20231003190557.423232-2-aconole@redhat.com> In-Reply-To: <20231003190557.423232-1-aconole@redhat.com> References: <20231003190557.423232-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: Peng He , Ilya Maximets , Michael Plato Subject: [ovs-dev] [PATCH v3 branch-2.17 2/2] conntrack: Remove nat_conn introducing key directionality. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Peng He The patch avoids the extra allocation for nat_conn. Currently, when doing NAT, the userspace conntrack will use an extra conn for the two directions in a flow. However, each conn has actually the two keys for both orig and rev directions. This patch introduces a key_node[CT_DIRS] member as per Aaron's suggestion in the conn which consists of a key, direction, and a cmap_node for hash lookup so addressing the feedback received by the original patch [0]. With this adjustment, we also remove the assertion that connections in the table are DEFAULT while updating connection state and/or removing connections. [0] https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0320@bytedance.com/ [Aaron resolved numerous conflicts due to lack of multiple commits] Reported-by: Michael Plato Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-September/052065.html Signed-off-by: Peng He Co-authored-by: Paolo Valerio Signed-off-by: Paolo Valerio Tested-by: Frode Nordahl Acked-by: Ilya Maximets Acked-by: Aaron Conole Signed-off-by: Aaron Conole Acked-by: Simon Horman --- NOTE: Inserted a check in conn_clean against conn->cleaned, which should be analagous to the check in post rcu-ified branches for the atomic test-and-set variable conn->reclaimed lib/conntrack-private.h | 19 +- lib/conntrack-tp.c | 6 +- lib/conntrack.c | 382 ++++++++++++++++++---------------------- 3 files changed, 184 insertions(+), 223 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index dfdf4e676b..581f517ad1 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -48,6 +48,12 @@ struct ct_endpoint { * hashing in ct_endpoint_hash_add(). */ BUILD_ASSERT_DECL(sizeof(struct ct_endpoint) == sizeof(union ct_addr) + 4); +enum key_dir { + CT_DIR_FWD = 0, + CT_DIR_REV, + CT_DIRS, +}; + /* Changes to this structure need to be reflected in conn_key_hash() * and conn_key_cmp(). */ struct conn_key { @@ -86,21 +92,19 @@ struct alg_exp_node { bool nat_rpl_dst; }; -enum OVS_PACKED_ENUM ct_conn_type { - CT_CONN_TYPE_DEFAULT, - CT_CONN_TYPE_UN_NAT, +struct conn_key_node { + enum key_dir dir; + struct conn_key key; + struct cmap_node cm_node; }; struct conn { /* Immutable data. */ - struct conn_key key; - struct conn_key rev_key; + struct conn_key_node key_node[CT_DIRS]; struct conn_key parent_key; /* Only used for orig_tuple support. */ struct ovs_list exp_node; - struct cmap_node cm_node; uint16_t nat_action; char *alg; - struct conn *nat_conn; /* The NAT 'conn' context, if there is one. */ /* Mutable data. */ struct ovs_mutex lock; /* Guards all mutable fields. */ @@ -120,7 +124,6 @@ struct conn { /* Immutable data. */ bool alg_related; /* True if alg data connection. */ - enum ct_conn_type conn_type; uint32_t tp_id; /* Timeout policy ID. */ }; diff --git a/lib/conntrack-tp.c b/lib/conntrack-tp.c index a586d3a8d3..2bdda67110 100644 --- a/lib/conntrack-tp.c +++ b/lib/conntrack-tp.c @@ -276,7 +276,8 @@ conn_update_expiration(struct conntrack *ct, struct conn *conn, ovs_mutex_lock(&conn->lock); VLOG_DBG_RL(&rl, "Update timeout %s zone=%u with policy id=%d " "val=%u sec.", - ct_timeout_str[tm], conn->key.zone, conn->tp_id, val); + ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, + conn->tp_id, val); conn_update_expiration__(ct, conn, tm, now, val); } @@ -307,7 +308,8 @@ conn_init_expiration(struct conntrack *ct, struct conn *conn, } VLOG_DBG_RL(&rl, "Init timeout %s zone=%u with policy id=%d val=%u sec.", - ct_timeout_str[tm], conn->key.zone, conn->tp_id, val); + ct_timeout_str[tm], conn->key_node[CT_DIR_FWD].key.zone, + conn->tp_id, val); conn_init_expiration__(ct, conn, tm, now, val); } diff --git a/lib/conntrack.c b/lib/conntrack.c index 83a73995d6..a85e9ba886 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -100,7 +100,7 @@ static enum ct_update_res conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, long long now); -static bool conn_expired(struct conn *, long long now); +static bool conn_expired(const struct conn *, long long now); static void set_mark(struct dp_packet *, struct conn *, uint32_t val, uint32_t mask); static void set_label(struct dp_packet *, struct conn *, @@ -109,8 +109,7 @@ static void set_label(struct dp_packet *, struct conn *, static void *clean_thread_main(void *f_); static bool -nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, - struct conn *nat_conn, +nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, const struct nat_action_info_t *nat_info); static uint8_t @@ -204,7 +203,7 @@ static alg_helper alg_helpers[] = { #define ALG_WC_SRC_PORT 0 /* If the total number of connections goes above this value, no new connections - * are accepted; this is for CT_CONN_TYPE_DEFAULT connections. */ + * are accepted. */ #define DEFAULT_N_CONN_LIMIT 3000000 /* Does a member by member comparison of two conn_keys; this @@ -230,61 +229,6 @@ conn_key_cmp(const struct conn_key *key1, const struct conn_key *key2) return 1; } -static void -ct_print_conn_info(const struct conn *c, const char *log_msg, - enum vlog_level vll, bool force, bool rl_on) -{ -#define CT_VLOG(RL_ON, LEVEL, ...) \ - do { \ - if (RL_ON) { \ - static struct vlog_rate_limit rl_ = VLOG_RATE_LIMIT_INIT(5, 5); \ - vlog_rate_limit(&this_module, LEVEL, &rl_, __VA_ARGS__); \ - } else { \ - vlog(&this_module, LEVEL, __VA_ARGS__); \ - } \ - } while (0) - - if (OVS_UNLIKELY(force || vlog_is_enabled(&this_module, vll))) { - if (c->key.dl_type == htons(ETH_TYPE_IP)) { - CT_VLOG(rl_on, vll, "%s: src ip "IP_FMT" dst ip "IP_FMT" rev src " - "ip "IP_FMT" rev dst ip "IP_FMT" src/dst ports " - "%"PRIu16"/%"PRIu16" rev src/dst ports " - "%"PRIu16"/%"PRIu16" zone/rev zone " - "%"PRIu16"/%"PRIu16" nw_proto/rev nw_proto " - "%"PRIu8"/%"PRIu8, log_msg, - IP_ARGS(c->key.src.addr.ipv4), - IP_ARGS(c->key.dst.addr.ipv4), - IP_ARGS(c->rev_key.src.addr.ipv4), - IP_ARGS(c->rev_key.dst.addr.ipv4), - ntohs(c->key.src.port), ntohs(c->key.dst.port), - ntohs(c->rev_key.src.port), ntohs(c->rev_key.dst.port), - c->key.zone, c->rev_key.zone, c->key.nw_proto, - c->rev_key.nw_proto); - } else { - char ip6_s[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->key.src.addr.ipv6, ip6_s, sizeof ip6_s); - char ip6_d[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->key.dst.addr.ipv6, ip6_d, sizeof ip6_d); - char ip6_rs[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->rev_key.src.addr.ipv6, ip6_rs, - sizeof ip6_rs); - char ip6_rd[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, &c->rev_key.dst.addr.ipv6, ip6_rd, - sizeof ip6_rd); - - CT_VLOG(rl_on, vll, "%s: src ip %s dst ip %s rev src ip %s" - " rev dst ip %s src/dst ports %"PRIu16"/%"PRIu16 - " rev src/dst ports %"PRIu16"/%"PRIu16" zone/rev zone " - "%"PRIu16"/%"PRIu16" nw_proto/rev nw_proto " - "%"PRIu8"/%"PRIu8, log_msg, ip6_s, ip6_d, ip6_rs, - ip6_rd, ntohs(c->key.src.port), ntohs(c->key.dst.port), - ntohs(c->rev_key.src.port), ntohs(c->rev_key.dst.port), - c->key.zone, c->rev_key.zone, c->key.nw_proto, - c->rev_key.nw_proto); - } - } -} - /* Initializes the connection tracker 'ct'. The caller is responsible for * calling 'conntrack_destroy()', when the instance is not needed anymore */ struct conntrack * @@ -446,23 +390,28 @@ static void conn_clean(struct conntrack *ct, struct conn *conn) OVS_REQUIRES(ct->ct_lock) { - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); + uint32_t hash; + + if (conn->cleaned) { + return; + } if (conn->alg) { - expectation_clean(ct, &conn->key); + expectation_clean(ct, &conn->key_node[CT_DIR_FWD].key); } - uint32_t hash = conn_key_hash(&conn->key, ct->hash_basis); - cmap_remove(&ct->conns, &conn->cm_node, hash); + hash = conn_key_hash(&conn->key_node[CT_DIR_FWD].key, ct->hash_basis); + cmap_remove(&ct->conns, &conn->key_node[CT_DIR_FWD].cm_node, hash); struct zone_limit *zl = zone_limit_lookup(ct, conn->admit_zone); if (zl && zl->czl.zone_limit_seq == conn->zone_limit_seq) { zl->czl.count--; } - if (conn->nat_conn) { - hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis); - cmap_remove(&ct->conns, &conn->nat_conn->cm_node, hash); + if (conn->nat_action) { + hash = conn_key_hash(&conn->key_node[CT_DIR_REV].key, + ct->hash_basis); + cmap_remove(&ct->conns, &conn->key_node[CT_DIR_REV].cm_node, hash); } ovs_list_remove(&conn->exp_node); conn->cleaned = true; @@ -476,15 +425,18 @@ conn_clean(struct conntrack *ct, struct conn *conn) void conntrack_destroy(struct conntrack *ct) { + struct conn_key_node *keyn; struct conn *conn; latch_set(&ct->clean_thread_exit); pthread_join(ct->clean_thread, NULL); latch_destroy(&ct->clean_thread_exit); ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (conn, cm_node, &ct->conns) { - if (conn->conn_type != CT_CONN_TYPE_DEFAULT) { - continue; + CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { + if (keyn->dir == CT_DIR_FWD) { + conn = CONTAINER_OF(keyn, struct conn, key_node[CT_DIR_FWD]); + } else { + conn = CONTAINER_OF(keyn, struct conn, key_node[CT_DIR_REV]); } conn_clean(ct, conn); @@ -526,31 +478,39 @@ conn_key_lookup(struct conntrack *ct, const struct conn_key *key, uint32_t hash, long long now, struct conn **conn_out, bool *reply) { - struct conn *conn; + struct conn_key_node *keyn; + struct conn *conn = NULL; bool found = false; - CMAP_FOR_EACH_WITH_HASH (conn, cm_node, hash, &ct->conns) { - if (!conn_key_cmp(&conn->key, key) && !conn_expired(conn, now)) { - found = true; - if (reply) { - *reply = false; - } - break; + CMAP_FOR_EACH_WITH_HASH (keyn, cm_node, hash, &ct->conns) { + if (keyn->dir == CT_DIR_FWD) { + conn = CONTAINER_OF(keyn, struct conn, key_node[CT_DIR_FWD]); + } else { + conn = CONTAINER_OF(keyn, struct conn, key_node[CT_DIR_REV]); + } + + if (conn_expired(conn, now)) { + continue; } - if (!conn_key_cmp(&conn->rev_key, key) && !conn_expired(conn, now)) { - found = true; - if (reply) { - *reply = true; + + for (int i = CT_DIR_FWD; i < CT_DIRS; i++) { + if (!conn_key_cmp(&conn->key_node[i].key, key)) { + found = true; + if (reply) { + *reply = (i == CT_DIR_REV); + } + goto out_found; } - break; } } +out_found: if (found && conn_out) { *conn_out = conn; } else if (conn_out) { *conn_out = NULL; } + return found; } @@ -584,7 +544,7 @@ write_ct_md(struct dp_packet *pkt, uint16_t zone, const struct conn *conn, if (conn->alg_related) { key = &conn->parent_key; } else { - key = &conn->key; + key = &conn->key_node[CT_DIR_FWD].key; } } else if (alg_exp) { pkt->md.ct_mark = alg_exp->parent_mark; @@ -813,7 +773,8 @@ nat_inner_packet(struct dp_packet *pkt, struct conn_key *key, static void nat_packet(struct dp_packet *pkt, struct conn *conn, bool reply, bool related) { - struct conn_key *key = reply ? &conn->key : &conn->rev_key; + enum key_dir dir = reply ? CT_DIR_FWD : CT_DIR_REV; + struct conn_key *key = &conn->key_node[dir].key; uint16_t nat_action = reply ? nat_action_reverse(conn->nat_action) : conn->nat_action; @@ -848,7 +809,7 @@ conn_seq_skew_set(struct conntrack *ct, const struct conn *conn_in, { struct conn *conn; ovs_mutex_unlock(&conn_in->lock); - conn_lookup(ct, &conn_in->key, now, &conn, NULL); + conn_lookup(ct, &conn_in->key_node[CT_DIR_FWD].key, now, &conn, NULL); ovs_mutex_lock(&conn_in->lock); if (conn && seq_skew) { @@ -886,7 +847,6 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, OVS_REQUIRES(ct->ct_lock) { struct conn *nc = NULL; - struct conn *nat_conn = NULL; if (!valid_new(pkt, &ctx->key)) { pkt->md.ct_state = CS_INVALID; @@ -900,6 +860,7 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } if (commit) { + struct conn_key_node *fwd_key_node, *rev_key_node; struct zone_limit *zl = zone_limit_lookup_or_default(ct, ctx->key.zone); if (zl && zl->czl.count >= zl->czl.limit) { @@ -914,9 +875,12 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, } nc = new_conn(ct, pkt, &ctx->key, now, tp_id); - memcpy(&nc->key, &ctx->key, sizeof nc->key); - memcpy(&nc->rev_key, &nc->key, sizeof nc->rev_key); - conn_key_reverse(&nc->rev_key); + fwd_key_node = &nc->key_node[CT_DIR_FWD]; + rev_key_node = &nc->key_node[CT_DIR_REV]; + memcpy(&fwd_key_node->key, &ctx->key, sizeof fwd_key_node->key); + memcpy(&rev_key_node->key, &fwd_key_node->key, + sizeof rev_key_node->key); + conn_key_reverse(&rev_key_node->key); if (ct_verify_helper(helper, ct_alg_ctl)) { nc->alg = nullable_xstrdup(helper); @@ -931,45 +895,33 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, if (nat_action_info) { nc->nat_action = nat_action_info->nat_action; - nat_conn = xzalloc(sizeof *nat_conn); if (alg_exp) { if (alg_exp->nat_rpl_dst) { - nc->rev_key.dst.addr = alg_exp->alg_nat_repl_addr; + rev_key_node->key.dst.addr = alg_exp->alg_nat_repl_addr; nc->nat_action = NAT_ACTION_SRC; } else { - nc->rev_key.src.addr = alg_exp->alg_nat_repl_addr; + rev_key_node->key.src.addr = alg_exp->alg_nat_repl_addr; nc->nat_action = NAT_ACTION_DST; } } else { - memcpy(nat_conn, nc, sizeof *nat_conn); - bool nat_res = nat_get_unique_tuple(ct, nc, nat_conn, - nat_action_info); + bool nat_res = nat_get_unique_tuple(ct, nc, nat_action_info); if (!nat_res) { goto nat_res_exhaustion; } - - /* Update nc with nat adjustments made to nat_conn by - * nat_get_unique_tuple(). */ - memcpy(nc, nat_conn, sizeof *nc); } nat_packet(pkt, nc, false, ctx->icmp_related); - memcpy(&nat_conn->key, &nc->rev_key, sizeof nat_conn->key); - memcpy(&nat_conn->rev_key, &nc->key, sizeof nat_conn->rev_key); - nat_conn->conn_type = CT_CONN_TYPE_UN_NAT; - nat_conn->nat_action = 0; - nat_conn->alg = NULL; - nat_conn->nat_conn = NULL; - uint32_t nat_hash = conn_key_hash(&nat_conn->key, ct->hash_basis); - cmap_insert(&ct->conns, &nat_conn->cm_node, nat_hash); + uint32_t rev_hash = conn_key_hash(&rev_key_node->key, + ct->hash_basis); + cmap_insert(&ct->conns, &rev_key_node->cm_node, rev_hash); } - nc->nat_conn = nat_conn; ovs_mutex_init_adaptive(&nc->lock); - nc->conn_type = CT_CONN_TYPE_DEFAULT; - cmap_insert(&ct->conns, &nc->cm_node, ctx->hash); + fwd_key_node->dir = CT_DIR_FWD; + rev_key_node->dir = CT_DIR_REV; + cmap_insert(&ct->conns, &fwd_key_node->cm_node, ctx->hash); atomic_count_inc(&ct->n_conn); ctx->conn = nc; /* For completeness. */ if (zl) { @@ -989,7 +941,6 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt, * firewall rules or a separate firewall. Also using zone partitioning * can limit DoS impact. */ nat_res_exhaustion: - free(nat_conn); ovs_list_remove(&nc->exp_node); delete_conn__(nc); static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); @@ -1003,7 +954,6 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, struct conn *conn, long long now) { - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); bool create_new_conn = false; if (ctx->icmp_related) { @@ -1031,7 +981,8 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt, break; case CT_UPDATE_NEW: ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key, now, NULL, NULL)) { + if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, + now, NULL, NULL)) { conn_clean(ct, conn); } ovs_mutex_unlock(&ct->ct_lock); @@ -1208,8 +1159,10 @@ initial_conn_lookup(struct conntrack *ct, struct conn_lookup_ctx *ctx, if (natted) { if (OVS_LIKELY(ctx->conn)) { + enum key_dir dir; ctx->reply = !ctx->reply; - ctx->key = ctx->reply ? ctx->conn->rev_key : ctx->conn->key; + dir = ctx->reply ? CT_DIR_REV : CT_DIR_FWD; + ctx->key = ctx->conn->key_node[dir].key; ctx->hash = conn_key_hash(&ctx->key, ct->hash_basis); } else { /* A lookup failure does not necessarily imply that an @@ -1243,32 +1196,14 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, /* Delete found entry if in wrong direction. 'force' implies commit. */ if (OVS_UNLIKELY(force && ctx->reply && conn)) { ovs_mutex_lock(&ct->ct_lock); - if (conn_lookup(ct, &conn->key, now, NULL, NULL)) { + if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key, + now, NULL, NULL)) { conn_clean(ct, conn); } ovs_mutex_unlock(&ct->ct_lock); conn = NULL; } - if (OVS_LIKELY(conn)) { - if (conn->conn_type == CT_CONN_TYPE_UN_NAT) { - - ctx->reply = true; - struct conn *rev_conn = conn; /* Save for debugging. */ - uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis); - conn_key_lookup(ct, &ctx->key, hash, now, &conn, &ctx->reply); - - if (!conn) { - pkt->md.ct_state |= CS_INVALID; - write_ct_md(pkt, zone, NULL, NULL, NULL); - char *log_msg = xasprintf("Missing parent conn %p", rev_conn); - ct_print_conn_info(rev_conn, log_msg, VLL_INFO, true, true); - free(log_msg); - return; - } - } - } - enum ct_alg_ctl_type ct_alg_ctl = get_alg_ctl_type(pkt, tp_src, tp_dst, helper); @@ -1361,8 +1296,9 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, struct conn *conn = packet->md.conn; if (OVS_UNLIKELY(packet->md.ct_state == CS_INVALID)) { write_ct_md(packet, zone, NULL, NULL, NULL); - } else if (conn && conn->key.zone == zone && !force - && !get_alg_ctl_type(packet, tp_src, tp_dst, helper)) { + } else if (conn && + conn->key_node[CT_DIR_FWD].key.zone == zone && !force && + !get_alg_ctl_type(packet, tp_src, tp_dst, helper)) { process_one_fast(zone, setmark, setlabel, nat_action_info, conn, packet); } else if (OVS_UNLIKELY(!conn_key_extract(ct, packet, dl_type, &ctx, @@ -2142,7 +2078,7 @@ nat_ipv6_addr_increment(struct in6_addr *ipv6, uint32_t increment) } static uint32_t -nat_range_hash(const struct conn *conn, uint32_t basis, +nat_range_hash(const struct conn_key *key, uint32_t basis, const struct nat_action_info_t *nat_info) { uint32_t hash = basis; @@ -2152,11 +2088,11 @@ nat_range_hash(const struct conn *conn, uint32_t basis, hash = hash_add(hash, ((uint32_t) nat_info->max_port << 16) | nat_info->min_port); - hash = ct_endpoint_hash_add(hash, &conn->key.src); - hash = ct_endpoint_hash_add(hash, &conn->key.dst); - hash = hash_add(hash, (OVS_FORCE uint32_t) conn->key.dl_type); - hash = hash_add(hash, conn->key.nw_proto); - hash = hash_add(hash, conn->key.zone); + hash = ct_endpoint_hash_add(hash, &key->src); + hash = ct_endpoint_hash_add(hash, &key->dst); + hash = hash_add(hash, (OVS_FORCE uint32_t) key->dl_type); + hash = hash_add(hash, key->nw_proto); + hash = hash_add(hash, key->zone); /* The purpose of the second parameter is to distinguish hashes of data of * different length; our data always has the same length so there is no @@ -2230,7 +2166,7 @@ get_addr_in_range(union ct_addr *min, union ct_addr *max, } static void -get_initial_addr(const struct conn *conn, union ct_addr *min, +get_initial_addr(const struct conn_key *key, union ct_addr *min, union ct_addr *max, union ct_addr *curr, uint32_t hash, bool ipv4, const struct nat_action_info_t *nat_info) @@ -2240,9 +2176,9 @@ get_initial_addr(const struct conn *conn, union ct_addr *min, /* All-zero case. */ if (!memcmp(min, &zero_ip, sizeof *min)) { if (nat_info->nat_action & NAT_ACTION_SRC) { - *curr = conn->key.src.addr; + *curr = key->src.addr; } else if (nat_info->nat_action & NAT_ACTION_DST) { - *curr = conn->key.dst.addr; + *curr = key->dst.addr; } } else { get_addr_in_range(min, max, curr, hash, ipv4); @@ -2306,7 +2242,7 @@ next_addr_in_range_guarded(union ct_addr *curr, union ct_addr *min, } static bool -nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn, +nat_get_unique_l4(struct conntrack *ct, struct conn_key *rev_key, ovs_be16 *port, uint16_t curr, uint16_t min, uint16_t max) { @@ -2314,8 +2250,7 @@ nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn, FOR_EACH_PORT_IN_RANGE (curr, min, max) { *port = htons(curr); - if (!conn_lookup(ct, &nat_conn->rev_key, - time_msec(), NULL, NULL)) { + if (!conn_lookup(ct, rev_key, time_msec(), NULL, NULL)) { return true; } } @@ -2347,45 +2282,45 @@ nat_get_unique_l4(struct conntrack *ct, struct conn *nat_conn, * * If none can be found, return exhaustion to the caller. */ static bool -nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, - struct conn *nat_conn, +nat_get_unique_tuple(struct conntrack *ct, struct conn *conn, const struct nat_action_info_t *nat_info) { + struct conn_key *fwd_key = &conn->key_node[CT_DIR_FWD].key; + struct conn_key *rev_key = &conn->key_node[CT_DIR_REV].key; union ct_addr min_addr = {0}, max_addr = {0}, curr_addr = {0}, guard_addr = {0}; - uint32_t hash = nat_range_hash(conn, ct->hash_basis, nat_info); - bool pat_proto = conn->key.nw_proto == IPPROTO_TCP || - conn->key.nw_proto == IPPROTO_UDP; + bool pat_proto = fwd_key->nw_proto == IPPROTO_TCP || + fwd_key->nw_proto == IPPROTO_UDP; uint16_t min_dport, max_dport, curr_dport; uint16_t min_sport, max_sport, curr_sport; + uint32_t hash; + hash = nat_range_hash(fwd_key, ct->hash_basis, nat_info); min_addr = nat_info->min_addr; max_addr = nat_info->max_addr; - get_initial_addr(conn, &min_addr, &max_addr, &curr_addr, hash, - (conn->key.dl_type == htons(ETH_TYPE_IP)), nat_info); + get_initial_addr(fwd_key, &min_addr, &max_addr, &curr_addr, hash, + (fwd_key->dl_type == htons(ETH_TYPE_IP)), nat_info); /* Save the address we started from so that * we can stop once we reach it. */ guard_addr = curr_addr; - set_sport_range(nat_info, &conn->key, hash, &curr_sport, + set_sport_range(nat_info, fwd_key, hash, &curr_sport, &min_sport, &max_sport); - set_dport_range(nat_info, &conn->key, hash, &curr_dport, + set_dport_range(nat_info, fwd_key, hash, &curr_dport, &min_dport, &max_dport); if (pat_proto) { - nat_conn->rev_key.src.port = htons(curr_dport); - nat_conn->rev_key.dst.port = htons(curr_sport); + rev_key->src.port = htons(curr_dport); + rev_key->dst.port = htons(curr_sport); } another_round: - store_addr_to_key(&curr_addr, &nat_conn->rev_key, - nat_info->nat_action); + store_addr_to_key(&curr_addr, rev_key, nat_info->nat_action); if (!pat_proto) { - if (!conn_lookup(ct, &nat_conn->rev_key, - time_msec(), NULL, NULL)) { + if (!conn_lookup(ct, rev_key, time_msec(), NULL, NULL)) { return true; } @@ -2394,12 +2329,12 @@ another_round: bool found = false; if (nat_info->nat_action & NAT_ACTION_DST_PORT) { - found = nat_get_unique_l4(ct, nat_conn, &nat_conn->rev_key.src.port, + found = nat_get_unique_l4(ct, rev_key, &rev_key->src.port, curr_dport, min_dport, max_dport); } if (!found) { - found = nat_get_unique_l4(ct, nat_conn, &nat_conn->rev_key.dst.port, + found = nat_get_unique_l4(ct, rev_key, &rev_key->dst.port, curr_sport, min_sport, max_sport); } @@ -2412,7 +2347,7 @@ another_round: next_addr: if (next_addr_in_range_guarded(&curr_addr, &min_addr, &max_addr, &guard_addr, - conn->key.dl_type == htons(ETH_TYPE_IP))) { + fwd_key->dl_type == htons(ETH_TYPE_IP))) { return false; } @@ -2424,23 +2359,20 @@ conn_update(struct conntrack *ct, struct conn *conn, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, long long now) { ovs_mutex_lock(&conn->lock); + uint8_t nw_proto = conn->key_node[CT_DIR_FWD].key.nw_proto; enum ct_update_res update_res = - l4_protos[conn->key.nw_proto]->conn_update(ct, conn, pkt, ctx->reply, - now); + l4_protos[nw_proto]->conn_update(ct, conn, pkt, ctx->reply, now); ovs_mutex_unlock(&conn->lock); return update_res; } static bool -conn_expired(struct conn *conn, long long now) +conn_expired(const struct conn *conn, long long now) { - if (conn->conn_type == CT_CONN_TYPE_DEFAULT) { - ovs_mutex_lock(&conn->lock); - bool expired = now >= conn->expiration ? true : false; - ovs_mutex_unlock(&conn->lock); - return expired; - } - return false; + ovs_mutex_lock(&conn->lock); + bool expired = now >= conn->expiration ? true : false; + ovs_mutex_unlock(&conn->lock); + return expired; } static bool @@ -2466,9 +2398,7 @@ delete_conn__(struct conn *conn) static void delete_conn(struct conn *conn) { - ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT); ovs_mutex_destroy(&conn->lock); - free(conn->nat_conn); delete_conn__(conn); } @@ -2560,11 +2490,14 @@ static void conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, long long now) { + const struct conn_key *rev_key = &conn->key_node[CT_DIR_REV].key; + const struct conn_key *key = &conn->key_node[CT_DIR_FWD].key; + memset(entry, 0, sizeof *entry); - conn_key_to_tuple(&conn->key, &entry->tuple_orig); - conn_key_to_tuple(&conn->rev_key, &entry->tuple_reply); + conn_key_to_tuple(key, &entry->tuple_orig); + conn_key_to_tuple(rev_key, &entry->tuple_reply); - entry->zone = conn->key.zone; + entry->zone = key->zone; ovs_mutex_lock(&conn->lock); entry->mark = conn->mark; @@ -2572,7 +2505,7 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry, long long expiration = conn->expiration - now; - struct ct_l4_proto *class = l4_protos[conn->key.nw_proto]; + struct ct_l4_proto *class = l4_protos[key->nw_proto]; if (class->conn_get_protoinfo) { class->conn_get_protoinfo(conn, &entry->protoinfo); } @@ -2620,10 +2553,21 @@ conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry) if (!cm_node) { break; } + struct conn_key_node *keyn; struct conn *conn; - INIT_CONTAINER(conn, cm_node, cm_node); - if ((!dump->filter_zone || conn->key.zone == dump->zone) && - (conn->conn_type != CT_CONN_TYPE_UN_NAT)) { + + INIT_CONTAINER(keyn, cm_node, cm_node); + + if (keyn->dir != CT_DIR_FWD) { + continue; + } + + conn = CONTAINER_OF(keyn, struct conn, key_node[CT_DIR_FWD]); + if (conn_expired(conn, now)) { + continue; + } + + if ((!dump->filter_zone || keyn->key.zone == dump->zone)) { conn_to_ct_dpif_entry(conn, entry, now); return 0; } @@ -2641,15 +2585,17 @@ conntrack_dump_done(struct conntrack_dump *dump OVS_UNUSED) int conntrack_flush(struct conntrack *ct, const uint16_t *zone) { + struct conn_key_node *keyn; struct conn *conn; ovs_mutex_lock(&ct->ct_lock); - CMAP_FOR_EACH (conn, cm_node, &ct->conns) { - if (conn->conn_type != CT_CONN_TYPE_DEFAULT) { + CMAP_FOR_EACH (keyn, cm_node, &ct->conns) { + if (keyn->dir != CT_DIR_FWD) { continue; } - if (!zone || *zone == conn->key.zone) { + conn = CONTAINER_OF(keyn, struct conn, key_node[CT_DIR_FWD]); + if (!zone || *zone == keyn->key.zone) { conn_clean(ct, conn); } } @@ -2662,19 +2608,19 @@ int conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple, uint16_t zone) { - int error = 0; struct conn_key key; struct conn *conn; + int error = 0; memset(&key, 0, sizeof(key)); tuple_to_conn_key(tuple, zone, &key); ovs_mutex_lock(&ct->ct_lock); conn_lookup(ct, &key, time_msec(), &conn, NULL); - if (conn && conn->conn_type == CT_CONN_TYPE_DEFAULT) { + if (conn) { conn_clean(ct, conn); } else { - VLOG_WARN("Must flush tuple using the original pre-NATed tuple"); + VLOG_WARN("Tuple not found"); error = ENOENT; } @@ -2818,50 +2764,54 @@ expectation_create(struct conntrack *ct, ovs_be16 dst_port, const struct conn *parent_conn, bool reply, bool src_ip_wc, bool skip_nat) { + const struct conn_key *pconn_key, *pconn_rev_key; union ct_addr src_addr; union ct_addr dst_addr; union ct_addr alg_nat_repl_addr; struct alg_exp_node *alg_exp_node = xzalloc(sizeof *alg_exp_node); + pconn_key = &parent_conn->key_node[CT_DIR_FWD].key; + pconn_rev_key = &parent_conn->key_node[CT_DIR_REV].key; + if (reply) { - src_addr = parent_conn->key.src.addr; - dst_addr = parent_conn->key.dst.addr; + src_addr = pconn_key->src.addr; + dst_addr = pconn_key->dst.addr; alg_exp_node->nat_rpl_dst = true; if (skip_nat) { alg_nat_repl_addr = dst_addr; } else if (parent_conn->nat_action & NAT_ACTION_DST) { - alg_nat_repl_addr = parent_conn->rev_key.src.addr; + alg_nat_repl_addr = pconn_rev_key->src.addr; alg_exp_node->nat_rpl_dst = false; } else { - alg_nat_repl_addr = parent_conn->rev_key.dst.addr; + alg_nat_repl_addr = pconn_rev_key->dst.addr; } } else { - src_addr = parent_conn->rev_key.src.addr; - dst_addr = parent_conn->rev_key.dst.addr; + src_addr = pconn_rev_key->src.addr; + dst_addr = pconn_rev_key->dst.addr; alg_exp_node->nat_rpl_dst = false; if (skip_nat) { alg_nat_repl_addr = src_addr; } else if (parent_conn->nat_action & NAT_ACTION_DST) { - alg_nat_repl_addr = parent_conn->key.dst.addr; + alg_nat_repl_addr = pconn_key->dst.addr; alg_exp_node->nat_rpl_dst = true; } else { - alg_nat_repl_addr = parent_conn->key.src.addr; + alg_nat_repl_addr = pconn_key->src.addr; } } if (src_ip_wc) { memset(&src_addr, 0, sizeof src_addr); } - alg_exp_node->key.dl_type = parent_conn->key.dl_type; - alg_exp_node->key.nw_proto = parent_conn->key.nw_proto; - alg_exp_node->key.zone = parent_conn->key.zone; + alg_exp_node->key.dl_type = pconn_key->dl_type; + alg_exp_node->key.nw_proto = pconn_key->nw_proto; + alg_exp_node->key.zone = pconn_key->zone; alg_exp_node->key.src.addr = src_addr; alg_exp_node->key.dst.addr = dst_addr; alg_exp_node->key.src.port = ALG_WC_SRC_PORT; alg_exp_node->key.dst.port = dst_port; alg_exp_node->parent_mark = parent_conn->mark; alg_exp_node->parent_label = parent_conn->label; - memcpy(&alg_exp_node->parent_key, &parent_conn->key, + memcpy(&alg_exp_node->parent_key, pconn_key, sizeof alg_exp_node->parent_key); /* Take the write lock here because it is almost 100% * likely that the lookup will fail and @@ -3113,12 +3063,16 @@ process_ftp_ctl_v4(struct conntrack *ct, switch (mode) { case CT_FTP_MODE_ACTIVE: - *v4_addr_rep = conn_for_expectation->rev_key.dst.addr.ipv4; - conn_ipv4_addr = conn_for_expectation->key.src.addr.ipv4; + *v4_addr_rep = + conn_for_expectation->key_node[CT_DIR_REV].key.dst.addr.ipv4; + conn_ipv4_addr = + conn_for_expectation->key_node[CT_DIR_FWD].key.src.addr.ipv4; break; case CT_FTP_MODE_PASSIVE: - *v4_addr_rep = conn_for_expectation->key.dst.addr.ipv4; - conn_ipv4_addr = conn_for_expectation->rev_key.src.addr.ipv4; + *v4_addr_rep = + conn_for_expectation->key_node[CT_DIR_FWD].key.dst.addr.ipv4; + conn_ipv4_addr = + conn_for_expectation->key_node[CT_DIR_REV].key.src.addr.ipv4; break; case CT_TFTP_MODE: default: @@ -3150,7 +3104,7 @@ skip_ipv6_digits(char *str) static enum ftp_ctl_pkt process_ftp_ctl_v6(struct conntrack *ct, struct dp_packet *pkt, - const struct conn *conn_for_expectation, + const struct conn *conn_for_exp, union ct_addr *v6_addr_rep, char **ftp_data_start, size_t *addr_offset_from_ftp_data_start, size_t *addr_size, enum ct_alg_mode *mode) @@ -3218,24 +3172,25 @@ process_ftp_ctl_v6(struct conntrack *ct, switch (*mode) { case CT_FTP_MODE_ACTIVE: - *v6_addr_rep = conn_for_expectation->rev_key.dst.addr; + *v6_addr_rep = conn_for_exp->key_node[CT_DIR_REV].key.dst.addr; /* Although most servers will block this exploit, there may be some * less well managed. */ if (memcmp(&ip6_addr, &v6_addr_rep->ipv6, sizeof ip6_addr) && - memcmp(&ip6_addr, &conn_for_expectation->key.src.addr.ipv6, + memcmp(&ip6_addr, + &conn_for_exp->key_node[CT_DIR_FWD].key.src.addr.ipv6, sizeof ip6_addr)) { return CT_FTP_CTL_INVALID; } break; case CT_FTP_MODE_PASSIVE: - *v6_addr_rep = conn_for_expectation->key.dst.addr; + *v6_addr_rep = conn_for_exp->key_node[CT_DIR_FWD].key.dst.addr; break; case CT_TFTP_MODE: default: OVS_NOT_REACHED(); } - expectation_create(ct, port, conn_for_expectation, + expectation_create(ct, port, conn_for_exp, !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); return CT_FTP_CTL_INTEREST; } @@ -3389,7 +3344,8 @@ handle_tftp_ctl(struct conntrack *ct, long long now OVS_UNUSED, enum ftp_ctl_pkt ftp_ctl OVS_UNUSED, bool nat OVS_UNUSED) { - expectation_create(ct, conn_for_expectation->key.src.port, + expectation_create(ct, + conn_for_expectation->key_node[CT_DIR_FWD].key.src.port, conn_for_expectation, !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); }