From patchwork Wed Apr 8 17:05:57 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221004 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HTKe6ViG; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTxM2gkrz1xv0 for ; Thu, 09 Apr 2026 03:06:31 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 4D55E40482; Wed, 8 Apr 2026 17:06:29 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id Vc1qQKBNcJFe; Wed, 8 Apr 2026 17:06:28 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org DED93400C1 Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=HTKe6ViG Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTPS id DED93400C1; Wed, 8 Apr 2026 17:06:27 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id BAFDBC0902; Wed, 8 Apr 2026 17:06:27 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 56A50C054A for ; Wed, 8 Apr 2026 17:06:26 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 2E1B840054 for ; Wed, 8 Apr 2026 17:06:26 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id j6N0yR8w84iP for ; Wed, 8 Apr 2026 17:06:25 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org D4EE84003D Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org D4EE84003D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id D4EE84003D for ; Wed, 8 Apr 2026 17:06:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775667983; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=niqN7jKf3mpf1VQ0kUJgQjCpBQom1xpKhRshIg2HTVs=; b=HTKe6ViGpWWkbz1X01uoSnagB3qKHGdzOZKjQCqygIF9HZgKhwjM8hR9WP4GTeMXs7k425 7aHoCDqB8MQak9mHy3FRoxoKj5bk6BnjxxI7f/M1vF6XF6EHRh/NjtXaF8AbQtT7rwNLYx 7SbAp4C7FEG0073HR9Gv4g1Im/DPM9Q= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-518--bM6X05gMKSHOwO1M0jqSA-1; Wed, 08 Apr 2026 13:06:20 -0400 X-MC-Unique: -bM6X05gMKSHOwO1M0jqSA-1 X-Mimecast-MFC-AGG-ID: -bM6X05gMKSHOwO1M0jqSA_1775667979 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 02A721800281; Wed, 8 Apr 2026 17:06:19 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5CC6C300019F; Wed, 8 Apr 2026 17:06:17 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:05:57 -0400 Message-ID: <20260408170613.587902-2-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: hLZRPB3mq-LE6w2edNAjyhhSamzd1cHjeiYKsJospYw_1775667979 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 01/12] conntrack: Add per-conn storage for conntrack modules. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Currently, if a conntrack submodule wants to add per-connection private details, the pattern looks like: struct private_conn { struct conn conn_; ... private data ... } ... new_conn = xalloc(sizeof struct private_conn); ... return &new_conn->conn_; ... struct private_conn *module_conn = (struct private_conn *)conn_; This is a common pattern where the underlying allocations are delegated to the submodule areas, and the main processing module always assumes that each module allocates a conn_ storage area at the head of the connection struct anyway. However, this means that some storage details can't be shared in a conenient way between modules without leaking details about the underlying implementations of the module. For example, TCP based connections may want to share some TCP block details, but not want to expose the full private TCP connection module internals. To facilitate this, introduce a private storage section into connection objects. This will allow storing pre-defined details that each module can fill and guarantee some kind of compatibility without needing to completely expose the internals. It will be used in upcoming commits. Signed-off-by: Aaron Conole --- lib/conntrack-private.h | 26 ++++++++ lib/conntrack.c | 44 ++++++++++++++ lib/conntrack.h | 39 ++++++++++++ tests/library.at | 18 ++++++ tests/test-conntrack.c | 130 +++++++++++++++++++++++++++++++++++++++- 5 files changed, 256 insertions(+), 1 deletion(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index f1132e8aa8..bd095277cd 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -156,6 +156,10 @@ struct conn { bool alg_related; /* True if alg data connection. */ uint32_t tp_id; /* Timeout policy ID. */ + + /* Private per-module storage. Indexed by ct_private_id_t values obtained + * via conn_private_id_alloc(). Access is protected by conn->lock. */ + void *private[CT_CONN_PRIVATE_MAX]; }; enum ct_update_res { @@ -264,4 +268,26 @@ struct ct_l4_proto { struct ct_dpif_protoinfo *); }; +/* conn_private_get() / conn_private_set() + * + * Fast-path accessors for per-connection private storage slots. Both + * functions are static inlines so they compile to a single load/store with + * bounds-checking asserts that disappear in release builds. + * + * The caller must hold conn->lock when accessing the pointer. + */ +static inline void * +conn_private_get(const struct conn *conn, ct_private_id_t id) +{ + ovs_assert(id < CT_CONN_PRIVATE_MAX); + return conn->private[id]; +} + +static inline void +conn_private_set(struct conn *conn, ct_private_id_t id, void *data) +{ + ovs_assert(id < CT_CONN_PRIVATE_MAX); + conn->private[id] = data; +} + #endif /* conntrack-private.h */ diff --git a/lib/conntrack.c b/lib/conntrack.c index e25cc25ca8..373c781eb9 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -157,6 +157,19 @@ expectation_clean(struct conntrack *ct, const struct conn_key *parent_key); static struct ct_l4_proto *l4_protos[UINT8_MAX + 1]; +/* Private per-connection storage slot registry. + * + * ct_private_slots[] is written once per slot at module initialization (via + * conn_private_id_alloc()) and then read-only for the lifetime of the process, + * so no additional locking is required to read the destructor pointer. + */ +struct ct_private_slot { + void (*destructor)(void *); /* NULL means no cleanup required. */ +}; + +static struct ct_private_slot ct_private_slots[CT_CONN_PRIVATE_MAX]; +static atomic_uint32_t ct_private_next_id = 0; + static void handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, struct dp_packet *pkt, struct conn *ec, long long now, @@ -607,6 +620,27 @@ conn_force_expire(struct conn *conn) atomic_store_relaxed(&conn->expiration, 0); } +ct_private_id_t +conn_private_id_alloc(void (*destructor)(void *)) +{ + uint32_t id; + + atomic_add(&ct_private_next_id, 1u, &id); + if (id >= CT_CONN_PRIVATE_MAX) { + /* Undo the increment so the counter doesn't overflow. + * Because we are not suppoed to call this after ct initialization, + * there shouldn't be an access race here. */ + atomic_sub(&ct_private_next_id, 1u, &id); + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); + VLOG_ERR_RL(&rl, "conntrack: all %d private storage slots are in use; " + "cannot allocate a new one", CT_CONN_PRIVATE_MAX); + return CT_PRIVATE_ID_INVALID; + } + + ct_private_slots[id].destructor = destructor; + return id; +} + /* Destroys the connection tracker 'ct' and frees all the allocated memory. * The caller of this function must already have shut down packet input * and PMD threads (which would have been quiesced). */ @@ -2719,6 +2753,16 @@ new_conn(struct conntrack *ct, struct dp_packet *pkt, struct conn_key *key, static void delete_conn__(struct conn *conn) { + uint32_t n; + + /* Invoke registered destructors for any non-NULL private slots. */ + atomic_read_relaxed(&ct_private_next_id, &n); + for (uint32_t i = 0; i < n; i++) { + if (ct_private_slots[i].destructor && conn->private[i]) { + ct_private_slots[i].destructor(conn->private[i]); + } + } + free(conn->alg); free(conn); } diff --git a/lib/conntrack.h b/lib/conntrack.h index c3136e9554..e5ca1528bf 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -91,6 +91,45 @@ struct nat_action_info_t { uint16_t nat_flags; }; +/* Private per-connection storage slots. + * + * Modules (protocol handlers, offload interfaces, etc.) can reserve a slot + * at initialization time and use it to attach private data to every tracked + * connection. Slot IDs are small integers that index directly into a fixed- + * size array inside struct conn, so get/set operations are O(1) and branch- + * free, safe to call on the datapath fast path. + * + * Usage + * ----- + * // At module initialization, allocate and store the returned id. + * static ct_private_id_t my_id; + * my_id = conn_private_id_alloc(my_conn_data_free); + * + * // On the fast path (no lock needed beyond conn->lock for the pointer). + * conn_private_set(conn, my_id, my_data); + * my_data = conn_private_get(conn, my_id); + * + * Thread-safety + * ------------- + * The pointer slot itself is protected by conn->lock. The pointed-to data + * is the responsibility of the registering module. + */ + +/* Maximum number of private storage slots available per connection. */ +#define CT_CONN_PRIVATE_MAX 8 + +typedef unsigned int ct_private_id_t; + +/* Returned by conn_private_id_alloc() when no slots remain. */ +#define CT_PRIVATE_ID_INVALID UINT_MAX + +/* Allocate a private storage slot. 'destructor' (may be NULL) is called with + * the stored pointer when a connection is freed; it must be safe to call with + * a NULL argument. Returns CT_PRIVATE_ID_INVALID on failure (all slots + * taken). Must be called before any connection is created that should carry + * this slot (i.e. at module initialization time). */ +ct_private_id_t conn_private_id_alloc(void (*destructor)(void *)); + struct conntrack *conntrack_init(void); void conntrack_destroy(struct conntrack *); diff --git a/tests/library.at b/tests/library.at index 449f15fd5a..6c5b55f045 100644 --- a/tests/library.at +++ b/tests/library.at @@ -307,3 +307,21 @@ AT_CLEANUP AT_SETUP([Conntrack Library - FTP ALG parsing]) AT_CHECK([ovstest test-conntrack ftp-alg-large-payload]) AT_CLEANUP + +AT_SETUP([conntrack private storage - id alloc]) +AT_KEYWORDS([conntrack]) +AT_CHECK([ovstest test-conntrack private-id-alloc], [0], [. +]) +AT_CLEANUP + +AT_SETUP([conntrack private storage - slot exhaustion]) +AT_KEYWORDS([conntrack]) +AT_CHECK([ovstest test-conntrack private-id-exhaustion], [0], [......... +], [ignore]) +AT_CLEANUP + +AT_SETUP([conntrack private storage - destructor]) +AT_KEYWORDS([conntrack]) +AT_CHECK([ovstest test-conntrack private-destructor], [0], [. +]) +AT_CLEANUP diff --git a/tests/test-conntrack.c b/tests/test-conntrack.c index 22db95f914..7f42adbb55 100644 --- a/tests/test-conntrack.c +++ b/tests/test-conntrack.c @@ -16,6 +16,7 @@ #include #include "conntrack.h" +#include "conntrack-private.h" #include "dp-packet.h" #include "fatal-signal.h" @@ -492,7 +493,7 @@ test_pcap(struct ovs_cmdl_context *ctx) ovs_pcap_close(pcap); } -/* ALG related testing. */ +/* Conntrack functional testing. */ /* FTP IPv4 PORT payload for testing. */ #define FTP_PORT_CMD_STR "PORT 192,168,123,2,113,42\r\n" @@ -572,6 +573,124 @@ test_ftp_alg_large_payload(struct ovs_cmdl_context *ctx OVS_UNUSED) conntrack_destroy(ct); } +/* Verify that conn_private_id_alloc() returns a valid slot ID and that the + * idiomatic "store the ID in a static variable at module init" pattern works. + */ +static void +test_private_id_alloc(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + /* Mirrors the real-world pattern: a module stores its slot ID in a static + * so it is initialised once and available everywhere in the translation + * unit. */ + static ct_private_id_t my_id = CT_PRIVATE_ID_INVALID; + + my_id = conn_private_id_alloc(NULL); + + ovs_assert(my_id != CT_PRIVATE_ID_INVALID); + + ovs_assert(my_id < CT_CONN_PRIVATE_MAX); + + /* The first allocation must yield slot 0. */ + ovs_assert(my_id == 0); + printf(".\n"); +} + +/* Allocate every available slot and confirm that the next request returns + * CT_PRIVATE_ID_INVALID. Each successful allocation prints one dot so the + * .at test can verify both the count and the error behaviour. + */ +static void +test_private_id_exhaustion(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + ct_private_id_t ids[CT_CONN_PRIVATE_MAX]; + + /* Fill all CT_CONN_PRIVATE_MAX slots. */ + for (unsigned int i = 0; i < CT_CONN_PRIVATE_MAX; i++) { + ids[i] = conn_private_id_alloc(NULL); + ovs_assert(ids[i] != CT_PRIVATE_ID_INVALID); + + ovs_assert(ids[i] == i); + printf("."); + } + + /* The very next allocation must fail. */ + ct_private_id_t extra = conn_private_id_alloc(NULL); + ovs_assert(extra == CT_PRIVATE_ID_INVALID); + printf(".\n"); +} + +/* Globals written by the destructor callback used in test 3. */ +static int dtor_call_count = 0; +static void *dtor_last_ptr = NULL; + +static void +record_destructor(void *data) +{ + dtor_call_count++; + dtor_last_ptr = data; +} + +/* Register a destructor, commit a real connection, attach a sentinel pointer + * as private data, then destroy the conntrack instance. After draining the + * RCU queue (ovsrcu_exit) the destructor must have been called exactly + * once with the sentinel value. + */ +static uintptr_t ERRPTR; + +static void +test_private_destructor(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + /* Sentinel: a non-NULL pointer value we can identify unambiguously. + * ERRPTR is defined above in case we want to use it in the future as + * a platform-agnostic and portable sentinel value rather than some + * hardcoded hex. */ + void *sentinel = (void *)(uintptr_t)&ERRPTR; + + static ct_private_id_t dtor_id = CT_PRIVATE_ID_INVALID; + dtor_id = conn_private_id_alloc(record_destructor); + ovs_assert(dtor_id != CT_PRIVATE_ID_INVALID); + + /* Create a conntrack instance and commit one UDP connection. */ + struct conntrack *lct = conntrack_init(); + ovs_be16 dl_type; + struct dp_packet *pkt = build_packet(1, 2, &dl_type); + struct dp_packet_batch batch; + dp_packet_batch_init(&batch); + dp_packet_batch_add(&batch, pkt); + + long long now = time_msec(); + conntrack_execute(lct, &batch, dl_type, false, true, 0, + NULL, NULL, NULL, NULL, now, 0); + + /* After a committed execute the packet carries a cached conn pointer. */ + struct conn *conn = pkt->md.conn; + ovs_assert(conn != NULL); + + /* Attach the sentinel as private data for our slot. */ + ovs_mutex_lock(&conn->lock); + conn_private_set(conn, dtor_id, sentinel); + ovs_mutex_unlock(&conn->lock); + + /* Destroying the tracker flushes all connections, queuing delete_conn() + * callbacks via ovsrcu_postpone(). The destructor fires once those + * callbacks are processed. */ + conntrack_destroy(lct); + + /* ovsrcu_exit() stops the urcu background thread and synchronously drains + * all pending postponed callbacks (including delete_conn__ / destructor + * chain) before returning. ovsrcu_synchronize() is insufficient here: it + * only waits for threads to quiesce, not for the urcu thread to have + * actually executed the queued callbacks. */ + ovsrcu_exit(); + + ovs_assert(dtor_call_count == 1); + + ovs_assert(dtor_last_ptr == sentinel); + + dp_packet_delete_batch(&batch, true); + printf(".\n"); +} + static const struct ovs_cmdl_command commands[] = { /* Connection tracker tests. */ @@ -597,6 +716,15 @@ static const struct ovs_cmdl_command commands[] = { * is rewritten to the SNAT target rather than causing a crash. */ {"ftp-alg-large-payload", "", 0, 0, test_ftp_alg_large_payload, OVS_RO}, + /* Private per-connection storage registry tests. + * Each MUST be run as a separate ovstest invocation so the process-global + * slot counter is fresh (starts at 0). */ + {"private-id-alloc", "", 0, 0, + test_private_id_alloc, OVS_RO}, + {"private-id-exhaustion", "", 0, 0, + test_private_id_exhaustion, OVS_RO}, + {"private-destructor", "", 0, 0, + test_private_destructor, OVS_RO}, {NULL, NULL, 0, 0, NULL, OVS_RO}, }; From patchwork Wed Apr 8 17:05:58 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221005 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=K/jwgtDl; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTxP4qGxz1xv0 for ; Thu, 09 Apr 2026 03:06:33 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id DFF9340FD2; Wed, 8 Apr 2026 17:06:31 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id HHz3xrriH3ON; Wed, 8 Apr 2026 17:06:30 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org EC4F240FD1 Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=K/jwgtDl Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTPS id EC4F240FD1; Wed, 8 Apr 2026 17:06:28 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 97307C0908; Wed, 8 Apr 2026 17:06:28 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id ADB26C0549 for ; Wed, 8 Apr 2026 17:06:27 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 946F78236B for ; Wed, 8 Apr 2026 17:06:27 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id YsfMnbJLu_C9 for ; Wed, 8 Apr 2026 17:06:26 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org 5FBEC82287 Authentication-Results: smtp1.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 5FBEC82287 Authentication-Results: smtp1.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=K/jwgtDl Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 5FBEC82287 for ; Wed, 8 Apr 2026 17:06:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775667985; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JtAFxhm8w4npgD70Ayu/PD8hRcujWZq9JkuXr5KbH08=; b=K/jwgtDlx6M+lN8v/xJL+OCLPBMIO6XGTmiggU5jiXwjeTVN/eJjk/Yc/EjcaZVspmH3Pn BiYlepTiJkQAxyR851GpXmuBOu981bjhYL1BftH09k5+m4bCiyi3Dsj/zXjXvoVdSIF+AC DDE1IJPtYh4FucKZ4pGRjnM6UkeeNk4= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-70-rGfthQJwPLCunhuJZ_hCFg-1; Wed, 08 Apr 2026 13:06:21 -0400 X-MC-Unique: rGfthQJwPLCunhuJZ_hCFg-1 X-Mimecast-MFC-AGG-ID: rGfthQJwPLCunhuJZ_hCFg_1775667980 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B7E8B1956096; Wed, 8 Apr 2026 17:06:20 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 595BD30001BB; Wed, 8 Apr 2026 17:06:19 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:05:58 -0400 Message-ID: <20260408170613.587902-3-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: yXaJu7CHH-r1gTKYNtx2fWOhNgix-8Ib_SLobCCCWT0_1775667980 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 02/12] conntrack: Introduce an observer pattern infrastructure as a hook. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Conntrack has a number of places where it would be useful to monitor changes. Currently, these are hard coded spots for things like alg handlers that need to fire when connections are added and transition so that we can monitor packets. This leads to very difficult to read code, with messy branches all over the place, and a bunch of unrelated functions mixed together. Rename the conn_update_state_alg() function to conn_update_state_dist() and abstract away the FTP specific hook logic. The original function required manual modification to add additional handlers, which we want to avoid and make more generic for future additions (which can include observers for hardware offloads). The hooks are priority based so that some high priority hooks can run early, while later hooks that consume the event can run later. This infrastructure relies on the fact that there is only one global conntrack instance. If the conntrack ever returns to allowing for multiple instances, this will need to be re-abstracted. Signed-off-by: Aaron Conole --- lib/conntrack-private.h | 55 +++++++++++++++ lib/conntrack.c | 148 +++++++++++++++++++++++++++------------- 2 files changed, 157 insertions(+), 46 deletions(-) diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index bd095277cd..a5bf1bb519 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -252,6 +252,17 @@ struct conntrack { * 3. 'resources_lock' */ +/* ALG control type identifiers. These determine which application-layer + * gateway helper applies to a given connection. */ +enum ct_alg_ctl_type { + CT_ALG_CTL_NONE, + CT_ALG_CTL_FTP, + CT_ALG_CTL_TFTP, + /* SIP is not enabled through OpenFlow and is present only as an example + * of an ALG that allows a wildcard source IP address. */ + CT_ALG_CTL_SIP, +}; + extern struct ct_l4_proto ct_proto_tcp; extern struct ct_l4_proto ct_proto_other; extern struct ct_l4_proto ct_proto_icmp4; @@ -268,6 +279,50 @@ struct ct_l4_proto { struct ct_dpif_protoinfo *); }; +/* Transient lookup context built for each packet; private to conntrack.c and + * the ALG helper modules. */ +struct conn_lookup_ctx { + struct conn_key key; + struct conn *conn; + uint32_t hash; + bool reply; + bool icmp_related; +}; + +/* conn_update_state_dist() hook + * + * Modules may register a hook to intercept connection state transitions. + * conn_update_state_dist() calls registered hooks in ascending priority order + * until one returns true (meaning the hook handled the update, including any + * call to conn_update_state() it needed to make). If no hook claims the + * packet the caller falls through to the default conn_update_state() path. + * + * Priority: lower numerical value -> higher precedence -> called first. + * Named constants below cover the common cases; any int value is accepted. + * + * conn->lock is NOT held on entry; hooks must acquire it as needed following + * the lock-ordering rules. + * + * Registration must happen before any connection is processed (i.e. at module + * initialisation time, under ovsthread_once). + */ +typedef bool (*conn_update_state_hook_fn)( + struct conntrack *ct, struct dp_packet *pkt, + struct conn_lookup_ctx *ctx, struct conn *conn, + const struct nat_action_info_t *nat_action_info, + enum ct_alg_ctl_type ct_alg_ctl, long long now, + bool *create_new_conn); + +enum conn_update_state_hook_priority { + CT_HOOK_PRI_HIGH = 50, /* Before built-in ALG handlers. */ + CT_HOOK_PRI_NORMAL = 100, /* Default; used by built-in ALG handlers. */ + CT_HOOK_PRI_LOW = 150, /* After built-in ALG handlers. */ +}; + +void conn_update_state_hook_register(int priority, + conn_update_state_hook_fn); +void conn_update_state_hook_unregister(conn_update_state_hook_fn); + /* conn_private_get() / conn_private_set() * * Fast-path accessors for per-connection private storage slots. Both diff --git a/lib/conntrack.c b/lib/conntrack.c index 373c781eb9..d81abe456a 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -55,14 +55,6 @@ COVERAGE_DEFINE(conntrack_l4csum_err); COVERAGE_DEFINE(conntrack_lookup_natted_miss); COVERAGE_DEFINE(conntrack_zone_full); -struct conn_lookup_ctx { - struct conn_key key; - struct conn *conn; - uint32_t hash; - bool reply; - bool icmp_related; -}; - enum ftp_ctl_pkt { /* Control packets with address and/or port specifiers. */ CT_FTP_CTL_INTEREST, @@ -77,15 +69,6 @@ enum ct_alg_mode { CT_TFTP_MODE, }; -enum ct_alg_ctl_type { - CT_ALG_CTL_NONE, - CT_ALG_CTL_FTP, - CT_ALG_CTL_TFTP, - /* SIP is not enabled through Openflow and presently only used as - * an example of an alg that allows a wildcard src ip. */ - CT_ALG_CTL_SIP, -}; - struct zone_limit { struct cmap_node node; struct conntrack_zone_limit czl; @@ -170,6 +153,29 @@ struct ct_private_slot { static struct ct_private_slot ct_private_slots[CT_CONN_PRIVATE_MAX]; static atomic_uint32_t ct_private_next_id = 0; +/* conn_update_state_dist() hook registry. + * + * Written once at module initialisation (under ovsthread_once), then + * read-only during packet processing, so no additional locking is needed. + * Entries are kept sorted by ascending priority so conn_update_state_dist() + * can iterate in order without extra bookkeeping. + */ +#define CT_UPDATE_STATE_HOOKS_MAX 8 + +struct ct_update_hook { + int priority; + conn_update_state_hook_fn fn; +}; + +static struct ct_update_hook ct_update_hooks[CT_UPDATE_STATE_HOOKS_MAX]; +static size_t n_ct_update_hooks; + +static bool ftp_conn_update_state_hook(struct conntrack *, struct dp_packet *, + struct conn_lookup_ctx *, struct conn *, + const struct nat_action_info_t *, + enum ct_alg_ctl_type, long long, + bool *); + static void handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, struct dp_packet *pkt, struct conn *ec, long long now, @@ -305,6 +311,9 @@ conntrack_init(void) l4_protos[IPPROTO_ICMP] = &ct_proto_icmp4; l4_protos[IPPROTO_ICMPV6] = &ct_proto_icmp6; + conn_update_state_hook_register(CT_HOOK_PRI_NORMAL, + ftp_conn_update_state_hook); + ovsthread_once_done(&setup_l4_once); } return ct; @@ -1314,38 +1323,56 @@ check_orig_tuple(struct conntrack *ct, struct dp_packet *pkt, } static bool -conn_update_state_alg(struct conntrack *ct, struct dp_packet *pkt, - struct conn_lookup_ctx *ctx, struct conn *conn, - const struct nat_action_info_t *nat_action_info, - enum ct_alg_ctl_type ct_alg_ctl, long long now, - bool *create_new_conn) -{ - if (is_ftp_ctl(ct_alg_ctl)) { - /* Keep sequence tracking in sync with the source of the - * sequence skew. */ +ftp_conn_update_state_hook(struct conntrack *ct, struct dp_packet *pkt, + struct conn_lookup_ctx *ctx, struct conn *conn, + const struct nat_action_info_t *nat_action_info, + enum ct_alg_ctl_type ct_alg_ctl, long long now, + bool *create_new_conn) +{ + if (!is_ftp_ctl(ct_alg_ctl)) { + return false; + } + + /* Keep sequence tracking in sync with the source of the sequence skew. */ + ovs_mutex_lock(&conn->lock); + if (ctx->reply != conn->seq_skew_dir) { + handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER, + !!nat_action_info); + /* conn_update_state acquires conn->lock for unrelated fields. */ + ovs_mutex_unlock(&conn->lock); + *create_new_conn = conn_update_state(ct, pkt, ctx, conn, now); + } else { + ovs_mutex_unlock(&conn->lock); + *create_new_conn = conn_update_state(ct, pkt, ctx, conn, now); ovs_mutex_lock(&conn->lock); - if (ctx->reply != conn->seq_skew_dir) { + if (!*create_new_conn) { handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER, !!nat_action_info); - /* conn_update_state locks for unrelated fields, so unlock. */ - ovs_mutex_unlock(&conn->lock); - *create_new_conn = conn_update_state(ct, pkt, ctx, conn, now); - } else { - /* conn_update_state locks for unrelated fields, so unlock. */ - ovs_mutex_unlock(&conn->lock); - *create_new_conn = conn_update_state(ct, pkt, ctx, conn, now); - ovs_mutex_lock(&conn->lock); - if (*create_new_conn == false) { - handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER, - !!nat_action_info); - } - ovs_mutex_unlock(&conn->lock); } - return true; + ovs_mutex_unlock(&conn->lock); } - return false; + return true; } +/* Distribute a connection state-transition event to registered hooks. + * Returns true if a hook handled the update (and set *create_new_conn), + * false if the caller should fall through to default conn_update_state(). */ +static bool +conn_update_state_dist(struct conntrack *ct, struct dp_packet *pkt, + struct conn_lookup_ctx *ctx, struct conn *conn, + const struct nat_action_info_t *nat_action_info, + enum ct_alg_ctl_type ct_alg_ctl, long long now, + bool *create_new_conn) +{ + for (size_t i = 0; i < n_ct_update_hooks; i++) { + if (ct_update_hooks[i].fn(ct, pkt, ctx, conn, nat_action_info, + ct_alg_ctl, now, create_new_conn)) { + return true; + } + } + return false; + } + static void set_cached_conn(const struct nat_action_info_t *nat_action_info, const struct conn_lookup_ctx *ctx, struct conn *conn, @@ -1450,10 +1477,10 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, enum ct_alg_ctl_type ct_alg_ctl = get_alg_ctl_type(pkt, helper); if (OVS_LIKELY(conn)) { - if (OVS_LIKELY(!conn_update_state_alg(ct, pkt, ctx, conn, - nat_action_info, - ct_alg_ctl, now, - &create_new_conn))) { + if (OVS_LIKELY(!conn_update_state_dist(ct, pkt, ctx, conn, + nat_action_info, + ct_alg_ctl, now, + &create_new_conn))) { create_new_conn = conn_update_state(ct, pkt, ctx, conn, now); } if (nat_action_info && !create_new_conn) { @@ -3746,6 +3773,35 @@ adj_seqnum(ovs_16aligned_be32 *val, int32_t inc) put_16aligned_be32(val, htonl(ntohl(get_16aligned_be32(val)) + inc)); } +void +conn_update_state_hook_register(int priority, conn_update_state_hook_fn fn) +{ + ovs_assert(n_ct_update_hooks < CT_UPDATE_STATE_HOOKS_MAX); + + /* Insert in sorted order (ascending priority = higher precedence). */ + size_t i = n_ct_update_hooks; + while (i > 0 && ct_update_hooks[i - 1].priority > priority) { + ct_update_hooks[i] = ct_update_hooks[i - 1]; + i--; + } + ct_update_hooks[i].priority = priority; + ct_update_hooks[i].fn = fn; + n_ct_update_hooks++; +} + +void +conn_update_state_hook_unregister(conn_update_state_hook_fn fn) +{ + for (size_t i = 0; i < n_ct_update_hooks; i++) { + if (ct_update_hooks[i].fn == fn) { + memmove(&ct_update_hooks[i], &ct_update_hooks[i + 1], + (n_ct_update_hooks - i - 1) * sizeof ct_update_hooks[0]); + n_ct_update_hooks--; + return; + } + } +} + static void handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, struct dp_packet *pkt, struct conn *ec, long long now, From patchwork Wed Apr 8 17:05:59 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221006 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BbJgsaac; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTxb4HFBz1xv0 for ; Thu, 09 Apr 2026 03:06:43 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 28E904052C; Wed, 8 Apr 2026 17:06:42 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id jYiIFYIyM84k; Wed, 8 Apr 2026 17:06:36 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 8DB9E404AE Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BbJgsaac Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTPS id 8DB9E404AE; Wed, 8 Apr 2026 17:06:36 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 84C41C054A; Wed, 8 Apr 2026 17:06:36 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 01B2EC0549 for ; Wed, 8 Apr 2026 17:06:35 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 6C17A8266A for ; Wed, 8 Apr 2026 17:06:33 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id x8_cEeoRP0Ac for ; Wed, 8 Apr 2026 17:06:28 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org D81D0824DF Authentication-Results: smtp1.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org D81D0824DF Authentication-Results: smtp1.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=BbJgsaac Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id D81D0824DF for ; Wed, 8 Apr 2026 17:06:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775667985; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=o4aZV4mhAAH0vGNNjfTJOSw+BFfad3311g7d++pW9z0=; b=BbJgsaacSb8bvoDGbrbK5o50d62tDSP1Ig0+iqISb5JKQpAtlfd3A/b08y9hVyNS6GqnVn Ne2jxGsMbTOcZ3x0jh9waJNGgVKOxcF+mAZrBc6Ycsig3xhggpUk4yCRaOrEtiSWRSW+Lo dvkIXT1A1pK0ECSu9Q8G64/uqHaa9yU= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-634-T5ijwdhjNl-6eCzsyRDGLw-1; Wed, 08 Apr 2026 13:06:24 -0400 X-MC-Unique: T5ijwdhjNl-6eCzsyRDGLw-1 X-Mimecast-MFC-AGG-ID: T5ijwdhjNl-6eCzsyRDGLw_1775667983 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 51AEB18005B0; Wed, 8 Apr 2026 17:06:23 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 08C66300019F; Wed, 8 Apr 2026 17:06:20 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:05:59 -0400 Message-ID: <20260408170613.587902-4-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: AUlXwVd2lPq9vyipe6d9ji5KqwyYVUikpTYpLF-EK9I_1775667983 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 03/12] conntrack: Split the FTP and TFTP handling into separate files. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The FTP and TFTP helpers were scattered all over the conntrack TU making reading the individual FTP parts a bit difficult. Now that the handling is more modular, split them out into their own files. Signed-off-by: Aaron Conole --- lib/automake.mk | 2 + lib/conntrack-ftp.c | 689 ++++++++++++++++++++++++++++++++++++++ lib/conntrack-private.h | 42 +++ lib/conntrack-tftp.c | 47 +++ lib/conntrack.c | 718 +--------------------------------------- 5 files changed, 786 insertions(+), 712 deletions(-) create mode 100644 lib/conntrack-ftp.c create mode 100644 lib/conntrack-tftp.c diff --git a/lib/automake.mk b/lib/automake.mk index c6e988906f..933b71226b 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -86,9 +86,11 @@ lib_libopenvswitch_la_SOURCES = \ lib/compiler.h \ lib/connectivity.c \ lib/connectivity.h \ + lib/conntrack-ftp.c \ lib/conntrack-icmp.c \ lib/conntrack-private.h \ lib/conntrack-tcp.c \ + lib/conntrack-tftp.c \ lib/conntrack-tp.c \ lib/conntrack-tp.h \ lib/conntrack-other.c \ diff --git a/lib/conntrack-ftp.c b/lib/conntrack-ftp.c new file mode 100644 index 0000000000..6ce17c9efe --- /dev/null +++ b/lib/conntrack-ftp.c @@ -0,0 +1,689 @@ +/* + * Copyright (c) 2015-2019 Nicira, Inc. + * Copyright (c) 2026 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include +#include +#include +#include + +#include "conntrack-private.h" +#include "csum.h" +#include "dp-packet.h" +#include "openvswitch/vlog.h" +#include "packets.h" +#include "unaligned.h" +#include "util.h" + +VLOG_DEFINE_THIS_MODULE(conntrack_ftp); + +/* FTP ALG mode: whether the data connection is initiated by the client + * (active) or the server (passive), and whether the session uses IPv6 + * extensions (EPRT/EPSV). */ +enum ct_alg_mode { + CT_FTP_MODE_ACTIVE, + CT_FTP_MODE_PASSIVE, + CT_TFTP_MODE, +}; + +/* String buffer used for parsing FTP string messages. + * This is sized about twice what is needed to leave some + * margin of error. */ +#define LARGEST_FTP_MSG_OF_INTEREST 128 +/* FTP port string used in active mode. */ +#define FTP_PORT_CMD "PORT" +/* FTP pasv string used in passive mode. */ +#define FTP_PASV_REPLY_CODE "227" +/* FTP epsv string used in passive mode. */ +#define FTP_EPSV_REPLY_CODE "229" +/* Maximum decimal digits for port in FTP command. + * The port is represented as two 3 digit numbers with the + * high part a multiple of 256. */ +#define MAX_FTP_PORT_DGTS 3 + +/* FTP extension EPRT string used for active mode. */ +#define FTP_EPRT_CMD "EPRT" +/* FTP extension EPSV string used for passive mode. */ +#define FTP_EPSV_REPLY "EXTENDED PASSIVE" +/* Maximum decimal digits for port in FTP extended command. */ +#define MAX_EXT_FTP_PORT_DGTS 5 +/* FTP extended command code for IPv4. */ +#define FTP_AF_V4 '1' +/* FTP extended command code for IPv6. */ +#define FTP_AF_V6 '2' + +static bool +is_ftp_ctl(const enum ct_alg_ctl_type ct_alg_ctl) +{ + return ct_alg_ctl == CT_ALG_CTL_FTP; +} + +static void +replace_substring(char *substr, size_t substr_size, + size_t total_size, char *rep_str, + size_t rep_str_size) +{ + memmove(substr + rep_str_size, substr + substr_size, + total_size - substr_size); + memcpy(substr, rep_str, rep_str_size); +} + +static void +repl_bytes(char *str, char c1, char c2, int max) +{ + while (*str) { + if (*str == c1) { + *str = c2; + + if (--max == 0) { + break; + } + } + str++; + } +} + +/* Replaces a substring in the packet and rewrites the packet + * size to match. This function assumes the caller has verified + * the lengths to prevent under/over flow. */ +static void +modify_packet(struct dp_packet *pkt, char *pkt_str, size_t size, + char *repl_str, size_t repl_size, + uint32_t orig_used_size) +{ + replace_substring(pkt_str, size, + (const char *) dp_packet_tail(pkt) - pkt_str, + repl_str, repl_size); + dp_packet_set_size(pkt, orig_used_size + (int) repl_size - (int) size); +} + +/* Replace IPV4 address in FTP message with NATed address. */ +static int +repl_ftp_v4_addr(struct dp_packet *pkt, ovs_be32 v4_addr_rep, + char *ftp_data_start, + size_t addr_offset_from_ftp_data_start, + size_t addr_size) +{ + enum { MAX_FTP_V4_NAT_DELTA = 8 }; + + /* EPSV mode. */ + if (addr_offset_from_ftp_data_start == 0 && + addr_size == 0) { + return 0; + } + + /* Do conservative check for pathological MTU usage. */ + uint32_t orig_used_size = dp_packet_size(pkt); + if (orig_used_size + MAX_FTP_V4_NAT_DELTA > + dp_packet_get_allocated(pkt)) { + + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); + VLOG_WARN_RL(&rl, "Unsupported effective MTU %u used with FTP V4", + dp_packet_get_allocated(pkt)); + return 0; + } + + char v4_addr_str[INET_ADDRSTRLEN] = {0}; + ovs_assert(inet_ntop(AF_INET, &v4_addr_rep, v4_addr_str, + sizeof v4_addr_str)); + repl_bytes(v4_addr_str, '.', ',', 0); + modify_packet(pkt, ftp_data_start + addr_offset_from_ftp_data_start, + addr_size, v4_addr_str, strlen(v4_addr_str), + orig_used_size); + return (int) strlen(v4_addr_str) - (int) addr_size; +} + +static char * +skip_non_digits(char *str) +{ + while (!isdigit(*str) && *str != 0) { + str++; + } + return str; +} + +static char * +terminate_number_str(char *str, uint8_t max_digits) +{ + uint8_t digits_found = 0; + while (isdigit(*str) && digits_found <= max_digits) { + str++; + digits_found++; + } + + *str = 0; + return str; +} + +static void +get_ftp_ctl_msg(struct dp_packet *pkt, char *ftp_msg) +{ + struct tcp_header *th = dp_packet_l4(pkt); + char *tcp_hdr = (char *) th; + uint32_t tcp_payload_len = dp_packet_get_tcp_payload_length(pkt); + size_t tcp_payload_of_interest = MIN(tcp_payload_len, + LARGEST_FTP_MSG_OF_INTEREST); + size_t tcp_hdr_len = TCP_OFFSET(th->tcp_ctl) * 4; + + ovs_strlcpy(ftp_msg, tcp_hdr + tcp_hdr_len, + tcp_payload_of_interest); +} + +static enum ftp_ctl_pkt +detect_ftp_ctl_type(const struct conn_lookup_ctx *ctx, + struct dp_packet *pkt) +{ + char ftp_msg[LARGEST_FTP_MSG_OF_INTEREST + 1] = {0}; + get_ftp_ctl_msg(pkt, ftp_msg); + + if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { + if (strncasecmp(ftp_msg, FTP_EPRT_CMD, strlen(FTP_EPRT_CMD)) && + !strcasestr(ftp_msg, FTP_EPSV_REPLY)) { + return CT_FTP_CTL_OTHER; + } + } else { + if (strncasecmp(ftp_msg, FTP_PORT_CMD, strlen(FTP_PORT_CMD)) && + strncasecmp(ftp_msg, FTP_EPRT_CMD, strlen(FTP_EPRT_CMD)) && + strncasecmp(ftp_msg, FTP_PASV_REPLY_CODE, + strlen(FTP_PASV_REPLY_CODE)) && + strncasecmp(ftp_msg, FTP_EPSV_REPLY_CODE, + strlen(FTP_EPSV_REPLY_CODE))) { + return CT_FTP_CTL_OTHER; + } + } + + return CT_FTP_CTL_INTEREST; +} + +static enum ftp_ctl_pkt +process_ftp_ctl_v4(struct conntrack *ct, + struct dp_packet *pkt, + const struct conn *conn_for_expectation, + ovs_be32 *v4_addr_rep, + char **ftp_data_v4_start, + size_t *addr_offset_from_ftp_data_start, + size_t *addr_size) +{ + struct tcp_header *th = dp_packet_l4(pkt); + size_t tcp_hdr_len = TCP_OFFSET(th->tcp_ctl) * 4; + char *tcp_hdr = (char *) th; + *ftp_data_v4_start = tcp_hdr + tcp_hdr_len; + char ftp_msg[LARGEST_FTP_MSG_OF_INTEREST + 1] = {0}; + get_ftp_ctl_msg(pkt, ftp_msg); + char *ftp = ftp_msg; + struct in_addr ip_addr; + enum ct_alg_mode mode; + bool extended = false; + + if (!strncasecmp(ftp, FTP_PORT_CMD, strlen(FTP_PORT_CMD))) { + ftp = ftp_msg + strlen(FTP_PORT_CMD); + mode = CT_FTP_MODE_ACTIVE; + } else if (!strncasecmp(ftp, FTP_EPRT_CMD, strlen(FTP_EPRT_CMD))) { + ftp = ftp_msg + strlen(FTP_EPRT_CMD); + mode = CT_FTP_MODE_ACTIVE; + extended = true; + } else if (!strncasecmp(ftp, FTP_EPSV_REPLY_CODE, + strlen(FTP_EPSV_REPLY_CODE))) { + ftp = ftp_msg + strlen(FTP_EPSV_REPLY_CODE); + mode = CT_FTP_MODE_PASSIVE; + extended = true; + } else { + ftp = ftp_msg + strlen(FTP_PASV_REPLY_CODE); + mode = CT_FTP_MODE_PASSIVE; + } + + /* Find first space. */ + ftp = strchr(ftp, ' '); + if (!ftp) { + return CT_FTP_CTL_INVALID; + } + + /* Find the first digit, after space. */ + ftp = skip_non_digits(ftp); + if (*ftp == 0) { + return CT_FTP_CTL_INVALID; + } + + /* EPRT, verify address family. */ + if (extended && mode == CT_FTP_MODE_ACTIVE) { + if (ftp[0] != FTP_AF_V4 || isdigit(ftp[1])) { + return CT_FTP_CTL_INVALID; + } + + ftp = skip_non_digits(ftp + 1); + if (*ftp == 0) { + return CT_FTP_CTL_INVALID; + } + } + + if (!extended || mode == CT_FTP_MODE_ACTIVE) { + char *ip_addr_start = ftp; + *addr_offset_from_ftp_data_start = ip_addr_start - ftp_msg; + repl_bytes(ftp, ',', '.', 3); + + /* Advance to end of IP address, to terminate it. */ + while (*ftp) { + if (!isdigit(*ftp) && *ftp != '.') { + break; + } + ftp++; + } + *ftp = 0; + ftp++; + + int rc2 = inet_pton(AF_INET, ip_addr_start, &ip_addr); + if (rc2 != 1) { + return CT_FTP_CTL_INVALID; + } + + *addr_size = ftp - ip_addr_start - 1; + } else { + *addr_size = 0; + *addr_offset_from_ftp_data_start = 0; + } + + char *save_ftp = ftp; + uint16_t port_hs; + + if (!extended) { + ftp = terminate_number_str(ftp, MAX_FTP_PORT_DGTS); + if (!ftp) { + return CT_FTP_CTL_INVALID; + } + int value; + if (!str_to_int(save_ftp, 10, &value)) { + return CT_FTP_CTL_INVALID; + } + + /* This is derived from the L4 port maximum is 65535. */ + if (value > 255) { + return CT_FTP_CTL_INVALID; + } + + port_hs = value; + port_hs <<= 8; + + /* Skip over comma. */ + ftp++; + save_ftp = ftp; + bool digit_found = false; + while (isdigit(*ftp)) { + ftp++; + digit_found = true; + } + if (!digit_found) { + return CT_FTP_CTL_INVALID; + } + *ftp = 0; + if (!str_to_int(save_ftp, 10, &value)) { + return CT_FTP_CTL_INVALID; + } + + if (value > 255) { + return CT_FTP_CTL_INVALID; + } + + port_hs |= value; + } else { + ftp = terminate_number_str(ftp, MAX_EXT_FTP_PORT_DGTS); + if (!ftp) { + return CT_FTP_CTL_INVALID; + } + int value; + if (!str_to_int(save_ftp, 10, &value)) { + return CT_FTP_CTL_INVALID; + } + if (value > UINT16_MAX) { + return CT_FTP_CTL_INVALID; + } + port_hs = (uint16_t) value; + } + + ovs_be16 port = htons(port_hs); + ovs_be32 conn_ipv4_addr; + + switch (mode) { + case CT_FTP_MODE_ACTIVE: + *v4_addr_rep = + conn_for_expectation->key_node[CT_DIR_REV].key.dst.addr.ipv4; + conn_ipv4_addr = + conn_for_expectation->key_node[CT_DIR_FWD].key.src.addr.ipv4; + break; + case CT_FTP_MODE_PASSIVE: + *v4_addr_rep = + conn_for_expectation->key_node[CT_DIR_FWD].key.dst.addr.ipv4; + conn_ipv4_addr = + conn_for_expectation->key_node[CT_DIR_REV].key.src.addr.ipv4; + break; + case CT_TFTP_MODE: + default: + OVS_NOT_REACHED(); + } + + if (!extended || mode == CT_FTP_MODE_ACTIVE) { + ovs_be32 ftp_ipv4_addr; + ftp_ipv4_addr = ip_addr.s_addr; + /* Although most servers will block this exploit, there may be some + * less well managed. */ + if (ftp_ipv4_addr != conn_ipv4_addr && ftp_ipv4_addr != *v4_addr_rep) { + return CT_FTP_CTL_INVALID; + } + } + + expectation_create(ct, port, conn_for_expectation, + !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); + return CT_FTP_CTL_INTEREST; +} + +static char * +skip_ipv6_digits(char *str) +{ + while (isxdigit(*str) || *str == ':' || *str == '.') { + str++; + } + return str; +} + +static enum ftp_ctl_pkt +process_ftp_ctl_v6(struct conntrack *ct, + struct dp_packet *pkt, + const struct conn *conn_for_exp, + union ct_addr *v6_addr_rep, char **ftp_data_start, + size_t *addr_offset_from_ftp_data_start, + size_t *addr_size, enum ct_alg_mode *mode) +{ + struct tcp_header *th = dp_packet_l4(pkt); + size_t tcp_hdr_len = TCP_OFFSET(th->tcp_ctl) * 4; + char *tcp_hdr = (char *) th; + char ftp_msg[LARGEST_FTP_MSG_OF_INTEREST + 1] = {0}; + get_ftp_ctl_msg(pkt, ftp_msg); + *ftp_data_start = tcp_hdr + tcp_hdr_len; + char *ftp = ftp_msg; + struct in6_addr ip6_addr; + + if (!strncasecmp(ftp, FTP_EPRT_CMD, strlen(FTP_EPRT_CMD))) { + ftp = ftp_msg + strlen(FTP_EPRT_CMD); + ftp = skip_non_digits(ftp); + if (*ftp != FTP_AF_V6 || isdigit(ftp[1])) { + return CT_FTP_CTL_INVALID; + } + /* Jump over delimiter. */ + ftp += 2; + + memset(&ip6_addr, 0, sizeof ip6_addr); + char *ip_addr_start = ftp; + *addr_offset_from_ftp_data_start = ip_addr_start - ftp_msg; + ftp = skip_ipv6_digits(ftp); + *ftp = 0; + *addr_size = ftp - ip_addr_start; + int rc2 = inet_pton(AF_INET6, ip_addr_start, &ip6_addr); + if (rc2 != 1) { + return CT_FTP_CTL_INVALID; + } + ftp++; + *mode = CT_FTP_MODE_ACTIVE; + } else { + ftp = ftp_msg + strcspn(ftp_msg, "("); + ftp = skip_non_digits(ftp); + if (!isdigit(*ftp)) { + return CT_FTP_CTL_INVALID; + } + + /* Not used for passive mode. */ + *addr_offset_from_ftp_data_start = 0; + *addr_size = 0; + + *mode = CT_FTP_MODE_PASSIVE; + } + + char *save_ftp = ftp; + ftp = terminate_number_str(ftp, MAX_EXT_FTP_PORT_DGTS); + if (!ftp) { + return CT_FTP_CTL_INVALID; + } + + int value; + if (!str_to_int(save_ftp, 10, &value)) { + return CT_FTP_CTL_INVALID; + } + if (value > CT_MAX_L4_PORT) { + return CT_FTP_CTL_INVALID; + } + + uint16_t port_hs = value; + ovs_be16 port = htons(port_hs); + + switch (*mode) { + case CT_FTP_MODE_ACTIVE: + *v6_addr_rep = conn_for_exp->key_node[CT_DIR_REV].key.dst.addr; + /* Although most servers will block this exploit, there may be some + * less well managed. */ + if (memcmp(&ip6_addr, &v6_addr_rep->ipv6, sizeof ip6_addr) && + memcmp(&ip6_addr, + &conn_for_exp->key_node[CT_DIR_FWD].key.src.addr.ipv6, + sizeof ip6_addr)) { + return CT_FTP_CTL_INVALID; + } + break; + case CT_FTP_MODE_PASSIVE: + *v6_addr_rep = conn_for_exp->key_node[CT_DIR_FWD].key.dst.addr; + break; + case CT_TFTP_MODE: + default: + OVS_NOT_REACHED(); + } + + expectation_create(ct, port, conn_for_exp, + !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); + return CT_FTP_CTL_INTEREST; +} + +static int +repl_ftp_v6_addr(struct dp_packet *pkt, union ct_addr v6_addr_rep, + char *ftp_data_start, + size_t addr_offset_from_ftp_data_start, + size_t addr_size, enum ct_alg_mode mode) +{ + /* This is slightly bigger than really possible. */ + enum { MAX_FTP_V6_NAT_DELTA = 45 }; + + if (mode == CT_FTP_MODE_PASSIVE) { + return 0; + } + + /* Do conservative check for pathological MTU usage. */ + uint32_t orig_used_size = dp_packet_size(pkt); + if (orig_used_size + MAX_FTP_V6_NAT_DELTA > + dp_packet_get_allocated(pkt)) { + + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); + VLOG_WARN_RL(&rl, "Unsupported effective MTU %u used with FTP V6", + dp_packet_get_allocated(pkt)); + return 0; + } + + char v6_addr_str[INET6_ADDRSTRLEN] = {0}; + ovs_assert(inet_ntop(AF_INET6, &v6_addr_rep.ipv6, v6_addr_str, + sizeof v6_addr_str)); + modify_packet(pkt, ftp_data_start + addr_offset_from_ftp_data_start, + addr_size, v6_addr_str, strlen(v6_addr_str), + orig_used_size); + return (int) strlen(v6_addr_str) - (int) addr_size; +} + +/* Increment/decrement a TCP sequence number. */ +static void +adj_seqnum(ovs_16aligned_be32 *val, int32_t inc) +{ + put_16aligned_be32(val, htonl(ntohl(get_16aligned_be32(val)) + inc)); +} + +static void +handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, + struct dp_packet *pkt, struct conn *ec, long long now, + enum ftp_ctl_pkt ftp_ctl, bool nat) +{ + struct ip_header *l3_hdr = dp_packet_l3(pkt); + ovs_be32 v4_addr_rep = 0; + union ct_addr v6_addr_rep; + size_t addr_offset_from_ftp_data_start = 0; + size_t addr_size = 0; + char *ftp_data_start; + enum ct_alg_mode mode = CT_FTP_MODE_ACTIVE; + + if (detect_ftp_ctl_type(ctx, pkt) != ftp_ctl) { + return; + } + + struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); + int64_t seq_skew = 0; + + if (ftp_ctl == CT_FTP_CTL_INTEREST) { + enum ftp_ctl_pkt rc; + if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { + rc = process_ftp_ctl_v6(ct, pkt, ec, + &v6_addr_rep, &ftp_data_start, + &addr_offset_from_ftp_data_start, + &addr_size, &mode); + } else { + rc = process_ftp_ctl_v4(ct, pkt, ec, + &v4_addr_rep, &ftp_data_start, + &addr_offset_from_ftp_data_start, + &addr_size); + } + if (rc == CT_FTP_CTL_INVALID) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); + VLOG_WARN_RL(&rl, "Invalid FTP control packet format"); + pkt->md.ct_state |= CS_TRACKED | CS_INVALID; + return; + } else if (rc == CT_FTP_CTL_INTEREST) { + uint16_t ip_len; + + if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { + if (nat) { + seq_skew = repl_ftp_v6_addr(pkt, v6_addr_rep, + ftp_data_start, + addr_offset_from_ftp_data_start, + addr_size, mode); + } + + if (seq_skew) { + ip_len = ntohs(nh6->ip6_ctlun.ip6_un1.ip6_un1_plen) + + seq_skew; + nh6->ip6_ctlun.ip6_un1.ip6_un1_plen = htons(ip_len); + } + } else { + if (nat) { + seq_skew = repl_ftp_v4_addr(pkt, v4_addr_rep, + ftp_data_start, + addr_offset_from_ftp_data_start, + addr_size); + } + if (seq_skew) { + ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew; + if (dp_packet_ip_checksum_valid(pkt)) { + dp_packet_ip_checksum_set_partial(pkt); + } else { + l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum, + l3_hdr->ip_tot_len, + htons(ip_len)); + } + l3_hdr->ip_tot_len = htons(ip_len); + } + } + } else { + OVS_NOT_REACHED(); + } + } + + struct tcp_header *th = dp_packet_l4(pkt); + + if (nat && ec->seq_skew != 0) { + ctx->reply != ec->seq_skew_dir ? + adj_seqnum(&th->tcp_ack, -ec->seq_skew) : + adj_seqnum(&th->tcp_seq, ec->seq_skew); + } + + if (dp_packet_l4_checksum_valid(pkt)) { + dp_packet_l4_checksum_set_partial(pkt); + } else { + th->tcp_csum = 0; + if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { + th->tcp_csum = packet_csum_upperlayer6(nh6, th, ctx->key.nw_proto, + dp_packet_l4_size(pkt)); + } else { + uint32_t tcp_csum = packet_csum_pseudoheader(l3_hdr); + th->tcp_csum = csum_finish( + csum_continue(tcp_csum, th, dp_packet_l4_size(pkt))); + } + } + + if (seq_skew) { + conn_seq_skew_set(ct, ec, now, seq_skew + ec->seq_skew, + ctx->reply); + } +} + +/* FTP requires sequence-number tracking to stay in sync with the source of + * any sequence skew introduced by address/port rewriting. This hook + * interleaves handle_ftp_ctl() calls with conn_update_state() depending on + * packet direction so that the skew accounting is always correct. */ +static bool +ftp_conn_update_state_hook(struct conntrack *ct, struct dp_packet *pkt, + struct conn_lookup_ctx *ctx, struct conn *conn, + const struct nat_action_info_t *nat_action_info, + enum ct_alg_ctl_type ct_alg_ctl, long long now, + bool *create_new_conn) +{ + if (!is_ftp_ctl(ct_alg_ctl)) { + return false; + } + + /* Keep sequence tracking in sync with the source of the sequence skew. */ + ovs_mutex_lock(&conn->lock); + if (ctx->reply != conn->seq_skew_dir) { + handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER, + !!nat_action_info); + /* conn_update_state acquires conn->lock for unrelated fields. */ + ovs_mutex_unlock(&conn->lock); + *create_new_conn = conn_update_state(ct, pkt, ctx, conn, now); + } else { + ovs_mutex_unlock(&conn->lock); + *create_new_conn = conn_update_state(ct, pkt, ctx, conn, now); + ovs_mutex_lock(&conn->lock); + if (!*create_new_conn) { + handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER, + !!nat_action_info); + } + ovs_mutex_unlock(&conn->lock); + } + return true; +} + +void +conntrack_ftp_init(void) +{ + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; + + if (ovsthread_once_start(&once)) { + conn_update_state_hook_register(CT_HOOK_PRI_NORMAL, + ftp_conn_update_state_hook); + alg_helpers[CT_ALG_CTL_FTP] = handle_ftp_ctl; + ovsthread_once_done(&once); + } +} diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h index a5bf1bb519..8eab1d3703 100644 --- a/lib/conntrack-private.h +++ b/lib/conntrack-private.h @@ -177,6 +177,9 @@ enum ct_ephemeral_range { MAX_NAT_EPHEMERAL_PORT = 65535 }; +/* The maximum TCP or UDP port number. */ +#define CT_MAX_L4_PORT 65535 + #define IN_RANGE(curr, min, max) \ (curr >= min && curr <= max) @@ -261,6 +264,9 @@ enum ct_alg_ctl_type { /* SIP is not enabled through OpenFlow and is present only as an example * of an ALG that allows a wildcard source IP address. */ CT_ALG_CTL_SIP, + + /* MAX ALG */ + CT_ALG_CTL_MAX, }; extern struct ct_l4_proto ct_proto_tcp; @@ -289,6 +295,28 @@ struct conn_lookup_ctx { bool icmp_related; }; +/* FTP control-packet classification used by ALG helpers. + * CT_FTP_CTL_INTEREST carries an address/port specifier (PORT, PASV, EPRT, + * EPSV); CT_FTP_CTL_OTHER does not; CT_FTP_CTL_INVALID is malformed. */ +enum ftp_ctl_pkt { + CT_FTP_CTL_INTEREST, + CT_FTP_CTL_OTHER, + CT_FTP_CTL_INVALID, +}; + +/* ALG helper callback signature. Each registered helper receives the + * classified control-packet type so it can decide whether to act. */ +typedef void (*alg_helper)(struct conntrack *ct, + const struct conn_lookup_ctx *ctx, + struct dp_packet *pkt, + struct conn *conn_for_expectation, + long long now, enum ftp_ctl_pkt ftp_ctl, + bool nat); + +/* Array indexed by ct_alg_ctl_type; populated by per-module init functions + * (conntrack_ftp_init, conntrack_tftp_init, ...) before first use. */ +extern alg_helper alg_helpers[]; + /* conn_update_state_dist() hook * * Modules may register a hook to intercept connection state transitions. @@ -323,6 +351,20 @@ void conn_update_state_hook_register(int priority, conn_update_state_hook_fn); void conn_update_state_hook_unregister(conn_update_state_hook_fn); +/* Functions in conntrack.c that ALG modules need. */ +bool conn_update_state(struct conntrack *ct, struct dp_packet *pkt, + struct conn_lookup_ctx *ctx, struct conn *conn, + long long now); +void conn_seq_skew_set(struct conntrack *ct, const struct conn *conn_in, + long long now, int seq_skew, bool seq_skew_dir); +void expectation_create(struct conntrack *ct, ovs_be16 dst_port, + const struct conn *parent_conn, bool reply, + bool src_ip_wc, bool skip_nat); + +/* ALG module initialization functions. */ +void conntrack_ftp_init(void); +void conntrack_tftp_init(void); + /* conn_private_get() / conn_private_set() * * Fast-path accessors for per-connection private storage slots. Both diff --git a/lib/conntrack-tftp.c b/lib/conntrack-tftp.c new file mode 100644 index 0000000000..61297f7240 --- /dev/null +++ b/lib/conntrack-tftp.c @@ -0,0 +1,47 @@ +/* + * Copyright (c) 2015-2019 Nicira, Inc. + * Copyright (c) 2026 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include "conntrack-private.h" +#include "dp-packet.h" +#include "ovs-thread.h" +#include "packets.h" + +static void +handle_tftp_ctl(struct conntrack *ct, + const struct conn_lookup_ctx *ctx OVS_UNUSED, + struct dp_packet *pkt, struct conn *conn_for_expectation, + long long now OVS_UNUSED, enum ftp_ctl_pkt ftp_ctl OVS_UNUSED, + bool nat OVS_UNUSED) +{ + expectation_create(ct, + conn_for_expectation->key_node[CT_DIR_FWD].key.src.port, + conn_for_expectation, + !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); +} + +void +conntrack_tftp_init(void) +{ + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; + + if (ovsthread_once_start(&once)) { + alg_helpers[CT_ALG_CTL_TFTP] = handle_tftp_ctl; + ovsthread_once_done(&once); + } +} diff --git a/lib/conntrack.c b/lib/conntrack.c index d81abe456a..462c0e0ad1 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -55,20 +55,6 @@ COVERAGE_DEFINE(conntrack_l4csum_err); COVERAGE_DEFINE(conntrack_lookup_natted_miss); COVERAGE_DEFINE(conntrack_zone_full); -enum ftp_ctl_pkt { - /* Control packets with address and/or port specifiers. */ - CT_FTP_CTL_INTEREST, - /* Control packets without address and/or port specifiers. */ - CT_FTP_CTL_OTHER, - CT_FTP_CTL_INVALID, -}; - -enum ct_alg_mode { - CT_FTP_MODE_ACTIVE, - CT_FTP_MODE_PASSIVE, - CT_TFTP_MODE, -}; - struct zone_limit { struct cmap_node node; struct conntrack_zone_limit czl; @@ -117,24 +103,6 @@ static struct alg_exp_node * expectation_lookup(struct hmap *alg_expectations, const struct conn_key *key, uint32_t basis, bool src_ip_wc); -static int -repl_ftp_v4_addr(struct dp_packet *pkt, ovs_be32 v4_addr_rep, - char *ftp_data_v4_start, - size_t addr_offset_from_ftp_data_start, size_t addr_size); - -static enum ftp_ctl_pkt -process_ftp_ctl_v4(struct conntrack *ct, - struct dp_packet *pkt, - const struct conn *conn_for_expectation, - ovs_be32 *v4_addr_rep, - char **ftp_data_v4_start, - size_t *addr_offset_from_ftp_data_start, - size_t *addr_size); - -static enum ftp_ctl_pkt -detect_ftp_ctl_type(const struct conn_lookup_ctx *ctx, - struct dp_packet *pkt); - static void expectation_clean(struct conntrack *ct, const struct conn_key *parent_key); @@ -170,64 +138,8 @@ struct ct_update_hook { static struct ct_update_hook ct_update_hooks[CT_UPDATE_STATE_HOOKS_MAX]; static size_t n_ct_update_hooks; -static bool ftp_conn_update_state_hook(struct conntrack *, struct dp_packet *, - struct conn_lookup_ctx *, struct conn *, - const struct nat_action_info_t *, - enum ct_alg_ctl_type, long long, - bool *); - -static void -handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, - struct dp_packet *pkt, struct conn *ec, long long now, - enum ftp_ctl_pkt ftp_ctl, bool nat); - -static void -handle_tftp_ctl(struct conntrack *ct, - const struct conn_lookup_ctx *ctx OVS_UNUSED, - struct dp_packet *pkt, struct conn *conn_for_expectation, - long long now OVS_UNUSED, enum ftp_ctl_pkt ftp_ctl OVS_UNUSED, - bool nat OVS_UNUSED); - -typedef void (*alg_helper)(struct conntrack *ct, - const struct conn_lookup_ctx *ctx, - struct dp_packet *pkt, - struct conn *conn_for_expectation, - long long now, enum ftp_ctl_pkt ftp_ctl, - bool nat); - -static alg_helper alg_helpers[] = { - [CT_ALG_CTL_NONE] = NULL, - [CT_ALG_CTL_FTP] = handle_ftp_ctl, - [CT_ALG_CTL_TFTP] = handle_tftp_ctl, -}; +alg_helper alg_helpers[CT_ALG_CTL_MAX]; -/* The maximum TCP or UDP port number. */ -#define CT_MAX_L4_PORT 65535 -/* String buffer used for parsing FTP string messages. - * This is sized about twice what is needed to leave some - * margin of error. */ -#define LARGEST_FTP_MSG_OF_INTEREST 128 -/* FTP port string used in active mode. */ -#define FTP_PORT_CMD "PORT" -/* FTP pasv string used in passive mode. */ -#define FTP_PASV_REPLY_CODE "227" -/* FTP epsv string used in passive mode. */ -#define FTP_EPSV_REPLY_CODE "229" -/* Maximum decimal digits for port in FTP command. - * The port is represented as two 3 digit numbers with the - * high part a multiple of 256. */ -#define MAX_FTP_PORT_DGTS 3 - -/* FTP extension EPRT string used for active mode. */ -#define FTP_EPRT_CMD "EPRT" -/* FTP extension EPSV string used for passive mode. */ -#define FTP_EPSV_REPLY "EXTENDED PASSIVE" -/* Maximum decimal digits for port in FTP extended command. */ -#define MAX_EXT_FTP_PORT_DGTS 5 -/* FTP extended command code for IPv4. */ -#define FTP_AF_V4 '1' -/* FTP extended command code for IPv6. */ -#define FTP_AF_V6 '2' /* Used to indicate a wildcard L4 source port number for ALGs. * This is used for port numbers that we cannot predict in * expectations. */ @@ -311,8 +223,8 @@ conntrack_init(void) l4_protos[IPPROTO_ICMP] = &ct_proto_icmp4; l4_protos[IPPROTO_ICMPV6] = &ct_proto_icmp6; - conn_update_state_hook_register(CT_HOOK_PRI_NORMAL, - ftp_conn_update_state_hook); + conntrack_ftp_init(); + conntrack_tftp_init(); ovsthread_once_done(&setup_l4_once); } @@ -835,12 +747,6 @@ get_ip_proto(const struct dp_packet *pkt) return ip_proto; } -static bool -is_ftp_ctl(const enum ct_alg_ctl_type ct_alg_ctl) -{ - return ct_alg_ctl == CT_ALG_CTL_FTP; -} - static enum ct_alg_ctl_type get_alg_ctl_type(const struct dp_packet *pkt, const char *helper) { @@ -1044,7 +950,7 @@ nat_packet(struct dp_packet *pkt, struct conn *conn, bool reply, bool related) } } -static void +void conn_seq_skew_set(struct conntrack *ct, const struct conn *conn_in, long long now, int seq_skew, bool seq_skew_dir) { @@ -1202,7 +1108,7 @@ nat_res_exhaustion: return NULL; } -static bool +bool conn_update_state(struct conntrack *ct, struct dp_packet *pkt, struct conn_lookup_ctx *ctx, struct conn *conn, long long now) @@ -1322,38 +1228,6 @@ check_orig_tuple(struct conntrack *ct, struct dp_packet *pkt, return *conn ? true : false; } -static bool -ftp_conn_update_state_hook(struct conntrack *ct, struct dp_packet *pkt, - struct conn_lookup_ctx *ctx, struct conn *conn, - const struct nat_action_info_t *nat_action_info, - enum ct_alg_ctl_type ct_alg_ctl, long long now, - bool *create_new_conn) -{ - if (!is_ftp_ctl(ct_alg_ctl)) { - return false; - } - - /* Keep sequence tracking in sync with the source of the sequence skew. */ - ovs_mutex_lock(&conn->lock); - if (ctx->reply != conn->seq_skew_dir) { - handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER, - !!nat_action_info); - /* conn_update_state acquires conn->lock for unrelated fields. */ - ovs_mutex_unlock(&conn->lock); - *create_new_conn = conn_update_state(ct, pkt, ctx, conn, now); - } else { - ovs_mutex_unlock(&conn->lock); - *create_new_conn = conn_update_state(ct, pkt, ctx, conn, now); - ovs_mutex_lock(&conn->lock); - if (!*create_new_conn) { - handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER, - !!nat_action_info); - } - ovs_mutex_unlock(&conn->lock); - } - return true; -} - /* Distribute a connection state-transition event to registered hooks. * Returns true if a hook handled the update (and set *create_new_conn), * false if the caller should fall through to default conn_update_state(). */ @@ -3238,7 +3112,7 @@ expectation_clean(struct conntrack *ct, const struct conn_key *parent_key) ovs_rwlock_unlock(&ct->resources_lock); } -static void +void expectation_create(struct conntrack *ct, ovs_be16 dst_port, const struct conn *parent_conn, bool reply, bool src_ip_wc, bool skip_nat) @@ -3312,467 +3186,6 @@ expectation_create(struct conntrack *ct, ovs_be16 dst_port, ovs_rwlock_unlock(&ct->resources_lock); } -static void -replace_substring(char *substr, size_t substr_size, - size_t total_size, char *rep_str, - size_t rep_str_size) -{ - memmove(substr + rep_str_size, substr + substr_size, - total_size - substr_size); - memcpy(substr, rep_str, rep_str_size); -} - -static void -repl_bytes(char *str, char c1, char c2, int max) -{ - while (*str) { - if (*str == c1) { - *str = c2; - - if (--max == 0) { - break; - } - } - str++; - } -} - -/* Replaces a substring in the packet and rewrites the packet - * size to match. This function assumes the caller has verified - * the lengths to prevent under/over flow. */ -static void -modify_packet(struct dp_packet *pkt, char *pkt_str, size_t size, - char *repl_str, size_t repl_size, - uint32_t orig_used_size) -{ - replace_substring(pkt_str, size, - (const char *) dp_packet_tail(pkt) - pkt_str, - repl_str, repl_size); - dp_packet_set_size(pkt, orig_used_size + (int) repl_size - (int) size); -} - -/* Replace IPV4 address in FTP message with NATed address. */ -static int -repl_ftp_v4_addr(struct dp_packet *pkt, ovs_be32 v4_addr_rep, - char *ftp_data_start, - size_t addr_offset_from_ftp_data_start, - size_t addr_size) -{ - enum { MAX_FTP_V4_NAT_DELTA = 8 }; - - /* EPSV mode. */ - if (addr_offset_from_ftp_data_start == 0 && - addr_size == 0) { - return 0; - } - - /* Do conservative check for pathological MTU usage. */ - uint32_t orig_used_size = dp_packet_size(pkt); - if (orig_used_size + MAX_FTP_V4_NAT_DELTA > - dp_packet_get_allocated(pkt)) { - - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); - VLOG_WARN_RL(&rl, "Unsupported effective MTU %u used with FTP V4", - dp_packet_get_allocated(pkt)); - return 0; - } - - char v4_addr_str[INET_ADDRSTRLEN] = {0}; - ovs_assert(inet_ntop(AF_INET, &v4_addr_rep, v4_addr_str, - sizeof v4_addr_str)); - repl_bytes(v4_addr_str, '.', ',', 0); - modify_packet(pkt, ftp_data_start + addr_offset_from_ftp_data_start, - addr_size, v4_addr_str, strlen(v4_addr_str), - orig_used_size); - return (int) strlen(v4_addr_str) - (int) addr_size; -} - -static char * -skip_non_digits(char *str) -{ - while (!isdigit(*str) && *str != 0) { - str++; - } - return str; -} - -static char * -terminate_number_str(char *str, uint8_t max_digits) -{ - uint8_t digits_found = 0; - while (isdigit(*str) && digits_found <= max_digits) { - str++; - digits_found++; - } - - *str = 0; - return str; -} - - -static void -get_ftp_ctl_msg(struct dp_packet *pkt, char *ftp_msg) -{ - struct tcp_header *th = dp_packet_l4(pkt); - char *tcp_hdr = (char *) th; - uint32_t tcp_payload_len = dp_packet_get_tcp_payload_length(pkt); - size_t tcp_payload_of_interest = MIN(tcp_payload_len, - LARGEST_FTP_MSG_OF_INTEREST); - size_t tcp_hdr_len = TCP_OFFSET(th->tcp_ctl) * 4; - - ovs_strlcpy(ftp_msg, tcp_hdr + tcp_hdr_len, - tcp_payload_of_interest); -} - -static enum ftp_ctl_pkt -detect_ftp_ctl_type(const struct conn_lookup_ctx *ctx, - struct dp_packet *pkt) -{ - char ftp_msg[LARGEST_FTP_MSG_OF_INTEREST + 1] = {0}; - get_ftp_ctl_msg(pkt, ftp_msg); - - if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { - if (strncasecmp(ftp_msg, FTP_EPRT_CMD, strlen(FTP_EPRT_CMD)) && - !strcasestr(ftp_msg, FTP_EPSV_REPLY)) { - return CT_FTP_CTL_OTHER; - } - } else { - if (strncasecmp(ftp_msg, FTP_PORT_CMD, strlen(FTP_PORT_CMD)) && - strncasecmp(ftp_msg, FTP_EPRT_CMD, strlen(FTP_EPRT_CMD)) && - strncasecmp(ftp_msg, FTP_PASV_REPLY_CODE, - strlen(FTP_PASV_REPLY_CODE)) && - strncasecmp(ftp_msg, FTP_EPSV_REPLY_CODE, - strlen(FTP_EPSV_REPLY_CODE))) { - return CT_FTP_CTL_OTHER; - } - } - - return CT_FTP_CTL_INTEREST; -} - -static enum ftp_ctl_pkt -process_ftp_ctl_v4(struct conntrack *ct, - struct dp_packet *pkt, - const struct conn *conn_for_expectation, - ovs_be32 *v4_addr_rep, - char **ftp_data_v4_start, - size_t *addr_offset_from_ftp_data_start, - size_t *addr_size) -{ - struct tcp_header *th = dp_packet_l4(pkt); - size_t tcp_hdr_len = TCP_OFFSET(th->tcp_ctl) * 4; - char *tcp_hdr = (char *) th; - *ftp_data_v4_start = tcp_hdr + tcp_hdr_len; - char ftp_msg[LARGEST_FTP_MSG_OF_INTEREST + 1] = {0}; - get_ftp_ctl_msg(pkt, ftp_msg); - char *ftp = ftp_msg; - struct in_addr ip_addr; - enum ct_alg_mode mode; - bool extended = false; - - if (!strncasecmp(ftp, FTP_PORT_CMD, strlen(FTP_PORT_CMD))) { - ftp = ftp_msg + strlen(FTP_PORT_CMD); - mode = CT_FTP_MODE_ACTIVE; - } else if (!strncasecmp(ftp, FTP_EPRT_CMD, strlen(FTP_EPRT_CMD))) { - ftp = ftp_msg + strlen(FTP_EPRT_CMD); - mode = CT_FTP_MODE_ACTIVE; - extended = true; - } else if (!strncasecmp(ftp, FTP_EPSV_REPLY_CODE, - strlen(FTP_EPSV_REPLY_CODE))) { - ftp = ftp_msg + strlen(FTP_EPSV_REPLY_CODE); - mode = CT_FTP_MODE_PASSIVE; - extended = true; - } else { - ftp = ftp_msg + strlen(FTP_PASV_REPLY_CODE); - mode = CT_FTP_MODE_PASSIVE; - } - - /* Find first space. */ - ftp = strchr(ftp, ' '); - if (!ftp) { - return CT_FTP_CTL_INVALID; - } - - /* Find the first digit, after space. */ - ftp = skip_non_digits(ftp); - if (*ftp == 0) { - return CT_FTP_CTL_INVALID; - } - - /* EPRT, verify address family. */ - if (extended && mode == CT_FTP_MODE_ACTIVE) { - if (ftp[0] != FTP_AF_V4 || isdigit(ftp[1])) { - return CT_FTP_CTL_INVALID; - } - - ftp = skip_non_digits(ftp + 1); - if (*ftp == 0) { - return CT_FTP_CTL_INVALID; - } - } - - if (!extended || mode == CT_FTP_MODE_ACTIVE) { - char *ip_addr_start = ftp; - *addr_offset_from_ftp_data_start = ip_addr_start - ftp_msg; - repl_bytes(ftp, ',', '.', 3); - - /* Advance to end of IP address, to terminate it. */ - while (*ftp) { - if (!isdigit(*ftp) && *ftp != '.') { - break; - } - ftp++; - } - *ftp = 0; - ftp++; - - int rc2 = inet_pton(AF_INET, ip_addr_start, &ip_addr); - if (rc2 != 1) { - return CT_FTP_CTL_INVALID; - } - - *addr_size = ftp - ip_addr_start - 1; - } else { - *addr_size = 0; - *addr_offset_from_ftp_data_start = 0; - } - - char *save_ftp = ftp; - uint16_t port_hs; - - if (!extended) { - ftp = terminate_number_str(ftp, MAX_FTP_PORT_DGTS); - if (!ftp) { - return CT_FTP_CTL_INVALID; - } - int value; - if (!str_to_int(save_ftp, 10, &value)) { - return CT_FTP_CTL_INVALID; - } - - /* This is derived from the L4 port maximum is 65535. */ - if (value > 255) { - return CT_FTP_CTL_INVALID; - } - - port_hs = value; - port_hs <<= 8; - - /* Skip over comma. */ - ftp++; - save_ftp = ftp; - bool digit_found = false; - while (isdigit(*ftp)) { - ftp++; - digit_found = true; - } - if (!digit_found) { - return CT_FTP_CTL_INVALID; - } - *ftp = 0; - if (!str_to_int(save_ftp, 10, &value)) { - return CT_FTP_CTL_INVALID; - } - - if (value > 255) { - return CT_FTP_CTL_INVALID; - } - - port_hs |= value; - } else { - ftp = terminate_number_str(ftp, MAX_EXT_FTP_PORT_DGTS); - if (!ftp) { - return CT_FTP_CTL_INVALID; - } - int value; - if (!str_to_int(save_ftp, 10, &value)) { - return CT_FTP_CTL_INVALID; - } - if (value > UINT16_MAX) { - return CT_FTP_CTL_INVALID; - } - port_hs = (uint16_t) value; - } - - ovs_be16 port = htons(port_hs); - ovs_be32 conn_ipv4_addr; - - switch (mode) { - case CT_FTP_MODE_ACTIVE: - *v4_addr_rep = - conn_for_expectation->key_node[CT_DIR_REV].key.dst.addr.ipv4; - conn_ipv4_addr = - conn_for_expectation->key_node[CT_DIR_FWD].key.src.addr.ipv4; - break; - case CT_FTP_MODE_PASSIVE: - *v4_addr_rep = - conn_for_expectation->key_node[CT_DIR_FWD].key.dst.addr.ipv4; - conn_ipv4_addr = - conn_for_expectation->key_node[CT_DIR_REV].key.src.addr.ipv4; - break; - case CT_TFTP_MODE: - default: - OVS_NOT_REACHED(); - } - - if (!extended || mode == CT_FTP_MODE_ACTIVE) { - ovs_be32 ftp_ipv4_addr; - ftp_ipv4_addr = ip_addr.s_addr; - /* Although most servers will block this exploit, there may be some - * less well managed. */ - if (ftp_ipv4_addr != conn_ipv4_addr && ftp_ipv4_addr != *v4_addr_rep) { - return CT_FTP_CTL_INVALID; - } - } - - expectation_create(ct, port, conn_for_expectation, - !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); - return CT_FTP_CTL_INTEREST; -} - -static char * -skip_ipv6_digits(char *str) -{ - while (isxdigit(*str) || *str == ':' || *str == '.') { - str++; - } - return str; -} - -static enum ftp_ctl_pkt -process_ftp_ctl_v6(struct conntrack *ct, - struct dp_packet *pkt, - const struct conn *conn_for_exp, - union ct_addr *v6_addr_rep, char **ftp_data_start, - size_t *addr_offset_from_ftp_data_start, - size_t *addr_size, enum ct_alg_mode *mode) -{ - struct tcp_header *th = dp_packet_l4(pkt); - size_t tcp_hdr_len = TCP_OFFSET(th->tcp_ctl) * 4; - char *tcp_hdr = (char *) th; - char ftp_msg[LARGEST_FTP_MSG_OF_INTEREST + 1] = {0}; - get_ftp_ctl_msg(pkt, ftp_msg); - *ftp_data_start = tcp_hdr + tcp_hdr_len; - char *ftp = ftp_msg; - struct in6_addr ip6_addr; - - if (!strncasecmp(ftp, FTP_EPRT_CMD, strlen(FTP_EPRT_CMD))) { - ftp = ftp_msg + strlen(FTP_EPRT_CMD); - ftp = skip_non_digits(ftp); - if (*ftp != FTP_AF_V6 || isdigit(ftp[1])) { - return CT_FTP_CTL_INVALID; - } - /* Jump over delimiter. */ - ftp += 2; - - memset(&ip6_addr, 0, sizeof ip6_addr); - char *ip_addr_start = ftp; - *addr_offset_from_ftp_data_start = ip_addr_start - ftp_msg; - ftp = skip_ipv6_digits(ftp); - *ftp = 0; - *addr_size = ftp - ip_addr_start; - int rc2 = inet_pton(AF_INET6, ip_addr_start, &ip6_addr); - if (rc2 != 1) { - return CT_FTP_CTL_INVALID; - } - ftp++; - *mode = CT_FTP_MODE_ACTIVE; - } else { - ftp = ftp_msg + strcspn(ftp_msg, "("); - ftp = skip_non_digits(ftp); - if (!isdigit(*ftp)) { - return CT_FTP_CTL_INVALID; - } - - /* Not used for passive mode. */ - *addr_offset_from_ftp_data_start = 0; - *addr_size = 0; - - *mode = CT_FTP_MODE_PASSIVE; - } - - char *save_ftp = ftp; - ftp = terminate_number_str(ftp, MAX_EXT_FTP_PORT_DGTS); - if (!ftp) { - return CT_FTP_CTL_INVALID; - } - - int value; - if (!str_to_int(save_ftp, 10, &value)) { - return CT_FTP_CTL_INVALID; - } - if (value > CT_MAX_L4_PORT) { - return CT_FTP_CTL_INVALID; - } - - uint16_t port_hs = value; - ovs_be16 port = htons(port_hs); - - switch (*mode) { - case CT_FTP_MODE_ACTIVE: - *v6_addr_rep = conn_for_exp->key_node[CT_DIR_REV].key.dst.addr; - /* Although most servers will block this exploit, there may be some - * less well managed. */ - if (memcmp(&ip6_addr, &v6_addr_rep->ipv6, sizeof ip6_addr) && - memcmp(&ip6_addr, - &conn_for_exp->key_node[CT_DIR_FWD].key.src.addr.ipv6, - sizeof ip6_addr)) { - return CT_FTP_CTL_INVALID; - } - break; - case CT_FTP_MODE_PASSIVE: - *v6_addr_rep = conn_for_exp->key_node[CT_DIR_FWD].key.dst.addr; - break; - case CT_TFTP_MODE: - default: - OVS_NOT_REACHED(); - } - - expectation_create(ct, port, conn_for_exp, - !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); - return CT_FTP_CTL_INTEREST; -} - -static int -repl_ftp_v6_addr(struct dp_packet *pkt, union ct_addr v6_addr_rep, - char *ftp_data_start, - size_t addr_offset_from_ftp_data_start, - size_t addr_size, enum ct_alg_mode mode) -{ - /* This is slightly bigger than really possible. */ - enum { MAX_FTP_V6_NAT_DELTA = 45 }; - - if (mode == CT_FTP_MODE_PASSIVE) { - return 0; - } - - /* Do conservative check for pathological MTU usage. */ - uint32_t orig_used_size = dp_packet_size(pkt); - if (orig_used_size + MAX_FTP_V6_NAT_DELTA > - dp_packet_get_allocated(pkt)) { - - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); - VLOG_WARN_RL(&rl, "Unsupported effective MTU %u used with FTP V6", - dp_packet_get_allocated(pkt)); - return 0; - } - - char v6_addr_str[INET6_ADDRSTRLEN] = {0}; - ovs_assert(inet_ntop(AF_INET6, &v6_addr_rep.ipv6, v6_addr_str, - sizeof v6_addr_str)); - modify_packet(pkt, ftp_data_start + addr_offset_from_ftp_data_start, - addr_size, v6_addr_str, strlen(v6_addr_str), - orig_used_size); - return (int) strlen(v6_addr_str) - (int) addr_size; -} - -/* Increment/decrement a TCP sequence number. */ -static void -adj_seqnum(ovs_16aligned_be32 *val, int32_t inc) -{ - put_16aligned_be32(val, htonl(ntohl(get_16aligned_be32(val)) + inc)); -} - void conn_update_state_hook_register(int priority, conn_update_state_hook_fn fn) { @@ -3801,122 +3214,3 @@ conn_update_state_hook_unregister(conn_update_state_hook_fn fn) } } } - -static void -handle_ftp_ctl(struct conntrack *ct, const struct conn_lookup_ctx *ctx, - struct dp_packet *pkt, struct conn *ec, long long now, - enum ftp_ctl_pkt ftp_ctl, bool nat) -{ - struct ip_header *l3_hdr = dp_packet_l3(pkt); - ovs_be32 v4_addr_rep = 0; - union ct_addr v6_addr_rep; - size_t addr_offset_from_ftp_data_start = 0; - size_t addr_size = 0; - char *ftp_data_start; - enum ct_alg_mode mode = CT_FTP_MODE_ACTIVE; - - if (detect_ftp_ctl_type(ctx, pkt) != ftp_ctl) { - return; - } - - struct ovs_16aligned_ip6_hdr *nh6 = dp_packet_l3(pkt); - int64_t seq_skew = 0; - - if (ftp_ctl == CT_FTP_CTL_INTEREST) { - enum ftp_ctl_pkt rc; - if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { - rc = process_ftp_ctl_v6(ct, pkt, ec, - &v6_addr_rep, &ftp_data_start, - &addr_offset_from_ftp_data_start, - &addr_size, &mode); - } else { - rc = process_ftp_ctl_v4(ct, pkt, ec, - &v4_addr_rep, &ftp_data_start, - &addr_offset_from_ftp_data_start, - &addr_size); - } - if (rc == CT_FTP_CTL_INVALID) { - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); - VLOG_WARN_RL(&rl, "Invalid FTP control packet format"); - pkt->md.ct_state |= CS_TRACKED | CS_INVALID; - return; - } else if (rc == CT_FTP_CTL_INTEREST) { - uint16_t ip_len; - - if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { - if (nat) { - seq_skew = repl_ftp_v6_addr(pkt, v6_addr_rep, - ftp_data_start, - addr_offset_from_ftp_data_start, - addr_size, mode); - } - - if (seq_skew) { - ip_len = ntohs(nh6->ip6_ctlun.ip6_un1.ip6_un1_plen) + - seq_skew; - nh6->ip6_ctlun.ip6_un1.ip6_un1_plen = htons(ip_len); - } - } else { - if (nat) { - seq_skew = repl_ftp_v4_addr(pkt, v4_addr_rep, - ftp_data_start, - addr_offset_from_ftp_data_start, - addr_size); - } - if (seq_skew) { - ip_len = ntohs(l3_hdr->ip_tot_len) + seq_skew; - if (dp_packet_ip_checksum_valid(pkt)) { - dp_packet_ip_checksum_set_partial(pkt); - } else { - l3_hdr->ip_csum = recalc_csum16(l3_hdr->ip_csum, - l3_hdr->ip_tot_len, - htons(ip_len)); - } - l3_hdr->ip_tot_len = htons(ip_len); - } - } - } else { - OVS_NOT_REACHED(); - } - } - - struct tcp_header *th = dp_packet_l4(pkt); - - if (nat && ec->seq_skew != 0) { - ctx->reply != ec->seq_skew_dir ? - adj_seqnum(&th->tcp_ack, -ec->seq_skew) : - adj_seqnum(&th->tcp_seq, ec->seq_skew); - } - - if (dp_packet_l4_checksum_valid(pkt)) { - dp_packet_l4_checksum_set_partial(pkt); - } else { - th->tcp_csum = 0; - if (ctx->key.dl_type == htons(ETH_TYPE_IPV6)) { - th->tcp_csum = packet_csum_upperlayer6(nh6, th, ctx->key.nw_proto, - dp_packet_l4_size(pkt)); - } else { - uint32_t tcp_csum = packet_csum_pseudoheader(l3_hdr); - th->tcp_csum = csum_finish( - csum_continue(tcp_csum, th, dp_packet_l4_size(pkt))); - } - } - - if (seq_skew) { - conn_seq_skew_set(ct, ec, now, seq_skew + ec->seq_skew, - ctx->reply); - } -} - -static void -handle_tftp_ctl(struct conntrack *ct, - const struct conn_lookup_ctx *ctx OVS_UNUSED, - struct dp_packet *pkt, struct conn *conn_for_expectation, - long long now OVS_UNUSED, enum ftp_ctl_pkt ftp_ctl OVS_UNUSED, - bool nat OVS_UNUSED) -{ - expectation_create(ct, - conn_for_expectation->key_node[CT_DIR_FWD].key.src.port, - conn_for_expectation, - !!(pkt->md.ct_state & CS_REPLY_DIR), false, false); -} From patchwork Wed Apr 8 17:06:00 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221008 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Xdj0rMAY; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTxx3GrGz1xv0 for ; Thu, 09 Apr 2026 03:07:01 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id DAB1740833; Wed, 8 Apr 2026 17:06:59 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id Ggpn__PnfCHW; Wed, 8 Apr 2026 17:06:58 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org BD7FC40819 Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Xdj0rMAY Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTPS id BD7FC40819; Wed, 8 Apr 2026 17:06:58 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id B1A82C054A; Wed, 8 Apr 2026 17:06:58 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1D488C054A for ; Wed, 8 Apr 2026 17:06:57 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id B5287825B9 for ; Wed, 8 Apr 2026 17:06:36 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id ecOcpzb0v1xf for ; Wed, 8 Apr 2026 17:06:34 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org 92B2D826DF Authentication-Results: smtp1.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 92B2D826DF Authentication-Results: smtp1.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=Xdj0rMAY Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 92B2D826DF for ; Wed, 8 Apr 2026 17:06:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775667992; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0KZWW+/GJssnUCHrRkpga/1lhU2qSCopeBg5ghCippw=; b=Xdj0rMAYK1UHt0tghtIYNxUoDqCK09sN+xMWseD94HhO118AeUqqaRAHf7lHxQPpW0bUtF l6nZ98I8Z7gZtpOYM5sLwO+75xDp5OuR6SGyseoupJs486XMrRD7uekAT1E0tuTX7taV8X ArpLsPRyEi9oEs1gAWfde9Dp6bJAV5c= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-348-TaLT9Ev3Myu7KzF_XaZyHw-1; Wed, 08 Apr 2026 13:06:29 -0400 X-MC-Unique: TaLT9Ev3Myu7KzF_XaZyHw-1 X-Mimecast-MFC-AGG-ID: TaLT9Ev3Myu7KzF_XaZyHw_1775667985 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 580D51956052; Wed, 8 Apr 2026 17:06:25 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B4FE43000203; Wed, 8 Apr 2026 17:06:23 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:06:00 -0400 Message-ID: <20260408170613.587902-5-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: cdQQ7FsMJGObqCYNkGPmgDsjq88GP6UyhhtMvkS-oEg_1775667985 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 04/12] conntrack-tcp: Convert to using the per-conn storage area. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Refactored TCP module to use a new ct private storage area rather than an compatible extended conn struct so that future modules will have access to the TCP state details. This will be needed when getting the actual tcp state of the connection for offload. Signed-off-by: Aaron Conole --- lib/automake.mk | 1 + lib/conntrack-tcp.c | 64 +++++++++++++++++++-------------------------- lib/conntrack-tcp.h | 61 ++++++++++++++++++++++++++++++++++++++++++ lib/conntrack.c | 2 ++ 4 files changed, 91 insertions(+), 37 deletions(-) create mode 100644 lib/conntrack-tcp.h diff --git a/lib/automake.mk b/lib/automake.mk index 933b71226b..027dd986ba 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -90,6 +90,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/conntrack-icmp.c \ lib/conntrack-private.h \ lib/conntrack-tcp.c \ + lib/conntrack-tcp.h \ lib/conntrack-tftp.c \ lib/conntrack-tp.c \ lib/conntrack-tp.h \ diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c index 8a7c98cc45..696fd5c109 100644 --- a/lib/conntrack-tcp.c +++ b/lib/conntrack-tcp.c @@ -3,6 +3,7 @@ * Copyright (c) 2002 - 2008 Henning Brauer * Copyright (c) 2012 Gleb Smirnoff * Copyright (c) 2015, 2016 Nicira, Inc. + * Copyright (c) 2026 Red Hat, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -39,6 +40,7 @@ #include #include "conntrack-private.h" +#include "conntrack-tcp.h" #include "conntrack-tp.h" #include "coverage.h" #include "ct-dpif.h" @@ -49,18 +51,7 @@ COVERAGE_DEFINE(conntrack_tcp_seq_chk_bypass); COVERAGE_DEFINE(conntrack_tcp_seq_chk_failed); COVERAGE_DEFINE(conntrack_invalid_tcp_flags); -struct tcp_peer { - uint32_t seqlo; /* Max sequence number sent */ - uint32_t seqhi; /* Max the other end ACKd + win */ - uint16_t max_win; /* largest window (pre scaling) */ - uint8_t wscale; /* window scaling factor */ - enum ct_dpif_tcp_state state; -}; - -struct conn_tcp { - struct conn up; - struct tcp_peer peer[2]; /* 'conn' lock protected. */ -}; +ct_private_id_t conntrack_tcp_private_id = CT_PRIVATE_ID_INVALID; enum { TCPOPT_EOL, @@ -79,12 +70,6 @@ enum { #define SEQ_MIN(a, b) INT_MOD_MIN(a, b) #define SEQ_MAX(a, b) INT_MOD_MAX(a, b) -static struct conn_tcp* -conn_tcp_cast(const struct conn* conn) -{ - return CONTAINER_OF(conn, struct conn_tcp, up); -} - /* pf does this in in pf_normalize_tcp(), and it is called only if scrub * is enabled. We're not scrubbing, but this check seems reasonable. */ static bool @@ -113,9 +98,6 @@ tcp_invalid_flags(uint16_t flags) } #define TCP_MAX_WSCALE 14 -#define CT_WSCALE_FLAG 0x80 -#define CT_WSCALE_UNKNOWN 0x40 -#define CT_WSCALE_MASK 0xf static uint8_t tcp_get_wscale(const struct tcp_header *tcp) @@ -164,7 +146,7 @@ static enum ct_update_res tcp_conn_update(struct conntrack *ct, struct conn *conn_, struct dp_packet *pkt, bool reply, long long now) { - struct conn_tcp *conn = conn_tcp_cast(conn_); + struct conn_tcp_state *conn = conn_tcp_state_get(conn_); struct tcp_header *tcp = dp_packet_l4(pkt); /* The peer that sent 'pkt' */ struct tcp_peer *src = &conn->peer[reply ? 1 : 0]; @@ -189,7 +171,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, return CT_UPDATE_NEW; } else if (src->state <= CT_DPIF_TCPS_SYN_SENT) { src->state = CT_DPIF_TCPS_SYN_SENT; - conn_update_expiration(ct, &conn->up, CT_TM_TCP_FIRST_PACKET, now); + conn_update_expiration(ct, conn_, CT_TM_TCP_FIRST_PACKET, now); return CT_UPDATE_VALID_NEW; } } @@ -340,18 +322,18 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, if (src->state >= CT_DPIF_TCPS_FIN_WAIT_2 && dst->state >= CT_DPIF_TCPS_FIN_WAIT_2) { - conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSED, now); + conn_update_expiration(ct, conn_, CT_TM_TCP_CLOSED, now); } else if (src->state >= CT_DPIF_TCPS_CLOSING && dst->state >= CT_DPIF_TCPS_CLOSING) { - conn_update_expiration(ct, &conn->up, CT_TM_TCP_FIN_WAIT, now); + conn_update_expiration(ct, conn_, CT_TM_TCP_FIN_WAIT, now); } else if (src->state < CT_DPIF_TCPS_ESTABLISHED || dst->state < CT_DPIF_TCPS_ESTABLISHED) { - conn_update_expiration(ct, &conn->up, CT_TM_TCP_OPENING, now); + conn_update_expiration(ct, conn_, CT_TM_TCP_OPENING, now); } else if (src->state >= CT_DPIF_TCPS_CLOSING || dst->state >= CT_DPIF_TCPS_CLOSING) { - conn_update_expiration(ct, &conn->up, CT_TM_TCP_CLOSING, now); + conn_update_expiration(ct, conn_, CT_TM_TCP_CLOSING, now); } else { - conn_update_expiration(ct, &conn->up, CT_TM_TCP_ESTABLISHED, now); + conn_update_expiration(ct, conn_, CT_TM_TCP_ESTABLISHED, now); } } else if ((dst->state < CT_DPIF_TCPS_SYN_SENT || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2 @@ -439,15 +421,14 @@ static struct conn * tcp_new_conn(struct conntrack *ct, struct dp_packet *pkt, long long now, uint32_t tp_id) { - struct conn_tcp* newconn = NULL; struct tcp_header *tcp = dp_packet_l4(pkt); + struct conn_tcp_state *tcp_state; struct tcp_peer *src, *dst; uint16_t tcp_flags = TCP_FLAGS(tcp->tcp_ctl); - newconn = xzalloc(sizeof *newconn); - - src = &newconn->peer[0]; - dst = &newconn->peer[1]; + tcp_state = xzalloc(sizeof *tcp_state); + src = &tcp_state->peer[0]; + dst = &tcp_state->peer[1]; src->seqlo = ntohl(get_16aligned_be32(&tcp->tcp_seq)); src->seqhi = src->seqlo + dp_packet_get_tcp_payload_length(pkt) + 1; @@ -473,10 +454,12 @@ tcp_new_conn(struct conntrack *ct, struct dp_packet *pkt, long long now, src->state = CT_DPIF_TCPS_SYN_SENT; dst->state = CT_DPIF_TCPS_CLOSED; - newconn->up.tp_id = tp_id; - conn_init_expiration(ct, &newconn->up, CT_TM_TCP_FIRST_PACKET, now); + struct conn *newconn = xzalloc(sizeof *newconn); + newconn->tp_id = tp_id; + conn_private_set(newconn, conntrack_tcp_private_id, tcp_state); + conn_init_expiration(ct, newconn, CT_TM_TCP_FIRST_PACKET, now); - return &newconn->up; + return newconn; } static uint8_t @@ -499,7 +482,7 @@ static void tcp_conn_get_protoinfo(const struct conn *conn_, struct ct_dpif_protoinfo *protoinfo) { - const struct conn_tcp *conn = conn_tcp_cast(conn_); + const struct conn_tcp_state *conn = conn_tcp_state_get(conn_); protoinfo->proto = IPPROTO_TCP; protoinfo->tcp.state_orig = conn->peer[0].state; @@ -518,3 +501,10 @@ struct ct_l4_proto ct_proto_tcp = { .conn_update = tcp_conn_update, .conn_get_protoinfo = tcp_conn_get_protoinfo, }; + + +void +conntrack_tcp_init(void) +{ + conntrack_tcp_private_id = conn_private_id_alloc(free); +} diff --git a/lib/conntrack-tcp.h b/lib/conntrack-tcp.h new file mode 100644 index 0000000000..519993874c --- /dev/null +++ b/lib/conntrack-tcp.h @@ -0,0 +1,61 @@ +/* + * Copyright (c) 2026 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef CONNTRACK_TCP_H +#define CONNTRACK_TCP_H + +#include "conntrack.h" +#include "ct-dpif.h" + +/* wscale field flags stored in tcp_peer.wscale. */ +#define CT_WSCALE_FLAG 0x80 /* Negotiated window scaling is in use. */ +#define CT_WSCALE_UNKNOWN 0x40 /* Scale factor not yet known. */ +#define CT_WSCALE_MASK 0x0f /* Actual scale factor (0-14). */ + +/* Per-direction TCP state tracked by the conntrack TCP module. */ +struct tcp_peer { + uint32_t seqlo; /* Max sequence number sent. */ + uint32_t seqhi; /* Max the other end ACKd + win. */ + uint16_t max_win; /* Largest window (pre-scaling). */ + uint8_t wscale; /* Window scaling factor + flags. */ + enum ct_dpif_tcp_state state; +}; + +/* TCP-specific connection state stored in the conntrack private data slot. + * Access via conn_tcp_state_get(). */ +struct conn_tcp_state { + struct tcp_peer peer[2]; /* peer[0]=original, peer[1]=reply. */ +}; + +/* Private slot ID for TCP state; valid after conntrack_tcp_init(). */ +extern ct_private_id_t conntrack_tcp_private_id; + +/* Must be called once at module initialization before any connections are + * created (called internally by conntrack_init()). */ +void conntrack_tcp_init(void); + +/* Returns the TCP state for 'conn', or NULL if not a TCP connection or + * conntrack_tcp_init() has not been called. */ +static inline struct conn_tcp_state * +conn_tcp_state_get(const struct conn *conn) +{ + if (conntrack_tcp_private_id == CT_PRIVATE_ID_INVALID) { + return NULL; + } + return conn_private_get(conn, conntrack_tcp_private_id); +} + +#endif /* CONNTRACK_TCP_H */ diff --git a/lib/conntrack.c b/lib/conntrack.c index 462c0e0ad1..eab65e48f2 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -24,6 +24,7 @@ #include "conntrack.h" #include "conntrack-private.h" +#include "conntrack-tcp.h" #include "conntrack-tp.h" #include "coverage.h" #include "crc32c.h" @@ -223,6 +224,7 @@ conntrack_init(void) l4_protos[IPPROTO_ICMP] = &ct_proto_icmp4; l4_protos[IPPROTO_ICMPV6] = &ct_proto_icmp6; + conntrack_tcp_init(); conntrack_ftp_init(); conntrack_tftp_init(); From patchwork Wed Apr 8 17:06:01 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221007 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=i0sf5DiW; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTxt4XzVz1xv0 for ; Thu, 09 Apr 2026 03:06:58 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 441FE40817; Wed, 8 Apr 2026 17:06:57 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id x6q5vM9UP41Z; Wed, 8 Apr 2026 17:06:56 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 1BA7A40818 Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=i0sf5DiW Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTPS id 1BA7A40818; Wed, 8 Apr 2026 17:06:56 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id EFB65C054A; Wed, 8 Apr 2026 17:06:55 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1B5F2C054A for ; Wed, 8 Apr 2026 17:06:54 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 96C10826C0 for ; Wed, 8 Apr 2026 17:06:36 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id WgrOwOyGn_Xq for ; Wed, 8 Apr 2026 17:06:33 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org 0044A82656 Authentication-Results: smtp1.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 0044A82656 Authentication-Results: smtp1.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=i0sf5DiW Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 0044A82656 for ; Wed, 8 Apr 2026 17:06:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775667991; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PBRrfpMJPqrYCC6af7jFsEFTfg7JHg4h1hjzGfoVX6Y=; b=i0sf5DiW4oJDg2wuIpefMglOvpqNkDfkizKrUi2Gv4p3VXYQwRQeVioAkEHyqPwjF8Fv0a W//4DHRAKELw+PZaV93cAefJttWiyQgLjuJopYnjteOheWpaLVi1BspjZRMStYngbtE3IZ UEcwYxbnWeQJg6tWNuIqP/PxH9LeGUs= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-425-Qu5TMTdLM76d9drApvikSw-1; Wed, 08 Apr 2026 13:06:28 -0400 X-MC-Unique: Qu5TMTdLM76d9drApvikSw-1 X-Mimecast-MFC-AGG-ID: Qu5TMTdLM76d9drApvikSw_1775667987 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7CE28195608F; Wed, 8 Apr 2026 17:06:27 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BE0A530001BB; Wed, 8 Apr 2026 17:06:25 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:06:01 -0400 Message-ID: <20260408170613.587902-6-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: -pmcfD26oibaG6dOTNk9qhy_wrn746CYZU7yZz3Q3Rw_1775667987 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 05/12] ct-offload: Add a new interface as an offload provider. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This adds the basic primitives, initialization, and operations that conntrack offload providers will need to implement in order to offer a path to offloading. Signed-off-by: Aaron Conole --- lib/automake.mk | 2 + lib/ct-offload.c | 257 +++++++++++++++++++++++++++++++++++++++++++++++ lib/ct-offload.h | 97 ++++++++++++++++++ 3 files changed, 356 insertions(+) create mode 100644 lib/ct-offload.c create mode 100644 lib/ct-offload.h diff --git a/lib/automake.mk b/lib/automake.mk index 027dd986ba..f11e3de27c 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -97,6 +97,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/conntrack-other.c \ lib/conntrack.c \ lib/conntrack.h \ + lib/ct-offload.c \ + lib/ct-offload.h \ lib/cooperative-multitasking.c \ lib/cooperative-multitasking.h \ lib/cooperative-multitasking-private.h \ diff --git a/lib/ct-offload.c b/lib/ct-offload.c new file mode 100644 index 0000000000..3bd6200e37 --- /dev/null +++ b/lib/ct-offload.c @@ -0,0 +1,257 @@ +/* + * Copyright (c) 2026 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include + +#include "ct-offload.h" +#include "ovs-thread.h" +#include "util.h" + +#include "openvswitch/list.h" +#include "openvswitch/vlog.h" + +VLOG_DEFINE_THIS_MODULE(ct_offload); + +/* Node in the registered-provider list. */ +struct ct_offload_class_node { + const struct ct_offload_class *class; + struct ovs_list list_node; +}; + +/* Global list of registered CT offload classes and a mutex to protect it. + * Providers are expected to be registered at module init time and + * unregistered only at module teardown, so contention is minimal. */ +static struct ovs_mutex ct_offload_mutex = OVS_MUTEX_INITIALIZER; +static struct ovs_list ct_offload_classes + OVS_GUARDED_BY(ct_offload_mutex) + = OVS_LIST_INITIALIZER(&ct_offload_classes); + + +/* ct_offload_register() - register a CT offload provider class. + * + * Calls class->init() if provided. Returns 0 on success or a positive + * errno value on failure. Attempting to register the same class twice + * returns EEXIST. */ +int +ct_offload_register(const struct ct_offload_class *class) +{ + struct ct_offload_class_node *node; + int error = 0; + + ovs_assert(class); + ovs_assert(class->name); + + ovs_mutex_lock(&ct_offload_mutex); + + /* Detect duplicate registrations. */ + LIST_FOR_EACH (node, list_node, &ct_offload_classes) { + if (!strcmp(node->class->name, class->name)) { + VLOG_WARN("attempted to register duplicate ct offload class: %s", + class->name); + error = EEXIST; + goto out; + } + } + + error = class->init ? class->init() : 0; + if (error) { + VLOG_WARN("failed to initialize ct offload class %s: %s", + class->name, ovs_strerror(error)); + goto out; + } + + node = xmalloc(sizeof *node); + node->class = class; + ovs_list_push_back(&ct_offload_classes, &node->list_node); + VLOG_DBG("registered ct offload class: %s", class->name); + +out: + ovs_mutex_unlock(&ct_offload_mutex); + return error; +} + +/* ct_offload_unregister() - unregister a previously registered class. + * + * Safe to call even if the class was never registered (no-op in that + * case). */ +void +ct_offload_unregister(const struct ct_offload_class *class) +{ + struct ct_offload_class_node *node; + + ovs_assert(class); + + ovs_mutex_lock(&ct_offload_mutex); + LIST_FOR_EACH (node, list_node, &ct_offload_classes) { + if (node->class == class) { + ovs_list_remove(&node->list_node); + free(node); + VLOG_DBG("unregistered ct offload class: %s", class->name); + goto out; + } + } + VLOG_WARN("attempted to unregister unknown ct offload class: %s", + class->name); + +out: + ovs_mutex_unlock(&ct_offload_mutex); +} + +/* ct_offload_module_init() - register built-in CT offload providers. + * + * Must be called once before any connections are created. */ +void +ct_offload_module_init(void) +{ + /* No built-in providers yet; third parties call ct_offload_register() + * directly from their own module-init routines. */ +} + +/* ct_offload_conn_add() - notify all eligible providers of a new connection. + * + * Iterates over registered providers and calls conn_add() on each one that + * reports can_offload() == true for this context. Returns the first non-zero + * error encountered, but continues notifying remaining providers. This allows + * the underlying hardware conntrack details across providers function. */ +int +ct_offload_conn_add(const struct ct_offload_ctx *ctx) +{ + struct ct_offload_class_node *node; + int ret = 0; + + ovs_mutex_lock(&ct_offload_mutex); + LIST_FOR_EACH (node, list_node, &ct_offload_classes) { + const struct ct_offload_class *class = node->class; + + if (class->can_offload && !class->can_offload(ctx)) { + continue; + } + if (class->conn_add) { + int error = class->conn_add(ctx); + + if (error && !ret) { + ret = error; + } + } + } + ovs_mutex_unlock(&ct_offload_mutex); + + return ret; +} + +/* ct_offload_conn_del() - notify all providers that a connection was removed. + * + * Called unconditionally on all providers so that each can clean up any + * state it may have installed. */ +void +ct_offload_conn_del(const struct ct_offload_ctx *ctx) +{ + struct ct_offload_class_node *node; + + ovs_mutex_lock(&ct_offload_mutex); + LIST_FOR_EACH (node, list_node, &ct_offload_classes) { + const struct ct_offload_class *class = node->class; + + if (class->conn_del) { + class->conn_del(ctx); + } + } + ovs_mutex_unlock(&ct_offload_mutex); +} + +void +ct_offload_conn_established(const struct ct_offload_ctx *ctx) +{ + struct ct_offload_class_node *node; + + ovs_mutex_lock(&ct_offload_mutex); + LIST_FOR_EACH (node, list_node, &ct_offload_classes) { + const struct ct_offload_class *class = node->class; + + if (class->conn_established) { + class->conn_established(ctx); + } + } + ovs_mutex_unlock(&ct_offload_mutex); +} + +/* ct_offload_conn_update() - query the hardware last-used timestamp. + * + * Iterates over providers and returns the first non-zero timestamp returned + * by a provider's conn_update() callback. Returns 0 if no provider + * supplies a timestamp. */ +long long +ct_offload_conn_update(const struct ct_offload_ctx *ctx) +{ + struct ct_offload_class_node *node; + long long last_used = 0; + + ovs_mutex_lock(&ct_offload_mutex); + LIST_FOR_EACH (node, list_node, &ct_offload_classes) { + const struct ct_offload_class *class = node->class; + + if (class->conn_update) { + long long ts = class->conn_update(ctx); + + if (ts) { + last_used = ts; + break; + } + } + } + ovs_mutex_unlock(&ct_offload_mutex); + + return last_used; +} + +/* ct_offload_can_offload() - returns true if any provider can offload ctx. */ +bool +ct_offload_can_offload(const struct ct_offload_ctx *ctx) +{ + struct ct_offload_class_node *node; + bool result = false; + + ovs_mutex_lock(&ct_offload_mutex); + LIST_FOR_EACH (node, list_node, &ct_offload_classes) { + const struct ct_offload_class *class = node->class; + + if (class->can_offload && class->can_offload(ctx)) { + result = true; + break; + } + } + ovs_mutex_unlock(&ct_offload_mutex); + + return result; +} + +/* ct_offload_flush() - flush all offloaded connections from every provider. */ +void +ct_offload_flush(void) +{ + struct ct_offload_class_node *node; + + ovs_mutex_lock(&ct_offload_mutex); + LIST_FOR_EACH (node, list_node, &ct_offload_classes) { + const struct ct_offload_class *class = node->class; + + if (class->flush) { + class->flush(); + } + } + ovs_mutex_unlock(&ct_offload_mutex); +} diff --git a/lib/ct-offload.h b/lib/ct-offload.h new file mode 100644 index 0000000000..824b94a5c1 --- /dev/null +++ b/lib/ct-offload.h @@ -0,0 +1,97 @@ +/* + * Copyright (c) 2026 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef CT_OFFLOAD_H +#define CT_OFFLOAD_H + +#include "openvswitch/types.h" + +struct conn; +struct netdev; + +/* Context for offload as part of the callbacks that all connection + * offload APIs receive. + */ +struct ct_offload_ctx { + const struct conn *conn; /* Connection object being offloaded. */ + struct netdev *netdev_in; /* Input netdev. */ + odp_port_t input_port_id; /* ODP port number. */ +}; + +enum ct_offload_op_type { + CT_OFFLOAD_OP_ADD, /* Add operation. */ + CT_OFFLOAD_OP_DEL, /* Del operation. */ + CT_OFFLOAD_OP_UPD, /* Update operation. */ + CT_OFFLOAD_OP_POLICY, /* Policy check operation. */ + CT_OFFLOAD_OP_FLUSH, /* Flush. */ + CT_OFFLOAD_OP_EST, /* Established - notify that a connection + * has a reply seen. */ +}; + +struct ct_offload_op { + enum ct_offload_op_type type; + struct ct_offload_ctx ctx; + int error; +}; + +/* Batched set of offload contexts and operations.*/ +struct ct_offload_op_batch { + struct ct_offload_op *ops; + size_t n_ops; + size_t allocated; +}; + + +/* CT offload class describes a conntrack offload provider implementation. */ +struct ct_offload_class { + const char *name; + + /* Initialization routine for the provider. */ + int (*init)(void); + + /* Per-connection operation callbacks get called for individual operations + * on the fast path or when batching is not in use. */ + int (*conn_add)(const struct ct_offload_ctx *); + void (*conn_del)(const struct ct_offload_ctx *); + /* Populate the last-used timestamp for the connection. Returns the + * last-used time in milliseconds since epoch, or 0 on failure. */ + long long (*conn_update)(const struct ct_offload_ctx *); + /* Called exactly once when the first reply-direction packet is seen + * for an offloaded connection. */ + void (*conn_established)(const struct ct_offload_ctx *); + /* Check whether this provider can offload a connection. */ + bool (*can_offload)(const struct ct_offload_ctx *); + /* Flush all offloaded connections. */ + void (*flush)(void); +}; + +/* Register/unregister a provider. Must be called at module init, before + * any connections are created. */ +int ct_offload_register(const struct ct_offload_class *); +void ct_offload_unregister(const struct ct_offload_class *); + +/* Module initialization (register built-in providers). */ +void ct_offload_module_init(void); + +/* Per-connection offload API that dispatches to all registered providers. */ +int ct_offload_conn_add(const struct ct_offload_ctx *); +void ct_offload_conn_del(const struct ct_offload_ctx *); +long long ct_offload_conn_update(const struct ct_offload_ctx *); +void ct_offload_conn_established(const struct ct_offload_ctx *); +bool ct_offload_can_offload(const struct ct_offload_ctx *); +void ct_offload_flush(void); + +#endif /* CT_OFFLOAD_H */ From patchwork Wed Apr 8 17:06:02 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221010 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VVwKybX9; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTy435HYz1xv0 for ; Thu, 09 Apr 2026 03:07:08 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 7CD4E6100E; Wed, 8 Apr 2026 17:07:06 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id IoQ2M2PvzhcW; Wed, 8 Apr 2026 17:07:01 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 7812160FFA Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VVwKybX9 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id 7812160FFA; Wed, 8 Apr 2026 17:07:00 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id BC02DC0908; Wed, 8 Apr 2026 17:06:59 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1B5BEC0902 for ; Wed, 8 Apr 2026 17:06:58 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 77151407EA for ; Wed, 8 Apr 2026 17:06:36 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id X2MeECwsVGCQ for ; Wed, 8 Apr 2026 17:06:35 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org 4C976404AE Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 4C976404AE Authentication-Results: smtp2.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=VVwKybX9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id 4C976404AE for ; Wed, 8 Apr 2026 17:06:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775667994; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N+/D41uh28tkAkpf338Kh8KYyfNnRmZ7o2qRawQuhfQ=; b=VVwKybX9JzKJHoAAhUAIRGIffTGWo1njO8vpZA6DPiMfNSRLNk3xBljUTciYcn+jZxmzpw oVekySo+9eMPhxJ+r9Fwitqjn6cRd1l2lV/ErbWLkXXf3SFdhoZj9cuXz8fqSYmJHQxbrT atOF3CnuwFQdHRajUXLFQ4G69inVH/o= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-319-otJsty_fMc6N3erS-fXb8A-1; Wed, 08 Apr 2026 13:06:30 -0400 X-MC-Unique: otJsty_fMc6N3erS-fXb8A-1 X-Mimecast-MFC-AGG-ID: otJsty_fMc6N3erS-fXb8A_1775667989 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D3F72195608D; Wed, 8 Apr 2026 17:06:29 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D9885300019F; Wed, 8 Apr 2026 17:06:27 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:06:02 -0400 Message-ID: <20260408170613.587902-7-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: qmEAItb1oXfx5LIdJnQIv1j_DdE-CG7ixK2yfRzCuSI_1775667989 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 06/12] ct-offload: Add batching support. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The CT offload operations API currently considers operating on a single connection at a time. However, there may be reason to accumulate offload API operations and execute them as a single large batch of operations. Provide a basic batch abstraction that allows for accumulating operations and then executing them all at once. This will be used in an upcoming commit, especially with the ct expiration logic. The provider also may have a batched abstraction that lets it do a better provider based optimization. As part of this extension, move the lock management up a level for the batching system to have a single bulk operations lock. Signed-off-by: Aaron Conole --- lib/ct-offload.c | 254 +++++++++++++++++++++++++++++++++++++++++++---- lib/ct-offload.h | 54 ++++++++++ 2 files changed, 286 insertions(+), 22 deletions(-) diff --git a/lib/ct-offload.c b/lib/ct-offload.c index 3bd6200e37..97c922dde1 100644 --- a/lib/ct-offload.c +++ b/lib/ct-offload.c @@ -121,25 +121,33 @@ ct_offload_module_init(void) * directly from their own module-init routines. */ } -/* ct_offload_conn_add() - notify all eligible providers of a new connection. +/* ct_offload_conn_add_() - notify all eligible providers of a new connection. * * Iterates over registered providers and calls conn_add() on each one that * reports can_offload() == true for this context. Returns the first non-zero * error encountered, but continues notifying remaining providers. This allows - * the underlying hardware conntrack details across providers function. */ -int -ct_offload_conn_add(const struct ct_offload_ctx *ctx) + * the underlying hardware conntrack details across providers function. + */ +static int +ct_offload_conn_add_(const struct ct_offload_ctx *ctx, bool batched) { struct ct_offload_class_node *node; int ret = 0; - ovs_mutex_lock(&ct_offload_mutex); LIST_FOR_EACH (node, list_node, &ct_offload_classes) { const struct ct_offload_class *class = node->class; + if (batched && class->batch_submit) { + /* Called via the batched path - skip the providers + * that support batched submits since they already processed + * this. */ + continue; + } + if (class->can_offload && !class->can_offload(ctx)) { continue; } + if (class->conn_add) { int error = class->conn_add(ctx); @@ -148,44 +156,83 @@ ct_offload_conn_add(const struct ct_offload_ctx *ctx) } } } + + return ret; +} + +int +ct_offload_conn_add(const struct ct_offload_ctx *ctx) +{ + int ret; + + ovs_mutex_lock(&ct_offload_mutex); + ret = ct_offload_conn_add_(ctx, false); ovs_mutex_unlock(&ct_offload_mutex); return ret; } -/* ct_offload_conn_del() - notify all providers that a connection was removed. +/* ct_offload_conn_del_() - notify all providers that a connection was removed. * * Called unconditionally on all providers so that each can clean up any * state it may have installed. */ -void -ct_offload_conn_del(const struct ct_offload_ctx *ctx) +static void +ct_offload_conn_del_(const struct ct_offload_ctx *ctx, bool batched) { struct ct_offload_class_node *node; - ovs_mutex_lock(&ct_offload_mutex); LIST_FOR_EACH (node, list_node, &ct_offload_classes) { const struct ct_offload_class *class = node->class; + if (batched && class->batch_submit) { + /* Called via the batched path - skip the providers + * that support batched submits since they already processed + * this. */ + continue; + } + if (class->conn_del) { class->conn_del(ctx); } } - ovs_mutex_unlock(&ct_offload_mutex); } void -ct_offload_conn_established(const struct ct_offload_ctx *ctx) +ct_offload_conn_del(const struct ct_offload_ctx *ctx) +{ + ovs_mutex_lock(&ct_offload_mutex); + ct_offload_conn_del_(ctx, false); + ovs_mutex_unlock(&ct_offload_mutex); +} + +static int +ct_offload_conn_established_(const struct ct_offload_ctx *ctx, bool batched) { struct ct_offload_class_node *node; - ovs_mutex_lock(&ct_offload_mutex); LIST_FOR_EACH (node, list_node, &ct_offload_classes) { const struct ct_offload_class *class = node->class; + if (batched && class->batch_submit) { + /* Called via the batched path - skip the providers + * that support batched submits since they already processed + * this. */ + continue; + } + if (class->conn_established) { class->conn_established(ctx); } } + + return 0; +} + +void +ct_offload_conn_established(const struct ct_offload_ctx *ctx) +{ + ovs_mutex_lock(&ct_offload_mutex); + (void) ct_offload_conn_established_(ctx, false); ovs_mutex_unlock(&ct_offload_mutex); } @@ -194,16 +241,22 @@ ct_offload_conn_established(const struct ct_offload_ctx *ctx) * Iterates over providers and returns the first non-zero timestamp returned * by a provider's conn_update() callback. Returns 0 if no provider * supplies a timestamp. */ -long long -ct_offload_conn_update(const struct ct_offload_ctx *ctx) +static long long +ct_offload_conn_update_(const struct ct_offload_ctx *ctx, bool batched) { struct ct_offload_class_node *node; long long last_used = 0; - ovs_mutex_lock(&ct_offload_mutex); LIST_FOR_EACH (node, list_node, &ct_offload_classes) { const struct ct_offload_class *class = node->class; + if (batched && class->batch_submit) { + /* Called via the batched path - skip the providers + * that support batched submits since they already processed + * this. */ + continue; + } + if (class->conn_update) { long long ts = class->conn_update(ctx); @@ -213,45 +266,202 @@ ct_offload_conn_update(const struct ct_offload_ctx *ctx) } } } + return last_used; +} + +long long +ct_offload_conn_update(const struct ct_offload_ctx *ctx) +{ + long long ret; + + ovs_mutex_lock(&ct_offload_mutex); + ret = ct_offload_conn_update_(ctx, false); ovs_mutex_unlock(&ct_offload_mutex); - return last_used; + return ret; } /* ct_offload_can_offload() - returns true if any provider can offload ctx. */ -bool -ct_offload_can_offload(const struct ct_offload_ctx *ctx) +static bool +ct_offload_can_offload_(const struct ct_offload_ctx *ctx, bool batched) { struct ct_offload_class_node *node; bool result = false; - ovs_mutex_lock(&ct_offload_mutex); LIST_FOR_EACH (node, list_node, &ct_offload_classes) { const struct ct_offload_class *class = node->class; + if (batched && class->batch_submit) { + /* Called via the batched path - skip the providers + * that support batched submits since they already processed + * this. */ + continue; + } + if (class->can_offload && class->can_offload(ctx)) { result = true; break; } } - ovs_mutex_unlock(&ct_offload_mutex); return result; } +bool +ct_offload_can_offload(const struct ct_offload_ctx *ctx) +{ + bool can_offload; + + ovs_mutex_lock(&ct_offload_mutex); + can_offload = ct_offload_can_offload_(ctx, false); + ovs_mutex_unlock(&ct_offload_mutex); + + return can_offload; +} + /* ct_offload_flush() - flush all offloaded connections from every provider. */ +static void +ct_offload_flush_(bool batched) +{ + struct ct_offload_class_node *node; + + LIST_FOR_EACH (node, list_node, &ct_offload_classes) { + const struct ct_offload_class *class = node->class; + + if (batched && class->batch_submit) { + /* Called via the batched path - skip the providers + * that support batched submits since they already processed + * this. */ + continue; + } + + if (class->flush) { + class->flush(); + } + } +} + void ct_offload_flush(void) +{ + ovs_mutex_lock(&ct_offload_mutex); + ct_offload_flush_(false); + ovs_mutex_unlock(&ct_offload_mutex); +} + + +/* Batch API + * ========= + * + * The default implementation serialises each operation in the batch through + * the individual per-connection dispatch functions above. All provider + * callbacks are invoked under the ct_offload_mutex, so the per-operation + * lock/unlock overhead of the single-op path is avoided across the batch. + */ + +#define CT_OFFLOAD_BATCH_INITIAL_SIZE 8 + +/* ct_offload_op_batch_init() - prepare an empty batch for use. */ +void +ct_offload_op_batch_init(struct ct_offload_op_batch *batch) +{ + batch->ops = NULL; + batch->n_ops = 0; + batch->allocated = 0; +} + +/* ct_offload_op_batch_add() - append one operation to the batch. + * + * The batch grows dynamically; callers need not pre-size it. */ +void +ct_offload_op_batch_add(struct ct_offload_op_batch *batch, + enum ct_offload_op_type type, + const struct ct_offload_ctx *ctx) +{ + if (batch->n_ops == batch->allocated) { + batch->allocated = batch->allocated + ? batch->allocated * 2 + : CT_OFFLOAD_BATCH_INITIAL_SIZE; + batch->ops = xrealloc(batch->ops, + batch->allocated * sizeof *batch->ops); + } + + struct ct_offload_op *op = &batch->ops[batch->n_ops++]; + op->type = type; + op->ctx = *ctx; + op->error = 0; +} + +/* ct_offload_op_batch_submit() - execute every operation in the batch. + * + * Each op's 'error' field is set to the result of the corresponding + * per-connection dispatch. The mutex is held for the duration of each + * operation; providers are invoked directly rather than through the + * public single-op wrappers to avoid repeated lock/unlock cycles. */ +void +ct_offload_op_batch_submit(struct ct_offload_op_batch *batch) { struct ct_offload_class_node *node; + struct ct_offload_op *op; ovs_mutex_lock(&ct_offload_mutex); LIST_FOR_EACH (node, list_node, &ct_offload_classes) { const struct ct_offload_class *class = node->class; - if (class->flush) { - class->flush(); + if (class->batch_submit) { + class->batch_submit(batch); + } + } + + CT_OFFLOAD_BATCH_OP_FOR_EACH (idx, op, batch) { + + switch (op->type) { + case CT_OFFLOAD_OP_ADD: + op->error = ct_offload_conn_add_(&op->ctx, true); + break; + + case CT_OFFLOAD_OP_DEL: + ct_offload_conn_del_(&op->ctx, true); + op->error = 0; + break; + + case CT_OFFLOAD_OP_UPD: { + long long ts = ct_offload_conn_update_(&op->ctx, true); + + op->error = ts ? 0 : ENODATA; + break; + } + + case CT_OFFLOAD_OP_POLICY: + op->error = ct_offload_can_offload_(&op->ctx, true) ? 0 : EPERM; + break; + + case CT_OFFLOAD_OP_FLUSH: + ct_offload_flush_(true); + op->error = 0; + break; + + case CT_OFFLOAD_OP_EST: + op->error = ct_offload_conn_established_(&op->ctx, true); + break; + + default: + op->error = EINVAL; + break; } } ovs_mutex_unlock(&ct_offload_mutex); } + +/* ct_offload_op_batch_destroy() - release memory held by the batch. + * + * The batch may be re-initialised with ct_offload_op_batch_init() after + * this call. */ +void +ct_offload_op_batch_destroy(struct ct_offload_op_batch *batch) +{ + free(batch->ops); + batch->ops = NULL; + batch->n_ops = 0; + batch->allocated = 0; +} diff --git a/lib/ct-offload.h b/lib/ct-offload.h index 824b94a5c1..36871d12cb 100644 --- a/lib/ct-offload.h +++ b/lib/ct-offload.h @@ -62,6 +62,10 @@ struct ct_offload_class { /* Initialization routine for the provider. */ int (*init)(void); + /* Interface to allow offload providers to operate in bulk. This + * will be called as part of the batch processing process. If a provider + * doesn't implemented this the fallback is each individual call. */ + void (*batch_submit)(struct ct_offload_op_batch *); /* Per-connection operation callbacks get called for individual operations * on the fast path or when batching is not in use. */ int (*conn_add)(const struct ct_offload_ctx *); @@ -94,4 +98,54 @@ void ct_offload_conn_established(const struct ct_offload_ctx *); bool ct_offload_can_offload(const struct ct_offload_ctx *); void ct_offload_flush(void); +/* Batch offload API. + * + * The default implementation dispatches each operation individually using the + * per-connection API above. Providers that can handle a native batch may do + * so by implementing a batch_submit callback in struct ct_offload_class in the + * future. + * + * Typical usage: + * + * struct ct_offload_op_batch batch; + * ct_offload_op_batch_init(&batch); + * + * ct_offload_op_batch_add(&batch, CT_OFFLOAD_OP_ADD, &ctx_a); + * ct_offload_op_batch_add(&batch, CT_OFFLOAD_OP_ADD, &ctx_b); + * + * ct_offload_op_batch_submit(&batch); + * for_each_op inspect batch.ops[i].error + * + * ct_offload_op_batch_destroy(&batch); + * + * For CT_OFFLOAD_OP_UPD, op->error is set to 0 when the hardware returned a + * valid last-used timestamp (expiration was refreshed by the provider), or to + * ENODATA when no hardware record was found. + * + * For CT_OFFLOAD_OP_POLICY, op->error is set to 0 when the connection is + * eligible for offload, or EPERM when no provider will accept it. + */ +void ct_offload_op_batch_init(struct ct_offload_op_batch *); +void ct_offload_op_batch_add(struct ct_offload_op_batch *, + enum ct_offload_op_type, + const struct ct_offload_ctx *); +void ct_offload_op_batch_submit(struct ct_offload_op_batch *); +void ct_offload_op_batch_destroy(struct ct_offload_op_batch *); + +static inline +size_t ct_offload_op_batch_len(struct ct_offload_op_batch *batch) +{ + return batch->n_ops; +} + +static inline +size_t ct_offload_op_batch_size(struct ct_offload_op_batch *batch) +{ + return batch->allocated; +} + +#define CT_OFFLOAD_BATCH_OP_FOR_EACH(IDX, OP, BATCH) \ + for (size_t IDX = 0; IDX < ct_offload_op_batch_len(BATCH); IDX++) \ + if (OP = &((BATCH)->ops[IDX]), true) + #endif /* CT_OFFLOAD_H */ From patchwork Wed Apr 8 17:06:03 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221009 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=W3uEZUwl; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTy2467Zz1xv0 for ; Thu, 09 Apr 2026 03:07:06 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 29A4E41078; Wed, 8 Apr 2026 17:07:05 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id mUBNT6LvFLbF; Wed, 8 Apr 2026 17:07:02 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org F1CF24107B Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=W3uEZUwl Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTPS id F1CF24107B; Wed, 8 Apr 2026 17:07:00 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id CF241C0902; Wed, 8 Apr 2026 17:07:00 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 070FEC054A for ; Wed, 8 Apr 2026 17:06:59 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 596BA60FCE for ; Wed, 8 Apr 2026 17:06:38 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id NpcFoH3D54j0 for ; Wed, 8 Apr 2026 17:06:37 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org 208D060FA0 Authentication-Results: smtp3.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 208D060FA0 Authentication-Results: smtp3.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=W3uEZUwl Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id 208D060FA0 for ; Wed, 8 Apr 2026 17:06:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775667996; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q3zQYA9MryvLBGKhdaeed6pNT/e0OC5Bxsv22zMX1r4=; b=W3uEZUwlCY2G4bnXA+iQeVu7d79eUa6BsFRkqp5HFI+lzifU5L4S5HDwjUwm8euF2hH9KU n9Jf+WbphnntJzNYLncn+Dh/kdTDFFEsJg3AtdrKyC1H3vnVFNutQmqJ52tg2NPTGE5R4i v7Qx62ad/wHHAyXf+Q2djo2CtPM4N9A= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-101-uKa8dl-ePs6Z5rgxk1kK-w-1; Wed, 08 Apr 2026 13:06:32 -0400 X-MC-Unique: uKa8dl-ePs6Z5rgxk1kK-w-1 X-Mimecast-MFC-AGG-ID: uKa8dl-ePs6Z5rgxk1kK-w_1775667991 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id ADE251955E7B; Wed, 8 Apr 2026 17:06:31 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3C19A300019F; Wed, 8 Apr 2026 17:06:30 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:06:03 -0400 Message-ID: <20260408170613.587902-8-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: aIP1VUqMmvzaKosGE4nJXxMMhC3pNV1aUr3hSXlGtbM_1775667991 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 07/12] ct-offload: Add a mark for offloaded connections. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This helps future work to determine whether a connection needs to be cleaned up during offload sweeping, and whether to notify offload providers about established connections. Additionally, update the TCP sequence check to skip verifying sequence numbers for offloaded connections. Signed-off-by: Aaron Conole --- lib/conntrack-tcp.c | 8 +++-- lib/ct-offload.c | 81 +++++++++++++++++++++++++++++++++++++++++++-- lib/ct-offload.h | 8 +++++ 3 files changed, 92 insertions(+), 5 deletions(-) diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c index 696fd5c109..1e71b40d40 100644 --- a/lib/conntrack-tcp.c +++ b/lib/conntrack-tcp.c @@ -39,6 +39,7 @@ #include +#include "ct-offload.h" #include "conntrack-private.h" #include "conntrack-tcp.h" #include "conntrack-tp.h" @@ -133,9 +134,10 @@ tcp_get_wscale(const struct tcp_header *tcp) } static bool -tcp_bypass_seq_chk(struct conntrack *ct) +tcp_bypass_seq_chk(struct conntrack *ct, struct conn *conn) { - if (!conntrack_get_tcp_seq_chk(ct)) { + if (!conntrack_get_tcp_seq_chk(ct) || + ct_offload_conn_is_offloaded(conn)) { COVERAGE_INC(conntrack_tcp_seq_chk_bypass); return true; } @@ -286,7 +288,7 @@ tcp_conn_update(struct conntrack *ct, struct conn *conn_, /* Acking not more than one window forward */ && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) - || tcp_bypass_seq_chk(ct)) { + || tcp_bypass_seq_chk(ct, conn_)) { /* Require an exact/+1 sequence match on resets when possible */ /* update max window */ diff --git a/lib/ct-offload.c b/lib/ct-offload.c index 97c922dde1..618bd655d0 100644 --- a/lib/ct-offload.c +++ b/lib/ct-offload.c @@ -17,6 +17,8 @@ #include #include +#include "conntrack.h" +#include "conntrack-private.h" #include "ct-offload.h" #include "ovs-thread.h" #include "util.h" @@ -26,6 +28,15 @@ VLOG_DEFINE_THIS_MODULE(ct_offload); +/* Private data slot used to mark connections that have been successfully + * offloaded. Allocated once at module init; no destructor needed because + * the stored value is a plain integer cast to pointer, not heap data. */ +static ct_private_id_t ct_offload_private_id = CT_PRIVATE_ID_INVALID; + +#define CT_OFFLOAD_STATE_NONE ((void *) 0) +#define CT_OFFLOAD_STATE_ADDED ((void *) 1) +#define CT_OFFLOAD_STATE_EST ((void *) 2) + /* Node in the registered-provider list. */ struct ct_offload_class_node { const struct ct_offload_class *class; @@ -111,14 +122,29 @@ out: ovs_mutex_unlock(&ct_offload_mutex); } +void +ct_offload_alloc_private_slot(void) +{ + static struct ovsthread_once once_enable = OVSTHREAD_ONCE_INITIALIZER; + + if (ovsthread_once_start(&once_enable)) { + /* Allocate the per-connection private slot. */ + ct_offload_private_id = conn_private_id_alloc(NULL); + if (ct_offload_private_id == CT_PRIVATE_ID_INVALID) { + VLOG_ERR("failed to allocate ct offload private id: " + "is-offloaded tracking disabled"); + } + ovsthread_once_done(&once_enable); + } +} + /* ct_offload_module_init() - register built-in CT offload providers. * * Must be called once before any connections are created. */ void ct_offload_module_init(void) { - /* No built-in providers yet; third parties call ct_offload_register() - * directly from their own module-init routines. */ + ct_offload_alloc_private_slot(); } /* ct_offload_conn_add_() - notify all eligible providers of a new connection. @@ -157,6 +183,11 @@ ct_offload_conn_add_(const struct ct_offload_ctx *ctx, bool batched) } } + if (!ret && ct_offload_private_id != CT_PRIVATE_ID_INVALID) { + conn_private_set(CONST_CAST(struct conn *, ctx->conn), + ct_offload_private_id, CT_OFFLOAD_STATE_ADDED); + } + return ret; } @@ -195,6 +226,11 @@ ct_offload_conn_del_(const struct ct_offload_ctx *ctx, bool batched) class->conn_del(ctx); } } + + if (ct_offload_private_id != CT_PRIVATE_ID_INVALID) { + conn_private_set(CONST_CAST(struct conn *, ctx->conn), + ct_offload_private_id, CT_OFFLOAD_STATE_NONE); + } } void @@ -208,8 +244,19 @@ ct_offload_conn_del(const struct ct_offload_ctx *ctx) static int ct_offload_conn_established_(const struct ct_offload_ctx *ctx, bool batched) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(600, 600); struct ct_offload_class_node *node; + if (ct_offload_private_id == CT_PRIVATE_ID_INVALID) { + VLOG_WARN_RL(&rl, "ct_offload id not allocted: always sending est."); + return EAGAIN; + } + + if (conn_private_get(ctx->conn, ct_offload_private_id) != + CT_OFFLOAD_STATE_ADDED) { + return EALREADY; + } + LIST_FOR_EACH (node, list_node, &ct_offload_classes) { const struct ct_offload_class *class = node->class; @@ -225,6 +272,8 @@ ct_offload_conn_established_(const struct ct_offload_ctx *ctx, bool batched) } } + conn_private_set(CONST_CAST(struct conn *, ctx->conn), + ct_offload_private_id, CT_OFFLOAD_STATE_EST); return 0; } @@ -453,6 +502,34 @@ ct_offload_op_batch_submit(struct ct_offload_op_batch *batch) ovs_mutex_unlock(&ct_offload_mutex); } +/* ct_offload_conn_is_offloaded() - return true if conn is currently offloaded. + * + * Reads the private slot set by ct_offload_conn_add() on success and cleared + * by ct_offload_conn_del(). Returns false when the private slot could not be + * allocated at init time. */ +bool +ct_offload_conn_is_offloaded(const struct conn *conn) +{ + if (ct_offload_private_id == CT_PRIVATE_ID_INVALID) { + return false; + } + return conn_private_get(conn, ct_offload_private_id) != + CT_OFFLOAD_STATE_NONE; +} + +/* ct_offload_conn_is_established() - return true if conn transitioned to + * established state. Returns false when the private slot could not be + * allocated at init time. */ +bool +ct_offload_conn_is_established(const struct conn *conn) +{ + if (ct_offload_private_id == CT_PRIVATE_ID_INVALID) { + return false; + } + return conn_private_get(conn, ct_offload_private_id) == + CT_OFFLOAD_STATE_EST; +} + /* ct_offload_op_batch_destroy() - release memory held by the batch. * * The batch may be re-initialised with ct_offload_op_batch_init() after diff --git a/lib/ct-offload.h b/lib/ct-offload.h index 36871d12cb..fcb3170fa1 100644 --- a/lib/ct-offload.h +++ b/lib/ct-offload.h @@ -87,6 +87,8 @@ struct ct_offload_class { int ct_offload_register(const struct ct_offload_class *); void ct_offload_unregister(const struct ct_offload_class *); +/* Allocate private slot id. */ +void ct_offload_alloc_private_slot(void); /* Module initialization (register built-in providers). */ void ct_offload_module_init(void); @@ -98,6 +100,12 @@ void ct_offload_conn_established(const struct ct_offload_ctx *); bool ct_offload_can_offload(const struct ct_offload_ctx *); void ct_offload_flush(void); +/* Returns true if 'conn' has been successfully offloaded to hardware. + * Set by ct_offload_conn_add(); cleared by ct_offload_conn_del(). */ +bool ct_offload_conn_is_offloaded(const struct conn *); +/* Returns true if 'conn' has been transitioned to established state. */ +bool ct_offload_conn_is_established(const struct conn *); + /* Batch offload API. * * The default implementation dispatches each operation individually using the From patchwork Wed Apr 8 17:06:04 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221011 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ZVaMZKFj; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTy55Ymcz1xv0 for ; Thu, 09 Apr 2026 03:07:09 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 50E2C41084; Wed, 8 Apr 2026 17:07:08 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id QWAMrFsPr6Xz; Wed, 8 Apr 2026 17:07:06 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 7C0964102C Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=ZVaMZKFj Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTPS id 7C0964102C; Wed, 8 Apr 2026 17:07:03 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 716B7C054A; Wed, 8 Apr 2026 17:07:03 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1388AC0908 for ; Wed, 8 Apr 2026 17:07:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id C47FD41032 for ; Wed, 8 Apr 2026 17:06:40 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id GFcnVIV4WXsi for ; Wed, 8 Apr 2026 17:06:39 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp4.osuosl.org 44EBE4100E Authentication-Results: smtp4.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 44EBE4100E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id 44EBE4100E for ; Wed, 8 Apr 2026 17:06:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775667998; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=expsvgU0zmJrqjHdwIzXa358pNB5bLBGD5z4Fn7Q1yA=; b=ZVaMZKFjV1SMzwPHMg6nhDE4ItvW+uU/RysI/5BtKHSsBGq7pR6B7SB88EYCyqKEDKau4g ujhqEZdFzkgaksqkXEXn2N+JiiB0JJPZ/hduVtAyU0uTETvWCzTcvYziQxC9zvqxbo2cac 6e8Zq76+7LpKEuTMnEIcaQEwZOUveUU= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-277-RvYghVTCMA2EXiBhGSDRCg-1; Wed, 08 Apr 2026 13:06:34 -0400 X-MC-Unique: RvYghVTCMA2EXiBhGSDRCg-1 X-Mimecast-MFC-AGG-ID: RvYghVTCMA2EXiBhGSDRCg_1775667993 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B825D19560A7; Wed, 8 Apr 2026 17:06:33 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0F3EC300019F; Wed, 8 Apr 2026 17:06:31 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:06:04 -0400 Message-ID: <20260408170613.587902-9-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 4WHmA3RYRqJXF1sxgecI1GnuxsXh0LD6vUGV-YgnQ8E_1775667993 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 08/12] conntrack: Add calls to ct-offload infrastructure. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This connects the offload provider infrastructure with the various places in conntrack where connection status changes take place. Signed-off-by: Aaron Conole --- lib/conntrack.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 76 insertions(+), 3 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index eab65e48f2..9872d7af51 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -27,6 +27,7 @@ #include "conntrack-tcp.h" #include "conntrack-tp.h" #include "coverage.h" +#include "ct-offload.h" #include "crc32c.h" #include "csum.h" #include "ct-dpif.h" @@ -533,6 +534,13 @@ conn_clean(struct conntrack *ct, struct conn *conn) atomic_count_dec(&zl->czl.count); } + struct ct_offload_ctx offload_ctx = { + .conn = conn, + .netdev_in = NULL, + .input_port_id = ODPP_NONE, + }; + ct_offload_conn_del(&offload_ctx); + ovsrcu_postpone(delete_conn, conn); atomic_count_dec(&ct->n_conn); } @@ -1396,6 +1404,31 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, helper, alg_exp, ct_alg_ctl, tp_id); } ovs_mutex_unlock(&ct->ct_lock); + + if (conn) { + struct ct_offload_ctx offload_ctx = { + .conn = conn, + .netdev_in = NULL, + .input_port_id = pkt->md.in_port.odp_port, + }; + ct_offload_conn_add(&offload_ctx); + } + } + + if (!create_new_conn && conn && ctx->reply && + (pkt->md.ct_state & CS_ESTABLISHED) && + ct_offload_conn_is_offloaded(conn) && + !ct_offload_conn_is_established(conn)) { + /* Notify offload providers that the connection is established. + * We use the reply bit to detect that the connection has + * transitioned and give us the input port, which should be the + * reverse direction port. */ + struct ct_offload_ctx offload_ctx = { + .conn = conn, + .netdev_in = NULL, + .input_port_id = pkt->md.in_port.odp_port, + }; + ct_offload_conn_established(&offload_ctx); } write_ct_md(pkt, zone, conn, &ctx->key, alg_exp); @@ -1541,19 +1574,59 @@ ct_sweep(struct conntrack *ct, struct rculist *list, long long now, size_t *cleaned_count) OVS_NO_THREAD_SAFETY_ANALYSIS { + struct ct_offload_op_batch batch; + struct ct_offload_op *op; + struct conn *conn; size_t cleaned = 0; size_t count = 0; + + ct_offload_op_batch_init(&batch); + RCULIST_FOR_EACH (conn, node, list) { if (conn_expired(conn, now)) { - conn_clean(ct, conn); - cleaned++; + if (!ct_offload_conn_is_offloaded(conn)) { + conn_clean(ct, conn); + cleaned++; + } else { + struct ct_offload_ctx offload_ctx = { + .conn = conn, + .netdev_in = NULL, + .input_port_id = ODPP_NONE, + }; + + ct_offload_op_batch_add(&batch, CT_OFFLOAD_OP_UPD, + &offload_ctx); + } } - count++; } + /* Run the batch. */ + ct_offload_op_batch_submit(&batch); + + CT_OFFLOAD_BATCH_OP_FOR_EACH (idx, op, &batch) { + struct conn *c = CONST_CAST(struct conn *, op->ctx.conn); + + if (op->error) { + conn_clean(ct, c); + cleaned++; + } else { + /* Extend expiration by one sweep interval from now so the + * connection survives until the next pass. */ + long long new_exp = now + conntrack_get_sweep_interval(ct); + long long cur; + + atomic_read_relaxed(&c->expiration, &cur); + if (new_exp > cur) { + atomic_store_relaxed(&c->expiration, new_exp); + } + } + } + + ct_offload_op_batch_destroy(&batch); + if (cleaned_count) { *cleaned_count = cleaned; } From patchwork Wed Apr 8 17:06:05 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221012 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=GHXK7E8W; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTy96Y3Xz1xv0 for ; Thu, 09 Apr 2026 03:07:13 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id EFEE261003; Wed, 8 Apr 2026 17:07:11 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id Q8Esbd4qdxLg; Wed, 8 Apr 2026 17:07:10 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 3AD1E6102C Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=GHXK7E8W Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id 3AD1E6102C; Wed, 8 Apr 2026 17:07:07 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 14056C0905; Wed, 8 Apr 2026 17:07:07 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 04B91C0549 for ; Wed, 8 Apr 2026 17:07:06 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 931A960FD2 for ; Wed, 8 Apr 2026 17:06:42 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id FTmL9dZI9vva for ; Wed, 8 Apr 2026 17:06:41 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org 80E8C60FBD Authentication-Results: smtp3.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 80E8C60FBD Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id 80E8C60FBD for ; Wed, 8 Apr 2026 17:06:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775668000; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jsWCh3LR/TCzmOEC16eVc11EX+zB5Vaq6TZEiFxm7j8=; b=GHXK7E8W+z9C9o17W6jLBnB2Tm0wTWvEbvGCQbBKTo4ZzDj2aqVgJD166nfbT1Dl9hqaM3 bOeluRvUEIuqSFHsWYjMyKlU69zJ+Xz60FlKpT/tZASRfkQbDvoCQ8yfGNeSuiwr2/VkSB L9/c8hvb3ZXtQxCDNedMt53JVyoMH5k= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-167-kcvqMyxoPbqIJ9IDm9A7PQ-1; Wed, 08 Apr 2026 13:06:37 -0400 X-MC-Unique: kcvqMyxoPbqIJ9IDm9A7PQ-1 X-Mimecast-MFC-AGG-ID: kcvqMyxoPbqIJ9IDm9A7PQ_1775667996 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3841918005B3; Wed, 8 Apr 2026 17:06:36 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 227A0300019F; Wed, 8 Apr 2026 17:06:33 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:06:05 -0400 Message-ID: <20260408170613.587902-10-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 9zBmJQMF-J3SPh-q8Z0GGp15PS80NF2Ypa40bkLUH9Y_1775667996 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 09/12] ct-offload: Add configuration infrastructure. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This allows configuring hardware offload on/off. The ct_sweep expiration is always trying to fill the ops batch anyway, so it doesn't need an actual check for enabled / disabled. Wrapping that code in a check may also be harmful in the event that offload is disabled with offloaded connections. Signed-off-by: Aaron Conole --- lib/conntrack.c | 25 +++++++++++++---------- lib/ct-offload.c | 51 +++++++++++++++++++++++++++++++++++++++++++++- lib/ct-offload.h | 8 ++++++++ lib/dpif-offload.c | 13 ++++++++++++ lib/dpif-offload.h | 1 + vswitchd/bridge.c | 4 ++++ 6 files changed, 90 insertions(+), 12 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index 9872d7af51..e59630aa2b 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -534,12 +534,14 @@ conn_clean(struct conntrack *ct, struct conn *conn) atomic_count_dec(&zl->czl.count); } - struct ct_offload_ctx offload_ctx = { - .conn = conn, - .netdev_in = NULL, - .input_port_id = ODPP_NONE, - }; - ct_offload_conn_del(&offload_ctx); + if (ct_offload_enabled()) { + struct ct_offload_ctx offload_ctx = { + .conn = conn, + .netdev_in = NULL, + .input_port_id = ODPP_NONE, + }; + ct_offload_conn_del(&offload_ctx); + } ovsrcu_postpone(delete_conn, conn); atomic_count_dec(&ct->n_conn); @@ -1405,7 +1407,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, } ovs_mutex_unlock(&ct->ct_lock); - if (conn) { + if (conn && ct_offload_enabled()) { struct ct_offload_ctx offload_ctx = { .conn = conn, .netdev_in = NULL, @@ -1417,6 +1419,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, if (!create_new_conn && conn && ctx->reply && (pkt->md.ct_state & CS_ESTABLISHED) && + ct_offload_enabled() && ct_offload_conn_is_offloaded(conn) && !ct_offload_conn_is_established(conn)) { /* Notify offload providers that the connection is established. @@ -1581,7 +1584,6 @@ ct_sweep(struct conntrack *ct, struct rculist *list, long long now, size_t cleaned = 0; size_t count = 0; - ct_offload_op_batch_init(&batch); RCULIST_FOR_EACH (conn, node, list) { @@ -1595,7 +1597,6 @@ ct_sweep(struct conntrack *ct, struct rculist *list, long long now, .netdev_in = NULL, .input_port_id = ODPP_NONE, }; - ct_offload_op_batch_add(&batch, CT_OFFLOAD_OP_UPD, &offload_ctx); } @@ -1603,12 +1604,14 @@ ct_sweep(struct conntrack *ct, struct rculist *list, long long now, count++; } - /* Run the batch. */ + /* Run the batch: providers that can supply a hw last-used timestamp + * return error==0, allowing us to extend the expiration. A non-zero + * error (typically ENODATA) means the connection has no hw activity + * and should be expired normally. */ ct_offload_op_batch_submit(&batch); CT_OFFLOAD_BATCH_OP_FOR_EACH (idx, op, &batch) { struct conn *c = CONST_CAST(struct conn *, op->ctx.conn); - if (op->error) { conn_clean(ct, c); cleaned++; diff --git a/lib/ct-offload.c b/lib/ct-offload.c index 618bd655d0..b777801ab9 100644 --- a/lib/ct-offload.c +++ b/lib/ct-offload.c @@ -20,8 +20,10 @@ #include "conntrack.h" #include "conntrack-private.h" #include "ct-offload.h" +#include "dpif-offload.h" #include "ovs-thread.h" #include "util.h" +#include "vswitch-idl.h" #include "openvswitch/list.h" #include "openvswitch/vlog.h" @@ -51,6 +53,12 @@ static struct ovs_list ct_offload_classes OVS_GUARDED_BY(ct_offload_mutex) = OVS_LIST_INITIALIZER(&ct_offload_classes); +/* Built-in CT offload provider classes. Only those whose name matches a + * registered dpif offload class will be activated by ct_offload_module_init(). + */ +static const struct ct_offload_class *base_ct_offload_classes[] = { +}; + /* ct_offload_register() - register a CT offload provider class. * @@ -140,11 +148,52 @@ ct_offload_alloc_private_slot(void) /* ct_offload_module_init() - register built-in CT offload providers. * - * Must be called once before any connections are created. */ + * Only registers providers whose name matches a currently-registered dpif + * offload class, so CT offload is automatically tied to the active hardware + * offload provider. Safe to call multiple times; subsequent calls are + * no-ops (duplicate registration is detected and skipped). */ void ct_offload_module_init(void) { ct_offload_alloc_private_slot(); + + for (int i = 0; i < ARRAY_SIZE(base_ct_offload_classes); i++) { + const struct ct_offload_class *class = base_ct_offload_classes[i]; + + if (dpif_offload_class_is_registered(class->name)) { + ct_offload_register(class); + } + } +} + +/* ct_offload_enabled() - returns true when hardware offload is active. + * + * Delegates to dpif_offload_enabled() so CT offload shares the same global + * enable/disable knob as datapath hardware offload. */ +bool +ct_offload_enabled(void) +{ + return dpif_offload_enabled(); +} + +/* ct_offload_set_global_cfg() - configure CT offload from OVSDB. + * + * Must be called alongside dpif_offload_set_global_cfg() so that CT offload + * providers are registered once hardware offload has been enabled and the + * appropriate dpif offload classes are known. */ +void +ct_offload_set_global_cfg(const struct ovsrec_open_vswitch *cfg OVS_UNUSED) +{ + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; + + if (!dpif_offload_enabled()) { + return; + } + + if (ovsthread_once_start(&once)) { + ct_offload_module_init(); + ovsthread_once_done(&once); + } } /* ct_offload_conn_add_() - notify all eligible providers of a new connection. diff --git a/lib/ct-offload.h b/lib/ct-offload.h index fcb3170fa1..fe4ecd33b8 100644 --- a/lib/ct-offload.h +++ b/lib/ct-offload.h @@ -21,6 +21,7 @@ struct conn; struct netdev; +struct ovsrec_open_vswitch; /* Context for offload as part of the callbacks that all connection * offload APIs receive. @@ -91,6 +92,13 @@ void ct_offload_unregister(const struct ct_offload_class *); void ct_offload_alloc_private_slot(void); /* Module initialization (register built-in providers). */ void ct_offload_module_init(void); +/* Global configuration: call alongside dpif_offload_set_global_cfg() to + * enable CT offload when hardware offload is active. */ +void ct_offload_set_global_cfg(const struct ovsrec_open_vswitch *); + +/* Returns true when CT offload is enabled (delegates to dpif_offload_enabled). + */ +bool ct_offload_enabled(void); /* Per-connection offload API that dispatches to all registered providers. */ int ct_offload_conn_add(const struct ct_offload_ctx *); diff --git a/lib/dpif-offload.c b/lib/dpif-offload.c index bb2feced9e..30cc4fd271 100644 --- a/lib/dpif-offload.c +++ b/lib/dpif-offload.c @@ -516,6 +516,19 @@ dpif_offload_enabled(void) return enabled; } +/* dpif_offload_class_is_registered() - returns true if a dpif offload class + * with the given name has been successfully registered. */ +bool +dpif_offload_class_is_registered(const char *name) +{ + bool found; + + ovs_mutex_lock(&dpif_offload_mutex); + found = shash_find(&dpif_offload_classes, name) != NULL; + ovs_mutex_unlock(&dpif_offload_mutex); + return found; +} + bool dpif_offload_rebalance_policy_enabled(void) { diff --git a/lib/dpif-offload.h b/lib/dpif-offload.h index 7fad3ebee3..d95e4b9463 100644 --- a/lib/dpif-offload.h +++ b/lib/dpif-offload.h @@ -45,6 +45,7 @@ enum dpif_offload_impl_type { void dpif_offload_set_global_cfg(const struct ovsrec_open_vswitch *); bool dpif_offload_enabled(void); bool dpif_offload_rebalance_policy_enabled(void); +bool dpif_offload_class_is_registered(const char *name); /* Per dpif specific functions. */ diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index 7a68e19ac3..91ffe76648 100644 --- a/vswitchd/bridge.c +++ b/vswitchd/bridge.c @@ -28,6 +28,7 @@ #include "daemon.h" #include "dirs.h" #include "dpif.h" +#include "ct-offload.h" #include "dpif-offload.h" #include "dpdk.h" #include "hash.h" @@ -543,6 +544,8 @@ bridge_init(const char *remote) void bridge_exit(bool delete_datapath) { + ct_offload_flush(); + if_notifier_manual_set_cb(NULL); if_notifier_destroy(ifnotifier); seq_destroy(ifaces_changed); @@ -3396,6 +3399,7 @@ bridge_run(void) if (cfg && ovsdb_idl_get_seqno(idl) != idl_seqno) { dpif_offload_set_global_cfg(cfg); + ct_offload_set_global_cfg(cfg); } if (cfg) { From patchwork Wed Apr 8 17:06:06 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221013 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=W4SY8Cho; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTyC5w3yz1xv0 for ; Thu, 09 Apr 2026 03:07:15 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 74A64408F6; Wed, 8 Apr 2026 17:07:14 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 8AjK9X32yEFw; Wed, 8 Apr 2026 17:07:12 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 1BF46408AB Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=W4SY8Cho Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTPS id 1BF46408AB; Wed, 8 Apr 2026 17:07:08 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id B7CCCC0903; Wed, 8 Apr 2026 17:07:07 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 13535C0904 for ; Wed, 8 Apr 2026 17:07:07 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id C6ABD4051C for ; Wed, 8 Apr 2026 17:06:42 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id YO22MN7yJEUe for ; Wed, 8 Apr 2026 17:06:42 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org D96CD407FB Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org D96CD407FB Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id D96CD407FB for ; Wed, 8 Apr 2026 17:06:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775668000; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QVbK0aQ1aUaNEWy7oYrBR6yLKJYUEQzlo9UJQ681MwM=; b=W4SY8Cho+ragXJWGnKUF236vmrt+O7QUnCWu0aLBcyklYXwsWO0+DtgV99HzcJsWBDm+H2 eLtJ4zbWUQqb3DoBN3Vk+d0Hsxk2++9K524jGF0hF2BsO/ucRvA8tfDX43yYdhBKqy0wbl mlcy7MdQWezQvM6qxFq8kDbpR4nsFa0= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-199-j1C4kDroM0qt6Nsm6Ee1fA-1; Wed, 08 Apr 2026 13:06:39 -0400 X-MC-Unique: j1C4kDroM0qt6Nsm6Ee1fA-1 X-Mimecast-MFC-AGG-ID: j1C4kDroM0qt6Nsm6Ee1fA_1775667998 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3EF401956060; Wed, 8 Apr 2026 17:06:38 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 77D52300019F; Wed, 8 Apr 2026 17:06:36 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:06:06 -0400 Message-ID: <20260408170613.587902-11-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: SFDuyCD9G5Wcn42Tqrn8zk1LWXz9K3Rch0Cx0BsUSNc_1775667998 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 10/12] conntrack: Propagate input netdev pointer to conntrack. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Offloading providers will need input port details in order to correctly map the packet movement. They will also need the output port mapping for the batch, but that will come in the future. Signed-off-by: Aaron Conole --- lib/conntrack.c | 10 +++++----- lib/conntrack.h | 5 ++++- lib/dpif-netdev.c | 14 +++++++++++++- tests/test-conntrack.c | 18 +++++++++--------- 4 files changed, 31 insertions(+), 16 deletions(-) diff --git a/lib/conntrack.c b/lib/conntrack.c index e59630aa2b..5491b3471f 100644 --- a/lib/conntrack.c +++ b/lib/conntrack.c @@ -1335,7 +1335,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, bool force, bool commit, long long now, const uint32_t *setmark, const struct ovs_key_ct_labels *setlabel, const struct nat_action_info_t *nat_action_info, - const char *helper, uint32_t tp_id) + const char *helper, uint32_t tp_id, struct netdev *in_netdev) { /* Reset ct_state whenever entering a new zone. */ if (pkt->md.ct_state && pkt->md.ct_zone != zone) { @@ -1410,7 +1410,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, if (conn && ct_offload_enabled()) { struct ct_offload_ctx offload_ctx = { .conn = conn, - .netdev_in = NULL, + .netdev_in = in_netdev, .input_port_id = pkt->md.in_port.odp_port, }; ct_offload_conn_add(&offload_ctx); @@ -1428,7 +1428,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt, * reverse direction port. */ struct ct_offload_ctx offload_ctx = { .conn = conn, - .netdev_in = NULL, + .netdev_in = in_netdev, .input_port_id = pkt->md.in_port.odp_port, }; ct_offload_conn_established(&offload_ctx); @@ -1465,7 +1465,7 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, const struct ovs_key_ct_labels *setlabel, const char *helper, const struct nat_action_info_t *nat_action_info, - long long now, uint32_t tp_id) + long long now, uint32_t tp_id, struct netdev *in_netdev) { odp_port_t in_port = ODPP_LOCAL; struct conn_lookup_ctx ctx; @@ -1502,7 +1502,7 @@ conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, write_ct_md(packet, zone, NULL, NULL, NULL); } else { process_one(ct, packet, &ctx, zone, force, commit, now, setmark, - setlabel, nat_action_info, helper, tp_id); + setlabel, nat_action_info, helper, tp_id, in_netdev); } } diff --git a/lib/conntrack.h b/lib/conntrack.h index e5ca1528bf..fc4a529e2a 100644 --- a/lib/conntrack.h +++ b/lib/conntrack.h @@ -130,6 +130,8 @@ typedef unsigned int ct_private_id_t; * this slot (i.e. at module initialization time). */ ct_private_id_t conn_private_id_alloc(void (*destructor)(void *)); +struct netdev; + struct conntrack *conntrack_init(void); void conntrack_destroy(struct conntrack *); @@ -139,7 +141,8 @@ int conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch, const struct ovs_key_ct_labels *setlabel, const char *helper, const struct nat_action_info_t *nat_action_info, - long long now, uint32_t tp_id); + long long now, uint32_t tp_id, + struct netdev *in_netdev); void conntrack_clear(struct dp_packet *packet); struct conntrack_dump { diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 9df05c4c28..b721169758 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -8630,6 +8630,8 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_, break; case OVS_ACTION_ATTR_CT: { + struct dp_netdev_port *in_port_p = NULL;; + struct netdev *in_netdev = NULL; const struct nlattr *b; bool force = false; bool commit = false; @@ -8762,9 +8764,19 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_, VLOG_WARN_RL(&rl, "NAT specified without commit."); } + if (!dp_packet_batch_is_empty(packets_)) { + odp_port_t query_port = packets_->packets[0]->md.in_port.odp_port; + in_port_p = dp_netdev_lookup_port(dp, query_port); + } + + if (in_port_p) { + in_netdev = in_port_p->netdev; + } + conntrack_execute(dp->conntrack, packets_, aux->flow->dl_type, force, commit, zone, setmark, setlabel, helper, - nat_action_info_ref, pmd->ctx.now / 1000, tp_id); + nat_action_info_ref, pmd->ctx.now / 1000, tp_id, + in_netdev); break; } diff --git a/tests/test-conntrack.c b/tests/test-conntrack.c index 7f42adbb55..3c409b373b 100644 --- a/tests/test-conntrack.c +++ b/tests/test-conntrack.c @@ -198,7 +198,7 @@ ct_thread_main(void *aux_) ovs_barrier_block(&barrier); for (i = 0; i < n_pkts; i += batch_size) { conntrack_execute(ct, pkt_batch, dl_type, false, true, 0, NULL, NULL, - NULL, NULL, now, 0); + NULL, NULL, now, 0, NULL); DP_PACKET_BATCH_FOR_EACH (j, pkt, pkt_batch) { pkt_metadata_init_conn(&pkt->md); } @@ -311,7 +311,7 @@ test_benchmark_zones(struct ovs_cmdl_context *ctx) for (i = 0; i < n_zones; i++) { for (j = 0; j < n_conns; j++) { conntrack_execute(ct, pkt_batch[j], dl_type, false, true, i, - NULL, NULL, NULL, NULL, now, 0); + NULL, NULL, NULL, NULL, now, 0, NULL); pkt_metadata_init_conn(&pkt_batch[j]->packets[0]->md); } } @@ -334,7 +334,7 @@ test_benchmark_zones(struct ovs_cmdl_context *ctx) stopwatch_start(STOPWATCH_CT_EXECUTE_COMMIT, time_usec()); for (j = 0; j < n_conns; j++) { conntrack_execute(ct, pkt_batch[j], dl_type, false, true, zone, - NULL, NULL, NULL, NULL, now, 0); + NULL, NULL, NULL, NULL, now, 0, NULL); pkt_metadata_init_conn(&pkt_batch[j]->packets[0]->md); } stopwatch_stop(STOPWATCH_CT_EXECUTE_COMMIT, time_usec()); @@ -343,7 +343,7 @@ test_benchmark_zones(struct ovs_cmdl_context *ctx) stopwatch_start(STOPWATCH_CT_EXECUTE_NO_COMMIT, time_usec()); for (j = 0; j < n_conns; j++) { conntrack_execute(ct, pkt_batch[j], dl_type, false, false, zone, - NULL, NULL, NULL, NULL, now, 0); + NULL, NULL, NULL, NULL, now, 0, NULL); pkt_metadata_init_conn(&pkt_batch[j]->packets[0]->md); } stopwatch_stop(STOPWATCH_CT_EXECUTE_NO_COMMIT, time_usec()); @@ -419,7 +419,7 @@ pcap_batch_execute_conntrack(struct conntrack *ct_, if (flow.dl_type != dl_type) { conntrack_execute(ct_, &new_batch, dl_type, false, true, 0, - NULL, NULL, NULL, NULL, now, 0); + NULL, NULL, NULL, NULL, now, 0, NULL); dp_packet_batch_init(&new_batch); } dp_packet_batch_add(&new_batch, packet); @@ -427,7 +427,7 @@ pcap_batch_execute_conntrack(struct conntrack *ct_, if (!dp_packet_batch_is_empty(&new_batch)) { conntrack_execute(ct_, &new_batch, dl_type, false, true, 0, NULL, NULL, - NULL, NULL, now, 0); + NULL, NULL, now, 0, NULL); } } @@ -540,7 +540,7 @@ test_ftp_alg_large_payload(struct ovs_cmdl_context *ctx OVS_UNUSED) struct dp_packet_batch syn_batch; dp_packet_batch_init_packet(&syn_batch, syn); conntrack_execute(ct, &syn_batch, htons(ETH_TYPE_IP), false, true, 0, - NULL, NULL, "ftp", &nat_info, now, 0); + NULL, NULL, "ftp", &nat_info, now, 0, NULL); dp_packet_delete_batch(&syn_batch, true); /* We get to skip some of the processing because the conntrack execute @@ -563,7 +563,7 @@ test_ftp_alg_large_payload(struct ovs_cmdl_context *ctx OVS_UNUSED) struct dp_packet_batch port_batch; dp_packet_batch_init_packet(&port_batch, port_pkt); conntrack_execute(ct, &port_batch, htons(ETH_TYPE_IP), false, true, 0, - NULL, NULL, "ftp", &nat_info, now, 0); + NULL, NULL, "ftp", &nat_info, now, 0, NULL); struct tcp_header *th = dp_packet_l4(port_pkt); size_t tcp_hdr_len = TCP_OFFSET(th->tcp_ctl) * 4; @@ -660,7 +660,7 @@ test_private_destructor(struct ovs_cmdl_context *ctx OVS_UNUSED) long long now = time_msec(); conntrack_execute(lct, &batch, dl_type, false, true, 0, - NULL, NULL, NULL, NULL, now, 0); + NULL, NULL, NULL, NULL, now, 0, NULL); /* After a committed execute the packet carries a cached conn pointer. */ struct conn *conn = pkt->md.conn; From patchwork Wed Apr 8 17:06:07 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221014 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=QxQ2m/xL; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTyV3rVVz1xv0 for ; Thu, 09 Apr 2026 03:07:30 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 683FC824DF; Wed, 8 Apr 2026 17:07:28 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 8dGeEhqDOKzP; Wed, 8 Apr 2026 17:07:23 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 0FF7F82861 Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=QxQ2m/xL Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTPS id 0FF7F82861; Wed, 8 Apr 2026 17:07:20 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 05F1BC054A; Wed, 8 Apr 2026 17:07:20 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 04850C0902 for ; Wed, 8 Apr 2026 17:07:19 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id ADD8F60F9D for ; Wed, 8 Apr 2026 17:06:47 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id XOzoG9zeYnL9 for ; Wed, 8 Apr 2026 17:06:46 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.129.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org D0C1060FA7 Authentication-Results: smtp3.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org D0C1060FA7 Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=QxQ2m/xL Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id D0C1060FA7 for ; Wed, 8 Apr 2026 17:06:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775668004; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=967KTdKt3UxhOnjgK316UNFyip0Iv+PUu9GdP6AK710=; b=QxQ2m/xL6jdoHiFLA+6tmhJcpxy4PgNXKL9V8d6cZmBBgLsdOuWAWybWuQ/eGZDPHocvVN syygfhNth5KpLtqdVZIFjLKftfXSBqhNwTb3VxM7dRoJFMEoFnFLoZdP4hDwMtCGmFoAwy Mnw6D1yCo6TRYVQ2ksuGdhIiJOcCHA0= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-561-HaIZrK0LPtGnRsKFo3ZWKg-1; Wed, 08 Apr 2026 13:06:41 -0400 X-MC-Unique: HaIZrK0LPtGnRsKFo3ZWKg-1 X-Mimecast-MFC-AGG-ID: HaIZrK0LPtGnRsKFo3ZWKg_1775668000 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 69DA7195608A; Wed, 8 Apr 2026 17:06:40 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 9DB94300019F; Wed, 8 Apr 2026 17:06:38 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:06:07 -0400 Message-ID: <20260408170613.587902-12-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: _gnBeOEOZqn-xnQvSG32QhfsIyNCOanT4muWwHG0ZEM_1775668000 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 11/12] ct-offload-dummy: Introduce dummy ct offload. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This includes a test netdev offload an a suite of unit tests to ensure functionality. To facilitate the testing, some special offload APIs are added that force offload to true. It is expected that these are not called unless within a testing environment. Signed-off-by: Aaron Conole --- lib/automake.mk | 2 + lib/ct-offload-dummy.c | 253 +++++++++++++++++++++++++++++++++ lib/ct-offload-dummy.h | 64 +++++++++ lib/ct-offload.c | 12 +- lib/ct-offload.h | 10 ++ tests/dpif-netdev.at | 72 ++++++++++ tests/library.at | 36 +++++ tests/test-conntrack.c | 314 +++++++++++++++++++++++++++++++++++++++++ 8 files changed, 762 insertions(+), 1 deletion(-) create mode 100644 lib/ct-offload-dummy.c create mode 100644 lib/ct-offload-dummy.h diff --git a/lib/automake.mk b/lib/automake.mk index f11e3de27c..b9dc5118fa 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -99,6 +99,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/conntrack.h \ lib/ct-offload.c \ lib/ct-offload.h \ + lib/ct-offload-dummy.c \ + lib/ct-offload-dummy.h \ lib/cooperative-multitasking.c \ lib/cooperative-multitasking.h \ lib/cooperative-multitasking-private.h \ diff --git a/lib/ct-offload-dummy.c b/lib/ct-offload-dummy.c new file mode 100644 index 0000000000..c85f478e6c --- /dev/null +++ b/lib/ct-offload-dummy.c @@ -0,0 +1,253 @@ +/* + * Copyright (c) 2026 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include "ct-offload-dummy.h" +#include "ct-offload.h" +#include "hash.h" +#include "openvswitch/list.h" +#include "openvswitch/vlog.h" +#include "ovs-thread.h" +#include "timeval.h" +#include "util.h" + +VLOG_DEFINE_THIS_MODULE(ct_offload_dummy); + +/* ----------------------------------------------------------------------- + * Per-connection tracking + * ----------------------------------------------------------------------- */ + +struct ct_dummy_entry { + struct ovs_list list_node; + const struct conn *conn; + struct netdev *netdev_fwd_in; + struct netdev *netdev_rev_in; +}; + +/* ct-offload infrastructure guarantees that we get called under the offload + * mutex, but the counters that we have are simple ints that can be erased + * at any time from any thread, so we have this extra mutex for consistency. + */ +static struct ovs_mutex dummy_mutex = OVS_MUTEX_INITIALIZER; + +/* Since this is a testing interface, we can use the above mutex when checking + * the fake list of offloaded connections for other properties (like the + * bidireactionality, etc). A proper hardware offload implementation shouldn't + * generally need this amount of critical sections. + */ +static struct ovs_list dummy_conns OVS_GUARDED_BY(dummy_mutex) + = OVS_LIST_INITIALIZER(&dummy_conns); + +static unsigned int n_added = 0; +static unsigned int n_deleted = 0; +static unsigned int n_updated = 0; +static unsigned int n_established = 0; + +/* Lookup must be called with dummy_mutex held. */ +static struct ct_dummy_entry * +dummy_find__(const struct conn *conn) + OVS_REQUIRES(dummy_mutex) +{ + struct ct_dummy_entry *e; + + LIST_FOR_EACH (e, list_node, &dummy_conns) { + if (e->conn == conn) { + return e; + } + } + return NULL; +} + +static bool +dummy_can_offload(const struct ct_offload_ctx *ctx OVS_UNUSED) +{ + /* Always accept that we can offload in the dummy provider */ + return true; +} + +static int +dummy_conn_add(const struct ct_offload_ctx *ctx) +{ + struct ct_dummy_entry *e = xmalloc(sizeof *e); + + e->conn = ctx->conn; + e->netdev_fwd_in = ctx->netdev_in; + e->netdev_rev_in = NULL; + + ovs_mutex_lock(&dummy_mutex); + ovs_list_push_back(&dummy_conns, &e->list_node); + n_added++; + ovs_mutex_unlock(&dummy_mutex); + + VLOG_DBG("ct_offload_dummy: conn add: conn=%p, netdev_fwd_in=%p", + ctx->conn, ctx->netdev_in); + return 0; +} + +static void +dummy_conn_del(const struct ct_offload_ctx *ctx) +{ + ovs_mutex_lock(&dummy_mutex); + struct ct_dummy_entry *e = dummy_find__(ctx->conn); + + if (e) { + ovs_list_remove(&e->list_node); + n_deleted++; + free(e); + } + ovs_mutex_unlock(&dummy_mutex); + + VLOG_DBG("ct_offload_dummy: conn del: conn=%p", ctx->conn); +} + +static void +dummy_conn_established(const struct ct_offload_ctx *ctx) +{ + ovs_mutex_lock(&dummy_mutex); + struct ct_dummy_entry *e = dummy_find__(ctx->conn); + + if (e && !e->netdev_rev_in) { + e->netdev_rev_in = ctx->netdev_in; + n_established++; + VLOG_DBG("ct_offload_dummy: conn established: conn=%p " + "netdev_fwd_in=%p netdev_rev_in=%p", + ctx->conn, e->netdev_fwd_in, e->netdev_rev_in); + } + ovs_mutex_unlock(&dummy_mutex); +} + +static long long +dummy_conn_update(const struct ct_offload_ctx *ctx) +{ + ovs_mutex_lock(&dummy_mutex); + struct ct_dummy_entry *e = dummy_find__(ctx->conn); + + if (!e) { + ovs_mutex_unlock(&dummy_mutex); + return 0; + } + + n_updated++; + ovs_mutex_unlock(&dummy_mutex); + + VLOG_DBG("ct_offload_dummy: conn update: conn=%p", ctx->conn); + return time_msec(); +} + +static void +dummy_flush(void) +{ + ovs_mutex_lock(&dummy_mutex); + struct ct_dummy_entry *e; + LIST_FOR_EACH_POP (e, list_node, &dummy_conns) { + n_deleted++; + free(e); + } + ovs_mutex_unlock(&dummy_mutex); +} + +/* ----------------------------------------------------------------------- + * Provider class + * ----------------------------------------------------------------------- */ + +const struct ct_offload_class ct_offload_dummy_class = { + .name = "dummy", + .init = NULL, + .batch_submit = NULL, + .conn_add = dummy_conn_add, + .conn_del = dummy_conn_del, + .conn_update = dummy_conn_update, + .conn_established = dummy_conn_established, + .can_offload = dummy_can_offload, + .flush = dummy_flush, +}; + +/* ----------------------------------------------------------------------- + * Public API + * ----------------------------------------------------------------------- */ + +void +ct_offload_dummy_register(void) +{ + ct_offload_dummy_reset_counters(); + ct_offload_register(&ct_offload_dummy_class); +} + +void +ct_offload_dummy_unregister(void) +{ + /* Flush any leftover entries before unregistering so we do not leak. */ + dummy_flush(); + ct_offload_unregister(&ct_offload_dummy_class); +} + +unsigned int +ct_offload_dummy_n_added(void) +{ + return n_added; +} + +unsigned int +ct_offload_dummy_n_deleted(void) +{ + return n_deleted; +} + +unsigned int +ct_offload_dummy_n_updated(void) +{ + return n_updated; +} + +unsigned int +ct_offload_dummy_n_established(void) +{ + return n_established; +} + +void +ct_offload_dummy_reset_counters(void) +{ + ovs_mutex_lock(&dummy_mutex); + n_added = 0; + n_deleted = 0; + n_updated = 0; + n_established = 0; + ovs_mutex_unlock(&dummy_mutex); +} + +bool +ct_offload_dummy_contains(const struct conn *conn) +{ + ovs_mutex_lock(&dummy_mutex); + bool found = dummy_find__(conn) != NULL; + ovs_mutex_unlock(&dummy_mutex); + return found; +} + +/* Returns true if the dummy provider has seen both the forward-direction + * input netdev (recorded at conn_add) and the reply-direction input netdev + * (recorded at conn_established) for 'conn'. */ +bool +ct_offload_dummy_is_bidirectional(const struct conn *conn) +{ + ovs_mutex_lock(&dummy_mutex); + struct ct_dummy_entry *e = dummy_find__(conn); + bool bidi = e && e->netdev_fwd_in && e->netdev_rev_in; + ovs_mutex_unlock(&dummy_mutex); + return bidi; +} diff --git a/lib/ct-offload-dummy.h b/lib/ct-offload-dummy.h new file mode 100644 index 0000000000..1e7ecfdb04 --- /dev/null +++ b/lib/ct-offload-dummy.h @@ -0,0 +1,64 @@ +/* + * Copyright (c) 2026 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef CT_OFFLOAD_DUMMY_H +#define CT_OFFLOAD_DUMMY_H 1 + +/* Dummy CT offload provider + * ========================= + * + * A software-only implementation of the ct_offload_class interface used for + * unit testing. It records every conn_add/conn_del/conn_update call and + * exposes inspection helpers so tests can verify that the correct hooks are + * reached without requiring any hardware. + * + * Typical usage: + * + * ct_offload_dummy_register(); // activate the provider + * conntrack_execute(...); // exercises conn_add + * ovs_assert(ct_offload_dummy_n_added() == 1); + * conntrack_flush(...); // exercises conn_del + * ovs_assert(ct_offload_dummy_n_deleted() == 1); + * ct_offload_dummy_unregister(); // tear down after test + */ + +#include + +struct conn; + +/* Register (or unregister) the dummy provider. + * + * ct_offload_dummy_register() also marks CT offload as "enabled" within the + * dummy so that the guards in conntrack.c fire even without hardware offload + * being configured globally. Call ct_offload_dummy_unregister() to undo. */ +void ct_offload_dummy_register(void); +void ct_offload_dummy_unregister(void); + +/* Counters. Initialized to zero and can be reset. */ +unsigned int ct_offload_dummy_n_added(void); +unsigned int ct_offload_dummy_n_deleted(void); +unsigned int ct_offload_dummy_n_updated(void); +unsigned int ct_offload_dummy_n_established(void); + +/* Reset all counters without changing registered state. */ +void ct_offload_dummy_reset_counters(void); + +/* Returns true if 'conn' is currently tracked by the dummy (was added but + * not yet deleted or flushed). */ +bool ct_offload_dummy_contains(const struct conn *conn); +bool ct_offload_dummy_is_bidirectional(const struct conn *conn); + +#endif /* CT_OFFLOAD_DUMMY_H */ diff --git a/lib/ct-offload.c b/lib/ct-offload.c index b777801ab9..707e71c03f 100644 --- a/lib/ct-offload.c +++ b/lib/ct-offload.c @@ -57,6 +57,10 @@ static struct ovs_list ct_offload_classes * registered dpif offload class will be activated by ct_offload_module_init(). */ static const struct ct_offload_class *base_ct_offload_classes[] = { + /* Dummy provider: activated whenever the "dummy" dpif offload class is + * registered (hw-offload=true with a dummy datapath). Also used directly + * by unit tests via ct_offload_dummy_register(). */ + &ct_offload_dummy_class, }; @@ -166,6 +170,12 @@ ct_offload_module_init(void) } } +static bool ct_offload_forced = false; +void ct_offload_force_enable(bool value) +{ + ct_offload_forced = value; +} + /* ct_offload_enabled() - returns true when hardware offload is active. * * Delegates to dpif_offload_enabled() so CT offload shares the same global @@ -173,7 +183,7 @@ ct_offload_module_init(void) bool ct_offload_enabled(void) { - return dpif_offload_enabled(); + return dpif_offload_enabled() || ct_offload_forced; } /* ct_offload_set_global_cfg() - configure CT offload from OVSDB. diff --git a/lib/ct-offload.h b/lib/ct-offload.h index fe4ecd33b8..3836852703 100644 --- a/lib/ct-offload.h +++ b/lib/ct-offload.h @@ -83,6 +83,12 @@ struct ct_offload_class { void (*flush)(void); }; +/* Dummy (software-only) CT offload provider, always compiled in. + * Registered automatically when the "dummy" dpif offload class is active + * (e.g. hw-offload=true with a dummy datapath), and available directly for + * unit tests via ct_offload_dummy_register() in ct-offload-dummy.h. */ +extern const struct ct_offload_class ct_offload_dummy_class; + /* Register/unregister a provider. Must be called at module init, before * any connections are created. */ int ct_offload_register(const struct ct_offload_class *); @@ -100,6 +106,10 @@ void ct_offload_set_global_cfg(const struct ovsrec_open_vswitch *); */ bool ct_offload_enabled(void); +/* Used for testing. Forces an additional parameter for the offload enable + * check. Set to 'true' to always enable the offloads. */ +void ct_offload_force_enable(bool); + /* Per-connection offload API that dispatches to all registered providers. */ int ct_offload_conn_add(const struct ct_offload_ctx *); void ct_offload_conn_del(const struct ct_offload_ctx *); diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at index 2311979709..ae890f72fb 100644 --- a/tests/dpif-netdev.at +++ b/tests/dpif-netdev.at @@ -50,6 +50,14 @@ filter_hw_packet_netdev_dummy () { | sort | uniq } +filter_ct_offload_dummy_conn_add () { + grep 'ct_offload_dummy.*conn add:' | sed 's/.*|DBG|//' | sort | uniq +} + +filter_ct_offload_dummy_conn_del () { + grep 'ct_offload_dummy.*conn del:' | sed 's/.*|DBG|//' | sort | uniq +} + filter_flow_dump () { grep 'flow_dump ' | sed ' s/.*flow_dump // @@ -3709,3 +3717,67 @@ AT_CHECK_UNQUOTED([tail -n 1 p1.pcap.txt], [0], [${good_expected_v6} OVS_VSWITCHD_STOP AT_CLEANUP + +dnl Test that the CT offload dummy provider receives conn_add and conn_del +dnl callbacks when packets are processed through a conntrack commit flow on a +dnl dummy datapath with hw-offload enabled. +AT_SETUP([dpif-netdev - conntrack offload dummy]) +AT_KEYWORDS([conntrack offload]) +OVS_VSWITCHD_START( + [add-port br0 p1 -- \ + set interface p1 type=dummy ofport_request=1 \ + options:pstream=punix:$OVS_RUNDIR/p1.sock \ + options:ifindex=1100 -- \ + add-port br0 p2 -- \ + set interface p2 type=dummy ofport_request=2 \ + options:pstream=punix:$OVS_RUNDIR/p2.sock \ + options:ifindex=1101 -- \ + set bridge br0 datapath-type=dummy \ + other-config:datapath-id=1234 fail-mode=secure], [], [], []) + +dnl Enable debug logging for the dpif offload and CT offload dummy modules so +dnl the test can detect hook calls via log grep. +AT_CHECK([ovs-appctl vlog/set dpif_offload_dummy:file:dbg ct_offload_dummy:file:dbg]) + +dnl Enable hardware offload — this registers the "dummy" dpif offload class +dnl and automatically activates the CT offload dummy provider. +AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:hw-offload=true]) +OVS_WAIT_UNTIL([grep "Flow HW offload is enabled" ovs-vswitchd.log]) + +dnl Add a two-table conntrack flow: +dnl table 0: untracked packets → ct(commit) recirculate to table 1 +dnl table 1: tracked packets → output on p2 +AT_CHECK([ovs-ofctl add-flow br0 \ + 'table=0,priority=100,in_port=p1,ip,ct_state=-trk,actions=ct(commit,table=1)']) +AT_CHECK([ovs-ofctl add-flow br0 \ + 'table=1,priority=100,in_port=p1,ip,ct_state=+trk,actions=output:p2']) + +dnl Compose and inject a UDP packet on p1. The first packet misses the +dnl datapath, causes an upcall, executes ct(commit) to create a conntrack +dnl entry, and triggers the ct_offload_dummy conn_add callback. +flow_s="eth_src=50:54:00:00:00:01,eth_dst=50:54:00:00:00:02,udp,ip_src=10.0.0.1,ip_dst=10.0.0.2,ip_frag=no,udp_src=1000,udp_dst=2000" +pkt=$(ovs-ofctl compose-packet --bare "${flow_s}") +AT_CHECK([ovs-appctl netdev-dummy/receive p1 "${pkt}"]) + +dnl Wait for the CT offload dummy conn_add hook to fire. +OVS_WAIT_UNTIL([grep 'ct_offload_dummy.*conn add:' ovs-vswitchd.log]) + +dnl Verify exactly one connection was added. +AT_CHECK([filter_ct_offload_dummy_conn_add < ovs-vswitchd.log | wc -l | tr -d ' '], + [0], [1 +]) + +dnl Flush all conntrack entries — conn_clean is called for every tracked +dnl connection, which invokes ct_offload_conn_del on each registered provider. +AT_CHECK([ovs-appctl dpctl/flush-conntrack]) + +dnl Wait for the CT offload dummy conn_del hook to fire. +OVS_WAIT_UNTIL([grep 'ct_offload_dummy.*conn del:' ovs-vswitchd.log]) + +dnl Verify exactly one connection was deleted. +AT_CHECK([filter_ct_offload_dummy_conn_del < ovs-vswitchd.log | wc -l | tr -d ' '], + [0], [1 +]) + +OVS_VSWITCHD_STOP +AT_CLEANUP diff --git a/tests/library.at b/tests/library.at index 6c5b55f045..2d5b02f75b 100644 --- a/tests/library.at +++ b/tests/library.at @@ -325,3 +325,39 @@ AT_KEYWORDS([conntrack]) AT_CHECK([ovstest test-conntrack private-destructor], [0], [. ]) AT_CLEANUP + +AT_SETUP([conntrack offload dummy - conn add hook]) +AT_KEYWORDS([conntrack offload]) +AT_CHECK([ovstest test-conntrack offload-conn-add], [0], [. +]) +AT_CLEANUP + +AT_SETUP([conntrack offload dummy - conn del hook]) +AT_KEYWORDS([conntrack offload]) +AT_CHECK([ovstest test-conntrack offload-conn-del], [0], [. +]) +AT_CLEANUP + +AT_SETUP([conntrack offload dummy - conn update hook]) +AT_KEYWORDS([conntrack offload]) +AT_CHECK([ovstest test-conntrack offload-conn-update], [0], [. +]) +AT_CLEANUP + +AT_SETUP([conntrack offload dummy - multiple connections]) +AT_KEYWORDS([conntrack offload]) +AT_CHECK([ovstest test-conntrack offload-multi-conn], [0], [. +]) +AT_CLEANUP + +AT_SETUP([conntrack offload dummy - conn established hook (end-to-end)]) +AT_KEYWORDS([conntrack offload]) +AT_CHECK([ovstest test-conntrack offload-conn-established], [0], [. +]) +AT_CLEANUP + +AT_SETUP([conntrack offload dummy - conn established fires exactly once (API)]) +AT_KEYWORDS([conntrack offload]) +AT_CHECK([ovstest test-conntrack offload-conn-established-api], [0], [. +]) +AT_CLEANUP diff --git a/tests/test-conntrack.c b/tests/test-conntrack.c index 3c409b373b..86f1f36d3f 100644 --- a/tests/test-conntrack.c +++ b/tests/test-conntrack.c @@ -17,6 +17,8 @@ #include #include "conntrack.h" #include "conntrack-private.h" +#include "ct-offload.h" +#include "ct-offload-dummy.h" #include "dp-packet.h" #include "fatal-signal.h" @@ -691,6 +693,304 @@ test_private_destructor(struct ovs_cmdl_context *ctx OVS_UNUSED) printf(".\n"); } + +/* =========================================================================== + * CT offload dummy provider tests + * + * These tests exercise the ct_offload provider API directly without going + * through conntrack_execute. The offload global-enable flag is deliberately + * not set here: the unit tests own the provider list and call the API + * functions directly. End-to-end enablement (hw-offload=true via DB config) + * is covered by the dpif-netdev integration test. + * + * Each test must be run as a separate ovstest invocation so that the + * process-global provider list starts empty. + * =========================================================================== + */ + +/* The dummy only compares pointer addresses and never dereferences them, so a + * small integer cast is sufficient. */ +#define FAKE_CONN(n) ((struct conn *)(uintptr_t)(n)) +#define FAKE_NETDEV(n) ((struct netdev *)(uintptr_t)(n)) + +/* Test: offload-conn-add + * ---------------------- + * Register the dummy provider, call ct_offload_conn_add() directly, and + * verify that the conn_add hook was invoked and the connection is tracked. + */ +static void +test_offload_conn_add(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + ct_offload_force_enable(true); + ct_offload_dummy_register(); + + struct conn *fake = FAKE_CONN(1); + struct ct_offload_ctx offload_ctx = { + .conn = fake, .netdev_in = NULL, + }; + ct_offload_conn_add(&offload_ctx); + + ovs_assert(ct_offload_dummy_n_added() == 1); + ovs_assert(ct_offload_dummy_contains(fake)); + + ct_offload_dummy_unregister(); + ct_offload_force_enable(false); + printf(".\n"); +} + +/* Test: offload-conn-del + * ---------------------- + * Register the dummy, add then delete a connection via the API, and verify + * that conn_del was called and the connection is no longer tracked. + */ +static void +test_offload_conn_del(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + ct_offload_force_enable(true); + ct_offload_dummy_register(); + + struct conn *fake = FAKE_CONN(1); + struct ct_offload_ctx offload_ctx = { + .conn = fake, .netdev_in = NULL, + }; + + ct_offload_conn_add(&offload_ctx); + ovs_assert(ct_offload_dummy_n_added() == 1); + + ct_offload_conn_del(&offload_ctx); + ovs_assert(ct_offload_dummy_n_deleted() == 1); + ovs_assert(!ct_offload_dummy_contains(fake)); + + ct_offload_dummy_unregister(); + ct_offload_force_enable(false); + printf(".\n"); +} + +/* Test: offload-conn-update + * ------------------------- + * Register the dummy, add a connection, call ct_offload_conn_update() + * directly, and verify that a non-zero last-used timestamp is returned. + */ +static void +test_offload_conn_update(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + ct_offload_force_enable(true); + ct_offload_dummy_register(); + + struct conn *fake = FAKE_CONN(1); + struct ct_offload_ctx offload_ctx = { + .conn = fake, .netdev_in = NULL, + }; + + ct_offload_conn_add(&offload_ctx); + + long long ts = ct_offload_conn_update(&offload_ctx); + ovs_assert(ts != 0); + ovs_assert(ct_offload_dummy_n_updated() == 1); + + ct_offload_dummy_unregister(); + ct_offload_force_enable(false); + printf(".\n"); +} + +/* Test: offload-multi-conn + * ------------------------ + * Register the dummy, add N connections via the API, and verify that each + * is tracked independently. + */ +#define OFFLOAD_MULTI_N 4 + +static void +test_offload_multi_conn(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + ct_offload_force_enable(true); + ct_offload_dummy_register(); + + for (unsigned i = 1; i <= OFFLOAD_MULTI_N; i++) { + struct ct_offload_ctx offload_ctx = { + .conn = FAKE_CONN(i), .netdev_in = NULL, + }; + ct_offload_conn_add(&offload_ctx); + } + + ovs_assert(ct_offload_dummy_n_added() == OFFLOAD_MULTI_N); + for (unsigned i = 1; i <= OFFLOAD_MULTI_N; i++) { + ovs_assert(ct_offload_dummy_contains(FAKE_CONN(i))); + } + + ct_offload_dummy_unregister(); + ct_offload_force_enable(false); + printf(".\n"); +} + +/* Test: offload-conn-established + * -------------------------------- + * Drive a TCP three-way handshake through conntrack_execute() with the dummy + * offload provider registered. Verifies three properties: + * + * (a) conn_add fires on the SYN (new connection created, forward netdev + * recorded); conn_established does NOT fire yet. + * (b) conn_established fires exactly once on the first ESTABLISHED reply + * (SYN-ACK), recording the reply-direction netdev so that the dummy + * entry is fully bidirectional. + * (c) A subsequent reply packet (ACK) does NOT cause a second + * conn_established call the "exactly once" guarantee holds. + * + * ct_offload_dummy_register() calls ct_offload_force_enable(true), which + * makes ct_offload_enabled() return true so the guards in conntrack.c fire + * without a real hardware offload backend. + */ +static void +test_offload_conn_established(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + /* Allocate the per-connection private slot before registering so that the + * ADD/ESTABLISHED state transitions are tracked in conn->private[]. + * The simple FAKE_CONN tests skip this step because they do not exercise + * the private-slot code path. */ + ct_offload_alloc_private_slot(); + ct_offload_force_enable(true); + ct_offload_dummy_register(); + + struct conntrack *lct = conntrack_init(); + /* Disable TCP sequence-number checking so test packets with seq=0 are + * accepted by the state machine. */ + conntrack_set_tcp_seq_chk(lct, false); + + long long now = time_msec(); + + struct eth_addr eth_a = ETH_ADDR_C(00, 00, 00, 00, 00, 01); + struct eth_addr eth_b = ETH_ADDR_C(00, 00, 00, 00, 00, 02); + ovs_be32 ip_a = inet_addr("10.0.0.1"); + ovs_be32 ip_b = inet_addr("10.0.0.2"); + uint16_t sport = 1234; + uint16_t dport = 80; + + /* --- (a) SYN: forward direction, creates the connection entry. --- */ + struct dp_packet *syn = build_eth_ip_packet(NULL, eth_a, eth_b, + ip_a, ip_b, + IPPROTO_TCP, 0); + build_tcp_packet(syn, sport, dport, TCP_SYN, NULL, 0); + + struct dp_packet_batch syn_batch; + dp_packet_batch_init_packet(&syn_batch, syn); + conntrack_execute(lct, &syn_batch, htons(ETH_TYPE_IP), false, true, 0, + NULL, NULL, NULL, NULL, now, 0, FAKE_NETDEV(1)); + + /* conn_add must have fired; conn_established must not have. */ + ovs_assert(ct_offload_dummy_n_added() == 1); + ovs_assert(ct_offload_dummy_n_established() == 0); + + /* The packet carries the conn pointer after commit. */ + struct conn *conn = syn->md.conn; + ovs_assert(conn != NULL); + ovs_assert(ct_offload_conn_is_offloaded(conn)); + ovs_assert(!ct_offload_conn_is_established(conn)); + + dp_packet_delete_batch(&syn_batch, true); + + /* --- (b) SYN-ACK: reply direction, transitions to ESTABLISHED. --- */ + struct dp_packet *synack = build_eth_ip_packet(NULL, eth_b, eth_a, + ip_b, ip_a, + IPPROTO_TCP, 0); + build_tcp_packet(synack, dport, sport, TCP_SYN | TCP_ACK, NULL, 0); + + struct dp_packet_batch synack_batch; + dp_packet_batch_init_packet(&synack_batch, synack); + conntrack_execute(lct, &synack_batch, htons(ETH_TYPE_IP), false, true, 0, + NULL, NULL, NULL, NULL, now, 0, FAKE_NETDEV(2)); + + /* conn_established fires exactly once on the first ESTABLISHED reply. */ + ovs_assert(ct_offload_dummy_n_established() == 1); + ovs_assert(ct_offload_conn_is_established(conn)); + /* Both netdev pointers are now known: the entry is fully bidirectional. */ + ovs_assert(ct_offload_dummy_is_bidirectional(conn)); + + dp_packet_delete_batch(&synack_batch, true); + + /* --- (c) ACK: another reply packet must NOT trigger conn_established + * again. The private-slot guard enforces this. --- */ + struct dp_packet *ack = build_eth_ip_packet(NULL, eth_b, eth_a, + ip_b, ip_a, + IPPROTO_TCP, 0); + build_tcp_packet(ack, dport, sport, TCP_ACK, NULL, 0); + + struct dp_packet_batch ack_batch; + dp_packet_batch_init_packet(&ack_batch, ack); + conntrack_execute(lct, &ack_batch, htons(ETH_TYPE_IP), false, true, 0, + NULL, NULL, NULL, NULL, now, 0, FAKE_NETDEV(2)); + + /* Counter must still be 1 - conn_established must not have fired again. */ + ovs_assert(ct_offload_dummy_n_established() == 1); + + dp_packet_delete_batch(&ack_batch, true); + + conntrack_destroy(lct); + ct_offload_dummy_unregister(); + ct_offload_force_enable(false); + printf(".\n"); +} + +/* Test: offload-conn-established-api + * ------------------------------------ + * Exercise ct_offload_conn_established() directly (not through + * conntrack_execute) to verify that the "exactly once" guarantee in the + * dispatch layer holds independently of the conntrack state machine. + * + * Sequence: + * 1. conn_add() - transitions the private slot to CT_OFFLOAD_STATE_ADDED. + * 2. conn_established() - should dispatch to the provider exactly once and + * advance the slot to CT_OFFLOAD_STATE_EST. + * 3. A second conn_established() call with the same conn must be a no-op + * (provider not called again, counter unchanged). + */ +static void +test_offload_conn_established_api(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + ct_offload_alloc_private_slot(); + ct_offload_force_enable(true); + ct_offload_dummy_register(); + + /* We need a real conn with a live private-data slot, so spin up a minimal + * conntrack instance and commit one UDP packet to get a conn. */ + struct conntrack *lct = conntrack_init(); + long long now = time_msec(); + + ovs_be16 dl_type; + struct dp_packet *pkt = build_packet(1, 2, &dl_type); + struct dp_packet_batch batch; + dp_packet_batch_init_packet(&batch, pkt); + conntrack_execute(lct, &batch, dl_type, false, true, 0, + NULL, NULL, NULL, NULL, now, 0, FAKE_NETDEV(1)); + struct conn *conn = pkt->md.conn; + ovs_assert(conn != NULL); + dp_packet_delete_batch(&batch, true); + + /* conn_add should have fired (via conntrack_execute). */ + ovs_assert(ct_offload_dummy_n_added() == 1); + ovs_assert(ct_offload_dummy_n_established() == 0); + ovs_assert(ct_offload_conn_is_offloaded(conn)); + ovs_assert(!ct_offload_conn_is_established(conn)); + + /* First call: must dispatch to the provider. */ + struct ct_offload_ctx ctx1 = { + .conn = conn, .netdev_in = FAKE_NETDEV(2), + }; + ct_offload_conn_established(&ctx1); + ovs_assert(ct_offload_dummy_n_established() == 1); + ovs_assert(ct_offload_conn_is_established(conn)); + ovs_assert(ct_offload_dummy_is_bidirectional(conn)); + + /* Second call with the same conn: must be a no-op. */ + ct_offload_conn_established(&ctx1); + + ovs_assert(ct_offload_dummy_n_established() == 1); /* unchanged */ + + conntrack_destroy(lct); + ct_offload_dummy_unregister(); + ct_offload_force_enable(false); + printf(".\n"); +} + static const struct ovs_cmdl_command commands[] = { /* Connection tracker tests. */ @@ -725,6 +1025,20 @@ static const struct ovs_cmdl_command commands[] = { test_private_id_exhaustion, OVS_RO}, {"private-destructor", "", 0, 0, test_private_destructor, OVS_RO}, + /* CT offload dummy provider tests. + * Each must be run as a separate ovstest invocation. */ + {"offload-conn-add", "", 0, 0, + test_offload_conn_add, OVS_RO}, + {"offload-conn-del", "", 0, 0, + test_offload_conn_del, OVS_RO}, + {"offload-conn-update", "", 0, 0, + test_offload_conn_update, OVS_RO}, + {"offload-multi-conn", "", 0, 0, + test_offload_multi_conn, OVS_RO}, + {"offload-conn-established", "", 0, 0, + test_offload_conn_established, OVS_RO}, + {"offload-conn-established-api", "", 0, 0, + test_offload_conn_established_api, OVS_RO}, {NULL, NULL, 0, 0, NULL, OVS_RO}, }; From patchwork Wed Apr 8 17:06:08 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 2221015 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=jNiOCTmW; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4frTyg493Cz1xv0 for ; Thu, 09 Apr 2026 03:07:39 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 3A381610A0; Wed, 8 Apr 2026 17:07:38 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id RuJXu-7W7-ji; Wed, 8 Apr 2026 17:07:37 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 5EFDA60FE6 Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=jNiOCTmW Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id 5EFDA60FE6; Wed, 8 Apr 2026 17:07:32 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 4957CC054A; Wed, 8 Apr 2026 17:07:32 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 18220C0549 for ; Wed, 8 Apr 2026 17:07:31 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 66994826C8 for ; Wed, 8 Apr 2026 17:06:46 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id tCNQc4ZySpg4 for ; Wed, 8 Apr 2026 17:06:45 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=170.10.133.124; helo=us-smtp-delivery-124.mimecast.com; envelope-from=aconole@redhat.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org 4D169826FF Authentication-Results: smtp1.osuosl.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 4D169826FF Authentication-Results: smtp1.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=jNiOCTmW Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 4D169826FF for ; Wed, 8 Apr 2026 17:06:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775668004; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AKVaiu6XOD23ZBqRotaGMOi0GGtT2MNh+Ce+msQLRaE=; b=jNiOCTmWV2NoE4DeUWmlp2Pgr7vKGeJF1mw9NSxgRMBPhS//R58YoY4eeMtwyQPt0l55QM Z0k3rEa19pgZdr082/2OebGpg9U02cTZp136alRD2a3B4F7FgyWa+oVMAYCPzNE6JrW3nj nnlHLTPtQeXnR85TDh3y3CqZJSvKnOk= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-112-M4ztafzNNGSEGuU-FBeD2g-1; Wed, 08 Apr 2026 13:06:43 -0400 X-MC-Unique: M4ztafzNNGSEGuU-FBeD2g-1 X-Mimecast-MFC-AGG-ID: M4ztafzNNGSEGuU-FBeD2g_1775668002 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0E95918005B6; Wed, 8 Apr 2026 17:06:42 +0000 (UTC) Received: from RHTRH0061144.redhat.com (unknown [10.22.89.172]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id A11BA300019F; Wed, 8 Apr 2026 17:06:40 +0000 (UTC) To: dev@openvswitch.org Date: Wed, 8 Apr 2026 13:06:08 -0400 Message-ID: <20260408170613.587902-13-aconole@redhat.com> In-Reply-To: <20260408170613.587902-1-aconole@redhat.com> References: <20260408170613.587902-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: RNfULPXT_6vYyWo-ueM31XZOm3BToc4MUh6GugtyBRw_1775668002 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [RFC 12/12] Documentation: Announce and describe the conntrack offload feature. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Aaron Conole via dev From: Aaron Conole Reply-To: Aaron Conole Cc: Eli Britstein , Florian Westphal , Flavio Leitner Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Signed-off-by: Aaron Conole --- Documentation/automake.mk | 1 + Documentation/topics/index.rst | 1 + .../topics/userspace-conntrack-offloading.rst | 76 +++++++++++++++++++ NEWS | 1 + 4 files changed, 79 insertions(+) create mode 100644 Documentation/topics/userspace-conntrack-offloading.rst diff --git a/Documentation/automake.mk b/Documentation/automake.mk index ea9459b555..7b84af79ba 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -59,6 +59,7 @@ DOC_SOURCE = \ Documentation/topics/tracing.rst \ Documentation/topics/usdt-probes.rst \ Documentation/topics/userspace-checksum-offloading.rst \ + Documentation/topics/userspace-conntrack-offloading.rst \ Documentation/topics/userspace-tso.rst \ Documentation/topics/userspace-tx-steering.rst \ Documentation/topics/windows.rst \ diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst index 9ddb145dd4..871871a3dc 100644 --- a/Documentation/topics/index.rst +++ b/Documentation/topics/index.rst @@ -56,6 +56,7 @@ OVS idl-compound-indexes ovs-extensions userspace-checksum-offloading + userspace-conntrack-offloading userspace-tx-steering usdt-probes flow-visualization diff --git a/Documentation/topics/userspace-conntrack-offloading.rst b/Documentation/topics/userspace-conntrack-offloading.rst new file mode 100644 index 0000000000..26ba838c88 --- /dev/null +++ b/Documentation/topics/userspace-conntrack-offloading.rst @@ -0,0 +1,76 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +========================================= +Userspace Datapath - Conntrack offloading +========================================= + +This document explains the internals of the Open vSwitch userspace connection +tracking offloading. + +Design +------ + +Open vSwitch provides a modified BSD stack based connection tracking facility +which primarily processes packet-at-a-time into various state updates. +This runs inline with the pmd execution pipeline through the +`conntrack_execute` into the `process_one` call. + +The core of the offload mechanism is the `ct_offload_class` structure. This +structure defines the callbacks for offload providers, allowing them to +register for specific connection tracking events. Each offload provider +instance is placed in a list in priority order, and each one is called during +operation processing. There is a single bulked operations interface, but it +currently is limited to calling into each ops list facility-at-a-time. + +All offload is done under a large `ct_offload` lock to keep the offload +provider list coherent. + +Primary Connection Events +------------------------- + +The offload provider handles specific events corresponding to the lifecycle of +a connection. These are call-ins provided by the `ct_offload_class` structure. + +* Connection Add (conn_add) is triggered when a connection is created and + committed to the connection list. + When triggered, the provider receives the conn_add event to initialize + tracking for the new connection. +* Connection Delete (conn_del) is triggered when a connection is removed. + The provider receives the conn_del event to clean up resources. +* Connection Established (conn_established) + This is a special event that occurs exactly once when the first + reply-direction packet is seen for an offloaded connection. + The netdev_in will contain the reply netdev. The offload provider should + have access to the initial netdev from the conn_add and the reply direction + from the conn_established events. This allows the provider to track both + sides of the connection. +* Connection Update (conn_update) is called when the connection tracking (ct) + expiration timer is set to run expiration processing for a connection. + It asks for an update on the packet list. It returns the last-used timestamp + in milliseconds since epoch, or 0 on failure. + +Configuration +------------- +Conntrack offload is configured as part of dpif offloading for userspace. It +utilizes the same configuration knob to enable offloading features. diff --git a/NEWS b/NEWS index 1a3044cbfb..80ee597abb 100644 --- a/NEWS +++ b/NEWS @@ -3,6 +3,7 @@ Post-v3.7.0 - Userspace datapath: * ARP/ND lookups for native tunnel are now rate limited. The holdout timer can be configured with 'tnl/neigh/retrans_time'. + * Add preliminary support for conntrack offloading. v3.7.0 - 16 Feb 2026