From patchwork Mon Jun 21 10:06:33 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paolo Valerio
X-Patchwork-Id: 1494997
Return-Path:
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org
(client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org;
envelope-from=ovs-dev-bounces@openvswitch.org; receiver=)
Authentication-Results: ozlabs.org;
dkim=fail reason="signature verification failed" (1024-bit key;
unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256
header.s=mimecast20190719 header.b=K+VfcY5S;
dkim-atps=neutral
Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
SHA256)
(No client certificate requested)
by ozlabs.org (Postfix) with ESMTPS id 4G7lYp66Vqz9sRN
for ; Mon, 21 Jun 2021 20:06:54 +1000 (AEST)
Received: from localhost (localhost [127.0.0.1])
by smtp2.osuosl.org (Postfix) with ESMTP id 85340403CE;
Mon, 21 Jun 2021 10:06:52 +0000 (UTC)
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from smtp2.osuosl.org ([127.0.0.1])
by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id m4Np5fEdDxIg; Mon, 21 Jun 2021 10:06:51 +0000 (UTC)
Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56])
by smtp2.osuosl.org (Postfix) with ESMTPS id 388C240218;
Mon, 21 Jun 2021 10:06:50 +0000 (UTC)
Received: from lf-lists.osuosl.org (localhost [127.0.0.1])
by lists.linuxfoundation.org (Postfix) with ESMTP id EAF68C001C;
Mon, 21 Jun 2021 10:06:49 +0000 (UTC)
X-Original-To: dev@openvswitch.org
Delivered-To: ovs-dev@lists.linuxfoundation.org
Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133])
by lists.linuxfoundation.org (Postfix) with ESMTP id 17A4CC000C
for ; Mon, 21 Jun 2021 10:06:48 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
by smtp2.osuosl.org (Postfix) with ESMTP id 525BA400A9
for ; Mon, 21 Jun 2021 10:06:40 +0000 (UTC)
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from smtp2.osuosl.org ([127.0.0.1])
by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id c_D--mZtGR6W for ;
Mon, 21 Jun 2021 10:06:39 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
Received: from us-smtp-delivery-124.mimecast.com
(us-smtp-delivery-124.mimecast.com [170.10.133.124])
by smtp2.osuosl.org (Postfix) with ESMTPS id E8A31403A3
for ; Mon, 21 Jun 2021 10:06:38 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
s=mimecast20190719; t=1624269998;
h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
to:to:cc:cc:mime-version:mime-version:content-type:content-type:
content-transfer-encoding:content-transfer-encoding:
in-reply-to:in-reply-to:references:references;
bh=5raXzCkWh0DvHQSwmhz7TWSuyALV6kNeSRJd5p46H6Y=;
b=K+VfcY5SF5v+4zghhPChK3OfI9YTe3S/XggamCnXW5HNpU4jgJNacieHg4SvImqVmlsodr
WrShaJ2xN/p5HH75TVumgCW90WkHsg+z1+USebSqQImgd+32I9xf30Ry5Ix/5Y/Z+nxPJn
yz/ZmXK+i6DSJHyCuJd25uZGZ7k/9i8=
Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com
[209.85.221.71]) (Using TLS) by relay.mimecast.com with ESMTP id
us-mta-584-pzd0BmDoNS2bAuR1QcOX6A-1; Mon, 21 Jun 2021 06:06:36 -0400
X-MC-Unique: pzd0BmDoNS2bAuR1QcOX6A-1
Received: by mail-wr1-f71.google.com with SMTP id
u16-20020a5d51500000b029011a6a17cf62so7887611wrt.13
for ; Mon, 21 Jun 2021 03:06:36 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:subject:from:to:cc:date:message-id:in-reply-to
:references:user-agent:mime-version:content-transfer-encoding;
bh=5raXzCkWh0DvHQSwmhz7TWSuyALV6kNeSRJd5p46H6Y=;
b=p8SMEU9up76rhTl+qEIJgqocAY9KqTnUFpAiyt55NTZuqilpLewPalSU1Z4cp67zFG
r7j4w+vADOa3GJDdpCcDFwsdPyuMBg0IPrHXSWm2aKE26ZxkJPJxZOSRz0ffnHCuO6uP
VtOSOihRKxU/wJlzsNw/kAzSZ+8YiO3VsQn8QdPUs/peQkrZ3eiARbLsjj3VJet9AxBl
qVT3vkfEAo9+Zwqz9q0j5qr7nbZdOMxMK3epFRZpmuIxuCQXTpScM8IhMrFQUbE5V8Bn
mC5tcoZG0B1bZqtpQIuzwY43KmGqJKXqmrw3gZaCJub4sdM90jRm2XfGKdrvjyXKFrqT
BKug==
X-Gm-Message-State: AOAM5336dD2gc6UOCQnPaY2laMVh6j3YYJs6b75P5S6NhgDT7JaMucqT
RZlN341P/DGPUdIQCHTZFUw9iyTpfXaMk4FRhHMOLeIJ4yrNxcfP8mZVign8VYfrefzns9ctsRb
q8fkI87flKV4cV2+MjL5zIDWtoMQeq5jMrQwspntbBk+nIMZNq66xFdeoVYy1x4hp
X-Received: by 2002:a05:600c:4f0c:: with SMTP id
l12mr26165294wmq.123.1624269995052;
Mon, 21 Jun 2021 03:06:35 -0700 (PDT)
X-Google-Smtp-Source:
ABdhPJxhcX3zD00DCDDVJ4O5vcZIOiH0qlfaEFh5XZsIHSlSkfi6PYfLegjCOlGxmgp5pu6WlrG1Xw==
X-Received: by 2002:a05:600c:4f0c:: with SMTP id
l12mr26165258wmq.123.1624269994725;
Mon, 21 Jun 2021 03:06:34 -0700 (PDT)
Received: from localhost (net-93-146-7-51.cust.vodafonedsl.it. [93.146.7.51])
by smtp.gmail.com with ESMTPSA id
b11sm902623wmj.25.2021.06.21.03.06.34
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Mon, 21 Jun 2021 03:06:34 -0700 (PDT)
From: Paolo Valerio
To: dev@openvswitch.org
Date: Mon, 21 Jun 2021 12:06:33 +0200
Message-ID: <162426999162.3650320.8139157744646493261.stgit@fed.void>
In-Reply-To: <162426992691.3650320.15060936163617998030.stgit@fed.void>
References: <162426992691.3650320.15060936163617998030.stgit@fed.void>
User-Agent: StGit/0.23
MIME-Version: 1.0
Authentication-Results: relay.mimecast.com;
auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pvalerio@redhat.com
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Cc: i.maximets@ovn.org
Subject: [ovs-dev] [PATCH v7 3/4] conntrack: handle SNAT with all-zero IP
address
X-BeenThere: ovs-dev@openvswitch.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: ovs-dev-bounces@openvswitch.org
Sender: "dev"
this patch introduces for the userspace datapath the handling
of rules like the following:
ct(commit,nat(src=0.0.0.0),...)
Kernel datapath already handle this case that is particularly
handy in scenarios like the following:
Given A: 10.1.1.1, B: 192.168.2.100, C: 10.1.1.2
A opens a connection toward B on port 80 selecting as source port 10000.
B's IP gets dnat'ed to C's IP (10.1.1.1:10000 -> 192.168.2.100:80).
This will result in:
tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000),protoinfo=(state=ESTABLISHED)
A now tries to establish another connection with C using source port
10000, this time using C's IP address (10.1.1.1:10000 -> 10.1.1.2:80).
This second connection, if processed by conntrack with no SNAT/DNAT
involved, collides with the reverse tuple of the first connection,
so the entry for this valid connection doesn't get created.
With this commit, and adding a SNAT rule with 0.0.0.0 for
10.1.1.1:10000 -> 10.1.1.2:80 will allow to create the conn entry:
tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10001),protoinfo=(state=ESTABLISHED)
tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000),protoinfo=(state=ESTABLISHED)
The issue exists even in the opposite case (with A trying to connect
to C using B's IP after establishing a direct connection from A to C).
This commit refactors the relevant function in a way that both of the
previously mentioned cases are handled as well.
Suggested-by: Eelco Chaudron
Signed-off-by: Paolo Valerio
Acked-by: Gaetan Rivet
Acked-by: Aaron Conole
---
NEWS | 3
lib/conntrack-private.h | 33 ++++
lib/conntrack.c | 335 ++++++++++++++++++++++++--------------
lib/ovs-actions.xml | 3
tests/system-userspace-macros.at | 8 -
5 files changed, 251 insertions(+), 131 deletions(-)
diff --git a/NEWS b/NEWS
index ebba17b22..ca6f52522 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,9 @@ Post-v2.15.0
- Userspace datapath:
* Auto load balancing of PMDs now partially supports cross-NUMA polling
cases, e.g if all PMD threads are running on the same NUMA node.
+ * Add all-zero IP SNAT handling to conntrack. In case of collision,
+ using ct(src=0.0.0.0), the source port will be replaced with another
+ non-colliding port in the ephemeral range (1024, 65535).
- ovs-ctl:
* New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index e8332bdba..cc2fb045d 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -148,6 +148,39 @@ enum ct_update_res {
CT_TIMEOUT(ICMP_FIRST) \
CT_TIMEOUT(ICMP_REPLY)
+#define NAT_ACTION_SNAT_ALL (NAT_ACTION_SRC | NAT_ACTION_SRC_PORT)
+#define NAT_ACTION_DNAT_ALL (NAT_ACTION_DST | NAT_ACTION_DST_PORT)
+
+enum ct_ephemeral_range {
+ MIN_NAT_EPHEMERAL_PORT = 1024,
+ MAX_NAT_EPHEMERAL_PORT = 65535
+};
+
+#define IN_RANGE(curr, min, max) \
+ (curr >= min && curr <= max)
+
+#define NEXT_PORT_IN_RANGE(curr, min, max) \
+ (curr = (!IN_RANGE(curr, min, max) || curr == max) ? min : curr + 1)
+
+/* if the current port is out of range increase the attempts by
+ * one so that in the worst case scenario the current out of
+ * range port plus all the in-range ports get tested.
+ * Note that curr can be an out of range port only in case of
+ * source port (SNAT with port range unspecified or DNAT),
+ * furthermore the source port in the packet has to be less than
+ * MIN_NAT_EPHEMERAL_PORT. */
+#define N_PORT_ATTEMPTS(curr, min, max) \
+ ((!IN_RANGE(curr, min, max)) ? (max - min) + 2 : (max - min) + 1)
+
+/* loose in-range check, the first curr port can be any port out of
+ * the range. */
+#define FOR_EACH_PORT_IN_RANGE__(curr, min, max, INAME) \
+ for (uint16_t INAME = N_PORT_ATTEMPTS(curr, min, max); \
+ INAME > 0; INAME--, NEXT_PORT_IN_RANGE(curr, min, max))
+
+#define FOR_EACH_PORT_IN_RANGE(curr, min, max) \
+ FOR_EACH_PORT_IN_RANGE__(curr, min, max, OVS_JOIN(idx, __COUNTER__))
+
enum ct_timeout {
#define CT_TIMEOUT(NAME) CT_TM_##NAME,
CT_TIMEOUTS
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 7e8b16a3e..f49382adb 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -108,8 +108,8 @@ static void set_label(struct dp_packet *, struct conn *,
static void *clean_thread_main(void *f_);
static bool
-nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
- struct conn *nat_conn);
+nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn,
+ struct conn *nat_conn);
static uint8_t
reverse_icmp_type(uint8_t type);
@@ -728,11 +728,11 @@ pat_packet(struct dp_packet *pkt, const struct conn *conn)
}
} else if (conn->nat_info->nat_action & NAT_ACTION_DST) {
if (conn->key.nw_proto == IPPROTO_TCP) {
- struct tcp_header *th = dp_packet_l4(pkt);
- packet_set_tcp_port(pkt, th->tcp_src, conn->rev_key.src.port);
+ packet_set_tcp_port(pkt, conn->rev_key.dst.port,
+ conn->rev_key.src.port);
} else if (conn->key.nw_proto == IPPROTO_UDP) {
- struct udp_header *uh = dp_packet_l4(pkt);
- packet_set_udp_port(pkt, uh->udp_src, conn->rev_key.src.port);
+ packet_set_udp_port(pkt, conn->rev_key.dst.port,
+ conn->rev_key.src.port);
}
}
}
@@ -786,11 +786,9 @@ un_pat_packet(struct dp_packet *pkt, const struct conn *conn)
}
} else if (conn->nat_info->nat_action & NAT_ACTION_DST) {
if (conn->key.nw_proto == IPPROTO_TCP) {
- struct tcp_header *th = dp_packet_l4(pkt);
- packet_set_tcp_port(pkt, conn->key.dst.port, th->tcp_dst);
+ packet_set_tcp_port(pkt, conn->key.dst.port, conn->key.src.port);
} else if (conn->key.nw_proto == IPPROTO_UDP) {
- struct udp_header *uh = dp_packet_l4(pkt);
- packet_set_udp_port(pkt, conn->key.dst.port, uh->udp_dst);
+ packet_set_udp_port(pkt, conn->key.dst.port, conn->key.src.port);
}
}
}
@@ -810,12 +808,10 @@ reverse_pat_packet(struct dp_packet *pkt, const struct conn *conn)
}
} else if (conn->nat_info->nat_action & NAT_ACTION_DST) {
if (conn->key.nw_proto == IPPROTO_TCP) {
- struct tcp_header *th_in = dp_packet_l4(pkt);
- packet_set_tcp_port(pkt, th_in->tcp_src,
+ packet_set_tcp_port(pkt, conn->key.src.port,
conn->key.dst.port);
} else if (conn->key.nw_proto == IPPROTO_UDP) {
- struct udp_header *uh_in = dp_packet_l4(pkt);
- packet_set_udp_port(pkt, uh_in->udp_src,
+ packet_set_udp_port(pkt, conn->key.src.port,
conn->key.dst.port);
}
}
@@ -1029,14 +1025,14 @@ conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
}
} else {
memcpy(nat_conn, nc, sizeof *nat_conn);
- bool nat_res = nat_select_range_tuple(ct, nc, nat_conn);
+ bool nat_res = nat_get_unique_tuple(ct, nc, nat_conn);
if (!nat_res) {
goto nat_res_exhaustion;
}
/* Update nc with nat adjustments made to nat_conn by
- * nat_select_range_tuple(). */
+ * nat_get_unique_tuple(). */
memcpy(nc, nat_conn, sizeof *nc);
}
@@ -2238,130 +2234,221 @@ nat_range_hash(const struct conn *conn, uint32_t basis)
return hash_finish(hash, 0);
}
-static bool
-nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
- struct conn *nat_conn)
-{
- enum { MIN_NAT_EPHEMERAL_PORT = 1024,
- MAX_NAT_EPHEMERAL_PORT = 65535 };
-
- uint16_t min_port;
- uint16_t max_port;
- uint16_t first_port;
- uint32_t hash = nat_range_hash(conn, ct->hash_basis);
+/* Ports are stored in host byte order for convenience. */
+static void
+set_sport_range(struct nat_action_info_t *ni, const struct conn_key *k,
+ uint32_t hash, uint16_t *curr, uint16_t *min,
+ uint16_t *max)
+{
+ if (((ni->nat_action & NAT_ACTION_SNAT_ALL) == NAT_ACTION_SRC) ||
+ ((ni->nat_action & NAT_ACTION_DST))) {
+ *curr = ntohs(k->src.port);
+ *min = MIN_NAT_EPHEMERAL_PORT;
+ *max = MAX_NAT_EPHEMERAL_PORT;
+ } else {
+ *min = ni->min_port;
+ *max = ni->max_port;
+ *curr = *min + (hash % ((*max - *min) + 1));
+ }
+}
- if ((conn->nat_info->nat_action & NAT_ACTION_SRC) &&
- (!(conn->nat_info->nat_action & NAT_ACTION_SRC_PORT))) {
- min_port = ntohs(conn->key.src.port);
- max_port = ntohs(conn->key.src.port);
- first_port = min_port;
- } else if ((conn->nat_info->nat_action & NAT_ACTION_DST) &&
- (!(conn->nat_info->nat_action & NAT_ACTION_DST_PORT))) {
- min_port = ntohs(conn->key.dst.port);
- max_port = ntohs(conn->key.dst.port);
- first_port = min_port;
+static void
+set_dport_range(struct nat_action_info_t *ni, const struct conn_key *k,
+ uint32_t hash, uint16_t *curr, uint16_t *min,
+ uint16_t *max)
+{
+ if (ni->nat_action & NAT_ACTION_DST_PORT) {
+ *min = ni->min_port;
+ *max = ni->max_port;
+ *curr = *min + (hash % ((*max - *min) + 1));
} else {
- uint16_t deltap = conn->nat_info->max_port - conn->nat_info->min_port;
- uint32_t port_index = hash % (deltap + 1);
- first_port = conn->nat_info->min_port + port_index;
- min_port = conn->nat_info->min_port;
- max_port = conn->nat_info->max_port;
+ *curr = ntohs(k->dst.port);
+ *min = *max = *curr;
}
+}
- uint32_t deltaa = 0;
- uint32_t address_index;
- union ct_addr ct_addr;
- memset(&ct_addr, 0, sizeof ct_addr);
- union ct_addr max_ct_addr;
- memset(&max_ct_addr, 0, sizeof max_ct_addr);
- max_ct_addr = conn->nat_info->max_addr;
+/* Gets the initial in range address based on the hash.
+ * Addresses are kept in network order. */
+static void
+get_addr_in_range(union ct_addr *min, union ct_addr *max,
+ union ct_addr *curr, uint32_t hash,
+ bool ipv4)
+{
+ uint32_t offt, range;
- if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
- deltaa = ntohl(conn->nat_info->max_addr.ipv4) -
- ntohl(conn->nat_info->min_addr.ipv4);
- address_index = hash % (deltaa + 1);
- ct_addr.ipv4 = htonl(
- ntohl(conn->nat_info->min_addr.ipv4) + address_index);
+ if (ipv4) {
+ range = (ntohl(max->ipv4) - ntohl(min->ipv4)) + 1;
+ offt = hash % range;
+ curr->ipv4 = htonl(ntohl(min->ipv4) + offt);
} else {
- deltaa = nat_ipv6_addrs_delta(&conn->nat_info->min_addr.ipv6,
- &conn->nat_info->max_addr.ipv6);
- /* deltaa must be within 32 bits for full hash coverage. A 64 or
+ range = nat_ipv6_addrs_delta(&min->ipv6,
+ &max->ipv6) + 1;
+ /* range must be within 32 bits for full hash coverage. A 64 or
* 128 bit hash is unnecessary and hence not used here. Most code
* is kept common with V4; nat_ipv6_addrs_delta() will do the
* enforcement via max_ct_addr. */
- max_ct_addr = conn->nat_info->min_addr;
- nat_ipv6_addr_increment(&max_ct_addr.ipv6, deltaa);
- address_index = hash % (deltaa + 1);
- ct_addr.ipv6 = conn->nat_info->min_addr.ipv6;
- nat_ipv6_addr_increment(&ct_addr.ipv6, address_index);
- }
-
- uint16_t port = first_port;
- bool all_ports_tried = false;
- /* For DNAT or for specified port ranges, we don't use ephemeral ports. */
- bool ephemeral_ports_tried
- = conn->nat_info->nat_action & NAT_ACTION_DST ||
- conn->nat_info->nat_action & NAT_ACTION_SRC_PORT
- ? true : false;
- union ct_addr first_addr = ct_addr;
- bool pat_enabled = conn->key.nw_proto == IPPROTO_TCP ||
- conn->key.nw_proto == IPPROTO_UDP;
-
- while (true) {
+ offt = hash % range;
+ curr->ipv6 = min->ipv6;
+ nat_ipv6_addr_increment(&curr->ipv6, offt);
+ }
+}
+
+static void
+get_initial_addr(const struct conn *conn, union ct_addr *min,
+ union ct_addr *max, union ct_addr *curr,
+ uint32_t hash, bool ipv4)
+{
+ const union ct_addr zero_ip = {0};
+
+ /* all-zero CASE */
+ if (!memcmp(min, &zero_ip, sizeof(*min))) {
if (conn->nat_info->nat_action & NAT_ACTION_SRC) {
- nat_conn->rev_key.dst.addr = ct_addr;
- if (pat_enabled) {
- nat_conn->rev_key.dst.port = htons(port);
- }
- } else {
- nat_conn->rev_key.src.addr = ct_addr;
- if (pat_enabled) {
- nat_conn->rev_key.src.port = htons(port);
- }
+ *curr = conn->key.src.addr;
+ } else if (conn->nat_info->nat_action & NAT_ACTION_DST) {
+ *curr = conn->key.dst.addr;
+ }
+ } else {
+ get_addr_in_range(min, max, curr, hash, ipv4);
+ }
+}
+
+static void
+store_addr_to_key(union ct_addr *addr, struct conn_key *key,
+ uint16_t action)
+{
+ if (action & NAT_ACTION_SRC) {
+ key->dst.addr = *addr;
+ } else {
+ key->src.addr = *addr;
+ }
+}
+
+static void
+next_addr_in_range(union ct_addr *curr, union ct_addr *min,
+ union ct_addr *max, bool ipv4)
+{
+ if (ipv4) {
+ /* this check could be unified with IPv6, but let's avoid
+ * an unneeded memcmp() in case of IPv4. */
+ if (min->ipv4 == max->ipv4) {
+ return;
+ }
+
+ curr->ipv4 = (curr->ipv4 == max->ipv4) ?
+ min->ipv4 :
+ htonl(ntohl(curr->ipv4) + 1);
+ } else {
+ if (!memcmp(min, max, sizeof(*min))) {
+ return;
+ }
+
+ if (!memcmp(curr, max, sizeof(*curr))) {
+ *curr = *min;
+ return;
}
- bool found = conn_lookup(ct, &nat_conn->rev_key, time_msec(), NULL,
- NULL);
- if (!found) {
+ nat_ipv6_addr_increment(&curr->ipv6, 1);
+ }
+}
+
+static bool
+next_addr_in_range_guarded(union ct_addr *curr, union ct_addr *min,
+ union ct_addr *max, union ct_addr *guard,
+ bool ipv4)
+{
+ bool exhausted;
+
+ next_addr_in_range(curr, min, max, ipv4);
+
+ if (ipv4) {
+ exhausted = (curr->ipv4 == guard->ipv4);
+ } else {
+ exhausted = !memcmp(curr, guard, sizeof(*curr));
+ }
+
+ return exhausted;
+}
+
+/* This function tries to get a unique tuple.
+ * Every iteration checks that the reverse tuple doesn't
+ * collide with any existing one.
+ *
+ * in case of SNAT:
+ * - for each src IP address in the range (if any)
+ * - try to find a source port in range (if any)
+ * - if no port range exists, use the whole
+ * ephemeral range (after testing the port
+ * used by the sender), otherwise use the
+ * specified range
+ *
+ * in case of DNAT:
+ * - for each dst IP address in the range (if any)
+ * - for each dport in range (if any)
+ * - try to find a source port in the ephemeral range
+ * (after testing the port used by the sender)
+ *
+ * If none can be found, return exhaustion to the caller. */
+static bool
+nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn,
+ struct conn *nat_conn)
+{
+ union ct_addr min_addr = {0}, max_addr = {0}, curr_addr = {0},
+ guard_addr = {0};
+ uint32_t hash = nat_range_hash(conn, ct->hash_basis);
+ bool pat_proto = conn->key.nw_proto == IPPROTO_TCP ||
+ conn->key.nw_proto == IPPROTO_UDP;
+ uint16_t min_dport, max_dport, curr_dport;
+ uint16_t min_sport, max_sport, curr_sport;
+
+ min_addr = conn->nat_info->min_addr;
+ max_addr = conn->nat_info->max_addr;
+
+ get_initial_addr(conn, &min_addr, &max_addr, &curr_addr, hash,
+ (conn->key.dl_type == htons(ETH_TYPE_IP)));
+
+ /* save the address we started from so that
+ * we can stop once we reach it. */
+ guard_addr = curr_addr;
+
+ set_sport_range(conn->nat_info, &conn->key, hash, &curr_sport,
+ &min_sport, &max_sport);
+ set_dport_range(conn->nat_info, &conn->key, hash, &curr_dport,
+ &min_dport, &max_dport);
+
+another_round:
+ store_addr_to_key(&curr_addr, &nat_conn->rev_key,
+ conn->nat_info->nat_action);
+
+ if (!pat_proto) {
+ if (!conn_lookup(ct, &nat_conn->rev_key,
+ time_msec(), NULL, NULL)) {
return true;
- } else if (pat_enabled && !all_ports_tried) {
- if (min_port == max_port) {
- all_ports_tried = true;
- } else if (port == max_port) {
- port = min_port;
- } else {
- port++;
- }
- if (port == first_port) {
- all_ports_tried = true;
- }
- } else {
- if (memcmp(&ct_addr, &max_ct_addr, sizeof ct_addr)) {
- if (conn->key.dl_type == htons(ETH_TYPE_IP)) {
- ct_addr.ipv4 = htonl(ntohl(ct_addr.ipv4) + 1);
- } else {
- nat_ipv6_addr_increment(&ct_addr.ipv6, 1);
- }
- } else {
- ct_addr = conn->nat_info->min_addr;
- }
- if (!memcmp(&ct_addr, &first_addr, sizeof ct_addr)) {
- if (pat_enabled && !ephemeral_ports_tried) {
- ephemeral_ports_tried = true;
- ct_addr = conn->nat_info->min_addr;
- first_addr = ct_addr;
- min_port = MIN_NAT_EPHEMERAL_PORT;
- max_port = MAX_NAT_EPHEMERAL_PORT;
- } else {
- break;
- }
+ }
+
+ goto next_addr;
+ }
+
+ FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, max_dport) {
+ nat_conn->rev_key.src.port = htons(curr_dport);
+ FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) {
+ nat_conn->rev_key.dst.port = htons(curr_sport);
+ if (!conn_lookup(ct, &nat_conn->rev_key,
+ time_msec(), NULL, NULL)) {
+ return true;
}
- first_port = min_port;
- port = first_port;
- all_ports_tried = false;
}
}
- return false;
+
+ /* Check if next IP is in range and respin. Otherwise, notify
+ * exhaustion to the caller. */
+next_addr:
+ if (next_addr_in_range_guarded(&curr_addr, &min_addr,
+ &max_addr, &guard_addr,
+ conn->key.dl_type == htons(ETH_TYPE_IP))) {
+ return false;
+ }
+
+ goto another_round;
}
static enum ct_update_res
diff --git a/lib/ovs-actions.xml b/lib/ovs-actions.xml
index 1668e5187..9bfc7ddd1 100644
--- a/lib/ovs-actions.xml
+++ b/lib/ovs-actions.xml
@@ -2138,8 +2138,7 @@ for i in [1,n_members]:
nat(src=0.0.0.0)
. In this case, when a source port
collision is detected during the commit, the source port will be
translated to an ephemeral port. If there is no collision, no SNAT
- is performed. Note that this is currently only implemented in the
- Linux kernel datapath.
+ is performed.
diff --git a/tests/system-userspace-macros.at b/tests/system-userspace-macros.at
index 9f0d38dfb..f639ba53a 100644
--- a/tests/system-userspace-macros.at
+++ b/tests/system-userspace-macros.at
@@ -99,12 +99,10 @@ m4_define([CHECK_CONNTRACK_NAT])
# CHECK_CONNTRACK_ZEROIP_SNAT()
#
# Perform requirements checks for running conntrack all-zero IP SNAT tests.
-# The userspace datapath does not support all-zero IP SNAT.
+# The userspace datapath always supports all-zero IP SNAT, so no check is
+# needed.
#
-m4_define([CHECK_CONNTRACK_ZEROIP_SNAT],
-[
- AT_SKIP_IF([:])
-])
+m4_define([CHECK_CONNTRACK_ZEROIP_SNAT])
# CHECK_CONNTRACK_TIMEOUT()
#