From patchwork Tue Oct 2 01:07:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 977565 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42PLcF0mnFz9s3x; Tue, 2 Oct 2018 11:07:56 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1g79AC-0008NR-RA; Tue, 02 Oct 2018 01:07:48 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1g79AA-0008NJ-V6 for kernel-team@lists.ubuntu.com; Tue, 02 Oct 2018 01:07:46 +0000 Received: from mail-qt1-f199.google.com ([209.85.160.199]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1g79AA-0001Vr-LP for kernel-team@lists.ubuntu.com; Tue, 02 Oct 2018 01:07:46 +0000 Received: by mail-qt1-f199.google.com with SMTP id t17-v6so311641qtq.12 for ; Mon, 01 Oct 2018 18:07:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=QljyQe8xOVM+MK+Pg9x6HmIiVklA4dlLHNvUbbjjEmM=; b=aA7OB+IyzGjjpQYIILLPjCLkagLMOupJjtI7PW+BjetU72PtYCpHENQ7/Grub5s+ag wmLF9PqW/9iRp2sLxX4ySQzkvnQXdUNOzQFt1NR79/M/tqqcCjyN6Zmj5houDeaI32DE 07Ya0cu581BFEJIEKFMGUEPgsCWNbb8KwvgXJokWp5pfm57h2zp2bChcXgpOTT8KE+6/ C70M2Y6giOoQJKnjVG5egbHlZ30XqKzP8bWU3gHWo9IfqY5jTVvOWZdQfgHsrJvdWsWf n0zo3pbd1XHUfMLzncHjpJzy115IZidil9yoKNo5Kh8hWFbtx22C5h3fEPRjBL5zp7K9 35Ow== X-Gm-Message-State: ABuFfog0djOPXOIsQoDTeCHhtaGwwby7Ununj1nx/Rni/Xqp1nfcl11E d+lRYTumq3jxiwZlvJNwJ8/W3oxwzVDUEm+3e0RbqPBTu2gWPRJImTJnVKCkZv+einQyoVvupaq WE9iINYlRmA6uEV2HtiQtd/sZ4Bf6pBdeSmLXjJ4+ X-Received: by 2002:a0c:f843:: with SMTP id g3-v6mr92400qvo.221.1538442465305; Mon, 01 Oct 2018 18:07:45 -0700 (PDT) X-Google-Smtp-Source: ACcGV62/++0BI7az1RqjBE2Q0BGrIZOmxz3agy3qg34E5HLZQREWaprjcdBq8ZlPtmqB/AvYIuSsEw== X-Received: by 2002:a0c:f843:: with SMTP id g3-v6mr92390qvo.221.1538442465031; Mon, 01 Oct 2018 18:07:45 -0700 (PDT) Received: from localhost.localdomain ([191.8.86.92]) by smtp.gmail.com with ESMTPSA id t188-v6sm3019644qkc.58.2018.10.01.18.07.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 01 Oct 2018 18:07:43 -0700 (PDT) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [bionic/linux-azure, cosmic/linux-azure][PATCH 1/2] UBUNTU: SAUCE: netfilter: nf_conntrack: resolve clash for matching conntracks Date: Mon, 1 Oct 2018 22:07:37 -0300 Message-Id: <20181002010738.20799-2-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181002010738.20799-1-marcelo.cerri@canonical.com> References: <20181002010738.20799-1-marcelo.cerri@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Martynas Pumputis BugLink: http://bugs.launchpad.net/bugs/1795493 This patch enables the clash resolution for NAT (disabled in "590b52e10d41") if clashing conntracks match (i.e. both tuples are equal) and a protocol allows it. The clash might happen for a connections-less protocol (e.g. UDP) when two threads in parallel writes to the same socket and consequent calls to "get_unique_tuple" return the same tuples (incl. reply tuples). In this case it is safe to perform the resolution, as the losing CT describes the same mangling as the winning CT, so no modifications to the packet are needed, and the result of rules traversal for the loser's packet stays valid. Signed-off-by: Martynas Pumputis Signed-off-by: Pablo Neira Ayuso (cherry picked from commit ed07d9a021df6da53456663a76999189badc432a) [source: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git] Signed-off-by: Marcelo Henrique Cerri --- net/netfilter/nf_conntrack_core.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 3d5280425027..52a4101e8814 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -502,6 +502,18 @@ nf_ct_key_equal(struct nf_conntrack_tuple_hash *h, net_eq(net, nf_ct_net(ct)); } +static inline bool +nf_ct_match(const struct nf_conn *ct1, const struct nf_conn *ct2) +{ + return nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_ORIGINAL].tuple, + &ct2->tuplehash[IP_CT_DIR_ORIGINAL].tuple) && + nf_ct_tuple_equal(&ct1->tuplehash[IP_CT_DIR_REPLY].tuple, + &ct2->tuplehash[IP_CT_DIR_REPLY].tuple) && + nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_ORIGINAL) && + nf_ct_zone_equal(ct1, nf_ct_zone(ct2), IP_CT_DIR_REPLY) && + net_eq(nf_ct_net(ct1), nf_ct_net(ct2)); +} + /* caller must hold rcu readlock and none of the nf_conntrack_locks */ static void nf_ct_gc_expired(struct nf_conn *ct) { @@ -695,19 +707,21 @@ static int nf_ct_resolve_clash(struct net *net, struct sk_buff *skb, /* This is the conntrack entry already in hashes that won race. */ struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); const struct nf_conntrack_l4proto *l4proto; + enum ip_conntrack_info oldinfo; + struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo); l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); if (l4proto->allow_clash && - ((ct->status & IPS_NAT_DONE_MASK) == 0) && !nf_ct_is_dying(ct) && atomic_inc_not_zero(&ct->ct_general.use)) { - enum ip_conntrack_info oldinfo; - struct nf_conn *loser_ct = nf_ct_get(skb, &oldinfo); - - nf_ct_acct_merge(ct, ctinfo, loser_ct); - nf_conntrack_put(&loser_ct->ct_general); - nf_ct_set(skb, ct, oldinfo); - return NF_ACCEPT; + if (((ct->status & IPS_NAT_DONE_MASK) == 0) || + nf_ct_match(ct, loser_ct)) { + nf_ct_acct_merge(ct, ctinfo, loser_ct); + nf_conntrack_put(&loser_ct->ct_general); + nf_ct_set(skb, ct, oldinfo); + return NF_ACCEPT; + } + nf_ct_put(ct); } NF_CT_STAT_INC(net, drop); return NF_DROP; From patchwork Tue Oct 2 01:07:38 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Henrique Cerri X-Patchwork-Id: 977566 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42PLcG3Y9xz9s7T; Tue, 2 Oct 2018 11:07:58 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1g79AF-0008OY-1d; Tue, 02 Oct 2018 01:07:51 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1g79AD-0008Nt-8w for kernel-team@lists.ubuntu.com; Tue, 02 Oct 2018 01:07:49 +0000 Received: from mail-qt1-f198.google.com ([209.85.160.198]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1g79AC-0001Vu-Sq for kernel-team@lists.ubuntu.com; Tue, 02 Oct 2018 01:07:49 +0000 Received: by mail-qt1-f198.google.com with SMTP id z26-v6so341492qtz.4 for ; Mon, 01 Oct 2018 18:07:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=yZBLcdmxliljYnm+POIp0MotZEWnVR+h6CL7uiCSFHs=; b=a9wLFOoCSny3hagNTmr1Qc61wJ/qmxZMtZlb6Tj842xDn8JVt7NMssAwmROO7rvOcO UrQAsfqK/AkOGZcOLDtksjKzEj6JthKq4jqhji+dVteemJ1lDGz1ydGYkiHlSu2p37wK zwbQ9DrDqggA+j4AX3pKnYxkGbaYR2phYYv+RaNfb1aj9h9PhpESOpM16BK3oX+1aWqZ iUI9pi8iMWFmfjkVJQy9KMwOOwNDpr6NUqc2lsZhqP64yYrFlpMTzZ6ruVseUksZIDG+ ikzUWKIUZuqYi5IQwVpu+JEshK4B0fTFeUM1/1sqnoyVy2SiKYjkrHdgRGTJ/NLgKDqd cPyA== X-Gm-Message-State: ABuFfoizxvRPD6L4VdNlLA5hFIn1hkjfJHY/hmHmOMoWr6DO3NIbbru8 EEMrUTcTU2nnsqaBOeWbhzvJt1qLvCIHrk2SiAqPIL3ckCvm9NKkCFnCNGncpAvZc6fr1og5UoB mSK9QKBDjdrysz9OROlNiXO1mQFMqYK7JIix1VR5T X-Received: by 2002:a37:cc1b:: with SMTP id r27-v6mr10543505qki.272.1538442467506; Mon, 01 Oct 2018 18:07:47 -0700 (PDT) X-Google-Smtp-Source: ACcGV61aZ/8b42oksk7jPy4wfoapSWCxwJcxB+71ddF/E90toqzJNJq2DQ03fyg6Y6QTb+JCUtl4qg== X-Received: by 2002:a37:cc1b:: with SMTP id r27-v6mr10543495qki.272.1538442467167; Mon, 01 Oct 2018 18:07:47 -0700 (PDT) Received: from localhost.localdomain ([191.8.86.92]) by smtp.gmail.com with ESMTPSA id t188-v6sm3019644qkc.58.2018.10.01.18.07.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 01 Oct 2018 18:07:46 -0700 (PDT) From: Marcelo Henrique Cerri To: kernel-team@lists.ubuntu.com Subject: [bionic/linux-azure, cosmic/linux-azure][PATCH 2/2] UBUNTU: SAUCE: netfilter: nf_nat: return the same reply tuple for matching CTs Date: Mon, 1 Oct 2018 22:07:38 -0300 Message-Id: <20181002010738.20799-3-marcelo.cerri@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20181002010738.20799-1-marcelo.cerri@canonical.com> References: <20181002010738.20799-1-marcelo.cerri@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Martynas Pumputis BugLink: http://bugs.launchpad.net/bugs/1795493 It is possible that two concurrent packets originating from the same socket of a connection-less protocol (e.g. UDP) can end up having different IP_CT_DIR_REPLY tuples which results in one of the packets being dropped. To illustrate this, consider the following simplified scenario: 1. No DNAT/SNAT/MASQUEARADE rules are installed, but the nf_nat module is loaded. 2. Packet A and B are sent at the same time from two different threads via the same UDP socket which hasn't been used before (=no CT has been created before). Both packets have the same IP_CT_DIR_ORIGINAL tuple. 3. CT of A has been created and confirmed, afterwards get_unique_tuple is called for B. Because IP_CT_DIR_REPLY tuple (the inverse of the IP_CT_DIR_ORIGINAL tuple) is already taken by the A's confirmed CT (nf_nat_used_tuple finds it), get_unique_tuple calls UDP's unique_tuple which returns a different IP_CT_DIR_REPLY tuple (usually with src port = 1024) 4. B's CT cannot get confirmed in __nf_conntrack_confirm due to the found IP_CT_DIR_ORIGINAL tuple of A and the different IP_CT_DIR_REPLY tuples, thus the packet B gets dropped. This patch modifies get_unique_tuple in a way that the function might return the already used by a confirmed CT reply tuple if a L4 protocol allows the clash resolution and IP_CT_DIR_ORIGINAL tuples are equal. Signed-off-by: Martynas Pumputis [from http://patchwork.ozlabs.org/patch/952939/] Signed-off-by: Marcelo Henrique Cerri --- include/net/netfilter/nf_conntrack.h | 5 ++-- include/net/netfilter/nf_nat.h | 3 ++- net/ipv4/netfilter/nf_nat_proto_gre.c | 2 +- net/ipv4/netfilter/nf_nat_proto_icmp.c | 2 +- net/ipv6/netfilter/nf_nat_proto_icmpv6.c | 2 +- net/netfilter/nf_conntrack_core.c | 12 ++++++--- net/netfilter/nf_nat_core.c | 34 +++++++++++++++++++----- net/netfilter/nf_nat_proto_common.c | 2 +- 8 files changed, 46 insertions(+), 16 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 062dc19b5840..498d5d8159f5 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -135,8 +135,9 @@ void nf_conntrack_alter_reply(struct nf_conn *ct, /* Is this tuple taken? (ignoring any belonging to the given conntrack). */ -int nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple, - const struct nf_conn *ignored_conntrack); +int nf_conntrack_reply_tuple_taken(const struct nf_conntrack_tuple *tuple, + const struct nf_conn *ignored_conntrack, + bool ignore_same_orig); #define NFCT_INFOMASK 7UL #define NFCT_PTRMASK ~(NFCT_INFOMASK) diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h index a17eb2f8d40e..fee9737a65a7 100644 --- a/include/net/netfilter/nf_nat.h +++ b/include/net/netfilter/nf_nat.h @@ -49,7 +49,8 @@ struct nf_conn_nat *nf_ct_nat_ext_add(struct nf_conn *ct); /* Is this tuple already taken? (not by us)*/ int nf_nat_used_tuple(const struct nf_conntrack_tuple *tuple, - const struct nf_conn *ignored_conntrack); + const struct nf_conn *ignored_conntrack, + bool ignore_same_orig); static inline struct nf_conn_nat *nfct_nat(const struct nf_conn *ct) { diff --git a/net/ipv4/netfilter/nf_nat_proto_gre.c b/net/ipv4/netfilter/nf_nat_proto_gre.c index 00fda6331ce5..c3083b68d3c2 100644 --- a/net/ipv4/netfilter/nf_nat_proto_gre.c +++ b/net/ipv4/netfilter/nf_nat_proto_gre.c @@ -72,7 +72,7 @@ gre_unique_tuple(const struct nf_nat_l3proto *l3proto, for (i = 0; ; ++key) { *keyptr = htons(min + key % range_size); - if (++i == range_size || !nf_nat_used_tuple(tuple, ct)) + if (++i == range_size || !nf_nat_used_tuple(tuple, ct, false)) return; } diff --git a/net/ipv4/netfilter/nf_nat_proto_icmp.c b/net/ipv4/netfilter/nf_nat_proto_icmp.c index 6d7cf1d79baf..589e9a9b5509 100644 --- a/net/ipv4/netfilter/nf_nat_proto_icmp.c +++ b/net/ipv4/netfilter/nf_nat_proto_icmp.c @@ -47,7 +47,7 @@ icmp_unique_tuple(const struct nf_nat_l3proto *l3proto, for (i = 0; ; ++id) { tuple->src.u.icmp.id = htons(ntohs(range->min_proto.icmp.id) + (id % range_size)); - if (++i == range_size || !nf_nat_used_tuple(tuple, ct)) + if (++i == range_size || !nf_nat_used_tuple(tuple, ct, false)) return; } return; diff --git a/net/ipv6/netfilter/nf_nat_proto_icmpv6.c b/net/ipv6/netfilter/nf_nat_proto_icmpv6.c index d9bf42ba44fa..cf47f5f549ee 100644 --- a/net/ipv6/netfilter/nf_nat_proto_icmpv6.c +++ b/net/ipv6/netfilter/nf_nat_proto_icmpv6.c @@ -49,7 +49,7 @@ icmpv6_unique_tuple(const struct nf_nat_l3proto *l3proto, for (i = 0; ; ++id) { tuple->src.u.icmp.id = htons(ntohs(range->min_proto.icmp.id) + (id % range_size)); - if (++i == range_size || !nf_nat_used_tuple(tuple, ct)) + if (++i == range_size || !nf_nat_used_tuple(tuple, ct, false)) return; } } diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 52a4101e8814..8c8c1780c56c 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -847,8 +847,9 @@ EXPORT_SYMBOL_GPL(__nf_conntrack_confirm); /* Returns true if a connection correspondings to the tuple (required for NAT). */ int -nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple, - const struct nf_conn *ignored_conntrack) +nf_conntrack_reply_tuple_taken(const struct nf_conntrack_tuple *tuple, + const struct nf_conn *ignored_conntrack, + bool ignore_same_orig) { struct net *net = nf_ct_net(ignored_conntrack); const struct nf_conntrack_zone *zone; @@ -877,6 +878,11 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple, } if (nf_ct_key_equal(h, tuple, zone, net)) { + if (ignore_same_orig && + nf_ct_tuple_equal(&ignored_conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple, + &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)) { + continue; + } NF_CT_STAT_INC_ATOMIC(net, found); rcu_read_unlock(); return 1; @@ -892,7 +898,7 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple, return 0; } -EXPORT_SYMBOL_GPL(nf_conntrack_tuple_taken); +EXPORT_SYMBOL_GPL(nf_conntrack_reply_tuple_taken); #define NF_CT_EVICTION_RANGE 8 diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index 46f9df99d276..12fb39d953e0 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -154,7 +154,8 @@ hash_by_src(const struct net *n, const struct nf_conntrack_tuple *tuple) /* Is this tuple already taken? (not by us) */ int nf_nat_used_tuple(const struct nf_conntrack_tuple *tuple, - const struct nf_conn *ignored_conntrack) + const struct nf_conn *ignored_conntrack, + bool ignore_same_orig) { /* Conntrack tracking doesn't keep track of outgoing tuples; only * incoming ones. NAT means they don't have a fixed mapping, @@ -165,7 +166,15 @@ nf_nat_used_tuple(const struct nf_conntrack_tuple *tuple, struct nf_conntrack_tuple reply; nf_ct_invert_tuplepr(&reply, tuple); - return nf_conntrack_tuple_taken(&reply, ignored_conntrack); + /* If ignore_same_orig is enabled, the following function will ignore + * any matching CT with the same IP_CT_DIR_ORIGINAL tuple. + * + * Used when calling the function for a CT of a connection-less protocol + * such as UDP to ignore a clashing CT which originated from the same + * socket. + */ + return nf_conntrack_reply_tuple_taken(&reply, ignored_conntrack, + ignore_same_orig); } EXPORT_SYMBOL(nf_nat_used_tuple); @@ -323,7 +332,9 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, const struct nf_conntrack_zone *zone; const struct nf_nat_l3proto *l3proto; const struct nf_nat_l4proto *l4proto; + const struct nf_conntrack_l4proto *ct_l4proto; struct net *net = nf_ct_net(ct); + bool ignore_same_orig = false; zone = nf_ct_zone(ct); @@ -331,6 +342,16 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, l3proto = __nf_nat_l3proto_find(orig_tuple->src.l3num); l4proto = __nf_nat_l4proto_find(orig_tuple->src.l3num, orig_tuple->dst.protonum); + ct_l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct)); + + /* If the protocol allows the clash resolution, then when searching + * for clashing CTs ignore the ones with the same IP_CT_DIR_ORIGINAL + * tuple as they originate from the same socket. This prevents from + * generating different reply tuples for two racing packets from + * the same connection-less (e.g. UDP) socket which results in dropping + * one of the packets in __nf_conntrack_confirm. + */ + ignore_same_orig = ct_l4proto->allow_clash; /* 1) If this srcip/proto/src-proto-part is currently mapped, * and that same mapping gives a unique tuple within the given @@ -344,14 +365,15 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, !(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) { /* try the original tuple first */ if (in_range(l3proto, l4proto, orig_tuple, range)) { - if (!nf_nat_used_tuple(orig_tuple, ct)) { + if (!nf_nat_used_tuple(orig_tuple, ct, + ignore_same_orig)) { *tuple = *orig_tuple; goto out; } } else if (find_appropriate_src(net, zone, l3proto, l4proto, orig_tuple, tuple, range)) { pr_debug("get_unique_tuple: Found current src map\n"); - if (!nf_nat_used_tuple(tuple, ct)) + if (!nf_nat_used_tuple(tuple, ct, ignore_same_orig)) goto out; } } @@ -372,9 +394,9 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, &range->min_proto, &range->max_proto) && (range->min_proto.all == range->max_proto.all || - !nf_nat_used_tuple(tuple, ct))) + !nf_nat_used_tuple(tuple, ct, ignore_same_orig))) goto out; - } else if (!nf_nat_used_tuple(tuple, ct)) { + } else if (!nf_nat_used_tuple(tuple, ct, ignore_same_orig)) { goto out; } } diff --git a/net/netfilter/nf_nat_proto_common.c b/net/netfilter/nf_nat_proto_common.c index 5d849d835561..851517cdfbd7 100644 --- a/net/netfilter/nf_nat_proto_common.c +++ b/net/netfilter/nf_nat_proto_common.c @@ -91,7 +91,7 @@ void nf_nat_l4proto_unique_tuple(const struct nf_nat_l3proto *l3proto, for (i = 0; ; ++off) { *portptr = htons(min + off % range_size); - if (++i != range_size && nf_nat_used_tuple(tuple, ct)) + if (++i != range_size && nf_nat_used_tuple(tuple, ct, false)) continue; if (!(range->flags & (NF_NAT_RANGE_PROTO_RANDOM_ALL| NF_NAT_RANGE_PROTO_OFFSET)))