Message ID | 20210923131243.24071-3-fw@strlen.de |
---|---|
State | RFC |
Delegated to: | Pablo Neira |
Headers | show |
Series | [nf,1/2] selftests: nft_nat: add udp hole punch test case | expand |
Florian Westphal <fw@strlen.de> wrote: > PoC, incomplete -- ipv4 udp only. > > Ipv6 support needs help from ipv6 indirection infra. > > Also lacks direction support: the check should only be done > for nf_conn objects created by externally generated packets. Alternate fix idea: 1. store skb->skb_iif in nf_conn. This means locally vs. remote-generated nf_conn can be identified via ct->skb_iff != 0. 2. For "remote" case, force following behaviour: check that sport > dport and sport > 1024. OTOH, this isn't transparent to users and might cause issues with very very old "credential passing" applications that insist on using privileged port range (< 1024) :-/
On Fri, Oct 01, 2021 at 03:21:28PM +0200, Florian Westphal wrote: > Florian Westphal <fw@strlen.de> wrote: > > PoC, incomplete -- ipv4 udp only. > > > > Ipv6 support needs help from ipv6 indirection infra. > > > > Also lacks direction support: the check should only be done > > for nf_conn objects created by externally generated packets. > > Alternate fix idea: > > 1. store skb->skb_iif in nf_conn. > > This means locally vs. remote-generated nf_conn can be identified > via ct->skb_iff != 0. > > 2. For "remote" case, force following behaviour: > check that sport > dport and sport > 1024. > > OTOH, this isn't transparent to users and might cause issues > with very very old "credential passing" applications that insist > on using privileged port range (< 1024) :-/ Can't this be just expressed through ruleset? I mean, conditionally masquerade depending on whether the packet is locally generated or not, for remove for sport > 1024 range.
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Fri, Oct 01, 2021 at 03:21:28PM +0200, Florian Westphal wrote: > > Alternate fix idea: > > > > 1. store skb->skb_iif in nf_conn. > > > > This means locally vs. remote-generated nf_conn can be identified > > via ct->skb_iff != 0. > > > > 2. For "remote" case, force following behaviour: > > check that sport > dport and sport > 1024. > > > > OTOH, this isn't transparent to users and might cause issues > > with very very old "credential passing" applications that insist > > on using privileged port range (< 1024) :-/ > > Can't this be just expressed through ruleset? I mean, conditionally > masquerade depending on whether the packet is locally generated or > not, for remove for sport > 1024 range. Yes, see patch #1, it demos a couple of ruleset based fixes/mitigations for this problem.
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index 273117683922..843b639200f8 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -24,6 +24,7 @@ #include <net/netfilter/nf_nat.h> #include <net/netfilter/nf_nat_helper.h> #include <uapi/linux/netfilter/nf_nat.h> +#include <net/udp.h> #include "nf_internals.h" @@ -372,6 +373,30 @@ find_best_ips_proto(const struct nf_conntrack_zone *zone, } } +static bool is_port_shadow(struct net *net, const struct nf_conntrack_tuple *tuple) +{ + const struct sock *sk; + __be32 saddr, daddr; + __be16 sport, dport; + + if (tuple->src.l3num != NFPROTO_IPV4 || + tuple->dst.protonum != IPPROTO_UDP) + return false; + + saddr = tuple->dst.u3.ip; + daddr = tuple->src.u3.ip; + sport = tuple->dst.u.udp.port; + dport = tuple->src.u.udp.port; + + sk = __udp4_lib_lookup(net, saddr, sport, daddr, dport, 0, 0, &udp_table, NULL); + + /* if this returns a socket, then replies might be reverse-natted and + * forwarded instead of being delivered to the local socket. + */ + + return sk != NULL; +} + /* Alter the per-proto part of the tuple (depending on maniptype), to * give a unique tuple in the given range if possible. * @@ -483,6 +508,10 @@ static void nf_nat_l4proto_unique_tuple(struct nf_conntrack_tuple *tuple, another_round: for (i = 0; i < attempts; i++, off++) { *keyptr = htons(min + off % range_size); + + if (maniptype == NF_NAT_MANIP_SRC && is_port_shadow(nf_ct_net(ct), tuple)) + continue; + if (!nf_nat_used_tuple(tuple, ct)) return; } @@ -507,6 +536,7 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, struct nf_conn *ct, enum nf_nat_manip_type maniptype) { + bool force_random = range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL; const struct nf_conntrack_zone *zone; struct net *net = nf_ct_net(ct); @@ -520,8 +550,12 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, * So far, we don't do local source mappings, so multiple * manips not an issue. */ - if (maniptype == NF_NAT_MANIP_SRC && - !(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) { + if (maniptype == NF_NAT_MANIP_SRC && !force_random) { + if (is_port_shadow(nf_ct_net(ct), orig_tuple)) { + force_random = true; + goto find_best_ips; + } + /* try the original tuple first */ if (in_range(orig_tuple, range)) { if (!nf_nat_used_tuple(orig_tuple, ct)) { @@ -536,6 +570,7 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, } } +find_best_ips: /* 2) Select the least-used IP/proto combination in the given range */ *tuple = *orig_tuple; find_best_ips_proto(zone, tuple, range, ct, maniptype); @@ -545,7 +580,7 @@ get_unique_tuple(struct nf_conntrack_tuple *tuple, */ /* Only bother mapping if it's not already in range and unique */ - if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) { + if (!force_random) { if (range->flags & NF_NAT_RANGE_PROTO_SPECIFIED) { if (!(range->flags & NF_NAT_RANGE_PROTO_OFFSET) && l4proto_in_range(tuple, maniptype,