Message ID | CAHmME9rapx6z1T76ik2vhF0bBotVy+GFkTTDY810aNso7ViV5w@mail.gmail.com |
---|---|
State | Not Applicable |
Delegated to: | Pablo Neira |
Headers | show |
Series | net/icmp: restore source address if packet is NATed | expand |
Jason A. Donenfeld <Jason@zx2c4.com> wrote: > When I sent this to netdev back in June, Dave pointed out how horrible > this is, since it breaks all sorts of layering. The result of that > discussion was that something like this -- the backwards > transformation and the correct rate limiting -- belongs inside > netfilter and not polluting the icmp code directly. He ended by > telling me, "I highly encourage you to continue pursuing the netfilter > based approach, and to also discuss it on netfilter-devel which will > hit more capable minds than just here on netdev." So, a few months > late, I'm forwarding this email here, in case anybody is interested. If I understand this correctly this cannot be fixed inside netfilter. > Most of the time, the icmp_send and icmpv6_send routines can just reach > down into the skb's IP header to determine the saddr. However, if > icmp_send or icmpv6_send is being called from a network device driver -- There you have it. Why TF do network drivers send icmp error messages from ndo_start_xmit()?! > there are a few in the tree -- then it's possible that by the time > icmp_send or icmpv6_send looks at the packet, the packet's source > address has already been transformed by SNAT or MASQUERADE or some other > transformation that CONNTRACK knows about. In this case, the packet's > source address is most certainly the *wrong* source address to be used > for the purpose of ICMP replies. Right. But then, *iff* you want to send packets in response to packets fed to ndo_start_xmit then you only have 2.5 choices: 1. accept that you will send it to wrong destination if snat was applied 2. something like your patch 3. add a icmp_ndo_xmit helper (variant of 2) to keep the nfct query out of normal stacsks icmp_send function. I think correct solution is to never allow drivers to do something like this. What is the use case here? Normal IP stack should send imcp/pkttobig errors, and that will work fine since it occurs pre-snat. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 9, 2017 at 9:01 AM, Florian Westphal <fw@strlen.de> wrote: > I think correct solution is to never allow drivers to do something like > this. What is the use case here? Normal IP stack should send > imcp/pkttobig errors, and that will work fine since it occurs pre-snat. The use case is particular tunneling devices -- virtual interfaces. It's possible that a tunnel device won't be able to transmit a packet, due to some particular state issue -- no encryption key available, incomplete configuration for a particular destination, etc -- and in that case, it's quite important to be able to send an error -- ICMP_DEST_UNREACH,ICMP_HOST_UNREACH -- back to the sender, rather than letting packets disappear into a blackhole without any useful notification. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index c2be26b98b5f..30aa6aa79fd2 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -97,6 +97,10 @@ #include <net/inet_common.h> #include <net/ip_fib.h> #include <net/l3mdev.h> +#if IS_ENABLED(CONFIG_NF_CONNTRACK) +#include <net/netfilter/nf_conntrack.h> +#include <net/netfilter/nf_nat_core.h> +#endif /* * Build xmit assembly blocks @@ -586,6 +590,10 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) u32 mark; struct net *net; struct sock *sk; +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; +#endif if (!rt) goto out; @@ -604,6 +612,19 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info) goto out; /* + * If this function is called after the skb has already been + * NAT transformed, the ratelimiting will apply to the wrong + * saddr, and the reply will will be marked as coming from the + * wrong host. So, we fix it up here in case connection tracking + * enables that. + */ +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + ct = nf_ct_get(skb_in, &ctinfo); + if (ct) + iph->saddr = ct->tuplehash[0].tuple.src.u3.ip; +#endif + + /* * No replies to physical multicast/broadcast */ if (skb_in->pkt_type != PACKET_HOST) diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 8d7b113958b1..ee8a2853121e 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -69,6 +69,10 @@ #include <net/inet_common.h> #include <net/dsfield.h> #include <net/l3mdev.h> +#if IS_ENABLED(CONFIG_NF_CONNTRACK) +#include <net/netfilter/nf_conntrack.h> +#include <net/netfilter/nf_nat_core.h> +#endif #include <linux/uaccess.h> @@ -422,12 +426,29 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, int len; int err = 0; u32 mark = IP6_REPLY_MARK(net, skb->mark); +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; +#endif if ((u8 *)hdr < skb->head || (skb_network_header(skb) + sizeof(*hdr)) > skb_tail_pointer(skb)) return; /* + * If this function is called after the skb has already been + * NAT transformed, the ratelimiting will apply to the wrong + * saddr, and the reply will will be marked as coming from the + * wrong host. So, we fix it up here in case connection tracking + * enables that. + */ +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + ct = nf_ct_get(skb, &ctinfo); + if (ct) + hdr->saddr = ct->tuplehash[0].tuple.src.u3.in6; +#endif + + /* * Make sure we respect the rules * i.e. RFC 1885 2.4(e)