diff mbox

[RFC] Fixing up TCP/UDP checksum for UDP encap. ESP4 packets in transport mode

Message ID 20090630140835.GA20811@gondor.apana.org.au
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu June 30, 2009, 2:08 p.m. UTC
On Tue, Jun 30, 2009 at 03:00:36PM +0800, Herbert Xu wrote:
> 
> Now as to the technical problem of how to recompute the checksums
> cleanly, may I draw your attention to gso_send_checksum which does
> exactly what you want.

Something like this untested patch.  Note that I still think this
is totally wrong (see the patch description for an explanation).
Perhaps a better way to do this is to write a netfilter module to
fix up checksums on egress.  That way it would be even more explicit
that you should do the checksum verification on the opposite end
as well.

The real solution is to get natoa, or even better, ditch transport
mode if you're doing NAT.

ipsec: Optionally recompute transport mode NAT checksum

We never bothered adjusting transport mode NAT checksums because
as long as the packet originated on the IPsec peer and terminated
on our host, it doesn't really matter much.

However, once you put inner NAT into the equation, the checksum
must be updated for it to work at all.

This patch does this by inanely recomputing the checksum.  Note
that this is almost always wrong! It can only be right if the
traffic originated on the IPsec peer, and terminated beyond our
host.  If it went the other way, that is, it originated beyond
our IPsec peer and termianted on us, then the checksum will not
be verified for the path between the source and the IPsec peer.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


Cheers,

Comments

David Miller July 7, 2009, 1:30 a.m. UTC | #1
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 30 Jun 2009 22:08:35 +0800

> On Tue, Jun 30, 2009 at 03:00:36PM +0800, Herbert Xu wrote:
>> 
>> Now as to the technical problem of how to recompute the checksums
>> cleanly, may I draw your attention to gso_send_checksum which does
>> exactly what you want.
> 
> Something like this untested patch.  Note that I still think this
> is totally wrong (see the patch description for an explanation).
> Perhaps a better way to do this is to write a netfilter module to
> fix up checksums on egress.  That way it would be even more explicit
> that you should do the checksum verification on the opposite end
> as well.
> 
> The real solution is to get natoa, or even better, ditch transport
> mode if you're doing NAT.

Ugly solution or not I don't like this patch because it requires
userspace to set this new attribute just to get correct checksums.

Can't we just detect the "came through remote peer" situation and
just do the checksum fixup in that case?  Anything that doesn't
require use changes, and as you've implemented it the user change
is only possible with netlink IPSEC users.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu July 7, 2009, 1:40 a.m. UTC | #2
On Mon, Jul 06, 2009 at 06:30:29PM -0700, David Miller wrote:
> 
> Ugly solution or not I don't like this patch because it requires
> userspace to set this new attribute just to get correct checksums.
> 
> Can't we just detect the "came through remote peer" situation and
> just do the checksum fixup in that case?  Anything that doesn't
> require use changes, and as you've implemented it the user change
> is only possible with netlink IPSEC users.

Hmm I deliberately didn't want to have this as the default because
I want whoever that enables it to think about the implications.
Having it on by default means that people will just set this up
without realising that they're leaving the packet unprotected by
checksums for a fraction of the path.

As I explained, it's almost impossible to use this without leaving
the packet unprotected at least in one direction.

Having said that I'm fine with turning this into a sysctl or some
global setting that's easier to enable.

Cheers,
David Miller July 7, 2009, 1:54 a.m. UTC | #3
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 7 Jul 2009 09:40:08 +0800

> Hmm I deliberately didn't want to have this as the default because
> I want whoever that enables it to think about the implications.
> Having it on by default means that people will just set this up
> without realising that they're leaving the packet unprotected by
> checksums for a fraction of the path.
> 
> As I explained, it's almost impossible to use this without leaving
> the packet unprotected at least in one direction.
> 
> Having said that I'm fine with turning this into a sysctl or some
> global setting that's easier to enable.

Hmmm, aren't we talking about packets which were protected by either a
hash, strong encryption, or both at some point?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu July 7, 2009, 1:58 a.m. UTC | #4
On Mon, Jul 06, 2009 at 06:54:11PM -0700, David Miller wrote:
> 
> Hmmm, aren't we talking about packets which were protected by either a
> hash, strong encryption, or both at some point?

Only if there is no inner NAT, i.e., only if this patch isn't
needed.  Otherwise

Source  ---  GW1  ----  GW2  ---  Dest

the path between Source and GW1, will be unprotected if transport
mode is used between GW1 and GW2.  The only bit protected by IPsec's
hash is between GW1 and GW2.

When the traffic comes back, then the bit between Dest and GW2 will
be unprotected.  This is why the only safe way to use this would be
if your traffic is one-way and you only had inner NAT at the other
end.

Cheers,
David Miller July 7, 2009, 2:02 a.m. UTC | #5
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 7 Jul 2009 09:58:50 +0800

> On Mon, Jul 06, 2009 at 06:54:11PM -0700, David Miller wrote:
>> 
>> Hmmm, aren't we talking about packets which were protected by either a
>> hash, strong encryption, or both at some point?
> 
> Only if there is no inner NAT, i.e., only if this patch isn't
> needed.  Otherwise
> 
> Source  ---  GW1  ----  GW2  ---  Dest
> 
> the path between Source and GW1, will be unprotected if transport
> mode is used between GW1 and GW2.  The only bit protected by IPsec's
> hash is between GW1 and GW2.
> 
> When the traffic comes back, then the bit between Dest and GW2 will
> be unprotected.  This is why the only safe way to use this would be
> if your traffic is one-way and you only had inner NAT at the other
> end.

I see.  So people are using transport IPSEC + NAT as a kind of
specialized tunnel.

Indeed, there is no way to handle checksums sanely.  The whole
end-to-end protection of the checksum would be entirely subverted
if we fixed it up.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu July 7, 2009, 2:18 a.m. UTC | #6
On Mon, Jul 06, 2009 at 07:02:35PM -0700, David Miller wrote:
>
> Indeed, there is no way to handle checksums sanely.  The whole
> end-to-end protection of the checksum would be entirely subverted
> if we fixed it up.

Exactly, the only safe solution is to use natoa to fix up the
checksums properly (doable in theory, but almost no one actually
uses it in practice, as seen by the fact that it's not even
possible with the current spec of IKEv2), or better yet, use
tunnel mode.  20 bytes is small change on the Internet.

Cheers,
Herbert Xu July 7, 2009, 2:21 a.m. UTC | #7
On Tue, Jul 07, 2009 at 10:18:45AM +0800, Herbert Xu wrote:
> On Mon, Jul 06, 2009 at 07:02:35PM -0700, David Miller wrote:
> >
> > Indeed, there is no way to handle checksums sanely.  The whole
> > end-to-end protection of the checksum would be entirely subverted
> > if we fixed it up.

Oh and if we were going to fix it up, it'd make much more sense
as an explicit netfilter target so people know what they're
getting themselves into.

Cheers,
diff mbox

Patch

diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h
index 2d4ec15..52c96a6 100644
--- a/include/linux/xfrm.h
+++ b/include/linux/xfrm.h
@@ -344,6 +344,7 @@  struct xfrm_usersa_info {
 #define XFRM_STATE_WILDRECV	8
 #define XFRM_STATE_ICMP		16
 #define XFRM_STATE_AF_UNSPEC	32
+#define XFRM_STATE_NATZAPCSUM	64
 };
 
 struct xfrm_usersa_id {
diff --git a/include/net/ip.h b/include/net/ip.h
index 4ac7577..146dcc7 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -405,4 +405,6 @@  int ipv4_doint_and_flush_strategy(ctl_table *table,
 extern int ip_misc_proc_init(void);
 #endif
 
+extern int __inet_gso_send_check(struct sk_buff *skb, int proto);
+
 #endif	/* _IP_H */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7f03373..ea31cfb 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1149,7 +1149,6 @@  EXPORT_SYMBOL(inet_sk_rebuild_header);
 static int inet_gso_send_check(struct sk_buff *skb)
 {
 	struct iphdr *iph;
-	struct net_protocol *ops;
 	int proto;
 	int ihl;
 	int err = -EINVAL;
@@ -1166,10 +1165,21 @@  static int inet_gso_send_check(struct sk_buff *skb)
 		goto out;
 
 	__skb_pull(skb, ihl);
-	skb_reset_transport_header(skb);
 	iph = ip_hdr(skb);
 	proto = iph->protocol & (MAX_INET_PROTOS - 1);
-	err = -EPROTONOSUPPORT;
+
+	return __inet_gso_send_check(skb, proto);
+
+out:
+	return err;
+}
+
+int __inet_gso_send_check(struct sk_buff *skb, int proto)
+{
+	struct net_protocol *ops;
+	int err = -EPROTONOSUPPORT;
+
+	skb_reset_transport_header(skb);
 
 	rcu_read_lock();
 	ops = rcu_dereference(inet_protos[proto]);
@@ -1177,9 +1187,9 @@  static int inet_gso_send_check(struct sk_buff *skb)
 		err = ops->gso_send_check(skb);
 	rcu_read_unlock();
 
-out:
 	return err;
 }
+EXPORT_SYMBOL_GPL(__inet_gso_send_check);
 
 static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int features)
 {
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 18bb383..11f1a34 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -227,6 +227,31 @@  error:
 	return err;
 }
 
+static void esp_fix_csum(struct sk_buff *skb, int proto)
+{
+	struct xfrm_state *x = xfrm_input_state(skb);
+
+	/*
+	 * Ignore UDP/TCP checksums in case
+	 * of NAT-T in Transport Mode, or
+	 * perform other post-processing fixes
+	 * as per draft-ietf-ipsec-udp-encaps-06,
+	 * section 3.1.2
+	 */
+	skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+	if (!(x->props.flags & XFRM_STATE_NATZAPCSUM))
+		return;
+
+	/*
+	 * Aiee! This transport-mode packet may need to be forwarded.
+	 * Prepare the checksum for forwarding.  If it fails we'll
+	 * keep quiet about it since we only support this for very
+	 * few protocols.
+	 */
+	__inet_gso_send_check(skb, proto);
+}
+
 static int esp_input_done2(struct sk_buff *skb, int err)
 {
 	struct iphdr *iph;
@@ -253,6 +278,9 @@  static int esp_input_done2(struct sk_buff *skb, int err)
 	if (padlen + 2 + alen >= elen)
 		goto out;
 
+	pskb_trim(skb, skb->len - (padlen + 2 + alen));
+	__skb_pull(skb, hlen);
+
 	/* ... check padding bits here. Silly. :-) */
 
 	iph = ip_hdr(skb);
@@ -284,19 +312,10 @@  static int esp_input_done2(struct sk_buff *skb, int err)
 			 */
 		}
 
-		/*
-		 * 2) ignore UDP/TCP checksums in case
-		 *    of NAT-T in Transport Mode, or
-		 *    perform other post-processing fixes
-		 *    as per draft-ietf-ipsec-udp-encaps-06,
-		 *    section 3.1.2
-		 */
 		if (x->props.mode == XFRM_MODE_TRANSPORT)
-			skb->ip_summed = CHECKSUM_UNNECESSARY;
+			esp_fix_csum(skb, nexthdr[1]);
 	}
 
-	pskb_trim(skb, skb->len - alen - padlen - 2);
-	__skb_pull(skb, hlen);
 	skb_set_transport_header(skb, -ihl);
 
 	err = nexthdr[1];