diff mbox

xfrm: UFO + ESP = double fragmentation

Message ID 20160129234424.GC7907@midget.suse.cz
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Bohac Jan. 29, 2016, 11:44 p.m. UTC
Hi,

I'm seeing wrong fragmentation on locally generated UDPv6 packets
going out over ESP (transport mode):

UFO is turned on on the outgoing interface and MTU is 1500.
When 8 kB is written to a UDP socket, udpv6_sendmsg() calls
ip_append_data() which generates a single 8 kB GSO skb.

Through ip6_send_skb() it reaches xfrm_output(). Since
skb_is_gso(skb) is nonzero, xfrm_output_gso() is called.
It immediatelly segments the skb via skb_gso_segment() and then
calls xfrm_output2() on each individual segment.

This is wrong. RFC4303 says:
	3.3.4.  Fragmentation
	   If necessary, fragmentation is performed after ESP
	   processing within an IPsec implementation.  Thus,
	   transport mode ESP is applied only to whole IP
	   datagrams (not to IP fragments).

Instead, xfrm_output_gso() applies the transform to each segment.
Since both the fragmentation header _and_ the ESP headers now
don't fit in the MTU and the ESP-encapsulated segments
are fragmented for a second time in ip6_finish_output().

The outcome is:
- the original 8k UDP packet is split into 6 ESP fragments
- the first 5 ESP fragments are 1508 bytes each, thus fragmented
  again into two fragments 

The destination host replies with ICMP parameter problem.

How is this supposed to work?
This hack fixes this specific case:



Is there a situation when xfrm_output_gso() does the right thing?

Thanks,

Comments

Herbert Xu Jan. 30, 2016, 4:21 a.m. UTC | #1
On Sat, Jan 30, 2016 at 12:44:24AM +0100, Jiri Bohac wrote:
>
> Is there a situation when xfrm_output_gso() does the right thing?

Yes because you've just broken TSO over IPsec.

In fact you're remarkably close to the right solution which is
to avoid xfrm_output_gso for SKB_GSO_UDP packets.

You should also work through all the other types (e.g., tunnels)
one-by-one and determine which ones should be fragmented and
which ones shouldn't.

Cheers,
diff mbox

Patch

--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -198,7 +198,7 @@  int xfrm_output(struct sock *sk, struct sk_buff *skb)
 	int err;
 
 	if (skb_is_gso(skb))
-		return xfrm_output_gso(net, sk, skb);
+		return xfrm_output2(net, sk, skb);
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		err = skb_checksum_help(skb);