Message ID | 1285049787.2323.797.camel@edumazet-laptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 21 Sep 2010 08:16:27 +0200 > [PATCH] ip : take care of last fragment in ip_append_data > > While investigating a bit, I found ip_fragment() slow path was taken > because ip_append_data() provides following layout for a send(MTU + > N*(MTU - 20)) syscall : > > - one skb with 1500 (mtu) bytes > - N fragments of 1480 (mtu-20) bytes (before adding IP header) > last fragment gets 17 bytes of trail data because of following bit: > > if (datalen == length + fraggap) > alloclen += rt->dst.trailer_len; > > Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm... > another bug ?) > > In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu, > so we take slow path, building another skb chain. > > In order to avoid taking slow path, we should correct ip_append_data() > to make sure last fragment has real trail space, under mtu... > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> This patch largely looks fine, but: 1) I want to find out where that "17" tailer_len comes from before applying this, that doesn't make any sense. 2) Even with #1 addressed, this function is tricky so I want to review this patch some more. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 21 septembre 2010 à 16:38 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 21 Sep 2010 08:16:27 +0200 > > > [PATCH] ip : take care of last fragment in ip_append_data > > > > While investigating a bit, I found ip_fragment() slow path was taken > > because ip_append_data() provides following layout for a send(MTU + > > N*(MTU - 20)) syscall : > > > > - one skb with 1500 (mtu) bytes > > - N fragments of 1480 (mtu-20) bytes (before adding IP header) > > last fragment gets 17 bytes of trail data because of following bit: > > > > if (datalen == length + fraggap) > > alloclen += rt->dst.trailer_len; > > > > Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm... > > another bug ?) > > > > In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu, > > so we take slow path, building another skb chain. > > > > In order to avoid taking slow path, we should correct ip_append_data() > > to make sure last fragment has real trail space, under mtu... > > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > This patch largely looks fine, but: > > 1) I want to find out where that "17" tailer_len comes from before > applying this, that doesn't make any sense. > > 2) Even with #1 addressed, this function is tricky so I want to review > this patch some more. The "17" (instead of probable 16 need) comes from : net/ipv4/esp4.c line 599 : x->props.trailer_len = align + 1 + crypto_aead_authsize(esp->aead); In my Nick ipsec script case, crypto_aead_blocksize(aead) = 16, crypto_aead_authsize(esp->aead) = 0 -> align = 16 trailer_len = 16 + 1 + 0; I am not sure we need the "+ 1", but I know nothing about this stuff. Same in net/ipv6/esp6.c ? Anyway the last frag problem is for packets with lengths : MTU + N*(MTU - 20) + LAST LAST being from [(MTU - trailer_len) ... MTU], not only MTU as I wrote in changelog -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 22 Sep 2010 06:44:06 +0200 > The "17" (instead of probable 16 need) comes from : > > net/ipv4/esp4.c line 599 : > > x->props.trailer_len = align + 1 + crypto_aead_authsize(esp->aead); > > In my Nick ipsec script case, > crypto_aead_blocksize(aead) = 16, > crypto_aead_authsize(esp->aead) = 0 > > -> align = 16 > trailer_len = 16 + 1 + 0; > > I am not sure we need the "+ 1", but I know nothing about this stuff. > > Same in net/ipv6/esp6.c ? I think the author of this code thought that the alignment was expressed as "N - 1" instead of plain "N". I'll do some research. :-) Thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 21 Sep 2010 08:16:27 +0200 > [PATCH] ip : take care of last fragment in ip_append_data > > While investigating a bit, I found ip_fragment() slow path was taken > because ip_append_data() provides following layout for a send(MTU + > N*(MTU - 20)) syscall : > > - one skb with 1500 (mtu) bytes > - N fragments of 1480 (mtu-20) bytes (before adding IP header) > last fragment gets 17 bytes of trail data because of following bit: > > if (datalen == length + fraggap) > alloclen += rt->dst.trailer_len; > > Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm... > another bug ?) > > In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu, > so we take slow path, building another skb chain. > > In order to avoid taking slow path, we should correct ip_append_data() > to make sure last fragment has real trail space, under mtu... > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied, thanks Eric. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 04b6989..dfd9a0d 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -927,16 +927,19 @@ alloc_new_skb: !(rt->dst.dev->features&NETIF_F_SG)) alloclen = mtu; else - alloclen = datalen + fragheaderlen; + alloclen = fraglen; /* The last fragment gets additional space at tail. * Note, with MSG_MORE we overallocate on fragments, * because we have no idea what fragment will be * the last. */ - if (datalen == length + fraggap) + if (datalen == length + fraggap) { alloclen += rt->dst.trailer_len; - + /* make sure mtu is not reached */ + if (datalen > mtu - fragheaderlen - rt->dst.trailer_len) + datalen -= ALIGN(rt->dst.trailer_len, 8); + } if (transhdrlen) { skb = sock_alloc_send_skb(sk, alloclen + hh_len + 15,