Message ID | 20141231133923.GA30248@gondor.apana.org.au |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Jan 01, 2015 at 12:39:23AM +1100, Herbert Xu wrote: > > Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs. > > In fact the problem was completely unrelated to IPsec. The bug is > also reproducible if you just disable TSO/GSO. This raises two interesting questions. Firstly not many people test non-TSO code paths anymore so bugs are likely to persist for a long time there. Perhaps it's time to remove the non-TSO code path altogether? The GSO code path should provide enough speed-up in terms of boosting the effective MTU to offset the cost of copying. Secondly why are we dealing with hardware TSO segment limits by limiting the size of the TSO packet in the TCP stack? Surely in this case GSO is free since there won't be any copying? Cheers,
On Thu, 2015-01-01 at 00:42 +1100, Herbert Xu wrote: > On Thu, Jan 01, 2015 at 12:39:23AM +1100, Herbert Xu wrote: > > > > Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs. > > > > In fact the problem was completely unrelated to IPsec. The bug is > > also reproducible if you just disable TSO/GSO. > > This raises two interesting questions. > > Firstly not many people test non-TSO code paths anymore so bugs > are likely to persist for a long time there. Perhaps it's time > to remove the non-TSO code path altogether? The GSO code path > should provide enough speed-up in terms of boosting the effective > MTU to offset the cost of copying. > Secondly why are we dealing with hardware TSO segment limits > by limiting the size of the TSO packet in the TCP stack? Surely > in this case GSO is free since there won't be any copying? It might depends on the device capabilities. Non TSO/GSO path is known to be better for devices unable to perform TX checksumming, as we compute the checksum at the time we copy data from user to kernel (csum_and_copy_from_user() from tcp_sendmsg())). With BQL+TSQ, having to compute the TX hash means bringing data into cpu caches a second time right before ndo_start_xmit() But maybe this gain is very relative in a full blown configuration, with netfilter / complex qdisc being used. Thanks Herbert ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 02 Jan 2015 10:24:00 -0800 > On Thu, 2015-01-01 at 00:42 +1100, Herbert Xu wrote: >> Firstly not many people test non-TSO code paths anymore so bugs >> are likely to persist for a long time there. Perhaps it's time >> to remove the non-TSO code path altogether? The GSO code path >> should provide enough speed-up in terms of boosting the effective >> MTU to offset the cost of copying. > >> Secondly why are we dealing with hardware TSO segment limits >> by limiting the size of the TSO packet in the TCP stack? Surely >> in this case GSO is free since there won't be any copying? > > It might depends on the device capabilities. > > Non TSO/GSO path is known to be better for devices unable to perform TX > checksumming, as we compute the checksum at the time we copy data from > user to kernel (csum_and_copy_from_user() from tcp_sendmsg())). Non-SG capable devices suffer in this scenerio as well. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Thu, 1 Jan 2015 00:39:23 +1100 > Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs. > > In fact the problem was completely unrelated to IPsec. The bug is > also reproducible if you just disable TSO/GSO. > > The problem is that when the MSS goes down, existing queued packet > on the TX queue that have not been transmitted yet all look like > TSO packets and get treated as such. > > This then triggers a bug where tcp_mss_split_point tells us to > generate a zero-sized packet on the TX queue. Once that happens > we're screwed because the zero-sized packet can never be removed > by ACKs. > > Fixes: 1485348d242 ("tcp: Apply device TSO segment limit earlier") > Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Applied and queued up for -stable, thanks Herbert. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 02, 2015 at 03:36:55PM -0500, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Fri, 02 Jan 2015 10:24:00 -0800 > > > Non TSO/GSO path is known to be better for devices unable to perform TX > > checksumming, as we compute the checksum at the time we copy data from > > user to kernel (csum_and_copy_from_user() from tcp_sendmsg())). > > Non-SG capable devices suffer in this scenerio as well. Yes I was referring to using GSO on non-SG/non-checksumming devices. Anything that supports checksum/SG should obviously be using GSO. IIRC when I first tested this GSO is basically on par for the non-SG case as the overhead of the extra copying was offset by the benefit of a larger MTU through the stack. So has anyone actually observed worse performance with GSO on these devices (you could even test on a GSO-capable device simply by disabling checksumming)? Also the fact that this bug took two years to surface means that very few people are actually using non-GSO in the real world as this bug is easily triggered by a PMTU event. Cheers,
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sat, 3 Jan 2015 09:01:07 +1100 > So has anyone actually observed worse performance with GSO on these > devices (you could even test on a GSO-capable device simply by > disabling checksumming)? Good question. > Also the fact that this bug took two years to surface means that > very few people are actually using non-GSO in the real world as > this bug is easily triggered by a PMTU event. I think the rarity of PMTU events on non-VPN'd connections plays a part in how long it took as well. But I totally accept your point, for sure. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 02, 2015 at 05:06:29PM -0500, David Miller wrote: > > I think the rarity of PMTU events on non-VPN'd connections plays > a part in how long it took as well. Maybe in your neck of the woods but certainly in China I observe loads of PMTU events without involving any VPNs at all :) In fact even the fibre connections here use PPPOE so PMTU is sort of unavoidable. Cheers,
Hi Herbert, On Thursday, 1. January 2015 00:39:23 Herbert Xu wrote: > The problem is that when the MSS goes down, existing queued packet > on the TX queue that have not been transmitted yet all look like > TSO packets and get treated as such. > > This then triggers a bug where tcp_mss_split_point tells us to > generate a zero-sized packet on the TX queue. Once that happens > we're screwed because the zero-sized packet can never be removed > by ACKs. picking up this one again: Is there any valid use case to have zero-sized packets in the TX queue? If not, may be a WARN_ON() could be added to the processing of the TX queue. That would help to prevent future issues like this. Cheers, Thomas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 19, 2015 at 09:19:48AM -0800, Eric Dumazet wrote:
> Zero sized packets are valid, just take a look at tcpdump ;)
But zero-sized packets in the send queue are not valid and will
not ever be cleaned, which is what triggered the original bug.
Although I'm not sure whether a WARN_ON will actually help since
it won't tell you why there is a zero-sized packet on the queue,
just the fact that there is. You'll know you've got a zero-sized
packet on the queue anyway by looking at the tcpdump.
Cheers,
On Mon, Jan 19, 2015 at 2:36 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Mon, Jan 19, 2015 at 09:19:48AM -0800, Eric Dumazet wrote: >> Zero sized packets are valid, just take a look at tcpdump ;) > > But zero-sized packets in the send queue are not valid and will > not ever be cleaned, which is what triggered the original bug. SYN and FIN packets are in the send queue. They are valid and might have a zero size. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 19, 2015 at 02:38:47PM -0800, Eric Dumazet wrote: > > SYN and FIN packets are in the send queue. > > They are valid and might have a zero size. Right, obviously I meant packets in between the SYN and the FIN. Technically SYN/FIN are not really zero-sided since they carry one bit of information and yes they do get cleaned while the ones in the middle will not. Cheers,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 7f18262..65caf8b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2019,7 +2019,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) break; - if (tso_segs == 1) { + if (tso_segs == 1 || !max_segs) { if (unlikely(!tcp_nagle_test(tp, skb, mss_now, (tcp_skb_is_last(sk, skb) ? nonagle : TCP_NAGLE_PUSH)))) @@ -2032,7 +2032,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, } limit = mss_now; - if (tso_segs > 1 && !tcp_urg_mode(tp)) + if (tso_segs > 1 && max_segs && !tcp_urg_mode(tp)) limit = tcp_mss_split_point(sk, skb, mss_now, min_t(unsigned int, cwnd_quota,