Message ID | E1LCl3n-00052k-6h@gondolin.me.apana.org.au |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 17 Dec 2008, Herbert Xu wrote: > Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > > I should have noticed this earlier... :-) The previous solution > > to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig > > patterns, or even worse, a steep downward slope into packet > > counting because each skb pcount would be truncated to pcount > > of 2 and then the following fragments of the later portion would > > restore the window again. > > How about just handling it, URG isn't that hard. If anyone is > bored you can do this for GRO as well. > > tcp: Add GSO support for urgent data > > When segmenting packets with urgent data we need to clear the > urgent flag/pointer once we move past the end of the urgent data. > This was not handled correctly in GSO or (presumably) in existing > hardware. > > This patch adds a new GSO flag for TCP urgent data and also fixes > its handling in GSO. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> This is certainly nice and possible but as mention earlier, the harder problem is that the urg_ptr is only 16-bits so there will be still be problem if the urgent seq-64k mark is within a super-skb as tail segments need urg while the head ones won't. I think we would need out-of-band means to communicate the extra bit (17th) from tcp if we want to solve that one. It would be rather easy to change TCP to only do splitting for that particular skb though and keep everything else intact. ...The same consideration applies to gro as well.
Herbert Xu píše v St 17. 12. 2008 v 12:18 +1100: > Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > > I should have noticed this earlier... :-) The previous solution > > to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig > > patterns, or even worse, a steep downward slope into packet > > counting because each skb pcount would be truncated to pcount > > of 2 and then the following fragments of the later portion would > > restore the window again. > > How about just handling it, URG isn't that hard. If anyone is > bored you can do this for GRO as well. > > tcp: Add GSO support for urgent data > > When segmenting packets with urgent data we need to clear the > urgent flag/pointer once we move past the end of the urgent data. > This was not handled correctly in GSO or (presumably) in existing > hardware. > > This patch adds a new GSO flag for TCP urgent data and also fixes > its handling in GSO. This would work, but: 1. Currently this code path will never get run, because large offloads are avoided both for hardware TSO and software GSO. You'd also have to modify the conditionals that guard calls to tcp_mss_split_point(). 2. Is it worth the trouble? This slows down the software GSO slightly. I mean, urgent data should be quite rare, so it makes sense to treat non-urgent data as "fast path" and urgent data as "slow path". While Ilpo's approach only slows down the slow path, your patch slows down both. No, I don't have any numbers, no. ;) Anyway, we have to keep the code that avoids large offloads with URG, because many NICs are broken in this regard (at least all Intel NICs I've seen so far), so why treat software GSO specially? Petr Tesarik > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > Cheers, > -- > Visit Openswan at http://www.openswan.org/ > Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt > -- > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index e26f549..4508c48 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -532,9 +532,11 @@ struct net_device > #define NETIF_F_GSO_ROBUST (SKB_GSO_DODGY << NETIF_F_GSO_SHIFT) > #define NETIF_F_TSO_ECN (SKB_GSO_TCP_ECN << NETIF_F_GSO_SHIFT) > #define NETIF_F_TSO6 (SKB_GSO_TCPV6 << NETIF_F_GSO_SHIFT) > +#define NETIF_F_TSO_URG (SKB_GSO_TCP_URG << NETIF_F_GSO_SHIFT) > > /* List of features with software fallbacks. */ > -#define NETIF_F_GSO_SOFTWARE (NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6) > +#define NETIF_F_GSO_SOFTWARE (NETIF_F_TSO | NETIF_F_TSO_ECN | \ > + NETIF_F_TSO6 | NETIF_F_TSO_URG) > > > #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 2725f4e..56e9179 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -188,6 +188,9 @@ enum { > SKB_GSO_TCP_ECN = 1 << 3, > > SKB_GSO_TCPV6 = 1 << 4, > + > + /* This indicates the tcp segment has URG set. */ > + SKB_GSO_TCP_URG = 1 << 5, > }; > > #if BITS_PER_LONG > 32 > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 1aa2dc9..33305f6 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1202,6 +1202,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int features) > SKB_GSO_UDP | > SKB_GSO_DODGY | > SKB_GSO_TCP_ECN | > + SKB_GSO_TCP_URG | > 0))) > goto out; > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index c5aca0b..e6796a8 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -2383,6 +2383,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features) > __be32 delta; > unsigned int oldlen; > unsigned int len; > + unsigned int urg_ptr; > > if (!pskb_may_pull(skb, sizeof(*th))) > goto out; > @@ -2408,6 +2409,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features) > SKB_GSO_DODGY | > SKB_GSO_TCP_ECN | > SKB_GSO_TCPV6 | > + SKB_GSO_TCP_URG | > 0) || > !(type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))) > goto out; > @@ -2429,6 +2431,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features) > skb = segs; > th = tcp_hdr(skb); > seq = ntohl(th->seq); > + urg_ptr = th->urg ? ntohs(th->urg_ptr) : 0; > > do { > th->fin = th->psh = 0; > @@ -2446,6 +2449,14 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features) > > th->seq = htonl(seq); > th->cwr = 0; > + > + if (urg_ptr <= len) > + urg_ptr = 0; > + else > + urg_ptr -= len; > + > + th->urg = !!urg_ptr; > + th->urg_ptr = htons(urg_ptr); > } while (skb->next); > > delta = htonl(oldlen + (skb->tail - skb->transport_header) + > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index fe3b4bd..d4dc0fa 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -667,6 +667,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, > between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))) { > th->urg_ptr = htons(tp->snd_up - tcb->seq); > th->urg = 1; > + skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_URG; > } > > tcp_options_write((__be32 *)(th + 1), tp, &opts, &md5_hash_location); > diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c > index 01edac8..c11d3d9 100644 > --- a/net/ipv6/af_inet6.c > +++ b/net/ipv6/af_inet6.c > @@ -746,6 +746,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, int features) > SKB_GSO_DODGY | > SKB_GSO_TCP_ECN | > SKB_GSO_TCPV6 | > + SKB_GSO_TCP_URG | > 0))) > goto out; > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 17 Dec 2008, Petr Tesarik wrote: > Herbert Xu píse v St 17. 12. 2008 v 12:18 +1100: > > Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > > > > I should have noticed this earlier... :-) The previous solution > > > to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig > > > patterns, or even worse, a steep downward slope into packet > > > counting because each skb pcount would be truncated to pcount > > > of 2 and then the following fragments of the later portion would > > > restore the window again. > > > > How about just handling it, URG isn't that hard. If anyone is > > bored you can do this for GRO as well. > > > > tcp: Add GSO support for urgent data > > > > When segmenting packets with urgent data we need to clear the > > urgent flag/pointer once we move past the end of the urgent data. > > This was not handled correctly in GSO or (presumably) in existing > > hardware. > > > > This patch adds a new GSO flag for TCP urgent data and also fixes > > its handling in GSO. > > This would work, but: > > 1. Currently this code path will never get run, because large > offloads are avoided both for hardware TSO and software GSO. > You'd also have to modify the conditionals that guard calls > to tcp_mss_split_point(). > > 2. Is it worth the trouble? > This slows down the software GSO slightly. I mean, urgent > data should be quite rare, so it makes sense to treat > non-urgent data as "fast path" and urgent data as "slow > path". While Ilpo's approach only slows down the slow path, > your patch slows down both. My intention is to make it slightly more fine-grained on tcp side still, so that affected portion only includes those segments that really reside within 64k from the urg seq and rest will be unaffected. > No, I don't have any numbers, no. ;) To put this in contrast (an example for elsewhere)... Once we got any SACK blocks, we were doing tcp_fragment for almost every incoming segment, in practice for the whole window (I made it a lot better in net-next though still some tcp_fragment()s will be needed), and that send-used fragment calls for tso_fragment which is less heavy-weight than sack-used tcp_fragment. > Anyway, we have to keep the code that avoids large offloads with URG, > because many NICs are broken in this regard (at least all Intel NICs > I've seen so far), so why treat software GSO specially? All TSO implementation will be subject to the other problem I mentioned in my response to Herbert. There just isn't enough bits for them to do things right w.r.t. 64k-before point.
Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote: > > This is certainly nice and possible but as mention earlier, the harder > problem is that the urg_ptr is only 16-bits so there will be still be > problem if the urgent seq-64k mark is within a super-skb as tail segments > need urg while the head ones won't. I think we would need out-of-band > means to communicate the extra bit (17th) from tcp if we want to solve > that one. It would be rather easy to change TCP to only do splitting for > that particular skb though and keep everything else intact. I think you've got it the wrong way around. The test for setting urg_ptr is: if (unlikely(tcp_urg_mode(tp) && between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))) { So if this is true for the super-packet, then by definition it is also true for the first packet. In fact, it will be true for the first n packets where tcb->seq + n * mss + 1 <= tp->snd_up The only time when this will fall apart is if we had super-packets bigger than 64K. Fortunately or unfortunately that is a long way off. Cheers,
Petr Tesarik <ptesarik@suse.cz> wrote: > > 1. Currently this code path will never get run, because large > offloads are avoided both for hardware TSO and software GSO. > You'd also have to modify the conditionals that guard calls > to tcp_mss_split_point(). Obviously one would revert the existing fixes after this is applied. > 2. Is it worth the trouble? > This slows down the software GSO slightly. I mean, urgent > data should be quite rare, so it makes sense to treat > non-urgent data as "fast path" and urgent data as "slow > path". While Ilpo's approach only slows down the slow path, > your patch slows down both. Well it might slow down the GSO path slightly, but it speeds up the TSO generation path accordingly :) I think the cost is negligible in either case so the benefit of TSO dominates. > Anyway, we have to keep the code that avoids large offloads with URG, > because many NICs are broken in this regard (at least all Intel NICs > I've seen so far), so why treat software GSO specially? That's not a problem. Unless the NIC sets NETIF_F_TSO_URG (which nobody does currently because it's a new flag), we'll do the segmentation in software, but only if the packet is marked as URG. This is identical to how we handle ECN which most NICs get wrong as well. Cheers,
On Wed, Dec 17, 2008 at 09:13:22PM +1100, Herbert Xu wrote: > Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote: > > > > This is certainly nice and possible but as mention earlier, the harder > > problem is that the urg_ptr is only 16-bits so there will be still be > > problem if the urgent seq-64k mark is within a super-skb as tail segments > > need urg while the head ones won't. I think we would need out-of-band > > means to communicate the extra bit (17th) from tcp if we want to solve > > that one. It would be rather easy to change TCP to only do splitting for > > that particular skb though and keep everything else intact. > > I think you've got it the wrong way around. The test for setting > urg_ptr is: Oh I see what you mean now. Yes this isn't as easy as it appears. Thanks,
On Wed, Dec 17, 2008 at 09:31:29PM +1100, Herbert Xu wrote: > > Oh I see what you mean now. Yes this isn't as easy as it appears. Reading RFC 793 again, it seems that what we're doing may not be correct. It would appear that the intention is to flag urgent mode as soon as urgent data appears from the user, regardless of whether it's 64K or more beyond the current sequence number. The fact that the pointer is 16 bits long is not an issue. We should setting it to 0xffff. The way we do it currently means that if there is loads of data in the pipe then we don't signal urgent mode at all until we're within 64K of snd.up, which defeats the whole point of urgent mode. Cheers,
Herbert Xu píše v St 17. 12. 2008 v 21:17 +1100: > Petr Tesarik <ptesarik@suse.cz> wrote: > > > > 1. Currently this code path will never get run, because large > > offloads are avoided both for hardware TSO and software GSO. > > You'd also have to modify the conditionals that guard calls > > to tcp_mss_split_point(). > > Obviously one would revert the existing fixes after this is applied. > > > 2. Is it worth the trouble? > > This slows down the software GSO slightly. I mean, urgent > > data should be quite rare, so it makes sense to treat > > non-urgent data as "fast path" and urgent data as "slow > > path". While Ilpo's approach only slows down the slow path, > > your patch slows down both. > > Well it might slow down the GSO path slightly, but it speeds > up the TSO generation path accordingly :) I think the cost is > negligible in either case so the benefit of TSO dominates. You're talking about the urgent-data case, right? Yes, for urgent data, correctly implemented GSO will no doubt be faster than splitting the skb. But GSO will also get (a bit) slower for the non-urgent-data case, where the slowdown is not compensated by anything. Or did I miss something obvious? Petr Tesarik > > Anyway, we have to keep the code that avoids large offloads with URG, > > because many NICs are broken in this regard (at least all Intel NICs > > I've seen so far), so why treat software GSO specially? > > That's not a problem. Unless the NIC sets NETIF_F_TSO_URG (which > nobody does currently because it's a new flag), we'll do the > segmentation in software, but only if the packet is marked as URG. > > This is identical to how we handle ECN which most NICs get wrong > as well. > > Cheers, -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 17, 2008 at 11:45:13AM +0100, Petr Tesarik wrote: > > You're talking about the urgent-data case, right? Yes, for urgent data, > correctly implemented GSO will no doubt be faster than splitting the > skb. But GSO will also get (a bit) slower for the non-urgent-data case, > where the slowdown is not compensated by anything. Or did I miss > something obvious? Unless you've got numbers I'm not buying this :) Cheers,
Herbert Xu píše v St 17. 12. 2008 v 21:42 +1100: > On Wed, Dec 17, 2008 at 09:31:29PM +1100, Herbert Xu wrote: > > > > Oh I see what you mean now. Yes this isn't as easy as it appears. > > Reading RFC 793 again, it seems that what we're doing may not be > correct. It would appear that the intention is to flag urgent mode > as soon as urgent data appears from the user, regardless of whether > it's 64K or more beyond the current sequence number. > > The fact that the pointer is 16 bits long is not an issue. We > should setting it to 0xffff. You might read it this way, but this would have some undesired consequences in practise. Both Linux and *BSD interprets the offset as a pointer to the first byte after the "out-of-band" data. So, they remove the immediately preceding byte from the TCP stream (unless you set MSG_OOB_INLINE, but that's not on by default). Additionally, whenever the urgent pointer (the absolute offset in the stream) changes, Linux (and maybe also *BSD -- not tested) sends a SIGURG to the process. In short, if you keep urg_ptr at 0xffff while moving forward in the stream, the receiving side will interpret it as multiple urgent data. Not quite what you want, I think. Petr Tesarik > The way we do it currently means that if there is loads of data > in the pipe then we don't signal urgent mode at all until we're > within 64K of snd.up, which defeats the whole point of urgent > mode. > > Cheers, -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 17, 2008 at 11:52:23AM +0100, Petr Tesarik wrote: > > In short, if you keep urg_ptr at 0xffff while moving forward in the > stream, the receiving side will interpret it as multiple urgent data. > Not quite what you want, I think. I just checked NetBSD and guess what, they do exactly what I've suggested: if (SEQ_GT(tp->snd_up, tp->snd_nxt)) { u_int32_t urp = tp->snd_up - tp->snd_nxt; if (urp > IP_MAXPACKET) urp = IP_MAXPACKET; th->th_urp = htons((u_int16_t)urp); th->th_flags |= TH_URG; } else So if you think this is bad then it's been there for decades :) Cheers,
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Wed, 17 Dec 2008 22:26:00 +1100 > if (SEQ_GT(tp->snd_up, tp->snd_nxt)) { > u_int32_t urp = tp->snd_up - tp->snd_nxt; > if (urp > IP_MAXPACKET) > urp = IP_MAXPACKET; > th->th_urp = htons((u_int16_t)urp); > th->th_flags |= TH_URG; > } else > > So if you think this is bad then it's been there for decades :) No, this is a hack the BSD guys added. This code does not appear in Richard Steven's TCP/IP Illustrated Volume 2, as I showed in my other reply. And bad or not, you cannot compare exposure and deployment of the BSD stack compared to say our's or Microsoft's. Especially in such the unusual circumstance of URG with the URG pointer outside of the 16-bit offset limit. Your quick discovery of the retransmit bug nearly proves this :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e26f549..4508c48 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -532,9 +532,11 @@ struct net_device #define NETIF_F_GSO_ROBUST (SKB_GSO_DODGY << NETIF_F_GSO_SHIFT) #define NETIF_F_TSO_ECN (SKB_GSO_TCP_ECN << NETIF_F_GSO_SHIFT) #define NETIF_F_TSO6 (SKB_GSO_TCPV6 << NETIF_F_GSO_SHIFT) +#define NETIF_F_TSO_URG (SKB_GSO_TCP_URG << NETIF_F_GSO_SHIFT) /* List of features with software fallbacks. */ -#define NETIF_F_GSO_SOFTWARE (NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6) +#define NETIF_F_GSO_SOFTWARE (NETIF_F_TSO | NETIF_F_TSO_ECN | \ + NETIF_F_TSO6 | NETIF_F_TSO_URG) #define NETIF_F_GEN_CSUM (NETIF_F_NO_CSUM | NETIF_F_HW_CSUM) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2725f4e..56e9179 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -188,6 +188,9 @@ enum { SKB_GSO_TCP_ECN = 1 << 3, SKB_GSO_TCPV6 = 1 << 4, + + /* This indicates the tcp segment has URG set. */ + SKB_GSO_TCP_URG = 1 << 5, }; #if BITS_PER_LONG > 32 diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 1aa2dc9..33305f6 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1202,6 +1202,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int features) SKB_GSO_UDP | SKB_GSO_DODGY | SKB_GSO_TCP_ECN | + SKB_GSO_TCP_URG | 0))) goto out; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index c5aca0b..e6796a8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2383,6 +2383,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features) __be32 delta; unsigned int oldlen; unsigned int len; + unsigned int urg_ptr; if (!pskb_may_pull(skb, sizeof(*th))) goto out; @@ -2408,6 +2409,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features) SKB_GSO_DODGY | SKB_GSO_TCP_ECN | SKB_GSO_TCPV6 | + SKB_GSO_TCP_URG | 0) || !(type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))) goto out; @@ -2429,6 +2431,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features) skb = segs; th = tcp_hdr(skb); seq = ntohl(th->seq); + urg_ptr = th->urg ? ntohs(th->urg_ptr) : 0; do { th->fin = th->psh = 0; @@ -2446,6 +2449,14 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features) th->seq = htonl(seq); th->cwr = 0; + + if (urg_ptr <= len) + urg_ptr = 0; + else + urg_ptr -= len; + + th->urg = !!urg_ptr; + th->urg_ptr = htons(urg_ptr); } while (skb->next); delta = htonl(oldlen + (skb->tail - skb->transport_header) + diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index fe3b4bd..d4dc0fa 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -667,6 +667,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))) { th->urg_ptr = htons(tp->snd_up - tcb->seq); th->urg = 1; + skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_URG; } tcp_options_write((__be32 *)(th + 1), tp, &opts, &md5_hash_location); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 01edac8..c11d3d9 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -746,6 +746,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, int features) SKB_GSO_DODGY | SKB_GSO_TCP_ECN | SKB_GSO_TCPV6 | + SKB_GSO_TCP_URG | 0))) goto out;