Message ID | alpine.DEB.2.02.1406041706450.14949@tomh.mtv.corp.google.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Jun 5, 2014 at 3:20 AM, Tom Herbert <therbert@google.com> wrote: > > Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates > that a device is capable of computing the UDP checksum in the > encapsulating header of a UDP tunnel. Tom, Do we have upstream driver that supports GSO_UDP_TUNNEL_CSUM? did you had such driver/patch while doing this patches? when a driver advertizes that bit, should they look over the xmit path on the new encap_hdr_csum bit? Or. > > include/linux/netdev_features.h | 1 + > include/linux/skbuff.h | 2 ++ > net/ipv4/af_inet.c | 1 + > net/ipv4/tcp_offload.c | 1 + > net/ipv4/udp.c | 40 +++++++++++++++++++--------------------- > net/ipv4/udp_offload.c | 4 +++- > net/ipv6/ip6_offload.c | 1 + > net/ipv6/udp_offload.c | 4 +++- > 8 files changed, 31 insertions(+), 23 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 26, 2014 at 2:05 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote: > On Thu, Jun 5, 2014 at 3:20 AM, Tom Herbert <therbert@google.com> wrote: >> >> Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates >> that a device is capable of computing the UDP checksum in the >> encapsulating header of a UDP tunnel. > > > > Tom, > > Do we have upstream driver that supports GSO_UDP_TUNNEL_CSUM? did you > had such driver/patch while doing this patches? when a driver > advertizes that bit, should they look over the xmit path on the new > encap_hdr_csum bit? > No, no, and encap_hdr_csum should only be set with SKB_GSO_UDP_TUNNEL_CSUM or SKB_GSO_GRE_CSUM. find . -name '*.[ch]' -exec fgrep -l GSO_UDP_TUNNEL_CSUM {} \; returns nothing. find . -name '*.[ch]' -exec fgrep -l GSO_UDP_TUNNEL {} \; returns ./ethernet/intel/i40e/i40e_main.c ./ethernet/broadcom/bnx2x/bnx2x_main.c ./ethernet/qlogic/qlcnic/qlcnic_main.c ./ethernet/mellanox/mlx4/en_netdev.c ./ethernet/emulex/benet/be_main.c > Or. > > >> >> include/linux/netdev_features.h | 1 + >> include/linux/skbuff.h | 2 ++ >> net/ipv4/af_inet.c | 1 + >> net/ipv4/tcp_offload.c | 1 + >> net/ipv4/udp.c | 40 +++++++++++++++++++--------------------- >> net/ipv4/udp_offload.c | 4 +++- >> net/ipv6/ip6_offload.c | 1 + >> net/ipv6/udp_offload.c | 4 +++- >> 8 files changed, 31 insertions(+), 23 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Sep 27, 2014 at 12:57 AM, Tom Herbert <therbert@google.com> wrote: > On Fri, Sep 26, 2014 at 2:05 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote: >> On Thu, Jun 5, 2014 at 3:20 AM, Tom Herbert <therbert@google.com> wrote: >>> Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates >>> that a device is capable of computing the UDP checksum in the >>> encapsulating header of a UDP tunnel. >> Do we have upstream driver that supports GSO_UDP_TUNNEL_CSUM? did you >> had such driver/patch while doing this patches? when a driver >> advertizes that bit, should they look over the xmit path on the new >> encap_hdr_csum bit? > No, no, and encap_hdr_csum should only be set with > SKB_GSO_UDP_TUNNEL_CSUM or SKB_GSO_GRE_CSUM. I'm still trying to dig the bigger picture w.r.t checksum of the outer UDP packet from the patches -- if I got it right, once these patches were picked upstream, there's a scheme where the kernel either computes this checksum or let the device do that - when they advertize NETIF_F_GSO_YYY_CSUM and in that case (skb->encap_hdr_csum == true) should be interpreted as a directive to do that, right? So what happens when the device isn't capable to compute that checksum (e.g they don't set _GSO_UDP_TUNNEL_CSUM) but they do advertize the GSO_UDP_TUNNEL bit? I was worried that we can run into this scheme - the stack computes the outer checksum for the giant 64K UDP chunck that encapsulate a 64K TCP segment, but when the NIC will issue the segmentation, they will likely to just repeat ~40 times (64K/1500) the original udp checksum for the packets they send, which will be treated as bad checksum on the receiving end, bad. > find . -name '*.[ch]' -exec fgrep -l GSO_UDP_TUNNEL_CSUM {} \; > returns nothing. grepping I know, I was just hoping you were able to test such a sensitive change with HW that cause both code parts (outer udp checksum offloaded vs. non-offloaded) to be exercised > find . -name '*.[ch]' -exec fgrep -l GSO_UDP_TUNNEL {} \; > returns > > ./ethernet/intel/i40e/i40e_main.c > ./ethernet/broadcom/bnx2x/bnx2x_main.c > ./ethernet/qlogic/qlcnic/qlcnic_main.c > ./ethernet/mellanox/mlx4/en_netdev.c > ./ethernet/emulex/benet/be_main.c cool, I see here four 40Gbs NICs that support GSO offloading of VXLAN traffic, each of them can serve you for testing new developments you do in that area. Or. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Sep 27, 2014 at 1:39 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote: > On Sat, Sep 27, 2014 at 12:57 AM, Tom Herbert <therbert@google.com> wrote: >> On Fri, Sep 26, 2014 at 2:05 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote: >>> On Thu, Jun 5, 2014 at 3:20 AM, Tom Herbert <therbert@google.com> wrote: > >>>> Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates >>>> that a device is capable of computing the UDP checksum in the >>>> encapsulating header of a UDP tunnel. > >>> Do we have upstream driver that supports GSO_UDP_TUNNEL_CSUM? did you >>> had such driver/patch while doing this patches? when a driver >>> advertizes that bit, should they look over the xmit path on the new >>> encap_hdr_csum bit? > >> No, no, and encap_hdr_csum should only be set with >> SKB_GSO_UDP_TUNNEL_CSUM or SKB_GSO_GRE_CSUM. > > I'm still trying to dig the bigger picture w.r.t checksum of the outer > UDP packet from the patches -- if I got it right, once these patches > were picked upstream, there's a scheme where the kernel either > computes this checksum or let the device do that - when they advertize > NETIF_F_GSO_YYY_CSUM and in that case > (skb->encap_hdr_csum == true) should be interpreted as a directive to > do that, right? NETIF_F_GSO_UDP_CSUM does indicate that device is capable of setting outer UDP checksum. gso_type in an skbuff indicates to driver that the outer checksum needs to be computed (e.g. SKB_GSO_UDP_TUNNEL_CSUM). skb->encap_hdr_csum is only used in software GSO, drivers should not care about this. > > So what happens when the device isn't capable to compute that checksum > (e.g they don't set _GSO_UDP_TUNNEL_CSUM) but they do advertize the > GSO_UDP_TUNNEL bit? > Then for tunnels configured to use outer checksum software GSO would be used. > I was worried that we can run into this scheme - the stack computes > the outer checksum for the giant 64K UDP chunck that encapsulate a 64K > TCP segment, but when the NIC will issue the segmentation, they will > likely to just repeat ~40 times (64K/1500) the original udp checksum > for the packets they send, which will be treated as bad checksum on > the receiving end, bad. > This shouldn't happen since the driver would not advertise SKB_GSO_UDP_TUNNEL_CSUM. We can add some comments in skbuff.h to clarify some of the GSO types. Tom >> find . -name '*.[ch]' -exec fgrep -l GSO_UDP_TUNNEL_CSUM {} \; >> returns nothing. > > grepping I know, I was just hoping you were able to test such a sensitive change > with HW that cause both code parts (outer udp checksum offloaded vs. > non-offloaded) to be exercised > >> find . -name '*.[ch]' -exec fgrep -l GSO_UDP_TUNNEL {} \; >> returns >> >> ./ethernet/intel/i40e/i40e_main.c >> ./ethernet/broadcom/bnx2x/bnx2x_main.c >> ./ethernet/qlogic/qlcnic/qlcnic_main.c >> ./ethernet/mellanox/mlx4/en_netdev.c >> ./ethernet/emulex/benet/be_main.c > > cool, I see here four 40Gbs NICs that support GSO offloading of VXLAN > traffic, each of them can serve you for testing new developments you > do in that area. > > Or. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 29, 2014 at 7:11 AM, Tom Herbert <therbert@google.com> wrote: > On Sat, Sep 27, 2014 at 1:39 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote: >> On Sat, Sep 27, 2014 at 12:57 AM, Tom Herbert <therbert@google.com> wrote: >>> On Fri, Sep 26, 2014 at 2:05 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote: >>>> On Thu, Jun 5, 2014 at 3:20 AM, Tom Herbert <therbert@google.com> wrote: >> >>>>> Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates >>>>> that a device is capable of computing the UDP checksum in the >>>>> encapsulating header of a UDP tunnel. >> >>>> Do we have upstream driver that supports GSO_UDP_TUNNEL_CSUM? did you >>>> had such driver/patch while doing this patches? when a driver >>>> advertizes that bit, should they look over the xmit path on the new >>>> encap_hdr_csum bit? >> >>> No, no, and encap_hdr_csum should only be set with >>> SKB_GSO_UDP_TUNNEL_CSUM or SKB_GSO_GRE_CSUM. >> >> I'm still trying to dig the bigger picture w.r.t checksum of the outer >> UDP packet from the patches -- if I got it right, once these patches >> were picked upstream, there's a scheme where the kernel either >> computes this checksum or let the device do that - when they advertize >> NETIF_F_GSO_YYY_CSUM and in that case >> (skb->encap_hdr_csum == true) should be interpreted as a directive to >> do that, right? > > NETIF_F_GSO_UDP_CSUM does indicate that device is capable of setting > outer UDP checksum. gso_type in an skbuff indicates to driver that the > outer checksum needs to be computed (e.g. SKB_GSO_UDP_TUNNEL_CSUM). > skb->encap_hdr_csum is only used in software GSO, drivers should not > care about this. > >> >> So what happens when the device isn't capable to compute that checksum >> (e.g they don't set _GSO_UDP_TUNNEL_CSUM) but they do advertize the >> GSO_UDP_TUNNEL bit? >> > Then for tunnels configured to use outer checksum software GSO would be used. I see. Can you point on the code that does this job, i.e goes to SW GSO unless the driver advertizes both bits Also, I saw some weird behavior recently (which I haven't fully nailed down, so didn't rush to post that) when I tested vxlan between node/NIC that supported offloads to another node/NIC which doesn't. That is packets in one direction where flying w. udp csum = 0 and in the other direction with udp csum != 0, did you run such test? does it work for you? >> I was worried that we can run into this scheme - the stack computes >> the outer checksum for the giant 64K UDP chunck that encapsulate a 64K >> TCP segment, but when the NIC will issue the segmentation, they will >> likely to just repeat ~40 times (64K/1500) the original udp checksum >> for the packets they send, which will be treated as bad checksum on >> the receiving end, bad. >> > This shouldn't happen since the driver would not advertise > SKB_GSO_UDP_TUNNEL_CSUM. > > We can add some comments in skbuff.h to clarify some of the GSO types. that will be very helpful >>> find . -name '*.[ch]' -exec fgrep -l GSO_UDP_TUNNEL {} \; >>> returns >>> ./ethernet/intel/i40e/i40e_main.c >>> ./ethernet/broadcom/bnx2x/bnx2x_main.c >>> ./ethernet/qlogic/qlcnic/qlcnic_main.c >>> ./ethernet/mellanox/mlx4/en_netdev.c >>> ./ethernet/emulex/benet/be_main.c >> cool, I see here four 40Gbs NICs that support GSO offloading of VXLAN >> traffic, each of them can serve you for testing new developments you >> do in that area. any thoughts here? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index c26d0ec..f1338e0 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -45,6 +45,7 @@ enum { NETIF_F_GSO_IPIP_BIT, /* ... IPIP tunnel with TSO */ NETIF_F_GSO_SIT_BIT, /* ... SIT tunnel with TSO */ NETIF_F_GSO_UDP_TUNNEL_BIT, /* ... UDP TUNNEL with TSO */ + NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */ NETIF_F_GSO_MPLS_BIT, /* ... MPLS segmentation */ /**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */ NETIF_F_GSO_MPLS_BIT, diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d8d397a..5a6d10a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -345,6 +345,8 @@ enum { SKB_GSO_UDP_TUNNEL = 1 << 9, SKB_GSO_MPLS = 1 << 10, + + SKB_GSO_UDP_TUNNEL_CSUM = 1 << 11, }; #if BITS_PER_LONG > 32 diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 0e9bb08..0070ab8 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1258,6 +1258,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb, SKB_GSO_SIT | SKB_GSO_TCPV6 | SKB_GSO_UDP_TUNNEL | + SKB_GSO_UDP_TUNNEL_CSUM | SKB_GSO_MPLS | 0))) goto out; diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index d8de7b9..c02f2d2 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -61,6 +61,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb, SKB_GSO_SIT | SKB_GSO_MPLS | SKB_GSO_UDP_TUNNEL | + SKB_GSO_UDP_TUNNEL_CSUM | 0) || !(type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))) goto out; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index a84f676..8d8c33d 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2528,7 +2528,11 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb, int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb); __be16 protocol = skb->protocol; netdev_features_t enc_features; - int outer_hlen; + int udp_offset, outer_hlen; + unsigned int oldlen; + bool need_csum; + + oldlen = (u16)~skb->len; if (unlikely(!pskb_may_pull(skb, tnl_hlen))) goto out; @@ -2540,6 +2544,10 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb, skb->mac_len = skb_inner_network_offset(skb); skb->protocol = htons(ETH_P_TEB); + need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM); + if (need_csum) + skb->encap_hdr_csum = 1; + /* segment inner packet. */ enc_features = skb->dev->hw_enc_features & netif_skb_features(skb); segs = skb_mac_gso_segment(skb, enc_features); @@ -2550,10 +2558,11 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb, } outer_hlen = skb_tnl_header_len(skb); + udp_offset = outer_hlen - tnl_hlen; skb = segs; do { struct udphdr *uh; - int udp_offset = outer_hlen - tnl_hlen; + int len; skb_reset_inner_headers(skb); skb->encapsulation = 1; @@ -2564,31 +2573,20 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb, skb_reset_mac_header(skb); skb_set_network_header(skb, mac_len); skb_set_transport_header(skb, udp_offset); + len = skb->len - udp_offset; uh = udp_hdr(skb); - uh->len = htons(skb->len - udp_offset); - - /* csum segment if tunnel sets skb with csum. */ - if (protocol == htons(ETH_P_IP) && unlikely(uh->check)) { - struct iphdr *iph = ip_hdr(skb); + uh->len = htons(len); - uh->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, - skb->len - udp_offset, - IPPROTO_UDP, 0); - uh->check = csum_fold(skb_checksum(skb, udp_offset, - skb->len - udp_offset, 0)); - if (uh->check == 0) - uh->check = CSUM_MANGLED_0; + if (need_csum) { + __be32 delta = htonl(oldlen + len); - } else if (protocol == htons(ETH_P_IPV6)) { - struct ipv6hdr *ipv6h = ipv6_hdr(skb); - u32 len = skb->len - udp_offset; + uh->check = ~csum_fold((__force __wsum) + ((__force u32)uh->check + + (__force u32)delta)); + uh->check = gso_make_checksum(skb, ~uh->check); - uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, - len, IPPROTO_UDP, 0); - uh->check = csum_fold(skb_checksum(skb, udp_offset, len, 0)); if (uh->check == 0) uh->check = CSUM_MANGLED_0; - skb->ip_summed = CHECKSUM_NONE; } skb->protocol = protocol; diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 88b4023..5c23f47 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -56,7 +56,8 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, __wsum csum; if (skb->encapsulation && - skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) { + (skb_shinfo(skb)->gso_type & + (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))) { segs = skb_udp_tunnel_segment(skb, features); goto out; } @@ -71,6 +72,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, if (unlikely(type & ~(SKB_GSO_UDP | SKB_GSO_DODGY | SKB_GSO_UDP_TUNNEL | + SKB_GSO_UDP_TUNNEL_CSUM | SKB_GSO_IPIP | SKB_GSO_GRE | SKB_GSO_MPLS) || !(type & (SKB_GSO_UDP)))) diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index b2f0915..d54c574 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -100,6 +100,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, SKB_GSO_IPIP | SKB_GSO_SIT | SKB_GSO_UDP_TUNNEL | + SKB_GSO_UDP_TUNNEL_CSUM | SKB_GSO_MPLS | SKB_GSO_TCPV6 | 0))) diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index b261ee8..79da8b3 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -63,6 +63,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, if (unlikely(type & ~(SKB_GSO_UDP | SKB_GSO_DODGY | SKB_GSO_UDP_TUNNEL | + SKB_GSO_UDP_TUNNEL_CSUM | SKB_GSO_GRE | SKB_GSO_IPIP | SKB_GSO_SIT | @@ -76,7 +77,8 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, goto out; } - if (skb->encapsulation && skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) + if (skb->encapsulation && skb_shinfo(skb)->gso_type & + (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM)) segs = skb_udp_tunnel_segment(skb, features); else { /* Do software UFO. Complete and fill in the UDP checksum as HW cannot
Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates that a device is capable of computing the UDP checksum in the encapsulating header of a UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> --- include/linux/netdev_features.h | 1 + include/linux/skbuff.h | 2 ++ net/ipv4/af_inet.c | 1 + net/ipv4/tcp_offload.c | 1 + net/ipv4/udp.c | 40 +++++++++++++++++++--------------------- net/ipv4/udp_offload.c | 4 +++- net/ipv6/ip6_offload.c | 1 + net/ipv6/udp_offload.c | 4 +++- 8 files changed, 31 insertions(+), 23 deletions(-)