Message ID | 1462410164-1953217-15-git-send-email-tom@herbertland.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2016-05-04 at 18:02 -0700, Tom Herbert wrote: > Signed-off-by: Tom Herbert <tom@herbertland.com> > --- > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 928b456..6a811fa 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -484,6 +484,8 @@ enum { > SKB_GSO_TUNNEL_REMCSUM = 1 << 14, > > SKB_GSO_IP6IP6 = 1 << 15, > + > + SKB_GSO_IP4IP6 = 1 << 16, > }; > Houston, we have a problem. gso_type is 16bit (unsigned short), or maybe I missed something ? struct skb_shared_info { unsigned char nr_frags; __u8 tx_flags; unsigned short gso_size; unsigned short gso_segs; unsigned short gso_type; <<-->>
On Wed, May 4, 2016 at 6:20 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2016-05-04 at 18:02 -0700, Tom Herbert wrote: >> Signed-off-by: Tom Herbert <tom@herbertland.com> >> --- >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >> index 928b456..6a811fa 100644 >> --- a/include/linux/skbuff.h >> +++ b/include/linux/skbuff.h >> @@ -484,6 +484,8 @@ enum { >> SKB_GSO_TUNNEL_REMCSUM = 1 << 14, >> >> SKB_GSO_IP6IP6 = 1 << 15, >> + >> + SKB_GSO_IP4IP6 = 1 << 16, >> }; >> > > Houston, we have a problem. > > gso_type is 16bit (unsigned short), or maybe I missed something ? > > struct skb_shared_info { > unsigned char nr_frags; > __u8 tx_flags; > unsigned short gso_size; > unsigned short gso_segs; > unsigned short gso_type; <<-->> No, I am pretty sure you have it right. We are out of bits. Also it seems like we are generating a number of duplicate entries. What is the difference between IP6IP6 and SIT over IPv6? I'm not really seeing the difference. I'm wondering if maybe we shouldn't look at the possibly using the IPIP and SIT bits to instead indicate that the frame is encapsulated in an outer IPv4 or outer IPv6 header since we already have TCPV4 and TCPV6 to indicate if the inner network type is a V4 or V6. Then there is no need for this patch set to introduce any new tunnel types to be segmented since all cases should be covered. As far as I can tell SKB_GSO_IPIP/SIT were never really being tested against anyway so we might want to go the IPIPV4 IPIPV6 route instead as that is probably closer to what most device limitations would be. - Alex
On Wed, May 4, 2016 at 8:26 PM, Alexander Duyck <alexander.duyck@gmail.com> wrote: > On Wed, May 4, 2016 at 6:20 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> On Wed, 2016-05-04 at 18:02 -0700, Tom Herbert wrote: >>> Signed-off-by: Tom Herbert <tom@herbertland.com> >>> --- >>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >>> index 928b456..6a811fa 100644 >>> --- a/include/linux/skbuff.h >>> +++ b/include/linux/skbuff.h >>> @@ -484,6 +484,8 @@ enum { >>> SKB_GSO_TUNNEL_REMCSUM = 1 << 14, >>> >>> SKB_GSO_IP6IP6 = 1 << 15, >>> + >>> + SKB_GSO_IP4IP6 = 1 << 16, >>> }; >>> >> >> Houston, we have a problem. >> >> gso_type is 16bit (unsigned short), or maybe I missed something ? >> >> struct skb_shared_info { >> unsigned char nr_frags; >> __u8 tx_flags; >> unsigned short gso_size; >> unsigned short gso_segs; >> unsigned short gso_type; <<-->> > > No, I am pretty sure you have it right. We are out of bits. > > Also it seems like we are generating a number of duplicate entries. > What is the difference between IP6IP6 and SIT over IPv6? I'm not > really seeing the difference. > > I'm wondering if maybe we shouldn't look at the possibly using the > IPIP and SIT bits to instead indicate that the frame is encapsulated > in an outer IPv4 or outer IPv6 header since we already have TCPV4 and > TCPV6 to indicate if the inner network type is a V4 or V6. Then there > is no need for this patch set to introduce any new tunnel types to be > segmented since all cases should be covered. As far as I can tell > SKB_GSO_IPIP/SIT were never really being tested against anyway so we > might want to go the IPIPV4 IPIPV6 route instead as that is probably > closer to what most device limitations would be. > My worry is that the current public interface means IPIP is IPv4 over IPv4, and SIT means IPv6 over IPv4. There are some drivers advertising offload support for these so I don't think we safely redefine SKB_GSO_IPIP/SIT. For finding more bits in gso_type there are some SKB_GOS_* that are not really types at all but more flags. I'm going try to turn SKB_GSO_DODGY into a skbuff for the purposes here. We'll still have the problem of running out bits next time a new type is added (maybe SKB_GSO_SCTP?) but can probably make room for one or two more. Longer term I think the the solution is too eliminate gso_type altogether as we are once again running into the problem of trying to indicate a combinatorial set of constraints and complex layering in a just few flag bits. Tom > - Alex
On Thu, May 5, 2016 at 9:48 AM, Tom Herbert <tom@herbertland.com> wrote: > On Wed, May 4, 2016 at 8:26 PM, Alexander Duyck > <alexander.duyck@gmail.com> wrote: >> On Wed, May 4, 2016 at 6:20 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >>> On Wed, 2016-05-04 at 18:02 -0700, Tom Herbert wrote: >>>> Signed-off-by: Tom Herbert <tom@herbertland.com> >>>> --- >>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >>>> index 928b456..6a811fa 100644 >>>> --- a/include/linux/skbuff.h >>>> +++ b/include/linux/skbuff.h >>>> @@ -484,6 +484,8 @@ enum { >>>> SKB_GSO_TUNNEL_REMCSUM = 1 << 14, >>>> >>>> SKB_GSO_IP6IP6 = 1 << 15, >>>> + >>>> + SKB_GSO_IP4IP6 = 1 << 16, >>>> }; >>>> >>> >>> Houston, we have a problem. >>> >>> gso_type is 16bit (unsigned short), or maybe I missed something ? >>> >>> struct skb_shared_info { >>> unsigned char nr_frags; >>> __u8 tx_flags; >>> unsigned short gso_size; >>> unsigned short gso_segs; >>> unsigned short gso_type; <<-->> >> >> No, I am pretty sure you have it right. We are out of bits. >> >> Also it seems like we are generating a number of duplicate entries. >> What is the difference between IP6IP6 and SIT over IPv6? I'm not >> really seeing the difference. >> >> I'm wondering if maybe we shouldn't look at the possibly using the >> IPIP and SIT bits to instead indicate that the frame is encapsulated >> in an outer IPv4 or outer IPv6 header since we already have TCPV4 and >> TCPV6 to indicate if the inner network type is a V4 or V6. Then there >> is no need for this patch set to introduce any new tunnel types to be >> segmented since all cases should be covered. As far as I can tell >> SKB_GSO_IPIP/SIT were never really being tested against anyway so we >> might want to go the IPIPV4 IPIPV6 route instead as that is probably >> closer to what most device limitations would be. >> > My worry is that the current public interface means IPIP is IPv4 over > IPv4, and SIT means IPv6 over IPv4. There are some drivers advertising > offload support for these so I don't think we safely redefine > SKB_GSO_IPIP/SIT. The group is pretty small, and they all advertise support for SIT and IPIP. Even if they don't support an outer IPv6 tunnel it just means we are essentially combining the 2 bits into 1. In the case of the Intel drivers that advertise support for it they could support the change without any issue since the use the header pointers to set the offsets of the inner and outer headers and don't care if they are v4 or v6. > For finding more bits in gso_type there are some SKB_GOS_* that are > not really types at all but more flags. I'm going try to turn > SKB_GSO_DODGY into a skbuff for the purposes here. We'll still have > the problem of running out bits next time a new type is added (maybe > SKB_GSO_SCTP?) but can probably make room for one or two more. Longer > term I think the the solution is too eliminate gso_type altogether as > we are once again running into the problem of trying to indicate a > combinatorial set of constraints and complex layering in a just few > flag bits. Still, I don't think there is much gain to be had by specifying the inner network type twice, once in the outer tunnel segmentation type, and then again in the inner TCP segmentation type. That is why I was thinking it would be better to just specify an IPIPV4 and IPIPV6 since then you can run v4 or v6 on the inside and let TSOV4 or TSOV6 in the hw_enc_features take care of specifying if you can support inner IPV6 or not. - Alex
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index db52030..d30ea91 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -54,8 +54,9 @@ enum { */ NETIF_F_GSO_TUNNEL_REMCSUM_BIT, /* ... TUNNEL with TSO & REMCSUM */ NETIF_F_GSO_IP6IP6_BIT, /* ... IP6IP6 tunnel with TSO */ + NETIF_F_GSO_IP4IP6_BIT, /* ... IP4IP6 tunnel with TSO */ /**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */ - NETIF_F_GSO_IP6IP6_BIT, + NETIF_F_GSO_IP4IP6_BIT, NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */ NETIF_F_SCTP_CRC_BIT, /* SCTP checksum offload */ @@ -125,6 +126,7 @@ enum { #define NETIF_F_GSO_IPIP __NETIF_F(GSO_IPIP) #define NETIF_F_GSO_IP6IP6 __NETIF_F(GSO_IP6IP6) #define NETIF_F_GSO_SIT __NETIF_F(GSO_SIT) +#define NETIF_F_GSO_IP4IP6 __NETIF_F(GSO_IP4IP6) #define NETIF_F_GSO_UDP_TUNNEL __NETIF_F(GSO_UDP_TUNNEL) #define NETIF_F_GSO_UDP_TUNNEL_CSUM __NETIF_F(GSO_UDP_TUNNEL_CSUM) #define NETIF_F_TSO_MANGLEID __NETIF_F(TSO_MANGLEID) @@ -204,6 +206,7 @@ enum { NETIF_F_GSO_GRE_CSUM | \ NETIF_F_GSO_IPIP | \ NETIF_F_GSO_IP6IP6 | \ + NETIF_F_GSO_IP4IP6 | \ NETIF_F_GSO_SIT | \ NETIF_F_GSO_UDP_TUNNEL | \ NETIF_F_GSO_UDP_TUNNEL_CSUM) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 724b9d5..4fa678a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4006,6 +4006,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type) BUILD_BUG_ON(SKB_GSO_GRE_CSUM != (NETIF_F_GSO_GRE_CSUM >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_IPIP != (NETIF_F_GSO_IPIP >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_IP6IP6 != (NETIF_F_GSO_IP6IP6 >> NETIF_F_GSO_SHIFT)); + BUILD_BUG_ON(SKB_GSO_IP4IP6 != (NETIF_F_GSO_IP4IP6 >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_SIT != (NETIF_F_GSO_SIT >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL != (NETIF_F_GSO_UDP_TUNNEL >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL_CSUM != (NETIF_F_GSO_UDP_TUNNEL_CSUM >> NETIF_F_GSO_SHIFT)); diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 928b456..6a811fa 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -484,6 +484,8 @@ enum { SKB_GSO_TUNNEL_REMCSUM = 1 << 14, SKB_GSO_IP6IP6 = 1 << 15, + + SKB_GSO_IP4IP6 = 1 << 16, }; #if BITS_PER_LONG > 32 diff --git a/include/net/inet_common.h b/include/net/inet_common.h index 109e3ee..5d68342 100644 --- a/include/net/inet_common.h +++ b/include/net/inet_common.h @@ -39,6 +39,11 @@ int inet_ctl_sock_create(struct sock **sk, unsigned short family, int inet_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len); +struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb); +int inet_gro_complete(struct sk_buff *skb, int nhoff); +struct sk_buff *inet_gso_segment(struct sk_buff *skb, + netdev_features_t features); + static inline void inet_ctl_sock_destroy(struct sock *sk) { if (sk) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 7f08d45..9d2bf87 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1192,8 +1192,8 @@ int inet_sk_rebuild_header(struct sock *sk) } EXPORT_SYMBOL(inet_sk_rebuild_header); -static struct sk_buff *inet_gso_segment(struct sk_buff *skb, - netdev_features_t features) +struct sk_buff *inet_gso_segment(struct sk_buff *skb, + netdev_features_t features) { bool udpfrag = false, fixedid = false, encap; struct sk_buff *segs = ERR_PTR(-EINVAL); @@ -1280,9 +1280,9 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb, out: return segs; } +EXPORT_SYMBOL(inet_gso_segment); -static struct sk_buff **inet_gro_receive(struct sk_buff **head, - struct sk_buff *skb) +struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb) { const struct net_offload *ops; struct sk_buff **pp = NULL; @@ -1398,6 +1398,7 @@ out: return pp; } +EXPORT_SYMBOL(inet_gro_receive); static struct sk_buff **ipip_gro_receive(struct sk_buff **head, struct sk_buff *skb) @@ -1449,7 +1450,7 @@ int inet_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len) return -EINVAL; } -static int inet_gro_complete(struct sk_buff *skb, int nhoff) +int inet_gro_complete(struct sk_buff *skb, int nhoff) { __be16 newlen = htons(skb->len - nhoff); struct iphdr *iph = (struct iphdr *)(skb->data + nhoff); @@ -1479,6 +1480,7 @@ out_unlock: return err; } +EXPORT_SYMBOL(inet_gro_complete); static int ipip_gro_complete(struct sk_buff *skb, int nhoff) { diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index cea42ad..507f701 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -16,6 +16,7 @@ #include <net/protocol.h> #include <net/ipv6.h> +#include <net/inet_common.h> #include "ip6_offload.h" @@ -269,6 +270,21 @@ static struct sk_buff **sit_ip6ip6_gro_receive(struct sk_buff **head, return ipv6_gro_receive(head, skb); } +static struct sk_buff **ip4ip6_gro_receive(struct sk_buff **head, + struct sk_buff *skb) +{ + /* Common GRO receive for SIT and IP6IP6 */ + + if (NAPI_GRO_CB(skb)->encap_mark) { + NAPI_GRO_CB(skb)->flush = 1; + return NULL; + } + + NAPI_GRO_CB(skb)->encap_mark = 1; + + return inet_gro_receive(head, skb); +} + static int ipv6_gro_complete(struct sk_buff *skb, int nhoff) { const struct net_offload *ops; @@ -308,6 +324,13 @@ static int ip6ip6_gro_complete(struct sk_buff *skb, int nhoff) return ipv6_gro_complete(skb, nhoff); } +static int ip4ip6_gro_complete(struct sk_buff *skb, int nhoff) +{ + skb->encapsulation = 1; + skb_shinfo(skb)->gso_type |= SKB_GSO_IP4IP6; + return inet_gro_complete(skb, nhoff); +} + static struct packet_offload ipv6_packet_offload __read_mostly = { .type = cpu_to_be16(ETH_P_IPV6), .callbacks = { @@ -325,6 +348,14 @@ static const struct net_offload sit_offload = { }, }; +static const struct net_offload ip4ip6_offload = { + .callbacks = { + .gso_segment = inet_gso_segment, + .gro_receive = ip4ip6_gro_receive, + .gro_complete = ip4ip6_gro_complete, + }, +}; + static const struct net_offload ip6ip6_offload = { .callbacks = { .gso_segment = ipv6_gso_segment, @@ -332,7 +363,6 @@ static const struct net_offload ip6ip6_offload = { .gro_complete = ip6ip6_gro_complete, }, }; - static int __init ipv6_offload_init(void) { @@ -345,6 +375,7 @@ static int __init ipv6_offload_init(void) inet_add_offload(&sit_offload, IPPROTO_IPV6); inet6_add_offload(&ip6ip6_offload, IPPROTO_IPV6); + inet6_add_offload(&ip4ip6_offload, IPPROTO_IPIP); return 0; } diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 0d01cef..7ad31b6 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -897,7 +897,7 @@ drop: static int ip4ip6_rcv(struct sk_buff *skb) { - return ipxip6_rcv(skb, IPPROTO_IP, &tpi_v4, + return ipxip6_rcv(skb, IPPROTO_IPIP, &tpi_v4, ip4ip6_dscp_ecn_decapsulate); } @@ -1187,6 +1187,9 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev) if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK) fl6.flowi6_mark = skb->mark; + if (iptunnel_handle_offloads(skb, SKB_GSO_IP4IP6)) + return -1; + err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu, IPPROTO_IPIP); if (err != 0) {
Signed-off-by: Tom Herbert <tom@herbertland.com> --- include/linux/netdev_features.h | 5 ++++- include/linux/netdevice.h | 1 + include/linux/skbuff.h | 2 ++ include/net/inet_common.h | 5 +++++ net/ipv4/af_inet.c | 12 +++++++----- net/ipv6/ip6_offload.c | 33 ++++++++++++++++++++++++++++++++- net/ipv6/ip6_tunnel.c | 5 ++++- 7 files changed, 55 insertions(+), 8 deletions(-)