Message ID | 20150716002350.GA7529@gondor.apana.org.au |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
Am Donnerstag, 16. Juli 2015, 08:23:50 schrieb Herbert Xu: > On Wed, Jul 15, 2015 at 02:25:59PM +0200, Wolfgang Walter wrote: > > Yes. Switching TSO off and leaving GRO on works, too. > > OK, could you please try this patch? Patch works here. Thanks, Wolfgang > > ---8<--- > We need to set the SKB_GSO_SIT bit if we detect a 6-in-4 tunnel > when doing GRO. Otherwise we may throw a packet at TSO hardware > that doesn't know what to do with it. > > Fixes: 19424e052fb4 ("sit: Add gro callbacks to sit_offload") > Reported-by: Wolfgang Walter <linux@stwm.de> > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c > index e893cd1..1252eac 100644 > --- a/net/ipv6/ip6_offload.c > +++ b/net/ipv6/ip6_offload.c > @@ -289,11 +289,21 @@ static struct packet_offload ipv6_packet_offload > __read_mostly = { }, > }; > > +static int sit_gro_complete(struct sk_buff *skb, int nhoff) > +{ > + int err = ipv6_gro_complete(skb, nhoff); > + > + skb->encapsulation = 1; > + skb_shinfo(skb)->gso_type |= SKB_GSO_SIT; > + > + return err; > +} > + > static const struct net_offload sit_offload = { > .callbacks = { > .gso_segment = ipv6_gso_segment, > .gro_receive = ipv6_gro_receive, > - .gro_complete = ipv6_gro_complete, > + .gro_complete = sit_gro_complete, > }, > };
On Thu, Jul 16, 2015 at 12:58:45PM +0200, Wolfgang Walter wrote: > Am Donnerstag, 16. Juli 2015, 08:23:50 schrieb Herbert Xu: > > On Wed, Jul 15, 2015 at 02:25:59PM +0200, Wolfgang Walter wrote: > > > Yes. Switching TSO off and leaving GRO on works, too. > > > > OK, could you please try this patch? > > Patch works here. Thanks for the confirmation. Let's add a tag for patchwork: Tested-by: Wolfgang Walter <linux@stwm.de>
Am Freitag, 17. Juli 2015, 09:56:51 schrieb Herbert Xu: > On Thu, Jul 16, 2015 at 12:58:45PM +0200, Wolfgang Walter wrote: > > Am Donnerstag, 16. Juli 2015, 08:23:50 schrieb Herbert Xu: > > > On Wed, Jul 15, 2015 at 02:25:59PM +0200, Wolfgang Walter wrote: > > > > Yes. Switching TSO off and leaving GRO on works, too. > > > > > > OK, could you please try this patch? > > > > Patch works here. > > Thanks for the confirmation. Let's add a tag for patchwork: > > Tested-by: Wolfgang Walter <linux@stwm.de> It seems that this patch may cause a problem with another one of our routers. Without the patch it had no problem, so I didn't tested it there. With that patch one interface blocks after some time. Not even arp requests get answered. It still receives packets though. Restarting the interface fixes the problem. Switching off gro for the other interface helps. This router is different from the other ones. It does not directly route isatap packets. It may routes isatap packets encapsulated in GRE packets, though. It is itself not an GRE-endpoint. The router does NAT. Basically it routes the GRE-tunnel packets unatted and NATs most of the rest. Not doing NAT and conntrack (and unloading all modules like nf_conntrack_ipv4, nf_defrag_ipv4) does not help. eth0: extern eth1: intern One (IPv4) GRE-tunnel is routed between eth0 und eth1. IPv6 ESP-tunnels are routed between eth0 and eth1 IPv4 UDP/TCP/ICMP from intern is natted with netfilter. eth1 stops sending with the patch after some time disabling gro on eth0 helps disabling tso or gso on eth0 and/or eth1 or both does not help eth0 and eth1 are both intel I350. Regards,
On Fri, Jul 17, 2015 at 05:38:30PM +0200, Wolfgang Walter wrote: > > eth1 stops sending with the patch after some time > disabling gro on eth0 helps > disabling tso or gso on eth0 and/or eth1 or both does not help > > eth0 and eth1 are both intel I350. What does ethtool -k eth1 say? Can you confirm that disabling tso on eth1 does not help? Because the most plausible explanation is that we're feeding some bogus TSO packet to the hardware causing a tx lockup. But in any case if it is a hardware lockup then it's no longer just a pure software bug. No matter what we do in the stack the hardware should not lock up (unless of course we're feeding it something that's completely bogus). If we can't figure this out then the safest solution would be to disable tunnel GRO completely because it's broken as it stands. Cheers,
Am Montag, 20. Juli 2015, 14:14:59 schrieb Herbert Xu: > On Fri, Jul 17, 2015 at 05:38:30PM +0200, Wolfgang Walter wrote: > > eth1 stops sending with the patch after some time > > disabling gro on eth0 helps > > disabling tso or gso on eth0 and/or eth1 or both does not help > > > > eth0 and eth1 are both intel I350. > > What does ethtool -k eth1 say? With TSO enabled: # ethtool -k eth0 Features for eth0: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off [fixed] receive-hashing: on highdma: on [fixed] rx-vlan-filter: on [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] busy-poll: off [fixed] hw-switch-offload: off [fixed] > > Can you confirm that disabling tso on eth1 does not help? Disabling TSO on eth1 does not help. > > Because the most plausible explanation is that we're feeding > some bogus TSO packet to the hardware causing a tx lockup. I run the unpatched 4.1.2 again since saturday without look. With your patch the network card hangs within 10 minutes or so. On the other hand I run the the patched kernel on serveral other routers (same hardware, by the way) without problems. So maybe the problem is that the former one routes GRE-tunnel-packets which contains ISATAP packets. I don't know how deep GRO/GSO inspects a packet. > > But in any case if it is a hardware lockup then it's no longer > just a pure software bug. No matter what we do in the stack > the hardware should not lock up (unless of course we're feeding > it something that's completely bogus). > > If we can't figure this out then the safest solution would be > to disable tunnel GRO completely because it's broken as it stands. > > Cheers, Regards,
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index e893cd1..1252eac 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -289,11 +289,21 @@ static struct packet_offload ipv6_packet_offload __read_mostly = { }, }; +static int sit_gro_complete(struct sk_buff *skb, int nhoff) +{ + int err = ipv6_gro_complete(skb, nhoff); + + skb->encapsulation = 1; + skb_shinfo(skb)->gso_type |= SKB_GSO_SIT; + + return err; +} + static const struct net_offload sit_offload = { .callbacks = { .gso_segment = ipv6_gso_segment, .gro_receive = ipv6_gro_receive, - .gro_complete = ipv6_gro_complete, + .gro_complete = sit_gro_complete, }, };