Message ID | 1366683556-4956-3-git-send-email-horms@verge.net.au |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, Apr 22, 2013 at 7:19 PM, Simon Horman <horms@verge.net.au> wrote: > In the case where a non-MPLS GSO skb becomes an MPLS GSO skb, via > Open vSwitch's push MPLS action it is desirable to provide segmentation > in software. In this case the original protocol of the skb may have allowed > its checksumming to be offloaded but this may no longer be supported now > the skb is MPLS. Actually it seems to me that this is the likely case. > > In order to allow the checksum to be updated in this case loosen > the rules for recalculating the checksum on in skb_segment(). > > N.B.: I must confess that I am a little unsure of the details of > the implementation of skb_checksum(). But I have observed that this > is necessary as skb_checksum() hits the following: > > if (!hsize && i >= nfrags) { > ... > fskb = fskb->next; > ... > } > ... > if (fskb != skb_shinfo(skb)->frag_list) > ... > > Signed-off-by: Simon Horman <horms@verge.net.au> I'm surprised at the need for changes here since neither vlans nor tunneling protocols needed something similar. Do you know what is special about MPLS that is triggering this? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 23, 2013 at 04:43:58PM -0700, Jesse Gross wrote: > On Mon, Apr 22, 2013 at 7:19 PM, Simon Horman <horms@verge.net.au> wrote: > > In the case where a non-MPLS GSO skb becomes an MPLS GSO skb, via > > Open vSwitch's push MPLS action it is desirable to provide segmentation > > in software. In this case the original protocol of the skb may have allowed > > its checksumming to be offloaded but this may no longer be supported now > > the skb is MPLS. Actually it seems to me that this is the likely case. > > > > In order to allow the checksum to be updated in this case loosen > > the rules for recalculating the checksum on in skb_segment(). > > > > N.B.: I must confess that I am a little unsure of the details of > > the implementation of skb_checksum(). But I have observed that this > > is necessary as skb_checksum() hits the following: > > > > if (!hsize && i >= nfrags) { > > ... > > fskb = fskb->next; > > ... > > } > > ... > > if (fskb != skb_shinfo(skb)->frag_list) > > ... > > > > Signed-off-by: Simon Horman <horms@verge.net.au> > > I'm surprised at the need for changes here since neither vlans nor > tunneling protocols needed something similar. Do you know what is > special about MPLS that is triggering this? After some testing I believe that the answer for GRE is, much to my surprise, that it doesn't work without this patch. At least not for the test case I have been using. To test GRE I set up a machine to receive IP packets and forward them over a GRE (not TEB) tunnel created using the ip command. In that case I see incorrect TCP checksums and then retransmissions when GSO segmentation occurs. A tcpdump is below. With this patch applied the checksums are correct. This seems to be the same behaviour I observed when using Open vSwtich to output a GSO skb after it had been changed from IPv4/TCP to MPLS with an IPv4/TCP payload using an MPLS PUSH action. Without this patch the TCP checksums are incorrect, with it they are calculated correctly. tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 16:10:54.502631 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 19650, offset 0, flags [DF], proto GRE (47), length 84) 10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 64 (tos 0x0, ttl 63, id 19650, offset 0, flags [DF], proto TCP (6), length 60) 10.3.3.135.54778 > 10.0.98.153.10000: Flags [S], cksum 0x7cb8 (correct), seq 1709735614, win 14560, options [mss 1456,sackOK,TS val 68891260 ecr 0,nop,wscale 7], length 0 16:10:54.502882 00:23:7d:e8:dc:8c > 00:23:7d:e8:5d:a0, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto GRE (47), length 84) 10.3.3.153 > 10.3.3.149: GREv0, Flags [none], proto IPv4 (0x0800), length 64 (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60) 10.0.98.153.10000 > 10.3.3.135.54778: Flags [S.], cksum 0x2e77 (correct), seq 3486876063, ack 1709735615, win 14240, options [mss 1436,sackOK,TS val 1310203 ecr 68891260,nop,wscale 7], length 0 16:10:54.503038 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 63, id 19651, offset 0, flags [DF], proto GRE (47), length 76) 10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 56 (tos 0x0, ttl 63, id 19651, offset 0, flags [DF], proto TCP (6), length 52) 10.3.3.135.54778 > 10.0.98.153.10000: Flags [.], cksum 0x9459 (correct), seq 1, ack 1, win 114, options [nop,nop,TS val 68891260 ecr 1310203], length 0 16:10:54.503181 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 63, id 19652, offset 0, flags [DF], proto GRE (47), length 1500) 10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 1480 (tos 0x0, ttl 63, id 19652, offset 0, flags [DF], proto TCP (6), length 1476) 10.3.3.135.54778 > 10.0.98.153.10000: Flags [.], cksum 0x7fd9 (incorrect -> 0x8ec9), seq 1:1425, ack 1, win 114, options [nop,nop,TS val 68891260 ecr 1310203], length 1424 16:10:54.503277 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 714: (tos 0x0, ttl 63, id 19653, offset 0, flags [DF], proto GRE (47), length 700) 10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 680 (tos 0x0, ttl 63, id 19653, offset 0, flags [DF], proto TCP (6), length 676) 10.3.3.135.54778 > 10.0.98.153.10000: Flags [P.], cksum 0x7cb9 (incorrect -> 0x8c51), seq 1425:2049, ack 1, win 114, options [nop,nop,TS val 68891260 ecr 1310203], length 624 16:10:54.503286 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 63, id 19654, offset 0, flags [DF], proto GRE (47), length 76) 10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 56 (tos 0x0, ttl 63, id 19654, offset 0, flags [DF], proto TCP (6), length 52) 10.3.3.135.54778 > 10.0.98.153.10000: Flags [F.], cksum 0x8c58 (correct), seq 2049, ack 1, win 114, options [nop,nop,TS val 68891260 ecr 1310203], length 0 16:10:54.503414 00:23:7d:e8:dc:8c > 00:23:7d:e8:5d:a0, ethertype IPv4 (0x0800), length 102: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto GRE (47), length 88) 10.3.3.153 > 10.3.3.149: GREv0, Flags [none], proto IPv4 (0x0800), length 68 (tos 0x0, ttl 64, id 31835, offset 0, flags [DF], proto TCP (6), length 64) 10.0.98.153.10000 > 10.3.3.135.54778: Flags [.], cksum 0x84f3 (correct), seq 1, ack 1, win 112, options [nop,nop,TS val 1310203 ecr 68891260,nop,nop,sack 1 {2049:2050}], length 0 16:10:54.704485 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 63, id 19655, offset 0, flags [DF], proto GRE (47), length 1500) 10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 1480 (tos 0x0, ttl 63, id 19655, offset 0, flags [DF], proto TCP (6), length 1476) 10.3.3.135.54778 > 10.0.98.153.10000: Flags [.], cksum 0x8e96 (correct), seq 1:1425, ack 1, win 114, options [nop,nop,TS val 68891311 ecr 1310203], length 1424 16:10:54.704666 00:23:7d:e8:dc:8c > 00:23:7d:e8:5d:a0, ethertype IPv4 (0x0800), length 102: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto GRE (47), length 88) 10.3.3.153 > 10.3.3.149: GREv0, Flags [none], proto IPv4 (0x0800), length 68 (tos 0x0, ttl 64, id 31836, offset 0, flags [DF], proto TCP (6), length 64) 10.0.98.153.10000 > 10.3.3.135.54778: Flags [.], cksum 0x7ee8 (correct), seq 1, ack 1425, win 134, options [nop,nop,TS val 1310253 ecr 68891311,nop,nop,sack 1 {2049:2050}], length 0 16:10:54.704817 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 714: (tos 0x0, ttl 63, id 19656, offset 0, flags [DF], proto GRE (47), length 700) 10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 680 (tos 0x0, ttl 63, id 19656, offset 0, flags [DF], proto TCP (6), length 676) 10.3.3.135.54778 > 10.0.98.153.10000: Flags [P.], cksum 0x8bec (correct), seq 1425:2049, ack 1, win 114, options [nop,nop,TS val 68891311 ecr 1310253], length 624 16:10:54.704961 00:23:7d:e8:dc:8c > 00:23:7d:e8:5d:a0, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto GRE (47), length 76) 10.3.3.153 > 10.3.3.149: GREv0, Flags [none], proto IPv4 (0x0800), length 56 (tos 0x0, ttl 64, id 31837, offset 0, flags [DF], proto TCP (6), length 52) 10.0.98.153.10000 > 10.3.3.135.54778: Flags [.], cksum 0x8bc9 (correct), seq 1, ack 2050, win 156, options [nop,nop,TS val 1310253 ecr 68891311], length 0 16:10:54.704990 00:23:7d:e8:dc:8c > 00:23:7d:e8:5d:a0, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto GRE (47), length 76) 10.3.3.153 > 10.3.3.149: GREv0, Flags [none], proto IPv4 (0x0800), length 56 (tos 0x0, ttl 64, id 31838, offset 0, flags [DF], proto TCP (6), length 52) 10.0.98.153.10000 > 10.3.3.135.54778: Flags [F.], cksum 0x8bc8 (correct), seq 1, ack 2050, win 156, options [nop,nop,TS val 1310253 ecr 68891311], length 0 16:10:54.705164 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 63, id 20581, offset 0, flags [DF], proto GRE (47), length 76) 10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 56 (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 52) 10.3.3.135.54778 > 10.0.98.153.10000: Flags [.], cksum 0x8bf2 (correct), seq 2050, ack 2, win 114, options [nop,nop,TS val 68891311 ecr 1310253], length 0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 25, 2013 at 12:26 AM, Simon Horman <horms@verge.net.au> wrote: > On Tue, Apr 23, 2013 at 04:43:58PM -0700, Jesse Gross wrote: >> On Mon, Apr 22, 2013 at 7:19 PM, Simon Horman <horms@verge.net.au> wrote: >> > In the case where a non-MPLS GSO skb becomes an MPLS GSO skb, via >> > Open vSwitch's push MPLS action it is desirable to provide segmentation >> > in software. In this case the original protocol of the skb may have allowed >> > its checksumming to be offloaded but this may no longer be supported now >> > the skb is MPLS. Actually it seems to me that this is the likely case. >> > >> > In order to allow the checksum to be updated in this case loosen >> > the rules for recalculating the checksum on in skb_segment(). >> > >> > N.B.: I must confess that I am a little unsure of the details of >> > the implementation of skb_checksum(). But I have observed that this >> > is necessary as skb_checksum() hits the following: >> > >> > if (!hsize && i >= nfrags) { >> > ... >> > fskb = fskb->next; >> > ... >> > } >> > ... >> > if (fskb != skb_shinfo(skb)->frag_list) >> > ... >> > >> > Signed-off-by: Simon Horman <horms@verge.net.au> >> >> I'm surprised at the need for changes here since neither vlans nor >> tunneling protocols needed something similar. Do you know what is >> special about MPLS that is triggering this? > > After some testing I believe that the answer for GRE is, much to my > surprise, that it doesn't work without this patch. At least not for the > test case I have been using. > > To test GRE I set up a machine to receive IP packets and forward them over > a GRE (not TEB) tunnel created using the ip command. In that case I see > incorrect TCP checksums and then retransmissions when GSO segmentation > occurs. A tcpdump is below. Hmm, skb_segment() should be independent of protocol but I can see how this is a case that wouldn't get exercised frequently and therefore could have bugs. It seems like the case would be receiving a packet that gets merged using GRO and then imposing some kind of header (but not a single level of vlan since that is stored out of band). I think probably it's necessary to do some more careful analysis to make sure that this change is safe in all cases (or at least expand the commit message since it's not immediately obvious at the moment). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 29, 2013 at 01:01:44PM -0700, Jesse Gross wrote: > On Thu, Apr 25, 2013 at 12:26 AM, Simon Horman <horms@verge.net.au> wrote: > > On Tue, Apr 23, 2013 at 04:43:58PM -0700, Jesse Gross wrote: > >> On Mon, Apr 22, 2013 at 7:19 PM, Simon Horman <horms@verge.net.au> wrote: > >> > In the case where a non-MPLS GSO skb becomes an MPLS GSO skb, via > >> > Open vSwitch's push MPLS action it is desirable to provide segmentation > >> > in software. In this case the original protocol of the skb may have allowed > >> > its checksumming to be offloaded but this may no longer be supported now > >> > the skb is MPLS. Actually it seems to me that this is the likely case. > >> > > >> > In order to allow the checksum to be updated in this case loosen > >> > the rules for recalculating the checksum on in skb_segment(). > >> > > >> > N.B.: I must confess that I am a little unsure of the details of > >> > the implementation of skb_checksum(). But I have observed that this > >> > is necessary as skb_checksum() hits the following: --,- s/skb_checksum/skb_segment/ > >> > > >> > if (!hsize && i >= nfrags) { > >> > ... > >> > fskb = fskb->next; > >> > ... > >> > } > >> > ... > >> > if (fskb != skb_shinfo(skb)->frag_list) > >> > ... > >> > > >> > Signed-off-by: Simon Horman <horms@verge.net.au> > >> > >> I'm surprised at the need for changes here since neither vlans nor > >> tunneling protocols needed something similar. Do you know what is > >> special about MPLS that is triggering this? > > > > After some testing I believe that the answer for GRE is, much to my > > surprise, that it doesn't work without this patch. At least not for the > > test case I have been using. > > > > To test GRE I set up a machine to receive IP packets and forward them over > > a GRE (not TEB) tunnel created using the ip command. In that case I see > > incorrect TCP checksums and then retransmissions when GSO segmentation > > occurs. A tcpdump is below. > > Hmm, skb_segment() should be independent of protocol but I can see how > this is a case that wouldn't get exercised frequently and therefore > could have bugs. It seems like the case would be receiving a packet > that gets merged using GRO and then imposing some kind of header (but > not a single level of vlan since that is stored out of band). Yes, that is what I think too. > I think probably it's necessary to do some more careful analysis to make > sure that this change is safe in all cases (or at least expand the commit > message since it's not immediately obvious at the moment). I will look over things a little more but I think that it is correct so long as the calculation made by can_checksum_protocol(), and the value of features passed to skb_segment() and in turn can_checksum_protocol() is correct. The reason is that in cases where can_checksum_protocol returns true then this change will have no effect on the checksum calculation. It is only in cases where it returns false that there may be an effect. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f23d136..81c9856 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2854,7 +2854,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features) doffset + tnl_hlen); if (fskb != skb_shinfo(skb)->frag_list) - continue; + goto csum; if (!sg) { nskb->ip_summed = CHECKSUM_NONE; @@ -2918,6 +2918,7 @@ skip_fraglist: nskb->len += nskb->data_len; nskb->truesize += nskb->data_len; +csum: if (!csum) { nskb->csum = skb_checksum(nskb, doffset, nskb->len - doffset, 0);
In the case where a non-MPLS GSO skb becomes an MPLS GSO skb, via Open vSwitch's push MPLS action it is desirable to provide segmentation in software. In this case the original protocol of the skb may have allowed its checksumming to be offloaded but this may no longer be supported now the skb is MPLS. Actually it seems to me that this is the likely case. In order to allow the checksum to be updated in this case loosen the rules for recalculating the checksum on in skb_segment(). N.B.: I must confess that I am a little unsure of the details of the implementation of skb_checksum(). But I have observed that this is necessary as skb_checksum() hits the following: if (!hsize && i >= nfrags) { ... fskb = fskb->next; ... } ... if (fskb != skb_shinfo(skb)->frag_list) ... Signed-off-by: Simon Horman <horms@verge.net.au> --- net/core/skbuff.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)