diff mbox

[net-next,2/2] net: Loosen constraints for recalculating checksum in skb_segment()

Message ID 1366683556-4956-3-git-send-email-horms@verge.net.au
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Simon Horman April 23, 2013, 2:19 a.m. UTC
In the case where a non-MPLS GSO skb becomes an MPLS GSO skb, via
Open vSwitch's push MPLS action it is desirable to provide segmentation
in software. In this case the original protocol of the skb may have allowed
its checksumming to be offloaded but this may no longer be supported now
the skb is MPLS. Actually it seems to me that this is the likely case.

In order to allow the checksum to be updated in this case loosen
the rules for recalculating the checksum on in skb_segment().

N.B.: I must confess that I am a little unsure of the details of
the implementation of skb_checksum(). But I have observed that this
is necessary as skb_checksum() hits the following:

		if (!hsize && i >= nfrags) {
			...
			fskb = fskb->next;
			...
		}
		...
		if (fskb != skb_shinfo(skb)->frag_list)
			...

Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/core/skbuff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Jesse Gross April 23, 2013, 11:43 p.m. UTC | #1
On Mon, Apr 22, 2013 at 7:19 PM, Simon Horman <horms@verge.net.au> wrote:
> In the case where a non-MPLS GSO skb becomes an MPLS GSO skb, via
> Open vSwitch's push MPLS action it is desirable to provide segmentation
> in software. In this case the original protocol of the skb may have allowed
> its checksumming to be offloaded but this may no longer be supported now
> the skb is MPLS. Actually it seems to me that this is the likely case.
>
> In order to allow the checksum to be updated in this case loosen
> the rules for recalculating the checksum on in skb_segment().
>
> N.B.: I must confess that I am a little unsure of the details of
> the implementation of skb_checksum(). But I have observed that this
> is necessary as skb_checksum() hits the following:
>
>                 if (!hsize && i >= nfrags) {
>                         ...
>                         fskb = fskb->next;
>                         ...
>                 }
>                 ...
>                 if (fskb != skb_shinfo(skb)->frag_list)
>                         ...
>
> Signed-off-by: Simon Horman <horms@verge.net.au>

I'm surprised at the need for changes here since neither vlans nor
tunneling protocols needed something similar. Do you know what is
special about MPLS that is triggering this?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simon Horman April 25, 2013, 7:26 a.m. UTC | #2
On Tue, Apr 23, 2013 at 04:43:58PM -0700, Jesse Gross wrote:
> On Mon, Apr 22, 2013 at 7:19 PM, Simon Horman <horms@verge.net.au> wrote:
> > In the case where a non-MPLS GSO skb becomes an MPLS GSO skb, via
> > Open vSwitch's push MPLS action it is desirable to provide segmentation
> > in software. In this case the original protocol of the skb may have allowed
> > its checksumming to be offloaded but this may no longer be supported now
> > the skb is MPLS. Actually it seems to me that this is the likely case.
> >
> > In order to allow the checksum to be updated in this case loosen
> > the rules for recalculating the checksum on in skb_segment().
> >
> > N.B.: I must confess that I am a little unsure of the details of
> > the implementation of skb_checksum(). But I have observed that this
> > is necessary as skb_checksum() hits the following:
> >
> >                 if (!hsize && i >= nfrags) {
> >                         ...
> >                         fskb = fskb->next;
> >                         ...
> >                 }
> >                 ...
> >                 if (fskb != skb_shinfo(skb)->frag_list)
> >                         ...
> >
> > Signed-off-by: Simon Horman <horms@verge.net.au>
> 
> I'm surprised at the need for changes here since neither vlans nor
> tunneling protocols needed something similar. Do you know what is
> special about MPLS that is triggering this?

After some testing I believe that the answer for GRE is, much to my
surprise, that it doesn't work without this patch. At least not for the
test case I have been using.

To test GRE I set up a machine to receive IP packets and forward them over
a GRE (not TEB) tunnel created using the ip command. In that case I see
incorrect TCP checksums and then retransmissions when GSO segmentation
occurs.  A tcpdump is below.

With this patch applied the checksums are correct.

This seems to be the same behaviour I observed when using Open vSwtich to
output a GSO skb after it had been changed from IPv4/TCP to MPLS with an
IPv4/TCP payload using an MPLS PUSH action. Without this patch the TCP
checksums are incorrect, with it they are calculated correctly.


tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
16:10:54.502631 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 19650, offset 0, flags [DF], proto GRE (47), length 84)
    10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 64
	(tos 0x0, ttl 63, id 19650, offset 0, flags [DF], proto TCP (6), length 60)
    10.3.3.135.54778 > 10.0.98.153.10000: Flags [S], cksum 0x7cb8 (correct), seq 1709735614, win 14560, options [mss 1456,sackOK,TS val 68891260 ecr 0,nop,wscale 7], length 0
16:10:54.502882 00:23:7d:e8:dc:8c > 00:23:7d:e8:5d:a0, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto GRE (47), length 84)
    10.3.3.153 > 10.3.3.149: GREv0, Flags [none], proto IPv4 (0x0800), length 64
	(tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.98.153.10000 > 10.3.3.135.54778: Flags [S.], cksum 0x2e77 (correct), seq 3486876063, ack 1709735615, win 14240, options [mss 1436,sackOK,TS val 1310203 ecr 68891260,nop,wscale 7], length 0
16:10:54.503038 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 63, id 19651, offset 0, flags [DF], proto GRE (47), length 76)
    10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 56
	(tos 0x0, ttl 63, id 19651, offset 0, flags [DF], proto TCP (6), length 52)
    10.3.3.135.54778 > 10.0.98.153.10000: Flags [.], cksum 0x9459 (correct), seq 1, ack 1, win 114, options [nop,nop,TS val 68891260 ecr 1310203], length 0
16:10:54.503181 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 63, id 19652, offset 0, flags [DF], proto GRE (47), length 1500)
    10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 1480
	(tos 0x0, ttl 63, id 19652, offset 0, flags [DF], proto TCP (6), length 1476)
    10.3.3.135.54778 > 10.0.98.153.10000: Flags [.], cksum 0x7fd9 (incorrect -> 0x8ec9), seq 1:1425, ack 1, win 114, options [nop,nop,TS val 68891260 ecr 1310203], length 1424
16:10:54.503277 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 714: (tos 0x0, ttl 63, id 19653, offset 0, flags [DF], proto GRE (47), length 700)
    10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 680
	(tos 0x0, ttl 63, id 19653, offset 0, flags [DF], proto TCP (6), length 676)
    10.3.3.135.54778 > 10.0.98.153.10000: Flags [P.], cksum 0x7cb9 (incorrect -> 0x8c51), seq 1425:2049, ack 1, win 114, options [nop,nop,TS val 68891260 ecr 1310203], length 624
16:10:54.503286 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 63, id 19654, offset 0, flags [DF], proto GRE (47), length 76)
    10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 56
	(tos 0x0, ttl 63, id 19654, offset 0, flags [DF], proto TCP (6), length 52)
    10.3.3.135.54778 > 10.0.98.153.10000: Flags [F.], cksum 0x8c58 (correct), seq 2049, ack 1, win 114, options [nop,nop,TS val 68891260 ecr 1310203], length 0
16:10:54.503414 00:23:7d:e8:dc:8c > 00:23:7d:e8:5d:a0, ethertype IPv4 (0x0800), length 102: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto GRE (47), length 88)
    10.3.3.153 > 10.3.3.149: GREv0, Flags [none], proto IPv4 (0x0800), length 68
	(tos 0x0, ttl 64, id 31835, offset 0, flags [DF], proto TCP (6), length 64)
    10.0.98.153.10000 > 10.3.3.135.54778: Flags [.], cksum 0x84f3 (correct), seq 1, ack 1, win 112, options [nop,nop,TS val 1310203 ecr 68891260,nop,nop,sack 1 {2049:2050}], length 0
16:10:54.704485 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 1514: (tos 0x0, ttl 63, id 19655, offset 0, flags [DF], proto GRE (47), length 1500)
    10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 1480
	(tos 0x0, ttl 63, id 19655, offset 0, flags [DF], proto TCP (6), length 1476)
    10.3.3.135.54778 > 10.0.98.153.10000: Flags [.], cksum 0x8e96 (correct), seq 1:1425, ack 1, win 114, options [nop,nop,TS val 68891311 ecr 1310203], length 1424
16:10:54.704666 00:23:7d:e8:dc:8c > 00:23:7d:e8:5d:a0, ethertype IPv4 (0x0800), length 102: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto GRE (47), length 88)
    10.3.3.153 > 10.3.3.149: GREv0, Flags [none], proto IPv4 (0x0800), length 68
	(tos 0x0, ttl 64, id 31836, offset 0, flags [DF], proto TCP (6), length 64)
    10.0.98.153.10000 > 10.3.3.135.54778: Flags [.], cksum 0x7ee8 (correct), seq 1, ack 1425, win 134, options [nop,nop,TS val 1310253 ecr 68891311,nop,nop,sack 1 {2049:2050}], length 0
16:10:54.704817 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 714: (tos 0x0, ttl 63, id 19656, offset 0, flags [DF], proto GRE (47), length 700)
    10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 680
	(tos 0x0, ttl 63, id 19656, offset 0, flags [DF], proto TCP (6), length 676)
    10.3.3.135.54778 > 10.0.98.153.10000: Flags [P.], cksum 0x8bec (correct), seq 1425:2049, ack 1, win 114, options [nop,nop,TS val 68891311 ecr 1310253], length 624
16:10:54.704961 00:23:7d:e8:dc:8c > 00:23:7d:e8:5d:a0, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto GRE (47), length 76)
    10.3.3.153 > 10.3.3.149: GREv0, Flags [none], proto IPv4 (0x0800), length 56
	(tos 0x0, ttl 64, id 31837, offset 0, flags [DF], proto TCP (6), length 52)
    10.0.98.153.10000 > 10.3.3.135.54778: Flags [.], cksum 0x8bc9 (correct), seq 1, ack 2050, win 156, options [nop,nop,TS val 1310253 ecr 68891311], length 0
16:10:54.704990 00:23:7d:e8:dc:8c > 00:23:7d:e8:5d:a0, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto GRE (47), length 76)
    10.3.3.153 > 10.3.3.149: GREv0, Flags [none], proto IPv4 (0x0800), length 56
	(tos 0x0, ttl 64, id 31838, offset 0, flags [DF], proto TCP (6), length 52)
    10.0.98.153.10000 > 10.3.3.135.54778: Flags [F.], cksum 0x8bc8 (correct), seq 1, ack 2050, win 156, options [nop,nop,TS val 1310253 ecr 68891311], length 0
16:10:54.705164 00:23:7d:e8:5d:a0 > 00:23:7d:e8:dc:8c, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 63, id 20581, offset 0, flags [DF], proto GRE (47), length 76)
    10.3.3.149 > 10.3.3.153: GREv0, Flags [none], proto IPv4 (0x0800), length 56
	(tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    10.3.3.135.54778 > 10.0.98.153.10000: Flags [.], cksum 0x8bf2 (correct), seq 2050, ack 2, win 114, options [nop,nop,TS val 68891311 ecr 1310253], length 0




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Gross April 29, 2013, 8:01 p.m. UTC | #3
On Thu, Apr 25, 2013 at 12:26 AM, Simon Horman <horms@verge.net.au> wrote:
> On Tue, Apr 23, 2013 at 04:43:58PM -0700, Jesse Gross wrote:
>> On Mon, Apr 22, 2013 at 7:19 PM, Simon Horman <horms@verge.net.au> wrote:
>> > In the case where a non-MPLS GSO skb becomes an MPLS GSO skb, via
>> > Open vSwitch's push MPLS action it is desirable to provide segmentation
>> > in software. In this case the original protocol of the skb may have allowed
>> > its checksumming to be offloaded but this may no longer be supported now
>> > the skb is MPLS. Actually it seems to me that this is the likely case.
>> >
>> > In order to allow the checksum to be updated in this case loosen
>> > the rules for recalculating the checksum on in skb_segment().
>> >
>> > N.B.: I must confess that I am a little unsure of the details of
>> > the implementation of skb_checksum(). But I have observed that this
>> > is necessary as skb_checksum() hits the following:
>> >
>> >                 if (!hsize && i >= nfrags) {
>> >                         ...
>> >                         fskb = fskb->next;
>> >                         ...
>> >                 }
>> >                 ...
>> >                 if (fskb != skb_shinfo(skb)->frag_list)
>> >                         ...
>> >
>> > Signed-off-by: Simon Horman <horms@verge.net.au>
>>
>> I'm surprised at the need for changes here since neither vlans nor
>> tunneling protocols needed something similar. Do you know what is
>> special about MPLS that is triggering this?
>
> After some testing I believe that the answer for GRE is, much to my
> surprise, that it doesn't work without this patch. At least not for the
> test case I have been using.
>
> To test GRE I set up a machine to receive IP packets and forward them over
> a GRE (not TEB) tunnel created using the ip command. In that case I see
> incorrect TCP checksums and then retransmissions when GSO segmentation
> occurs.  A tcpdump is below.

Hmm, skb_segment() should be independent of protocol but I can see how
this is a case that wouldn't get exercised frequently and therefore
could have bugs. It seems like the case would be receiving a packet
that gets merged using GRO and then imposing some kind of header (but
not a single level of vlan since that is stored out of band). I think
probably it's necessary to do some more careful analysis to make sure
that this change is safe in all cases (or at least expand the commit
message since it's not immediately obvious at the moment).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simon Horman April 30, 2013, 3:17 a.m. UTC | #4
On Mon, Apr 29, 2013 at 01:01:44PM -0700, Jesse Gross wrote:
> On Thu, Apr 25, 2013 at 12:26 AM, Simon Horman <horms@verge.net.au> wrote:
> > On Tue, Apr 23, 2013 at 04:43:58PM -0700, Jesse Gross wrote:
> >> On Mon, Apr 22, 2013 at 7:19 PM, Simon Horman <horms@verge.net.au> wrote:
> >> > In the case where a non-MPLS GSO skb becomes an MPLS GSO skb, via
> >> > Open vSwitch's push MPLS action it is desirable to provide segmentation
> >> > in software. In this case the original protocol of the skb may have allowed
> >> > its checksumming to be offloaded but this may no longer be supported now
> >> > the skb is MPLS. Actually it seems to me that this is the likely case.
> >> >
> >> > In order to allow the checksum to be updated in this case loosen
> >> > the rules for recalculating the checksum on in skb_segment().
> >> >
> >> > N.B.: I must confess that I am a little unsure of the details of
> >> > the implementation of skb_checksum(). But I have observed that this
> >> > is necessary as skb_checksum() hits the following:

--,- s/skb_checksum/skb_segment/

> >> >
> >> >                 if (!hsize && i >= nfrags) {
> >> >                         ...
> >> >                         fskb = fskb->next;
> >> >                         ...
> >> >                 }
> >> >                 ...
> >> >                 if (fskb != skb_shinfo(skb)->frag_list)
> >> >                         ...
> >> >
> >> > Signed-off-by: Simon Horman <horms@verge.net.au>
> >>
> >> I'm surprised at the need for changes here since neither vlans nor
> >> tunneling protocols needed something similar. Do you know what is
> >> special about MPLS that is triggering this?
> >
> > After some testing I believe that the answer for GRE is, much to my
> > surprise, that it doesn't work without this patch. At least not for the
> > test case I have been using.
> >
> > To test GRE I set up a machine to receive IP packets and forward them over
> > a GRE (not TEB) tunnel created using the ip command. In that case I see
> > incorrect TCP checksums and then retransmissions when GSO segmentation
> > occurs.  A tcpdump is below.
> 
> Hmm, skb_segment() should be independent of protocol but I can see how
> this is a case that wouldn't get exercised frequently and therefore
> could have bugs. It seems like the case would be receiving a packet
> that gets merged using GRO and then imposing some kind of header (but
> not a single level of vlan since that is stored out of band).

Yes, that is what I think too.

> I think probably it's necessary to do some more careful analysis to make
> sure that this change is safe in all cases (or at least expand the commit
> message since it's not immediately obvious at the moment).

I will look over things a little more but I think that it is
correct so long as the calculation made by can_checksum_protocol(),
and the value of features passed to skb_segment() and in turn
can_checksum_protocol() is correct.

The reason is that in cases where can_checksum_protocol returns true
then this change will have no effect on the checksum calculation.
It is only in cases where it returns false that there may be an effect.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f23d136..81c9856 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2854,7 +2854,7 @@  struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
 						 doffset + tnl_hlen);
 
 		if (fskb != skb_shinfo(skb)->frag_list)
-			continue;
+			goto csum;
 
 		if (!sg) {
 			nskb->ip_summed = CHECKSUM_NONE;
@@ -2918,6 +2918,7 @@  skip_fraglist:
 		nskb->len += nskb->data_len;
 		nskb->truesize += nskb->data_len;
 
+csum:
 		if (!csum) {
 			nskb->csum = skb_checksum(nskb, doffset,
 						  nskb->len - doffset, 0);