diff mbox

[v3] net: Do not include padding in TCP GRO checksum

Message ID 20131116064611.GA12146@gondor.apana.org.au
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu Nov. 16, 2013, 6:46 a.m. UTC
On Fri, Nov 15, 2013 at 08:10:00PM -0800, Alexander Duyck wrote:
>
> The case I am addressing is padding added by the remote side. 
> Specifically in my case I was seeing TCP frames without options that
> were padded up to 60 bytes from a netperf TCP_RR test.  I messed up the
> padding/checksum logic so it was making the same mistake in the Tx
> checksum logic in the driver that I caught here in GRO.  As a result I
> was seeing checksum errors errors in wireshark, but noticed they were
> being accepted by the stack as valid.

OK great.  So this isn't normal data that we expect to aggregate.

In that case the simplest solution is to skip the checksum check
altogether.  We only require it if the packet is going to be merged.

So how about something like this?

gro: Only verify TCP checksums for candidates

In some cases we may receive IP packets that are longer than
their stated lengths.  Such packets are never merged in GRO.
However, we may end up computing their checksums incorrectly
and end up allowing packets with a bogus checksum enter our
stack with the checksum status set as verified.

Since such packets are rare and not performance-critical, this
patch simply skips the checksum verification for them.

Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,

Comments

Alexander H Duyck Nov. 16, 2013, 5:02 p.m. UTC | #1
On 11/15/2013 10:46 PM, Herbert Xu wrote:
> On Fri, Nov 15, 2013 at 08:10:00PM -0800, Alexander Duyck wrote:
>> The case I am addressing is padding added by the remote side. 
>> Specifically in my case I was seeing TCP frames without options that
>> were padded up to 60 bytes from a netperf TCP_RR test.  I messed up the
>> padding/checksum logic so it was making the same mistake in the Tx
>> checksum logic in the driver that I caught here in GRO.  As a result I
>> was seeing checksum errors errors in wireshark, but noticed they were
>> being accepted by the stack as valid.
> OK great.  So this isn't normal data that we expect to aggregate.

Sorry, I thought that was obvious.  The check in the IPv4/IPv6 GRO
functions always make it so that we flush any frames that contain padding.

> In that case the simplest solution is to skip the checksum check
> altogether.  We only require it if the packet is going to be merged.
>
> So how about something like this?
>
> gro: Only verify TCP checksums for candidates
>
> In some cases we may receive IP packets that are longer than
> their stated lengths.  Such packets are never merged in GRO.
> However, we may end up computing their checksums incorrectly
> and end up allowing packets with a bogus checksum enter our
> stack with the checksum status set as verified.
>
> Since such packets are rare and not performance-critical, this
> patch simply skips the checksum verification for them.
>
> Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index a2b68a1..55aeec9 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -276,6 +276,10 @@ static struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *
>  	__wsum wsum;
>  	__sum16 sum;
>  
> +	/* Don't bother verifying checksum if we're going to flush anyway. */
> +	if (NAPI_GRO_CB(skb)->flush)
> +		goto skip_csum;
> +
>  	switch (skb->ip_summed) {
>  	case CHECKSUM_COMPLETE:
>  		if (!tcp_v4_check(skb_gro_len(skb), iph->saddr, iph->daddr,
> @@ -301,6 +305,7 @@ flush:
>  		break;
>  	}
>  
> +skip_csum:
>  	return tcp_gro_receive(head, skb);
>  }
>  
> diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
> index c1097c7..71923d1 100644
> --- a/net/ipv6/tcpv6_offload.c
> +++ b/net/ipv6/tcpv6_offload.c
> @@ -39,6 +39,10 @@ static struct sk_buff **tcp6_gro_receive(struct sk_buff **head,
>  	__wsum wsum;
>  	__sum16 sum;
>  
> +	/* Don't bother verifying checksum if we're going to flush anyway. */
> +	if (NAPI_GRO_CB(skb)->flush)
> +		goto skip_csum;
> +
>  	switch (skb->ip_summed) {
>  	case CHECKSUM_COMPLETE:
>  		if (!tcp_v6_check(skb_gro_len(skb), &iph->saddr, &iph->daddr,
> @@ -65,6 +69,7 @@ flush:
>  		break;
>  	}
>  
> +skip_csum:
>  	return tcp_gro_receive(head, skb);
>  }
>
> Thanks,

This should work.  I was just playing it safe in the patches I was
submitting by trying not to alter the behaviour.  As long as it is safe
to push something with a bad checksum and the flush bit I am fine with this.

That being the case though, why don't we set the flush flag on detecting
a bad checksum and hand it off to tcp_gro_receive instead of returning
NULL?  It seems like it would be in our interest to flush the flow and
then report the bad checksum instead of keeping the flow and handing off
the bad checksum to the stack.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Nov. 17, 2013, 3:17 a.m. UTC | #2
On Sat, Nov 16, 2013 at 09:02:34AM -0800, Alexander Duyck wrote:
>
> That being the case though, why don't we set the flush flag on detecting
> a bad checksum and hand it off to tcp_gro_receive instead of returning
> NULL?  It seems like it would be in our interest to flush the flow and
> then report the bad checksum instead of keeping the flow and handing off
> the bad checksum to the stack.

Because if the TCP checksum is wrong then it may belong to a
different flow.

Cheers,
Alexander H Duyck Nov. 17, 2013, 6:24 p.m. UTC | #3
On 11/16/2013 07:17 PM, Herbert Xu wrote:
> On Sat, Nov 16, 2013 at 09:02:34AM -0800, Alexander Duyck wrote:
>> That being the case though, why don't we set the flush flag on detecting
>> a bad checksum and hand it off to tcp_gro_receive instead of returning
>> NULL?  It seems like it would be in our interest to flush the flow and
>> then report the bad checksum instead of keeping the flow and handing off
>> the bad checksum to the stack.
> Because if the TCP checksum is wrong then it may belong to a
> different flow.
>
> Cheers,

It seems like it would be much more likely that a checksum error
occuring with a padded frame would corrupt the flow identifying data
then one that isn't padded.

I suppose it doesn't really matter though since checksum errors are
probably not going to be all that common of a case to deal with anyway.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Nov. 18, 2013, 12:03 a.m. UTC | #4
On Sun, Nov 17, 2013 at 10:24:36AM -0800, Alexander Duyck wrote:
>
> It seems like it would be much more likely that a checksum error
> occuring with a padded frame would corrupt the flow identifying data
> then one that isn't padded.

Well it doesn't matter anyway because if a packet has an incorrect
checksum it'll be dropped by the receiving stack anyway so a little
reordering is not going to do any harm.  After all, to be 100%
correct, you'd have to flush every single flow between those IP
addresses which is just silly.

Cheers,
Duyck, Alexander H Nov. 18, 2013, 5:43 p.m. UTC | #5
On 11/15/2013 10:46 PM, Herbert Xu wrote:
> On Fri, Nov 15, 2013 at 08:10:00PM -0800, Alexander Duyck wrote:
>>
>> The case I am addressing is padding added by the remote side. 
>> Specifically in my case I was seeing TCP frames without options that
>> were padded up to 60 bytes from a netperf TCP_RR test.  I messed up the
>> padding/checksum logic so it was making the same mistake in the Tx
>> checksum logic in the driver that I caught here in GRO.  As a result I
>> was seeing checksum errors errors in wireshark, but noticed they were
>> being accepted by the stack as valid.
> 
> OK great.  So this isn't normal data that we expect to aggregate.
> 
> In that case the simplest solution is to skip the checksum check
> altogether.  We only require it if the packet is going to be merged.
> 
> So how about something like this?
> 
> gro: Only verify TCP checksums for candidates
> 
> In some cases we may receive IP packets that are longer than
> their stated lengths.  Such packets are never merged in GRO.
> However, we may end up computing their checksums incorrectly
> and end up allowing packets with a bogus checksum enter our
> stack with the checksum status set as verified.
> 
> Since such packets are rare and not performance-critical, this
> patch simply skips the checksum verification for them.
> 
> Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index a2b68a1..55aeec9 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -276,6 +276,10 @@ static struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *
>  	__wsum wsum;
>  	__sum16 sum;
>  
> +	/* Don't bother verifying checksum if we're going to flush anyway. */
> +	if (NAPI_GRO_CB(skb)->flush)
> +		goto skip_csum;
> +
>  	switch (skb->ip_summed) {
>  	case CHECKSUM_COMPLETE:
>  		if (!tcp_v4_check(skb_gro_len(skb), iph->saddr, iph->daddr,
> @@ -301,6 +305,7 @@ flush:
>  		break;
>  	}
>  
> +skip_csum:
>  	return tcp_gro_receive(head, skb);
>  }
>  
> diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
> index c1097c7..71923d1 100644
> --- a/net/ipv6/tcpv6_offload.c
> +++ b/net/ipv6/tcpv6_offload.c
> @@ -39,6 +39,10 @@ static struct sk_buff **tcp6_gro_receive(struct sk_buff **head,
>  	__wsum wsum;
>  	__sum16 sum;
>  
> +	/* Don't bother verifying checksum if we're going to flush anyway. */
> +	if (NAPI_GRO_CB(skb)->flush)
> +		goto skip_csum;
> +
>  	switch (skb->ip_summed) {
>  	case CHECKSUM_COMPLETE:
>  		if (!tcp_v6_check(skb_gro_len(skb), &iph->saddr, &iph->daddr,
> @@ -65,6 +69,7 @@ flush:
>  		break;
>  	}
>  
> +skip_csum:
>  	return tcp_gro_receive(head, skb);
>  }
> 
> Thanks,
> 

I'm not going to have a chance to test this today, but on review it
should fix the issue.

Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 21, 2013, 6:35 p.m. UTC | #6
From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Mon, 18 Nov 2013 09:43:18 -0800

> I'm not going to have a chance to test this today, but on review it
> should fix the issue.
> 
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

I've lost track of this dicussion, Herbert could you post the patches
I should apply?  I think it was this one and a follow-on simplification
to the checksum handling?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Nov. 22, 2013, 2:30 a.m. UTC | #7
On Thu, Nov 21, 2013 at 01:35:01PM -0500, David Miller wrote:
> 
> I've lost track of this dicussion, Herbert could you post the patches
> I should apply?  I think it was this one and a follow-on simplification
> to the checksum handling?

OK I'll respost.
diff mbox

Patch

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index a2b68a1..55aeec9 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -276,6 +276,10 @@  static struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *
 	__wsum wsum;
 	__sum16 sum;
 
+	/* Don't bother verifying checksum if we're going to flush anyway. */
+	if (NAPI_GRO_CB(skb)->flush)
+		goto skip_csum;
+
 	switch (skb->ip_summed) {
 	case CHECKSUM_COMPLETE:
 		if (!tcp_v4_check(skb_gro_len(skb), iph->saddr, iph->daddr,
@@ -301,6 +305,7 @@  flush:
 		break;
 	}
 
+skip_csum:
 	return tcp_gro_receive(head, skb);
 }
 
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index c1097c7..71923d1 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -39,6 +39,10 @@  static struct sk_buff **tcp6_gro_receive(struct sk_buff **head,
 	__wsum wsum;
 	__sum16 sum;
 
+	/* Don't bother verifying checksum if we're going to flush anyway. */
+	if (NAPI_GRO_CB(skb)->flush)
+		goto skip_csum;
+
 	switch (skb->ip_summed) {
 	case CHECKSUM_COMPLETE:
 		if (!tcp_v6_check(skb_gro_len(skb), &iph->saddr, &iph->daddr,
@@ -65,6 +69,7 @@  flush:
 		break;
 	}
 
+skip_csum:
 	return tcp_gro_receive(head, skb);
 }