diff mbox

[9/9,v2] net: Add sysctl to trust checksum_complete

Message ID alpine.DEB.2.02.1404261402390.27405@tomh.mtv.corp.google.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Tom Herbert April 26, 2014, 9:26 p.m. UTC
Currently if a device provides CHECKSUM_COMPLETE but the checksum
is calculated to be invalid we recompute the checksum and try
again in software. On the other hand, if device returns
CHECKSUM_UNNECESSARY we implicitly trust it and don't verify what it
did. This seems backwards!

Add a sysctl to trust the device and report an invalid checksum when
CHECKSUM_COMPLETE shows it is incorrect. sysctl defaults to enabled.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/skbuff.h     | 7 ++++++-
 net/core/dev.c             | 3 +++
 net/core/sysctl_net_core.c | 7 +++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

Comments

Ben Hutchings April 26, 2014, 9:58 p.m. UTC | #1
On Sat, 2014-04-26 at 14:26 -0700, Tom Herbert wrote:
> Currently if a device provides CHECKSUM_COMPLETE but the checksum
> is calculated to be invalid we recompute the checksum and try
> again in software. On the other hand, if device returns
> CHECKSUM_UNNECESSARY we implicitly trust it and don't verify what it
> did. This seems backwards!
> 
> Add a sysctl to trust the device and report an invalid checksum when
> CHECKSUM_COMPLETE shows it is incorrect. sysctl defaults to enabled.
[...]

I think the sysctl and variable names need an '_error' or '_err' on the
end, to distinguish this from the NETIF_F_RXCSUM feature flag.

The new sysctl also needs to be documented, again making that
distinction clear.

Finally, if we're going to start trusting device checksum errors,
perhaps there should be a way to indicate a checksum error from a device
that only reports a checksum good/bad flag.  I do understand that such
devices are more likely to have bugs in checksum validation, though.

Ben.
Stephen Hemminger April 27, 2014, 2:31 a.m. UTC | #2
On Sat, 26 Apr 2014 14:26:35 -0700 (PDT)
Tom Herbert <therbert@google.com> wrote:

> Currently if a device provides CHECKSUM_COMPLETE but the checksum
> is calculated to be invalid we recompute the checksum and try
> again in software. On the other hand, if device returns
> CHECKSUM_UNNECESSARY we implicitly trust it and don't verify what it
> did. This seems backwards!
> 
> Add a sysctl to trust the device and report an invalid checksum when
> CHECKSUM_COMPLETE shows it is incorrect. sysctl defaults to enabled.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---

NO. Make one choice and do it consistently.

Papering over driver bugs or design confusion with a sysctl is not a
reasonable choice.

If some device (or code path) has invalid checksum logic, it should
be reported once and go ahead and fix it in software. The problem with
CHECKSUM_UNNECESSARY is that there is no way to check that the device
is broken without computing the checksum (catch-22).

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert April 27, 2014, 5:03 p.m. UTC | #3
On Sat, Apr 26, 2014 at 7:31 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Sat, 26 Apr 2014 14:26:35 -0700 (PDT)
> Tom Herbert <therbert@google.com> wrote:
>
>> Currently if a device provides CHECKSUM_COMPLETE but the checksum
>> is calculated to be invalid we recompute the checksum and try
>> again in software. On the other hand, if device returns
>> CHECKSUM_UNNECESSARY we implicitly trust it and don't verify what it
>> did. This seems backwards!
>>
>> Add a sysctl to trust the device and report an invalid checksum when
>> CHECKSUM_COMPLETE shows it is incorrect. sysctl defaults to enabled.
>>
>> Signed-off-by: Tom Herbert <therbert@google.com>
>> ---
>
> NO. Make one choice and do it consistently.
>
> Papering over driver bugs or design confusion with a sysctl is not a
> reasonable choice.
>
> If some device (or code path) has invalid checksum logic, it should
> be reported once and go ahead and fix it in software. The problem with
> CHECKSUM_UNNECESSARY is that there is no way to check that the device
> is broken without computing the checksum (catch-22).
>
Unlike CHECKSUM_UNNECESSARY, CHECKSUM_COMPLETE provides information
for both valid and invalid checksums. Also, CHECKSUM_COMPLETE is well
defined and specific as to what the device and driver are returning
and I have to believe should be easier to get right. So I don't see
any reason why we shouldn't use the negative information returned by
CHECKSUM_COMPLETE; always recomputing an invalid checksum in SW is a
waste of CPU cycles and could become a basis for DOS attack at high
speeds.

In the unlikely event that the HW (or driver) is incorrectly computing
the checksum this would manifest itself as checksum errors. This is
quite debuggable, and the sysctl would be useful to narrow down
whether the issue is in checksum calculation logic or actual bad
checksums. Compare this to incorrect reporting of CHECKSUM_UNNECESSARY
which could result in packets with errors being accepted without any
detection-- it might take years to identify such a problem!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 28, 2014, 3:32 a.m. UTC | #4
From: Tom Herbert <therbert@google.com>
Date: Sun, 27 Apr 2014 10:03:48 -0700

> On Sat, Apr 26, 2014 at 7:31 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> On Sat, 26 Apr 2014 14:26:35 -0700 (PDT)
>> Tom Herbert <therbert@google.com> wrote:
>>
>>> Currently if a device provides CHECKSUM_COMPLETE but the checksum
>>> is calculated to be invalid we recompute the checksum and try
>>> again in software. On the other hand, if device returns
>>> CHECKSUM_UNNECESSARY we implicitly trust it and don't verify what it
>>> did. This seems backwards!
>>>
>>> Add a sysctl to trust the device and report an invalid checksum when
>>> CHECKSUM_COMPLETE shows it is incorrect. sysctl defaults to enabled.
>>>
>>> Signed-off-by: Tom Herbert <therbert@google.com>
>>> ---
>>
>> NO. Make one choice and do it consistently.
>>
>> Papering over driver bugs or design confusion with a sysctl is not a
>> reasonable choice.
>>
>> If some device (or code path) has invalid checksum logic, it should
>> be reported once and go ahead and fix it in software. The problem with
>> CHECKSUM_UNNECESSARY is that there is no way to check that the device
>> is broken without computing the checksum (catch-22).
>>
> Unlike CHECKSUM_UNNECESSARY, CHECKSUM_COMPLETE provides information
> for both valid and invalid checksums. Also, CHECKSUM_COMPLETE is well
> defined and specific as to what the device and driver are returning
> and I have to believe should be easier to get right. So I don't see
> any reason why we shouldn't use the negative information returned by
> CHECKSUM_COMPLETE; always recomputing an invalid checksum in SW is a
> waste of CPU cycles and could become a basis for DOS attack at high
> speeds.
> 
> In the unlikely event that the HW (or driver) is incorrectly computing
> the checksum this would manifest itself as checksum errors. This is
> quite debuggable, and the sysctl would be useful to narrow down
> whether the issue is in checksum calculation logic or actual bad
> checksums. Compare this to incorrect reporting of CHECKSUM_UNNECESSARY
> which could result in packets with errors being accepted without any
> detection-- it might take years to identify such a problem!

If we do anything, we should do it consistently and not just for one
specific checksum delivery type.

So if we add a sysctl, it should revalidate the checksum in software
for all checksum offload variants, and such a sysctl should be off by
default.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert April 28, 2014, 4:07 a.m. UTC | #5
> If we do anything, we should do it consistently and not just for one
> specific checksum delivery type.
>
> So if we add a sysctl, it should revalidate the checksum in software
> for all checksum offload variants, and such a sysctl should be off by
> default.

Okay, but I would want to add a new checksum type to distinguish
checksum_complete that was done in software as opposed to one received
from a device. I was planning on doing this in a later patch set. I
suppose you can disregard this patch for now.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 28, 2014, 4:15 a.m. UTC | #6
From: Tom Herbert <therbert@google.com>
Date: Sun, 27 Apr 2014 21:07:24 -0700

>> If we do anything, we should do it consistently and not just for one
>> specific checksum delivery type.
>>
>> So if we add a sysctl, it should revalidate the checksum in software
>> for all checksum offload variants, and such a sysctl should be off by
>> default.
> 
> Okay, but I would want to add a new checksum type to distinguish
> checksum_complete that was done in software as opposed to one received
> from a device.

I don't think it's wise to keep CHECKSUM_COMPLETE once you've software
verified it.

If anything, you should at that point treat it as we would have treated
it were it marked CHECKSUM_NONE.

Therefore I see no reason for a new checksum type.

Please repost your series if you want me to consider it with this patch
removed, rather than just saying "apply the series without patch X".

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Herbert April 28, 2014, 4:22 a.m. UTC | #7
On Sun, Apr 27, 2014 at 9:15 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Sun, 27 Apr 2014 21:07:24 -0700
>
>>> If we do anything, we should do it consistently and not just for one
>>> specific checksum delivery type.
>>>
>>> So if we add a sysctl, it should revalidate the checksum in software
>>> for all checksum offload variants, and such a sysctl should be off by
>>> default.
>>
>> Okay, but I would want to add a new checksum type to distinguish
>> checksum_complete that was done in software as opposed to one received
>> from a device.
>
> I don't think it's wise to keep CHECKSUM_COMPLETE once you've software
> verified it.
>
> If anything, you should at that point treat it as we would have treated
> it were it marked CHECKSUM_NONE.
>
We want to keep it if there are multiple checksums in the packet which
is possible with encapsulation. It really shouldn't add any cost to
maintain it with the packet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 3b14acc..f3c6018 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2763,6 +2763,8 @@  static inline __sum16 __skb_checksum_validate_unnecessary(struct sk_buff *skb,
 	return 1;
 }
 
+extern int sysctl_trust_dev_checksum;
+
 /*
  * Validate (init) checksum based on checksum complete.
  *
@@ -2775,9 +2777,12 @@  static inline __sum16 __skb_checksum_validate_complete(struct sk_buff *skb,
 						       __wsum psum)
 {
 	if (skb->ip_summed == CHECKSUM_COMPLETE) {
-		if (!csum_fold(csum_add(psum, skb->csum))) {
+		__sum16 csum = csum_fold(csum_add(psum, skb->csum));
+		if (!csum) {
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 			return 0;
+		} else if (sysctl_trust_dev_checksum) {
+			return csum;
 		}
 	}
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 11d70e3..4f7980a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2918,6 +2918,9 @@  int netdev_tstamp_prequeue __read_mostly = 1;
 int netdev_budget __read_mostly = 300;
 int weight_p __read_mostly = 64;            /* old backlog weight */
 
+int sysctl_trust_dev_checksum = 1;
+EXPORT_SYMBOL(sysctl_trust_dev_checksum);
+
 /* Called with irq disabled */
 static inline void ____napi_schedule(struct softnet_data *sd,
 				     struct napi_struct *napi)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index cf9cd13..246b9e3 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -263,6 +263,13 @@  static struct ctl_table net_core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "trust_dev_checksum",
+		.data		= &sysctl_trust_dev_checksum,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 #ifdef CONFIG_BPF_JIT
 	{
 		.procname	= "bpf_jit_enable",