diff mbox

[3.2,085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

Message ID lsq.1461711744.699003961@decadent.org.uk
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Ben Hutchings April 26, 2016, 11:02 p.m. UTC
3.2.80-rc1 review patch.  If anyone has any objections, please let me know.

------------------

From: Vijay Pandurangan <vijayp@vijayp.ca>

[ Upstream commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f ]

Packets that arrive from real hardware devices have ip_summed ==
CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
current version of veth will replace CHECKSUM_NONE with
CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
a veth device to be delivered to the application. This caused applications
at Twitter to receive corrupt data when network hardware was corrupting
packets.

We believe this was added as an optimization to skip computing and
verifying checksums for communication between containers. However, locally
generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
written does nothing for them. As far as we can tell, after removing this
code, these packets are transmitted from one stack to another unmodified
(tcpdump shows invalid checksums on both sides, as expected), and they are
delivered correctly to applications. We didn’t test every possible network
configuration, but we tried a few common ones such as bridging containers,
using NAT between the host and a container, and routing from hardware
devices to containers. We have effectively deployed this in production at
Twitter (by disabling RX checksum offloading on veth devices).

This code dates back to the first version of the driver, commit
<e314dbdc1c0dc6a548ecf> ("[NET]: Virtual ethernet device driver"), so I
suspect this bug occurred mostly because the driver API has evolved
significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix
packet checksumming") (in December 2010) fixed this for packets that get
created locally and sent to hardware devices, by not changing
CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
in from hardware devices.

Co-authored-by: Evan Jones <ej@evanjones.ca>
Signed-off-by: Evan Jones <ej@evanjones.ca>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Phil Sutter <phil@nwl.cc>
Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vijay Pandurangan <vijayp@vijayp.ca>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/veth.c | 6 ------
 1 file changed, 6 deletions(-)

Comments

Ben Greear April 27, 2016, 3:59 p.m. UTC | #1
On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.

I would be careful about this.  It causes regressions when sending
PACKET_SOCKET buffers from user-space to veth devices.

There was a proposed upstream fix for the regression, but it has not gone
into the tree as far as I know.

http://www.spinics.net/lists/netdev/msg370436.html

Thanks,
Ben

>
> ------------------
>
> From: Vijay Pandurangan <vijayp@vijayp.ca>
>
> [ Upstream commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f ]
>
> Packets that arrive from real hardware devices have ip_summed ==
> CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
> CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
> current version of veth will replace CHECKSUM_NONE with
> CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
> a veth device to be delivered to the application. This caused applications
> at Twitter to receive corrupt data when network hardware was corrupting
> packets.
>
> We believe this was added as an optimization to skip computing and
> verifying checksums for communication between containers. However, locally
> generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
> written does nothing for them. As far as we can tell, after removing this
> code, these packets are transmitted from one stack to another unmodified
> (tcpdump shows invalid checksums on both sides, as expected), and they are
> delivered correctly to applications. We didn’t test every possible network
> configuration, but we tried a few common ones such as bridging containers,
> using NAT between the host and a container, and routing from hardware
> devices to containers. We have effectively deployed this in production at
> Twitter (by disabling RX checksum offloading on veth devices).
>
> This code dates back to the first version of the driver, commit
> <e314dbdc1c0dc6a548ecf> ("[NET]: Virtual ethernet device driver"), so I
> suspect this bug occurred mostly because the driver API has evolved
> significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix
> packet checksumming") (in December 2010) fixed this for packets that get
> created locally and sent to hardware devices, by not changing
> CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
> in from hardware devices.
>
> Co-authored-by: Evan Jones <ej@evanjones.ca>
> Signed-off-by: Evan Jones <ej@evanjones.ca>
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Cc: Phil Sutter <phil@nwl.cc>
> Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Vijay Pandurangan <vijayp@vijayp.ca>
> Acked-by: Cong Wang <cwang@twopensource.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> [bwh: Backported to 3.2: adjust context]
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> ---
>   drivers/net/veth.c | 6 ------
>   1 file changed, 6 deletions(-)
>
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -126,11 +126,6 @@ static netdev_tx_t veth_xmit(struct sk_b
>   	stats = this_cpu_ptr(priv->stats);
>   	rcv_stats = this_cpu_ptr(rcv_priv->stats);
>
> -	/* don't change ip_summed == CHECKSUM_PARTIAL, as that
> -	   will cause bad checksum on forwarded packets */
> -	if (skb->ip_summed == CHECKSUM_NONE &&
> -	    rcv->features & NETIF_F_RXCSUM)
> -		skb->ip_summed = CHECKSUM_UNNECESSARY;
>
>   	length = skb->len;
>   	if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)
>
Ben Hutchings April 27, 2016, 6:07 p.m. UTC | #2
On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > 
> > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
> I would be careful about this.  It causes regressions when sending
> PACKET_SOCKET buffers from user-space to veth devices.
> 
> There was a proposed upstream fix for the regression, but it has not gone
> into the tree as far as I know.
> 
> http://www.spinics.net/lists/netdev/msg370436.html
[...]

OK, I'll drop this for now.

Ben.
Hannes Frederic Sowa April 28, 2016, midnight UTC | #3
Hi Ben,

On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
> On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > > 
> > > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
> > I would be careful about this.  It causes regressions when sending
> > PACKET_SOCKET buffers from user-space to veth devices.
> > 
> > There was a proposed upstream fix for the regression, but it has not gone
> > into the tree as far as I know.
> > 
> > http://www.spinics.net/lists/netdev/msg370436.html
> [...]
> 
> OK, I'll drop this for now.

The fall out from not having this patch is in my opinion a bigger
fallout than not having this patch. This patch fixes silent data
corruption vs. the problem Ben Greear is talking about, which might not
be that a common usage.

What do others think?

Bye,
Hannes
diff mbox

Patch

--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -126,11 +126,6 @@  static netdev_tx_t veth_xmit(struct sk_b
 	stats = this_cpu_ptr(priv->stats);
 	rcv_stats = this_cpu_ptr(rcv_priv->stats);
 
-	/* don't change ip_summed == CHECKSUM_PARTIAL, as that
-	   will cause bad checksum on forwarded packets */
-	if (skb->ip_summed == CHECKSUM_NONE &&
-	    rcv->features & NETIF_F_RXCSUM)
-		skb->ip_summed = CHECKSUM_UNNECESSARY;
 
 	length = skb->len;
 	if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)