[RFC,1/3] Avoid making inappropriate requests of NETIF_F_V[46]_CSUM devices

Submitted by David Woodhouse on Jan. 14, 2013, 12:10 p.m.

Details

Message ID 1358165431.27054.62.camel@shinybook.infradead.org
State RFC
Delegated to: David Miller
Headers show

Commit Message

David Woodhouse Jan. 14, 2013, 12:10 p.m.
Devices with the NETIF_F_V[46]_CSUM feature(s) are *only* required to
handle checksumming of UDP and TCP.

In netif_skb_features() we attempt to filter out the capabilities which
are inappropriate for the device that the skb will actually be sent
from... but there we assume that NETIF_F_V4_CSUM devices can handle
*all* Legacy IP, and that NETIF_F_V6_CSUM devices can handle *all* IPv6.

This may have been OK in the days when CHECKSUM_PARTIAL packets would
*only* be produced by the local stack, and we knew the local stack
didn't generate them for anything but UDP and TCP. But these days that's
not true. When a tun device receives a packet from userspace with
VIRTIO_NET_HDR_F_NEEDS_CSUM, that translates fairly directly into
setting CHECKSUM_PARTIAL on the resulting skb. Since virtio_net
advertises NETIF_F_HW_CSUM to its guests, we should expect to be asked
to checksum *anything*.

This patch attempts to cope with that by checking skb->csum_offset for
such devices. If that doesn't match the offset for UDP or TCP, then we
don't use hardware checksum. It won't catch 100% of cases, but a full
check of the actual skb contents in the fast path isn't a good idea.
It'll probably do well enough for now.

This expands the check in can_checksum_protocol() to make it more
readable, but doing so shouldn't make the resulting code any *bigger*,
except obviously for the additional checks.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

Comments

David Miller Jan. 16, 2013, 8:54 p.m.
From: David Woodhouse <dwmw2@infradead.org>
Date: Mon, 14 Jan 2013 12:10:31 +0000

> Devices with the NETIF_F_V[46]_CSUM feature(s) are *only* required to
> handle checksumming of UDP and TCP.
> 
> In netif_skb_features() we attempt to filter out the capabilities which
> are inappropriate for the device that the skb will actually be sent
> from... but there we assume that NETIF_F_V4_CSUM devices can handle
> *all* Legacy IP, and that NETIF_F_V6_CSUM devices can handle *all* IPv6.
> 
> This may have been OK in the days when CHECKSUM_PARTIAL packets would
> *only* be produced by the local stack, and we knew the local stack
> didn't generate them for anything but UDP and TCP. But these days that's
> not true. When a tun device receives a packet from userspace with
> VIRTIO_NET_HDR_F_NEEDS_CSUM, that translates fairly directly into
> setting CHECKSUM_PARTIAL on the resulting skb. Since virtio_net
> advertises NETIF_F_HW_CSUM to its guests, we should expect to be asked
> to checksum *anything*.

My opinion on this is that the injectors of packets are responsible
for ensuring checksum types are set on SKBs in an appropriate way.

So we ensure this in the local protocol stacks that generate packets,
and if foreign alien entities can inject SKBs with these checksum
settings (like the tun device can) the burdon of verification falls
upon whatever layer allows that to happen.

So really, the fix is in the tun device and the virtio layer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Woodhouse Jan. 16, 2013, 10:34 p.m.
On Wed, 2013-01-16 at 15:54 -0500, David Miller wrote:
> 
> My opinion on this is that the injectors of packets are responsible
> for ensuring checksum types are set on SKBs in an appropriate way.
> 
> So we ensure this in the local protocol stacks that generate packets,
> and if foreign alien entities can inject SKBs with these checksum
> settings (like the tun device can) the burdon of verification falls
> upon whatever layer allows that to happen.
> 
> So really, the fix is in the tun device and the virtio layer.

The virtio layer (and the tun device) expose the equivalent of the
NETIF_F_HW_CSUM capability to the guest. In the case where we have a
real device on the host which *also* has NETIF_F_HW_CSUM capability, are
you saying that the tun driver should do the checksum for non-UDP/TCP
packets in software *anyway*, just because the packet might end up going
out a device *without* that capability, and the check in
harmonize_features() isn't sophisticated enough to cope properly?
David Miller Jan. 16, 2013, 11 p.m.
From: David Woodhouse <dwmw2@infradead.org>
Date: Wed, 16 Jan 2013 22:34:18 +0000

> On Wed, 2013-01-16 at 15:54 -0500, David Miller wrote:
>> 
>> My opinion on this is that the injectors of packets are responsible
>> for ensuring checksum types are set on SKBs in an appropriate way.
>> 
>> So we ensure this in the local protocol stacks that generate packets,
>> and if foreign alien entities can inject SKBs with these checksum
>> settings (like the tun device can) the burdon of verification falls
>> upon whatever layer allows that to happen.
>> 
>> So really, the fix is in the tun device and the virtio layer.
> 
> The virtio layer (and the tun device) expose the equivalent of the
> NETIF_F_HW_CSUM capability to the guest. In the case where we have a
> real device on the host which *also* has NETIF_F_HW_CSUM capability, are
> you saying that the tun driver should do the checksum for non-UDP/TCP
> packets in software *anyway*, just because the packet might end up going
> out a device *without* that capability, and the check in
> harmonize_features() isn't sophisticated enough to cope properly?

I'm saying that tun can't inject unchecked crap into our stack.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Woodhouse Jan. 17, 2013, 12:03 a.m.
On Wed, 2013-01-16 at 18:00 -0500, David Miller wrote:
> I'm saying that tun can't inject unchecked crap into our stack.

That's a very strange way of putting it.

Our stack has explicit support for sane hardware devices with
NETIF_F_HW_CSUM capability that can checksum anything.

And it has checks (harmonize_features) on output, so that if the device
on which a packet is being emitted *doesn't* have appropriate hardware
checksum capability, we'll do the checksum in software.

Except that the check in harmonize_features() doesn't do that check
*properly*. It only catches *some* of the packets that the device can't
handle, and lets others through.

So we basically can't use NETIF_F_HW_CSUM in the general case, for
anything like SCTP or any other protocol, because harmonize_features()
is buggy and will let it go out a device that can't handle it.

Um, SCTP *does* use CHECKSUM_PARTIAL. Am I missing something or does
that suffer the same problem?
David Woodhouse Jan. 29, 2013, 4:35 p.m.
On Thu, 2013-01-17 at 00:03 +0000, David Woodhouse wrote:
> Um, SCTP *does* use CHECKSUM_PARTIAL. Am I missing something or does
> that suffer the same problem?

Am I mistaken, or does SCTP end up sending un-checksummed packets
because of the same 'bug' in harmonize_features()?
David Woodhouse Sept. 21, 2015, 4:29 p.m.
On Wed, 2013-01-16 at 18:00 -0500, David Miller wrote:
> From: David Woodhouse <dwmw2@infradead.org>
> Date: Wed, 16 Jan 2013 22:34:18 +0000
> 
> > On Wed, 2013-01-16 at 15:54 -0500, David Miller wrote:
> >> 
> >> My opinion on this is that the injectors of packets are responsible
> >> for ensuring checksum types are set on SKBs in an appropriate way.
> >> 
> >> So we ensure this in the local protocol stacks that generate packets,
> >> and if foreign alien entities can inject SKBs with these checksum
> >> settings (like the tun device can) the burdon of verification falls
> >> upon whatever layer allows that to happen.
> >> 
> >> So really, the fix is in the tun device and the virtio layer.
> > 
> > The virtio layer (and the tun device) expose the equivalent of the
> > NETIF_F_HW_CSUM capability to the guest. In the case where we have a
> > real device on the host which *also* has NETIF_F_HW_CSUM capability, are
> > you saying that the tun driver should do the checksum for non-UDP/TCP
> > packets in software *anyway*, just because the packet might end up going
> > out a device *without* that capability, and the check in
> > harmonize_features() isn't sophisticated enough to cope properly?
> 
> I'm saying that tun can't inject unchecked crap into our stack.

Did we ever resolve this? AFAICT from inspecting the code the
virtio_net device still advertises hardware csum capabilities to the
guest. And accepts packets which need checksumming, calling
skb_partial_csum_set() as appropriate. Likewise tun, xen, macvtap and
af_packet.

And that works fine — it's a nice performance win because it means that
VM guests (and other clients) can make full use of the HW csum
capabilities of the real network hardware. And when the outbound
netdevice *doesn't* have HW csum support, we generally do the right
thing and complete the csum in software in the host kernel before
transmitting it.

Perhaps I'm missing something, but I'm not sure why you refer to that
as 'injecting unchecked crap'. Surely it's using CHECKSUM_PARTIAL
precisely as it was designed, and allowing the checksum to be completed
either by hardware or software as appropriate?

The *only* problem is the false positive in harmonize_features(), which
was addressed by my patch which started this thread (in 2013). The
problem is that an IP packet that *isn't* TCP or UDP, being sent out a
device that has only NETIF_F_IP_CSUM capability, ends up being handed
to the device unchecksummed because harmonize_features() fails to clear
the HW csum flag as it (arguably) should.

Original thread at
http://comments.gmane.org/gmane.linux.network/254981

I'm only looking at it again because I'm pondering enabling HW csum in
8139cp (now that I've fixed TSO), and it reminded me of this...
David Woodhouse Sept. 23, 2015, 3:42 p.m.
On Mon, 2015-09-21 at 17:29 +0100, David Woodhouse wrote:
> 
> Did we ever resolve this? AFAICT from inspecting the code the
> virtio_net device still advertises hardware csum capabilities to the
> guest. And accepts packets which need checksumming, calling
> skb_partial_csum_set() as appropriate. Likewise tun, xen, macvtap and
> af_packet.

Here's a test case which provokes the network stack into handing a
CHECKSUM_PARTIAL skb to a device which it knows can't handle it. (It
obviously needs the AF_PACKET endianness ABI fix I sent earlier.)

You might well be right to refer to this as 'injecting unchecked crap',
but we are *gaining* injection points with the ability to do this, and
for not entirely insane reasons — people want to be able to make full
use of hardware offload capabilities.

And we *have* a safety check, to avoid handing CHECKSUM_PARTIAL buffers
to devices which can't handle them. We already do check the
capabilities of the device we end up routing it to, and complete the
checksum in software if the device can't cope.

All we're talking about here is a corner case when that existing check
doesn't actually give the right results, because it assumes a device
with NETIF_F_IP_CSUM can checksum *all* Legacy IP packets, not just TCP
and UDP.

Patch hide | download patch | download mbox

diff --git a/net/core/dev.c b/net/core/dev.c
index 515473e..f1048b6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2229,22 +2229,39 @@  static int dev_gso_segment(struct sk_buff *skb, netdev_features_t features)
 	return 0;
 }
 
-static bool can_checksum_protocol(netdev_features_t features, __be16 protocol)
+static bool can_checksum_protocol(netdev_features_t features, __be16 protocol,
+				  __u16 csum_offset)
 {
-	return ((features & NETIF_F_GEN_CSUM) ||
-		((features & NETIF_F_V4_CSUM) &&
-		 protocol == htons(ETH_P_IP)) ||
-		((features & NETIF_F_V6_CSUM) &&
-		 protocol == htons(ETH_P_IPV6)) ||
-		((features & NETIF_F_FCOE_CRC) &&
-		 protocol == htons(ETH_P_FCOE)));
+	if (features & NETIF_F_GEN_CSUM)
+		return 1;
+
+	if ((features & NETIF_F_FCOE_CRC) && protocol == htons(ETH_P_FCOE))
+		return 1;
+
+	/*
+	 * Only allow NETIF_F_V[46]_CSUM for UDP/TCP packets. This is an
+	 * overly permissive check, but it's very unlikely to have false
+	 * positives in practice, and actually looking in the packet for
+	 * a proper confirmation would be very slow.
+	 */
+	if (csum_offset != offsetof(struct udphdr, check) &&
+	    csum_offset != offsetof(struct tcphdr, check))
+		return 0;
+
+	if ((features & NETIF_F_V4_CSUM) && protocol == htons(ETH_P_IP))
+		return 1;
+
+	if ((features & NETIF_F_V6_CSUM) && protocol == htons(ETH_P_IPV6))
+		return 1;
+
+	return 0;
 }
 
 static netdev_features_t harmonize_features(struct sk_buff *skb,
 	__be16 protocol, netdev_features_t features)
 {
 	if (skb->ip_summed != CHECKSUM_NONE &&
-	    !can_checksum_protocol(features, protocol)) {
+	    !can_checksum_protocol(features, protocol, skb->csum_offset)) {
 		features &= ~NETIF_F_ALL_CSUM;
 		features &= ~NETIF_F_SG;
 	} else if (illegal_highdma(skb->dev, skb)) {