Message ID | 1358165431.27054.62.camel@shinybook.infradead.org |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
From: David Woodhouse <dwmw2@infradead.org> Date: Mon, 14 Jan 2013 12:10:31 +0000 > Devices with the NETIF_F_V[46]_CSUM feature(s) are *only* required to > handle checksumming of UDP and TCP. > > In netif_skb_features() we attempt to filter out the capabilities which > are inappropriate for the device that the skb will actually be sent > from... but there we assume that NETIF_F_V4_CSUM devices can handle > *all* Legacy IP, and that NETIF_F_V6_CSUM devices can handle *all* IPv6. > > This may have been OK in the days when CHECKSUM_PARTIAL packets would > *only* be produced by the local stack, and we knew the local stack > didn't generate them for anything but UDP and TCP. But these days that's > not true. When a tun device receives a packet from userspace with > VIRTIO_NET_HDR_F_NEEDS_CSUM, that translates fairly directly into > setting CHECKSUM_PARTIAL on the resulting skb. Since virtio_net > advertises NETIF_F_HW_CSUM to its guests, we should expect to be asked > to checksum *anything*. My opinion on this is that the injectors of packets are responsible for ensuring checksum types are set on SKBs in an appropriate way. So we ensure this in the local protocol stacks that generate packets, and if foreign alien entities can inject SKBs with these checksum settings (like the tun device can) the burdon of verification falls upon whatever layer allows that to happen. So really, the fix is in the tun device and the virtio layer. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-01-16 at 15:54 -0500, David Miller wrote: > > My opinion on this is that the injectors of packets are responsible > for ensuring checksum types are set on SKBs in an appropriate way. > > So we ensure this in the local protocol stacks that generate packets, > and if foreign alien entities can inject SKBs with these checksum > settings (like the tun device can) the burdon of verification falls > upon whatever layer allows that to happen. > > So really, the fix is in the tun device and the virtio layer. The virtio layer (and the tun device) expose the equivalent of the NETIF_F_HW_CSUM capability to the guest. In the case where we have a real device on the host which *also* has NETIF_F_HW_CSUM capability, are you saying that the tun driver should do the checksum for non-UDP/TCP packets in software *anyway*, just because the packet might end up going out a device *without* that capability, and the check in harmonize_features() isn't sophisticated enough to cope properly?
From: David Woodhouse <dwmw2@infradead.org> Date: Wed, 16 Jan 2013 22:34:18 +0000 > On Wed, 2013-01-16 at 15:54 -0500, David Miller wrote: >> >> My opinion on this is that the injectors of packets are responsible >> for ensuring checksum types are set on SKBs in an appropriate way. >> >> So we ensure this in the local protocol stacks that generate packets, >> and if foreign alien entities can inject SKBs with these checksum >> settings (like the tun device can) the burdon of verification falls >> upon whatever layer allows that to happen. >> >> So really, the fix is in the tun device and the virtio layer. > > The virtio layer (and the tun device) expose the equivalent of the > NETIF_F_HW_CSUM capability to the guest. In the case where we have a > real device on the host which *also* has NETIF_F_HW_CSUM capability, are > you saying that the tun driver should do the checksum for non-UDP/TCP > packets in software *anyway*, just because the packet might end up going > out a device *without* that capability, and the check in > harmonize_features() isn't sophisticated enough to cope properly? I'm saying that tun can't inject unchecked crap into our stack. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-01-16 at 18:00 -0500, David Miller wrote:
> I'm saying that tun can't inject unchecked crap into our stack.
That's a very strange way of putting it.
Our stack has explicit support for sane hardware devices with
NETIF_F_HW_CSUM capability that can checksum anything.
And it has checks (harmonize_features) on output, so that if the device
on which a packet is being emitted *doesn't* have appropriate hardware
checksum capability, we'll do the checksum in software.
Except that the check in harmonize_features() doesn't do that check
*properly*. It only catches *some* of the packets that the device can't
handle, and lets others through.
So we basically can't use NETIF_F_HW_CSUM in the general case, for
anything like SCTP or any other protocol, because harmonize_features()
is buggy and will let it go out a device that can't handle it.
Um, SCTP *does* use CHECKSUM_PARTIAL. Am I missing something or does
that suffer the same problem?
On Thu, 2013-01-17 at 00:03 +0000, David Woodhouse wrote: > Um, SCTP *does* use CHECKSUM_PARTIAL. Am I missing something or does > that suffer the same problem? Am I mistaken, or does SCTP end up sending un-checksummed packets because of the same 'bug' in harmonize_features()?
On Wed, 2013-01-16 at 18:00 -0500, David Miller wrote: > From: David Woodhouse <dwmw2@infradead.org> > Date: Wed, 16 Jan 2013 22:34:18 +0000 > > > On Wed, 2013-01-16 at 15:54 -0500, David Miller wrote: > >> > >> My opinion on this is that the injectors of packets are responsible > >> for ensuring checksum types are set on SKBs in an appropriate way. > >> > >> So we ensure this in the local protocol stacks that generate packets, > >> and if foreign alien entities can inject SKBs with these checksum > >> settings (like the tun device can) the burdon of verification falls > >> upon whatever layer allows that to happen. > >> > >> So really, the fix is in the tun device and the virtio layer. > > > > The virtio layer (and the tun device) expose the equivalent of the > > NETIF_F_HW_CSUM capability to the guest. In the case where we have a > > real device on the host which *also* has NETIF_F_HW_CSUM capability, are > > you saying that the tun driver should do the checksum for non-UDP/TCP > > packets in software *anyway*, just because the packet might end up going > > out a device *without* that capability, and the check in > > harmonize_features() isn't sophisticated enough to cope properly? > > I'm saying that tun can't inject unchecked crap into our stack. Did we ever resolve this? AFAICT from inspecting the code the virtio_net device still advertises hardware csum capabilities to the guest. And accepts packets which need checksumming, calling skb_partial_csum_set() as appropriate. Likewise tun, xen, macvtap and af_packet. And that works fine — it's a nice performance win because it means that VM guests (and other clients) can make full use of the HW csum capabilities of the real network hardware. And when the outbound netdevice *doesn't* have HW csum support, we generally do the right thing and complete the csum in software in the host kernel before transmitting it. Perhaps I'm missing something, but I'm not sure why you refer to that as 'injecting unchecked crap'. Surely it's using CHECKSUM_PARTIAL precisely as it was designed, and allowing the checksum to be completed either by hardware or software as appropriate? The *only* problem is the false positive in harmonize_features(), which was addressed by my patch which started this thread (in 2013). The problem is that an IP packet that *isn't* TCP or UDP, being sent out a device that has only NETIF_F_IP_CSUM capability, ends up being handed to the device unchecksummed because harmonize_features() fails to clear the HW csum flag as it (arguably) should. Original thread at http://comments.gmane.org/gmane.linux.network/254981 I'm only looking at it again because I'm pondering enabling HW csum in 8139cp (now that I've fixed TSO), and it reminded me of this...
On Mon, 2015-09-21 at 17:29 +0100, David Woodhouse wrote: > > Did we ever resolve this? AFAICT from inspecting the code the > virtio_net device still advertises hardware csum capabilities to the > guest. And accepts packets which need checksumming, calling > skb_partial_csum_set() as appropriate. Likewise tun, xen, macvtap and > af_packet. Here's a test case which provokes the network stack into handing a CHECKSUM_PARTIAL skb to a device which it knows can't handle it. (It obviously needs the AF_PACKET endianness ABI fix I sent earlier.) You might well be right to refer to this as 'injecting unchecked crap', but we are *gaining* injection points with the ability to do this, and for not entirely insane reasons — people want to be able to make full use of hardware offload capabilities. And we *have* a safety check, to avoid handing CHECKSUM_PARTIAL buffers to devices which can't handle them. We already do check the capabilities of the device we end up routing it to, and complete the checksum in software if the device can't cope. All we're talking about here is a corner case when that existing check doesn't actually give the right results, because it assumes a device with NETIF_F_IP_CSUM can checksum *all* Legacy IP packets, not just TCP and UDP.
diff --git a/net/core/dev.c b/net/core/dev.c index 515473e..f1048b6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2229,22 +2229,39 @@ static int dev_gso_segment(struct sk_buff *skb, netdev_features_t features) return 0; } -static bool can_checksum_protocol(netdev_features_t features, __be16 protocol) +static bool can_checksum_protocol(netdev_features_t features, __be16 protocol, + __u16 csum_offset) { - return ((features & NETIF_F_GEN_CSUM) || - ((features & NETIF_F_V4_CSUM) && - protocol == htons(ETH_P_IP)) || - ((features & NETIF_F_V6_CSUM) && - protocol == htons(ETH_P_IPV6)) || - ((features & NETIF_F_FCOE_CRC) && - protocol == htons(ETH_P_FCOE))); + if (features & NETIF_F_GEN_CSUM) + return 1; + + if ((features & NETIF_F_FCOE_CRC) && protocol == htons(ETH_P_FCOE)) + return 1; + + /* + * Only allow NETIF_F_V[46]_CSUM for UDP/TCP packets. This is an + * overly permissive check, but it's very unlikely to have false + * positives in practice, and actually looking in the packet for + * a proper confirmation would be very slow. + */ + if (csum_offset != offsetof(struct udphdr, check) && + csum_offset != offsetof(struct tcphdr, check)) + return 0; + + if ((features & NETIF_F_V4_CSUM) && protocol == htons(ETH_P_IP)) + return 1; + + if ((features & NETIF_F_V6_CSUM) && protocol == htons(ETH_P_IPV6)) + return 1; + + return 0; } static netdev_features_t harmonize_features(struct sk_buff *skb, __be16 protocol, netdev_features_t features) { if (skb->ip_summed != CHECKSUM_NONE && - !can_checksum_protocol(features, protocol)) { + !can_checksum_protocol(features, protocol, skb->csum_offset)) { features &= ~NETIF_F_ALL_CSUM; features &= ~NETIF_F_SG; } else if (illegal_highdma(skb->dev, skb)) {
Devices with the NETIF_F_V[46]_CSUM feature(s) are *only* required to handle checksumming of UDP and TCP. In netif_skb_features() we attempt to filter out the capabilities which are inappropriate for the device that the skb will actually be sent from... but there we assume that NETIF_F_V4_CSUM devices can handle *all* Legacy IP, and that NETIF_F_V6_CSUM devices can handle *all* IPv6. This may have been OK in the days when CHECKSUM_PARTIAL packets would *only* be produced by the local stack, and we knew the local stack didn't generate them for anything but UDP and TCP. But these days that's not true. When a tun device receives a packet from userspace with VIRTIO_NET_HDR_F_NEEDS_CSUM, that translates fairly directly into setting CHECKSUM_PARTIAL on the resulting skb. Since virtio_net advertises NETIF_F_HW_CSUM to its guests, we should expect to be asked to checksum *anything*. This patch attempts to cope with that by checking skb->csum_offset for such devices. If that doesn't match the offset for UDP or TCP, then we don't use hardware checksum. It won't catch 100% of cases, but a full check of the actual skb contents in the fast path isn't a good idea. It'll probably do well enough for now. This expands the check in can_checksum_protocol() to make it more readable, but doing so shouldn't make the resulting code any *bigger*, except obviously for the additional checks. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>