diff mbox series

[net] packet: copy user buffers before orphan or clone

Message ID 20181120180018.178537-1-willemdebruijn.kernel@gmail.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series [net] packet: copy user buffers before orphan or clone | expand

Commit Message

Willem de Bruijn Nov. 20, 2018, 6 p.m. UTC
From: Willem de Bruijn <willemb@google.com>

tpacket_snd sends packets with user pages linked into skb frags. It
notifies that pages can be reused when the skb is released by setting
skb->destructor to tpacket_destruct_skb.

This can cause data corruption if the skb is orphaned (e.g., on
transmit through veth) or cloned (e.g., on mirror to another psock).

Create a kernel-private copy of data in these cases, same as tun/tap
zerocopy transmission. Reuse that infrastructure: mark the skb as
SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).

Unlike other zerocopy packets, do not set shinfo destructor_arg to
struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
when the original skb is released and a timestamp is recorded. Do not
change this timestamp behavior. The ubuf_info->callback is not needed
anyway, as no zerocopy notification is expected.

Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
resulting value is not a valid ubuf_info pointer, nor a valid
tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.

The fix relies on features introduced in commit 52267790ef52 ("sock:
add MSG_ZEROCOPY"), so can be backported as is only to 4.14.

Tested with from `./in_netns.sh ./txring_overwrite` from
http://github.com/wdebruij/kerneltools/tests

Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
Reported-by: Anand H. Krishnan <anandhkrishnan@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>

---

I did not find a better location to store not-a-uarg than in the ptr.
An alternative is to check skb->sk->sk_family. That adds a cacheline
read and needs a helper as skbuff.h does not include sock.h.
---
 include/linux/skbuff.h | 18 +++++++++++++++++-
 net/packet/af_packet.c |  4 ++--
 2 files changed, 19 insertions(+), 3 deletions(-)

Comments

David Miller Nov. 23, 2018, 7:09 p.m. UTC | #1
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Tue, 20 Nov 2018 13:00:18 -0500

> From: Willem de Bruijn <willemb@google.com>
> 
> tpacket_snd sends packets with user pages linked into skb frags. It
> notifies that pages can be reused when the skb is released by setting
> skb->destructor to tpacket_destruct_skb.
> 
> This can cause data corruption if the skb is orphaned (e.g., on
> transmit through veth) or cloned (e.g., on mirror to another psock).
> 
> Create a kernel-private copy of data in these cases, same as tun/tap
> zerocopy transmission. Reuse that infrastructure: mark the skb as
> SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
> 
> Unlike other zerocopy packets, do not set shinfo destructor_arg to
> struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
> when the original skb is released and a timestamp is recorded. Do not
> change this timestamp behavior. The ubuf_info->callback is not needed
> anyway, as no zerocopy notification is expected.
> 
> Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
> resulting value is not a valid ubuf_info pointer, nor a valid
> tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
> 
> The fix relies on features introduced in commit 52267790ef52 ("sock:
> add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
> 
> Tested with from `./in_netns.sh ./txring_overwrite` from
> http://github.com/wdebruij/kerneltools/tests
> 
> Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
> Reported-by: Anand H. Krishnan <anandhkrishnan@gmail.com>
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Applied, and queued up for -stable.  Thanks for the backporting notes.

Any chance those tests from your kerneltools repo can make their way
into selftests?

Thanks.
Willem de Bruijn Nov. 23, 2018, 10:32 p.m. UTC | #2
On Fri, Nov 23, 2018 at 2:09 PM David Miller <davem@davemloft.net> wrote:
>
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Tue, 20 Nov 2018 13:00:18 -0500
>
> > From: Willem de Bruijn <willemb@google.com>
> >
> > tpacket_snd sends packets with user pages linked into skb frags. It
> > notifies that pages can be reused when the skb is released by setting
> > skb->destructor to tpacket_destruct_skb.
> >
> > This can cause data corruption if the skb is orphaned (e.g., on
> > transmit through veth) or cloned (e.g., on mirror to another psock).
> >
> > Create a kernel-private copy of data in these cases, same as tun/tap
> > zerocopy transmission. Reuse that infrastructure: mark the skb as
> > SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
> >
> > Unlike other zerocopy packets, do not set shinfo destructor_arg to
> > struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
> > when the original skb is released and a timestamp is recorded. Do not
> > change this timestamp behavior. The ubuf_info->callback is not needed
> > anyway, as no zerocopy notification is expected.
> >
> > Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
> > resulting value is not a valid ubuf_info pointer, nor a valid
> > tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
> >
> > The fix relies on features introduced in commit 52267790ef52 ("sock:
> > add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
> >
> > Tested with from `./in_netns.sh ./txring_overwrite` from
> > http://github.com/wdebruij/kerneltools/tests
> >
> > Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
> > Reported-by: Anand H. Krishnan <anandhkrishnan@gmail.com>
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
>
> Applied, and queued up for -stable.  Thanks for the backporting notes.
>
> Any chance those tests from your kerneltools repo can make their way
> into selftests?

Absolutely. I'll send it to net-next once it has the fix.
diff mbox series

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0ba687454267..0d1b2c3f127b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1326,6 +1326,22 @@  static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg)
 	}
 }
 
+static inline void skb_zcopy_set_nouarg(struct sk_buff *skb, void *val)
+{
+	skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t) val | 0x1UL);
+	skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG;
+}
+
+static inline bool skb_zcopy_is_nouarg(struct sk_buff *skb)
+{
+	return (uintptr_t) skb_shinfo(skb)->destructor_arg & 0x1UL;
+}
+
+static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb)
+{
+	return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL);
+}
+
 /* Release a reference on a zerocopy structure */
 static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
 {
@@ -1335,7 +1351,7 @@  static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy)
 		if (uarg->callback == sock_zerocopy_callback) {
 			uarg->zerocopy = uarg->zerocopy && zerocopy;
 			sock_zerocopy_put(uarg);
-		} else {
+		} else if (!skb_zcopy_is_nouarg(skb)) {
 			uarg->callback(uarg, zerocopy);
 		}
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index ec3095f13aae..a74650e98f42 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2394,7 +2394,7 @@  static void tpacket_destruct_skb(struct sk_buff *skb)
 		void *ph;
 		__u32 ts;
 
-		ph = skb_shinfo(skb)->destructor_arg;
+		ph = skb_zcopy_get_nouarg(skb);
 		packet_dec_pending(&po->tx_ring);
 
 		ts = __packet_set_timestamp(po, ph, skb);
@@ -2461,7 +2461,7 @@  static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	skb->mark = po->sk.sk_mark;
 	skb->tstamp = sockc->transmit_time;
 	sock_tx_timestamp(&po->sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags);
-	skb_shinfo(skb)->destructor_arg = ph.raw;
+	skb_zcopy_set_nouarg(skb, ph.raw);
 
 	skb_reserve(skb, hlen);
 	skb_reset_network_header(skb);