diff mbox

Netlink mmap tx security?

Message ID 20141216.175817.576861457076632402.davem@davemloft.net
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

David Miller Dec. 16, 2014, 10:58 p.m. UTC
From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 16 Oct 2014 08:07:53 +0100

> On 10/16/14 at 08:45am, Daniel Borkmann wrote:
>> On 10/15/2014 04:09 AM, David Miller wrote:
>> >That would work as well.
>> >
>> >There are pros and cons to all of these approaches.
>> >
>> >I was thinking that if we do the "TX mmap --> copy to kernel buffer"
>> >approach, then if in the future we find a way to make it work
>> >reliably, we can avoid the copy.  And frankly performance wise it's no
>> >worse than what happens via normal sendmsg() calls.
>> >
>> >And all applications using NETLINK_RX_RING keep working and keep
>> >getting the performance boost.
>> 
>> That would be better, yes. This would avoid having such a TPACKET_V*
>> API chaos we have in packet sockets if this could be fixed for netlink
>> eventually.
> 
> Only saw the second part of Dave's message now. I agree that this
> is even a better option.

Sorry for dropping the ball on this, here is the patch I have right
now, any comments?

Comments

Daniel Borkmann Dec. 16, 2014, 11:58 p.m. UTC | #1
On 12/16/2014 11:58 PM, David Miller wrote:
> From: Thomas Graf <tgraf@suug.ch>
> Date: Thu, 16 Oct 2014 08:07:53 +0100
>
>> On 10/16/14 at 08:45am, Daniel Borkmann wrote:
>>> On 10/15/2014 04:09 AM, David Miller wrote:
>>>> That would work as well.
>>>>
>>>> There are pros and cons to all of these approaches.
>>>>
>>>> I was thinking that if we do the "TX mmap --> copy to kernel buffer"
>>>> approach, then if in the future we find a way to make it work
>>>> reliably, we can avoid the copy.  And frankly performance wise it's no
>>>> worse than what happens via normal sendmsg() calls.
>>>>
>>>> And all applications using NETLINK_RX_RING keep working and keep
>>>> getting the performance boost.
>>>
>>> That would be better, yes. This would avoid having such a TPACKET_V*
>>> API chaos we have in packet sockets if this could be fixed for netlink
>>> eventually.
>>
>> Only saw the second part of Dave's message now. I agree that this
>> is even a better option.
>
> Sorry for dropping the ball on this, here is the patch I have right
> now, any comments?
>
> ====================
> [PATCH] netlink: Always copy on mmap TX.
>
> Checking the file f_count and the nlk->mapped count is not completely
> sufficient to prevent the mmap'd area contents from changing from
> under us during netlink mmap sendmsg() operations.
>
> Be careful to sample the header's length field only once, because this
> could change from under us as well.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>

Looks good to me, thanks for following up on this Dave!

Now this also leaves NETLINK_SKB_TX unused afterwards, so we should
get rid of these bits, too. It's only set in the broken 'exclusive'
path and tested for resetting the slot status in the skb destructor,
but _now_ we set it back immediately to NL_MMAP_STATUS_UNUSED instead
of a NL_MMAP_STATUS_RESERVED -> NL_MMAP_STATUS_UNUSED transition via
destructor.

The atomic (tx)ring->pending logic inside the send path seems also
unused now; similarly as in, resp. taken from packet sockets, it was
there to wait for completion (for blocking syscalls) until the
netlink_skb_destructor() has been invoked and thus the slot given
back to user space.

Anyways, above could probably be followed-up; other than that:

Fixes: 5fd96123ee19 ("netlink: implement memory mapped sendmsg()")
Acked-by: Daniel Borkmann <dborkman@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 17, 2014, 12:02 a.m. UTC | #2
On Tue, 2014-12-16 at 17:58 -0500, David Miller wrote:

> +		__skb_put(skb, nm_len);
> +		memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, nm_len);
> +		netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
>  

Not related to this patch, but it looks like netlink_set_status()
barrier is wrong ?

It seems we need smp_wmb() after the memcpy() and before the
hdr->nm_status = status; 




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Dec. 17, 2014, 4:26 p.m. UTC | #3
On 12/16/14 at 04:02pm, Eric Dumazet wrote:
> On Tue, 2014-12-16 at 17:58 -0500, David Miller wrote:
> 
> > +		__skb_put(skb, nm_len);
> > +		memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, nm_len);
> > +		netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
> >  
> 
> Not related to this patch, but it looks like netlink_set_status()
> barrier is wrong ?
> 
> It seems we need smp_wmb() after the memcpy() and before the
> hdr->nm_status = status; 

Yes, definitely wrong as-is. I'll send a patch. For the particular
case we'd need a smp_rmb() after the memcpy() to complete the loads.
The skb destructor needs a smp_wmb() after setting nm_len. We could
get away with a smp_wmb() first thing in netlink_set_status() with
the code as-is but smp_mb() might be the less fragile thing to do.
Objections?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf Dec. 17, 2014, 4:27 p.m. UTC | #4
On 12/17/14 at 12:58am, Daniel Borkmann wrote:
> Fixes: 5fd96123ee19 ("netlink: implement memory mapped sendmsg()")
> Acked-by: Daniel Borkmann <dborkman@redhat.com>

Nothing to add to Daniel's excellent feedback.

Acked-by: Thomas Graf <tgraf@suug.ch>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 18, 2014, 5:36 p.m. UTC | #5
From: Thomas Graf <tgraf@suug.ch>
Date: Wed, 17 Dec 2014 16:27:49 +0000

> On 12/17/14 at 12:58am, Daniel Borkmann wrote:
>> Fixes: 5fd96123ee19 ("netlink: implement memory mapped sendmsg()")
>> Acked-by: Daniel Borkmann <dborkman@redhat.com>
> 
> Nothing to add to Daniel's excellent feedback.
> 
> Acked-by: Thomas Graf <tgraf@suug.ch>

Applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

====================
[PATCH] netlink: Always copy on mmap TX.

Checking the file f_count and the nlk->mapped count is not completely
sufficient to prevent the mmap'd area contents from changing from
under us during netlink mmap sendmsg() operations.

Be careful to sample the header's length field only once, because this
could change from under us as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netlink/af_netlink.c | 52 +++++++++++++++---------------------------------
 1 file changed, 16 insertions(+), 36 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ef5f77b..a64680a 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -525,14 +525,14 @@  out:
 	return err;
 }
 
-static void netlink_frame_flush_dcache(const struct nl_mmap_hdr *hdr)
+static void netlink_frame_flush_dcache(const struct nl_mmap_hdr *hdr, unsigned int nm_len)
 {
 #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 1
 	struct page *p_start, *p_end;
 
 	/* First page is flushed through netlink_{get,set}_status */
 	p_start = pgvec_to_page(hdr + PAGE_SIZE);
-	p_end   = pgvec_to_page((void *)hdr + NL_MMAP_HDRLEN + hdr->nm_len - 1);
+	p_end   = pgvec_to_page((void *)hdr + NL_MMAP_HDRLEN + nm_len - 1);
 	while (p_start <= p_end) {
 		flush_dcache_page(p_start);
 		p_start++;
@@ -714,24 +714,16 @@  static int netlink_mmap_sendmsg(struct sock *sk, struct msghdr *msg,
 	struct nl_mmap_hdr *hdr;
 	struct sk_buff *skb;
 	unsigned int maxlen;
-	bool excl = true;
 	int err = 0, len = 0;
 
-	/* Netlink messages are validated by the receiver before processing.
-	 * In order to avoid userspace changing the contents of the message
-	 * after validation, the socket and the ring may only be used by a
-	 * single process, otherwise we fall back to copying.
-	 */
-	if (atomic_long_read(&sk->sk_socket->file->f_count) > 1 ||
-	    atomic_read(&nlk->mapped) > 1)
-		excl = false;
-
 	mutex_lock(&nlk->pg_vec_lock);
 
 	ring   = &nlk->tx_ring;
 	maxlen = ring->frame_size - NL_MMAP_HDRLEN;
 
 	do {
+		unsigned int nm_len;
+
 		hdr = netlink_current_frame(ring, NL_MMAP_STATUS_VALID);
 		if (hdr == NULL) {
 			if (!(msg->msg_flags & MSG_DONTWAIT) &&
@@ -739,35 +731,23 @@  static int netlink_mmap_sendmsg(struct sock *sk, struct msghdr *msg,
 				schedule();
 			continue;
 		}
-		if (hdr->nm_len > maxlen) {
+
+		nm_len = ACCESS_ONCE(hdr->nm_len);
+		if (nm_len > maxlen) {
 			err = -EINVAL;
 			goto out;
 		}
 
-		netlink_frame_flush_dcache(hdr);
+		netlink_frame_flush_dcache(hdr, nm_len);
 
-		if (likely(dst_portid == 0 && dst_group == 0 && excl)) {
-			skb = alloc_skb_head(GFP_KERNEL);
-			if (skb == NULL) {
-				err = -ENOBUFS;
-				goto out;
-			}
-			sock_hold(sk);
-			netlink_ring_setup_skb(skb, sk, ring, hdr);
-			NETLINK_CB(skb).flags |= NETLINK_SKB_TX;
-			__skb_put(skb, hdr->nm_len);
-			netlink_set_status(hdr, NL_MMAP_STATUS_RESERVED);
-			atomic_inc(&ring->pending);
-		} else {
-			skb = alloc_skb(hdr->nm_len, GFP_KERNEL);
-			if (skb == NULL) {
-				err = -ENOBUFS;
-				goto out;
-			}
-			__skb_put(skb, hdr->nm_len);
-			memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
-			netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
+		skb = alloc_skb(nm_len, GFP_KERNEL);
+		if (skb == NULL) {
+			err = -ENOBUFS;
+			goto out;
 		}
+		__skb_put(skb, nm_len);
+		memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, nm_len);
+		netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
 
 		netlink_increment_head(ring);
 
@@ -813,7 +793,7 @@  static void netlink_queue_mmaped_skb(struct sock *sk, struct sk_buff *skb)
 	hdr->nm_pid	= NETLINK_CB(skb).creds.pid;
 	hdr->nm_uid	= from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid);
 	hdr->nm_gid	= from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid);
-	netlink_frame_flush_dcache(hdr);
+	netlink_frame_flush_dcache(hdr, hdr->nm_len);
 	netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
 
 	NETLINK_CB(skb).flags |= NETLINK_SKB_DELIVERED;