diff mbox

udp: ipv4: must add synchronization in udp_sk_rx_dst_set()

Message ID 1386802011.19078.4.camel@edumazet-glaptop2.roam.corp.google.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Dec. 11, 2013, 10:46 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

Unlike TCP, UDP input path does not hold the socket lock.

Before messing with sk->sk_rx_dst, we must use a spinlock, otherwise
multiple cpus could leak a refcount.

This patch also takes care of renewing a stale dst entry.
(When the sk->sk_rx_dst would not be used by IP early demux)

Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
---
 net/ipv4/udp.c |   22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Dec. 12, 2013, 1:22 a.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 11 Dec 2013 14:46:51 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Unlike TCP, UDP input path does not hold the socket lock.
> 
> Before messing with sk->sk_rx_dst, we must use a spinlock, otherwise
> multiple cpus could leak a refcount.
> 
> This patch also takes care of renewing a stale dst entry.
> (When the sk->sk_rx_dst would not be used by IP early demux)
> 
> Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

Longterm, perhaps a candidate for using xchg() instead of this
spinlock?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Dec. 15, 2013, 6:30 p.m. UTC | #2
On Wed, 2013-12-11 at 20:22 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 11 Dec 2013 14:46:51 -0800
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Unlike TCP, UDP input path does not hold the socket lock.
> > 
> > Before messing with sk->sk_rx_dst, we must use a spinlock, otherwise
> > multiple cpus could leak a refcount.
> > 
> > This patch also takes care of renewing a stale dst entry.
> > (When the sk->sk_rx_dst would not be used by IP early demux)
> > 
> > Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Applied.
> 
> Longterm, perhaps a candidate for using xchg() instead of this
> spinlock?

Indeed... It appears sk_dst_lock should not be used from BH context
anyway...

I am sending a fix, thanks.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 16d246a51a02..62c19fdd102d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1599,12 +1599,21 @@  static void flush_stack(struct sock **stack, unsigned int count,
 		kfree_skb(skb1);
 }
 
-static void udp_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
+/* For TCP sockets, sk_rx_dst is protected by socket lock
+ * For UDP, we use sk_dst_lock to guard against concurrent changes.
+ */
+static void udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst)
 {
-	struct dst_entry *dst = skb_dst(skb);
+	struct dst_entry *old;
 
-	dst_hold(dst);
-	sk->sk_rx_dst = dst;
+	spin_lock(&sk->sk_dst_lock);
+	old = sk->sk_rx_dst;
+	if (likely(old != dst)) {
+		dst_hold(dst);
+		sk->sk_rx_dst = dst;
+		dst_release(old);
+	}
+	spin_unlock(&sk->sk_dst_lock);
 }
 
 /*
@@ -1737,10 +1746,11 @@  int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 
 	sk = skb_steal_sock(skb);
 	if (sk) {
+		struct dst_entry *dst = skb_dst(skb);
 		int ret;
 
-		if (unlikely(sk->sk_rx_dst == NULL))
-			udp_sk_rx_dst_set(sk, skb);
+		if (unlikely(sk->sk_rx_dst != dst))
+			udp_sk_rx_dst_set(sk, dst);
 
 		ret = udp_queue_rcv_skb(sk, skb);
 		sock_put(sk);