diff mbox

[net,v2] net: Do not hold the reference for the same sk_rx_dst

Message ID 1489888121-1051-1-git-send-email-kaiwen.xu@hulu.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Kevin Xu March 19, 2017, 1:48 a.m. UTC
In some rare cases, inet_sk_rx_dst_set() may be called multiple times
on the same dst, causing double refcounting. Eventually, it
prevents net_device to be destroyed. The bug manifested as

unregister_netdevice: waiting for lo to become free. Usage count = 1

in the kernel log, preventing new network namespace creation.

Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
---
 net/ipv4/tcp_ipv4.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Cong Wang March 19, 2017, 3:49 a.m. UTC | #1
On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote:
> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> on the same dst, causing double refcounting. Eventually, it
> prevents net_device to be destroyed. The bug manifested as
>
> unregister_netdevice: waiting for lo to become free. Usage count = 1
>
> in the kernel log, preventing new network namespace creation.
>
> Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>

Don't know why you don't follow the discussion on your v1...

It is protected by bh_lock_sock(), so your patch is not needed
at all.

Read net/ipv4/udp.c:

1762 /* For TCP sockets, sk_rx_dst is protected by socket lock
1763  * For UDP, we use xchg() to guard against concurrent changes.
1764  */
Kevin Xu March 19, 2017, 4:03 a.m. UTC | #2
On Sat, Mar 18, 2017 at 08:49:43PM -0700, Cong Wang wrote:
> On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote:
> > In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> > on the same dst, causing double refcounting. Eventually, it
> > prevents net_device to be destroyed. The bug manifested as
> >
> > unregister_netdevice: waiting for lo to become free. Usage count = 1
> >
> > in the kernel log, preventing new network namespace creation.
> >
> > Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
> 
> Don't know why you don't follow the discussion on your v1...
> 
> It is protected by bh_lock_sock(), so your patch is not needed
> at all.
> 
> Read net/ipv4/udp.c:
> 
> 1762 /* For TCP sockets, sk_rx_dst is protected by socket lock
> 1763  * For UDP, we use xchg() to guard against concurrent changes.
> 1764  */

I probably misunderstood. Do you mean v2 patch is actually not needed or
the whole workaround is not necessary?
Cong Wang March 20, 2017, 4:09 a.m. UTC | #3
On Sat, Mar 18, 2017 at 9:03 PM, Kaiwen Xu <kaiwen.xu@hulu.com> wrote:
> On Sat, Mar 18, 2017 at 08:49:43PM -0700, Cong Wang wrote:
>> On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote:
>> > In some rare cases, inet_sk_rx_dst_set() may be called multiple times
>> > on the same dst, causing double refcounting. Eventually, it
>> > prevents net_device to be destroyed. The bug manifested as
>> >
>> > unregister_netdevice: waiting for lo to become free. Usage count = 1
>> >
>> > in the kernel log, preventing new network namespace creation.
>> >
>> > Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
>>
>> Don't know why you don't follow the discussion on your v1...
>>
>> It is protected by bh_lock_sock(), so your patch is not needed
>> at all.
>>
>> Read net/ipv4/udp.c:
>>
>> 1762 /* For TCP sockets, sk_rx_dst is protected by socket lock
>> 1763  * For UDP, we use xchg() to guard against concurrent changes.
>> 1764  */
>
> I probably misunderstood. Do you mean v2 patch is actually not needed or
> the whole workaround is not necessary?

Your patch, no matter v1 or v2, is not needed because we use
bh_lock_sock() to serialize inet_sk_rx_dst_set(), unless you find
a case where we miss the bh_lock_sock(), but you don't say it in
your changelog. "some rare cases" is not enough to justify this bug.
Kevin Xu March 20, 2017, 9:23 p.m. UTC | #4
On Sun, Mar 19, 2017 at 09:09:38PM -0700, Cong Wang wrote:
> On Sat, Mar 18, 2017 at 9:03 PM, Kaiwen Xu <kaiwen.xu@hulu.com> wrote:
> > On Sat, Mar 18, 2017 at 08:49:43PM -0700, Cong Wang wrote:
> >> On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote:
> >> > In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> >> > on the same dst, causing double refcounting. Eventually, it
> >> > prevents net_device to be destroyed. The bug manifested as
> >> >
> >> > unregister_netdevice: waiting for lo to become free. Usage count = 1
> >> >
> >> > in the kernel log, preventing new network namespace creation.
> >> >
> >> > Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
> >>
> >> Don't know why you don't follow the discussion on your v1...
> >>
> >> It is protected by bh_lock_sock(), so your patch is not needed
> >> at all.
> >>
> >> Read net/ipv4/udp.c:
> >>
> >> 1762 /* For TCP sockets, sk_rx_dst is protected by socket lock
> >> 1763  * For UDP, we use xchg() to guard against concurrent changes.
> >> 1764  */
> >
> > I probably misunderstood. Do you mean v2 patch is actually not needed or
> > the whole workaround is not necessary?
> 
> Your patch, no matter v1 or v2, is not needed because we use
> bh_lock_sock() to serialize inet_sk_rx_dst_set(), unless you find
> a case where we miss the bh_lock_sock(), but you don't say it in
> your changelog. "some rare cases" is not enough to justify this bug.

I see, thanks for your explanation! I will try to dig in more to see if
I can find the root cause.
diff mbox

Patch

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 575e19d..ca588d1 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1806,9 +1806,11 @@  int tcp_v4_rcv(struct sk_buff *skb)
 void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
+	struct dst_entry *old;
 
 	if (dst && dst_hold_safe(dst)) {
-		sk->sk_rx_dst = dst;
+		old = xchg(&sk->sk_rx_dst, dst);
+		dst_release(old);
 		inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
 	}
 }