Message ID | 1489888121-1051-1-git-send-email-kaiwen.xu@hulu.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote: > In some rare cases, inet_sk_rx_dst_set() may be called multiple times > on the same dst, causing double refcounting. Eventually, it > prevents net_device to be destroyed. The bug manifested as > > unregister_netdevice: waiting for lo to become free. Usage count = 1 > > in the kernel log, preventing new network namespace creation. > > Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com> Don't know why you don't follow the discussion on your v1... It is protected by bh_lock_sock(), so your patch is not needed at all. Read net/ipv4/udp.c: 1762 /* For TCP sockets, sk_rx_dst is protected by socket lock 1763 * For UDP, we use xchg() to guard against concurrent changes. 1764 */
On Sat, Mar 18, 2017 at 08:49:43PM -0700, Cong Wang wrote: > On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote: > > In some rare cases, inet_sk_rx_dst_set() may be called multiple times > > on the same dst, causing double refcounting. Eventually, it > > prevents net_device to be destroyed. The bug manifested as > > > > unregister_netdevice: waiting for lo to become free. Usage count = 1 > > > > in the kernel log, preventing new network namespace creation. > > > > Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com> > > Don't know why you don't follow the discussion on your v1... > > It is protected by bh_lock_sock(), so your patch is not needed > at all. > > Read net/ipv4/udp.c: > > 1762 /* For TCP sockets, sk_rx_dst is protected by socket lock > 1763 * For UDP, we use xchg() to guard against concurrent changes. > 1764 */ I probably misunderstood. Do you mean v2 patch is actually not needed or the whole workaround is not necessary?
On Sat, Mar 18, 2017 at 9:03 PM, Kaiwen Xu <kaiwen.xu@hulu.com> wrote: > On Sat, Mar 18, 2017 at 08:49:43PM -0700, Cong Wang wrote: >> On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote: >> > In some rare cases, inet_sk_rx_dst_set() may be called multiple times >> > on the same dst, causing double refcounting. Eventually, it >> > prevents net_device to be destroyed. The bug manifested as >> > >> > unregister_netdevice: waiting for lo to become free. Usage count = 1 >> > >> > in the kernel log, preventing new network namespace creation. >> > >> > Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com> >> >> Don't know why you don't follow the discussion on your v1... >> >> It is protected by bh_lock_sock(), so your patch is not needed >> at all. >> >> Read net/ipv4/udp.c: >> >> 1762 /* For TCP sockets, sk_rx_dst is protected by socket lock >> 1763 * For UDP, we use xchg() to guard against concurrent changes. >> 1764 */ > > I probably misunderstood. Do you mean v2 patch is actually not needed or > the whole workaround is not necessary? Your patch, no matter v1 or v2, is not needed because we use bh_lock_sock() to serialize inet_sk_rx_dst_set(), unless you find a case where we miss the bh_lock_sock(), but you don't say it in your changelog. "some rare cases" is not enough to justify this bug.
On Sun, Mar 19, 2017 at 09:09:38PM -0700, Cong Wang wrote: > On Sat, Mar 18, 2017 at 9:03 PM, Kaiwen Xu <kaiwen.xu@hulu.com> wrote: > > On Sat, Mar 18, 2017 at 08:49:43PM -0700, Cong Wang wrote: > >> On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote: > >> > In some rare cases, inet_sk_rx_dst_set() may be called multiple times > >> > on the same dst, causing double refcounting. Eventually, it > >> > prevents net_device to be destroyed. The bug manifested as > >> > > >> > unregister_netdevice: waiting for lo to become free. Usage count = 1 > >> > > >> > in the kernel log, preventing new network namespace creation. > >> > > >> > Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com> > >> > >> Don't know why you don't follow the discussion on your v1... > >> > >> It is protected by bh_lock_sock(), so your patch is not needed > >> at all. > >> > >> Read net/ipv4/udp.c: > >> > >> 1762 /* For TCP sockets, sk_rx_dst is protected by socket lock > >> 1763 * For UDP, we use xchg() to guard against concurrent changes. > >> 1764 */ > > > > I probably misunderstood. Do you mean v2 patch is actually not needed or > > the whole workaround is not necessary? > > Your patch, no matter v1 or v2, is not needed because we use > bh_lock_sock() to serialize inet_sk_rx_dst_set(), unless you find > a case where we miss the bh_lock_sock(), but you don't say it in > your changelog. "some rare cases" is not enough to justify this bug. I see, thanks for your explanation! I will try to dig in more to see if I can find the root cause.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 575e19d..ca588d1 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1806,9 +1806,11 @@ int tcp_v4_rcv(struct sk_buff *skb) void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb) { struct dst_entry *dst = skb_dst(skb); + struct dst_entry *old; if (dst && dst_hold_safe(dst)) { - sk->sk_rx_dst = dst; + old = xchg(&sk->sk_rx_dst, dst); + dst_release(old); inet_sk(sk)->rx_dst_ifindex = skb->skb_iif; } }
In some rare cases, inet_sk_rx_dst_set() may be called multiple times on the same dst, causing double refcounting. Eventually, it prevents net_device to be destroyed. The bug manifested as unregister_netdevice: waiting for lo to become free. Usage count = 1 in the kernel log, preventing new network namespace creation. Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com> --- net/ipv4/tcp_ipv4.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)