Message ID | 1393972752.26794.145.camel@edumazet-glaptop2.roam.corp.google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 04 Mar 2014 14:39:12 -0800 > @@ -767,6 +767,17 @@ void tcp_release_cb(struct sock *sk) > if (flags & (1UL << TCP_TSQ_DEFERRED)) > tcp_tsq_handler(sk); > > + /* Here begins the tricky part : > + * We are called from release_sock() with : > + * 1) BH disabled > + * 2) sk_lock.slock spinlock held > + * 3) socket owned by us (sk->sk_lock.owned == 1) > + * > + * But following code is meant to be called from BH handlers, > + * so we should keep BH disabled, but early release socket ownership > + */ > + sock_release_ownership(sk); > + It really means that sk_lock.owned cannot ever be accessed without the sk_lock spinlock held. Most of this is easy to hand audit, except sock_owned_by_user() which has call sites everywhere. Consider adding a locking assertion to it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-03-05 at 20:59 -0500, David Miller wrote: > It really means that sk_lock.owned cannot ever be accessed without the > sk_lock spinlock held. > > Most of this is easy to hand audit, except sock_owned_by_user() which > has call sites everywhere. > > Consider adding a locking assertion to it. We can do that, but would it be a stable candidate ? What about I send a followup for net-next ? Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 05 Mar 2014 18:06:41 -0800 > On Wed, 2014-03-05 at 20:59 -0500, David Miller wrote: > >> It really means that sk_lock.owned cannot ever be accessed without the >> sk_lock spinlock held. >> >> Most of this is easy to hand audit, except sock_owned_by_user() which >> has call sites everywhere. >> >> Consider adding a locking assertion to it. > > We can do that, but would it be a stable candidate ? > > What about I send a followup for net-next ? Targetting net-next for the assertion is fine. Did you get test results back yet? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-03-05 at 21:15 -0500, David Miller wrote: > Targetting net-next for the assertion is fine. > > Did you get test results back yet? Lars gave the results this morning for the first patch. He said he was now testing the official patch. Lets wait a bit more ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2014-03-07 at 16:45 +0100, Lars Persson wrote: > So far no deadlock on the socket lock is seen with this patch. I have seen two out-of-memory crashes which leaves the system deadlocked trying to allocate memory. > > Stack traces and kernel log from one of the oom crashes is attached. I was thinking this may be caused by having the dirty_ratio set too high given our product's heavy use of memory. Currently re-testing with dirty_ratio reduced to 5%. > > > [11515.530000] Recording maint invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 > [11515.540000] CPU: 0 PID: 1304 Comm: Recording maint Tainted: G O 3.10.32 #10 > [11515.550000] Stack : 80666e52 00000049 80660000 00000000 805563dc 8b1876f8 8054af80 805d2227 > 00000518 00000000 80656d40 8b1876f8 000007ce 00000002 00000000 8047d858 > 00000000 8003bb64 00000006 00000000 8054d024 > [11515.580000] 8b189a84 8b189a84 8054af80 > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 8b189a18 > ... Well, this is 3.10.32, with local changes to dirty_ratio.... so I will simply ignore mm issues for the moment. Thanks a lot Lars Tested-by: Lars Persson <lars.persson@axis.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2014-03-07 at 08:01 -0800, Eric Dumazet wrote: > On Fri, 2014-03-07 at 16:45 +0100, Lars Persson wrote: > > So far no deadlock on the socket lock is seen with this patch. I have seen two out-of-memory crashes which leaves the system deadlocked trying to allocate memory. > > > > Stack traces and kernel log from one of the oom crashes is attached. I was thinking this may be caused by having the dirty_ratio set too high given our product's heavy use of memory. Currently re-testing with dirty_ratio reduced to 5%. > > > > > > [11515.530000] Recording maint invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 > > [11515.540000] CPU: 0 PID: 1304 Comm: Recording maint Tainted: G O 3.10.32 #10 > > [11515.550000] Stack : 80666e52 00000049 80660000 00000000 805563dc 8b1876f8 8054af80 805d2227 > > 00000518 00000000 80656d40 8b1876f8 000007ce 00000002 00000000 8047d858 > > 00000000 8003bb64 00000006 00000000 8054d024 > > [11515.580000] 8b189a84 8b189a84 8054af80 > > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 8b189a18 > > ... > > > Well, this is 3.10.32, with local changes to dirty_ratio.... > > so I will simply ignore mm issues for the moment. > > Thanks a lot Lars > > Tested-by: Lars Persson <lars.persson@axis.com> David, what happened with this patch ? Should I re-submit it ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 10 Mar 2014 09:43:51 -0700 > On Fri, 2014-03-07 at 08:01 -0800, Eric Dumazet wrote: >> On Fri, 2014-03-07 at 16:45 +0100, Lars Persson wrote: >> > So far no deadlock on the socket lock is seen with this patch. I have seen two out-of-memory crashes which leaves the system deadlocked trying to allocate memory. >> > >> > Stack traces and kernel log from one of the oom crashes is attached. I was thinking this may be caused by having the dirty_ratio set too high given our product's heavy use of memory. Currently re-testing with dirty_ratio reduced to 5%. >> > >> > >> > [11515.530000] Recording maint invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 >> > [11515.540000] CPU: 0 PID: 1304 Comm: Recording maint Tainted: G O 3.10.32 #10 >> > [11515.550000] Stack : 80666e52 00000049 80660000 00000000 805563dc 8b1876f8 8054af80 805d2227 >> > 00000518 00000000 80656d40 8b1876f8 000007ce 00000002 00000000 8047d858 >> > 00000000 8003bb64 00000006 00000000 8054d024 >> > [11515.580000] 8b189a84 8b189a84 8054af80 >> > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >> > 00000000 00000000 00000000 00000000 00000000 00000000 00000000 8b189a18 >> > ... >> >> >> Well, this is 3.10.32, with local changes to dirty_ratio.... >> >> so I will simply ignore mm issues for the moment. >> >> Thanks a lot Lars >> >> Tested-by: Lars Persson <lars.persson@axis.com> > > David, what happened with this patch ? > > Should I re-submit it ? Yes, please do. Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/sock.h b/include/net/sock.h index 5c3f7c3624aa..2100d5e8809a 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1488,6 +1488,11 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) */ #define sock_owned_by_user(sk) ((sk)->sk_lock.owned) +static inline void sock_release_ownership(struct sock *sk) +{ + sk->sk_lock.owned = 0; +} + /* * Macro so as to not evaluate some arguments when * lockdep is not enabled. diff --git a/net/core/sock.c b/net/core/sock.c index 5b6a9431b017..c0fc6bdad1e3 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2357,10 +2357,13 @@ void release_sock(struct sock *sk) if (sk->sk_backlog.tail) __release_sock(sk); + /* Warning : release_cb() might need to release sk ownership, + * ie call sock_release_ownership(sk) before us. + */ if (sk->sk_prot->release_cb) sk->sk_prot->release_cb(sk); - sk->sk_lock.owned = 0; + sock_release_ownership(sk); if (waitqueue_active(&sk->sk_lock.wq)) wake_up(&sk->sk_lock.wq); spin_unlock_bh(&sk->sk_lock.slock); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index f0eb4e337ec8..17a11e65e57f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -767,6 +767,17 @@ void tcp_release_cb(struct sock *sk) if (flags & (1UL << TCP_TSQ_DEFERRED)) tcp_tsq_handler(sk); + /* Here begins the tricky part : + * We are called from release_sock() with : + * 1) BH disabled + * 2) sk_lock.slock spinlock held + * 3) socket owned by us (sk->sk_lock.owned == 1) + * + * But following code is meant to be called from BH handlers, + * so we should keep BH disabled, but early release socket ownership + */ + sock_release_ownership(sk); + if (flags & (1UL << TCP_WRITE_TIMER_DEFERRED)) { tcp_write_timer_handler(sk); __sock_put(sk);