Message ID | 1512151616.19682.44.camel@gmail.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net] tcp/dccp: block bh before arming time_wait timer | expand |
Acked-by: Maciej Żenczykowski <maze@google.com>
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 01 Dec 2017 10:06:56 -0800 > From: Eric Dumazet <edumazet@google.com> > > Maciej Żenczykowski reported some panics in tcp_twsk_destructor() > that might be caused by the following bug. > > timewait timer is pinned to the cpu, because we want to transition > timwewait refcount from 0 to 4 in one go, once everything has been > initialized. > > At the time commit ed2e92394589 ("tcp/dccp: fix timewait races in timer > handling") was merged, TCP was always running from BH habdler. > > After commit 5413d1babe8f ("net: do not block BH while processing > socket backlog") we definitely can run tcp_time_wait() from process > context. > > We need to block BH in the critical section so that the pinned timer > has still its purpose. > > This bug is more likely to happen under stress and when very small RTO > are used in datacenter flows. > > Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: Maciej Żenczykowski <maze@google.com> Applied and queued up for -stable, thanks Eric.
On Fri, 2017-12-01 at 15:12 -0500, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Fri, 01 Dec 2017 10:06:56 -0800 > > > From: Eric Dumazet <edumazet@google.com> > > > > Maciej Żenczykowski reported some panics in tcp_twsk_destructor() > > that might be caused by the following bug. > > > > timewait timer is pinned to the cpu, because we want to transition > > timwewait refcount from 0 to 4 in one go, once everything has been > > initialized. > > > > At the time commit ed2e92394589 ("tcp/dccp: fix timewait races in > timer > > handling") was merged, TCP was always running from BH habdler. > > > > After commit 5413d1babe8f ("net: do not block BH while processing > > socket backlog") we definitely can run tcp_time_wait() from process > > context. > > > > We need to block BH in the critical section so that the pinned > timer > > has still its purpose. > > > > This bug is more likely to happen under stress and when very small > RTO > > are used in datacenter flows. > > > > Fixes: 5413d1babe8f ("net: do not block BH while processing socket > backlog") > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Reported-by: Maciej Żenczykowski <maze@google.com> > > Applied and queued up for -stable, thanks Eric. It just occurred to me that we can now revert 614bdd4d6e61d26 ("tcp: must block bh in __inet_twsk_hashdance()")
diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c index abd07a443219853b022bef41cb072e90ff8f07f0..178bb9833311f83205317b07fe64cb2e45a9f734 100644 --- a/net/dccp/minisocks.c +++ b/net/dccp/minisocks.c @@ -57,10 +57,16 @@ void dccp_time_wait(struct sock *sk, int state, int timeo) if (state == DCCP_TIME_WAIT) timeo = DCCP_TIMEWAIT_LEN; + /* tw_timer is pinned, so we need to make sure BH are disabled + * in following section, otherwise timer handler could run before + * we complete the initialization. + */ + local_bh_disable(); inet_twsk_schedule(tw, timeo); /* Linkage updates. */ __inet_twsk_hashdance(tw, sk, &dccp_hashinfo); inet_twsk_put(tw); + local_bh_enable(); } else { /* Sorry, if we're out of memory, just CLOSE this * socket up. We've got bigger problems than diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index e36eff0403f4e80c4f7291a70614f40125652133..b079b619b60ca577d5ef20a5065fce87acecd96c 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -310,10 +310,16 @@ void tcp_time_wait(struct sock *sk, int state, int timeo) if (state == TCP_TIME_WAIT) timeo = TCP_TIMEWAIT_LEN; + /* tw_timer is pinned, so we need to make sure BH are disabled + * in following section, otherwise timer handler could run before + * we complete the initialization. + */ + local_bh_disable(); inet_twsk_schedule(tw, timeo); /* Linkage updates. */ __inet_twsk_hashdance(tw, sk, &tcp_hashinfo); inet_twsk_put(tw); + local_bh_enable(); } else { /* Sorry, if we're out of memory, just CLOSE this * socket up. We've got bigger problems than