Message ID | 4c8332354d9423207740a814ea75706f59ca7e72.1511195030.git.sowmini.varadhan@oracle.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Series | rds-tcp netns delete related fixes | expand |
On 11/30/2017 11:11 AM, Sowmini Varadhan wrote: > Commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") > introduces a regression in rds-tcp netns cleanup. The cleanup_net(), > (and thus rds_tcp_dev_event notification) is only called from put_net() > when all netns refcounts go to 0, but this cannot happen if the > rds_connection itself is holding a c_net ref that it expects to > release in rds_tcp_kill_sock. > > Instead, the rds_tcp_kill_sock callback should make sure to > tear down state carefully, ensuring that the socket teardown > is only done after all data-structures and workqs that depend > on it are quiesced. > > The original motivation for commit 8edc3affc077 ("rds: tcp: Take explicit > refcounts on struct net") was to resolve a race condition reported by > syzkaller where workqs for tx/rx/connect were triggered after the > namespace was deleted. Those worker threads should have been > cancelled/flushed before socket tear-down and indeed, > rds_conn_path_destroy() does try to sequence this by doing > /* cancel cp_send_w */ > /* cancel cp_recv_w */ > /* flush cp_down_w */ > /* free data structures */ > Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus > invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that > we ought to have satisfied the requirement that "socket-close is > done after all other dependent state is quiesced". However, > rds_conn_shutdown has a bug in that it *always* triggers the reconnect > workq (and if connection is successful, we always restart tx/rx > workqs so with the right timing, we risk the race conditions reported > by syzkaller). > > Netns deletion is like module teardown- no need to restart a > reconnect in this case. We can use the c_destroy_in_prog bit > to avoid restarting the reconnect. > > Fixes: 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") > Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> > --- > net/rds/connection.c | 3 ++- > net/rds/rds.h | 6 +++--- > net/rds/tcp.c | 4 ++-- > 3 files changed, 7 insertions(+), 6 deletions(-) > > diff --git a/net/rds/connection.c b/net/rds/connection.c > index 7ee2d5d..9efc82c 100644 > --- a/net/rds/connection.c > +++ b/net/rds/connection.c > @@ -366,6 +366,8 @@ void rds_conn_shutdown(struct rds_conn_path *cp) > * to the conn hash, so we never trigger a reconnect on this > * conn - the reconnect is always triggered by the active peer. */ > cancel_delayed_work_sync(&cp->cp_conn_w); > + if (conn->c_destroy_in_prog) > + return; Not related to this patch but it will be more safe to use cp_flags or if needed add flag and conn level for bundle and use bit wise to avoid possible races to set c_destroy_in_prog. Something similar to RDS_DESTROY_PENDING etc. The patch itself looks good to me in terms of netns ref counting. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
diff --git a/net/rds/connection.c b/net/rds/connection.c index 7ee2d5d..9efc82c 100644 --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -366,6 +366,8 @@ void rds_conn_shutdown(struct rds_conn_path *cp) * to the conn hash, so we never trigger a reconnect on this * conn - the reconnect is always triggered by the active peer. */ cancel_delayed_work_sync(&cp->cp_conn_w); + if (conn->c_destroy_in_prog) + return; rcu_read_lock(); if (!hlist_unhashed(&conn->c_hash_node)) { rcu_read_unlock(); @@ -445,7 +447,6 @@ void rds_conn_destroy(struct rds_connection *conn) */ rds_cong_remove_conn(conn); - put_net(conn->c_net); kfree(conn->c_path); kmem_cache_free(rds_conn_slab, conn); diff --git a/net/rds/rds.h b/net/rds/rds.h index 2e0315b..d11301b 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -149,7 +149,7 @@ struct rds_connection { /* Protocol version */ unsigned int c_version; - struct net *c_net; + possible_net_t c_net; struct list_head c_map_item; unsigned long c_map_queued; @@ -164,13 +164,13 @@ struct rds_connection { static inline struct net *rds_conn_net(struct rds_connection *conn) { - return conn->c_net; + return read_pnet(&conn->c_net); } static inline void rds_conn_net_set(struct rds_connection *conn, struct net *net) { - conn->c_net = get_net(net); + write_pnet(&conn->c_net, net); } #define RDS_FLAG_CONG_BITMAP 0x01 diff --git a/net/rds/tcp.c b/net/rds/tcp.c index 222cc53..f580f72 100644 --- a/net/rds/tcp.c +++ b/net/rds/tcp.c @@ -506,7 +506,7 @@ static void rds_tcp_kill_sock(struct net *net) rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w); spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { - struct net *c_net = tc->t_cpath->cp_conn->c_net; + struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net); if (net != c_net || !tc->t_sock) continue; @@ -563,7 +563,7 @@ static void rds_tcp_sysctl_reset(struct net *net) spin_lock_irq(&rds_tcp_conn_lock); list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node) { - struct net *c_net = tc->t_cpath->cp_conn->c_net; + struct net *c_net = read_pnet(&tc->t_cpath->cp_conn->c_net); if (net != c_net || !tc->t_sock) continue;
Commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") introduces a regression in rds-tcp netns cleanup. The cleanup_net(), (and thus rds_tcp_dev_event notification) is only called from put_net() when all netns refcounts go to 0, but this cannot happen if the rds_connection itself is holding a c_net ref that it expects to release in rds_tcp_kill_sock. Instead, the rds_tcp_kill_sock callback should make sure to tear down state carefully, ensuring that the socket teardown is only done after all data-structures and workqs that depend on it are quiesced. The original motivation for commit 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") was to resolve a race condition reported by syzkaller where workqs for tx/rx/connect were triggered after the namespace was deleted. Those worker threads should have been cancelled/flushed before socket tear-down and indeed, rds_conn_path_destroy() does try to sequence this by doing /* cancel cp_send_w */ /* cancel cp_recv_w */ /* flush cp_down_w */ /* free data structures */ Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that we ought to have satisfied the requirement that "socket-close is done after all other dependent state is quiesced". However, rds_conn_shutdown has a bug in that it *always* triggers the reconnect workq (and if connection is successful, we always restart tx/rx workqs so with the right timing, we risk the race conditions reported by syzkaller). Netns deletion is like module teardown- no need to restart a reconnect in this case. We can use the c_destroy_in_prog bit to avoid restarting the reconnect. Fixes: 8edc3affc077 ("rds: tcp: Take explicit refcounts on struct net") Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> --- net/rds/connection.c | 3 ++- net/rds/rds.h | 6 +++--- net/rds/tcp.c | 4 ++-- 3 files changed, 7 insertions(+), 6 deletions(-)