Message ID | 1529934085-181126-1-git-send-email-sowmini.varadhan@oracle.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net-next] rds: clean up loopback rds_connections on netns deletion | expand |
On (06/25/18 06:41), Sowmini Varadhan wrote: : > Add the changes aligned with the changes from > commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize > netns/module teardown and rds connection/workq management") for > rds_loop_transport FWIW, I am optimistic that this will take care of a number of the use-after-free panics reported by syzbot (I have not marked the patch with the recommended syzkaller Reported-by tags because I was not able to reproduce each original issue, but inspection of the traces suggests this missing patch may be behind the races that cause the reports). --Sowmini
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Mon, 25 Jun 2018 06:41:25 -0700 > The RDS core module creates rds_connections based on callbacks > from rds_loop_transport when sending/receiving packets to local > addresses. > > These connections will need to be cleaned up when they are > created from a netns that is not init_net, and that netns is deleted. > > Add the changes aligned with the changes from > commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize > netns/module teardown and rds connection/workq management") for > rds_loop_transport > > Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> > Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Since this probably fixes syzbot reports, this can be targetted at 'net' instead?
On (06/26/18 22:23), David Miller wrote: > > Since this probably fixes syzbot reports, this can be targetted > at 'net' instead? that thought occurred to me but I wanted to be conservative and have it in net-next first, have the syzkaller-bugs team confirm the the fixes and then backport to earlier kernels (if needed).. --Sowmini
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Tue, 26 Jun 2018 09:40:43 -0400 > On (06/26/18 22:23), David Miller wrote: >> >> Since this probably fixes syzbot reports, this can be targetted >> at 'net' instead? > > that thought occurred to me but I wanted to be conservative and have > it in net-next first, have the syzkaller-bugs team confirm the > the fixes and then backport to earlier kernels (if needed).. I think there is a way to ask syzbot to test a patch in an email.
On (06/26/18 23:29), David Miller wrote: > > I think there is a way to ask syzbot to test a patch in an > email. Dmitry/syzkaller-bugs, can you clarify? This is for the cluster of dup reports like https://groups.google.com/forum/#!topic/syzkaller-bugs/zBph8Vu-q2U and (most recently) https://www.spinics.net/lists/linux-rdma/msg66020.html as I understand it, if there is no reproducer, you cannot really have a pass/fail test to confirm the fix. --Sowmini
On Tue, Jun 26, 2018 at 4:44 PM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote: > On (06/26/18 23:29), David Miller wrote: >> >> I think there is a way to ask syzbot to test a patch in an >> email. > > Dmitry/syzkaller-bugs, can you clarify? > > This is for the cluster of dup reports like > https://groups.google.com/forum/#!topic/syzkaller-bugs/zBph8Vu-q2U > and (most recently) > https://www.spinics.net/lists/linux-rdma/msg66020.html > > as I understand it, if there is no reproducer, you cannot really > have a pass/fail test to confirm the fix. This bug has a reproducer as far as I see: https://syzkaller.appspot.com/bug?id=f4ef381349e100280193c25f24e01d9d364132d9 It seems to be a subtle race since syzbot did not progress with minimization too much: https://syzkaller.appspot.com/text?tag=ReproSyz&x=16cbfeaf800000 it probably hit the race by a pure luck of the large program, but then never had the same luck when tried to remove any syscalls. So it can make sense to submit several test requests to get more testing.
On (06/26/18 23:29), David Miller wrote: > >> > >> Since this probably fixes syzbot reports, this can be targetted > >> at 'net' instead? > > > > that thought occurred to me but I wanted to be conservative and have > > it in net-next first, have the syzkaller-bugs team confirm the > > the fixes and then backport to earlier kernels (if needed).. > > I think there is a way to ask syzbot to test a patch in an > email. and just to add, the fix itself is logically correct, so belongs in net-next. What I dont have (and therefore did not target net) is official confirmation that the syzbot failures are root-caused to the absence of this patch (since there is no reproducer for many of these, and no crash dumps available from syzbot). --Sowmini
On (06/26/18 16:48), Dmitry Vyukov wrote: > it probably hit the race by a pure luck of the large program, but then > never had the same luck when tried to remove any syscalls. > So it can make sense to submit several test requests to get more testing. How does one submit test requests by email? the last time I asked this question, the answer was a pointer to https://groups.google.com/forum/#!msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ Thanks --Sowmini
On Tue, Jun 26, 2018 at 5:04 PM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote: > On (06/26/18 16:48), Dmitry Vyukov wrote: >> it probably hit the race by a pure luck of the large program, but then >> never had the same luck when tried to remove any syscalls. >> So it can make sense to submit several test requests to get more testing. > > How does one submit test requests by email? https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches > the last time I asked this question, the answer was a pointer to > https://groups.google.com/forum/#!msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ You probably asked to apply an unsubmitted patch to syzbot git tree. That's the question that I gave that link to. But now it's also detailed here: https://github.com/google/syzkaller/blob/master/docs/syzbot.md#no-custom-patches
On (06/26/18 10:53), Sowmini Varadhan wrote: > Date: Tue, 26 Jun 2018 10:53:23 -0400 > From: Sowmini Varadhan <sowmini.varadhan@oracle.com> > To: David Miller <davem@davemloft.net> > Cc: netdev@vger.kernel.org, rds-devel@oss.oracle.com > Subject: Re: [rds-devel] [PATCH net-next] rds: clean up loopback > > and just to add, the fix itself is logically correct, so belongs in > net-next. What I dont have (and therefore did not target net) is > official confirmation that the syzbot failures are root-caused to the > absence of this patch (since there is no reproducer for many of these, > and no crash dumps available from syzbot). > With help from Dmitry, I just got the confirmation from syzbot that "syzbot has tested the proposed patch and the reproducer did not trigger crash:" thus, we can mark this Reported-and-tested-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com and yes, it can target net. Thanks --Sowmini
From: Sowmini Varadhan <sowmini.varadhan@oracle.com> Date: Tue, 26 Jun 2018 12:28:22 -0400 > On (06/26/18 10:53), Sowmini Varadhan wrote: >> Date: Tue, 26 Jun 2018 10:53:23 -0400 >> From: Sowmini Varadhan <sowmini.varadhan@oracle.com> >> To: David Miller <davem@davemloft.net> >> Cc: netdev@vger.kernel.org, rds-devel@oss.oracle.com >> Subject: Re: [rds-devel] [PATCH net-next] rds: clean up loopback >> >> and just to add, the fix itself is logically correct, so belongs in >> net-next. What I dont have (and therefore did not target net) is >> official confirmation that the syzbot failures are root-caused to the >> absence of this patch (since there is no reproducer for many of these, >> and no crash dumps available from syzbot). >> > > With help from Dmitry, I just got the confirmation from syzbot that > "syzbot has tested the proposed patch and the reproducer did not trigger > crash:" > > thus, we can mark this > > Reported-and-tested-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com > > and yes, it can target net. Great, applied, thanks!
diff --git a/net/rds/connection.c b/net/rds/connection.c index abef75d..cfb0595 100644 --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -659,11 +659,19 @@ static void rds_conn_info(struct socket *sock, unsigned int len, int rds_conn_init(void) { + int ret; + + ret = rds_loop_net_init(); /* register pernet callback */ + if (ret) + return ret; + rds_conn_slab = kmem_cache_create("rds_connection", sizeof(struct rds_connection), 0, 0, NULL); - if (!rds_conn_slab) + if (!rds_conn_slab) { + rds_loop_net_exit(); return -ENOMEM; + } rds_info_register_func(RDS_INFO_CONNECTIONS, rds_conn_info); rds_info_register_func(RDS_INFO_SEND_MESSAGES, @@ -676,6 +684,7 @@ int rds_conn_init(void) void rds_conn_exit(void) { + rds_loop_net_exit(); /* unregister pernet callback */ rds_loop_exit(); WARN_ON(!hlist_empty(rds_conn_hash)); diff --git a/net/rds/loop.c b/net/rds/loop.c index dac6218..feea1f9 100644 --- a/net/rds/loop.c +++ b/net/rds/loop.c @@ -33,6 +33,8 @@ #include <linux/kernel.h> #include <linux/slab.h> #include <linux/in.h> +#include <net/net_namespace.h> +#include <net/netns/generic.h> #include "rds_single_path.h" #include "rds.h" @@ -40,6 +42,17 @@ static DEFINE_SPINLOCK(loop_conns_lock); static LIST_HEAD(loop_conns); +static atomic_t rds_loop_unloading = ATOMIC_INIT(0); + +static void rds_loop_set_unloading(void) +{ + atomic_set(&rds_loop_unloading, 1); +} + +static bool rds_loop_is_unloading(struct rds_connection *conn) +{ + return atomic_read(&rds_loop_unloading) != 0; +} /* * This 'loopback' transport is a special case for flows that originate @@ -165,6 +178,8 @@ void rds_loop_exit(void) struct rds_loop_connection *lc, *_lc; LIST_HEAD(tmp_list); + rds_loop_set_unloading(); + synchronize_rcu(); /* avoid calling conn_destroy with irqs off */ spin_lock_irq(&loop_conns_lock); list_splice(&loop_conns, &tmp_list); @@ -177,6 +192,46 @@ void rds_loop_exit(void) } } +static void rds_loop_kill_conns(struct net *net) +{ + struct rds_loop_connection *lc, *_lc; + LIST_HEAD(tmp_list); + + spin_lock_irq(&loop_conns_lock); + list_for_each_entry_safe(lc, _lc, &loop_conns, loop_node) { + struct net *c_net = read_pnet(&lc->conn->c_net); + + if (net != c_net) + continue; + list_move_tail(&lc->loop_node, &tmp_list); + } + spin_unlock_irq(&loop_conns_lock); + + list_for_each_entry_safe(lc, _lc, &tmp_list, loop_node) { + WARN_ON(lc->conn->c_passive); + rds_conn_destroy(lc->conn); + } +} + +static void __net_exit rds_loop_exit_net(struct net *net) +{ + rds_loop_kill_conns(net); +} + +static struct pernet_operations rds_loop_net_ops = { + .exit = rds_loop_exit_net, +}; + +int rds_loop_net_init(void) +{ + return register_pernet_device(&rds_loop_net_ops); +} + +void rds_loop_net_exit(void) +{ + unregister_pernet_device(&rds_loop_net_ops); +} + /* * This is missing .xmit_* because loop doesn't go through generic * rds_send_xmit() and doesn't call rds_recv_incoming(). .listen_stop and @@ -194,4 +249,5 @@ struct rds_transport rds_loop_transport = { .inc_free = rds_loop_inc_free, .t_name = "loopback", .t_type = RDS_TRANS_LOOP, + .t_unloading = rds_loop_is_unloading, }; diff --git a/net/rds/loop.h b/net/rds/loop.h index 469fa4b..bbc8cdd 100644 --- a/net/rds/loop.h +++ b/net/rds/loop.h @@ -5,6 +5,8 @@ /* loop.c */ extern struct rds_transport rds_loop_transport; +int rds_loop_net_init(void); +void rds_loop_net_exit(void); void rds_loop_exit(void); #endif