Message ID | 87zirtofgp.fsf@x220.int.ebiederm.org |
---|---|
State | Awaiting Upstream, archived |
Delegated to: | David Miller |
Headers | show |
Eric W. Biederman <ebiederm@xmission.com> wrote: > Florian could you test and verify this patch fixes your issues? Yes, this seems to work. Pablo, I'm fine with this patch going into -nf/stable but I do not think making the pointers per netns is a desireable option in the long term. > Unlike the other possibilities that have been discussed this also > addresses the nf_queue path as well as the nf_queue_hook_drop path. The nf_queue path should have been fine, no? Or putting it differently: can we start processing skbs before a netns is fully initialized?
Florian Westphal <fw@strlen.de> writes: > Eric W. Biederman <ebiederm@xmission.com> wrote: >> Florian could you test and verify this patch fixes your issues? > > Yes, this seems to work. > > Pablo, I'm fine with this patch going into -nf/stable but I do not think > making the pointers per netns is a desireable option in the long term. > >> Unlike the other possibilities that have been discussed this also >> addresses the nf_queue path as well as the nf_queue_hook_drop path. > > The nf_queue path should have been fine, no? > > Or putting it differently: can we start processing skbs before a netns > is fully initialized? The practical case that worries me is what happens when someone does "rmmod nfnetlink_queue" while the system is running. It appears to me that today we could free the per netns data during the rcu grace period and cause a similar issue in nfnl_queue_pernet. That looks like it could affect both the nf_queue path and the nf_queue_nf_hook_drop path. Eric
Eric W. Biederman <ebiederm@xmission.com> wrote: > Florian Westphal <fw@strlen.de> writes: > > > Eric W. Biederman <ebiederm@xmission.com> wrote: > >> Florian could you test and verify this patch fixes your issues? > > > > Yes, this seems to work. > > > > Pablo, I'm fine with this patch going into -nf/stable but I do not think > > making the pointers per netns is a desireable option in the long term. > > > >> Unlike the other possibilities that have been discussed this also > >> addresses the nf_queue path as well as the nf_queue_hook_drop path. > > > > The nf_queue path should have been fine, no? > > > > Or putting it differently: can we start processing skbs before a netns > > is fully initialized? > > The practical case that worries me is what happens when someone does > "rmmod nfnetlink_queue" while the system is running. It appears to me > that today we could free the per netns data during the rcu grace period > and cause a similar issue in nfnl_queue_pernet. > > That looks like it could affect both the nf_queue path and the > nf_queue_nf_hook_drop path. OK, I'll check this again but I seem to recall this was fine (the nfqueue module exit path sets the handler to NULL before doing anything else). The normal netns exit path should be fine too as exit and free happens in two distinct loops, i.e. while (without your change) we can have calls to nf_queue_hook_drop after the nfqueue netns exit function was called, these calls will always happen before the pernets data is freed.
Florian Westphal <fw@strlen.de> writes: > Eric W. Biederman <ebiederm@xmission.com> wrote: >> Florian Westphal <fw@strlen.de> writes: >> >> > Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> Florian could you test and verify this patch fixes your issues? >> > >> > Yes, this seems to work. >> > >> > Pablo, I'm fine with this patch going into -nf/stable but I do not think >> > making the pointers per netns is a desireable option in the long term. >> > >> >> Unlike the other possibilities that have been discussed this also >> >> addresses the nf_queue path as well as the nf_queue_hook_drop path. >> > >> > The nf_queue path should have been fine, no? >> > >> > Or putting it differently: can we start processing skbs before a netns >> > is fully initialized? >> >> The practical case that worries me is what happens when someone does >> "rmmod nfnetlink_queue" while the system is running. It appears to me >> that today we could free the per netns data during the rcu grace period >> and cause a similar issue in nfnl_queue_pernet. >> >> That looks like it could affect both the nf_queue path and the >> nf_queue_nf_hook_drop path. > > OK, I'll check this again but I seem to recall this was fine (the > nfqueue module exit path sets the handler to NULL before doing anything > else). Good point. Yes, the nfnetlink_queue module calls nf_unregister_queue_handler() in the module fini method before it does anything else. That does set queue_handler to NULL and calls synchronize_rcu() before anything else. So in practice that is not a problem, but being disconnected from everything else that is not immediately apparent. Sigh. > The normal netns exit path should be fine too as exit and free happens > in two distinct loops, i.e. while (without your change) we can have > calls to nf_queue_hook_drop after the nfqueue netns exit function was > called, these calls will always happen before the pernets data is > freed. Ouch. That is a little scary. Today we only have remove_proc_entry in nfnl_queue_net_exit. If we had something more substantial those calls after .exit (without my change) we could get into nasty to find oopses. So I guess I did prevent a possible future issue with my patch. I am half wondering if we could make everything simpler by simply not allowing nfnetlink_queue be a module. Eric
diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h index 9c5638ad872e..0dbce55437f2 100644 --- a/include/net/netfilter/nf_queue.h +++ b/include/net/netfilter/nf_queue.h @@ -28,8 +28,8 @@ struct nf_queue_handler { struct nf_hook_ops *ops); }; -void nf_register_queue_handler(const struct nf_queue_handler *qh); -void nf_unregister_queue_handler(void); +void nf_register_queue_handler(struct net *net, const struct nf_queue_handler *qh); +void nf_unregister_queue_handler(struct net *net); void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict); void nf_queue_entry_get_refs(struct nf_queue_entry *entry); diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h index 38aa4983e2a9..36d723579af2 100644 --- a/include/net/netns/netfilter.h +++ b/include/net/netns/netfilter.h @@ -5,11 +5,13 @@ struct proc_dir_entry; struct nf_logger; +struct nf_queue_handler; struct netns_nf { #if defined CONFIG_PROC_FS struct proc_dir_entry *proc_netfilter; #endif + const struct nf_queue_handler __rcu *queue_handler; const struct nf_logger __rcu *nf_loggers[NFPROTO_NUMPROTO]; #ifdef CONFIG_SYSCTL struct ctl_table_header *nf_log_dir_header; diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c index 5baa8e24e6ac..b19ad20a705c 100644 --- a/net/netfilter/nf_queue.c +++ b/net/netfilter/nf_queue.c @@ -26,23 +26,21 @@ * Once the queue is registered it must reinject all packets it * receives, no matter what. */ -static const struct nf_queue_handler __rcu *queue_handler __read_mostly; /* return EBUSY when somebody else is registered, return EEXIST if the * same handler is registered, return 0 in case of success. */ -void nf_register_queue_handler(const struct nf_queue_handler *qh) +void nf_register_queue_handler(struct net *net, const struct nf_queue_handler *qh) { /* should never happen, we only have one queueing backend in kernel */ - WARN_ON(rcu_access_pointer(queue_handler)); - rcu_assign_pointer(queue_handler, qh); + WARN_ON(rcu_access_pointer(net->nf.queue_handler)); + rcu_assign_pointer(net->nf.queue_handler, qh); } EXPORT_SYMBOL(nf_register_queue_handler); /* The caller must flush their queue before this */ -void nf_unregister_queue_handler(void) +void nf_unregister_queue_handler(struct net *net) { - RCU_INIT_POINTER(queue_handler, NULL); - synchronize_rcu(); + RCU_INIT_POINTER(net->nf.queue_handler, NULL); } EXPORT_SYMBOL(nf_unregister_queue_handler); @@ -103,7 +101,7 @@ void nf_queue_nf_hook_drop(struct net *net, struct nf_hook_ops *ops) const struct nf_queue_handler *qh; rcu_read_lock(); - qh = rcu_dereference(queue_handler); + qh = rcu_dereference(net->nf.queue_handler); if (qh) qh->nf_hook_drop(net, ops); rcu_read_unlock(); @@ -122,9 +120,10 @@ int nf_queue(struct sk_buff *skb, struct nf_queue_entry *entry = NULL; const struct nf_afinfo *afinfo; const struct nf_queue_handler *qh; + struct net *net = state->net; /* QUEUE == DROP if no one is waiting, to be safe. */ - qh = rcu_dereference(queue_handler); + qh = rcu_dereference(net->nf.queue_handler); if (!qh) { status = -ESRCH; goto err; diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index cb5b630a645b..2c13e41c508c 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -1377,21 +1377,29 @@ static int __net_init nfnl_queue_net_init(struct net *net) net->nf.proc_netfilter, &nfqnl_file_ops)) return -ENOMEM; #endif + nf_register_queue_handler(net, &nfqh); return 0; } static void __net_exit nfnl_queue_net_exit(struct net *net) { + nf_unregister_queue_handler(net); #ifdef CONFIG_PROC_FS remove_proc_entry("nfnetlink_queue", net->nf.proc_netfilter); #endif } +static void nfnl_queue_net_exit_batch(struct list_head *net_exit_list) +{ + synchronize_rcu(); +} + static struct pernet_operations nfnl_queue_net_ops = { - .init = nfnl_queue_net_init, - .exit = nfnl_queue_net_exit, - .id = &nfnl_queue_net_id, - .size = sizeof(struct nfnl_queue_net), + .init = nfnl_queue_net_init, + .exit = nfnl_queue_net_exit, + .exit_batch = nfnl_queue_net_exit_batch, + .id = &nfnl_queue_net_id, + .size = sizeof(struct nfnl_queue_net), }; static int __init nfnetlink_queue_init(void) @@ -1412,7 +1420,6 @@ static int __init nfnetlink_queue_init(void) } register_netdevice_notifier(&nfqnl_dev_notifier); - nf_register_queue_handler(&nfqh); return status; cleanup_netlink_notifier: @@ -1424,7 +1431,6 @@ out: static void __exit nfnetlink_queue_fini(void) { - nf_unregister_queue_handler(); unregister_netdevice_notifier(&nfqnl_dev_notifier); nfnetlink_subsys_unregister(&nfqnl_subsys); netlink_unregister_notifier(&nfqnl_rtnl_notifier);