Message ID | 1381802507-7934-3-git-send-email-horms@verge.net.au |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, Oct 15, 2013 at 11:01:46AM +0900, Simon Horman wrote: > From: Julian Anastasov <ja@ssi.bg> > > commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added > rcu_barrier() on cleanup to wait dest users and schedulers > like LBLC and LBLCR to put their last dest reference. > Using rcu_barrier with many namespaces is problematic. > > Trying to fix it by freeing dest with kfree_rcu is not > a solution, RCU callbacks can run in parallel and execution > order is random. > > Fix it by creating new function ip_vs_dest_put_and_free() > which is heavier than ip_vs_dest_put(). We will use it just > for schedulers like LBLC, LBLCR that can delay their dest > release. > > By default, dests reference is above 0 if they are present in > service and it is 0 when deleted but still in trash list. > Change the dest trash code to use ip_vs_dest_put_and_free(), > so that refcnt -1 can be used for freeing. As result, > such checks remain in slow path and the rcu_barrier() from > netns cleanup can be removed. I can enqueue this fix to nf if you like. No need to resend, I can manually apply. Let me know. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > I can enqueue this fix to nf if you like. No need to resend, I can > manually apply. > > Let me know. It is not critical. I waited weeks the net tree to be copied into net-next because it collides with the recent "ipvs: make the service replacement more robust" change in net tree :) But if a rcu_barrier in the netns cleanup looks scary enough you can push it to nf. IMHO, it just adds unneeded delay there. Regards -- Julian Anastasov <ja@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote: > > Hello, > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > > > I can enqueue this fix to nf if you like. No need to resend, I can > > manually apply. > > > > Let me know. > > It is not critical. I waited weeks the net tree to be > copied into net-next because it collides with the recent > "ipvs: make the service replacement more robust" change in > net tree :) But if a rcu_barrier in the netns cleanup looks > scary enough you can push it to nf. IMHO, it just adds > unneeded delay there. If it is not critical I would prefer for it to travel through nf-next. Though I do not feel strongly about this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote: > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote: > > > > Hello, > > > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > > > > > I can enqueue this fix to nf if you like. No need to resend, I can > > > manually apply. > > > > > > Let me know. > > > > It is not critical. I waited weeks the net tree to be > > copied into net-next because it collides with the recent > > "ipvs: make the service replacement more robust" change in > > net tree :) But if a rcu_barrier in the netns cleanup looks > > scary enough you can push it to nf. IMHO, it just adds > > unneeded delay there. > > If it is not critical I would prefer for it to travel through > nf-next. Though I do not feel strongly about this. Will enqueue for nf-next. I'd appreciate if you can recover the tradition of attaching a short evaluation in the cover letter as I do when I send pull requests to David. Thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 17, 2013 at 10:11:42AM +0200, Pablo Neira Ayuso wrote: > On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote: > > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote: > > > > > > Hello, > > > > > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote: > > > > > > > I can enqueue this fix to nf if you like. No need to resend, I can > > > > manually apply. > > > > > > > > Let me know. > > > > > > It is not critical. I waited weeks the net tree to be > > > copied into net-next because it collides with the recent > > > "ipvs: make the service replacement more robust" change in > > > net tree :) But if a rcu_barrier in the netns cleanup looks > > > scary enough you can push it to nf. IMHO, it just adds > > > unneeded delay there. > > > > If it is not critical I would prefer for it to travel through > > nf-next. Though I do not feel strongly about this. > > Will enqueue for nf-next. > > I'd appreciate if you can recover the tradition of attaching a short > evaluation in the cover letter as I do when I send pull requests to > David. Thanks! Sure, will do. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index 1c2e1b9..cd7275f 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -1442,6 +1442,12 @@ static inline void ip_vs_dest_put(struct ip_vs_dest *dest) atomic_dec(&dest->refcnt); } +static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest) +{ + if (atomic_dec_return(&dest->refcnt) < 0) + kfree(dest); +} + /* * IPVS sync daemon data and function prototypes * (from ip_vs_sync.c) diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index a3df9bd..62786a4 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -704,7 +704,7 @@ static void ip_vs_dest_free(struct ip_vs_dest *dest) __ip_vs_dst_cache_reset(dest); __ip_vs_svc_put(svc, false); free_percpu(dest->stats.cpustats); - kfree(dest); + ip_vs_dest_put_and_free(dest); } /* @@ -3820,10 +3820,6 @@ void __net_exit ip_vs_control_net_cleanup(struct net *net) { struct netns_ipvs *ipvs = net_ipvs(net); - /* Some dest can be in grace period even before cleanup, we have to - * defer ip_vs_trash_cleanup until ip_vs_dest_wait_readers is called. - */ - rcu_barrier(); ip_vs_trash_cleanup(net); ip_vs_stop_estimator(net, &ipvs->tot_stats); ip_vs_control_net_cleanup_sysctl(net); diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c index eff13c9..ca056a3 100644 --- a/net/netfilter/ipvs/ip_vs_lblc.c +++ b/net/netfilter/ipvs/ip_vs_lblc.c @@ -136,7 +136,7 @@ static void ip_vs_lblc_rcu_free(struct rcu_head *head) struct ip_vs_lblc_entry, rcu_head); - ip_vs_dest_put(en->dest); + ip_vs_dest_put_and_free(en->dest); kfree(en); } diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c index 0b85500..3f21a2f 100644 --- a/net/netfilter/ipvs/ip_vs_lblcr.c +++ b/net/netfilter/ipvs/ip_vs_lblcr.c @@ -130,7 +130,7 @@ static void ip_vs_lblcr_elem_rcu_free(struct rcu_head *head) struct ip_vs_dest_set_elem *e; e = container_of(head, struct ip_vs_dest_set_elem, rcu_head); - ip_vs_dest_put(e->dest); + ip_vs_dest_put_and_free(e->dest); kfree(e); }