Patchwork [2/3] ipvs: avoid rcu_barrier during netns cleanup

login
register
mail settings
Submitter Simon Horman
Date Oct. 15, 2013, 2:01 a.m.
Message ID <1381802507-7934-3-git-send-email-horms@verge.net.au>
Download mbox | patch
Permalink /patch/283437/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Simon Horman - Oct. 15, 2013, 2:01 a.m.
From: Julian Anastasov <ja@ssi.bg>

commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added
rcu_barrier() on cleanup to wait dest users and schedulers
like LBLC and LBLCR to put their last dest reference.
Using rcu_barrier with many namespaces is problematic.

Trying to fix it by freeing dest with kfree_rcu is not
a solution, RCU callbacks can run in parallel and execution
order is random.

Fix it by creating new function ip_vs_dest_put_and_free()
which is heavier than ip_vs_dest_put(). We will use it just
for schedulers like LBLC, LBLCR that can delay their dest
release.

By default, dests reference is above 0 if they are present in
service and it is 0 when deleted but still in trash list.
Change the dest trash code to use ip_vs_dest_put_and_free(),
so that refcnt -1 can be used for freeing. As result,
such checks remain in slow path and the rcu_barrier() from
netns cleanup can be removed.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h              | 6 ++++++
 net/netfilter/ipvs/ip_vs_ctl.c   | 6 +-----
 net/netfilter/ipvs/ip_vs_lblc.c  | 2 +-
 net/netfilter/ipvs/ip_vs_lblcr.c | 2 +-
 4 files changed, 9 insertions(+), 7 deletions(-)
Pablo Neira - Oct. 16, 2013, 10:43 a.m.
On Tue, Oct 15, 2013 at 11:01:46AM +0900, Simon Horman wrote:
> From: Julian Anastasov <ja@ssi.bg>
> 
> commit 578bc3ef1e473a ("ipvs: reorganize dest trash") added
> rcu_barrier() on cleanup to wait dest users and schedulers
> like LBLC and LBLCR to put their last dest reference.
> Using rcu_barrier with many namespaces is problematic.
> 
> Trying to fix it by freeing dest with kfree_rcu is not
> a solution, RCU callbacks can run in parallel and execution
> order is random.
> 
> Fix it by creating new function ip_vs_dest_put_and_free()
> which is heavier than ip_vs_dest_put(). We will use it just
> for schedulers like LBLC, LBLCR that can delay their dest
> release.
> 
> By default, dests reference is above 0 if they are present in
> service and it is 0 when deleted but still in trash list.
> Change the dest trash code to use ip_vs_dest_put_and_free(),
> so that refcnt -1 can be used for freeing. As result,
> such checks remain in slow path and the rcu_barrier() from
> netns cleanup can be removed.

I can enqueue this fix to nf if you like. No need to resend, I can
manually apply.

Let me know.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julian Anastasov - Oct. 16, 2013, 7:52 p.m.
Hello,

On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:

> I can enqueue this fix to nf if you like. No need to resend, I can
> manually apply.
> 
> Let me know.

	It is not critical. I waited weeks the net tree to be
copied into net-next because it collides with the recent
"ipvs: make the service replacement more robust" change in
net tree :) But if a rcu_barrier in the netns cleanup looks
scary enough you can push it to nf. IMHO, it just adds
unneeded delay there.

Regards

--
Julian Anastasov <ja@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simon Horman - Oct. 17, 2013, 12:49 a.m.
On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> 
> > I can enqueue this fix to nf if you like. No need to resend, I can
> > manually apply.
> > 
> > Let me know.
> 
> 	It is not critical. I waited weeks the net tree to be
> copied into net-next because it collides with the recent
> "ipvs: make the service replacement more robust" change in
> net tree :) But if a rcu_barrier in the netns cleanup looks
> scary enough you can push it to nf. IMHO, it just adds
> unneeded delay there.

If it is not critical I would prefer for it to travel through
nf-next. Though I do not feel strongly about this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira - Oct. 17, 2013, 8:11 a.m.
On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote:
> On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> > 
> > 	Hello,
> > 
> > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> > 
> > > I can enqueue this fix to nf if you like. No need to resend, I can
> > > manually apply.
> > > 
> > > Let me know.
> > 
> > 	It is not critical. I waited weeks the net tree to be
> > copied into net-next because it collides with the recent
> > "ipvs: make the service replacement more robust" change in
> > net tree :) But if a rcu_barrier in the netns cleanup looks
> > scary enough you can push it to nf. IMHO, it just adds
> > unneeded delay there.
> 
> If it is not critical I would prefer for it to travel through
> nf-next. Though I do not feel strongly about this.

Will enqueue for nf-next.

I'd appreciate if you can recover the tradition of attaching a short
evaluation in the cover letter as I do when I send pull requests to
David. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Simon Horman - Oct. 17, 2013, 8:30 a.m.
On Thu, Oct 17, 2013 at 10:11:42AM +0200, Pablo Neira Ayuso wrote:
> On Thu, Oct 17, 2013 at 09:49:39AM +0900, Simon Horman wrote:
> > On Wed, Oct 16, 2013 at 10:52:14PM +0300, Julian Anastasov wrote:
> > > 
> > > 	Hello,
> > > 
> > > On Wed, 16 Oct 2013, Pablo Neira Ayuso wrote:
> > > 
> > > > I can enqueue this fix to nf if you like. No need to resend, I can
> > > > manually apply.
> > > > 
> > > > Let me know.
> > > 
> > > 	It is not critical. I waited weeks the net tree to be
> > > copied into net-next because it collides with the recent
> > > "ipvs: make the service replacement more robust" change in
> > > net tree :) But if a rcu_barrier in the netns cleanup looks
> > > scary enough you can push it to nf. IMHO, it just adds
> > > unneeded delay there.
> > 
> > If it is not critical I would prefer for it to travel through
> > nf-next. Though I do not feel strongly about this.
> 
> Will enqueue for nf-next.
> 
> I'd appreciate if you can recover the tradition of attaching a short
> evaluation in the cover letter as I do when I send pull requests to
> David. Thanks!

Sure, will do.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 1c2e1b9..cd7275f 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1442,6 +1442,12 @@  static inline void ip_vs_dest_put(struct ip_vs_dest *dest)
 	atomic_dec(&dest->refcnt);
 }
 
+static inline void ip_vs_dest_put_and_free(struct ip_vs_dest *dest)
+{
+	if (atomic_dec_return(&dest->refcnt) < 0)
+		kfree(dest);
+}
+
 /*
  *      IPVS sync daemon data and function prototypes
  *      (from ip_vs_sync.c)
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index a3df9bd..62786a4 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -704,7 +704,7 @@  static void ip_vs_dest_free(struct ip_vs_dest *dest)
 	__ip_vs_dst_cache_reset(dest);
 	__ip_vs_svc_put(svc, false);
 	free_percpu(dest->stats.cpustats);
-	kfree(dest);
+	ip_vs_dest_put_and_free(dest);
 }
 
 /*
@@ -3820,10 +3820,6 @@  void __net_exit ip_vs_control_net_cleanup(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
-	/* Some dest can be in grace period even before cleanup, we have to
-	 * defer ip_vs_trash_cleanup until ip_vs_dest_wait_readers is called.
-	 */
-	rcu_barrier();
 	ip_vs_trash_cleanup(net);
 	ip_vs_stop_estimator(net, &ipvs->tot_stats);
 	ip_vs_control_net_cleanup_sysctl(net);
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index eff13c9..ca056a3 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -136,7 +136,7 @@  static void ip_vs_lblc_rcu_free(struct rcu_head *head)
 						   struct ip_vs_lblc_entry,
 						   rcu_head);
 
-	ip_vs_dest_put(en->dest);
+	ip_vs_dest_put_and_free(en->dest);
 	kfree(en);
 }
 
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 0b85500..3f21a2f 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -130,7 +130,7 @@  static void ip_vs_lblcr_elem_rcu_free(struct rcu_head *head)
 	struct ip_vs_dest_set_elem *e;
 
 	e = container_of(head, struct ip_vs_dest_set_elem, rcu_head);
-	ip_vs_dest_put(e->dest);
+	ip_vs_dest_put_and_free(e->dest);
 	kfree(e);
 }