diff mbox

[net-2.6,v2] pkt_sched: gen_estimator: add a new lock

Message ID 1276076350.2442.90.camel@edumazet-laptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet June 9, 2010, 9:39 a.m. UTC
Le lundi 07 juin 2010 à 19:18 +0200, Eric Dumazet a écrit :


> [PATCH net-2.6] pkt_sched: gen_estimator: add a new lock
> 
> gen_kill_estimator() / gen_new_estimator() is not always called with
> RTNL held.
> 
> net/netfilter/xt_RATEEST.c is one user of these API that do not hold
> RTNL, so random corruptions can occur between "tc" and "iptables"
> 
> Add a new fine grained lock instead of trying to use RTNL in xt_RATEEST


Here is v2 of patch, adding protection in gen_estimator_active() as
well.

[PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock

gen_kill_estimator() / gen_new_estimator() is not always called with
RTNL held.

net/netfilter/xt_RATEEST.c is one user of these API that do not hold
RTNL, so random corruptions can occur between "tc" and "iptables".

Add a new fine grained lock instead of trying to use RTNL in netfilter.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/gen_estimator.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jarek Poplawski June 9, 2010, 11:33 a.m. UTC | #1
On Wed, Jun 09, 2010 at 11:39:10AM +0200, Eric Dumazet wrote:
...
> Here is v2 of patch, adding protection in gen_estimator_active() as
> well.
> 
> [PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock
> 
> gen_kill_estimator() / gen_new_estimator() is not always called with
> RTNL held.
> 
> net/netfilter/xt_RATEEST.c is one user of these API that do not hold
> RTNL, so random corruptions can occur between "tc" and "iptables".
> 
> Add a new fine grained lock instead of trying to use RTNL in netfilter.

Btw, maybe it's a different case, but RTNL happens in netfilter already
(nf_conntrack_helper.c, nf_conntrack_proto.c).

It would be nice to mention Changli's argument that
gen_replace_estimator isn't atomic operation after this change.

> @@ -312,8 +315,14 @@ EXPORT_SYMBOL(gen_replace_estimator);
>  bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
>  			  const struct gnet_stats_rate_est *rate_est)
>  {
> +	bool res;
> +
>  	ASSERT_RTNL();

This line should go away as well.

I'm OK with this patch unless there is an alternative xt_RATEEST RTNL
fix available.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 9, 2010, 11:55 a.m. UTC | #2
Le mercredi 09 juin 2010 à 11:33 +0000, Jarek Poplawski a écrit :
> On Wed, Jun 09, 2010 at 11:39:10AM +0200, Eric Dumazet wrote:
> ...
> > Here is v2 of patch, adding protection in gen_estimator_active() as
> > well.
> > 
> > [PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock
> > 
> > gen_kill_estimator() / gen_new_estimator() is not always called with
> > RTNL held.
> > 
> > net/netfilter/xt_RATEEST.c is one user of these API that do not hold
> > RTNL, so random corruptions can occur between "tc" and "iptables".
> > 
> > Add a new fine grained lock instead of trying to use RTNL in netfilter.
> 
> Btw, maybe it's a different case, but RTNL happens in netfilter already
> (nf_conntrack_helper.c, nf_conntrack_proto.c).
> 
> It would be nice to mention Changli's argument that
> gen_replace_estimator isn't atomic operation after this change.
> 

See my next comment. This function is still used in qdisc domain only.
We could add ASSERT_RTNL() here, but its not a bug fix...

> > @@ -312,8 +315,14 @@ EXPORT_SYMBOL(gen_replace_estimator);
> >  bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
> >  			  const struct gnet_stats_rate_est *rate_est)
> >  {
> > +	bool res;
> > +
> >  	ASSERT_RTNL();
> 
> This line should go away as well.
> 

Nope. This is really needed for safety. Of course, its a debugging knob
and you can remove it if you trust all callers to be good.

Caller should still use RTNL here, or else, as soon as we exit from
gen_estimator_active() with a 'true' result, some other qdisc user could
destroy the estimator.

Fortunatly up to 2.6.35, xt_RATEEST cannot reuse/delete an estimator
created by a qdisc user, and it doesnt use gen_estimator_active().

In a future patch, we should add RCU safety here instead of ASSERT_RTNL.
Thats to make sure caller of gen_estimator_active() is inside a
rcu_read_lock() section, and estimator cannot disappear under it, even
if another cpu/thread calls gen_kill_estimator() on same estimator.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 11, 2010, 5:54 a.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 09 Jun 2010 11:39:10 +0200

> [PATCH net-2.6 v2] pkt_sched: gen_estimator: add a new lock
> 
> gen_kill_estimator() / gen_new_estimator() is not always called with
> RTNL held.
> 
> net/netfilter/xt_RATEEST.c is one user of these API that do not hold
> RTNL, so random corruptions can occur between "tc" and "iptables".
> 
> Add a new fine grained lock instead of trying to use RTNL in netfilter.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Wow, a lot of discussion to read and digest for this one :-)

Applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index cf8e703..785e527 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -107,6 +107,7 @@  static DEFINE_RWLOCK(est_lock);
 
 /* Protects against soft lockup during large deletion */
 static struct rb_root est_root = RB_ROOT;
+static DEFINE_SPINLOCK(est_tree_lock);
 
 static void est_timer(unsigned long arg)
 {
@@ -201,7 +202,6 @@  struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats
  *
  * Returns 0 on success or a negative error code.
  *
- * NOTE: Called under rtnl_mutex
  */
 int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 		      struct gnet_stats_rate_est *rate_est,
@@ -232,6 +232,7 @@  int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
 
+	spin_lock(&est_tree_lock);
 	if (!elist[idx].timer.function) {
 		INIT_LIST_HEAD(&elist[idx].list);
 		setup_timer(&elist[idx].timer, est_timer, idx);
@@ -242,6 +243,7 @@  int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 
 	list_add_rcu(&est->list, &elist[idx].list);
 	gen_add_node(est);
+	spin_unlock(&est_tree_lock);
 
 	return 0;
 }
@@ -261,13 +263,13 @@  static void __gen_kill_estimator(struct rcu_head *head)
  *
  * Removes the rate estimator specified by &bstats and &rate_est.
  *
- * NOTE: Called under rtnl_mutex
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 			struct gnet_stats_rate_est *rate_est)
 {
 	struct gen_estimator *e;
 
+	spin_lock(&est_tree_lock);
 	while ((e = gen_find_node(bstats, rate_est))) {
 		rb_erase(&e->node, &est_root);
 
@@ -278,6 +280,7 @@  void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 		list_del_rcu(&e->list);
 		call_rcu(&e->e_rcu, __gen_kill_estimator);
 	}
+	spin_unlock(&est_tree_lock);
 }
 EXPORT_SYMBOL(gen_kill_estimator);
 
@@ -312,8 +315,14 @@  EXPORT_SYMBOL(gen_replace_estimator);
 bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats,
 			  const struct gnet_stats_rate_est *rate_est)
 {
+	bool res;
+
 	ASSERT_RTNL();
 
-	return gen_find_node(bstats, rate_est) != NULL;
+	spin_lock(&est_tree_lock);
+	res = gen_find_node(bstats, rate_est) != NULL;
+	spin_unlock(&est_tree_lock);
+
+	return res;
 }
 EXPORT_SYMBOL(gen_estimator_active);