diff mbox

[net-2.6] pkt_sched: gen_estimator: add a new lock

Message ID 1275931108.2545.168.camel@edumazet-laptop
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet June 7, 2010, 5:18 p.m. UTC
Le lundi 07 juin 2010 à 18:56 +0200, Eric Dumazet a écrit :
> > 
> > For your information, bug is already there before my patch.
> > 
> > So this est_lock is a wrong protection, in the sense its so convoluted
> > that nobody but you and me even noticed it was buggy in the first place.
> > 
> > (see commit 5d944c640b4 for a first patch)
> > 
> > 
> 
> Here is v2 of the patch.
> 
> Even if its a bug correction, I cooked it for net-next-2.6 since bug
> probably never occured, and patch is too large to be sent to
> net-2.6/linux-2.6 before testing.
> 
> Another bug comes from net/netfilter/xt_RATEEST.c : It apparently
> calls gen_kill_estimator() / gen_new_estimator() without holding RTNL ?
> 
> So we should add another lock to protect things (est_root, elist[], ...)
> 
> David, I can send a net-2.6 patch for this one, since it should be small
> enough. If yes, I'll respin this patch of course ;)

[PATCH net-2.6] pkt_sched: gen_estimator: add a new lock

gen_kill_estimator() / gen_new_estimator() is not always called with
RTNL held.

net/netfilter/xt_RATEEST.c is one user of these API that do not hold
RTNL, so random corruptions can occur between "tc" and "iptables"

Add a new fine grained lock instead of trying to use RTNL in xt_RATEEST

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/gen_estimator.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Changli Gao June 8, 2010, 1 a.m. UTC | #1
On Tue, Jun 8, 2010 at 1:18 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> [PATCH net-2.6] pkt_sched: gen_estimator: add a new lock
>
> gen_kill_estimator() / gen_new_estimator() is not always called with
> RTNL held.
>
> net/netfilter/xt_RATEEST.c is one user of these API that do not hold
> RTNL, so random corruptions can occur between "tc" and "iptables"
>
> Add a new fine grained lock instead of trying to use RTNL in xt_RATEEST
>

Why not use RTNL in xt_RATEEST? It seems xt_RATEEST misuse the APIs.

and I think gen_replace_estimator is expected to be an atomic operation.

And gen_estimator_active() is also assumed to be called with RTNL locked.
Eric Dumazet June 8, 2010, 4:30 a.m. UTC | #2
Le mardi 08 juin 2010 à 09:00 +0800, Changli Gao a écrit :
> On Tue, Jun 8, 2010 at 1:18 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >
> > [PATCH net-2.6] pkt_sched: gen_estimator: add a new lock
> >
> > gen_kill_estimator() / gen_new_estimator() is not always called with
> > RTNL held.
> >
> > net/netfilter/xt_RATEEST.c is one user of these API that do not hold
> > RTNL, so random corruptions can occur between "tc" and "iptables"
> >
> > Add a new fine grained lock instead of trying to use RTNL in xt_RATEEST
> >
> 
> Why not use RTNL in xt_RATEEST? It seems xt_RATEEST misuse the APIs.
> 
> and I think gen_replace_estimator is expected to be an atomic operation.
> 
> And gen_estimator_active() is also assumed to be called with RTNL locked.
> 

Thank you for asking this question.

Because I want to kill RTNL when possible, I dont even want to try
adding RTNL to xt_RATEEST and solve all lock dependencies it might
raise.

RTNL = Big and Horrible Network LOCK

You never got blocked because of this RTNL thing, dont you ?

I did. And it sucks, because when you hit this, you are in a hurry and
locating the bottleneck takes lot of time.

RTNL is the thing we must hold during device register / unregister.
Its locked for long delays because of all synchronize_rcu() that must be
done, and that is already a big problem on some setups.

Every time someone adds a RTNL requirement, you can be sure another guy
will zap it during following ten years.

Let's do this right now, not later.

For an example of horrible rtnl behavior, take a look at following
construct :

if (!rtnl_trylock())
	return restart_syscall();

I saw hundred of udev looping, trying to get rtnl to dump some
information. (Patrick added a rtnl requirement to all dump operations,
and it sucks)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao June 8, 2010, 4:57 a.m. UTC | #3
On Tue, Jun 8, 2010 at 12:30 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Thank you for asking this question.
>
> Because I want to kill RTNL when possible, I dont even want to try
> adding RTNL to xt_RATEEST and solve all lock dependencies it might
> raise.
>
> RTNL = Big and Horrible Network LOCK
>

It is much like the BKL, and should be killed too. Thanks. :)
Eric Dumazet June 8, 2010, 4:58 a.m. UTC | #4
Le mardi 08 juin 2010 à 09:00 +0800, Changli Gao a écrit :

> and I think gen_replace_estimator is expected to be an atomic operation.
> 
> And gen_estimator_active() is also assumed to be called with RTNL locked.
> 

My patch fixes a bug of new/kill operators, regardless of RTNL being
held or not. Its should be small enough to be included in linux-2.6.35.

If what you say is right, all gen_replace_estimator() /
gen_estimator_active() callers should still holds RTNL.
I didnt change this part.
If you believe one caller doesnt hold RTNL, please submit another patch.

Then, in net-next-2.6, we can probably cleanup this to remove RTNL
requirement if possible for gen_replace_estimator() /
gen_estimator_active()

Yes, it sounds a bit difficult (three patches instead of a single one),
but this is the how things should be done, step by step.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Changli Gao June 8, 2010, 5:20 a.m. UTC | #5
On Tue, Jun 8, 2010 at 12:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 08 juin 2010 à 09:00 +0800, Changli Gao a écrit :
>
>> and I think gen_replace_estimator is expected to be an atomic operation.
>>
>> And gen_estimator_active() is also assumed to be called with RTNL locked.
>>
>
> My patch fixes a bug of new/kill operators, regardless of RTNL being
> held or not. Its should be small enough to be included in linux-2.6.35.
>
> If what you say is right, all gen_replace_estimator() /
> gen_estimator_active() callers should still holds RTNL.
> I didnt change this part.
> If you believe one caller doesnt hold RTNL, please submit another patch.
>
> Then, in net-next-2.6, we can probably cleanup this to remove RTNL
> requirement if possible for gen_replace_estimator() /
> gen_estimator_active()
>
> Yes, it sounds a bit difficult (three patches instead of a single one),
> but this is the how things should be done, step by step.
>

IMO, this bug should be fixed by adding rtnl_lock to xt_RATEEST.c.
Killing rtnl should be done in separated patches. They are different
things. Your patch introduces another locks, and it is extra overhead
for other users.
Eric Dumazet June 8, 2010, 5:39 a.m. UTC | #6
Le mardi 08 juin 2010 à 13:20 +0800, Changli Gao a écrit :

> IMO, this bug should be fixed by adding rtnl_lock to xt_RATEEST.c.
> Killing rtnl should be done in separated patches. They are different
> things. Your patch introduces another locks, and it is extra overhead
> for other users.
> 

extra overhead, in new/kill estimators ? 

Are you kidding ?

RTNL is taken, taking an extra-uncontended spinlock is free.

Nope, I wont add rtnl lock to xt_RATEEST.c

I believe you dont really understood what I patiently explained to you.

Thats becoming rediculous.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index cf8e703..3d11203 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -107,6 +107,7 @@  static DEFINE_RWLOCK(est_lock);
 
 /* Protects against soft lockup during large deletion */
 static struct rb_root est_root = RB_ROOT;
+static DEFINE_SPINLOCK(est_tree_lock);
 
 static void est_timer(unsigned long arg)
 {
@@ -201,7 +202,6 @@  struct gen_estimator *gen_find_node(const struct gnet_stats_basic_packed *bstats
  *
  * Returns 0 on success or a negative error code.
  *
- * NOTE: Called under rtnl_mutex
  */
 int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 		      struct gnet_stats_rate_est *rate_est,
@@ -222,6 +222,7 @@  int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 	if (est == NULL)
 		return -ENOBUFS;
 
+	spin_lock(&est_tree_lock);
 	idx = parm->interval + 2;
 	est->bstats = bstats;
 	est->rate_est = rate_est;
@@ -242,6 +243,7 @@  int gen_new_estimator(struct gnet_stats_basic_packed *bstats,
 
 	list_add_rcu(&est->list, &elist[idx].list);
 	gen_add_node(est);
+	spin_unlock(&est_tree_lock);
 
 	return 0;
 }
@@ -261,13 +263,13 @@  static void __gen_kill_estimator(struct rcu_head *head)
  *
  * Removes the rate estimator specified by &bstats and &rate_est.
  *
- * NOTE: Called under rtnl_mutex
  */
 void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 			struct gnet_stats_rate_est *rate_est)
 {
 	struct gen_estimator *e;
 
+	spin_lock(&est_tree_lock);
 	while ((e = gen_find_node(bstats, rate_est))) {
 		rb_erase(&e->node, &est_root);
 
@@ -278,6 +280,7 @@  void gen_kill_estimator(struct gnet_stats_basic_packed *bstats,
 		list_del_rcu(&e->list);
 		call_rcu(&e->e_rcu, __gen_kill_estimator);
 	}
+	spin_unlock(&est_tree_lock);
 }
 EXPORT_SYMBOL(gen_kill_estimator);