diff mbox

[1/3] net: serialize hrtimer callback in sched_cbq

Message ID 4A5C402A.7090906@trash.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Patrick McHardy July 14, 2009, 8:22 a.m. UTC
David Miller wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Thu, 09 Jul 2009 21:59:22 -0000
> 
>> The hrtimer callback cbq_undelay() is not serialized against
>> cbq_ovl_delay(). That affects at least q->pmask and q->delay_timer.
>>
>> Lock it proper.
>>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> The problems here are even much deeper than it appears.
> 
> First of all, I am to understand that hrtimers run from hardware
> interrupt context, right?  If so, all of these datastructures are
> softirq safe only.
> 
> And it is not merely the immediate things you see being modified in
> this hrtimer, such as ->pmask etc., it is also the q->active[]
> pointers, the list state for the classes, just about everything in the
> qdisc state is referenced in this hrtimer code path.
> 
> I wonder how many queer unexplainable bugs we see because of this.
> 
> What should probably happen is that the hrtimer merely fires off work
> at software interrupt context (perhaps a tasklet or similar), and that
> software interrupt code take the qdisc's root lock throughout it's
> execution.

That's my understanding what HRTIMER_SOFTIRQ is used for. I think
simply grabbing the root lock in cbq_undelay() should be fine.

Compile-tested only.

Comments

Peter Zijlstra July 14, 2009, 8:30 a.m. UTC | #1
On Tue, 2009-07-14 at 10:22 +0200, Patrick McHardy wrote:
> 
> That's my understanding what HRTIMER_SOFTIRQ is used for. I think
> simply grabbing the root lock in cbq_undelay() should be fine.

Its not, and its going away soon (again) :-)

The current use of HRTIMER_SOFTIRQ is for when we enqueue a hrtimer with
an expiration time in the past. The current implementation tries to run
the timer instantly, however we cannot do it from the context calling
hrtimer_start(), since that might be holding locks the timer callback
also wants to hold, resulting in deadlocks.

Instead we queue the timer to the softirq, and kick the softirq. Which
leads to another problem in that we cannot always kick the softirq (esp
from within the scheduler).

We're going to change hrtimer_start() to return -ETIME instead of trying
to run the timer in-place, leaving the callers to figure it out.

The basic patch is done, but I still need to audit all the hrtimer users
in the kernel.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 14, 2009, 4:01 p.m. UTC | #2
From: Patrick McHardy <kaber@trash.net>
Date: Tue, 14 Jul 2009 10:22:02 +0200

> That's my understanding what HRTIMER_SOFTIRQ is used for. I think
> simply grabbing the root lock in cbq_undelay() should be fine.
> 
> Compile-tested only.

Unfortunately, as Peter and Thomas explained, this is not the case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

commit a790fb8873f1cbe8b9cb48cb368851e30d3ec172
Author: Patrick McHardy <kaber@trash.net>
Date:   Tue Jul 14 10:19:47 2009 +0200

    net-sched: sch_cbq: fix locking in cbq_undelay()
    
    The hrtimer callback cbq_undelay() is not serialized against
    cbq_ovl_delay(). That affects at least q->pmask and q->delay_timer.
    
    Lock it proper.
    
    Based on patch by Thomas Gleixner <tglx@linutronix.de>
    
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 23a1676..7c659c6 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -593,12 +593,16 @@  static enum hrtimer_restart cbq_undelay(struct hrtimer *timer)
 	struct cbq_sched_data *q = container_of(timer, struct cbq_sched_data,
 						delay_timer);
 	struct Qdisc *sch = q->watchdog.qdisc;
+	spinlock_t *root_lock;
 	psched_time_t now;
 	psched_tdiff_t delay = 0;
 	unsigned pmask;
 
 	now = psched_get_time();
 
+	root_lock = qdisc_lock(qdisc_root(sch));
+	spin_lock(root_lock);
+
 	pmask = q->pmask;
 	q->pmask = 0;
 
@@ -615,6 +619,7 @@  static enum hrtimer_restart cbq_undelay(struct hrtimer *timer)
 				delay = tmp;
 		}
 	}
+	spin_unlock(root_lock);
 
 	if (delay) {
 		ktime_t time;