diff mbox

[1/3] net: serialize hrtimer callback in sched_cbq

Message ID 20090709215606.526259917@linutronix.de
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Thomas Gleixner July 9, 2009, 9:59 p.m. UTC
The hrtimer callback cbq_undelay() is not serialized against
cbq_ovl_delay(). That affects at least q->pmask and q->delay_timer.

Lock it proper.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 net/sched/sch_cbq.c |    8 ++++++++
 1 file changed, 8 insertions(+)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller July 12, 2009, 8:55 p.m. UTC | #1
From: Thomas Gleixner <tglx@linutronix.de>
Date: Thu, 09 Jul 2009 21:59:22 -0000

> The hrtimer callback cbq_undelay() is not serialized against
> cbq_ovl_delay(). That affects at least q->pmask and q->delay_timer.
> 
> Lock it proper.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

The problems here are even much deeper than it appears.

First of all, I am to understand that hrtimers run from hardware
interrupt context, right?  If so, all of these datastructures are
softirq safe only.

And it is not merely the immediate things you see being modified in
this hrtimer, such as ->pmask etc., it is also the q->active[]
pointers, the list state for the classes, just about everything in the
qdisc state is referenced in this hrtimer code path.

I wonder how many queer unexplainable bugs we see because of this.

What should probably happen is that the hrtimer merely fires off work
at software interrupt context (perhaps a tasklet or similar), and that
software interrupt code take the qdisc's root lock throughout it's
execution.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner July 14, 2009, 8:55 a.m. UTC | #2
David,

On Sun, 12 Jul 2009, David Miller wrote:

> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Thu, 09 Jul 2009 21:59:22 -0000
> 
> > The hrtimer callback cbq_undelay() is not serialized against
> > cbq_ovl_delay(). That affects at least q->pmask and q->delay_timer.
> > 
> > Lock it proper.
> > 
> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> 
> The problems here are even much deeper than it appears.
> 
> First of all, I am to understand that hrtimers run from hardware
> interrupt context, right?  If so, all of these datastructures are
> softirq safe only.
> 
> And it is not merely the immediate things you see being modified in
> this hrtimer, such as ->pmask etc., it is also the q->active[]
> pointers, the list state for the classes, just about everything in the
> qdisc state is referenced in this hrtimer code path.

That's what I was worried about.
 
> I wonder how many queer unexplainable bugs we see because of this.
> 
> What should probably happen is that the hrtimer merely fires off work
> at software interrupt context (perhaps a tasklet or similar), and that
> software interrupt code take the qdisc's root lock throughout it's
> execution.

Sigh, I almost expected that the removal of the callback modes will
fire back some day.

Thanks,

	tglx

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 14, 2009, 4 p.m. UTC | #3
From: Thomas Gleixner <tglx@linutronix.de>
Date: Tue, 14 Jul 2009 10:55:14 +0200 (CEST)

> David,
> 
> On Sun, 12 Jul 2009, David Miller wrote:
> 
>> What should probably happen is that the hrtimer merely fires off work
>> at software interrupt context (perhaps a tasklet or similar), and that
>> software interrupt code take the qdisc's root lock throughout it's
>> execution.
> 
> Sigh, I almost expected that the removal of the callback modes will
> fire back some day.

Well this makes hrtimers decidedly less useful for networking and we
have a ton of bugs right now, basically in every hrtimer used by the
networking currently.

The only way we can use them, as things currently stand, is as
triggers for softirq work.

Is it really that troublesome to provide this kind of facility
generically, rather than having various subsystems replicate such code
where they want to use hrtimers and are restricted to softirqs?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra July 14, 2009, 4:28 p.m. UTC | #4
On Tue, 2009-07-14 at 09:00 -0700, David Miller wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Tue, 14 Jul 2009 10:55:14 +0200 (CEST)

> > On Sun, 12 Jul 2009, David Miller wrote:
> > 
> >> What should probably happen is that the hrtimer merely fires off work
> >> at software interrupt context (perhaps a tasklet or similar), and that
> >> software interrupt code take the qdisc's root lock throughout it's
> >> execution.
> > 
> > Sigh, I almost expected that the removal of the callback modes will
> > fire back some day.
> 
> Well this makes hrtimers decidedly less useful for networking and we
> have a ton of bugs right now, basically in every hrtimer used by the
> networking currently.
> 
> The only way we can use them, as things currently stand, is as
> triggers for softirq work.
> 
> Is it really that troublesome to provide this kind of facility
> generically, rather than having various subsystems replicate such code
> where they want to use hrtimers and are restricted to softirqs?

Linus really hated the softirq mode, which is what prompted me to change
that.

Now, it might be he only hated the particular interface and the
resulting code, but I think to remember he simply thought the whole
thing daft.

I can look into adding it back if we can agree on the interface and code
impact, but looking at:

# git grep hrtimer_init net/ | sort -u
net/can/bcm.c:          hrtimer_init(&op->thrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
net/can/bcm.c:          hrtimer_init(&op->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
net/sched/sch_api.c:    hrtimer_init(&wd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
net/sched/sch_cbq.c:    hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);

I wonder if its worth the impact on the core kernel code, or whether its
better for these few timers to kick off a tasklet or the like.

Further, I don't think a lot of subsystems would need this, as the
general trend is away from softirqs/tasklets and towards
threads/workqueues as most people want to schedule. And for those
hardirq hrtimers are good enough as a wakeup source.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds July 14, 2009, 4:42 p.m. UTC | #5
On Tue, 14 Jul 2009, Peter Zijlstra wrote:
> 
> Linus really hated the softirq mode, which is what prompted me to change
> that.
> 
> Now, it might be he only hated the particular interface and the
> resulting code, but I think to remember he simply thought the whole
> thing daft.

Yes. And I hated the bugs it had. 

Don't make something as core as timers any more complicated. Don't take 
locks in timers and then complain about deadlocks. If your locking is 
broken, don't make the core timers be idiotically broken.

Because it was. The code was a total mess to follow, and had bugs.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Hartkopp July 15, 2009, 9:56 a.m. UTC | #6
David Miller wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Tue, 14 Jul 2009 10:55:14 +0200 (CEST)
> 
>> David,
>>
>> On Sun, 12 Jul 2009, David Miller wrote:
>>
>>> What should probably happen is that the hrtimer merely fires off work
>>> at software interrupt context (perhaps a tasklet or similar), and that
>>> software interrupt code take the qdisc's root lock throughout it's
>>> execution.
>> Sigh, I almost expected that the removal of the callback modes will
>> fire back some day.
> 
> Well this makes hrtimers decidedly less useful for networking and we
> have a ton of bugs right now, basically in every hrtimer used by the
> networking currently.

The CAN stuff is clean in this topic. See below.

> 
> The only way we can use them, as things currently stand, is as
> triggers for softirq work.
> 
> Is it really that troublesome to provide this kind of facility
> generically, rather than having various subsystems replicate such code
> where they want to use hrtimers and are restricted to softirqs?

Indeed this had been my concerns also, when i moved the hrtimer usage in a CAN
protocol to use tasklets.

("can: update can-bcm for hrtimer hardirq callbacks")
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=6e5c172cf7ca1ab878cc6a6a4c1d52fef60f3ee0

due to

("hrtimer: removing all ur callback modes")
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ca109491f612aab5c8152207631c0444f63da97f

I was not very amused that time and wanted to NACK that change, but Linus said:

"Quite frankly, your NAK doesn't matter.
We've had too many bugs in hrtimers. They _will_ get simplified."
(http://lkml.indiana.edu/hypermail/linux/kernel/0812.1/00218.html)

Thomas, is there chance to get this nice simple possibility back to invoke at
least a hrtimer for SOFT_IRQ context additional to the current functionality??

I would expect this to save lot's of tasklet code that is - and will be -
created due to the lack of the hrtimers softirq capability ...

FWIK there were really many callback modes before (per-cpu stuff and so), that
were probably too much. But having a SOFT_IRQ callback mode again would really
help.

Regards,
Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/net/sched/sch_cbq.c
===================================================================
--- linux-2.6.orig/net/sched/sch_cbq.c
+++ linux-2.6/net/sched/sch_cbq.c
@@ -163,6 +163,7 @@  struct cbq_sched_data
 	psched_time_t		now_rt;		/* Cached real time */
 	unsigned		pmask;
 
+	spinlock_t		lock;
 	struct hrtimer		delay_timer;
 	struct qdisc_watchdog	watchdog;	/* Watchdog timer,
 						   started when CBQ has
@@ -503,6 +504,9 @@  static void cbq_ovl_delay(struct cbq_cla
 		cl->undertime = q->now + delay;
 
 		if (delay > 0) {
+			unsigned long flags;
+
+			spin_lock_irqsave(&q->lock, flags);
 			sched += delay + cl->penalty;
 			cl->penalized = sched;
 			cl->cpriority = TC_CBQ_MAXPRIO;
@@ -518,6 +522,7 @@  static void cbq_ovl_delay(struct cbq_cla
 			hrtimer_restart(&q->delay_timer);
 			cl->delayed = 1;
 			cl->xstats.overactions++;
+			spin_unlock_irqrestore(&q->lock, flags);
 			return;
 		}
 		delay = 1;
@@ -599,6 +604,7 @@  static enum hrtimer_restart cbq_undelay(
 
 	now = psched_get_time();
 
+	spin_lock(&q->lock);
 	pmask = q->pmask;
 	q->pmask = 0;
 
@@ -623,6 +629,7 @@  static enum hrtimer_restart cbq_undelay(
 		time = ktime_add_ns(time, PSCHED_TICKS2NS(now + delay));
 		hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS);
 	}
+	spin_unlock(&q->lock);
 
 	sch->flags &= ~TCQ_F_THROTTLED;
 	__netif_schedule(qdisc_root(sch));
@@ -1396,6 +1403,7 @@  static int cbq_init(struct Qdisc *sch, s
 	q->link.avpkt = q->link.allot/2;
 	q->link.minidle = -0x7FFFFFFF;
 
+	spin_lock_init(&q->lock);
 	qdisc_watchdog_init(&q->watchdog, sch);
 	hrtimer_init(&q->delay_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	q->delay_timer.function = cbq_undelay;