diff mbox

softirq: Use local_irq_save() in local_bh_enable()

Message ID 20081117133548.GC6345@ff.dom.local
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Jarek Poplawski Nov. 17, 2008, 1:35 p.m. UTC
This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2
shows local_bh_enable() is used in the wrong context (irqs disabled).
It happens when a usual network receive path is called by netconsole,
which simply turns off irqs around this all. Probably this is wrong,
but it worked like this long time, and it's not trivial to fix this.

Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq:
remove irqs_disabled warning from local_bh_enable" can break things
after changing from local_irq_save() to local_irq_disable(). Before
this commit there was only a warning, now a lockup is possible, so
it could be treated as a regression. This patch reverts the change
in irqs.

Reported-by: Ferenc Wagner <wferi@niif.hu>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 kernel/softirq.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Johannes Berg Nov. 17, 2008, 2:18 p.m. UTC | #1
On Mon, 2008-11-17 at 13:35 +0000, Jarek Poplawski wrote:
> This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2
> shows local_bh_enable() is used in the wrong context (irqs disabled).
> It happens when a usual network receive path is called by netconsole,
> which simply turns off irqs around this all. Probably this is wrong,
> but it worked like this long time, and it's not trivial to fix this.

Unfortunately my brain lacks the magic to decrypt x86 stack traces, so
I'm unable to read much from that report other than that it hit the
WARN_ON. That looks more like the TX path to me? Anyway, my patch made
that trigger for everybody rather than just on NOPREEMPT/UP (or
something like that) and made the code easier to understand by removing
the flags that are pointless anyway if the API is used correctly.

You can find discussion around the patch at
http://lkml.org/lkml/2008/6/17/259

> Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq:
> remove irqs_disabled warning from local_bh_enable" can break things
> after changing from local_irq_save() to local_irq_disable(). Before
> this commit there was only a warning, now a lockup is possible, so
> it could be treated as a regression. This patch reverts the change
> in irqs.

Do we have evidence of this actually hitting often? This is the first
report of anything going wrong that I've seen ever since a single one
right after this commit went into testing five months ago.

IFF we want to add this back (and I'm not in favour) then please add a
big comment that this is only to accomodate broken users.

johannes
Ingo Molnar Nov. 17, 2008, 4:16 p.m. UTC | #2
* Jarek Poplawski <jarkao2@gmail.com> wrote:

> This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2 
> shows local_bh_enable() is used in the wrong context (irqs 
> disabled). It happens when a usual network receive path is called by 
> netconsole, which simply turns off irqs around this all. Probably 
> this is wrong, but it worked like this long time, and it's not 
> trivial to fix this.
> 
> Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq: 
> remove irqs_disabled warning from local_bh_enable" can break things 
> after changing from local_irq_save() to local_irq_disable(). Before 
> this commit there was only a warning, now a lockup is possible, so 
> it could be treated as a regression. This patch reverts the change 
> in irqs.

hm, but calling local_bh_enable() with hardirqs off is a bug. It might 
be a long-standing bug, but it can cause lockups even with that change 
reverted: when we process softirqs in local_bh_enable(). So why not 
fix the bug instead?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Nov. 18, 2008, 7:38 a.m. UTC | #3
On Mon, Nov 17, 2008 at 05:16:17PM +0100, Ingo Molnar wrote:
> 
> * Jarek Poplawski <jarkao2@gmail.com> wrote:
> 
> > This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2 
> > shows local_bh_enable() is used in the wrong context (irqs 
> > disabled). It happens when a usual network receive path is called by 
> > netconsole, which simply turns off irqs around this all. Probably 
> > this is wrong, but it worked like this long time, and it's not 
> > trivial to fix this.
> > 
> > Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq: 
> > remove irqs_disabled warning from local_bh_enable" can break things 
> > after changing from local_irq_save() to local_irq_disable(). Before 
> > this commit there was only a warning, now a lockup is possible, so 
> > it could be treated as a regression. This patch reverts the change 
> > in irqs.
> 
> hm, but calling local_bh_enable() with hardirqs off is a bug. It might 
> be a long-standing bug, but it can cause lockups even with that change 
> reverted: when we process softirqs in local_bh_enable().

I think it's what they call a regression: this is a long-standing bug,
and this commit doesn't fix this, but can cause additional lockups.

> So why not 
> fix the bug instead?

It's not about instead: this bug could be fixed as well (if somebody
knows how to do it "properly" without hacks like:
	if (!in_irq())
		local_bh_disable();
etc.; but, I guess, the network code has more such bh disabling).

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Nov. 18, 2008, 7:49 a.m. UTC | #4
On Mon, Nov 17, 2008 at 03:18:28PM +0100, Johannes Berg wrote:
> On Mon, 2008-11-17 at 13:35 +0000, Jarek Poplawski wrote:
> > This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2
> > shows local_bh_enable() is used in the wrong context (irqs disabled).
> > It happens when a usual network receive path is called by netconsole,
> > which simply turns off irqs around this all. Probably this is wrong,
> > but it worked like this long time, and it's not trivial to fix this.
> 
> Unfortunately my brain lacks the magic to decrypt x86 stack traces, so
> I'm unable to read much from that report other than that it hit the
> WARN_ON. That looks more like the TX path to me?

OK, this looks like both paths (which is probably common in networking).

> Anyway, my patch made
> that trigger for everybody rather than just on NOPREEMPT/UP (or
> something like that) and made the code easier to understand by removing
> the flags that are pointless anyway if the API is used correctly.
> 
> You can find discussion around the patch at
> http://lkml.org/lkml/2008/6/17/259

Yes, it's very interesting.

> 
> > Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq:
> > remove irqs_disabled warning from local_bh_enable" can break things
> > after changing from local_irq_save() to local_irq_disable(). Before
> > this commit there was only a warning, now a lockup is possible, so
> > it could be treated as a regression. This patch reverts the change
> > in irqs.
> 
> Do we have evidence of this actually hitting often? This is the first
> report of anything going wrong that I've seen ever since a single one
> right after this commit went into testing five months ago.
> 
> IFF we want to add this back (and I'm not in favour) then please add a
> big comment that this is only to accomodate broken users.

Yes, it seems there should be more such reports from netconsole users.
But, I guess we kind of expect this if we still use WARN_ON and not
BUG_ON here?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/softirq.c b/kernel/softirq.c
index e7c69a7..756c928 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -135,9 +135,12 @@  EXPORT_SYMBOL(_local_bh_enable);
 
 static inline void _local_bh_enable_ip(unsigned long ip)
 {
+#ifdef CONFIG_TRACE_IRQFLAGS
+	unsigned long flags;
+#endif
 	WARN_ON_ONCE(in_irq() || irqs_disabled());
 #ifdef CONFIG_TRACE_IRQFLAGS
-	local_irq_disable();
+	local_irq_save(flags);
 #endif
 	/*
 	 * Are softirqs going to be turned on now:
@@ -155,7 +158,7 @@  static inline void _local_bh_enable_ip(unsigned long ip)
 
 	dec_preempt_count();
 #ifdef CONFIG_TRACE_IRQFLAGS
-	local_irq_enable();
+	local_irq_restore(flags);
 #endif
 	preempt_check_resched();
 }