Message ID | 20081117133548.GC6345@ff.dom.local |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, 2008-11-17 at 13:35 +0000, Jarek Poplawski wrote: > This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2 > shows local_bh_enable() is used in the wrong context (irqs disabled). > It happens when a usual network receive path is called by netconsole, > which simply turns off irqs around this all. Probably this is wrong, > but it worked like this long time, and it's not trivial to fix this. Unfortunately my brain lacks the magic to decrypt x86 stack traces, so I'm unable to read much from that report other than that it hit the WARN_ON. That looks more like the TX path to me? Anyway, my patch made that trigger for everybody rather than just on NOPREEMPT/UP (or something like that) and made the code easier to understand by removing the flags that are pointless anyway if the API is used correctly. You can find discussion around the patch at http://lkml.org/lkml/2008/6/17/259 > Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq: > remove irqs_disabled warning from local_bh_enable" can break things > after changing from local_irq_save() to local_irq_disable(). Before > this commit there was only a warning, now a lockup is possible, so > it could be treated as a regression. This patch reverts the change > in irqs. Do we have evidence of this actually hitting often? This is the first report of anything going wrong that I've seen ever since a single one right after this commit went into testing five months ago. IFF we want to add this back (and I'm not in favour) then please add a big comment that this is only to accomodate broken users. johannes
* Jarek Poplawski <jarkao2@gmail.com> wrote: > This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2 > shows local_bh_enable() is used in the wrong context (irqs > disabled). It happens when a usual network receive path is called by > netconsole, which simply turns off irqs around this all. Probably > this is wrong, but it worked like this long time, and it's not > trivial to fix this. > > Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq: > remove irqs_disabled warning from local_bh_enable" can break things > after changing from local_irq_save() to local_irq_disable(). Before > this commit there was only a warning, now a lockup is possible, so > it could be treated as a regression. This patch reverts the change > in irqs. hm, but calling local_bh_enable() with hardirqs off is a bug. It might be a long-standing bug, but it can cause lockups even with that change reverted: when we process softirqs in local_bh_enable(). So why not fix the bug instead? Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 17, 2008 at 05:16:17PM +0100, Ingo Molnar wrote: > > * Jarek Poplawski <jarkao2@gmail.com> wrote: > > > This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2 > > shows local_bh_enable() is used in the wrong context (irqs > > disabled). It happens when a usual network receive path is called by > > netconsole, which simply turns off irqs around this all. Probably > > this is wrong, but it worked like this long time, and it's not > > trivial to fix this. > > > > Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq: > > remove irqs_disabled warning from local_bh_enable" can break things > > after changing from local_irq_save() to local_irq_disable(). Before > > this commit there was only a warning, now a lockup is possible, so > > it could be treated as a regression. This patch reverts the change > > in irqs. > > hm, but calling local_bh_enable() with hardirqs off is a bug. It might > be a long-standing bug, but it can cause lockups even with that change > reverted: when we process softirqs in local_bh_enable(). I think it's what they call a regression: this is a long-standing bug, and this commit doesn't fix this, but can cause additional lockups. > So why not > fix the bug instead? It's not about instead: this bug could be fixed as well (if somebody knows how to do it "properly" without hacks like: if (!in_irq()) local_bh_disable(); etc.; but, I guess, the network code has more such bh disabling). Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 17, 2008 at 03:18:28PM +0100, Johannes Berg wrote: > On Mon, 2008-11-17 at 13:35 +0000, Jarek Poplawski wrote: > > This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2 > > shows local_bh_enable() is used in the wrong context (irqs disabled). > > It happens when a usual network receive path is called by netconsole, > > which simply turns off irqs around this all. Probably this is wrong, > > but it worked like this long time, and it's not trivial to fix this. > > Unfortunately my brain lacks the magic to decrypt x86 stack traces, so > I'm unable to read much from that report other than that it hit the > WARN_ON. That looks more like the TX path to me? OK, this looks like both paths (which is probably common in networking). > Anyway, my patch made > that trigger for everybody rather than just on NOPREEMPT/UP (or > something like that) and made the code easier to understand by removing > the flags that are pointless anyway if the API is used correctly. > > You can find discussion around the patch at > http://lkml.org/lkml/2008/6/17/259 Yes, it's very interesting. > > > Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq: > > remove irqs_disabled warning from local_bh_enable" can break things > > after changing from local_irq_save() to local_irq_disable(). Before > > this commit there was only a warning, now a lockup is possible, so > > it could be treated as a regression. This patch reverts the change > > in irqs. > > Do we have evidence of this actually hitting often? This is the first > report of anything going wrong that I've seen ever since a single one > right after this commit went into testing five months ago. > > IFF we want to add this back (and I'm not in favour) then please add a > big comment that this is only to accomodate broken users. Yes, it seems there should be more such reports from netconsole users. But, I guess we kind of expect this if we still use WARN_ON and not BUG_ON here? Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/softirq.c b/kernel/softirq.c index e7c69a7..756c928 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -135,9 +135,12 @@ EXPORT_SYMBOL(_local_bh_enable); static inline void _local_bh_enable_ip(unsigned long ip) { +#ifdef CONFIG_TRACE_IRQFLAGS + unsigned long flags; +#endif WARN_ON_ONCE(in_irq() || irqs_disabled()); #ifdef CONFIG_TRACE_IRQFLAGS - local_irq_disable(); + local_irq_save(flags); #endif /* * Are softirqs going to be turned on now: @@ -155,7 +158,7 @@ static inline void _local_bh_enable_ip(unsigned long ip) dec_preempt_count(); #ifdef CONFIG_TRACE_IRQFLAGS - local_irq_enable(); + local_irq_restore(flags); #endif preempt_check_resched(); }
This report: http://marc.info/?l=linux-netdev&m=122599341430090&w=2 shows local_bh_enable() is used in the wrong context (irqs disabled). It happens when a usual network receive path is called by netconsole, which simply turns off irqs around this all. Probably this is wrong, but it worked like this long time, and it's not trivial to fix this. Anyway, a commit 0f476b6d91a1395bda6464e653ce66ea9bea7167 "softirq: remove irqs_disabled warning from local_bh_enable" can break things after changing from local_irq_save() to local_irq_disable(). Before this commit there was only a warning, now a lockup is possible, so it could be treated as a regression. This patch reverts the change in irqs. Reported-by: Ferenc Wagner <wferi@niif.hu> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> --- kernel/softirq.c | 7 +++++-- 1 files changed, 5 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html