Patchwork net: fix NOHZ: local_softirq_pending 08

login
register
mail settings
Submitter Oliver Hartkopp
Date Sept. 30, 2009, 6:18 p.m.
Message ID <4AC3A0F1.3060306@hartkopp.net>
Download mbox | patch
Permalink /patch/34610/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Oliver Hartkopp - Sept. 30, 2009, 6:18 p.m.
Socket buffers that are generated and received inside softirqs or from process
context must not use netif_rx() that's intended to be used from irq context only.

This patch introduces a new helper function netif_rx_ti(skb) that tests for
in_interrupt() before invoking netif_rx() or netif_rx_ni().

It fixes the ratelimited kernel warning

        NOHZ: local_softirq_pending 08

in the mac80211 and can subsystems.

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>

---
John W. Linville - Sept. 30, 2009, 6:47 p.m.
On Wed, Sep 30, 2009 at 08:18:25PM +0200, Oliver Hartkopp wrote:
> Socket buffers that are generated and received inside softirqs or from process
> context must not use netif_rx() that's intended to be used from irq context only.
> 
> This patch introduces a new helper function netif_rx_ti(skb) that tests for
> in_interrupt() before invoking netif_rx() or netif_rx_ni().
> 
> It fixes the ratelimited kernel warning
> 
>         NOHZ: local_softirq_pending 08
> 
> in the mac80211 and can subsystems.
> 
> Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>

http://bugzilla.kernel.org/show_bug.cgi?id=14278

Acked-by: John W. Linville <linville@tuxdriver.com>
David Miller - Sept. 30, 2009, 11:33 p.m.
From: Oliver Hartkopp <oliver@hartkopp.net>
Date: Wed, 30 Sep 2009 20:18:25 +0200

> Socket buffers that are generated and received inside softirqs or from process
> context must not use netif_rx() that's intended to be used from irq context only.
> 
> This patch introduces a new helper function netif_rx_ti(skb) that tests for
> in_interrupt() before invoking netif_rx() or netif_rx_ni().
> 
> It fixes the ratelimited kernel warning
> 
>         NOHZ: local_softirq_pending 08
> 
> in the mac80211 and can subsystems.
> 
> Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>

I bet all of these code paths can use netif_receive_skb() or
don't need this conditional blob at all.

Looking at some specific cases in this patch:

1) VCAN:  This RX routine is only invoked from the drivers
   ->ndo_start_xmit() handler, and therefore like the loopback
   driver we know that BH's are already disabled and therefore
   it can always use netif_rx() safely.

   Why did you convert this case?

   And if this needs to be converted, why doesn't loopback need
   to be?

2) af_can.c:  In what situation will netif_rx_ni() not be appropriate
   here?  It should always execute in softirq or user context, now
   hardirq context.

And the list goes on and on, I don't really like this new conditional
testing of interrupt state.  As always, that's usually a red flag and
as far as I can see these spots where you're changing things are only
trying to receive packets in one of the two possible cases not both.

I'm not applying this until all of these details are sorted out and
you add some verbosity to the commit message explaining each and every
case you are changing, what contexts those cases can be called
from, and from where.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oliver Hartkopp - Oct. 1, 2009, 7:08 a.m.
David Miller wrote:
> From: Oliver Hartkopp <oliver@hartkopp.net>
> Date: Wed, 30 Sep 2009 20:18:25 +0200
> 
>> Socket buffers that are generated and received inside softirqs or from process
>> context must not use netif_rx() that's intended to be used from irq context only.
>>
>> This patch introduces a new helper function netif_rx_ti(skb) that tests for
>> in_interrupt() before invoking netif_rx() or netif_rx_ni().
>>
>> It fixes the ratelimited kernel warning
>>
>>         NOHZ: local_softirq_pending 08
>>
>> in the mac80211 and can subsystems.
>>
>> Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
> 
> I bet all of these code paths can use netif_receive_skb() or
> don't need this conditional blob at all.
> 
> Looking at some specific cases in this patch:
> 
> 1) VCAN:  This RX routine is only invoked from the drivers
>    ->ndo_start_xmit() handler, and therefore like the loopback
>    driver we know that BH's are already disabled and therefore
>    it can always use netif_rx() safely.
> 
>    Why did you convert this case?
> 
>    And if this needs to be converted, why doesn't loopback need
>    to be?
> 
> 2) af_can.c:  In what situation will netif_rx_ni() not be appropriate
>    here?  It should always execute in softirq or user context, now
>    hardirq context.
> 
> And the list goes on and on, I don't really like this new conditional
> testing of interrupt state.

Hello Dave,

i'm confused about in_interrupt(), in_softirq() and in_irq() as pointed out by
Johannes here:

http://marc.info/?l=linux-wireless&m=125432410405562&w=2

Indeed in the two cases for the CAN stuff (in vcan.c and af_can.c) the skb's
are received in process-context and softirq-context only.

Therefore i used netif_rx_ni() in my last change of this code.

Now i was reading from Johannes that in_interrupt() is used for
hardirq-context /and/ softirq-context, so i was just *unsure* and used the
newly introduced netif_rx_ti() for that, which tests for in_interrupt().

Indeed i'm not really happy with that, as it is always better to check each
receive case in which context it can be used and used exactly the right
function for that.

So when netif_rx_ni() or netif_receive_skb() is the best i can use when in
process-context or in softirq-context, i'll do it with pleasure.

And if it is like this the problematic netif_rx() calls in mac80211 need to be
sorted out in detail also ...

Regards,
Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Buesch - Oct. 1, 2009, 2:04 p.m.
On Thursday 01 October 2009 01:33:33 David Miller wrote:

> I'm not applying this until all of these details are sorted out 

John, please apply my fix to wireless-testing to get rid of the regression.
You can revert it later, if there's a better fix available.
Kalle Valo - Oct. 1, 2009, 2:24 p.m.
Michael Buesch <mb@bu3sch.de> writes:

> On Thursday 01 October 2009 01:33:33 David Miller wrote:
>
>> I'm not applying this until all of these details are sorted out 
>
> John, please apply my fix to wireless-testing to get rid of the
> regression. You can revert it later, if there's a better fix
> available.

I agree, please take Michael's patch. It's trivial to change mac80211
part whenever there's better support available.

But I don't think this is a regression because I see the bug also with
2.6.28, most probably it has been in mac80211 forever. But it's still
a bug which needs to be fixed.
Johannes Berg - Oct. 1, 2009, 6:42 p.m.
On Thu, 2009-10-01 at 16:04 +0200, Michael Buesch wrote:
> On Thursday 01 October 2009 01:33:33 David Miller wrote:
> 
> > I'm not applying this until all of these details are sorted out 
> 
> John, please apply my fix to wireless-testing to get rid of the regression.
> You can revert it later, if there's a better fix available.

I agree with davem, don't. Just fix the driver to local_bh_disable()
around the rx function if necessary.

johannes
Michael Buesch - Oct. 1, 2009, 7:10 p.m.
On Thursday 01 October 2009 20:42:28 Johannes Berg wrote:
> On Thu, 2009-10-01 at 16:04 +0200, Michael Buesch wrote:
> > On Thursday 01 October 2009 01:33:33 David Miller wrote:
> > 
> > > I'm not applying this until all of these details are sorted out 
> > 
> > John, please apply my fix to wireless-testing to get rid of the regression.
> > You can revert it later, if there's a better fix available.
> 
> I agree with davem, don't. Just fix the driver to local_bh_disable()
> around the rx function if necessary.

For the benefit of a much bigger critical section? I don't get it why this would be any better.

I _am_ going to do one thing now, however. That is ignoring any regression bugreport.
(Yes, it _is_ a regression for b43)
Johannes Berg - Oct. 1, 2009, 7:26 p.m.
On Thu, 2009-10-01 at 21:10 +0200, Michael Buesch wrote:

> > I agree with davem, don't. Just fix the driver to local_bh_disable()
> > around the rx function if necessary.
> 
> For the benefit of a much bigger critical section? I don't get it why this would be any better.

And how do you know mac80211 is actually safe with this change? It uses
tasklets too. At the very least you'd have to require drivers to never
mix & match the regular/irqsafe functions at all.

johannes
David Miller - Oct. 1, 2009, 7:32 p.m.
From: Michael Buesch <mb@bu3sch.de>
Date: Thu, 1 Oct 2009 21:10:32 +0200

> For the benefit of a much bigger critical section? I don't get it
> why this would be any better.

Think about what you are saying when you introduce things
like this into your code:

	if (in_interrupt())
		foo();
	else
		bar();

That thing there means you don't know anything about how you'll need
to do locking properly, because you have no idea about even the
context in which your code is executed.

Sure, you can lock for the most stringent case, but that's silly and
wasteful.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/can/vcan.c b/drivers/net/can/vcan.c
index 80ac563..899f3d3 100644
--- a/drivers/net/can/vcan.c
+++ b/drivers/net/can/vcan.c
@@ -80,7 +80,7 @@  static void vcan_rx(struct sk_buff *skb, struct net_device *dev)
 	skb->dev       = dev;
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-	netif_rx_ni(skb);
+	netif_rx_ti(skb);
 }
 
 static netdev_tx_t vcan_tx(struct sk_buff *skb, struct net_device *dev)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 94958c1..dc8dfb2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1509,6 +1509,19 @@  extern int		netdev_budget;
 extern void netdev_run_todo(void);
 
 /**
+ *	netif_rx_ti - test for irq context and post buffer to the network code
+ *	@skb: buffer to post
+ *
+ */
+static inline int netif_rx_ti(struct sk_buff *skb)
+{
+	if (in_interrupt())
+		return netif_rx(skb);
+	else
+		return netif_rx_ni(skb);
+}
+
+/**
  *	dev_put - release reference to device
  *	@dev: network device
  *
diff --git a/net/can/af_can.c b/net/can/af_can.c
index 6068321..c21e7f4 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -199,8 +199,6 @@  static int can_create(struct net *net, struct socket *sock, int protocol)
  * @skb: pointer to socket buffer with CAN frame in data section
  * @loop: loopback for listeners on local CAN sockets (recommended default!)
  *
- * Due to the loopback this routine must not be called from hardirq context.
- *
  * Return:
  *  0 on success
  *  -ENETDOWN when the selected interface is down
@@ -280,7 +278,7 @@  int can_send(struct sk_buff *skb, int loop)
 	}
 
 	if (newskb)
-		netif_rx_ni(newskb);
+		netif_rx_ti(newskb);
 
 	/* update statistics */
 	can_stats.tx_frames++;
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 5608f6c..bbcb4cb 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -606,7 +606,7 @@  static void ieee80211_send_layer2_update(struct sta_info *sta)
 	skb->dev = sta->sdata->dev;
 	skb->protocol = eth_type_trans(skb, sta->sdata->dev);
 	memset(skb->cb, 0, sizeof(skb->cb));
-	netif_rx(skb);
+	netif_rx_ti(skb);
 }
 
 static void sta_apply_parameters(struct ieee80211_local *local,
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 797f539..1109f99 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -591,7 +591,7 @@  void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 				skb2 = skb_clone(skb, GFP_ATOMIC);
 				if (skb2) {
 					skb2->dev = prev_dev;
-					netif_rx(skb2);
+					netif_rx_ti(skb2);
 				}
 			}
 
@@ -600,7 +600,7 @@  void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 	}
 	if (prev_dev) {
 		skb->dev = prev_dev;
-		netif_rx(skb);
+		netif_rx_ti(skb);
 		skb = NULL;
 	}
 	rcu_read_unlock();
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index c01588f..5bb7c04 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -309,7 +309,7 @@  ieee80211_rx_monitor(struct ieee80211_local *local, struct sk_buff *origskb,
 			skb2 = skb_clone(skb, GFP_ATOMIC);
 			if (skb2) {
 				skb2->dev = prev_dev;
-				netif_rx(skb2);
+				netif_rx_ti(skb2);
 			}
 		}
 
@@ -320,7 +320,7 @@  ieee80211_rx_monitor(struct ieee80211_local *local, struct sk_buff *origskb,
 
 	if (prev_dev) {
 		skb->dev = prev_dev;
-		netif_rx(skb);
+		netif_rx_ti(skb);
 	} else
 		dev_kfree_skb(skb);
 
@@ -1349,7 +1349,7 @@  ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
 			/* deliver to local stack */
 			skb->protocol = eth_type_trans(skb, dev);
 			memset(skb->cb, 0, sizeof(skb->cb));
-			netif_rx(skb);
+			netif_rx_ti(skb);
 		}
 	}
 
@@ -1943,7 +1943,7 @@  static void ieee80211_rx_cooked_monitor(struct ieee80211_rx_data *rx)
 			skb2 = skb_clone(skb, GFP_ATOMIC);
 			if (skb2) {
 				skb2->dev = prev_dev;
-				netif_rx(skb2);
+				netif_rx_ti(skb2);
 			}
 		}
 
@@ -1954,7 +1954,7 @@  static void ieee80211_rx_cooked_monitor(struct ieee80211_rx_data *rx)
 
 	if (prev_dev) {
 		skb->dev = prev_dev;
-		netif_rx(skb);
+		netif_rx_ti(skb);
 		skb = NULL;
 	} else
 		goto out_free_skb;