diff mbox

[6/8] netpoll: Allow netpoll_setup/cleanup recursion

Message ID E1OMtjk-0006P2-6r@gondolin.me.apana.org.au
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu June 11, 2010, 2:12 a.m. UTC
netpoll: Allow netpoll_setup/cleanup recursion

This patch adds the functions __netpoll_setup/__netpoll_cleanup
which is designed to be called recursively through ndo_netpoll_seutp.

They must be called with RTNL held, and the caller must initialise
np->dev and ensure that it has a valid reference count.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netpoll.h |    2 
 net/core/netpoll.c      |  176 ++++++++++++++++++++++++++----------------------
 2 files changed, 99 insertions(+), 79 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andrew Morton June 25, 2010, 1:21 a.m. UTC | #1
On Fri, 11 Jun 2010 12:12:48 +1000
Herbert Xu <herbert@gondor.hengli.com.au> wrote:

> netpoll: Allow netpoll_setup/cleanup recursion
> 
> This patch adds the functions __netpoll_setup/__netpoll_cleanup
> which is designed to be called recursively through ndo_netpoll_seutp.
> 
> They must be called with RTNL held, and the caller must initialise
> np->dev and ensure that it has a valid reference count.
> 
> ...
>
> -int netpoll_setup(struct netpoll *np)
> +int __netpoll_setup(struct netpoll *np)
>  {
> -	struct net_device *ndev = NULL;
> -	struct in_device *in_dev;
> +	struct net_device *ndev = np->dev;
>  	struct netpoll_info *npinfo;
>  	const struct net_device_ops *ops;
>  	unsigned long flags;
>  	int err;
>  
> +	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
> +	    !ndev->netdev_ops->ndo_poll_controller) {
> +		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
> +		       np->name, np->dev_name);
> +		err = -ENOTSUPP;
> +		goto out;
> +	}
> +
> +	if (!ndev->npinfo) {
> +		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
> +		if (!npinfo) {
> +			err = -ENOMEM;
> +			goto out;
> +		}
> +
> +		npinfo->rx_flags = 0;
> +		INIT_LIST_HEAD(&npinfo->rx_np);
> +
> +		spin_lock_init(&npinfo->rx_lock);
> +		skb_queue_head_init(&npinfo->arp_tx);
> +		skb_queue_head_init(&npinfo->txq);
> +		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
> +
> +		atomic_set(&npinfo->refcnt, 1);
> +
> +		ops = np->dev->netdev_ops;
> +		if (ops->ndo_netpoll_setup) {
> +			err = ops->ndo_netpoll_setup(ndev, npinfo);
> +			if (err)
> +				goto free_npinfo;
> +		}
> +	} else {
> +		npinfo = ndev->npinfo;
> +		atomic_inc(&npinfo->refcnt);
> +	}
> +
> +	npinfo->netpoll = np;
> +
> +	if (np->rx_hook) {
> +		spin_lock_irqsave(&npinfo->rx_lock, flags);
> +		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
> +		list_add_tail(&np->rx, &npinfo->rx_np);
> +		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
> +	}
> +
> +	/* last thing to do is link it to the net device structure */
> +	rcu_assign_pointer(ndev->npinfo, npinfo);
> +	rtnl_unlock();

And there it is, an unbalanced rtnl_unlock().

This stupid little thing took me over a day's work to find - it's just
been awful.

The user-visible symptom was that a bug in the netpoll code causes the
machine to hang after loading ipv6 (!), because
addrconf_fixup_forwarding()'s rtnl_trylock() kept on failing, and the
restart_syscall() kept on getting restarted, so an initscripts procfs
write just kept banging its head against the excessively-unlocked
mutex.

The mutex code handles an excessively-unlocked mutex (mutex.count==2)
really badly.  Some API functions say "its locked", others say "it
isn't", etc.

Maybe it's better with mutex debugging enabled - didn't try that. 
Things get pretty user-unfriendly when there's a bug within the
netconsole code itself.

Enabling lockdep simply made the bug cure itself - I suspect the mutex
code's handling of mutexes is different if lockdep is enabled.  That
would be pretty bad behaviour from the lockdep code.

I just removed the rtnl_unlock() - I couldn't see much in there which
needed rtnl_locking..

Dave, the fixup should be folded into the original patch please -
otherwise we'll have a machine-hangs-up bisection hole which spans two
weeks work of commits.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu June 25, 2010, 3:01 a.m. UTC | #2
On Thu, Jun 24, 2010 at 06:21:23PM -0700, Andrew Morton wrote:
>
> I just removed the rtnl_unlock() - I couldn't see much in there which
> needed rtnl_locking..

Yes that's the correct fix.  I forgot to remove it from the
recursive setup path after moving the RTNL lock out.  Sorry!
David Miller June 25, 2010, 3:30 a.m. UTC | #3
From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 24 Jun 2010 18:21:23 -0700

> Dave, the fixup should be folded into the original patch please -
> otherwise we'll have a machine-hangs-up bisection hole which spans two
> weeks work of commits.

I'm not unwrapping my tree that far back, and there's already been
a dependent tree pull or two which depend upon the tree past that
commit, one of it was a rather large wireless merge.

It's going to screw over too many people.

We'll just merge the bug fix in, and be done with it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton June 25, 2010, 3:50 a.m. UTC | #4
On Thu, 24 Jun 2010 20:30:06 -0700 (PDT) David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Thu, 24 Jun 2010 18:21:23 -0700
> 
> > Dave, the fixup should be folded into the original patch please -
> > otherwise we'll have a machine-hangs-up bisection hole which spans two
> > weeks work of commits.
> 
> I'm not unwrapping my tree that far back, and there's already been
> a dependent tree pull or two which depend upon the tree past that
> commit, one of it was a rather large wireless merge.
> 
> It's going to screw over too many people.
> 
> We'll just merge the bug fix in, and be done with it.

Sigh.  That really sucks.

What happens if you want to actually *drop* a patch from net-next? 
Surely that happens?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 25, 2010, 4:27 a.m. UTC | #5
From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 24 Jun 2010 20:50:59 -0700

> What happens if you want to actually *drop* a patch from net-next? 
> Surely that happens?

I've only respun the tree on two or three occasions and that was
because I made some significant error myself and screwed up the
GIT tree somehow.

We've fixed much worse bugs than this one weeks after the changes
causing them went in, life goes on.

And the fact that it took two weeks of it being in -next before
anyone even reported it says how wide a net this particular bug
covers :-)  (hint: personally I've still never used netconsole
even one single time, and it's been around for what, something
like 6 years?)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andrew Morton June 25, 2010, 4:42 a.m. UTC | #6
On Thu, 24 Jun 2010 21:27:13 -0700 (PDT) David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Thu, 24 Jun 2010 20:50:59 -0700
> 
> > What happens if you want to actually *drop* a patch from net-next? 
> > Surely that happens?
> 
> I've only respun the tree on two or three occasions and that was
> because I made some significant error myself and screwed up the
> GIT tree somehow.
> 
> We've fixed much worse bugs than this one weeks after the changes
> causing them went in, life goes on.

Still sucks - this is a quite ugly drawback to how we're using git. 
I've hit bisection holes several times which held up the show. 
Sometimes you can make them go away by fiddling the .config, other
times I've hunted down the fix and manually applied it for each
iteration.  It makes me feel all guilty each time I ask some poor sap
to bisect a bug for us.

> And the fact that it took two weeks of it being in -next before
> anyone even reported it says how wide a net this particular bug
> covers :-)  (hint: personally I've still never used netconsole
> even one single time, and it's been around for what, something
> like 6 years?)

I'd imagine that netconsole would get in the way rather a lot for net
developers, but it's really useful!


That being said, I wonder why Herbert didn't hit this in his testing. 
I suspect that he'd enabled lockdep, which hid the bug.  I haven't
worked out _why_ lockdep hides the double-mutex_unlock bug, but it's a
pretty bad thing to do.

Presumably mutex debugging _would_ have found it, but because the bug
was in netconsole code, the mutex-debugging blurt of course didn't come
out.  We don't replay the log buffer when netconsole is brought up -
perhaps we should.

And that machine has a screwy USB keyboard on which I've never managed
to invoke the vt-srcoll-backwards thing, so it would have been darned
hard for me to see and mutex-debugging warnings anyway.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 25, 2010, 4:52 a.m. UTC | #7
From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 24 Jun 2010 21:42:04 -0700

> I'd imagine that netconsole would get in the way rather a lot for net
> developers, but it's really useful!

I have a fully logging stable serial or hypervisor console on every
one of my boxes, why would I even bother?

> That being said, I wonder why Herbert didn't hit this in his testing. 
> I suspect that he'd enabled lockdep, which hid the bug.  I haven't
> worked out _why_ lockdep hides the double-mutex_unlock bug, but it's a
> pretty bad thing to do.

Yes, definitely something to look into.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar June 25, 2010, 8:46 a.m. UTC | #8
* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 24 Jun 2010 21:27:13 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> 
> > From: Andrew Morton <akpm@linux-foundation.org>
> > Date: Thu, 24 Jun 2010 20:50:59 -0700
> > 
> > > What happens if you want to actually *drop* a patch from net-next? 
> > > Surely that happens?
> > 
> > I've only respun the tree on two or three occasions and that was
> > because I made some significant error myself and screwed up the
> > GIT tree somehow.
> > 
> > We've fixed much worse bugs than this one weeks after the changes
> > causing them went in, life goes on.
> 
> Still sucks - this is a quite ugly drawback to how we're using git. 
> I've hit bisection holes several times which held up the show. 
> Sometimes you can make them go away by fiddling the .config, other
> times I've hunted down the fix and manually applied it for each
> iteration.  It makes me feel all guilty each time I ask some poor sap
> to bisect a bug for us.

One solution would be, for really grave bugs that introduce a significant 
window of bisection breakage, to add a special tag:

  Fixes-Bug: SHA1

Which could then be parsed by a special variant of git bisect and which would 
try to cherry-pick all Fixes-Bug commits into the bisection point.

But there are numerous disadvantages that:

 - The bisection itself would be slower (maybe not a big issue for most 
   bisection sessions)

 - Such cherry-picked trees wouldnt be precisely the same thing as the tree 
   as-it-was-there.

 - Awkward combination bugs could ensue

 - The cherry-pick may fail or may mis-apply things.

 - If the Fixes-Bug tag is typoed or misconstrued then that might not be found 
   until someone tries a bisection and fails ... leading to awkward 
   add-the-tag-again-to-some-commit scenarios.

My gut feeling is that we really dont want to go there. As painful as 
bisection breakages are, they are relatively rare (compared to regular 
regressions), and the value of testing _that precise_ sha1 is a fundamental 
thing - git bisect shouldnt interact and reorder fixes.

If a bug wasnt found sooner then it either didnt matter that much, or the tree 
QA was bad. Neither of those factors forces a change in the development model.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nick Piggin June 25, 2010, 10:08 a.m. UTC | #9
On Thu, Jun 24, 2010 at 09:42:04PM -0700, Andrew Morton wrote:
> On Thu, 24 Jun 2010 21:27:13 -0700 (PDT) David Miller <davem@davemloft.net> wrote:
> 
> > From: Andrew Morton <akpm@linux-foundation.org>
> > Date: Thu, 24 Jun 2010 20:50:59 -0700
> > 
> > > What happens if you want to actually *drop* a patch from net-next? 
> > > Surely that happens?
> > 
> > I've only respun the tree on two or three occasions and that was
> > because I made some significant error myself and screwed up the
> > GIT tree somehow.
> > 
> > We've fixed much worse bugs than this one weeks after the changes
> > causing them went in, life goes on.
> 
> Still sucks - this is a quite ugly drawback to how we're using git. 
> I've hit bisection holes several times which held up the show. 
> Sometimes you can make them go away by fiddling the .config, other
> times I've hunted down the fix and manually applied it for each
> iteration.  It makes me feel all guilty each time I ask some poor sap
> to bisect a bug for us.

This might be somewhat improved by tagging a fix with the id of the
patch causing the regression, so that bisect could go through and try
to apply a fix out of order.

This could be done with a specific note format in the changelog, but
I don't know whether it's realistic to support in git format or if
you'd have to have a tool that builds up a mapping going the other
way...

Could also be useful for distros backporting things.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 95c9f7e..f3ad74a 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -46,9 +46,11 @@  void netpoll_poll(struct netpoll *np);
 void netpoll_send_udp(struct netpoll *np, const char *msg, int len);
 void netpoll_print_options(struct netpoll *np);
 int netpoll_parse_options(struct netpoll *np, char *opt);
+int __netpoll_setup(struct netpoll *np);
 int netpoll_setup(struct netpoll *np);
 int netpoll_trap(void);
 void netpoll_set_trap(int trap);
+void __netpoll_cleanup(struct netpoll *np);
 void netpoll_cleanup(struct netpoll *np);
 int __netpoll_rx(struct sk_buff *skb);
 void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb);
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index c445896..f728208 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -724,15 +724,78 @@  int netpoll_parse_options(struct netpoll *np, char *opt)
 	return -1;
 }
 
-int netpoll_setup(struct netpoll *np)
+int __netpoll_setup(struct netpoll *np)
 {
-	struct net_device *ndev = NULL;
-	struct in_device *in_dev;
+	struct net_device *ndev = np->dev;
 	struct netpoll_info *npinfo;
 	const struct net_device_ops *ops;
 	unsigned long flags;
 	int err;
 
+	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
+	    !ndev->netdev_ops->ndo_poll_controller) {
+		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
+		       np->name, np->dev_name);
+		err = -ENOTSUPP;
+		goto out;
+	}
+
+	if (!ndev->npinfo) {
+		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
+		if (!npinfo) {
+			err = -ENOMEM;
+			goto out;
+		}
+
+		npinfo->rx_flags = 0;
+		INIT_LIST_HEAD(&npinfo->rx_np);
+
+		spin_lock_init(&npinfo->rx_lock);
+		skb_queue_head_init(&npinfo->arp_tx);
+		skb_queue_head_init(&npinfo->txq);
+		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
+
+		atomic_set(&npinfo->refcnt, 1);
+
+		ops = np->dev->netdev_ops;
+		if (ops->ndo_netpoll_setup) {
+			err = ops->ndo_netpoll_setup(ndev, npinfo);
+			if (err)
+				goto free_npinfo;
+		}
+	} else {
+		npinfo = ndev->npinfo;
+		atomic_inc(&npinfo->refcnt);
+	}
+
+	npinfo->netpoll = np;
+
+	if (np->rx_hook) {
+		spin_lock_irqsave(&npinfo->rx_lock, flags);
+		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
+		list_add_tail(&np->rx, &npinfo->rx_np);
+		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+	}
+
+	/* last thing to do is link it to the net device structure */
+	rcu_assign_pointer(ndev->npinfo, npinfo);
+	rtnl_unlock();
+
+	return 0;
+
+free_npinfo:
+	kfree(npinfo);
+out:
+	return err;
+}
+EXPORT_SYMBOL_GPL(__netpoll_setup);
+
+int netpoll_setup(struct netpoll *np)
+{
+	struct net_device *ndev = NULL;
+	struct in_device *in_dev;
+	int err;
+
 	if (np->dev_name)
 		ndev = dev_get_by_name(&init_net, np->dev_name);
 	if (!ndev) {
@@ -805,61 +868,14 @@  int netpoll_setup(struct netpoll *np)
 	refill_skbs();
 
 	rtnl_lock();
-	if ((ndev->priv_flags & IFF_DISABLE_NETPOLL) ||
-	    !ndev->netdev_ops->ndo_poll_controller) {
-		printk(KERN_ERR "%s: %s doesn't support polling, aborting.\n",
-		       np->name, np->dev_name);
-		err = -ENOTSUPP;
-		goto unlock;
-	}
-
-	if (!ndev->npinfo) {
-		npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL);
-		if (!npinfo) {
-			err = -ENOMEM;
-			goto unlock;
-		}
-
-		npinfo->rx_flags = 0;
-		INIT_LIST_HEAD(&npinfo->rx_np);
-
-		spin_lock_init(&npinfo->rx_lock);
-		skb_queue_head_init(&npinfo->arp_tx);
-		skb_queue_head_init(&npinfo->txq);
-		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
-
-		atomic_set(&npinfo->refcnt, 1);
-
-		ops = np->dev->netdev_ops;
-		if (ops->ndo_netpoll_setup) {
-			err = ops->ndo_netpoll_setup(ndev, npinfo);
-			if (err)
-				goto free_npinfo;
-		}
-	} else {
-		npinfo = ndev->npinfo;
-		atomic_inc(&npinfo->refcnt);
-	}
-
-	npinfo->netpoll = np;
-
-	if (np->rx_hook) {
-		spin_lock_irqsave(&npinfo->rx_lock, flags);
-		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
-		list_add_tail(&np->rx, &npinfo->rx_np);
-		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
-	}
-
-	/* last thing to do is link it to the net device structure */
-	rcu_assign_pointer(ndev->npinfo, npinfo);
+	err = __netpoll_setup(np);
 	rtnl_unlock();
 
+	if (err)
+		goto put;
+
 	return 0;
 
-free_npinfo:
-	kfree(npinfo);
-unlock:
-	rtnl_unlock();
 put:
 	dev_put(ndev);
 	return err;
@@ -872,40 +888,32 @@  static int __init netpoll_init(void)
 }
 core_initcall(netpoll_init);
 
-void netpoll_cleanup(struct netpoll *np)
+void __netpoll_cleanup(struct netpoll *np)
 {
 	struct netpoll_info *npinfo;
 	unsigned long flags;
-	int free = 0;
 
-	if (!np->dev)
+	npinfo = np->dev->npinfo;
+	if (!npinfo)
 		return;
 
-	rtnl_lock();
-	npinfo = np->dev->npinfo;
-	if (npinfo) {
-		if (!list_empty(&npinfo->rx_np)) {
-			spin_lock_irqsave(&npinfo->rx_lock, flags);
-			list_del(&np->rx);
-			if (list_empty(&npinfo->rx_np))
-				npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
-			spin_unlock_irqrestore(&npinfo->rx_lock, flags);
-		}
+	if (!list_empty(&npinfo->rx_np)) {
+		spin_lock_irqsave(&npinfo->rx_lock, flags);
+		list_del(&np->rx);
+		if (list_empty(&npinfo->rx_np))
+			npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
+		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+	}
 
-		free = atomic_dec_and_test(&npinfo->refcnt);
-		if (free) {
-			const struct net_device_ops *ops;
+	if (atomic_dec_and_test(&npinfo->refcnt)) {
+		const struct net_device_ops *ops;
 
-			ops = np->dev->netdev_ops;
-			if (ops->ndo_netpoll_cleanup)
-				ops->ndo_netpoll_cleanup(np->dev);
+		ops = np->dev->netdev_ops;
+		if (ops->ndo_netpoll_cleanup)
+			ops->ndo_netpoll_cleanup(np->dev);
 
-			rcu_assign_pointer(np->dev->npinfo, NULL);
-		}
-	}
-	rtnl_unlock();
+		rcu_assign_pointer(np->dev->npinfo, NULL);
 
-	if (free) {
 		/* avoid racing with NAPI reading npinfo */
 		synchronize_rcu_bh();
 
@@ -917,9 +925,19 @@  void netpoll_cleanup(struct netpoll *np)
 		__skb_queue_purge(&npinfo->txq);
 		kfree(npinfo);
 	}
+}
+EXPORT_SYMBOL_GPL(__netpoll_cleanup);
 
-	dev_put(np->dev);
+void netpoll_cleanup(struct netpoll *np)
+{
+	if (!np->dev)
+		return;
 
+	rtnl_lock();
+	__netpoll_cleanup(np);
+	rtnl_unlock();
+
+	dev_put(np->dev);
 	np->dev = NULL;
 }