diff mbox

[RFC] spinlock: split out debugging check from spin_lock_mutex

Message ID 20130418192912.GB29130@hmsreliant.think-freely.org
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Neil Horman April 18, 2013, 7:29 p.m. UTC
Well, I guess Ingo doesn't have time  to respond here.  Given that, I've been
looking at other ways of adressing this problem (as the more I think about it,
the more I think changing mutex semantics, even to be more accepting, is going
to be met with lots of friction).  I've come up with the below change, and it
works well for me (I've hammered it with ifup/downs while sending messages over
netconsole for 24 hours now without any incident).  I'm not sure I like all the
details yet (I modified the rx_flag field to be a gneral flags field that uses
test/set/clear_bit, which I don't think is necessecary, but it seems saner to
me).  the meat of the change though is that instead of a mutex, we use two flags
that, respectively, inhibit netpoll usage, and indicate when we're in a netpoll
service path.  Using these two flags, in conjunction with a wait_queue, we can
signal to executors in the netpoll path that we're inhibiting its use and expect
to any user in the path to complete and wake up any contexts in the inhibiting
path.  Seems to work well for me.  Let me know what you think, and I'll make any
desired changes and post an official patch.

Best
Neil


commit c277d54d471572b0ec7b27812d205fb303ad44f9
Author: Neil Horman <nhorman@tuxdriver.com>
Date:   Wed Apr 17 15:59:44 2013 -0400

    netpoll: convert mutex into a wait_event
    
    Bart Van Assche recently reported a warning to me:
    
    <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
     [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
     [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
     [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
     [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
     [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
     [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
     [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
     [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
     [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
     [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
     [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
     [<ffffffff8146f9f6>] printk+0x4d/0x4f
     [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
    
    This resulted from my commit ca99ca14c which introduced a mutex_trylock
    operation in a path that could execute in interrupt context.  When mutex
    debugging is enabled, the above warns the user when we are in fact exectuting in
    interrupt context.
    
    After some discussion, I'm not sure a mutex is the right choice here, as it
    seems unsafe to use (deadlock possibilities due to implementation details).
    Instead lets convert the mutex to a mechanism that makes use of a
    wait_queue_head and some flag bits.  The dev_close paths can now set a bit to
    suspend netpoll and then sleep waiting for the NETPOLL_POLLING flag to clear,
    which allows for flushing of any in-progress execution in the
    netpoll_poll_dev_path.
    
    Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
    Reported-by: Bart Van Assche <bvanassche@acm.org>
    CC: Bart Van Assche <bvanassche@acm.org>
    CC: David Miller <davem@davemloft.net>
    CC: netdev@vger.kernel.org

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Neil Horman April 22, 2013, 8:12 p.m. UTC | #1
On Thu, Apr 18, 2013 at 03:29:12PM -0400, Neil Horman wrote:
> Well, I guess Ingo doesn't have time  to respond here.  Given that, I've been
> looking at other ways of adressing this problem (as the more I think about it,
> the more I think changing mutex semantics, even to be more accepting, is going
> to be met with lots of friction).  I've come up with the below change, and it
> works well for me (I've hammered it with ifup/downs while sending messages over
> netconsole for 24 hours now without any incident).  I'm not sure I like all the
> details yet (I modified the rx_flag field to be a gneral flags field that uses
> test/set/clear_bit, which I don't think is necessecary, but it seems saner to
> me).  the meat of the change though is that instead of a mutex, we use two flags
> that, respectively, inhibit netpoll usage, and indicate when we're in a netpoll
> service path.  Using these two flags, in conjunction with a wait_queue, we can
> signal to executors in the netpoll path that we're inhibiting its use and expect
> to any user in the path to complete and wake up any contexts in the inhibiting
> path.  Seems to work well for me.  Let me know what you think, and I'll make any
> desired changes and post an official patch.
> 
> Best
> Neil
> 
Bart, any testing/thoughts here?  i'd like to get this fixed in the next few
days, but I'd like to get any feedback you might have first.

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman April 23, 2013, 1:23 p.m. UTC | #2
On Mon, Apr 22, 2013 at 01:41:11PM -0700, Bart Van Assche wrote:
> On Mon, Apr 22, 2013 at 1:12 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > Bart, any testing/thoughts here?  i'd like to get this fixed in the next
> > few
> > days, but I'd like to get any feedback you might have first.
> >
> 
> Sorry but I haven't been able to run any tests during the past eight days
> since I'm traveling and because of that I do not have access to my regular
> test setup.
> 
> Bart.
Copy that.  I can propose this change based on my own testing, or wait for you
to get back.  Do you have a preference?
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman April 23, 2013, 1:44 p.m. UTC | #3
On Tue, Apr 23, 2013 at 06:38:19AM -0700, Bart Van Assche wrote:
> On Tue, Apr 23, 2013 at 6:23 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > Copy that.  I can propose this change based on my own testing, or wait for
> > you
> > to get back.  Do you have a preference?
> >
> 
> It will take until next week before I can run more tests. It depends on
> whether a fix can wait that long ?
> 
> Bart.
Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
the mutex solution is reporting a warning, but no actual problems).  So I think
its safe to wait a bit longer, unless you just want this squared away now.

FWIw, I think theres still some discussion to be had about the possibility of
converting mutexes to use spin_lock_irqsave internally, thereby making
mutex_trylock irq safe, but I think we can deal with that independently of this
issue.

Regards
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 23, 2013, 5:33 p.m. UTC | #4
From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 23 Apr 2013 09:44:43 -0400

> Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
> proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
> the mutex solution is reporting a warning, but no actual problems).  So I think
> its safe to wait a bit longer, unless you just want this squared away now.

If it's just a warning and people aren't actually hitting the
potential deadlock, it can wait.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman April 23, 2013, 5:50 p.m. UTC | #5
On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Tue, 23 Apr 2013 09:44:43 -0400
> 
> > Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
> > proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
> > the mutex solution is reporting a warning, but no actual problems).  So I think
> > its safe to wait a bit longer, unless you just want this squared away now.
> 
> If it's just a warning and people aren't actually hitting the
> potential deadlock, it can wait.
> 
Copy that. Bart, I'll wait till you get back then.
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
bvba Bart Van Assche April 27, 2013, 6:53 p.m. UTC | #6
On 04/23/13 19:50, Neil Horman wrote:
> On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote:
>> From: Neil Horman <nhorman@tuxdriver.com>
>> Date: Tue, 23 Apr 2013 09:44:43 -0400
>>
>>> Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
>>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
>>> the mutex solution is reporting a warning, but no actual problems).  So I think
>>> its safe to wait a bit longer, unless you just want this squared away now.
>>
>> If it's just a warning and people aren't actually hitting the
>> potential deadlock, it can wait.
>>
> Copy that. Bart, I'll wait till you get back then.

(Just arrived home)

Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271:

kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601]
kernel: irq event stamp: 1999192
kernel: hardirqs last  enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450
kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80
kernel: softirqs last  enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280
kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0
kernel: CPU 0 
kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G           O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE
kernel: RIP: 0010:[<ffffffff8103e8a0>]  [<ffffffff8103e8a0>] console_unlock+0x420/0x450
kernel: Call Trace:
kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040
kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350
kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
kernel: [<ffffffff812bd764>] con_write+0x34/0x50
kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210
kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70
kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480
kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0
kernel: [<ffffffff812a2839>] tty_write+0x159/0x300
kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180
kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0
kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170
kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0
kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b

Bart.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman April 29, 2013, 6:13 p.m. UTC | #7
On Sat, Apr 27, 2013 at 08:53:54PM +0200, bvba Bart Van Assche wrote:
> On 04/23/13 19:50, Neil Horman wrote:
> > On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote:
> >> From: Neil Horman <nhorman@tuxdriver.com>
> >> Date: Tue, 23 Apr 2013 09:44:43 -0400
> >>
> >>> Dave, how do you feel about it?  I'm comfortable with the wait queue change I've
> >>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e.
> >>> the mutex solution is reporting a warning, but no actual problems).  So I think
> >>> its safe to wait a bit longer, unless you just want this squared away now.
> >>
> >> If it's just a warning and people aren't actually hitting the
> >> potential deadlock, it can wait.
> >>
> > Copy that. Bart, I'll wait till you get back then.
> 
> (Just arrived home)
> 
> Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271:
> 
> kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601]
> kernel: irq event stamp: 1999192
> kernel: hardirqs last  enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450
> kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80
> kernel: softirqs last  enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280
> kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0
> kernel: CPU 0 
> kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G           O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE
> kernel: RIP: 0010:[<ffffffff8103e8a0>]  [<ffffffff8103e8a0>] console_unlock+0x420/0x450
> kernel: Call Trace:
> kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040
> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
> kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350
> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
> kernel: [<ffffffff812bd764>] con_write+0x34/0x50
> kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210
> kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70
> kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480
> kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0
> kernel: [<ffffffff812a2839>] tty_write+0x159/0x300
> kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180
> kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0
> kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170
> kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0
> kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b
> 
> Bart.
> 

So, looking at this, I'm not 100% sure whats going on.  console_unlock has the
ability to detect the need for rescheduling when we call
raw_spin_unlock_irqrestore at the bottom of the function, so there should be no
reason that we don't give the per-cpu watchdog task a chance to run.  About the
only case I see in which we might be in trouble would be if somehow got into an
infinite loop if every message in the log buffer had LOG_NOCONS set, in which
case we would spin in this loop without ever scheduling or re-enabling
interrupts.  But for that to happen I think you would have to be flooding the
log with messages that were bigger than the default message buffer size (about
1000 bytes without a newline).  You're not doing that are you?

Regardless, this is a different problem from the one we were origionally
addressing, we should probably handle it in a different thread.  

On the up side, I did discover while looking at this that semaphores, unlike
mutexes are quite liberal in what contexts they can be used in (i.e. irq
contexts are perfectly legitimate places to call up and down_trylock).  I'll
respin my patch to use that, as its much simpler than a wait queue

Best
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche April 29, 2013, 7:12 p.m. UTC | #8
On 04/29/13 20:13, Neil Horman wrote:
> On Sat, Apr 27, 2013 at 08:53:54PM +0200, Bart Van Assche wrote:
>> Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271:
>>
>> kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601]
>> kernel: irq event stamp: 1999192
>> kernel: hardirqs last  enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450
>> kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80
>> kernel: softirqs last  enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280
>> kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0
>> kernel: CPU 0
>> kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G           O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE
>> kernel: RIP: 0010:[<ffffffff8103e8a0>]  [<ffffffff8103e8a0>] console_unlock+0x420/0x450
>> kernel: Call Trace:
>> kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040
>> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
>> kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350
>> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70
>> kernel: [<ffffffff812bd764>] con_write+0x34/0x50
>> kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210
>> kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70
>> kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480
>> kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0
>> kernel: [<ffffffff812a2839>] tty_write+0x159/0x300
>> kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180
>> kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0
>> kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170
>> kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0
>> kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b
>
> So, looking at this, I'm not 100% sure whats going on.  console_unlock has the
> ability to detect the need for rescheduling when we call
> raw_spin_unlock_irqrestore at the bottom of the function, so there should be no
> reason that we don't give the per-cpu watchdog task a chance to run.  About the
> only case I see in which we might be in trouble would be if somehow got into an
> infinite loop if every message in the log buffer had LOG_NOCONS set, in which
> case we would spin in this loop without ever scheduling or re-enabling
> interrupts.  But for that to happen I think you would have to be flooding the
> log with messages that were bigger than the default message buffer size (about
> 1000 bytes without a newline).  You're not doing that are you?

The above call trace was triggered by sending messages at a high rate 
over netconsole where each message has a length between 40 and (about) 
160 characters. I'm not sure though what exactly is causing this call 
trace - I haven't had the time yet to analyze this in detail.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index 9d7d8c6..52e1d14 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -35,12 +35,19 @@  struct netpoll {
 	struct work_struct cleanup_work;
 };
 
+/* Flag defs for netpoll_info->flags */
+
+#define NETPOLL_RX_ENABLED	1
+#define NETPOLL_RX_DROP		2
+#define NETPOLL_SUSPENDED	3
+#define NETPOLL_POLLING		4
+
 struct netpoll_info {
 	atomic_t refcnt;
 
-	unsigned long rx_flags;
+	unsigned long flags;
 	spinlock_t rx_lock;
-	struct mutex dev_lock;
+	wait_queue_head_t dev_wait;
 	struct list_head rx_np; /* netpolls that registered an rx_hook */
 
 	struct sk_buff_head neigh_tx; /* list of neigh requests to reply to */
@@ -53,11 +60,11 @@  struct netpoll_info {
 };
 
 #ifdef CONFIG_NETPOLL
-extern int netpoll_rx_disable(struct net_device *dev);
-extern void netpoll_rx_enable(struct net_device *dev);
+extern int netpoll_suspend(struct net_device *dev);
+extern void netpoll_resume(struct net_device *dev);
 #else
-static inline int netpoll_rx_disable(struct net_device *dev) { return 0; }
-static inline void netpoll_rx_enable(struct net_device *dev) { return; }
+static inline int netpoll_suspend(struct net_device *dev) { return 0; }
+static inline void netpoll_resume(struct net_device *dev) { return; }
 #endif
 
 void netpoll_send_udp(struct netpoll *np, const char *msg, int len);
@@ -88,7 +95,8 @@  static inline bool netpoll_rx_on(struct sk_buff *skb)
 {
 	struct netpoll_info *npinfo = rcu_dereference_bh(skb->dev->npinfo);
 
-	return npinfo && (!list_empty(&npinfo->rx_np) || npinfo->rx_flags);
+	return npinfo && (!list_empty(&npinfo->rx_np) ||
+	       test_bit(NETPOLL_RX_ENABLED, &npinfo->flags));
 }
 
 static inline bool netpoll_rx(struct sk_buff *skb)
@@ -104,8 +112,9 @@  static inline bool netpoll_rx(struct sk_buff *skb)
 
 	npinfo = rcu_dereference_bh(skb->dev->npinfo);
 	spin_lock(&npinfo->rx_lock);
-	/* check rx_flags again with the lock held */
-	if (npinfo->rx_flags && __netpoll_rx(skb, npinfo))
+	/* check flags again with the lock held */
+	if (test_bit(NETPOLL_RX_ENABLED, &npinfo->flags) &&
+	    __netpoll_rx(skb, npinfo))
 		ret = true;
 	spin_unlock(&npinfo->rx_lock);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 3655ff9..163dd3e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1200,7 +1200,7 @@  static int __dev_open(struct net_device *dev)
 	 * If we don't do this there is a chance ndo_poll_controller
 	 * or ndo_poll may be running while we open the device
 	 */
-	ret = netpoll_rx_disable(dev);
+	ret = netpoll_suspend(dev);
 	if (ret)
 		return ret;
 
@@ -1217,7 +1217,7 @@  static int __dev_open(struct net_device *dev)
 	if (!ret && ops->ndo_open)
 		ret = ops->ndo_open(dev);
 
-	netpoll_rx_enable(dev);
+	netpoll_resume(dev);
 
 	if (ret)
 		clear_bit(__LINK_STATE_START, &dev->state);
@@ -1311,7 +1311,7 @@  static int __dev_close(struct net_device *dev)
 	LIST_HEAD(single);
 
 	/* Temporarily disable netpoll until the interface is down */
-	retval = netpoll_rx_disable(dev);
+	retval = netpoll_suspend(dev);
 	if (retval)
 		return retval;
 
@@ -1319,7 +1319,7 @@  static int __dev_close(struct net_device *dev)
 	retval = __dev_close_many(&single);
 	list_del(&single);
 
-	netpoll_rx_enable(dev);
+	netpoll_resume(dev);
 	return retval;
 }
 
@@ -1360,7 +1360,7 @@  int dev_close(struct net_device *dev)
 		LIST_HEAD(single);
 
 		/* Block netpoll rx while the interface is going down */
-		ret = netpoll_rx_disable(dev);
+		ret = netpoll_suspend(dev);
 		if (ret)
 			return ret;
 
@@ -1368,7 +1368,7 @@  int dev_close(struct net_device *dev)
 		dev_close_many(&single);
 		list_del(&single);
 
-		netpoll_rx_enable(dev);
+		netpoll_resume(dev);
 	}
 	return ret;
 }
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index a3a17ae..a094227 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -50,8 +50,6 @@  static atomic_t trapped;
 DEFINE_STATIC_SRCU(netpoll_srcu);
 
 #define USEC_PER_POLL	50
-#define NETPOLL_RX_ENABLED  1
-#define NETPOLL_RX_DROP     2
 
 #define MAX_SKB_SIZE							\
 	(sizeof(struct ethhdr) +					\
@@ -155,7 +153,7 @@  static int poll_one_napi(struct netpoll_info *npinfo,
 	if (!test_bit(NAPI_STATE_SCHED, &napi->state))
 		return budget;
 
-	npinfo->rx_flags |= NETPOLL_RX_DROP;
+	set_bit(NETPOLL_RX_DROP, &npinfo->flags);
 	atomic_inc(&trapped);
 	set_bit(NAPI_STATE_NPSVC, &napi->state);
 
@@ -164,7 +162,7 @@  static int poll_one_napi(struct netpoll_info *npinfo,
 
 	clear_bit(NAPI_STATE_NPSVC, &napi->state);
 	atomic_dec(&trapped);
-	npinfo->rx_flags &= ~NETPOLL_RX_DROP;
+	clear_bit(NETPOLL_RX_DROP, &npinfo->flags);
 
 	return budget - work;
 }
@@ -206,27 +204,22 @@  static void netpoll_poll_dev(struct net_device *dev)
 	 * the dev_open/close paths use this to block netpoll activity
 	 * while changing device state
 	 */
-	if (!mutex_trylock(&ni->dev_lock))
-		return;
+	set_bit(NETPOLL_POLLING, &ni->flags);
+	if (test_bit(NETPOLL_SUSPENDED, &ni->flags))
+		goto out;
 
-	if (!netif_running(dev)) {
-		mutex_unlock(&ni->dev_lock);
-		return;
-	}
+	if (!netif_running(dev))
+		goto out;
 
 	ops = dev->netdev_ops;
-	if (!ops->ndo_poll_controller) {
-		mutex_unlock(&ni->dev_lock);
-		return;
-	}
+	if (!ops->ndo_poll_controller)
+		goto out;
 
 	/* Process pending work on NIC */
 	ops->ndo_poll_controller(dev);
 
 	poll_napi(dev);
 
-	mutex_unlock(&ni->dev_lock);
-
 	if (dev->flags & IFF_SLAVE) {
 		if (ni) {
 			struct net_device *bond_dev;
@@ -245,32 +238,37 @@  static void netpoll_poll_dev(struct net_device *dev)
 	service_neigh_queue(ni);
 
 	zap_completion_queue();
+out:
+	clear_bit(NETPOLL_POLLING, &ni->dev_wait.lock);
+	wake_up(&ni->dev_wait);
 }
 
-int netpoll_rx_disable(struct net_device *dev)
+int netpoll_suspend(struct net_device *dev)
 {
 	struct netpoll_info *ni;
 	int idx;
+
 	might_sleep();
 	idx = srcu_read_lock(&netpoll_srcu);
 	ni = srcu_dereference(dev->npinfo, &netpoll_srcu);
-	if (ni)
-		mutex_lock(&ni->dev_lock);
+	if (ni && !test_and_set_bit(NETPOLL_SUSPENDED, &ni->flags))
+		wait_event(ni->dev_wait, !test_bit(NETPOLL_POLLING,
+			   &ni->flags));
 	srcu_read_unlock(&netpoll_srcu, idx);
 	return 0;
 }
-EXPORT_SYMBOL(netpoll_rx_disable);
+EXPORT_SYMBOL(netpoll_suspend);
 
-void netpoll_rx_enable(struct net_device *dev)
+void netpoll_resume(struct net_device *dev)
 {
 	struct netpoll_info *ni;
 	rcu_read_lock();
 	ni = rcu_dereference(dev->npinfo);
 	if (ni)
-		mutex_unlock(&ni->dev_lock);
+		clear_bit(NETPOLL_SUSPENDED, &ni->flags);
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(netpoll_rx_enable);
+EXPORT_SYMBOL(netpoll_resume);
 
 static void refill_skbs(void)
 {
@@ -1042,11 +1040,11 @@  int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp)
 			goto out;
 		}
 
-		npinfo->rx_flags = 0;
+		npinfo->flags = 0;
 		INIT_LIST_HEAD(&npinfo->rx_np);
 
 		spin_lock_init(&npinfo->rx_lock);
-		mutex_init(&npinfo->dev_lock);
+		init_waitqueue_head(&npinfo->dev_wait);
 		skb_queue_head_init(&npinfo->neigh_tx);
 		skb_queue_head_init(&npinfo->txq);
 		INIT_DELAYED_WORK(&npinfo->tx_work, queue_process);
@@ -1068,7 +1066,7 @@  int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp)
 
 	if (np->rx_hook) {
 		spin_lock_irqsave(&npinfo->rx_lock, flags);
-		npinfo->rx_flags |= NETPOLL_RX_ENABLED;
+		set_bit(NETPOLL_RX_ENABLED, &npinfo->flags);
 		list_add_tail(&np->rx, &npinfo->rx_np);
 		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
 	}
@@ -1251,7 +1249,7 @@  void __netpoll_cleanup(struct netpoll *np)
 		spin_lock_irqsave(&npinfo->rx_lock, flags);
 		list_del(&np->rx);
 		if (list_empty(&npinfo->rx_np))
-			npinfo->rx_flags &= ~NETPOLL_RX_ENABLED;
+			clear_bit(NETPOLL_RX_ENABLED, &npinfo->flags);
 		spin_unlock_irqrestore(&npinfo->rx_lock, flags);
 	}