Message ID | 20130418192912.GB29130@hmsreliant.think-freely.org |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Apr 18, 2013 at 03:29:12PM -0400, Neil Horman wrote: > Well, I guess Ingo doesn't have time to respond here. Given that, I've been > looking at other ways of adressing this problem (as the more I think about it, > the more I think changing mutex semantics, even to be more accepting, is going > to be met with lots of friction). I've come up with the below change, and it > works well for me (I've hammered it with ifup/downs while sending messages over > netconsole for 24 hours now without any incident). I'm not sure I like all the > details yet (I modified the rx_flag field to be a gneral flags field that uses > test/set/clear_bit, which I don't think is necessecary, but it seems saner to > me). the meat of the change though is that instead of a mutex, we use two flags > that, respectively, inhibit netpoll usage, and indicate when we're in a netpoll > service path. Using these two flags, in conjunction with a wait_queue, we can > signal to executors in the netpoll path that we're inhibiting its use and expect > to any user in the path to complete and wake up any contexts in the inhibiting > path. Seems to work well for me. Let me know what you think, and I'll make any > desired changes and post an official patch. > > Best > Neil > Bart, any testing/thoughts here? i'd like to get this fixed in the next few days, but I'd like to get any feedback you might have first. Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 22, 2013 at 01:41:11PM -0700, Bart Van Assche wrote: > On Mon, Apr 22, 2013 at 1:12 PM, Neil Horman <nhorman@tuxdriver.com> wrote: > > > Bart, any testing/thoughts here? i'd like to get this fixed in the next > > few > > days, but I'd like to get any feedback you might have first. > > > > Sorry but I haven't been able to run any tests during the past eight days > since I'm traveling and because of that I do not have access to my regular > test setup. > > Bart. Copy that. I can propose this change based on my own testing, or wait for you to get back. Do you have a preference? Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 23, 2013 at 06:38:19AM -0700, Bart Van Assche wrote: > On Tue, Apr 23, 2013 at 6:23 AM, Neil Horman <nhorman@tuxdriver.com> wrote: > > > Copy that. I can propose this change based on my own testing, or wait for > > you > > to get back. Do you have a preference? > > > > It will take until next week before I can run more tests. It depends on > whether a fix can wait that long ? > > Bart. Dave, how do you feel about it? I'm comfortable with the wait queue change I've proposed, but I've not received any reports of actual netpoll deadlocks (i.e. the mutex solution is reporting a warning, but no actual problems). So I think its safe to wait a bit longer, unless you just want this squared away now. FWIw, I think theres still some discussion to be had about the possibility of converting mutexes to use spin_lock_irqsave internally, thereby making mutex_trylock irq safe, but I think we can deal with that independently of this issue. Regards Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Neil Horman <nhorman@tuxdriver.com> Date: Tue, 23 Apr 2013 09:44:43 -0400 > Dave, how do you feel about it? I'm comfortable with the wait queue change I've > proposed, but I've not received any reports of actual netpoll deadlocks (i.e. > the mutex solution is reporting a warning, but no actual problems). So I think > its safe to wait a bit longer, unless you just want this squared away now. If it's just a warning and people aren't actually hitting the potential deadlock, it can wait. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote: > From: Neil Horman <nhorman@tuxdriver.com> > Date: Tue, 23 Apr 2013 09:44:43 -0400 > > > Dave, how do you feel about it? I'm comfortable with the wait queue change I've > > proposed, but I've not received any reports of actual netpoll deadlocks (i.e. > > the mutex solution is reporting a warning, but no actual problems). So I think > > its safe to wait a bit longer, unless you just want this squared away now. > > If it's just a warning and people aren't actually hitting the > potential deadlock, it can wait. > Copy that. Bart, I'll wait till you get back then. Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/23/13 19:50, Neil Horman wrote: > On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote: >> From: Neil Horman <nhorman@tuxdriver.com> >> Date: Tue, 23 Apr 2013 09:44:43 -0400 >> >>> Dave, how do you feel about it? I'm comfortable with the wait queue change I've >>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e. >>> the mutex solution is reporting a warning, but no actual problems). So I think >>> its safe to wait a bit longer, unless you just want this squared away now. >> >> If it's just a warning and people aren't actually hitting the >> potential deadlock, it can wait. >> > Copy that. Bart, I'll wait till you get back then. (Just arrived home) Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271: kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601] kernel: irq event stamp: 1999192 kernel: hardirqs last enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450 kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80 kernel: softirqs last enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280 kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0 kernel: CPU 0 kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE kernel: RIP: 0010:[<ffffffff8103e8a0>] [<ffffffff8103e8a0>] console_unlock+0x420/0x450 kernel: Call Trace: kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040 kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350 kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 kernel: [<ffffffff812bd764>] con_write+0x34/0x50 kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210 kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70 kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480 kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0 kernel: [<ffffffff812a2839>] tty_write+0x159/0x300 kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180 kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0 kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170 kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0 kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b Bart. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Apr 27, 2013 at 08:53:54PM +0200, bvba Bart Van Assche wrote: > On 04/23/13 19:50, Neil Horman wrote: > > On Tue, Apr 23, 2013 at 01:33:15PM -0400, David Miller wrote: > >> From: Neil Horman <nhorman@tuxdriver.com> > >> Date: Tue, 23 Apr 2013 09:44:43 -0400 > >> > >>> Dave, how do you feel about it? I'm comfortable with the wait queue change I've > >>> proposed, but I've not received any reports of actual netpoll deadlocks (i.e. > >>> the mutex solution is reporting a warning, but no actual problems). So I think > >>> its safe to wait a bit longer, unless you just want this squared away now. > >> > >> If it's just a warning and people aren't actually hitting the > >> potential deadlock, it can wait. > >> > > Copy that. Bart, I'll wait till you get back then. > > (Just arrived home) > > Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271: > > kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601] > kernel: irq event stamp: 1999192 > kernel: hardirqs last enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450 > kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80 > kernel: softirqs last enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280 > kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0 > kernel: CPU 0 > kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE > kernel: RIP: 0010:[<ffffffff8103e8a0>] [<ffffffff8103e8a0>] console_unlock+0x420/0x450 > kernel: Call Trace: > kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040 > kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 > kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350 > kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 > kernel: [<ffffffff812bd764>] con_write+0x34/0x50 > kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210 > kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70 > kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480 > kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0 > kernel: [<ffffffff812a2839>] tty_write+0x159/0x300 > kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180 > kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0 > kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170 > kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0 > kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b > > Bart. > So, looking at this, I'm not 100% sure whats going on. console_unlock has the ability to detect the need for rescheduling when we call raw_spin_unlock_irqrestore at the bottom of the function, so there should be no reason that we don't give the per-cpu watchdog task a chance to run. About the only case I see in which we might be in trouble would be if somehow got into an infinite loop if every message in the log buffer had LOG_NOCONS set, in which case we would spin in this loop without ever scheduling or re-enabling interrupts. But for that to happen I think you would have to be flooding the log with messages that were bigger than the default message buffer size (about 1000 bytes without a newline). You're not doing that are you? Regardless, this is a different problem from the one we were origionally addressing, we should probably handle it in a different thread. On the up side, I did discover while looking at this that semaphores, unlike mutexes are quite liberal in what contexts they can be used in (i.e. irq contexts are perfectly legitimate places to call up and down_trylock). I'll respin my patch to use that, as its much simpler than a wait queue Best Neil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/29/13 20:13, Neil Horman wrote: > On Sat, Apr 27, 2013 at 08:53:54PM +0200, Bart Van Assche wrote: >> Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271: >> >> kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601] >> kernel: irq event stamp: 1999192 >> kernel: hardirqs last enabled at (1999191): [<ffffffff8103e89d>] console_unlock+0x41d/0x450 >> kernel: hardirqs last disabled at (1999192): [<ffffffff8143e96a>] apic_timer_interrupt+0x6a/0x80 >> kernel: softirqs last enabled at (1999188): [<ffffffff81044e26>] __do_softirq+0x196/0x280 >> kernel: softirqs last disabled at (1999181): [<ffffffff810450c5>] irq_exit+0xb5/0xc0 >> kernel: CPU 0 >> kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE >> kernel: RIP: 0010:[<ffffffff8103e8a0>] [<ffffffff8103e8a0>] console_unlock+0x420/0x450 >> kernel: Call Trace: >> kernel: [<ffffffff812bbf37>] do_con_write.part.19+0x887/0x2040 >> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 >> kernel: [<ffffffff814316fc>] ? mutex_lock_nested+0x28c/0x350 >> kernel: [<ffffffff812a52b7>] ? process_output+0x37/0x70 >> kernel: [<ffffffff812bd764>] con_write+0x34/0x50 >> kernel: [<ffffffff812a51e9>] do_output_char+0x179/0x210 >> kernel: [<ffffffff812a52cd>] process_output+0x4d/0x70 >> kernel: [<ffffffff812a59d0>] n_tty_write+0x210/0x480 >> kernel: [<ffffffff81072710>] ? try_to_wake_up+0x2e0/0x2e0 >> kernel: [<ffffffff812a2839>] tty_write+0x159/0x300 >> kernel: [<ffffffff8109326f>] ? lock_release_holdtime.part.22+0xf/0x180 >> kernel: [<ffffffff812a57c0>] ? n_tty_poll+0x1c0/0x1c0 >> kernel: [<ffffffff81151a3b>] vfs_write+0xab/0x170 >> kernel: [<ffffffff81151ea5>] sys_write+0x55/0xa0 >> kernel: [<ffffffff8143dd82>] system_call_fastpath+0x16/0x1b > > So, looking at this, I'm not 100% sure whats going on. console_unlock has the > ability to detect the need for rescheduling when we call > raw_spin_unlock_irqrestore at the bottom of the function, so there should be no > reason that we don't give the per-cpu watchdog task a chance to run. About the > only case I see in which we might be in trouble would be if somehow got into an > infinite loop if every message in the log buffer had LOG_NOCONS set, in which > case we would spin in this loop without ever scheduling or re-enabling > interrupts. But for that to happen I think you would have to be flooding the > log with messages that were bigger than the default message buffer size (about > 1000 bytes without a newline). You're not doing that are you? The above call trace was triggered by sending messages at a high rate over netconsole where each message has a length between 40 and (about) 160 characters. I'm not sure though what exactly is causing this call trace - I haven't had the time yet to analyze this in detail. Bart. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h index 9d7d8c6..52e1d14 100644 --- a/include/linux/netpoll.h +++ b/include/linux/netpoll.h @@ -35,12 +35,19 @@ struct netpoll { struct work_struct cleanup_work; }; +/* Flag defs for netpoll_info->flags */ + +#define NETPOLL_RX_ENABLED 1 +#define NETPOLL_RX_DROP 2 +#define NETPOLL_SUSPENDED 3 +#define NETPOLL_POLLING 4 + struct netpoll_info { atomic_t refcnt; - unsigned long rx_flags; + unsigned long flags; spinlock_t rx_lock; - struct mutex dev_lock; + wait_queue_head_t dev_wait; struct list_head rx_np; /* netpolls that registered an rx_hook */ struct sk_buff_head neigh_tx; /* list of neigh requests to reply to */ @@ -53,11 +60,11 @@ struct netpoll_info { }; #ifdef CONFIG_NETPOLL -extern int netpoll_rx_disable(struct net_device *dev); -extern void netpoll_rx_enable(struct net_device *dev); +extern int netpoll_suspend(struct net_device *dev); +extern void netpoll_resume(struct net_device *dev); #else -static inline int netpoll_rx_disable(struct net_device *dev) { return 0; } -static inline void netpoll_rx_enable(struct net_device *dev) { return; } +static inline int netpoll_suspend(struct net_device *dev) { return 0; } +static inline void netpoll_resume(struct net_device *dev) { return; } #endif void netpoll_send_udp(struct netpoll *np, const char *msg, int len); @@ -88,7 +95,8 @@ static inline bool netpoll_rx_on(struct sk_buff *skb) { struct netpoll_info *npinfo = rcu_dereference_bh(skb->dev->npinfo); - return npinfo && (!list_empty(&npinfo->rx_np) || npinfo->rx_flags); + return npinfo && (!list_empty(&npinfo->rx_np) || + test_bit(NETPOLL_RX_ENABLED, &npinfo->flags)); } static inline bool netpoll_rx(struct sk_buff *skb) @@ -104,8 +112,9 @@ static inline bool netpoll_rx(struct sk_buff *skb) npinfo = rcu_dereference_bh(skb->dev->npinfo); spin_lock(&npinfo->rx_lock); - /* check rx_flags again with the lock held */ - if (npinfo->rx_flags && __netpoll_rx(skb, npinfo)) + /* check flags again with the lock held */ + if (test_bit(NETPOLL_RX_ENABLED, &npinfo->flags) && + __netpoll_rx(skb, npinfo)) ret = true; spin_unlock(&npinfo->rx_lock); diff --git a/net/core/dev.c b/net/core/dev.c index 3655ff9..163dd3e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1200,7 +1200,7 @@ static int __dev_open(struct net_device *dev) * If we don't do this there is a chance ndo_poll_controller * or ndo_poll may be running while we open the device */ - ret = netpoll_rx_disable(dev); + ret = netpoll_suspend(dev); if (ret) return ret; @@ -1217,7 +1217,7 @@ static int __dev_open(struct net_device *dev) if (!ret && ops->ndo_open) ret = ops->ndo_open(dev); - netpoll_rx_enable(dev); + netpoll_resume(dev); if (ret) clear_bit(__LINK_STATE_START, &dev->state); @@ -1311,7 +1311,7 @@ static int __dev_close(struct net_device *dev) LIST_HEAD(single); /* Temporarily disable netpoll until the interface is down */ - retval = netpoll_rx_disable(dev); + retval = netpoll_suspend(dev); if (retval) return retval; @@ -1319,7 +1319,7 @@ static int __dev_close(struct net_device *dev) retval = __dev_close_many(&single); list_del(&single); - netpoll_rx_enable(dev); + netpoll_resume(dev); return retval; } @@ -1360,7 +1360,7 @@ int dev_close(struct net_device *dev) LIST_HEAD(single); /* Block netpoll rx while the interface is going down */ - ret = netpoll_rx_disable(dev); + ret = netpoll_suspend(dev); if (ret) return ret; @@ -1368,7 +1368,7 @@ int dev_close(struct net_device *dev) dev_close_many(&single); list_del(&single); - netpoll_rx_enable(dev); + netpoll_resume(dev); } return ret; } diff --git a/net/core/netpoll.c b/net/core/netpoll.c index a3a17ae..a094227 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -50,8 +50,6 @@ static atomic_t trapped; DEFINE_STATIC_SRCU(netpoll_srcu); #define USEC_PER_POLL 50 -#define NETPOLL_RX_ENABLED 1 -#define NETPOLL_RX_DROP 2 #define MAX_SKB_SIZE \ (sizeof(struct ethhdr) + \ @@ -155,7 +153,7 @@ static int poll_one_napi(struct netpoll_info *npinfo, if (!test_bit(NAPI_STATE_SCHED, &napi->state)) return budget; - npinfo->rx_flags |= NETPOLL_RX_DROP; + set_bit(NETPOLL_RX_DROP, &npinfo->flags); atomic_inc(&trapped); set_bit(NAPI_STATE_NPSVC, &napi->state); @@ -164,7 +162,7 @@ static int poll_one_napi(struct netpoll_info *npinfo, clear_bit(NAPI_STATE_NPSVC, &napi->state); atomic_dec(&trapped); - npinfo->rx_flags &= ~NETPOLL_RX_DROP; + clear_bit(NETPOLL_RX_DROP, &npinfo->flags); return budget - work; } @@ -206,27 +204,22 @@ static void netpoll_poll_dev(struct net_device *dev) * the dev_open/close paths use this to block netpoll activity * while changing device state */ - if (!mutex_trylock(&ni->dev_lock)) - return; + set_bit(NETPOLL_POLLING, &ni->flags); + if (test_bit(NETPOLL_SUSPENDED, &ni->flags)) + goto out; - if (!netif_running(dev)) { - mutex_unlock(&ni->dev_lock); - return; - } + if (!netif_running(dev)) + goto out; ops = dev->netdev_ops; - if (!ops->ndo_poll_controller) { - mutex_unlock(&ni->dev_lock); - return; - } + if (!ops->ndo_poll_controller) + goto out; /* Process pending work on NIC */ ops->ndo_poll_controller(dev); poll_napi(dev); - mutex_unlock(&ni->dev_lock); - if (dev->flags & IFF_SLAVE) { if (ni) { struct net_device *bond_dev; @@ -245,32 +238,37 @@ static void netpoll_poll_dev(struct net_device *dev) service_neigh_queue(ni); zap_completion_queue(); +out: + clear_bit(NETPOLL_POLLING, &ni->dev_wait.lock); + wake_up(&ni->dev_wait); } -int netpoll_rx_disable(struct net_device *dev) +int netpoll_suspend(struct net_device *dev) { struct netpoll_info *ni; int idx; + might_sleep(); idx = srcu_read_lock(&netpoll_srcu); ni = srcu_dereference(dev->npinfo, &netpoll_srcu); - if (ni) - mutex_lock(&ni->dev_lock); + if (ni && !test_and_set_bit(NETPOLL_SUSPENDED, &ni->flags)) + wait_event(ni->dev_wait, !test_bit(NETPOLL_POLLING, + &ni->flags)); srcu_read_unlock(&netpoll_srcu, idx); return 0; } -EXPORT_SYMBOL(netpoll_rx_disable); +EXPORT_SYMBOL(netpoll_suspend); -void netpoll_rx_enable(struct net_device *dev) +void netpoll_resume(struct net_device *dev) { struct netpoll_info *ni; rcu_read_lock(); ni = rcu_dereference(dev->npinfo); if (ni) - mutex_unlock(&ni->dev_lock); + clear_bit(NETPOLL_SUSPENDED, &ni->flags); rcu_read_unlock(); } -EXPORT_SYMBOL(netpoll_rx_enable); +EXPORT_SYMBOL(netpoll_resume); static void refill_skbs(void) { @@ -1042,11 +1040,11 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp) goto out; } - npinfo->rx_flags = 0; + npinfo->flags = 0; INIT_LIST_HEAD(&npinfo->rx_np); spin_lock_init(&npinfo->rx_lock); - mutex_init(&npinfo->dev_lock); + init_waitqueue_head(&npinfo->dev_wait); skb_queue_head_init(&npinfo->neigh_tx); skb_queue_head_init(&npinfo->txq); INIT_DELAYED_WORK(&npinfo->tx_work, queue_process); @@ -1068,7 +1066,7 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev, gfp_t gfp) if (np->rx_hook) { spin_lock_irqsave(&npinfo->rx_lock, flags); - npinfo->rx_flags |= NETPOLL_RX_ENABLED; + set_bit(NETPOLL_RX_ENABLED, &npinfo->flags); list_add_tail(&np->rx, &npinfo->rx_np); spin_unlock_irqrestore(&npinfo->rx_lock, flags); } @@ -1251,7 +1249,7 @@ void __netpoll_cleanup(struct netpoll *np) spin_lock_irqsave(&npinfo->rx_lock, flags); list_del(&np->rx); if (list_empty(&npinfo->rx_np)) - npinfo->rx_flags &= ~NETPOLL_RX_ENABLED; + clear_bit(NETPOLL_RX_ENABLED, &npinfo->flags); spin_unlock_irqrestore(&npinfo->rx_lock, flags); }
Well, I guess Ingo doesn't have time to respond here. Given that, I've been looking at other ways of adressing this problem (as the more I think about it, the more I think changing mutex semantics, even to be more accepting, is going to be met with lots of friction). I've come up with the below change, and it works well for me (I've hammered it with ifup/downs while sending messages over netconsole for 24 hours now without any incident). I'm not sure I like all the details yet (I modified the rx_flag field to be a gneral flags field that uses test/set/clear_bit, which I don't think is necessecary, but it seems saner to me). the meat of the change though is that instead of a mutex, we use two flags that, respectively, inhibit netpoll usage, and indicate when we're in a netpoll service path. Using these two flags, in conjunction with a wait_queue, we can signal to executors in the netpoll path that we're inhibiting its use and expect to any user in the path to complete and wake up any contexts in the inhibiting path. Seems to work well for me. Let me know what you think, and I'll make any desired changes and post an official patch. Best Neil commit c277d54d471572b0ec7b27812d205fb303ad44f9 Author: Neil Horman <nhorman@tuxdriver.com> Date: Wed Apr 17 15:59:44 2013 -0400 netpoll: convert mutex into a wait_event Bart Van Assche recently reported a warning to me: <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 [<ffffffff8146f9f6>] printk+0x4d/0x4f [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] This resulted from my commit ca99ca14c which introduced a mutex_trylock operation in a path that could execute in interrupt context. When mutex debugging is enabled, the above warns the user when we are in fact exectuting in interrupt context. After some discussion, I'm not sure a mutex is the right choice here, as it seems unsafe to use (deadlock possibilities due to implementation details). Instead lets convert the mutex to a mechanism that makes use of a wait_queue_head and some flag bits. The dev_close paths can now set a bit to suspend netpoll and then sleep waiting for the NETPOLL_POLLING flag to clear, which allows for flushing of any in-progress execution in the netpoll_poll_dev_path. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Bart Van Assche <bvanassche@acm.org> CC: Bart Van Assche <bvanassche@acm.org> CC: David Miller <davem@davemloft.net> CC: netdev@vger.kernel.org -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html