diff mbox

netconsole: prevent deadlock when removing net driver that netconsole is using

Message ID 1303481216-10348-1-git-send-email-nhorman@tuxdriver.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Neil Horman April 22, 2011, 2:06 p.m. UTC
A deadlock was reported to me recently that occured when netconsole was being
used in a virtual guest.  If the virtio_net driver was removed while netconsole
was setup to use an interface that was driven by that driver, the guest
deadlocked.  No backtrace was provided because netconsole was the only console
configured, but it became clear pretty quickly what the problem was.  In
netconsole_netdev_event, if we get an unregister event, we call
__netpoll_cleanup with the target_list_lock held and irqs disabled.
__netpoll_cleanup can, if pending netpoll packets are waiting call
cancel_delayed_work_sync, which is a sleeping path.  the might_sleep call in
that path gets triggered, causing a console warning to be issued.  The
netconsole write handler of course tries to take the target_list_lock again,
which we already hold, causing deadlock.

The fix is pretty striaghtforward.  Simply drop the target_list_lock and
re-enable irqs prior to calling __netpoll_cleanup, the re-acquire the lock, and
restart the loop.  Confirmed by myself to fix the problem reported.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
---
 drivers/net/netconsole.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

Comments

Neil Horman April 22, 2011, 2:44 p.m. UTC | #1
On Fri, Apr 22, 2011 at 10:06:56AM -0400, Neil Horman wrote:
> A deadlock was reported to me recently that occured when netconsole was being
> used in a virtual guest.  If the virtio_net driver was removed while netconsole
> was setup to use an interface that was driven by that driver, the guest
> deadlocked.  No backtrace was provided because netconsole was the only console
> configured, but it became clear pretty quickly what the problem was.  In
> netconsole_netdev_event, if we get an unregister event, we call
> __netpoll_cleanup with the target_list_lock held and irqs disabled.
> __netpoll_cleanup can, if pending netpoll packets are waiting call
> cancel_delayed_work_sync, which is a sleeping path.  the might_sleep call in
> that path gets triggered, causing a console warning to be issued.  The
> netconsole write handler of course tries to take the target_list_lock again,
> which we already hold, causing deadlock.
> 
> The fix is pretty striaghtforward.  Simply drop the target_list_lock and
> re-enable irqs prior to calling __netpoll_cleanup, the re-acquire the lock, and
> restart the loop.  Confirmed by myself to fix the problem reported.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: "David S. Miller" <davem@davemloft.net>
> ---
>  drivers/net/netconsole.c |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
> index dfb67eb..f365840 100644
> --- a/drivers/net/netconsole.c
> +++ b/drivers/net/netconsole.c
> @@ -670,6 +670,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
>  	      event == NETDEV_BONDING_DESLAVE || event == NETDEV_GOING_DOWN))
>  		goto done;
>  
> +restart:
>  	spin_lock_irqsave(&target_list_lock, flags);
>  	list_for_each_entry(nt, &target_list, list) {
>  		netconsole_target_get(nt);
> @@ -683,9 +684,12 @@ static int netconsole_netdev_event(struct notifier_block *this,
>  				 * rtnl_lock already held
>  				 */
>  				if (nt->np.dev) {
> +					spin_unlock_irqrestore(&target_list_lock,
> +							       flags);
>  					__netpoll_cleanup(&nt->np);
>  					dev_put(nt->np.dev);
>  					nt->np.dev = NULL;
> +					goto restart;
>  				}
>  				/* Fall through */
>  			case NETDEV_GOING_DOWN:
> -- 
> 1.7.4.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Crap, hold on this, sorry.  Just saw that I forgot to issue a
netconsole_target_put to balance the get in the restart path.  I'll send a new
patch shortly.
Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index dfb67eb..f365840 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -670,6 +670,7 @@  static int netconsole_netdev_event(struct notifier_block *this,
 	      event == NETDEV_BONDING_DESLAVE || event == NETDEV_GOING_DOWN))
 		goto done;
 
+restart:
 	spin_lock_irqsave(&target_list_lock, flags);
 	list_for_each_entry(nt, &target_list, list) {
 		netconsole_target_get(nt);
@@ -683,9 +684,12 @@  static int netconsole_netdev_event(struct notifier_block *this,
 				 * rtnl_lock already held
 				 */
 				if (nt->np.dev) {
+					spin_unlock_irqrestore(&target_list_lock,
+							       flags);
 					__netpoll_cleanup(&nt->np);
 					dev_put(nt->np.dev);
 					nt->np.dev = NULL;
+					goto restart;
 				}
 				/* Fall through */
 			case NETDEV_GOING_DOWN: