From patchwork Fri Mar 23 00:05:11 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 148356 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 4B543B6F9F for ; Fri, 23 Mar 2012 10:49:47 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760192Ab2CVXtp (ORCPT ); Thu, 22 Mar 2012 19:49:45 -0400 Received: from mga09.intel.com ([134.134.136.24]:43594 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756242Ab2CVXtn (ORCPT ); Thu, 22 Mar 2012 19:49:43 -0400 Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP; 22 Mar 2012 16:49:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.67,352,1309762800"; d="scan'208";a="120690903" Received: from dwillia2-linux.jf.intel.com ([10.23.45.110]) by orsmga001.jf.intel.com with ESMTP; 22 Mar 2012 16:49:42 -0700 Received: from dwillia2-linux.jf.intel.com (localhost.localdomain [IPv6:::1]) by dwillia2-linux.jf.intel.com (Postfix) with ESMTP id 48303800F7; Thu, 22 Mar 2012 17:05:11 -0700 (PDT) Subject: [libsas PATCH v13] scsi, sd: limit the scope of the async probe domain To: linux-scsi@vger.kernel.org From: Dan Williams Cc: linux-ide@vger.kernel.org, Alan Stern Date: Thu, 22 Mar 2012 17:05:11 -0700 Message-ID: <20120323000201.17750.18754.stgit@dwillia2-linux.jf.intel.com> User-Agent: StGit/0.16-1-g7004 MIME-Version: 1.0 Sender: linux-ide-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ide@vger.kernel.org sd injects and synchronizes probe work on the global kernel-wide domain. This runs into conflict with PM that wants to perform resume actions in async context: [ 494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds. [ 494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 494.360809] kworker/u:3 D 0000000000000000 0 554 2 0x00000000 [ 494.420739] ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8 [ 494.484392] ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160 [ 494.548038] ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398 [ 494.611685] Call Trace: [ 494.632649] [] schedule+0x5a/0x5c [ 494.674687] [] async_synchronize_cookie_domain+0xb6/0x112 [ 494.734177] [] ? __init_waitqueue_head+0x50/0x50 [ 494.787134] [] ? scsi_remove_target+0x48/0x48 [ 494.837900] [] async_synchronize_cookie+0x15/0x17 [ 494.891567] [] async_synchronize_full+0x54/0x70 <-- here we wait for async contexts to complete [ 494.943783] [] ? async_synchronize_full_domain+0x1a/0x1a [ 495.002547] [] sd_remove+0x2c/0xa2 [sd_mod] [ 495.051861] [] __device_release_driver+0x86/0xcf [ 495.104807] [] device_release_driver+0x25/0x32 <-- here we take device_lock() [ 853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds. [ 853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 853.635119] kworker/u:4 D ffff88013097b5d0 0 549 2 0x00000000 [ 853.695129] ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8 [ 853.758990] ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000 [ 853.822796] 0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000 [ 853.886633] Call Trace: [ 853.907631] [] schedule+0x5a/0x5c [ 853.949670] [] __mutex_lock_common+0x220/0x351 [ 854.001225] [] ? device_resume+0x58/0x1c4 [ 854.049082] [] ? device_resume+0x58/0x1c4 [ 854.097011] [] mutex_lock_nested+0x2f/0x36 <-- here we wait for device_lock() [ 854.145591] [] device_resume+0x58/0x1c4 [ 854.192066] [] async_resume+0x1e/0x45 [ 854.237019] [] async_run_entry_fn+0xc6/0x173 <-- ...while running in async context Provide a 'scsi_sd_probe_domain' so that async probe actions actions can be flushed without regard for the state of PM, and allow for the resume path to handle devices that have transitioned from SDEV_QUIESCE to SDEV_DEL prior to resume. Acked-by: Alan Stern [alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume] Signed-off-by: Dan Williams --- Changes since v12: http://marc.info/?l=linux-scsi&m=133239705303419&w=2 Just update this one patch with Alan's feedback to move the declaration scsi_sd_probe_domain into scsi_priv.h. git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci.git libsas-eh-reworks-v13 drivers/scsi/scsi.c | 6 ++++++ drivers/scsi/scsi_lib.c | 10 +++++++--- drivers/scsi/scsi_pm.c | 2 +- drivers/scsi/scsi_priv.h | 4 ++++ drivers/scsi/sd.c | 5 +++-- 5 files changed, 21 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index 07322ec..61c82a3 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -90,6 +90,12 @@ unsigned int scsi_logging_level; EXPORT_SYMBOL(scsi_logging_level); #endif +#if IS_ENABLED(CONFIG_PM) || IS_ENABLED(CONFIG_BLK_DEV_SD) +/* sd and scsi_pm need to coordinate flushing async actions */ +LIST_HEAD(scsi_sd_probe_domain); +EXPORT_SYMBOL(scsi_sd_probe_domain); +#endif + /* NB: These are exposed through /proc/scsi/scsi and form part of the ABI. * You may not alter any existing entry (although adding new ones is * encouraged once assigned by ANSI/INCITS T10 diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index b4833de..992ff63 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2348,10 +2348,14 @@ EXPORT_SYMBOL(scsi_device_quiesce); * * Must be called with user context, may sleep. */ -void -scsi_device_resume(struct scsi_device *sdev) +void scsi_device_resume(struct scsi_device *sdev) { - if(scsi_device_set_state(sdev, SDEV_RUNNING)) + /* check if the device state was mutated prior to resume, and if + * so assume the state is being managed elsewhere (for example + * device deleted during suspend) + */ + if (sdev->sdev_state != SDEV_QUIESCE || + scsi_device_set_state(sdev, SDEV_RUNNING)) return; scsi_run_queue(sdev->request_queue); } diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c index c467064..f661a41 100644 --- a/drivers/scsi/scsi_pm.c +++ b/drivers/scsi/scsi_pm.c @@ -97,7 +97,7 @@ static int scsi_bus_prepare(struct device *dev) { if (scsi_is_sdev_device(dev)) { /* sd probing uses async_schedule. Wait until it finishes. */ - async_synchronize_full(); + async_synchronize_full_domain(&scsi_sd_probe_domain); } else if (scsi_is_host_device(dev)) { /* Wait until async scanning is finished */ diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index be4fa6d..bec061d 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -163,6 +163,10 @@ static inline int scsi_autopm_get_host(struct Scsi_Host *h) { return 0; } static inline void scsi_autopm_put_host(struct Scsi_Host *h) {} #endif /* CONFIG_PM_RUNTIME */ +#if IS_ENABLED(CONFIG_PM) || IS_ENABLED(CONFIG_BLK_DEV_SD) +extern struct list_head scsi_sd_probe_domain; +#endif + /* * internal scsi timeout functions: for use by mid-layer and transport * classes. diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index bd17cf8..19e27d8 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -65,6 +65,7 @@ #include #include "sd.h" +#include "scsi_priv.h" #include "scsi_logging.h" MODULE_AUTHOR("Eric Youngdale"); @@ -2717,7 +2718,7 @@ static int sd_probe(struct device *dev) dev_set_drvdata(dev, sdkp); get_device(&sdkp->dev); /* prevent release before async_schedule */ - async_schedule(sd_probe_async, sdkp); + async_schedule_domain(sd_probe_async, sdkp, &scsi_sd_probe_domain); return 0; @@ -2751,7 +2752,7 @@ static int sd_remove(struct device *dev) sdkp = dev_get_drvdata(dev); scsi_autopm_get_device(sdkp->device); - async_synchronize_full(); + async_synchronize_full_domain(&scsi_sd_probe_domain); blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn); blk_queue_unprep_rq(sdkp->device->request_queue, NULL); device_del(&sdkp->dev);