diff mbox

[2/6] cxlflash: Remove the device cleanly in the system shutdown path

Message ID 1472848756-64998-1-git-send-email-ukrishn@linux.vnet.ibm.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Uma Krishnan Sept. 2, 2016, 8:39 p.m. UTC
Commit 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash
cards") was recently introduced to notify the AFU when a system is going
down. Due to the position of the cxlflash driver in the device stack,
cxlflash devices are _always_ removed during a reboot/shutdown. This can
lead to a crash if the cxlflash shutdown hook is invoked _after_ the
shutdown hook for the owning virtual PHB. Furthermore, the current
implementation of shutdown/remove hooks for cxlflash are not tolerant to
being invoked when the device is not enabled. This can also lead to a
crash in situations where the remove hook is invoked after the device has
been removed via the vPHBs shutdown hook. An example of this scenario
would be an EEH reset failure while a reboot/shutdown is in progress.

To solve both problems, the shutdown hook for cxlflash is updated to
simply remove the device. This path already includes the AFU notification
and thus this solution will continue to perform the original intent. At
the same time, the remove hook is updated to protect against being
called when the device is not enabled.

Fixes: 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash
cards")
Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
---
 drivers/scsi/cxlflash/main.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

Comments

Andrew Donnellan Sept. 5, 2016, 7:12 a.m. UTC | #1
On 03/09/16 06:39, Uma Krishnan wrote:
> Commit 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash
> cards") was recently introduced to notify the AFU when a system is going
> down. Due to the position of the cxlflash driver in the device stack,
> cxlflash devices are _always_ removed during a reboot/shutdown. This can
> lead to a crash if the cxlflash shutdown hook is invoked _after_ the
> shutdown hook for the owning virtual PHB. Furthermore, the current
> implementation of shutdown/remove hooks for cxlflash are not tolerant to
> being invoked when the device is not enabled. This can also lead to a
> crash in situations where the remove hook is invoked after the device has
> been removed via the vPHBs shutdown hook. An example of this scenario
> would be an EEH reset failure while a reboot/shutdown is in progress.
>
> To solve both problems, the shutdown hook for cxlflash is updated to
> simply remove the device. This path already includes the AFU notification
> and thus this solution will continue to perform the original intent. At
> the same time, the remove hook is updated to protect against being
> called when the device is not enabled.
>
> Fixes: 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash
> cards")
> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
> ---
>  drivers/scsi/cxlflash/main.c | 18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
> index b063c41..4c2559a 100644
> --- a/drivers/scsi/cxlflash/main.c
> +++ b/drivers/scsi/cxlflash/main.c
> @@ -823,17 +823,6 @@ static void notify_shutdown(struct cxlflash_cfg *cfg, bool wait)
>  }
>
>  /**
> - * cxlflash_shutdown() - shutdown handler
> - * @pdev:	PCI device associated with the host.
> - */
> -static void cxlflash_shutdown(struct pci_dev *pdev)
> -{
> -	struct cxlflash_cfg *cfg = pci_get_drvdata(pdev);
> -
> -	notify_shutdown(cfg, false);

You can get rid of the second parameter to notify_shutdown() now.

> -}
> -
> -/**
>   * cxlflash_remove() - PCI entry point to tear down host
>   * @pdev:	PCI device associated with the host.
>   *
> @@ -844,6 +833,11 @@ static void cxlflash_remove(struct pci_dev *pdev)
>  	struct cxlflash_cfg *cfg = pci_get_drvdata(pdev);
>  	ulong lock_flags;
>
> +	if (!pci_is_enabled(pdev)) {
> +		pr_debug("%s: Device is disabled\n", __func__);
> +		return;
> +	}
> +
>  	/* If a Task Management Function is active, wait for it to complete
>  	 * before continuing with remove.
>  	 */
> @@ -2685,7 +2679,7 @@ static struct pci_driver cxlflash_driver = {
>  	.id_table = cxlflash_pci_table,
>  	.probe = cxlflash_probe,
>  	.remove = cxlflash_remove,
> -	.shutdown = cxlflash_shutdown,
> +	.shutdown = cxlflash_remove,

What's the justification for using cxlflash_remove() as the shutdown 
hook, rather than just not having a shutdown hook at all?
Uma Krishnan Sept. 6, 2016, 8:06 p.m. UTC | #2
On 9/5/2016 2:12 AM, Andrew Donnellan wrote:
> On 03/09/16 06:39, Uma Krishnan wrote:
>> Commit 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash
>> cards") was recently introduced to notify the AFU when a system is going
>> down. Due to the position of the cxlflash driver in the device stack,
>> cxlflash devices are _always_ removed during a reboot/shutdown. This can
>> lead to a crash if the cxlflash shutdown hook is invoked _after_ the
>> shutdown hook for the owning virtual PHB. Furthermore, the current
>> implementation of shutdown/remove hooks for cxlflash are not tolerant to
>> being invoked when the device is not enabled. This can also lead to a
>> crash in situations where the remove hook is invoked after the device has
>> been removed via the vPHBs shutdown hook. An example of this scenario
>> would be an EEH reset failure while a reboot/shutdown is in progress.
>>
>> To solve both problems, the shutdown hook for cxlflash is updated to
>> simply remove the device. This path already includes the AFU notification
>> and thus this solution will continue to perform the original intent. At
>> the same time, the remove hook is updated to protect against being
>> called when the device is not enabled.
>>
>> Fixes: 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash
>> cards")
>> Signed-off-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com>
>> ---
>>  drivers/scsi/cxlflash/main.c | 18 ++++++------------
>>  1 file changed, 6 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
>> index b063c41..4c2559a 100644
>> --- a/drivers/scsi/cxlflash/main.c
>> +++ b/drivers/scsi/cxlflash/main.c
>> @@ -823,17 +823,6 @@ static void notify_shutdown(struct cxlflash_cfg
>> *cfg, bool wait)
>>  }
>>
>>  /**
>> - * cxlflash_shutdown() - shutdown handler
>> - * @pdev:    PCI device associated with the host.
>> - */
>> -static void cxlflash_shutdown(struct pci_dev *pdev)
>> -{
>> -    struct cxlflash_cfg *cfg = pci_get_drvdata(pdev);
>> -
>> -    notify_shutdown(cfg, false);
>
> You can get rid of the second parameter to notify_shutdown() now.
>

Valid comment. This was considered while working the patch. I left it
for now, in case we need this flag in future. If we end up not having
to use it, I will clean it up in a future patch.

>> -}
>> -
>> -/**
>>   * cxlflash_remove() - PCI entry point to tear down host
>>   * @pdev:    PCI device associated with the host.
>>   *
>> @@ -844,6 +833,11 @@ static void cxlflash_remove(struct pci_dev *pdev)
>>      struct cxlflash_cfg *cfg = pci_get_drvdata(pdev);
>>      ulong lock_flags;
>>
>> +    if (!pci_is_enabled(pdev)) {
>> +        pr_debug("%s: Device is disabled\n", __func__);
>> +        return;
>> +    }
>> +
>>      /* If a Task Management Function is active, wait for it to complete
>>       * before continuing with remove.
>>       */
>> @@ -2685,7 +2679,7 @@ static struct pci_driver cxlflash_driver = {
>>      .id_table = cxlflash_pci_table,
>>      .probe = cxlflash_probe,
>>      .remove = cxlflash_remove,
>> -    .shutdown = cxlflash_shutdown,
>> +    .shutdown = cxlflash_remove,
>
> What's the justification for using cxlflash_remove() as the shutdown
> hook, rather than just not having a shutdown hook at all?
>

Even though cxlflash gets cleaned up via cxl_remove() shutdown
hook, today it is a dependency on the code external to our driver.
To protect us from future API changes that could possibly impact
cxlflash, we want to maintain a clean and reliable way to get
called in the shutdown path.
Matthew R. Ochs Sept. 7, 2016, 11:46 p.m. UTC | #3
> On Sep 2, 2016, at 3:39 PM, Uma Krishnan <ukrishn@linux.vnet.ibm.com> wrote:
> 
> Commit 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash
> cards") was recently introduced to notify the AFU when a system is going
> down. Due to the position of the cxlflash driver in the device stack,
> cxlflash devices are _always_ removed during a reboot/shutdown. This can
> lead to a crash if the cxlflash shutdown hook is invoked _after_ the
> shutdown hook for the owning virtual PHB. Furthermore, the current
> implementation of shutdown/remove hooks for cxlflash are not tolerant to
> being invoked when the device is not enabled. This can also lead to a
> crash in situations where the remove hook is invoked after the device has
> been removed via the vPHBs shutdown hook. An example of this scenario
> would be an EEH reset failure while a reboot/shutdown is in progress.
> 
> To solve both problems, the shutdown hook for cxlflash is updated to
> simply remove the device. This path already includes the AFU notification
> and thus this solution will continue to perform the original intent. At
> the same time, the remove hook is updated to protect against being
> called when the device is not enabled.
> 
> Fixes: 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash
> cards")
> Signed-off-by: Uma Krishna <ukrishn@linux.vnet.ibm.com>

Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
diff mbox

Patch

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index b063c41..4c2559a 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -823,17 +823,6 @@  static void notify_shutdown(struct cxlflash_cfg *cfg, bool wait)
 }
 
 /**
- * cxlflash_shutdown() - shutdown handler
- * @pdev:	PCI device associated with the host.
- */
-static void cxlflash_shutdown(struct pci_dev *pdev)
-{
-	struct cxlflash_cfg *cfg = pci_get_drvdata(pdev);
-
-	notify_shutdown(cfg, false);
-}
-
-/**
  * cxlflash_remove() - PCI entry point to tear down host
  * @pdev:	PCI device associated with the host.
  *
@@ -844,6 +833,11 @@  static void cxlflash_remove(struct pci_dev *pdev)
 	struct cxlflash_cfg *cfg = pci_get_drvdata(pdev);
 	ulong lock_flags;
 
+	if (!pci_is_enabled(pdev)) {
+		pr_debug("%s: Device is disabled\n", __func__);
+		return;
+	}
+
 	/* If a Task Management Function is active, wait for it to complete
 	 * before continuing with remove.
 	 */
@@ -2685,7 +2679,7 @@  static struct pci_driver cxlflash_driver = {
 	.id_table = cxlflash_pci_table,
 	.probe = cxlflash_probe,
 	.remove = cxlflash_remove,
-	.shutdown = cxlflash_shutdown,
+	.shutdown = cxlflash_remove,
 	.err_handler = &cxlflash_err_handler,
 };