diff mbox series

[SRU,Bionic] scsi: libiscsi: Allow sd_shutdown on bad transport

Message ID 20180119203539.17885-1-rafael.tinoco@canonical.com
State New
Headers show
Series [SRU,Bionic] scsi: libiscsi: Allow sd_shutdown on bad transport | expand

Commit Message

Rafael David Tinoco Jan. 19, 2018, 8:35 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1569925

If, for any reason, userland shuts down iscsi transport interfaces
before proper logouts - like when logging in to LUNs manually, without
logging out on server shutdown, or when automated scripts can't
umount/logout from logged LUNs - kernel will hang forever on its
sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
still existent paths.

PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
 #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
 #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
 #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
 #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c

This happens because iscsi_eh_cmd_timed_out(), the transport layer
timeout helper, would tell the queue timeout function (scsi_times_out)
to reset the request timer over and over, until the session state is
back to logged in state. Unfortunately, during server shutdown, this
might never happen again.

Other option would be "not to handle" the issue in the transport
layer. That would trigger the error handler logic, which would also need
the session state to be logged in again.

Best option, for such case, is to tell upper layers that the command was
handled during the transport layer error handler helper, marking it as
DID_NO_CONNECT, which will allow completion and inform about the
problem.

After the session was marked as ISCSI_STATE_FAILED, due to the first
timeout during the server shutdown phase, all subsequent cmds will fail
to be queued, allowing upper logic to fail faster.

Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com>
(cherry-picked from commit d754941225a7dbc61f6dd2173fa9498049f9a7ee next-20180117)
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/scsi/libiscsi.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

Comments

Kleber Sacilotto de Souza Jan. 23, 2018, 3:27 p.m. UTC | #1
On 01/19/18 21:35, Rafael David Tinoco wrote:
> BugLink: https://bugs.launchpad.net/bugs/1569925
> 
> If, for any reason, userland shuts down iscsi transport interfaces
> before proper logouts - like when logging in to LUNs manually, without
> logging out on server shutdown, or when automated scripts can't
> umount/logout from logged LUNs - kernel will hang forever on its
> sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
> still existent paths.
> 
> PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
>  #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
>  #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
>  #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
>  #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
>  #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
>  #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
>  #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
>  #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
>  #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
>  #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
> 
> This happens because iscsi_eh_cmd_timed_out(), the transport layer
> timeout helper, would tell the queue timeout function (scsi_times_out)
> to reset the request timer over and over, until the session state is
> back to logged in state. Unfortunately, during server shutdown, this
> might never happen again.
> 
> Other option would be "not to handle" the issue in the transport
> layer. That would trigger the error handler logic, which would also need
> the session state to be logged in again.
> 
> Best option, for such case, is to tell upper layers that the command was
> handled during the transport layer error handler helper, marking it as
> DID_NO_CONNECT, which will allow completion and inform about the
> problem.
> 
> After the session was marked as ISCSI_STATE_FAILED, due to the first
> timeout during the server shutdown phase, all subsequent cmds will fail
> to be queued, allowing upper logic to fail faster.
> 
> Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com>
> (cherry-picked from commit d754941225a7dbc61f6dd2173fa9498049f9a7ee next-20180117)

Hi Rafael,

If the same patch can be applied cleanly for more than one series
there's no need to send a patch for each one of them. You can send a
single email just adding to the subject the prefixes for the series that
it applies to.

In this case it could be:

[PATCH][SRU][Bionic][Artful][Xenial][Trusty] scsi: libiscsi: ...

You can also use only the first letter of the series to have an
abbreviated form, e.g.:

[PATCH][SRU][B/A/X/T] scsi: libiscsi: ...


Thanks,
Kleber

> Reviewed-by: Lee Duncan <lduncan@suse.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  drivers/scsi/libiscsi.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 42381adf0769..f11c16500ff1 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1696,6 +1696,15 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc)
>  		 */
>  		switch (session->state) {
>  		case ISCSI_STATE_FAILED:
> +			/*
> +			 * cmds should fail during shutdown, if the session
> +			 * state is bad, allowing completion to happen
> +			 */
> +			if (unlikely(system_state != SYSTEM_RUNNING)) {
> +				reason = FAILURE_SESSION_FAILED;
> +				sc->result = DID_NO_CONNECT << 16;
> +				break;
> +			}
>  		case ISCSI_STATE_IN_RECOVERY:
>  			reason = FAILURE_SESSION_IN_RECOVERY;
>  			sc->result = DID_IMM_RETRY << 16;
> @@ -1980,6 +1989,19 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
>  	}
>  
>  	if (session->state != ISCSI_STATE_LOGGED_IN) {
> +		/*
> +		 * During shutdown, if session is prematurely disconnected,
> +		 * recovery won't happen and there will be hung cmds. Not
> +		 * handling cmds would trigger EH, also bad in this case.
> +		 * Instead, handle cmd, allow completion to happen and let
> +		 * upper layer to deal with the result.
> +		 */
> +		if (unlikely(system_state != SYSTEM_RUNNING)) {
> +			sc->result = DID_NO_CONNECT << 16;
> +			ISCSI_DBG_EH(session, "sc on shutdown, handled\n");
> +			rc = BLK_EH_HANDLED;
> +			goto done;
> +		}
>  		/*
>  		 * We are probably in the middle of iscsi recovery so let
>  		 * that complete and handle the error.
> @@ -2084,7 +2106,7 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
>  		task->last_timeout = jiffies;
>  	spin_unlock(&session->frwd_lock);
>  	ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ?
> -		     "timer reset" : "nh");
> +		     "timer reset" : "shutdown or nh");
>  	return rc;
>  }
>  EXPORT_SYMBOL_GPL(iscsi_eh_cmd_timed_out);
>
Rafael David Tinoco Jan. 23, 2018, 3:33 p.m. UTC | #2
Hi Kleber

> Hi Rafael,
> 
> If the same patch can be applied cleanly for more than one series
> there's no need to send a patch for each one of them. You can send a
> single email just adding to the subject the prefixes for the series that
> it applies to.

I was in doubt of that. I confess the public documentation:

https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat <https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat>

Is a bit outdated, and I looked for such information there.

> 
> In this case it could be:
> 
> [PATCH][SRU][Bionic][Artful][Xenial][Trusty] scsi: libiscsi: ...

Good to know. Will use in the future. For this case, please note that the Trusty patch wasn't clean cherry-pick, like others, but didn't differ in anything meaningful. 

> 
> You can also use only the first letter of the series to have an
> abbreviated form, e.g.:
> 
> [PATCH][SRU][B/A/X/T] scsi: libiscsi: ...

Yep, will use it. Sorry for the overhead this time.

> Thanks,
> Kleber

No, Tku!
<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div>Hi Kleber</div><div><br class=""><blockquote type="cite" class=""><div class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Hi Rafael,</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">If the same patch can be applied cleanly for more than one series</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">there's no need to send a patch for each one of them. You can send a</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">single email just adding to the subject the prefixes for the series that</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">it applies to.</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote><div><br class=""></div><div>I was in doubt of that. I confess the public documentation:</div><div><br class=""></div><div><a href="https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat" class="">https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat</a></div><div><br class=""></div><div>Is a bit outdated, and I looked for such information there.</div><br class=""><blockquote type="cite" class=""><div class=""><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">In this case it could be:</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">[PATCH][SRU][Bionic][Artful][Xenial][Trusty] scsi: libiscsi: ...</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote><div><br class=""></div><div>Good to know. Will use in the future. For this case, please note that the Trusty patch wasn't clean cherry-pick, like others, but didn't differ in anything meaningful.&nbsp;</div><br class=""><blockquote type="cite" class=""><div class=""><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">You can also use only the first letter of the series to have an</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">abbreviated form, e.g.:</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">[PATCH][SRU][B/A/X/T] scsi: libiscsi: ...</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote><div><br class=""></div><div>Yep, will use it. Sorry for the overhead this time.</div><br class=""><blockquote type="cite" class=""><div class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Thanks,</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Kleber</span><br style="font-family: LucidaConsole; font-size: 13px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote></div><br class=""><div class="">No, Tku!</div></body></html>
Kleber Sacilotto de Souza Jan. 23, 2018, 4 p.m. UTC | #3
On 01/23/18 16:33, Rafael David Tinoco wrote:
> Hi Kleber
> 
>> Hi Rafael,
>>
>> If the same patch can be applied cleanly for more than one series
>> there's no need to send a patch for each one of them. You can send a
>> single email just adding to the subject the prefixes for the series that
>> it applies to.
> 
> I was in doubt of that. I confess the public documentation:
> 
> https://wiki.ubuntu.com/Kernel/Dev/StablePatchFormat
> 
> Is a bit outdated, and I looked for such information there.

It's indeed a bit outdated, we should spend some time in updating it to
our current procedures.

> 
>>
>> In this case it could be:
>>
>> [PATCH][SRU][Bionic][Artful][Xenial][Trusty] scsi: libiscsi: ...
> 
> Good to know. Will use in the future. For this case, please note that
> the Trusty patch wasn't clean cherry-pick, like others, but didn't
> differ in anything meaningful.

When it doesn't apply cleanly please mark it as (backported from ...),
even if it's only for small things such as context adjustment. The rule
is that 'cherry picked from ...' should be used only if the patch can be
applied with git am.

Also, to help making the review a bit easier for us, the patches can be
sent in a thread with a cover letter containing the SRU justification.
 
> 
>>
>> You can also use only the first letter of the series to have an
>> abbreviated form, e.g.:
>>
>> [PATCH][SRU][B/A/X/T] scsi: libiscsi: ...
> 
> Yep, will use it. Sorry for the overhead this time.
> 
>> Thanks,
>> Kleber
> 
> No, Tku!
Rafael David Tinoco Jan. 23, 2018, 4:05 p.m. UTC | #4
> On 23/01/2018, at 02:00 PM, Kleber Souza <kleber.souza@canonical.com> wrote:
> 
>>> In this case it could be:
>>> 
>>> [PATCH][SRU][Bionic][Artful][Xenial][Trusty] scsi: libiscsi: ...
>> 
>> Good to know. Will use in the future. For this case, please note that
>> the Trusty patch wasn't clean cherry-pick, like others, but didn't
>> differ in anything meaningful.
> 
> When it doesn't apply cleanly please mark it as (backported from ...),
> even if it's only for small things such as context adjustment. The rule
> is that 'cherry picked from ...' should be used only if the patch can be
> applied with git am.
> 
> Also, to help making the review a bit easier for us, the patches can be
> sent in a thread with a cover letter containing the SRU justification.

Makes perfect sense. Should I re-do it ? What do you prefer ?

Tku
Kleber Sacilotto de Souza Jan. 23, 2018, 4:40 p.m. UTC | #5
On 01/23/18 17:05, Rafael David Tinoco wrote:
> 
>> On 23/01/2018, at 02:00 PM, Kleber Souza <kleber.souza@canonical.com> wrote:
>>
>>>> In this case it could be:
>>>>
>>>> [PATCH][SRU][Bionic][Artful][Xenial][Trusty] scsi: libiscsi: ...
>>>
>>> Good to know. Will use in the future. For this case, please note that
>>> the Trusty patch wasn't clean cherry-pick, like others, but didn't
>>> differ in anything meaningful.
>>
>> When it doesn't apply cleanly please mark it as (backported from ...),
>> even if it's only for small things such as context adjustment. The rule
>> is that 'cherry picked from ...' should be used only if the patch can be
>> applied with git am.
>>
>> Also, to help making the review a bit easier for us, the patches can be
>> sent in a thread with a cover letter containing the SRU justification.
> 
> Makes perfect sense. Should I re-do it ? What do you prefer ?
> 
> Tku
> 

Could you please re-send the patches?

Another note [sorry for so many small rules :-)]: we try to keep the
original provenance block of the original patch intact and add the
additional information below it. So in this case that you are also the
original author of the patch, that would require another SOB line and
the backport/cherry-pick information, so it would be something like this:

Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com>
Reviewed-by: [...]
Signed-off-by: [...]
(cherry-picked from commit d754941225a7dbc61f6dd2173fa9498049f9a7ee
next-20180117)
Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com>


thanks!
Kleber
Seth Forshee Jan. 23, 2018, 4:43 p.m. UTC | #6
On Fri, Jan 19, 2018 at 08:35:39PM +0000, Rafael David Tinoco wrote:
> BugLink: https://bugs.launchpad.net/bugs/1569925
> 
> If, for any reason, userland shuts down iscsi transport interfaces
> before proper logouts - like when logging in to LUNs manually, without
> logging out on server shutdown, or when automated scripts can't
> umount/logout from logged LUNs - kernel will hang forever on its
> sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
> still existent paths.
> 
> PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
>  #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
>  #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
>  #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
>  #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
>  #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
>  #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
>  #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
>  #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
>  #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
>  #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
> 
> This happens because iscsi_eh_cmd_timed_out(), the transport layer
> timeout helper, would tell the queue timeout function (scsi_times_out)
> to reset the request timer over and over, until the session state is
> back to logged in state. Unfortunately, during server shutdown, this
> might never happen again.
> 
> Other option would be "not to handle" the issue in the transport
> layer. That would trigger the error handler logic, which would also need
> the session state to be logged in again.
> 
> Best option, for such case, is to tell upper layers that the command was
> handled during the transport layer error handler helper, marking it as
> DID_NO_CONNECT, which will allow completion and inform about the
> problem.
> 
> After the session was marked as ISCSI_STATE_FAILED, due to the first
> timeout during the server shutdown phase, all subsequent cmds will fail
> to be queued, allowing upper logic to fail faster.
> 
> Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com>
> (cherry-picked from commit d754941225a7dbc61f6dd2173fa9498049f9a7ee next-20180117)
> Reviewed-by: Lee Duncan <lduncan@suse.com>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Applied to bionic/master-next and unstable/master, thanks!
diff mbox series

Patch

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 42381adf0769..f11c16500ff1 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1696,6 +1696,15 @@  int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc)
 		 */
 		switch (session->state) {
 		case ISCSI_STATE_FAILED:
+			/*
+			 * cmds should fail during shutdown, if the session
+			 * state is bad, allowing completion to happen
+			 */
+			if (unlikely(system_state != SYSTEM_RUNNING)) {
+				reason = FAILURE_SESSION_FAILED;
+				sc->result = DID_NO_CONNECT << 16;
+				break;
+			}
 		case ISCSI_STATE_IN_RECOVERY:
 			reason = FAILURE_SESSION_IN_RECOVERY;
 			sc->result = DID_IMM_RETRY << 16;
@@ -1980,6 +1989,19 @@  enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
 	}
 
 	if (session->state != ISCSI_STATE_LOGGED_IN) {
+		/*
+		 * During shutdown, if session is prematurely disconnected,
+		 * recovery won't happen and there will be hung cmds. Not
+		 * handling cmds would trigger EH, also bad in this case.
+		 * Instead, handle cmd, allow completion to happen and let
+		 * upper layer to deal with the result.
+		 */
+		if (unlikely(system_state != SYSTEM_RUNNING)) {
+			sc->result = DID_NO_CONNECT << 16;
+			ISCSI_DBG_EH(session, "sc on shutdown, handled\n");
+			rc = BLK_EH_HANDLED;
+			goto done;
+		}
 		/*
 		 * We are probably in the middle of iscsi recovery so let
 		 * that complete and handle the error.
@@ -2084,7 +2106,7 @@  enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc)
 		task->last_timeout = jiffies;
 	spin_unlock(&session->frwd_lock);
 	ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ?
-		     "timer reset" : "nh");
+		     "timer reset" : "shutdown or nh");
 	return rc;
 }
 EXPORT_SYMBOL_GPL(iscsi_eh_cmd_timed_out);