Message ID | 20180119203544.17983-1-rafael.tinoco@canonical.com |
---|---|
State | New |
Headers | show |
Series | [SRU,Xenial] scsi: libiscsi: Allow sd_shutdown on bad transport | expand |
On 01/19/18 21:35, Rafael David Tinoco wrote: > BugLink: https://bugs.launchpad.net/bugs/1569925 > > If, for any reason, userland shuts down iscsi transport interfaces > before proper logouts - like when logging in to LUNs manually, without > logging out on server shutdown, or when automated scripts can't > umount/logout from logged LUNs - kernel will hang forever on its > sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all > still existent paths. > > PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow" > #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee > #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5 > #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199 > #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604 > #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c > #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10 > #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7 > #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe > #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7 > #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c > > This happens because iscsi_eh_cmd_timed_out(), the transport layer > timeout helper, would tell the queue timeout function (scsi_times_out) > to reset the request timer over and over, until the session state is > back to logged in state. Unfortunately, during server shutdown, this > might never happen again. > > Other option would be "not to handle" the issue in the transport > layer. That would trigger the error handler logic, which would also need > the session state to be logged in again. > > Best option, for such case, is to tell upper layers that the command was > handled during the transport layer error handler helper, marking it as > DID_NO_CONNECT, which will allow completion and inform about the > problem. > > After the session was marked as ISCSI_STATE_FAILED, due to the first > timeout during the server shutdown phase, all subsequent cmds will fail > to be queued, allowing upper logic to fail faster. > > Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com> > (cherry-picked from commit d754941225a7dbc61f6dd2173fa9498049f9a7ee next-20180117) > Reviewed-by: Lee Duncan <lduncan@suse.com> > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > --- > drivers/scsi/libiscsi.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 42381adf0769..f11c16500ff1 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -1696,6 +1696,15 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc) > */ > switch (session->state) { > case ISCSI_STATE_FAILED: > + /* > + * cmds should fail during shutdown, if the session > + * state is bad, allowing completion to happen > + */ > + if (unlikely(system_state != SYSTEM_RUNNING)) { > + reason = FAILURE_SESSION_FAILED; > + sc->result = DID_NO_CONNECT << 16; > + break; > + } > case ISCSI_STATE_IN_RECOVERY: > reason = FAILURE_SESSION_IN_RECOVERY; > sc->result = DID_IMM_RETRY << 16; > @@ -1980,6 +1989,19 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) > } > > if (session->state != ISCSI_STATE_LOGGED_IN) { > + /* > + * During shutdown, if session is prematurely disconnected, > + * recovery won't happen and there will be hung cmds. Not > + * handling cmds would trigger EH, also bad in this case. > + * Instead, handle cmd, allow completion to happen and let > + * upper layer to deal with the result. > + */ > + if (unlikely(system_state != SYSTEM_RUNNING)) { > + sc->result = DID_NO_CONNECT << 16; > + ISCSI_DBG_EH(session, "sc on shutdown, handled\n"); > + rc = BLK_EH_HANDLED; > + goto done; > + } > /* > * We are probably in the middle of iscsi recovery so let > * that complete and handle the error. > @@ -2084,7 +2106,7 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) > task->last_timeout = jiffies; > spin_unlock(&session->frwd_lock); > ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ? > - "timer reset" : "nh"); > + "timer reset" : "shutdown or nh"); > return rc; > } > EXPORT_SYMBOL_GPL(iscsi_eh_cmd_timed_out); > This patch has been resent and can be ignored.
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 42381adf0769..f11c16500ff1 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1696,6 +1696,15 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc) */ switch (session->state) { case ISCSI_STATE_FAILED: + /* + * cmds should fail during shutdown, if the session + * state is bad, allowing completion to happen + */ + if (unlikely(system_state != SYSTEM_RUNNING)) { + reason = FAILURE_SESSION_FAILED; + sc->result = DID_NO_CONNECT << 16; + break; + } case ISCSI_STATE_IN_RECOVERY: reason = FAILURE_SESSION_IN_RECOVERY; sc->result = DID_IMM_RETRY << 16; @@ -1980,6 +1989,19 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) } if (session->state != ISCSI_STATE_LOGGED_IN) { + /* + * During shutdown, if session is prematurely disconnected, + * recovery won't happen and there will be hung cmds. Not + * handling cmds would trigger EH, also bad in this case. + * Instead, handle cmd, allow completion to happen and let + * upper layer to deal with the result. + */ + if (unlikely(system_state != SYSTEM_RUNNING)) { + sc->result = DID_NO_CONNECT << 16; + ISCSI_DBG_EH(session, "sc on shutdown, handled\n"); + rc = BLK_EH_HANDLED; + goto done; + } /* * We are probably in the middle of iscsi recovery so let * that complete and handle the error. @@ -2084,7 +2106,7 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) task->last_timeout = jiffies; spin_unlock(&session->frwd_lock); ISCSI_DBG_EH(session, "return %s\n", rc == BLK_EH_RESET_TIMER ? - "timer reset" : "nh"); + "timer reset" : "shutdown or nh"); return rc; } EXPORT_SYMBOL_GPL(iscsi_eh_cmd_timed_out);