diff mbox series

[v2] ata: libata-core: do not issue non-internal commands once EH is pending

Message ID 20221107161036.670237-1-niklas.cassel@wdc.com
State New
Headers show
Series [v2] ata: libata-core: do not issue non-internal commands once EH is pending | expand

Commit Message

Niklas Cassel Nov. 7, 2022, 4:10 p.m. UTC
scsi_queue_rq() will check if scsi_host_in_recovery() (state is
SHOST_RECOVERY), and if so, it will _not_ issue a command via:
scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
-> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()

Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
when receiving a TFES error IRQ, this was the call chain:
ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)

Which meant that as soon as the error IRQ was serviced, no additional
commands sent from the block layer (scsi_queue_rq()) would be sent down
to the device. (ATA internal commands originating for ATA EH could of
course still be sent.)

However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
scsi_times_out() would instead result in a call to scsi_abort_command() ->
queue_delayed_work(). work function: scmd_eh_abort_handler() would call
scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).

(It was possible to get the old behavior if host->hostt->no_async_abort
was set, but libata never used it, and this option was completely removed
in commit a06586325f37 ("scsi: make asynchronous aborts mandatory"))

Additionally, later in commit 358f70da49d7 ("blk-mq: make
blk_abort_request() trigger timeout path"), blk_abort_request() was changed
to also call the abort callback asynchronously.

So now, after the TFES error irq has been serviced, we need to wait for
two different workqueues to run their work, before the SHOST_RECOVERY
state gets set.

While the ATA specification states that a device should return command
aborted for all commands queued after the device has entered error state,
since ATA only keeps the sense data for the latest command (in non-NCQ
case), we really don't want to send block layer commands to the device
after it has entered error state. (Only ATA EH commands should be sent,
to read the sense data etc.)

While we could just call scsi_host_set_state(shost, SHOST_RECOVERY) from
ata_qc_schedule_eh() directly, that might be a layering violation.
So instead of doing that, add an additional check against the libata's own
EH flag(s) before calling the qc_defer callback.

Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
Changes since v1:
-Implemented review comments from Damien.

 drivers/ata/libata-scsi.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

John Garry Nov. 7, 2022, 5:01 p.m. UTC | #1
On 07/11/2022 16:10, Niklas Cassel wrote:
> scsi_queue_rq() will check if scsi_host_in_recovery() (state is
> SHOST_RECOVERY), and if so, it will _not_ issue a command via:
> scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
> -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
> 
> Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
> when receiving a TFES error IRQ, this was the call chain:
> ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
> ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
> blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
> scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
> 
> Which meant that as soon as the error IRQ was serviced, no additional
> commands sent from the block layer (scsi_queue_rq()) would be sent down
> to the device. (ATA internal commands originating for ATA EH could of
> course still be sent.)
> 
> However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
> scsi_times_out() would instead result in a call to scsi_abort_command() ->
> queue_delayed_work(). work function: scmd_eh_abort_handler() would call
> scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
> 
> (It was possible to get the old behavior if host->hostt->no_async_abort
> was set, but libata never used it, and this option was completely removed
> in commit a06586325f37 ("scsi: make asynchronous aborts mandatory"))
> 
> Additionally, later in commit 358f70da49d7 ("blk-mq: make
> blk_abort_request() trigger timeout path"), blk_abort_request() was changed
> to also call the abort callback asynchronously.
> 
> So now, after the TFES error irq has been serviced, we need to wait for
> two different workqueues to run their work, before the SHOST_RECOVERY
> state gets set.
> 
> While the ATA specification states that a device should return command
> aborted for all commands queued after the device has entered error state,
> since ATA only keeps the sense data for the latest command (in non-NCQ
> case), we really don't want to send block layer commands to the device
> after it has entered error state. (Only ATA EH commands should be sent,
> to read the sense data etc.)
> 
> While we could just call scsi_host_set_state(shost, SHOST_RECOVERY) from
> ata_qc_schedule_eh() directly, that might be a layering violation.
> So instead of doing that, add an additional check against the libata's own
> EH flag(s) before calling the qc_defer callback.
> 
> Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
> Changes since v1:
> -Implemented review comments from Damien.
> 
>   drivers/ata/libata-scsi.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 4cb914103382..383a208f5f99 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
>   	if (xlat_func(qc))
>   		goto early_finish;
>   
> +	/*
> +	 * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
> +	 *
> +	 * When getting an error interrupt, ata_port_abort() will be called,
> +	 * which ends up calling ata_qc_schedule_eh() on all QCs.
> +	 *
> +	 * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
> +	 * blk_abort_request() on the given QC. blk_abort_request() will
> +	 * asynchronously end up calling scsi_eh_scmd_add(), which will set
> +	 * the state to SHOST_RECOVERY and wake up SCSI EH.
> +	 *
> +	 * In order to avoid requests from being issued to the device from the
> +	 * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
> +	 * sets the state to SHOST_RECOVERY, we defer requests here as well.
> +	 */
> +	if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
> +		rc = ATA_DEFER_LINK;
> +		goto defer;

Could we move this check earlier? I mean, we didn't need to have the qc 
alloc'ed and xlat'ed for this check to be done, right?

> +	}
> +
>   	if (ap->ops->qc_defer) {
>   		if ((rc = ap->ops->qc_defer(qc)))
>   			goto defer;

This solves the issue I saw in a QEMU AHCI NCQ error model where 
ahci_error_intr() -> ata_port_abort() is called multiple times, so, FWIW:

Tested-by: John Garry <john.g.garry@oracle.com>

Thanks
Niklas Cassel Nov. 7, 2022, 11:57 p.m. UTC | #2
On Mon, Nov 07, 2022 at 05:01:53PM +0000, John Garry wrote:
> On 07/11/2022 16:10, Niklas Cassel wrote:
> > scsi_queue_rq() will check if scsi_host_in_recovery() (state is
> > SHOST_RECOVERY), and if so, it will _not_ issue a command via:
> > scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
> > -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
> > 
> > Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
> > when receiving a TFES error IRQ, this was the call chain:
> > ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
> > ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
> > blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
> > scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
> > 
> > Which meant that as soon as the error IRQ was serviced, no additional
> > commands sent from the block layer (scsi_queue_rq()) would be sent down
> > to the device. (ATA internal commands originating for ATA EH could of
> > course still be sent.)
> > 
> > However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
> > scsi_times_out() would instead result in a call to scsi_abort_command() ->
> > queue_delayed_work(). work function: scmd_eh_abort_handler() would call
> > scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
> > 
> > (It was possible to get the old behavior if host->hostt->no_async_abort
> > was set, but libata never used it, and this option was completely removed
> > in commit a06586325f37 ("scsi: make asynchronous aborts mandatory"))
> > 
> > Additionally, later in commit 358f70da49d7 ("blk-mq: make
> > blk_abort_request() trigger timeout path"), blk_abort_request() was changed
> > to also call the abort callback asynchronously.
> > 
> > So now, after the TFES error irq has been serviced, we need to wait for
> > two different workqueues to run their work, before the SHOST_RECOVERY
> > state gets set.
> > 
> > While the ATA specification states that a device should return command
> > aborted for all commands queued after the device has entered error state,
> > since ATA only keeps the sense data for the latest command (in non-NCQ
> > case), we really don't want to send block layer commands to the device
> > after it has entered error state. (Only ATA EH commands should be sent,
> > to read the sense data etc.)
> > 
> > While we could just call scsi_host_set_state(shost, SHOST_RECOVERY) from
> > ata_qc_schedule_eh() directly, that might be a layering violation.
> > So instead of doing that, add an additional check against the libata's own
> > EH flag(s) before calling the qc_defer callback.
> > 
> > Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> > Changes since v1:
> > -Implemented review comments from Damien.
> > 
> >   drivers/ata/libata-scsi.c | 20 ++++++++++++++++++++
> >   1 file changed, 20 insertions(+)
> > 
> > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> > index 4cb914103382..383a208f5f99 100644
> > --- a/drivers/ata/libata-scsi.c
> > +++ b/drivers/ata/libata-scsi.c
> > @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
> >   	if (xlat_func(qc))
> >   		goto early_finish;
> > +	/*
> > +	 * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
> > +	 *
> > +	 * When getting an error interrupt, ata_port_abort() will be called,
> > +	 * which ends up calling ata_qc_schedule_eh() on all QCs.
> > +	 *
> > +	 * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
> > +	 * blk_abort_request() on the given QC. blk_abort_request() will
> > +	 * asynchronously end up calling scsi_eh_scmd_add(), which will set
> > +	 * the state to SHOST_RECOVERY and wake up SCSI EH.
> > +	 *
> > +	 * In order to avoid requests from being issued to the device from the
> > +	 * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
> > +	 * sets the state to SHOST_RECOVERY, we defer requests here as well.
> > +	 */
> > +	if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
> > +		rc = ATA_DEFER_LINK;
> > +		goto defer;
> 
> Could we move this check earlier? I mean, we didn't need to have the qc
> alloc'ed and xlat'ed for this check to be done, right?

Sure, we could put it in e.g. ata_scsi_queuecmd() or __ata_scsi_queuecmd().


Or, perhaps it is just time to accept that ATA EH is so interconnected with
SCSI EH, so the simplest thing is just to do:

--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
 
        qc->flags |= ATA_QCFLAG_FAILED;
        ata_eh_set_pending(ap, 1);
+       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
 
        /* The following will fail if timeout has already expired.
         * ata_scsi_error() takes care of such scmds on EH entry.


Which appears to work just as well as the patch in $subject.

In commit ee7863bc68fa ("[PATCH] SCSI: implement shost->host_eh_scheduled")
Tejun mentioned that "... libata is planning to depart from SCSI, so, for the
time being, libata will be using SCSI EH to handle such exceptions."

Now, 16 years later, ATA is still using SCSI EH (see ata_port_schedule_eh() and
ata_std_sched_eh()) to schedule EH in the case when there are no QCs to abort:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-eh.c?h=v6.1-rc4#n1004

Damien, thoughts?
Damien Le Moal Nov. 8, 2022, 3:50 a.m. UTC | #3
+ linux-scsi, Martin and James

On 11/8/22 08:57, Niklas Cassel wrote:
> On Mon, Nov 07, 2022 at 05:01:53PM +0000, John Garry wrote:
>> On 07/11/2022 16:10, Niklas Cassel wrote:
>>> scsi_queue_rq() will check if scsi_host_in_recovery() (state is
>>> SHOST_RECOVERY), and if so, it will _not_ issue a command via:
>>> scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
>>> -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
>>>
>>> Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
>>> when receiving a TFES error IRQ, this was the call chain:
>>> ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
>>> ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
>>> blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
>>> scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
>>>
>>> Which meant that as soon as the error IRQ was serviced, no additional
>>> commands sent from the block layer (scsi_queue_rq()) would be sent down
>>> to the device. (ATA internal commands originating for ATA EH could of
>>> course still be sent.)
>>>
>>> However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
>>> scsi_times_out() would instead result in a call to scsi_abort_command() ->
>>> queue_delayed_work(). work function: scmd_eh_abort_handler() would call
>>> scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
>>>
>>> (It was possible to get the old behavior if host->hostt->no_async_abort
>>> was set, but libata never used it, and this option was completely removed
>>> in commit a06586325f37 ("scsi: make asynchronous aborts mandatory"))
>>>
>>> Additionally, later in commit 358f70da49d7 ("blk-mq: make
>>> blk_abort_request() trigger timeout path"), blk_abort_request() was changed
>>> to also call the abort callback asynchronously.
>>>
>>> So now, after the TFES error irq has been serviced, we need to wait for
>>> two different workqueues to run their work, before the SHOST_RECOVERY
>>> state gets set.
>>>
>>> While the ATA specification states that a device should return command
>>> aborted for all commands queued after the device has entered error state,
>>> since ATA only keeps the sense data for the latest command (in non-NCQ
>>> case), we really don't want to send block layer commands to the device
>>> after it has entered error state. (Only ATA EH commands should be sent,
>>> to read the sense data etc.)
>>>
>>> While we could just call scsi_host_set_state(shost, SHOST_RECOVERY) from
>>> ata_qc_schedule_eh() directly, that might be a layering violation.
>>> So instead of doing that, add an additional check against the libata's own
>>> EH flag(s) before calling the qc_defer callback.
>>>
>>> Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
>>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>>> ---
>>> Changes since v1:
>>> -Implemented review comments from Damien.
>>>
>>>   drivers/ata/libata-scsi.c | 20 ++++++++++++++++++++
>>>   1 file changed, 20 insertions(+)
>>>
>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>> index 4cb914103382..383a208f5f99 100644
>>> --- a/drivers/ata/libata-scsi.c
>>> +++ b/drivers/ata/libata-scsi.c
>>> @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
>>>   	if (xlat_func(qc))
>>>   		goto early_finish;
>>> +	/*
>>> +	 * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
>>> +	 *
>>> +	 * When getting an error interrupt, ata_port_abort() will be called,
>>> +	 * which ends up calling ata_qc_schedule_eh() on all QCs.
>>> +	 *
>>> +	 * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
>>> +	 * blk_abort_request() on the given QC. blk_abort_request() will
>>> +	 * asynchronously end up calling scsi_eh_scmd_add(), which will set
>>> +	 * the state to SHOST_RECOVERY and wake up SCSI EH.
>>> +	 *
>>> +	 * In order to avoid requests from being issued to the device from the
>>> +	 * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
>>> +	 * sets the state to SHOST_RECOVERY, we defer requests here as well.
>>> +	 */
>>> +	if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
>>> +		rc = ATA_DEFER_LINK;
>>> +		goto defer;
>>
>> Could we move this check earlier? I mean, we didn't need to have the qc
>> alloc'ed and xlat'ed for this check to be done, right?
> 
> Sure, we could put it in e.g. ata_scsi_queuecmd() or __ata_scsi_queuecmd().
> 
> 
> Or, perhaps it is just time to accept that ATA EH is so interconnected with
> SCSI EH, so the simplest thing is just to do:
> 
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
>  
>         qc->flags |= ATA_QCFLAG_FAILED;
>         ata_eh_set_pending(ap, 1);
> +       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);

Why put this in this function ? Nothing in ata_qc_schedule_eh() calls
scsi_schedule_eh() or scsi_eh_scmd_add(), which set that state.
It looks like ata_port->ops->sched_eh(), that is, ata_std_sched_eh(),
would be a better place for this.

>  
>         /* The following will fail if timeout has already expired.
>          * ata_scsi_error() takes care of such scmds on EH entry.
> 
> 
> Which appears to work just as well as the patch in $subject.
> 
> In commit ee7863bc68fa ("[PATCH] SCSI: implement shost->host_eh_scheduled")
> Tejun mentioned that "... libata is planning to depart from SCSI, so, for the
> time being, libata will be using SCSI EH to handle such exceptions."
> 
> Now, 16 years later, ATA is still using SCSI EH (see ata_port_schedule_eh() and
> ata_std_sched_eh()) to schedule EH in the case when there are no QCs to abort:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-eh.c?h=v6.1-rc4#n1004
> 
> Damien, thoughts?

I like this simpler fix better, but it does introduce a risk of (again)
having problems with ata EH if scsi EH trigger/timing is changed.
Unlikely now, but as this fix proves, not unheard of.

The v2 change is fine too, modulo John's suggestion, which I agree with.
At least it is consistent with the ata internal eh state accounting, so
self contained within libata and somewhat less dependent on scsi state
machine.

Martin ? James ? Thoughts ?
Hannes Reinecke Nov. 8, 2022, 7:10 a.m. UTC | #4
On 11/8/22 00:57, Niklas Cassel wrote:
> On Mon, Nov 07, 2022 at 05:01:53PM +0000, John Garry wrote:
>> On 07/11/2022 16:10, Niklas Cassel wrote:
>>> scsi_queue_rq() will check if scsi_host_in_recovery() (state is
>>> SHOST_RECOVERY), and if so, it will _not_ issue a command via:
>>> scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
>>> -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
>>>
>>> Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
>>> when receiving a TFES error IRQ, this was the call chain:
>>> ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
>>> ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
>>> blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
>>> scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
>>>
>>> Which meant that as soon as the error IRQ was serviced, no additional
>>> commands sent from the block layer (scsi_queue_rq()) would be sent down
>>> to the device. (ATA internal commands originating for ATA EH could of
>>> course still be sent.)
>>>
>>> However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
>>> scsi_times_out() would instead result in a call to scsi_abort_command() ->
>>> queue_delayed_work(). work function: scmd_eh_abort_handler() would call
>>> scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
>>>
>>> (It was possible to get the old behavior if host->hostt->no_async_abort
>>> was set, but libata never used it, and this option was completely removed
>>> in commit a06586325f37 ("scsi: make asynchronous aborts mandatory"))
>>>
>>> Additionally, later in commit 358f70da49d7 ("blk-mq: make
>>> blk_abort_request() trigger timeout path"), blk_abort_request() was changed
>>> to also call the abort callback asynchronously.
>>>
>>> So now, after the TFES error irq has been serviced, we need to wait for
>>> two different workqueues to run their work, before the SHOST_RECOVERY
>>> state gets set.
>>>
>>> While the ATA specification states that a device should return command
>>> aborted for all commands queued after the device has entered error state,
>>> since ATA only keeps the sense data for the latest command (in non-NCQ
>>> case), we really don't want to send block layer commands to the device
>>> after it has entered error state. (Only ATA EH commands should be sent,
>>> to read the sense data etc.)
>>>
>>> While we could just call scsi_host_set_state(shost, SHOST_RECOVERY) from
>>> ata_qc_schedule_eh() directly, that might be a layering violation.
>>> So instead of doing that, add an additional check against the libata's own
>>> EH flag(s) before calling the qc_defer callback.
>>>
>>> Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
>>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>>> ---
>>> Changes since v1:
>>> -Implemented review comments from Damien.
>>>
>>>    drivers/ata/libata-scsi.c | 20 ++++++++++++++++++++
>>>    1 file changed, 20 insertions(+)
>>>
>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>> index 4cb914103382..383a208f5f99 100644
>>> --- a/drivers/ata/libata-scsi.c
>>> +++ b/drivers/ata/libata-scsi.c
>>> @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
>>>    	if (xlat_func(qc))
>>>    		goto early_finish;
>>> +	/*
>>> +	 * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
>>> +	 *
>>> +	 * When getting an error interrupt, ata_port_abort() will be called,
>>> +	 * which ends up calling ata_qc_schedule_eh() on all QCs.
>>> +	 *
>>> +	 * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
>>> +	 * blk_abort_request() on the given QC. blk_abort_request() will
>>> +	 * asynchronously end up calling scsi_eh_scmd_add(), which will set
>>> +	 * the state to SHOST_RECOVERY and wake up SCSI EH.
>>> +	 *
>>> +	 * In order to avoid requests from being issued to the device from the
>>> +	 * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
>>> +	 * sets the state to SHOST_RECOVERY, we defer requests here as well.
>>> +	 */
>>> +	if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
>>> +		rc = ATA_DEFER_LINK;
>>> +		goto defer;
>>
>> Could we move this check earlier? I mean, we didn't need to have the qc
>> alloc'ed and xlat'ed for this check to be done, right?
> 
> Sure, we could put it in e.g. ata_scsi_queuecmd() or __ata_scsi_queuecmd().
> 
> 
> Or, perhaps it is just time to accept that ATA EH is so interconnected with
> SCSI EH, so the simplest thing is just to do:
> 
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
>   
>          qc->flags |= ATA_QCFLAG_FAILED;
>          ata_eh_set_pending(ap, 1);
> +       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
>   
>          /* The following will fail if timeout has already expired.
>           * ata_scsi_error() takes care of such scmds on EH entry.
> 
> 
> Which appears to work just as well as the patch in $subject.
> 
> In commit ee7863bc68fa ("[PATCH] SCSI: implement shost->host_eh_scheduled")
> Tejun mentioned that "... libata is planning to depart from SCSI, so, for the
> time being, libata will be using SCSI EH to handle such exceptions."
> 
> Now, 16 years later, ATA is still using SCSI EH (see ata_port_schedule_eh() and
> ata_std_sched_eh()) to schedule EH in the case when there are no QCs to abort:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-eh.c?h=v6.1-rc4#n1004
> 
> Damien, thoughts?

Yeah, that was the original idea. By the time libata was designed we 
still had quite a lot of drivers in drivers/ide, and it wasn't quite 
clear if managing everything via SATL was the way to go.
So one idea was to allow libata to become stand-alone, and SATL working 
as a modular interface on top of libata.
However, this never materialized, as everyone was quite happy having to 
deal with SCSI devices only, even if that meant to go via an additional 
layer of interpolation.

With the introduction of libsas this idea became even more dubious, as 
libsas really called for a _tighter_ integration with SCSI.
Plus ATA in general seems to be heading into the sunset, so I sincerely 
doubt anyone really wants to head that way.

In fact, with the recent patches we should finally drop the pretension 
that libata will ever be standalone, and concentrate on making it more 
aligned with SCSI.

Cheers,

Hannes
Damien Le Moal Nov. 8, 2022, 7:13 a.m. UTC | #5
On 11/8/22 16:10, Hannes Reinecke wrote:
> On 11/8/22 00:57, Niklas Cassel wrote:
>> On Mon, Nov 07, 2022 at 05:01:53PM +0000, John Garry wrote:
>>> On 07/11/2022 16:10, Niklas Cassel wrote:
>>>> scsi_queue_rq() will check if scsi_host_in_recovery() (state is
>>>> SHOST_RECOVERY), and if so, it will _not_ issue a command via:
>>>> scsi_dispatch_cmd() -> host->hostt->queuecommand()
>>>> (ata_scsi_queuecmd())
>>>> -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
>>>>
>>>> Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
>>>> when receiving a TFES error IRQ, this was the call chain:
>>>> ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
>>>> ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
>>>> blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
>>>> scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
>>>>
>>>> Which meant that as soon as the error IRQ was serviced, no additional
>>>> commands sent from the block layer (scsi_queue_rq()) would be sent down
>>>> to the device. (ATA internal commands originating for ATA EH could of
>>>> course still be sent.)
>>>>
>>>> However, after commit e494f6a72839 ("[SCSI] improved eh timeout
>>>> handler"),
>>>> scsi_times_out() would instead result in a call to
>>>> scsi_abort_command() ->
>>>> queue_delayed_work(). work function: scmd_eh_abort_handler() would call
>>>> scsi_eh_scmd_add(), which calls scsi_host_set_state(shost,
>>>> SHOST_RECOVERY).
>>>>
>>>> (It was possible to get the old behavior if host->hostt->no_async_abort
>>>> was set, but libata never used it, and this option was completely
>>>> removed
>>>> in commit a06586325f37 ("scsi: make asynchronous aborts mandatory"))
>>>>
>>>> Additionally, later in commit 358f70da49d7 ("blk-mq: make
>>>> blk_abort_request() trigger timeout path"), blk_abort_request() was
>>>> changed
>>>> to also call the abort callback asynchronously.
>>>>
>>>> So now, after the TFES error irq has been serviced, we need to wait for
>>>> two different workqueues to run their work, before the SHOST_RECOVERY
>>>> state gets set.
>>>>
>>>> While the ATA specification states that a device should return command
>>>> aborted for all commands queued after the device has entered error
>>>> state,
>>>> since ATA only keeps the sense data for the latest command (in non-NCQ
>>>> case), we really don't want to send block layer commands to the device
>>>> after it has entered error state. (Only ATA EH commands should be sent,
>>>> to read the sense data etc.)
>>>>
>>>> While we could just call scsi_host_set_state(shost, SHOST_RECOVERY)
>>>> from
>>>> ata_qc_schedule_eh() directly, that might be a layering violation.
>>>> So instead of doing that, add an additional check against the
>>>> libata's own
>>>> EH flag(s) before calling the qc_defer callback.
>>>>
>>>> Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
>>>> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
>>>> ---
>>>> Changes since v1:
>>>> -Implemented review comments from Damien.
>>>>
>>>>    drivers/ata/libata-scsi.c | 20 ++++++++++++++++++++
>>>>    1 file changed, 20 insertions(+)
>>>>
>>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>>> index 4cb914103382..383a208f5f99 100644
>>>> --- a/drivers/ata/libata-scsi.c
>>>> +++ b/drivers/ata/libata-scsi.c
>>>> @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct
>>>> ata_device *dev, struct scsi_cmnd *cmd,
>>>>        if (xlat_func(qc))
>>>>            goto early_finish;
>>>> +    /*
>>>> +     * scsi_queue_rq() will defer commands when in state
>>>> SHOST_RECOVERY.
>>>> +     *
>>>> +     * When getting an error interrupt, ata_port_abort() will be
>>>> called,
>>>> +     * which ends up calling ata_qc_schedule_eh() on all QCs.
>>>> +     *
>>>> +     * ata_qc_schedule_eh() will call ata_eh_set_pending() and then
>>>> call
>>>> +     * blk_abort_request() on the given QC. blk_abort_request() will
>>>> +     * asynchronously end up calling scsi_eh_scmd_add(), which will
>>>> set
>>>> +     * the state to SHOST_RECOVERY and wake up SCSI EH.
>>>> +     *
>>>> +     * In order to avoid requests from being issued to the device
>>>> from the
>>>> +     * time ata_eh_set_pending() is called, to the time
>>>> scsi_eh_scmd_add()
>>>> +     * sets the state to SHOST_RECOVERY, we defer requests here as
>>>> well.
>>>> +     */
>>>> +    if (ap->pflags & (ATA_PFLAG_EH_PENDING |
>>>> ATA_PFLAG_EH_IN_PROGRESS)) {
>>>> +        rc = ATA_DEFER_LINK;
>>>> +        goto defer;
>>>
>>> Could we move this check earlier? I mean, we didn't need to have the qc
>>> alloc'ed and xlat'ed for this check to be done, right?
>>
>> Sure, we could put it in e.g. ata_scsi_queuecmd() or
>> __ata_scsi_queuecmd().
>>
>>
>> Or, perhaps it is just time to accept that ATA EH is so interconnected
>> with
>> SCSI EH, so the simplest thing is just to do:
>>
>> --- a/drivers/ata/libata-eh.c
>> +++ b/drivers/ata/libata-eh.c
>> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
>>            qc->flags |= ATA_QCFLAG_FAILED;
>>          ata_eh_set_pending(ap, 1);
>> +       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
>>            /* The following will fail if timeout has already expired.
>>           * ata_scsi_error() takes care of such scmds on EH entry.
>>
>>
>> Which appears to work just as well as the patch in $subject.
>>
>> In commit ee7863bc68fa ("[PATCH] SCSI: implement
>> shost->host_eh_scheduled")
>> Tejun mentioned that "... libata is planning to depart from SCSI, so,
>> for the
>> time being, libata will be using SCSI EH to handle such exceptions."
>>
>> Now, 16 years later, ATA is still using SCSI EH (see
>> ata_port_schedule_eh() and
>> ata_std_sched_eh()) to schedule EH in the case when there are no QCs
>> to abort:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-eh.c?h=v6.1-rc4#n1004
>>
>> Damien, thoughts?
> 
> Yeah, that was the original idea. By the time libata was designed we
> still had quite a lot of drivers in drivers/ide, and it wasn't quite
> clear if managing everything via SATL was the way to go.
> So one idea was to allow libata to become stand-alone, and SATL working
> as a modular interface on top of libata.
> However, this never materialized, as everyone was quite happy having to
> deal with SCSI devices only, even if that meant to go via an additional
> layer of interpolation.
> 
> With the introduction of libsas this idea became even more dubious, as
> libsas really called for a _tighter_ integration with SCSI.
> Plus ATA in general seems to be heading into the sunset, so I sincerely
> doubt anyone really wants to head that way.
> 
> In fact, with the recent patches we should finally drop the pretension
> that libata will ever be standalone, and concentrate on making it more
> aligned with SCSI.

So you are saying you prefer the fix proposed by Niklas above, calling
scsi_host_set_state() from libata ? I feel OK-ish about it... But if it
is called in the right place, why not.

> 
> Cheers,
> 
> Hannes
Niklas Cassel Nov. 8, 2022, 11:14 p.m. UTC | #6
On Tue, Nov 08, 2022 at 12:50:44PM +0900, Damien Le Moal wrote:
> >>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> >>> index 4cb914103382..383a208f5f99 100644
> >>> --- a/drivers/ata/libata-scsi.c
> >>> +++ b/drivers/ata/libata-scsi.c
> >>> @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
> >>>   	if (xlat_func(qc))
> >>>   		goto early_finish;
> >>> +	/*
> >>> +	 * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
> >>> +	 *
> >>> +	 * When getting an error interrupt, ata_port_abort() will be called,
> >>> +	 * which ends up calling ata_qc_schedule_eh() on all QCs.
> >>> +	 *
> >>> +	 * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
> >>> +	 * blk_abort_request() on the given QC. blk_abort_request() will
> >>> +	 * asynchronously end up calling scsi_eh_scmd_add(), which will set
> >>> +	 * the state to SHOST_RECOVERY and wake up SCSI EH.
> >>> +	 *
> >>> +	 * In order to avoid requests from being issued to the device from the
> >>> +	 * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
> >>> +	 * sets the state to SHOST_RECOVERY, we defer requests here as well.
> >>> +	 */
> >>> +	if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
> >>> +		rc = ATA_DEFER_LINK;
> >>> +		goto defer;
> >>
> >> Could we move this check earlier? I mean, we didn't need to have the qc
> >> alloc'ed and xlat'ed for this check to be done, right?
> > 
> > Sure, we could put it in e.g. ata_scsi_queuecmd() or __ata_scsi_queuecmd().
> > 
> > 
> > Or, perhaps it is just time to accept that ATA EH is so interconnected with
> > SCSI EH, so the simplest thing is just to do:
> > 
> > --- a/drivers/ata/libata-eh.c
> > +++ b/drivers/ata/libata-eh.c
> > @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
> >  
> >         qc->flags |= ATA_QCFLAG_FAILED;
> >         ata_eh_set_pending(ap, 1);
> > +       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
> 
> Why put this in this function ? Nothing in ata_qc_schedule_eh() calls
> scsi_schedule_eh() or scsi_eh_scmd_add(), which set that state.

It does, but after calling blk_abort_request(), we need to wait for
two different workqueues to run their work, before the SHOST_RECOVERY
state gets set:

blk_abort_request() -> kblockd_schedule_work(&req->q->timeout_work) ->
queue_work(kblockd_workqueue, work)

... -> blk_mq_timeout_work() -> blk_mq_handle_expired() ->
blk_mq_rq_timed_out() -> req->q->mq_ops->timeout() (scsi_timeout()) ->
scsi_abort_command() ->
queue_delayed_work(shost->tmf_work_q, &scmd->abort_work, HZ / 100);

... -> scmd_eh_abort_handler() -> scsi_eh_scmd_add() ->
scsi_host_set_state(shost, SHOST_RECOVERY)

After setting state to SHOST_RECOVERY, scsi_eh_scmd_add() will
call scsi_eh_inc_host_failed(), which will cause the while (true) loop
in scsi_error_handler() to proceed to perform SCSI EH, and eventually
call shost->transportt->eh_strategy_handler(shost) which for ATA is set
to ata_scsi_error().

Then we have the regular ATA side of things:
ata_scsi_error() -> ata_scsi_port_error_handler() ->
ap->ops->error_handler(ap) -> (for e.g. AHCI) ahci_error_handler() ->
sata_pmp_error_handler() -> ata_eh_autopsy() -> ata_eh_link_autopsy() ->
ata_eh_analyze_ncq_error(). (Where reading the NCQ error log in the
command that brings the device out of error state.)


Looking at this original commit, there are two ways for libata to trigger
SCSI EH:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b70fc039824bc7303e4007a5f758f832de56611

ata_qc_schedule_eh(): which is explained above. It indirectly schedules
SCSI with an associated QC.

ata_port_schedule_eh(): It directly schedules EH for @ap without an
associated qc. (I assume this is for e.g. an errors with the link,
where we get an error interrupt and need to read SError register.)


The commit message here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee7863bc68fa6ad6fe7cfcc0e5ebe9efe0c0664e

"In short, SCSI host is not supposed to know about exceptions unrelated
to specific device or command.  Such exceptions should be handled by
transport layer proper.  However, the distinction is not essential to
ATA and libata is planning to depart from SCSI, so, for the time
being, libata will be using SCSI EH to handle such exceptions."

So it appears that this distinction is not important for libata.
Sure, if libata EH function ata_scsi_error() sees any commands in
host->eh_cmd_q, then we know that they timed out or aborted.
But ata_scsi_cmd_error_handler() will leave any QC alone that
has ATA_QCFLAG_FAILED flag set.
Those QCs will instead be processed by ata_scsi_port_error_handler()
which totally ignores the host->eh_cmd_q list supplied by SCSI EH,
and only looks at QCs with ATA_QCFLAG_FAILED set.


So it would be tempting to do something like:
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -1001,8 +1001,7 @@ static int ata_do_link_abort(struct ata_port *ap, struct ata_link *link)
                }
        }
 
-       if (!nr_aborted)
-               ata_port_schedule_eh(ap);
+       ata_port_schedule_eh(ap);
 
        return nr_aborted;


However, doing so would go against how this API is supposed to be used, see:
36fed4980529 ("[SCSI] libsas: cleanup spurious calls to scsi_schedule_eh")

I did decide to try it anyway, but it turns out both this and the previous
suggestion:

--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)

        qc->flags |= ATA_QCFLAG_FAILED;
        ata_eh_set_pending(ap, 1);
+       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);


Both actually lead to two error interrupts.
(We should only have one.)

QEMU shows that this is from scsi_restart_operations(),
which temporary clears the SHOST_RECOVERY bit, then calls
scsi_run_host_queues().

Since we've reached this far, that must mean that in
#3  scsi_queue_rq

when the:
if (unlikely(scsi_host_in_recovery(shost))) {
check was done, we were not in recovery.

But from that time, to #0, we must have received an error irq,
because a:
(gdb) p /x scmd->device->host->shost_state
$9 = 0x5

which is SHOST_RECOVERY,


#0  __ata_scsi_queuecmd (scmd=scmd@entry=0xffff8881016d7b88, dev=0xffff888101e124c0) at drivers/ata/libata-scsi.c:4256
#1  0xffffffff819c1d8b in ata_scsi_queuecmd (shost=<optimized out>, cmd=0xffff8881016d7b88) at drivers/ata/libata-scsi.c:4337
#2  0xffffffff81995c41 in scsi_dispatch_cmd (cmd=0xffff8881016d7b88) at drivers/scsi/scsi_lib.c:1516
#3  scsi_queue_rq (hctx=<optimized out>, bd=<optimized out>) at drivers/scsi/scsi_lib.c:1757
--Type <RET> for more, q to quit, c to continue without paging--
#4  0xffffffff81578a42 in blk_mq_dispatch_rq_list (hctx=hctx@entry=0xffff8881008e0000, list=list@entry=0xffffc90000367da0, nr_budgets=0, nr_budgets@entry=1) at block/blk-mq.c:2056
#5  0xffffffff8157ef1b in __blk_mq_do_dispatch_sched (hctx=0xffff8881008e0000) at block/blk-mq-sched.c:173
#6  blk_mq_do_dispatch_sched (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:187
#7  0xffffffff8157f238 in __blk_mq_sched_dispatch_requests (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:313
#8  0xffffffff8157f2fb in blk_mq_sched_dispatch_requests (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:339
#9  0xffffffff81573cc0 in __blk_mq_run_hw_queue (hctx=0xffff8881008e0000) at block/blk-mq.c:2174
#10 0xffffffff8157546c in __blk_mq_delay_run_hw_queue (hctx=<optimized out>, async=<optimized out>, msecs=msecs@entry=0) at block/blk-mq.c:2250
#11 0xffffffff815756a6 in blk_mq_run_hw_queue (hctx=<optimized out>, async=async@entry=false) at block/blk-mq.c:2298
#12 0xffffffff81575990 in blk_mq_run_hw_queues (q=0xffff888102530000, async=async@entry=false) at block/blk-mq.c:2346
#13 0xffffffff8199221d in scsi_run_queue (q=<optimized out>) at drivers/scsi/scsi_lib.c:457
#14 0xffffffff81994ead in scsi_run_host_queues (shost=shost@entry=0xffff888101ca2000) at drivers/scsi/scsi_lib.c:475
#15 0xffffffff819918bf in scsi_restart_operations (shost=0xffff888101ca2000) at drivers/scsi/scsi_error.c:2134
#16 scsi_error_handler (data=0xffff888101ca2000) at drivers/scsi/scsi_error.c:2327


So it appears that simply checking if SHOST_RECOVERY is set in
scsi_queue_rq() is not enough. Since this is done without holding
ap->lock (which is in libata..), we can always get an error irq.

So the only reliable way to ensure that we don't send down requests
while the drive is in error state, is to have a check against
ATA_PFLAG_EH_PENDING inside __ata_scsi_queuecmd(), while holding
the ap->lock.
Will send a V3 that is similar to V2, but with the check inside
__ata_scsi_queuecmd().


Kind regards,
Niklas
Hannes Reinecke Nov. 9, 2022, 7:11 a.m. UTC | #7
On 11/9/22 00:14, Niklas Cassel wrote:
> On Tue, Nov 08, 2022 at 12:50:44PM +0900, Damien Le Moal wrote:
>>>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>>>> index 4cb914103382..383a208f5f99 100644
>>>>> --- a/drivers/ata/libata-scsi.c
>>>>> +++ b/drivers/ata/libata-scsi.c
>>>>> @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
>>>>>    	if (xlat_func(qc))
>>>>>    		goto early_finish;
>>>>> +	/*
>>>>> +	 * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
>>>>> +	 *
>>>>> +	 * When getting an error interrupt, ata_port_abort() will be called,
>>>>> +	 * which ends up calling ata_qc_schedule_eh() on all QCs.
>>>>> +	 *
>>>>> +	 * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
>>>>> +	 * blk_abort_request() on the given QC. blk_abort_request() will
>>>>> +	 * asynchronously end up calling scsi_eh_scmd_add(), which will set
>>>>> +	 * the state to SHOST_RECOVERY and wake up SCSI EH.
>>>>> +	 *
>>>>> +	 * In order to avoid requests from being issued to the device from the
>>>>> +	 * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
>>>>> +	 * sets the state to SHOST_RECOVERY, we defer requests here as well.
>>>>> +	 */
>>>>> +	if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
>>>>> +		rc = ATA_DEFER_LINK;
>>>>> +		goto defer;
>>>>
>>>> Could we move this check earlier? I mean, we didn't need to have the qc
>>>> alloc'ed and xlat'ed for this check to be done, right?
>>>
>>> Sure, we could put it in e.g. ata_scsi_queuecmd() or __ata_scsi_queuecmd().
>>>
>>>
>>> Or, perhaps it is just time to accept that ATA EH is so interconnected with
>>> SCSI EH, so the simplest thing is just to do:
>>>
>>> --- a/drivers/ata/libata-eh.c
>>> +++ b/drivers/ata/libata-eh.c
>>> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
>>>   
>>>          qc->flags |= ATA_QCFLAG_FAILED;
>>>          ata_eh_set_pending(ap, 1);
>>> +       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
>>
>> Why put this in this function ? Nothing in ata_qc_schedule_eh() calls
>> scsi_schedule_eh() or scsi_eh_scmd_add(), which set that state.
> 
> It does, but after calling blk_abort_request(), we need to wait for
> two different workqueues to run their work, before the SHOST_RECOVERY
> state gets set:
> 
> blk_abort_request() -> kblockd_schedule_work(&req->q->timeout_work) ->
> queue_work(kblockd_workqueue, work)
> 
> ... -> blk_mq_timeout_work() -> blk_mq_handle_expired() ->
> blk_mq_rq_timed_out() -> req->q->mq_ops->timeout() (scsi_timeout()) ->
> scsi_abort_command() ->
> queue_delayed_work(shost->tmf_work_q, &scmd->abort_work, HZ / 100);
> 
> ... -> scmd_eh_abort_handler() -> scsi_eh_scmd_add() ->
> scsi_host_set_state(shost, SHOST_RECOVERY)
> 
> After setting state to SHOST_RECOVERY, scsi_eh_scmd_add() will
> call scsi_eh_inc_host_failed(), which will cause the while (true) loop
> in scsi_error_handler() to proceed to perform SCSI EH, and eventually
> call shost->transportt->eh_strategy_handler(shost) which for ATA is set
> to ata_scsi_error().
> 
> Then we have the regular ATA side of things:
> ata_scsi_error() -> ata_scsi_port_error_handler() ->
> ap->ops->error_handler(ap) -> (for e.g. AHCI) ahci_error_handler() ->
> sata_pmp_error_handler() -> ata_eh_autopsy() -> ata_eh_link_autopsy() ->
> ata_eh_analyze_ncq_error(). (Where reading the NCQ error log in the
> command that brings the device out of error state.)
> 
> 
> Looking at this original commit, there are two ways for libata to trigger
> SCSI EH:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b70fc039824bc7303e4007a5f758f832de56611
> 
> ata_qc_schedule_eh(): which is explained above. It indirectly schedules
> SCSI with an associated QC.
> 
> ata_port_schedule_eh(): It directly schedules EH for @ap without an
> associated qc. (I assume this is for e.g. an errors with the link,
> where we get an error interrupt and need to read SError register.)
> 
> 
> The commit message here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee7863bc68fa6ad6fe7cfcc0e5ebe9efe0c0664e
> 
> "In short, SCSI host is not supposed to know about exceptions unrelated
> to specific device or command.  Such exceptions should be handled by
> transport layer proper.  However, the distinction is not essential to
> ATA and libata is planning to depart from SCSI, so, for the time
> being, libata will be using SCSI EH to handle such exceptions."
> 
> So it appears that this distinction is not important for libata.
> Sure, if libata EH function ata_scsi_error() sees any commands in
> host->eh_cmd_q, then we know that they timed out or aborted.
> But ata_scsi_cmd_error_handler() will leave any QC alone that
> has ATA_QCFLAG_FAILED flag set.
> Those QCs will instead be processed by ata_scsi_port_error_handler()
> which totally ignores the host->eh_cmd_q list supplied by SCSI EH,
> and only looks at QCs with ATA_QCFLAG_FAILED set.
> 
> 
> So it would be tempting to do something like:
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -1001,8 +1001,7 @@ static int ata_do_link_abort(struct ata_port *ap, struct ata_link *link)
>                  }
>          }
>   
> -       if (!nr_aborted)
> -               ata_port_schedule_eh(ap);
> +       ata_port_schedule_eh(ap);
>   
>          return nr_aborted;
> 
> 
> However, doing so would go against how this API is supposed to be used, see:
> 36fed4980529 ("[SCSI] libsas: cleanup spurious calls to scsi_schedule_eh")
> 
> I did decide to try it anyway, but it turns out both this and the previous
> suggestion:
> 
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
> 
>          qc->flags |= ATA_QCFLAG_FAILED;
>          ata_eh_set_pending(ap, 1);
> +       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
> 
> 
> Both actually lead to two error interrupts.
> (We should only have one.)
> 
> QEMU shows that this is from scsi_restart_operations(),
> which temporary clears the SHOST_RECOVERY bit, then calls
> scsi_run_host_queues().
> 
> Since we've reached this far, that must mean that in
> #3  scsi_queue_rq
> 
> when the:
> if (unlikely(scsi_host_in_recovery(shost))) {
> check was done, we were not in recovery.
> 
> But from that time, to #0, we must have received an error irq,
> because a:
> (gdb) p /x scmd->device->host->shost_state
> $9 = 0x5
> 
> which is SHOST_RECOVERY,
> 
> 
> #0  __ata_scsi_queuecmd (scmd=scmd@entry=0xffff8881016d7b88, dev=0xffff888101e124c0) at drivers/ata/libata-scsi.c:4256
> #1  0xffffffff819c1d8b in ata_scsi_queuecmd (shost=<optimized out>, cmd=0xffff8881016d7b88) at drivers/ata/libata-scsi.c:4337
> #2  0xffffffff81995c41 in scsi_dispatch_cmd (cmd=0xffff8881016d7b88) at drivers/scsi/scsi_lib.c:1516
> #3  scsi_queue_rq (hctx=<optimized out>, bd=<optimized out>) at drivers/scsi/scsi_lib.c:1757
> --Type <RET> for more, q to quit, c to continue without paging--
> #4  0xffffffff81578a42 in blk_mq_dispatch_rq_list (hctx=hctx@entry=0xffff8881008e0000, list=list@entry=0xffffc90000367da0, nr_budgets=0, nr_budgets@entry=1) at block/blk-mq.c:2056
> #5  0xffffffff8157ef1b in __blk_mq_do_dispatch_sched (hctx=0xffff8881008e0000) at block/blk-mq-sched.c:173
> #6  blk_mq_do_dispatch_sched (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:187
> #7  0xffffffff8157f238 in __blk_mq_sched_dispatch_requests (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:313
> #8  0xffffffff8157f2fb in blk_mq_sched_dispatch_requests (hctx=hctx@entry=0xffff8881008e0000) at block/blk-mq-sched.c:339
> #9  0xffffffff81573cc0 in __blk_mq_run_hw_queue (hctx=0xffff8881008e0000) at block/blk-mq.c:2174
> #10 0xffffffff8157546c in __blk_mq_delay_run_hw_queue (hctx=<optimized out>, async=<optimized out>, msecs=msecs@entry=0) at block/blk-mq.c:2250
> #11 0xffffffff815756a6 in blk_mq_run_hw_queue (hctx=<optimized out>, async=async@entry=false) at block/blk-mq.c:2298
> #12 0xffffffff81575990 in blk_mq_run_hw_queues (q=0xffff888102530000, async=async@entry=false) at block/blk-mq.c:2346
> #13 0xffffffff8199221d in scsi_run_queue (q=<optimized out>) at drivers/scsi/scsi_lib.c:457
> #14 0xffffffff81994ead in scsi_run_host_queues (shost=shost@entry=0xffff888101ca2000) at drivers/scsi/scsi_lib.c:475
> #15 0xffffffff819918bf in scsi_restart_operations (shost=0xffff888101ca2000) at drivers/scsi/scsi_error.c:2134
> #16 scsi_error_handler (data=0xffff888101ca2000) at drivers/scsi/scsi_error.c:2327
> 
> 
> So it appears that simply checking if SHOST_RECOVERY is set in
> scsi_queue_rq() is not enough. Since this is done without holding
> ap->lock (which is in libata..), we can always get an error irq.
> 
> So the only reliable way to ensure that we don't send down requests
> while the drive is in error state, is to have a check against
> ATA_PFLAG_EH_PENDING inside __ata_scsi_queuecmd(), while holding
> the ap->lock.
> Will send a V3 that is similar to V2, but with the check inside
> __ata_scsi_queuecmd().
> Thanks for the detailed explanation, Niklas.

However, one fundamental thing is still unresolved:
I've switched SCSI EH to do asynchronous aborts with commit e494f6a72839 
("[SCSI] improved eh timeout handler"); since then commands will be 
aborted without invoking SCSI EH.

This goes _fundamentally_ against libata's .eh_strategy (as it's never 
invoked), and as such libata _cannot_ use command aborts.
Which typically wouldn't matter as ATA doesn't have command aborts, and 
realistically any error is causing the NCQ queue to be drained.

So SCSI EH scsi_abort_command() really shouldn't queue a workqueue item, 
as it'll never be able to do anything meaningful.

You need this patch:

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index be2a70c5ac6d..4fb72b73871e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -242,6 +242,11 @@ scsi_abort_command(struct scsi_cmnd *scmd)
                 return FAILED;
         }

+       if (!hostt->eh_abort_handler) {
+               /* No abort handler, fail command directly */
+               return FAILED;
+       }
+
         spin_lock_irqsave(shost->host_lock, flags);
         if (shost->eh_deadline != -1 && !shost->last_reset)
                 shost->last_reset = jiffies;

to avoid having libata trying to queue a (pointless) abort workqueue 
item, and get rid of the various workqueue thingies you mentioned above.

And it's a sensible fix anyway, will be sending it as a separate patch.

Cheers,

Hannes
Niklas Cassel Nov. 9, 2022, 9:28 a.m. UTC | #8
On Wed, Nov 09, 2022 at 08:11:17AM +0100, Hannes Reinecke wrote:
> Thanks for the detailed explanation, Niklas.
> 
> However, one fundamental thing is still unresolved:
> I've switched SCSI EH to do asynchronous aborts with commit e494f6a72839
> ("[SCSI] improved eh timeout handler"); since then commands will be aborted
> without invoking SCSI EH.
> 
> This goes _fundamentally_ against libata's .eh_strategy (as it's never
> invoked), and as such libata _cannot_ use command aborts.
> Which typically wouldn't matter as ATA doesn't have command aborts, and
> realistically any error is causing the NCQ queue to be drained.
> 
> So SCSI EH scsi_abort_command() really shouldn't queue a workqueue item, as
> it'll never be able to do anything meaningful.
> 
> You need this patch:
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index be2a70c5ac6d..4fb72b73871e 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -242,6 +242,11 @@ scsi_abort_command(struct scsi_cmnd *scmd)
>                 return FAILED;
>         }
> 
> +       if (!hostt->eh_abort_handler) {
> +               /* No abort handler, fail command directly */
> +               return FAILED;
> +       }
> +
>         spin_lock_irqsave(shost->host_lock, flags);
>         if (shost->eh_deadline != -1 && !shost->last_reset)
>                 shost->last_reset = jiffies;
> 
> to avoid having libata trying to queue a (pointless) abort workqueue item,
> and get rid of the various workqueue thingies you mentioned above.
> 
> And it's a sensible fix anyway, will be sending it as a separate patch.

Hello Hannes,

This is how it looks before your patch:
scsi_logging_level -s -E 7

[   33.737069] sd 0:0:0:0: [sda] tag#0 abort scheduled
[   33.738812] sd 0:0:0:0: [sda] tag#3 abort scheduled
[   33.749085] sd 0:0:0:0: [sda] tag#0 aborting command
[   33.751393] sd 0:0:0:0: [sda] tag#0 cmd abort failed
[   33.753541] sd 0:0:0:0: [sda] tag#3 aborting command
[   33.755565] sd 0:0:0:0: [sda] tag#3 cmd abort failed
[   33.763051] scsi host0: Waking error handler thread
[   33.765727] scsi host0: scsi_eh_0: waking up 0/2/2
[   33.768815] ata1.00: exception Emask 0x0 SAct 0x9 SErr 0x0 action 0x0
[   33.772211] ata1.00: irq_stat 0x40000000
[   33.774187] ata1.00: failed command: WRITE FPDMA QUEUED
[   33.776962] ata1.00: cmd 61/00:00:00:00:0f/01:00:00:00:00/40 tag 0 ncq dma 131072 out
[   33.776962]          res 43/04:00:00:00:00/00:00:00:00:00/00 Emask 0x400 (NCQ error) <F>
[   33.783598] ata1.00: status: { DRDY SENSE ERR }
[   33.785252] ata1.00: error: { ABRT }
[   33.791290] ata1.00: configured for UDMA/100
[   33.792195] sd 0:0:0:0: [sda] tag#0 scsi_eh_0: flush finish cmd
[   33.793426] sd 0:0:0:0: [sda] tag#3 scsi_eh_0: flush finish cmd
[   33.794653] ata1: EH complete

So we do get the scmd:s sent to ATA EH (strategy_handler).

In scmd_eh_abort_handler(), scsi_try_to_abort_cmd() will return FAILED
since hostt->eh_abort_handler is not implemented for libata, so
scmd_eh_abort_handler() will do goto out; which calls scsi_eh_scmd_add().


This is how it looks after your patch:
scsi_logging_level -s -E 7

[  223.417749] scsi host0: Waking error handler thread
[  223.419782] scsi host0: scsi_eh_0: waking up 0/2/2
[  223.423101] ata1.00: exception Emask 0x0 SAct 0x80002 SErr 0x0 action 0x0
[  223.425362] ata1.00: irq_stat 0x40000008
[  223.426778] ata1.00: failed command: WRITE FPDMA QUEUED
[  223.428925] ata1.00: cmd 61/00:08:00:00:0f/01:00:00:00:00/40 tag 1 ncq dma 131072 out
[  223.428925]          res 43/04:00:00:00:00/00:00:00:00:00/00 Emask 0x400 (NCQ error) <F>
[  223.436077] ata1.00: status: { DRDY SENSE ERR }
[  223.438015] ata1.00: error: { ABRT }
[  223.441179] ata1.00: Security Log not supported
[  223.445698] ata1.00: Security Log not supported
[  223.448475] ata1.00: configured for UDMA/100
[  223.449790] sd 0:0:0:0: [sda] tag#1 scsi_eh_0: flush finish cmd
[  223.451063] sd 0:0:0:0: [sda] tag#19 scsi_eh_0: flush finish cmd
[  223.452648] ata1: EH complete

So your patch looks good to me, like you said, it removes a
a pointless queue_work().

Do we perhaps want to remove the !hostt->eh_abort_handler check
from scmd_eh_abort_handler(), now when you've moved it earlier
(to scsi_abort_command()) ? Perhaps we need to keep it, since
the function used for checking this, scsi_try_to_abort_cmd()
is used in other places as well.


Kind regards,
Niklas
Hannes Reinecke Nov. 9, 2022, 9:37 a.m. UTC | #9
On 11/9/22 10:28, Niklas Cassel wrote:
> On Wed, Nov 09, 2022 at 08:11:17AM +0100, Hannes Reinecke wrote:
>> Thanks for the detailed explanation, Niklas.
>>
>> However, one fundamental thing is still unresolved:
>> I've switched SCSI EH to do asynchronous aborts with commit e494f6a72839
>> ("[SCSI] improved eh timeout handler"); since then commands will be aborted
>> without invoking SCSI EH.
>>
>> This goes _fundamentally_ against libata's .eh_strategy (as it's never
>> invoked), and as such libata _cannot_ use command aborts.
>> Which typically wouldn't matter as ATA doesn't have command aborts, and
>> realistically any error is causing the NCQ queue to be drained.
>>
>> So SCSI EH scsi_abort_command() really shouldn't queue a workqueue item, as
>> it'll never be able to do anything meaningful.
>>
>> You need this patch:
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index be2a70c5ac6d..4fb72b73871e 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -242,6 +242,11 @@ scsi_abort_command(struct scsi_cmnd *scmd)
>>                  return FAILED;
>>          }
>>
>> +       if (!hostt->eh_abort_handler) {
>> +               /* No abort handler, fail command directly */
>> +               return FAILED;
>> +       }
>> +
>>          spin_lock_irqsave(shost->host_lock, flags);
>>          if (shost->eh_deadline != -1 && !shost->last_reset)
>>                  shost->last_reset = jiffies;
>>
>> to avoid having libata trying to queue a (pointless) abort workqueue item,
>> and get rid of the various workqueue thingies you mentioned above.
>>
>> And it's a sensible fix anyway, will be sending it as a separate patch.
> 
> Hello Hannes,
> 
> This is how it looks before your patch:
> scsi_logging_level -s -E 7
> 
> [   33.737069] sd 0:0:0:0: [sda] tag#0 abort scheduled
> [   33.738812] sd 0:0:0:0: [sda] tag#3 abort scheduled
> [   33.749085] sd 0:0:0:0: [sda] tag#0 aborting command
> [   33.751393] sd 0:0:0:0: [sda] tag#0 cmd abort failed
> [   33.753541] sd 0:0:0:0: [sda] tag#3 aborting command
> [   33.755565] sd 0:0:0:0: [sda] tag#3 cmd abort failed
> [   33.763051] scsi host0: Waking error handler thread
> [   33.765727] scsi host0: scsi_eh_0: waking up 0/2/2
> [   33.768815] ata1.00: exception Emask 0x0 SAct 0x9 SErr 0x0 action 0x0
> [   33.772211] ata1.00: irq_stat 0x40000000
> [   33.774187] ata1.00: failed command: WRITE FPDMA QUEUED
> [   33.776962] ata1.00: cmd 61/00:00:00:00:0f/01:00:00:00:00/40 tag 0 ncq dma 131072 out
> [   33.776962]          res 43/04:00:00:00:00/00:00:00:00:00/00 Emask 0x400 (NCQ error) <F>
> [   33.783598] ata1.00: status: { DRDY SENSE ERR }
> [   33.785252] ata1.00: error: { ABRT }
> [   33.791290] ata1.00: configured for UDMA/100
> [   33.792195] sd 0:0:0:0: [sda] tag#0 scsi_eh_0: flush finish cmd
> [   33.793426] sd 0:0:0:0: [sda] tag#3 scsi_eh_0: flush finish cmd
> [   33.794653] ata1: EH complete
> 
> So we do get the scmd:s sent to ATA EH (strategy_handler).
> 
> In scmd_eh_abort_handler(), scsi_try_to_abort_cmd() will return FAILED
> since hostt->eh_abort_handler is not implemented for libata, so
> scmd_eh_abort_handler() will do goto out; which calls scsi_eh_scmd_add().
> 
> 
> This is how it looks after your patch:
> scsi_logging_level -s -E 7
> 
> [  223.417749] scsi host0: Waking error handler thread
> [  223.419782] scsi host0: scsi_eh_0: waking up 0/2/2
> [  223.423101] ata1.00: exception Emask 0x0 SAct 0x80002 SErr 0x0 action 0x0
> [  223.425362] ata1.00: irq_stat 0x40000008
> [  223.426778] ata1.00: failed command: WRITE FPDMA QUEUED
> [  223.428925] ata1.00: cmd 61/00:08:00:00:0f/01:00:00:00:00/40 tag 1 ncq dma 131072 out
> [  223.428925]          res 43/04:00:00:00:00/00:00:00:00:00/00 Emask 0x400 (NCQ error) <F>
> [  223.436077] ata1.00: status: { DRDY SENSE ERR }
> [  223.438015] ata1.00: error: { ABRT }
> [  223.441179] ata1.00: Security Log not supported
> [  223.445698] ata1.00: Security Log not supported
> [  223.448475] ata1.00: configured for UDMA/100
> [  223.449790] sd 0:0:0:0: [sda] tag#1 scsi_eh_0: flush finish cmd
> [  223.451063] sd 0:0:0:0: [sda] tag#19 scsi_eh_0: flush finish cmd
> [  223.452648] ata1: EH complete
> 
> So your patch looks good to me, like you said, it removes a
> a pointless queue_work().
> 
> Do we perhaps want to remove the !hostt->eh_abort_handler check
> from scmd_eh_abort_handler(), now when you've moved it earlier
> (to scsi_abort_command()) ? Perhaps we need to keep it, since
> the function used for checking this, scsi_try_to_abort_cmd()
> is used in other places as well.
> 
I'd keep it as it aligns with all the other 'scsi_try_to_XXX()' commands.

Cheers,

Hannes
diff mbox series

Patch

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 4cb914103382..383a208f5f99 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1736,6 +1736,26 @@  static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
 	if (xlat_func(qc))
 		goto early_finish;
 
+	/*
+	 * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
+	 *
+	 * When getting an error interrupt, ata_port_abort() will be called,
+	 * which ends up calling ata_qc_schedule_eh() on all QCs.
+	 *
+	 * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
+	 * blk_abort_request() on the given QC. blk_abort_request() will
+	 * asynchronously end up calling scsi_eh_scmd_add(), which will set
+	 * the state to SHOST_RECOVERY and wake up SCSI EH.
+	 *
+	 * In order to avoid requests from being issued to the device from the
+	 * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
+	 * sets the state to SHOST_RECOVERY, we defer requests here as well.
+	 */
+	if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
+		rc = ATA_DEFER_LINK;
+		goto defer;
+	}
+
 	if (ap->ops->qc_defer) {
 		if ((rc = ap->ops->qc_defer(qc)))
 			goto defer;