diff mbox series

[1/1] libata: only wake a drive once on system resume

Message ID 20231230182128.296675-2-phill@thesusis.net
State New
Headers show
Series Only activate drive once during system resume | expand

Commit Message

Phillip Susi Dec. 30, 2023, 6:21 p.m. UTC
In the event that more than one pass of EH is needed during system resume,
only request the drive be started once.
---
 drivers/ata/libata-core.c | 4 ++--
 drivers/ata/libata-eh.c   | 4 ----
 2 files changed, 2 insertions(+), 6 deletions(-)

Comments

Sergey Shtylyov Dec. 30, 2023, 7:42 p.m. UTC | #1
On 12/30/23 9:21 PM, Phillip Susi wrote:

> In the event that more than one pass of EH is needed during system resume,
> only request the drive be started once.
> ---
>  drivers/ata/libata-core.c | 4 ++--
>  drivers/ata/libata-eh.c   | 4 ----
>  2 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index d1389ce6943e..ca3ca8107a3e 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5223,7 +5223,7 @@ static void ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
>  	/* on system resume, don't wake PuiS drives */
>  	if (mesg.event == PM_EVENT_RESUME)
>  		ehi_flags |= ATA_EHI_NOSTART;
> -	
> +

   Unrelated whitespace change? :-)

[...]

MBR, Sergey
Damien Le Moal Jan. 2, 2024, 11:17 p.m. UTC | #2
On 12/31/23 03:21, Phillip Susi wrote:
> In the event that more than one pass of EH is needed during system resume,
> only request the drive be started once.

This is not an improvement... What if the verify command that wakes-up the drive
fails to be issued, or EH does not reach the call to ata_dev_power_set_active()
on the first run ? You would want to retry it but your patch will prevent that.

I do not really see any fundamental issue here given that calling
ata_dev_power_set_active() is indeed useless if the drive is already active, but
that does not hurt either. The only overhead is issuing a check power mode
command (see the call to ata_dev_power_is_active() in ata_dev_power_set_active()).

Are you seeing different behavior with your system ? Any error ?

> ---
>  drivers/ata/libata-core.c | 4 ++--
>  drivers/ata/libata-eh.c   | 4 ----
>  2 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index d1389ce6943e..ca3ca8107a3e 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -5223,7 +5223,7 @@ static void ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
>  	/* on system resume, don't wake PuiS drives */
>  	if (mesg.event == PM_EVENT_RESUME)
>  		ehi_flags |= ATA_EHI_NOSTART;
> -	
> +
>  	/* Request PM operation to EH */
>  	ap->pm_mesg = mesg;
>  	ap->pflags |= ATA_PFLAG_PM_PENDING;
> @@ -5297,7 +5297,7 @@ static int ata_port_pm_poweroff(struct device *dev)
>  static void ata_port_resume(struct ata_port *ap, pm_message_t mesg,
>  			    bool async)
>  {
> -	ata_port_request_pm(ap, mesg, ATA_EH_RESET,
> +	ata_port_request_pm(ap, mesg, ATA_EH_RESET | ATA_EH_SET_ACTIVE,
>  			    ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET,
>  			    async);
>  }
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 120f7d7fb450..f44ce7068a8b 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -711,10 +711,6 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap)
>  			ehc->saved_xfer_mode[devno] = dev->xfer_mode;
>  			if (ata_ncq_enabled(dev))
>  				ehc->saved_ncq_enabled |= 1 << devno;
> -
> -			/* If we are resuming, wake up the device */
> -			if (ap->pflags & ATA_PFLAG_RESUMING)
> -				ehc->i.dev_action[devno] |= ATA_EH_SET_ACTIVE;
>  		}
>  	}
>
Phillip Susi Jan. 3, 2024, 9 p.m. UTC | #3
Damien Le Moal <dlemoal@kernel.org> writes:

> This is not an improvement... What if the verify command that wakes-up the drive
> fails to be issued, or EH does not reach the call to ata_dev_power_set_active()
> on the first run ? You would want to retry it but your patch will prevent that.

Perhaps if it fails, then it should set the flag to request it be tried
again?  As long as it succeeds, then there's no need to do it again?

> I do not really see any fundamental issue here given that calling
> ata_dev_power_set_active() is indeed useless if the drive is already active, but
> that does not hurt either. The only overhead is issuing a check power mode
> command (see the call to ata_dev_power_is_active() in ata_dev_power_set_active()).
>
> Are you seeing different behavior with your system ? Any error ?

My main issue with it was that it caused errors with my PuiS patch.  I
tried canceling the SET_ACTIVE flag when PuiS was detected, but then the
flag got turned back on for the second pass of EH, but not the flag for
revalidate_and_attach, so it didn't detect the PuiS and clear the
SET_ACTIVE flag the second time.  Since the SET FEATURES command was not
issued, the VERIFY command failed, and after the 5th attempt, eh gave
up.  Without PuiS, it would also be nice to not waste time with a second
VERIFY command.
Damien Le Moal Jan. 4, 2024, 1:21 a.m. UTC | #4
On 1/4/24 06:00, Phillip Susi wrote:
> Damien Le Moal <dlemoal@kernel.org> writes:
> 
>> This is not an improvement... What if the verify command that wakes-up the drive
>> fails to be issued, or EH does not reach the call to ata_dev_power_set_active()
>> on the first run ? You would want to retry it but your patch will prevent that.
> 
> Perhaps if it fails, then it should set the flag to request it be tried
> again?  As long as it succeeds, then there's no need to do it again?

Correct, if it succeeds, no need to do it again. The problem with clearing the
flag though is that ATA_EH_SET_ACTIVE is for the device and that is set only if
ATA_PFLAG_RESUMING is set, but that one is for the port. So if resume for the
device succeeds, you can clear the ATA_PFLAG_RESUMING flag *only* if there is
only a single link/device for that port. If not, other devices on the port may
need a retry so we cannot clear ATA_PFLAG_RESUMING.

> 
>> I do not really see any fundamental issue here given that calling
>> ata_dev_power_set_active() is indeed useless if the drive is already active, but
>> that does not hurt either. The only overhead is issuing a check power mode
>> command (see the call to ata_dev_power_is_active() in ata_dev_power_set_active()).
>>
>> Are you seeing different behavior with your system ? Any error ?
> 
> My main issue with it was that it caused errors with my PuiS patch.  I

Ah, now I think I understand: is it your patch that prevents resuming a drive if
it has PUIS on ? If yes, then sure, the verify command to spin-up the drive will
indeed fail immediately if the drive is in standby from PUIS, since getting out
of standby state driven by PUIS requires a set features spinup. So... with your
patch, things will not work.

> tried canceling the SET_ACTIVE flag when PuiS was detected, but then the
> flag got turned back on for the second pass of EH, but not the flag for
> revalidate_and_attach, so it didn't detect the PuiS and clear the
> SET_ACTIVE flag the second time.  Since the SET FEATURES command was not
> issued, the VERIFY command failed, and after the 5th attempt, eh gave
> up.  Without PuiS, it would also be nice to not waste time with a second
> VERIFY command.

If PUIS is not enabled, the only thing wasted is a check power mode command. If
the drive confirms that it is active, verify command is not issued.

With PUIS enabled though, if you leave the drive in standby mode, when/how do
you actually wake it up ?

Scratching my head about this, I think that doing this cleanly should be
possible if:
1) The dive gives complete identify data when that command is issued with the
drive in standby state driven by PUIS.
2) The call to ata_dev_configure() executed by ata EH started from system resume
does not spinup the device if requested not to (libata module & sysfs parameter
can do that). But I think this requires that the drive be instead put into a
state equivalent to *runtime* suspend, that is, with the scsi disk associated
with the device must also be in runtime suspend state.

With that, it would only be a matter of adding a device flag to remember that
the drive is in "PUIS stnadby" instead of regular standby, and then have
ata_dev_power_set_active() use a set feature spinup command instead of a verify.
Drive spinup will then be cleanly driven by accesses to the scsi disk, similarly
to a regular runtime suspend.

With such changes, everything would be cleaner and safer and all work as
expected. The exception will be drives that do not give complete identify data
when PUIS is on. For these, it is too risky to not wake them up to get the full
information first.

Do you want to try to tackle this ? If you do not feel like it, I can give it a
try too.
Phillip Susi Jan. 4, 2024, 2:05 p.m. UTC | #5
Damien Le Moal <dlemoal@kernel.org> writes:

> Correct, if it succeeds, no need to do it again. The problem with clearing the
> flag though is that ATA_EH_SET_ACTIVE is for the device and that is set only if
> ATA_PFLAG_RESUMING is set, but that one is for the port. So if resume for the
> device succeeds, you can clear the ATA_PFLAG_RESUMING flag *only* if there is
> only a single link/device for that port. If not, other devices on the port may
> need a retry so we cannot clear ATA_PFLAG_RESUMING.

Rather than clear ATA_PFLAG_RESUMING, I was thinking of keeping my
previus change to specify ATA_EH_SET_ACTIVE in the request_pm path
rather than by setting it based on ATA_PFLAG_RESUMING, but just adding a
check to see if the VERIFY fails, and if so, set ATA_EH_SET_ACTIVE again.

> Ah, now I think I understand: is it your patch that prevents resuming a drive if
> it has PUIS on ? If yes, then sure, the verify command to spin-up the drive will
> indeed fail immediately if the drive is in standby from PUIS, since getting out
> of standby state driven by PUIS requires a set features spinup. So... with your
> patch, things will not work.

Right... unless you also apply this patch to make sure that
ATA_EH_SET_ACTIVE isn't turned on again after it is cleared when PuiS is detected.

> If PUIS is not enabled, the only thing wasted is a check power mode command. If
> the drive confirms that it is active, verify command is not issued.

Ahh, right... I forgot you did add a CHECK POWER MODE first.

> With PUIS enabled though, if you leave the drive in standby mode, when/how do
> you actually wake it up ?

Initially I just set the ATA_DFLAG_SLEEPING flag as if the drive had been
put into SLEEP mode, which triggers EH to wake it up when the drive is
accessed.  I have since switched to actually putting the drive to SLEEP
mode when PuiS is detected since it will save a little more power than
leaving it in STANDBY.

> Scratching my head about this, I think that doing this cleanly should be
> possible if:
> 1) The dive gives complete identify data when that command is issued with the
> drive in standby state driven by PUIS.

I think you can do it even if the IDENTIFY data is incomplete, as long
as a revalidate_and_attach is done eventually, when waking the drive up.

> 2) The call to ata_dev_configure() executed by ata EH started from system resume
> does not spinup the device if requested not to (libata module & sysfs parameter
> can do that). But I think this requires that the drive be instead put into a
> state equivalent to *runtime* suspend, that is, with the scsi disk associated
> with the device must also be in runtime suspend state.

Yea, I was trying to make it work with runtime suspend before, but you
indicated that the current device hierarchy seems to make that
impossible, so I put that aside for now.  Currently runtime pm thinks
the drive is running even though it isn't, but that isn't any different
than when you hdparm -y or hdparm -Y, or hdparm -S and let the drive
decide to auto standby.  Eventually I'll try to get the runtime pm
sorted but I figured I'd try to get the basic concept working first.

> With that, it would only be a matter of adding a device flag to remember that
> the drive is in "PUIS stnadby" instead of regular standby, and then have
> ata_dev_power_set_active() use a set feature spinup command instead of a verify.
> Drive spinup will then be cleanly driven by accesses to the scsi disk, similarly
> to a regular runtime suspend.

Right now I'm using the SLEEP flag to do this, and when the disk is
accessed, it triggers a round of EH that does the revalidate_and_attach
and in the process, issues the SET FEATURES command to wake the drive.

> With such changes, everything would be cleaner and safer and all work as
> expected. The exception will be drives that do not give complete identify data
> when PUIS is on. For these, it is too risky to not wake them up to get the full
> information first.
>
> Do you want to try to tackle this ? If you do not feel like it, I can give it a
> try too.

I've already got it working ;)

I think I sent you an earlier version of the patch a few weeks ago.
I'll post my whole series tonight, after I fix the case of retrying the
VERIFY if it fails.
diff mbox series

Patch

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index d1389ce6943e..ca3ca8107a3e 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5223,7 +5223,7 @@  static void ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
 	/* on system resume, don't wake PuiS drives */
 	if (mesg.event == PM_EVENT_RESUME)
 		ehi_flags |= ATA_EHI_NOSTART;
-	
+
 	/* Request PM operation to EH */
 	ap->pm_mesg = mesg;
 	ap->pflags |= ATA_PFLAG_PM_PENDING;
@@ -5297,7 +5297,7 @@  static int ata_port_pm_poweroff(struct device *dev)
 static void ata_port_resume(struct ata_port *ap, pm_message_t mesg,
 			    bool async)
 {
-	ata_port_request_pm(ap, mesg, ATA_EH_RESET,
+	ata_port_request_pm(ap, mesg, ATA_EH_RESET | ATA_EH_SET_ACTIVE,
 			    ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET,
 			    async);
 }
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 120f7d7fb450..f44ce7068a8b 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -711,10 +711,6 @@  void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap)
 			ehc->saved_xfer_mode[devno] = dev->xfer_mode;
 			if (ata_ncq_enabled(dev))
 				ehc->saved_ncq_enabled |= 1 << devno;
-
-			/* If we are resuming, wake up the device */
-			if (ap->pflags & ATA_PFLAG_RESUMING)
-				ehc->i.dev_action[devno] |= ATA_EH_SET_ACTIVE;
 		}
 	}