diff mbox series

libata: fix read log timeout value

Message ID 20211105073106.422623-1-damien.lemoal@opensource.wdc.com
State New
Headers show
Series libata: fix read log timeout value | expand

Commit Message

Damien Le Moal Nov. 5, 2021, 7:31 a.m. UTC
Some ATA drives are very slow to respond to READ_LOG_EXT and
READ_LOG_DMA_EXT commands issued from ata_dev_configure() when the
device is revalidated right after resuming a system or inserting the
ATA adapter driver (e.g. ahci). The default 5s timeout
(ATA_EH_CMD_DFL_TIMEOUT) used for these commands is too short, causing
errors during the device configuration. Ex:

...
ata9: SATA max UDMA/133 abar m524288@0x9d200000 port 0x9d200400 irq 209
ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata9.00: ATA-9: XXX  XXXXXXXXXXXXXXX, XXXXXXXX, max UDMA/133
ata9.00: qc timeout (cmd 0x2f)
ata9.00: Read log page 0x00 failed, Emask 0x4
ata9.00: Read log page 0x00 failed, Emask 0x40
ata9.00: NCQ Send/Recv Log not supported
ata9.00: Read log page 0x08 failed, Emask 0x40
ata9.00: 27344764928 sectors, multi 16: LBA48 NCQ (depth 32), AA
ata9.00: Read log page 0x00 failed, Emask 0x40
ata9.00: ATA Identify Device Log not supported
ata9.00: failed to set xfermode (err_mask=0x40)
ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata9.00: configured for UDMA/133
...

The timeoutout error causes a soft reset of the drive link, followed in
most cases by a successful revalidation as that give enough time to the
drive to become fully ready to quickly process the read log commands.
However, in some cases, this also fails resulting in the device being
dropped.

Fix this by using adding the ata_eh_revalidate_timeouts entries for the
READ_LOG_EXT and READ_LOG_DMA_EXT commands. This defines a timeout
increased to 15s, retriable one time.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
---
 drivers/ata/libata-eh.c | 8 ++++++++
 include/linux/libata.h  | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

Comments

Geert Uytterhoeven Nov. 8, 2021, 9:47 a.m. UTC | #1
Hi Damien,

On Fri, Nov 5, 2021 at 8:31 AM Damien Le Moal
<damien.lemoal@opensource.wdc.com> wrote:
> Some ATA drives are very slow to respond to READ_LOG_EXT and
> READ_LOG_DMA_EXT commands issued from ata_dev_configure() when the
> device is revalidated right after resuming a system or inserting the
> ATA adapter driver (e.g. ahci). The default 5s timeout
> (ATA_EH_CMD_DFL_TIMEOUT) used for these commands is too short, causing
> errors during the device configuration. Ex:
>
> ...
> ata9: SATA max UDMA/133 abar m524288@0x9d200000 port 0x9d200400 irq 209
> ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata9.00: ATA-9: XXX  XXXXXXXXXXXXXXX, XXXXXXXX, max UDMA/133
> ata9.00: qc timeout (cmd 0x2f)
> ata9.00: Read log page 0x00 failed, Emask 0x4
> ata9.00: Read log page 0x00 failed, Emask 0x40
> ata9.00: NCQ Send/Recv Log not supported
> ata9.00: Read log page 0x08 failed, Emask 0x40
> ata9.00: 27344764928 sectors, multi 16: LBA48 NCQ (depth 32), AA
> ata9.00: Read log page 0x00 failed, Emask 0x40
> ata9.00: ATA Identify Device Log not supported
> ata9.00: failed to set xfermode (err_mask=0x40)
> ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata9.00: configured for UDMA/133
> ...
>
> The timeoutout error causes a soft reset of the drive link, followed in

timeout

> most cases by a successful revalidation as that give enough time to the
> drive to become fully ready to quickly process the read log commands.
> However, in some cases, this also fails resulting in the device being
> dropped.
>
> Fix this by using adding the ata_eh_revalidate_timeouts entries for the
> READ_LOG_EXT and READ_LOG_DMA_EXT commands. This defines a timeout
> increased to 15s, retriable one time.
>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

Thanks for your patch! Impact on system resume on Salvator-XS:

s2idle:

     ata1: link resume succeeded after 1 retries
     ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
    -ata1.00: qc timeout (cmd 0x2f)
    -ata1.00: Read log page 0x00 failed, Emask 0x4
    -ata1.00: ATA Identify Device Log not supported
    -ata1.00: failed to set xfermode (err_mask=0x40)
    -ata1: link resume succeeded after 1 retries
    -ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
     ata1.00: ATA Identify Device Log not supported
     ata1.00: ATA Identify Device Log not supported
     ata1.00: configured for UDMA/133

s2ram:

     ata1: link resume succeeded after 1 retries
     ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
    -ata1.00: qc timeout (cmd 0x2f)
    -ata1.00: Read log page 0x00 failed, Emask 0x4
    -ata1.00: ATA Identify Device Log not supported
    -ata1.00: failed to set xfermode (err_mask=0x40)
    -ata1: link resume succeeded after 1 retries
    -ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
    -ata1.00: NODEV after polling detection
    -ata1.00: revalidation failed (errno=-2)
    -ata1: link resume succeeded after 1 retries
    -ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
     ata1.00: ATA Identify Device Log not supported
     ata1.00: ATA Identify Device Log not supported
     ata1.00: configured for UDMA/133

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Damien Le Moal Nov. 8, 2021, 9:38 p.m. UTC | #2
On 2021/11/08 18:47, Geert Uytterhoeven wrote:
> Hi Damien,
> 
> On Fri, Nov 5, 2021 at 8:31 AM Damien Le Moal
> <damien.lemoal@opensource.wdc.com> wrote:
>> Some ATA drives are very slow to respond to READ_LOG_EXT and
>> READ_LOG_DMA_EXT commands issued from ata_dev_configure() when the
>> device is revalidated right after resuming a system or inserting the
>> ATA adapter driver (e.g. ahci). The default 5s timeout
>> (ATA_EH_CMD_DFL_TIMEOUT) used for these commands is too short, causing
>> errors during the device configuration. Ex:
>>
>> ...
>> ata9: SATA max UDMA/133 abar m524288@0x9d200000 port 0x9d200400 irq 209
>> ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>> ata9.00: ATA-9: XXX  XXXXXXXXXXXXXXX, XXXXXXXX, max UDMA/133
>> ata9.00: qc timeout (cmd 0x2f)
>> ata9.00: Read log page 0x00 failed, Emask 0x4
>> ata9.00: Read log page 0x00 failed, Emask 0x40
>> ata9.00: NCQ Send/Recv Log not supported
>> ata9.00: Read log page 0x08 failed, Emask 0x40
>> ata9.00: 27344764928 sectors, multi 16: LBA48 NCQ (depth 32), AA
>> ata9.00: Read log page 0x00 failed, Emask 0x40
>> ata9.00: ATA Identify Device Log not supported
>> ata9.00: failed to set xfermode (err_mask=0x40)
>> ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>> ata9.00: configured for UDMA/133
>> ...
>>
>> The timeoutout error causes a soft reset of the drive link, followed in
> 
> timeout
> 
>> most cases by a successful revalidation as that give enough time to the
>> drive to become fully ready to quickly process the read log commands.
>> However, in some cases, this also fails resulting in the device being
>> dropped.
>>
>> Fix this by using adding the ata_eh_revalidate_timeouts entries for the
>> READ_LOG_EXT and READ_LOG_DMA_EXT commands. This defines a timeout
>> increased to 15s, retriable one time.
>>
>> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> 
> Thanks for your patch! Impact on system resume on Salvator-XS:
> 
> s2idle:
> 
>      ata1: link resume succeeded after 1 retries
>      ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>     -ata1.00: qc timeout (cmd 0x2f)
>     -ata1.00: Read log page 0x00 failed, Emask 0x4
>     -ata1.00: ATA Identify Device Log not supported
>     -ata1.00: failed to set xfermode (err_mask=0x40)
>     -ata1: link resume succeeded after 1 retries
>     -ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>      ata1.00: ATA Identify Device Log not supported
>      ata1.00: ATA Identify Device Log not supported
>      ata1.00: configured for UDMA/133
> 
> s2ram:
> 
>      ata1: link resume succeeded after 1 retries
>      ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>     -ata1.00: qc timeout (cmd 0x2f)
>     -ata1.00: Read log page 0x00 failed, Emask 0x4
>     -ata1.00: ATA Identify Device Log not supported
>     -ata1.00: failed to set xfermode (err_mask=0x40)
>     -ata1: link resume succeeded after 1 retries
>     -ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>     -ata1.00: NODEV after polling detection
>     -ata1.00: revalidation failed (errno=-2)
>     -ata1: link resume succeeded after 1 retries
>     -ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>      ata1.00: ATA Identify Device Log not supported
>      ata1.00: ATA Identify Device Log not supported
>      ata1.00: configured for UDMA/133
> 
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Great ! Thanks for testing.

> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
>
diff mbox series

Patch

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index bf9c4b6c5c3d..1d4a6f1e88cd 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -93,6 +93,12 @@  static const unsigned long ata_eh_identify_timeouts[] = {
 	ULONG_MAX,
 };
 
+static const unsigned long ata_eh_revalidate_timeouts[] = {
+	15000,	/* Some drives are slow to read log pages when waking-up */
+	15000,  /* combined time till here is enough even for media access */
+	ULONG_MAX,
+};
+
 static const unsigned long ata_eh_flush_timeouts[] = {
 	15000,	/* be generous with flush */
 	15000,  /* ditto */
@@ -129,6 +135,8 @@  static const struct ata_eh_cmd_timeout_ent
 ata_eh_cmd_timeout_table[ATA_EH_CMD_TIMEOUT_TABLE_SIZE] = {
 	{ .commands = CMDS(ATA_CMD_ID_ATA, ATA_CMD_ID_ATAPI),
 	  .timeouts = ata_eh_identify_timeouts, },
+	{ .commands = CMDS(ATA_CMD_READ_LOG_EXT, ATA_CMD_READ_LOG_DMA_EXT),
+	  .timeouts = ata_eh_revalidate_timeouts, },
 	{ .commands = CMDS(ATA_CMD_READ_NATIVE_MAX, ATA_CMD_READ_NATIVE_MAX_EXT),
 	  .timeouts = ata_eh_other_timeouts, },
 	{ .commands = CMDS(ATA_CMD_SET_MAX, ATA_CMD_SET_MAX_EXT),
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 236ec689056a..0ab8d9594541 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -394,7 +394,7 @@  enum {
 	/* This should match the actual table size of
 	 * ata_eh_cmd_timeout_table in libata-eh.c.
 	 */
-	ATA_EH_CMD_TIMEOUT_TABLE_SIZE = 6,
+	ATA_EH_CMD_TIMEOUT_TABLE_SIZE = 7,
 
 	/* Horkage types. May be set by libata or controller on drives
 	   (some horkage may be drive/controller pair dependent */