diff mbox

libata: Allow SOFT_RESET for Sil3726

Message ID 1317881037-11831-1-git-send-email-gwendal@google.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Gwendal Grignou Oct. 6, 2011, 6:03 a.m. UTC
Allow controllers to send SOFT_RESET to Sil3726 PMP.
This PMP does not accept frames until the drive connected to
its port spins up.
Some controller [Sil3132 family] can not wait for the drive to spinup
and fails the reset, leading to unnecessary speed downgrade.

Not allowing to send SOFT_RESET can lead some drive slow to spinup
to be ignored and produces weird error messages.

This fix allows the error handler to wait if the controller is unable
to send a SOFT_RESET.

Change-Id: I7eeea152facb4b76e5c69cfde5ef8188874fbaba

Signed-off-by: Gwendal Grignou <gwendal@google.com>
---
 drivers/ata/libata-eh.c  |   11 ++++++++++-
 drivers/ata/libata-pmp.c |   10 ++++------
 include/linux/libata.h   |    1 +
 3 files changed, 15 insertions(+), 7 deletions(-)

Comments

Tejun Heo Oct. 6, 2011, 8:03 a.m. UTC | #1
Hello, Gwendal.

Which tree is this patch against?

On Wed, Oct 05, 2011 at 11:03:57PM -0700, Gwendal Grignou wrote:
> Allow controllers to send SOFT_RESET to Sil3726 PMP.
> This PMP does not accept frames until the drive connected to
> its port spins up.

Do you mean until the device sets RDY by sending D2H Reg FIS?

> Some controller [Sil3132 family] can not wait for the drive to spinup
> and fails the reset, leading to unnecessary speed downgrade.
> Not allowing to send SOFT_RESET can lead some drive slow to spinup
> to be ignored and produces weird error messages.

Yeap, I agree this is nasty.

> @@ -2805,7 +2805,14 @@ int ata_eh_reset(struct ata_link *link, int classify,
>  	    sata_scr_read(link, SCR_STATUS, &sstatus))
>  		rc = -ERESTART;
>  
> -	if (rc == -ERESTART || try >= max_tries)
> +	if (try >= max_tries)
> +		goto out;
> +
> +	/* Some PMP will not serve SRST until the disk is spunup,
> +	 * if the controller can not wait for the PMP to acknowledge the frame,
> +	 * wait here */
> +	if (rc == -ERESTART &&
> +	    !((lflags & ATA_LFLAG_WAIT_SRST) && (reset == softreset)))
>  		goto out;
>  
>  	now = jiffies;
> @@ -2820,6 +2827,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
>  			delta = schedule_timeout_uninterruptible(delta);
>  	}
>  
> +	if (rc == -ERESTART)
> +		goto out;

So, now libata waits for full reset period before proceeding to reset
PMP.  Hmmm... yeah, it makes sense.  Unfortunately, the only way to
achieve spinup wait in this case is waiting blindly and libata's reset
timeouts are configured to accomodate drive spinup times.  PMP SCR
failure kinda destroys those blind wait periods.

I'm not too sure about ATA_LFLAG_WAIT_SRST.  I don't think making the
new behavior default would hurt.

Can you please post before & after logs?

Thank you.
Sergei Shtylyov Oct. 6, 2011, 10:42 a.m. UTC | #2
Hello.

On 06-10-2011 10:03, Gwendal Grignou wrote:

> Allow controllers to send SOFT_RESET to Sil3726 PMP.
> This PMP does not accept frames until the drive connected to
> its port spins up.
> Some controller [Sil3132 family] can not wait for the drive to spinup
> and fails the reset, leading to unnecessary speed downgrade.

> Not allowing to send SOFT_RESET can lead some drive slow to spinup
> to be ignored and produces weird error messages.

> This fix allows the error handler to wait if the controller is unable
> to send a SOFT_RESET.

> Change-Id: I7eeea152facb4b76e5c69cfde5ef8188874fbaba

    Please get rid of this line, it has no place in the upstream commit.

> Signed-off-by: Gwendal Grignou<gwendal@google.com>
> ---
>   drivers/ata/libata-eh.c  |   11 ++++++++++-
>   drivers/ata/libata-pmp.c |   10 ++++------
>   include/linux/libata.h   |    1 +
>   3 files changed, 15 insertions(+), 7 deletions(-)

> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 49af350..60223c3 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -2805,7 +2805,14 @@ int ata_eh_reset(struct ata_link *link, int classify,
>   	    sata_scr_read(link, SCR_STATUS,&sstatus))
>   		rc = -ERESTART;
>
> -	if (rc == -ERESTART || try>= max_tries)
> +	if (try>= max_tries)
> +		goto out;
> +
> +	/* Some PMP will not serve SRST until the disk is spunup,
> +	 * if the controller can not wait for the PMP to acknowledge the frame,
> +	 * wait here */

    The preferred multi-line comment style:

/*
  * bla
  * bla
  */

> +	if (rc == -ERESTART&&
> +	    !((lflags&  ATA_LFLAG_WAIT_SRST)&&  (reset == softreset)))
>   		goto out;
>
>   	now = jiffies;
[...]
> diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
> index ad0e71d..5fbbe2f 100644
> --- a/drivers/ata/libata-pmp.c
> +++ b/drivers/ata/libata-pmp.c
> @@ -365,13 +365,11 @@ static void sata_pmp_quirks(struct ata_port *ap)
>   	if (vendor == 0x1095&&  devid == 0x3726) {
>   		/* sil3726 quirks */
>   		ata_for_each_link(link, ap, EDGE) {
> -			/* Class code report is unreliable and SRST
> -			 * times out under certain configurations.
> -			 */
> +			/* Class code report is unreliable */
> +			/* PMP does not forward SRST until the drive spins up */
>   			if (link->pmp < 5)
> -				link->flags |= ATA_LFLAG_NO_SRST |
> -					       ATA_LFLAG_ASSUME_ATA;
> -

    Why remove the empty line?

> +				link->flags |= ATA_LFLAG_ASSUME_ATA |
> +					       ATA_LFLAG_WAIT_SRST;
>   			/* port 5 is for SEMB device and it doesn't like SRST */
>   			if (link->pmp == 5)
>   				link->flags |= ATA_LFLAG_NO_SRST |

WBR, Sergei
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gwendal Grignou Oct. 6, 2011, 8:44 p.m. UTC | #3
On Thu, Oct 6, 2011 at 1:03 AM, Tejun Heo <htejun@gmail.com> wrote:
> Hello, Gwendal.
>
> Which tree is this patch against?
I am using 2.6.34. I try to have this mail follow the thread "RE:
Problem w/ hotplug on sata_sil24 w/ PMP (sil3726)", Derry started. It
did not work, sorry.
I will rebase the ata-dev branch soon - and clean up the patch
following Sergei comments.
>
> On Wed, Oct 05, 2011 at 11:03:57PM -0700, Gwendal Grignou wrote:
>> Allow controllers to send SOFT_RESET to Sil3726 PMP.
>> This PMP does not accept frames until the drive connected to
>> its port spins up.
>
> Do you mean until the device sets RDY by sending D2H Reg FIS?
Yes. Until the device sends the async D2H Reg FIS indicating the drive
spun up, the MPM does not accept the SoftRest FIS from the controller.
On most controller, that fine, the controller state machine keeps
retrying, but on Sil3132 it stops after a second or so and send an
error back to the driver.
>
>> Some controller [Sil3132 family] can not wait for the drive to spinup
>> and fails the reset, leading to unnecessary speed downgrade.
>> Not allowing to send SOFT_RESET can lead some drive slow to spinup
>> to be ignored and produces weird error messages.
>
> Yeap, I agree this is nasty.
>
>> @@ -2805,7 +2805,14 @@ int ata_eh_reset(struct ata_link *link, int classify,
>>           sata_scr_read(link, SCR_STATUS, &sstatus))
>>               rc = -ERESTART;
>>
>> -     if (rc == -ERESTART || try >= max_tries)
>> +     if (try >= max_tries)
>> +             goto out;
>> +
>> +     /* Some PMP will not serve SRST until the disk is spunup,
>> +      * if the controller can not wait for the PMP to acknowledge the frame,
>> +      * wait here */
>> +     if (rc == -ERESTART &&
>> +         !((lflags & ATA_LFLAG_WAIT_SRST) && (reset == softreset)))
>>               goto out;
>>
>>       now = jiffies;
>> @@ -2820,6 +2827,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
>>                       delta = schedule_timeout_uninterruptible(delta);
>>       }
>>
>> +     if (rc == -ERESTART)
>> +             goto out;
>
> So, now libata waits for full reset period before proceeding to reset
> PMP.  Hmmm... yeah, it makes sense.  Unfortunately, the only way to
> achieve spinup wait in this case is waiting blindly and libata's reset
> timeouts are configured to accomodate drive spinup times.  PMP SCR
> failure kinda destroys those blind wait periods.
Yes, I totally agree this blind wait is not clean. Normally we would
wait until an event occurs [async FIS] and have the timeout just for
the error case.
Here we wait [10s] because we think the device is spinning up.
>
> I'm not too sure about ATA_LFLAG_WAIT_SRST.  I don't think making the
> new behavior default would hurt.
I see your point. But if there is no PMP, Sil3132 is behaving, there
is no need of this logic.
>
> Can you please post before & after logs?
There are 2 problem with the current solution:
- by not waiting for device spin up, we basically disabled staggered
spinup: we send hard reset to all port very fast.
That may put burden on enclosure with weak power supplies.
- as Derry found out, disk which are slow to spin up can be ignored by
the kernel.

From my experience:

Before:
Apr 11 13:29:23 cigg22 kernel: ata5.15: Port Multiplier 1.1,
0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
Apr 11 13:29:23 cigg22 kernel: ata5.00: hard resetting link
Apr 11 13:29:23 cigg22 kernel: ata5.00: SATA link up 3.0 Gbps (SStatus
123 SControl 320)
Apr 11 13:29:23 cigg22 kernel: ata5.01: hard resetting link
...
Apr 11 13:29:23 cigg22 kernel: ata5.04: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
Apr 11 13:29:23 cigg22 kernel: ata5.05: hard resetting link
Apr 11 13:29:23 cigg22 kernel: ata5.05: SATA link up 1.5 Gbps (SStatus
113 SControl 320)
Apr 11 13:29:23 cigg22 kernel: ata5.00: failed to IDENTIFY (I/O error,
err_mask=0x11)
Apr 11 13:29:23 cigg22 kernel: ata5.15: hard resetting link
Apr 11 13:29:23 cigg22 kernel: ata5: controller in dubious state,
performing PORT_RST
Apr 11 13:29:23 cigg22 kernel: ata5.15: SATA link up 3.0 Gbps (SStatus
123 SControl 0)
Apr 11 13:29:23 cigg22 kernel: ata5.00: hard resetting link
Apr 11 13:29:23 cigg22 kernel: ata5.00: SATA link up 3.0 Gbps (SStatus
123 SControl 320)
...

We are hoping that by the time it takes to hard reset 5 ports, the
disks would have spun up.

After:
Sep 12 12:40:38 pnkv6 kern.info kernel: ata7: SATA link up 3.0 Gbps
(SStatus 123 SControl 0)
Sep 12 12:40:38 pnkv6 kern.info kernel: ata7.15: Port Multiplier 1.1,
0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
Sep 12 12:40:38 pnkv6 kern.info kernel: ata7.00: hard resetting link
Sep 12 12:40:38 pnkv6 kern.err kernel: ata7.00: softreset failed (SRST
command error)
Sep 12 12:40:38 pnkv6 kern.warn kernel: ata7.00: failed to read SCR 0
(Emask=0x40)
Sep 12 12:40:38 pnkv6 kern.warn kernel: ata7.00: reset failed
(errno=-85), retrying in 10 secs

<<< this allows the disk to spin up >>>

Sep 12 12:40:48 pnkv6 kern.err kernel: ata7.00: reset failed, giving up
Sep 12 12:40:48 pnkv6 kern.info kernel: ata7.15: hard resetting link
Sep 12 12:40:48 pnkv6 kern.warn kernel: ata7: controller in dubious
state, performing PORT_RST
Sep 12 12:40:50 pnkv6 kern.info kernel: ata7.15: SATA link up 3.0 Gbps
(SStatus 123 SControl 0)
Sep 12 12:40:50 pnkv6 kern.info kernel: ata7.00: hard resetting link
Sep 12 12:40:50 pnkv6 kern.info kernel: ata7.00: SATA link up 3.0 Gbps
(SStatus 123 SControl 320)
Sep 12 12:40:50 pnkv6 kern.info kernel: ata7.01: hard resetting link

>
> Thank you.
>
> --
> tejun
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo Oct. 6, 2011, 10:10 p.m. UTC | #4
Hello,

On Thu, Oct 06, 2011 at 01:44:27PM -0700, Gwendal Grignou wrote:
> Yes. Until the device sends the async D2H Reg FIS indicating the drive
> spun up, the MPM does not accept the SoftRest FIS from the controller.
> On most controller, that fine, the controller state machine keeps
> retrying, but on Sil3132 it stops after a second or so and send an
> error back to the driver.

I see.

> > So, now libata waits for full reset period before proceeding to reset
> > PMP.  Hmmm... yeah, it makes sense.  Unfortunately, the only way to
> > achieve spinup wait in this case is waiting blindly and libata's reset
> > timeouts are configured to accomodate drive spinup times.  PMP SCR
> > failure kinda destroys those blind wait periods.
>
> Yes, I totally agree this blind wait is not clean. Normally we would
> wait until an event occurs [async FIS] and have the timeout just for
> the error case.
> Here we wait [10s] because we think the device is spinning up.

Yes but that's exactly how the reset timeouts are set up.  They're
supposed to provide reasonable spinup timeouts when the proper wait
mechanisms can't do so and here it becomes a problem because the blind
timeouts are circumvented by SCR read failure handling.

> > I'm not too sure about ATA_LFLAG_WAIT_SRST.  I don't think making the
> > new behavior default would hurt.
>
> I see your point. But if there is no PMP, Sil3132 is behaving, there
> is no need of this logic.

Yes, sure, the behavior is necessary iff PMP is attached as that's
only time SCR read failure can occur anyway and I think it would
generally be a good idea to always enforce the blind timeouts if PMP
is attached, so no need for ATA_LFLAG_WAIT_SRST.

> > Can you please post before & after logs?
> There are 2 problem with the current solution:
> - by not waiting for device spin up, we basically disabled staggered
> spinup: we send hard reset to all port very fast.
> That may put burden on enclosure with weak power supplies.
> - as Derry found out, disk which are slow to spin up can be ignored by
> the kernel.

Looks pretty good to me.  Nice spotting.  Thanks a lot for tracking it
down and coming up with good solution. :)

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Lord Oct. 12, 2011, 2:03 p.m. UTC | #5
On 11-10-06 06:10 PM, Tejun Heo wrote:
> Hello,
> 
> On Thu, Oct 06, 2011 at 01:44:27PM -0700, Gwendal Grignou wrote:
>> Yes. Until the device sends the async D2H Reg FIS indicating the drive
>> spun up, the MPM does not accept the SoftRest FIS from the controller.
>> On most controller, that fine, the controller state machine keeps
>> retrying, but on Sil3132 it stops after a second or so and send an
>> error back to the driver.
> 
> I see.
> 
>>> So, now libata waits for full reset period before proceeding to reset
>>> PMP.  Hmmm... yeah, it makes sense.  Unfortunately, the only way to
>>> achieve spinup wait in this case is waiting blindly and libata's reset
>>> timeouts are configured to accomodate drive spinup times.  PMP SCR
>>> failure kinda destroys those blind wait periods.
>>
>> Yes, I totally agree this blind wait is not clean. Normally we would
>> wait until an event occurs [async FIS] and have the timeout just for
>> the error case.
>> Here we wait [10s] because we think the device is spinning up.
> 
> Yes but that's exactly how the reset timeouts are set up.  They're
> supposed to provide reasonable spinup timeouts when the proper wait
> mechanisms can't do so and here it becomes a problem because the blind
> timeouts are circumvented by SCR read failure handling.
> 
>>> I'm not too sure about ATA_LFLAG_WAIT_SRST.  I don't think making the
>>> new behavior default would hurt.
>>
>> I see your point. But if there is no PMP, Sil3132 is behaving, there
>> is no need of this logic.
> 
> Yes, sure, the behavior is necessary iff PMP is attached as that's
> only time SCR read failure can occur anyway and I think it would
> generally be a good idea to always enforce the blind timeouts if PMP
> is attached, so no need for ATA_LFLAG_WAIT_SRST.

I wonder if this same logic is more generally applicable for PMP
rather than sil3132 specific?  What I've read thus far in this thread
sounds very much like the issues I see here with PMPs not working
on JMB and Marvell controllers (where they USED to work fine).

Gwendal, have you got a version of that patch which applies the same
change globally rather than only for sil3132?  I'd like to try it out here.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gwendal Grignou Oct. 20, 2011, 12:02 a.m. UTC | #6
By issuing SRST,
. We re enable staggered spin up
. We allow slow disks to spin up while we are still in initial error handler

However, some controller fails sending SRST while the disk is spinning up
behind the port multiplier.
To avoid slow disk to be ignored, or link to be downgraded unnecessarily, wait 10s if
SRST fails before resetting the ATA port.

With a Sil3132.
Before the patch:
[  236.184036] ata4.15: hard resetting link
[  236.184038] ata4: controller in dubious state, performing PORT_RST
[  238.338048] ata4.15: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[  239.549102] ata4.00: hard resetting link
[  239.854315] ata4.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  239.854344] ata4.01: hard resetting link
[  240.159316] ata4.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  240.159344] ata4.02: hard resetting link
[  240.464316] ata4.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  240.464343] ata4.03: hard resetting link
[  240.769314] ata4.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  240.769342] ata4.04: hard resetting link
[  241.074316] ata4.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  241.074344] ata4.05: hard resetting link
[  241.379314] ata4.05: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  241.382672] ata4.00: configured for UDMA/100 
[  241.492027] ata4.01: failed to IDENTIFY (I/O error, err_mask=0x11)
[  241.492030] ata4.01: revalidation failed (errno=-5)
[  241.492033] ata4.15: hard resetting link

After the patch:
[   19.436422] ata4.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
[   19.454715] ata4.00: hard resetting link
[   19.880032] ata4.00: softreset failed (SRST command error)
[   19.880048] ata4.00: failed to read SCR 0 (Emask=0x40)
[   19.880051] ata4.00: reset failed (errno=-85), retrying in 10 secs 
[   29.454092] ata4.00: reset failed, giving up
[   29.454099] ata4.15: hard resetting link
[   29.454102] ata4: controller in dubious state, performing PORT_RST 
[   31.608046] ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   31.608431] ata4.00: hard resetting link
[   31.924284] ata4.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   31.924335] ata4.01: hard resetting link 
[   32.350029] ata4.01: softreset failed (SRST command error)
[   32.460032] ata4.01: failed to read SCR 0 (Emask=0x1)
[   32.460036] ata4.01: reset failed (errno=-85), retrying in 10 secs 
[   41.924139] ata4.01: reset failed, giving up 
[   41.924146] ata4.15: hard resetting link
[   41.924148] ata4: controller in dubious state, performing PORT_RST
[   44.078047] ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   44.078384] ata4.00: hard resetting link

Patch also tested with Marvel 7042 and Sil3726.

Gwendal Grignou (1):
  [libata]Issue SRST to Sil3726 PMP

 drivers/ata/libata-eh.c  |   12 +++++++++++-
 drivers/ata/libata-pmp.c |    7 ++-----
 2 files changed, 13 insertions(+), 6 deletions(-)
Gwendal Grignou Oct. 20, 2011, 12:17 a.m. UTC | #7
By issuing SRST,
. We re enable staggered spin up
. We allow slow disks to spin up while we are still in initial error handler

However, some controller fails sending SRST while the disk is spinning up
behind the port multiplier.
To avoid slow disk to be ignored, or link to be downgraded unnecessarily, wait 10s if
SRST fails before resetting the ATA port.

With a Sil3132.
Before the patch:
[  236.184036] ata4.15: hard resetting link
[  236.184038] ata4: controller in dubious state, performing PORT_RST
[  238.338048] ata4.15: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[  239.549102] ata4.00: hard resetting link
[  239.854315] ata4.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  239.854344] ata4.01: hard resetting link
[  240.159316] ata4.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  240.159344] ata4.02: hard resetting link
[  240.464316] ata4.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  240.464343] ata4.03: hard resetting link
[  240.769314] ata4.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  240.769342] ata4.04: hard resetting link
[  241.074316] ata4.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  241.074344] ata4.05: hard resetting link
[  241.379314] ata4.05: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  241.382672] ata4.00: configured for UDMA/100 
[  241.492027] ata4.01: failed to IDENTIFY (I/O error, err_mask=0x11)
[  241.492030] ata4.01: revalidation failed (errno=-5)
[  241.492033] ata4.15: hard resetting link

After the patch:
[   19.436422] ata4.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
[   19.454715] ata4.00: hard resetting link
[   19.880032] ata4.00: softreset failed (SRST command error)
[   19.880048] ata4.00: failed to read SCR 0 (Emask=0x40)
[   19.880051] ata4.00: reset failed (errno=-85), retrying in 10 secs 
[   29.454092] ata4.00: reset failed, giving up
[   29.454099] ata4.15: hard resetting link
[   29.454102] ata4: controller in dubious state, performing PORT_RST 
[   31.608046] ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   31.608431] ata4.00: hard resetting link
[   31.924284] ata4.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   31.924335] ata4.01: hard resetting link 
[   32.350029] ata4.01: softreset failed (SRST command error)
[   32.460032] ata4.01: failed to read SCR 0 (Emask=0x1)
[   32.460036] ata4.01: reset failed (errno=-85), retrying in 10 secs 
[   41.924139] ata4.01: reset failed, giving up 
[   41.924146] ata4.15: hard resetting link
[   41.924148] ata4: controller in dubious state, performing PORT_RST
[   44.078047] ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   44.078384] ata4.00: hard resetting link

Patch also tested with Marvel 7042 and Sil3726.

Signed off version.

Gwendal Grignou (1):
  [libata]Issue SRST to Sil3726 PMP

 drivers/ata/libata-eh.c  |   12 +++++++++++-
 drivers/ata/libata-pmp.c |    7 ++-----
 2 files changed, 13 insertions(+), 6 deletions(-)
Gwendal Grignou Oct. 20, 2011, 12:35 a.m. UTC | #8
By issuing SRST,
. We re enable staggered spin up
. We allow slow disks to spin up while we are still in initial error handler

However, some controller fails sending SRST while the disk is spinning up
behind the port multiplier.
To avoid slow disk to be ignored, or link to be downgraded unnecessarily,
wait for the reset deadline if SRST fails before resetting the ATA port.

With a Sil3132.
Before the patch:
[  236.184036] ata4.15: hard resetting link
[  236.184038] ata4: controller in dubious state, performing PORT_RST
[  238.338048] ata4.15: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[  239.549102] ata4.00: hard resetting link
[  239.854315] ata4.00: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  239.854344] ata4.01: hard resetting link
[  240.159316] ata4.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  240.159344] ata4.02: hard resetting link
[  240.464316] ata4.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  240.464343] ata4.03: hard resetting link
[  240.769314] ata4.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  240.769342] ata4.04: hard resetting link
[  241.074316] ata4.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  241.074344] ata4.05: hard resetting link
[  241.379314] ata4.05: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  241.382672] ata4.00: configured for UDMA/100 
[  241.492027] ata4.01: failed to IDENTIFY (I/O error, err_mask=0x11)
[  241.492030] ata4.01: revalidation failed (errno=-5)
[  241.492033] ata4.15: hard resetting link

After the patch:
[   19.436422] ata4.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
[   19.454715] ata4.00: hard resetting link
[   19.880032] ata4.00: softreset failed (SRST command error)
[   19.880048] ata4.00: failed to read SCR 0 (Emask=0x40)
[   19.880051] ata4.00: reset failed (errno=-85), retrying in 10 secs 
[   29.454092] ata4.00: reset failed, giving up
[   29.454099] ata4.15: hard resetting link
[   29.454102] ata4: controller in dubious state, performing PORT_RST 
[   31.608046] ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   31.608431] ata4.00: hard resetting link
[   31.924284] ata4.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   31.924335] ata4.01: hard resetting link 
[   32.350029] ata4.01: softreset failed (SRST command error)
[   32.460032] ata4.01: failed to read SCR 0 (Emask=0x1)
[   32.460036] ata4.01: reset failed (errno=-85), retrying in 10 secs 
[   41.924139] ata4.01: reset failed, giving up 
[   41.924146] ata4.15: hard resetting link
[   41.924148] ata4: controller in dubious state, performing PORT_RST
[   44.078047] ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   44.078384] ata4.00: hard resetting link

Patch also tested with Marvel 7042 and Sil3726.

Gwendal Grignou (1):
  [libata]Issue SRST to Sil3726 PMP

 drivers/ata/libata-eh.c  |   12 +++++++++++-
 drivers/ata/libata-pmp.c |    7 ++-----
 2 files changed, 13 insertions(+), 6 deletions(-)
diff mbox

Patch

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 49af350..60223c3 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2805,7 +2805,14 @@  int ata_eh_reset(struct ata_link *link, int classify,
 	    sata_scr_read(link, SCR_STATUS, &sstatus))
 		rc = -ERESTART;
 
-	if (rc == -ERESTART || try >= max_tries)
+	if (try >= max_tries)
+		goto out;
+
+	/* Some PMP will not serve SRST until the disk is spunup,
+	 * if the controller can not wait for the PMP to acknowledge the frame,
+	 * wait here */
+	if (rc == -ERESTART &&
+	    !((lflags & ATA_LFLAG_WAIT_SRST) && (reset == softreset)))
 		goto out;
 
 	now = jiffies;
@@ -2820,6 +2827,8 @@  int ata_eh_reset(struct ata_link *link, int classify,
 			delta = schedule_timeout_uninterruptible(delta);
 	}
 
+	if (rc == -ERESTART)
+		goto out;
 	if (try == max_tries - 1) {
 		sata_down_spd_limit(link, 0);
 		if (slave)
diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
index ad0e71d..5fbbe2f 100644
--- a/drivers/ata/libata-pmp.c
+++ b/drivers/ata/libata-pmp.c
@@ -365,13 +365,11 @@  static void sata_pmp_quirks(struct ata_port *ap)
 	if (vendor == 0x1095 && devid == 0x3726) {
 		/* sil3726 quirks */
 		ata_for_each_link(link, ap, EDGE) {
-			/* Class code report is unreliable and SRST
-			 * times out under certain configurations.
-			 */
+			/* Class code report is unreliable */
+			/* PMP does not forward SRST until the drive spins up */
 			if (link->pmp < 5)
-				link->flags |= ATA_LFLAG_NO_SRST |
-					       ATA_LFLAG_ASSUME_ATA;
-
+				link->flags |= ATA_LFLAG_ASSUME_ATA |
+					       ATA_LFLAG_WAIT_SRST;
 			/* port 5 is for SEMB device and it doesn't like SRST */
 			if (link->pmp == 5)
 				link->flags |= ATA_LFLAG_NO_SRST |
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 14b9f50..4577ed2 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -173,6 +173,7 @@  enum {
 	ATA_LFLAG_DISABLED	= (1 << 6), /* link is disabled */
 	ATA_LFLAG_SW_ACTIVITY	= (1 << 7), /* keep activity stats */
 	ATA_LFLAG_PHYOFF        = (1 << 8), /* phy is powered off */
+	ATA_LFLAG_WAIT_SRST	= (1 << 9), /* add delay when SRST fails */
 
 	/* struct ata_port flags */
 	ATA_FLAG_SLAVE_POSS	= (1 << 0), /* host supports slave dev */