diff mbox

Problem w/ hotplug on sata_sil24 w/ PMP (sil3726)

Message ID CAMHSBOWbaLjbWZ7PNwTa-AsZRAzrOGLgSip=q2=u+qVQW4u3mQ@mail.gmail.com
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Gwendal Grignou Oct. 6, 2011, 5:48 a.m. UTC
I think I know what is going on. One of your disks at least is slow to
spinup. Due to a bug/feature in silicon image disk controller and pmp,
at bring up we can not issue a SOFT_RESET and wait for the disk to
spinup and then continue.
That why we set ATA_LFLAG_NO_SRST in sata_pmp_quirks().
So what happen is we go into a function that issue identify, but we
fail, the disk is not ready [it is spinning up], so we retry.
3 times.

From the first hard reset: 12888.470385, to the time you got the final
error: 12901.397305 ~ 12.9s
In the second case, your controller can send SOFT_RESET and wait for
the device to respond.
Time for the disk to spinup:
28010.630028 - 27997.097116 ~ 13.5s
As you can see, you are borderline with the PMP, but the controller
did not "wait" enough in the first case.
Given the spinup time varies with drive, age, time since last
spin-up..., it may work one day and fail the next.
To work around the problem, I have a patch that consist of allowing
the silicon image control to send a reset, but if it fails, we spin
for a fixed amount of time and retry. This is not very nice, it is a
better design to wait for event that waiting a fixed amount of time.
You may have to alter ATA_LFLAG_WAIT_SRST to use the first bit available.

Can you try with the following patch?

Thanks,
Gwendal.



On Fri, Sep 30, 2011 at 2:54 PM, Mike I <mihrcke@gmail.com> wrote:
>
> tj <at> kernel.org <tj <at> kernel.org> writes:
>
> >
> > Hello,
> >
> > How did that go?
> >
> > Thanks.
> >
>
> Like Derry who started this thread, I too had seen an old thread from
> October/November 2008 with what appeared to be no resolution to this problem.
> Now, finding this thread, again, with no apparent resolution to this problem.
>
> I'm currently running Ubuntu 10.04 (lucid), kernel 2.6.32-33-generic.  I've no
> experience with applying these git patches, and my searching to figure out how
> it works have not helped.
>
> I'm using an Addonics eSATA PCI-X controller with the SiI3124 chipset, and I
> have an Addonics PM in an external enclosure, with a 5 bay/tray DAS.  Some of
> my drives give me this problem: (this occurs for me with pretty much ALL
> Samsung hard drives)
> [12888.470308] ata9.01: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf
> [12888.470313] ata9.01: SError: { PHYRdyChg CommWake DevExch }
> [12888.470385] ata9.01: hard resetting link
> [12889.211597] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [12889.211686] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
> [12889.211692] ata9.15: hard resetting link
> [12891.430086] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> [12891.430397] ata9.00: hard resetting link
> [12891.780786] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [12894.211103] ata9.01: hard resetting link
> [12894.560424] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12894.560466] ata9.02: hard resetting link
> [12894.914176] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12894.914222] ata9.03: hard resetting link
> [12895.264141] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12895.264169] ata9.04: hard resetting link
> [12895.612930] ata9.04: SATA link down (SStatus 0 SControl 320)
> [12895.613007] ata9.05: hard resetting link
> [12895.964143] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
> [12896.065908] ata9.00: configured for UDMA/100
> [12896.065970] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
> [12896.065977] ata9.15: hard resetting link
> [12898.283804] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> [12898.284128] ata9.00: hard resetting link
> [12898.634174] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [12899.562524] ata9.01: hard resetting link
> [12899.914147] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12899.914180] ata9.02: hard resetting link
> [12900.261682] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12900.261724] ata9.03: hard resetting link
> [12900.610413] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12900.961283] ata9.05: hard resetting link
> [12901.310385] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
> [12901.397241] ata9.00: configured for UDMA/100
> [12901.397300] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
> [12901.397305] ata9.01: failed to recover link after 3 tries, disabling
> [12901.397311] ata9.15: hard resetting link
> [12903.613694] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> [12903.960564] ata9.00: hard resetting link
> [12904.311125] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [12905.260154] ata9.02: hard resetting link
> [12905.602929] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12905.611319] ata9.03: hard resetting link
> [12905.962555] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [12905.962592] ata9.04: hard resetting link
> [12906.312931] ata9.04: SATA link down (SStatus 0 SControl 320)
> [12906.313004] ata9.05: hard resetting link
> [12906.660409] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
> [12906.753619] ata9.00: configured for UDMA/100
> [12906.766586] ata9.02: configured for UDMA/100
> [12906.771917] ata9.03: configured for UDMA/100
> [12907.121462] ata9: EH complete
>
> If I hot plug the same drive using a port directly off my mobo(no PM in the
> mix), I get this result(drive connects/mounts/works):
> [27997.097104] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action
> 0xe frozen
> [27997.097108] ata5: irq_stat 0x00400040, connection status changed
> [27997.097111] ata5: SError: { PHYRdyChg CommWake DevExch }
> [27997.097116] ata5: hard resetting link
> [28007.147622] ata5: softreset failed (device not ready)
> [28007.147627] ata5: hard resetting link
> [28010.630028] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [28010.748595] ata5.00: ATA-7: SAMSUNG HD154UI, 1AG01118, max UDMA7
> [28010.748599] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> [28010.755227] ata5.00: configured for UDMA/133
> [28010.755237] ata5: EH complete
> [28010.756338] scsi 4:0:0:0: Direct-Access     ATA      SAMSUNG HD154UI  1AG0
> PQ: 0 ANSI: 5
> [28010.756475] sd 4:0:0:0: Attached scsi generic sg10 type 0
> [28010.756572] sd 4:0:0:0: [sdj] 2930277168 512-byte logical blocks: (1.50
> TB/1.36 TiB)
> [28010.756613] sd 4:0:0:0: [sdj] Write Protect is off
> [28010.756616] sd 4:0:0:0: [sdj] Mode Sense: 00 3a 00 00
> [28010.756636] sd 4:0:0:0: [sdj] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [28010.756760]  sdj: sdj1
> [28010.816161] sd 4:0:0:0: [sdj] Attached SCSI disk
>
> I've been using Ubuntu for a few years now, and have been living with the
> problem...working around it with USB docking stations and such.  But, I'd
> really hope to see/find this problem worked out.
>
> Thoughts/tips/suggestions?  Since I'm pretty much a novice when it comes to
> patching, a link to a guide for git patching would be appreciated too.
>
> Thank You,
> Mike
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Gwendal Grignou Oct. 6, 2011, 6:07 a.m. UTC | #1
Forgot to remove make it plain text, sorry for the spam.

Gwendal.

On Wed, Oct 5, 2011 at 11:05 PM, Gwendal Grignou <gwendal@google.com> wrote:
>
> Forgot gmail is not great to send patches, used git send-email instead.
> Gwendal.
>
> On Wed, Oct 5, 2011 at 10:48 PM, Gwendal Grignou <gwendal@google.com> wrote:
>>
>> I think I know what is going on. One of your disks at least is slow to
>> spinup. Due to a bug/feature in silicon image disk controller and pmp,
>> at bring up we can not issue a SOFT_RESET and wait for the disk to
>> spinup and then continue.
>> That why we set ATA_LFLAG_NO_SRST in sata_pmp_quirks().
>> So what happen is we go into a function that issue identify, but we
>> fail, the disk is not ready [it is spinning up], so we retry.
>> 3 times.
>>
>> From the first hard reset: 12888.470385, to the time you got the final
>> error: 12901.397305 ~ 12.9s
>> In the second case, your controller can send SOFT_RESET and wait for
>> the device to respond.
>> Time for the disk to spinup:
>> 28010.630028 - 27997.097116 ~ 13.5s
>> As you can see, you are borderline with the PMP, but the controller
>> did not "wait" enough in the first case.
>> Given the spinup time varies with drive, age, time since last
>> spin-up..., it may work one day and fail the next.
>> To work around the problem, I have a patch that consist of allowing
>> the silicon image control to send a reset, but if it fails, we spin
>> for a fixed amount of time and retry. This is not very nice, it is a
>> better design to wait for event that waiting a fixed amount of time.
>> You may have to alter ATA_LFLAG_WAIT_SRST to use the first bit available.
>>
>> Can you try with the following patch?
>>
>> Thanks,
>> Gwendal.
>>
>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>> index 228740f..b98b02d 100644
>> --- a/drivers/ata/libata-eh.c
>> +++ b/drivers/ata/libata-eh.c
>> @@ -2798,7 +2798,14 @@ int ata_eh_reset(struct ata_link *link, int classify,
>>      sata_scr_read(link, SCR_STATUS, &sstatus))
>>   rc = -ERESTART;
>>
>> - if (rc == -ERESTART || try >= max_tries)
>> + if (try >= max_tries)
>> + goto out;
>> +
>> + /* Some PMP will not serve SRST until the disk is spunup,
>> + * if the controller can not wait for the PMP to acknowledge the frame,
>> + * wait here */
>> + if (rc == -ERESTART &&
>> +    !((lflags & ATA_LFLAG_WAIT_SRST) && (reset == softreset)))
>>   goto out;
>>
>>   now = jiffies;
>> @@ -2813,6 +2820,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
>>   delta = schedule_timeout_uninterruptible(delta);
>>   }
>>
>> + if (rc == -ERESTART)
>> + goto out;
>>   if (try == max_tries - 1) {
>>   sata_down_spd_limit(link, 0);
>>   if (slave)
>> diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
>> index 00305f4..d21ad7d 100644
>> --- a/drivers/ata/libata-pmp.c
>> +++ b/drivers/ata/libata-pmp.c
>> @@ -325,13 +351,11 @@ static void sata_pmp_quirks(struct ata_port *ap)
>>   if (vendor == 0x1095 && devid == 0x3726) {
>>   /* sil3726 quirks */
>>   ata_for_each_link(link, ap, EDGE) {
>> - /* Class code report is unreliable and SRST
>> - * times out under certain configurations.
>> - */
>> + /* Class code report is unreliable */
>> + /* PMP does not forward SRST until the drive spins up */
>>   if (link->pmp < 5)
>> - link->flags |= ATA_LFLAG_NO_SRST |
>> -       ATA_LFLAG_ASSUME_ATA;
>> -
>> + link->flags |= ATA_LFLAG_ASSUME_ATA |
>> +       ATA_LFLAG_WAIT_SRST;
>>   /* port 5 is for SEMB device and it doesn't like SRST */
>>   if (link->pmp == 5)
>>   link->flags |= ATA_LFLAG_NO_SRST |
>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>> index b2f2003..3a18caa 100644
>> --- a/include/linux/libata.h
>> +++ b/include/linux/libata.h
>> @@ -172,6 +172,7 @@ enum {
>>   ATA_LFLAG_NO_RETRY = (1 << 5), /* don't retry this link */
>>   ATA_LFLAG_DISABLED = (1 << 6), /* link is disabled */
>>   ATA_LFLAG_SW_ACTIVITY = (1 << 7), /* keep activity stats */
>> + ATA_LFLAG_WAIT_SRST = (1 << 8), /* add delay when SRST fails */
>>
>>   /* struct ata_port flags */
>>   ATA_FLAG_SLAVE_POSS = (1 << 0), /* host supports slave dev */
>>
>>
>> On Fri, Sep 30, 2011 at 2:54 PM, Mike I <mihrcke@gmail.com> wrote:
>> >
>> > tj <at> kernel.org <tj <at> kernel.org> writes:
>> >
>> > >
>> > > Hello,
>> > >
>> > > How did that go?
>> > >
>> > > Thanks.
>> > >
>> >
>> > Like Derry who started this thread, I too had seen an old thread from
>> > October/November 2008 with what appeared to be no resolution to this problem.
>> > Now, finding this thread, again, with no apparent resolution to this problem.
>> >
>> > I'm currently running Ubuntu 10.04 (lucid), kernel 2.6.32-33-generic.  I've no
>> > experience with applying these git patches, and my searching to figure out how
>> > it works have not helped.
>> >
>> > I'm using an Addonics eSATA PCI-X controller with the SiI3124 chipset, and I
>> > have an Addonics PM in an external enclosure, with a 5 bay/tray DAS.  Some of
>> > my drives give me this problem: (this occurs for me with pretty much ALL
>> > Samsung hard drives)
>> > [12888.470308] ata9.01: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf
>> > [12888.470313] ata9.01: SError: { PHYRdyChg CommWake DevExch }
>> > [12888.470385] ata9.01: hard resetting link
>> > [12889.211597] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>> > [12889.211686] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
>> > [12889.211692] ata9.15: hard resetting link
>> > [12891.430086] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>> > [12891.430397] ata9.00: hard resetting link
>> > [12891.780786] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>> > [12894.211103] ata9.01: hard resetting link
>> > [12894.560424] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> > [12894.560466] ata9.02: hard resetting link
>> > [12894.914176] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> > [12894.914222] ata9.03: hard resetting link
>> > [12895.264141] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> > [12895.264169] ata9.04: hard resetting link
>> > [12895.612930] ata9.04: SATA link down (SStatus 0 SControl 320)
>> > [12895.613007] ata9.05: hard resetting link
>> > [12895.964143] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> > [12896.065908] ata9.00: configured for UDMA/100
>> > [12896.065970] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
>> > [12896.065977] ata9.15: hard resetting link
>> > [12898.283804] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>> > [12898.284128] ata9.00: hard resetting link
>> > [12898.634174] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>> > [12899.562524] ata9.01: hard resetting link
>> > [12899.914147] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> > [12899.914180] ata9.02: hard resetting link
>> > [12900.261682] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> > [12900.261724] ata9.03: hard resetting link
>> > [12900.610413] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> > [12900.961283] ata9.05: hard resetting link
>> > [12901.310385] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> > [12901.397241] ata9.00: configured for UDMA/100
>> > [12901.397300] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
>> > [12901.397305] ata9.01: failed to recover link after 3 tries, disabling
>> > [12901.397311] ata9.15: hard resetting link
>> > [12903.613694] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
>> > [12903.960564] ata9.00: hard resetting link
>> > [12904.311125] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
>> > [12905.260154] ata9.02: hard resetting link
>> > [12905.602929] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> > [12905.611319] ata9.03: hard resetting link
>> > [12905.962555] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> > [12905.962592] ata9.04: hard resetting link
>> > [12906.312931] ata9.04: SATA link down (SStatus 0 SControl 320)
>> > [12906.313004] ata9.05: hard resetting link
>> > [12906.660409] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>> > [12906.753619] ata9.00: configured for UDMA/100
>> > [12906.766586] ata9.02: configured for UDMA/100
>> > [12906.771917] ata9.03: configured for UDMA/100
>> > [12907.121462] ata9: EH complete
>> >
>> > If I hot plug the same drive using a port directly off my mobo(no PM in the
>> > mix), I get this result(drive connects/mounts/works):
>> > [27997.097104] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action
>> > 0xe frozen
>> > [27997.097108] ata5: irq_stat 0x00400040, connection status changed
>> > [27997.097111] ata5: SError: { PHYRdyChg CommWake DevExch }
>> > [27997.097116] ata5: hard resetting link
>> > [28007.147622] ata5: softreset failed (device not ready)
>> > [28007.147627] ata5: hard resetting link
>> > [28010.630028] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> > [28010.748595] ata5.00: ATA-7: SAMSUNG HD154UI, 1AG01118, max UDMA7
>> > [28010.748599] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
>> > [28010.755227] ata5.00: configured for UDMA/133
>> > [28010.755237] ata5: EH complete
>> > [28010.756338] scsi 4:0:0:0: Direct-Access     ATA      SAMSUNG HD154UI  1AG0
>> > PQ: 0 ANSI: 5
>> > [28010.756475] sd 4:0:0:0: Attached scsi generic sg10 type 0
>> > [28010.756572] sd 4:0:0:0: [sdj] 2930277168 512-byte logical blocks: (1.50
>> > TB/1.36 TiB)
>> > [28010.756613] sd 4:0:0:0: [sdj] Write Protect is off
>> > [28010.756616] sd 4:0:0:0: [sdj] Mode Sense: 00 3a 00 00
>> > [28010.756636] sd 4:0:0:0: [sdj] Write cache: enabled, read cache: enabled,
>> > doesn't support DPO or FUA
>> > [28010.756760]  sdj: sdj1
>> > [28010.816161] sd 4:0:0:0: [sdj] Attached SCSI disk
>> >
>> > I've been using Ubuntu for a few years now, and have been living with the
>> > problem...working around it with USB docking stations and such.  But, I'd
>> > really hope to see/find this problem worked out.
>> >
>> > Thoughts/tips/suggestions?  Since I'm pretty much a novice when it comes to
>> > patching, a link to a guide for git patching would be appreciated too.
>> >
>> > Thank You,
>> > Mike
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-ide" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike I Oct. 8, 2011, 6:25 p.m. UTC | #2
Thank you...

Will this require a complete kernel recompile?
Or is there an easier way to include this patch?

I have little to no experience with working with patches at this level.
I've been doing research on how to use this patch, and I'm close, but, I'm not
sure if I need to attack compiling a kernel, or if I can somehow just patch
libata or something like that?

Thank you,
Mike

On Thu, Oct 6, 2011 at 12:48 AM, Gwendal Grignou <gwendal@google.com> wrote:
>
> I think I know what is going on. One of your disks at least is slow to
> spinup. Due to a bug/feature in silicon image disk controller and pmp,
> at bring up we can not issue a SOFT_RESET and wait for the disk to
> spinup and then continue.
> That why we set ATA_LFLAG_NO_SRST in sata_pmp_quirks().
> So what happen is we go into a function that issue identify, but we
> fail, the disk is not ready [it is spinning up], so we retry.
> 3 times.
>
> From the first hard reset: 12888.470385, to the time you got the final
> error: 12901.397305 ~ 12.9s
> In the second case, your controller can send SOFT_RESET and wait for
> the device to respond.
> Time for the disk to spinup:
> 28010.630028 - 27997.097116 ~ 13.5s
> As you can see, you are borderline with the PMP, but the controller
> did not "wait" enough in the first case.
> Given the spinup time varies with drive, age, time since last
> spin-up..., it may work one day and fail the next.
> To work around the problem, I have a patch that consist of allowing
> the silicon image control to send a reset, but if it fails, we spin
> for a fixed amount of time and retry. This is not very nice, it is a
> better design to wait for event that waiting a fixed amount of time.
> You may have to alter ATA_LFLAG_WAIT_SRST to use the first bit available.
>
> Can you try with the following patch?
>
> Thanks,
> Gwendal.
>
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 228740f..b98b02d 100644
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -2798,7 +2798,14 @@ int ata_eh_reset(struct ata_link *link, int classify,
>      sata_scr_read(link, SCR_STATUS, &sstatus))
>   rc = -ERESTART;
>
> - if (rc == -ERESTART || try >= max_tries)
> + if (try >= max_tries)
> + goto out;
> +
> + /* Some PMP will not serve SRST until the disk is spunup,
> + * if the controller can not wait for the PMP to acknowledge the frame,
> + * wait here */
> + if (rc == -ERESTART &&
> +    !((lflags & ATA_LFLAG_WAIT_SRST) && (reset == softreset)))
>   goto out;
>
>   now = jiffies;
> @@ -2813,6 +2820,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
>   delta = schedule_timeout_uninterruptible(delta);
>   }
>
> + if (rc == -ERESTART)
> + goto out;
>   if (try == max_tries - 1) {
>   sata_down_spd_limit(link, 0);
>   if (slave)
> diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
> index 00305f4..d21ad7d 100644
> --- a/drivers/ata/libata-pmp.c
> +++ b/drivers/ata/libata-pmp.c
> @@ -325,13 +351,11 @@ static void sata_pmp_quirks(struct ata_port *ap)
>   if (vendor == 0x1095 && devid == 0x3726) {
>   /* sil3726 quirks */
>   ata_for_each_link(link, ap, EDGE) {
> - /* Class code report is unreliable and SRST
> - * times out under certain configurations.
> - */
> + /* Class code report is unreliable */
> + /* PMP does not forward SRST until the drive spins up */
>   if (link->pmp < 5)
> - link->flags |= ATA_LFLAG_NO_SRST |
> -       ATA_LFLAG_ASSUME_ATA;
> -
> + link->flags |= ATA_LFLAG_ASSUME_ATA |
> +       ATA_LFLAG_WAIT_SRST;
>   /* port 5 is for SEMB device and it doesn't like SRST */
>   if (link->pmp == 5)
>   link->flags |= ATA_LFLAG_NO_SRST |
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index b2f2003..3a18caa 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -172,6 +172,7 @@ enum {
>   ATA_LFLAG_NO_RETRY = (1 << 5), /* don't retry this link */
>   ATA_LFLAG_DISABLED = (1 << 6), /* link is disabled */
>   ATA_LFLAG_SW_ACTIVITY = (1 << 7), /* keep activity stats */
> + ATA_LFLAG_WAIT_SRST = (1 << 8), /* add delay when SRST fails */
>
>   /* struct ata_port flags */
>   ATA_FLAG_SLAVE_POSS = (1 << 0), /* host supports slave dev */
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike I Oct. 12, 2011, 2:06 a.m. UTC | #3
Thanks again for the help, but, I could use a little more instruction...

I've figured out how to apply the patch to the source, and compiled
the libata-pmp, and libata-eh modules(/drivers/ata), but, I cannot
figure out what to do from here.

The libata-pmp driver/module does not seem to be loaded on my system
to begin with...maybe this was part of my problem.

On Sat, Oct 8, 2011 at 1:25 PM, Michael Ihrcke <mihrcke@gmail.com> wrote:
> Thank you...
>
> Will this require a complete kernel recompile?
> Or is there an easier way to include this patch?
>
> I have little to no experience with working with patches at this level.
> I've been doing research on how to use this patch, and I'm close, but, I'm not
> sure if I need to attack compiling a kernel, or if I can somehow just patch
> libata or something like that?
>
> Thank you,
> Mike
>
> On Thu, Oct 6, 2011 at 12:48 AM, Gwendal Grignou <gwendal@google.com> wrote:
>>
>> I think I know what is going on. One of your disks at least is slow to
>> spinup. Due to a bug/feature in silicon image disk controller and pmp,
>> at bring up we can not issue a SOFT_RESET and wait for the disk to
>> spinup and then continue.
>> That why we set ATA_LFLAG_NO_SRST in sata_pmp_quirks().
>> So what happen is we go into a function that issue identify, but we
>> fail, the disk is not ready [it is spinning up], so we retry.
>> 3 times.
>>
>> From the first hard reset: 12888.470385, to the time you got the final
>> error: 12901.397305 ~ 12.9s
>> In the second case, your controller can send SOFT_RESET and wait for
>> the device to respond.
>> Time for the disk to spinup:
>> 28010.630028 - 27997.097116 ~ 13.5s
>> As you can see, you are borderline with the PMP, but the controller
>> did not "wait" enough in the first case.
>> Given the spinup time varies with drive, age, time since last
>> spin-up..., it may work one day and fail the next.
>> To work around the problem, I have a patch that consist of allowing
>> the silicon image control to send a reset, but if it fails, we spin
>> for a fixed amount of time and retry. This is not very nice, it is a
>> better design to wait for event that waiting a fixed amount of time.
>> You may have to alter ATA_LFLAG_WAIT_SRST to use the first bit available.
>>
>> Can you try with the following patch?
>>
>> Thanks,
>> Gwendal.
>>
>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>> index 228740f..b98b02d 100644
>> --- a/drivers/ata/libata-eh.c
>> +++ b/drivers/ata/libata-eh.c
>> @@ -2798,7 +2798,14 @@ int ata_eh_reset(struct ata_link *link, int classify,
>>      sata_scr_read(link, SCR_STATUS, &sstatus))
>>   rc = -ERESTART;
>>
>> - if (rc == -ERESTART || try >= max_tries)
>> + if (try >= max_tries)
>> + goto out;
>> +
>> + /* Some PMP will not serve SRST until the disk is spunup,
>> + * if the controller can not wait for the PMP to acknowledge the frame,
>> + * wait here */
>> + if (rc == -ERESTART &&
>> +    !((lflags & ATA_LFLAG_WAIT_SRST) && (reset == softreset)))
>>   goto out;
>>
>>   now = jiffies;
>> @@ -2813,6 +2820,8 @@ int ata_eh_reset(struct ata_link *link, int classify,
>>   delta = schedule_timeout_uninterruptible(delta);
>>   }
>>
>> + if (rc == -ERESTART)
>> + goto out;
>>   if (try == max_tries - 1) {
>>   sata_down_spd_limit(link, 0);
>>   if (slave)
>> diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
>> index 00305f4..d21ad7d 100644
>> --- a/drivers/ata/libata-pmp.c
>> +++ b/drivers/ata/libata-pmp.c
>> @@ -325,13 +351,11 @@ static void sata_pmp_quirks(struct ata_port *ap)
>>   if (vendor == 0x1095 && devid == 0x3726) {
>>   /* sil3726 quirks */
>>   ata_for_each_link(link, ap, EDGE) {
>> - /* Class code report is unreliable and SRST
>> - * times out under certain configurations.
>> - */
>> + /* Class code report is unreliable */
>> + /* PMP does not forward SRST until the drive spins up */
>>   if (link->pmp < 5)
>> - link->flags |= ATA_LFLAG_NO_SRST |
>> -       ATA_LFLAG_ASSUME_ATA;
>> -
>> + link->flags |= ATA_LFLAG_ASSUME_ATA |
>> +       ATA_LFLAG_WAIT_SRST;
>>   /* port 5 is for SEMB device and it doesn't like SRST */
>>   if (link->pmp == 5)
>>   link->flags |= ATA_LFLAG_NO_SRST |
>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>> index b2f2003..3a18caa 100644
>> --- a/include/linux/libata.h
>> +++ b/include/linux/libata.h
>> @@ -172,6 +172,7 @@ enum {
>>   ATA_LFLAG_NO_RETRY = (1 << 5), /* don't retry this link */
>>   ATA_LFLAG_DISABLED = (1 << 6), /* link is disabled */
>>   ATA_LFLAG_SW_ACTIVITY = (1 << 7), /* keep activity stats */
>> + ATA_LFLAG_WAIT_SRST = (1 << 8), /* add delay when SRST fails */
>>
>>   /* struct ata_port flags */
>>   ATA_FLAG_SLAVE_POSS = (1 << 0), /* host supports slave dev */
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike I Oct. 13, 2011, 2:09 a.m. UTC | #4
Working much better now...this is what I get when I hot plug a drive that was
giving me problems before:

[  211.873069] ata3.01: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf
[  211.873075] ata3.01: SError: { PHYRdyChg CommWake DevExch }
[  211.873181] ata3.01: limiting SATA link speed to 1.5 Gbps
[  211.873188] ata3.01: hard resetting link
[  212.750032] ata3.01: softreset failed (SRST command error)
[  212.850022] ata3.01: failed to read SCR 0 (Emask=0x1)
[  212.850026] ata3.01: reset failed (errno=-85), retrying in 10 secs
[  221.870023] ata3.01: reset failed, giving up
[  221.870029] ata3.15: hard resetting link
[  221.870032] ata3: controller in dubious state, performing PORT_RST
[  224.140057] ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  224.140364] ata3.02: limiting SATA link speed to 1.5 Gbps
[  224.140402] ata3.03: limiting SATA link speed to 1.5 Gbps
[  224.140440] ata3.04: limiting SATA link speed to 1.5 Gbps
[  224.140477] ata3.05: limiting SATA link speed to 1.5 Gbps
[  224.140481] ata3.00: hard resetting link
[  224.550272] ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  224.550311] ata3.01: hard resetting link
[  225.020043] ata3.01: softreset failed (SRST command error)
[  225.020069] ata3.01: failed to read SCR 0 (Emask=0x40)
[  225.020073] ata3.01: reset failed (errno=-85), retrying in 10 secs
[  234.550024] ata3.01: reset failed, giving up
[  234.550032] ata3.15: hard resetting link
[  234.550036] ata3: controller in dubious state, performing PORT_RST
[  236.810049] ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  236.810336] ata3.00: hard resetting link
[  237.230271] ata3.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  237.230310] ata3.01: hard resetting link
[  237.920270] ata3.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  237.920309] ata3.02: hard resetting link
[  238.270420] ata3.02: SATA link down (SStatus 0 SControl 310)
[  238.270492] ata3.03: hard resetting link
[  238.610434] ata3.03: SATA link down (SStatus 0 SControl 310)
[  238.610504] ata3.04: hard resetting link
[  238.960436] ata3.04: SATA link down (SStatus 0 SControl 310)
[  238.960508] ata3.05: hard resetting link
[  239.310401] ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  239.413575] ata3.00: configured for UDMA/100
[  239.420008] ata3.01: ATA-7: SAMSUNG HD154UI, 1AG01118, max UDMA7
[  239.420012] ata3.01: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[  239.426691] ata3.01: configured for UDMA/100
[  239.426763] ata3: EH complete
[  239.426967] scsi 2:1:0:0: Direct-Access     ATA      SAMSUNG HD154UI  1AG0
PQ: 0 ANSI: 5
[  239.427123] sd 2:1:0:0: Attached scsi generic sg6 type 0
[  239.427138] sd 2:1:0:0: [sde] 2930277168 512-byte logical blocks: (1.50
TB/1.36 TiB)
[  239.427182] sd 2:1:0:0: [sde] Write Protect is off
[  239.427185] sd 2:1:0:0: [sde] Mode Sense: 00 3a 00 00
[  239.427206] sd 2:1:0:0: [sde] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA
[  239.427348]  sde: sde1
[  239.581901] sd 2:1:0:0: [sde] Attached SCSI disk

Thank you,
Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 228740f..b98b02d 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2798,7 +2798,14 @@  int ata_eh_reset(struct ata_link *link, int classify,
     sata_scr_read(link, SCR_STATUS, &sstatus))
  rc = -ERESTART;

- if (rc == -ERESTART || try >= max_tries)
+ if (try >= max_tries)
+ goto out;
+
+ /* Some PMP will not serve SRST until the disk is spunup,
+ * if the controller can not wait for the PMP to acknowledge the frame,
+ * wait here */
+ if (rc == -ERESTART &&
+    !((lflags & ATA_LFLAG_WAIT_SRST) && (reset == softreset)))
  goto out;

  now = jiffies;
@@ -2813,6 +2820,8 @@  int ata_eh_reset(struct ata_link *link, int classify,
  delta = schedule_timeout_uninterruptible(delta);
  }

+ if (rc == -ERESTART)
+ goto out;
  if (try == max_tries - 1) {
  sata_down_spd_limit(link, 0);
  if (slave)
diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
index 00305f4..d21ad7d 100644
--- a/drivers/ata/libata-pmp.c
+++ b/drivers/ata/libata-pmp.c
@@ -325,13 +351,11 @@  static void sata_pmp_quirks(struct ata_port *ap)
  if (vendor == 0x1095 && devid == 0x3726) {
  /* sil3726 quirks */
  ata_for_each_link(link, ap, EDGE) {
- /* Class code report is unreliable and SRST
- * times out under certain configurations.
- */
+ /* Class code report is unreliable */
+ /* PMP does not forward SRST until the drive spins up */
  if (link->pmp < 5)
- link->flags |= ATA_LFLAG_NO_SRST |
-       ATA_LFLAG_ASSUME_ATA;
-
+ link->flags |= ATA_LFLAG_ASSUME_ATA |
+       ATA_LFLAG_WAIT_SRST;
  /* port 5 is for SEMB device and it doesn't like SRST */
  if (link->pmp == 5)
  link->flags |= ATA_LFLAG_NO_SRST |
diff --git a/include/linux/libata.h b/include/linux/libata.h
index b2f2003..3a18caa 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -172,6 +172,7 @@  enum {
  ATA_LFLAG_NO_RETRY = (1 << 5), /* don't retry this link */
  ATA_LFLAG_DISABLED = (1 << 6), /* link is disabled */
  ATA_LFLAG_SW_ACTIVITY = (1 << 7), /* keep activity stats */
+ ATA_LFLAG_WAIT_SRST = (1 << 8), /* add delay when SRST fails */

  /* struct ata_port flags */
  ATA_FLAG_SLAVE_POSS = (1 << 0), /* host supports slave dev */