diff mbox

Problem w/ hotplug on sata_sil24 w/ PMP (sil3726)

Message ID 20110713143936.GQ2872@htj.dyndns.org
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Tejun Heo July 13, 2011, 2:39 p.m. UTC
Hello, again.

On Wed, Jul 13, 2011 at 03:34:25PM +0200, tj@kernel.org wrote:
> I see.  Yeah, it doesn't seem to be PHY flickering.  It looks like
> IDENTIFY is failing before disk finishes spinning up, which isn't the
> correct behavior.  SRST is supposed to wait for disk spin up
> completion.  I don't know why sil24 softreset is failing without
> waiting for spinup.  I'll see whether I can reproduce the problem
> here.

I played with it a bit but couldn't reproduce the problem; however,
PMP EH does lack per-device retry back-off.  On host ports, when
reset/identification fails for whatever reason, retry is backed off
such that even if controller/device interaction is faulty during spin
up, EH always retries after sufficient time is given to the device.
With PMP EH, this logic is missing partly because PMP EH sometimes
already takes too long and partly because tracking the timestamps
seemed a bit complex during the initial implementation.

So, this definitely contributes to why PMP EH fails to work around the
misinteraction between the drive and port and should probably be
improved, but let's also find out why the controller isn't waiting for
the drive after issuing IDENTIFY.  Can you please apply the following
patch, reproduce the problem and report the kernel log?

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Derry Bryson July 13, 2011, 4:23 p.m. UTC | #1
Hi:

Here is the result after the patch of hotplugging WD 3TB drive (only drive):

[   92.201163] ata1.02: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf
[   92.210719] ata1.02: SError: { PHYRdyChg CommWake DevExch }
[   92.220365] ata1.02: hard resetting link
[   92.980398] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   92.980449] ata1.02: issuing IDENTIFY
[   92.980476] ata1.02: IDENTIFY failed, irq_stat=0x60002 cerr=4
[   92.980517] ata1.02: failed to IDENTIFY (I/O error, err_mask=0x11)
[   92.980526] ata1.15: hard resetting link
[   95.200045] ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[   95.200327] ata1.00: hard resetting link
[   95.640059] ata1.00: link resume succeeded after 1 retries
[   95.760404] ata1.00: SATA link down (SStatus 0 SControl 310)
[   95.760481] ata1.01: hard resetting link
[   96.110392] ata1.01: SATA link down (SStatus 0 SControl 320)
[   97.980042] ata1.02: hard resetting link
[   98.330391] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   98.330437] ata1.03: hard resetting link
[   98.680386] ata1.03: SATA link down (SStatus 0 SControl 320)
[   98.680461] ata1.04: hard resetting link
[   99.030393] ata1.04: SATA link down (SStatus 0 SControl 320)
[   99.030468] ata1.05: hard resetting link
[   99.390388] ata1.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[   99.390438] ata1.02: issuing IDENTIFY
[   99.390464] ata1.02: IDENTIFY failed, irq_stat=0x60002 cerr=4
[   99.390506] ata1.02: failed to IDENTIFY (I/O error, err_mask=0x11)
[   99.390515] ata1.15: hard resetting link
[  101.610048] ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  101.610325] ata1.00: hard resetting link
[  102.050070] ata1.00: link resume succeeded after 1 retries
[  102.170389] ata1.00: SATA link down (SStatus 0 SControl 310)
[  102.170466] ata1.01: hard resetting link
[  102.520390] ata1.01: SATA link down (SStatus 0 SControl 320)
[  103.330025] ata1.02: hard resetting link
[  103.680391] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  104.390026] ata1.05: hard resetting link
[  104.740391] ata1.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[  104.740442] ata1.02: issuing IDENTIFY
[  104.740468] ata1.02: IDENTIFY failed, irq_stat=0x60002 cerr=4
[  104.740510] ata1.02: failed to IDENTIFY (I/O error, err_mask=0x11)
[  104.740518] ata1.15: hard resetting link
[  106.960046] ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  106.960329] ata1.00: hard resetting link
[  107.400059] ata1.00: link resume succeeded after 1 retries
[  107.520397] ata1.00: SATA link down (SStatus 0 SControl 310)
[  107.520472] ata1.01: hard resetting link
[  107.870392] ata1.01: SATA link down (SStatus 0 SControl 320)
[  108.680021] ata1.02: hard resetting link
[  109.030393] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  109.030439] ata1.03: hard resetting link
[  109.380397] ata1.03: SATA link down (SStatus 0 SControl 320)
[  109.380473] ata1.04: hard resetting link
[  109.730395] ata1.04: SATA link down (SStatus 0 SControl 320)
[  109.741269] ata1.05: hard resetting link
[  110.090393] ata1.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[  110.090443] ata1.02: issuing IDENTIFY
[  110.118499] ata1.02: ATA-8: WDC WD30EZRX-00MMMB0, 80.00A80, max UDMA/133
[  110.118506] ata1.02: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[  110.118642] ata1.02: issuing IDENTIFY
[  110.123817] ata1.02: configured for UDMA/100
[  110.123904] ata1: EH complete
[  110.124108] scsi 0:2:0:0: Direct-Access     ATA      WDC WD30EZRX-00M 80.0 PQ: 0 ANSI: 5
[  110.124554] sd 0:2:0:0: [sda] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
[  110.124578] sd 0:2:0:0: Attached scsi generic sg0 type 0
[  110.124596] sd 0:2:0:0: [sda] 4096-byte physical blocks
[  110.124735] sd 0:2:0:0: [sda] Write Protect is off
[  110.124742] sd 0:2:0:0: [sda] Mode Sense: 00 3a 00 00
[  110.124813] sd 0:2:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  110.125220]  sda: sda1 sda2
[  110.170313] sd 0:2:0:0: [sda] Attached SCSI disk
[  110.172439] ata1.02: issuing IDENTIFY
[  110.250997] ata1.02: issuing IDENTIFY
[  110.274885] ata1.02: issuing IDENTIFY

> -----Original Message-----
> From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of tj@kernel.org
> Sent: Wednesday, July 13, 2011 7:40 AM
> To: Derry Bryson
> Cc: linux-ide@vger.kernel.org
> Subject: Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726)
> 
> Hello, again.
> 
> On Wed, Jul 13, 2011 at 03:34:25PM +0200, tj@kernel.org wrote:
> > I see.  Yeah, it doesn't seem to be PHY flickering.  It looks like
> > IDENTIFY is failing before disk finishes spinning up, which isn't the
> > correct behavior.  SRST is supposed to wait for disk spin up
> > completion.  I don't know why sil24 softreset is failing without
> > waiting for spinup.  I'll see whether I can reproduce the problem
> > here.
> 
> I played with it a bit but couldn't reproduce the problem; however,
> PMP EH does lack per-device retry back-off.  On host ports, when
> reset/identification fails for whatever reason, retry is backed off
> such that even if controller/device interaction is faulty during spin
> up, EH always retries after sufficient time is given to the device.
> With PMP EH, this logic is missing partly because PMP EH sometimes
> already takes too long and partly because tracking the timestamps
> seemed a bit complex during the initial implementation.
> 
> So, this definitely contributes to why PMP EH fails to work around the
> misinteraction between the drive and port and should probably be
> improved, but let's also find out why the controller isn't waiting for
> the drive after issuing IDENTIFY.  Can you please apply the following
> patch, reproduce the problem and report the kernel log?
> 
> Thank you.
> 
> diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c
> index 06c564e..cbd3475 100644
> --- a/drivers/ata/sata_sil24.c
> +++ b/drivers/ata/sata_sil24.c
> @@ -896,6 +896,9 @@ static unsigned int sil24_qc_issue(struct
> ata_queued_cmd *qc)
>  	dma_addr_t paddr;
>  	void __iomem *activate;
> 
> +	if (qc->tf.command == ATA_CMD_ID_ATA)
> +		ata_dev_printk(qc->dev, KERN_INFO, "issuing IDENTIFY\n");
> +
>  	paddr = pp->cmd_block_dma + tag * sizeof(*pp->cmd_block);
>  	activate = port + PORT_CMD_ACTIVATE + tag * 8;
> 
> @@ -1081,9 +1084,12 @@ static void sil24_error_intr(struct ata_port
> *ap)
>  		}
> 
>  		/* record error info */
> -		if (qc)
> +		if (qc) {
> +			if (qc->tf.command == ATA_CMD_ID_ATA)
> +				ata_dev_printk(qc->dev, KERN_INFO, "IDENTIFY
> failed, irq_stat=%#0x cerr=%u\n",
> +					       irq_stat, cerr);
>  			qc->err_mask |= err_mask;
> -		else
> +		} else
>  			ehi->err_mask |= err_mask;
> 
>  		ehi->action |= action;
> 
> 
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1390 / Virus Database: 1516/3761 - Release Date: 07/12/11


Derry Bryson
Sr. Software Engineer 

Highly Reliable Systems
www.High-Rely.com
"Making Backup Invisible"

Phone: 775.329.5139 X129
Fax:     775.370.1001

Sales & Support: 877.384.6838

This e-mail and any files transmitted with it are the property of Highly Reliable Systems, Inc. This (email) information is confidential, and is intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, disclosure, retention, dissemination, forwarding, printing, distribution or copying of this e-mail is strictly prohibited.




--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo July 14, 2011, 7:14 a.m. UTC | #2
Hello,

On Wed, Jul 13, 2011 at 04:23:27PM +0000, Derry Bryson wrote:
> [   92.980476] ata1.02: IDENTIFY failed, irq_stat=0x60002 cerr=4

The controller is saying that it failed to transmit command FIS
because PHYRDY state changed, so that is PHY flickering.  Can you
please bump the second debounce timing to 1000 (not only the first one
but always) and see whether the failure pattern changes?

Thanks.
Derry Bryson July 14, 2011, 5:37 p.m. UTC | #3
Hi:

Dmesg with second timing value set to 1000 always, hotplugging WD 3TB drive:

[  139.145917] ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf
[  139.156180] ata5.00: SError: { PHYRdyChg CommWake DevExch }
[  139.166619] ata5.00: hard resetting link
[  139.166639] ata5.00: XXXX hardreset hotplugged = true
[  139.166646] ata5.00: XXX0 hardreset debounce 25 500 2000
[  139.166651] ata5.00: XXX1 hardreset debounce 25 1000 2000
[  140.430401] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  140.430453] ata5.00: issuing IDENTIFY
[  140.430480] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4
[  140.430521] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11)
[  140.430532] ata5.15: hard resetting link
[  142.650047] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  145.431276] ata5.00: hard resetting link
[  145.431298] ata5.00: XXXX hardreset hotplugged = false
[  145.431304] ata5.00: XXX0 hardreset debounce 5 100 2000
[  145.431310] ata5.00: XXX1 hardreset debounce 5 1000 2000
[  146.690413] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  146.690461] ata5.01: hard resetting link
[  146.690481] ata5.01: XXXX hardreset hotplugged = false
[  146.690487] ata5.01: XXX0 hardreset debounce 5 100 2000
[  146.690492] ata5.01: XXX1 hardreset debounce 5 1000 2000
[  147.940402] ata5.01: SATA link down (SStatus 0 SControl 320)
[  147.940481] ata5.02: hard resetting link
[  147.940501] ata5.02: XXXX hardreset hotplugged = false
[  147.940507] ata5.02: XXX0 hardreset debounce 5 100 2000
[  147.940513] ata5.02: XXX1 hardreset debounce 5 1000 2000
[  149.190398] ata5.02: SATA link down (SStatus 0 SControl 320)
[  149.190476] ata5.03: hard resetting link
[  149.190497] ata5.03: XXXX hardreset hotplugged = false
[  149.190503] ata5.03: XXX0 hardreset debounce 5 100 2000
[  149.190508] ata5.03: XXX1 hardreset debounce 5 1000 2000
[  150.440399] ata5.03: SATA link down (SStatus 0 SControl 320)
[  150.440478] ata5.04: hard resetting link
[  150.440499] ata5.04: XXXX hardreset hotplugged = false
[  150.440505] ata5.04: XXX0 hardreset debounce 5 100 2000
[  150.440510] ata5.04: XXX1 hardreset debounce 5 1000 2000
[  151.690410] ata5.04: SATA link down (SStatus 0 SControl 320)
[  151.690488] ata5.05: hard resetting link
[  151.690509] ata5.05: XXXX hardreset hotplugged = false
[  151.690515] ata5.05: XXX0 hardreset debounce 5 100 2000
[  151.690520] ata5.05: XXX1 hardreset debounce 5 1000 2000
[  152.940399] ata5.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[  152.940451] ata5.00: issuing IDENTIFY
[  152.940477] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4
[  152.940518] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11)
[  152.940529] ata5.15: hard resetting link
[  155.160047] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  155.160327] ata5.00: hard resetting link
[  155.160347] ata5.00: XXXX hardreset hotplugged = false
[  155.160353] ata5.00: XXX0 hardreset debounce 5 100 2000
[  155.160358] ata5.00: XXX1 hardreset debounce 5 1000 2000
[  156.410398] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  157.940026] ata5.05: hard resetting link
[  157.940048] ata5.05: XXXX hardreset hotplugged = false
[  157.940054] ata5.05: XXX0 hardreset debounce 5 100 2000
[  157.940059] ata5.05: XXX1 hardreset debounce 5 1000 2000
[  159.190398] ata5.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[  159.190450] ata5.00: issuing IDENTIFY
[  159.199553] ata5.00: ATA-8: WDC WD30EZRX-00MMMB0, 80.00A80, max UDMA/133
[  159.199560] ata5.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[  159.199702] ata5.00: issuing IDENTIFY
[  159.205571] ata5.00: configured for UDMA/100
[  159.205661] ata5: EH complete
[  159.205857] scsi 4:0:0:0: Direct-Access     ATA      WDC WD30EZRX-00M 80.0 PQ: 0 ANSI: 5
[  159.206297] sd 4:0:0:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
[  159.206305] sd 4:0:0:0: [sdb] 4096-byte physical blocks
[  159.206319] sd 4:0:0:0: Attached scsi generic sg1 type 0
[  159.206440] sd 4:0:0:0: [sdb] Write Protect is off
[  159.206447] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[  159.206519] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  159.206938]  sdb: sdb1 sdb2
[  159.251508] sd 4:0:0:0: [sdb] Attached SCSI disk
[  159.254218] ata5.00: issuing IDENTIFY
[  159.330233] ata5.00: issuing IDENTIFY
[  159.350570] ata5.00: issuing IDENTIFY

> -----Original Message-----
> From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of tj@kernel.org
> Sent: Thursday, July 14, 2011 12:15 AM
> To: Derry Bryson
> Cc: linux-ide@vger.kernel.org
> Subject: Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726)
> 
> Hello,
> 
> On Wed, Jul 13, 2011 at 04:23:27PM +0000, Derry Bryson wrote:
> > [   92.980476] ata1.02: IDENTIFY failed, irq_stat=0x60002 cerr=4
> 
> The controller is saying that it failed to transmit command FIS
> because PHYRDY state changed, so that is PHY flickering.  Can you
> please bump the second debounce timing to 1000 (not only the first one
> but always) and see whether the failure pattern changes?
> 
> Thanks.
> 
> --
> tejun
> 

Derry Bryson
Sr. Software Engineer 

Highly Reliable Systems
www.High-Rely.com
"Making Backup Invisible"

Phone: 775.329.5139 X129
Fax:     775.370.1001

Sales & Support: 877.384.6838

This e-mail and any files transmitted with it are the property of Highly Reliable Systems, Inc. This (email) information is confidential, and is intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, disclosure, retention, dissemination, forwarding, printing, distribution or copying of this e-mail is strictly prohibited.




--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo July 21, 2011, 9 a.m. UTC | #4
On Thu, Jul 14, 2011 at 05:37:37PM +0000, Derry Bryson wrote:
> Hi:
> 
> Dmesg with second timing value set to 1000 always, hotplugging WD 3TB drive:
> 
> [  139.145917] ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf
> [  139.156180] ata5.00: SError: { PHYRdyChg CommWake DevExch }
> [  139.166619] ata5.00: hard resetting link
> [  140.430401] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
> [  140.430453] ata5.00: issuing IDENTIFY
> [  140.430480] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4
> [  140.430521] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11)
...
> [  145.431276] ata5.00: hard resetting link
> [  145.431298] ata5.00: XXXX hardreset hotplugged = false
> [  145.431304] ata5.00: XXX0 hardreset debounce 5 100 2000
> [  145.431310] ata5.00: XXX1 hardreset debounce 5 1000 2000
> [  146.690413] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
...
> [  152.940451] ata5.00: issuing IDENTIFY
> [  152.940477] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4
> [  152.940518] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11)

So, no change.  Weird.  I think we need to implement retry backoff
properly anyways but am still wondering why probing failure behaves
differently for you.  I'll try with similar drive but maybe it's
difference in PMP firmware?

Thanks.
Derry Bryson July 21, 2011, 4:22 p.m. UTC | #5
Tejun: 

Here are the specs on the port multiplier board:

Lycom ST-126RM (http://www.lycom.com.tw/)
Sil3726 chip revision 17
EEPROM FW Version: 1.0114

Also there is another problem:  ports that are not connected at power on
are ignored until a port that was connected at power on is hotplugged.  So,
for instance, if I turn the unit on with only a drive on the first port, the
other four ports will not work and I don't even see kernel messages when
hotplugging until I hotplug the first port.  I haven't looked into this yet,
but it would seem the driver must be disabling the ports and they are not
looked at again until the hard reset is performed.

Let me know if I can help in any way.

Thanks,

Derry Bryson
Sr. Software Engineer 

Highly Reliable Systems
www.High-Rely.com
"Making Backup Invisible"

Phone: 775.329.5139 X129
Fax:     775.370.1001

Sales & Support: 877.384.6838

This e-mail and any files transmitted with it are the property of Highly Reliable Systems, Inc. This (email) information is confidential, and is intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, disclosure, retention, dissemination, forwarding, printing, distribution or copying of this e-mail is strictly prohibited.

> -----Original Message-----
> From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of tj@kernel.org
> Sent: Thursday, July 21, 2011 2:00 AM
> To: Derry Bryson
> Cc: linux-ide@vger.kernel.org
> Subject: Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726)
> 
> On Thu, Jul 14, 2011 at 05:37:37PM +0000, Derry Bryson wrote:
> > Hi:
> >
> > Dmesg with second timing value set to 1000 always, hotplugging WD 3TB
> drive:
> >
> > [  139.145917] ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000
> action 0xf
> > [  139.156180] ata5.00: SError: { PHYRdyChg CommWake DevExch }
> > [  139.166619] ata5.00: hard resetting link
> > [  140.430401] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl
> 320)
> > [  140.430453] ata5.00: issuing IDENTIFY
> > [  140.430480] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4
> > [  140.430521] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11)
> ...
> > [  145.431276] ata5.00: hard resetting link
> > [  145.431298] ata5.00: XXXX hardreset hotplugged = false
> > [  145.431304] ata5.00: XXX0 hardreset debounce 5 100 2000
> > [  145.431310] ata5.00: XXX1 hardreset debounce 5 1000 2000
> > [  146.690413] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl
> 320)
> ...
> > [  152.940451] ata5.00: issuing IDENTIFY
> > [  152.940477] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4
> > [  152.940518] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11)
> 
> So, no change.  Weird.  I think we need to implement retry backoff
> properly anyways but am still wondering why probing failure behaves
> differently for you.  I'll try with similar drive but maybe it's
> difference in PMP firmware?
> 
> Thanks.
> 
> --
> tejun
> 
> 
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1390 / Virus Database: 1518/3777 - Release Date: 07/20/11

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo July 22, 2011, 9:50 a.m. UTC | #6
Hello,

On Thu, Jul 21, 2011 at 04:22:07PM +0000, Derry Bryson wrote:
> Here are the specs on the port multiplier board:
> 
> Lycom ST-126RM (http://www.lycom.com.tw/)
> Sil3726 chip revision 17
> EEPROM FW Version: 1.0114
> 
> Also there is another problem:  ports that are not connected at power on
> are ignored until a port that was connected at power on is hotplugged.  So,
> for instance, if I turn the unit on with only a drive on the first port, the
> other four ports will not work and I don't even see kernel messages when
> hotplugging until I hotplug the first port.  I haven't looked into this yet,
> but it would seem the driver must be disabling the ports and they are not
> looked at again until the hard reset is performed.

That's most likely the sil pmp itself.  It has been known to show
erratic behavior if the first port is not occupied during power up.  I
don't know whether later firmware updates fixed that tho.  Can you get
hold of a marvell PMP and see how it behaves?

Thanks.
Derry Bryson July 22, 2011, 5:29 p.m. UTC | #7
> -----Original Message-----
> From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of tj@kernel.org
> Sent: Friday, July 22, 2011 2:51 AM
> To: Derry Bryson
> Cc: linux-ide@vger.kernel.org
> Subject: Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726)
> 
> Hello,
> 
> On Thu, Jul 21, 2011 at 04:22:07PM +0000, Derry Bryson wrote:
> > Here are the specs on the port multiplier board:
> >
> > Lycom ST-126RM (http://www.lycom.com.tw/)
> > Sil3726 chip revision 17
> > EEPROM FW Version: 1.0114
> >
> > Also there is another problem:  ports that are not connected at power
> on
> > are ignored until a port that was connected at power on is
> hotplugged.  So,
> > for instance, if I turn the unit on with only a drive on the first
> port, the
> > other four ports will not work and I don't even see kernel messages
> when
> > hotplugging until I hotplug the first port.  I haven't looked into
> this yet,
> > but it would seem the driver must be disabling the ports and they are
> not
> > looked at again until the hard reset is performed.
> 
> That's most likely the sil pmp itself.  It has been known to show
> erratic behavior if the first port is not occupied during power up.  I
> don't know whether later firmware updates fixed that tho.  Can you get
> hold of a marvell PMP and see how it behaves?
> 

That may well be a problem, but is not the problem in this case since it
is not limited to port 0.  Seems like the sil driver or pmp code in the kernel
is disabling the ports that are not connected at pmp power on and not looking
at them again until we hotplug one of the ports that was connected and cause
a hard reset or something like that.  

For instance, I turn off the pmp unit and connect drives to
ports 0, 1, and 2 then power on the unit.  Linux sees the pmp unit and finds the
three drives.  Then I connect a drive to port 3 and Linux reports nothing.  No
interrupts, no nothing.  Same with port 4.  However if I connect a drive to
port 3 or 4 (or both) and hotplug the a drive on any of ports 0, 1, or 2 now
the drives are detected after the hard reset.

This hardware works fine under Windows, hotplugging drives and powering the 
unit off and on, etc. using the same Sil3124 eSATA PCI controller.  No problems, 
so I think the hardware is okay.

I don't know of nor have a Marvell PMP, but I do have another Sil3726
PMP board with newer firmware and a board with a JMicron PMP that I am
going to test with. 

Thanks for your help.

Derry Bryson
Sr. Software Engineer 

Highly Reliable Systems
www.High-Rely.com
"Making Backup Invisible"

Phone: 775.329.5139 X129
Fax:     775.370.1001

Sales & Support: 877.384.6838

This e-mail and any files transmitted with it are the property of Highly Reliable Systems, Inc. This (email) information is confidential, and is intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, disclosure, retention, dissemination, forwarding, printing, distribution or copying of this e-mail is strictly prohibited.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Derry Bryson July 22, 2011, 7:34 p.m. UTC | #8
I did some testing with a PMP board containing a Sil3726 with firmware
Version 1.0124 and a PMP/Raid board with a JMicron 393 configured as
PMP only.  I am connecting to a Sil3124 esata pci controller.  

The Sil3726 board acted the same as the previous Sil3726 board with the 1.0114
firmware.

The JMicron board worked much better.  It works properly with a stock 
kernel and does not fail the IDENTIFY after hotplugging and does
not require retries.  It seems to wait for the drive to spin up before
doing anything so it works properly the first time without retries or
hard resets.  The only problem I saw was that it seemed to not work at
all a couple of times if not powered up/connected before booting Linux.  
Couldn't see it at all, even after several power cycles and unplugging/pluggin
esata cable.  I need  to test this more as it is inconsistent and may 
have to do with the Sil3124 controller rather than the JMicron PMP.

Derry Bryson
Sr. Software Engineer 

Highly Reliable Systems
www.High-Rely.com
"Making Backup Invisible"

Phone: 775.329.5139 X129
Fax:     775.370.1001

Sales & Support: 877.384.6838

This e-mail and any files transmitted with it are the property of Highly Reliable Systems, Inc. This (email) information is confidential, and is intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, disclosure, retention, dissemination, forwarding, printing, distribution or copying of this e-mail is strictly prohibited.



> -----Original Message-----
> From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of tj@kernel.org
> Sent: Friday, July 22, 2011 2:51 AM
> To: Derry Bryson
> Cc: linux-ide@vger.kernel.org
> Subject: Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726)
> 
> Hello,
> 
> On Thu, Jul 21, 2011 at 04:22:07PM +0000, Derry Bryson wrote:
> > Here are the specs on the port multiplier board:
> >
> > Lycom ST-126RM (http://www.lycom.com.tw/)
> > Sil3726 chip revision 17
> > EEPROM FW Version: 1.0114
> >
> > Also there is another problem:  ports that are not connected at power
> on
> > are ignored until a port that was connected at power on is
> hotplugged.  So,
> > for instance, if I turn the unit on with only a drive on the first
> port, the
> > other four ports will not work and I don't even see kernel messages
> when
> > hotplugging until I hotplug the first port.  I haven't looked into
> this yet,
> > but it would seem the driver must be disabling the ports and they are
> not
> > looked at again until the hard reset is performed.
> 
> That's most likely the sil pmp itself.  It has been known to show
> erratic behavior if the first port is not occupied during power up.  I
> don't know whether later firmware updates fixed that tho.  Can you get
> hold of a marvell PMP and see how it behaves?
> 
> Thanks.
> 
> --
> tejun
> 
> 
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1390 / Virus Database: 1518/3779 - Release Date: 07/21/11

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo July 30, 2011, 12:54 p.m. UTC | #9
Hello,

On Fri, Jul 22, 2011 at 07:34:37PM +0000, Derry Bryson wrote:
> I did some testing with a PMP board containing a Sil3726 with firmware
> Version 1.0124 and a PMP/Raid board with a JMicron 393 configured as
> PMP only.  I am connecting to a Sil3124 esata pci controller.  
> 
> The Sil3726 board acted the same as the previous Sil3726 board with the 1.0114
> firmware.
> 
> The JMicron board worked much better.  It works properly with a stock 
> kernel and does not fail the IDENTIFY after hotplugging and does
> not require retries.  It seems to wait for the drive to spin up before
> doing anything so it works properly the first time without retries or
> hard resets.

Interesting, that's the behavior I see w/ my 3726 but all my PMPs are
very old, so I guess it has changed since then.  A bit too lazy to
hook it up to windows machine to check firmware version ATM. :)

> The only problem I saw was that it seemed to not work at all a
> couple of times if not powered up/connected before booting Linux.
> Couldn't see it at all, even after several power cycles and
> unplugging/pluggin esata cable.  I need to test this more as it is
> inconsistent and may have to do with the Sil3124 controller rather
> than the JMicron PMP.

How did that go?

Thanks.
Mike I Sept. 30, 2011, 9:54 p.m. UTC | #10
tj <at> kernel.org <tj <at> kernel.org> writes:

> 
> Hello,
> 
> How did that go?
> 
> Thanks.
> 

Like Derry who started this thread, I too had seen an old thread from
October/November 2008 with what appeared to be no resolution to this problem. 
Now, finding this thread, again, with no apparent resolution to this problem.

I'm currently running Ubuntu 10.04 (lucid), kernel 2.6.32-33-generic.  I've no
experience with applying these git patches, and my searching to figure out how
it works have not helped.

I'm using an Addonics eSATA PCI-X controller with the SiI3124 chipset, and I
have an Addonics PM in an external enclosure, with a 5 bay/tray DAS.  Some of
my drives give me this problem: (this occurs for me with pretty much ALL 
Samsung hard drives)
[12888.470308] ata9.01: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf
[12888.470313] ata9.01: SError: { PHYRdyChg CommWake DevExch }
[12888.470385] ata9.01: hard resetting link
[12889.211597] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[12889.211686] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
[12889.211692] ata9.15: hard resetting link
[12891.430086] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[12891.430397] ata9.00: hard resetting link
[12891.780786] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[12894.211103] ata9.01: hard resetting link
[12894.560424] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[12894.560466] ata9.02: hard resetting link
[12894.914176] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[12894.914222] ata9.03: hard resetting link
[12895.264141] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[12895.264169] ata9.04: hard resetting link
[12895.612930] ata9.04: SATA link down (SStatus 0 SControl 320)
[12895.613007] ata9.05: hard resetting link
[12895.964143] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[12896.065908] ata9.00: configured for UDMA/100
[12896.065970] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
[12896.065977] ata9.15: hard resetting link
[12898.283804] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[12898.284128] ata9.00: hard resetting link
[12898.634174] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[12899.562524] ata9.01: hard resetting link
[12899.914147] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[12899.914180] ata9.02: hard resetting link
[12900.261682] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[12900.261724] ata9.03: hard resetting link
[12900.610413] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[12900.961283] ata9.05: hard resetting link
[12901.310385] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[12901.397241] ata9.00: configured for UDMA/100
[12901.397300] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11)
[12901.397305] ata9.01: failed to recover link after 3 tries, disabling
[12901.397311] ata9.15: hard resetting link
[12903.613694] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[12903.960564] ata9.00: hard resetting link
[12904.311125] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[12905.260154] ata9.02: hard resetting link
[12905.602929] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[12905.611319] ata9.03: hard resetting link
[12905.962555] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[12905.962592] ata9.04: hard resetting link
[12906.312931] ata9.04: SATA link down (SStatus 0 SControl 320)
[12906.313004] ata9.05: hard resetting link
[12906.660409] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
[12906.753619] ata9.00: configured for UDMA/100
[12906.766586] ata9.02: configured for UDMA/100
[12906.771917] ata9.03: configured for UDMA/100
[12907.121462] ata9: EH complete

If I hot plug the same drive using a port directly off my mobo(no PM in the
mix), I get this result(drive connects/mounts/works):
[27997.097104] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 
0xe frozen
[27997.097108] ata5: irq_stat 0x00400040, connection status changed
[27997.097111] ata5: SError: { PHYRdyChg CommWake DevExch }
[27997.097116] ata5: hard resetting link
[28007.147622] ata5: softreset failed (device not ready)
[28007.147627] ata5: hard resetting link
[28010.630028] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[28010.748595] ata5.00: ATA-7: SAMSUNG HD154UI, 1AG01118, max UDMA7
[28010.748599] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[28010.755227] ata5.00: configured for UDMA/133
[28010.755237] ata5: EH complete
[28010.756338] scsi 4:0:0:0: Direct-Access     ATA      SAMSUNG HD154UI  1AG0
PQ: 0 ANSI: 5
[28010.756475] sd 4:0:0:0: Attached scsi generic sg10 type 0
[28010.756572] sd 4:0:0:0: [sdj] 2930277168 512-byte logical blocks: (1.50
TB/1.36 TiB)
[28010.756613] sd 4:0:0:0: [sdj] Write Protect is off
[28010.756616] sd 4:0:0:0: [sdj] Mode Sense: 00 3a 00 00
[28010.756636] sd 4:0:0:0: [sdj] Write cache: enabled, read cache: enabled,
doesn't support DPO or FUA
[28010.756760]  sdj: sdj1
[28010.816161] sd 4:0:0:0: [sdj] Attached SCSI disk

I've been using Ubuntu for a few years now, and have been living with the
problem...working around it with USB docking stations and such.  But, I'd
really hope to see/find this problem worked out.

Thoughts/tips/suggestions?  Since I'm pretty much a novice when it comes to
patching, a link to a guide for git patching would be appreciated too.

Thank You,
Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c
index 06c564e..cbd3475 100644
--- a/drivers/ata/sata_sil24.c
+++ b/drivers/ata/sata_sil24.c
@@ -896,6 +896,9 @@  static unsigned int sil24_qc_issue(struct ata_queued_cmd *qc)
 	dma_addr_t paddr;
 	void __iomem *activate;
 
+	if (qc->tf.command == ATA_CMD_ID_ATA)
+		ata_dev_printk(qc->dev, KERN_INFO, "issuing IDENTIFY\n");
+
 	paddr = pp->cmd_block_dma + tag * sizeof(*pp->cmd_block);
 	activate = port + PORT_CMD_ACTIVATE + tag * 8;
 
@@ -1081,9 +1084,12 @@  static void sil24_error_intr(struct ata_port *ap)
 		}
 
 		/* record error info */
-		if (qc)
+		if (qc) {
+			if (qc->tf.command == ATA_CMD_ID_ATA)
+				ata_dev_printk(qc->dev, KERN_INFO, "IDENTIFY failed, irq_stat=%#0x cerr=%u\n",
+					       irq_stat, cerr);
 			qc->err_mask |= err_mask;
-		else
+		} else
 			ehi->err_mask |= err_mask;
 
 		ehi->action |= action;