Message ID | 20110713143936.GQ2872@htj.dyndns.org |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
Hi: Here is the result after the patch of hotplugging WD 3TB drive (only drive): [ 92.201163] ata1.02: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf [ 92.210719] ata1.02: SError: { PHYRdyChg CommWake DevExch } [ 92.220365] ata1.02: hard resetting link [ 92.980398] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 92.980449] ata1.02: issuing IDENTIFY [ 92.980476] ata1.02: IDENTIFY failed, irq_stat=0x60002 cerr=4 [ 92.980517] ata1.02: failed to IDENTIFY (I/O error, err_mask=0x11) [ 92.980526] ata1.15: hard resetting link [ 95.200045] ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [ 95.200327] ata1.00: hard resetting link [ 95.640059] ata1.00: link resume succeeded after 1 retries [ 95.760404] ata1.00: SATA link down (SStatus 0 SControl 310) [ 95.760481] ata1.01: hard resetting link [ 96.110392] ata1.01: SATA link down (SStatus 0 SControl 320) [ 97.980042] ata1.02: hard resetting link [ 98.330391] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 98.330437] ata1.03: hard resetting link [ 98.680386] ata1.03: SATA link down (SStatus 0 SControl 320) [ 98.680461] ata1.04: hard resetting link [ 99.030393] ata1.04: SATA link down (SStatus 0 SControl 320) [ 99.030468] ata1.05: hard resetting link [ 99.390388] ata1.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) [ 99.390438] ata1.02: issuing IDENTIFY [ 99.390464] ata1.02: IDENTIFY failed, irq_stat=0x60002 cerr=4 [ 99.390506] ata1.02: failed to IDENTIFY (I/O error, err_mask=0x11) [ 99.390515] ata1.15: hard resetting link [ 101.610048] ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [ 101.610325] ata1.00: hard resetting link [ 102.050070] ata1.00: link resume succeeded after 1 retries [ 102.170389] ata1.00: SATA link down (SStatus 0 SControl 310) [ 102.170466] ata1.01: hard resetting link [ 102.520390] ata1.01: SATA link down (SStatus 0 SControl 320) [ 103.330025] ata1.02: hard resetting link [ 103.680391] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 104.390026] ata1.05: hard resetting link [ 104.740391] ata1.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) [ 104.740442] ata1.02: issuing IDENTIFY [ 104.740468] ata1.02: IDENTIFY failed, irq_stat=0x60002 cerr=4 [ 104.740510] ata1.02: failed to IDENTIFY (I/O error, err_mask=0x11) [ 104.740518] ata1.15: hard resetting link [ 106.960046] ata1.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [ 106.960329] ata1.00: hard resetting link [ 107.400059] ata1.00: link resume succeeded after 1 retries [ 107.520397] ata1.00: SATA link down (SStatus 0 SControl 310) [ 107.520472] ata1.01: hard resetting link [ 107.870392] ata1.01: SATA link down (SStatus 0 SControl 320) [ 108.680021] ata1.02: hard resetting link [ 109.030393] ata1.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 109.030439] ata1.03: hard resetting link [ 109.380397] ata1.03: SATA link down (SStatus 0 SControl 320) [ 109.380473] ata1.04: hard resetting link [ 109.730395] ata1.04: SATA link down (SStatus 0 SControl 320) [ 109.741269] ata1.05: hard resetting link [ 110.090393] ata1.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) [ 110.090443] ata1.02: issuing IDENTIFY [ 110.118499] ata1.02: ATA-8: WDC WD30EZRX-00MMMB0, 80.00A80, max UDMA/133 [ 110.118506] ata1.02: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 110.118642] ata1.02: issuing IDENTIFY [ 110.123817] ata1.02: configured for UDMA/100 [ 110.123904] ata1: EH complete [ 110.124108] scsi 0:2:0:0: Direct-Access ATA WDC WD30EZRX-00M 80.0 PQ: 0 ANSI: 5 [ 110.124554] sd 0:2:0:0: [sda] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) [ 110.124578] sd 0:2:0:0: Attached scsi generic sg0 type 0 [ 110.124596] sd 0:2:0:0: [sda] 4096-byte physical blocks [ 110.124735] sd 0:2:0:0: [sda] Write Protect is off [ 110.124742] sd 0:2:0:0: [sda] Mode Sense: 00 3a 00 00 [ 110.124813] sd 0:2:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 110.125220] sda: sda1 sda2 [ 110.170313] sd 0:2:0:0: [sda] Attached SCSI disk [ 110.172439] ata1.02: issuing IDENTIFY [ 110.250997] ata1.02: issuing IDENTIFY [ 110.274885] ata1.02: issuing IDENTIFY > -----Original Message----- > From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of tj@kernel.org > Sent: Wednesday, July 13, 2011 7:40 AM > To: Derry Bryson > Cc: linux-ide@vger.kernel.org > Subject: Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726) > > Hello, again. > > On Wed, Jul 13, 2011 at 03:34:25PM +0200, tj@kernel.org wrote: > > I see. Yeah, it doesn't seem to be PHY flickering. It looks like > > IDENTIFY is failing before disk finishes spinning up, which isn't the > > correct behavior. SRST is supposed to wait for disk spin up > > completion. I don't know why sil24 softreset is failing without > > waiting for spinup. I'll see whether I can reproduce the problem > > here. > > I played with it a bit but couldn't reproduce the problem; however, > PMP EH does lack per-device retry back-off. On host ports, when > reset/identification fails for whatever reason, retry is backed off > such that even if controller/device interaction is faulty during spin > up, EH always retries after sufficient time is given to the device. > With PMP EH, this logic is missing partly because PMP EH sometimes > already takes too long and partly because tracking the timestamps > seemed a bit complex during the initial implementation. > > So, this definitely contributes to why PMP EH fails to work around the > misinteraction between the drive and port and should probably be > improved, but let's also find out why the controller isn't waiting for > the drive after issuing IDENTIFY. Can you please apply the following > patch, reproduce the problem and report the kernel log? > > Thank you. > > diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c > index 06c564e..cbd3475 100644 > --- a/drivers/ata/sata_sil24.c > +++ b/drivers/ata/sata_sil24.c > @@ -896,6 +896,9 @@ static unsigned int sil24_qc_issue(struct > ata_queued_cmd *qc) > dma_addr_t paddr; > void __iomem *activate; > > + if (qc->tf.command == ATA_CMD_ID_ATA) > + ata_dev_printk(qc->dev, KERN_INFO, "issuing IDENTIFY\n"); > + > paddr = pp->cmd_block_dma + tag * sizeof(*pp->cmd_block); > activate = port + PORT_CMD_ACTIVATE + tag * 8; > > @@ -1081,9 +1084,12 @@ static void sil24_error_intr(struct ata_port > *ap) > } > > /* record error info */ > - if (qc) > + if (qc) { > + if (qc->tf.command == ATA_CMD_ID_ATA) > + ata_dev_printk(qc->dev, KERN_INFO, "IDENTIFY > failed, irq_stat=%#0x cerr=%u\n", > + irq_stat, cerr); > qc->err_mask |= err_mask; > - else > + } else > ehi->err_mask |= err_mask; > > ehi->action |= action; > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 10.0.1390 / Virus Database: 1516/3761 - Release Date: 07/12/11 Derry Bryson Sr. Software Engineer Highly Reliable Systems www.High-Rely.com "Making Backup Invisible" Phone: 775.329.5139 X129 Fax: 775.370.1001 Sales & Support: 877.384.6838 This e-mail and any files transmitted with it are the property of Highly Reliable Systems, Inc. This (email) information is confidential, and is intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, disclosure, retention, dissemination, forwarding, printing, distribution or copying of this e-mail is strictly prohibited. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello,
On Wed, Jul 13, 2011 at 04:23:27PM +0000, Derry Bryson wrote:
> [ 92.980476] ata1.02: IDENTIFY failed, irq_stat=0x60002 cerr=4
The controller is saying that it failed to transmit command FIS
because PHYRDY state changed, so that is PHY flickering. Can you
please bump the second debounce timing to 1000 (not only the first one
but always) and see whether the failure pattern changes?
Thanks.
Hi: Dmesg with second timing value set to 1000 always, hotplugging WD 3TB drive: [ 139.145917] ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf [ 139.156180] ata5.00: SError: { PHYRdyChg CommWake DevExch } [ 139.166619] ata5.00: hard resetting link [ 139.166639] ata5.00: XXXX hardreset hotplugged = true [ 139.166646] ata5.00: XXX0 hardreset debounce 25 500 2000 [ 139.166651] ata5.00: XXX1 hardreset debounce 25 1000 2000 [ 140.430401] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 140.430453] ata5.00: issuing IDENTIFY [ 140.430480] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4 [ 140.430521] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11) [ 140.430532] ata5.15: hard resetting link [ 142.650047] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [ 145.431276] ata5.00: hard resetting link [ 145.431298] ata5.00: XXXX hardreset hotplugged = false [ 145.431304] ata5.00: XXX0 hardreset debounce 5 100 2000 [ 145.431310] ata5.00: XXX1 hardreset debounce 5 1000 2000 [ 146.690413] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 146.690461] ata5.01: hard resetting link [ 146.690481] ata5.01: XXXX hardreset hotplugged = false [ 146.690487] ata5.01: XXX0 hardreset debounce 5 100 2000 [ 146.690492] ata5.01: XXX1 hardreset debounce 5 1000 2000 [ 147.940402] ata5.01: SATA link down (SStatus 0 SControl 320) [ 147.940481] ata5.02: hard resetting link [ 147.940501] ata5.02: XXXX hardreset hotplugged = false [ 147.940507] ata5.02: XXX0 hardreset debounce 5 100 2000 [ 147.940513] ata5.02: XXX1 hardreset debounce 5 1000 2000 [ 149.190398] ata5.02: SATA link down (SStatus 0 SControl 320) [ 149.190476] ata5.03: hard resetting link [ 149.190497] ata5.03: XXXX hardreset hotplugged = false [ 149.190503] ata5.03: XXX0 hardreset debounce 5 100 2000 [ 149.190508] ata5.03: XXX1 hardreset debounce 5 1000 2000 [ 150.440399] ata5.03: SATA link down (SStatus 0 SControl 320) [ 150.440478] ata5.04: hard resetting link [ 150.440499] ata5.04: XXXX hardreset hotplugged = false [ 150.440505] ata5.04: XXX0 hardreset debounce 5 100 2000 [ 150.440510] ata5.04: XXX1 hardreset debounce 5 1000 2000 [ 151.690410] ata5.04: SATA link down (SStatus 0 SControl 320) [ 151.690488] ata5.05: hard resetting link [ 151.690509] ata5.05: XXXX hardreset hotplugged = false [ 151.690515] ata5.05: XXX0 hardreset debounce 5 100 2000 [ 151.690520] ata5.05: XXX1 hardreset debounce 5 1000 2000 [ 152.940399] ata5.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) [ 152.940451] ata5.00: issuing IDENTIFY [ 152.940477] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4 [ 152.940518] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11) [ 152.940529] ata5.15: hard resetting link [ 155.160047] ata5.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [ 155.160327] ata5.00: hard resetting link [ 155.160347] ata5.00: XXXX hardreset hotplugged = false [ 155.160353] ata5.00: XXX0 hardreset debounce 5 100 2000 [ 155.160358] ata5.00: XXX1 hardreset debounce 5 1000 2000 [ 156.410398] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 157.940026] ata5.05: hard resetting link [ 157.940048] ata5.05: XXXX hardreset hotplugged = false [ 157.940054] ata5.05: XXX0 hardreset debounce 5 100 2000 [ 157.940059] ata5.05: XXX1 hardreset debounce 5 1000 2000 [ 159.190398] ata5.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) [ 159.190450] ata5.00: issuing IDENTIFY [ 159.199553] ata5.00: ATA-8: WDC WD30EZRX-00MMMB0, 80.00A80, max UDMA/133 [ 159.199560] ata5.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 159.199702] ata5.00: issuing IDENTIFY [ 159.205571] ata5.00: configured for UDMA/100 [ 159.205661] ata5: EH complete [ 159.205857] scsi 4:0:0:0: Direct-Access ATA WDC WD30EZRX-00M 80.0 PQ: 0 ANSI: 5 [ 159.206297] sd 4:0:0:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) [ 159.206305] sd 4:0:0:0: [sdb] 4096-byte physical blocks [ 159.206319] sd 4:0:0:0: Attached scsi generic sg1 type 0 [ 159.206440] sd 4:0:0:0: [sdb] Write Protect is off [ 159.206447] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 159.206519] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 159.206938] sdb: sdb1 sdb2 [ 159.251508] sd 4:0:0:0: [sdb] Attached SCSI disk [ 159.254218] ata5.00: issuing IDENTIFY [ 159.330233] ata5.00: issuing IDENTIFY [ 159.350570] ata5.00: issuing IDENTIFY > -----Original Message----- > From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of tj@kernel.org > Sent: Thursday, July 14, 2011 12:15 AM > To: Derry Bryson > Cc: linux-ide@vger.kernel.org > Subject: Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726) > > Hello, > > On Wed, Jul 13, 2011 at 04:23:27PM +0000, Derry Bryson wrote: > > [ 92.980476] ata1.02: IDENTIFY failed, irq_stat=0x60002 cerr=4 > > The controller is saying that it failed to transmit command FIS > because PHYRDY state changed, so that is PHY flickering. Can you > please bump the second debounce timing to 1000 (not only the first one > but always) and see whether the failure pattern changes? > > Thanks. > > -- > tejun > Derry Bryson Sr. Software Engineer Highly Reliable Systems www.High-Rely.com "Making Backup Invisible" Phone: 775.329.5139 X129 Fax: 775.370.1001 Sales & Support: 877.384.6838 This e-mail and any files transmitted with it are the property of Highly Reliable Systems, Inc. This (email) information is confidential, and is intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, disclosure, retention, dissemination, forwarding, printing, distribution or copying of this e-mail is strictly prohibited. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 14, 2011 at 05:37:37PM +0000, Derry Bryson wrote: > Hi: > > Dmesg with second timing value set to 1000 always, hotplugging WD 3TB drive: > > [ 139.145917] ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf > [ 139.156180] ata5.00: SError: { PHYRdyChg CommWake DevExch } > [ 139.166619] ata5.00: hard resetting link > [ 140.430401] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) > [ 140.430453] ata5.00: issuing IDENTIFY > [ 140.430480] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4 > [ 140.430521] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11) ... > [ 145.431276] ata5.00: hard resetting link > [ 145.431298] ata5.00: XXXX hardreset hotplugged = false > [ 145.431304] ata5.00: XXX0 hardreset debounce 5 100 2000 > [ 145.431310] ata5.00: XXX1 hardreset debounce 5 1000 2000 > [ 146.690413] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) ... > [ 152.940451] ata5.00: issuing IDENTIFY > [ 152.940477] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4 > [ 152.940518] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11) So, no change. Weird. I think we need to implement retry backoff properly anyways but am still wondering why probing failure behaves differently for you. I'll try with similar drive but maybe it's difference in PMP firmware? Thanks.
Tejun: Here are the specs on the port multiplier board: Lycom ST-126RM (http://www.lycom.com.tw/) Sil3726 chip revision 17 EEPROM FW Version: 1.0114 Also there is another problem: ports that are not connected at power on are ignored until a port that was connected at power on is hotplugged. So, for instance, if I turn the unit on with only a drive on the first port, the other four ports will not work and I don't even see kernel messages when hotplugging until I hotplug the first port. I haven't looked into this yet, but it would seem the driver must be disabling the ports and they are not looked at again until the hard reset is performed. Let me know if I can help in any way. Thanks, Derry Bryson Sr. Software Engineer Highly Reliable Systems www.High-Rely.com "Making Backup Invisible" Phone: 775.329.5139 X129 Fax: 775.370.1001 Sales & Support: 877.384.6838 This e-mail and any files transmitted with it are the property of Highly Reliable Systems, Inc. This (email) information is confidential, and is intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, disclosure, retention, dissemination, forwarding, printing, distribution or copying of this e-mail is strictly prohibited. > -----Original Message----- > From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of tj@kernel.org > Sent: Thursday, July 21, 2011 2:00 AM > To: Derry Bryson > Cc: linux-ide@vger.kernel.org > Subject: Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726) > > On Thu, Jul 14, 2011 at 05:37:37PM +0000, Derry Bryson wrote: > > Hi: > > > > Dmesg with second timing value set to 1000 always, hotplugging WD 3TB > drive: > > > > [ 139.145917] ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4050000 > action 0xf > > [ 139.156180] ata5.00: SError: { PHYRdyChg CommWake DevExch } > > [ 139.166619] ata5.00: hard resetting link > > [ 140.430401] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl > 320) > > [ 140.430453] ata5.00: issuing IDENTIFY > > [ 140.430480] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4 > > [ 140.430521] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11) > ... > > [ 145.431276] ata5.00: hard resetting link > > [ 145.431298] ata5.00: XXXX hardreset hotplugged = false > > [ 145.431304] ata5.00: XXX0 hardreset debounce 5 100 2000 > > [ 145.431310] ata5.00: XXX1 hardreset debounce 5 1000 2000 > > [ 146.690413] ata5.00: SATA link up 3.0 Gbps (SStatus 123 SControl > 320) > ... > > [ 152.940451] ata5.00: issuing IDENTIFY > > [ 152.940477] ata5.00: IDENTIFY failed, irq_stat=0x60002 cerr=4 > > [ 152.940518] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x11) > > So, no change. Weird. I think we need to implement retry backoff > properly anyways but am still wondering why probing failure behaves > differently for you. I'll try with similar drive but maybe it's > difference in PMP firmware? > > Thanks. > > -- > tejun > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 10.0.1390 / Virus Database: 1518/3777 - Release Date: 07/20/11 -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Thu, Jul 21, 2011 at 04:22:07PM +0000, Derry Bryson wrote: > Here are the specs on the port multiplier board: > > Lycom ST-126RM (http://www.lycom.com.tw/) > Sil3726 chip revision 17 > EEPROM FW Version: 1.0114 > > Also there is another problem: ports that are not connected at power on > are ignored until a port that was connected at power on is hotplugged. So, > for instance, if I turn the unit on with only a drive on the first port, the > other four ports will not work and I don't even see kernel messages when > hotplugging until I hotplug the first port. I haven't looked into this yet, > but it would seem the driver must be disabling the ports and they are not > looked at again until the hard reset is performed. That's most likely the sil pmp itself. It has been known to show erratic behavior if the first port is not occupied during power up. I don't know whether later firmware updates fixed that tho. Can you get hold of a marvell PMP and see how it behaves? Thanks.
> -----Original Message----- > From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of tj@kernel.org > Sent: Friday, July 22, 2011 2:51 AM > To: Derry Bryson > Cc: linux-ide@vger.kernel.org > Subject: Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726) > > Hello, > > On Thu, Jul 21, 2011 at 04:22:07PM +0000, Derry Bryson wrote: > > Here are the specs on the port multiplier board: > > > > Lycom ST-126RM (http://www.lycom.com.tw/) > > Sil3726 chip revision 17 > > EEPROM FW Version: 1.0114 > > > > Also there is another problem: ports that are not connected at power > on > > are ignored until a port that was connected at power on is > hotplugged. So, > > for instance, if I turn the unit on with only a drive on the first > port, the > > other four ports will not work and I don't even see kernel messages > when > > hotplugging until I hotplug the first port. I haven't looked into > this yet, > > but it would seem the driver must be disabling the ports and they are > not > > looked at again until the hard reset is performed. > > That's most likely the sil pmp itself. It has been known to show > erratic behavior if the first port is not occupied during power up. I > don't know whether later firmware updates fixed that tho. Can you get > hold of a marvell PMP and see how it behaves? > That may well be a problem, but is not the problem in this case since it is not limited to port 0. Seems like the sil driver or pmp code in the kernel is disabling the ports that are not connected at pmp power on and not looking at them again until we hotplug one of the ports that was connected and cause a hard reset or something like that. For instance, I turn off the pmp unit and connect drives to ports 0, 1, and 2 then power on the unit. Linux sees the pmp unit and finds the three drives. Then I connect a drive to port 3 and Linux reports nothing. No interrupts, no nothing. Same with port 4. However if I connect a drive to port 3 or 4 (or both) and hotplug the a drive on any of ports 0, 1, or 2 now the drives are detected after the hard reset. This hardware works fine under Windows, hotplugging drives and powering the unit off and on, etc. using the same Sil3124 eSATA PCI controller. No problems, so I think the hardware is okay. I don't know of nor have a Marvell PMP, but I do have another Sil3726 PMP board with newer firmware and a board with a JMicron PMP that I am going to test with. Thanks for your help. Derry Bryson Sr. Software Engineer Highly Reliable Systems www.High-Rely.com "Making Backup Invisible" Phone: 775.329.5139 X129 Fax: 775.370.1001 Sales & Support: 877.384.6838 This e-mail and any files transmitted with it are the property of Highly Reliable Systems, Inc. This (email) information is confidential, and is intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, disclosure, retention, dissemination, forwarding, printing, distribution or copying of this e-mail is strictly prohibited. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I did some testing with a PMP board containing a Sil3726 with firmware Version 1.0124 and a PMP/Raid board with a JMicron 393 configured as PMP only. I am connecting to a Sil3124 esata pci controller. The Sil3726 board acted the same as the previous Sil3726 board with the 1.0114 firmware. The JMicron board worked much better. It works properly with a stock kernel and does not fail the IDENTIFY after hotplugging and does not require retries. It seems to wait for the drive to spin up before doing anything so it works properly the first time without retries or hard resets. The only problem I saw was that it seemed to not work at all a couple of times if not powered up/connected before booting Linux. Couldn't see it at all, even after several power cycles and unplugging/pluggin esata cable. I need to test this more as it is inconsistent and may have to do with the Sil3124 controller rather than the JMicron PMP. Derry Bryson Sr. Software Engineer Highly Reliable Systems www.High-Rely.com "Making Backup Invisible" Phone: 775.329.5139 X129 Fax: 775.370.1001 Sales & Support: 877.384.6838 This e-mail and any files transmitted with it are the property of Highly Reliable Systems, Inc. This (email) information is confidential, and is intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not one of the named recipient(s) or otherwise have reason to believe that you have received this message in error, please notify the sender and delete this message immediately from your computer. Any other use, disclosure, retention, dissemination, forwarding, printing, distribution or copying of this e-mail is strictly prohibited. > -----Original Message----- > From: Tejun Heo [mailto:htejun@gmail.com] On Behalf Of tj@kernel.org > Sent: Friday, July 22, 2011 2:51 AM > To: Derry Bryson > Cc: linux-ide@vger.kernel.org > Subject: Re: Problem w/ hotplug on sata_sil24 w/ PMP (sil3726) > > Hello, > > On Thu, Jul 21, 2011 at 04:22:07PM +0000, Derry Bryson wrote: > > Here are the specs on the port multiplier board: > > > > Lycom ST-126RM (http://www.lycom.com.tw/) > > Sil3726 chip revision 17 > > EEPROM FW Version: 1.0114 > > > > Also there is another problem: ports that are not connected at power > on > > are ignored until a port that was connected at power on is > hotplugged. So, > > for instance, if I turn the unit on with only a drive on the first > port, the > > other four ports will not work and I don't even see kernel messages > when > > hotplugging until I hotplug the first port. I haven't looked into > this yet, > > but it would seem the driver must be disabling the ports and they are > not > > looked at again until the hard reset is performed. > > That's most likely the sil pmp itself. It has been known to show > erratic behavior if the first port is not occupied during power up. I > don't know whether later firmware updates fixed that tho. Can you get > hold of a marvell PMP and see how it behaves? > > Thanks. > > -- > tejun > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 10.0.1390 / Virus Database: 1518/3779 - Release Date: 07/21/11 -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Fri, Jul 22, 2011 at 07:34:37PM +0000, Derry Bryson wrote: > I did some testing with a PMP board containing a Sil3726 with firmware > Version 1.0124 and a PMP/Raid board with a JMicron 393 configured as > PMP only. I am connecting to a Sil3124 esata pci controller. > > The Sil3726 board acted the same as the previous Sil3726 board with the 1.0114 > firmware. > > The JMicron board worked much better. It works properly with a stock > kernel and does not fail the IDENTIFY after hotplugging and does > not require retries. It seems to wait for the drive to spin up before > doing anything so it works properly the first time without retries or > hard resets. Interesting, that's the behavior I see w/ my 3726 but all my PMPs are very old, so I guess it has changed since then. A bit too lazy to hook it up to windows machine to check firmware version ATM. :) > The only problem I saw was that it seemed to not work at all a > couple of times if not powered up/connected before booting Linux. > Couldn't see it at all, even after several power cycles and > unplugging/pluggin esata cable. I need to test this more as it is > inconsistent and may have to do with the Sil3124 controller rather > than the JMicron PMP. How did that go? Thanks.
tj <at> kernel.org <tj <at> kernel.org> writes: > > Hello, > > How did that go? > > Thanks. > Like Derry who started this thread, I too had seen an old thread from October/November 2008 with what appeared to be no resolution to this problem. Now, finding this thread, again, with no apparent resolution to this problem. I'm currently running Ubuntu 10.04 (lucid), kernel 2.6.32-33-generic. I've no experience with applying these git patches, and my searching to figure out how it works have not helped. I'm using an Addonics eSATA PCI-X controller with the SiI3124 chipset, and I have an Addonics PM in an external enclosure, with a 5 bay/tray DAS. Some of my drives give me this problem: (this occurs for me with pretty much ALL Samsung hard drives) [12888.470308] ata9.01: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf [12888.470313] ata9.01: SError: { PHYRdyChg CommWake DevExch } [12888.470385] ata9.01: hard resetting link [12889.211597] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [12889.211686] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11) [12889.211692] ata9.15: hard resetting link [12891.430086] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [12891.430397] ata9.00: hard resetting link [12891.780786] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [12894.211103] ata9.01: hard resetting link [12894.560424] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [12894.560466] ata9.02: hard resetting link [12894.914176] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [12894.914222] ata9.03: hard resetting link [12895.264141] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [12895.264169] ata9.04: hard resetting link [12895.612930] ata9.04: SATA link down (SStatus 0 SControl 320) [12895.613007] ata9.05: hard resetting link [12895.964143] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) [12896.065908] ata9.00: configured for UDMA/100 [12896.065970] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11) [12896.065977] ata9.15: hard resetting link [12898.283804] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [12898.284128] ata9.00: hard resetting link [12898.634174] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [12899.562524] ata9.01: hard resetting link [12899.914147] ata9.01: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [12899.914180] ata9.02: hard resetting link [12900.261682] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [12900.261724] ata9.03: hard resetting link [12900.610413] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [12900.961283] ata9.05: hard resetting link [12901.310385] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) [12901.397241] ata9.00: configured for UDMA/100 [12901.397300] ata9.01: failed to IDENTIFY (I/O error, err_mask=0x11) [12901.397305] ata9.01: failed to recover link after 3 tries, disabling [12901.397311] ata9.15: hard resetting link [12903.613694] ata9.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [12903.960564] ata9.00: hard resetting link [12904.311125] ata9.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [12905.260154] ata9.02: hard resetting link [12905.602929] ata9.02: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [12905.611319] ata9.03: hard resetting link [12905.962555] ata9.03: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [12905.962592] ata9.04: hard resetting link [12906.312931] ata9.04: SATA link down (SStatus 0 SControl 320) [12906.313004] ata9.05: hard resetting link [12906.660409] ata9.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320) [12906.753619] ata9.00: configured for UDMA/100 [12906.766586] ata9.02: configured for UDMA/100 [12906.771917] ata9.03: configured for UDMA/100 [12907.121462] ata9: EH complete If I hot plug the same drive using a port directly off my mobo(no PM in the mix), I get this result(drive connects/mounts/works): [27997.097104] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen [27997.097108] ata5: irq_stat 0x00400040, connection status changed [27997.097111] ata5: SError: { PHYRdyChg CommWake DevExch } [27997.097116] ata5: hard resetting link [28007.147622] ata5: softreset failed (device not ready) [28007.147627] ata5: hard resetting link [28010.630028] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [28010.748595] ata5.00: ATA-7: SAMSUNG HD154UI, 1AG01118, max UDMA7 [28010.748599] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32) [28010.755227] ata5.00: configured for UDMA/133 [28010.755237] ata5: EH complete [28010.756338] scsi 4:0:0:0: Direct-Access ATA SAMSUNG HD154UI 1AG0 PQ: 0 ANSI: 5 [28010.756475] sd 4:0:0:0: Attached scsi generic sg10 type 0 [28010.756572] sd 4:0:0:0: [sdj] 2930277168 512-byte logical blocks: (1.50 TB/1.36 TiB) [28010.756613] sd 4:0:0:0: [sdj] Write Protect is off [28010.756616] sd 4:0:0:0: [sdj] Mode Sense: 00 3a 00 00 [28010.756636] sd 4:0:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [28010.756760] sdj: sdj1 [28010.816161] sd 4:0:0:0: [sdj] Attached SCSI disk I've been using Ubuntu for a few years now, and have been living with the problem...working around it with USB docking stations and such. But, I'd really hope to see/find this problem worked out. Thoughts/tips/suggestions? Since I'm pretty much a novice when it comes to patching, a link to a guide for git patching would be appreciated too. Thank You, Mike -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/ata/sata_sil24.c b/drivers/ata/sata_sil24.c index 06c564e..cbd3475 100644 --- a/drivers/ata/sata_sil24.c +++ b/drivers/ata/sata_sil24.c @@ -896,6 +896,9 @@ static unsigned int sil24_qc_issue(struct ata_queued_cmd *qc) dma_addr_t paddr; void __iomem *activate; + if (qc->tf.command == ATA_CMD_ID_ATA) + ata_dev_printk(qc->dev, KERN_INFO, "issuing IDENTIFY\n"); + paddr = pp->cmd_block_dma + tag * sizeof(*pp->cmd_block); activate = port + PORT_CMD_ACTIVATE + tag * 8; @@ -1081,9 +1084,12 @@ static void sil24_error_intr(struct ata_port *ap) } /* record error info */ - if (qc) + if (qc) { + if (qc->tf.command == ATA_CMD_ID_ATA) + ata_dev_printk(qc->dev, KERN_INFO, "IDENTIFY failed, irq_stat=%#0x cerr=%u\n", + irq_stat, cerr); qc->err_mask |= err_mask; - else + } else ehi->err_mask |= err_mask; ehi->action |= action;