Message ID | 20091026162011.GA3289@frolo.macqel |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
On 10/26/2009 10:20 AM, Philippe De Muyter wrote: > Hi, > > I just encountered a problem with write-access to a batch of CF cards > (KINGSTON TECHNOLOGY 4GB COMPACT FLASH CF/4GB > 3.3V/5V 9904321 - 006.AOOLF 4449081 - 1219643 X001 ASSY IN TAIWAN (c) 2008) > connected to a PC-CARD / PCMCIA interface, with the following error messages : > > hda: status timeout: status=0xd0 { Busy } > ide: failed opcode was: unknown > hda: no DRQ after issuing MULTWRITE > > After testing with different bigger values for the WAIT_DRQ timeout value, > the problem disappeared. I had success with WAIT_DRQ = 500ms, then with > WAIT_DRQ = 300ms. I then tested with WAIT_DRQ = 200ms, but the problem > reappeared. So I kept the 300ms value. > > Signed-off-by: Philippe De Muyter<phdm@macqel.be> > > diff -r a145344bb228 include/linux/ide.h > --- a/include/linux/ide.h Thu Oct 22 08:28:28 2009 +0900 > +++ b/include/linux/ide.h Mon Oct 26 16:51:23 2009 +0100 > @@ -125,8 +125,8 @@ > * Timeouts for various operations: > */ > enum { > - /* spec allows up to 20ms */ > - WAIT_DRQ = HZ / 10, /* 100ms */ > + /* spec allows up to 20ms, but some CF cards need more than 200ms */ > + WAIT_DRQ = 3 * HZ / 10, /* 300ms */ > /* some laptops are very slow */ > WAIT_READY = 5 * HZ, /* 5s */ > /* should be less than 3ms (?), if all ATAPI CD is closed at boot */ This has come up before: http://marc.info/?l=linux-ide&m=123064513313466&w=2 I think this timeout should not even exist. libata has no such timeout (only the overall command completion timeout), and I can't find any reference in current ATA specs to the device being required to raise DRQ in any particular amount of time. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Robert Hancock <hancockrwd@gmail.com> Date: Mon, 26 Oct 2009 18:34:57 -0600 > This has come up before: > > http://marc.info/?l=linux-ide&m=123064513313466&w=2 > > I think this timeout should not even exist. libata has no such timeout > (only the overall command completion timeout), and I can't find any > reference in current ATA specs to the device being required to raise > DRQ in any particular amount of time. So is the issue that, whilst we should wait for BUSY to clear, waiting around for DRQ is unreasonable? It seems that WAIT_DRQ is passed to ide_wait_stat() but that only controls how long we wait for BUSY to clear, the ATA_DRQ 'bad' bit we pass there only gets probed in a fixed limit loop: for (i = 0; i < 10; i++) { udelay(1); stat = tp_ops->read_status(hwif); if (OK_STAT(stat, good, bad)) { *rstat = stat; return 0; } } *rstat = stat; return -EFAULT; Therefore, if increasing WAIT_DRQ helps things for people, it's because the BUSY bit needs that much time to clear in these cases. The talking in that thread seems to state that the ATA layer waits only for BUSY to clear, it does not wait for DRQ. But from the data we're seeing here, it is in fact BUSY which needs so much more time to clear so removing the DRQ bit probe to be more like ATA won't fix anything. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 26, 2009 at 6:45 PM, David Miller <davem@davemloft.net> wrote: > From: Robert Hancock <hancockrwd@gmail.com> > Date: Mon, 26 Oct 2009 18:34:57 -0600 > >> This has come up before: >> >> http://marc.info/?l=linux-ide&m=123064513313466&w=2 >> >> I think this timeout should not even exist. libata has no such timeout >> (only the overall command completion timeout), and I can't find any >> reference in current ATA specs to the device being required to raise >> DRQ in any particular amount of time. > > So is the issue that, whilst we should wait for BUSY to clear, > waiting around for DRQ is unreasonable? > > It seems that WAIT_DRQ is passed to ide_wait_stat() but that > only controls how long we wait for BUSY to clear, the ATA_DRQ > 'bad' bit we pass there only gets probed in a fixed limit loop: > > for (i = 0; i < 10; i++) { > udelay(1); > stat = tp_ops->read_status(hwif); > > if (OK_STAT(stat, good, bad)) { > *rstat = stat; > return 0; > } > } > *rstat = stat; > return -EFAULT; > > Therefore, if increasing WAIT_DRQ helps things for people, it's > because the BUSY bit needs that much time to clear in these > cases. > > The talking in that thread seems to state that the ATA layer > waits only for BUSY to clear, it does not wait for DRQ. But > from the data we're seeing here, it is in fact BUSY which needs > so much more time to clear so removing the DRQ bit probe to > be more like ATA won't fix anything. Hmm, I think you're right.. seems it expects BSY to be de-asserted within 100ms when issuing a write, which is fairly ridiculous. Maybe not a problem for a hard drive in typical cases, but if a CF or SSD is in an erase cycle or something it's quite possible for this not to work. Of course, just jacking up the timeout may make the problem alluded to in the comment in __ide_wait_stat more evident ("This routine should get fixed to not hog the cpu during extra long waits"), as it just does a tight loop polling the status with no sleeps. libata only busy-waits for 50 microseconds, if not set then it sleeps for 2ms and polls for another 10 microseconds, if still not set it tries the whole thing again at 16ms intervals. Only after (typically) 30 seconds does it give up. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Robert Hancock <hancockrwd@gmail.com> Date: Mon, 26 Oct 2009 19:07:18 -0600 > libata only busy-waits for 50 microseconds, if not set then it sleeps > for 2ms and polls for another 10 microseconds, if still not set it > tries the whole thing again at 16ms intervals. Only after (typically) > 30 seconds does it give up. Porting that kind of logic over to IDE is a non-starter. It's easier to get people to move over to using the ATA layer for their devices. Meanwhile we should provide a way for things to work, and realistically the only way to do that currently is to bump the WAIT_DRQ value to some large number. And that's exactly the kind of patch I'm willing to accept for this. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Oct 26, 2009 at 7:19 PM, David Miller <davem@davemloft.net> wrote: > From: Robert Hancock <hancockrwd@gmail.com> > Date: Mon, 26 Oct 2009 19:07:18 -0600 > >> libata only busy-waits for 50 microseconds, if not set then it sleeps >> for 2ms and polls for another 10 microseconds, if still not set it >> tries the whole thing again at 16ms intervals. Only after (typically) >> 30 seconds does it give up. > > Porting that kind of logic over to IDE is a non-starter. > > It's easier to get people to move over to using the ATA layer for > their devices. > > Meanwhile we should provide a way for things to work, and > realistically the only way to do that currently is to bump the > WAIT_DRQ value to some large number. > > And that's exactly the kind of patch I'm willing to accept for this. I agree, it's sub-optimal but it helps.. if the user wants better behavior they should a) fix it so that the card isn't using PIO, at least if it supports DMA and b) not use drivers/ide.. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Robert Hancock <hancockrwd@gmail.com> Date: Mon, 26 Oct 2009 19:40:03 -0600 > On Mon, Oct 26, 2009 at 7:19 PM, David Miller <davem@davemloft.net> wrote: >> Meanwhile we should provide a way for things to work, and >> realistically the only way to do that currently is to bump the >> WAIT_DRQ value to some large number. >> >> And that's exactly the kind of patch I'm willing to accept for this. > > I agree, it's sub-optimal but it helps.. if the user wants better > behavior they should a) fix it so that the card isn't using PIO, at > least if it supports DMA and b) not use drivers/ide.. Philippe's patch that started this thread uses "3 * HZ / 10" which isn't large enough for the SSD cases. Can someone please post a patch that uses a large enough value? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David, On Mon, Oct 26, 2009 at 06:43:18PM -0700, David Miller wrote: > From: Robert Hancock <hancockrwd@gmail.com> > Date: Mon, 26 Oct 2009 19:40:03 -0600 > > > On Mon, Oct 26, 2009 at 7:19 PM, David Miller <davem@davemloft.net> wrote: > >> Meanwhile we should provide a way for things to work, and > >> realistically the only way to do that currently is to bump the > >> WAIT_DRQ value to some large number. > >> > >> And that's exactly the kind of patch I'm willing to accept for this. > > > > I agree, it's sub-optimal but it helps.. if the user wants better > > behavior they should a) fix it so that the card isn't using PIO, at > > least if it supports DMA and b) not use drivers/ide.. Strangely enough, I also had no timeout problem if I started my kernel with 'ide=nodma', instead of increasing WAIT_DRQ. So I surmise that WAIT_DRQ is used in the dma case. > > Philippe's patch that started this thread uses "3 * HZ / 10" > which isn't large enough for the SSD cases. Can someone please > post a patch that uses a large enough value? How big a timeout do you want/accept ? Mark Lord wrote about SSD's in the mail referred by Robert Hancock : It should probably be at least 500msec or more now. Philippe -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello. Philippe De Muyter wrote: >> From: Robert Hancock <hancockrwd@gmail.com> >> Date: Mon, 26 Oct 2009 19:40:03 -0600 >> >> >>> On Mon, Oct 26, 2009 at 7:19 PM, David Miller <davem@davemloft.net> wrote: >>> >>>> Meanwhile we should provide a way for things to work, and >>>> realistically the only way to do that currently is to bump the >>>> WAIT_DRQ value to some large number. >>>> >>>> And that's exactly the kind of patch I'm willing to accept for this. >>>> >>> I agree, it's sub-optimal but it helps.. if the user wants better >>> behavior they should a) fix it so that the card isn't using PIO, at >>> least if it supports DMA and b) not use drivers/ide.. >>> > > Strangely enough, I also had no timeout problem if I started my kernel with > 'ide=nodma', instead of increasing WAIT_DRQ. Hm, interesting... > So I surmise that WAIT_DRQ is used in the dma case. > > It's used only for the PIO write commands -- see do_rw_taskfile() in ide-taskfile.c... DMA commands don't require waiting for BSY=0, DRQ=1 condition. > Philippe > WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Robert Hancock wrote: .. > This has come up before: > > http://marc.info/?l=linux-ide&m=123064513313466&w=2 > > I think this timeout should not even exist. libata has no such timeout > (only the overall command completion timeout), and I can't find any > reference in current ATA specs to the device being required to raise DRQ > in any particular amount of time. .. The reason for the original (20ms, then 50ms) timeout was this text from the ATA1 specification, long since outdated: - Upon receipt of a Class 3 command, the drive sets BSY within 400 nsec, sets up the sector buffer for a write operation, sets DRQ within 20 msec, and clears BSY within 400 nsec of setting DRQ. Cheers -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff -r a145344bb228 include/linux/ide.h --- a/include/linux/ide.h Thu Oct 22 08:28:28 2009 +0900 +++ b/include/linux/ide.h Mon Oct 26 16:51:23 2009 +0100 @@ -125,8 +125,8 @@ * Timeouts for various operations: */ enum { - /* spec allows up to 20ms */ - WAIT_DRQ = HZ / 10, /* 100ms */ + /* spec allows up to 20ms, but some CF cards need more than 200ms */ + WAIT_DRQ = 3 * HZ / 10, /* 300ms */ /* some laptops are very slow */ WAIT_READY = 5 * HZ, /* 5s */ /* should be less than 3ms (?), if all ATAPI CD is closed at boot */
Hi, I just encountered a problem with write-access to a batch of CF cards (KINGSTON TECHNOLOGY 4GB COMPACT FLASH CF/4GB 3.3V/5V 9904321 - 006.AOOLF 4449081 - 1219643 X001 ASSY IN TAIWAN (c) 2008) connected to a PC-CARD / PCMCIA interface, with the following error messages : hda: status timeout: status=0xd0 { Busy } ide: failed opcode was: unknown hda: no DRQ after issuing MULTWRITE After testing with different bigger values for the WAIT_DRQ timeout value, the problem disappeared. I had success with WAIT_DRQ = 500ms, then with WAIT_DRQ = 300ms. I then tested with WAIT_DRQ = 200ms, but the problem reappeared. So I kept the 300ms value. Signed-off-by: Philippe De Muyter <phdm@macqel.be> -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html