diff mbox

[ide] : Increase WAIT_DRQ to support slow CF cards

Message ID 20091026162011.GA3289@frolo.macqel
State Changes Requested
Delegated to: David Miller
Headers show

Commit Message

Philippe De Muyter Oct. 26, 2009, 4:20 p.m. UTC
Hi,

I just encountered a problem with write-access to a batch of CF cards
(KINGSTON TECHNOLOGY 4GB COMPACT FLASH CF/4GB
3.3V/5V 9904321 - 006.AOOLF 4449081 - 1219643 X001 ASSY IN TAIWAN (c) 2008)
connected to a PC-CARD / PCMCIA interface, with the following error messages :

	hda: status timeout: status=0xd0 { Busy }
	ide: failed opcode was: unknown
	hda: no DRQ after issuing MULTWRITE

After testing with different bigger values for the WAIT_DRQ timeout value,
the problem disappeared.  I had success with WAIT_DRQ = 500ms, then with
WAIT_DRQ = 300ms.  I then tested with WAIT_DRQ = 200ms, but the problem
reappeared.  So I kept the 300ms value.

Signed-off-by: Philippe De Muyter <phdm@macqel.be>

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Robert Hancock Oct. 27, 2009, 12:34 a.m. UTC | #1
On 10/26/2009 10:20 AM, Philippe De Muyter wrote:
> Hi,
>
> I just encountered a problem with write-access to a batch of CF cards
> (KINGSTON TECHNOLOGY 4GB COMPACT FLASH CF/4GB
> 3.3V/5V 9904321 - 006.AOOLF 4449081 - 1219643 X001 ASSY IN TAIWAN (c) 2008)
> connected to a PC-CARD / PCMCIA interface, with the following error messages :
>
> 	hda: status timeout: status=0xd0 { Busy }
> 	ide: failed opcode was: unknown
> 	hda: no DRQ after issuing MULTWRITE
>
> After testing with different bigger values for the WAIT_DRQ timeout value,
> the problem disappeared.  I had success with WAIT_DRQ = 500ms, then with
> WAIT_DRQ = 300ms.  I then tested with WAIT_DRQ = 200ms, but the problem
> reappeared.  So I kept the 300ms value.
>
> Signed-off-by: Philippe De Muyter<phdm@macqel.be>
>
> diff -r a145344bb228 include/linux/ide.h
> --- a/include/linux/ide.h	Thu Oct 22 08:28:28 2009 +0900
> +++ b/include/linux/ide.h	Mon Oct 26 16:51:23 2009 +0100
> @@ -125,8 +125,8 @@
>    * Timeouts for various operations:
>    */
>   enum {
> -	/* spec allows up to 20ms */
> -	WAIT_DRQ	= HZ / 10,	/* 100ms */
> +	/* spec allows up to 20ms, but some CF cards need more than 200ms */
> +	WAIT_DRQ	= 3 * HZ / 10,	/* 300ms */
>   	/* some laptops are very slow */
>   	WAIT_READY	= 5 * HZ,	/* 5s */
>   	/* should be less than 3ms (?), if all ATAPI CD is closed at boot */

This has come up before:

http://marc.info/?l=linux-ide&m=123064513313466&w=2

I think this timeout should not even exist. libata has no such timeout 
(only the overall command completion timeout), and I can't find any 
reference in current ATA specs to the device being required to raise DRQ 
in any particular amount of time.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 27, 2009, 12:45 a.m. UTC | #2
From: Robert Hancock <hancockrwd@gmail.com>
Date: Mon, 26 Oct 2009 18:34:57 -0600

> This has come up before:
> 
> http://marc.info/?l=linux-ide&m=123064513313466&w=2
> 
> I think this timeout should not even exist. libata has no such timeout
> (only the overall command completion timeout), and I can't find any
> reference in current ATA specs to the device being required to raise
> DRQ in any particular amount of time.

So is the issue that, whilst we should wait for BUSY to clear,
waiting around for DRQ is unreasonable?

It seems that WAIT_DRQ is passed to ide_wait_stat() but that
only controls how long we wait for BUSY to clear, the ATA_DRQ
'bad' bit we pass there only gets probed in a fixed limit loop:

	for (i = 0; i < 10; i++) {
		udelay(1);
		stat = tp_ops->read_status(hwif);

		if (OK_STAT(stat, good, bad)) {
			*rstat = stat;
			return 0;
		}
	}
	*rstat = stat;
	return -EFAULT;

Therefore, if increasing WAIT_DRQ helps things for people, it's
because the BUSY bit needs that much time to clear in these
cases.

The talking in that thread seems to state that the ATA layer
waits only for BUSY to clear, it does not wait for DRQ.  But
from the data we're seeing here, it is in fact BUSY which needs
so much more time to clear so removing the DRQ bit probe to
be more like ATA won't fix anything.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robert Hancock Oct. 27, 2009, 1:07 a.m. UTC | #3
On Mon, Oct 26, 2009 at 6:45 PM, David Miller <davem@davemloft.net> wrote:
> From: Robert Hancock <hancockrwd@gmail.com>
> Date: Mon, 26 Oct 2009 18:34:57 -0600
>
>> This has come up before:
>>
>> http://marc.info/?l=linux-ide&m=123064513313466&w=2
>>
>> I think this timeout should not even exist. libata has no such timeout
>> (only the overall command completion timeout), and I can't find any
>> reference in current ATA specs to the device being required to raise
>> DRQ in any particular amount of time.
>
> So is the issue that, whilst we should wait for BUSY to clear,
> waiting around for DRQ is unreasonable?
>
> It seems that WAIT_DRQ is passed to ide_wait_stat() but that
> only controls how long we wait for BUSY to clear, the ATA_DRQ
> 'bad' bit we pass there only gets probed in a fixed limit loop:
>
>        for (i = 0; i < 10; i++) {
>                udelay(1);
>                stat = tp_ops->read_status(hwif);
>
>                if (OK_STAT(stat, good, bad)) {
>                        *rstat = stat;
>                        return 0;
>                }
>        }
>        *rstat = stat;
>        return -EFAULT;
>
> Therefore, if increasing WAIT_DRQ helps things for people, it's
> because the BUSY bit needs that much time to clear in these
> cases.
>
> The talking in that thread seems to state that the ATA layer
> waits only for BUSY to clear, it does not wait for DRQ.  But
> from the data we're seeing here, it is in fact BUSY which needs
> so much more time to clear so removing the DRQ bit probe to
> be more like ATA won't fix anything.

Hmm, I think you're right.. seems it expects BSY to be de-asserted
within 100ms when issuing a write, which is fairly ridiculous. Maybe
not a problem for a hard drive in typical cases, but if a CF or SSD is
in an erase cycle or something it's quite possible for this not to
work.

Of course, just jacking up the timeout may make the problem alluded to
in the comment in __ide_wait_stat more evident ("This routine should
get fixed to not hog the cpu during extra long waits"), as it just
does a tight loop polling the status with no sleeps.

libata only busy-waits for 50 microseconds, if not set then it sleeps
for 2ms and polls for another 10 microseconds, if still not set it
tries the whole thing again at 16ms intervals. Only after (typically)
30 seconds does it give up.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 27, 2009, 1:19 a.m. UTC | #4
From: Robert Hancock <hancockrwd@gmail.com>
Date: Mon, 26 Oct 2009 19:07:18 -0600

> libata only busy-waits for 50 microseconds, if not set then it sleeps
> for 2ms and polls for another 10 microseconds, if still not set it
> tries the whole thing again at 16ms intervals. Only after (typically)
> 30 seconds does it give up.

Porting that kind of logic over to IDE is a non-starter.

It's easier to get people to move over to using the ATA layer for
their devices.

Meanwhile we should provide a way for things to work, and
realistically the only way to do that currently is to bump the
WAIT_DRQ value to some large number.

And that's exactly the kind of patch I'm willing to accept for this.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robert Hancock Oct. 27, 2009, 1:40 a.m. UTC | #5
On Mon, Oct 26, 2009 at 7:19 PM, David Miller <davem@davemloft.net> wrote:
> From: Robert Hancock <hancockrwd@gmail.com>
> Date: Mon, 26 Oct 2009 19:07:18 -0600
>
>> libata only busy-waits for 50 microseconds, if not set then it sleeps
>> for 2ms and polls for another 10 microseconds, if still not set it
>> tries the whole thing again at 16ms intervals. Only after (typically)
>> 30 seconds does it give up.
>
> Porting that kind of logic over to IDE is a non-starter.
>
> It's easier to get people to move over to using the ATA layer for
> their devices.
>
> Meanwhile we should provide a way for things to work, and
> realistically the only way to do that currently is to bump the
> WAIT_DRQ value to some large number.
>
> And that's exactly the kind of patch I'm willing to accept for this.

I agree, it's sub-optimal but it helps.. if the user wants better
behavior they should a) fix it so that the card isn't using PIO, at
least if it supports DMA and b) not use drivers/ide..
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 27, 2009, 1:43 a.m. UTC | #6
From: Robert Hancock <hancockrwd@gmail.com>
Date: Mon, 26 Oct 2009 19:40:03 -0600

> On Mon, Oct 26, 2009 at 7:19 PM, David Miller <davem@davemloft.net> wrote:
>> Meanwhile we should provide a way for things to work, and
>> realistically the only way to do that currently is to bump the
>> WAIT_DRQ value to some large number.
>>
>> And that's exactly the kind of patch I'm willing to accept for this.
> 
> I agree, it's sub-optimal but it helps.. if the user wants better
> behavior they should a) fix it so that the card isn't using PIO, at
> least if it supports DMA and b) not use drivers/ide..

Philippe's patch that started this thread uses "3 * HZ / 10"
which isn't large enough for the SSD cases.  Can someone please
post a patch that uses a large enough value?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Philippe De Muyter Oct. 27, 2009, 9:45 a.m. UTC | #7
Hi David,

On Mon, Oct 26, 2009 at 06:43:18PM -0700, David Miller wrote:
> From: Robert Hancock <hancockrwd@gmail.com>
> Date: Mon, 26 Oct 2009 19:40:03 -0600
> 
> > On Mon, Oct 26, 2009 at 7:19 PM, David Miller <davem@davemloft.net> wrote:
> >> Meanwhile we should provide a way for things to work, and
> >> realistically the only way to do that currently is to bump the
> >> WAIT_DRQ value to some large number.
> >>
> >> And that's exactly the kind of patch I'm willing to accept for this.
> > 
> > I agree, it's sub-optimal but it helps.. if the user wants better
> > behavior they should a) fix it so that the card isn't using PIO, at
> > least if it supports DMA and b) not use drivers/ide..

Strangely enough, I also had no timeout problem if I started my kernel with
'ide=nodma', instead of increasing WAIT_DRQ.  So I surmise that WAIT_DRQ
is used in the dma case.

> 
> Philippe's patch that started this thread uses "3 * HZ / 10"
> which isn't large enough for the SSD cases.  Can someone please
> post a patch that uses a large enough value?

How big a timeout do you want/accept ? Mark Lord wrote about SSD's in the mail
referred by Robert Hancock :
	It should probably be at least 500msec or more now.

Philippe
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergei Shtylyov Oct. 27, 2009, 10:24 a.m. UTC | #8
Hello.

Philippe De Muyter wrote:

>> From: Robert Hancock <hancockrwd@gmail.com>
>> Date: Mon, 26 Oct 2009 19:40:03 -0600
>>
>>     
>>> On Mon, Oct 26, 2009 at 7:19 PM, David Miller <davem@davemloft.net> wrote:
>>>       
>>>> Meanwhile we should provide a way for things to work, and
>>>> realistically the only way to do that currently is to bump the
>>>> WAIT_DRQ value to some large number.
>>>>
>>>> And that's exactly the kind of patch I'm willing to accept for this.
>>>>         
>>> I agree, it's sub-optimal but it helps.. if the user wants better
>>> behavior they should a) fix it so that the card isn't using PIO, at
>>> least if it supports DMA and b) not use drivers/ide..
>>>       
>
> Strangely enough, I also had no timeout problem if I started my kernel with
> 'ide=nodma', instead of increasing WAIT_DRQ.

    Hm, interesting...

>  So I surmise that WAIT_DRQ is used in the dma case.
>
>   

   It's used only for the PIO write commands -- see do_rw_taskfile() in 
ide-taskfile.c... DMA commands don't require waiting for BSY=0, DRQ=1 
condition.

> Philippe
>   

WBR, Sergei


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Lord Oct. 31, 2009, 1:56 p.m. UTC | #9
Robert Hancock wrote:
..
> This has come up before:
> 
> http://marc.info/?l=linux-ide&m=123064513313466&w=2
> 
> I think this timeout should not even exist. libata has no such timeout 
> (only the overall command completion timeout), and I can't find any 
> reference in current ATA specs to the device being required to raise DRQ 
> in any particular amount of time.
..

The reason for the original (20ms, then 50ms) timeout was this text
from the ATA1 specification, long since outdated:

   -  Upon receipt of a Class 3 command, the drive sets BSY within 400 nsec,
      sets  up the sector buffer for a write operation, sets DRQ within 20
      msec, and clears BSY within 400 nsec of setting DRQ.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff -r a145344bb228 include/linux/ide.h
--- a/include/linux/ide.h	Thu Oct 22 08:28:28 2009 +0900
+++ b/include/linux/ide.h	Mon Oct 26 16:51:23 2009 +0100
@@ -125,8 +125,8 @@ 
  * Timeouts for various operations:
  */
 enum {
-	/* spec allows up to 20ms */
-	WAIT_DRQ	= HZ / 10,	/* 100ms */
+	/* spec allows up to 20ms, but some CF cards need more than 200ms */
+	WAIT_DRQ	= 3 * HZ / 10,	/* 300ms */
 	/* some laptops are very slow */
 	WAIT_READY	= 5 * HZ,	/* 5s */
 	/* should be less than 3ms (?), if all ATAPI CD is closed at boot */