Patchwork 2.6.32: Promise UDMA33 card refuses to work in UDMA mode

login
register
mail settings
Submitter Russell King
Date Dec. 24, 2009, 9:54 p.m.
Message ID <20091224215451.GA2476@flint.arm.linux.org.uk>
Download mbox | patch
Permalink /patch/41790/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Russell King - Dec. 24, 2009, 9:54 p.m.
Some additions due to further testing:

On Thu, Dec 24, 2009 at 06:13:00PM +0000, Russell King wrote:
> I tried upgrading my main machine from an old 2.6.23 kernel to 2.6.32.
> In doing so, I switched from IDE to ATA support.

IDE on 2.6.32 results in "irq 24: nobody cared" for the IDE interrupt.
Booting back to 2.6.23 results in everything working normally again.

> Could this be another case where the kernel should always need to write
> out the full task file to the drive?  I don't see any form of hardware
> control on the PDC20247 (which is the UDMA add-on to the PDC20246 chip)
> to control its mode, other than it snooping the taskfile writes.

I tried commenting out the caching of the drive control register:


This resulted in the same failure, but without the 'timeout' for the
second drive.  So there seems to be a separate problem in that when
the Promise host gets reset, the host state isn't properly updated on
all channels, resulting in the drive control register not being
written for the 'other' channel.
Russell King - Jan. 3, 2010, 12:23 a.m.
Okay, things have moved on, but not very much.

I've pulled the card out of the ARM machine, and its now in an x86
desktop, where, with 2.6.33-rc2 it behaves in exactly the same way.
That is:

- with libata
  - UDMA reads work fine, every time.
  - UDMA writes always fail with a CRC error.

  Note: there appears to be an additional libata bug here: when the
  speed gets knocked down to PIO mode, the block IO request is failed
  with *no* PIO retries.  Subsequent IO requests succeed.  This means
  that if you're using MD raid, your drive unnecessarily gets marked
  as failed - not nice when things are actually still operational,
  which can be seen by the other MD raid partitions on the drive
  behaving just fine in PIO mode.

  An additional note: it seems that libata reads the ATA taskfile
  back from the drive on every command completion, successful or not -
  there are at least 30 IO accesses between any two ATA commands.
  Is there any reason for that?

- with IDE
  - locks the interrupt line, and makes the machine extremely painful -
    about an hour to get to the point of being able to unload the
    pdc202xx_old module.

Having spent a lot of time today looking at the ATA drive, SFF DMA,
promise IO configuration and PCI configuration registers, I can't see
anything obviously wrong.

Nevertheless, the fact is that this card does work with previous kernels
just fine, and continues to work if I boot back to old kernels.  So,
there is something going on which the old kernels do, which makes this
card behave.

I'm going to drop investigation of libata for the time being, and
possibly switch to the old IDE driver to find out why its getting a
stuck PCI interrupt.  Once the old IDE stuff works again, I'll be able
to compare the differences between libata and IDE.

On Thu, Dec 24, 2009 at 09:54:51PM +0000, Russell King wrote:
> Some additions due to further testing:
> 
> On Thu, Dec 24, 2009 at 06:13:00PM +0000, Russell King wrote:
> > I tried upgrading my main machine from an old 2.6.23 kernel to 2.6.32.
> > In doing so, I switched from IDE to ATA support.
> 
> IDE on 2.6.32 results in "irq 24: nobody cared" for the IDE interrupt.
> Booting back to 2.6.23 results in everything working normally again.
> 
> > Could this be another case where the kernel should always need to write
> > out the full task file to the drive?  I don't see any form of hardware
> > control on the PDC20247 (which is the UDMA add-on to the PDC20246 chip)
> > to control its mode, other than it snooping the taskfile writes.
> 
> I tried commenting out the caching of the drive control register:
> 
> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
> index bbbb1fa..ddd275a 100644
> --- a/drivers/ata/libata-sff.c
> +++ b/drivers/ata/libata-sff.c
> @@ -574,12 +574,12 @@ void ata_sff_tf_load(struct ata_port *ap, const struct ata_taskfile *tf)
>  	struct ata_ioports *ioaddr = &ap->ioaddr;
>  	unsigned int is_addr = tf->flags & ATA_TFLAG_ISADDR;
>  
> -	if (tf->ctl != ap->last_ctl) {
> +//	if (tf->ctl != ap->last_ctl) {
>  		if (ioaddr->ctl_addr)
>  			iowrite8(tf->ctl, ioaddr->ctl_addr);
>  		ap->last_ctl = tf->ctl;
>  		ata_wait_idle(ap);
> -	}
> +//	}
>  
>  	if (is_addr && (tf->flags & ATA_TFLAG_LBA48)) {
>  		WARN_ON_ONCE(!ioaddr->ctl_addr);
> 
> This resulted in the same failure, but without the 'timeout' for the
> second drive.  So there seems to be a separate problem in that when
> the Promise host gets reset, the host state isn't properly updated on
> all channels, resulting in the drive control register not being
> written for the 'other' channel.
> 
> -- 
> Russell King
>  Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
>  maintainer of:
Robert Hancock - Jan. 3, 2010, 3:08 a.m.
On 01/02/2010 06:23 PM, Russell King wrote:
> Okay, things have moved on, but not very much.
>
> I've pulled the card out of the ARM machine, and its now in an x86
> desktop, where, with 2.6.33-rc2 it behaves in exactly the same way.
> That is:
>
> - with libata
>    - UDMA reads work fine, every time.
>    - UDMA writes always fail with a CRC error.
>
>    Note: there appears to be an additional libata bug here: when the
>    speed gets knocked down to PIO mode, the block IO request is failed
>    with *no* PIO retries.  Subsequent IO requests succeed.  This means
>    that if you're using MD raid, your drive unnecessarily gets marked
>    as failed - not nice when things are actually still operational,
>    which can be seen by the other MD raid partitions on the drive
>    behaving just fine in PIO mode.

libata doesn't attempt any retries internally, that's up to the upper 
layers. If they're marking the array as failed after one failed request 
without any retries, that's presumably either intentional or a bug in 
the MD layer..

>
>    An additional note: it seems that libata reads the ATA taskfile
>    back from the drive on every command completion, successful or not -
>    there are at least 30 IO accesses between any two ATA commands.
>    Is there any reason for that?

It shouldn't be reading the entire taskfile back after completion unless 
it failed or it's marked as requiring result taskfile (ATA passthru 
commands or internal commands, i.e. during probing). You're seeing this 
on normal read/write commands?

>
> - with IDE
>    - locks the interrupt line, and makes the machine extremely painful -
>      about an hour to get to the point of being able to unload the
>      pdc202xx_old module.
>
> Having spent a lot of time today looking at the ATA drive, SFF DMA,
> promise IO configuration and PCI configuration registers, I can't see
> anything obviously wrong.
>
> Nevertheless, the fact is that this card does work with previous kernels
> just fine, and continues to work if I boot back to old kernels.  So,
> there is something going on which the old kernels do, which makes this
> card behave.
>
> I'm going to drop investigation of libata for the time being, and
> possibly switch to the old IDE driver to find out why its getting a
> stuck PCI interrupt.  Once the old IDE stuff works again, I'll be able
> to compare the differences between libata and IDE.
>
> On Thu, Dec 24, 2009 at 09:54:51PM +0000, Russell King wrote:
>> Some additions due to further testing:
>>
>> On Thu, Dec 24, 2009 at 06:13:00PM +0000, Russell King wrote:
>>> I tried upgrading my main machine from an old 2.6.23 kernel to 2.6.32.
>>> In doing so, I switched from IDE to ATA support.
>>
>> IDE on 2.6.32 results in "irq 24: nobody cared" for the IDE interrupt.
>> Booting back to 2.6.23 results in everything working normally again.
>>
>>> Could this be another case where the kernel should always need to write
>>> out the full task file to the drive?  I don't see any form of hardware
>>> control on the PDC20247 (which is the UDMA add-on to the PDC20246 chip)
>>> to control its mode, other than it snooping the taskfile writes.
>>
>> I tried commenting out the caching of the drive control register:
>>
>> diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
>> index bbbb1fa..ddd275a 100644
>> --- a/drivers/ata/libata-sff.c
>> +++ b/drivers/ata/libata-sff.c
>> @@ -574,12 +574,12 @@ void ata_sff_tf_load(struct ata_port *ap, const struct ata_taskfile *tf)
>>   	struct ata_ioports *ioaddr =&ap->ioaddr;
>>   	unsigned int is_addr = tf->flags&  ATA_TFLAG_ISADDR;
>>
>> -	if (tf->ctl != ap->last_ctl) {
>> +//	if (tf->ctl != ap->last_ctl) {
>>   		if (ioaddr->ctl_addr)
>>   			iowrite8(tf->ctl, ioaddr->ctl_addr);
>>   		ap->last_ctl = tf->ctl;
>>   		ata_wait_idle(ap);
>> -	}
>> +//	}
>>
>>   	if (is_addr&&  (tf->flags&  ATA_TFLAG_LBA48)) {
>>   		WARN_ON_ONCE(!ioaddr->ctl_addr);
>>
>> This resulted in the same failure, but without the 'timeout' for the
>> second drive.  So there seems to be a separate problem in that when
>> the Promise host gets reset, the host state isn't properly updated on
>> all channels, resulting in the drive control register not being
>> written for the 'other' channel.
>>
>> --
>> Russell King
>>   Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
>>   maintainer of:
>

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - Jan. 3, 2010, 10:05 a.m.
On Sat, Jan 02, 2010 at 09:08:00PM -0600, Robert Hancock wrote:
> libata doesn't attempt any retries internally, that's up to the upper  
> layers. If they're marking the array as failed after one failed request  
> without any retries, that's presumably either intentional or a bug in  
> the MD layer..

Why should MD care?  All that MD knows is that the IO request failed,
it isn't told why.  If a request fails, then it has to assume disk
failure.

Maybe it's not a libata problem, but it _is_ bad system behaviour that
needs to be resolved in some way.

However, it's only libata which knows that a UDMA CRC error can't happen
with PIO.  If its getting UDMA CRC errors, backing the speed down
eventually to PIO is guaranteed to fix it.  The SCSI layer can't know
this, and shouldn't know this.  The MD layer certainly doesn't get to
know why requests are failing, nor can it know about these kinds of
interface behaviour, so it can't be fixed there.  That only leaves libata
which knows this, knows what's happening and knows how to fix it.

>>    An additional note: it seems that libata reads the ATA taskfile
>>    back from the drive on every command completion, successful or not -
>>    there are at least 30 IO accesses between any two ATA commands.
>>    Is there any reason for that?
>
> It shouldn't be reading the entire taskfile back after completion unless  
> it failed or it's marked as requiring result taskfile (ATA passthru  
> commands or internal commands, i.e. during probing). You're seeing this  
> on normal read/write commands?

Hmm, yes, you're right.  For normal DMA read/write commands which are
successful, no readback occurs.  However, it appears for non-data and
PIO commands during probe/error recovery - eg, this is what the accesses
for a set features, xfer mode command looks like:

ioread8(0001b407) = 50		read status
iowrite8(a0, 0001b406)		write drive
ioread8(0001b006) = 50		read alt status
ioread8(0001b407) = 50		read status
iowrite8(0a, 0001b006)		write ctrl
ioread8(0001b407) = 50		read status
iowrite8(03, 0001b401)		write taskfile
iowrite8(42, 0001b402)
iowrite8(00, 0001b403)
iowrite8(00, 0001b404)
iowrite8(00, 0001b405)
iowrite8(a0, 0001b406)
ioread8(0001b407) = 50		read status
iowrite8(ef, 0001b407)		write command
ioread8(0001b006) = d0		read altstatus
ioread8(0001b407) = d0		poll status
ioread8(0001b407) = d0
ioread8(0001b407) = d0
ioread8(0001b407) = d0
ioread8(0001b407) = d0
ioread8(0001b407) = 50
iowrite8(08, 0001b006)		write ctrl
ioread8(0001b407) = 50		read status
ioread8(0001ac0a) = 60		read sff dma status
iowrite8(60, 0001ac0a)		write sff dma status
ioread8(0001ac0a) = 60		read sff dma status
ioread8(0001b407) = 50		read status
ioread8(0001b401) = 00		read taskfile
ioread8(0001b402) = 42
ioread8(0001b403) = 00
ioread8(0001b404) = 00
ioread8(0001b405) = 00
ioread8(0001b406) = a0
ioread8(0001ac08) = 00		read sff dma control
iowrite8(00, 0001ac08)		write sff dma control
ioread8(0001ac08) = 00		read sff dma control

Identify (ec) gets the same treatment.  I don't have a trace for a PIO
read/write to hand at the moment to check what their behaviour is, but
I suspect the same on the grounds that I didn't notice a "please read
back the taskfile" flag being set for the set features command.
Alan Cox - Jan. 3, 2010, 11:40 a.m.
> eventually to PIO is guaranteed to fix it.  The SCSI layer can't know
> this, and shouldn't know this.  The MD layer certainly doesn't get to

That depends which bit you mean - at the lower levels SCSI most certainly
knows this kind of stuff as it too must deal with speed shifting and
of course has to handle domain validation.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - Jan. 3, 2010, 11:46 p.m.
On Sun, Jan 03, 2010 at 12:23:14AM +0000, Russell King wrote:
> Okay, things have moved on, but not very much.
> 
> I've pulled the card out of the ARM machine, and its now in an x86
> desktop, where, with 2.6.33-rc2 it behaves in exactly the same way.
> That is:
> 
> - with libata
>   - UDMA reads work fine, every time.
>   - UDMA writes always fail with a CRC error.
> 
>   Note: there appears to be an additional libata bug here: when the
>   speed gets knocked down to PIO mode, the block IO request is failed
>   with *no* PIO retries.  Subsequent IO requests succeed.  This means
>   that if you're using MD raid, your drive unnecessarily gets marked
>   as failed - not nice when things are actually still operational,
>   which can be seen by the other MD raid partitions on the drive
>   behaving just fine in PIO mode.
> 
>   An additional note: it seems that libata reads the ATA taskfile
>   back from the drive on every command completion, successful or not -
>   there are at least 30 IO accesses between any two ATA commands.
>   Is there any reason for that?

Now that the old IDE driver is working, comparing 2.6.33-rc2 setup.
Timing registers are at 0x60, each of 32-bits for primary master, slave,
secondary master, slave.  As can be seen, the setup is identical, so it's
not a card configuration issue.

ATA:
00: 5a 10 33 4d 07 00 00 02 01 00 04 01 00 20 00 00
10: 01 bc 00 00 05 b8 00 00 01 b4 00 00 05 b0 00 00
20: 01 ac 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 0f ff 00 00 00 00 00 00 00 00 0b 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: ee 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: f1 24 41 00 c9 f3 4f 00 f1 24 41 00 c9 f3 4f 00

IDE:
00: 5a 10 33 4d 07 00 00 02 01 00 04 01 00 20 00 00
10: 01 bc 00 00 05 b8 00 00 01 b4 00 00 05 b0 00 00
20: 01 ac 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 0f ff 00 00 00 00 00 00 00 00 0b 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: ee 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: f1 24 41 00 c4 f3 4f 00 f1 24 41 00 c4 f3 4f 00

The system control register is almost same - ATA 01d2110c vs 01da110c.
Programming it to the value it has under ATA doesn't cause any ill
effects with IDE.

This leaves one thing - the way the ATA taskfile is accessed.  Those
who don't know the Promise Ultra33 hardware, it consists of two devices -
the PDC20246, which is a PIO/SWDMA/MWDMA host, and a PDC20247 add-on,
which partly sits between the host and the drives providing the UDMA
protocol.  The 247 intercepts INTRQ, DIOR, DIOW, IORDY, DMARQ, DMACK
and DD0:15 signals, but monitors the DA0:2 and two CS signals to the
drive.

The documentation doesn't specify how the 247 works, except specifying
that it has a UDMA mode and a pass-through mode.  However, we can make
some assumptions from the wiring:

1. There is no way for the 247 to see any configuration settings;
   the only thing it can see are the taskfile reads and writes, and
   the timing of the DMA signals from the 246.

2. The 246 timings for MWDMA2 and UDMA0 are identical; there is no
   programming difference between these two modes.  The only way the
   246 can know that UDMA is selected is by looking for the SET
   FEATURES command to the drive.

So, taking all this together, I'm now down to the way libata drives the
ATA task file, which seems to be *extremely* noisy.  What is the best
way to cut down that noise, down to the bare minimum that's actually
required, while still keeping compatibility for my ICH5 SATA controller
(which I need for my x86 machine's boot drive.)

I've tried changing the set xfermode code to use a version of
ata_exec_internal() which doesn't return the taskfile, but this makes no
difference to the promise exploding with CRC errors with UDMA writes.
Is it possible to do a similar thing with IDENTIFY?

Also, is it possible to get rid of the additional identify and read native
max address commands which seem to be repeated (command register writes
listed):

iowrite8(ec, 0001b407)	identify
iowrite8(f8, 0001b407)	read native max
iowrite8(ef, 0001b407)	set features
iowrite8(ec, 0001b407)	identify
iowrite8(f8, 0001b407)	read native max
iowrite8(c8, 0001b407)	read dma, succeeds
iowrite8(c8, 0001b407)
...
iowrite8(c8, 0001b407)
repeat {
iowrite8(ca, 0001b407)	write dma, fails
iowrite8(ec, 0001b407)	identify
iowrite8(f8, 0001b407)	read native max
iowrite8(ef, 0001b407)	set features
iowrite8(ec, 0001b407)	identify
iowrite8(f8, 0001b407)	read native max
} until PIO
Alan Cox - Jan. 4, 2010, 10:37 a.m.
> 1. There is no way for the 247 to see any configuration settings;
>    the only thing it can see are the taskfile reads and writes, and
>    the timing of the DMA signals from the 246.
> 
> 2. The 246 timings for MWDMA2 and UDMA0 are identical; there is no
>    programming difference between these two modes.  The only way the
>    246 can know that UDMA is selected is by looking for the SET
>    FEATURES command to the drive.

The 2026x certainly snoops SET FEATURES so that would be a reasonable
assumption.

> I've tried changing the set xfermode code to use a version of
> ata_exec_internal() which doesn't return the taskfile, but this makes no
> difference to the promise exploding with CRC errors with UDMA writes.
> Is it possible to do a similar thing with IDENTIFY?

No because you need to know if it worked.

> Also, is it possible to get rid of the additional identify and read native
> max address commands which seem to be repeated (command register writes
> listed):

You can turn off Host Protected Area support for this. You could also btw
turn *on* host protected area for the IDE stack and see what occurs but I
imagine its a red herring anyway. If the snoop is failing then it is more
likely to be that the chip has some limitations on the taskfile snooping
and perhaps requires that the device register is always written or that
the registers are written in a specific order when writing the set
features command.

Another thing to try if that fails is using a polled set features in case
the chip has problems in that area. We've seen a similar bug on some older
VIA devices.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - Jan. 4, 2010, 1:30 p.m.
On Mon, Jan 04, 2010 at 10:37:56AM +0000, Alan Cox wrote:
> > 1. There is no way for the 247 to see any configuration settings;
> >    the only thing it can see are the taskfile reads and writes, and
> >    the timing of the DMA signals from the 246.
> > 
> > 2. The 246 timings for MWDMA2 and UDMA0 are identical; there is no
> >    programming difference between these two modes.  The only way the
> >    246 can know that UDMA is selected is by looking for the SET
> >    FEATURES command to the drive.
> 
> The 2026x certainly snoops SET FEATURES so that would be a reasonable
> assumption.
> 
> > I've tried changing the set xfermode code to use a version of
> > ata_exec_internal() which doesn't return the taskfile, but this makes no
> > difference to the promise exploding with CRC errors with UDMA writes.
> > Is it possible to do a similar thing with IDENTIFY?
> 
> No because you need to know if it worked.
> 
> > Also, is it possible to get rid of the additional identify and read native
> > max address commands which seem to be repeated (command register writes
> > listed):
> 
> You can turn off Host Protected Area support for this. You could also btw
> turn *on* host protected area for the IDE stack and see what occurs but I
> imagine its a red herring anyway. If the snoop is failing then it is more
> likely to be that the chip has some limitations on the taskfile snooping
> and perhaps requires that the device register is always written or that
> the registers are written in a specific order when writing the set
> features command.
> 
> Another thing to try if that fails is using a polled set features in case
> the chip has problems in that area. We've seen a similar bug on some older
> VIA devices.

Found the problem - getting rid of the read of the alt status register
after the command has been written fixes the UDMA CRC errors on write:

@@ -676,7 +676,8 @@ void ata_sff_exec_command(struct ata_port *ap, const struct
ata_taskfile *tf)
        DPRINTK("ata%u: cmd 0x%X\n", ap->print_id, tf->command);

        iowrite8(tf->command, ap->ioaddr.command_addr);
-       ata_sff_pause(ap);
+       ndelay(400);
+//     ata_sff_pause(ap);
 }
 EXPORT_SYMBOL_GPL(ata_sff_exec_command);


This rather makes sense.  The PDC20247 handles the UDMA part of the
protocol.  It has no way to tell the PDC20246 to wait while it suspends
UDMA, so that a normal register access can take place - the 246 ploughs
on with the register access without any regard to the state of the 247.

If the drive immediately starts the UDMA protocol after a write to the
command register (as it probably will for the DMA WRITE command), then
we'll be accessing the taskfile in the middle of the UDMA setup, which
can't be good.  It's certainly a violation of the ATA specs.
Alan Cox - Jan. 4, 2010, 3:16 p.m.
> If the drive immediately starts the UDMA protocol after a write to the
> command register (as it probably will for the DMA WRITE command), then
> we'll be accessing the taskfile in the middle of the UDMA setup, which
> can't be good.  It's certainly a violation of the ATA specs.

I concur entirely - that wants pushing into the upstream kernel ASAP and
if its ok backporting. May also be worth seeing if that fixes the funny
in the ALi setup.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 4, 2010, 3:25 p.m.
On 01/04/2010 08:30 AM, Russell King wrote:
> On Mon, Jan 04, 2010 at 10:37:56AM +0000, Alan Cox wrote:
>>> 1. There is no way for the 247 to see any configuration settings;
>>>     the only thing it can see are the taskfile reads and writes, and
>>>     the timing of the DMA signals from the 246.
>>>
>>> 2. The 246 timings for MWDMA2 and UDMA0 are identical; there is no
>>>     programming difference between these two modes.  The only way the
>>>     246 can know that UDMA is selected is by looking for the SET
>>>     FEATURES command to the drive.
>>
>> The 2026x certainly snoops SET FEATURES so that would be a reasonable
>> assumption.
>>
>>> I've tried changing the set xfermode code to use a version of
>>> ata_exec_internal() which doesn't return the taskfile, but this makes no
>>> difference to the promise exploding with CRC errors with UDMA writes.
>>> Is it possible to do a similar thing with IDENTIFY?
>>
>> No because you need to know if it worked.
>>
>>> Also, is it possible to get rid of the additional identify and read native
>>> max address commands which seem to be repeated (command register writes
>>> listed):
>>
>> You can turn off Host Protected Area support for this. You could also btw
>> turn *on* host protected area for the IDE stack and see what occurs but I
>> imagine its a red herring anyway. If the snoop is failing then it is more
>> likely to be that the chip has some limitations on the taskfile snooping
>> and perhaps requires that the device register is always written or that
>> the registers are written in a specific order when writing the set
>> features command.
>>
>> Another thing to try if that fails is using a polled set features in case
>> the chip has problems in that area. We've seen a similar bug on some older
>> VIA devices.
>
> Found the problem - getting rid of the read of the alt status register
> after the command has been written fixes the UDMA CRC errors on write:
>
> @@ -676,7 +676,8 @@ void ata_sff_exec_command(struct ata_port *ap, const struct
> ata_taskfile *tf)
>          DPRINTK("ata%u: cmd 0x%X\n", ap->print_id, tf->command);
>
>          iowrite8(tf->command, ap->ioaddr.command_addr);
> -       ata_sff_pause(ap);
> +       ndelay(400);
> +//     ata_sff_pause(ap);
>   }
>   EXPORT_SYMBOL_GPL(ata_sff_exec_command);
>
>
> This rather makes sense.  The PDC20247 handles the UDMA part of the
> protocol.  It has no way to tell the PDC20246 to wait while it suspends
> UDMA, so that a normal register access can take place - the 246 ploughs
> on with the register access without any regard to the state of the 247.
>
> If the drive immediately starts the UDMA protocol after a write to the
> command register (as it probably will for the DMA WRITE command), then
> we'll be accessing the taskfile in the middle of the UDMA setup, which
> can't be good.  It's certainly a violation of the ATA specs.

Well...

Without AltStatus, you would not be able to check and see if a command 
is complete...  pretty much by definition you should able to access that 
at any time.  If BSY=1, none of the other bits are valid, but 
nonetheless it is axiomatic that you cannot determine BSY state without 
reading AltStatus.

It is certainly conceivable that this is a problem on a rare controller 
or two, but in general, AltStatus is how you poll for DMA completion. 
You are allowed to hit it during a DMA transfer.

Well known BIOS code, ATADRVR, the IDE driver and libata all read 
AltStatus __in the middle of a transfer__ in the case of PCI shared 
interrupts, or in the case of interrupt-free polling for DMA completion.

HOWEVER, accessing AltStatus prior to the 400ns post-exec-command period 
is potentially an area of undefined behavior.  Changing this code is 
only a problem for MMIO-based controllers, which use the AltStatus read 
to guarantee that the 400ns delay occurs outside any posted writes.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 4, 2010, 3:32 p.m.
On 01/04/2010 10:16 AM, Alan Cox wrote:
>> If the drive immediately starts the UDMA protocol after a write to the
>> command register (as it probably will for the DMA WRITE command), then
>> we'll be accessing the taskfile in the middle of the UDMA setup, which
>> can't be good.  It's certainly a violation of the ATA specs.
>
> I concur entirely - that wants pushing into the upstream kernel ASAP and
> if its ok backporting. May also be worth seeing if that fixes the funny
> in the ALi setup.

How do you think one tests for command completion?  :)  There does not 
seem to be a general problem reading AltStatus(BSY) during a DMA 
transfer.  IDE driver and other well known codebases have been doing 
that for decades.

I think the undefined behavior is only where the AltStatus read occurs 
before the 400ns period specified by ATA...  excising the AltStatus read 
there (ata_sff_pause) should be fine for PIO, but problematic for MMIO 
chipsets that need posted writes flushed prior to the 400ns delay.

	Jeff




--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - Jan. 4, 2010, 3:42 p.m.
On Mon, Jan 04, 2010 at 10:25:37AM -0500, Jeff Garzik wrote:
> On 01/04/2010 08:30 AM, Russell King wrote:
>> This rather makes sense.  The PDC20247 handles the UDMA part of the
>> protocol.  It has no way to tell the PDC20246 to wait while it suspends
>> UDMA, so that a normal register access can take place - the 246 ploughs
>> on with the register access without any regard to the state of the 247.
>>
>> If the drive immediately starts the UDMA protocol after a write to the
>> command register (as it probably will for the DMA WRITE command), then
>> we'll be accessing the taskfile in the middle of the UDMA setup, which
>> can't be good.  It's certainly a violation of the ATA specs.
>
> Well...
>
> Without AltStatus, you would not be able to check and see if a command  
> is complete...

I wasn't inferring that reading altstatus was a bad idea.  I was saying
that causing altstatus to be read in the middle of an active UDMA
transfer (in other words, while the hosts DMACK is active) is a
violation of the spec.

There's no way for the 20247 (UDMA add-on) to know that the IO access
is coming; the first the 247 knows about it is when the CS/address lines
to the drive have been activated - in violation of the ATA specification
(which calls for the CS lines to be inactive and address lines to be zero
while UDMA is occuring.)

That means there's absolutely no way for the 20247 to stop the UDMA
protocol to allow a read of any registers on the drive until UDMA has
completed.

Yes, technically a buggy host design...
Russell King - Jan. 4, 2010, 3:44 p.m.
On Mon, Jan 04, 2010 at 10:32:55AM -0500, Jeff Garzik wrote:
> On 01/04/2010 10:16 AM, Alan Cox wrote:
>>> If the drive immediately starts the UDMA protocol after a write to the
>>> command register (as it probably will for the DMA WRITE command), then
>>> we'll be accessing the taskfile in the middle of the UDMA setup, which
>>> can't be good.  It's certainly a violation of the ATA specs.
>>
>> I concur entirely - that wants pushing into the upstream kernel ASAP and
>> if its ok backporting. May also be worth seeing if that fixes the funny
>> in the ALi setup.
>
> How do you think one tests for command completion?  :)  There does not  
> seem to be a general problem reading AltStatus(BSY) during a DMA  
> transfer.  IDE driver and other well known codebases have been doing  
> that for decades.
>
> I think the undefined behavior is only where the AltStatus read occurs  
> before the 400ns period specified by ATA...  excising the AltStatus read  
> there (ata_sff_pause) should be fine for PIO, but problematic for MMIO  
> chipsets that need posted writes flushed prior to the 400ns delay.

That's not the problem here - with the printk's I've added, there's no
way for the IO read/writes to come that fast.  Slowing down the IO
accesses made no difference what so ever.
Alan Cox - Jan. 4, 2010, 3:46 p.m.
> Without AltStatus, you would not be able to check and see if a command 
> is complete...  pretty much by definition you should able to access that 

Correct - and you may not make that check for 400nS after the command is
issued. The old IDE code enforces this (I know because I added it to fix
real problems as PC's got fast enough to go from write to IRQ in 400nS)

> It is certainly conceivable that this is a problem on a rare controller 
> or two, but in general, AltStatus is how you poll for DMA completion. 
> You are allowed to hit it during a DMA transfer.

But not for 400nS after a command is issued, or on certain old
controllers during a DMA transfer before you halt it.

> HOWEVER, accessing AltStatus prior to the 400ns post-exec-command period 
> is potentially an area of undefined behavior.  Changing this code is 
> only a problem for MMIO-based controllers, which use the AltStatus read 
> to guarantee that the 400ns delay occurs outside any posted writes.

Certainly for pio interfaces we should use ndelay(400). All our MMIO
controllers are SATA and that's a different world anyway, including the
fact that an altstatus read isn't going to produce a timing delay !
Basically we know ndelay(400) works.

SIL680 is MMIO in the old IDE stack but the hardware appears to be doing
the required magic itself. In fact as it can stack two commands in non
LBA48 mode (a feature we don't use) it would seem to have a fairly
complete internal state machine.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Jan. 4, 2010, 3:48 p.m.
> How do you think one tests for command completion?  :)  There does not 
> seem to be a general problem reading AltStatus(BSY) during a DMA 
> transfer.  IDE driver and other well known codebases have been doing 
> that for decades.

You use altstatus if available but you can't do so for 400nS after
command issue.

> I think the undefined behavior is only where the AltStatus read occurs 
> before the 400ns period specified by ATA...  excising the AltStatus read 
> there (ata_sff_pause) should be fine for PIO, but problematic for MMIO 
> chipsets that need posted writes flushed prior to the 400ns delay.

Old IDE says no such chipsets exist. Probably because their own designers
would have been lynched by the in house software team ;)

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Jan. 4, 2010, 3:55 p.m.
> That's not the problem here - with the printk's I've added, there's no
> way for the IO read/writes to come that fast.  Slowing down the IO
> accesses made no difference what so ever.

Doing it within 400nS of command setup is suspect at the ATA level and
can trivially fixed.

Sane PCI controllers (and quite a few less than sane ones) all implement
an AltStatus which can be checked for busy bits with DMA running or stall
that access until the DMA completes (wrongly in some CMD64x cases).

The 20246 implements its own magic 'am I busy' flag which old IDE
supports and libata doesn't. So as well as the ndelay(400) fix for PIO
controllers it needs pdc202xx_test_irq porting over to complete the fix.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - Jan. 4, 2010, 4:15 p.m.
On Mon, Jan 04, 2010 at 03:55:44PM +0000, Alan Cox wrote:
> > That's not the problem here - with the printk's I've added, there's no
> > way for the IO read/writes to come that fast.  Slowing down the IO
> > accesses made no difference what so ever.
> 
> Doing it within 400nS of command setup is suspect at the ATA level and
> can trivially fixed.
> 
> Sane PCI controllers (and quite a few less than sane ones) all implement
> an AltStatus which can be checked for busy bits with DMA running or stall
> that access until the DMA completes (wrongly in some CMD64x cases).
> 
> The 20246 implements its own magic 'am I busy' flag which old IDE
> supports and libata doesn't. So as well as the ndelay(400) fix for PIO
> controllers it needs pdc202xx_test_irq porting over to complete the fix.

Yes, but please use a non-buggy version of pdc202xx_test_irq() as per
my recent patch.  Testing for the DMA FIFOs being non-empty to qualify
an interrupt is Very Bad News too.

It's interesting to note that SFF 8038 suggests an approach to dealing
with IDE interrupts connected to a SFF compatible host interface:

3) Software issues the appropriate DMA transfer command to the disk device.
4) Engage the bus master function by writing a '1' to the Start bit in the
   Bus Master IDE Command Register for the appropriate channel.
5) The controller transfers data to/from memory responding to DMA requests
   from the IDE device.
6) at the end of the transfer the IDE device signals an interrupt .
7) In response to the interrupt, software resets the Start/Stop bit in the
   command register. It then reads the controller status and then the drive
   status to determine if the transfer completed successfully.

It goes on to say that the interrupt status bit in the BM-DMA status
register will only be set after buffered data has been transferred -
which means reading the altstatus register to determine when the device
has finished would give a premature indication and doesn't take any
account of the BM-DMA buffering.

So I think Jeff is not right on the "must read altstatus" point.

I would not be surprised if lots of host controllers were actually buggy
in respect of reading any ATA device register mid-transfer - firstly a
started UDMA operation can't be stopped before at least one word has been
transferred, nor can it be stopped without first transferring the CRC
bytes.  That's an awful lot of logic (and overhead) to be randomly
interacting with due to other shared devices interrupt activity.

Reading the BM-DMA status register (or equivalent on hosts) seems to be
the right way to tell if an interrupt is pending.
Bartlomiej Zolnierkiewicz - Jan. 4, 2010, 4:31 p.m.
On Monday 04 January 2010 02:30:24 pm Russell King wrote:
> On Mon, Jan 04, 2010 at 10:37:56AM +0000, Alan Cox wrote:
> > > 1. There is no way for the 247 to see any configuration settings;
> > >    the only thing it can see are the taskfile reads and writes, and
> > >    the timing of the DMA signals from the 246.
> > > 
> > > 2. The 246 timings for MWDMA2 and UDMA0 are identical; there is no
> > >    programming difference between these two modes.  The only way the
> > >    246 can know that UDMA is selected is by looking for the SET
> > >    FEATURES command to the drive.
> > 
> > The 2026x certainly snoops SET FEATURES so that would be a reasonable
> > assumption.
> > 
> > > I've tried changing the set xfermode code to use a version of
> > > ata_exec_internal() which doesn't return the taskfile, but this makes no
> > > difference to the promise exploding with CRC errors with UDMA writes.
> > > Is it possible to do a similar thing with IDENTIFY?
> > 
> > No because you need to know if it worked.
> > 
> > > Also, is it possible to get rid of the additional identify and read native
> > > max address commands which seem to be repeated (command register writes
> > > listed):
> > 
> > You can turn off Host Protected Area support for this. You could also btw
> > turn *on* host protected area for the IDE stack and see what occurs but I
> > imagine its a red herring anyway. If the snoop is failing then it is more
> > likely to be that the chip has some limitations on the taskfile snooping
> > and perhaps requires that the device register is always written or that
> > the registers are written in a specific order when writing the set
> > features command.
> > 
> > Another thing to try if that fails is using a polled set features in case
> > the chip has problems in that area. We've seen a similar bug on some older
> > VIA devices.
> 
> Found the problem - getting rid of the read of the alt status register
> after the command has been written fixes the UDMA CRC errors on write:
> 
> @@ -676,7 +676,8 @@ void ata_sff_exec_command(struct ata_port *ap, const struct
> ata_taskfile *tf)
>         DPRINTK("ata%u: cmd 0x%X\n", ap->print_id, tf->command);
> 
>         iowrite8(tf->command, ap->ioaddr.command_addr);
> -       ata_sff_pause(ap);
> +       ndelay(400);
> +//     ata_sff_pause(ap);
>  }
>  EXPORT_SYMBOL_GPL(ata_sff_exec_command);
> 
> 
> This rather makes sense.  The PDC20247 handles the UDMA part of the
> protocol.  It has no way to tell the PDC20246 to wait while it suspends
> UDMA, so that a normal register access can take place - the 246 ploughs
> on with the register access without any regard to the state of the 247.
> 
> If the drive immediately starts the UDMA protocol after a write to the
> command register (as it probably will for the DMA WRITE command), then
> we'll be accessing the taskfile in the middle of the UDMA setup, which
> can't be good.  It's certainly a violation of the ATA specs.

Great debugging work!

It is very likely that your fix is also valid for UDMA problems reported
for later PDC2026x UDMA66 chips and libata..

--
Bartlomiej Zolnierkiewicz
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 4, 2010, 4:32 p.m.
On 01/04/2010 10:46 AM, Alan Cox wrote:
>> Without AltStatus, you would not be able to check and see if a command
>> is complete...  pretty much by definition you should able to access that
>
> Correct - and you may not make that check for 400nS after the command is
> issued. The old IDE code enforces this (I know because I added it to fix
> real problems as PC's got fast enough to go from write to IRQ in 400nS)
>
>> It is certainly conceivable that this is a problem on a rare controller
>> or two, but in general, AltStatus is how you poll for DMA completion.
>> You are allowed to hit it during a DMA transfer.
>
> But not for 400nS after a command is issued, or on certain old
> controllers during a DMA transfer before you halt it.
>
>> HOWEVER, accessing AltStatus prior to the 400ns post-exec-command period
>> is potentially an area of undefined behavior.  Changing this code is
>> only a problem for MMIO-based controllers, which use the AltStatus read
>> to guarantee that the 400ns delay occurs outside any posted writes.
>
> Certainly for pio interfaces we should use ndelay(400). All our MMIO
> controllers are SATA and that's a different world anyway, including the
> fact that an altstatus read isn't going to produce a timing delay !
> Basically we know ndelay(400) works.

I think you misunderstood the "Changing this code..." sentence.  I make 
no claim, nor is there code, that says AltStatus produces the necessary 
400ns delay.

To rephrase/repeat, the AltStatus read in ata_sff_pause() exists to 
ensure that any previous posted MMIO writes -- most notably the taskfile 
registers we just wrote -- are flushed from CPU and PCI bridges etc. to 
the ATA host controller chip.

After the AltStatus read is executed, the delay is executed.  Here is 
the libata implementation, in its entirety:

	void ata_sff_pause(struct ata_port *ap)
	{
	        ata_sff_sync(ap); /* dummy AltStatus register read */
	        ndelay(400);
	}

On MMIO, you cannot guarantee that ndelay(400) occurs after the ATA 
Command register is written, without some method of ensuring that all 
posted writes are flushed.  Typically, this is done by a dummy register 
read of some worthless register -- AltStatus was used in this case.

Absent that, ndelay(400) could occur in _parallel_ with the posted 
taskfile writes, essentially eliminating the spec-required delay.

Thus,

(1a)  The solution for non-MMIO controllers is obvious and easy: remove 
the dummy AltStatus register read, as Russell has done in his test patch.

(1b)  The solution for MMIO controllers is a bit more complex:  replace 
the dummy AltStatus register read with something else.


> SIL680 is MMIO in the old IDE stack but the hardware appears to be doing
> the required magic itself.

You cannot draw such a conclusion from looking only at the SIL680 chip. 
   SIL680 might easily be behind a PCI bridge.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 4, 2010, 4:48 p.m.
On 01/04/2010 11:15 AM, Russell King wrote:
> It goes on to say that the interrupt status bit in the BM-DMA status
> register will only be set after buffered data has been transferred -
> which means reading the altstatus register to determine when the device
> has finished would give a premature indication and doesn't take any
> account of the BM-DMA buffering.
>
> So I think Jeff is not right on the "must read altstatus" point.
>
> I would not be surprised if lots of host controllers were actually buggy
> in respect of reading any ATA device register mid-transfer - firstly a
> started UDMA operation can't be stopped before at least one word has been
> transferred, nor can it be stopped without first transferring the CRC
> bytes.  That's an awful lot of logic (and overhead) to be randomly
> interacting with due to other shared devices interrupt activity.
>
> Reading the BM-DMA status register (or equivalent on hosts) seems to be
> the right way to tell if an interrupt is pending.


libata already does that, as does IDE:

         case HSM_ST_LAST:
                 if (qc->tf.protocol == ATA_PROT_DMA ||
                     qc->tf.protocol == ATAPI_PROT_DMA) {

                         /* check status of DMA engine */
                         host_stat = ap->ops->bmdma_status(ap);
                         VPRINTK("ata%u: host_stat 0x%X\n",
                                 ap->print_id, host_stat);

                         /* if it's not our irq... */
                         if (!(host_stat & ATA_DMA_INTR))
                                 goto idle_irq;

[...]

         /* check main status, clearing INTRQ if needed */

         status = ata_sff_irq_status(ap);  <-- reads AltStatus here
         if (status & ATA_BUSY)
                 goto idle_irq;

IDE driver has basically the same logic in drive_is_ready()

I never said "must" read, I was only responding to the notion that 
touching AltStatus at all, in general, was a violation of the ATA spec. 
  IFF there is a DMA status register available to be checked, and IFF 
the transfer is DMA, then we do check that first.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Jan. 4, 2010, 5:02 p.m.
> (1b)  The solution for MMIO controllers is a bit more complex:  replace 
> the dummy AltStatus register read with something else.

If we had any SFF PATA controllers using MMIO. I can't find any. SATA is
different anyway. In fact we probably want to avoid such delays on a pure
SATA controller.

> > SIL680 is MMIO in the old IDE stack but the hardware appears to be doing
> > the required magic itself.
> 
> You cannot draw such a conclusion from looking only at the SIL680 chip. 
>    SIL680 might easily be behind a PCI bridge.

Not the one I have, nor it appears have there ever been any reports of
the problem on any system in years. That problem would show up if it
exists. It's also necessary that the chip can do its own timings in some
of its fancier modes.

The person who has the problem to worry about is Ben, but the older PMAC
stuff has its own implementation of all the DMA/IRQ/Command stuff as far
as I can see so its not relevant there.

Several years and zillions of users say there isn't a problem. The fact
we only enable the SIL680 mmio on libata also means its a rather
irrelevant discussion in the first place.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergei Shtylyov - Jan. 4, 2010, 5:16 p.m.
Hello.

Russell King wrote:

>>>That's not the problem here - with the printk's I've added, there's no
>>>way for the IO read/writes to come that fast.  Slowing down the IO
>>>accesses made no difference what so ever.

>>Doing it within 400nS of command setup is suspect at the ATA level and
>>can trivially fixed.

>>Sane PCI controllers (and quite a few less than sane ones) all implement
>>an AltStatus which can be checked for busy bits with DMA running or stall
>>that access until the DMA completes (wrongly in some CMD64x cases).

>>The 20246 implements its own magic 'am I busy' flag which old IDE
>>supports and libata doesn't. So as well as the ndelay(400) fix for PIO
>>controllers it needs pdc202xx_test_irq porting over to complete the fix.

> Yes, but please use a non-buggy version of pdc202xx_test_irq() as per
> my recent patch.  Testing for the DMA FIFOs being non-empty to qualify
> an interrupt is Very Bad News too.

    Sorry about that -- I was the author of pdc202xx_test_irq()...

> It's interesting to note that SFF 8038 suggests an approach to dealing
> with IDE interrupts connected to a SFF compatible host interface:

> 3) Software issues the appropriate DMA transfer command to the disk device.
> 4) Engage the bus master function by writing a '1' to the Start bit in the
>    Bus Master IDE Command Register for the appropriate channel.
> 5) The controller transfers data to/from memory responding to DMA requests
>    from the IDE device.
> 6) at the end of the transfer the IDE device signals an interrupt .
> 7) In response to the interrupt, software resets the Start/Stop bit in the
>    command register. It then reads the controller status and then the drive
>    status to determine if the transfer completed successfully.

> It goes on to say that the interrupt status bit in the BM-DMA status
> register will only be set after buffered data has been transferred -
> which means reading the altstatus register to determine when the device
> has finished would give a premature indication and doesn't take any
> account of the BM-DMA buffering.

    Yes.

> So I think Jeff is not right on the "must read altstatus" point.

    Frankly speaking, I'm not sure why Jeff insisted on that, at least for 
the DMA case...

> I would not be surprised if lots of host controllers were actually buggy
> in respect of reading any ATA device register mid-transfer - firstly a
> started UDMA operation can't be stopped before at least one word has been
> transferred, nor can it be stopped without first transferring the CRC
> bytes.  That's an awful lot of logic (and overhead) to be randomly
> interacting with due to other shared devices interrupt activity.

> Reading the BM-DMA status register (or equivalent on hosts) seems to be
> the right way to tell if an interrupt is pending.

    Except the BMDMA interrupt bit (generally) doesn't get  set for the PIO 
transfers. Anyway, the IDE core eventually reads that register for the DMA 
transfers. I'm sure libata also does eventually read it; looking at the code 
in ata_sff_host_intr(), it even reads the BMDMA status first, before 
touching any of the IDE status registers.

MBR, Sergei
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 4, 2010, 5:27 p.m.
On 01/04/2010 12:02 PM, Alan Cox wrote:
>> (1b)  The solution for MMIO controllers is a bit more complex:  replace
>> the dummy AltStatus register read with something else.
>
> If we had any SFF PATA controllers using MMIO. I can't find any. SATA is
> different anyway. In fact we probably want to avoid such delays on a pure
> SATA controller.

Early SATA controllers are just PATA controllers in disguise.  All SFF 
controllers want that 400ns delay.  The 400ns delay should -not- be avoided.

Because several SATA controllers are SFF and use the code in question, 
the MMIO issue is relevant for the code change, even if it is irrelevant 
to drivers/ata/pata_*.c.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - Jan. 4, 2010, 5:28 p.m.
On Mon, Jan 04, 2010 at 05:31:35PM +0100, Bartlomiej Zolnierkiewicz wrote:
> Great debugging work!
> 
> It is very likely that your fix is also valid for UDMA problems reported
> for later PDC2026x UDMA66 chips and libata..

Possibly...

But, unrelated to this thread, I have a question for you.  While digging
through the changes to find out why drivers/ide broke on PDC20246, I
noticed dff8817.

This change is bogus.

        writeb(0, base + ICS_ARCIN_V6_INTROFFSET_1);
        readb(base + ICS_ARCIN_V6_INTROFFSET_2);

        writeb(0, base + ICS_ARCIN_V6_INTROFFSET_2);
        readb(base + ICS_ARCIN_V6_INTROFFSET_1);

This sequence of code does:

1. enable interrupt 1
2. disable interrupt 2
3. enable interrupt 2
4. disable interrupt 1

which results in the interrupt for the second channel being enabled -
leaving channel 1 blocked.

Firstly, icside shares its two IDE channels with one DMA engine - so it's
a simplex interface.  IDE supports those (or did when the code was written)
serializing requests between the two interfaces.  libata does not.

Secondly, the interrupt lines on icside float when there's no drive connected
or when the drive has its NIEN bit set, which means that you get spurious
screaming interrupts which can kill off all expansion card interrupts on
the machine unless you disable the channel interrupt on the card.

Since libata can not serialize the operation of the two channels like IDE
can, the libata version of the icside driver does not contain the interrupt
stearing logic.  Instead, it looks at the status after reset, and if
nothing was found on that channel, it masks the interrupt from that
channel.

In this respect, the IDE driver is more fully featured than the libata
driver - the IDE driver supports DMA on both channels, whereas the libata
version only supports DMA on one channel.

So, your change which was "inspired by pata_icside" is plain wrong.
Please revert it.
Russell King - Jan. 4, 2010, 5:30 p.m.
On Mon, Jan 04, 2010 at 12:27:54PM -0500, Jeff Garzik wrote:
> On 01/04/2010 12:02 PM, Alan Cox wrote:
>>> (1b)  The solution for MMIO controllers is a bit more complex:  replace
>>> the dummy AltStatus register read with something else.
>>
>> If we had any SFF PATA controllers using MMIO. I can't find any. SATA is
>> different anyway. In fact we probably want to avoid such delays on a pure
>> SATA controller.
>
> Early SATA controllers are just PATA controllers in disguise.  All SFF  
> controllers want that 400ns delay.  The 400ns delay should -not- be 
> avoided.

Note that ICH5 SATA is SFF, which only offers non-MMIO addressing; the
change I made is running there just fine:

libata version 3.00 loaded.
pata_acpi 0000:00:1f.1: PCI INT A -> GSI 18 (level, low) -> IRQ 18
pata_acpi 0000:00:1f.1: setting latency timer to 64
pata_acpi 0000:00:1f.1: PCI INT A disabled
pata_acpi 0000:00:1f.2: PCI INT A -> GSI 18 (level, low) -> IRQ 18
pata_acpi 0000:00:1f.2: setting latency timer to 64
pata_acpi 0000:00:1f.2: PCI INT A disabled
input: PS/2 Logitech Mouse as /devices/platform/i8042/serio1/input/input3
ata_piix 0000:00:1f.1: version 2.13
ata_piix 0000:00:1f.1: PCI INT A -> GSI 18 (level, low) -> IRQ 18
ata_piix 0000:00:1f.1: setting latency timer to 64
scsi0 : ata_piix
scsi1 : ata_piix
ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xfc00 irq 14
ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xfc08 irq 15
ata_piix 0000:00:1f.2: PCI INT A -> GSI 18 (level, low) -> IRQ 18
ata_piix 0000:00:1f.2: MAP [ P0 -- P1 -- ]
ata_piix 0000:00:1f.2: setting latency timer to 64
scsi2 : ata_piix
scsi3 : ata_piix
ata3: SATA max UDMA/133 cmd 0xd800 ctl 0xd400 bmdma 0xc800 irq 18
ata4: SATA max UDMA/133 cmd 0xd000 ctl 0xcc00 bmdma 0xc808 irq 18
ata4.00: ATA-7: MAXTOR STM3160811AS, 3.AAE, max UDMA/133
ata4.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata4.00: configured for UDMA/133
scsi 3:0:0:0: Direct-Access     ATA      MAXTOR STM316081 3.AA PQ: 0 ANSI: 5
sd 3:0:0:0: [sda] 312581808 512-byte logical blocks: (160 GB/149 GiB)
sd 3:0:0:0: [sda] Write Protect is off
sd 3:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2 sda3
sd 3:0:0:0: [sda] Attached SCSI disk
Alan Cox - Jan. 4, 2010, 5:38 p.m.
> Early SATA controllers are just PATA controllers in disguise.  All SFF 
> controllers want that 400ns delay.  The 400ns delay should -not- be avoided.

Its a signalling protocol requirement for PATA or a PATA controller with
a PATA/SATA bridge. What makes you think a SATA controller cares one
iota ?

> Because several SATA controllers are SFF and use the code in question, 
> the MMIO issue is relevant for the code change, even if it is irrelevant 
> to drivers/ata/pata_*.c.

and to SATA. It's simply not relevant at a protocol level to SATA which
is message passing anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Jan. 4, 2010, 5:39 p.m.
> Firstly, icside shares its two IDE channels with one DMA engine - so it's
> a simplex interface.  IDE supports those (or did when the code was written)
> serializing requests between the two interfaces.  libata does not.

Libata does the driver may not.

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - Jan. 4, 2010, 5:46 p.m.
On Mon, Jan 04, 2010 at 05:39:58PM +0000, Alan Cox wrote:
> > Firstly, icside shares its two IDE channels with one DMA engine - so it's
> > a simplex interface.  IDE supports those (or did when the code was written)
> > serializing requests between the two interfaces.  libata does not.
> 
> Libata does the driver may not.

libata certainly didn't when I wrote the driver in 2007.

int ata_do_set_mode(struct ata_link *link, struct ata_device **r_failed_dev)
{
        /* Record simplex status. If we selected DMA then the other
         * host channels are not permitted to do so.
         */
        if (used_dma && (ap->host->flags & ATA_HOST_SIMPLEX))
                ap->host->simplex_claimed = ap;
...
static void ata_dev_xfermask(struct ata_device *dev)
{
...
        if ((host->flags & ATA_HOST_SIMPLEX) &&
            host->simplex_claimed && host->simplex_claimed != ap) {
                xfer_mask &= ~(ATA_MASK_MWDMA | ATA_MASK_UDMA);
                ata_dev_printk(dev, KERN_WARNING, "simplex DMA is claimed by "
                               "other device, disabling DMA\n");
        }

and apparantly still doesn't.  Only one channel of a simplex device is
permitted to be configured for DMA at any one time.
Bartlomiej Zolnierkiewicz - Jan. 4, 2010, 5:49 p.m.
On Monday 04 January 2010 06:39:58 pm Alan Cox wrote:
> > Firstly, icside shares its two IDE channels with one DMA engine - so it's
> > a simplex interface.  IDE supports those (or did when the code was written)
> > serializing requests between the two interfaces.  libata does not.
> 
> Libata does the driver may not.

IIRC it didn't at the time pata_icside was written.

--
Bartlomiej Zolnierkiewicz
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 4, 2010, 6:03 p.m.
On 01/04/2010 12:30 PM, Russell King wrote:
> On Mon, Jan 04, 2010 at 12:27:54PM -0500, Jeff Garzik wrote:
>> On 01/04/2010 12:02 PM, Alan Cox wrote:
>>>> (1b)  The solution for MMIO controllers is a bit more complex:  replace
>>>> the dummy AltStatus register read with something else.
>>>
>>> If we had any SFF PATA controllers using MMIO. I can't find any. SATA is
>>> different anyway. In fact we probably want to avoid such delays on a pure
>>> SATA controller.
>>
>> Early SATA controllers are just PATA controllers in disguise.  All SFF
>> controllers want that 400ns delay.  The 400ns delay should -not- be
>> avoided.
>
> Note that ICH5 SATA is SFF, which only offers non-MMIO addressing; the
> change I made is running there just fine:

Yes, your change should be fine for all non-MMIO SFF controllers, SATA 
or PATA.

The AltStatus read is simply a dummy read, for MMIO controllers, to 
ensure the previously-written taskfile registers make it to the 
controller before the ndelay() begins execution.

	Jeff




--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - Jan. 4, 2010, 6:06 p.m.
On Mon, Jan 04, 2010 at 01:03:36PM -0500, Jeff Garzik wrote:
> On 01/04/2010 12:30 PM, Russell King wrote:
>> On Mon, Jan 04, 2010 at 12:27:54PM -0500, Jeff Garzik wrote:
>>> On 01/04/2010 12:02 PM, Alan Cox wrote:
>>>>> (1b)  The solution for MMIO controllers is a bit more complex:  replace
>>>>> the dummy AltStatus register read with something else.
>>>>
>>>> If we had any SFF PATA controllers using MMIO. I can't find any. SATA is
>>>> different anyway. In fact we probably want to avoid such delays on a pure
>>>> SATA controller.
>>>
>>> Early SATA controllers are just PATA controllers in disguise.  All SFF
>>> controllers want that 400ns delay.  The 400ns delay should -not- be
>>> avoided.
>>
>> Note that ICH5 SATA is SFF, which only offers non-MMIO addressing; the
>> change I made is running there just fine:
>
> Yes, your change should be fine for all non-MMIO SFF controllers, SATA  
> or PATA.
>
> The AltStatus read is simply a dummy read, for MMIO controllers, to  
> ensure the previously-written taskfile registers make it to the  
> controller before the ndelay() begins execution.

Can we solve the problem by doing a read of the BMDMA status register?

As it is, the command register and alt status registers are in different
BARs on the controller, so reading from BAR 4 instead of 1/3 should have
the same effect?
Jeff Garzik - Jan. 4, 2010, 6:07 p.m.
On 01/04/2010 12:38 PM, Alan Cox wrote:
>> Early SATA controllers are just PATA controllers in disguise.  All SFF
>> controllers want that 400ns delay.  The 400ns delay should -not- be avoided.
>
> Its a signalling protocol requirement for PATA or a PATA controller with
> a PATA/SATA bridge. What makes you think a SATA controller cares one
> iota ?

Because first gen SATA controllers, notably the Promise ones, were often 
just the same old PATA silicon logic with additional PATA<->SATA 
translation glue logic, rather than a pure SATA SFF implementation.


>> Because several SATA controllers are SFF and use the code in question,
>> the MMIO issue is relevant for the code change, even if it is irrelevant
>> to drivers/ata/pata_*.c.
>
> and to SATA. It's simply not relevant at a protocol level to SATA which
> is message passing anyway.

It is not relevant to FIS-based SATA (SiI 3124, AHCI), sure.  But SFF is 
not FIS-based SATA.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Jan. 4, 2010, 6:20 p.m.
> libata certainly didn't when I wrote the driver in 2007.
> 
> int ata_do_set_mode(struct ata_link *link, struct ata_device **r_failed_dev)
> {
>         /* Record simplex status. If we selected DMA then the other
>          * host channels are not permitted to do so.
>          */
>         if (used_dma && (ap->host->flags & ATA_HOST_SIMPLEX))
>                 ap->host->simplex_claimed = ap;
> ...
> static void ata_dev_xfermask(struct ata_device *dev)
> {
> ...
>         if ((host->flags & ATA_HOST_SIMPLEX) &&
>             host->simplex_claimed && host->simplex_claimed != ap) {
>                 xfer_mask &= ~(ATA_MASK_MWDMA | ATA_MASK_UDMA);
>                 ata_dev_printk(dev, KERN_WARNING, "simplex DMA is claimed by "
>                                "other device, disabling DMA\n");
>         }
> 
> and apparantly still doesn't.  Only one channel of a simplex device is
> permitted to be configured for DMA at any one time.

Simplex has a specific meaning and we implement it to the meaning of the
standard PCI config bits, which is that you have a single channel that can
be configured for DMA. There is nothing in the standard that says you can
play fast and loose flipping them back and forth.

If your controller simply can't handle two DMA transactions (or two
transactions in general) at the same time you can provide a private
qc_defer function and ask the midlayer to defer commands that don't fit
your needs.

Ie the simplex handling is like the Simplex bit handling in drivers/ide,
and the qc_defer method is akin to the ->serialize flag in drivers/ide,
but a bit more flexible.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 4, 2010, 6:29 p.m.
On 01/04/2010 12:02 PM, Alan Cox wrote:
> If we had any SFF PATA controllers using MMIO. I can't find any.

Well,

[jgarzik@bd linux-2.6]$ grep -l ATA_FLAG_MMIO drivers/ata/pata_*.c
drivers/ata/pata_at32.c
drivers/ata/pata_bf54x.c
drivers/ata/pata_ixp4xx_cf.c
drivers/ata/pata_macio.c
drivers/ata/pata_octeon_cf.c
drivers/ata/pata_palmld.c
drivers/ata/pata_pdc2027x.c
drivers/ata/pata_rb532_cf.c
drivers/ata/pata_scc.c

plus drivers/ata/pata_platform.c

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 4, 2010, 6:35 p.m.
On 01/04/2010 01:06 PM, Russell King wrote:
> On Mon, Jan 04, 2010 at 01:03:36PM -0500, Jeff Garzik wrote:
>> On 01/04/2010 12:30 PM, Russell King wrote:
>>> On Mon, Jan 04, 2010 at 12:27:54PM -0500, Jeff Garzik wrote:
>>>> On 01/04/2010 12:02 PM, Alan Cox wrote:
>>>>>> (1b)  The solution for MMIO controllers is a bit more complex:  replace
>>>>>> the dummy AltStatus register read with something else.
>>>>>
>>>>> If we had any SFF PATA controllers using MMIO. I can't find any. SATA is
>>>>> different anyway. In fact we probably want to avoid such delays on a pure
>>>>> SATA controller.
>>>>
>>>> Early SATA controllers are just PATA controllers in disguise.  All SFF
>>>> controllers want that 400ns delay.  The 400ns delay should -not- be
>>>> avoided.
>>>
>>> Note that ICH5 SATA is SFF, which only offers non-MMIO addressing; the
>>> change I made is running there just fine:
>>
>> Yes, your change should be fine for all non-MMIO SFF controllers, SATA
>> or PATA.
>>
>> The AltStatus read is simply a dummy read, for MMIO controllers, to
>> ensure the previously-written taskfile registers make it to the
>> controller before the ndelay() begins execution.
>
> Can we solve the problem by doing a read of the BMDMA status register?
>
> As it is, the command register and alt status registers are in different
> BARs on the controller, so reading from BAR 4 instead of 1/3 should have
> the same effect?

Correct -- it does not matter which controller register is read, to 
ensure the MMIO flush.

Standard MMIO flush / PCI posting rules, nothing unique to PATA at all 
really.  You can see examples of MMIO flushes all over the tree, such as 
the cpw8_f() and cpw32_f() macros in drivers/net/8139cp.c.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robert Hancock - Jan. 5, 2010, 1:03 a.m.
On 01/04/2010 07:30 AM, Russell King wrote:
> On Mon, Jan 04, 2010 at 10:37:56AM +0000, Alan Cox wrote:
>>> 1. There is no way for the 247 to see any configuration settings;
>>>     the only thing it can see are the taskfile reads and writes, and
>>>     the timing of the DMA signals from the 246.
>>>
>>> 2. The 246 timings for MWDMA2 and UDMA0 are identical; there is no
>>>     programming difference between these two modes.  The only way the
>>>     246 can know that UDMA is selected is by looking for the SET
>>>     FEATURES command to the drive.
>>
>> The 2026x certainly snoops SET FEATURES so that would be a reasonable
>> assumption.
>>
>>> I've tried changing the set xfermode code to use a version of
>>> ata_exec_internal() which doesn't return the taskfile, but this makes no
>>> difference to the promise exploding with CRC errors with UDMA writes.
>>> Is it possible to do a similar thing with IDENTIFY?
>>
>> No because you need to know if it worked.
>>
>>> Also, is it possible to get rid of the additional identify and read native
>>> max address commands which seem to be repeated (command register writes
>>> listed):
>>
>> You can turn off Host Protected Area support for this. You could also btw
>> turn *on* host protected area for the IDE stack and see what occurs but I
>> imagine its a red herring anyway. If the snoop is failing then it is more
>> likely to be that the chip has some limitations on the taskfile snooping
>> and perhaps requires that the device register is always written or that
>> the registers are written in a specific order when writing the set
>> features command.
>>
>> Another thing to try if that fails is using a polled set features in case
>> the chip has problems in that area. We've seen a similar bug on some older
>> VIA devices.
>
> Found the problem - getting rid of the read of the alt status register
> after the command has been written fixes the UDMA CRC errors on write:
>
> @@ -676,7 +676,8 @@ void ata_sff_exec_command(struct ata_port *ap, const struct
> ata_taskfile *tf)
>          DPRINTK("ata%u: cmd 0x%X\n", ap->print_id, tf->command);
>
>          iowrite8(tf->command, ap->ioaddr.command_addr);
> -       ata_sff_pause(ap);
> +       ndelay(400);
> +//     ata_sff_pause(ap);
>   }
>   EXPORT_SYMBOL_GPL(ata_sff_exec_command);
>
>
> This rather makes sense.  The PDC20247 handles the UDMA part of the
> protocol.  It has no way to tell the PDC20246 to wait while it suspends
> UDMA, so that a normal register access can take place - the 246 ploughs
> on with the register access without any regard to the state of the 247.
>
> If the drive immediately starts the UDMA protocol after a write to the
> command register (as it probably will for the DMA WRITE command), then
> we'll be accessing the taskfile in the middle of the UDMA setup, which
> can't be good.  It's certainly a violation of the ATA specs.

I don't think it is. From ATA-8 APT:

"HDMA0: Check_Status State: This state is entered when the host has 
written a DMA command to the device; when all data for the command has 
been transferred and nIEN is set to one; or when all data for the 
command has been transferred, nIEN is cleared zero, and INTRQ has been 
asserted.

When in this state, the host shall read the device Status register. When 
entering this state from the HI4 state, the host shall wait 400 ns 
before reading the Status register. When entering this state from the 
HDMA1 state,
the host shall wait one PIO transfer cycle time before reading the 
Status register. The wait may be accomplished by reading the Alternate 
Status register and ignoring the result."

The last sentence is basically what the code in ata_sff_pause is doing..
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robert Hancock - Jan. 5, 2010, 2:06 a.m.
On 01/04/2010 09:42 AM, Russell King wrote:
> On Mon, Jan 04, 2010 at 10:25:37AM -0500, Jeff Garzik wrote:
>> On 01/04/2010 08:30 AM, Russell King wrote:
>>> This rather makes sense.  The PDC20247 handles the UDMA part of the
>>> protocol.  It has no way to tell the PDC20246 to wait while it suspends
>>> UDMA, so that a normal register access can take place - the 246 ploughs
>>> on with the register access without any regard to the state of the 247.
>>>
>>> If the drive immediately starts the UDMA protocol after a write to the
>>> command register (as it probably will for the DMA WRITE command), then
>>> we'll be accessing the taskfile in the middle of the UDMA setup, which
>>> can't be good.  It's certainly a violation of the ATA specs.
>>
>> Well...
>>
>> Without AltStatus, you would not be able to check and see if a command
>> is complete...
>
> I wasn't inferring that reading altstatus was a bad idea.  I was saying
> that causing altstatus to be read in the middle of an active UDMA
> transfer (in other words, while the hosts DMACK is active) is a
> violation of the spec.

I wouldn't say that's the case, as I don't see any such prohibition in 
the ATA or SFF host adapter specs. The host doesn't necessarily know 
whether or not a UDMA transfer is active or not. It's supposed to be the 
controller's job to either terminate the UDMA burst or stall the access 
until it can handle the request. It seems a bit of a deficiency of this 
controller that it can't.

The problem with just taking the altstatus read out is that in certain 
cases, the ATA spec requires that we wait for not 400ns, but one PIO 
cycle time before reading the status register (which may be up to 
600ns). The altstatus read inherently accomplishes this since it causes 
a bus cycle. The spec says nothing about not being allowed to access 
altstatus during this period, and in fact ATA-6 and up explictly suggest 
a dummy altstatus read to do this.

One alternative might be to increase the delay to 600ns and change it to 
read the BMDMA status register instead (to ensure PCI posted write 
flushing with MMIO). The trick in doing this in generic code is that the 
register might not exist (BMDMA controllers are a subset of SFF 
controllers, which is what this code deals with).

> There's no way for the 20247 (UDMA add-on) to know that the IO access
> is coming; the first the 247 knows about it is when the CS/address lines
> to the drive have been activated - in violation of the ATA specification
> (which calls for the CS lines to be inactive and address lines to be zero
> while UDMA is occuring.)
>
> That means there's absolutely no way for the 20247 to stop the UDMA
> protocol to allow a read of any registers on the drive until UDMA has
> completed.
>
> Yes, technically a buggy host design...

Indeed, which is why I suspect that a global change to handle this 
controller may not be a good idea. Overriding sff_exec_command for this 
driver to not do the altstatus read (just wait 600ns, maybe) might be 
the best solution for now.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 5, 2010, 10:04 a.m.
On 01/04/2010 08:03 PM, Robert Hancock wrote:
> rom ATA-8 APT:
>
> "HDMA0: Check_Status State: This state is entered when the host has
> written a DMA command to the device; when all data for the command has
> been transferred and nIEN is set to one; or when all data for the
> command has been transferred, nIEN is cleared zero, and INTRQ has been
> asserted.
>
> When in this state, the host shall read the device Status register. When
> entering this state from the HI4 state, the host shall wait 400 ns
> before reading the Status register. When entering this state from the
> HDMA1 state,
> the host shall wait one PIO transfer cycle time before reading the
> Status register. The wait may be accomplished by reading the Alternate
> Status register and ignoring the result."


Indeed...  I had forgotten about that bit.

	Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Jan. 5, 2010, 11:25 a.m.
> Indeed, which is why I suspect that a global change to handle this 
> controller may not be a good idea. Overriding sff_exec_command for this 
> driver to not do the altstatus read (just wait 600ns, maybe) might be 
> the best solution for now.

Old IDE does 400nS and no read. We know that works. At least for all the
standard PATA controllers. Thats an incredibly strong argument for doing
command issue that way for non MMIO controllers.

The MMIO case is muddy and probably simply does not matter because no
sane hardware implementation is not going to handle it internally.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 5, 2010, 1 p.m.
On 01/05/2010 06:25 AM, Alan Cox wrote:
>> Indeed, which is why I suspect that a global change to handle this
>> controller may not be a good idea. Overriding sff_exec_command for this
>> driver to not do the altstatus read (just wait 600ns, maybe) might be
>> the best solution for now.
>
> Old IDE does 400nS and no read. We know that works. At least for all the
> standard PATA controllers. Thats an incredibly strong argument for doing
> command issue that way for non MMIO controllers.

To which old-IDE do you refer?  :)  This specific area of functionality 
varies from 2.4.x to 2.6.x.  Old-Old-IDE even picks and chooses whether 
or not to wait depending on protocol (PIO or not), something that has 
changed in the more recent old-IDE code.


> The MMIO case is muddy and probably simply does not matter because no
> sane hardware implementation is not going to handle it internally.

It's not muddy, the rules for MMIO are quite clear, as is kernel 
practice for drivers driving MMIO-based hardware.

Any PATA MMIO controller on a PCI card -- pata_pdc2027x comes to mind -- 
must handle this situation, because the PCI posting situation varies 
depending on where you slot in your PCI card.  That's not something a 
controller can really handle internally.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Jan. 5, 2010, 1:11 p.m.
On 01/04/2010 09:06 PM, Robert Hancock wrote:
> Indeed, which is why I suspect that a global change to handle this
> controller may not be a good idea. Overriding sff_exec_command for this
> driver to not do the altstatus read (just wait 600ns, maybe) might be
> the best solution for now.


Well, I think it's fair to say that this should be applied more 
generally than just one driver.  It is definitely conceivable that 
reading AltStatus causes a problem with more than one host controller, 
even though the ATA7 and ATA8 state diagrams recommends doing so.

One solution could be taken from the old-IDE driver: handle things at 
the other end of the transaction.  Audit the paths that check status at 
the end of a drive command, rather than focusing on the start of the 
transaction.

That is where the "hard requirements" exist, anyway.  ATA requires us to

* wait 400ns before reading Status, and/or
* wait one PIO cycle before reading Status

And logically, we only need to read Status at the end of the 
transaction, or when handling a non-DMA PCI shared interrupt.

	Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox - Jan. 5, 2010, 1:37 p.m.
> To which old-IDE do you refer?  :)  This specific area of functionality 
> varies from 2.4.x to 2.6.x.  Old-Old-IDE even picks and chooses whether 
> or not to wait depending on protocol (PIO or not), something that has 
> changed in the more recent old-IDE code.

Which is irrelevant to the discussion.

> It's not muddy, the rules for MMIO are quite clear, as is kernel 
> practice for drivers driving MMIO-based hardware.

The IDE layer shows the SI3112 doesn't have this problem so we shouldn't
cripple it with Jeff paranoia.

> Any PATA MMIO controller on a PCI card -- pata_pdc2027x comes to mind -- 
> must handle this situation, because the PCI posting situation varies 
> depending on where you slot in your PCI card.  That's not something a 
> controller can really handle internally.

Of course it can. The controller can implement a 400nS delay internally
if it gets a back to back command write and a read of a tf register state.
Nobody implementing an MMIO controller is going to not do that because
avoiding an artificial posting stall is a big performance win. Compared
to the big chunk of logic they all have for fifo draining interlocks its
nothing. Without that its going to be cheaper to operate PIO cycles not
MMIO cycles !

My SI controllers at least don't seem to need any kind of posting
protection in MMIO mode. They seem to be handling it all internally
somehow - Try rdtsc/write to cmd/rdtsc/read status/rdtsc

Alan


--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - Jan. 5, 2010, 5:44 p.m.
On Mon, Jan 04, 2010 at 07:03:49PM -0600, Robert Hancock wrote:
> I don't think it is. From ATA-8 APT:

Now go and look at the drive state diagram rather than the host
state diagram.
Robert Hancock - Jan. 6, 2010, 12:30 a.m.
On Tue, Jan 5, 2010 at 11:44 AM, Russell King <rmk@arm.linux.org.uk> wrote:
> On Mon, Jan 04, 2010 at 07:03:49PM -0600, Robert Hancock wrote:
>> I don't think it is. From ATA-8 APT:
>
> Now go and look at the drive state diagram rather than the host
> state diagram.

What about it are you referring to?
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c
index bbbb1fa..ddd275a 100644
--- a/drivers/ata/libata-sff.c
+++ b/drivers/ata/libata-sff.c
@@ -574,12 +574,12 @@  void ata_sff_tf_load(struct ata_port *ap, const struct ata_taskfile *tf)
 	struct ata_ioports *ioaddr = &ap->ioaddr;
 	unsigned int is_addr = tf->flags & ATA_TFLAG_ISADDR;
 
-	if (tf->ctl != ap->last_ctl) {
+//	if (tf->ctl != ap->last_ctl) {
 		if (ioaddr->ctl_addr)
 			iowrite8(tf->ctl, ioaddr->ctl_addr);
 		ap->last_ctl = tf->ctl;
 		ata_wait_idle(ap);
-	}
+//	}
 
 	if (is_addr && (tf->flags & ATA_TFLAG_LBA48)) {
 		WARN_ON_ONCE(!ioaddr->ctl_addr);