diff mbox

Fix Promise UDMA33 IDE driver (pdc202xx_old)

Message ID 20100103223542.GA24920@flint.arm.linux.org.uk
State Accepted
Delegated to: David Miller
Headers show

Commit Message

Russell King Jan. 3, 2010, 10:35 p.m. UTC
On Sun, Jan 03, 2010 at 12:23:14AM +0000, Russell King wrote:
> - with IDE
>   - locks the interrupt line, and makes the machine extremely painful -
>     about an hour to get to the point of being able to unload the
>     pdc202xx_old module.

Having manually bisected kernel versions, I've narrowed it down to some
change between 2.6.30 and 2.6.31.  There's not much which has changed
between the two kernels, but one change stands out like a sore thumb:

+static int pdc202xx_test_irq(ide_hwif_t *hwif)
+{
+       struct pci_dev *dev     = to_pci_dev(hwif->dev);
+       unsigned long high_16   = pci_resource_start(dev, 4);
+       u8 sc1d                 = inb(high_16 + 0x1d);
+
+       if (hwif->channel) {
+               /*
+                * bit 7: error, bit 6: interrupting,
+                * bit 5: FIFO full, bit 4: FIFO empty
+                */
+               return ((sc1d & 0x50) == 0x40) ? 1 : 0;
+       } else  {
+               /*
+                * bit 3: error, bit 2: interrupting,
+                * bit 1: FIFO full, bit 0: FIFO empty
+                */
+               return ((sc1d & 0x05) == 0x04) ? 1 : 0;
+       }
+}

Reading the (documented as a 32-bit) system control register when the
interface is idle gives: 0x01da110c

So, the byte at 0x1d is 0x11, which is documented as meaning that the
primary and secondary FIFOs are empty.

The code above, which is trying to see whether an IRQ is pending, checks
for the IRQ bit to be one, and the FIFO bit to be zero - or in English,
to be non-empty.

Since during a BM-DMA read, the FIFOs will naturally be drained to the
PCI bus, the chance of us getting to the interface before this happens
are extremely small - and if we don't, it means we decide not to service
the interrupt.  Hence, the screaming interrupt problem with drivers/ide.

Fix this by only indicating an interrupt is ready if both the interrupt
and FIFO empty bits are at '1'.

This bug only affects PDC20246/PDC20247 (Promise Ultra33) based cards,
and has been tested on 2.6.31 and 2.6.33-rc2.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 drivers/ide/pdc202xx_old.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Sergei Shtylyov Jan. 4, 2010, 7:14 p.m. UTC | #1
Hello.

Russell King wrote:

>>- with IDE
>>  - locks the interrupt line, and makes the machine extremely painful -
>>    about an hour to get to the point of being able to unload the
>>    pdc202xx_old module.

> Having manually bisected kernel versions, I've narrowed it down to some
> change between 2.6.30 and 2.6.31.  There's not much which has changed
> between the two kernels, but one change stands out like a sore thumb:

> +static int pdc202xx_test_irq(ide_hwif_t *hwif)
> +{
> +       struct pci_dev *dev     = to_pci_dev(hwif->dev);
> +       unsigned long high_16   = pci_resource_start(dev, 4);
> +       u8 sc1d                 = inb(high_16 + 0x1d);
> +
> +       if (hwif->channel) {
> +               /*
> +                * bit 7: error, bit 6: interrupting,
> +                * bit 5: FIFO full, bit 4: FIFO empty
> +                */
> +               return ((sc1d & 0x50) == 0x40) ? 1 : 0;
> +       } else  {
> +               /*
> +                * bit 3: error, bit 2: interrupting,
> +                * bit 1: FIFO full, bit 0: FIFO empty
> +                */
> +               return ((sc1d & 0x05) == 0x04) ? 1 : 0;
> +       }
> +}

    Sigh, it was mine... :-/

> Reading the (documented as a 32-bit) system control register when the
> interface is idle gives: 0x01da110c

    Oh, you even have the docs... :-)

> So, the byte at 0x1d is 0x11, which is documented as meaning that the
> primary and secondary FIFOs are empty.

> The code above, which is trying to see whether an IRQ is pending, checks
> for the IRQ bit to be one, and the FIFO bit to be zero - or in English,
> to be non-empty.

    Yeah, here's the older driver code that I got inspiration from:

static int pdc202xx_dma_test_irq(ide_drive_t *drive)
{
        ide_hwif_t *hwif        = drive->hwif;
        unsigned long high_16   = hwif->extra_base - 16;
        u8 dma_stat             = inb(hwif->dma_base + ATA_DMA_STATUS);
        u8 sc1d                 = inb(high_16 + 0x001d);

        if (hwif->channel) {
                /* bit7: Error, bit6: Interrupting, bit5: FIFO Full, bit4: 
FIFO Empty */
                if ((sc1d & 0x50) == 0x50)
                        goto somebody_else;
                else if ((sc1d & 0x40) == 0x40)
                        return (dma_stat & 4) == 4;
        } else {
                /* bit3: Error, bit2: Interrupting, bit1: FIFO Full, bit0: 
FIFO Empty */
                if ((sc1d & 0x05) == 0x05)
                        goto somebody_else;
                else if ((sc1d & 0x04) == 0x04)
                        return (dma_stat & 4) == 4;
        }
somebody_else:
        return (dma_stat & 4) == 4;     /* return 1 if INTR asserted */
}

    Now you can see the source of the confusion -- interrupts with FIFO 
empty were regarded as coming from something else.  The 'sc1d' value was 
effectively ignored at that time though, that's why this went unnoticed so far.

> Since during a BM-DMA read, the FIFOs will naturally be drained to the
> PCI bus, the chance of us getting to the interface before this happens
> are extremely small - and if we don't, it means we decide not to service
> the interrupt.  Hence, the screaming interrupt problem with drivers/ide.

> Fix this by only indicating an interrupt is ready if both the interrupt
> and FIFO empty bits are at '1'.

    I suggest that we completely ignore the "FIFO empty" bit. dma_test_irq() 
method which is called eventually when doing FMA should yield the needed 
result WRT FIFO flushing, filtering out .

> This bug only affects PDC20246/PDC20247 (Promise Ultra33) based cards,
> and has been tested on 2.6.31 and 2.6.33-rc2.
> 
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
>  drivers/ide/pdc202xx_old.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ide/pdc202xx_old.c b/drivers/ide/pdc202xx_old.c
> index cb812f3..f5c08f4 100644
> --- a/drivers/ide/pdc202xx_old.c
> +++ b/drivers/ide/pdc202xx_old.c
> @@ -100,13 +100,13 @@ static int pdc202xx_test_irq(ide_hwif_t *hwif)
>  		 * bit 7: error, bit 6: interrupting,
>  		 * bit 5: FIFO full, bit 4: FIFO empty
>  		 */
> -		return ((sc1d & 0x50) == 0x40) ? 1 : 0;
> +		return ((sc1d & 0x50) == 0x50) ? 1 : 0;

+		return (sc1d & 0x40) ? 1 : 0;

>  	} else	{
>  		/*
>  		 * bit 3: error, bit 2: interrupting,
>  		 * bit 1: FIFO full, bit 0: FIFO empty
>  		 */
> -		return ((sc1d & 0x05) == 0x04) ? 1 : 0;
> +		return ((sc1d & 0x05) == 0x05) ? 1 : 0;

+		return (sc1d & 0x04) ? 1 : 0;

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Jan. 5, 2010, 6:26 a.m. UTC | #2
From: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Date: Mon, 04 Jan 2010 22:14:55 +0300

>    I suggest that we completely ignore the "FIFO empty"
>    bit. dma_test_irq() method which is called eventually when doing FMA
>    should yield the needed result WRT FIFO flushing, filtering out .

Can someone implement and test such a revised patch?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King Jan. 5, 2010, 5:49 p.m. UTC | #3
On Mon, Jan 04, 2010 at 10:26:58PM -0800, David Miller wrote:
> From: Sergei Shtylyov <sshtylyov@ru.mvista.com>
> Date: Mon, 04 Jan 2010 22:14:55 +0300
> 
> >    I suggest that we completely ignore the "FIFO empty"
> >    bit. dma_test_irq() method which is called eventually when doing FMA
> >    should yield the needed result WRT FIFO flushing, filtering out .
> 
> Can someone implement and test such a revised patch?

Do we have any guarantee that the interrupt bit in this register will
only be set when the DMA FIFOs on the card have emptied?

Unlike the SFF status register's interrupt bit, which is defined by SFF
to only be asserted when the drive signals an interrupt _and_ the FIFOs
have emptied, there seems to be no such words in the PDC20246 data which
suggests that the same is true of the interrupt bit in this alternative
register.

Therefore, I would suggest that using this alternative bit could be unsafe
with shared interrupts without also qualifying it with the FIFO empty
bit - we could end up disabling the DMA before the transfer has completed,
thereby corrupting the last few words of a transfer from the drive.

What was the reason for checking this alternate promise-special register
in the first place, rather than the SFF BM-DMA status register?
Sergei Shtylyov Jan. 5, 2010, 5:52 p.m. UTC | #4
Hello.

Russell King wrote:

>>From: Sergei Shtylyov <sshtylyov@ru.mvista.com>
>>Date: Mon, 04 Jan 2010 22:14:55 +0300

>>>   I suggest that we completely ignore the "FIFO empty"
>>>   bit. dma_test_irq() method which is called eventually when doing FMA
>>>   should yield the needed result WRT FIFO flushing, filtering out .

>>Can someone implement and test such a revised patch?

> Do we have any guarantee that the interrupt bit in this register will
> only be set when the DMA FIFOs on the card have emptied?

> Unlike the SFF status register's interrupt bit, which is defined by SFF
> to only be asserted when the drive signals an interrupt _and_ the FIFOs
> have emptied, there seems to be no such words in the PDC20246 data which
> suggests that the same is true of the interrupt bit in this alternative
> register.

    So what? The SFF status register will be read eventually.

> Therefore, I would suggest that using this alternative bit could be unsafe
> with shared interrupts without also qualifying it with the FIFO empty
> bit - we could end up disabling the DMA before the transfer has completed,
> thereby corrupting the last few words of a transfer from the drive.

    How so?

> What was the reason for checking this alternate promise-special register
> in the first place, rather than the SFF BM-DMA status register?

    It's not "rather then" -- we're checking both.

MBR, Sergei
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/ide/pdc202xx_old.c b/drivers/ide/pdc202xx_old.c
index cb812f3..f5c08f4 100644
--- a/drivers/ide/pdc202xx_old.c
+++ b/drivers/ide/pdc202xx_old.c
@@ -100,13 +100,13 @@  static int pdc202xx_test_irq(ide_hwif_t *hwif)
 		 * bit 7: error, bit 6: interrupting,
 		 * bit 5: FIFO full, bit 4: FIFO empty
 		 */
-		return ((sc1d & 0x50) == 0x40) ? 1 : 0;
+		return ((sc1d & 0x50) == 0x50) ? 1 : 0;
 	} else	{
 		/*
 		 * bit 3: error, bit 2: interrupting,
 		 * bit 1: FIFO full, bit 0: FIFO empty
 		 */
-		return ((sc1d & 0x05) == 0x04) ? 1 : 0;
+		return ((sc1d & 0x05) == 0x05) ? 1 : 0;
 	}
 }