Patchwork [RFC] ata/core: don't enable the interrupt while activating the host

login
register
mail settings
Submitter Sebastian Siewior
Date March 11, 2010, 10:04 p.m.
Message ID <20100311220417.GA16022@Chamillionaire.breakpoint.cc>
Download mbox | patch
Permalink /patch/47635/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Sebastian Siewior - March 11, 2010, 10:04 p.m.
I just migrated from ide_platform to pata_platform on my Swarm/Mips
board. This resulted in "nobody cared" during request_irq().
The only difference I noticed between ATA and IDE before registering the
interrupt handler is that IDE does not enable the interrupt.
After removing the flag it looks like the controller is working with the
ATA layer.

If I flip the polarity of the interrupt, I see the "nobody cared"
message shortly after seeing ata_sff_softreset() during boot. So I tried
to perform the software reset with ATA_SRST before requesting the
interrupt hoping that it makes interrupt go away. It did not so.

Does anyone have an idea what might go wrong here?

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
---
 drivers/ata/libata-core.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)
Tejun Heo - March 25, 2010, 12:30 a.m.
Hello,

On 03/12/2010 07:04 AM, Sebastian Andrzej Siewior wrote:
> I just migrated from ide_platform to pata_platform on my Swarm/Mips
> board. This resulted in "nobody cared" during request_irq().
> The only difference I noticed between ATA and IDE before registering the
> interrupt handler is that IDE does not enable the interrupt.
> After removing the flag it looks like the controller is working with the
> ATA layer.
> 
> If I flip the polarity of the interrupt, I see the "nobody cared"
> message shortly after seeing ata_sff_softreset() during boot. So I tried
> to perform the software reset with ATA_SRST before requesting the
> interrupt hoping that it makes interrupt go away. It did not so.
> 
> Does anyone have an idea what might go wrong here?

ata_host_start() calls ata_eh_freeze_port() on each port which is
supposed to block the IRQs.  For pata_platform, the callback is
libata-sff.c::ata_sff_freeze() which does all that's necessary.  Can
you please play with ata_sff_freeze() and find out why it's not
working?

Thanks.
Sebastian Siewior - March 25, 2010, 10:04 p.m.
* Tejun Heo | 2010-03-25 09:30:25 [+0900]:

>ata_host_start() calls ata_eh_freeze_port() on each port which is
>supposed to block the IRQs.  For pata_platform, the callback is
>libata-sff.c::ata_sff_freeze() which does all that's necessary.  Can
>you please play with ata_sff_freeze() and find out why it's not
>working?
I've been there. I've moved the functions around and nothing changed.
After removing the ATA_NIEN bit it started working. So it looks like
->sff_check_status() & ->sff_irq_clear() are not clearing the interrupt
for what reason ever.
Right now I belive that the firmware is doing something with the chip.
Is it possible to reset the chip to its power-on state?
ata_bus_softreset() did not help. Maybe I find something when I dig
through the firmware code....

>Thanks.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - March 25, 2010, 11:54 p.m.
Hello,

On 03/26/2010 07:04 AM, Sebastian Andrzej Siewior wrote:
>> Can you please play with ata_sff_freeze() and find out why it's not
>> working?
> I've been there. I've moved the functions around and nothing changed.
> After removing the ATA_NIEN bit it started working. So it looks like
> ->sff_check_status() & ->sff_irq_clear() are not clearing the interrupt
> for what reason ever.

What happens if you remove everything else from the function and just
set ATA_NIEN?

Thanks.
Sebastian Siewior - April 6, 2010, 9:45 a.m.
* Tejun Heo | 2010-03-26 08:54:54 [+0900]:

>Hello,
Hello,

sorry for the delay.

>On 03/26/2010 07:04 AM, Sebastian Andrzej Siewior wrote:
>>> Can you please play with ata_sff_freeze() and find out why it's not
>>> working?
>> I've been there. I've moved the functions around and nothing changed.
>> After removing the ATA_NIEN bit it started working. So it looks like
>> ->sff_check_status() & ->sff_irq_clear() are not clearing the interrupt
>> for what reason ever.
>
>What happens if you remove everything else from the function and just
>set ATA_NIEN?

removing everything from ata_sff_freeze() makes the interrupt storm go
away.
Leaving
- ap->ctl |= ATA_NIEN;
  => seems to work
- the upper and "ap->last_ctl = ap->ctl;"
  => seems to work
- the upper and "iowrite8(ap->ctl, ioaddr->ctl_addr);"
  => irq storm on request irq.

ata_sff_check_status() does not report ATA_DRQ set so I don't know why
the interrupt is comming.

>Thanks.
>
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sergei Shtylyov - April 6, 2010, 10 a.m.
Hello.

Sebastian Andrzej Siewior wrote:

>> On 03/26/2010 07:04 AM, Sebastian Andrzej Siewior wrote:
>>     
>>>> Can you please play with ata_sff_freeze() and find out why it's not
>>>> working?
>>>>         
>>> I've been there. I've moved the functions around and nothing changed.
>>> After removing the ATA_NIEN bit it started working. So it looks like
>>> ->sff_check_status() & ->sff_irq_clear() are not clearing the interrupt
>>> for what reason ever.
>>>       
>> What happens if you remove everything else from the function and just
>> set ATA_NIEN?
>>     
>
> removing everything from ata_sff_freeze() makes the interrupt storm go
> away.
> Leaving
> - ap->ctl |= ATA_NIEN;
>   => seems to work
> - the upper and "ap->last_ctl = ap->ctl;"
>   => seems to work
> - the upper and "iowrite8(ap->ctl, ioaddr->ctl_addr);"
>   => irq storm on request irq.
>
> ata_sff_check_status() does not report ATA_DRQ set so I don't know why
> the interrupt is comming.
>   

   Perhaps the drive incorrectly triggers interrupt on setting the nIEN 
bit -- some drives are known to do this; there is a blacklist of such 
drives in the IDE core so that it avoids setting nIEN bit for them.

WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - April 6, 2010, 9:56 p.m.
On 04/06/2010 07:00 PM, Sergei Shtylyov wrote:
>> ata_sff_check_status() does not report ATA_DRQ set so I don't know why
>> the interrupt is comming.
> 
>   Perhaps the drive incorrectly triggers interrupt on setting the nIEN
> bit -- some drives are known to do this; there is a blacklist of such
> drives in the IDE core so that it avoids setting nIEN bit for them.

Arghh.... that stinks.  Maybe we should port the list over?  What is
the ATA device in question here?
Sebastian Siewior - April 7, 2010, 7:52 a.m.
* Tejun Heo | 2010-04-07 06:56:10 [+0900]:

>On 04/06/2010 07:00 PM, Sergei Shtylyov wrote:
>>> ata_sff_check_status() does not report ATA_DRQ set so I don't know why
>>> the interrupt is comming.
>> 
>>   Perhaps the drive incorrectly triggers interrupt on setting the nIEN
>> bit -- some drives are known to do this; there is a blacklist of such
>> drives in the IDE core so that it avoids setting nIEN bit for them.
>
>Arghh.... that stinks.  Maybe we should port the list over?  What is
>the ATA device in question here?

[    1.257000] ata1.00: ATA-6: WDC WD1600PB-98FBA0, 15.05R15, max UDMA/100
[    1.264000] ata1.00: 312581808 sectors, multi 0: LBA48
[    1.269000] ata1.01: ATAPI: HL-DT-ST DVDRAM GSA-4081B, A100, max UDMA/33
[    1.335000] scsi 0:0:0:0: Direct-Access     ATA      WDC WD1600PB-98F 15.0 PQ: 0 ANSI: 5
[    1.387000] scsi 0:0:1:0: CD-ROM            HL-DT-ST DVDRAM GSA-4081B A100 PQ: 0 ANSI: 5

I try to unplug them later and see if the problem goes away once there
is no device.
I'm not really convinced that this the fault of the device. I just
booted the kernel that worked yesterday and I ended up in "no body
carred" irq storm. Hard reset, the same kernel and it worked. Two
reboots later and I had the "nobody cared" again. So I guess that the
irq line is active in every case but if the device (or the CPU) is fast
enough then I don't notice it. Later (after the first command or so) it
seems to get normal and issue the interrupt only if required.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Siewior - April 13, 2010, 2:38 p.m.
* Sebastian Andrzej Siewior | 2010-04-07 09:52:49 [+0200]:

>>>   Perhaps the drive incorrectly triggers interrupt on setting the nIEN
>>> bit -- some drives are known to do this; there is a blacklist of such
>>> drives in the IDE core so that it avoids setting nIEN bit for them.
>>
>>Arghh.... that stinks.  Maybe we should port the list over?  What is
>>the ATA device in question here?
>
>[    1.257000] ata1.00: ATA-6: WDC WD1600PB-98FBA0, 15.05R15, max UDMA/100
>[    1.264000] ata1.00: 312581808 sectors, multi 0: LBA48
>[    1.269000] ata1.01: ATAPI: HL-DT-ST DVDRAM GSA-4081B, A100, max UDMA/33
>[    1.335000] scsi 0:0:0:0: Direct-Access     ATA      WDC WD1600PB-98F 15.0 PQ: 0 ANSI: 5
>[    1.387000] scsi 0:0:1:0: CD-ROM            HL-DT-ST DVDRAM GSA-4081B A100 PQ: 0 ANSI: 5
>
>I try to unplug them later and see if the problem goes away once there
>is no device.
This is odd:
- a different HD & CD-ROM => seems to work
- just this CD-ROM => IRQ storm
- just the HD => 
 [    6.109000] ata1: link is slow to respond, please be patient (ready=0)
 [   11.113000] ata1: device not ready (errno=-16), forcing hardreset
 [   16.319000] ata1: link is slow to respond, please be patient (ready=0)
 [   21.170000] ata1: SRST failed (errno=-16)
 [   26.374000] ata1: link is slow to respond, please be patient (ready=0)
 [   31.174000] ata1: SRST failed (errno=-16)
 [   36.378000] ata1: link is slow to respond, please be patient (ready=0)
- this HD nad other CD-ROM => seems to work

The "odd" HD seems to work on X86 however the boot process gets delayed
by like 30 seconds. The CD-ROM seems not show any side effects.

So I fixed this problem by replacing the HD. Thanks for the pointers :)

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6728328..8484223 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6274,6 +6274,13 @@  int ata_host_activate(struct ata_host *host, int irq,
 		return ata_host_register(host, sht);
 	}
 
+	for (i = 0; i < host->n_ports; i++) {
+		struct ata_port *ap;
+
+		ap = host->ports[i];
+		iowrite8(ap->ctl & ~ATA_NIEN, ap->ioaddr.ctl_addr);
+	}
+
 	rc = devm_request_irq(host->dev, irq, irq_handler, irq_flags,
 			      dev_driver_string(host->dev), host);
 	if (rc)