Patchwork [regression] 2.6.37-rc5: scsi_eh_11 CPU loop

login
register
mail settings
Submitter Tejun Heo
Date Dec. 20, 2010, 5:12 p.m.
Message ID <4D0F8E98.9070500@kernel.org>
Download mbox | patch
Permalink /patch/76213/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - Dec. 20, 2010, 5:12 p.m.
Hello,

On 12/20/2010 11:05 AM, Martin Steigerwald wrote:
> Hi!
> 
> top - 10:49:07 up 3 days, 14:24,  8 users,  load average: 2.31, 2.62, 2.28
> Tasks: 198 total,   2 running, 194 sleeping,   0 stopped,   2 zombie
> Cpu(s):  6.8%us, 93.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  
> 0.0%st
> Mem:   2073660k total,  1941152k used,   132508k free,   153876k buffers
> Swap:  4000180k total,   243452k used,  3756728k free,   676612k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                               
>   735 root      20   0     0    0    0 R 85.2  0.0 137:28.94 scsi_eh_11
> 
> I don't see anything in dmesg. Everything appears to work as normal, 
> except for the slowness. Which got a bit better upon renicing scsi_eh_11 
> to 19 (not knowing whether its really safe, but it works for now).
> 
> Interrupts appear to be within usual range as well:

Can you please apply the following patch, trigger the problem and
attach the kernel log?
Martin Steigerwald - Dec. 21, 2010, 10:31 a.m.
Am Monday 20 December 2010 schrieb Tejun Heo:
> Hello,
> 
> On 12/20/2010 11:05 AM, Martin Steigerwald wrote:
> > Hi!
> > 
> > top - 10:49:07 up 3 days, 14:24,  8 users,  load average: 2.31, 2.62,
> > 2.28 Tasks: 198 total,   2 running, 194 sleeping,   0 stopped,   2
> > zombie Cpu(s):  6.8%us, 93.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,
> >  0.0%si, 0.0%st
> > Mem:   2073660k total,  1941152k used,   132508k free,   153876k
> > buffers Swap:  4000180k total,   243452k used,  3756728k free,  
> > 676612k cached
> > 
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >   735 root      20   0     0    0    0 R 85.2  0.0 137:28.94
> >   scsi_eh_11
> > 
> > I don't see anything in dmesg. Everything appears to work as normal,
> > except for the slowness. Which got a bit better upon renicing
> > scsi_eh_11 to 19 (not knowing whether its really safe, but it works
> > for now).
> 
> > Interrupts appear to be within usual range as well:

> Can you please apply the following patch, trigger the problem and
> attach the kernel log?
> 
> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> index 5e59050..c748b5a 100644
[...]
> 
> +	if (!(ap->pflags & ATA_PFLAG_LOADING) && xxx < 16) {
> +		ata_port_printk(ap, KERN_WARNING, "XXX ata_eh_set_pending()\n");
> +		dump_stack();
> +		xxx++;
> +	}
> +

I will do so with my next kernel compile - likely rc7 if it comes out 
before Christmas.

As I do not know what exactly triggers it, it can take a while till I 
experience it again. I hope to be able to trigger it by inserting and 
removing the PCMCIA eSATA adapter lots of times. As the bug happened, I 
forgot to vgchange -an the LVM on the external drive. That might be or 
might not be related to triggering the issue. I will try this as well.

Reported-as: https://bugzilla.kernel.org/show_bug.cgi?id=25392

Might make sense to switch this bugzilla report to mail. Or use the 
bugzilla. Or just use the bugzilla for tracking and continue using mail. 
I'll do whatever suits you best (but prefer mail).

Thanks,
Tejun Heo - Dec. 21, 2010, 11:03 a.m.
Hello,

On Tue, Dec 21, 2010 at 11:31:15AM +0100, Martin Steigerwald wrote:
> > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
> > index 5e59050..c748b5a 100644
> [...]
> > 
> > +	if (!(ap->pflags & ATA_PFLAG_LOADING) && xxx < 16) {
> > +		ata_port_printk(ap, KERN_WARNING, "XXX ata_eh_set_pending()\n");
> > +		dump_stack();
> > +		xxx++;
> > +	}
> > +
> 
> I will do so with my next kernel compile - likely rc7 if it comes out 
> before Christmas.
> 
> As I do not know what exactly triggers it, it can take a while till I 
> experience it again. I hope to be able to trigger it by inserting and 
> removing the PCMCIA eSATA adapter lots of times. As the bug happened, I 
> forgot to vgchange -an the LVM on the external drive. That might be or 
> might not be related to triggering the issue. I will try this as well.
> 
> Reported-as: https://bugzilla.kernel.org/show_bug.cgi?id=25392
> 
> Might make sense to switch this bugzilla report to mail. Or use the 
> bugzilla. Or just use the bugzilla for tracking and continue using mail. 
> I'll do whatever suits you best (but prefer mail).

I don't care either way.  Just ping me whenever you hit the problem
again.

Thanks.

Patch

diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 5e59050..c748b5a 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -888,12 +888,19 @@  void ata_eh_fastdrain_timerfn(unsigned long arg)
  */
 static void ata_eh_set_pending(struct ata_port *ap, int fastdrain)
 {
+	static int xxx;
 	int cnt;

 	/* already scheduled? */
 	if (ap->pflags & ATA_PFLAG_EH_PENDING)
 		return;

+	if (!(ap->pflags & ATA_PFLAG_LOADING) && xxx < 16) {
+		ata_port_printk(ap, KERN_WARNING, "XXX ata_eh_set_pending()\n");
+		dump_stack();
+		xxx++;
+	}
+
 	ap->pflags |= ATA_PFLAG_EH_PENDING;

 	if (!fastdrain)