diff mbox

2.6.34 PDC20268 PATA IO error loop makes system unusable

Message ID 1276532426.5374.38.camel@mulgrave.site
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

James Bottomley June 14, 2010, 4:20 p.m. UTC
On Mon, 2010-06-14 at 09:59 +0200, Tejun Heo wrote:
> Hello,
> 
> On 06/14/2010 09:53 AM, Andi Kleen wrote:
> > On Mon, Jun 14, 2010 at 09:43:28AM +0200, Tejun Heo wrote:
> >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f0 08 00 01 00 00
> >>> sd 11:0:0:0: [sdd] Unhandled error code
> >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f1 08 00 01 00 00
> >>> sd 11:0:0:0: [sdd] Unhandled error code
> >>> sd 11:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> >>> sd 11:0:0:0: [sdd] CDB: Write(10): 2a 00 00 e5 f2 08 00 01 00 00
> >>>
> >>> same messages repeating forever, just with CDB changing occasionally.
> >>>
> >>> ....
> >>>
> >>> not stopping until I reset the box.
> >>
> >> Did you have a lot of dirty pages?  It looks like upper layer is
> > 
> > Yes, there was a dd running.
> > 
> >> trying to flush all the dirty buffers and SCSI is a tad bit too
> >> verbose about failing each IO w/ DID_BAD_TARGET thus taking a very
> > 
> > A bit too verbose?  That's really an euphemism ...
> 
> Yeap, of course it was. :-)
> 
> > During the CDB: Write loop the console was totally unusable!
> > 
> > And I think the fsyncs in syslogd completely made the performance
> > tank.
> 
> Console often becomes the bottleneck too when there are a lot of
> kernel messages.
> 
> > So basically it was a "reset button only" situation.
> > 
> > When the device is gone what's the point in giving a message 
> > more than once? Can't the requests just be silently failed in this
> > case?
> 
> Yeah, it would be better to somehow summarize those error message
> instead of spitting out all of them.

I don't think we can summarize.  However, when things start to go wrong,
it's usually only the first set of errors that are significant, so we
could do a simple ratelimit.

James

---



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1646fe7..c8c7483 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -896,7 +896,7 @@  void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 	case ACTION_FAIL:
 		/* Give up and fail the remainder of the request */
 		scsi_release_buffers(cmd);
-		if (!(req->cmd_flags & REQ_QUIET)) {
+		if (!(req->cmd_flags & REQ_QUIET) && printk_ratelimit()) {
 			if (description)
 				scmd_printk(KERN_INFO, cmd, "%s\n",
 					    description);