Patchwork 'Device not ready' issue on mpt2sas since 3.1.10

login
register
mail settings
Submitter James Bottomley
Date July 25, 2012, 7:55 p.m.
Message ID <1343246155.12094.59.camel@dabdike>
Download mbox | patch
Permalink /patch/173252/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

James Bottomley - July 25, 2012, 7:55 p.m.
On Wed, 2012-07-25 at 10:17 -0700, Tejun Heo wrote:
> Hello, James.
> 
> On Wed, Jul 25, 2012 at 06:19:13PM +0400, James Bottomley wrote:
> > > I haven't consulted SAT but it seems like a bug in SAS driver or
> > > firmware.  If it's a driver bug, we better fix it there.  If a
> > > firmware bug, working around those is one of major roles of drivers,
> > > so I think setting allow_restart is fine.
> > 
> > Actually, I don't think so.  SAT-2 section 8.12.2 does say 
> > 
> >         if the device is in the stopped state as the result of
> >         processing a START STOP UNIT command (see 9.11), then the SATL
> >         shall terminate the TEST UNIT READY command with CHECK CONDITION
> >         status with the sense key set to NOT READY and the additional
> >         sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
> >         REQUIRED;
> > 
> > START STOP UNIT (with START=0) translates to STANDBY IMMEDIATE, and
> > that's what hdparm -y issues.  We don't see this in /drivers/ata because
> > TEST UNIT READY always returns success.
> 
> Urgh... ATA device in standby mode is ready for any command and
> definitely doesn't need an "initializing command".  Oh, well...

Well, it does in sleep mode ... which seems to most closely map to what
SCSI thinks of as a stopped unit. I checked the specs just in case there
was an error ... they all say STANDBY not SLEEP.

> > So it looks like the mpt2sas SAT is doing the correct thing and we only
> > don't see this problem in normal SATA devices because of a bug in the
> > libata-scsi SAT.
> 
> libata is inconsistent with the standard but I think the standard is
> wrong here. :(

Well, reading it, so do I.  Unfortunately, we get to deal with the world
as it is rather than as we would wish it to be.  We likely have this
problem with a lot of USB SATLs as well ...

It looks like a hack like this might be needed.

James

---



--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthias Prager - July 25, 2012, 11:56 p.m.
Hello James,

Am 25.07.2012 21:55, schrieb James Bottomley:>
> It looks like a hack like this might be needed.
>
> James
>

<SNIP>

I don't yet understand all the code but I'm following your discussion
with Tejun: I've set up a minimal vm running gentoo with a mpt2sas
driven controller in passthrough mode. I've applied your proposed patch
against the vanilla 3.5.0 kernel (which includes Tejun's commit), and
I'm happy to report the problem does seem to get fixed by it.
Well at least sending the sata drive in standby using 'hdparm -y' now
works (according to 'hdparm -C') without these nasty i/o errors on later
i/o. That is to say the drive wakes up again (e.g. from a 'fdisk -l
/dev/sda' command) and returns data.

--
Matthias
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robert Trace - July 26, 2012, 7:16 p.m.
On 07/25/2012 07:56 PM, Matthias Prager wrote:
> 
> I don't yet understand all the code but I'm following your discussion
> with Tejun: I've set up a minimal vm running gentoo with a mpt2sas
> driven controller in passthrough mode. I've applied your proposed patch
> against the vanilla 3.5.0 kernel (which includes Tejun's commit), and
> I'm happy to report the problem does seem to get fixed by it.

I can confirm this on my hardware as well with both 3.4.4 and 3.5.0.
Without James' patch the kernels will immediately drop the I/O and with
the patch both kernels will wake the SATA disks and then complete the
I/O successfully.

-- Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robert Trace - Aug. 16, 2012, 6:26 p.m.
On 07/25/2012 03:55 PM, James Bottomley wrote:
> 
> Well, reading it, so do I.  Unfortunately, we get to deal with the world
> as it is rather than as we would wish it to be.  We likely have this
> problem with a lot of USB SATLs as well ...

Has this patch made it into the main git trees yet?

I haven't seen anything about it in nearly a month, but I've been using
the James' patch since he posted it and the sleep/wakeup behavior seems
improved/correct.

-- Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthias Prager - Aug. 16, 2012, 8:24 p.m.
Am 16.08.2012 20:26, schrieb Robert Trace:
> On 07/25/2012 03:55 PM, James Bottomley wrote:
>>
>> Well, reading it, so do I.  Unfortunately, we get to deal with the world
>> as it is rather than as we would wish it to be.  We likely have this
>> problem with a lot of USB SATLs as well ...
> 
> Has this patch made it into the main git trees yet?

Not yet, but it is in James scsi misc tree and last I heard was
scheduled for inclusion in the 3.6 kernel.

Anyways here is his commit:
<http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb>

> 
> I haven't seen anything about it in nearly a month, but I've been using
> the James' patch since he posted it and the sleep/wakeup behavior seems
> improved/correct.

I have been running smoothly with the patch too - problem solved I'd say :-)

> 
> -- Robert
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Robert Trace - Aug. 16, 2012, 8:33 p.m.
On 08/16/2012 04:24 PM, Matthias Prager wrote:
> 
> Not yet, but it is in James scsi misc tree and last I heard was
> scheduled for inclusion in the 3.6 kernel.

Close enough. :-)  I didn't track the changes on the SCSI tree and I
just wanted to make sure that it didn't slip through the cracks.

Thanks to all involved for all of the help and a speedy fix!

-- Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 4a6381c..7e59a7f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -42,6 +42,8 @@ 
 
 #include <trace/events/scsi.h>
 
+static void scsi_eh_done(struct scsi_cmnd *scmd);
+
 #define SENSE_TIMEOUT		(10*HZ)
 
 /*
@@ -241,6 +243,14 @@  static int scsi_check_sense(struct scsi_cmnd *scmd)
 	if (! scsi_command_normalize_sense(scmd, &sshdr))
 		return FAILED;	/* no valid sense data */
 
+	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
+		/* 
+		 * nasty: for mid-layer issued TURs, we need to return the
+		 * actual sense data without any recovery attempt.  For eh
+		 * issued ones, we need to try to recover and interpret
+		 */
+		return SUCCESS;
+
 	if (scsi_sense_is_deferred(&sshdr))
 		return NEEDS_RETRY;
 
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 56a9379..91d3366 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -764,6 +764,16 @@  static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	sdev->model = (char *) (sdev->inquiry + 16);
 	sdev->rev = (char *) (sdev->inquiry + 32);
 
+	if (strncmp(sdev->vendor, "ATA     ", 8) == 0) {
+		/* 
+		 * sata emulation layer device.  This is a hack to work around
+		 * the SATL power management specifications which state that
+		 * when the SATL detects the device has gone into standby
+		 * mode, it shall respond with NOT READY.
+		 */
+		sdev->allow_restart = 1;
+	}
+
 	if (*bflags & BLIST_ISROM) {
 		sdev->type = TYPE_ROM;
 		sdev->removable = 1;