Patchwork Storage related regression in linux-next 20120824

login
register
mail settings
Submitter Hugh Dickins
Date Sept. 9, 2012, 7:38 a.m.
Message ID <alpine.LSU.2.00.1209090012390.2416@eggly.anvils>
Download mbox | patch
Permalink /patch/182612/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Hugh Dickins - Sept. 9, 2012, 7:38 a.m.
On Mon, 27 Aug 2012, Arvydas Sidorenko wrote:
> On 08/27/2012 06:39 PM, Arvydas Sidorenko wrote:
> > > Can you pastebin 'dmesg' and 'lspci'?  Did this occur only once, or is
> > > it reproducible?
> > > 
> > >       Jeff
> > It does happen every time when booting into -next 20120824.

I don't know what went into next-20120824 versus next-20120813 (which
you reported to be good), but I'm seeing similar behaviour on PowerMac
G5 on Thursday's mmotm based on next-20120907 - critical target error,
root remounted read-only, the reboot with good kernel then has to fsck
(although fsck doesn't find anything interesting to fix in my case).

> > [   11.035530] sd 0:0:0:0: [sda]
> > [   11.035533] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [   11.035535] sd 0:0:0:0: [sda]
> > [   11.035536] Sense Key : Illegal Request [current]
> > [   11.035539] sd 0:0:0:0: [sda]
> > [   11.035541] Add. Sense: Invalid field in cdb
> > [   11.035543] sd 0:0:0:0: [sda] CDB:
> > [   11.035544] Write(10): 2a 08 0b ad d1 78 00 00 08 00
> > [   11.035550] end_request: critical target error, dev sda, sector
> > 195940728
> > [   11.035552] end_request: critical target error, dev sda, sector
> > 195940728
> > [   11.035557] Aborting journal on device sda4-8.
> > [   11.046413] EXT4-fs error (device sda4): ext4_journal_start_sb:348:
> > Detected aborted journal
> > [   11.046418] EXT4-fs (sda4): Remounting filesystem read-only
> > 
> I believe the problem is in SCSI. Mode sense command catches my attention:
>     [    3.955397] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 10
> All logs from older kernels has different mode page code: 00 3a 00 00
> 
> sg_modes from broken kernel shows DPOFUA set, which is 0x10th bit.
> Anyone knowing SCSI better could tell if that might cause the problems?

I think you know your way around SCSI/libata much better than I do.

I just bisected linux-next, and it comes down to the commit below, which
introduces the regression for me, and I'm guessing for you also.  Maybe
it can be fixed up to satisfy us, but otherwise will have to be reverted:
we don't invert a default if it's going to break older working systems.

A good workaround for me meanwhile is to add boot option "libata.fua=0":
please try that (or reverting the commit) and let us know the result.

Thanks,
Hugh

commit 91895b786e631ab47b618c901231f22b5a44115b
Author: Zheng Liu <wenqing.lz@taobao.com>
Date:   Tue May 8 11:24:03 2012 +0800

    libata: enable SATA disk fua detection on default
    
    Currently, SATA disk fua detection is disabled on default because most of
    devices don't support this feature at that time.  With the development of
    technology, more and more SATA disks support this feature.  So now we can enable
    this detection on default.
    
    Although fua detection is defined as a kernel module parameter, it is too hard
    to set its value because it must be loaded and set before system starts up.
    That needs to modify initrd file.  So it is inconvenient for administrator who
    needs to manage a huge number of servers.
    
    Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
    Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arvydas Sidorenko - Sept. 9, 2012, 8:11 p.m.
> I think you know your way around SCSI/libata much better than I do.
>
> I just bisected linux-next, and it comes down to the commit below, which
> introduces the regression for me, and I'm guessing for you also.  Maybe
> it can be fixed up to satisfy us, but otherwise will have to be reverted:
> we don't invert a default if it's going to break older working systems.
>
> A good workaround for me meanwhile is to add boot option "libata.fua=0":
> please try that (or reverting the commit) and let us know the result.
>
> Thanks,
> Hugh
>
> commit 91895b786e631ab47b618c901231f22b5a44115b
> Author: Zheng Liu <wenqing.lz@taobao.com>
> Date:   Tue May 8 11:24:03 2012 +0800
>
>     libata: enable SATA disk fua detection on default
>
>     Currently, SATA disk fua detection is disabled on default because most of
>     devices don't support this feature at that time.  With the development of
>     technology, more and more SATA disks support this feature.  So now we can enable
>     this detection on default.
>
>     Although fua detection is defined as a kernel module parameter, it is too hard
>     to set its value because it must be loaded and set before system starts up.
>     That needs to modify initrd file.  So it is inconvenient for administrator who
>     needs to manage a huge number of servers.
>
>     Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
>     Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
>
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 5eee1c1..c3fbdca 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -135,9 +135,9 @@ int atapi_passthru16 = 1;
>  module_param(atapi_passthru16, int, 0444);
>  MODULE_PARM_DESC(atapi_passthru16, "Enable ATA_16 passthru for ATAPI devices (0=off, 1=on [default])");
>
> -int libata_fua = 0;
> +int libata_fua = 1;
>  module_param_named(fua, libata_fua, int, 0444);
> -MODULE_PARM_DESC(fua, "FUA support (0=off [default], 1=on)");
> +MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on [default])");
>
>  static int ata_ignore_hpa;
>  module_param_named(ignore_hpa, ata_ignore_hpa, int, 0644);

Indeed, disabling FUA explicitly solved the issue on my disk as well.
Hugh, what hard drive you have this issue on?

I believe there are two solutions:
- Revert FUA default back to '0'
- Start filling SATA drive blacklist in function:
> static int ata_dev_supports_fua(u16 *id)

Arvydas
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Sept. 9, 2012, 8:28 p.m.
On 09/09/2012 04:11 PM, Arvydas Sidorenko wrote:
>> I think you know your way around SCSI/libata much better than I do.
>>
>> I just bisected linux-next, and it comes down to the commit below, which
>> introduces the regression for me, and I'm guessing for you also.  Maybe
>> it can be fixed up to satisfy us, but otherwise will have to be reverted:
>> we don't invert a default if it's going to break older working systems.
>>
>> A good workaround for me meanwhile is to add boot option "libata.fua=0":
>> please try that (or reverting the commit) and let us know the result.
>>
>> Thanks,
>> Hugh
>>
>> commit 91895b786e631ab47b618c901231f22b5a44115b
>> Author: Zheng Liu <wenqing.lz@taobao.com>
>> Date:   Tue May 8 11:24:03 2012 +0800
>>
>>      libata: enable SATA disk fua detection on default
>>
>>      Currently, SATA disk fua detection is disabled on default because most of
>>      devices don't support this feature at that time.  With the development of
>>      technology, more and more SATA disks support this feature.  So now we can enable
>>      this detection on default.
>>
>>      Although fua detection is defined as a kernel module parameter, it is too hard
>>      to set its value because it must be loaded and set before system starts up.
>>      That needs to modify initrd file.  So it is inconvenient for administrator who
>>      needs to manage a huge number of servers.
>>
>>      Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
>>      Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
>>
>> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
>> index 5eee1c1..c3fbdca 100644
>> --- a/drivers/ata/libata-core.c
>> +++ b/drivers/ata/libata-core.c
>> @@ -135,9 +135,9 @@ int atapi_passthru16 = 1;
>>   module_param(atapi_passthru16, int, 0444);
>>   MODULE_PARM_DESC(atapi_passthru16, "Enable ATA_16 passthru for ATAPI devices (0=off, 1=on [default])");
>>
>> -int libata_fua = 0;
>> +int libata_fua = 1;
>>   module_param_named(fua, libata_fua, int, 0444);
>> -MODULE_PARM_DESC(fua, "FUA support (0=off [default], 1=on)");
>> +MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on [default])");
>>
>>   static int ata_ignore_hpa;
>>   module_param_named(ignore_hpa, ata_ignore_hpa, int, 0644);
>
> Indeed, disabling FUA explicitly solved the issue on my disk as well.
> Hugh, what hard drive you have this issue on?
>
> I believe there are two solutions:
> - Revert FUA default back to '0'
> - Start filling SATA drive blacklist in function:

I think the right thing to do for release is disable it (again), then we 
can try again later with better logic.

I'll send Linus a patch to disable.

It is entirely possible that this is a software problem, where we 
missing some detail turning on FUA (thereby engaging some less traveled 
core block layer machinery), or even a remote possibility of triggering 
a filesystem bug.

	Jeff





--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hugh Dickins - Sept. 9, 2012, 8:36 p.m.
On Sun, Sep 9, 2012 at 1:28 PM, Jeff Garzik <jgarzik@pobox.com> wrote:
> I'll send Linus a patch to disable.

Thanks, but no, the change in question hasn't reached Linus yet, it's
just a linux-next or mmotm thing - isn't it?

I'll reply again later, but just about to rush out, and thought it
best to clear this up quickly.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Garzik - Sept. 9, 2012, 8:50 p.m.
On 09/09/2012 04:36 PM, Hugh Dickins wrote:
> On Sun, Sep 9, 2012 at 1:28 PM, Jeff Garzik <jgarzik@pobox.com> wrote:
>> I'll send Linus a patch to disable.
>
> Thanks, but no, the change in question hasn't reached Linus yet, it's
> just a linux-next or mmotm thing - isn't it?

Yep, libata-dev#upstream.  Sorry, got my own branches confused for a 
second there.

	Jeff




--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hugh Dickins - Sept. 10, 2012, 5:43 p.m.
On Sun, 9 Sep 2012, Jeff Garzik wrote:
> On 09/09/2012 04:11 PM, Arvydas Sidorenko wrote:
> > > I think you know your way around SCSI/libata much better than I do.
> > > 
> > > I just bisected linux-next, and it comes down to the commit below, which
> > > introduces the regression for me, and I'm guessing for you also.  Maybe
> > > it can be fixed up to satisfy us, but otherwise will have to be reverted:
> > > we don't invert a default if it's going to break older working systems.
> > > 
> > > A good workaround for me meanwhile is to add boot option "libata.fua=0":
> > > please try that (or reverting the commit) and let us know the result.
> > > 
> > > Thanks,
> > > Hugh
> > > 
> > > commit 91895b786e631ab47b618c901231f22b5a44115b
> > > Author: Zheng Liu <wenqing.lz@taobao.com>
> > > Date:   Tue May 8 11:24:03 2012 +0800
> > > 
> > >      libata: enable SATA disk fua detection on default
> > > 
> > >      Currently, SATA disk fua detection is disabled on default because
> > > most of
> > >      devices don't support this feature at that time.  With the
> > > development of
> > >      technology, more and more SATA disks support this feature.  So now
> > > we can enable
> > >      this detection on default.
> > > 
> > >      Although fua detection is defined as a kernel module parameter, it
> > > is too hard
> > >      to set its value because it must be loaded and set before system
> > > starts up.
> > >      That needs to modify initrd file.  So it is inconvenient for
> > > administrator who
> > >      needs to manage a huge number of servers.
> > > 
> > >      Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
> > >      Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
> > > 
> > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > > index 5eee1c1..c3fbdca 100644
> > > --- a/drivers/ata/libata-core.c
> > > +++ b/drivers/ata/libata-core.c
> > > @@ -135,9 +135,9 @@ int atapi_passthru16 = 1;
> > >   module_param(atapi_passthru16, int, 0444);
> > >   MODULE_PARM_DESC(atapi_passthru16, "Enable ATA_16 passthru for ATAPI
> > > devices (0=off, 1=on [default])");
> > > 
> > > -int libata_fua = 0;
> > > +int libata_fua = 1;
> > >   module_param_named(fua, libata_fua, int, 0444);
> > > -MODULE_PARM_DESC(fua, "FUA support (0=off [default], 1=on)");
> > > +MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on [default])");
> > > 
> > >   static int ata_ignore_hpa;
> > >   module_param_named(ignore_hpa, ata_ignore_hpa, int, 0644);
> > 
> > Indeed, disabling FUA explicitly solved the issue on my disk as well.

Good, thanks for letting us know.

> > Hugh, what hard drive you have this issue on?

The machine is a PowerMac G5, and here are (I think)
the relevant lines from dmesg (from a libata.fua=0 bootup):

ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7: WDC WD2500JS-41MVB1, 10.02E01, max UDMA/133
ata1.00: 488397168 sectors, multi 16: LBA48 
ata1.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      WDC WD2500JS-41M 10.0 PQ: 0 ANSI: 5
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 0:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: [mac] sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 sda15
sd 0:0:0:0: [sda] Attached SCSI disk

I have not jotted down the more interesting lines near the critical
target failure on a bad libata.fua=1 boot, but can certainly do so
later if we think they may help. 

> > 
> > I believe there are two solutions:
> > - Revert FUA default back to '0'
> > - Start filling SATA drive blacklist in function:
> 
> I think the right thing to do for release is disable it (again), then we can
> try again later with better logic.
> 
> I'll send Linus a patch to disable.

We've got some time to play before it goes to Linus (agreed in later mail),
and we've got a good libata.fua=0 workaround for now, with only Arvydas and
me complaining (there was a report from Dieter Ries in the thread, but that
was actually on 3.6-rc3, so must be something else).

So I think it would be okay to keep the commit in your tree for the moment,
if anyone has ideas of what we could try out.  Looking at the code, I see
that libata.fua=1 _ought_ to be harmless, but somehow is not.

I notice in the dmesg above that the SCSI end is reporting "doesn't support
DPO or FUA" whereas the ATA must be believing that FUA is supported on that
device.

I'm probably making a fool of myself by speculating in this area,
but I wonder if we could update the ATA view from the SCSI view
once it's known.

> 
> It is entirely possible that this is a software problem, where we missing
> some detail turning on FUA (thereby engaging some less traveled core block
> layer machinery), or even a remote possibility of triggering a filesystem
> bug.

Please let me know if you have anything for me to try,
or just want more debug output to help.

Thanks,
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zheng Liu - Sept. 12, 2012, 4:36 a.m.
On Sun, Sep 09, 2012 at 04:50:47PM -0400, Jeff Garzik wrote:
> On 09/09/2012 04:36 PM, Hugh Dickins wrote:
> >On Sun, Sep 9, 2012 at 1:28 PM, Jeff Garzik <jgarzik@pobox.com> wrote:
> >>I'll send Linus a patch to disable.
> >
> >Thanks, but no, the change in question hasn't reached Linus yet, it's
> >just a linux-next or mmotm thing - isn't it?
> 
> Yep, libata-dev#upstream.  Sorry, got my own branches confused for a
> second there.

Sorry for delay reply.  In our product system, we tested a lot of
different brands of SATA disks, and it is OK after fua is enabled on
default.  Certainly we can't test all disks. :-(

So I think we quite have to revert my patch now, and try to fix the
current problem.

Regards,
Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 5eee1c1..c3fbdca 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -135,9 +135,9 @@  int atapi_passthru16 = 1;
 module_param(atapi_passthru16, int, 0444);
 MODULE_PARM_DESC(atapi_passthru16, "Enable ATA_16 passthru for ATAPI devices (0=off, 1=on [default])");
 
-int libata_fua = 0;
+int libata_fua = 1;
 module_param_named(fua, libata_fua, int, 0444);
-MODULE_PARM_DESC(fua, "FUA support (0=off [default], 1=on)");
+MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on [default])");
 
 static int ata_ignore_hpa;
 module_param_named(ignore_hpa, ata_ignore_hpa, int, 0644);