Race to power off harming SATA SSDs

Message ID	20170508134807.498b3ee7@bbrezillon
State	Not Applicable
Headers	show Return-Path: <linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org> Date: Mon, 8 May 2017 13:48:07 +0200 From: Boris Brezillon <boris.brezillon@free-electrons.com> To: Richard Weinberger <richard.weinberger@gmail.com> Subject: Re: Race to power off harming SATA SSDs Message-ID: <20170508134807.498b3ee7@bbrezillon> In-Reply-To: <CAFLxGvzAmURHU_6K3gmBfOqHKgvQOkkO_+qgMKsmzhWZwwGnqw@mail.gmail.com> References: <20170410232118.GA4816@khazad-dum.debian.net> <20170410235206.GA28603@wtj.duckdns.org> <20170507204007.GA25628@atrey.karlin.mff.cuni.cz> <1494228094.6528.14.camel@infradead.org> <20170508092846.GA11029@amd> <1494236048.6528.32.camel@infradead.org> <20170508104953.GA17773@amd> <CAFLxGvzAmURHU_6K3gmBfOqHKgvQOkkO_+qgMKsmzhWZwwGnqw@mail.gmail.com> MIME-Version: 1.0 summary: Content analysis details: (-1.9 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record -0.0 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Precedence: list Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>, "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>, Hans de Goede <hdegoede@redhat.com>, LKML <linux-kernel@vger.kernel.org>, linux-ide@vger.kernel.org, "linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>, Pavel Machek <pavel@ucw.cz>, Tejun Heo <tj@kernel.org>, David Woodhouse <dwmw2@infradead.org> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-mtd" <linux-mtd-bounces@lists.infradead.org> Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org

Message ID

20170508134807.498b3ee7@bbrezillon

State

Not Applicable

Headers

Date: Mon, 8 May 2017 13:48:07 +0200
From: Boris Brezillon <boris.brezillon@free-electrons.com>
To: Richard Weinberger <richard.weinberger@gmail.com>
Subject: Re: Race to power off harming SATA SSDs
Message-ID: <20170508134807.498b3ee7@bbrezillon>
In-Reply-To: <CAFLxGvzAmURHU_6K3gmBfOqHKgvQOkkO_+qgMKsmzhWZwwGnqw@mail.gmail.com>
References: <20170410232118.GA4816@khazad-dum.debian.net>
	<20170410235206.GA28603@wtj.duckdns.org>
	<20170507204007.GA25628@atrey.karlin.mff.cuni.cz>
	<1494228094.6528.14.camel@infradead.org>
	<20170508092846.GA11029@amd>
	<1494236048.6528.32.camel@infradead.org>
	<20170508104953.GA17773@amd>
	<CAFLxGvzAmURHU_6K3gmBfOqHKgvQOkkO_+qgMKsmzhWZwwGnqw@mail.gmail.com>
MIME-Version: 1.0
Precedence: list
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	Hans de Goede <hdegoede@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>, linux-ide@vger.kernel.org,
	"linux-mtd@lists.infradead.org" <linux-mtd@lists.infradead.org>,
	Pavel Machek <pavel@ucw.cz>, Tejun Heo <tj@kernel.org>,
	David Woodhouse <dwmw2@infradead.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-mtd" <linux-mtd-bounces@lists.infradead.org>
Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org

Commit Message

Boris Brezillon May 8, 2017, 11:48 a.m. UTC

On Mon, 8 May 2017 13:06:17 +0200
Richard Weinberger <richard.weinberger@gmail.com> wrote:

> On Mon, May 8, 2017 at 12:49 PM, Pavel Machek <pavel@ucw.cz> wrote:
> > Aha, nice, so it looks like ubifs is a step back here.
> >
> > 'clean marker' is a good idea... empty pages have plenty of space.  
> 
> If UBI (not UBIFS) faces an empty block, it also re-erases it.

Unfortunately, that's not the case, though UBI can easily be patched
to do that (see below).

> The EC header is uses as clean marker.

That is true. If the EC header has been written to a block, that means
this block has been correctly erased.

> 
> > How do you handle the issue during regular write? Always ignore last
> > successfully written block?  

I guess UBIFS can know what was written last, because of the log-based
approach + the seqnum stored along with FS nodes, but I'm pretty sure
UBIFS does not re-write the last written block in case of an unclean
mount. Richard, am I wrong?

> 
> The last page of a block is inspected and allowed to be corrupted.

Actually, it's not really about corrupted pages, it's about pages that
might become unreadable after a few reads.

> 
> > Do you handle "paired pages" problem on MLC?  
> 
> Nope, no MLC support in mainline so far.

Richard and I have put a lot of effort to reliably support MLC NANDs in
mainline, unfortunately this projects has been paused. You can access
the last version of our work here [1] if you're interested (it's
clearly not in a shippable state ;-)).

[1]https://github.com/bbrezillon/linux-sunxi/commits/bb/4.7/ubi-mlc

--->8---

Comments

Boris Brezillon May 8, 2017, 11:55 a.m. UTC | #1

On Mon, 8 May 2017 13:48:07 +0200
Boris Brezillon <boris.brezillon@free-electrons.com> wrote:

> On Mon, 8 May 2017 13:06:17 +0200
> Richard Weinberger <richard.weinberger@gmail.com> wrote:
> 
> > On Mon, May 8, 2017 at 12:49 PM, Pavel Machek <pavel@ucw.cz> wrote:  
> > > Aha, nice, so it looks like ubifs is a step back here.
> > >
> > > 'clean marker' is a good idea... empty pages have plenty of space.    
> > 
> > If UBI (not UBIFS) faces an empty block, it also re-erases it.  
> 
> Unfortunately, that's not the case, though UBI can easily be patched
> to do that (see below).

Sorry for the noise, I was wrong, UBI already re-erases empty blocks
[1].

[1]http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/ubi/attach.c#L983

Richard Weinberger May 8, 2017, 12:13 p.m. UTC | #2

Boris,

Am 08.05.2017 um 13:48 schrieb Boris Brezillon:
>>> How do you handle the issue during regular write? Always ignore last
>>> successfully written block?  
> 
> I guess UBIFS can know what was written last, because of the log-based
> approach + the seqnum stored along with FS nodes, but I'm pretty sure
> UBIFS does not re-write the last written block in case of an unclean
> mount. Richard, am I wrong?

Yes. UBIFS has the machinery but uses it differently. When it faces ECC
errors while replying the journal it can recover good data from the LEB.
It assumes that an interrupted write leads always to ECC errors.

>>
>> The last page of a block is inspected and allowed to be corrupted.
> 
> Actually, it's not really about corrupted pages, it's about pages that
> might become unreadable after a few reads.

As stated before, it assumes an ECC error from an interrupted read.

We could automatically re-write everything in UBIFS that was written last
but we don't have this information for data UBI itself wrote since UBI has
no journal.

If unstable bit can be triggered with current systems we can think of a clever
trick to deal with that. So far nobody was able to show me the problem.

Thanks,
//richard

diff --git a/drivers/mtd/ubi/attach.c b/drivers/mtd/ubi/attach.c
index 93ceea4f27d5..3d76941c9570 100644
--- a/drivers/mtd/ubi/attach.c
+++ b/drivers/mtd/ubi/attach.c
@@ -1121,21 +1121,20 @@  static int scan_peb(struct ubi_device *ubi, struct ubi_attach_info *ai,
                        return err;
                goto adjust_mean_ec;
        case UBI_IO_FF_BITFLIPS:
+       case UBI_IO_FF:
+               /*
+                * Always erase the block if the EC header is empty, even if
+                * no bitflips were reported because otherwise we might
+                * expose ourselves to the 'unstable bits' issue described
+                * here:
+                *
+                * http://www.linux-mtd.infradead.org/doc/ubifs.html#L_unstable_bits
+                */
                err = add_to_list(ai, pnum, UBI_UNKNOWN, UBI_UNKNOWN,
                                  ec, 1, &ai->erase);
                if (err)
                        return err;
                goto adjust_mean_ec;
-       case UBI_IO_FF:
-               if (ec_err || bitflips)
-                       err = add_to_list(ai, pnum, UBI_UNKNOWN,
-                                         UBI_UNKNOWN, ec, 1, &ai->erase);
-               else
-                       err = add_to_list(ai, pnum, UBI_UNKNOWN,
-                                         UBI_UNKNOWN, ec, 0, &ai->free);
-               if (err)
-                       return err;
-               goto adjust_mean_ec;
        default:
                ubi_err(ubi, "'ubi_io_read_vid_hdr()' returned unknown code %d",
                        err);

Race to power off harming SATA SSDs

Commit Message

Comments

Patch